Next Article in Journal
The Measurement of the Impact Time to Evaluate the Plate Thickness
Previous Article in Journal
Designing a Reference Model for the Deployment of Shared Autonomous Vehicles in Lisbon
 
 
Article
Peer-Review Record

Deep Learning for Automated Detection of Periportal Fibrosis in Ultrasound Imaging: Improving Diagnostic Accuracy in Schistosoma mansoni Infection

Appl. Sci. 2026, 16(1), 87; https://doi.org/10.3390/app16010087 (registering DOI)
by Alex Mutebe 1,2,*, Bakhtiyar Ahmed 3, Agnes Natukunda 1, Emily Webb 1,4, Andrew Abaasa 1, Simon Mpooya 5, Moses Egesa 1,6,7, Ayoub Kakande 1, Alison M. Elliott 1 and Samuel O. Danso 8
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2026, 16(1), 87; https://doi.org/10.3390/app16010087 (registering DOI)
Submission received: 4 November 2025 / Revised: 27 November 2025 / Accepted: 3 December 2025 / Published: 21 December 2025

Round 1

Reviewer 1 Report (Previous Reviewer 2)

Comments and Suggestions for Authors

The authors have made the necessary revisions in response to my previous comments, and I believe the manuscript has met the minimum acceptance criteria.

Nevertheless, several key metrics must be corrected. For example, based on the confusion matrix in Figure 8, the Sensitivity of Model 2 in Table 2 should be 84. The authors should double-check the accuracy of all data presented in Table 2.

On Page 11, correspondingly, the statement "Model 2 achieved a precision of 84%" should be revised to 80%.

Given that model metrics are a critical and sensitive issue, yet the authors have failed to maintain full data consistency repeatly, I request that they also provide the confusion matrix for Model 1 to facilitate further verification of all model metrics.

Author Response

We sincerely thank the reviewer for the time and effort devoted to reviewing our manuscript. We deeply appreciate the insightful and constructive feedback, which has greatly contributed to improving the clarity, quality, and scholarly rigor of our work. Below are our detailed responses to each comment, with corresponding revisions clearly marked and referenced by line numbers in the resubmitted manuscript.

Comment

The authors have made the necessary revisions in response to my previous comments, and I believe the manuscript has met the minimum acceptance criteria.

Nevertheless, several key metrics must be corrected. For example, based on the confusion matrix in Figure 8, the Sensitivity of Model 2 in Table 2 should be 84. The authors should double-check the accuracy of all data presented in Table 2.

On Page 11, correspondingly, the statement "Model 2 achieved a precision of 84%" should be revised to 80%.

Given that model metrics are a critical and sensitive issue, yet the authors have failed to maintain full data consistency repeatedly, I request that they also provide the confusion matrix for Model 1 to facilitate further verification of all model metrics.

 

Response

Thank you for your careful re-examination of our manuscript and for identifying inconsistencies in the reported performance metrics. Accordingly, we taken the following actions:

Correction of Table 2: We have re-checked all calculations against the confusion matrices. The Sensitivity of Model 2 has been corrected to 84%, and all other values in Table 2 (Section 3.1, page 10) have been verified for accuracy.

Correction of Page 11 statement: The text has been revised to reflect the correct precision value for Model 2, now stated as 76%, which is now consistently reported in Table 2 (page 10) and the Results section (page 11). We recognise that the reviewer suggested 80%, but our verified calculations confirm 76% as the accurate figure.

Addition of Confusion Matrix for Model 1: To ensure transparency, we have included the confusion matrices for both Model 1 (Figure 8) and Model 2 (Figure 9), allowing independent verification of all metrics. These corrections eliminate the earlier inconsistency and strengthen the reliability of our findings.

Author Response File: Author Response.pdf

Reviewer 2 Report (Previous Reviewer 3)

Comments and Suggestions for Authors

The response letter is not written in a proper format.
Should list each reviewer’s comment one by one, followed by your detailed response to each point. 
Note that these are the comments and responses of all reviewers, not just one.

In addition, the revised manuscript does not include any highlighted changes.
All modifications must be highlighted in the revised version so reviewers can easily identify the updates, rather than having to compare the old and new versions manually.

Please make the above corrections and resubmit your manuscript and response letter.

Author Response

 Comment

The response letter is not written in a proper format. Should list each reviewer’s comment one by one, followed by your detailed response to each point. Note that these are the comments and responses of all reviewers, not just one. In addition, the revised manuscript does not include any highlighted changes. All modifications must be highlighted in the revised version so reviewers can easily identify the updates, rather than having to compare the old and new versions manually. Please make the above corrections and resubmit your manuscript and response letter.

Response

We have reformatted the response letter to list each reviewer’s comment verbatim, followed by our detailed point‑by‑point replies. All reviewers’ comments are now included. In addition, all modifications in the revised manuscript (manuscript ID indicated) have been highlighted, with section IDs, page numbers, and line references provided in the response letter to ensure reviewers can easily trace the updates. Both the revised manuscript and corrected response letter are resubmitted as requested.

Author Response File: Author Response.pdf

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

I am pleased to receive your submission, "Deep Learning for Automated Detection of Periportal Fibrosis in Ultrasound Imaging: Improving Diagnostic Accuracy in Schistosoma mansoni Infection." As a reviewer of this manuscript, I have carefully examined the work, which employs deep learning techniques—particularly a VGG16-inspired CNN to automatically identify and classify periportal fibrosis (PPF) induced by Schistosoma mansoni infection, with the aim of enhancing the accuracy and accessibility of ultrasound diagnosis. The proposed method achieved an accuracy of 80%, and the model demonstrates robust performance even at 32×32 low-resolution image inputs, effectively showcasing its resilience. Nevertheless, certain aspects warrant further refinement and enhancement:

The methodology utilized only 200 images selected from an initial pool of 791 for training and testing. While 200 images represent a reasonable sample size, this quantity slightly falls below conventional dataset sizes in medical AI studies. Would it be possible to incorporate additional datasets for both training and validation to strengthen the model's generalizability?

Furthermore, the original dataset comprised 791 images, yet only 200 were retained for model training. The exclusion rationale merely noted issues of untraceable image origins and inconsistent image quality without specifying explicit exclusion criteria. To enhance reproducibility and scientific rigor, could the authors

(1) Detail the precise selection protocol for the 200 images, or

(2) Clarify the specific technical issues rendering the remaining images unusable for model training?

The methodology reports that 32×32-pixel resolution yielded superior performance compared to 128×128 pixels—a counterintuitive finding given that higher resolution (128×128) theoretically should capture richer image features. The authors must clarify:

(1) Whether 32×32 resolution compromises the preservation of critical textural details (e.g., periportal fibrosis-specific features), and (2) Why a low-resolution approach—despite potential degradation of key medical features—achieves better diagnostic efficacy.

The experimental section lacks explicit explanation of the model's decision-making process. To enhance interpretability and clinical relevance, the authors should:
  (1) Incorporate one or more Grad-CAM visualizations in the results section to demonstrate salient regions of attention.

(2) Analyze whether these regions align with clinical experts' annotations (e.g., consensus on periportal fibrosis zones).

The paper emphasizes the significance of model lightweighting and deployability in the Discussion section, yet lacks quantitative performance metrics for real-world deployment. To substantiate these claims, it is recommended that the author add:

(1) Inference latency and memory footprint measured on edge computing platforms

(2) Comparison with traditional models or clinical experts under clinically relevant time constraints

Suggestions for enhancing the paper's impact: the authors could demonstrate model utility through verifiable real-world deployment:

Suggestions for enhancing the paper's impact: It is recommended to provide the pseudocode for core algorithmic logic or neural network architecture diagrams illustrating tensor flow operations explicitly linking each component to the clinical task.

In summary, the authors present a novel approach for automated identification and diagnosis of Schistosoma mansoni-induced parasitic pulmonary fibrosis (PPF), demonstrating rigorous validation through clinical datasets. While the methodology exhibits strong innovation, certain revisions are still required in terms of the data processing procedure and output accuracy. If the authors can adequately address and enhance these issues, this work holds significant promise for advancing medical AI in diagnostic imaging.

Author Response

Response to Reviewer 3 Comments

We sincerely thank the reviewer for the time and effort devoted to reviewing our manuscript. Below are our detailed responses to each comment, with corresponding revisions clearly marked and referenced by line numbers in the resubmitted manuscript.

 Comment

The methodology utilized only 200 images selected from an initial pool of 791 for training and testing. While 200 images represent a reasonable sample size, this quantity slightly falls below conventional dataset sizes in medical AI studies. Would it be possible to incorporate additional datasets for both training and validation to strengthen the model's generalizability?

Furthermore, the original dataset comprised 791 images, yet only 200 were retained for model training. The exclusion rationale merely noted issues of untraceable image origins and inconsistent image quality without specifying explicit exclusion criteria. To enhance reproducibility and scientific rigor, could the authors

(1) Detail the precise selection protocol for the 200 images, or

(2) Clarify the specific technical issues rendering the remaining images unusable for model training?

Response

Thank you for this comment. As noted in Section 2.1 (Lines 107–116), inconsistencies in the image identification numbers stored on the ultrasound device prevented reliable matching of some images with their corresponding clinical records. Furthermore, the class imbalance between non-disease (n = 594) and disease cases (n = 179) constrained the effective sample size for this binary classification task. We also highlight in the Discussion (Lines 355–360) that future work will incorporate larger and more diverse datasets to improve model robustness and generalisability.

 

Comment

The methodology reports that 32×32-pixel resolution yielded superior performance compared to 128×128 pixels—a counterintuitive finding given that higher resolution (128×128) theoretically should capture richer image features. The authors must clarify:

 

(1) Whether 32×32 resolution compromises the preservation of critical textural details (e.g., periportal fibrosis-specific features), and (2) Why a low-resolution approach—despite potential degradation of key medical features—achieves better diagnostic efficacy.

Response

In response to this, we have clarified that the original DICOM frames varied in size due to patient anatomy and imaging depth (section 2.2. lines 125-132). Images were resized to standard dimensions (32×32 and 128×128) to improve computational efficiency. Aspect ratio was preserved using padding to minimize distortion. Preliminary experiments comparing input resolutions of 32×32 and 128×128 pixels showed that the 32×32 configuration achieved superior accuracy, indicating that key diagnostic features for PPF detection were retained even at lower resolution (Page 11, Section 4, line 302-306). Further clarification can be found on Page 8 Section 2.6, lines 243-248

Comment

The experimental section lacks explicit explanation of the model's decision-making process. To enhance interpretability and clinical relevance, the authors should:

  (1) Incorporate one or more Grad-CAM visualizations in the results section to demonstrate salient regions of attention.

 

(2) Analyze whether these regions align with clinical experts' annotations (e.g., consensus on periportal fibrosis zones).

 

Response

We appreciate this constructive suggestion. In the revised manuscript (Lines 370–374), we have added a statement proposing the incorporation of interpretability techniques such as Grad‑CAM and other visualization tools in future research to identify image regions most influential in model decision‑making. While Grad‑CAM visualizations were not included in the current Results section due to our study limitations and the exploratory nature of this study, we agree that such analyses are critical for clinical relevance. We therefore highlight interpretability as a key priority for future work, including explicit comparison of Grad‑CAM saliency maps with expert‑annotated periportal fibrosis zones. This addition strengthens the discussion on model transparency and clinician confidence.

 

Comment

The paper emphasizes the significance of model lightweighting and deployability in the Discussion section, yet lacks quantitative performance metrics for real-world deployment. To substantiate these claims, it is recommended that the author add:

 

(1) Inference latency and memory footprint measured on edge computing platforms

(2) Comparison with traditional models or clinical experts under clinically relevant time constraints

Suggestions for enhancing the paper's impact: the authors could demonstrate model utility through verifiable real-world deployment:

Suggestions for enhancing the paper's impact: It is recommended to provide the pseudocode for core algorithmic logic or neural network architecture diagrams illustrating tensor flow operations explicitly linking each component to the clinical task.

 

Response

We sincerely thank the reviewer for their constructive feedback regarding the inclusion of deployment metrics, comparative evaluation under clinical constraints, and detailed algorithmic documentation. We agree that these aspects would further strengthen the translational impact of our work. However, the current study was designed as a proof-of-concept investigation to evaluate the diagnostic performance of a CNN model for periportal fibrosis detection in ultrasound images. Within this scope, it was not practically feasible to measure inference latency or memory footprint on edge computing platforms, nor to conduct comparative benchmarking against clinical experts under time-sensitive workflows. Similarly, while pseudocode and detailed tensor flow diagrams would enhance reproducibility, these elements will be incorporated in subsequent studies where we extend evaluation to real-world deployment scenarios and larger, more diverse datasets.

To address this valuable feedback, we have now explicitly acknowledged these limitations in the Discussion section (last paragraph, lines 384-390) and highlighted them as priorities for future work. We believe this staged approach; first establishing diagnostic feasibility, then expanding to deployment and expert benchmarking will ensure methodological rigor and strengthen the clinical translation of our findings.

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report (Previous Reviewer 3)

Comments and Suggestions for Authors

There were no substantial experiments, and although the comments from the first round were addressed, they were all labeled as future work. 

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Advanced deep learning methods were applied to automate the detection of periportal fibrosis (PPF) in ultrasound images of Schistosoma mansoni infection. A convolutional neural network trained on a balanced dataset of 200 images achieved an accuracy of 80%, with sensitivity and specificity of 80% and 84%, respectively. Future validation on larger datasets and multi-class fibrosis grading was recommended to enhance clinical applicability.

  • A small dataset of only 200 images was used, which limits generalisability.
  • Images were selected sequentially without randomisation, introducing potential selection bias.
  • The sample size was not statistically determined, which reduced the robustness of the findings.
  • Only one ultrasonographer performed the image classification, increasing the risk of subjective bias.
  • No independent validation set from external sources was used.
  • The study excluded other liver diseases, limiting its applicability in broader clinical contexts.
  • The CNN was evaluated only for binary classification (fibrosis vs. no fibrosis), not for multiple severity levels.
  • The study population was geographically restricted to Ugandan communities, reducing cross-population validity.
  • The CNN architecture was adapted from VGG16 but was not benchmarked against other state-of-the-art models.
  • Limited interpretability tools (e.g., Grad-CAM) were not incorporated, restricting clinical trust.
  • Training involved manual hyperparameter tuning without automated optimisation strategies.
  • The achieved accuracy (80%) remains moderate, which may limit clinical adoption without further improvements.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript investigates the detection of periportal fibrosis (PPF) in ultrasound images. While liver fibrosis detection has been widely studied, the authors restrict their work to schistosomiasis-related PPF under the Niamey protocol, which gives the study some novelty in terms of context.

That said, personally, the manuscript has very limited technical appeal. The experiments are conducted on a small private dataset, the backbone is an early VGG network, and the stated objective is largely confined to comparing model metrics under different batch sizes. Moreover, the core results (Table 2) contradict Figure 8. For example, specificity should be defined as TN/(TN+FP)=16/(16+5)=76%, not 84% as reported in Table 2. I cannot tell whether this discrepancy is intentional or accidental, but the authors must audit all their metrics, and I would require the full confusion matrix to be included in the revision.

The description of the datasets is also very general and omits critical information—we only learn that the images are 32×32-pixel PNGs. The authors must provide more details, such as: patient positioning during acquisition; the imaging device; acquisition procedures; the format and size of the raw image data; criteria for extracting frames from ultrasound videos; and the qualifications of the physician(s) who performed the Image Pattern Score assessments. In addition, 32×32 is extremely small—why were the images downscaled to such a low resolution? Without the original image size or even example raw images, it is difficult to understand the necessity of this choice.

Other comments include:

The dataset is very small (200 samples). Even considering label balance, there could be 394 samples, yet the authors state that images were “very difficult” to obtain. Collecting as much data as possible is essential for this line of research—especially since the data appear to be single-center. Personally, the explanation provided does not make sense to me.

The manuscript states that “the adapted model was specifically designed to …,” but from the description the authors are simply using a standard VGG-16. I do not see any substantive differences.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper develops a deep learning-based automated detection method for periportal fibrosis.

Strengths:
The authors compiled a comprehensive dataset of liver ultrasound images and carefully annotated it. 

Weaknesses:
1. The paper is overly lengthy and covers many basic concepts. Paragraphs 6 and 7 of the introduction are somewhat repetitive. It is recommended that you refine your presentation and highlight the innovations in your work.

2. The paper lacks innovation. The VGG framework used is a classic, early approach. While improvements are mentioned in the paper, no concrete improvements are found. Note that modifying parameters does not constitute an improvement.

3. The article does not point out that using the original size and directly resizing the data to a 32*32 resolution will result in loss of diagnostic information.

4. In the Model Training section, “Training began by evaluating two input image resolutions (32×32 pixels and 128×128 pixels), cycling through each size to identify the optimal scale for feature extraction.” After evaluation, is the pixel size of 32*32 adopted?

5. Model 1 and Model 2 are both proposed models by the author, and the original results of VGG16 or other methods should be considered for comparison.

6. The training and validation curves of Fig. 6 and Fig. 7 are both oscillating and have not yet converged. Consider increasing the batch size and epochs to make a final judgment.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Back to TopTop