Next Article in Journal
Microstrip Patch Antenna for GNSS Applications
Previous Article in Journal
Toward Transparent Modeling: A Scoping Review of Explainability for Arabic Sentiment Analysis
Previous Article in Special Issue
Deep Learning-Based Detection of Intracranial Hemorrhages in Postmortem Computed Tomography: Comparative Study of 15 Transfer-Learned Models
 
 
Article
Peer-Review Record

Enhancing Pediatric Asthma Homecare Management: The Potential of Deep Learning Associated with Spirometry-Labelled Data

Appl. Sci. 2025, 15(19), 10662; https://doi.org/10.3390/app151910662
by Heidi Cleverley-Leblanc 1,2, Johan N. Siebert 2,3, Jonathan Doenz 4, Mary-Anne Hartley 4,5, Alain Gervaix 2,3, Constance Barazzone-Argiroffo 2,6, Laurence Lacroix 2,3 and Isabelle Ruchonnet-Metrailler 2,6,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Appl. Sci. 2025, 15(19), 10662; https://doi.org/10.3390/app151910662
Submission received: 21 August 2025 / Revised: 19 September 2025 / Accepted: 22 September 2025 / Published: 2 October 2025
(This article belongs to the Special Issue Deep Learning and Data Mining: Latest Advances and Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for submitting your manuscript entitled “Enhancing pediatric asthma homecare management: The Potential of Deep Learning associated with spirometry-labelled data.”

Your study addresses an important and clinically relevant challenge: improving early detection of asthma exacerbations in pediatric patients through AI-based analysis of lung sounds. In particular, your use of spirometry-labelled data, as opposed to more subjective human-labeled datasets, is a novel and commendable approach with strong potential for future impact.

However, while the study is conceptually interesting, we believe major revisions are required before the manuscript can be considered for publication. Below, we outline the rationale for this recommendation and provide specific guidance for improving the manuscript.

1. Lack of Model Generalization – Needs Deeper Analysis

The AI model showed promising performance on the training set (AUROC = 0.763) but very poor generalization on the validation set (AUROC = 0.398), which is essentially no better than random classification. This outcome, while acknowledged in the paper, demands deeper technical and scientific analysis.

The discussion should go beyond listing possible factors (e.g., cohort characteristics, device variance) and provide quantitative analyses supporting each hypothesis.

For example:

Performance breakdown by clinical severity (e.g., ACTp or PRAM scores).

Error analysis comparing misclassified cases to correctly classified ones.

Evaluation of class imbalance effects and data stratification.

Understanding exactly why the model failed to generalize is key to making your study meaningful—even if the initial results were negative.

2. Limitations in Dataset Composition – Affects Model Learning

Your dataset includes mostly asymptomatic or mildly symptomatic outpatients, which likely limited the model’s ability to learn meaningful features related to asthma exacerbation.

While this was an intentional design choice to detect early signs, the result is a dataset that lacks the acoustic diversity needed to train a robust model.

Please clarify:

Was the low symptom profile a deliberate inclusion criterion, or a limitation of recruitment?

Do you intend this model for early-stage detection only, or general asthma monitoring?

Addressing these questions will help clarify the model’s intended scope and limitations.

3. Device Variability and Its Impact on Performance

Two digital stethoscopes were used in the study, with 74% of recordings obtained using the Eko® CORE stethoscope. Surprisingly, this device yielded lower validation performance than the Littmann® stethoscope, despite being the preferred tool.

This discrepancy warrants deeper technical investigation:

Could filtering or noise suppression features in the Eko® device be unintentionally masking subtle lung sounds?

We recommend including:

Device-specific performance breakdowns.

Basic acoustic analysis (e.g., frequency distribution, noise levels).

A short discussion on standardizing input data from different devices in future work.

4. Lack of Transparency in Model Architecture and Training

The manuscript mentions the use of the DeepBreath algorithm, but offers minimal detail about the model's structure, training parameters, or validation strategy.

Readers should not be required to follow an external link to understand your methodology.

Please include:

An overview of model architecture.

Hyperparameters used.

Any techniques to prevent overfitting (e.g., dropout, early stopping).

How cross-validation or internal validation was performed (if any).

This will improve the reproducibility and scientific rigor of your work.

5. Clarifying the Contribution of a “Negative Result”

Despite the limited model performance, your work offers value by clearly demonstrating when and why AI-based lung sound models may fail—a crucial insight for future development.

To emphasize this contribution, we encourage you to:

Adjust the abstract and conclusion to explicitly highlight the value of identifying the failure conditions (e.g., model underperformance in low-symptom patients or home-use scenarios).

Reframe the results not simply as disappointing, but as a guidepost for better model design, data collection, and clinical integration.

This repositioning could turn your paper into a meaningful contribution, especially for others working on digital stethoscope data, pediatric AI applications, or early-stage disease monitoring.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Overall assessment.

The paper is devoted to the creation and evaluation of a deep learning model for the detection of asthma in children based on digital lung auscultation, with an objective link to the results of spirometry (lines 18–26, 65–69). This is a significant step forward compared to previous works, which were mostly based on human labeling of breath sounds (lines 61–64, 216–220). The authors emphasize the potential of such an approach for improving home monitoring of asthma and early detection of exacerbations.

Strengths

  1. Scientific novelty. Using spirometry as the “gold standard” for labeling recordings makes the model more objective and less prone to bias (lines 222–225).
  2. Study design. Prospective, single-center study with clearly described patient inclusion and exclusion criteria (lines 72–88).
  3. Data volume. 16.8 hours of audio were collected from 151 patients (lines 28–29, 153–159), which is a relevant sample for the pilot project.
  4. Technical implementation. The DeepBreath algorithm is adapted for multichannel audio recordings (lines 132–138). The authors took into account the features of different digital stethoscopes (lines 93–101, 239–244).
  5. Ethical aspect. Ethics committee approval and informed consent from parents and children were obtained (lines 72–78).

Limitations

  1. Model generalization. The significant performance gap between the training set (AUROC = 0.763) and the validation set (0.398–0.511) (lines 172–179, 231–233) indicates overfitting and limited generalizability.
  2. Sample composition. Most patients were asymptomatic or had mild symptoms (lines 159–161, 247–253). This reduced the number of “high-quality” asthma exacerbations to train on.
  3. Instrumental differences. Results varied significantly between stethoscope models (lines 239–244), which casts doubt on the universality of the developed algorithm.
  4. Lack of multimodal data. The model only considered breath sounds, although the integration of clinical parameters (saturation, respiratory rate, symptoms) could improve accuracy (lines 258–260).
  5. Comparison with other algorithms. The article lacks direct comparison with other approaches or open datasets to better assess the benefits.

 

Recommendations to authors

  • Expand the sample to include children with acute exacerbations to provide a variety of training data. Or use data augmentation. Show how the results change with the use of data augmentation or the use of external datasets. The obtained accuracy of ~0.3 is too low for the task of medical diagnostics and does not allow considering the model as a practical tool. To improve performance, authors should consider data augmentation methods (for example, artificially increasing the sample by adding noise, time-stretching, or generating synthetic examples), as well as applying transfer learning approaches to improve the generalization ability of the model.
  • Analyze cases of incorrect classifications to highlight typical algorithm errors.
  • References to literature are not in the correct format. This needs to be corrected. Synchronize all indentations. For example, between 142 and 143, the type of indentation is the same, and between 152 and 154, it is different. 230 - ? Fix everything according to the requirements.
  • Line 67 – the goal is never to develop a method. The goal of the study can be to increase the accuracy or speed of diagnostics, and the method is a way to achieve the goal (that is, you are developing a means of improving something). Correct the goal. When you correct the goal, you will see that you need to show (before formulating it) the shortcomings of the existing approach that you plan to correct – you need to show the gap between what is and what is needed (you have described the problem very well in the first two sentences of the abstract). And overcoming this gap is your goal, which you achieve with the help of this study. You need to schematically mention in the introduction how exactly you plan to overcome this gap – the method (not to be confused with the goal). Then, in the conclusions, you need to mention by how many points or percentages your method improved the state of affairs. In your case, the accuracy of diagnostics. Or justify why this did not happen. You have formulated the abstract perfectly – everything is there.

 

Conclusion

The article is interesting, well-structured, and relevant. It contributes to the development of digital medicine and the application of artificial intelligence in pediatrics. However, the results are of limited practical importance due to the weak generalizability of the model (or small sample size). After further development and expansion of clinical scenarios, the study has significant potential for implementation in the home monitoring system for asthma in children.

Recommendation: The study deserves publication after further development.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The aim of the work sounds promising – to teach independent control of asthma symptoms. Perhaps, it is possible. Especially since asthma is mainly an outpatient disease.

  1. The work is related to only 2 methods. However, in clinical practice, asthma is mainly diagnosed by immunological parameters. Therefore, in the Introduction, it is advisable to analyze other methods used in the diagnosis of asthma in order to lead the reader to the importance of auscultation and spirography.
  2. It would be a good idea to show the original records of auscultation and spirography.
  3. Part of Table 1 (before Clinical data) should be moved to the Introduction and linked to paragraph 2.3.2. The anamnesis and age of the 1st registered attack should be added.
  4. Why forced inspiratory volume not assessed in spirography?
  5. In figures 2 and 3, it is better to avoid abbreviations and keep full signatures.
  6. Table 1. It is necessary to expand the description of spirometry data in the text. And also to show significant differences in the table, especially for clinical data related to the research objectives.
  7. The AI ​​model is not disclosed in the work. What results support the model? The results are more like a fragment of an expert system based on auscultation and spirometry data. What is the proposed algorithm for using the model at home. An optimally functioning model should create a file with the results, which must be exported to the clinic. Also, nothing is said or discussed for deep learning.
  8. Of the 24 literary sources, only 4 are from the last 5 years. Up to 70% are accepted for research work.
  9. How is source 21 used?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

This paper aims to develop an AI-based automatic auscultation analysis tool for the early detection of asthma exacerbations in children within homecare settings. It identifies a key limitation in existing auscultation-based AI models—namely, their reliance on subjective human labeling—and proposes a novel approach using objective spirometry results as labels for model training. The study was conducted as a prospective observational study at the Geneva University Hospitals in Switzerland, collecting lung sound recordings and spirometry data from a total of 151 pediatric outpatients. While the model achieved an AUROC of 0.763 on the training data, its performance declined significantly on the validation set (AUROC = 0.398), revealing limitations in generalization. Despite this, the paper holds strong academic value in that it analyzes real-world constraints of AI model deployment and identifies performance deterioration due to differences between digital stethoscope devices, while also suggesting directions for future research.

<General Comments on Scientific Concept and Content>
Strengths:

- The scientific problem is clearly defined, and the limitations of previous studies are well summarized.

- The labeling strategy based on spirometry represents a robust method to eliminate subjectivity in AI-based lung sound analysis.

- Technical aspects of the AI model—such as preprocessing, model architecture, and training strategy—are described in detail, supporting reproducibility.

- The study’s focus on pediatric outpatients reflects realistic conditions of home-based asthma management.

<Specific Comments>

(1) L33–34: Rather than using the expression “failed to generalize,” it is recommended to use a more objective phrasing such as “performance in validation set was limited (AUROC = 0.398).”

(2) L106–108: The authors state that dropout or early stopping was not used due to the lack of repeated measures, implying overfitting was not a concern. However, the significant performance drop in the validation set suggests overfitting may have occurred. This should be reconsidered.

(3) L259–281: While the discussion of model performance is adequate, including additional evaluation metrics such as the confusion matrix, precision, and recall would enhance the reliability of the interpretation.

(4) L360–372: The suggestions for future work are appropriate, but it would strengthen the manuscript to further elaborate on how AI-based clinical tools could realistically influence decision-making in clinical practice.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors in the conclusions (updated, in green) state that “Our findings reveal that in mostly asymptomatic outpatients, lung sounds alone do not provide sufficient information for the early detection of 380 asthma exacerbations,”. From this, the authors conclude that “First, differences in how digital (382) stethoscopes process sounds may suppress relevant frequencies for the diagnosis of early asthma attacks.”. That is, from the fact that “… lung sounds alone do not provide sufficient information,” it is concluded that the problem is “how digital stethoscopes process sounds.” Here (in the conclusion), something needs to be rephrased.

The last sentence in the conclusion is not very informative. Rewrite it, because both I and the reader can only guess: your studies have shown that despite encouraging training (___), in experiments on real asymptomatic patients the system method showed insufficient accuracy (___). Therefore, you conclude that in order to make a diagnosis, it is necessary to integrate data from the tests with spirometry data? And this will be your next work? Correct the last sentence.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The authors responded to the reviewer's comments and made the appropriate edits.

It seems to me that captions should be placed under the figures (1-3).

Author Response

Thank you for you comment.

We have now placed the captions under the figures.

Back to TopTop