Ensemble Transfer Learning for Gastric Cancer Prediction Using Electronic Health Records in a Data-Scarce Single-Hospital Setting
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors1.In the introduction section, it is recommended to reorganize the structure. Please arrange it according to research background, research motivation, research objectives, and research framework. Currently, the research objectives are overly lengthy, and some parts actually belong to the motivation rather than the objectives.
2.The literature review section is too brief. It is suggested to include more relevant studies and divide the content into subsections with subtitles for better clarity.
3.Please explain how the study identified patients with atrophic gastritis or gastric cancer.
4.In the Feature Selection part, please clarify why you focused on nine underlying medical conditions rather than more. Was this choice based on existing literature or another reason?
5.In Figure 2, please specify where Deep Neural Network (DNN) and TabNet were applied.
6.Please add a comparison figure of the fine-tuned model results to the Methods section.
7.In the pretrained model, the TabNet threshold is 0.87, whereas in the fine-tuned model, the threshold is 0.1. This is an extremely large difference. Please explain the reason for this discrepancy.
8.In the Results section, only SVM, RF, and DNN are compared. Please explain why. Since TabNet also showed good performance in both Methods and Results, why was it not included in the comparison? In Table 6, the Ensemble Transfer Learning only shows one result. Please clarify which model this result corresponds to.
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis study presents an ensemble transfer learning framework for gastric cancer prediction in data-scarce single-hospital settings using structured electronic health records (EHRs). Three pretrained models SVM, RF, and DNN were fine-tuned on a smaller institutional dataset and combined through stacking ensemble learning, achieving an AUC of 0.92 and other performance values. Risk factors such as smoking status, gender, and hypertension were identified, demonstrating the model’s interpretability and its potential to improve gastric cancer prediction in limited-data clinical environments. The paper is well-written however, I have a few concerns regarding the contents presented in the paper.
- There is no information about that whether this data is longitudinal or not? Because
- For predictive analysis it is required to have longitudinal data. As such types of data samples will assist in identifying the key factors associated with the development of a certain disease with time. Since this work on gastric cancer prediction so we need longitudinal data.
- How many number of participants are there?
- Gender and age distribution is missing.
- While there some gold standards available for cancer detection and diagnosing, I don't know why the authors used clinical data only.
- Why they didn't use whole slide images or other relevant data modalities that can best perform prediction analysis.
- In table 4, all the ML models have recall, F-score, and precision around 70%, but the accuracy is above 92% for all that is I think not practical. I would recommend the authors should reconsider these simulations.
- Also, in this Table 4, the authors mentioned a recall value of 0.1 means 100% for recall. This means that in this predictive model the false-positives are zero. This is really surprising, I would suggest to reconsider this simulation due to the following two reasons:
- As all other models have around 70% recall values but only TabNet gives 100%, I am not sure why?
- There maybe data leakage during the simulation that it gives 100% recall value.
- I would recommend that the author should explain the type of data they have for their simulations i-e., tabular data, text data...
- if your data is tabular, do the authors used gradient boosting models as they outperforms in tabular data.
Author Response
Please see the attachment
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsI thank the authors for submitting their work. Please see my comments below:
- The novelty of the study appears to be very limited. The authors have applied three well-established algorithms, Support Vector Machine (SVM), Random Forest, and Deep Neural Network, pretrained on a large-scale national dataset from the South Korean National Health Insurance Service (NHIS) for gastric cancer prediction. Could the authors clearly elaborate, in bullet points, on the specific contributions of their work that warrant consideration as a full-length journal article?
- I also request the authors to include a comprehensive comparison table with the latest state-of-the-art approaches in this field and to highlight their unique contributions to the knowledge domain.
- Please specify the toolchain used for model training and testing, and include all relevant details to ensure full reproducibility. The software code should also be made available for verification purposes.
- I would like to see the confusion matrices generated directly from the tool, with all relevant screenshots included in the manuscript as well as the calculated p-value.
- Additionally, please provide the data training and testing waveforms obtained directly from the tool to demonstrate how the error converged during training. Indicate which error functions were used, and provide low-level implementation details regarding the filters.
- Since one of the keywords of the paper is “explainable AI,” the authors should offer a detailed technical explanation to help readers fully understand the methodology.
- Without these simulation results and corresponding visualisations directly obtained from the tool, I am unable to recommend the paper for publication at this stage. I therefore request that the authors address all of the above points and clearly highlight their contributions in comparison with the existing state-of-the-art.
Minor editing is required.
Author Response
Please, see the attachment
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThis manuscript has fully addressed my questions and is suitable for publication.
Reviewer 3 Report
Comments and Suggestions for AuthorsI thank authors for addressing my comments. I am happy to recommend for publication.
Comments on the Quality of English LanguageMinor editing is required.

