We are grateful to Dr. Pastore for his thoughtful comments [1] on our article. Below, we provide a revised and carefully edited response addressing the points raised.
We appreciate the reviewer’s insightful observation regarding the potential limitations associated with dataset selection and experimental design. The primary objective of our study was to conduct a systematic and controlled comparison of multiple fine-tuned transfer-learning architectures for brain tumor classification, with an emphasis on methodological benchmarking rather than immediate clinical deployment. To this end, we employed a publicly available Kaggle-sourced MRI dataset, which facilitates transparency, reproducibility, and direct comparison with prior studies, as recommended in early-stage deep learning research for medical imaging [2].
We sincerely thank the reviewer for highlighting the critical issue of subject-level data separation, which is widely recognized in the medical imaging literature as a prerequisite for valid model evaluation. Brain MRI datasets frequently consist of multiple highly correlated slices per subject, often acquired across different sequences and imaging parameters. When random image-level splitting is employed, there is a substantial risk that anatomically adjacent or near-duplicate slices from the same individual may appear in both training and testing sets. This scenario can lead to inadvertent information leakage, enabling models to leverage subject-specific anatomical cues, scanner-dependent textures, or acquisition artifacts rather than learning disease-specific and generalizable representations [3,4]. In the present study, we relied on a publicly available Kaggle MRI dataset that did not include explicit patient, sequence, or site identifiers, thereby limiting our ability to implement strict patient-level or sequence-aware splitting. This constraint reflects a known limitation of several open-access imaging repositories frequently used for benchmarking purposes. To partially mitigate redundancy effects, we confined data augmentation strictly to the training set and ensured that no augmented samples were shared across partitions. However, we fully acknowledge that these measures cannot fully replace subject-level separation and that residual optimistic bias may remain in the reported performance.
We appreciate the reviewer’s detailed and constructive feedback regarding the risk of pipeline leakage arising from improperly nested preprocessing, augmentation, and model optimization steps. We fully agree that all data-dependent operations must be strictly confined to the training folds within a resampling framework to ensure unbiased estimation of model performance. The medical imaging literature has repeatedly emphasized that failure to nest these steps appropriately can introduce subtle but significant information leakage, leading to inflated discrimination metrics and reduced reproducibility [5]. Preprocessing operations such as intensity normalization, resizing, and imputation inherently encode statistical properties of the data distribution. When these transformations are estimated using the full dataset—rather than exclusively on training partitions—information from the held-out data can inadvertently influence the learned feature space, violating the independence of evaluation sets [4]. Similarly, data augmentation strategies, while essential for improving generalization in deep learning models, must be applied only to training samples; applying augmentation prior to data splitting or across folds risks introducing correlated samples into validation or test sets, thereby biasing performance estimates [3,4]. In the present study, all preprocessing and augmentation procedures were implemented using the training data only, and the resulting transformations were applied unchanged to the validation and test sets. Augmentation was performed exclusively on the training subset to avoid redundancy-induced bias. Hyperparameter tuning and fine-tuning schedules were optimized on a dedicated validation split and were not adjusted based on test-set performance. The final evaluation was conducted on a held-out test set that remained untouched throughout model development, thereby preserving the integrity of performance estimation.
We sincerely thank the reviewer for highlighting this important consideration. We agree that evaluating model transportability under data drift and varying acquisition conditions is essential for assessing clinical robustness [6]. In this study, we focused on a single train–validation–test division to establish a baseline comparison across transfer-learning architectures. We acknowledge that repeated nested cross-validation, inclusion of temporally or externally separated test sets, and calibration analyses (e.g., reliability diagrams, decision curve analysis) would provide a stronger assessment of generalization and clinical utility. These aspects will be prioritized in future work.
Author Contributions
Conceptualization, D.R.; Methodology, D.R.; Software, L.K.; Validation, M.D.; Formal analysis, D.R. and L.K.; Investigation, P.J.; Resources, L.K. and A.R.; Data curation, A.R.; Writing—original draft, D.R.; Writing—review & editing, P.J. and S.K.K.; Visualization, S.B. and A.R.; Supervision, P.J., M.D. and S.K.K.; Project administration, D.R.; Funding acquisition, M.D. All authors have read and agreed to the published version of the manuscript.
Data Availability Statement
Data will be made available on “Brain tumor [WWW Document], 2020. Kaggle. https://www.kaggle.com/datasets/jakeshbohaju/brain-tumor, accessed on 16 February 2025”.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Pastore, E.P. Comment on Rastogi et al. Brain Tumor Detection and Prediction in MRI Images Utilizing a Fine-Tuned Transfer Learning Model Integrated Within Deep Learning Frameworks. Life 2025, 15, 327. Life 2026, 16, 535. [Google Scholar] [CrossRef]
- Rastogi, D.; Johri, P.; Donelli, M.; Kumar, L.; Bindewari, S.; Raghav, A.; Khatri, S.K. Brain Tumor Detection and Prediction in MRI Images Utilizing a Fine-Tuned Transfer Learning Model Integrated Within Deep Learning Frameworks. Life 2025, 15, 327. [Google Scholar] [CrossRef] [PubMed]
- Collins, G.S.; Moons, K.G.M.; Dhiman, P.; Riley, R.D.; Beam, A.L.; Van Calster, B.; Ghassemi, M.; Liu, X.; Reitsma, J.B.; van Smeden, M.; et al. TRIPOD+AI: Updated reporting guidance for clinical prediction models using regression or machine-learning methods. BMJ 2024, 385, e078378. [Google Scholar] [CrossRef] [PubMed]
- Moons, K.G.M.; Damen, J.A.A.; Kaul, T.; Hooft, L.; Navarro, C.A.; Dhiman, P.; Beam, A.L.; Van Calster, B.; Celi, L.A.; Denaxas, S.; et al. PROBAST+AI: Updated tool to assess risk of bias and applicability of prediction models using regression or AI. BMJ 2025, 388, e082505. [Google Scholar] [CrossRef] [PubMed]
- Kaufman, S.; Rosset, S.; Perlich, C.; Stitelman, O. Leakage in Data Mining: Formulation, Detection, and Avoidance. ACM Trans. Knowl. Discov. Data 2012, 6, 15. [Google Scholar] [CrossRef]
- Vickers, A.J.; Van Calster, B.; Steyerberg, E.W. A simple guide to decision-curve analysis. Diagn. Progn. Res. 2019, 3, 18. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.