The comment raises three principal themes: (i) safeguarding statistical independence through grouping at the level of specimen/collector/site prior to splitting, (ii) preventing selection bias via fully nested preprocessing and hyperparameter tuning, and (iii) assessing shortcut learning and reporting probability calibration for decision thresholding in ecological workflows [1]. The study addressed Discomycetes classification with a 2800-image corpus and a 60/20/20 train–validation–test protocol, and reported comparative CNN performance and XAI visualizations (Grad-CAM, Score-CAM) [2].
Attention to potential non-independence arising from near-duplicates, shared backgrounds, or repeated collection events is acknowledged [1]. The dataset was assembled to maximize heterogeneity across photographers, habitats, and illumination, and the fixed split (60%/20%/20%) was chosen to retain distributional diversity while isolating a held-out test set [2,3,4,5,6,7,8,9,10]. For future iterations, grouped partitioning will be prioritized to match the deployment setting:
- Specimen-grouped or event-grouped splitting (all images from a collection event withheld together);
- Site-blocked or photographer-blocked cross-validation to mitigate scene-level leakage;
- Geography-aware blocking (e.g., region or habitat strata) to better emulate out-of-domain transfer;
- Camera/EXIF-stratified folds to counter device-specific priors.
These designs align with hierarchical/spatial CV practices recommended for ecological data and directly address the dependency risks highlighted in the comment [1].
All transforms (normalization, resizing, augmentations) and optimizer schedules were confined to the training partition; validation was reserved for early stopping and hyperparameter selection; and the test set remained untouched for final reporting [1]. While this pipeline follows common deep learning practice in image recognition, a fully nested scheme will be adopted in expanded studies to provide more conservative generalization estimates: outer folds for evaluation, inner loops for fitting transforms and tuning, and frozen transforms applied to held-out folds before scoring [2]. This migration to nested CV will reduce selection bias and align the evaluation with rigorous protocols for structured ecological data.
The concern that high accuracy may arise from background or acquisition shortcuts rather than morphology is pertinent [1]. Class-activation analyses (Grad-CAM, Score-CAM) were used to interrogate focus regions for cap, hymenial surfaces, and other taxonomically meaningful traits [2,3,4,5,6,7,8]. To strengthen this line of evidence, subsequent work will include the following:
- Implement saliency sanity checks (model/label randomization tests) to verify explanation fidelity.
- Use perturbation-based localization and deletion/insertion metrics to quantify whether highlighted regions are causally important.
- Add counterfactual augmentations (background swaps, color-cast normalization) to stress-test reliance on contextual cues.
- Report probability calibration (temperature scaling/isotonic regression) with reliability diagrams and Expected Calibration Error (ECE), enabling threshold setting for biodiversity monitoring and curation pipelines [1].
These additions address the request for decisive tests beyond qualitative overlays and for calibrated outputs suitable for operational decision-making [1].
Generalization to new cameras, habitats, and independent collections is essential [1]. An expanded, multi-institutional repository (including herbarium-verified vouchers and geographically distinct field sets) is being curated to support the following:
- External validation on fully independent sources;
- Site-withheld evaluation where entire locales are unseen during training;
- Temporal holdouts (by season/year) to probe phenology-related drift;
- Domain-shift diagnostics (source-specific performance stratified by habitat, device, and photographer).
These experiments will complement the current fixed split and provide deployment-oriented evidence of robustness [2,3,4,5,6,7,8].
To aid re-use and auditability, forthcoming revisions will (i) enumerate transform parameters and their fit scope (training-only vs. applied) [2,3,4,5,6,7,8,9,10,11,12], (ii) list hyperparameter search spaces and selection criteria [2,3,4,5,6,7,8,9,10,11,12,13], (iii) publish per-class metrics (precision/recall/F1) and confusion matrices [2,3,4,5,6,7,8,9,10,11,12,13,14], and (iv) include calibration curves and threshold–utility analyses for practitioner selection in monitoring workflows [2,3,4,5,6,7,8]. Where permissible, code and trained weights will be shared to facilitate independent replication.
The present study established a comparative baseline across ten CNNs with XAI analyses on Discomycetes images [2]. Recognized limitations include (a) absence of fully nested CV, (b) lack of grouped/blocked splits, (c) qualitative emphasis in XAI without formal causality metrics, and (d) missing calibration reporting. The roadmap above targets each limitation with concrete methodological upgrades consistent with the comment’s recommendations [1].
The comment’s recommendations on grouped splitting, nested validation, explanation reliability, calibration, and external/site-withheld evaluation are well-taken and align with the next phase of this research program [1]. Within the current scope, processing was confined to training data, validation guided tuning, and a held-out test set was used for final reporting [2]. Future releases will incorporate grouped/nested protocols, quantitative XAI sanity checks, probability calibration, and independent validations to provide deployment-grade evidence for biodiversity and curation use cases.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Pastore, E.P. Comment on Korkmaz et al. A Deep Learning and Explainable AI-Based Approach for the Classification of Discomycetes Species. Biology 2025, 14, 719. Biology 2026, 15, 106. [Google Scholar] [CrossRef]
- Korkmaz, A.F.; Ekinci, F.; Altaş, Ş.; Kumru, E.; Güzel, M.S.; Akata, I. A Deep Learning and Explainable AI-Based Approach for the Classification of Discomycetes Species. Biology 2025, 14, 719. [Google Scholar] [CrossRef] [PubMed]
- Özsarı, Ş.; Kumru, E.; Ekinci, F.; Güzel, M.S.; Açıcı, K.; Asuroglu, T.; Akata, I. Advanced deep learning approaches for the automated classification of macrofungal species in biodiversity monitoring. Trak. Univ. J. Nat. Sci. 2025. Online First. [Google Scholar] [CrossRef]
- Kumru, E.; Ugurlu, G.; Sevindik, M.; Ekinci, F.; Güzel, M.S.; Acici, K.; Akata, I. Hybrid Deep Learning Framework for High-Accuracy Classification of Morphologically Similar Puffball Species Using CNN and Transformer Architectures. Biology 2025, 14, 816. [Google Scholar] [CrossRef] [PubMed]
- Kumru, E.; Korkmaz, A.F.; Ekinci, F.; Aydoğan, A.; Güzel, M.S.; Akata, I. Deep Ensemble Learning and Explainable AI for Multi-Class Classification of Earthstar Fungal Species. Biology 2025, 14, 1313. [Google Scholar] [CrossRef] [PubMed]
- Ekinci, F.; Ugurlu, G.; Ozcan, G.S.; Acici, K.; Asuroglu, T.; Kumru, E.; Guzel, M.S.; Akata, I. Classification of Mycena and Marasmius Species Using Deep Learning Models: An Ecological and Taxonomic Approach. Sensors 2025, 25, 1642. [Google Scholar] [CrossRef]
- Kumru, E.; Ekinci, F.; Açici, K.; Altindal, Ö.B.; Güzel, M.S.; Akata, I. Advanced deep learning approaches for the accurate classification of Phallaceae fungi with explainable AI. Turk. J. Bot. 2025, 49, 388–405. [Google Scholar] [CrossRef]
- Ozsari, S.; Kumru, E.; Ekinci, F.; Akata, I.; Guzel, M.S.; Acici, K.; Ozcan, E.; Asuroglu, T. Deep learning-based classification of macrofungi: Comparative analysis of advanced models for accurate fungi identification. Sensors 2024, 24, 7189. [Google Scholar] [CrossRef]
- Kalkan, M.; Guzel, M.S.; Ekinci, F.; Akcapinar Sezer, E.; Asuroglu, T. Comparative analysis of deep learning methods on CT images for lung cancer specification. Cancers 2024, 16, 3321. [Google Scholar] [CrossRef] [PubMed]
- Atilkan, Y.; Kirik, B.; Acici, K.; Benzer, R.; Ekinci, F.; Guzel, M.S.; Benzer, S.; Asuroglu, T. Advancing Crayfish Disease Detection: A Comparative Study of Deep Learning and Canonical Machine Learning Techniques. Appl. Sci. 2024, 14, 6211. [Google Scholar] [CrossRef]
- Jain, E.; Nandy, T.; Aggarwal, G.; Tendulkar, A.; Iyer, R.; De, A. Efficient data subset selection to generalize training across models: Transductive and inductive networks. Adv. Neural Inf. Process. Syst. 2023, 36, 4716–4740. [Google Scholar]
- Vignali, S.; Barras, A.G.; Arlettaz, R.; Braunisch, V. SDMtune: An R package to tune and evaluate species distribution models. Ecol. Evol. 2020, 10, 11488–11506. [Google Scholar] [CrossRef] [PubMed]
- Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; Cox, D.D. Hyperopt: A python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 2015, 8, 014008. [Google Scholar] [CrossRef]
- Zeng, G. Invariance Properties and Evaluation Metrics Derived from the Confusion Matrix in Multiclass Classification. Mathematics 2025, 13, 2609. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.