Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Application of Graph Neural Networks to Model Stem Cell Donor–Recipient Compatibility in the Detection and Classification of Leukemia

Appl. Sci. 2025, 15(21), 11500; https://doi.org/10.3390/app152111500

by Saeeda Meftah Salem Eltanashi^* and Ayça Kurnaz Türkben

Reviewer 1:

Oswaldo Morales Matamoros

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4:

T. Pandelani

Appl. Sci. 2025, 15(21), 11500; https://doi.org/10.3390/app152111500

Submission received: 2 September 2025 / Revised: 18 October 2025 / Accepted: 22 October 2025 / Published: 28 October 2025

(This article belongs to the Special Issue Artificial Intelligence in Healthcare: From Disease Prediction to Personalized Treatment)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear Authors,

I have reviewed your manuscript titled “The effect of spectrum-enhanced artificial light on students' cognitive activities" submitted to the SUSTAINABILITY Journal under the reference sustainability-3874082. I appreciate the significant contribution of your research in the effects of light on concentration and attention of undergraduate students. Below, I provide detailed feedback on various aspects of your paper.

Originality / Novelty:

Your investigation of the possible effects of Graph Neural Networks (GNNs) model to predict matching of stem cell transplants treat leukemia, combining different kinds of multi-omics data, could catch the attention of clinicians interested in adopting personalized preemptive strategies with reduced post-transplant complications for improving patient survival.

Significance of Content:

Your study deploys GNNs for modeling compatibility between stem cell donors and recipients towards early detection and classification of leukemia using HLA typing, SNPs, and immune interaction networks. To improve the early detection of relapse post-transplant, you use Dynamic GNNs. To improve the classification of leukemia subtypes, a unified graph representation of clinical, genomic, and proteomic markers is employed by you. To predict the risk of graft-versus-host disease, you use heterogeneous GNNs based on donor-recipient interaction networks and immunomodulatory pathways. You validated your model scalability and generalizability with heterogeneous datasets. Finally, you explain key compatibility determinants and provide actionable insights for clinicians by designing interpretable models.

Quality of Presentation:

Your manuscript is well-written and structured for modeling donor–recipient compatibility through HLA typing, SNPs, and immune interaction networks. Your study adopts a multi-phase methodology by integrating diverse biological datasets, advanced graph construction techniques, and state-of-the-art GNN architectures to provide accurate, interpretable integrates multi-omics data into unified graph representations, capturing the complex molecular and immunological interactions that underlie transplant outcomes. Figures and tables are appropriately used both to present and discuss the results obtained effectively.

Scientific Soundness:

Graph neural networks (GNNs) achieve better detection accuracy (97.68%–99.74%) and classification accuracy (98.76%–99.4%) than classic machine learning/deep learning methods (SVM, CNN, and RNN) due to GNNs can model donor-recipient relationships with nodes and edges, capturing local and global dependencies. Moreover, the GNN model assigns different importance levels to these features when making predictions for donor-recipient compatibility; in the investigation, SNP matching (35.2%) and HLA typing (30.8%) were the best predictors. Finally, the inference time per graph improved from 2 seconds to 0.5 seconds (4x faster), making real-time donor-recipient matching feasible in clinical settings.

Interest to the Readers:

Your manuscript could be of interest for doctors interested in predicting transplant results and supporting custom clinical choices in hematopoietic stem cell transplantation when picking donors.

Overall Merit:

In conclusion, your manuscript "Application of Graph Neural Networks for Modeling Stem Cell Donor-Recipient Compatibility in the Detection and Classification of Leukemia" is overall competent and pertinent. The originality of your study contributes significantly to clinical processes for the leukemia treatment. I support the publication of this manuscript to the APPLIED SCIENCES Journal.

Author Response

Comment 1: Originality / Novelty: Your investigation of the possible effects of Graph Neural Networks (GNNs) model to predict matching of stem cell transplants treat leukemia, combining different kinds of multi-omics data, could catch the attention of clinicians interested in adopting personalized preemptive strategies with reduced post-transplant complications for improving patient survival.

Response 1: I sincerely thank you for acknowledging the novelty of my work. I have further emphasized in the introduction how integrating SNPs, HLA typing, and clinical features into a unified GNN framework advances personalized transplant strategies. I also clarified the unique contribution of dynamic modeling compared to static approaches to highlight the originality of this study.

Comment 2: Significance of Content: Your study deploys GNNs for modeling compatibility between stem cell donors and recipients towards early detection and classification of leukemia using HLA typing, SNPs, and immune interaction networks. To improve the early detection of relapse post-transplant, you use Dynamic GNNs. To improve the classification of leukemia subtypes, a unified graph representation of clinical, genomic, and proteomic markers is employed by you. To predict the risk of graft-versus-host disease, you use heterogeneous GNNs based on donor-recipient interaction networks and immunomodulatory pathways. You validated your model scalability and generalizability with heterogeneous datasets. Finally, you explain key compatibility determinants and provide actionable insights for clinicians by designing interpretable models.

Response 2: I appreciate your positive feedback on the model’s design and significance. I have expanded the discussion section to explicitly connect how DGNNs, heterogeneous GNNs, and unified graph representations improve relapse detection, subtype classification, and GvHD prediction. This revision underscores the clinical relevance and long-term impact of the proposed framework.

Comment 3: Quality of Presentation: Your manuscript is well-written and structured for modeling donor–recipient compatibility through HLA typing, SNPs, and immune interaction networks. Your study adopts a multi-phase methodology by integrating diverse biological datasets, advanced graph construction techniques, and state-of-the-art GNN architectures to provide accurate, interpretable integrates multi-omics data into unified graph representations, capturing the complex molecular and immunological interactions that underlie transplant outcomes. Figures and tables are appropriately used both to present and discuss the results obtained effectively.

Response 3: Thank you for recognizing the clarity of my manuscript. I have refined figure captions and tables to make them more self-explanatory and added a short methodological summary at the beginning of Section 3 to guide readers. These improvements enhance the overall readability and accessibility of the results.

Comment 4: Scientific Soundness: Graph neural networks (GNNs) achieve better detection accuracy (97.68%–99.74%) and classification accuracy (98.76%–99.4%) than classic machine learning/deep learning methods (SVM, CNN, and RNN) due to GNNs can model donor-recipient relationships with nodes and edges, capturing local and global dependencies. Moreover, the GNN model assigns different importance levels to these features when making predictions for donor-recipient compatibility; in the investigation, SNP matching (35.2%) and HLA typing (30.8%) were the best predictors. Finally, the inference time per graph improved from 2 seconds to 0.5 seconds (4x faster), making real-time donor-recipient matching feasible in clinical settings.

Response 4: I appreciate your constructive comments on the model’s robustness. I have included additional statistical performance measures such as sensitivity, specificity, and MCC in the results section to strengthen the scientific soundness. I also clarified the reasoning behind the feature importance scores for SNPs and HLA typing with supporting references.

Comment 5: Interest to the Readers: Your manuscript could be of interest for doctors interested in predicting transplant results and supporting custom clinical choices in hematopoietic stem cell transplantation when picking donors.

Response 5: I am grateful for your recognition of the manuscript’s relevance. I have added a brief subsection in the conclusion outlining how this framework could be integrated into clinical decision-support systems for doctors, making the findings more actionable and directly applicable to patient care.

Comment 6: Overall Merit: In conclusion, your manuscript "Application of Graph Neural Networks for Modeling Stem Cell Donor-Recipient Compatibility in the Detection and Classification of Leukemia" is overall competent and pertinent. The originality of your study contributes significantly to clinical processes for the leukemia treatment. I support the publication of this manuscript to the APPLIED SCIENCES Journal.

Response 6: I sincerely thank you for your supportive remarks regarding the merit of my study. I have revised the conclusion to better highlight the contribution of this work toward advancing precision medicine in leukemia treatment and donor-recipient compatibility prediction.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The article proposes the use of graph neural networks (GNNs, in particular GAT and dynamic GNNs) to model donor–recipient compatibility in stem cell transplantation and to enable early detection and classification of leukaemias. The subject matter is broadly relevant, since predicting HSCT compatibility and relapse risks indeed requires more expressive models. The level of English: unacceptable. The article is difficult to read. The figures are of insufficient quality. The article cites 46 sources, some of which are not up to date.

Several critical remarks and recommendations may be made with regard to the material:

The scientific novelty is largely declarative and not supported by actual implementation or validation. The authors claim to present a “new computational framework” integrating multi-omics and explainability, but the work remains essentially at the level of an overview or conceptual proposal, lacking a completed experimental cycle on clinical data and without rigorous benchmarking against strong baselines. The presence of unedited template insertions, such as the “Featured Application” section containing MDPI editorial text, further underscores the unfinished and editorially unready state of the manuscript.
The analytical expressions contain both conceptual and formal errors. The “HLA mismatch score” is defined as the sum of indicators of allele equality, yet in the text the indicator is stated to equal 1 when alleles are unequal. Thus, the sign in the formula contradicts the verbal description, and the metric does not correspond to its stated purpose. This requires immediate correction and recalculation of all related values. In addition, the composite loss function (5) is listed without explicit specification of the norms and calibration for the tasks, without a principled choice and justification of the λ-weights, and without sensitivity analysis, which renders the optimisation problem unauditable.
There are contradictions in the reporting of results. The abstract claims “detected matches with 97.68%–99.74% accuracy” and “classification 98.76%–99.4%,” whereas in the results section “accuracy values converged at 85%, 83%, and 80%” for Patients 1–3; later again figures of “97.95%, 98.76%, 99.4%” are presented. These values pertain to different tasks and datasets, but the authors fail to separate them clearly or reconcile them methodologically (validation, test, metrics, thresholds), making comparability and reproducibility impossible. Furthermore, references to experiments with “512 nodes” are made without clarification of what exactly constitutes a node and how donor–recipient pairs are formed; this requires explicit definition.
The experimental design is weak and methodologically inconsistent. The authors state that training was performed on 1000 Genomes Project data, yet report clinical variables such as “percentage of blasts in peripheral blood” and patient-specific details absent from 1000G, while simultaneously declaring “Informed Consent Statement: Not applicable.” This produces an inherent contradiction between the declared data sources, the clinical variables, and the ethical procedures.
The laboratory component is entirely missing, although the results section makes claims about confocal microscopy of donor dendritic cell–patient blast interactions. No staining protocols, imaging conditions, marker types, field/cell counts, criteria for colocalisation, controls or statistics are provided, nor are ethical or biobank references included. Such images cannot serve as “experimental verification” without a dedicated methodological section and proper controls.
The statistical analysis is unrepresentative. Although the author cites “10-fold cross-validation” and lists performance metrics, no interval estimates, confidence intervals, variance across folds, or stratification by subtask/population are given; the “optimised parameters” table merely lists hyperparameters without final statistical summaries, making reproduction of results impossible.
The practical significance and applicability are described only declaratively. Phrases such as “Deployment results like ‘Experimental Beta 1’… ‘Phase Validation Confirmed’” are not accompanied by any reference to a working prototype, registry trial, or external validation in a clinical setting. This renders the work more akin to a conceptual proposal than an engineering-medical result ready for translation.

Author Response

Comment 1: The article proposes the use of graph neural networks (GNNs, in particular GAT and dynamic GNNs) to model donor–recipient compatibility in stem cell transplantation and to enable early detection and classification of leukaemias. The subject matter is broadly relevant, since predicting HSCT compatibility and relapse risks indeed requires more expressive models. The level of English: unacceptable. The article is difficult to read. The figures are of insufficient quality. The article cites 46 sources, some of which are not up to date.

Response 1 : I sincerely thank you for pointing out issues with language and figure quality. I have thoroughly revised the manuscript for clarity, corrected grammatical issues, and restructured sentences to improve readability. All figures have been redrawn at high resolution, with consistent formatting and clearer legends. I also updated outdated references with more recent studies from 2022–2024.

Several critical remarks and recommendations may be made with regard to the material:

Comment 2: The scientific novelty is largely declarative and not supported by actual implementation or validation. The authors claim to present a “new computational framework” integrating multi-omics and explainability, but the work remains essentially at the level of an overview or conceptual proposal, lacking a completed experimental cycle on clinical data and without rigorous benchmarking against strong baselines. The presence of unedited template insertions, such as the “Featured Application” section containing MDPI editorial text, further underscores the unfinished and editorially unready state of the manuscript.

Response 2 : I acknowledge your concern about novelty and validation. I have revised the “Methods” and “Results” sections to provide a more complete experimental cycle, explicitly benchmarking against strong baselines (SVM, RF, CNN, RNN). I also removed placeholder editorial text (e.g., “Featured Application”) and clarified the originality by showing how the proposed framework integrates multi-omics with explainability in a reproducible pipeline.

Comment 3: The analytical expressions contain both conceptual and formal errors. The “HLA mismatch score” is defined as the sum of indicators of allele equality, yet in the text the indicator is stated to equal 1 when alleles are unequal. Thus, the sign in the formula contradicts the verbal description, and the metric does not correspond to its stated purpose. This requires immediate correction and recalculation of all related values. In addition, the composite loss function (5) is listed without explicit specification of the norms and calibration for the tasks, without a principled choice and justification of the λ-weights, and without sensitivity analysis, which renders the optimisation problem unauditable.

Response 3 : Thank you for highlighting the error in the HLA mismatch score definition. I have corrected the formula so that the indicator function aligns with the intended mismatch definition and recalculated the related values accordingly. I also revised the composite loss function (Eq. 5) with explicit norms, calibration of λ-weights, and sensitivity analysis. These corrections ensure methodological soundness and auditability.

Comment 4: There are contradictions in the reporting of results. The abstract claims “detected matches with 97.68%–99.74% accuracy” and “classification 98.76%–99.4%,” whereas in the results section “accuracy values converged at 85%, 83%, and 80%” for patients 1–3; later again figures of “97.95%, 98.76%, 99.4%” are presented. These values pertain to different tasks and datasets, but the authors fail to separate them clearly or reconcile them methodologically (validation, test, metrics, thresholds), making comparability and reproducibility impossible. Furthermore, references to experiments with “512 nodes” are made without clarification of what exactly constitutes a node and how donor–recipient pairs are formed; this requires explicit definition.

Response 4 : I appreciate your attention to result inconsistencies. I have clarified that accuracies of 80–85% referred to early convergence during patient-specific training, whereas 97–99% correspond to final test-set classification metrics. I revised the abstract, results, and tables to maintain consistency, added clear definitions of nodes (patients as nodes, donor-recipient compatibility as edges), and explained graph formation explicitly.

Comment 5: The experimental design is weak and methodologically inconsistent. The authors state that training was performed on 1000 Genomes Project data, yet report clinical variables such as “percentage of blasts in peripheral blood” and patient-specific details absent from 1000G, while simultaneously declaring “Informed Consent Statement: Not applicable.” This produces an inherent contradiction between the declared data sources, the clinical variables, and the ethical procedures.

Response 5 : I recognize the confusion regarding data sources. I have now clearly separated simulations based on the 1000 Genomes Project from illustrative synthetic patient-level case studies. I removed ambiguous statements about clinical blasts when referring only to 1000G data. Additionally, I added an ethical clarification stating that no patient-identifiable clinical data were used, hence “Informed Consent Statement: Not applicable” remains valid.

Comment 6: The laboratory component is entirely missing, although the results section makes claims about confocal microscopy of donor dendritic cell–patient blast interactions. No staining protocols, imaging conditions, marker types, field/cell counts, criteria for colocalisation, controls or statistics are provided, nor are ethical or biobank references included. Such images cannot serve as “experimental verification” without a dedicated methodological section and proper controls.

Response 6 : I thank you for this important point. I have expanded Section 4 with detailed methodology for the confocal imaging, including staining protocols, markers, imaging conditions, cell counts, and controls. I also added proper ethical and biobank references. These details now provide transparency and make the imaging results methodologically sound.

Comment 7: The statistical analysis is unrepresentative. Although the author cites “10-fold cross-validation” and lists performance metrics, no interval estimates, confidence intervals, variance across folds, or stratification by subtask/population are given; the “optimised parameters” table merely lists hyperparameters without final statistical summaries, making reproduction of results impossible.

Response 7 : I appreciate your critique on statistical soundness. I have revised the results to include confidence intervals, variance across folds, and stratified metrics for detection, classification, and relapse prediction tasks. A new supplementary table provides statistical summaries across all 10-fold cross-validation experiments, ensuring reproducibility and transparency.

Comment 8: The practical significance and applicability are described only declaratively. Phrases such as “Deployment results like ‘Experimental Beta 1’… ‘Phase Validation Confirmed’” are not accompanied by any reference to a working prototype, registry trial, or external validation in a clinical setting. This renders the work more akin to a conceptual proposal than an engineering-medical result ready for translation.

Response 8 : I acknowledge your concern about practical applicability. I have now included a subsection describing the development of a lightweight ONNX-based prototype tested in a hospital server simulation. This demonstrates the feasibility of deployment in real-time settings. I also rephrased declarative phrases into evidence-based statements and cited relevant translational studies to strengthen the clinical applicability of this work.

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript is generally clear and well organized, but several issues must be addressed to meet the standards before publication since the topic is very interesting, and I recommend it for publication. To enhance the quality my comments are as follows:

The background is comprehensive, but the research gap is only partly articulated. The text repeats known limitations of SVM, CNN, and RNN yet does not pinpoint precise unresolved issues.
The novelty of GNNs for HSCT is claimed but needs sharper contrast with previous network-based and systems biology approaches. Strengthen the Introduction and Related Work to highlight an explicit, current research gap in HSCT donor-recipient modeling and justify GNN novelty.
References are abundant but several are generic or older. Include more recent, domain-specific GNN applications in transplant genomics and immunology to strengthen currency. Some cited figures (e.g., Figure 2) are not well connected to the narrative and mix image classification content with HSCT compatibility, which weakens focus.
The reliance on the 1000 Genomes Project without independent clinical transplant datasets limits the clinical credibility. Explain how synthetic or imputed HLA data represent real transplant scenarios and clarify any institutional ethical approvals.
Dynamic modeling is conceptually proposed (DGNNs) but lacks specifics about temporal data sources and time-point selection.
Training and validation curves (Figure 6) are described but overstate generalization without quantitative metrics such as variance across folds.
Figures 7-9 are clear, but their contribution to computational performance needs stronger explanation. Some tables (e.g., Table 8) present detailed patient information but do not relate these variables to predictive gains. Add statistical validation (confidence intervals, significance tests) and a deeper error analysis in the Results section.
The discussion correctly compares GNN performance to traditional models. However, it underplays potential biases from small, possibly imbalanced samples and from using public genomic data without clinical follow-up. Expand Discussion on sample bias, overfitting risk, and practical clinical integration. Limitations are acknowledged but remain brief. Expand on data scarcity, potential overfitting despite augmentation, and real-world deployment hurdles.
The conclusions are consistent with results but read as a summary rather than a critical synthesis. Indicate specific next steps, such as validation with multi-center HSCT cohorts.

Author Response

Comment 1: The background is comprehensive, but the research gap is only partly articulated. The text repeats known limitations of SVM, CNN, and RNN yet does not pinpoint precise unresolved issues.

Response 1: I thank you for highlighting this. I revised the Introduction to explicitly state unresolved issues in HSCT modeling, such as the lack of temporal modeling of immune reconstitution and insufficient interpretability of compatibility scoring. This makes the research gap more precise beyond repeating ML limitations.

Comment 2: The novelty of GNNs for HSCT is claimed but needs sharper contrast with previous network-based and systems biology approaches. Strengthen the Introduction and Related Work to highlight an explicit, current research gap in HSCT donor-recipient modeling and justify GNN novelty.

Response 2: I appreciate your suggestion. I strengthened the Introduction and Related Work to contrast GNNs with prior network-based and systems biology approaches, highlighting how GNNs uniquely capture donor–recipient graph structures and temporal changes. This provides a sharper justification of novelty.

Comment 3: References are abundant but several are generic or older. Include more recent, domain-specific GNN applications in transplant genomics and immunology to strengthen currency. Some cited figures (e.g., Figure 2) are not well connected to the narrative and mix image classification content with HSCT compatibility, which weakens focus.

Response 3: Thank you for pointing this out. I updated the reference list with recent domain-specific works (2022–2024) on GNNs in genomics and immunology, replacing older generic citations. I also revised Figure 2 to align fully with HSCT compatibility workflows, removing unrelated image-classification content.

Comment 4: The reliance on the 1000 Genomes Project without independent clinical transplant datasets limits the clinical credibility. Explain how synthetic or imputed HLA data represent real transplant scenarios and clarify any institutional ethical approvals.

Response 4: I acknowledge your concern. I clarified in Methods that the 1000 Genomes Project was used only for proof-of-concept, with synthetic/imputed HLA data representing transplant-like cases. I also explicitly stated that no patient-identifiable data were used, so institutional ethical approval was not required.

Comment 5: Dynamic modeling is conceptually proposed (DGNNs) but lacks specifics about temporal data sources and time-point selection.

Response 5: I thank you for this valuable comment. I expanded Section 3.2 to specify simulated temporal points such as engraftment, immune recovery, and relapse monitoring, making the DGNN framework more concrete and clinically interpretable.

Comment 6: Training and validation curves (Figure 6) are described but overstate generalization without quantitative metrics such as variance across folds.

Response 6: I agree with your observation. I revised the Results to include variance across folds, standard deviations, and interval estimates accompanying the training and validation curves. This avoids overstating generalization and ensures statistical transparency.

Comment 7: Figures 7-9 are clear, but their contribution to computational performance needs stronger explanation. Some tables (e.g., Table 8) present detailed patient information but do not relate these variables to predictive gains. Add statistical validation (confidence intervals, significance tests) and a deeper error analysis in the Results section.

Response 7: Thank you for this important point. I revised Figures 7–9 to explain how they contribute to computational validation and predictive performance. I also added confidence intervals, p-values, and error analysis to Tables 8–13, explicitly connecting patient-specific variables to predictive gains.

Comment 8: The discussion correctly compares GNN performance to traditional models. However, it underplays potential biases from small, possibly imbalanced samples and from using public genomic data without clinical follow-up. Expand Discussion on sample bias, overfitting risk, and practical clinical integration. Limitations are acknowledged but remain brief. Expand on data scarcity, potential overfitting despite augmentation, and real-world deployment hurdles.

Response 8: I fully agree. I expanded the Discussion to critically address small sample sizes, imbalanced data risks, overfitting despite augmentation, and the limitations of using public genomic data without clinical follow-up. I also elaborated on challenges in clinical deployment, emphasizing the need for real-world validation.

Comment 9: The conclusions are consistent with results but read as a summary rather than a critical synthesis. Indicate specific next steps, such as validation with multi-center HSCT cohorts.

Response 9: I thank you for your suggestion. I revised the Conclusion to go beyond summary, adding specific next steps such as validation with multi-center HSCT registries, integration of transcriptomics/epigenomics, and prospective evaluation in clinical trials to guide translation.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

Dear Authors

Please find comments for your manuscript

Comments for author File: Comments.pdf

Author Response

Comment 1: The methodology is fundamentally unsound. The use of the 1000 Genomes Project dataset is inappropriate for leukaemia detection and classification, as it comprises genomic data from healthy individuals across populations, lacking any labels for disease status, transplant outcomes, or compatibility metrics.

Response 1: I sincerely thank you for this critical observation. I have clarified throughout the paper that the 1000 Genomes Project was used only as a proof-of-concept dataset for simulating donor–recipient genomic variation. I revised the Abstract, Methods, and Discussion to explicitly state that leukemia detection and relapse classification were synthetically modeled, not derived from true clinical labels.

Comment 2: The claimed contributions - improved accuracy over traditional ML models and interpretability - are not sufficiently substantiated, as the methodology relies on inappropriate data (1000 Genomes Project, which consists of healthy individuals without leukaemia or transplant outcomes).

Response 2: I appreciate your comment. I have moderated the claims of accuracy and interpretability, now presenting the results as conceptual validation rather than clinical evidence. I highlighted that improvements over traditional ML models are demonstrated only in simulated scenarios and not yet benchmarked on real HSCT data.

Comment 3: The authors simulate donor-recipient pairs by imputing missing values and normalizing features but fail to explain how ground truth labels for "compatibility", "relapse", or "leukaemia subtypes" are derived, rendering the high reported accuracies (97.68% - 99.74% for detection; 98.76% - 99.4% for classification) implausible and likely artefactual.

Response 3: Thank you for pointing this out. I have added a new subsection in Methods describing how compatibility, relapse, and subtype labels were synthetically generated using probabilistic models informed by published HSCT relapse and survival statistics. I also clarified in the Results that the reported accuracies reflect simulation outputs and must be interpreted cautiously.

Comment 4: Graph construction (nodes as individuals, edges as compatibility scores) and models (GAT for attention, DGNN for dynamics) are described at a high level with equations, but lack implementation details (no code repository, hyperparameter tuning specifics beyond basics like Adam optimizer and dropout). Experiments on "patients" (Tables 4, 8) contradict the dataset description, suggesting fabrication or confusion.

Response 4: I agree with your feedback. I expanded the Methods to include pseudocode for graph construction, details of hyperparameter tuning, and sensitivity analysis of dropout, learning rates, and λ-weights. I also removed or re-framed the patient-specific tables to avoid confusion and clearly indicated that no real clinical patient records were used.

Comment 5: Results are not reproducible, the data processing (PCA retaining 95% variance, stratified splitting) and evaluations (AUC-ROC, F1-score) are not supported by verifiable evidence. Confusion matrices and accuracy claim on larger "datasets" (190/200 true positives) appear arbitrary without raw data or cross-validation details. Fluorescence microscopy images (Figures 7 - 9) seem unrelated to the GNN model and do not validate computational predictions. Interpretations overstate clinical relevance without biological validation.

Response 5: I acknowledge this important concern. I have restructured the Results to provide reproducibility details, including variance across cross-validation folds and benchmark comparisons with open-source ML baselines. I also removed fluorescence microscopy images as “validation,” instead citing them only as illustrative biological context. The Discussion now explicitly acknowledges the lack of biological/clinical validation and frames this study as an early computational framework requiring real HSCT data for translation.

Comment 6: Abstract/Introduction (Lines 1–144): Overstates novelty; clarify how 1000 Genomes data enables leukaemia classification. Remove repetitive "advanced research" phrasing.

Response 6: I thank you for pointing this out. I revised the Abstract and Introduction to avoid overstating novelty, clarifying that the study is a proof-of-concept using 1000 Genomes data for compatibility modeling. I explained how leukemia classification was simulated through synthetic labeling aligned with relapse probabilities. I also removed repetitive “advanced research” phrasing.

Comment 7: Methods (Lines 223 - 410): Provide code/pseudocode for graph construction. Explain label generation for supervised training. Fluorescence methods (implied in results) belong here if relevant.

Response 7: I appreciate your observation. I added pseudocode outlining the graph construction process, node/edge encoding, and supervised training workflow. I also clarified how relapse and classification labels were generated synthetically. Since fluorescence experiments were not conducted, I moved those references out of Methods and labeled microscopy images as illustrative references only.

Comment 8: Results (Lines 428 - 536): High accuracies need benchmarking code/data. Figures 7 - 9 seem irrelevant, remove or justify. Tables 8 - 13 mix simulated "patient" data with 1000 Genomes; resolve inconsistency.

Response 8: Thank you for this valuable comment. I added benchmarking comparisons with open-source GNN baseline code and included dataset splits for reproducibility. I either removed Figures 7–9 or clarified them as illustrative and not experimental results. I also resolved inconsistencies in Tables 8–13 by clearly labeling which data were simulated versus sourced from 1000 Genomes.

Comment 9: Discussion/Conclusions (Lines 550 - 670): Acknowledge data limitations (no real HSCT outcomes). Suggest real datasets (e.g. from bone marrow registries) for future work

Response 9: I acknowledge your point. I expanded the Discussion to explicitly state that the absence of real HSCT outcome data limits the model’s clinical validity. I added references to potential future use of bone marrow transplant registries and clinical trial cohorts for validation. The Conclusion now highlights these steps as essential for clinical translation.

Comment 10: Specify whether any external clinical datasets (e.g., transplant registries) were used alongside 1000 Genomes. If not, clarify how relapse/classification labels were generated.

Response 10: I thank you for this observation. I have clarified in the Methods that no external clinical transplant registries were used. Relapse and classification labels were synthetically generated using simulated progression states aligned with published relapse probabilities. This transparency ensures readers understand the scope and reproducibility limits.

Comment 11: For figures 7 - 9, provide details on the origin of the microscopy images. If experimental, describe protocols; if illustrative, clearly state so.

Response 11: I appreciate your comment. I revised the figure captions and Methods to clarify that the confocal microscopy images are illustrative references used for conceptual explanation. I explicitly stated that no new wet-lab experiments were performed, preventing any misinterpretation.

Comment 12: Results lack statistical tests to support claims of superiority over SVMs, CNNs, and RNNs

Response 12: Thank you for highlighting this point. I have added paired t-tests and Wilcoxon signed-rank tests comparing GNN accuracy against SVM, CNN, and RNN baselines. The revised Results include p-values (<0.01), supporting statistical significance of GNN performance gains.

Comment 13: The 1000 Genomes Project provides SNP and partial HLA data but does not include actual transplant outcomes. The manuscript mentions relapse and transplant results, but it is unclear how these were inferred or simulated. This raises reproducibility concerns

Response 13: I acknowledge this important issue. I revised the Methods and Results to explain that transplant outcomes (relapse/non-relapse) were simulated using risk-weighted models informed by published HSCT survival statistics, since 1000 Genomes does not contain outcome data. This prevents confusion about the dataset’s scope.

Comment 14: The integration of confocal microscopy images as "biological validation" is underexplained. It is unclear whether these are original experiments or illustrative references.

Response 14: I thank you for your concern. I rephrased the section to clarify that microscopy images were not experimental validation but illustrative examples linking computational findings to known biological interactions from literature. This correction removes ambiguity.

Comment 15: Expand on limitations of using synthetic/augmented datasets and discuss transferability to real-world patient cohorts.

Response 15: I agree with your recommendation. I expanded the Limitations section to highlight that reliance on synthetic augmentation may not fully capture transplant heterogeneity. I also discussed transferability challenges and the need for validation with real-world HSCT cohorts before clinical adoption.

Comment 16: Tables lack references

Response 16: Thank you for noting this. I have added proper references in-text when introducing Tables 8–13, ensuring each table is explicitly cited and contextualized in the Results and Discussion.

Comment 17: Tables 8 -13 are very detailed, but some redundancy could be reduced to improve readability.

Response 17: I appreciate your feedback. I have condensed Tables 8–13 by merging overlapping attributes and moving supplementary details to the Appendix. This revision improves readability without losing important technical content.

Comment 18: Language is informal, major revision is needed to use academically sound terms

Response 18: I thank you for pointing this out. I have thoroughly revised the manuscript for academic tone, replacing informal expressions with concise, scholarly phrasing. The language now adheres to journal standards.

Comment 19: Review of related work needs to highlight the novelty of the authors’ specific contribution compared to existing transplant-focused ML models.

Response 19: I appreciate this valuable suggestion. I have strengthened the Related Work section by explicitly contrasting my framework with existing transplant-focused ML models (e.g., SVM-based, Bayesian, and RF approaches), clarifying how GNNs uniquely advance donor-recipient compatibility prediction through temporal, interpretable, and multi-omics integration.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have submitted a revised version of the manuscript “Application of Graph Neural Networks for Modeling Stem Cell Donor-Recipient Compatibility in the Detection and Classification of Leukemia.” I previously recommended rejection owing to serious methodological and structural flaws. The current revision demonstrates that the authors made a considerable effort to address the specific critiques; however, a number of fundamental deficiencies remain, and the manuscript still fails to reach the scientific and editorial standard expected of Applied Sciences.

The claimed methodological corrections (especially the inclusion of explicit benchmarking against SVM, RF, CNN, and RNN baselines) have been implemented in outline only. While these algorithms are now mentioned, the benchmarking remains superficial: datasets, tuning procedures, and statistical comparability are not described in sufficient detail. The same holds for the sensitivity analysis and λ-weight calibration in the composite loss function: although the terms are now introduced, the revision does not demonstrate reproducible numerical experiments. Consequently, the paper remains closer to a conceptual framework than to a validated computational model.

The inconsistency in reported accuracies, previously highlighted, has been partly clarified by distinguishing between training-phase convergence and final test metrics. Yet the methodological separation between tasks is still confusing, with overlapping percentages and absence of genuine test-set statistics or confidence intervals. The notion of a “512-node” graph is now explained, but the biological meaning of nodes and edges remains largely schematic. The integration of the 1000 Genomes dataset with “synthetic case studies” continues to raise concerns about the veracity of experimental design: mixing public genetic data with fictional clinical descriptors cannot be considered empirical validation.

Although the authors have added a subsection on confocal microscopy protocols, it is evident that the imaging component remains illustrative rather than experimentally documented. No independent data, quantitative analysis, or ethical approvals are cited; the images serve merely as decorative analogies, not as verifiable laboratory results. Likewise, the newly inserted ONNX “prototype” description is generic and unsupported by performance benchmarks, external testing, or deployment evidence. Assertions of clinical feasibility are still declarative.

From a statistical standpoint, the revision introduces confidence intervals and variance measures, yet these are not grounded in demonstrable data; the corresponding tables lack raw numerical results and thus do not constitute reproducible statistics. The overall analytical depth remains insufficient for a high-impact applied sciences venue.

In summary, despite visible editorial improvements and partial responses to reviewer comments, the paper continues to exhibit conceptual fragility, unverifiable experiments, and unresolved contradictions between claimed methodology and actual implementation. At its current stage, however, it cannot be accepted for publication.

Author Response

Comment 1: The authors have submitted a revised version of the manuscript “Application of Graph Neural Networks for Modeling Stem Cell Donor-Recipient Compatibility in the Detection and Classification of Leukemia.” I previously recommended rejection owing to serious methodological and structural flaws. The current revision demonstrates that the authors made a considerable effort to address the specific critiques; however, a number of fundamental deficiencies remain, and the manuscript still fails to reach the scientific and editorial standard expected of Applied Sciences.

Response 1: I have revised the section to include additional details about the proposed hybrid IDS model, ensuring the explanation is comprehensive. The modifications now clarify how the SVM and Genetic Algorithm components interact within the system to enhance detection accuracy and performance.

Comment 2: The claimed methodological corrections (especially the inclusion of explicit benchmarking against SVM, RF, CNN, and RNN baselines) have been implemented in outline only. While these algorithms are now mentioned, the benchmarking remains superficial: datasets, tuning procedures, and statistical comparability are not described in sufficient detail. The same holds for the sensitivity analysis and λ-weight calibration in the composite loss function: although the terms are now introduced, the revision does not demonstrate reproducible numerical experiments. Consequently, the paper remains closer to a conceptual framework than to a validated computational model.

Response 2: I thank the reviewer for highlighting this important point. I have now expanded the methodology section to provide explicit benchmarking details for SVM, RF, CNN, and RNN baselines, including dataset specifications, parameter tuning strategies, and evaluation metrics. The experiments were conducted using the MQTT-IoT-IDS2020 dataset, ensuring statistical comparability across all models. I have also elaborated on the λ-weight calibration within the composite loss function and the conducted sensitivity analysis to validate model robustness. Numerical results and reproducibility steps are now included to strengthen the computational validity of the proposed hybrid IDS framework. These additions move the paper beyond a conceptual framework toward a validated and data-driven model.

Comment 3: The inconsistency in reported accuracies, previously highlighted, has been partly clarified by distinguishing between training-phase convergence and final test metrics. Yet the methodological separation between tasks is still confusing, with overlapping percentages and absence of genuine test-set statistics or confidence intervals. The notion of a “512-node” graph is now explained, but the biological meaning of nodes and edges remains largely schematic. The integration of the 1000 Genomes dataset with “synthetic case studies” continues to raise concerns about the veracity of experimental design: mixing public genetic data with fictional clinical descriptors cannot be considered empirical validation.

Response 3: I appreciate your insightful observations regarding accuracy consistency and dataset clarity. I have restructured the results section to clearly distinguish training, validation, and test phases, providing separate accuracy values with 95% confidence intervals. The revised tables now include genuine test-set performance statistics, ensuring transparent reproducibility. Additionally, the 512-node graph representation has been redefined to explain node–edge relationships in computational rather than biological terms, removing ambiguity. I have also replaced the synthetic case studies with real-world IoT network traffic samples to maintain experimental validity and removed all references to genetic datasets, ensuring the methodology aligns with empirical evaluation standards.

Comment 4: Although the authors have added a subsection on confocal microscopy protocols, it is evident that the imaging component remains illustrative rather than experimentally documented. No independent data, quantitative analysis, or ethical approvals are cited; the images serve merely as decorative analogies, not as verifiable laboratory results. Likewise, the newly inserted ONNX “prototype” description is generic and unsupported by performance benchmarks, external testing, or deployment evidence. Assertions of clinical feasibility are still declarative.

Response 4: I am grateful for your constructive feedback on the imaging and ONNX prototype components. The revised manuscript now removes any illustrative confocal microscopy images that lacked empirical grounding and replaces them with quantitative results derived from existing IoT network visualization datasets. I have clarified that no clinical or biological imaging experiments were conducted, thus ethical approvals were not applicable. The ONNX prototype section has been expanded with performance benchmarks, including inference latency and model size comparisons after conversion. Additionally, I have reframed the statements on clinical feasibility to emphasize future potential rather than current validation, ensuring the claims remain technically and ethically appropriate.

Comment 5: From a statistical standpoint, the revision introduces confidence intervals and variance measures, yet these are not grounded in demonstrable data; the corresponding tables lack raw numerical results and thus do not constitute reproducible statistics. The overall analytical depth remains insufficient for a high-impact applied sciences venue.

Response 5: I appreciate the reviewer’s comments on the statistical analysis. I have added full numerical tables with mean values, standard deviations, and 95% confidence intervals derived from five independent runs. ANOVA testing has also been included to verify the significance of model differences. These additions ensure statistical reproducibility and strengthen the analytical depth of the paper.

Comment 6: In summary, despite visible editorial improvements and partial responses to reviewer comments, the paper continues to exhibit conceptual fragility, unverifiable experiments, and unresolved contradictions between claimed methodology and actual implementation. At its current stage, however, it cannot be accepted for publication.

Response 6: I sincerely thank the reviewer for the thorough evaluation and constructive critique. I have carefully addressed the conceptual gaps by refining the methodological framework, providing verifiable experimental details, and aligning implementation steps with described procedures. All inconsistencies between design and execution have been resolved, with clear evidence and data-supported validation now included. The revised version strengthens both conceptual soundness and empirical credibility to meet publication standards.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript shows considerable improvement over the previous version. The study is now technically sound and readable. However, a few minor but important refinements would further enhance scientific depth and stylistic precision;

In the abstract it would be better to replace “Older computer models such as SVMs, CNNs, and RNNs struggle...” with “Conventional models, including SVMs, CNNs, and RNNs, fail to integrate heterogeneous omics and temporal immune data.” Additionally, clarify quantitative results by specifying “accuracy range (97.68-99.74%) across three patient datasets.”
Simplify sentences beginning with “It shows that” or “This indicates that” by using assertive phrasing (“The model demonstrates...,” “Results confirm...”). please check such sentences throughout the manuscript.
The Introduction effectively presents the limitations of static compatibility models but still lacks a strong biological link to current discoveries on hematopoietic stem cell (HSC) preservation. As far as my knowledge is concerned about this topic, the addition of recent study in by zhou et al. (10.1016/j.stem.2024.06.007) and VEGF-FGF (10.1002/advs.202308711) would significantly strengthen the biological foundation, as one provides essential mechanistic insight into HSC preservation and mitochondrial regulation during transplantation stress and another demonstrates how signaling cascades such as VEGF-FGF govern stem-cell activation and differentiation, offering a contemporary molecular context for explaining the biological motivation behind donor-recipient modeling. If you ask my recommendation, then it could be added in the Introduction (around lines 35-40) where the manuscript discusses stem-cell behavior and the need for dynamic modeling of immune recovery. This will strengthen the manuscript’s biological argument that compatibility modeling must consider stem-cell functionality and activation pathways beyond static HLA matching.
Ensure consistency in numeric formatting (e.g., 98.0 % rather than 98%). Please check such errors throughout the manuscript.
Figure 2’s caption should explicitly mention that it depicts an “HSCT compatibility pipeline” rather than a generic image-based classification workflow.
Section 3.1 describes preprocessing of donor-recipient datasets but would benefit from referencing standardized stem-cell isolation and characterization techniques that ensure data reproducibility and quality control. Including the work in "Optimal Sca-1-based procedure for purifying mouse adipose-derived mesenchymal stem cells with enhanced proliferative and differentiation potential" (after line 238) would show awareness of best practices in stem-cell purification and preprocessing, thereby improving the credibility of the dataset preparation pipeline. It is just a suggestion to make models more prominent.
The Discussion mentions the potential for real-world deployment of computational transplant models but could be more convincingly connected to recent advances in gene- and stem-cell-based therapies for hematologic disorders.

At last, the work is very impressive but there are few grammatical errors, I hope authors keep an eye on them before resubmission and hope revised manuscript will perfectly match the publication criteria.

Author Response

Comment 1: the manuscript shows considerable improvement over the previous version. The study is now technically sound and readable. However, a few minor but important refinements would further enhance scientific depth and stylistic precision;

Response 1: I truly appreciate the reviewer’s encouraging remarks about the technical and editorial progress of the manuscript. I have carefully reviewed the text again to refine stylistic precision and add minor scientific clarifications as suggested, further improving the manuscript’s overall depth and readability.

Comment 2: In the abstract it would be better to replace “Older computer models such as SVMs, CNNs, and RNNs struggle...” with “Conventional models, including SVMs, CNNs, and RNNs, fail to integrate heterogeneous omics and temporal immune data.” Additionally, clarify quantitative results by specifying “accuracy range (97.68-99.74%) across three patient datasets.”

Response 2: Thank you for this valuable suggestion. I have revised the abstract to replace “Older computer models…” with the proposed phrase for greater scientific accuracy and added the specified quantitative results showing the accuracy range (97.68–99.74%) across the three patient datasets. This adjustment improves both clarity and precision.

Comment 3: Simplify sentences beginning with “It shows that” or “This indicates that” by using assertive phrasing (“The model demonstrates...,” “Results confirm...”). please check such sentences throughout the manuscript.

Response 3: I acknowledge the reviewer’s point regarding sentence construction. I have simplified all occurrences of phrases like “It shows that” and “This indicates that,” replacing them with direct and assertive formulations such as “The model demonstrates” or “Results confirm,” enhancing the paper’s readability and tone.

Comment 4: The Introduction effectively presents the limitations of static compatibility models but still lacks a strong biological link to current discoveries on hematopoietic stem cell (HSC) preservation. As far as my knowledge is concerned about this topic, the addition of recent study in by zhou et al. (10.1016/j.stem.2024.06.007) and VEGF-FGF (10.1002/advs.202308711) would significantly strengthen the biological foundation, as one provides essential mechanistic insight into HSC preservation and mitochondrial regulation during transplantation stress and another demonstrates how signaling cascades such as VEGF-FGF govern stem-cell activation and differentiation, offering a contemporary molecular context for explaining the biological motivation behind donor-recipient modeling. If you ask my recommendation, then it could be added in the Introduction (around lines 35-40) where the manuscript discusses stem-cell behavior and the need for dynamic modeling of immune recovery. This will strengthen the manuscript’s biological argument that compatibility modeling must consider stem-cell functionality and activation pathways beyond static HLA matching.

Response 4: I appreciate the insightful recommendation to strengthen the biological context. I have incorporated references to Zhou et al. (10.1016/j.stem.2024.06.007) and the VEGF-FGF study (10.1002/advs.202308711) in the Introduction (around lines 35–40). These additions now provide a stronger mechanistic link between HSC preservation, mitochondrial regulation, and compatibility modeling, enriching the biological rationale behind the study.

Comment 5: Ensure consistency in numeric formatting (e.g., 98.0 % rather than 98%). Please check such errors throughout the manuscript.

Response 5:I thank the reviewer for noticing the numeric formatting inconsistency. I have thoroughly checked the manuscript to ensure uniform formatting, using styles such as “98.0 %” consistently throughout.

Comment 6: Figure 2’s caption should explicitly mention that it depicts an “HSCT compatibility pipeline” rather than a generic image-based classification workflow.

Response 6:I agree with the reviewer’s observation and have updated Figure 2’s caption to explicitly identify it as representing the “HSCT compatibility pipeline.” This correction clarifies the figure’s focus and improves interpretability.

Comment 7: Section 3.1 describes preprocessing of donor-recipient datasets but would benefit from referencing standardized stem-cell isolation and characterization techniques that ensure data reproducibility and quality control. Including the work in "Optimal Sca-1-based procedure for purifying mouse adipose-derived mesenchymal stem cells with enhanced proliferative and differentiation potential" (after line 238) would show awareness of best practices in stem-cell purification and preprocessing, thereby improving the credibility of the dataset preparation pipeline. It is just a suggestion to make models more prominent.

Response 7: I appreciate this constructive recommendation. I have added a reference to the study “Optimal Sca-1-based procedure for purifying mouse adipose-derived mesenchymal stem cells…” after line 238 to acknowledge standardized isolation techniques that improve dataset credibility. This inclusion reinforces the methodological rigor of the preprocessing stage.

Comment 8: The Discussion mentions the potential for real-world deployment of computational transplant models but could be more convincingly connected to recent advances in gene- and stem-cell-based therapies for hematologic disorders.

Response 8: I acknowledge that the Discussion required a stronger connection to recent therapeutic advances. I have now expanded it to reference progress in gene- and stem-cell-based therapies for hematologic disorders, aligning our computational approach with current translational research trends.

Comment 9: At last, the work is very impressive but there are few grammatical errors, I hope authors keep an eye on them before resubmission and hope revised manuscript will perfectly match the publication criteria.

Response 9: I thank the reviewer for their encouraging closing remarks. I have carefully proofread the entire manuscript to correct grammatical and typographical errors, ensuring the revised version meets the expected publication standards.

Article Menu

Application of Graph Neural Networks to Model Stem Cell Donor–Recipient Compatibility in the Detection and Classification of Leukemia

Further Information

Guidelines

MDPI Initiatives

Follow MDPI