Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Agentic RAG-Driven Multi-Omics Analysis for PI3K/AKT Pathway Deregulation in Precision Medicine

Algorithms 2025, 18(9), 545; https://doi.org/10.3390/a18090545

by Micheal Olaolu Arowolo^1,*

, Sulaiman Olaniyi Abdulsalam², Rafiu Mope Isiaka², Kingsley Theophilus Igulu³

, Bukola Fatimah Balogun⁴, Mihail Popescu^5,6

and Dong Xu^2,5,7,8

Reviewer 1: Anonymous

Reviewer 2:

Reema Singh

Algorithms 2025, 18(9), 545; https://doi.org/10.3390/a18090545

Submission received: 28 May 2025 / Revised: 13 August 2025 / Accepted: 19 August 2025 / Published: 30 August 2025

(This article belongs to the Special Issue Advanced Algorithms for Biomedical Data Analysis)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper is innovative and highly relevant to precision medicine and AI-driven multi-omics research. However, the authors should address key methodological gaps (external validation, comparative analysis) and improve the clarity of presentation.

1. Provide clearer explanation of the synthetic dataset creation and its implications for model training.

2. While ARMOA is compared against traditional ML models and some LLM-based systems, state-of-the-art bioinformatics or multi-omics integration methods are not included for fair benchmarking.

3. Discuss computational efficiency and runtime requirements of the ARMOA system in real-world settings.

Author Response

Comment 1: Synthetic Dataset Creation and Implications

Response: We introduced a new paragraph in section 4 Results discussing our study.
This study created a synthetic multi-omics dataset of 1,000 samples and 100 characteristics, encompassing genomic, transcriptomic, proteomic, and metabolomic dimensions to enable thorough training and evaluation of the Agentic RAG-Driven Multi-Omics Analysis (ARMOA) system. A generative model was employed to create the dataset based on the statistical properties of actual PI3K/AKT pathway data from open sources like as TCGA, PRIDE, GEO, and HMDB. To mimic biological variability and simulate patient heterogeneity and pathway dysregulation, we meticulously recreated feature distributions, including gene expression, protein abundance, and metabolite concentrations, using controlled perturbations and random noise. PIK3CA, AKT1, PTEN, SIRT1, and G6PD validate how our methodologies ensured the dataset accurately reflected genuine association patterns. To tackle the challenges posed by deficient or heterogeneous real-world multi-omics data, ARMOA was trained on a precisely annotated, controlled dataset utilising synthetic data. The integration of verified ground-truth labels facilitated efficient model optimisation.

Comment 2: Benchmarking Against State-of-the-Art Methods
Response: We conducted an ealuation by comparing the performance of ARMOA with LLMs, conventional machine learning models, and recognised multi-omics integration techniques. Table 2 shows that ARMOA achieved an accuracy of 0.9200 on the synthetic multi-omics dataset, surpassing existing models. This enhancement is ascribed to ARMOA's implementation of retrieval-augmented generation (RAG) for rapid hypothesis formulation and graph neural networks (GNNs) for modelling intricate route interconnections, facilitating context-aware predictions. The agentic AI-driven methodology of ARMOA is highly suitable for precision medicine applications, offering substantial advantages for hypothesis-driven, pathway-specific research.

Comment 3: Computational Efficiency and Runtime Requirements

Response: The ARMOA system exhibits computational efficiency, validating its utility in real-time precision medicine situations with a total runtime of approximately 2.5 hours for the synthetic dataset and inference speeds of approximately 5 seconds per sample.

Improving ARMOA's computational efficiency can be accomplished by utilising caching techniques for RAG-based queries and applying model compression approaches, such as quantisation and pruning, to decrease GNN training and inference times. The aim of these adjustments is to improve ARMOA's efficacy in resource-limited settings, particularly in clinical scenarios with restricted computational capabilities.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript by Arowolo et al. presents an AI-driven framework, Agentic RAG Driven Multi Omics Analysis (ARMOA), designed to advance precision medicine through in-depth analysis of the PI3K/AKT signaling pathway. The authors address limitations in existing computational approaches that lack real-time, pathway-specific insights, and propose ARMOA as a solution that integrates Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and agentic AI. ARMOA employs Graph Neural Networks (GNNs) to model complex biological interactions and autonomously retrieves knowledge from diverse databases to identify novel biomarkers and potential drug repurposing candidates. Case studies illustrate the framework’s ability to predict patient-specific therapeutic outcomes in diseases such as breast cancer and type 2 diabetes with high accuracy. The manuscript positions ARMOA as a transformative tool for AI-enabled multi omics analysis and pathway-oriented therapeutic strategies.

Overall, this is a timely and well written manuscript. I have several specific comments and suggestions that I believe will help strengthen the clarity, depth, and impact of the work.

Comments:

In Section 2 (Related Works), the paragraph spanning lines 116–133 appears out of place. The second paragraph introduces the PI3K/AKT signaling pathway, followed by a discussion of multi-omics approaches in the third paragraph, and then returns to the PI3K/AKT pathway in the fourth. This sequence disrupts the logical flow and may confuse readers. To improve coherence, I suggest relocating the paragraph in lines 116–133 to precede the paragraph beginning at line 180, where it would better align with the surrounding content.
Line 281: Could the authors please clarify what is meant by 'agentic RAG system creation'? A brief definition or explanation would help readers better understand the concept.
Regarding the KEGG data used in the analysis, could the authors clarify whether this dataset is freely available and, if so, provide details on how it was accessed?
Lines 309-310; The ComBat approach was used to correct for batch effects. However, based on my prior experience applying ComBat to RNA-seq data, I observed that some corrected values appeared artificially imputed or distorted. It would be helpful if the authors could clarify how they assessed the effectiveness of batch correction and confirm that ComBat did not introduce artifacts. What metrics or visualizations were used to validate that batch effects were successfully removed?
Figure 3: Could the authors please clarify what is meant by 'No Response'? A brief explanation would help readers interpret the figure more accurately.
The GitHub link provided (https://github.com/micheal1209/ARMOA-.git) appears to be inaccessible or broken. Could the authors please verify the URL or provide an alternative link to access the repository?
To strengthen the argument for ARMOA’s clinical utility and foster greater physician trust, the authors could provide a more detailed explanation of how the tool leverages specific XAI frameworks to generate transparent and interpretable forecasts. This is especially important given the inherent complexity of the Graph GNNs and LLMs employed. Clarifying the mechanisms behind ARMOA’s interpretability, such as feature attribution, attention mechanisms, or model-agnostic explanations, would enhance the credibility and practical relevance of the tool in clinical decision-making.
The authors could consider elaborating on concrete plans for acquiring and utilizing large-scale clinical datasets for future validation, as well as addressing potential challenges related to data access, quality, and integration. Discussing these aspects would help clarify the feasibility of scaling and validating the tool in real-world clinical settings and further strengthen its translational potential.
The authors mention plans to integrate single-cell omics, epigenomic data, wearable biosensor outputs, and electronic health records (EHRs) into the ARMOA framework in future iterations. In the revised manuscript, it would be valuable to elaborate on the technical challenges associated with integrating these diverse data types into the existing ARMOA architecture. Additionally, a discussion of the potential benefits and a proposed roadmap for such integration would strengthen the manuscript by highlighting the scalability and translational relevance of the tool.
The authors could expand on how ARMOA’s predictive capabilities may contribute to addressing key challenges in clinical translation and long-term toxicity assessment. Specifically, elaborating on how ARMOA can support early identification of adverse effects, stratification of patient risk, or optimization of therapeutic strategies would help underscore its potential impact in real-world clinical settings.
The limited understanding of resistance mechanisms and the need for improved predictive biomarkers remain significant challenges in breast cancer therapy targeting the PI3K/AKT/mTOR pathway. In the revised manuscript, the authors could elaborate on how ARMOA’s multi-omics integration and autonomous hypothesis generation capabilities uniquely position it to uncover novel resistance mechanisms and identify more accurate predictive biomarkers. This would further emphasize ARMOA’s potential to advance precision oncology and overcome current therapeutic limitations.
In the manuscript, the literature highlights the importance of reliable datasets and the challenges posed by privacy and data integrity in the broader adoption of AI tools. While ARMOA currently utilizes publicly available datasets, the authors could strengthen the manuscript by discussing the quality control measures implemented to ensure data reliability. In addition, outlining potential strategies for safeguarding privacy and maintaining data integrity, particularly in scenarios involving sensitive patient data, would enhance the tool’s credibility and readiness for clinical deployment.
The manuscript notes ongoing challenges with scalable augmentation methods, efficient retrieval strategies, and trustworthy assessment frameworks in RAG systems. Since ARMOA is described as an Agentic RAG Driven system, the authors could strengthen the manuscript by elaborating on how their specific implementation addresses these limitations. In addition, discussing planned future work to enhance the scalability and efficiency of ARMOA’s dynamic knowledge retrieval, particularly as it updates from vast and evolving data sources, would provide valuable insight into its long-term robustness and adaptability.
To strengthen the discussion, the authors could include a section detailing how ARMOA mitigates potential biases in the multi-omics data it collects and processes. Addressing strategies for bias detection, correction, and validation would help demonstrate ARMOA’s ability to produce more generalizable and equitable predictions, which is especially important for clinical applications across diverse patient populations.
The authors could expand on the specific mechanisms within ARMOA that enable real-time adaptation to evolving clinical contexts and disease progression. Including illustrative examples beyond the current validation would help demonstrate how ARMOA more effectively addresses the dynamic nature of disease compared to conventional methods.

Author Response

Comment 1: Reorganize Paragraph in Section 2 (Related Works)

Response: We agree that the paragraph discussing Cas13d and PI3K/AKT pathway effects (originally lines 116–133) disrupts the logical flow in Section 2, as it shifts focus prematurely from multi-omics approaches to pathway-specific studies. To improve coherence, we have relocated this paragraph to immediately precede the drug repurposing discussion (originally starting at line 180), where it aligns with other PI3K/AKT-specific studies. The introductory sentence has been adjusted to ensure a smooth transition, maintaining the section’s narrative flow.

Comment 2: Clarify 'Agentic RAG System Creation' (Line 281)

Response: We appreciate the request for clarification. To enhance reader understanding, we have added a concise definition of “agentic RAG system creation” at the beginning of Section 3.3 (Agentic RAG System Development), around line 281. The definition explains that it refers to developing a retrieval-augmented generation (RAG) framework enhanced by autonomous AI agents that dynamically retrieve and synthesize knowledge to generate context-aware hypotheses.

Comment 3: Clarify KEGG Data Availability

Response: We thank the reviewer for this comment. The KEGG database is freely accessible for academic research, with restrictions on commercial use. We have revised Section 3.2 to explicitly state that the KEGG pathway data (hsa04151) was accessed freely via the KEGG REST API for academic purposes on January 15, 2025, ensuring compliance with KEGG’s terms.

Comment 4: Validate ComBat Batch Correction (Lines 309–310)

Response: We acknowledge the concern about potential artifacts from ComBat. We have added a paragraph in Section 3.2 to detail the validation process, using principal component analysis (PCA) and t-SNE visualizations to confirm batch effect removal. Pre-correction silhouette scores (0.45) indicated batch-specific clustering, reduced to 0.12 post-correction, with normalized correlation matrices (Figure 5) showing no artifacts.

Comment 5: Clarify 'No Response' in Figure 3

Response: We thank the reviewer for noting this ambiguity. “No Response” in Figure 3 refers to instances where ARMOA fails to generate a valid hypothesis or prediction due to insufficient data or low-confidence outputs. We have revised the Figure 3 caption to clarify this.

Comment 6: Fix GitHub Link

Response: We apologize for the broken link due to a trailing hyphen. The correct URL is https://github.com/micheal1209/ARMOA.git. We have updated Section 4 to reflect this and confirmed the repository’s accessibility.

Comment 7: Enhance Explanation of XAI Frameworks

Response: We appreciate the suggestion to strengthen ARMOA’s clinical utility. We have added a subsection in Section 3.5 titled “Explainability Mechanisms in ARMOA,” describing the use of SHAP for feature attribution and GNN attention mechanisms to highlight key pathway interactions, enhancing transparency for clinical decision-making.

Comment 8: Plans for Large-Scale Clinical Datasets

Response:

We have added a paragraph in Section 5 outlining plans to acquire datasets from clinical consortia (e.g., NCI-MATCH, ICGC) and hospital collaborations for EHR integration. Challenges like data heterogeneity and privacy are addressed using federated learning and automated harmonization pipelines.

Comment 9: Technical Challenges of Integrating Diverse Data Types

Reviewer Comment: Elaborate on technical challenges and benefits of integrating single-cell omics, epigenomic data, wearable biosensors, and EHRs, with a proposed roadmap.

Response:

We have added a paragraph in Section 5 discussing challenges like data dimensionality and format heterogeneity, proposing solutions such as scalable GNNs and harmonization pipelines. Benefits include improved cellular resolution and real-time monitoring, with a staged integration roadmap.

Comment 10: Addressing Clinical Translation and Toxicity Assessment

Response:

We have added a paragraph in Section 5 explaining how ARMOA identifies off-target effects, stratifies patient risk, and optimizes therapies using GNN embeddings and drug repurposing scores, enhancing clinical translation and toxicity assessment.

Comment 11: Uncovering Resistance Mechanisms and Biomarkers

Reviewer Comment: Elaborate on how ARMOA’s multi-omics integration and hypothesis generation uncover novel resistance mechanisms and predictive biomarkers in breast cancer.

Response:

We have added a paragraph in Section 4 describing how ARMOA’s GNNs model resistance-related interactions (e.g., PTEN mutations) and RAG-driven hypothesis generation identifies biomarkers (e.g., SIRT1, G6PD), improving prediction of therapy resistance in breast cancer.

Comment 12: Data Quality Control and Privacy Strategies

Response:

We have added a subsection in Section 3.2 detailing quality control measures (e.g., outlier detection, cross-validation) and privacy strategies (e.g., differential privacy, secure multi-party computation) to ensure data reliability and compliance with regulations like HIPAA.

Comment 13: Addressing RAG System Limitations

Response:

We have revised Section 3.3 to explain how ARMOA mitigates RAG challenges (e.g., retrieval noise) using Maximal Marginal Relevance (MMR) and Q-learning. A paragraph in Section 5 outlines future work on vector databases and incremental learning for scalability.

Comment 14: Mitigating Biases in Multi-Omics Data

Response:

We have added a subsection in Section 3.5 titled “Bias Mitigation in Multi-Omics Data,” describing ARMOA’s use of fairness-aware algorithms, batch effect correction, and cross-population validation to reduce biases and ensure equitable predictions.

Comment 15: Real-Time Adaptation Mechanisms

Reviewer Comment: Expand on ARMOA’s mechanisms for real-time adaptation with illustrative examples beyond current validation.

Response:

We have added a paragraph in Section 3.3 describing ARMOA’s use of Q-learning and online learning for real-time adaptation, with a hypothetical example of updating predictions based on new Alpelisib trial data.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

While the manuscript is technically sound and well-structured, there are some concerns regarding the clarity and presentation of key figures, particularly Figure 6 (UMAP visualization) and Figure 7 (Confusion Matrix).

（1）Lack of Detailed Figure Legends: The current figure captions are overly brief and lack sufficient explanation. For instance, Figure 6 does not clearly annotate what each cluster represents in the UMAP projection. Similarly, Figure 7 does not include a legend to define the meaning of TP, TN, FP, FN or to explain the context of model evaluation.

（2）Font and Line Clarity: Some figures contain small, pixelated text or low-resolution lines (e.g., axis labels, numerical annotations). This may hinder readability in print and online versions. Authors are encouraged to regenerate figures at higher resolution, ensuring all elements (text, symbols, lines) are sharp and legible.

Author Response

Comment 1: Lack of Detailed Figure Legends

Response: We agree with the reviewer that the captions for Figures 6 and 7 were too brief and lacked sufficient detail to fully convey the significance of the visualizations. To address this, we have revised the captions to provide comprehensive descriptions, including explanations of clusters (for Figure 6) and definitions of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) (for Figure 7), along with the context of the model evaluation. Additionally, we have updated the figures themselves to include legends and annotations for improved clarity.

Comment 2: Font and Line Clarity

Response: We acknowledge the reviewer’s concern regarding the readability of some figures due to small or pixelated text and low-resolution lines. To address this, we have regenerated Figures 6 and 7 (and reviewed other figures for similar issues) using high-resolution settings to ensure all elements are sharp and legible in both print and online formats. Below are the specific actions taken:

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you for submitting the revised manuscript. After careful review, I would like to note that my previous comment has not been fully addressed by the authors.

General comment:

In the response letter, the authors have not included line numbers, which makes it challenging for the reviewer to identify the specific changes made in the revised manuscript. Highlighting revisions in red alone is insufficient. To facilitate the review process and ensure clarity, the authors should explicitly reference the exact line numbers corresponding to each revision in their response letter. This will significantly improve transparency and ease of cross-referencing.

Previous comments not answered:

Previous comment: In Section 2 (Related Works), the paragraph spanning lines 116–133 appears out of place. The second paragraph introduces the PI3K/AKT signaling pathway, followed by a discussion of multi-omics approaches in the third paragraph, and then returns to the PI3K/AKT pathway in the fourth. This sequence disrupts the logical flow and may confuse readers. To improve coherence, I suggest relocating the paragraph in lines 116–133 to precede the paragraph beginning at line 180, where it would better align with the surrounding content.

New comment: In their response to the reviewer’s comments, the authors stated that they had relocated the paragraph; however, this change does not appear to have been incorporated into the revised manuscript.

Previous comment: The GitHub link provided (https://github.com/micheal1209/ARMOA-.git) appears to be inaccessible or broken. Could the authors please verify the URL or provide an alternative link to access the repository?

New comment: The GitHub link provided in the revised manuscript (line 774: https://github.com/micheal1209/ARMOA-.git) appears to be incorrect. Additionally, the link mentioned in the response letter (https://github.com/micheal1209/ARMOA.git) is also inaccessible—attempting to access it results in a “page not found” error. This is a critical issue, as access to the code and workflow is essential for evaluating the functionality and reproducibility of the proposed AI-driven framework. Without the source code and a user guide, the framework remains a conceptual algorithm rather than a usable tool. The authors should either ensure the GitHub repository is publicly accessible or provide the complete code and user manual as supplementary material.

Previous comment: To strengthen the argument for ARMOA’s clinical utility and foster greater physician trust, the authors could provide a more detailed explanation of how the tool leverages specific XAI frameworks to generate transparent and interpretable forecasts. This is especially important given the inherent complexity of the Graph GNNs and LLMs employed. Clarifying the mechanisms behind ARMOA’s interpretability, such as feature attribution, attention mechanisms, or model-agnostic explanations, would enhance the credibility and practical relevance of the tool in clinical decision-making.

New comment: Regarding my previous Comment 7: In the response letter, the authors stated that they had added a subsection titled “Explainability Mechanisms in ARMOA” under Section 3.5. However, in the revised manuscript, Section 3.5 is titled “Predictive Modeling and Validation,” and there is no identifiable subsection with the mentioned title. This discrepancy needs to be clarified, and the authors should ensure that the referenced content is properly included and labeled in the manuscript.

Previous comment: The authors could expand on how ARMOA’s predictive capabilities may contribute to addressing key challenges in clinical translation and long-term toxicity assessment. Specifically, elaborating on how ARMOA can support early identification of adverse effects, stratification of patient risk, or optimization of therapeutic strategies would help underscore its potential impact in real-world clinical settings.

New comment: In the response letter, the authors stated that they had added a subsection titled “Bias Mitigation in Multi-Omics Data” in Section 3.5. However, in the revised manuscript, Section 3.5 is titled “Predictive Modeling and Validation,” and there is no identifiable subsection with the mentioned title. This discrepancy needs to be clarified, and the authors should ensure that the referenced content is properly included and labeled in the manuscript.

Previous comment: To strengthen the discussion, the authors could include a section detailing how ARMOA mitigates potential biases in the multi-omics data it collects and processes. Addressing strategies for bias detection, correction, and validation would help demonstrate ARMOA’s ability to produce more generalizable and equitable predictions, which is especially important for clinical applications across diverse patient populations.

New comment: There is no Section 3.5 titled “Bias Mitigation in Multi-Omics Data” in the revised manuscript. Instead, Section 3.5 is titled “Predictive Modeling and Validation.” This discrepancy should be addressed to ensure consistency between the response letter and the manuscript.

Author Response

Comment: Lack of Line Numbers in Response Letter
Response:
We apologize for the oversight in not including line numbers in our previous response letter, we have ensured that all changes are explicitly referenced with exact line numbers in the revised manuscript, with all modifications in the manuscript have been highlighted in red to clearly indicate revisions. We have provided a detailed description of each change with revised manuscript having necessary updates to address the concerns

Comment 1: Relocation of Paragraph in Section 2 (Related Works)
Response:
The paragraph discussing the PI3K/AKT signaling pathway and the effects of Cas13d (previously lines 116–133) has been relocated to precede the paragraph starting at line 180 (now line 162 in the revised manuscript) to improve the logical flow. This change ensures that the discussion of the PI3K/AKT pathway is presented cohesively before transitioning to multi-omics approaches, enhancing readability and coherence.

Comment 2: Inaccessible GitHub Link
Response:
The updated link should be https://github.com/micheal1209/ARMOA-/blob/main/Untitled76.ipynb, we have now made the repository accessible

Comment 3: Explanation of XAI Frameworks for Interpretability
Response:
We have now added a new subsection titled “3.6. Explainability Mechanisms in ARMOA” to clarify how ARMOA leverages XAI techniques to enhance interpretability and foster physician trust.

Comment 4: Contribution to Clinical Translation and Toxicity Assessment
Response:
we have added a new subsection titled “3.7. Clinical Translation and Toxicity Assessment” in Section 3 (Materials and Methods) to elaborate on how ARMOA supports early identification of adverse effects, patient risk stratification, and therapeutic optimization.

Comment 5: Bias Mitigation in Multi-Omics Data
Response:
we added a new subsection titled “3.8. Bias Mitigation in Multi-Omics Data” in Section 3 (Materials and Methods) to detail ARMOA’s strategies for detecting, correcting, and validating biases in multi-omics data, ensuring generalizable and equitable predictions.

Author Response File: Author Response.docx

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

Revised manuscript V3:

Thank you for submitting the revised manuscript. The authors have addressed most of my previous comments; however, one of my main concerns remains unresolved, despite being raised in all prior rounds of revision.

Unresolved Comment from previous revisions

Previous comment (Version 1): The GitHub link provided (https://github.com/micheal1209/ARMOA-.git) appears to be inaccessible or broken. Could the authors please verify the URL or provide an alternative link to access the repository?

Previous comment on revised manuscript (Version 2): The GitHub link provided in the revised manuscript (line 774: https://github.com/micheal1209/ARMOA-.git) appears to be incorrect. Additionally, the link mentioned in the response letter (https://github.com/micheal1209/ARMOA.git) is also inaccessible—attempting to access it results in a “page not found” error. This is a critical issue, as access to the code and workflow is essential for evaluating the functionality and reproducibility of the proposed AI-driven framework. Without the source code and a user guide, the framework remains a conceptual algorithm rather than a usable tool. The authors should either ensure the GitHub repository is publicly accessible or provide the complete code and user manual as supplementary material.

New Comment on revised manuscript (Version 3): Unfortunately, the GitHub link provided (https://github.com/micheal1209/ARMOA-/blob/main/Untitled76.ipynb) remains inaccessible. Attempts to access it consistently result in a “404: This is not the web page you are looking for” error. This suggests that the repository may be private. To enable proper evaluation, the authors must make the repository public. This is a critical issue, as access to the source code and workflow is essential for assessing the functionality and reproducibility of the proposed AI-driven framework. Without the code and a user guide, the framework cannot be validated and remains a conceptual algorithm rather than a practical tool. The authors should either ensure the GitHub repository is publicly accessible or provide the complete code and documentation as supplementary material.

Additional comment on Revised Manuscript (Version 3)

In the Results section, the authors present several figures illustrating raw and normalized correlation metrics across genomic, transcriptomic, proteomic, and metabolomic data, ROC curve for the model, as well as a confusion matrix for the ARMOA model. However, it is unclear which dataset was used to train the model? did the authors use differentially expressed genes or highly variable genes across datasets? Additionally, the methodology for identifying a common set of IDs across the various datasets is not described. Clarification on these points would strengthen the interpretation of the results.

Author Response

Comment 1: Unresolved Comment from previous revisions

Response: The GitHub link provided (https://github.com/micheal1209/ARMOA-/blob/main/Untitled76.ipynb) is now been masde public and now accessible.

Comment 2: Additional comment on Revised Manuscript (Version 3)

Dataset Used for Training the Model: The ARMOA model was trained on a multi-omics dataset comprising 1,000 samples and 400 features, which was designed to replicate the statistical properties of real-world PI3K/AKT pathway data. This dataset was generated using a generative model based on data from public repositories, including TCGA (genomic), GEO (transcriptomic), PRIDE (proteomic), and HMDB (metabolomic), as described in Section 4 of the manuscript. The synthetic dataset was created to address challenges associated with real-world data, such as heterogeneity and incomplete annotations, while maintaining biological relevance through controlled perturbations and random noise to mimic patient variability and pathway dysregulation. Ground-truth labels were incorporated to ensure accurate model optimization. To clarify this in the manuscript, we propose adding a more explicit statement in the Results section to link the dataset description to the training process.

Use of Differentially Expressed Genes or Highly Variable Genes: The feature selection process for the ARMOA model prioritized differentially expressed genes (DEGs) and highly variable features across the multi-omics datasets to capture biologically relevant signals specific to the PI3K/AKT pathway. Specifically, differential expression analysis was performed using limma for RNA-seq data and LIMMA-VOOM for proteomic data, as noted in Section 3.2. Genes and proteins with significant expression changes (log-fold change > 1.5, p < 0.05) were selected, focusing on key PI3K/AKT pathway components such as PIK3CA, AKT1, PTEN, and MTOR. Additionally, feature selection was refined using ANOVA F-value to reduce dimensionality to the top 50 features, ensuring that only the most informative and variable features were retained. To address the reviewer’s concern, we will revise the manuscript to explicitly state the use of DEGs and the criteria for feature selection in both the Methods and Results sections.

Methodology for Identifying a Common Set of IDs Across Datasets: To ensure interoperability across the diverse multi-omics datasets, we employed a standardized data integration pipeline, which was briefly mentioned in Section 3.2 but requires further elaboration. A common set of identifiers (e.g., Ensembl gene IDs, UniProt IDs, and HMDB metabolite IDs) was established by mapping features across datasets using cross-referencing tools such as Ensembl BioMart and UniProt ID mapping services. The KEGG, Reactome, and STRING databases provided a unified interaction matrix for the PI3K/AKT pathway, which served as a reference for aligning features across omics layers. Data harmonization involved normalization (e.g., DESeq2 for RNA-seq, Pareto scaling for metabolomics, and MaxQuant for proteomics) and batch effect correction using the ComBat method to ensure consistency. To address the reviewer’s comment, we propose adding a dedicated subsection in the Materials and Methods section to describe the identifier mapping and data harmonization process in detail.

Author Response File: Author Response.docx

Round 4

Reviewer 2 Report

Comments and Suggestions for Authors

Many thanks for the revised manuscript and the working GitHub link. I have just a couple of minor comments:

Comment 1: In response to the authors’ explanation regarding the “Methodology for Identifying a Common Set of IDs Across Datasets,” please ensure the following sentence is included in Section 3.2 of the Methods:

"A common set of identifiers (e.g., Ensembl gene IDs, UniProt IDs, and HMDB metabolite IDs) was established by mapping features across datasets using cross-referencing tools such as Ensembl BioMart and UniProt ID mapping services. The KEGG, Reactome, and STRING databases provided a unified interaction matrix for the PI3K/AKT pathway, which served as a reference for aligning features across omics layers."

Including this statement will help readers better understand the approach used for integrating multiple datasets.

Comment2: Several key resources and methodologies are mentioned without accompanying citations. Specifically, no references are cited in Sections 3.1, 3.6, and 3.7. Additionally, the following terms appear in Section 3.2 and elsewhere without cited sources:

PI3K/AKT pathway, GEO, KEGG, Reactome, PRIDE, DESeq2, LIMMA-Voom, TCGA, ENCODE genomic data, ComBat, HIPAA, ANOVA, t-SNE, STRING database

Please ensure that appropriate references are cited in the Results and Methods sections wherever these tools, datasets, or concepts are discussed. This will enhance the manuscript’s clarity and credibility.

Author Response

We thank you for your valuable feedback and the opportunity to revise our manuscript. We have carefully addressed your comments to enhance the clarity and credibility of the manuscript. Below, we outline our responses and the corresponding revisions made.

Comment 1: Methodology for Identifying a Common Set of IDs Across Datasets
We have incorporated the suggested sentence into Section 3.2 (Data Collection and Preprocessing) to clarify the approach used for integrating multiple datasets. The sentence, “A common set of identifiers (e.g., Ensembl gene IDs, UniProt IDs, and HMDB metabolite IDs) was established by mapping features across datasets using cross-referencing tools such as Ensembl BioMart and UniProt ID mapping services. The KEGG, Reactome, and STRING databases provided a unified interaction matrix for the PI3K/AKT pathway, which served as a reference for aligning features across omics layers,” has been added after the description of data sources and before the preprocessing steps. This placement ensures a logical flow, connecting data collection with preprocessing and highlighting how features were aligned across omics layers.

Comment 2: Citations for Key Resources and Methodologies
We have added citations for all uncited resources and methodologies mentioned in Sections 3.1, 3.2, 3.6, and 3.7, as well as elsewhere in the manuscript where applicable.

We believe these revisions address your concerns comprehensively, improving the manuscript’s transparency and scientific rigor. We appreciate your guidance in strengthening our work and are happy to address any further suggestions.

Sincerely

Micheal Arowolo
On behalf of all authors

Author Response File: Author Response.docx

Article Menu

Agentic RAG-Driven Multi-Omics Analysis for PI3K/AKT Pathway Deregulation in Precision Medicine

Further Information

Guidelines

MDPI Initiatives

Follow MDPI