FR-BINN: Biologically Informed Neural Networks for Enhanced Biomarker Discovery and Pathway Analysis

Cao, Yangkun; Yin, Chaoyi; Zhou, Xinsen; Zhao, Yonghe

doi:10.3390/ijms26146670

Open AccessArticle

FR-BINN: Biologically Informed Neural Networks for Enhanced Biomarker Discovery and Pathway Analysis

School of Artificial Intelligence, Jilin University, Changchun 130012, China

^*

Authors to whom correspondence should be addressed.

Int. J. Mol. Sci. 2025, 26(14), 6670; https://doi.org/10.3390/ijms26146670

Submission received: 8 June 2025 / Revised: 28 June 2025 / Accepted: 8 July 2025 / Published: 11 July 2025

(This article belongs to the Special Issue Machine Learning Applications in Bioinformatics and Biomedicine: 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Chronic inflammation plays a pivotal role in human health, with certain inflammatory conditions significantly increasing the risk of cancer, while others do not. However, the molecular mechanisms underlying this divergent risk remain poorly understood. In this study, we propose FR-BINN, a biologically informed neural network framework for disease prediction and interpretability. Incorporating Fenton reaction (FR)-related biological priors and leveraging multiple interpretability methods, FR-BINN identifies key genes driving cancer-prone and non-cancer-prone chronic inflammatory diseases. The experimental results demonstrate that FR-BINN achieves superior classification performance while offering biologically interpretable insights. Moreover, attribution results derived from different explainable techniques show high consistency, and intra-method results exhibit distinct patterns across disease categories. We further combine large language models with feature attributions to identify candidate biomarkers, and independent datasets confirm the robustness of these findings. Notably, genes such as NCOA1 and SDHB are identified as being associated with cancer susceptibility. The framework further reveals distinct patterns in energy metabolism, oxidative stress, and pH regulation between cancer-prone and non-cancer-prone inflammatory diseases. These insights enhance our understanding of inflammation-associated tumorigenesis and contribute to the identification of potential biomarkers and therapeutic targets.

Keywords:

biologically informed neural network; chronic inflammation; explainable artificial intelligence; biomarker

1. Introduction

Chronic inflammation represents a significant challenge to global health, imposing substantial physical and psychological burdens on patients while dramatically increasing societal healthcare costs. Its impact extends across a broad spectrum of diseases, and its intricate relationship with cancer progression has garnered considerable attention. Chronic inflammation plays a pivotal role in nearly every stage of carcinogenesis, from precancerous lesions to malignant transformation, thereby acting as a critical modulator of cancer development [1,2,3]. However, not all chronic inflammatory diseases exhibit a clear link to cancer [4,5]. This raises a fundamental question: why do certain types of chronic inflammation lead to cancer, while others remain relatively stable? At present, there is still a lack of a unified standard definition or clear biological mechanism explanation. Therefore, understanding the molecular basis behind these differences, especially identifying key genes, is essential in shedding light on why some chronic inflammations are more likely to promote cancer while others do not. Notably, these key factors may provide early prevention and diagnosis and offer potential targets for the development of novel therapeutic strategies.

Recent studies have uncovered a prevalent phenomenon of reduced protein mobility in various chronic diseases, a condition termed proteolethargy, where pathogenic signaling suppresses the mobility of proteins essential for cellular functions [6,7,8,9]. These proteins often exhibit functional dysregulation closely tied to the pathological features of chronic diseases. The research revealed that this reduction in protein mobility is strongly associated with dysregulated redox environments, highlighting a potential connection to the oxidative stress commonly seen in chronic conditions [6]. Previous studies have also established that reactive oxygen species (ROS) play a critical role in the progression of chronic diseases and cancer, with iron-mediated redox reactions such as the Fenton reaction identified as central players in these processes [10,11,12,13]. Building on these findings, this study leverages the Fenton reaction as a molecular mechanisms basis to explore the differences between cancer-prone chronic inflammatory diseases (CP-CIDs) and non-cancer-prone chronic inflammatory diseases (NCP-CIDs).

In recent years, artificial intelligence, especially deep learning, has made remarkable progress in the field of biomedicine, and has been widely used in protein structure prediction, gene expression analysis of single-cell, gene–disease association prediction, cancer subtyping, and so on [14,15,16,17,18]. Beyond capturing complex nonlinear relationships between inputs and outputs, deep learning models have also been employed as foundation models for biological reasoning and knowledge discovery [19,20,21,22,23,24,25,26]. Despite these advancements, the application of deep learning to model the cancer propensity of chronic inflammation remains underexplored. Existing studies lack both a predictive framework tailored to this problem and an approach to mining key genes and discovering new knowledge from predictive models.

In this study, we proposed FR-BINN, a novel framework for disease category prediction and interpretability analysis based on the biological knowledge-informed neural networks. In summary, the main contributions of FR-BINN are as follows:

We established a formal definition of whether chronic inflammatory diseases are susceptible to carcinogenesis and compile transcriptomic datasets from chronic inflammation diseases to provide a foundation for predictive modeling.
Based on the biological domain knowledge of the FR, we constructed the hierarchical knowledge neural network. The interpretable approaches were utilized to explore the important genes and the potential patterns of FR. Leveraging chain-of-thought reasoning, the large language model offers auxiliary semantic analysis and explanations.
Extensive experiments and downstream analyses demonstrate the ability of FR-BINN, providing novel insights into the mechanisms of inflammation-driven carcinogenesis and offering potential targets for prevention and therapy.

2. Results

2.1. Overview of FR-BINN Framework

The FR-BINN framework, as illustrated in Figure 1, presents an integrated framework for prediction and interpretability analysis aimed at identifying key genes involved in the predisposition to transition from chronic inflammation to cancer. This framework consists of three core components: the biologically informed module, the interpretability module, and the large language model (LLM)-based semantic reasoning module. These modules collectively enable both key genes and biological insight extraction, ensuring that the results are reliable and interpretable.

First, we establish formal definitions for CP-CIDs and NCP-CIDs based on statistical indicators. Based on these definitions, we collected a comprehensive dataset encompassing transcriptomic profiles from diseases and control groups. Then, the biological prior knowledge, specifically gene sets and pathway interactions related to the FR, were further compiled (Figure 1A).

To mitigate shortcut learning by the model and enhance its interpretability, we incorporated the biological hierarchy knowledge into a sparse masked neural network for biologically informed module (Figure 1B). This module leverages gene sets and pathway relationships related to the FR to construct a biologically informed architecture. The resulting sparse biological hierarchy network is trained to classify samples into four distinct categories: cancer, CP-CIDs, NCP-CIDs, and normal.

Additionally, an interpretability module is developed to provide multi-level explanations, encompassing features, pathways, and higher-level biological patterns. This module plays a crucial role in identifying key genes that drive the distinction between CP-CIDs and NCP-CIDs. Furthermore, it enables the exploration of the learned patterns related to the FR.

Finally, the large language model module, built on chain-of-thought (COT) reasoning, facilitates the semantic-level interpretation of the attribution results. By harnessing the reasoning and knowledge capabilities of the LLM, this module contextualizes key findings in terms of their biological significance, linking gene attribution to inflammation-to-cancer transitions and providing an additional semantic explanation.

2.2. Performance Evaluation

Epidemiological evidence indicates that certain chronic inflammatory conditions are more likely to undergo malignant transformation, whereas others exhibit a lower propensity for cancer progression. To train and evaluate the model’s performance, we constructed a dataset and defined class labels as described in the Methods section. Chronic inflammatory diseases were categorized into CP-CIDs or NCP-CIDs based on statistical thresholds including Relative Risk (RR), Hazard Ratio (HR), and Standardized Incidence Ratio (SIR) (Figure 2A).

The biologically informed hierarchical neural network, built using prior knowledge-based network, serves as the foundation for identifying key genes associated with the cancer susceptibility of chronic inflammation diseases. To assess the predictive performance of FR-BINN in disease classification, we compared it against five widely-used machine learning algorithms: Kolmogorov–Arnold Networks (KAN) [27], k-Nearest Neighbors (KNN), Naive Bayes (NB), Random Forest (RF), and XGBoost. As shown in Figure 2B, model performance was evaluated using two key metrics: Precision and F1-score, with Recall results provided in Supplementary Figure S1. The results demonstrate that our proposed method outperforms all baselines across these metrics, achieving a Precision of 0.8715, Recall of 0.8737, and F1-score of 0.8702. These findings highlight the superior capability of the biologically informed architecture based on the FR priors, effectively distinguishing between cancer, CP-CIDs, NCP-CIDs, and normal bulk tissue samples. In addition, we conducted a comprehensive hyperparameter search to optimize model performance. Using the F1-score as the evaluation criterion, we determined the optimal settings for key hyperparameters.

Furthermore, Figure 2C depicts the density distribution of samples across different clinical stages, sorted based on predicted probabilities generated by the model. By ranking the samples and computing their density distributions, we observed the concordance between the predicted probabilities and the progression of clinical stages for specific diseases. For example, the distribution of samples for non-cancer-prone inflammatory bowel syndrome (IBS) and cancer-prone Hepatitis B virus (HBV)-associated chronic inflammation aligned closely with the observed clinical progression of these diseases. This demonstrates that beyond achieving high predictive accuracy, the model’s probability scores can provide insights into the progression dynamics of diseases.

These results collectively validate the effectiveness of FR-BINN in predicting the cancer propensity of chronic inflammation. The strong performance across classification tasks underscores the utility of integrating biological priors, while the model’s ability to capture clinically relevant patterns further reinforces its potential to uncover key genes driving inflammation-to-cancer transitions. This provides a solid foundation for subsequent interpretability, bioinformatics analysis at the gene level, and mechanism discovery at the pathway level.

2.3. Validation of Attribution Methods

The robustness of our framework’s gene attribution results were further validated through a combination of complementary strategies, including a visualization of the top attributed genes and gene expression heatmaps, an assessment of inter- and intra-interpretability method agreement, LLM-driven knowledge refinement, and independent classification (Figure 3).

To investigate the key genes learned by the model under the influence of biological priors, we employed the Integrated Gradients (IG) and Shapley value (SV) methods within the interpretability module to derive attribution results for both CP-CIDs and NCP-CIDs (Supplementary File S2). Figure 3A highlights the top 10 genes attributed to CP-CIDs and NCP-CIDs categories based on the IG method. For instance, genes such as SDHB contributed positively to the classification of CP-CIDs samples, and NCOA1 provided evidence favoring NCP-CIDs classification.

To illustrate attribution results at the gene expression level, we visualized the union of the top 10 genes identified by both the IG and SV methods for each category in a heatmap (Figure 3B). The results reveal that the genes identified by both methods exhibit strong discriminatory power between CP-CIDs and NCP-CIDs, reinforcing the biological relevance of the attributions.

Considering that combining multiple attribution methods can reduce bias and increase robustness [28], we further assessed the overlap between the two methods within each category. Figure 3C demonstrates a high degree of overlap between the IG and SV attribution results, with intersections exceeding 80% across the top 10, 30, 50, and 100 genes for both CP-CIDs and NCP-CIDs categories. This high overlap further supports the reliability of the attribution results and suggests that the key genes identified are robust across interpretability methods. Importantly, we further explored the extent to which the same interpretability approach overlaps under both categories (Supplementary Figures S2 and S3). There is less than 20% overlap in the top 100 attribution results between CP-CIDs and NCP-CIDs categories, underscoring the distinct features that differentiate cancer-prone from non-cancer-prone inflammatory diseases. This distinction provides a valuable foundation for further exploration of the molecular mechanisms driving the differences between these two categories.

Moreover, by leveraging the LLM combined with COT, we further refined the top 10 ranked genes in both attribution methods. This was achieved by filtering out genes with weak associations to cancer progression, as determined through LLM-driven semantic explanation (Supplementary File S2). The step ensured that the remaining genes were more strongly linked to the biological processes relevant to the cancer-prone phenotype. Then, we built a logistic regression model using the refined gene sets to evaluate the model’s ability to classify samples as CP-CIDs or NCP-CIDs (Figure 3D). Finally, we tested the logistic regression model on independent disease datasets to assess its generalizability. As shown in Figure 3E, the model achieved high predictive accuracy in independent datasets for diseases such as Multiple Sclerosis in the NCP-CIDs and Dermatomyositis in the CP-CIDs, demonstrating the robustness of the attribution results across diverse contexts.

In summary, these results confirm the effectiveness of FR-BINN interpretability framework in identifying key genes that distinguish cancer-prone from non-cancer-prone chronic inflammatory diseases. The strong consistency between two attribution methods, alongside the heterogeneity of across different categories, coupled with successful validation on independent datasets, underscores the biological significance of the identified genes.

2.4. Analysis of Attribution Results

Following the validation of attribution methods, we computed total attribution scores for further ranking genes, investigating potential causal associations between the genes and chronic disease or cancer outcomes, and systematically analyzing the top-ranked genes.

At first, we calculated the total attribution scores of the two feature attribution methods in CP-CIDs and NCP-CIDs. The genes were ranked based on the total attribution scores and the results of the LLM reasoning module, with the top 10 candidates visualized in Figure 4A. Among these, NCOA1 achieved the highest total attribution score, indicating its significant contribution to the model’s classification decisions.

Using a two-sample Mendelian randomization database, we further assessed the causal relationships of the top 10 genes. Out of the top 10 genes, 7 genes (NCOA1, SDHB, DAD1, SNX3, ALDH9A1, PSMD2, and EXTL3) were identified as causal genes that exhibited causal relationships with diseases (Supplementary File S2). Moreover, 4 genes (NCOA1, CANT1, ALDH9A1, and EXTL3) from the top 10 list encode proteins known to catalyze hydrogen ion production reactions. The Venn diagram comparing the aboved causal genes and hydrogen production-related genes is shown in Figure 4B. Among the intersection of these two categories, three genes—NCOA1, ALDH9A1, and EXTL3—were identified as both causal genes and hydrogen-producing genes. This overlap implicates the potential functional importance of these genes in chronic inflammation and carcinogenesis.

Focusing on NCOA1, which had the highest total attribution score, we conducted a detailed expression analysis. Figure 4C illustrates the differential expression of NCOA1 between CP-CIDs and NCP-CIDs and across nine additional diseases. NCOA1 was consistently expressed at lower levels in disease groups compared to controls, and its expression in CP-CIDs was significantly lower than in NCP-CIDs. Further, we investigated NCOA1 expression in cancers, as shown in Figure 4D, where its expression was found to be reduced in the majority of cancer types relative to controls. Additionally, Figure 4E illustrates a survival analysis for NCOA1 in liver hepatocellular carcinoma (LIHC), revealing that lower NCOA1 expression correlates with improved survival outcomes. Several studies have reported that NCOA1 is highly expressed in various cancers and promotes tumor progression. For example, in breast cancer and prostate cancer, NCOA1 overexpression enhances cell proliferation and metastasis [29,30]. However, it was found that the downregulation of NCOA1 decreased cell invasion in HCC in vivo and was associated with a better 5-year survival rate in HCC patients [31]. The researchers demonstrated the pivotal role of NCOA1 as a critical modulator in HCC metastasis, presenting a potential therapeutic target for HCC intervention. Functionally, NCOA1 acts as a transcriptional coactivator by directly binding to transcription factors and recruiting to target gene promoters, thereby enhancing gene transcription through chromatin remodeling and transcriptional complex formation. Its emerging multiorgan oncogenic role is under intense investigation [32,33]. Mechanistically, NCOA1 in colorectal cancer has been shown to activate JAK-STAT signaling by inhibiting SOCS1 expression and coactivate STAT3 and IRF1 to enhance Programmed death-ligand 1 (PD-L1) transcription, as well as stabilize PD-L1 protein by inhibiting SPOP-mediated proteasomal degradation, highlighting its potential as a significant biomarker across multiple tumor types [34].

Furthermore, SDHB, ranking second in total attribution scores, is considered a significant contributor to the model’s classification of CP-CIDs samples. SDHB plays a critical role in cellular and tissue metabolism by ensuring the stability of mitochondrial complexes and the proper functioning of the TCA cycle. Its proper function is crucial for mitochondrial ATP production, providing energy to cells across various tissues. Importantly, SDHB is linked to tumor formation [35,36]. Given its critical function in cellular metabolism and its established link to tumor development, SDHB is increasingly recognized as both a significant prognostic biomarker and a promising therapeutic target. Research has demonstrated that changes in SDHB expression can serve as a prognostic indicator in cancers such as colorectal cancer [37] and clear cell renal cell carcinoma [38]. Beyond its prognostic value, SDHB deficiency has been shown to be a key driver in the development of specific tumor types, including pheochromocytomas and paragangliomas [39]. In these tumors, multi-omic analyses have identified unique molecular profiles associated with metastasis, highlighting SDHB as a crucial target for treatment [39]. Additionally, the metabolic dysregulation caused by altered SDHB function, such as in hepatoblastoma, can be targeted to inhibit tumor cell proliferation [40]. The clinical utility of SDHB extends to its metabolic product, succinate, which can be measured as a serum biomarker for SDHB-mutated tumors [41], further solidifying its importance in both diagnosis and therapeutic strategies.

2.5. Pathway Enrichment Analysis

Conductance analysis reveals distinct utilization patterns of the FR between CP-CIDs and NCP-CIDs, with the top five pathways based on conductance values for both categories being presented (Figure 5A). Specifically, the model relies on divergent patterns to distinguish between disease types. These differences prompted further investigation into the underlying causes of these variations. Consequently, we explored the Gene Ontology (GO)-enriched pathways associated with the attribution results of two categories, aiming to understand the biological distinctions between these two groups at the pathway level. The full results of the pathway enrichment analysis are provided in Supplementary File S2.

The top 10 enriched pathways of biological process for CP-CIDs and NCP-CIDs are shown in Figure 5B. Compared with the pathway of NCP-CIDs, the major pathways of CP-CIDs enrichment are related to energy metabolism, including oxidative phosphorylation and the aerobic electron transport chain. Energy metabolism not only serves as the foundation for rapid cellular proliferation but also plays a pivotal role in responding to persistent microenvironmental stress. To better understand this phenomenon, we examined the role of energy metabolism in both disease categories and its potential implications.

The enriched energy metabolism pathways identified in the attribution results for both categories are illustrated in Figure 5C. CP-CIDs showed enrichment for numerous energy metabolism-related pathways, while NCP-CIDs exhibited far fewer enriched energy pathways. This observation suggests that CP-CIDs have a significantly higher energy demand. By calculating the Gene Set Variation Analysis (GSVA) scores for each pathway (Figure 5D), we found that CP-CIDs demonstrated markedly higher energy demand compared to NCP-CIDs [42]. This elevated energy requirement likely supports the rapid proliferation observed in CP-CIDs and provides essential substrates for subsequent nucleotide biosynthesis.

The choice of energy metabolism pathways not only determines cellular energy supply but also directly impacts the generation of ROS [43,44]. To further explore this, we analyzed pathways related to oxidative stress and endoplasmic reticulum stress (Supplementary Figure S4). While both categories exhibited enrichment for oxidative stress pathways, CP-CIDs also showed significant enrichment for pathways involved in hydrogen peroxide metabolism and cellular responses to free radicals. These results indicate that CP-CIDs likely face higher levels of oxidative stress, which could be a key driver of tumorigenesis. Conversely, NCP-CIDs appear to experience more controlled oxidative conditions, potentially mitigating the risk of carcinogenesis. The GSVA scores for related pathways further confirmed the heightened oxidative stress in CP-CIDs (Supplementary Figure S5).

Excess iron accumulation is a major driver of the FR, which generates hydroxyl radicals and other ROS, leading to oxidative stress, pH dysregulation, and cellular damage [45,46,47]. These effects increase the risk of cancer by causing DNA damage, protein dysfunction, and cellular instability. The pathway analysis of iron metabolism revealed that both disease categories showed enrichment for pathways related to intracellular iron ion homeostasis, highlighting the importance of maintaining iron levels in these conditions. However, CP-CIDs also exhibited enrichment for pathways related to iron–sulfur cluster assembly, suggesting that CP-CIDs experience more severe oxidative stress and require additional mechanisms to manage iron-related challenges. In contrast, NCP-CIDs appear to better maintain normal iron levels (Supplementary Figure S6). The GSVA scores for iron metabolism pathways further support these observations (Supplementary Figure S7).

In light of the observed pH disequilibrium between the intracellular and extracellular pH in various cancers, we investigated the pH regulation pathways in CP-CIDs and NCP-CIDs [48,49]. Both categories showed enrichment for pathways related to the regulation of cellular pH, macroautophagy, and intracellular pH reduction, suggesting that cells may enhance proton production to counteract the alkaline pH pressure caused by chronic inflammation and localized iron overload (Supplementary Figures S8 and S9). Notably, CP-CIDs were enriched for a wide range of nucleotide biosynthesis and catabolic metabolism pathways, particularly purine nucleotide synthesis pathways (Supplementary Figures S10 and S11). The proton-generating property of purine synthesis, coupled with reduced proton consumption during its catabolism, may serve as a compensatory mechanism to mitigate intracellular alkalinity caused by chronic inflammation and persistent Fenton reactions [50]. This mechanism could provide CP-CIDs with a means to manage persistent intracellular pH stress while simultaneously supporting rapid proliferation and biosynthetic needs [13].

Finally, we examined the extent of mitochondrial and cytoplasmic protein damage in the two categories, as shown in Figure 5E. CP-CIDs exhibited significantly higher FR intensity, reflecting the elevated oxidative stress challenges faced by this category. This heightened FR activity is likely associated with greater mitochondrial dysfunction, which could serve as a critical driver of cancer progression. In contrast, NCP-CIDs displayed lower levels of mitochondrial damage, suggesting a more stable cellular environment less prone to oncogenic transformation.

These findings underscore the significant biological distinctions between cancer-prone and non-cancer-prone chronic inflammatory diseases. CP-CIDs demonstrates higher energy demands, elevated oxidative stress, disrupted iron metabolism, and greater mitochondrial dysfunction, all of which are hallmarks of tumorigenic processes. The enrichment of purine nucleotide biosynthesis pathways in CP-CIDs further highlights a potential compensatory mechanism for pH regulation under chronic stress conditions. In contrast, NCP-CIDs exhibits a more stable metabolic and oxidative profile, which may contribute to its reduced cancer risk.

3. Discussion

Understanding the key factors that determine whether chronic inflammation is prone to carcinogenesis is essential to uncover the mechanisms underlying the transition from chronic inflammation to cancer. This study presents FR-BINN, a biologically informed and interpretable framework for understanding the differential cancer susceptibility of chronic inflammatory diseases. By integrating biological priors of Fenton reaction pathways with multi-layered explainable AI, our study achieves robust disease classification and uncovers critical molecular features, including key genes and pathways, through interpretability analysis. High consistency was observed in top-ranked features of the same disease category between the Integrated Gradients and Shapley values methods, while the limited overlap of the intra-interpretability method across disease categories confirmed the heterogeneity of feature attribution. The chain-of-thought-augmented large language model further filtered out biologically implausible candidates and provided semantic explanation. The refined genes achieved better predictive performance on both the chronic inflammatory disease dataset and the independent disease dataset, further validating the reliability of the attribution results.

The pathway enrichment analysis revealed key biological distinctions between cancer-prone chronic inflammatory diseases and non-cancer-prone chronic inflammatory diseases. CP-CIDs demonstrated significant enrichment in energy metabolism pathways, particularly oxidative phosphorylation and the aerobic electron transport chain, indicating heightened energy demands likely supporting rapid cellular proliferation. This elevated energy requirement, coupled with increased oxidative stress, suggests that CP-CIDs face greater challenges related to reactive oxygen species and pH regulation, which are critical drivers of tumorigenesis. In contrast, NCP-CIDs exhibited a more stable oxidative and metabolic profile, with fewer enriched energy pathways and lower oxidative stress, potentially contributing to their reduced cancer risk. Notably, CP-CIDs also showed enrichment in iron metabolism pathways, particularly those related to iron–sulfur cluster assembly, further emphasizing the severity of oxidative stress in this category. Additionally, the enriched purine nucleotide biosynthesis pathways in CP-CIDs highlight a potential compensatory mechanism to manage intracellular pH stress. These findings underscore the complex and imbalanced nature of CP-CIDs, where energy and oxidative stress imbalances may drive carcinogenesis, while NCP-CIDs appear to maintain a dynamic equilibrium that mitigates cancer risk. Extending this approach to new domains requires considering the biological plausibility between pathway relationships and disease classification. Future work may extend the framework to broader disease contexts and incorporate additional omics data to further enhance its predictive and interpretative capabilities.

In summary, this study establishes a robust framework that combines AI-driven modeling with biological insights to tackle the long-standing challenge of understanding the cancer propensity of chronic inflammation. By bridging the gap between predictive accuracy and interpretability, FR-BINN provides valuable insights into inflammation-cancer transitions and suggests potential avenues for the development of novel diagnostic and therapeutic strategies.

4. Materials and Methods

4.1. Datasets and Data Processing

In this study, we curated a comprehensive dataset comprising 4284 transcriptomic samples sourced from the Gene Expression Omnibus (GEO) [51]. These samples represent four primary categories: cancer-prone chronic inflammatory diseases (CP-CIDs), non-cancer-prone chronic inflammatory diseases (NCP-CIDs), cancer, and normal samples. Notably, the normal samples were derived from the control cohorts of NCP-CIDs and CP-CIDs studies. A total of 12 diseases were included in the analysis, spanning a diverse range of conditions: Asthma, Alzheimer’s disease (AD), Psoriasis, Irritable bowel syndrome (IBS), Rheumatoid arthritis (RA), Ulcerative colitis (UC), Crohn’s disease (CD), Non-alcoholic steatohepatitis (NASH), Hepatitis B virus (HBV), Colon cancer, Colorectal cancer, and Hepatocellular carcinoma (HCC). Detailed information on the GEO dataset accession numbers is provided in Supplementary Tables S1 and S2. Additionally, two independent datasets, including multiple sclerosis and dermatomyositis, were retrieved from GEO to serve as external validation sets (Supplementary Tables S3 and S4).

To ensure consistency and compatibility across datasets, all the transcriptome data that were used were processed through a unified pipeline established by the NCBI SRA and GEO teams (www.ncbi.nlm.nih.gov/geo/download/?acc=GEONumber, accessed on 3 June 2024). This pipeline involved the re-alignment and quantification of raw sequencing data to produce high-quality, harmonized expression profiles. The expression levels were quantified using transcripts per kilobase million, a normalization metric that facilitates cross-study comparison by accounting for both sequencing depth and gene length. This standardized preprocessing approach ensured robust data integration and reproducibility in downstream analyses. The inclusion of a larger and more diverse dataset aimed to improve the robustness and generalizability of gene-level attribution results, allowing the model to better capture class-specific biological signatures. Cancer samples were chosen to reflect malignancies associated with high inflammation-to-cancer risk, which are also the primary tissues involved in CP-CIDs. During data processing, we retained only transcriptomic samples derived from the disease-relevant tissue types to ensure biological relevance and reduce confounding effects.

4.2. The Definition of NCP-CIDs and CP-CIDs

Chronic inflammation has been firmly established as a key factor in the initiation and progression of tumors, with inflammatory conditions significantly increasing tumor risk. Epidemiological evidence reveals that certain chronic inflammatory conditions exhibit a higher likelihood of malignant transformation, while others appear less prone to cancer progression. However, a standardized framework to categorize these conditions is currently lacking.

Drawing from extensive literature reviews and statistical analyses of large population cohorts, we propose a quantitative definition for determining the susceptibility of chronic inflammation to carcinogenesis:

\begin{matrix} p r o n e_o r_n o t = \{\begin{matrix} p r o n e & if (R R > 2) \lor (H R > 2) \\ \lor (S I R > 1.4) \\ n o t_p r o n e & else \end{matrix} \end{matrix}

(1)

where

R R

,

H R

, and

S I R

represent Relative Risk, Hazard Ratio, and Standardized Incidence Ratio, respectively. Based on this definition, we categorized chronic inflammatory diseases into NCP-CIDs and CP-CIDs (Supplementary Table S5). The thresholds of RR > 2 and HR > 2 align with established epidemiological conventions where an effect size exceeding 2.0 signifies a clinically important association and substantially elevated risk [52,53,54]. Our analysis of large-scale cohort studies and meta-analyses revealed that diseases consistently exhibiting RR > 2, HR > 2, or SIR > 1.4 demonstrate a clinically significant increase in cancer risk compared to reference populations. Crucially, empirical data aggregation showed clear separation: CP-CIDs consistently exceeded these thresholds, while NCP-CIDs fell below them. This natural dichotomy informed our threshold selection to robustly distinguish risk categories. These thresholds may warrant refinement if applied to broader disease spectra. Nevertheless, they provide a rigorously justified foundation for our framework’s classification task.

We first identified five diseases as NCP-CIDs: Asthma, AD, Psoriasis, IBS, and RA. Below, we summarize their characteristics and relationships with cancer risk.

Asthma: The Global Initiative for Asthma defines Asthma as a heterogeneous disease characterized by chronic airway inflammation and symptoms such as wheezing, coughing, dyspnea, and chest tightness [55]. While severe Asthma has been associated with an increased risk of certain cancers, the HR remains moderate at 1.36 [56,57].

Alzheimer’s Disease: AD is a neurodegenerative disorder marked by progressive cognitive decline and psychiatric disturbances [58]. Epidemiological studies consistently show an inverse association between AD and cancer risk. Patients with AD are significantly less likely to develop overall malignancies or specific cancers compared to the general population [4,59,60,61].

Psoriasis: Psoriasis is a chronic inflammatory disease of the skin and joints. While patients with Psoriasis have a slightly increased cancer risk, the RR remains modest at 1.21 [62].

Irritable Bowel Syndrome: IBS is a chronic gastrointestinal condition affecting 7–21% of the general population [63]. Despite its prevalence, IBS does not increase overall cancer risk. In fact, IBS is associated with a reduced risk of colorectal cancer and cancer-specific mortality [64].

Rheumatoid Arthritis: RA is a systemic autoimmune disease characterized by persistent joint inflammation and the presence of autoantibodies [65]. Patients with RA exhibit an increased overall cancer risk, with a SIR of 1.20 [66].

Next, the following diseases UC, CD, HBV, and NASH are considered to be CP-CIDs.

Inflammatory Bowel Disease: UC and CD are the primary forms of inflammatory bowel disease, distinguished by differences in genetic predisposition, clinical features, and histopathological characteristics [67]. UC increases the risk of colorectal cancer, with a pooled SIR of 2.4 and a RR of 2.4 [68]. Similarly, CD significantly elevates the risk of colorectal and small bowel cancers, with a RR of 2.5 [69].

Non-Alcoholic Steatohepatitis: NASH, a more severe form of Non-alcoholic fatty liver disease, is characterized by necroinflammation and accelerated fibrosis progression [70]. Patients with NASH are at a significantly increased risk of developing HCC, with a HR of 7.62 [71].

Hepatitis B Virus: HBV is a major risk factor for HCC. After 4.4 million person-years of follow-up, participants seropositive for Hepatitis B surface antigen exhibited a dramatically increased risk of HCC, with an HR of 15.77 [72].

Finally, we employed independent datasets for two additional disease datasets: dermatomyositis in the CP-CIDs and multiple sclerosis in the NCP-CIDs [73,74,75,76]. These datasets were not used during training, enabling robust validation of model predictions and biological insights.

4.3. Construction of Prior Networks

The FR is a critical biochemical process that generates hydroxyl radicals (

• O H

) and hydroxide ions (

O H^{-}

) via the interaction of ferrous iron (

F e^{2 +}

) and hydrogen peroxide (

H_{2} O_{2}

):

F e^{2 +} + H_{2} O_{2} ⟶ F e^{3 +} + • O H + O H^{-}

(2)

Hydroxyl radicals, as one of the most reactive oxidative species, have been extensively documented to cause severe cellular damage, including lipid peroxidation and DNA damage, while hydroxide ions significantly influence intracellular and extracellular pH homeostasis, which is fundamental to maintaining cellular function [13,50,77,78]. The FR has been implicated in cancer and chronic inflammatory diseases [10,50]. To construct the prior network, we curated the genesets of FR and relationships of biological entities from the existing literature [50,77,79]. This network captures key pathways and molecular components associated with FR processes, including genes such as iron homeostasis, ROS production, and antioxidant defense. The prior network was structured into layered form, denoted as S, and integrated into the FR-BINN framework to enable biologically grounded prediction and interpretability in the context of CP-CIDs and NCP-CIDs.

4.4. Construction of Model

We propose FR-BINN, a framework for interpretable analysis of disease prediction based on the biological hierarchy knowledge, which can not only be used to classify cancer, NCP-CIDs, CP-CIDs, and normal controls, but can also explore multiple levels of interpretation of features and pathways (Figure 1). The framework comprises three primary modules, each designed to address specific objectives. The first biological hierarchy knowledge module incorporates prior knowledge to reduce model parameters and mitigate shortcut learning. The second is the interpretability module, which explores and analyzes feature and pathway importance across different categories, mainly using the Integral Gradient, Shapley value, and conductance. Leveraging chain-of-thought reasoning, the large language model-based semantic module refines the identification of key genes and reduces potential hallucinations, contributing to more accurate and interpretable semantic explanations.

Biologically informed module: In order to reduce the shortcut learning and improve the interpretability of the model, we first use the prior knowledge of biology, FR-related gene sets and pathway relationships, and construct a prior-based sparse neural network. This hierarchical knowledge also helps to reduce the number of parameters and improve the performance of the model [23,25,26,80]. Training with different types of data helps FR-BINN to learn how to distinguish the state of the sample by using the biological prior network and the transcript expression data of the sample. This architecture of the biology knowledge-informed module was built using the pathway of the fention reaction. In FR-BINN, each node encodes some biological entity (for example, genes and pathways) and each edge represents a known relationship between the corresponding entities. The knowledge on the edges leads to a smaller number of parameters compared to fully connected networks with the same number of nodes, and thus potentially fewer computations.

\begin{matrix} h_{(l + 1)} = f ({(S_{l} ⊙ W_{l})}^{T} h_{l} + b) \end{matrix}

(3)

\begin{matrix} p r e d = s o f t m a x (W_{o u t} h_{o u t} + b) \end{matrix}

(4)

where

h_{0} = x

, and x represents the input feature, which, here, is the gene expression value.

S_{l} \in n^{(l + 1) \times n^{(l)}}

denotes the masked matrix of the layered prior knowledge. The activation function f introduces nonlinearity, while the Hadamard product ⊙ imposes sparsity by incorporating prior knowledge into the weights W. The model is trained using the Adam optimizer, and the cross-entropy loss function is employed to minimize classification errors:

L = - \frac{1}{N} \sum_{i = 0}^{N - 1} \sum_{k = 0}^{K - 1} (y_{i k}) l o g (p r e d_{i k})

(5)

where N represents the number of samples, K is the number of categories, and y is the ground truth.

Interpretability module. To identify and analyze key biological entities in a framework grounded in prior biological knowledge, we integrated Integrated Gradients (IG) [17,81], Shapley value (SV) [82], and conductance [83,84] into the interpretability module of FR-BINN. These methods enable the evaluation of feature attributions and the understanding of neuron importance in the prediction process. Together, these methods enhance the interpretability of predictions at multiple levels, from input features to network layers.

IG is an attribution method designed to quantify the contribution of each input feature to a model’s output. It works by integrating the gradients of the model’s output with respect to the input features along a straight path from a baseline input to the actual input. IG satisfies two key axioms: Sensitivity (ensuring that the attribution score reflects the influence of the feature) and Implementation Invariance (ensuring consistency across functionally equivalent models).

I G_{i} = (x_{i} - x_{i}^{'}) \sum_{k = 1}^{m_{i g}} \frac{\partial F (x^{'} + \frac{k}{m_{i g}} (x - x^{'}))}{\partial x_{i}} \frac{1}{m_{i g}}

(6)

where F represents our deep neural network.

I G_{i}

signifies the final attribution score with respect to the

i_{t h}

dimension of the features. The baseline

x^{'}

is set to zero.

m_{i g}

denotes the number of steps used in the Legendre–Gauss quadrature integral approximation.

The concept of total conductance is modified from IG for computing internal neuron importance. Consider a specific neuron y in a hidden layer of a network. We can define the conductance of neuron y for the attribution to an input variable i as follows:

C o n d_{i}^{y} (x) : : = (x_{i} - x_{i}^{'}) \cdot \int_{α = 0}^{1} \frac{\partial F (x^{'} + α (x - x^{'}))}{y} \cdot \frac{\partial y}{\partial x_{i}} d α

(7)

the total conductance of the hidden neuron y by summing over the input variables is defined as follows:

C o n d^{y} (x) : : = \sum_{i} (x_{i} - x_{i}^{'}) \cdot \int_{α = 0}^{1} \frac{\partial F (x^{'} + α (x - x^{'}))}{y} \cdot \frac{\partial y}{\partial x_{i}} d α

(8)

Additionally, we can aggregate over a set of logically related neurons that belong to a specific hidden layer. To define the whole conductance of the set, we can sum over the conductances of the neurons in the set. As with the IG, we use an approximation algorithm [84] to compute the conductance, with a step size of

m_{c o n}

.

The SV, rooted in cooperative game theory, provides a principled way to fairly distribute the payoff among input features by quantifying their marginal contributions to the prediction across all possible feature combinations. While the exact computation of SVs is computationally expensive due to the combinatorial number of possible feature subsets, Kernel SHAP approximates the SVs efficiently by leveraging a weighted linear regression method inspired by the local interpretable model-agnostic explanations framework [82]. Kernel SHAP constructs a surrogate interpretable model by sampling feature subsets and calculating the original model’s predictions for those subsets, assigning weights based on their importance. Here, we use

m_{s h a p l e y}

to represent the approximate number of times the original model is queried to generate predictions for training the surrogate interpretable model.

Large language model-based semantic reasoning module. The third module of FR-BINN leverages the LLM to analyze key genes. This module employs COT reasoning to enhance the model’s explanation, providing context-aware answers while reducing hallucinations.

Recent advancements in LLMs have demonstrated their potential in achieving near-human-level intelligence by training on vast amounts of knowledge from the web and other domains [85,86,87]. Studies have shown that COTreasoning improves LLMs’ capabilities in logical reasoning and decision-making tasks, making them more effective for complex, multi-step problem-solving scenarios [88,89]. In our framework, we utilize prompt engineering and COT reasoning to guide LLMs assistance in interpreting relationships between key genes and disease states. The standard prompting mechanism in LLMs can be represented as follows:

p (A | T, Q) = \prod_{i = 1}^{| A |} p_{LLM} (a_{i} | T, Q, a_{< i})

(9)

where Q represents the reasoning question, T denotes the prompt,

p_{L M}

is the parameterized probabilistic model, and A is the answer.

| A |

represents the length of the final answer.

a_{i}

denotes the i-th token. This equation aims to maximize the likelihood of answer. To incorporate COTreasoning, the equation is reformulated by introducing intermediate reasoning steps:

p (A | T, Q) = p (A | T, Q, R) p (R | T, Q)

(10)

p (R | T, Q) = \prod_{i = 1}^{| R |} p_{LLM} (r_{i} | T, Q, r_{< i})

(11)

p (A | T, Q, R) = \prod_{j = 1}^{| A |} p_{LLM} (a_{i} | T, Q, R, a_{< j})

(12)

where the prompt

T = {(Q_{i}, R_{i}, A_{i})}^{K_{i = 1}}

,

r_{i}

is one step of total

| R |

reasoning steps.

4.5. Evaluation Metrics

In our experiment, the five-fold cross-validation was utilized for evaluating method performance. For the classification task, the folds are made by preserving the percentage of samples for each class. In addition, we use the weighted Precision, weighted Recall and the weighted F1-score to evaluate performance.

4.6. Implementation

We implemented a layered architecture consisting of two pathway layers and one gene layer, though users can define custom hierarchies and relationships based on their gene sets or pathways of interest. Higher layers in the hierarchy represent higher-order biological processes or pathways. The model input layer is the value of omics, which in our task is the transcriptional expression of genes. The model output layer is used for classification. Our implementation of FR-BINN is based on the PyTorch machine learning framework, version 2.3. Using the F1-score as the evaluation criterion, we determined the optimal settings for key hyperparameters: the number of epochs was found to be optimal at 500, the batch size performed best at 32, and the learning rate achieved maximum performance at

0.001

. To uncover latent biological insights, we trained the model on the complete dataset after determining optimal hyperparameters for the biological prior module. To ensure robustness and biological interpretation, attribution analyses were computed exclusively using correctly predicted samples. For each sample, we focused solely on the attribution patterns corresponding to its true class label. The values of

m_{i g}

,

m_{c o n}

, and

m_{s h a p l e y}

are all set to 800. The large language model used is GPT4O. In addition, the filtering process relied on the LLM’s semantic explanations generated through COT prompting. Genes were retained only if the LLM’s narrative established mechanistic relevance to cancer development. To validate the disease relevance of identified genes, we interrogated the Mendelian Disease Database (DMRdb) for evidence of established causal associations [90]. Attribution ranking scores were computed as the mean absolute values of sample-level attributions. To obtain total attribution rankings, we derived a composite score by summing the attribution values from both feature attrbution methods. Gene enrichment analysis was performed using the union of genes identified by both interpretability methods within each class. For each class, we selected the top

5 %

of genes of the pathways based on total attribution scores and ranking values. Significant enrichment results were defined as those with adjusted p-values <

0.05

. GSVA scores were calculated according to [42], with group-level GSVA scores represented as the mean score across all group samples.

5. Conclusions

Our study introduced FR-BINN, a novel biologically informed neural network framework designed to classify chronic inflammatory diseases based on their propensity for carcinogenesis and to uncover the underlying molecular mechanisms. By integrating biological priors of the Fenton reaction with advanced interpretability methods, our work provides significant insights into the transition from inflammation to cancer. The core contributions, potential applications, and future directions of study are as follows:

FR-BINN effectively classifies samples into four categories by integrating a hierarchical structure based on FR-related pathways. This biologically informed design enhances model performance and ensures the interpretability of its predictions.
Through a multi-method interpretability analysis, we identified and validated key biomarkers that are critical in distinguishing CP-CIDs from NCP-CIDs. These identified key genes are promising candidate biomarkers for early diagnosis and therapeutic targets.
Our analysis revealed clear differences in energy metabolism, oxidative stress, and pH regulation between CP-CIDs and NCP-CIDs. This understanding suggests that therapies could be developed to modulate these metabolic pathways or mitigate oxidative stress specifically in CP-CIDs, potentially preventing cancer progression.
While FR-BINN effectively leverages the strengths of biologically informed neural networks for gene and complex pattern recognition, future research could benefit from incorporating more classical mathematical modeling approaches [91,92,93]. Although these methods typically require substantial mathematical expertise and a firm grasp of the underlying physical or biological mechanisms, they offer complementary perspectives and have the potential to enhance predictive power. This is particularly valuable for long-term predictions or when establishing precise causal links is paramount.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26146670/s1.

Author Contributions

Conceptualization, Y.C. and Y.Z.; methodology, Y.C. and C.Y.; software, Y.C. and C.Y.; investigation, Y.C., X.Z. and Y.Z.; data curation, Y.C. and C.Y.; visualization, Y.C., C.Y., X.Z. and Y.Z.; writing—original draft preparation, Y.C. and C.Y.; writing—review and editing, Y.C., X.Z. and Y.Z.; supervision, Y.C. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data utilized in this study are derived from publicly available the GEO database (www.ncbi.nlm.nih.gov/geo/, accessed on 3 June 2024). Detailed information is provided within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Coussens, L.M.; Werb, Z. Inflammation and cancer. Nature 2002, 420, 860–867. [Google Scholar] [CrossRef] [PubMed]
Piotrowski, I.; Kulcenty, K.; Suchorska, W. Interplay between inflammation and cancer. Rep. Pract. Oncol. Radiother. 2020, 25, 422–427. [Google Scholar] [CrossRef] [PubMed]
Zhao, H.; Wu, L.; Yan, G.; Chen, Y.; Zhou, M.; Wu, Y.; Li, Y. Inflammation and tumor progression: Signaling pathways and targeted intervention. Signal Transduct. Target. Ther. 2021, 6, 263. [Google Scholar] [CrossRef] [PubMed]
Rocca, W.A.; Petersen, R.C.; Knopman, D.S.; Hebert, L.E.; Evans, D.A.; Hall, K.S.; Gao, S.; Unverzagt, F.W.; Langa, K.M.; Larson, E.B.; et al. Trends in the incidence and prevalence of Alzheimer’s disease, dementia, and cognitive impairment in the United States. Alzheimer’s Dement. 2011, 7, 80–93. [Google Scholar] [CrossRef]
Singh, N.; Baby, D.; Rajguru, J.P.; Patil, P.B.; Thakkannavar, S.S.; Pujari, V.B. Inflammation and cancer. Ann. Afr. Med. 2019, 18, 121–126. [Google Scholar] [CrossRef]
Dall’Agnese, A.; Zheng, M.M.; Moreno, S.; Platt, J.M.; Hoang, A.T.; Kannan, D.; Dall’Agnese, G.; Overholt, K.J.; Sagi, I.; Hannett, N.M.; et al. Proteolethargy is a pathogenic mechanism in chronic disease. Cell 2025, 188, 207–221. [Google Scholar] [CrossRef]
Dall’Agnese, A.; Platt, J.M.; Zheng, M.M.; Friesen, M.; Dall’Agnese, G.; Blaise, A.M.; Spinelli, J.B.; Henninger, J.E.; Tevonian, E.N.; Hannett, N.M.; et al. The dynamic clustering of insulin receptor underlies its signaling and is disrupted in insulin resistance. Nat. Commun. 2022, 13, 7522. [Google Scholar] [CrossRef]
Li, H.; Zhang, J.; Shi, Y.; Zhao, G.; Xu, H.; Cai, M.; Gao, J.; Wang, H. Mechanism of INSR clustering with insulin activation and resistance revealed by super-resolution imaging. Nanoscale 2022, 14, 7747–7755. [Google Scholar] [CrossRef]
Nair, S.J.; Yang, L.; Meluzzi, D.; Oh, S.; Yang, F.; Friedman, M.J.; Wang, S.; Suter, T.; Alshareedah, I.; Gamliel, A.; et al. Phase separation of ligand-activated enhancers licenses cooperative chromosomal enhancer assembly. Nat. Struct. Mol. Biol. 2019, 26, 193–203. [Google Scholar] [CrossRef]
Ru, Q.; Li, Y.; Chen, L.; Wu, Y.; Min, J.; Wang, F. Iron homeostasis and ferroptosis in human diseases: Mechanisms and therapeutic prospects. Signal Transduct. Target. Ther. 2024, 9, 271. [Google Scholar] [CrossRef]
Zhuang, X.; Wang, Q.; Joost, S.; Ferrena, A.; Humphreys, D.T.; Li, Z.; Blum, M.; Krause, K.; Ding, S.; Landais, Y.; et al. Ageing limits stemness and tumorigenesis by reprogramming iron homeostasis. Nature 2024, 637, 184–194. [Google Scholar] [CrossRef] [PubMed]
Yu, Y.; Yan, Y.; Niu, F.; Wang, Y.; Chen, X.; Su, G.; Liu, Y.; Zhao, X.; Qian, L.; Liu, P.; et al. Ferroptosis: A cell death connecting oxidative stress, inflammation and cardiovascular diseases. Cell Death Discov. 2021, 7, 193. [Google Scholar] [CrossRef] [PubMed]
Tan, R.; Zhou, Y.; An, Z.; Xu, Y. Cancer is a survival process under persistent microenvironmental and cellular stresses. Genom. Proteom. Bioinform. 2023, 21, 1260–1265. [Google Scholar] [CrossRef] [PubMed]
Abramson, J.; Adler, J.; Dunger, J.; Evans, R.; Green, T.; Pritzel, A.; Ronneberger, O.; Willmore, L.; Ballard, A.J.; Bambrick, J.; et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 2024, 630, 493–500. [Google Scholar] [CrossRef]
Theodoris, C.V.; Xiao, L.; Chopra, A.; Chaffin, M.D.; Al Sayed, Z.R.; Hill, M.C.; Mantineo, H.; Brydon, E.M.; Zeng, Z.; Liu, X.S.; et al. Transfer learning enables predictions in network biology. Nature 2023, 618, 616–624. [Google Scholar] [CrossRef]
Cui, H.; Wang, C.; Maan, H.; Pang, K.; Luo, F.; Duan, N.; Wang, B. scGPT: Toward building a foundation model for single-cell multi-omics using generative AI. Nat. Methods 2024, 21, 1470–1480. [Google Scholar] [CrossRef]
Cao, Y.; Xiao, J.; Sheng, N.; Qu, Y.; Wang, Z.; Sun, C.; Mu, X.; Huang, Z.; Li, X. X-lda: An interpretable and knowledge-informed heterogeneous graph learning framework for lncrna-disease association prediction. Comput. Biol. Med. 2023, 167, 107634. [Google Scholar] [CrossRef]
Shi, H.; Gu, Y.; Zhang, H.; Li, X.; Cao, Y. MORGAT: A Model Based Knowledge-Informed Multi-omics Integration and Robust Graph Attention Network for Molecular Subtyping of Cancer. In Proceedings of the International Conference on Intelligent Computing, Kaohsiung, Taiwan, 14–16 December 2023; Springer: Singapore, 2023; pp. 192–206. [Google Scholar]
Xuan, P.; Cao, Y.; Zhang, T.; Kong, R.; Zhang, Z. Dual convolutional neural networks with attention mechanisms based method for predicting disease-related lncRNA genes. Front. Genet. 2019, 10, 416. [Google Scholar] [CrossRef]
Hao, M.; Gong, J.; Zeng, X.; Liu, C.; Guo, Y.; Cheng, X.; Wang, T.; Ma, J.; Zhang, X.; Song, L. Large-scale foundation model on single-cell transcriptomics. Nat. Methods 2024, 21, 1481–1491. [Google Scholar] [CrossRef]
Gao, S.; Fang, A.; Huang, Y.; Giunchiglia, V.; Noori, A.; Schwarz, J.R.; Ektefaie, Y.; Kondic, J.; Zitnik, M. Empowering biomedical discovery with ai agents. Cell 2024, 187, 6125–6151. [Google Scholar] [CrossRef]
Song, L.; Segal, E.; Xing, E. Toward AI-Driven Digital Organism: Multiscale Foundation Models for Predicting, Simulating and Programming Biology at All Levels. arXiv 2024, arXiv:2412.06993. [Google Scholar]
Elmarakeby, H.A.; Hwang, J.; Arafeh, R.; Crowdis, J.; Gang, S.; Liu, D.; AlDubayan, S.H.; Salari, K.; Kregel, S.; Richter, C.; et al. Biologically informed deep neural network for prostate cancer discovery. Nature 2021, 598, 348–352. [Google Scholar] [CrossRef] [PubMed]
Lan, W.; Liao, H.; Chen, Q.; Zhu, L.; Pan, Y.; Chen, Y.P.P. DeepKEGG: A multi-omics data integration framework with biological insights for cancer recurrence prediction and biomarker discovery. Briefings Bioinform. 2024, 25, bbae185. [Google Scholar] [CrossRef] [PubMed]
Jiang, Y.; Immadi, M.S.; Wang, D.; Zeng, S.; Chan, Y.O.; Zhou, J.; Xu, D.; Joshi, T. IRnet: Immunotherapy response prediction using pathway knowledge-informed graph neural network. J. Adv. Res. 2024, 72, 319–331. [Google Scholar] [CrossRef]
Hartman, E.; Scott, A.M.; Karlsson, C.; Mohanty, T.; Vaara, S.T.; Linder, A.; Malmström, L.; Malmström, J. Interpreting biologically informed neural networks for enhanced proteomic biomarker discovery and pathway analysis. Nat. Commun. 2023, 14, 5359. [Google Scholar] [CrossRef]
Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-arnold networks. arXiv 2024, arXiv:2404.19756. [Google Scholar]
Chen, V.; Yang, M.; Cui, W.; Kim, J.S.; Talwalkar, A.; Ma, J. Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments. Nat. Methods 2024, 21, 1454–1461. [Google Scholar] [CrossRef]
Qin, L.; Wu, Y.L.; Toneff, M.J.; Li, D.; Liao, L.; Gao, X.; Bane, F.T.; Tien, J.C.Y.; Xu, Y.; Feng, Z.; et al. NCOA1 directly targets M-CSF1 expression to promote breast cancer metastasis. Cancer Res. 2014, 74, 3477–3488. [Google Scholar] [CrossRef]
Qin, L.; Xu, Y.; Xu, Y.; Ma, G.; Liao, L.; Wu, Y.; Li, Y.; Wang, X.; Wang, X.; Jiang, J.; et al. NCOA1 promotes angiogenesis in breast tumors by simultaneously enhancing both HIF1α-and AP-1-mediated VEGFa transcription. Oncotarget 2015, 6, 23890. [Google Scholar] [CrossRef]
Tong, Z.; Zhang, Y.; Guo, P.; Wang, W.; Chen, Q.; Jin, J.; Liu, S.; Yu, C.; Mo, P.; Zhang, L.; et al. Steroid receptor coactivator 1 promotes human hepatocellular carcinoma invasiveness through enhancing MMP-9. J. Cell. Mol. Med. 2024, 28, e18171. [Google Scholar] [CrossRef]
Chen, Q.; Guo, P.; Hong, Y.; Mo, P.; Yu, C. The multifaceted therapeutic value of targeting steroid receptor coactivator-1 in tumorigenesis. Cell Biosci. 2024, 14, 41. [Google Scholar] [CrossRef] [PubMed]
Pavón, M.A.; Parreño, M.; Téllez-Gabriel, M.; León, X.; Arroyo-Solera, I.; López, M.; Céspedes, M.V.; Casanova, I.; Gallardo, A.; López-Pousa, A.; et al. CKMT1 and NCOA1 expression as a predictor of clinical outcome in patients with advanced-stage head and neck squamous cell carcinoma. Head Neck 2016, 38, E1392–E1403. [Google Scholar] [CrossRef] [PubMed]
Hong, Y.; Chen, Q.; Wang, Z.; Zhang, Y.; Li, B.; Guo, H.; Huang, C.; Kong, X.; Mo, P.; Xiao, N.; et al. Targeting Nuclear Receptor Coactivator SRC-1 Prevents Colorectal Cancer Immune Escape by Reducing Transcription and Protein Stability of PD-L1. Adv. Sci. 2024, 11, 2310037. [Google Scholar] [CrossRef] [PubMed]
Liu, C.; Zhou, D.; Yang, K.; Xu, N.; Peng, J.; Zhu, Z. Research progress on the pathogenesis of the SDHB mutation and related diseases. Biomed. Pharmacother. 2023, 167, 115500. [Google Scholar] [CrossRef]
Reynolds, M.B.; Hong, H.S.; Michmerhuizen, B.C.; Lawrence, A.L.E.; Zhang, L.; Knight, J.S.; Lyssiotis, C.A.; Abuaita, B.H.; O’Riordan, M.X. Cardiolipin coordinates inflammatory metabolic reprogramming through regulation of Complex II disassembly and degradation. Sci. Adv. 2023, 9, eade8701. [Google Scholar] [CrossRef]
Golozar, M.; Motlagh, A.V.; Mahdevar, M.; Peymani, M.; InanlooRahatloo, K.; Ghaedi, K. TBX15 and SDHB expression changes in colorectal cancer serve as potential prognostic biomarkers. Exp. Mol. Pathol. 2024, 136, 104890. [Google Scholar] [CrossRef]
Cornejo, K.M.; Lu, M.; Yang, P.; Wu, S.; Cai, C.; Zhong, W.d.; Olumi, A.; Young, R.H.; Wu, C.L. Succinate dehydrogenase B: A new prognostic biomarker in clear cell renal cell carcinoma. Hum. Pathol. 2015, 46, 820–826. [Google Scholar] [CrossRef]
Flynn, A.; Pattison, A.D.; Balachander, S.; Boehm, E.; Bowen, B.; Dwight, T.; Rossello, F.J.; Hofmann, O.; Martelotto, L.; Zethoven, M.; et al. Multi-omic analysis of SDHB-deficient pheochromocytomas and paragangliomas identifies metastasis and treatment-related molecular profiles. Nat. Commun. 2025, 16, 2632. [Google Scholar] [CrossRef]
Zhu, J.; Mao, S.; Zhen, N.; Zhu, G.; Bian, Z.; Xie, Y.; Tang, X.; Ding, M.; Wu, H.; Ma, J.; et al. SNORA14A inhibits hepatoblastoma cell proliferation by regulating SDHB-mediated succinate metabolism. Cell Death Discov. 2023, 9, 36. [Google Scholar] [CrossRef]
Lamy, C.; Tissot, H.; Faron, M.; Baudin, E.; Lamartina, L.; Pradon, C.; Al Ghuzlan, A.; Leboulleux, S.; Perfettini, J.L.; Paci, A.; et al. Succinate: A serum biomarker of SDHB-mutated paragangliomas and pheochromocytomas. J. Clin. Endocrinol. Metab. 2022, 107, 2801–2810. [Google Scholar] [CrossRef]
Hänzelmann, S.; Castelo, R.; Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-seq data. BMC Bioinform. 2013, 14, 7. [Google Scholar] [CrossRef] [PubMed]
Quijano, C.; Trujillo, M.; Castro, L.; Trostchansky, A. Interplay between oxidant species and energy metabolism. Redox Biol. 2016, 8, 28–42. [Google Scholar] [CrossRef] [PubMed]
Yang, S.; Lian, G. ROS and diseases: Role in metabolism and energy supply. Mol. Cell. Biochem. 2020, 467, 1–12. [Google Scholar] [CrossRef] [PubMed]
Kew, M.C. Hepatic iron overload and hepatocellular carcinoma. Cancer Lett. 2009, 286, 38–43. [Google Scholar] [CrossRef]
Das, T.K.; Wati, M.R.; Fatima-Shad, K. Oxidative stress gated by Fenton and Haber Weiss reactions and its association with Alzheimer’s disease. Arch. Neurosci. 2015, 2, e60038. [Google Scholar]
Lushchak, V.I. Free radicals, reactive oxygen species, oxidative stress and its classification. Chem.-Biol. Interact. 2014, 224, 164–175. [Google Scholar] [CrossRef]
Webb, B.A.; Chimenti, M.; Jacobson, M.P.; Barber, D.L. Dysregulated pH: A perfect storm for cancer progression. Nat. Rev. Cancer 2011, 11, 671–677. [Google Scholar] [CrossRef]
Koltai, T. The Ph paradigm in cancer. Eur. J. Clin. Nutr. 2020, 74, 14–19. [Google Scholar] [CrossRef]
Sun, H.; Zhou, Y.; Jiang, H.; Xu, Y. Elucidation of functional roles of sialic acids in cancer migration. Front. Oncol. 2020, 10, 401. [Google Scholar] [CrossRef]
Edgar, R.; Domrachev, M.; Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002, 30, 207–210. [Google Scholar] [CrossRef]
Lavigne, S.E. Evolving evidence for relationships between periodontitis and systemic diseases: Position paper from the Canadian Dental Hygienists Association. Can. J. Dent. Hyg. 2022, 56, 155. [Google Scholar] [PubMed]
Nurminen, M.; Karjalainen, A. Epidemiologic estimate of the proportion of fatalities related to occupational factors in Finland. Scand. J. Work. Environ. Health 2001, 27, 161–213. [Google Scholar] [CrossRef] [PubMed]
Ferguson, C.J. An effect size primer: A guide for clinicians and researchers. In Methodological Issues and Strategies in Clinical Research, 4th ed.; American Psychological Association: Worcester, MA, USA, 2016; pp. 301–310. [Google Scholar]
Bateman, E.D.; Hurd, S.S.; Barnes, P.J.; Bousquet, J.; Drazen, J.M.; FitzGerald, M.; Gibson, P.; Ohta, K.; O’Byrne, P.; Pedersen, S.E.; et al. Global strategy for asthma management and prevention: GINA executive summary. Eur. Respir. J. 2007, 31, 143–178. [Google Scholar] [CrossRef] [PubMed]
Salameh, L.; Mahboub, B.; Khamis, A.; Alsharhan, M.; Tirmazy, S.H.; Dairi, Y.; Hamid, Q.; Hamoudi, R.; Al Heialy, S. Asthma severity as a contributing factor to cancer incidence: A cohort study. PLoS ONE 2021, 16, e0250430. [Google Scholar] [CrossRef]
Guo, Y.; Bian, J.; Chen, Z.; Fishe, J.N.; Zhang, D.; Braithwaite, D.; George, T.J.; Shenkman, E.A.; Licht, J.D. Cancer incidence after asthma diagnosis: Evidence from a large clinical research network in the United States. Cancer Med. 2023, 12, 11871–11877. [Google Scholar] [CrossRef]
McDade, E.M. Alzheimer disease. Contin. Lifelong Learn. Neurol. 2022, 28, 648–675. [Google Scholar] [CrossRef]
Ospina-Romero, M.; Glymour, M.M.; Hayes-Larson, E.; Mayeda, E.R.; Graff, R.E.; Brenowitz, W.D.; Ackley, S.F.; Witte, J.S.; Kobayashi, L.C. Association between Alzheimer disease and cancer with evaluation of study biases: A systematic review and meta-analysis. JAMA Netw. Open 2020, 3, e2025515. [Google Scholar] [CrossRef]
Kang, H.S.; Kim, J.H.; Lim, H.; Kim, J.H.; Noh, H.M.; Choi, H.G.; Min, K.W.; Kim, N.Y.; Kwon, M.J. Alzheimer’s Disease and Different Types of Cancer Likelihood: Unveiling Disparities and Potential Protective Effects in a Korean Cohort Study. Cancers 2023, 15, 4615. [Google Scholar] [CrossRef]
Nolen, S.C.; Evans, M.A.; Fischer, A.; Corrada, M.M.; Kawas, C.H.; Bota, D.A. Cancer—incidence, prevalence and mortality in the oldest-old. A comprehensive review. Mech. Ageing Dev. 2017, 164, 113–126. [Google Scholar] [CrossRef]
Vaengebjerg, S.; Skov, L.; Egeberg, A.; Loft, N.D. Prevalence, incidence, and risk of cancer in patients with psoriasis and psoriatic arthritis: A systematic review and meta-analysis. JAMA Dermatol. 2020, 156, 421–429. [Google Scholar] [CrossRef]
Chey, W.D.; Kurlander, J.; Eswaran, S. Irritable bowel syndrome: A clinical review. Jama 2015, 313, 949–958. [Google Scholar] [CrossRef] [PubMed]
Wu, S.; Yuan, C.; Liu, S.; Zhang, Q.; Yang, Z.; Sun, F.; Zhan, S.; Zhu, S.; Zhang, S. Irritable bowel syndrome and long-term risk of cancer: A prospective cohort study among 0.5 million adults in UK biobank. Off. J. Am. Coll. Gastroenterol. ACG 2022, 117, 785–793. [Google Scholar] [CrossRef] [PubMed]
Bhandari, B.; Basyal, B.; Sarao, M.S.; Nookala, V.; Thein, Y. Prevalence of cancer in rheumatoid arthritis: Epidemiological study based on the National Health and Nutrition Examination Survey (NHANES). Cureus 2020, 12, e7870. [Google Scholar] [CrossRef] [PubMed]
Beydon, M.; Pinto, S.; De Rycke, Y.; Fautrel, B.; Mariette, X.; Seror, R.; Tubach, F. Risk of cancer for patients with rheumatoid arthritis versus general population: A national claims database cohort study. Lancet Reg.-Health -Eur. 2023, 35, 100768. [Google Scholar] [CrossRef]
Gajendran, M.; Loganathan, P.; Jimenez, G.; Catinella, A.P.; Ng, N.; Umapathy, C.; Ziade, N.; Hashash, J.G. A comprehensive review and update on ulcerative colitis. Disease-A-Month 2019, 65, 100851. [Google Scholar] [CrossRef]
Jess, T.; Rungoe, C.; Peyrin-Biroulet, L. Risk of colorectal cancer in patients with ulcerative colitis: A meta-analysis of population-based cohort studies. Clin. Gastroenterol. Hepatol. 2012, 10, 639–645. [Google Scholar] [CrossRef]
Canavan, C.; Abrams, K.; Mayberry, J. Meta-analysis: Colorectal and small bowel cancer risk in patients with Crohn’s disease. Aliment. Pharmacol. Ther. 2006, 23, 1097–1104. [Google Scholar] [CrossRef]
Powell, E.E.; Wong, V.W.S.; Rinella, M. Non-alcoholic fatty liver disease. Lancet 2021, 397, 2212–2224. [Google Scholar] [CrossRef]
Kanwal, F.; Kramer, J.R.; Mapakshi, S.; Natarajan, Y.; Chayanupatkul, M.; Richardson, P.A.; Li, L.; Desiderio, R.; Thrift, A.P.; Asch, S.M.; et al. Risk of hepatocellular cancer in patients with non-alcoholic fatty liver disease. Gastroenterology 2018, 155, 1828–1837. [Google Scholar] [CrossRef]
Song, C.; Lv, J.; Liu, Y.; Chen, J.G.; Ge, Z.; Zhu, J.; Dai, J.; Du, L.B.; Yu, C.; Guo, Y.; et al. Associations between hepatitis B virus infection and risk of all cancer types. JAMA Netw. Open 2019, 2, e195718. [Google Scholar] [CrossRef]
Olazagasti, J.M.; Baez, P.J.; Wetter, D.A.; Ernste, F.C. Cancer risk in dermatomyositis: A meta-analysis of cohort studies. Am. J. Clin. Dermatol. 2015, 16, 89–98. [Google Scholar] [CrossRef] [PubMed]
Qiang, J.K.; Kim, W.B.; Baibergenova, A.; Alhusayen, R. Risk of malignancy in dermatomyositis and polymyositis: A systematic review and meta-analysis. J. Cutan. Med. Surg. 2017, 21, 131–136. [Google Scholar] [CrossRef] [PubMed]
Pierret, C.; Mulliez, A.; Le Bihan-Benjamin, C.; Moisset, X.; Bousquet, P.J.; Leray, E. Cancer Risk Among Patients With Multiple Sclerosis: A 10-Year Nationwide Retrospective Cohort Study. Neurology 2024, 103, e209885. [Google Scholar] [CrossRef] [PubMed]
Bosco-Lévy, P.; Foch, C.; Grelaud, A.; Sabidó, M.; Lacueille, C.; Jové, J.; Boutmy, E.; Blin, P. Incidence and risk of cancer among multiple sclerosis patients: A matched population-based cohort study. Eur. J. Neurol. 2022, 29, 1091–1099. [Google Scholar] [CrossRef] [PubMed]
Sun, H.; Zhang, C.; Cao, S.; Sheng, T.; Dong, N.; Xu, Y. Fenton reactions drive nucleotide and ATP syntheses in cancer. J. Mol. Cell Biol. 2018, 10, 448–459. [Google Scholar] [CrossRef]
Sun, H.; Zhou, Y.; Skaro, M.F.; Wu, Y.; Qu, Z.; Mao, F.; Zhao, S.; Xu, Y. Metabolic reprogramming in cancer is induced to increase proton production. Cancer Res. 2020, 80, 1143–1155. [Google Scholar] [CrossRef]
Huang, Z.; Chen, Q.; Mu, X.; An, Z.; Xu, Y. Elucidating the Functional Roles of Long Non-Coding RNAs in Alzheimer’s Disease. Int. J. Mol. Sci. 2024, 25, 9211. [Google Scholar] [CrossRef]
Chen, H.; Lu, Y.; Dai, Z.; Yang, Y.; Li, Q.; Rao, Y. Comprehensive single-cell RNA-seq analysis using deep interpretable generative modeling guided by biological hierarchy knowledge. Briefings Bioinform. 2024, 25, bbae314. [Google Scholar] [CrossRef]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 3319–3328. [Google Scholar]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Dhamdhere, K.; Sundararajan, M.; Yan, Q. How Important is a Neuron. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Shrikumar, A.; Su, J.; Kundaje, A. Computationally efficient measures of internal neuron importance. arXiv 2018, arXiv:1807.09946. [Google Scholar]
Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A survey on large language model based autonomous agents. Front. Comput. Sci. 2024, 18, 186345. [Google Scholar] [CrossRef]
Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
Wu, S.; Peng, Z.; Du, X.; Zheng, T.; Liu, M.; Wu, J.; Ma, J.; Li, Y.; Yang, J.; Zhou, W.; et al. A Comparative Study on Reasoning Patterns of OpenAI’s o1 Model. arXiv 2024, arXiv:2410.13639. [Google Scholar]
Chu, Z.; Chen, J.; Chen, Q.; Yu, W.; He, T.; Wang, H.; Peng, W.; Liu, M.; Qin, B.; Liu, T. Navigate through enigmatic labyrinth a survey of chain of thought reasoning: Advances, frontiers and future. arXiv 2023, arXiv:2309.15402. [Google Scholar]
Qiao, S.; Ou, Y.; Zhang, N.; Chen, X.; Yao, Y.; Deng, S.; Tan, C.; Huang, F.; Chen, H. Reasoning with language model prompting: A survey. arXiv 2023, arXiv:2212.09597. [Google Scholar]
Zheng, X.; Tian, Z.; Che, X.; Zhang, X.; Xiang, Y.; Ge, Z.; Zhai, Z.; Ma, Q.; Pan, J. DMRdb: A disease-centric Mendelian randomization database for systematically assessing causal relationships of diseases with genes, proteins, CpG sites, metabolites and other diseases. Nucleic Acids Res. 2025, 53, D1363–D1371. [Google Scholar] [CrossRef]
Mitkowski, P.J. Mathematical Structures of Ergodicity and Chaos in Population Dynamics; Springer: Berlin/Heidelberg, Germany, 2021; Volume 312. [Google Scholar]
Marciniak-Czochra, A.; Karch, G.; Suzuki, K. Instability of Turing patterns in reaction-diffusion-ODE systems. J. Math. Biol. 2017, 74, 583–618. [Google Scholar] [CrossRef]
Lasota, A.; Mackey, M.C.; Ważewska-Czyżewska, M. Minimizing therapeutically induced anemia. J. Math. Biol. 1981, 13, 149–158. [Google Scholar] [CrossRef]

Figure 1. Overview of FR-BINN framework. (A) Integration of transcriptomic datasets, disease category definitions (derived from statistical indicators), and FR-associated biological prior knowledge. (B) Biologically informed neural network encoding hierarchical FR-related knowledge for classification. The framework further provides multi-level explanations (e.g., key genes, pathways) and utilizes a LLM for semantic reasoning and interpretation.

Figure 2. Performance evaluation. (A) Disease categorization into CP-CIDs and NCP-CIDs based on epidemiological statistical thresholds. (B) Comparative classification performance of FR-BINN versus five baseline models across categories. (C) Density distributions of model-predicted probabilities aligned with clinical disease stages.

Figure 3. Validation of attribution methods. (A) Top attributed gene identification: Top 10 genes identified by the IG method that distinguish CP-CIDs from NCP-CIDs. (B) Expression validation: Gene expression heatmap illustrating the discriminatory power of the union of the top 10 genes identified by both IG and SV methods for cancer-prone versus non-cancer-prone inflammatory diseases. (C) Attribution method robustness: High concordance between IG and SV attribution results for top-ranked genes across both two inflammatory disease categories, indicating the robustness of the identified features. (D) Performance of the logistic regression model with refined genes. (E) Predictive accuracy on independent datasets.

Figure 4. Analysis of key attributed genes. (A) Top 10 candidate genes ranked by combined IG and SV attribution scores. (B) Venn diagram illustrating, in the top 10 candidate genes, the overlap between 7 causal genes and 4 genes encoding proteins related to hydrogen ion production. (C) Gene expression level of NCOA1 in CP-CIDs, NCP-CIDs, diseases, and control (********** denotes p =

1.47 \times 10^{- 32}

). (D) NCOA1 expression profiles across various cancer types of TCGA. (E) Kaplan–Meier survival analysis for NCOA1 in LIHC.

Figure 4. Analysis of key attributed genes. (A) Top 10 candidate genes ranked by combined IG and SV attribution scores. (B) Venn diagram illustrating, in the top 10 candidate genes, the overlap between 7 causal genes and 4 genes encoding proteins related to hydrogen ion production. (C) Gene expression level of NCOA1 in CP-CIDs, NCP-CIDs, diseases, and control (********** denotes p =

1.47 \times 10^{- 32}

). (D) NCOA1 expression profiles across various cancer types of TCGA. (E) Kaplan–Meier survival analysis for NCOA1 in LIHC.

Figure 5. Pathway-level analysis. (A) Patterns in CP-CIDs and NCP-CIDs. (B) GO enrichment analysis. (C) Energy metabolism pathway enrichment results of two categories. (D) GSVA scores of the energy metabolism pathways. (E) Mitochondrial and cytosolic protein damage.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, Y.; Yin, C.; Zhou, X.; Zhao, Y. FR-BINN: Biologically Informed Neural Networks for Enhanced Biomarker Discovery and Pathway Analysis. Int. J. Mol. Sci. 2025, 26, 6670. https://doi.org/10.3390/ijms26146670

AMA Style

Cao Y, Yin C, Zhou X, Zhao Y. FR-BINN: Biologically Informed Neural Networks for Enhanced Biomarker Discovery and Pathway Analysis. International Journal of Molecular Sciences. 2025; 26(14):6670. https://doi.org/10.3390/ijms26146670

Chicago/Turabian Style

Cao, Yangkun, Chaoyi Yin, Xinsen Zhou, and Yonghe Zhao. 2025. "FR-BINN: Biologically Informed Neural Networks for Enhanced Biomarker Discovery and Pathway Analysis" International Journal of Molecular Sciences 26, no. 14: 6670. https://doi.org/10.3390/ijms26146670

APA Style

Cao, Y., Yin, C., Zhou, X., & Zhao, Y. (2025). FR-BINN: Biologically Informed Neural Networks for Enhanced Biomarker Discovery and Pathway Analysis. International Journal of Molecular Sciences, 26(14), 6670. https://doi.org/10.3390/ijms26146670

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FR-BINN: Biologically Informed Neural Networks for Enhanced Biomarker Discovery and Pathway Analysis

Abstract

1. Introduction

2. Results

2.1. Overview of FR-BINN Framework

2.2. Performance Evaluation

2.3. Validation of Attribution Methods

2.4. Analysis of Attribution Results

2.5. Pathway Enrichment Analysis

3. Discussion

4. Materials and Methods

4.1. Datasets and Data Processing

4.2. The Definition of NCP-CIDs and CP-CIDs

4.3. Construction of Prior Networks

4.4. Construction of Model

4.5. Evaluation Metrics

4.6. Implementation

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI