Next Article in Journal
From Model to Crop: Roles of Macroautophagy in Arabidopsis and Legumes
Previous Article in Journal
Identification of the High-Affinity Potassium Transporter Gene Family in Perennial Ryegrass (Lolium perenne) and Its Potential Role in Salt Stress
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrating Transcriptomics and Machine Learning to Uncover the FLI1-PARP14-Immune Axis in Ulcerative Colitis Activity and Pathogenesis

Cancer Research Center, School of Medicine, Xiamen University, Xiamen 361102, China
*
Author to whom correspondence should be addressed.
Genes 2025, 16(11), 1342; https://doi.org/10.3390/genes16111342
Submission received: 20 October 2025 / Revised: 30 October 2025 / Accepted: 4 November 2025 / Published: 7 November 2025
(This article belongs to the Section Human Genomics and Genetic Diseases)

Abstract

Background: Ulcerative colitis (UC) is a chronic inflammatory bowel disease whose molecular mechanisms of action remain incompletely characterized. This study was designed to develop potential diagnostic biomarkers and unravel the pathogenic causes of UC activity through the integration of transcriptome analysis with machine learning and genetic causal inference. Methods: Gene expression datasets (GSE75214, GSE53306, GSE179285) from the GEO database were evaluated. Weighted gene co-expression network analysis (WGCNA) and differentially expressed gene (DEG) analysis were applied to discover activity-associated genes. Protein–protein interaction networks and ensemble machine learning methods were utilized to refine the potential list. Furthermore, summary-data-based Mendelian Randomization (SMR) analysis and immune infiltration research were conducted. Results: Eight characteristic genes were identified, with CXCL11, PARP14, and IFITM1 emerging as hub genes. These hub genes exhibited strong diagnostic accuracy, with consistent area under the curve (AUC) values exceeding 0.83 across 3 independent cohorts. SMR analysis demonstrated a probable causal connection between higher PARP14 and UC susceptibility. The hub genes were strongly correlated with immune cells, including M1 macrophages and NK cells. FLI1 was discovered as a critical upstream transcription factor regulating this network. Conclusions: The findings outline a FLI1-PARP14-immune axis central to UC activity, providing unique insights into its pathophysiology and highlighting PARP14 as a promising diagnostic biomarker and potential therapeutic target.

1. Introduction

Inflammatory bowel disease (IBD) is a group of chronic, relapsing inflammatory disorders of the gastrointestinal tract. It is characterized by recurrent mucosal inflammation, which can result in irreversible tissue damage, impaired intestinal function, and systemic complications such as malnutrition and extraintestinal manifestations [1]. Among its 2 major subtypes, ulcerative colitis (UC) is characterized by diffuse, continuous mucosal inflammation that originates in the rectum and progresses proximally across the colon. In contrast, Crohn’s disease (CD) can affect any segment of the gastrointestinal system and often manifests with discontinuous lesions [2,3]. The global burden of UC has risen considerably over recent decades. By 2023, its prevalence was anticipated at roughly 5 million cases worldwide, with incidence growing in both developed and developing regions. This trend, driven by the combination of genetic susceptibility, Westernized diets, microbial dysbiosis, and environmental exposures, imposes major physical, psychological, and economic costs on patients and healthcare systems [3,4].
The pathophysiology of UC is multifaceted and not yet fully elucidated. It involves complex interactions among genetic predisposition, intestinal barrier dysfunction, immunological dysregulation, and alterations in the gut microbiota [5,6]. Genome-wide association studies (GWAS) have identified more than 200 susceptibility loci, many of which are associated with pathways critical to epithelial integrity (e.g., MUC2-mediated mucus barrier), immune regulation (e.g., TNF-α and IL-23 signaling), and microbial recognition (e.g., NOD2) [7,8]. Environmental risk factors, such as early antibiotic exposure, consumption of processed foods, and persistent psychological stress, further alter intestinal homeostasis by reducing microbial diversity (e.g., decreased Bacteroidetes and increased Proteobacteria) and increasing epithelial permeability. This disturbance contributes to the development of a “leaky gut”, which, in turn, can worsen mucosal inflammation [6,9]. Despite the considerable progress in the field of UC research, the molecular pathways underlying its development, relapse, and progression remain poorly characterized. This limitation, in turn, constrains the efficacy of current therapeutic options.
Current management strategies for UC reflect its incurable nature and focus on 4 key goals: inducing a rapid clinical response, maintaining long-term remission, promoting mucosal healing, and preventing complications such as colorectal cancer and toxic megacolon [10,11]. Therapeutic regimens are stratified by disease severity and extent. Meta-analyses demonstrate the efficacy of 5-aminosalicylic acid (5-ASA) over placebo in reducing mucosal inflammation and relapse rates [12]. For mild-to-moderate UC, 5-ASA formulations, taken orally or rectally, remain the primary therapy for inducing and maintaining remission. For patients with moderate-to-severe disease who do not respond adequately to conventional treatments, the therapeutic landscape has expanded significantly to include biologic agents, particularly inhibitors targeting specific biomarkers [13]. While systemic corticosteroids remain highly effective for short-term symptom relief in moderate-to-severe UC, their use is restricted to acute settings due to significant long-term adverse effects, including osteoporosis and increased infection risk [14].
A critical challenge in UC management is the absence of individualized strategies to anticipate relapse, maximize therapeutic response, and achieve lasting mucosal healing—essential criteria for improving long-term outcomes and quality of life [10,15]. Mucosal healing, defined as the endoscopic normalization of the colonic mucosa, is a critical therapeutic target that is correlated with lower hospitalization and colectomy rates, as well as a reduced risk of colorectal cancer [10,11]. This is particularly relevant given that the risk of colorectal cancer for any patient with ulcerative colitis is known to be elevated, and is estimated to be 2% after 10 years, 8% after 20 years and 18% after 30 years of disease [16].
However, traditional biomarkers (e.g., fecal calprotectin and C-reactive protein) and clinical symptom scores often fail to detect the subtle molecular changes that drive disease activity or remission [17,18]. Histology, while a more definitive tool for diagnosing disease activity and severity and featuring higher accessibility in clinical practice, also struggles to rapidly capture these early subtle molecular alterations. In contrast, transcriptomics, though less accessible compared to histology, demonstrates advantages in quickly reflecting such fine-grained molecular changes in the early stages of disease.
This complementary nature of the two methods highlights that their integration could better address existing diagnostic gaps—an approach that further underscores the urgent need for diagnostic strategies capable of rapidly reflecting treatment efficacy. Such combined strategies, leveraging both histology’s diagnostic definitiveness and transcriptomics’ molecular sensitivity, would enable timely intervention and improved personalized management, while also helping identify molecular fingerprints to distinguish active from inactive disease phases and uncover key regulatory genes involved in inflammation resolution, epithelial repair, and immune homeostasis.
Recent advancements in high-throughput sequencing and analytics have enabled systematic analysis of transcriptome profiles between inactive and active disease phases. Differential gene expression studies of patient samples have identified potential diagnostic biomarkers such as CCL11 and MMP1, which contribute to immune cell recruitment and extracellular matrix remodeling—processes implicated in mucosal inflammation and healing [18]. Weighted gene co-expression network analysis (WGCNA) and machine learning algorithms have further defined important hub genes, including CTSS, S100A11, and TUBB, which are dysregulated in UC and linked to macrophage-mediated inflammation and epithelial barrier dysfunction [19]. Integrating these results with expression quantitative trait loci (eQTL) analysis [20] and Mendelian randomization [21] offers a means to establish causal relationships between gene expression and UC susceptibility, while protein–protein interaction (PPI) networks and immune cell infiltration analysis (e.g., CIBERSORT) provide mechanistic insights into how hub genes regulate inflammatory pathways and immune cell dynamics [7,18,22].
By comprehensively comparing gene expression profiles between inactive and active disease phases, this study aims to discover hub genes driving UC activity. The overarching goal is to facilitate the development of targeted therapeutics capable of inducing durable remission, promoting mucosal healing, and improving patient quality of life. These discoveries are expected to address important gaps in understanding the molecular mechanisms of UC and advance the development of precision medicine strategies for this complex disease.

2. Materials and Methods

2.1. Study Design

Using the GEO database [23], overlapping genes associated with UC activity were identified through weighted gene co-expression network analysis (WGCNA) and differentially expressed gene (DEG) analysis. Characteristic and hub genes were further evaluated using protein–protein interaction (PPI) analysis and machine learning algorithms. Immune infiltration patterns associated with UC activity, together with hub genes were also investigated. In addition, expression quantitative trait locus (eQTL) data and genome-wide association study (GWAS) data related to IBD were integrated to examine causal links between hub genes and IBD. The overall analytical workflow is depicted in Figure 1.

2.2. Data Collection and Analysis

Three gene expression profiles (GSE75214 [24], GSE53306 [25], and GSE179285 [26]) of UC patients were retrieved from the GEO database (https://www.ncbi.nlm.nih.gov/geo/ (accessed on 5 August 2025)), maintained by the National Center for Biotechnology Information (NCBI), Bethesda, MD, USA. The GSE75214 dataset was used for DEG analysis and selected as the training cohort. The GSE53306 and GSE179285 datasets were deployed as independent testing cohorts (Table 1).

2.3. WGCNA

To construct a weighted gene co-expression network from the top 5000 most variably expressed genes, the R package “WGCNA” (version 1.73; https://cran.r-project.org/web/packages/WGCNA/index.html (accessed on 5 August 2025)) [27] was utilized to identify gene modules strongly correlated with UC activity. A scale-free network was built by determining the appropriate soft-thresholding power (β). A weighted adjacency matrix was constructed and subsequently transformed into a topological overlap matrix (TOM). Hierarchical clustering was then performed using a TOM-based dissimilarity metric. Gene modules were identified by dynamic tree cutting, and genes significantly correlated with UC activity were selected for further analysis.

2.4. Functional Enrichment Analysis

Functional enrichment analyses were performed on the overlapping genes of the DEGs and the UC activity-associated module to uncover their biological roles. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis was performed using the R packages “clusterProfiler” and “DOSE” (version 4.2.0), supplemented by the org.Hs.eg.db database.

2.5. Protein–Protein Interaction (PPI) Analysis

A PPI network was constructed using the STRING database (https://string-db.org/ (accessed on 6 August 2025)) [20], focusing on genes overlapping between the DEGs and the UC activity-associated module. The network was visualized using Cytoscape software (version 3.10.2). To identify characteristic genes, the cytoHubba plugin was used to extract the top 15 genes with the highest maximal clique centrality (MCC) scores, emphasizing functionally significant sub-networks.

2.6. Machine Learning Algorithm Analysis

The top 15 genes identified from the PPI analysis were assessed using a diverse set of machine learning algorithms to identify the most predictive features. Twelve algorithms were implemented, including regularization methods such as Least Absolute Shrinkage and Selection Operator (Lasso), Ridge Regression (Ridge), and Elastic Net (ElasticNet); generalized linear models such as Stepwise Generalized Linear Model (Stepglm), Boosted Generalized Linear Model (glmBoost), and Partial Least Squares Regression with Generalized Linear Model (plsRglm); ensemble learners such as Random Forest (RF), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost); and pattern recognition approaches such as Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), and Naive Bayes Classifier (Naive Bayes) [28]. This integrative strategy ensured coverage of both linear and nonlinear modeling paradigms.
The computational approach involved 2 stages: (1) feature ranking by recursive feature elimination for preliminary screening and (2) predictive modeling through stacked generalization. All models were trained with stratified 10-fold cross-validation and underwent systematic hyperparameter optimization, resulting in 113 unique configurations. Four algorithms with intrinsic feature selection ability (Lasso, Random Forest, Stepglm, and glmBoost) were initially utilized to refine the gene set. The refined subsets were subsequently used to train 8 additional algorithms for predictive modeling. Each of the 113 models was evaluated based on the area under the receiver operating characteristic curve (AUC) in both training and testing cohorts. AUC values close to 1 were considered to indicate strong discriminative performance, whereas values near 0.5 suggested no diagnostic utility. The optimum model was selected based on the highest mean AUC across validation folds.

2.7. Diagnostic Value and Expression Patterns of Hub Genes

To investigate the diagnostic potential of hub genes in UC activity, receiver operating characteristic (ROC) curve analysis was performed using the R package “pROC” (version 1.19.0.1). The AUC was used to assess each gene’s ability to differentiate between inactive and active phase. Genes having AUC values greater than 0.7 in both the training and testing cohorts were identified as hub genes with biomarker potential. Furthermore, a logistic regression model was developed using the R package “glmnet” (version 4.1-10) to analyze the combined diagnostic performance of these hub genes in both the training (GSE75214) and testing (GSE179285) cohorts.

2.8. Prediction of Transcription Factors

To investigate upstream regulatory mechanisms of hub genes, the TF Target Finder (TFTF) database (https://jingle.shinyapps.io/TF_Target_Finder/ (accessed on 6 August 2025)) [29] was used to predict transcription factors (TFs) potentially targeting these genes. It was hypothesized that TFs, upregulated in UC, may actively influence hub gene expression. Common TFs among the 8 characteristic genes were identified by petal diagram analysis. A TF–gene regulation network was subsequently built, and functionally relevant sub-networks were extracted.

2.9. Summary-Data–Based Mendelian Randomization (SMR) Analysis

To evaluate potential causal relationships between gene expression and UC susceptibility, summary-data–based Mendelian randomization (SMR) analysis [30] was conducted. The analysis focused on cis-regions and incorporated heterogeneity in dependent instruments (HEIDI) tests to assess instrumental variable heterogeneity and minimize horizontal pleiotropy. Genetic instruments were derived from single nucleotide polymorphisms (SNPs), integrating eQTLs as exposures and GWAS summary statistics for IBD from the FinnGen biobank (version R12; https://r12.finngen.fi/ (accessed on 7 August 2025)).

2.10. Immune Cell Infiltration Analysis

To assess immune cell dynamics in UC, CIBERSORT analysis was performed, a deconvolution method based on linear support vector regression that infers immune cell composition from bulk gene expression data. The Leukocyte Signature Matrix (LM22) was utilized as a reference to quantify the relative abundances of 22 immune cell types (CIBERSORTx; https://cibersortx.stanford.edu/ (accessed on 7 August 2025)) [31]. Correlation analysis was then performed between hub gene expression and immune cell subtypes to identify relationships relevant to UC activity.

2.11. Statistical Analysis

All statistical analyses were performed using R software (version 4.5.1). Pearson correlation coefficients were calculated to examine hub gene expression patterns. Spearman’s rank correlation was utilized to analyze relationships between immune cell infiltration and hub gene transcriptional activity. A p-value < 0.05 was considered statistically significant.

3. Results

3.1. DEGs Correlated with UC Activity

To identify DEGs associated with disease activity in UC, the GSE75214 dataset was analyzed. Samples were classified into 2 groups based on UC activity: inactive phase and active phase. DEG analysis was performed using the limma package in R, with thresholds set at |log2FC| > 1 and false discovery rate (FDR) < 0.05. A total of 715 DEGs (194 Down and 521 Up) were identified between inactive and active phase (Figure 2A). The top 50 DEGs are visualized in a clustered heatmap (Figure 2B).

3.2. Meorange Module Strongest Correlated with UC Activity

WGCNA was applied to the GSE75214 dataset to identify co-expression modules associated with UC activity. Soft-thresholding powers ranging from 1 to 30 were evaluated using the scale-free topology fit index and mean connectivity (Figure 3A). A power value of 28 was selected, achieving a scale-free fit over 0.8 while maintaining strong connectivity. Using this threshold, a co-expression network was constructed, and gene clusters were visualized in a dendrogram (Figure 3B). Modules identified with the dynamic tree cut algorithm were represented as color-coded branches, with correlated modules subsequently merged. A cluster heatmap of all genes demonstrated intermodular co-expression patterns, where color intensity reflected gene–gene connectivity strength (Figure 3C). Module–trait correlation analysis revealed that the Meorange module had the strongest positive correlation with active phase (R = 0.84, p < 0.05; Figure 3D). This module comprised 383 genes after merging.

3.3. Immune System and Cytokine-Driven Inflammation Closely Associated with UC Activity

To elucidate molecular mechanisms underlying UC activity, genes overlapping the DEGs (n = 715) and the Meorange module (n = 383) were identified. A Venn diagram showed 138 intersecting genes (Figure 4A), suggesting candidates potentially driving UC activity.
These 138 genes were subjected to GO and KEGG enrichment analysis. GO analysis covered biological process (BP), molecular function (MF), and cellular component (CC) categories. Significantly enriched BP terms were primarily associated with immune regulation, including “response to type II interferon” (reflecting IFN-γ-mediated antiviral and pro-inflammatory responses), “humoral immune response” (antibody-mediated pathogen clearance), “response to virus” (viral recognition and defense), and “response to molecule of bacterial origin” (pattern recognition of microbial antigens) (Figure 4B).
KEGG pathway analysis further emphasized inflammatory and immune signaling, notably the “NOD-like receptor signaling pathway” (cytosolic pathogen sensing and inflammasome activation), “Complement and coagulation cascades” (innate immune amplification and vascular regulation), and the “IL-17 signaling pathway” (a critical mediator of mucosal inflammation in UC) (Figure 4C). Collectively, these results indicate that UC activity is closely associated with immune system and cytokine-driven inflammation, providing a biological rationale for targeting key disease pathways.

3.4. Characteristic Genes Associated with UC Activity

The top 15 genes identified from the PPI network (Figure 5A) were considered as pivotal regulators of UC-related biological processes. To refine these candidates, machine learning was applied, constructing 113 predictive models using combinations of algorithms. The combination of Random Forest and Stepglm [forward] achieved the highest mean AUC (0.877) in both training and testing cohorts (Figure 5B), demonstrating superior discriminative capacity. From this model, 8 characteristic genes were identified: SAMD9L (Sterile Alpha Motif Domain Containing 9 Like), IFITM1 (Interferon-Induced Transmembrane Protein 1), GBP5 (Guanylate Binding Protein 5), PARP14 (Poly (ADP-Ribose) Polymerase 14), PARP9 (Poly (ADP-Ribose) Polymerase 9), CXCL10 (C-X-C Motif Chemokine Ligand 10), GBP1 (Guanylate Binding Protein 1), and CXCL11 (C-X-C Motif Chemokine Ligand 11). These genes represent promising candidates for further mechanistic studies and biomarker or therapeutic target development.

3.5. Hub Genes and Diagnostic Capability in UC Activity

The diagnostic performance of the 8 characteristic genes was evaluated using the pROC package by performing ROC curve analysis in both training and testing cohorts. Among them, CXCL11, PARP14, and IFITM1 consistently displayed high diagnostic accuracy, with AUC values surpassing 0.83 in both cohorts (Figure 6A, Table 2). The notable performance suggests their high sensitivity and specificity in discriminating active phase from inactive phase. Based on these results, CXCL11, PARP14, and IFITM1 were identified as hub genes for UC activity.
Validation in 2 independent datasets confirmed their diagnostic potential. In GSE75214, all 3 genes were significantly upregulated in active phase compared to inactive phase (p < 0.05). Similarly, in GSE179285, they showed consistent overexpression in active phase (Figure 6B). Despite variations in sample collection, patient demographics, and technology platforms, their diagnostic performance remained consistent. These findings strongly support CXCL11, PARP14, and IFITM1 as reliable diagnostic biomarkers for UC activity.

3.6. TFs of Characteristic Genes in UC Activity

To investigate transcriptional regulation, upstream transcription factors (TFs) were predicted using the TFTF database. Each gene was regulated by multiple TFs: CXCL11 (11), CXCL10 (7), GBP1 (10), GBP5 (9), IFITM1 (10), PARP9 (13), PARP14 (14), and SAMD9L (9). Petal diagram analysis identified 3 TFs common to all 8 genes: FLI1, STAT1, and IRF4 (Figure 7A). A TF–gene regulatory network was constructed with Cytoscape and analyzed via cytoHubba, which confirmed the same 3 TFs but excluded CXCL10 due to weaker regulatory connectivity (Figure 7B).
Expression validation in both training and testing datasets (Table 3) showed that (1) STAT1 and IRF4 lacked consistent differential expression, and (2) STAT1 displayed variable trends between cohorts. Consequently, only FLI1 was considered as a major TF regulating characteristic genes in UC activity.

3.7. Causal Association Between PARP14 Expression and UC Risk Identified by SMR Analysis

SMR analysis integrating GTEx v8 eQTL data and FinnGen R12 IBD GWAS data identified a significant causal association between PARP14 and UC susceptibility (Table 4). Among the 3 hub genes tested, only PARP14 retained statistical significance following colocalization analysis (Figure 8A). The colocalization plot of GWAS and eQTL signals for PARP14 showed clustering of effect sizes, supporting shared genetic variants underlying both PARP14 and IBD susceptibility (Figure 8B). This genetic evidence, combined with its upregulation in active phase, highlights PARP14 as a key driver of UC pathogenesis.

3.8. Immune Cell Infiltration Patterns Associated with UC Activity

The immunological microenvironment was characterized using the LM22 signature matrix and GSE75214 transcriptome data. Relative abundances of 22 immune cell types were estimated with CIBERSORT (Figure 9A), while correlations among immune cells were assessed (Figure 9B). Several immune subsets displayed significant differences between inactive and active phase (Figure 9C) and correlated with characteristic genes (Figure 9D). Specifically, activated dendritic cells, M0 macrophages, M1 macrophages, neutrophils, resting NK cells, and resting CD4+ memory T cells were enriched in active phase and positively correlated with most characteristic genes. These increases align with the pro-inflammatory state of UC, implicating them in cytokine production and tissue injury. Conversely, M2 macrophages, activated NK cells, CD8+ T cells, and regulatory T cells (Tregs) were decreased and negatively correlated with the same genes, supporting consistent immune–molecular relationships. Notably, PARP14 exhibited its most significant positive correlation with M1 macrophages and negative correlation with activated NK cells, further substantiating its potential role in modulating the immune imbalance associated with UC activity (Figure 9E).
Together, these results indicate that UC activity involves infiltration of pro-inflammatory immune cells, coupled with a decline in regulatory cell subsets. Moreover, significant molecular correlations suggest that PARP14 and FLI1 play pivotal roles in driving mucosal inflammation.

4. Discussion

The objective of this study was to identify potential molecular signatures of UC activity and elucidate their pathogenic roles by integrating transcriptomic analysis, machine learning, and genetic causal inference. To this end, a comprehensive analysis of 3 independent GEO datasets (GSE75214, GSE53306, GSE179285) was conducted, and the “Meorange” module was identified using WGCNA (R = 0.84, p < 0.05). This module demonstrated a robust correlation with UC activity. The integration of DEG analysis resulted in the identification of 138 core genes. Subsequent PPI networking and ensemble machine learning (RF + Stepglm [forward], average AUC = 0.877) prioritized 8 characteristic genes. Among these, CXCL11, PARP14, and IFITM1 emerged as hub genes with exceptional diagnostic performance (average AUC: 0.855, 0.840, and 0.830, respectively). Furthermore, FLI1 was identified as a pivotal transcription factor (TF) that regulates these hubs, and a causal relationship between PARP14 upregulation and UC susceptibility was confirmed via SMR. These findings contribute to our enhanced understanding of the molecular and immunological mechanisms underlying the pathogenesis of UC.
A critical contribution of this study is the identification of CXCL11, PARP14, and IFITM1 as stable diagnostic markers for UC activity, which outperform many previously reported UC biomarkers. For instance, CCL11 and MMP1 were identified as inflammation-related diagnostic biomarkers for UC, with AUC values of 0.741 and 0.703, respectively, in the GSE193677 dataset [18]. Eight novel biomarkers (e.g., TFF3, LRG, and HMGB1) were similarly evaluated [17]. In contrast, our hub genes (CXCL11, PARP14, and IFITM1) achieved consistently high average AUC values (0.830–0.855) across 3 independent cohorts of diverse ethnic backgrounds. Furthermore, SMR analysis confirmed a causal association between PARP14 and UC susceptibility. The HEIDI p-value > 0.05 indicates that the genetic instrumental variables (SNPs) used in this study predominantly influence the outcome via the target exposure, rather than through irrelevant pathways—thus ruling out significant horizontal pleiotropy. This finding provides genetic evidence supporting the pathogenic role of PARP14. Collectively, these characteristics highlight PARP14 as a potential candidate for clinical translation.
Active phase is characterized by marked dysregulation of immune cell infiltration, with pro-inflammatory subsets dominating the intestinal mucosal microenvironment [32,33]. This finding aligns with and extends the current understanding of UC immunopathogenesis. In our analysis, PARP14 was significantly positively correlated with M1 macrophage infiltration and neutrophil accumulation—2 cell types strongly implicated in UC inflammation [34,35]. PARP14 has been associated with the progression of inflammatory diseases by regulating Th2/Th17 signaling. It also acts as a downstream molecule of Klf5 involved in metabolic reprogramming and the polarization of microglia into the M1/M2 state [36,37]. These findings indicate that PARP14 may play a key role in coordinating multiple pro-inflammatory processes in UC activity, including immune infiltration, inflammatory signaling, and cellular polarization.
Furthermore, FLI1, as a key TF regulating CXCL11, PARP14, and IFITM1, indirectly shapes immune infiltration: CXCL11 (a CXC chemokine) recruits activated T cells and dendritic cells to the inflamed mucosa [18], while IFITM1 activates the NF-κB pathway (by interacting with IKKβ to promote its phosphorylation) to regulate immune and inflammatory responses [38]. Consistent with these findings, our immune cell deconvolution analysis indicated that FLI1 expression was positively correlated with Th17 cell infiltration—a subset known to drive UC activity via IL-17 secretion [39]. These data collectively demonstrate that FLI1 orchestrates a pro-inflammatory transcriptional program, amplifying immune cell infiltration through PARP14 and downstream genes, ultimately exacerbating intestinal mucosal damage.
A methodological consideration in this study is the use of IBD GWAS data rather than UC activity-specific GWAS data for SMR analysis. This approach was necessitated by current limitations in available genomic resources. To date, most public GWAS datasets for UC focus on disease susceptibility [39,40], with few explicitly stratifying samples by UC activity. Therefore, the IBD GWAS data were integrated with eQTL data from intestinal mucosal tissues [7]. While this approach cannot capture activity-specific genetic effects, it nonetheless establishes a causal relationship between PARP14 and UC susceptibility (rather than just UC activity), providing critical genetic validation for our transcriptomic findings. Future studies should prioritize generating activity-stratified UC activity GWAS and eQTL data to refine causal inferences and identify activity-specific genetic regulators.
A key finding of this study is the identification of the FLI1-PARP14 axis as a central regulator of UC activity, with implications for long-term complications such as colorectal cancer (CRC). Chronic intestinal inflammation is a well-established driver of CRC in UC patients [41], and our data suggest that the FLI1-PARP14 axis may contribute to this progression: (1) FLI1 regulates PARP14 and downstream pro-inflammatory genes, sustaining a chronic inflammatory microenvironment that promotes DNA damage accumulation in intestinal epithelial cells by increasing reactive oxygen species (ROS) and impairing DNA repair [34,35]; (2) PARP14-mediated immune infiltration (e.g., M1 macrophages, neutrophils) further amplifies inflammation-associated tissue damage, creating a self-reinforcing loop that accelerates epithelial dysplasia [41]; (3) FLI1 itself has been implicated in cancer progression by regulating cell proliferation and epithelial–mesenchymal transition [35], suggesting that it may directly and indirectly contribute to CRC development in long-standing UC. While functional experiments are needed to confirm these links, our data provide a novel framework: targeting the FLI1-PARP14 axis could not only alleviate UC activity by reducing immune infiltration but also potentially lower CRC risk by mitigating chronic inflammation.
Several limitations of this study warrant consideration. First, reliance on GEO datasets introduces potential selection bias (e.g., overrepresentation of patients of European ancestry), which may limit the generalizability of our hub genes to diverse populations. Second, transcriptomic data alone cannot confirm protein-level expression; future studies should validate PARP14 in UC mucosal biopsies via immunohistochemistry or Western blotting [18]. Third, functional experiments (e.g., PARP14 knockdown in intestinal epithelial cells, FLI1 ChIP-seq to confirm binding to the PARP14 promoter) are needed to directly verify the regulatory mechanisms identified. Fourth, potential crosstalk between PARP14 and gut microbiota—an important factor in UC pathogenesis was not explored [6]—which may further clarify its role in inflammation. To address these gaps, future research could: (1) validate hub genes in multi-ethnic cohorts and explore non-invasive detection (e.g., serum or fecal PARP14); (2) test PARP14 inhibitors (e.g., RBN012759) in dextran sulfate sodium (DSS)-induced UC mice to evaluate effects on mucosal healing and CRC development [42]; (3) integrate microbiomics data to investigate interactions between PARP14 and gut microbiota [6].

5. Conclusions

In this study, PARP14 was identified as a potential biomarker associated with UC activity, demonstrating promising clinical utility. A potential FLI1–PARP14–immune infiltration axis was further proposed, which may be involved in UC pathogenesis: FLI1 may be associated with PARP14 and downstream genes, potentially leading to increased infiltration of M1 macrophages, neutrophils, and Th17 cells, thereby suggested to exacerbate intestinal mucosal inflammation. This axis is also proposed to be linked to the potential risk of CRC in chronic UC, possibly through the maintenance of inflammatory damage and influence on epithelial dysplasia. These findings are considered to provide insights into the molecular mechanisms of UC and to lay a foundation for further research on precision medicine approaches for UC diagnosis and treatment.

Author Contributions

Conceptualization, Z.Z. and G.S.; methodology, Z.Z.; software, Z.Z.; investigation, Y.Z.; resources, G.S.; data curation, Z.Z. and Y.Z.; writing—original draft preparation, Z.Z.; writing—review and editing, Y.Z., Z.G., H.C. and G.S.; visualization, Z.G.; supervision, H.C. and G.S.; validation; formal analysis; project administration; funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (82372809 and 82572979) and the Special Fund for Public Welfare Research Institutes of Fujian Province (2023R1001003).

Institutional Review Board Statement

This manuscript does not contain any individual person’s private data (e.g., images, clinical details). All analyzed data were anonymized and publicly accessible through the GEO repository.

Informed Consent Statement

Not applicable.

Data Availability Statement

The RNA expression dataset is publicly accessible through the NCBI GEO repository at https://www.ncbi.nlm.nih.gov/geo/ (accessed on 5 August 2025). All findings from this research have been incorporated into the manuscript. Additional data can be obtained by contacting the corresponding author with a formal request.

Acknowledgments

During the preparation of this manuscript, the authors used DeepSeek-R1 for the purposes of generating text. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AUCArea under the receiver operating characteristic curve
CDCrohn’s disease
CRCColorectal cancer
DEGDifferentially expressed gene
DSSDextran sulfate sodium
eQTLExpression quantitative trait locus
GBMGradient boosting machine
GOGene ontology
GWASGenome-wide association study
IBDInflammatory bowel disease
KEGGKyoto encyclopedia of genes and genomes
LDALinear discriminant analysis
LM22Leukocyte signature matrix
LassoLeast absolute shrinkage and selection operator
MCCMaximal clique centrality
PPIProtein–protein interaction
plsR-glmPartial least squares regression with generalized linear model
RidgeRidge regression
RFRandom forest
ROCReceiver operating characteristic
SMRSummary-data–based mendelian randomization
SVMSupport vector machine
StepglmStepwise generalized linear model
glmBoostBoosted generalized linear model
UCUlcerative colitis
WGCNAWeighted gene co-expression network analysis
XGBoostExtreme gradient boosting

References

  1. Seyedian, S.S.; Nokhostin, F.; Malamir, M.D. A Review of the Diagnosis, Prevention, and Treatment Methods of Inflammatory Bowel Disease. J. Med. Life 2019, 12, 113–122. [Google Scholar] [CrossRef]
  2. Ungaro, R.; Mehandru, S.; Allen, P.B.; Peyrin-Biroulet, L.; Colombel, J.-F. Ulcerative Colitis. Lancet 2017, 389, 1756–1770. [Google Scholar] [CrossRef] [PubMed]
  3. Le Berre, C.; Honap, S.; Peyrin-Biroulet, L. Ulcerative Colitis. Lancet 2023, 402, 571–584. [Google Scholar] [CrossRef]
  4. Du, L.; Ha, C. Epidemiology and Pathogenesis of Ulcerative Colitis. Gastroenterol. Clin. N. Am. 2020, 49, 643–654. [Google Scholar] [CrossRef]
  5. Porter, R.J.; Kalla, R.; Ho, G.-T. Ulcerative Colitis: Recent Advances in the Understanding of Disease Pathogenesis. F1000Researchs 2020, 9, F1000 Faculty Rev-294. [Google Scholar] [CrossRef]
  6. Shen, Z.-H.; Zhu, C.-X.; Quan, Y.-S.; Yang, Z.-Y.; Wu, S.; Luo, W.-W.; Tan, B.; Wang, X.-Y. Relationship between Intestinal Microbiota and Ulcerative Colitis: Mechanisms and Clinical Application of Probiotics and Fecal Microbiota Transplantation. World J. Gastroenterol. 2018, 24, 5–14. [Google Scholar] [CrossRef]
  7. Mauduit, A.; Mas, E.; Solà-Tapias, N.; Ménard, S.; Barreau, F. Main Genetic Factors Associated with Inflammatory Bowel Diseases and Their Consequences on Intestinal Permeability: Involvement in Gut Inflammation. J. Gastroenterol. 2025, 60, 1323–1338. [Google Scholar] [CrossRef] [PubMed]
  8. Yao, D.; Dai, W.; Dong, M.; Dai, C.; Wu, S. MUC2 and Related Bacterial Factors: Therapeutic Targets for Ulcerative Colitis. EBioMedicine 2021, 74, 103751. [Google Scholar] [CrossRef] [PubMed]
  9. Wang, Y.; Zhuang, H.; Jiang, X.-H.; Zou, R.-H.; Wang, H.-Y.; Fan, Z.-N. Unveiling the Key Genes, Environmental Toxins, and Drug Exposures in Modulating the Severity of Ulcerative Colitis: A Comprehensive Analysis. Front. Immunol. 2023, 14, 1162458. [Google Scholar]
  10. Feuerstein, J.D.; Moss, A.C.; Farraye, F.A. Ulcerative Colitis. Mayo Clin. Proc. 2019, 94, 1357–1373. [Google Scholar] [CrossRef]
  11. Cleveland, N.K.; Torres, J.; Rubin, D.T. What Does Disease Progression Look Like in Ulcerative Colitis, and How Might It Be Prevented? Gastroenterology 2022, 162, 1396–1408. [Google Scholar] [CrossRef] [PubMed]
  12. Kucharzik, T.; Koletzko, S.; Kannengiesser, K.; Dignass, A. Ulcerative Colitis-Diagnostic and Therapeutic Algorithms. Dtsch. Arztebl. Int. 2020, 117, 564–574. [Google Scholar] [CrossRef]
  13. Feuerstein, J.D.; Isaacs, K.L.; Schneider, Y.; Siddique, S.M.; Falck-Ytter, Y.; Singh, S.; AGA Institute Clinical Guidelines Committee. AGA Clinical Practice Guidelines on the Management of Moderate to Severe Ulcerative Colitis. Gastroenterology 2020, 158, 1450–1461. [Google Scholar] [CrossRef]
  14. Salice, M.; Rizzello, F.; Calabrese, C.; Calandrini, L.; Gionchetti, P. A Current Overview of Corticosteroid Use in Active Ulcerative Colitis. Expert. Rev. Gastroenterol. Hepatol. 2019, 13, 557–561. [Google Scholar] [CrossRef]
  15. Lim, C.-T.; Teichert, C.; Pruijt, M.; De Voogd, F.; D’Haens, G.; Gecse, K. Transmural Healing in Ulcerative Colitis Patients Improves Long-Term Outcomes Compared to Endoscopic Healing Alone. J. Crohns Colitis 2025, 19, jjaf149. [Google Scholar] [CrossRef] [PubMed]
  16. Lakatos, P.-L.; Lakatos, L. Risk for Colorectal Cancer in Ulcerative Colitis: Changes, Causes and Management Strategies. World J. Gastroenterol. 2008, 14, 3937–3947. [Google Scholar] [CrossRef] [PubMed]
  17. Nakov, R. New Markers in Ulcerative Colitis. Clin. Chim. Acta 2019, 497, 141–146. [Google Scholar] [CrossRef]
  18. Dai, F.; Ye, S.; Zhu, Y.; Zhang, J. Identification of Inflammation-Related Diagnostic Biomarker and Molecular Subtypes in Ulcerative Colitis Based on Machine Learning. Dig. Dis. Sci. 2025. [Google Scholar] [CrossRef]
  19. Chen, G.; Qi, H.; Jiang, L.; Sun, S.; Zhang, J.; Yu, J.; Liu, F.; Zhang, Y.; Du, S. Integrating Single-Cell RNA-Seq and Machine Learning to Dissect Tryptophan Metabolism in Ulcerative Colitis. J. Transl. Med. 2024, 22, 1121. [Google Scholar] [CrossRef]
  20. Võsa, U.; Claringbould, A.; Westra, H.-J.; Bonder, M.J.; Deelen, P.; Zeng, B.; Kirsten, H.; Saha, A.; Kreuzhuber, R.; Yazar, S.; et al. Large-Scale Cis- and Trans-eQTL Analyses Identify Thousands of Genetic Loci and Polygenic Scores That Regulate Blood Gene Expression. Nat. Genet. 2021, 53, 1300–1310. [Google Scholar] [CrossRef]
  21. Smith, G.D.; Ebrahim, S. “Mendelian Randomization”: Can Genetic Epidemiology Contribute to Understanding Environmental Determinants of Disease? Int. J. Epidemiol. 2003, 32, 1–22. [Google Scholar] [CrossRef]
  22. Huang, Y.; Liu, J.; Liang, D. Comprehensive Analysis Reveals Key Genes and Environmental Toxin Exposures Underlying Treatment Response in Ulcerative Colitis Based on In-Silico Analysis and Mendelian Randomization. Aging 2023, 15, 14141–14171. [Google Scholar] [CrossRef]
  23. Chaudhary, V.; Chung, F.R.; Delau, O.; Dane, B.; Levine, I.; Meng, X.; Chodosh, J.; da Luz Moreira, A.; Simon, J.N.; Axelrad, J.E.; et al. Risk of Malnutrition Increases in the Year Prior to Surgery among Patients with Inflammatory Bowel Disease. Therap. Adv. Gastroenterol. 2025, 18, 17562848251365036. [Google Scholar]
  24. Vancamelbeke, M.; Vanuytsel, T.; Farré, R.; Verstockt, S.; Ferrante, M.; Van Assche, G.; Rutgeerts, P.; Schuit, F.; Vermeire, S.; Arijs, I.; et al. Genetic and Transcriptomic Bases of Intestinal Epithelial Barrier Dysfunction in Inflammatory Bowel Disease. Inflamm. Bowel Dis. 2017, 23, 1718–1729. [Google Scholar] [CrossRef]
  25. Zhao, X.; Fan, J.; Zhi, F.; Li, A.; Li, C.; Berger, A.E.; Boorgula, M.P.; Barkataki, S.; Courneya, J.-P.; Chen, Y.; et al. Mobilization of Epithelial Mesenchymal Transition Genes Distinguishes Active from Inactive Lesional Tissue in Patients with Ulcerative Colitis. Hum. Mol. Genet. 2015, 24, 4615–4624. [Google Scholar] [CrossRef] [PubMed]
  26. Keir, M.E.; Fuh, F.; Ichikawa, R.; Acres, M.; Hackney, J.A.; Hulme, G.; Carey, C.D.; Palmer, J.; Jones, C.J.; Long, A.K.; et al. Regulation and Role of αE Integrin and Gut Homing Integrins in Migration and Retention of Intestinal Lymphocytes during Inflammatory Bowel Disease. J. Immunol. 2021, 207, 2245–2254. [Google Scholar] [CrossRef]
  27. Langfelder, P.; Horvath, S. WGCNA: An R Package for Weighted Correlation Network Analysis. BMC Bioinform. 2008, 9, 559. [Google Scholar] [CrossRef] [PubMed]
  28. Liu, Z.; Liu, L.; Weng, S.; Guo, C.; Dang, Q.; Xu, H.; Wang, L.; Lu, T.; Zhang, Y.; Sun, Z.; et al. Machine Learning-Based Integration Develops an Immune-Derived lncRNA Signature for Improving Outcomes in Colorectal Cancer. Nat. Commun. 2022, 13, 816. [Google Scholar] [CrossRef] [PubMed]
  29. Wang, J. TFTF: An R-Based Integrative Tool for Decoding Human Transcription Factor-Target Interactions. Biomolecules 2024, 14, 749. [Google Scholar] [CrossRef]
  30. Zhu, Z.; Zhang, F.; Hu, H.; Bakshi, A.; Robinson, M.R.; Powell, J.E.; Montgomery, G.W.; Goddard, M.E.; Wray, N.R.; Visscher, P.M.; et al. Integration of Summary Data from GWAS and eQTL Studies Predicts Complex Trait Gene Targets. Nat. Genet. 2016, 48, 481–487. [Google Scholar] [CrossRef]
  31. Newman, A.M.; Liu, C.L.; Green, M.R.; Gentles, A.J.; Feng, W.; Xu, Y.; Hoang, C.D.; Diehn, M.; Alizadeh, A.A. Robust Enumeration of Cell Subsets from Tissue Expression Profiles. Nat. Methods 2015, 12, 453–457. [Google Scholar] [CrossRef]
  32. Chen, K.; Shang, S.; Yu, S.; Cui, L.; Li, S.; He, N. Identification and Exploration of Pharmacological Pyroptosis-Related Biomarkers of Ulcerative Colitis. Front. Immunol. 2022, 13, 998470. [Google Scholar] [CrossRef]
  33. Qu, F.; Xu, B.; Kang, H.; Wang, H.; Ji, J.; Pang, L.; Wu, Y.; Zhou, Z. The Role of Macrophage Polarization in Ulcerative Colitis and Its Treatment. Microb. Pathog. 2025, 199, 107227. [Google Scholar] [CrossRef] [PubMed]
  34. Long, D.; Mao, C.; Huang, Y.; Xu, Y.; Zhu, Y. Ferroptosis in Ulcerative Colitis: Potential Mechanisms and Promising Therapeutic Targets. Biomed. Pharmacother. 2024, 175, 116722. [Google Scholar] [CrossRef] [PubMed]
  35. Song, Y.; Song, Q.; Tan, F.; Wang, Y.; Li, C.; Liao, S.; Yu, K.; Mei, Z.; Lv, L. Seliciclib Alleviates Ulcerative Colitis by Inhibiting Ferroptosis and Improving Intestinal Inflammation. Life Sci. 2024, 351, 122794. [Google Scholar] [CrossRef]
  36. Zhang, Y.; Chen, J.-C.; Zheng, J.-H.; Cheng, Y.-Z.; Weng, W.-P.; Zhong, R.-L.; Sun, S.-L.; Shi, Y.-S.; Pan, X.-D. Pterosin B Improves Cognitive Dysfunction by Promoting Microglia M1/M2 Polarization through Inhibiting Klf5/Parp14 Pathway. Phytomedicine 2024, 135, 156152. [Google Scholar] [CrossRef]
  37. Wu, S.; Zeng, X.; Liu, J.; Cong, K.; Lou, S.; Li, Z.; Wei, P.; Shao, L.; Zhang, Y.; Qu, L.; et al. Discovery and Optimization of Potent and Highly Selective PARP14 Inhibitors for the Treatment of Atopic Dermatitis. J. Med. Chem. 2025, 68, 9755–9776. [Google Scholar] [CrossRef]
  38. Sun, X.; Zhu, L.; Mou, C.; Zhao, J.; Go, Y.Y.; Shi, K.; Chen, Z. African Swine Fever Virus A179L Inhibits Interferon Induced Transmembrane Protein 1 Activation of NF-κB Pathway. Cell Commun. Signal. 2025, 23, 380. [Google Scholar] [CrossRef]
  39. Xu, H.; Wang, Z.-H.; Zhong, S.-L.; Chen, T.; Liu, S.-Z.; Zhang, S.-L.; Xie, X.-X.; Liu, T.; Yang, W. Rhoifolin Attenuates DSS-Induced Colitis in Mice by Modulating Gut Microbiota and Restoring Th17/Treg Balance. J. Inflamm. Res. 2025, 18, 11109–11124. [Google Scholar] [CrossRef] [PubMed]
  40. Bruland, T.; Østvik, A.E.; Sandvik, A.K.; Hansen, M.D. Host-Viral Interactions in the Pathogenesis of Ulcerative Colitis. Int. J. Mol. Sci. 2021, 22, 10851. [Google Scholar] [CrossRef]
  41. Bhama, A.R.; Kapadia, M.R. Management of Dysplasia in Ulcerative Colitis. J. Laparoendosc. Adv. Surg. Tech. A 2021, 31, 855–860. [Google Scholar] [CrossRef] [PubMed]
  42. Tang, Y.; Wang, Z.; Zhou, F.; Li, L.; Sun, C.; Li, L.; Tang, F.; Huang, D.; Li, Z.; Tan, Y.; et al. Benzoylpaeoniflorin Alleviates Ulcerative Colitis by Inhibiting Ferroptosis through Targeting Phosphogluconic Dehydrogenase. Phytomedicine 2025, 147, 157111. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Flowchart of the study. Abbreviations: gene expression omnibus (GEO); differentially expressed genes (DEGs); weighted gene co-expression network analysis (WGCNA); protein–protein interaction (PPI); expression quantitative trait locus (eQTL); genome-wide association study (GWAS); summary-data-based mendelian randomization (SMR).
Figure 1. Flowchart of the study. Abbreviations: gene expression omnibus (GEO); differentially expressed genes (DEGs); weighted gene co-expression network analysis (WGCNA); protein–protein interaction (PPI); expression quantitative trait locus (eQTL); genome-wide association study (GWAS); summary-data-based mendelian randomization (SMR).
Genes 16 01342 g001
Figure 2. Differentially expressed genes analysis (DEGs) for UC activity in the GSE75214 dataset (n = 97; 23 inactive, 74 active). (A) Volcano plot of DEGs. Blue represents downregulated genes (log2FC < −1, FDR < 0.05), red represents upregulated genes (log2FC > 1, FDR < 0.05), and gray represents non-differentially expressed genes. (B) Clustered heatmap of the top 50 DEGs. Rows represent DEGs; columns represent samples.
Figure 2. Differentially expressed genes analysis (DEGs) for UC activity in the GSE75214 dataset (n = 97; 23 inactive, 74 active). (A) Volcano plot of DEGs. Blue represents downregulated genes (log2FC < −1, FDR < 0.05), red represents upregulated genes (log2FC > 1, FDR < 0.05), and gray represents non-differentially expressed genes. (B) Clustered heatmap of the top 50 DEGs. Rows represent DEGs; columns represent samples.
Genes 16 01342 g002
Figure 3. Weighted gene co-expression network analysis (WGCNA) results for UC activity. (A) Threshold selection for WGCNA analysis; the optimal soft-thresholding power was 28. (B) Gene clustering dendrogram generated by WGCNA. (C) Heatmap of co-expression patterns across merged gene modules. (D) Module–trait relationships. Each cell contains the Pearson correlation coefficient and corresponding p-value; the color gradient indicates the direction and strength of the correlation.
Figure 3. Weighted gene co-expression network analysis (WGCNA) results for UC activity. (A) Threshold selection for WGCNA analysis; the optimal soft-thresholding power was 28. (B) Gene clustering dendrogram generated by WGCNA. (C) Heatmap of co-expression patterns across merged gene modules. (D) Module–trait relationships. Each cell contains the Pearson correlation coefficient and corresponding p-value; the color gradient indicates the direction and strength of the correlation.
Genes 16 01342 g003
Figure 4. Functional enrichment analysis of genes overlapping between differentially expressed genes (DEGs) and the Weighted gene co-expression network analysis (WGCNA) -identified Meorange module. (A) Venn diagram illustrating the intersection of DEGs and Meorange module genes. (B) Gene ontology (GO) enrichment analysis: circle plot displaying the top significantly enriched terms. (C) Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analysis: the top 15 significantly enriched pathways.
Figure 4. Functional enrichment analysis of genes overlapping between differentially expressed genes (DEGs) and the Weighted gene co-expression network analysis (WGCNA) -identified Meorange module. (A) Venn diagram illustrating the intersection of DEGs and Meorange module genes. (B) Gene ontology (GO) enrichment analysis: circle plot displaying the top significantly enriched terms. (C) Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analysis: the top 15 significantly enriched pathways.
Genes 16 01342 g004
Figure 5. Protein–protein interaction (PPI) and Machine Learning analysis. (A) Top 15 genes identified from the PPI network. (B) Performance evaluation of 113 machine learning algorithm combinations using stratified 10-fold cross-validation. Abbreviations: Area under the receiver operating characteristic curve (AUC).
Figure 5. Protein–protein interaction (PPI) and Machine Learning analysis. (A) Top 15 genes identified from the PPI network. (B) Performance evaluation of 113 machine learning algorithm combinations using stratified 10-fold cross-validation. Abbreviations: Area under the receiver operating characteristic curve (AUC).
Genes 16 01342 g005
Figure 6. Diagnostic performance and expression validation of characteristic genes. (A) Receiver operating characteristic (ROC) curves of the 8 characteristic genes in the training and testing cohorts. (B) Expression levels of the 3 hub genes (CXCL11, PARP14, IFITM1) in the training and testing cohorts.
Figure 6. Diagnostic performance and expression validation of characteristic genes. (A) Receiver operating characteristic (ROC) curves of the 8 characteristic genes in the training and testing cohorts. (B) Expression levels of the 3 hub genes (CXCL11, PARP14, IFITM1) in the training and testing cohorts.
Genes 16 01342 g006
Figure 7. Transcriptional regulatory network analysis of characteristic genes. (A) Petal diagram of transcription factors (TFs) targeting the 8 characteristic genes. Each petal represents 1 gene (labeled with gene name), with the number of regulating TFs indicated in parentheses; the central region shows 3 common TFs (FLI1, STAT1, and IRF4) targeting all 8 genes. (B) Core TF–gene regulatory subnetwork, comprising 3 TFs and 7 characteristic genes, highlighting key transcriptional interactions in UC activity.
Figure 7. Transcriptional regulatory network analysis of characteristic genes. (A) Petal diagram of transcription factors (TFs) targeting the 8 characteristic genes. Each petal represents 1 gene (labeled with gene name), with the number of regulating TFs indicated in parentheses; the central region shows 3 common TFs (FLI1, STAT1, and IRF4) targeting all 8 genes. (B) Core TF–gene regulatory subnetwork, comprising 3 TFs and 7 characteristic genes, highlighting key transcriptional interactions in UC activity.
Genes 16 01342 g007
Figure 8. Summary-data–based mendelian randomization (SMR) analysis integrating expression quantitative trait locus (eQTL) and genome-wide association study (GWAS) data (FinnGen R12). (A) Manhattan plot illustrating genes associated with IBD susceptibility. (B) Scatter plot of eQTL versus GWAS effect sizes for the PARP14 locus, indicating a shared genetic association.
Figure 8. Summary-data–based mendelian randomization (SMR) analysis integrating expression quantitative trait locus (eQTL) and genome-wide association study (GWAS) data (FinnGen R12). (A) Manhattan plot illustrating genes associated with IBD susceptibility. (B) Scatter plot of eQTL versus GWAS effect sizes for the PARP14 locus, indicating a shared genetic association.
Genes 16 01342 g008
Figure 9. Immune infiltration patterns and correlations with characteristic genes in the training cohort. (A) Heatmap of immune cell infiltration abundances. (B) Correlation heatmap among immune cell subsets. (C) Box plots showing differential infiltration levels of immune cell subsets between inactive and active phase. (D) Correlation heatmap between characteristic genes and immune cell subtypes. (E) Lollipop plot for correlations between PARP14 and immune cell subsets. Statistical significance is denoted as follows: ns, not significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001.
Figure 9. Immune infiltration patterns and correlations with characteristic genes in the training cohort. (A) Heatmap of immune cell infiltration abundances. (B) Correlation heatmap among immune cell subsets. (C) Box plots showing differential infiltration levels of immune cell subsets between inactive and active phase. (D) Correlation heatmap between characteristic genes and immune cell subtypes. (E) Lollipop plot for correlations between PARP14 and immune cell subsets. Statistical significance is denoted as follows: ns, not significant; *, p < 0.05; **, p < 0.01; ***, p < 0.001; ****, p < 0.0001.
Genes 16 01342 g009
Table 1. GEO datasets and samples.
Table 1. GEO datasets and samples.
DatasetsSamples from UC PatientsGroups
Inactive PhaseActive Phase
GSE752142374Training cohort
GSE533061216Testing cohorts
GSE1792853223
Table 2. AUC of 8 characteristic genes in the training cohort, and testing cohorts.
Table 2. AUC of 8 characteristic genes in the training cohort, and testing cohorts.
GeneTrainingTestingAverage AUC
GSE75214GSE53306GSE179285
CXCL110.90.7710.8940.855
PARP140.9520.7030.8640.840
IFITM10.9730.7290.7890.830
SAMD9L0.9750.5620.8740.804
GBP10.9490.6460.7920.796
GBP50.9610.6090.8030.791
PARP90.9360.6040.8180.786
CXCL100.9070.6090.7640.760
Table 3. Transcription factors (TFs) targeting characteristic genes.
Table 3. Transcription factors (TFs) targeting characteristic genes.
Key TFDescriptionTrainingGSE53306GSE179285
p Valuelog2FCp Valuelog2FCp Valuelog2FC
FLI1Fli-1 proto-oncogene, ETS transcription factor1.07 × 10−101.0780.03050.5609.47 × 10−40.702
STAT1signal transducer and activator of transcription 13.87 × 10−141.0350.404−0.1941.96 × 10−40.833
IRF4interferon regulatory factor 44.67 × 10−161.6650.1620.4537.27 × 10−50.739
Table 4. PARP14 SMR analysis.
Table 4. PARP14 SMR analysis.
Geneb_SMRse_SMRp_SMRp_HEIDInsnp_HEIDI
PARP140.07220.03480.03780.24020
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, Z.; Zhang, Y.; Gao, Z.; Chen, H.; Song, G. Integrating Transcriptomics and Machine Learning to Uncover the FLI1-PARP14-Immune Axis in Ulcerative Colitis Activity and Pathogenesis. Genes 2025, 16, 1342. https://doi.org/10.3390/genes16111342

AMA Style

Zheng Z, Zhang Y, Gao Z, Chen H, Song G. Integrating Transcriptomics and Machine Learning to Uncover the FLI1-PARP14-Immune Axis in Ulcerative Colitis Activity and Pathogenesis. Genes. 2025; 16(11):1342. https://doi.org/10.3390/genes16111342

Chicago/Turabian Style

Zheng, Zhizhong, Yayu Zhang, Zhixing Gao, Houyu Chen, and Gang Song. 2025. "Integrating Transcriptomics and Machine Learning to Uncover the FLI1-PARP14-Immune Axis in Ulcerative Colitis Activity and Pathogenesis" Genes 16, no. 11: 1342. https://doi.org/10.3390/genes16111342

APA Style

Zheng, Z., Zhang, Y., Gao, Z., Chen, H., & Song, G. (2025). Integrating Transcriptomics and Machine Learning to Uncover the FLI1-PARP14-Immune Axis in Ulcerative Colitis Activity and Pathogenesis. Genes, 16(11), 1342. https://doi.org/10.3390/genes16111342

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop