1. Introduction
Diabetic cardiomyopathy (DCM) is increasingly recognized as a distinct myocardial disorder characterized by structural remodeling and functional impairment in the absence of overt coronary artery disease or uncontrolled hypertension [
1]. The clinical phenotype encompasses early diastolic dysfunction, progressive myocardial stiffness, and, in advanced stages, systolic impairment. These alterations are thought to arise from a complex interplay of metabolic derangements, oxidative stress, and chronic low-grade inflammation associated with diabetes mellitus [
2].
At the cellular level, hyperglycemia-driven metabolic stress promotes mitochondrial dysfunction, excessive reactive oxygen species (ROS) generation, and impaired redox homeostasis [
2]. These changes not only disrupt cardiomyocyte energetics but also activate maladaptive signaling pathways that contribute to both cell death and extracellular matrix (ECM) remodeling. In this context, increasing attention has been directed toward regulated forms of cell death beyond apoptosis, particularly ferroptosis, which is characterized by iron-dependent lipid peroxidation [
3].
Ferroptosis is mechanistically linked to the failure of antioxidant defense systems, most notably involving glutathione peroxidase 4 (GPX4), which normally detoxifies lipid hydroperoxides [
3]. Impairment of GPX4 activity, coupled with increased availability of redox-active iron and enhanced incorporation of polyunsaturated fatty acids into membrane phospholipids via acyl-CoA synthetase long-chain family member 4 (ACSL4), creates a permissive environment for lipid peroxidation and membrane damage [
3,
4]. Emerging experimental data suggest that these processes may contribute to cardiomyocyte injury in diabetic settings, although their precise role in human DCM remains to be fully elucidated [
2,
3].
Parallel to cell death mechanisms, myocardial fibrosis represents a central structural hallmark of DCM and a major determinant of adverse clinical outcomes [
5]. Fibrotic remodeling is largely driven by activation of the transforming growth factor-beta (TGF-β) signaling pathway, leading to downstream SMAD-dependent transcriptional programs that promote ECM deposition and fibroblast activation [
5]. The resulting increase in myocardial stiffness contributes directly to diastolic dysfunction and impaired ventricular compliance.
Importantly, fibrotic processes do not occur in isolation but are closely modulated by inflammatory and metabolic signaling networks. Among these, signal transducer and activator of transcription 3 (STAT3) and protein kinase B (AKT1) have been implicated as key integrative nodes linking metabolic stress, inflammation, and tissue remodeling [
6]. STAT3 activation has been associated with both pro-inflammatory and profibrotic responses, while AKT1 signaling plays a dual role in cell survival and metabolic adaptation under stress conditions [
6]. These pathways may therefore serve as critical mediators bridging cellular injury and structural remodeling in DCM.
Taken together, these observations are compatible with the concept that ferroptosis and fibrosis may represent interconnected rather than independent processes within the diabetic heart. Oxidative stress, mitochondrial dysfunction, and inflammatory signaling constitute plausible mechanistic links that integrate these pathways into a broader ferroptosis–fibrosis axis. However, despite growing interest in this concept, current evidence remains fragmented, with most studies focusing on individual pathways rather than their integrated network behavior.
From a therapeutic perspective, the multifactorial nature of DCM poses a challenge for conventional single-target interventions. Increasing emphasis has therefore been placed on multitarget pharmacological strategies capable of modulating multiple interconnected pathways simultaneously. In this setting, computational approaches such as network pharmacology provide a systems-level framework to identify potential regulatory hubs and drug–target interactions across complex biological networks [
7]. When integrated with molecular docking techniques, these approaches may offer additional structural insights into ligand–target binding affinities and interaction patterns.
Nevertheless, it should be emphasized that such computational strategies are inherently exploratory and hypothesis-generating. They do not establish causality but rather provide a structured means of prioritizing targets and candidate compounds for further experimental validation. Accordingly, careful interpretation and methodological transparency are essential to avoid overstatement of findings.
In this context, the present study was designed as an integrative systems-level investigation of the ferroptosis–fibrosis axis in cardiomyopathy with exploratory relevance to DCM. Publicly available transcriptomic data were analyzed to identify differentially expressed genes, characterize enriched biological processes and pathways, construct protein–protein interaction networks, and prioritize candidate hub genes associated with ECM remodeling and stress-related responses. To provide additional support for candidate gene prioritization, receiver operating characteristic (ROC) analysis and machine learning–based feature selection approaches were subsequently applied. Finally, molecular docking was used as an exploratory structural approach to evaluate potential ligand–target compatibility within selected fibrosis-, ferroptosis-, and stress-related pathways.
The primary objective of this study was not to establish causal mechanisms or therapeutic efficacy, but rather to explore potential biological relationships linking ECM remodeling, ferroptosis-associated stress responses, and cardiometabolic dysfunction within an integrated analytical framework. The resulting observations are intended to generate testable hypotheses, prioritize candidate genes and pathways for future investigation, and provide a foundation for subsequent experimental validation.
3. Results
3.1. Differential Expression Results in GSE5406
Differential expression analysis of the GSE5406 dataset identified a subset of genes exhibiting statistically significant transcriptional alterations between nonfailing myocardial samples and cardiomyopathy samples (ischemic and idiopathic combined). Using predefined thresholds of adjusted
p-value < 0.05 and absolute log2 fold-change ≥ 1, a total of 63 differentially expressed probe sets corresponding to 49 unique gene symbol entries were retained for downstream analyses, with the top-ranked genes summarized in
Table 1. The complete list of differentially expressed probe sets is provided in
Supplementary Table S1.
Visualization of differential expression patterns using a volcano plot demonstrated a broad distribution of transcripts, with a subset exceeding both statistical significance and fold-change thresholds (
Figure 1A). While the majority of genes clustered near the center, consistent with relatively modest expression changes, a distinct group of transcripts exhibited more pronounced upregulation and downregulation, consistent with a detectable disease-associated transcriptional signal.
The mean–difference (MA) plot further illustrated the relationship between fold-change and average expression levels, suggesting that differential expression was not restricted to a specific expression range (
Figure 1B). Both upregulated and downregulated genes were distributed across a wide span of expression intensities, supporting the robustness of the observed transcriptional differences.
Unsupervised dimensionality reduction using UMAP revealed partial separation between nonfailing and cardiomyopathy samples (
Figure 1C). Although a general clustering trend was observed, a degree of overlap between groups persisted, suggesting underlying biological heterogeneity within the cardiomyopathy cohort.
Global expression distributions were highly comparable between groups, as demonstrated by density plots (
Figure 1D), consistent with appropriate normalization and absence of major technical bias. This observation was further supported by the mean–variance trend plot (
Figure 1E), which demonstrated stable variance across a broad range of expression values, consistent with the assumptions of the limma modeling framework.
Finally, the distribution of adjusted
p-values showed an expected deviation from the null distribution, with a subset of low
p-values supporting the presence of true differential expression signals while consistent with appropriate control of multiple testing (
Figure 1F).
Taken together, these findings suggest that the GSE5406 dataset captures a detectable yet heterogeneous transcriptional signature associated with cardiomyopathic remodeling. Accordingly, the identified DEG set was considered suitable for subsequent network-based and enrichment analyses.
3.2. Cross-Dataset Comparison
To assess the directional consistency of the discovery-cohort transcriptional findings, cross-dataset comparison was performed using the independent GSE263297 dataset. Of the 51 discovery genes evaluated, all 51 were represented in the GSE263297 expression matrix. Directional comparison showed that 20 genes exhibited the same direction of expression change between GSE5406 and GSE263297, corresponding to a directional concordance rate of 39.2% (
Table 2). Only one gene showed both same-direction change and an absolute log2 fold-change ≥ 1 in the independent cohort. Because adjusted
p-values in GSE263297 did not support statistically significant replication of the discovery findings, this analysis was interpreted as directional comparison rather than robust external validation.
For the independent dataset, log2 fold-change values were recalculated using normalized expression data by comparing ICM-DM samples (n = 7) with donor controls (n = 7). Statistical comparisons were performed using Welch’s t-test, and p-values were adjusted using the Benjamini–Hochberg false discovery rate method. Among the concordant genes, MAFF demonstrated the largest effect size in GSE263297 (log2FC = 1.24), whereas the remaining 19 genes showed attenuated expression differences in the independent cohort, with absolute log2 fold-change values below 1. Although none of the concordant genes remained statistically significant after multiple-testing correction in GSE263297, the observed directional agreement between datasets suggests limited cross-cohort consistency of the transcriptional trends identified in GSE5406. The reduced effect sizes and lack of statistical significance in the independent cohort may reflect differences in cohort composition, sample size, disease characteristics, and transcriptomic platform (microarray versus RNA sequencing).
3.3. PPI Network and Functional Enrichment Analysis
The PPI network constructed from the DEG-derived input list comprised 51 nodes and 26 edges, with a PPI enrichment
p-value of 8.78 × 10
−12, suggesting that the observed interactions significantly exceeded random expectation and indicating non-random functional connectivity within the network (
Figure 2A). Topological inspection revealed a limited number of moderately connected nodes, with COL1A1 (degree = 6), and COL1A2 and COL3A1 (degree = 5) emerging as the most connected elements. In contrast, a substantial proportion of nodes exhibited low or zero degree, consistent with a partially fragmented network structure (
Figure 2A). This pattern may reflect the coexistence of a central ECM-related module alongside more isolated functional components.
Gene Ontology (GO) enrichment analysis demonstrated a consistent pattern centered on ECM organization and stress-related processes (
Figure 2B–D,
Table 3). Within the Biological Process category, enrichment was observed for response to stress (25 genes; FDR = 0.0025), response to inorganic substance (10 genes; FDR = 0.0071), and nitric oxide transport (3 genes; FDR = 0.0071) (
Figure 2B,
Table 3). These broadly defined processes may suggest activation of adaptive and redox-related responses rather than a single dominant signaling axis. In the Cellular Component category, the most prominent enrichments were collagen-containing ECM (16 genes; FDR = 1.09 × 10
−11) and ECM (17 genes; FDR = 3.12 × 10
−11) (
Figure 2C,
Table 3), suggesting that a substantial fraction of the network localizes to extracellular structural compartments, consistent with the central positioning of collagen-related nodes. Within the Molecular Function category, enrichment was observed for ECM structural constituent (8 genes; FDR = 1.20 × 10
−5) and structural molecule activity (13 genes; FDR = 0.00017) (
Figure 2D,
Table 3). This profile may reflect a predominance of structural and scaffold-associated proteins rather than enzymatic or receptor-driven activities.
Pathway-level analysis further supported these observations (
Figure 2E,F,
Table 3). In KEGG, enrichment was identified for protein digestion and absorption (FDR = 0.0025) and the AGE–RAGE signaling pathway in diabetic complications (FDR = 0.0216) (
Figure 2E,
Table 3). Given the gene composition—particularly collagen isoforms—the former likely reflects structural protein turnover rather than gastrointestinal physiology. In Reactome, enriched pathways included ECM proteoglycans (FDR = 7.53 × 10
−5), ECM organization (FDR = 8.6 × 10
−4), integrin cell surface interactions, and collagen chain trimerization (
Figure 2F,
Table 3). Additionally, binding and uptake of ligands by scavenger receptors demonstrated strong enrichment (FDR = 1.04 × 10
−7), which may suggest involvement of macrophage-related or clearance-associated processes.
Taken together, these findings suggest that the network is organized around ECM structure and remodeling, stress-responsive processes, and receptor-mediated ligand handling pathways. However, given the relatively small network size, partial fragmentation, and heterogeneity of enriched terms, these results should be interpreted as exploratory and hypothesis-generating, rather than definitive evidence of a unified mechanistic pathway.
3.4. Cytoscape-Based Network Analysis
Network topology analysis performed in Cytoscape demonstrated heterogeneous node connectivity within the MCC-derived subnetwork. Degree, betweenness centrality, and closeness centrality values are summarized (
Table 4).
Among the analyzed nodes, COL1A1 exhibited the highest degree and betweenness centrality, consistent with a prominent position in maintaining network connectivity within this subnetwork. COL1A2 and COL3A1 also showed relatively high centrality values, whereas several nodes, including HSP90AA1 and HBB, displayed minimal or absent connectivity.
In contrast, ranking based on MCC scores revealed a partially distinct pattern of hub gene prioritization (
Table 5).
Specifically,
COL15A1,
ASPN, and
LUM demonstrated the highest MCC scores, while COL1A1 and other collagen-related genes showed comparatively lower MCC values. Notably, some nodes with low or absent degree centrality were also associated with low MCC scores. Visualization of the MCC-derived subnetwork illustrates a structured cluster primarily composed of ECM–related genes (
Figure 3).
The observed differences between degree-based and MCC-based rankings suggest that these centrality measures capture distinct topological features of the network. These findings should therefore be interpreted in the context of a topologically limited subnetwork.
3.5. Disease Enrichment Results
Disease enrichment analysis was performed using the DisGeNET and OMIM Disease libraries in the Enrichr platform to explore potential disease-associated patterns within the differentially expressed gene set (
Table 6).
DisGeNET analysis demonstrated a strong overrepresentation of cardiovascular-related conditions, including congestive heart failure and heart failure, which showed the most significant adjusted
p-values (
Figure 4A). Additional enriched terms included myocardial infarction, hypertrophic cardiomyopathy, and broader cardiovascular disease categories. These enrichments were supported by genes involved in cardiac structure and function, such as
NPPA,
MYH6, and
FLNC.
In addition to cardiac-related terms, several cardiometabolic conditions, including diabetes mellitus and atherosclerosis, were also enriched. These findings suggest overlap between the identified transcriptional profile and systemic processes commonly associated with cardiac remodeling.
OMIM Disease analysis provided complementary results, highlighting enrichment of connective tissue and structural disorders, including Ehlers–Danlos syndrome and osteoporosis, largely driven by collagen-related genes (COL1A1, COL1A2, COL3A1). Enrichment of anemia-related terms was also observed, reflecting the presence of hemoglobin gene family members (HBA1, HBA2, HBB).
The gene–disease association structure further revealed distinct clustering patterns (
Figure 4B), where hemoglobin-related genes grouped with erythroid and thalassemia-associated terms, while ECM–related genes showed associations with connective tissue–related conditions.
These disease associations were broad and partially overlapping, and therefore were interpreted cautiously. Overall, disease enrichment analysis provides contextual support for a transcriptional profile centered on cardiac remodeling, ECM organization, and cardiometabolic conditions (
Figure 4). Additional OMIM-based visualizations are provided in
Supplementary Figure S1.
3.6. Pathway Enrichment Results
Pathway enrichment analysis was performed using the KEGG 2026 and Reactome 2024 databases through Enrichr (
Table 7). These pathway enrichment results are independent of the STRING functional enrichment analysis presented in
Table 3; therefore, adjusted
p-values may differ between the two analyses.
KEGG analysis identified several pathways related to structural organization and metabolic processes (
Figure 5A). Among the most significantly enriched pathways were cytoskeleton organization in muscle cells and protein digestion and absorption, the latter likely reflecting the overrepresentation of collagen-related genes. Additional pathways included AGE–RAGE signaling in diabetic complications, ECM–receptor interaction, and DCM, suggesting involvement of ECM remodeling and cardiometabolic processes.
Reactome analysis further emphasized ECM-related pathways (
Figure 5B), including ECM organization, ECM proteoglycans, collagen formation, and integrin-mediated interactions. These findings are consistent with the prominent representation of collagen and matrix-associated genes within the dataset.
Additional pathways related to platelet activation and signaling were also identified, suggesting a potential link to vascular or hemostatic processes. However, similar to the disease enrichment results, these pathways were relatively broad and partially overlapping.
Overall, pathway enrichment analysis is consistent with a network organization centered on ECM remodeling, structural integrity, and stress-related signaling processes (
Figure 5), while remaining descriptive in nature. Detailed pathway–gene association patterns are shown in
Supplementary Figures S2 and S3.
3.7. Exploratory Therapeutic Contextualization of Candidate Compounds
To provide additional mechanistic contextualization of the identified ferroptosis–fibrosis network, selected candidate compounds associated with fibrosis, oxidative stress, inflammatory signaling, and iron-dependent cellular injury pathways were reviewed using publicly available pharmacological resources and supporting literature sources. As summarized in
Supplementary Table S2, the selected compounds demonstrated mechanistic relevance to several biologically interconnected processes implicated in DCM, including ECM remodeling, inflammatory signaling, oxidative stress responses, and ferroptosis-associated cellular injury. These observations were not intended to imply therapeutic efficacy, but rather to provide pathway-level contextualization supporting the biological rationale underlying the subsequent molecular docking analyses.
3.8. Docking Results
Molecular docking analyses were performed to provide structural context for selected ligand–target combinations within fibrosis-, ferroptosis-, inflammatory signaling-, oxidative stress-, and cellular survival–related pathways. Representative three-dimensional docking conformations are presented in
Figure 6 and
Figure 7. Detailed docking parameters and grid definitions are provided in
Supplementary Table S3.
The chemical structures of all candidate compounds included in the docking analyses are provided in
Supplementary Figure S4. Detailed residue-level interaction profiles, including conventional hydrogen bonds, π-cation/π-anion interactions, alkyl/π-alkyl contacts, and π-sigma interactions, are summarized in
Supplementary Table S4. Two-dimensional protein–ligand interaction diagrams for all analyzed protein–ligand complexes are provided in
Supplementary Figures S5–S8.
Across the analyzed ligand–target combinations, predicted binding energy values varied according to both compound and target protein. For TGFBR1, the numerically lowest predicted binding energy values were recorded for ruxolitinib (−12.08 kcal/mol), liproxstatin-1 (−11.39 kcal/mol), and finerenone (−10.40 kcal/mol). For STAT3, the lowest numerical values were recorded for liproxstatin-1 (−9.03 kcal/mol), bardoxolone methyl (−8.69 kcal/mol), ruxolitinib (−8.59 kcal/mol), and finerenone (−8.49 kcal/mol). For GPX4, ruxolitinib (−8.06 kcal/mol), liproxstatin-1 (−7.65 kcal/mol), and finerenone (−7.19 kcal/mol) showed the lowest numerical binding energy values within this target group. For AKT1, lower numerical binding energy values were recorded for ruxolitinib (−11.72 kcal/mol), liproxstatin-1 (−10.10 kcal/mol), bardoxolone methyl (−8.75 kcal/mol), and finerenone (−8.60 kcal/mol). In the SMAD3 docking set, liproxstatin-1 (−8.55 kcal/mol), ruxolitinib (−7.84 kcal/mol), and finerenone (−7.18 kcal/mol) yielded the lowest numerical values. For ACSL4, liproxstatin-1 (−10.86 kcal/mol), ruxolitinib (−10.75 kcal/mol), and finerenone (−10.20 kcal/mol) had the lowest predicted binding energy values among the evaluated compounds. Notably, although ruxolitinib yielded the numerically lowest predicted binding energy value for TGFBR1, this observation should be interpreted cautiously because ruxolitinib is a clinically established JAK1/JAK2 inhibitor rather than a known TGFBR1-directed ligand. Therefore, the observed docking score may reflect structural compatibility within the modeled binding region rather than biologically relevant target selectivity.
The retained docking poses were associated with RMSD values within the predefined acceptable range, and residue-level interaction patterns varied across ligand–target pairs. These interaction profiles are presented in detail in
Supplementary Table S4 and were used only to describe the docking conformations retained for structural interpretation.
Overall, the docking results are presented as exploratory computational observations intended to provide structural context for the investigated ligand–target combinations. Predicted binding energies and interaction patterns were not interpreted as evidence of biological activity, target inhibition, pathway modulation, or therapeutic efficacy.
3.9. Exploratory ROC Analysis of Candidate Hub Genes
ROC analysis was performed to evaluate the discriminatory performance of ten candidate hub genes between nonfailing control myocardial samples and cardiomyopathy samples in the GSE5406 dataset. Among the evaluated genes, LUM demonstrated the highest diagnostic performance (AUC = 0.960, 95% CI: 0.923–0.997), followed by
ASPN (AUC = 0.952, 95% CI: 0.889–1.000),
COL3A1 (AUC = 0.835, 95% CI: 0.715–0.954),
COL1A1 (AUC = 0.832, 95% CI: 0.703–0.960),
COL1A2 (AUC = 0.827, 95% CI: 0.705–0.949),
THBS1 (AUC = 0.802, 95% CI: 0.686–0.919),
SERPINE1 (AUC = 0.780, 95% CI: 0.632–0.929),
PTX3 (AUC = 0.748, 95% CI: 0.615–0.881),
S100A8 (AUC = 0.714, 95% CI: 0.598–0.830), and
PDK4 (AUC = 0.697, 95% CI: 0.532–0.861) (
Figure 8,
Table 8).
For ROC analysis, score direction was inverted only for THBS1, SERPINE1, PTX3, S100A8, and PDK4 when the raw expression direction yielded an AUC below 0.50. No inversion was applied for LUM, ASPN, COL3A1, COL1A1, or COL1A2. logFC values reflect the control − disease direction according to GEO2R group assignment.
LUM and ASPN exhibited the highest AUC values among the evaluated candidate hub genes, followed by COL3A1, COL1A1, COL1A2, and THBS1. SERPINE1 and PTX3 showed moderate discriminatory performance, whereas S100A8 and PDK4 showed lower AUC values. Overall, these findings may reflect the contribution of ECM-associated and fibrosis-related genes to sample-level separation between cardiomyopathy and nonfailing myocardial samples in the discovery cohort. However, because ROC analysis was performed within the same dataset used for differential-expression discovery and because of the unequal sample distribution between groups, these results should be interpreted as exploratory and hypothesis-generating rather than evidence of definitive diagnostic performance.
3.10. Machine Learning–Based Identification of Candidate Diagnostic Signatures
LASSO regression identified 9 candidate genes, whereas random forest and SVM-RFE selected 18 and 34 genes, respectively (
Table 9).
Comparison of the three feature-selection approaches demonstrated partial overlap among genes associated with ECM remodeling, myocardial structure, inflammation, and metabolic regulation. The overlap analysis demonstrated convergence of machine-learning prioritization despite methodological differences among the algorithms (
Figure 9). Because feature selection was performed within the same discovery cohort and under marked class imbalance, these findings should be regarded as exploratory and hypothesis-generating.
To address the risk of overfitting and class imbalance, the machine-learning framework was further evaluated using stratified class-weighted five-fold cross-validation. The internally validated model achieved an AUC of 0.934 (95% CI: 0.820–1.000), with class weighting applied to account for the imbalance between cardiomyopathy samples and nonfailing controls (
Figure 10A,
Table 10(A)). Cross-cohort comparison between GSE5406 and GSE263297 showed that all 51 discovery genes were present in the independent cohort, with 20 genes showing the same direction of expression change and one gene retaining both directional consistency and |logFC| ≥ 1 (
Figure 10B,
Table 10(B)). Sample-level external evaluation of the
FCN3–
HOPX–
CNN1–
GLUL four-gene signature in GSE263297 yielded an AUC of 0.673 (95% CI: 0.380–0.967) (
Figure 10C,
Table 10(C)). Among individual genes,
FCN3 showed the highest AUC (0.806), whereas
HOPX,
GLUL, and
CNN1 showed lower AUC values. Together, these analyses provide exploratory internal and independent-cohort observations for the machine-learning–derived signature, while remaining insufficient to establish clinical diagnostic utility.
Internal validation was performed using stratified class-weighted five-fold cross-validation. Cross-cohort comparison assessed directional concordance of differential-expression patterns between GSE5406 and GSE263297. External sample-level evaluation was conducted using normalized log2-transformed RNA-sequencing expression data from the independent GSE263297 cohort.
4. Discussion
This study was designed to explore whether transcriptional alterations identified in cardiomyopathic myocardial tissue may exhibit an underlying network organization reflecting ECM remodeling and stress-related biological processes.
At the level of individual differentially expressed genes, several transcripts identified in the present analysis are consistent with previously described structural and stress-related alterations in cardiomyopathic remodeling. In particular, the prominent representation of ECM–associated genes, including COL1A1, COL1A2, COL3A1, LUM, and ASPN, may reflect ongoing matrix remodeling processes. Similarly, the presence of stress- and inflammation-related genes, such as SERPINE1, PTX3, and S100A8, may indicate broader adaptive or injury-related responses.
However, differential expression alone does not establish functional involvement, and the observed gene-level alterations may reflect downstream or compensatory responses rather than primary disease drivers. In addition, the partial concordance observed in cross-dataset comparisons limits the generalizability of individual gene-level findings. Accordingly, interpretation at the single-gene level should be approached with caution and considered within a broader network and pathway context.
The present study identified a transcriptional signature associated with cardiomyopathic remodeling and explored its potential biological organization using network-based and enrichment analyses. Although differential expression analysis revealed a detectable set of altered transcripts, cross-dataset comparison demonstrated only partial preservation of expression patterns, suggesting that the observed signal is likely context-dependent and biologically heterogeneous rather than universally conserved.
From a network perspective, the PPI analysis suggested a relatively sparse and partially fragmented structure, with a limited number of moderately connected nodes. Notably, collagen-related proteins (COL1A1, COL1A2, and COL3A1) emerged as the most connected elements, suggesting that ECM–associated components may represent a central structural axis within the network. The absence of a densely interconnected core indicates that the transcriptional alterations identified here do not converge on a single dominant pathway but instead reflect multiple coexisting biological processes.
Functional enrichment analysis further supported this interpretation. Gene Ontology results demonstrated enrichment in ECM organization and structural components, particularly within the Cellular Component and Molecular Function categories. These findings are consistent with the central positioning of collagen isoforms in the PPI network and may reflect remodeling of extracellular architecture rather than activation of a discrete signaling cascade.
At the same time, enrichment of broadly defined Biological Process terms, such as response to stress and response to inorganic substance, suggests activation of adaptive and redox-related processes. Given the nonspecific nature of these categories, these observations should be interpreted cautiously and may represent generalized stress responses rather than pathway-specific regulation.
Pathway-level analyses provided additional context but should be interpreted with similar caution. The enrichment of protein digestion and absorption in KEGG, in the presence of collagen-rich gene input, likely reflects structural protein turnover rather than gastrointestinal physiology. Similarly, the identification of the AGE–RAGE signaling pathway may be influenced by the inclusion of genes associated with ECM remodeling and diabetic complications, rather than reflecting direct pathway activation.
Interestingly, recent evidence suggests that ferroptosis may represent a mechanistic link between metabolic stress, oxidative injury, and myocardial remodeling in DCM. Experimental and translational studies have implicated ferroptosis-related pathways in the development of myocardial fibrosis, ECM accumulation, and adverse cardiac remodeling, providing biological context of ferroptosis-associated signals identified in the present analysis [
35,
36,
37].
Reactome analysis highlighted ECM organization, ECM proteoglycans, and integrin-mediated interactions, further reinforcing the concept that extracellular structural remodeling represents a prominent feature of the observed network. The enrichment of scavenger receptor–related pathways may additionally suggest involvement of clearance-associated or macrophage-related processes, although this interpretation remains speculative.
In addition to functional enrichment, disease-oriented annotation provided complementary contextual information. Disease enrichment analysis using DisGeNET and OMIM libraries demonstrated overrepresentation of cardiovascular conditions, including heart failure, myocardial infarction, and cardiomyopathy-related terms. These associations were supported by genes involved in cardiac structure and contractile function, such as NPPA, MYH6, FLNC, and FHL1.
Furthermore, enrichment of connective tissue and ECM–related disorders, including Ehlers–Danlos syndrome, was observed, reflecting the prominence of collagen genes (COL1A1, COL1A2, COL3A1) within the dataset. Metabolic and cardiometabolic pathways, including AGE–RAGE signaling and DCM, were also represented, suggesting potential overlap between extracellular remodeling processes and systemic metabolic stress. However, enrichment of hemoglobin-related disease categories was also detected, likely driven by the inclusion of HBA and HBB gene family members, and should therefore be interpreted cautiously.
Taken together, these findings suggest that the observed transcriptional profile is organized around ECM remodeling, structural integrity, and cardiometabolic stress-related processes. Accordingly, the fibrosis component of the proposed ferroptosis–fibrosis axis is more directly supported by the transcriptomic findings, whereas the ferroptosis component should be interpreted as an integrative, hypothesis-generating biological framework informed by prior mechanistic knowledge and exploratory docking analysis rather than as a directly demonstrated transcriptomic mechanism.
Recent evidence has increasingly highlighted potential mechanistic links between ferroptosis, myocardial fibrosis, oxidative stress, and cardiometabolic remodeling, suggesting that ferroptosis-associated pathways may contribute to ECM remodeling and adverse cardiac remodeling in cardiovascular disease states, including cardiomyopathy [
16,
36,
37,
38].
ROC and machine learning analyses provided additional contextual information regarding the identified remodeling signature. Among the evaluated candidate hub genes, LUM and ASPN exhibited the highest discriminatory performance between cardiomyopathy and nonfailing myocardial samples. COL3A1, COL1A1, COL1A2, and THBS1 also demonstrated relatively strong classification capacity, whereas SERPINE1, PTX3, S100A8, and PDK4 showed moderate discriminatory performance. Notably, these genes are closely linked to ECM organization and myocardial structural remodeling, reinforcing the central role of matrix-related processes identified through differential expression, network, and enrichment analyses.
Machine learning–based feature selection further refined candidate gene prioritization from a complementary analytical perspective. Despite methodological differences among LASSO regression, random forest classification, and SVM-RFE, FCN3, HOPX, CNN1, and GLUL were consistently identified across all three algorithms, constituting the core machine learning signature. The convergence of independent feature-selection approaches on a limited subset of genes suggests that these candidates may represent relatively stable components of the observed cardiomyopathy-associated transcriptional landscape. In particular, the identification of LUM by ROC analysis and by two of three machine learning algorithms provides additional context for its potential relevance within ECM-associated remodeling processes. HOPX, a transcriptional regulator implicated in cardiac development and stress-responsive remodeling, and CNN1, a cytoskeleton-associated marker linked to smooth muscle-like contractile and remodeling phenotypes, are consistent with the fibrosis-dominant and structural remodeling signature observed in the present analysis.
Nevertheless, these findings should be interpreted within the exploratory framework of the present study. ROC analysis and machine-learning–based feature selection were performed within the GSE5406 discovery cohort, and additional validation analyses were conducted to reduce the risk of overfitting. Stratified class-weighted five-fold cross-validation demonstrated preserved internal discriminatory performance, while exploratory sample-level external evaluation of the FCN3–HOPX–CNN1–GLUL signature in the independent GSE263297 cohort yielded an AUC of 0.673 (95% CI: 0.380–0.967). Nevertheless, these findings should still be interpreted cautiously because the external cohort was limited in size and differed from the discovery cohort in disease context, sample composition, and transcriptomic platform. Therefore, the results inform exploratory candidate-gene prioritization rather than definitive diagnostic validation or clinical utility.
Within this framework, mechanistic contextualization of selected candidate compounds further suggested potential biological convergence between ECM remodeling, oxidative stress, inflammatory signaling, and ferroptosis-associated cellular injury pathways. Although these observations do not imply therapeutic efficacy, the integration of compounds associated with mineralocorticoid signaling, iron handling, oxidative stress modulation, and cytokine-related pathways provided an additional systems-level perspective supporting the biological plausibility of the identified ferroptosis–fibrosis network.
In addition, molecular docking analyses provided a complementary structural perspective by examining selected ligand–target combinations within fibrosis-, ferroptosis-, and inflammation-related pathways. Although docking-derived binding energies and interaction patterns cannot be interpreted as evidence of biological activity or therapeutic efficacy, they offered a structure-based framework for prioritizing candidate compound–target relationships for future experimental investigation.
Several limitations should be considered when interpreting these results. First, the primary transcriptomic discovery analysis was based on a single public microarray cohort and therefore requires further validation in larger independent datasets and experimental models. Moreover, detailed patient-level clinical phenotypes were not available in the public transcriptomic dataset, limiting the ability to relate molecular findings to diabetes subtype, disease duration, glycemic control, treatment status, or clinical outcomes. Because T1D- and T2D-associated cardiomyopathy may involve distinct metabolic, inflammatory, and pathophysiological mechanisms, the present findings should not be interpreted as diabetes subtype-specific observations. The relatively small network size and the presence of disconnected nodes also limit the ability to infer robust interaction architecture. In addition, enrichment analyses are inherently dependent on input gene composition and database annotations and may introduce bias toward well-characterized biological processes. Because the ROC and machine-learning analyses were derived from a single discovery cohort, the resulting performance estimates should be interpreted cautiously. Although exploratory sample-level external evaluation was performed in GSE263297, the limited sample size, cohort differences, and platform differences preclude definitive claims of diagnostic validity or clinical utility. Several inflammation- and immune-associated transcripts, including CD163, S100A8, PTX3, and SERPINE1, were present among the differentially expressed genes; however, immune cell deconvolution was not performed in the present study because the analysis was based on bulk myocardial microarray data and a heterogeneous cardiomyopathy cohort. Future studies using single-cell transcriptomics, spatial transcriptomics, or validated immune deconvolution frameworks may help clarify whether macrophage-, monocyte-, neutrophil-, T-cell-, or stromal-related signatures contribute to the observed remodeling profile.
Molecular docking analyses represent structure-based computational predictions and do not account for the full complexity of biological systems, including protein dynamics, conformational flexibility, post-translational modifications, tissue-specific context, or in vivo pharmacokinetics and pharmacodynamics. Furthermore, the ACSL4 structure was derived from an AlphaFold model because an experimentally resolved ligand-bound structure was not available. In addition, GPX4 contains a catalytically important selenocysteine residue that may not be fully represented by standard AutoDock 4.2.6 atom parameterization. Therefore, GPX4-related docking results should be interpreted as approximate estimates of ligand–target compatibility and considered with appropriate methodological caution. SMAD3 (PDB ID: 1MJS) represents an MH2-domain transcription factor complex for which no well-established small-molecule binding pocket has been experimentally validated. Consequently, SMAD3-related docking results should be interpreted with additional caution and regarded primarily as exploratory structural observations. In addition, the particularly low predicted binding energy obtained for the ruxolitinib–TGFBR1 interaction should be interpreted cautiously, as ruxolitinib is a known JAK1/JAK2 inhibitor rather than a TGFBR1-directed ligand and the observed docking score may reflect structural compatibility within the modeled binding region rather than biologically meaningful target selectivity. Accordingly, predicted binding energies and interaction patterns should be interpreted as indicators of potential structural compatibility rather than evidence of biological activity, target modulation, or therapeutic efficacy.
Finally, the present work was designed as an exploratory and hypothesis-generating in silico study integrating transcriptomic profiling, network analysis, machine learning-based prioritization, and molecular docking. Additional validation using independent patient cohorts, molecular dynamics simulations, target-specific functional assays, and appropriate in vitro and in vivo models will be necessary to further evaluate the biological and translational relevance of the reported findings.