Next Article in Journal
Information Entropy-Guided Multi-Scale Feature Fusion for Crowd Density Estimation
Previous Article in Journal
Towards a Mathematical Structure of Global Phenomenal Consciousness
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DBCL-DFNet: Dual-Branch Contrastive Learning for Multi-Omics Dynamic Fusion

1
College of Computer Science and Technology, Taiyuan University of Technology, Taiyuan 030024, China
2
College of Artificial Intelligence, Taiyuan University of Technology, Taiyuan 030024, China
3
College of Artificial Intelligence, Wuhan University, Wuhan 430072, China
*
Authors to whom correspondence should be addressed.
Entropy 2026, 28(6), 616; https://doi.org/10.3390/e28060616 (registering DOI)
Submission received: 8 May 2026 / Revised: 26 May 2026 / Accepted: 29 May 2026 / Published: 30 May 2026
(This article belongs to the Section Entropy and Biology)

Abstract

Multimodal omics data portray biological processes across molecular layers, yet their heterogeneity and high dimensionality hinder a unified representation. Existing integrative approaches either focus on local feature interactions or adopt static fusion, often overlooking the complementary global sequential context and the dynamic relevance among omics sources. Consequently, clinically critical tasks such as accurate cancer-subtype classification and therapy selection still lack sufficient accuracy and robustness. We introduce the Dual-Branch Contrastive Learning for Multi-Omics Dynamic Fusion Network (DBCL-DFNet), a dual-branch contrastive-learning framework that simultaneously encodes local heterogeneous graphs and global omics sequences, distills key features via contrastive objectives, and employs a dynamic attention mechanism for adaptive, data-driven fusion. Benchmarked on three public cancer multi-omics datasets, DBCL-DFNet outperforms both conventional machine-learning models and state-of-the-art deep-integration methods, establishing a competitive and reliable framework for multi-omics integration and demonstrating potential for precision-oncology decision-making. From an information-theoretic perspective, the framework integrates Copula-entropy-guided feature selection with mutual-information-maximizing contrastive alignment, providing a principled foundation for robust multi-omics integration.

1. Introduction

The rapid advancement of high-throughput sequencing technologies has generated an overwhelming amount of multi-omics data in life sciences, significantly enhancing our understanding of complex biological systems [1]. These technologies cover multiple dimensions, including genome-scale DNA sequencing, transcriptome mRNA expression profiling, epigenome DNA methylation detection, and small RNA (miRNA) expression analysis, among others [2]. Each type of omics data captures key aspects of biological processes from different perspectives, and their interconnections and interactions collectively drive disease progression and development [3]. The expanding dimensions of omics data provide a more comprehensive perspective for elucidating disease mechanisms, helping overcome the limitations of traditional genome-wide association studies (GWAS), such as identifying variants in non-coding sequences and revealing the value of important metabolic molecules in tumor progression [4,5]. Beyond increasing the dimensions of omics data, researchers are also committed to developing methods for integrating data from multiple omics layers. For example, integrating matched genomic and gene expression data enables the identification of genetic variants that influence gene expression levels across the genome, known as expression quantitative trait loci (eQTL) [6]. Fully leveraging these multi-omics data and analytical methods is an important direction for uncovering complex disease characteristics and advancing clinical applications.
Graph-based cancer subtype classification effectively captures complex relationships between samples that traditional methods often fail to utilize. By constructing patient similarity networks, graph models can reveal patient groups with similar molecular features, which is crucial for subtype identification [7]. Their visualization capabilities also help intuitively understand data structures and classification results [2].
Graph neural networks (GNNs) and graph attention networks (GATs) serve as core technologies for graph representation learning. Modeling omics data as a graph structure of biological entities and interactions allows GNNs to deeply explore intricate relationships within biological networks [8]. The attention mechanism introduced by GATs enables the model to learn the importance of different interactions and enhances its ability to focus on key biological linkages [7]. For example, the MOGAT framework integrates eight data types (mRNA, lncRNA, methylation, etc.) and improves cancer subtype prediction accuracy through a multi-head attention mechanism [8]; the AMOGEL model combines association rule mining (ARM) with GNNs to achieve better AUC scores than existing models in breast and kidney cancer data [6].
GNNs show great promise for multi-omics research, but there are still a number of obstacles to overcome. The first is the “curse of dimensionality–information loss” problem in high-dimensional data, which occurs when all original features are used directly to create a graph structure. This causes the graph size to increase exponentially and creates noise through erroneous correlations between redundant features. Conventional feature selection techniques may remove promising biomarkers and result in information loss even when they reduce dimensionality [9]. Moreover, current GAT variants mainly focus on direct interactions between nodes and fail to effectively integrate global information [6]. Second, homogeneous graphs may lead to a one-sided representation of feature associations: existing GAT models based on homogeneous graphs can only represent a single type of node or edge and cannot capture feature associations [10,11]. Third, static weights have insufficient adaptability for sample-level modality contributions: traditional multi-omics integration methods use fixed weights that cannot adapt to the varying importance of different modalities across samples [12,13].
From an information-theoretic perspective, the core challenge is to preserve discriminative information across heterogeneous omics while suppressing redundancy. To address this, we perform feature selection by combining the F3 score with Copula entropy. This hybrid strategy retains both class separability and information-theoretic relevance while reducing dimensionality. Furthermore, contrastive learning in our dual-branch encoder can be understood as maximizing a mutual-information lower bound between local graph views and global sequence views. Motivated by these insights, we design DBCL-DFNet as an information-aware multi-omics integration framework to overcome the above three limitations.
To address the above challenges, this paper employs heterogeneous graphs to model heterogeneous relationships, GAT+Mamba to capture local and global graph structures, a Transformer branch to preserve original sequence information, contrastive learning to align the dual branches, and dynamic attention to achieve sample-level adaptive fusion, thereby forming the Dual-Branch Contrastive Learning for Multi-Omics Dynamic Fusion Network (DBCL-DFNet) framework. The main contributions are as follows:
  • We propose a multi-omics integration framework named DBCL-DFNet, which offers a robust and effective solution for cancer subtype classification based on multi-omics integration.
  • The proposed model employs a dual-branch contrastive learning encoder to integrate local heterogeneous graphs with global sequences. This approach provides a unified perspective, effectively capturing critical features and their interrelationships, while simultaneously reducing graph complexity and preserving latent information.
  • The model incorporates a dynamic attention mechanism to fuse the outputs from multiple omics encoders, thereby addressing the limitation of static weighting schemes and enhancing adaptability to sample-specific modality contributions.

2. Method

The proposed multi-omics model for cancer subtype classification consists of three main modules, namely heterogeneous graph construction, the graph-sequence dual-branch structure, and the dynamic attention fusion mechanism. The overall workflow of our model is illustrated in Figure 1.

2.1. Heterogeneous Graph Construction

We construct a heterogeneous graph using carefully selected features to capture intricate patient-feature relationships while avoiding the computational overhead of a fully connected graph. Specifically, we create three complementary subgraphs: a patient similarity network, a feature similarity network, and a feature-patient network. These structures jointly reveal inter-patient affinities, intrinsic feature correlations, and associations between features and individual patients. The resulting heterogeneous graph encodes rich relational information that substantially enhances downstream node representation learning.
First, feature selection was conducted using the F3 score and Copula entropy to identify the most informative features from each omics modality [14,15]. The F3 score evaluates a feature’s discriminative power by measuring the degree of overlap between different classes, with higher scores indicating better separation. It is computed as follows:
F 3 j = 2 C ( C 1 ) ( e 1 , e 2 ) 1 n overlap n total
where C is the number of classes, n overlap is the number of overlapping samples between classes e 1 and e 2 , and n total is the total number of samples in these two classes. Copula entropy measures the dependence between a feature and the target variable, with lower values indicating higher dependence. It is defined as:
CE ( x j , y ) = 0 1 0 1 c ( u , v ) log c ( u , v ) d u d v
where u and v are the uniform-transformed variables of the marginal distributions of x j and y, and c ( u , v ) is the corresponding Copula density function.
The F3 score and Copula entropy serve complementary roles in feature selection. The F3 score evaluates the discriminative power of a feature by measuring class separability based on sample overlap, which is intuitive and computationally efficient but primarily captures linear or rank-based separation. Copula entropy, in contrast, quantifies the statistical dependence between a feature and the target variable without assuming a specific functional form, thus capturing nonlinear and higher-order relationships that may be missed by the F3 score. By combining both measures, we retain features that are either well-separated in the original space or strongly dependent on the target in a potentially nonlinear manner. This hybrid strategy reduces dimensionality while minimizing the risk of discarding biologically relevant but nonlinearly associated features.
Subsequently, we constructed the heterogeneous graph to link the selected features and reveal their relationships. For each omics modality X m , we computed the similarity matrix S ( m ) R N × N between patients using the Pearson correlation coefficient:
S i , j ( m ) = d = 1 D m x i , d ( m ) x ¯ i ( m ) x j , d ( m ) x ¯ j ( m ) d = 1 D m x i , d ( m ) x ¯ i ( m ) 2 d = 1 D m x j , d ( m ) x ¯ j ( m ) 2
D m is the number of features in the m-th omics modality, x i ( m ) R D m and x j ( m ) R D m are the feature vectors of patients i and j, x i , d ( m ) is the value of the d-th feature of patient i in the m-th omics modality, and x ¯ i ( m ) is the mean of patient i’s features in the m-th omics modality. According to the sparsity rate, we selected entries in the similarity matrix that exceeded a certain threshold to construct the sparse adjacency matrix A m . Similarly, we constructed the feature similarity graph.
For the feature-patient association graph, we established connections between features and patients to represent the association between each feature and each patient. Specifically, for each feature x i and each patient p j , we constructed an edge ( x i , p j ) to indicate that feature x i belongs to patient p j .
By integrating feature selection and heterogeneous graph construction, our model effectively captures the most informative features and their relationships within each omics modality, providing a solid foundation for subsequent multi-omics data analysis.

2.2. Graph-Sequence Dual-Branch Structure

In the graph branch, GAT dynamically assigns attention weights to the first-order neighbors of each node, thereby emphasizing local topological features. Meanwhile, Mamba arranges graph nodes into a sequential representation according to breadth-first traversal and leverages a selective state space model to capture long-range dependencies between distant node pairs, enabling effective modeling of global structural information [16,17]. The two modules extract complementary information from local and global perspectives, respectively, while avoiding functional redundancy. Their outputs are further fused through residual connections, allowing the resulting node representations to preserve both local contextual features and global structural dependencies. The detailed pipeline of the graph branch is illustrated in Figure 2.
Let l denote the layer index, and Z m ( l 1 ) R N × D l 1 be the node feature matrix output by the dual-branch encoder for modality m at layer l 1 , where N is the number of nodes and D l 1 is the feature dimension. The j-th row Z m , j ( l 1 ) corresponds to the features of node j. For the omics modality X m , GAT learns the structural representation G m ( l ) at layer l via the following attention-based convolution:
G m ( l ) = σ j N ( i ) { i } α i j ( l ) W m ( l 1 ) Z m , j ( l 1 ) + Z m , i ( l 1 )
here, W m ( l 1 ) denotes the weight matrix of layer l 1 . The term α i j ( l ) is the standard GAT attention coefficient computed by a shared attention mechanism. The input to this layer is given by Z m ( l 1 ) . GAT learns the representation G m ( l ) of layer l from Z m ( l 1 ) and W m ( l 1 ) .
For the omics modality X m , Mamba learns the structural representation of layer l, denoted M m ( l ) , through the following operations.
After batch-wise alignment and zero-padding, the node features are organized into a dense sequence X ( l 1 ) R B × T × D , where B is the batch size, T is the maximum number of nodes in any single graph, and D is the feature dimension. An accompanying binary mask { 0 , 1 } B × T indicates valid nodes (1) and zero-padding (0). Leveraging the Structured State Space (S4) mechanism, Mamba models long-range dependencies within the sequence X ( l 1 ) . The core computation can be decomposed into the following steps.
Define the latent state S t R H with state dimension H, initialized as S 0 = 0 (the zero vector). The state evolution follows discretized continuous-time state-space dynamics:
S t = A S t 1 + B X t ( l 1 )
here, A R H × H is the state-transition matrix, and B R H × D is the input projection matrix.
O t = C S t 1 + D ss X t ( l 1 )
Here, C R D × H denotes the projection matrix from the hidden state to the output, and D ss R D × D denotes the direct input-to-output mapping matrix (note that D ss is distinct from the feature dimension D).
After aggregating the per-step outputs { O t } t = 1 T into the full-sequence encoding Y m ( l ) = { O t } t = 1 T R B × T × D , we apply the mask to keep only the representations of valid nodes, perform dropout on the resulting Y ˜ m ( l ) , and finally add the processed vectors back to the input M m ( l 1 ) via a residual connection, producing the layer-l representation M m ( l ) :
M m ( l ) = Dropout Y ˜ m ( l ) + Z m ( l 1 )
Finally, the final representation Z m ( l ) is obtained by fusing the outputs of GAT and Mamba:
Z m ( l ) = M m ( l ) + G m ( l )
The input to the graph branch consists of heterogeneous graph nodes after feature selection, which reduces dimensionality but may also discard some global distribution information. To address this, we retain the original sequence data (without feature selection) and feed it separately into a Transformer branch [18]. This branch directly captures long-range dependencies among original features via multi-head self-attention, and its output complements that of the graph branch: the graph branch provides structured topological information, while the Transformer branch provides global patterns of the original sequence. The two branches are aligned via a contrastive loss, thereby fusing the two types of information.
For the omics modality X m , the Transformer learns the l-th layer representation as follows.
The Transformer leverages multi-head self-attention to capture long-range dependencies among sequence elements, serving as the core component of feature processing. Given the l-th layer input S m ( l 1 ) R B × T × D l 1 , where B is the batch size, T is the sequence length, and D l 1 is the feature dimension, learnable projection matrices are applied to generate queries Q l , keys K l , and values V l .
We then compute the similarity between queries and keys, normalize it via Softmax to obtain attention weights, and simultaneously apply the mask a t t n to suppress contributions from padded positions:
A l = Softmax Q l K l T D l 1 mask a t t n
Next, the representation is split into multiple attention heads, the outputs from all heads are aggregated, and the original feature dimension is restored:
MHSA S m ( l 1 ) = Concat ( head 1 , , head h ) W o ( l )
where h denotes the number of attention heads and W o ( l ) is the aggregation matrix. Here, head i = Attention ( Q i , K i , V i ) with head-specific projections.
To preserve historical feature information and accommodate dimension changes across layers, residual connections along with projection operations are introduced:
Proj S m ( l 1 ) = W r e s ( l ) S m ( l 1 ) + b r e s ( l ) , D l 1 D l S m ( l 1 ) , D l 1 = D l
where D l is the feature dimension of the l-th layer. When the dimensions differ between layers ( D l 1 D l ), a linear projection matrix W r e s ( l ) R D l × D l 1 together with bias b r e s ( l ) R D l is applied to align the feature dimensions; otherwise, the features are passed through unchanged.
Subsequently, layer normalization is applied to the sum of the attention output and the residual connection, stabilizing training and accelerating convergence:
S m ( l ) , norm = LN MHSA S m ( l 1 ) + Proj S m ( l 1 )
Ultimately, a two-layer feed-forward network enriches nonlinear feature expressiveness and distills higher-level semantics, yielding the final layer-l output S m ( l ) of the Transformer.
For the loss function, we adopt a cooperative optimization strategy that jointly employs a classification loss and a contrastive loss [19]. Concretely, the adopted contrastive loss (InfoNCE) can be interpreted as maximizing a lower bound of the mutual information between the graph-branch representation and the Transformer-branch representation, thereby encouraging information-theoretic alignment across the two views.
Taking modality m as an example, the loss computation proceeds as follows.
The classification loss is used to supervise the model in learning discriminative features for category distinction. For modality m, the predicted logits O m ( l ) R B × C (with C classes) are combined with the ground-truth labels y R B and the valid-sample mask { 0 , 1 } B to compute the classification loss:
L c l s , m ( l ) = 1 i = 1 B mask i i = 1 B mask i c = 1 C y i , c log ( p i , c )
here, i = 1 B mask i represents the number of valid samples, y i , c { 0 , 1 } is the true label indicating whether the i-th sample belongs to class c, and p i , c is the predicted probability obtained by applying Softmax to the logits O m ( l ) .
The two sets of features for the contrastive loss come from two different processing paths: View 1 is the encoding result of the graph branch (GAT + Mamba) on the feature-selected heterogeneous graph nodes; View 2 is the encoding result of the Transformer branch on the original sequence data without feature selection. The contrastive loss forces the representations of these two views for the same sample to be as similar as possible, thereby enabling the model to fuse structured topological information with raw global sequence information and enhancing the discriminative power of the features.
For the m-th modality, given two sets of features T m ( l ) R B × T × D l from the graph branch and T m ( l ) R B × T × D l from the Transformer branch, the process is as follows.
After performing L 2 normalization on the features, we obtain T m norm and T m norm . Then, the similarity matrix of the normalized features is computed:
Sim i , j = ( T m norm ) i · ( T m norm ) j T τ
here, ( T m norm ) i denotes the i-th sample feature vector of T m norm , and ( T m norm ) j T represents the transpose of the j-th sample feature vector of T m norm . τ is the temperature parameter. A mask a t t n { 0 , 1 } B × B is introduced to mask out padded positions. The attention weights are then computed via the Softmax function:
A m , i , j = exp Sim i , j × mask i , j a t t n k exp Sim i , k × mask i , k a t t n , i , j [ 1 , B ]
where A m , i , j is the attention weight between sample i of the first view and sample j of the second view for modality m.
The contrastive loss is computed using cross-entropy, with the labels defined as the indices of the diagonal elements:
L c o n t , m ( l ) = 1 B i = 1 B log ( A m , i , i )
The total loss is obtained by combining the classification loss and the contrastive loss, with λ 1 and λ 2 denoting the corresponding loss weights.
L t o t a l , m ( l ) = λ 1 L c l s , m ( l ) + λ 2 L c o n t , m ( l )

2.3. Dynamic Attention Fusion Mechanism

Unlike static fusion strategies such as simple concatenation or fixed weighting, dynamic attention fusion generates modality-specific weights for each sample individually, thereby accommodating sample-specific differences in the contributions of different omics modalities [20,21]. Concretely, cross-modal multi-head attention is used to compute interactions among modality features, which are then passed through a two-layer fully connected network to produce sample-specific weight vectors. The weights are normalized via Softmax and used to adaptively weight the modality features. This allows the omics modality that is more discriminative for a given sample to obtain a higher fusion weight.
The fusion module learns the fused feature representation for classification prediction through the following operations. For the m-th omics input input 1 [ m ] R B × D m and input 2 [ m ] R B × D m , where B denotes the batch size and D m represents the feature dimension of the m-th omics modality, the features are first concatenated and projected for enhancement. Subsequently, the enhanced features from all omics are stacked to form a sequence features R B × M × D hid , where M denotes the number of omics modalities and D hid represents the dimension of the enhanced features.
Based on the multi-head attention mechanism, cross-modal interactions are performed on the multi-omics feature sequence. The query Q, key K, and value V are all defined as features. The multi-head attention is calculated as follows:
head i = Softmax Q i K i T D hid / H V i
attn output = Concat ( head 1 , . . . , head H ) W o
where H is the number of attention heads, W o R D hid × D hid is the aggregation matrix, and attn _ output R B × M × D hid denotes the attention output.
Based on the multimodal feature sequence, modality-specific weights are computed by a weight-generation network that first flattens the sequence into a vector representation, processes it with a two-layer feed-forward network, and finally applies a Softmax to yield weight dynamic R B × M .
Finally, after obtaining W output by weighting the attention output with weight dynamic , W output is passed through a two-layer feed-forward network that maps it into the classification space, producing the final classification logits R B × C , where C denotes the number of classes.
The idea of adapting to changing conditions rather than following a fixed schedule also appears in dynamic event-triggered control [22,23], which is conceptually related at a high level to our sample-specific dynamic weighting.

3. Experiments

3.1. Datasets

To validate our model, we applied it to three real-world cancer multi-omics datasets: the LGG dataset for classifying low-grade gliomas (Grade II and Grade III) [24], the RCC dataset for classifying renal cell carcinomas (KICH, KIRC, and KIRP) [25], and the BLCA dataset for classifying bladder urothelial carcinoma (low-grade vs. high-grade) [26]. The omics data were obtained from the TCGA cohort via the UCSC Xena platform [27]. Each dataset included DNA methylation, mRNA expression, and miRNA expression data, with matched samples only. Preprocessing involved handling missing values, normalizing features, and removing low-variance features. The details of the datasets are provided in Table 1.

3.2. Experimental Setup

To avoid information leakage, all data preprocessing steps, including Copula entropy and F3-score feature selection, as well as heterogeneous graph construction, were performed strictly within the training set of each fold; the test set did not participate in any preprocessing step. Specifically, we adopted stratified five-fold cross-validation: each dataset was randomly divided into five folds according to class proportions. In each round, one fold was used as the test set and the remaining four folds were used as the training set, resulting in five evaluation rounds. For each fold, feature selection and heterogeneous graph construction were first performed on the training set, and then the selected features and graph structure were mapped to the corresponding test set. All deep learning algorithms were implemented using the PyTorch 2.1.0 framework, and experiments were conducted on a Linux operating system with a vGPU-32GB GPU. The final results were reported as the mean ± standard deviation over the five test folds.
For the evaluation of binary classification results in cancer multi-omics data, we used Accuracy, AUROC, Recall, Precision, Specificity, NPV, and F1 score to assess the classification outcomes. For the evaluation of multi-class classification results in cancer multi-omics data, we used Accuracy, Macro F1, Micro F1, Weighted F1, Precision, and Recall to assess the classification outcomes.

3.3. Hyperparameter

To improve reproducibility, we summarized the key hyperparameter settings of DBCL-DFNet on the three datasets in Table 2. These hyperparameters include training-related parameters, branch-specific dropout rates, attention heads and layers, graph-construction parameters, and loss weights.
The feature sparsity rates for DNA methylation, mRNA expression, and miRNA expression were fixed at 0.9, 0.9, and 0.8, respectively, across all datasets. The random seed was fixed at 42 for reproducibility. For contrastive learning, the graph-branch representation and Transformer-branch representation of the same patient were treated as a positive pair, whereas representations from different patients within the same mini-batch were treated as negative pairs.
The hyperparameters in Table 2 were selected according to validation performance on the training folds. Dataset-specific hyperparameters, such as learning rate, dropout rate, GAT heads, and patient sparsity rate, were tuned because the three datasets differ in sample size, class distribution, and feature dimensionality. All hyperparameter selection was performed only on the training folds, and the test folds were not used for model selection or parameter tuning.

4. Results and Discussion

4.1. Evaluation of Multi-Omics Classification Performance

To effectively evaluate the classification accuracy of our model, we compared it with ten other multi-omics integration models. These included the K-Nearest Neighbors classifier (KNN) [28], Random Forest classifier (RF) [29], eXtreme Gradient Boosting (XGBoost) [25], Multi-Omics Graph Convolutional Network (MOGONET) [30], Graph Convolutional Network (GCN) [31], Cancer Molecular Subtype Diagnosis model (CancerSD) [32], Multi-Omics Dynamic Learning Integration Network (TMODINET) [33], Multi-Omics Hypergraph Integration Network (MORE) [34], Integrative Graph Convolution Networks (IGCN) [35], and SMODA [36].
The detailed comparison of classification results is shown in Table 3. As can be seen from the table, our model achieved the best performance on most evaluation metrics across the three datasets. For example, on the LGG dataset, our model achieved an accuracy approximately 3.2% higher than that of the second-best classification model (0.741 vs. 0.709). On the RCC dataset, our model was 1.4% more accurate than the second-best classification model (0.980 vs. 0.966). Interestingly, on the BLCA dataset, even with a smaller number of positive samples, our model still outperformed other machine learning and deep learning methods, with an accuracy 1.2% higher (0.976 vs. 0.964) and an F1 score 5.6% higher (0.868 vs. 0.812) than that of the second-best classification model. Overall, compared with the respective second-best methods, DBCL-DFNet improves accuracy by 3.2% (LGG), 1.4% (RCC), and 1.2% (BLCA), with an additional F1 gain of 5.6% on the highly imbalanced BLCA dataset. These results demonstrate that our model achieves competitive performance in both binary and multi-class classification tasks.

4.2. Ablation Study of Key Modules

To validate the effectiveness of each module, we conducted ablation experiments on three datasets using different variants of our proposed model. The settings are as follows: First, we directly constructed a patient similarity network and named this variant W/oHE. Second, to verify the effectiveness of the dual-branch structure, we performed the following operations: removing the Transformer branch (W/oTR), removing the GAT module from the graph branch (W/oGA), removing the Mamba module from the graph branch (W/oMA), and retaining only the Transformer branch (W/oGR). Third, we used only the original classification loss and named this variant W/oCL. Fourth, we replaced the dynamic attention mechanism with VCDN and named this variant W/oDA. Table 4 presents the classification results of the proposed model under different configurations.
The results show that the complete model (Ours) outperforms all variants across all datasets and metrics. Taking Accuracy on the LGG dataset as an example: the complete model achieves 0.741; removing only GAT (W/oGA) reduces it to 0.703, and removing only Mamba (W/oMA) reduces it to 0.697, with similar drops for both, indicating that GAT and Mamba each contribute complementary local and global information. Removing the entire graph branch (W/oGR) further reduces Accuracy to 0.674, and removing the Transformer branch (W/oTR) reduces it to 0.665, demonstrating that the graph branch and the sequence branch work synergistically and are both indispensable. After replacing the dynamic attention mechanism with VCDN (W/oDA), Accuracy on LGG, RCC, and BLCA drops to 0.669, 0.957, and 0.952, respectively, all lower than the complete model, confirming the effectiveness of sample-level dynamic weights. Removing the contrastive loss (W/oCL) also leads to performance degradation on all datasets (e.g., Accuracy on BLCA drops from 0.976 to 0.962), indicating that contrastive learning enhances the alignment of dual-branch features. The above ablation experiments not only verify the complementary strengths of each module but also provide empirical support for the overall effectiveness and robustness of the complete framework.
A closer look at Table 4 on the BLCA dataset reveals that W/oDA achieves the same Recall (0.720) as the full model but a lower F1 score (0.783 vs. 0.868). This implies that removing dynamic attention does not affect the identification of positive samples but leads to a drop in Precision (0.550 vs. 0.833), i.e., more false positives. The BLCA dataset is highly imbalanced (397 high-grade vs. 21 low-grade), which exacerbates over-prediction of the majority class. The dynamic attention mechanism adaptively reweights modality features per sample, helping to reduce false positives by focusing on discriminative cues. Without it, the model suffers from increased false positives and thus lower Precision and F1, while Recall stays unchanged.

4.3. Model Performance Across Different Omics Data Types

To demonstrate the necessity of integrating multi-omics data, we conducted experiments using different modality combinations, including DNA, mRNA, miRNA, DNA + mRNA, DNA + miRNA, mRNA + miRNA, and DNA + mRNA + miRNA. The performance comparison under different settings is shown in Figure 3.
Clearly, the best results were achieved when DNA methylation, mRNA expression, and miRNA expression were simultaneously fed into the model, demonstrating that it effectively exploits cross-omics complementarity to capture biological signals inaccessible to any single modality. These results further confirm that integrating more comprehensive molecular information improves model performance and supports its potential utility in clinical research.

4.4. Interpretability Analysis Based on Dynamic Modality Weights

To further investigate the biological interpretability of DBCL-DFNet, we analyzed the dynamic attention weights learned by the multi-omics fusion module. Unlike static fusion strategies that assign fixed contributions to different omics modalities, the proposed dynamic attention mechanism generates sample-specific modality weights. Therefore, the learned weights can be regarded as modality-level indicators reflecting the relative contributions of DNA methylation, mRNA expression, and miRNA expression to the final prediction.
As shown in Figure 4, the learned dynamic weights reveal distinct modality-contribution patterns across different datasets and class labels. The values are reported in the order of DNA methylation, mRNA expression, and miRNA expression.
For the LGG dataset, the average weights are 0.428, 0.391, and 0.181 for Grade II samples, and 0.451, 0.387, and 0.162 for Grade III samples. These results indicate that both LGG grades assign higher weights to DNA methylation and mRNA expression than to miRNA expression.
For the RCC dataset, KICH shows the highest contribution from mRNA expression, with an average weight of 0.475. In contrast, KIRC and KIRP assign relatively higher weights to miRNA expression, with average weights of 0.405 and 0.519, respectively, suggesting subtype-specific modality preferences.
For the BLCA dataset, the average weights are 0.413, 0.171, and 0.416 for high-grade samples, and 0.417, 0.218, and 0.365 for low-grade samples. These results suggest that both BLCA grades rely more on DNA methylation and miRNA expression than on mRNA expression.
Overall, these observations indicate that DBCL-DFNet does not rely on fixed modality contributions but adaptively captures cancer- and subtype-specific molecular evidence. Therefore, the dynamic modality weights provide a quantitative modality-level explanation for the model’s classification behavior and offer useful clues for subsequent biological interpretation and precision-oncology research.

5. Conclusions

This study proposes the DBCL-DFNet framework to address core challenges in multi-omics data integration. The framework performs feature selection using the F3 score and Copula entropy, constructs a heterogeneous graph model integrating feature similarity, patient similarity, and feature-patient networks, and employs graph-sequence dual-branch learning with GAT-Mamba and Transformer to model both local and global features, aligning dual-branch representations via a contrastive loss. Additionally, it achieves adaptive multi-omics fusion through a dynamic attention mechanism. Experiments on three different cancer datasets demonstrate that the proposed methodology outperforms both conventional machine learning techniques and state-of-the-art deep learning-based multi-omics integration models. Quantitatively, compared with the best-performing state-of-the-art methods, DBCL-DFNet improves accuracy by 3.2% on LGG, 1.4% on RCC, and 1.2% on BLCA, and increases the F1 score by 5.6% on the highly imbalanced BLCA dataset.
Overall, DBCL-DFNet realizes effective integration of information-theoretic principles and deep multi-modal fusion, providing a reproducible and robust solution for cancer subtype classification. The framework establishes a systematic multi-omics data integration pipeline covering “feature selection–graph-sequence modeling–dynamic fusion”, enabling trustworthy and in-depth integration of multi-omics data. From a real-world application perspective, DBCL-DFNet can serve as a computational decision-support tool for precision oncology by assisting multi-omics-based cancer subtype classification, patient stratification, subtype-related risk assessment, and precision-oncology research. Rather than replacing pathological diagnosis or clinical judgment, it is expected to provide complementary molecular evidence for clinicians and biomedical researchers.
Although the proposed method achieves promising classification performance in cancer subtype identification, its generalization ability and fine-grained biological interpretability still have room for further improvement. In future work, we will further optimize the model structure and incorporate pathway-level enrichment analysis, biomarker discovery, and graph relationship interpretation to further explore the biological significance of the learned feature representations, thereby improving the clinical applicability and interpretability of the model.

Author Contributions

Y.D.: Writing—original draft, Visualization, Formal analysis. X.Y.: Writing—original draft. L.Z.: Writing—Funding acquisition. D.L.: Writing—Supervision, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 32470678; the Consulting Research Project of Chinese Academy of Engineering, grant number 2020SX6; the Basic Research Program of Shanxi Province, grant numbers 202303021211069 and 202403021222070; and the Key Research and Development Program of Shanxi Province, grant number 202402020101008.

Data Availability Statement

DBCL-DFNet is available at https://github.com/dangyun943/DBCL-DFNet (accessed on 28 May 2026).

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Baião, A.R.; Cai, Z.; Poulos, R.C.; Robinson, P.J.; Reddel, R.R.; Zhong, Q.; Vinga, S.; Gonçalves, E. A technical review of multi-omics data integration methods: From classical statistical to deep generative approaches. Brief. Bioinform. 2025, 26, bbaf355. [Google Scholar] [CrossRef]
  2. Barylli, M.; Saha, J.; Buffart, T.E.; Koster, J.; Lenos, K.J.; Vermeulen, L.; Sheraton, V.M. Biological Multi-Layer and Single Cell Network-Based Multiomics Models—A Review. arXiv 2025, arXiv:2503.09568. [Google Scholar]
  3. Hawkes, G.; Chundru, K.; Jackson, L.; Patel, K.A.; Murray, A.; Wood, A.R.; Wright, C.F.; Weedon, M.N.; Frayling, T.M.; Beaumont, R.N. Whole-genome sequencing analysis identifies rare, large-effect noncoding variants and regulatory regions associated with circulating protein levels. Nat. Genet. 2025, 57, 626–634. [Google Scholar] [CrossRef]
  4. Song, T.; Shi, Y.; Li, Y.; Hao, D.; Zhan, K.; Xu, T.; Chen, R.; He, S. TOAnnoPriDB: An integrative database for trans-omic annotation and prioritization of non-coding variants across human genome. Sci. Bull. 2025, 70, 1757–1760. [Google Scholar] [CrossRef]
  5. Strober, B.J.; Zhang, M.J.; Amariuta, T.; Rossen, J.; Price, A.L. Fine-mapping causal tissues and genes at disease-associated loci. Nat. Genet. 2025, 57, 42–52. [Google Scholar] [CrossRef] [PubMed]
  6. Tanvir, R.B.; Islam, M.M.; Sobhan, M.; Luo, D.; Mondal, A.M. MOGAT: A Multi-Omics Integration Framework Using Graph Attention Networks for Cancer Subtype Prediction. Int. J. Mol. Sci. 2024, 25, 2788. [Google Scholar] [CrossRef]
  7. Tabakhi, S.; Vandermeulen, C.; Sudbery, I.; Lu, H. Heterogeneous Graph Attention Network Improves Cancer Multiomics Integration. arXiv 2024, arXiv:2408.02845. [Google Scholar] [CrossRef]
  8. Choi, J.M.; Chae, H. moBRCA-net: A breast cancer subtype classification framework based on multi-omics attention neural networks. BMC Bioinform. 2023, 24, 169. [Google Scholar] [CrossRef]
  9. Fang, Z.; Zhang, X.; Zhao, A.; Li, X.; Chen, H.; Li, J. Recent Developments in GNNs for Drug Discovery. arXiv 2025, arXiv:2506.01302. [Google Scholar] [CrossRef]
  10. Wang, W.; Chen, H. Predicting miRNA-disease associations based on lncRNA–miRNA interactions and graph convolution networks. Brief. Bioinform. 2023, 24, bbac495. [Google Scholar] [CrossRef] [PubMed]
  11. Li, X.; Ma, J.; Leng, L.; Han, M.; Li, M.; He, F.; Zhu, Y. MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis. Front. Genet. 2022, 13, 806842. [Google Scholar] [CrossRef] [PubMed]
  12. Sammut, S.J.; Crispin-Ortuzar, M.; Chin, S.-F.; Provenzano, E.; Bardwell, H.A.; Ma, W.; Cope, W.; Dariush, A.; Dawson, S.-J.; Abraham, J.E.; et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature 2022, 601, 623–629. [Google Scholar] [CrossRef]
  13. Pan, Y.; Lei, X.; Zhang, Y.C. Association predictions of genomics, proteinomics, transcriptomics, microbiome, metabolomics, pathomics, radiomics, drug, symptoms, environment factor, and disease networks: A comprehensive approach. Med. Res. Rev. 2022, 42, 441–461. [Google Scholar] [CrossRef]
  14. Durante, F.; Sempi, C. Copula Theory: An Introduction. In Copula Theory and Its Applications; Springer: Berlin/Heidelberg, Germany, 2010; pp. 3–31. [Google Scholar] [CrossRef]
  15. Güneş, S.; Polat, K.; Yosunkaya, Ş. Multi-class f-score feature selection approach to classification of obstructive sleep apnea syndrome. Expert Syst. Appl. 2010, 37, 998–1004. [Google Scholar] [CrossRef]
  16. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2018, arXiv:1710.10903. [Google Scholar]
  17. Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
  18. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NIPS 2017); MIT Press: Cambridge, MA, USA, 2017; Volume 30, pp. 5998–6008. [Google Scholar]
  19. van den Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
  20. Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
  21. Guo, H.; Jin, X.; Jiang, Q.; Wozniak, M.; Wang, P.; Yao, S. DMF-Net: A Dual Remote Sensing Image Fusion Network Based on Multiscale Convolutional Dense Connectivity With Performance Measure. IEEE Trans. Instrum. Meas. 2024, 73, 4501015. [Google Scholar] [CrossRef]
  22. Zhang, D.; Meng, L.; Liang, L.; Qin, C.; Liu, D. Dynamic Event-Triggered Control for Human–Machine Cooperative Systems Based on Dynamic Authority Allocation. IEEE Trans. Syst. Man Cybern. Syst. 2026, 56, 3733–3744. [Google Scholar] [CrossRef]
  23. Zhang, D.; Hao, Y.; Yuan, Q.; Qin, C. Dynamic event-triggered approximate optimal consensus control for unknown nonlinear multi-agent systems via adaptive dynamic programming. ISA Trans. 2026, 172, 21–32. [Google Scholar] [CrossRef]
  24. Network, C.G.A.R. Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas. N. Engl. J. Med. 2015, 372, 2481–2498. [Google Scholar] [CrossRef]
  25. Chen, F.; Zhang, Y.; Şenbabaoğlu, Y.; Ciriello, G.; Yang, L.; Reznik, E.; Shuch, B.; Micevic, G.; De Velasco, G.; Shinbrot, E.; et al. Multilevel Genomics-Based Taxonomy of Renal Cell Carcinoma. Cell Rep. 2016, 14, 2476–2489. [Google Scholar] [CrossRef]
  26. The Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 2014, 507, 315–322. [Google Scholar] [CrossRef]
  27. Goldman, M.J.; Craft, B.; Hastie, M.; Repečka, K.; McDade, F.; Kamath, A.; Banerjee, A.; Luo, Y.; Rogers, D.; Brooks, A.N.; et al. Visualizing and interpreting cancer genomics data via the Xena platform. Nat. Biotechnol. 2020, 38, 675–678. [Google Scholar] [CrossRef]
  28. Uddin, S.; Haque, I.; Lu, H.; Moni, M.A.; Gide, E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci. Rep. 2022, 12, 6256. [Google Scholar] [CrossRef] [PubMed]
  29. Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
  30. Wang, T.; Shao, W.; Huang, Z.; Tang, H.; Zhang, J.; Ding, Z.; Huang, K. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat. Commun. 2021, 12, 3445. [Google Scholar] [CrossRef]
  31. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017; pp. 1–13. [Google Scholar]
  32. Bu, Y.; Liang, J.; Li, Z.; Wang, J.; Wang, J.; Yu, G. Cancer molecular subtyping using limited multi-omics data with missingness. PLoS Comput. Biol. 2024, 20, e1012710. [Google Scholar] [CrossRef] [PubMed]
  33. Du, L.; Gao, P.; Liu, Z.; Yin, N.; Wang, X. TMODINET: A trustworthy multi-omics dynamic learning integration network for cancer diagnostic. Comput. Biol. Chem. 2024, 113, 108202. [Google Scholar] [CrossRef]
  34. Wang, Y.; Wang, Z.; Yu, X.; Wang, X.; Song, J.; Yu, D.-J.; Ge, F. MORE: A multi-omics data-driven hypergraph integration network for biomedical data classification and biomarker identification. Brief. Bioinform. 2025, 26, bbae658. [Google Scholar] [CrossRef] [PubMed]
  35. Ozdemir, C.; Vashishath, Y.; Bozdag, S.; Initiative, A.D.N. IGCN: Integrative Graph Convolution Networks for Patient Level Insights and Biomarker Discovery in Multi-Omics Integration. Bioinformatics 2025, 41, btaf313. [Google Scholar] [CrossRef] [PubMed]
  36. Zhao, J.; Bao, H.; Guan, P.; Zhao, X.; Wang, B.; Yan, Z.; Zhao, C.; Zhao, Y.; Lu, X.; Xu, G. SMODA: Interpretable Multimodal Omics Integration for Disease Classification and Subtype Discovery via Heterogeneous Transfer Learning. Anal. Chem. 2026, 98, 10997–11009. [Google Scholar] [CrossRef]
Figure 1. Schematic illustration of the Dual-Branch Contrastive Learning for Multi-Omics Dynamic Fusion Network (DBCL-DFNet), which integrates three types of omics data: DNA methylation, mRNA expression, and miRNA expression. The blue, red, and green branches represent DNA methylation, mRNA expression, and miRNA expression data, respectively.
Figure 1. Schematic illustration of the Dual-Branch Contrastive Learning for Multi-Omics Dynamic Fusion Network (DBCL-DFNet), which integrates three types of omics data: DNA methylation, mRNA expression, and miRNA expression. The blue, red, and green branches represent DNA methylation, mRNA expression, and miRNA expression data, respectively.
Entropy 28 00616 g001
Figure 2. Workflow diagram of the heterogeneous graph branch using DNA methylation as an illustrative example. The three internal branches correspond to the patient similarity network, feature–patient association network, and feature similarity network within the heterogeneous graph. The blue and cyan modules are used to distinguish different representation-processing paths. Black arrows indicate the direction of information flow, curved arrows denote the interaction and residual fusion between GAT and Mamba modules, dotted arrows indicate cross-branch information transfer, and plus signs represent feature fusion.
Figure 2. Workflow diagram of the heterogeneous graph branch using DNA methylation as an illustrative example. The three internal branches correspond to the patient similarity network, feature–patient association network, and feature similarity network within the heterogeneous graph. The blue and cyan modules are used to distinguish different representation-processing paths. Black arrows indicate the direction of information flow, curved arrows denote the interaction and residual fusion between GAT and Mamba modules, dotted arrows indicate cross-branch information transfer, and plus signs represent feature fusion.
Entropy 28 00616 g002
Figure 3. Performance comparison of the model across different modality combinations on three datasets (based on mean and standard deviation from 5-fold cross-validation). Subplots: (a) LGG dataset, (b) RCC dataset, (c) BLCA dataset.
Figure 3. Performance comparison of the model across different modality combinations on three datasets (based on mean and standard deviation from 5-fold cross-validation). Subplots: (a) LGG dataset, (b) RCC dataset, (c) BLCA dataset.
Entropy 28 00616 g003
Figure 4. Average dynamic attention weights of DNA methylation, mRNA expression, and miRNA expression across different class labels. (a) LGG dataset (Grade II vs. Grade III). (b) RCC dataset (KICH, KIRC, KIRP). (c) BLCA dataset (high-grade vs. low-grade).
Figure 4. Average dynamic attention weights of DNA methylation, mRNA expression, and miRNA expression across different class labels. (a) LGG dataset (Grade II vs. Grade III). (b) RCC dataset (KICH, KIRC, KIRP). (c) BLCA dataset (high-grade vs. low-grade).
Entropy 28 00616 g004
Table 1. Summary of multi-omics data.
Table 1. Summary of multi-omics data.
No.DatasetCategoriesPatientsNumber of Features
DNAmRNAmiRNA
1LGGGrade II: 254,
Grade III: 268
52282771166287
2RCCKICH: 65, KIRC:
201, KIRP: 294
56041072456238
3BLCAHigh-grade: 397,
Low-grade: 21
41879992373249
Table 2. Key hyperparameter settings of DBCL-DFNet.
Table 2. Key hyperparameter settings of DBCL-DFNet.
No.HyperparameterLGGRCCBLCA
1Max epochs200100200
2Learning rate 1.0 × 10 4 1.0 × 10 3 3.3 × 10 4
3Weight decay 4.0 × 10 5 1.0 × 10 3 1.6 × 10 6
4GAT dropout0.250.000.30
5Transformer dropout0.200.200.20
6GAT/Transformer heads4/42/45/4
7GAT/Transformer layers3/43/43/4
8Selected graph features200200200
9Patient sparsity rate0.880.800.90
10Contrastive temperature τ 0.50.50.5
11Loss weights ( λ 1 , λ 2 ) 0.5, 0.50.5, 0.50.5, 0.5
Table 3. The classification performance is evaluated on three datasets.
Table 3. The classification performance is evaluated on three datasets.
DataMetricKNNRFXGBoostGCNMOGONETCancerSDTMODINETMOREIGCNSMODAOurs
LGGAccuracy0.667 ± 0.0530.703 ± 0.0560.678 ± 0.0780.663 ± 0.0440.674 ± 0.0600.699 ± 0.0290.691 ± 0.0680.680 ± 0.0420.692 ± 0.0610.709 ± 0.0420.741 ± 0.049
AUROC0.670 ± 0.0520.704 ± 0.0550.679 ± 0.0780.704 ± 0.1840.716 ± 0.0500.765 ± 0.0390.737 ± 0.0460.734 ± 0.0300.722 ± 0.0750.765 ± 0.0350.783 ± 0.051
Precision0.737 ± 0.0660.735 ± 0.0580.715 ± 0.1080.712 ± 0.0270.681 ± 0.0570.713 ± 0.0290.721 ± 0.0940.698 ± 0.0570.759 ± 0.1030.756 ± 0.0570.787 ± 0.096
F1 Score0.631 ± 0.0580.694 ± 0.0640.671 ± 0.0880.659 ± 0.0500.685 ± 0.0560.699 ± 0.0290.689 ± 0.0710.677 ± 0.0430.688 ± 0.0610.707 ± 0.0420.739 ± 0.050
Recall0.552 ± 0.1160.664 ± 0.1180.638 ± 0.1070.575 ± 0.1270.689 ± 0.1020.694 ± 0.0450.679 ± 0.0460.683 ± 0.0680.613 ± 0.0800.646 ± 0.0580.709 ± 0.044
Specificity0.788 ± 0.0770.745 ± 0.0830.721 ± 0.1140.756 ± 0.0550.657 ± 0.0790.705 ± 0.0380.704 ± 0.1580.676 ± 0.1210.775 ± 0.1380.775 ± 0.0700.775 ± 0.140
NPV0.630 ± 0.0560.686 ± 0.0710.657 ± 0.0760.635 ± 0.0630.674 ± 0.0730.713 ± 0.0290.704 ± 0.1580.676 ± 0.1210.654 ± 0.0460.675 ± 0.0410.715 ± 0.018
RCCAccuracy0.946 ± 0.0280.950 ± 0.0260.955 ± 0.0220.952 ± 0.0250.952 ± 0.0250.964 ± 0.0250.964 ± 0.0180.954 ± 0.0180.952 ± 0.0280.966 ± 0.0240.980 ± 0.013
Macro F10.944 ± 0.0250.950 ± 0.0200.955 ± 0.0190.951 ± 0.0280.953 ± 0.0220.962 ± 0.0300.964 ± 0.0260.953 ± 0.0200.954 ± 0.0290.963 ± 0.0290.977 ± 0.016
Micro F10.946 ± 0.0280.950 ± 0.0260.955 ± 0.0220.952 ± 0.0250.952 ± 0.0250.964 ± 0.0250.964 ± 0.0180.954 ± 0.0180.952 ± 0.0280.966 ± 0.0240.980 ± 0.013
Weighted F10.947 ± 0.0280.950 ± 0.0260.955 ± 0.0220.952 ± 0.0240.952 ± 0.0260.964 ± 0.0240.964 ± 0.0180.953 ± 0.0180.952 ± 0.0270.966 ± 0.0240.980 ± 0.013
Precision0.949 ± 0.0270.950 ± 0.0250.955 ± 0.0200.953 ± 0.0230.956 ± 0.0230.965 ± 0.0230.961 ± 0.0340.954 ± 0.0210.953 ± 0.0270.967 ± 0.0230.981 ± 0.013
Recall0.946 ± 0.0280.950 ± 0.0260.955 ± 0.0210.952 ± 0.0250.952 ± 0.0250.964 ± 0.0250.968 ± 0.0180.952 ± 0.0230.952 ± 0.0280.966 ± 0.0240.980 ± 0.013
BLCAAccuracy0.955 ± 0.0270.955 ± 0.0220.964 ± 0.0160.955 ± 0.0140.948 ± 0.0230.964 ± 0.0120.962 ± 0.0130.957 ± 0.0120.959 ± 0.0100.962 ± 0.0140.976 ± 0.007
AUROC0.779 ± 0.1910.684 ± 0.1490.760 ± 0.1750.920 ± 0.0230.884 ± 0.1600.962 ± 0.0290.963 ± 0.0180.932 ± 0.0340.934 ± 0.0670.964 ± 0.0190.966 ± 0.029
Precision0.517 ± 0.3290.600 ± 0.4360.650 ± 0.3690.513 ± 0.3280.292 ± 0.4070.650 ± 0.1370.720 ± 0.2050.383 ± 0.3230.617 ± 0.1000.639 ± 0.1190.833 ± 0.139
F1 Score0.780 ± 0.1910.680 ± 0.1500.757 ± 0.1780.685 ± 0.1190.611 ± 0.1680.791 ± 0.0930.789 ± 0.0730.671 ± 0.1570.738 ± 0.0810.812 ± 0.0480.868 ± 0.039
Recall0.583 ± 0.3820.383 ± 0.2990.533 ± 0.3560.340 ± 0.2060.250 ± 0.3350.570 ± 0.2080.610 ± 0.2230.350 ± 0.3000.430 ± 0.1860.670 ± 0.1030.720 ± 0.169
Specificity0.975 ± 0.0200.985 ± 0.0200.987 ± 0.0130.987 ± 0.0140.985 ± 0.0230.985 ± 0.0060.980 ± 0.0190.990 ± 0.0090.987 ± 0.0000.977 ± 0.0150.990 ± 0.009
NPV0.978 ± 0.0200.968 ± 0.0160.976 ± 0.0190.966 ± 0.0110.961 ± 0.0190.650 ± 0.1370.980 ± 0.0190.966 ± 0.0170.970 ± 0.0100.982 ± 0.0060.985 ± 0.009
Bold values indicate the best performance, and underlined values indicate the second-best performance. For the binary-class datasets (LGG and BLCA), Precision and Recall are indicators for the high-grade (positive) class, while NPV and Specificity are indicators for the low-grade (negative) class. F1 Score is the macro-average (i.e., the average of the F1 scores of the positive and negative classes).
Table 4. Ablation studies are conducted on three datasets.
Table 4. Ablation studies are conducted on three datasets.
DataMetricW/oHEW/oTRW/oGAW/oMAW/oGRW/oCLW/oDAOurs
LGGAccuracy0.697 ± 0.0270.665 ± 0.0130.703 ± 0.0290.697 ± 0.0390.674 ± 0.0330.690 ± 0.0390.669 ± 0.0290.741 ± 0.049
AUROC0.748 ± 0.0270.668 ± 0.0310.765 ± 0.0310.761 ± 0.0390.742 ± 0.0510.756 ± 0.0300.743 ± 0.0420.783 ± 0.051
Precision0.741 ± 0.0460.690 ± 0.0260.734 ± 0.0520.761 ± 0.0860.732 ± 0.0860.725 ± 0.0530.732 ± 0.0630.787 ± 0.096
F1 Score0.696 ± 0.0270.664 ± 0.0130.701 ± 0.0300.694 ± 0.0380.668 ± 0.0330.688 ± 0.0390.665 ± 0.0280.739 ± 0.050
Recall0.638 ± 0.0330.634 ± 0.0330.676 ± 0.0700.623 ± 0.0820.616 ± 0.1250.646 ± 0.0750.575 ± 0.0650.709 ± 0.044
Specificity0.759 ± 0.0670.697 ± 0.0510.732 ± 0.0950.775 ± 0.1170.736 ± 0.1430.736 ± 0.0710.767 ± 0.0900.775 ± 0.140
NPV0.665 ± 0.0180.644 ± 0.0120.684 ± 0.0310.664 ± 0.0340.653 ± 0.0500.666 ± 0.0410.632 ± 0.0250.715 ± 0.018
RCCAccuracy0.970 ± 0.0200.957 ± 0.0170.964 ± 0.0190.962 ± 0.0260.952 ± 0.0280.961 ± 0.0270.957 ± 0.0330.980 ± 0.013
Macro F10.969 ± 0.0180.955 ± 0.0210.956 ± 0.0290.958 ± 0.0310.948 ± 0.0340.955 ± 0.0260.952 ± 0.0330.977 ± 0.016
Micro F10.970 ± 0.0200.957 ± 0.0170.964 ± 0.0190.962 ± 0.0260.952 ± 0.0280.961 ± 0.0270.957 ± 0.0330.980 ± 0.013
Weighted F10.970 ± 0.0200.957 ± 0.0180.964 ± 0.0190.963 ± 0.0260.952 ± 0.0280.961 ± 0.0270.957 ± 0.0330.980 ± 0.013
Precision0.971 ± 0.0200.958 ± 0.0180.965 ± 0.0180.965 ± 0.0230.954 ± 0.0260.962 ± 0.0260.958 ± 0.0320.981 ± 0.013
Recall0.970 ± 0.0200.957 ± 0.0170.964 ± 0.0190.962 ± 0.0260.952 ± 0.0280.961 ± 0.0270.957 ± 0.0330.980 ± 0.013
BLCAAccuracy0.964 ± 0.0110.954 ± 0.0200.962 ± 0.0090.959 ± 0.0100.955 ± 0.0180.962 ± 0.0140.952 ± 0.0140.976 ± 0.007
AUROC0.932 ± 0.0570.888 ± 0.1290.852 ± 0.1620.926 ± 0.0680.959 ± 0.0270.780 ± 0.1530.928 ± 0.0640.966 ± 0.029
Precision0.700 ± 0.1870.457 ± 0.2930.717 ± 0.1630.763 ± 0.2250.594 ± 0.1610.750 ± 0.2470.550 ± 0.0920.833 ± 0.139
F1 Score0.809 ± 0.0440.741 ± 0.1610.762 ± 0.0600.738 ± 0.0540.791 ± 0.0490.750 ± 0.0880.783 ± 0.0400.868 ± 0.039
Recall0.620 ± 0.1120.600 ± 0.4240.480 ± 0.1630.420 ± 0.1440.670 ± 0.1030.430 ± 0.1860.720 ± 0.2430.720 ± 0.169
Specificity0.982 ± 0.0130.972 ± 0.0160.987 ± 0.0080.987 ± 0.0140.970 ± 0.0220.990 ± 0.0090.967 ± 0.0190.990 ± 0.009
NPV0.980 ± 0.0060.980 ± 0.0180.973 ± 0.0090.970 ± 0.0060.982 ± 0.0060.970 ± 0.0100.985 ± 0.0130.985 ± 0.009
Bold values indicate the best performance, and underlined values indicate the second-best performance.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dang, Y.; Yan, X.; Zhou, L.; Li, D. DBCL-DFNet: Dual-Branch Contrastive Learning for Multi-Omics Dynamic Fusion. Entropy 2026, 28, 616. https://doi.org/10.3390/e28060616

AMA Style

Dang Y, Yan X, Zhou L, Li D. DBCL-DFNet: Dual-Branch Contrastive Learning for Multi-Omics Dynamic Fusion. Entropy. 2026; 28(6):616. https://doi.org/10.3390/e28060616

Chicago/Turabian Style

Dang, Yun, Xiaoran Yan, Li Zhou, and Dongxi Li. 2026. "DBCL-DFNet: Dual-Branch Contrastive Learning for Multi-Omics Dynamic Fusion" Entropy 28, no. 6: 616. https://doi.org/10.3390/e28060616

APA Style

Dang, Y., Yan, X., Zhou, L., & Li, D. (2026). DBCL-DFNet: Dual-Branch Contrastive Learning for Multi-Omics Dynamic Fusion. Entropy, 28(6), 616. https://doi.org/10.3390/e28060616

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop