TEMINET: A Co-Informative and Trustworthy Multi-Omics Integration Network for Diagnostic Prediction

Advancing the domain of biomedical investigation, integrated multi-omics data have shown exceptional performance in elucidating complex human diseases. However, as the variety of omics information expands, precisely perceiving the informativeness of intra- and inter-omics becomes challenging due to the intricate interrelations, thus presenting significant challenges in the integration of multi-omics data. To address this, we introduce a novel multi-omics integration approach, referred to as TEMINET. This approach enhances diagnostic prediction by leveraging an intra-omics co-informative representation module and a trustworthy learning strategy used to address inter-omics fusion. Considering the multifactorial nature of complex diseases, TEMINET utilizes intra-omics features to construct disease-specific networks; then, it applies graph attention networks and a multi-level framework to capture more collective informativeness than pairwise relations. To perceive the contribution of co-informative representations within intra-omics, we designed a trustworthy learning strategy to identify the reliability of each omics in integration. To integrate inter-omics information, a combined-beliefs fusion approach is deployed to harmonize the trustworthy representations of different omics types effectively. Our experiments across four different diseases using mRNA, methylation, and miRNA data demonstrate that TEMINET achieves advanced performance and robustness in classification tasks.


Introduction
In light of recent advancements in the acquisition of high-throughput omics data, multi-omics integration is rapidly expanding as a research field with the aim of providing a more comprehensive understanding of the underlying biological processes and molecular mechanisms involved in complex diseases [1,2].Compared to single-omics studies, integrating multiple types of omics data enables the capture of complementary information across various molecular layers, leading to a more holistic view of biological systems [3].Traditional approaches often involve statistical tools, which may have a limited capacity to fully capture the complex, non-linear relationships present in multi-omics datasets.In recent years, the application of artificial neural networks (ANNs) to multi-omics studies has emerged as a promising avenue to address these limitations [4][5][6].ANN approaches can learn intricate patterns within data, enabling more accurate predictions and classifications, as well as the identification of previously unexplored relationships between molecular entities.These methods have shown great potential in various biomedical applications, such as disease prediction, patient stratification, and the discovery of novel biomarkers [7].
Despite significant achievements, the stability and biological explainability of many existing multi-omics integration approaches remain underdeveloped, primarily because of the insufficient exploration of both intra-and inter-omics interaction information.The existing multi-omics integration strategies can be grouped mainly into three categories: early, intermediate, and late fusion [7][8][9].Early integration, such as fully connected neural networks (FCNNs) [10][11][12] and autoencoders (AEs) [13][14][15], perform integration through concatenating representations without considering the complex inter-omics interactions.Intermediate fusion emphasizes the interactions within inter-omics, rather than perceiving each omics type individually.For example, variational AEs (VAEs) are widely used to perform a fusion of homogeneous data types in a joint manner [16,17].Despite the advantages of coordinated representation learning in intermediate fusion, it operates under the presumption of an equal contribution from each modality.This presumption can be challenging for the fitting capabilities of ANNs, especially in cases of feature imbalance, severe missing modalities, or substantial noise interference.Considering the uncertainty of such situations, trustworthy learning has been adopted.It involves transforming the operation from feature embedding to a decision-level process, which ultimately stabilizes and enhances the outcomes of late integration strategies.Han et al. [18] introduced a dynamic fusion approach for multi-modal classification.This method employed true-class probability to assess the classification confidence across different omics and then performed adjustments through modality confidence weighting for integration.Wang et al. [19] proposed Mogonet, which employed a view correlation discovery network (VCDN) to integrate initial classifications instead of fusing features across modalities, thereby utilizing label-correlated information in the shared space to produce final classification labels.We observed that, by perceiving and integrating the informativeness of each modality and inter-modality from distinct samples in trustworthy multimodal learning, practical applications can be significantly enhanced.However, traditional intra-modality information embedding may inadequately capture the full scope of informativeness, primarily due to the biological explainability of intra-omics.Such studies may overlook crucial aspects of the underlying biological processes, potentially resulting in a limited comprehension of the intricate molecular networks that drive various biological systems [20].Ultimately, this could result in challenges to effective integration.
In omics research, it is crucial to acknowledge the interconnections among molecular functions, given the multifactorial nature of complex diseases [21,22].Instead of treating each genetic factor as an independent entity, this approach allows for a more comprehensive understanding.Consequently, more investigations are focused on constructing disease networks by reconfiguring omics data into graph-based structures, reflecting the growing recognition of the importance of contextualized molecular interactions in understanding disease mechanisms.For instance, Ramirez et al. [23] employed a graph convolutional network (GCN) approach for cancer classification, leveraging a framework of gene co-expression based on Spearman correlating analysis.Althubaiti et al. [24] utilized the DeepMOCCA framework, combining omics data with protein-protein interaction networks for improved cancer prognosis predictions.Tang et al. introduced SiGra [25] and SpaRx [26], which used graph-based approaches to decode complex spatial tissue structures and nuanced cellular drug responses, showcasing a deeper insight into the dynamics of molecular interactions in biomedical research.Furthermore, Xing et al. [27] adopted an approach involving a weighted correlating method to build prior knowledge graphs.This method was particularly advantageous for unraveling disease-specific complexities.Therefore, inspired by the effectiveness of network representations in omics studies, we were motivated to adopt a graph-based representation to enhance the informativeness of intra-omics.This approach is intuitive and easy to interpret.It has the potential to better preserve the inherent structure and capture functional interactions from omics data, ultimately leading to improved disease prediction performance.
Based on the above observations, we propose a multi-omics integration framework, named TEMINET, for disease-predictive diagnosis that leverages graph attention networks and an uncertainty-based trustworthy strategy.Specifically, we construct a disease-specific network for each omics data to represent large-scale, unstructured, and irregular data effectively.We apply hierarchical graph attention networks (GATs) to capture co-informative intra-omics representations.Then, a trustworthy learning mechanism is employed to assess the reliability of informativeness using uncertainty information.Combine beliefs fusion integrates informativeness and uncertainty into an inter-omics fusion embedding for subsequent tasks.We conduct extensive experiments to show the effectiveness and robustness of TEMINET for multi-omics prediction.Our results demonstrate that the combination of graph-based feature representation and uncertainty-based trustworthy learning integration surpasses the state-of-the-art models on four disease classification tasks.We further employ a global interpretation approach to identify important biomarkers and analyze their disease-related functional relevance.

Results
Our research evaluated the effectiveness of the proposed model relative to current methodologies across diverse classification tasks.We also explored the scalability through the incorporation of additional omics data types and conducted a robustness study to assess the generalizability of the model.In the comparative analysis of various methodologies, accuracy (ACC) is a common metric employed for binary and multi-class classifications.In addition to ACC, binary classifications also utilize the F1 score (F1) and the area under the receiver operating characteristic curve (AUC).For multi-class classifications, the ACC, the weighted average F1 score (F1_weighted), and the macro-averaged F1 score (F1_macro) are used.Our experimental framework replicated the same settings as those used in Mogonet [19] and five random experiments to report the mean and standard deviation of evaluation metrics.To demonstrate a significant improvement in our model compared to the suboptimal method, we conducted an independent t-test.An obtained p-value of less than 0.05 indicated the statistical significance of the improvement.

Classification Performance Comparison
In Table 1, a comparative analysis between the proposed model and established methods on ROSMAP and LGG datasets is provided.Compared to alternative approaches, the outcomes demonstrated that our suggested model performed better in binary classification tests.TEMINET outperformed the suboptimal method in many evaluation metrics, though not in the AUC metric for the ROSMAP dataset.The outcomes demonstrated the advantages of integrating informative data utilizing a multi-level, graph-based attention framework and disease-specific networks.Moreover, the proposed model significantly exceeded the performance of MODILM (GAT).This advantage was likely due to the incorporation of uncertainty-based adaptive fusion, enhancing the capability of the model to select and utilize the informative modalities, thus ensuring a more precise characterization of each subject.As shown in Table 2, the proposed method continued to lead in overall performance, yet it exhibited slightly lower efficacy in the five-class BRCA task compared to the top-performing method in this specific area.This further indicated the strength of our model in leveraging disease-specific network analysis and graph attention mechanisms, underscoring its potential despite the room for improvement in certain multi-class classification tasks.99.9 ± 0.2 99.9 ± 0.2 99.9 ± 0.2 1 The terms F1-M and F1-W denote the F1 macro and weighted scores, respectively.

Ablation Comparison of the Model Key Component
In an extensive ablation study of our framework, as shown in Table 3, performance metrics were compared against established benchmarks.Our model incorporated an advanced uncertainty-based mechanism and omics-specific co-expression networks, and it achieved increases in ACC and F1, particularly within the ROSMAP dataset, with which it outstripped the GAT+NN model by 4.3% and 4.4%.However, there was a slight decrement in the AUC by 1.0% compared to the second-best models.This outcome suggested that, while our model advanced predictive accuracy, it did so with a trade-off in AUC performance, signaling an area for further refinement in balancing predictive precision with generalizability across diverse datasets.The result of the LGG and KIPAN datasets also revealed improvements across all metrics, reflecting the exceptional ability of our model to capture intricate data patterns.For the BRCA dataset, our model exhibited enhancements in both ACC and F1-M, yet it encountered a slight decrease in the F1-W score, indicating potential for further refinement in the equilibrium of precision and recall. 1 NN refers to a neural network. 2 The terms F1-M and F1-W denote the F1 macro and weighted scores, respectively.

Ablation Study Comparing Integration Performance across Varied Omics Categories
In our investigation, we analyzed the effectiveness of different omics data combinations for classification performance.Figure 1 shows that using all three omics types outperformed combinations of just two.This result underlined the individual and substantial contributions provided via different omics categories.Moreover, it confirmed the advantage of integrating multiple omics approaches.Our results demonstrated that TEMINET significantly improved classification performance by integrating multi-omics informative data.Remarkably, these enhancements became more pronounced with the gradual inclusion of diverse omics types.This observation underscores the scalability and adaptability of our model, suggesting its potential for broader applications in the field of biomedicine.

Robustness Study Involving Comparisons with Advanced Methods
In the robustness experiments, we increased the masked ratio to heighten the uncertainty of a specified modality.This method assessed the robustness of our model by comparing the reduction in accuracy as the data became increasingly incomplete or uncertain.Figure 2 demonstrates that TEMINET exhibited superior stability, with consistently lower accuracy reduction ratios across all masked ratios, compared to Mogonet and Dynamic.The robustness of TEMINET was attributed to its use of a graph-based topology and an uncertainty-based trust mechanism.While Mogonet also employed a graph-based approach by constructing similarity graphs among samples and incorporating the VCDN trust mechanism, its performance was moderate.Conversely, the Dynamic method, which relied on an encoder network and a confidence-based trust mechanism, displayed a greater decrease in accuracy, indicating reduced robustness relative to TEMINET.The findings underscored that TEMINET not only maintained lower accuracy reduction ratios but also exhibited stability across various levels of data masking, highlighting the effectiveness of its graph-based approach and uncertainty-based trust mechanism in preserving model robustness.This robustness enhances the generalizability and applicability of the model.

Figure 2.
The robustness of the proposed approach was evaluated by comparing it with that of advanced methods.KIPAN was excluded from this comparison since it is relatively easy to classify.

Important Biomarkers Identified via TEMINET
In interpreting the model, the primary objective was to identify biomarkers of significance.As shown in Table 4, the five most discriminating biomarkers based on their differential values were reported.The biomarkers with identical values were assigned the same ranking.If the number of tied positions exceeded the threshold for reporting (i.e., five distinct ranks), a random set from the tied group was chosen to fulfill the report.Subsequently, we conducted a brief review of the existing literature to elucidate the biological significance and disease association of these identified biomarkers.
As shown in Table 4, existing advances in AD research have identified biomarkers associated with its pathogenesis.As the most significant mRNA biomarker of AD identified via TEMINET, MEIS3 was revealed through differential expression analysis to be considerably linked with cognitive decline and increased neurofibrillary tangle density [39,40].Complementing this, cg19485804 (NGEF) emerged from LASSO regression analysis as another insight [41], notably associated with the APP mutation in mouse models.The downregulation of NGEF in the CVN-AD model suggested a critical role in modifying actin dynamics and consequently disrupting neuronal growth cone motility [42].Furthermore, the microRNA miR-132 has been associated with the progression of Aβ and tau pathologies, with its reduced levels in circulation suggesting its utility as a potential diagnostic insight for AD.In the realm of breast cancer, CA9 has been identified as an mRNA biomarker.A study showed that BRCA patients with lower levels of CA9 derive more benefit from adjuvant therapies, suggesting that CA9 expression could be instrumental in tailoring patient-specific treatment plans [43].The interaction between IGF2BP3, TRIM25, and miR-3614 delineated a novel regulatory pathway crucial for tumor cell proliferation.The protective role of IGF2BP3 in safeguarding TRIM25 mRNA from degradation and its influence on miR-3614 maturation presented new potential targets for therapeutic in-tervention in BRCA [44].In LGG, our model did not reveal any methylation or miRNA biomarkers.However, ADD3 was identified as the leading mRNA biomarker.Essential for actin cytoskeleton assembly, ADD3 deficiency in GBM cells triggered pro-angiogenic signaling, enhancing VEGFR expression in endothelial cells, which could have implications for angiogenesis in LGG [45].Suggested as a tumor suppressor and survival predictor on chromosome 10q, ADD3 was valuable for prognostic assessments in LGG [46].In kidneyrelated cancers, miR-126 has been recognized for its strong prognostic potential in clear-cell renal cell carcinoma (ccRCC) [47], while miR-1271, markedly upregulated in ccRCC tissues, has emerged as a significant marker for assessing disease severity [48].These findings affirmed the robust capability of TEMINET in elucidating complex biological markers pertinent to disease mechanisms and therapy responsiveness.

Discussion
The advancement of high-throughput techniques and individualized healthcare approaches has produced various supervised data collections critical for predictive applications such as pinpointing disease conditions, classifying tumor stages, and distinguishing cancer subtypes.The fusion of multi-omics information has demonstrated enhanced efficacy in disease prediction compared to single omics approaches.For clinical applications, these integration models must not only provide precise diagnostic guidance but also cover a wide range of diseases.This underscores the necessity for models exhibiting high accuracy and strong generalization across diverse medical conditions.
To achieve optimal multi-omics integration, the informativeness of modality representation has increasingly attracted attention.On the one hand, this informativeness reflects the quality of features specific to each omics type.This quality is contingent not only on the methodologies employed for feature representation but also on the inherent quality of the data.This is because data quality can be compromised during collection, storage, and processing, leading to potential loss and degradation.On the other hand, the informativeness of a modality significantly determines its contribution to the integration process.This contribution is measured by the extent to which data from a modality can complement or enhance the understanding that other omics types provide.It concerns not merely the quantity of data each modality brings but also the unique biological insights it offers that cannot be captured by others.Therefore, evaluating the informativeness of each modality representation is essential to ensure that the most informative and relevant features are utilized for better predictive accuracy and a deeper biological understanding.
In this study, we introduced TEMINET, a framework optimized for the trustworthy integration of multi-omics datasets.The advanced performance of TEMINET can be attributed to the joint observation of intra-omics molecular interactions and inter-omics informativeness.The framework addressed multifaceted complexities in disease patho-genesis by amalgamating omics data with topological models.Our approach involved constructing individual graphs for each subject within all omics data.This data formation strategy allowed us to leverage the inner topological information among intra-omics molecules obtained from disease-specific data to improve model performance.It emphasized the importance of the interplay between genetic factors in revealing the underlying causes of diseases.Investigations have shown that TEMINET outperforms in various metrics across four distinct tasks, demonstrating that enhancing intra-omics informativeness can significantly improve the performance of a trustworthy learning strategy in multiomics integration.The outperformance observed with four diseases, including one cerebral degeneration disease and three cancers, highlighted the generalizability and adaptability of TEMINET at the disease level.The robustness study also confirmed its generalizability, as a robust model produces more stable and reliable results, which are particularly essential in real-world application scenarios.Additionally, the combination ablation experiment conducted at the omics level confirmed the scalability of TEMINET, indicating that its integration capabilities significantly improve with the increasing variety of omics data types.Applied to four diverse diseases, TEMINET enhanced the understanding of disease mechanisms and patient stratification, revealing biomarkers as potential insights and offering precise classifications.These advancements assist healthcare professionals in developing personalized therapeutic interventions based on deeper insights into patient conditions.
However, the model exhibited several limitations.In comparison to other models, TEMINET demonstrated lower computational efficiency due to the construction of multiple omics-level networks for each sample, potentially posing challenges for practical deployment.Meanwhile, it also presented computational demands that became apparent when dealing with larger datasets, indicating potential scalability issues.While it concentrates on specific omics interactions, the model might overlook the broader dynamics between different omics.This oversight could reduce the AUC performance, indicating that a more balanced approach should be considered in future developments.
This study can be additionally extended towards some future directions.For example, one direction would be to extend the capability of TEMINET by incorporating spatial transcriptomics data [25].This enhancement would enable the analysis of not only mRNA, methylation, and miRNA but also the exploration of the spatial dimensions of cellular behavior and interactions.Another direction would be to improve the computational efficiency of the model and make it applicable to a broader range of diseases, thereby enhancing its generalizability.

Overview of TEMINET
The proposed method is illustrated in Figure 3.It begins with constructing a coexpression graph for the omics data of each subject via weighted gene co-expression network analysis (WGCNA).The second step involves generating initial classification results for each omics data using a multi-level GAT process.This process utilizes three layers for the extraction of intra-omics features, encompassing G 0 , G 1 , and G 2 .G 1 is updated from G 0 , and G 2 is subsequently updated from G 1 .Thirdly, the uncertainty of each initial distribution is parameterized using subjective logic.Based on the Dempster-Shafer theory, the integration of multi-omics evidence comes from the inference of overall uncertainty and classification probability.The whole inference process concludes with the final label prediction of each subject.
{b ,b , ,b ,u } The first intra-omics network was built using the WGCNA.(C) The intra-omic information at each omics level was augmented using the multi-level GAT.(D) The evidence was evaluated via the subject logic module to determine the uncertainty.During the integration phase, the trustworthy informativeness and uncertainty of each omics were amalgamated into composite embedding that encompassed inter-omics information.The fusing representation was subsequently applied to implement a downstream classification task.

Dataset Overview
In our investigation, we performed analysis using four public benchmark datasets, including ROSMAP for binary classification (distinguishing between Alzheimer's disease (AD) and a normal control (NC)), BRCA for the classification of breast-invasive carcinoma into PAM50 subtypes (including five categories), low-grade glioma (LGG) for distinguishing between grade I I and grade I I I, and KIPAN, referring to the classification of renal cell carcinoma subtypes.All datasets were acquired from Wang et al. [19], and they each contained three types of omics data: mRNA expression, DNA methylation, and miRNA expression.Detailed information regarding data acquisition and preprocessing was available in [19].In brief, features with no signal (mean = 0) and those with low variances (standard deviation = 0.1 for mRNA, 0.001 for DNA methylation, and 0 for miRNA) were filtered out.To optimize the feature selection, the ANOVA F-value and principal component analysis (PCA) were employed.Initially, ANOVA tests were utilized to preselect features, reducing the impact of redundant ones.Subsequently, PCA was applied to the preselected features, aiming for the first principal component to account for less than 50% of the variance, thereby avoiding the overrepresentation of any individual feature within the dataset.Each feature was then scaled to the range of [0,1] through a linear transformation.Comprehensive details regarding these datasets are presented in Table 5.

Intra-Omics Network Construction
The development of functional interaction networks is integral to understanding the pathogenesis of complex diseases.To leverage synergistic relationships among intra-omics molecules, the initial step in our methodology involved implementing the WGCNA [49]) to construct intra-omics co-expression networks, as shown in Figure 3B.The construction of the intra-omics graph G 0 for each subject involved several key stages: Firstly, an adjacency matrix was formed through the WGCNA, with each entry indicating the correlation strength between pairs of omics features.Secondly, this matrix was transformed into an edge matrix by applying a threshold.Thirdly, a co-expression network was constructed for individual subjects, assigning omics data expression values to nodes as their features.
In this context, an initial co-expression graph network of each patient was denoted as G 0 = G(V d×1 0 , E d×d 0 ).Here, V d×1 symbolizes the attributes of d nodes.E d×d represents the edge matrix derived from the co-expression correlations computed via the WGCNA.Specifically, for each subject n, a feature vector of dimension 1 × d was generated, where d represents the number of features.For a group of N subjects within a single omics data, an N × d matrix was formulated to calculate the co-expression matrix A d×d .The correlation A ij between node v i and v j was determined as follows: where vi and vj denote the mean values of nodes v i and v j , and β denotes an adjustable parameter set through WGCNA.The edge matrix E d×d 0 was then obtained by binarizing the values from the matrix A d×d .The optimal threshold for binarizing was determined through a grid search approach, where the threshold parameter was varied within a range of 0.05 to 0.5.The optimal threshold setting in our study was 0.08.Intra-omics co-expression networks for mRNA, methylation, and miRNA datasets were constructed similarly in our study.

Intra-Omics Informative Augmentation
To leverage the information embedding in nodes and their associations within coexpression matrices effectively, we introduced GAT [50] to enhance the disease-specific characteristics of the omics dataset.GAT incorporated masked self-attention-based layers to enable the dynamic weighting of neighboring node contributions, which allowed GAT to selectively focus on more pertinent adjacent nodes, thereby diminishing the impact of nodes that were less significant.As a result, GAT exhibited a superior ability to discern intricate and non-structure connections, as well as variations within the topology of the graph.
Specifically, an initial network for each omics subject n, denoted as G 0 = G(V n 0 , E 0 ) in Figure 3B, was updated through a GAT layer.Initially, for a node h i within this network, the normalization of attention coefficients α ij with its neighboring nodes h j was calculated as follows: where ∥ is the concatenation operation, and W C is the shared parameter matrix for linear transformation.To ensure a more stable self-attention update, a multi-head approach was introduced [50].We conducted a process in which the attention-layer functions were implemented T times, each with unique parameters.The outcomes of these replications, indicated as h ′ i , then conducted an aggregating concatenation in sequence, as follows: where α t ij represents the attention coefficients from the t-th attention head, and W t R is the weight matrix associated with the t-th head.
To enhance the exploration of internal feature relationships, we incorporated the multilevel feature representation approach implemented by Xing et al. [27].The foundational network G 0 encapsulated data corresponding to the primal features.Subsequently, G 1 evolved from G 0 via a multi-head GAT attention layer.Analogously, G 2 was generated from G 1 .This progression through three progressive graph network layers created a hierarchical integration structure, systematically amalgamating features across the GAT layers.Ultimately, the vectors produced from these transformation stages were fused, resulting in comprehensive feature representations.This design facilitates a dynamic optimization of feature interplays within the network, leading to a more substantial and comprehensive representation of the fundamental biology mechanisms.Using a similar process, we also implemented the DNA methylation and miRNA informative augmentation for each subject.

Trust-Driven Multi-Omics Fusion
In traditional multi-omics integration methods, the trustworthiness of different datasets is often not adequately considered, leading to potential inaccuracies in understanding complex biological processes.To address this, we introduced an uncertainty-based trustworthy learning approach to our integration method [51].This approach enhances trustworthiness and precision in multi-omics data integration by quantifying inherent uncertainty in each modality.It leverages this measure of uncertainty to jointly perceive informativeness across intra-and inter-omics.Given that uncertainty assessments define confidence in predictions, the evidence of a dataset with lower uncertainty should achieve higher trustworthiness and be assigned a larger contribution to multi-omics integration.
Evidence in a classifier is generally considered the outputs of a neural network processed through an activation function like softmax.In our study, the evidence e m = [e m 1 , • • • , e m K ] for the m th omics category across K classes was generated from GATenhanced features.In the augmentation module, GAT-enhanced features were transformed into evidence through a sequence of fully connected layers and an active layer.Here, we set a cross-entropy loss L m GAT to modify the GAT augmentation module.Secondly, to obtain the uncertainty, we applied subjective logic [52] to the evidence For each class k in the m th omics category, the belief mass b m k and uncertainty mass u m were calculated: was obtained for the evidence e m .In summary, for the mth omics category, the more evidence gathered for each of the K classes, the higher the probability assigned to the respective class, thus reducing uncertainty.Conversely, a scarcity of evidence led to increased uncertainty.Utilizing subjective logic, this approach models second-order probabilities and uncertainties for the m th omics category, effectively countering the overconfidence often seen in traditional neural network classifiers.
Thirdly, within the methodological framework for multi-omics fusion, we applied the Dempster-Shafer theory to synthesize evidence from different classes.This process consolidates independent probability mass assignments from each class into a unified joint mass.The Dempster rule orchestrates this fusion to merge belief and uncertainty across the omics spectrum, symbolized as follows: where M F denotes the combined beliefs, and M m represents the opinion of the m th omics type.As illustrated in Figure 3D, we took a fusion of two omics categories as an example.The first category, M 1 , represented in orange, corresponded to the mRNA type and was denoted as The second category, M 2 , represented in green, corresponded to the methylation and was denoted as In the process of combining these two sets, we focused on recombining the compatible elements (indicated with brown blocks) while disregarding the mutually exclusive parts (shown as white blocks).The fusion process was implemented to form the combined beliefs, which were denoted as M F , as follows: where 1  1−C denotes the scale factor for normalization.The term C = ∑ i̸ =j b 1 i b 2 j represents a degree of conflict observed between the two sets of mass values, and b 1 i b 2 j represents the white blocks in Figure 3D.It can be observed that, in instances where both M 1 and M 2 exhibit high levels of uncertainty (with high values of u 1 and u 2 ), the resulting prediction manifests in low confidence, indicated by a lower value of b k .Conversely, when both sources demonstrate low uncertainty (low values of u 1 and u 2 ), the resulting prediction exhibits high confidence, indicated by a higher value of b k .When only one source shows low uncertainty (either u 1 or u 2 ), the final prediction relies on the more reliable source.After M F = {{b F k } K k=1 , u F } was obtained, the final evidence could be inferred with e k = b k × S, α k = e k + 1, and S = K u .Furthermore, we introduced an enhanced cross-entropy loss function by integrating sample-specific evidence, as follows: where α i is the parameter of the Dirichlet distribution for the ith sample, and ψ(•) is the digamma function.Building on this, an overall sample-specific loss function, which combined the adjusted cross-entropy loss with a Kullback-Leibler divergence term to effectively manage the evidence for incorrect labels, was defined as follows: The modified attribute of the Dirichlet distribution is αi = y i + (1 − y i )α i , and λ is a balance factor greater than zero.This design helps penalize incorrect class evidence while preserving the evidence for the correct class.To ensure that the informative augmentation and evidence fusion were updated simultaneously, a total loss was defined as follows: where γ denotes an adjusted attribute.We deployed γ = 1 across our investigations.

Identifying Biomarkers with TEMINET
In the realm of biomedical research, the elucidation of biomarkers is pivotal for unraveling the intricacies of biological processes and providing insight into diverse outcomes.Concurrently, there is a growing need in clinical research for interpretable models that elucidate underlying disease mechanisms and bolster model credibility.Consequently, we applied a global interpretation method to identify the importance of biomarkers in our model.Specifically, to evaluate omics features in computational models, the normalization of these features on a scale from zero to one was initially implemented.Feature ablation involved individually removing features to evaluate their impacts on the model efficacy, with a focus on classification capability.The importance of each feature was determined by observing the reduction in model performance post-ablation.In binary classification and multi-class classification tasks, the F1 score and F1 macro score were used to assess the impact of feature ablation on model performance, respectively.This process was implemented with the best-performing model.For multi-omics data, we implemented this strategy with each type of omics data.

Figure 1 .
Figure 1.The efficacy of the proposed approach on different omics data combinations was assessed, presenting means and standard errors for comparison.

Figure 3 .
Figure 3. Framework of TEMINET.(A) TEMINET operates on a sample-wise basis with multi-omics information of each sample being imported into the model.(B)The first intra-omics network was built using the WGCNA.(C) The intra-omic information at each omics level was augmented using the multi-level GAT.(D) The evidence was evaluated via the subject logic module to determine the uncertainty.During the integration phase, the trustworthy informativeness and uncertainty of each omics were amalgamated into composite embedding that encompassed inter-omics information.The fusing representation was subsequently applied to implement a downstream classification task.
where u m ≥ 0 indicates the overall uncertainty in the classification for the mth omics category and b m k ≥ 0 indicates the confidence in each class prediction.The concentration parameters α m of the Dirichlet distribution were determined from the evidence, where α m k = e m k + 1.The belief mass for each class was computed as b m k = e m k S m and the overall uncertainty as u

Table 1 .
Comparison with advanced methods using the ROSMAP and LGG datasets.The topperforming results are emphasized in boldface.Metrics marked with * indicate a significant improvement for our model compared to the suboptimal method, as confirmed with an independent t-test resulting in p < 0.05.

Table 2 .
Comparison with advanced methods on the BRCA and KIPAN datasets.The top-performing results are emphasized in boldface.Metrics marked with * indicate a significant improvement in our model compared to the suboptimal method, as confirmed with an independent t-test resulting in p < 0.05.

Table 3 .
This study examined key components of TEMINET with benchmark datasets.The topperforming results are highlighted.

Table 4 .
Top five significant disease-specific biomarkers identified using TEMINET.

Table 5 .
Overview of datasets used in this investigation.