Comprehensive Evaluation of Multi-Omics Clustering Algorithms for Cancer Molecular Subtyping

Wang, Juan; Wang, Lingxiao; Liu, Yi; Li, Xiao; Ma, Jie; Li, Mansheng; Zhu, Yunping

doi:10.3390/ijms26030963

Open AccessArticle

Comprehensive Evaluation of Multi-Omics Clustering Algorithms for Cancer Molecular Subtyping

by

Juan Wang

^1,2,†,

Lingxiao Wang

^2,†

,

Yi Liu

²,

Xiao Li

²,

Jie Ma

²,

Mansheng Li

^2,* and

Yunping Zhu

^1,2,*

¹

School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China

²

State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing 102206, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Int. J. Mol. Sci. 2025, 26(3), 963; https://doi.org/10.3390/ijms26030963

Submission received: 14 December 2024 / Revised: 15 January 2025 / Accepted: 21 January 2025 / Published: 23 January 2025

(This article belongs to the Special Issue Machine Learning in Disease Diagnosis and Treatment)

Download

Browse Figures

Versions Notes

Abstract

As a highly heterogeneous and complex disease, the identification of cancer’s molecular subtypes is crucial for accurate diagnosis and personalized treatment. The integration of multi-omics data enables a comprehensive interpretation of the molecular characteristics of cancer at various biological levels. In recent years, an increasing number of multi-omics clustering algorithms for cancer molecular subtyping have been proposed. However, the absence of a definitive gold standard makes it challenging to evaluate and compare these methods effectively. In this study, we developed a general framework for the comprehensive evaluation of multi-omics clustering algorithms and introduced an innovative metric, the accuracy-weighted average index, which simultaneously considers both clustering performance and clinical relevance. Using this framework, we performed a thorough evaluation and comparison of 11 state-of-the-art multi-omics clustering algorithms, including deep learning-based methods. By integrating the accuracy-weighted average index with computational efficiency, our analysis reveals that PIntMF demonstrates the best overall performance, making it a promising tool for molecular subtyping across a wide range of cancers.

Keywords:

multi-omics; cancer subtyping; clustering algorithm; accuracy-weighted average index

1. Introduction

Cancer is a disease with complex etiology and high heterogeneity, necessitating its classification into distinct types or subtypes based on various characteristics. In recent years, with the widespread adoption of high-throughput technologies and the continuous increase in omics data, cancer subtyping has progressively shifted from a traditional focus on disease sites and histomorphological features to a more molecularly driven approach [1]. Initially, molecular subtyping of cancer primarily focused on single-omics data, which, while providing insights into molecular alterations at specific levels, struggled to offer a comprehensive understanding of the full spectrum of cancer biology. Multi-omics integration can uncover how various molecular entities and biological processes interact and interrelate [2]. In order to better investigate the complex regulatory mechanisms underlying cancer, researchers have increasingly turned to the integration of multi-omics data for molecular subtyping analysis. To date, numerous multi-omics clustering algorithms have been proposed for the molecular subtyping of cancer. These algorithms are typically classified into one or more of the following categories based on their underlying principles: similarity, fusion, matrix factorization, Bayesian, network, dimensionality reduction, deep learning, and multivariate [3,4,5].

The analysis of multi-omics data is prone to various sources of error, bias, and uncertainty that can affect their quality and validity, so methods for their analysis must be rigorously and transparently evaluated to ensure reliability and accuracy [2]. The evaluation and comparison of multi-omics clustering methods remains challenging due to the absence of a gold standard. Currently, some progress has been made in this area. For example, Tini et al. [6] tested several integration methods, MCCA, MCIA, MFA, JIVE, and SNF, for classifying patients with varying signal strengths and noise levels using both real datasets (mouse liver, platelet reactivity, and breast cancer datasets) and simulated datasets. They evaluated classification performance using F-scores and accuracy indices. Similarly, Chauvel et al. [7] evaluated the ability of algorithms such as iCluster, moCluster, iNMF, JIVE, MDI, and BCC to recover the correct number of clusters and simulated clusters at the public and data-specific levels using simulated datasets, with validation on a breast cancer dataset. However, most of the existing studies have not specifically focused on cancer multi-omics datasets, nor have they assessed the generalizability of evaluation metrics across a broad spectrum of cancers. The majority of research has primarily concentrated on traditional methods, with less emphasis on the application and evaluation of emerging multi-omics clustering algorithms, such as those based on deep learning. This highlights the need for the development of a robust evaluation framework that can be broadly applied to multi-omics data across various cancer types, representing a promising direction for future research.

Under this situation, we developed a comprehensive evaluation process for multi-omics clustering algorithms and introduced a new metric, the accuracy-weighted average index (AWA). By integrating multi-omics data from a diverse set of cancers, we conducted a detailed comparative analysis of 11 state-of-the-art multi-omics clustering algorithms (MOSD [8], MSNE [9], MCLS [10], subtype-WESLR [11], PIntMF [12], nNMF [13], SMCC [14], MDICC [15], Parea [16], Subtype-GAN [17], and Subtype-DCC [18]), specifically focusing on their performance in the molecular subtyping of cancer (Figure 1). This evaluation not only helps identify the strengths and limitations of each algorithm but also provides valuable insights for guiding future algorithmic evaluation in this field.

2. Results

2.1. Internal Metrics of Clustering Algorithm

Herein, we first calculated the internal metrics (silhouette coefficient (S), Calinski–Harabasz index (CH), and Dunn’s index (D) values) for 11 multi-omics clustering algorithms applied to each of the nine TCGA cancer datasets (BRCA, BLCA, KIRC, LUAD, PAAD, SKCM, STAD, UCEC, and UVM). We then evaluated the clustering effectiveness of the algorithms by averaging the internal metric scores across the cancer datasets (Figure 2). The results showed that PIntMF achieved the highest average scores for internal metrics, indicating that its clustering performance was superior across the cancer datasets compared to the other 10 algorithms. Additionally, Subtype-DCC ranked in the top three for all three internal evaluation metrics, demonstrating similarly strong clustering performance. In contrast, MDICC and MCLS received relatively low scores, suggesting poorer clustering performance.

2.2. Clinical Metrics of Clustering Algorithm

Clinical performance between molecular subtypes is primarily assessed by survival differences and clinical label enrichment, quantified by the log rank test (LRT) and enriched clinical parameters (ECP) values, respectively. To minimize bias in the enrichment analysis, we selected a consistent set of clinical information across all cancer datasets, including gender, age at diagnosis, pathologic T, pathologic M, pathologic N, and pathologic stage. The clinical metric scores reveal that SubtypeDCC achieved the highest scores for both LRT and ECP (Figure 3), indicating its superior ability to identify subtypes with clinically significant differences. Similarly, MOSD performed well, particularly excelling in the ECP metric, where it outperformed all other algorithms. In contrast, MDICC had lower clinical metric scores, suggesting it identifies subtypes with less pronounced clinical differences.

2.3. The Accuracy-Weighted Average Index of Clustering Algorithm

Although optimal algorithms can be found based on clustering effects and clinical performance through internal and clinical metrics, respectively, discrepancies remain when considering the overall perspective. We ranked the algorithms based on their internal and clinical scores separately across different cancer datasets and selected the top five scorers in each category. Interestingly, algorithms selected based on internal metrics (internal algorithms) did not align with those selected based on clinical metrics (clinical algorithms) (Figure 4A). This observation suggests that such screening methods might highlight algorithms that excel in one aspect (either clustering effect or clinical performance) but perform poorly in the other. Moreover, these methods could miss high-quality algorithms that maintain balanced performance across both aspects. For example, in the STAD dataset, MOSD and MSNE ranked in the top five for internal scores (7.08 and 6.67, respectively), but their clinical scores (3.5 and 1.25) placed them at the bottom (Figure 4A).

To address this limitation, we propose the AWA value, which assigns different weights to internal and clinical metrics (I/C) and performs weighted averaging to synthesize the clustering effect and clinical performance of the algorithm. With the set weights (I/C = 50%/50%) to calculate the AWA, we identify SubtypeDCC, nNMF, SubtypeGAN, SMCC, and Parea as the top performers in both clustering effect and clinical performance for the STAD dataset (Figure 4B,C). Notably, while SMCC and Parea were not screened out on the internal or clinical scores, their balanced performance in both aspects outperformed over half of their counterparts. Additionally, SubtypeDCC is the preferred algorithm for clustering the STAD dataset, as it effectively combines both clustering effectiveness and clinical performance. We performed the survival analyses of clustering results on the STAD dataset from PIntMF (top-ranked for internal metrics), subtypeWESLR (top-ranked for clinical metrics), and SubtypeDCC (top-ranked for AWA scores). The clustering result derived from SubtypeDCC shows the highest confidence level with survival analyses (SubtypeDCC, p = 0.0082 (Figure 4D); subtypeWESLR, p = 0.038 (Figure S5); PIntMF, p = 0.66), suggesting that AWA can balance the difference between internal and clinical indicators and provide a more effective measurement of the disease-relative characteristics.

Meanwhile, we calculated and ranked the AWA scores for 11 algorithms applied to nine different cancer datasets (Figure 5A), with the complete set of AWA scores provided in Figure S1. Although the accuracy of each algorithm varies significantly across different cancer datasets, PIntMF, which achieves the highest AWA score, performs superiorly across most datasets, particularly excelling in the PAAD, UCEC, and BRCA datasets (with AWA scores of 9.62, 8.92, and 8.62, respectively). However, PIntMF performed relatively poorly in the STAD and LUAD datasets (with AWA scores of 5.25 and 5.98), primarily due to its low clinical scores in these datasets (2.25 and 4.00, respectively). In contrast, SubtypeDCC, which has the second-highest AWA score, performs consistently across all datasets (with AWA scores above 6.5 in all cases) and outperforms PIntMF in the BLCA, STAD, SKCM, and LUAD datasets (Figure S1), yet it is less efficient in terms of runtime. In addition, the lower-ranked algorithms MDICC and MCLS scored below 5 in most datasets, particularly ranking among the worst performers in the BLCA, KIRC, and BRCA datasets. To verify the reliability of the ranking, we calculated the AWA scores under various weight settings (Figure S2). Notably, PIntMF and SubtypeDCC consistently maintained their leading performance across different weights.

In addition to achieving high accuracy, researchers highly value multi-omics clustering algorithms that are efficient and require less computation time. Here, we calculated the average runtime of the 11 algorithms across nine cancer datasets. As shown in Figure 5B, with the exception of Subtype-WESLR, Subtype-DCC, nNMF, and SMCC, the average runtime of the remaining seven algorithms is under 50 s, indicating their high operational efficiency.

2.4. Molecular Subtyping Performance of PIntMF

To assess whether the AWA scores accurately reflect both the clustering effect and clinical performance of the algorithm, we conducted UMAP dimensionality reduction and survival analysis using the PIntMF algorithm on the BRCA dataset (AWA score: 8.62, indicating high accuracy) and the STAD dataset (AWA score: 5.25, indicating low accuracy). As shown in Figure 6, the BRCA dataset was successfully divided into five distinct clusters, with significant survival differences between these clusters (p = 0.0041), among which Cluster 2 exhibited the best prognostic effect. In contrast, the STAD dataset was divided into three clusters, but its UMAP visualization revealed obvious cross-mixing and the survival differences between the clusters were not significant (p = 0.66). This suggests that PIntMF performs poorly for subtyping STAD and may not be effective in supporting prognosis for this subtype. These findings are consistent with the observed AWA scores. Therefore, the AWA score serves as a reliable metric for evaluating the performance of the algorithm, facilitating both cancer subtyping and its associated prognosis.

3. Discussion

In recent years, the field of Artificial Intelligence (AI) has transitioned from primarily theoretical research to real-world applications. [19]. For example, the FDA approved automated detection, counting, and computer-generated analysis of the HER2 gene for therapeutic determination in breast cancer [20]. Esteva et al. trained a CNN dataset to recognize nonmelanoma skin cancer versus benign seborrheic keratosis and melanoma versus benign nevi. They project their results to a practical application of incorporating deep neural networks into clinician’s mobile devices to yield diagnoses beyond the confines of the office/clinic [21]. However, the ability of AI to effectively assist in clinical decision-making depends on the performance of prediction, recognition, or classification models.

Molecular subtyping of cancer plays a crucial role in enabling precise diagnosis and personalized treatment for patients, which also requires high precision in subtyping algorithms to support decision-making. With the rapid advancement of multi-omics clustering algorithms, the need for effective evaluation and comparison of these methods has become increasingly important. In this study, we established a comprehensive framework for evaluating multi-omics clustering algorithms. We assessed the clustering performance and clinical relevance of 11 algorithms across nine TCGA cancer datasets using both internal and clinical metrics. During the evaluation, we observed that while some algorithms had very similar performance across these two types of metrics, many others exhibited significant discrepancies between their internal and clinical scores. This variation posed challenges in evaluating and selecting the best-performing algorithm, making it difficult to quickly identify the algorithm with the highest accuracy.

Therefore, we propose a new evaluation metric, AWA, which measures the accuracy of multi-omics clustering algorithms by considering both their clustering effectiveness and the clinical performance of the resulting molecular subtypes. This metric enables a comprehensive assessment of algorithm performance in cancer molecular subtyping. Additionally, the optimal clustering algorithm varies across different cancer datasets. Among the nine cancer datasets analyzed in this study, PIntMF achieved the highest AWA score, demonstrating strong overall performance, particularly in the PAAD, UCEC, and BRCA datasets, where it can be recommended as the first-choice clustering algorithm (Figure 5C). Furthermore, the subtyping performance of PIntMF on the BRCA and STAD datasets further demonstrates that AWA serves as a reliable metric for evaluating the algorithm’s subtyping performance. Further, to validate the reliability and generalizability of the comprehensive evaluation process, we applied our analysis to two non-TCGA datasets: PAT and HCC datasets. The results show that our strategy can achieve an accurate evaluation of the performance of multi-omics clustering algorithms on the supplemental datasets. PIntMF, the best clustering algorithm identified by the evaluation process in TCGA datasets, consistently performs well in the non-TCGA datasets containing additional omics data (Figures S3 and S4).

By combining the AWA scores with the operational efficiency of each algorithm, we found that PIntMF delivers the best overall performance (Figure 5). This performance may be attributed to the design idea of PIntMF. In terms of clustering performance, a key advantage of this method is its ability to automatically tune the lasso penalties for both variable and sample matrices. For operational efficiency, PIntMF utilizes the faster GLMNet framework to infer the matrix H^k and optimizes algorithm initialization using the SNF algorithm [12]. These features enable PIntMF to effectively address the distributional heterogeneity of multi-omics data, with its performance being particularly exceptional in handling the PAAD and BRCA datasets. In contrast, other algorithms showed certain limitations: SubtypeDCC and MOSD exhibited high accuracy but poor running efficiency, while MDICC had lower accuracy but better operational efficiency. However, each algorithm has its limitations in application. For instance, when applied to large datasets, the operational efficiency of PIntMF decreases significantly [12]; other algorithms that exhibit lower accuracy but better runtime efficiency, such as MDICC, may offer more effective support for disease subtype clustering.

Overall, we construct a comprehensive evaluation process to identify ideal algorithms for cancer molecular subtyping and provide a theoretical foundation for the development of computer-aided tools to support cancer diagnosis and treatment. After being evaluated by multiple datasets, PIntMF is identified as a highly accurate and operationally efficient cancer subtyping algorithm that can be integrated into computer-aided tools for analyzing patient-derived multi-omics data, assisting in the diagnosis and treatment of specific cancers, such as PAAD, BRCA, and UCEC.

Notably, in this study, the clustering effect of the algorithm was considered to be of equal importance to the clinical performance. To calculate the overall evaluation, we assigned equal weights to the internal and clinical metrics. However, this may not represent the optimal weighting for each cancer dataset, and these parameters could be adjusted in future studies depending on the specific cancer datasets or research focus. In addition, the K-values for the cancer datasets were selected based on existing studies; we recognize that the performance of clustering algorithms could be further improved by optimization of K-values on different datasets. Additionally, the current evaluation process mainly focused on omics data of copy number, DNA methylation, mRNA, and miRNA data for nine cancers, which may lead to limitations in the application of the evaluation process to other types of omics data and diseases. In future studies, we will try to include a broader range of multi-omics data (e.g., proteomics, single-cell omics, etc.) and validate and improve the evaluation process across a wider array of diseases. The characteristics of clustering algorithms, such as interpretability and the issues related to deep learning models, including overfitting and underfitting results, that were not included in the current study, should also be considered.

4. Materials and Methods

In this study, a comprehensive evaluation process of multi-omics clustering algorithms was constructed (Figure 1). The process began with the preparation and preprocessing of the omics data, followed by the selection and execution of clustering algorithms. Finally, the molecular subtyping performance of the different algorithms was thoroughly assessed based on the AWA and running efficiency.

4.1. Data Sources and Preprocessing

The Cancer Genome Atlas (TCGA), a cancer research project created in 2006 by the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGIR), contains genomic, transcriptomic, epigenomic, and proteomic data across 33 different cancer types [22]. In the current study, we utilized 9 TCGA cancer datasets, as provided by Yang et al. [17]. These datasets collectively include 4027 tumor samples from nine cancer types: breast invasive carcinoma (BRCA, 1031 samples), bladder urothelial carcinoma (BLCA, 399 samples), kidney renal clear cell carcinoma (KIRC, 488 samples), lung adenocarcinoma (LUAD, 490 samples), pancreatic adenocarcinoma (PAAD, 176 samples), skin cutaneous melanoma (SKCM, 446 samples), stomach adenocarcinoma (STAD, 407 samples), uterine corpus endometrial carcinoma (UCEC, 510 samples), and uveal melanoma (UVM, 80 samples). Each dataset includes four types of omics data for all samples: copy number, DNA methylation, mRNA, and miRNA. The large sample size and comprehensive clinical information in these datasets ensure the stability and reliability of the subsequent molecular subtyping analysis. To minimize bias in the clinical analysis, we selected a consistent set of clinical information across all cancer datasets, including gender, age at diagnosis, pathologic T, pathologic M, pathologic N, and pathologic stage.

After data collection, we performed preprocessing on the four omics data types: (1) Copy number data: removing duplicate regions from the original data and constructing features based on the correspondence between samples and genomic regions. (2) DNA methylation data: combining features with β values ≥ 0.3 from DNA methylation arrays HM27 and HM450. (3) mRNA and miRNA data: after log2 transformation, poorly expressed genes were removed using median absolute deviation and performing feature dimensionality reduction through variance threshold. Moreover, missing values were imputed using the sample mean of each omics data, and data were normalized by removing the mean of the features and scaling them to unit variance. After preprocessing, we obtained a total of 3105 copy number features, 3139 DNA methylation features, 3217 mRNA features, and 383 miRNA features.

To further validate the reliability and generalizability of the comprehensive evaluation process, we supplemented our analysis with two non-TCGA datasets: the pediatric brain tumor (PBT) dataset (214 samples) from the UCSC Xena browser (https://xenabrowser.net/ (accessed on 20 September 2024)) and the hepatocellular carcinoma (HCC) dataset (159 samples) from the National Omics Data Encyclopedia (NODE, https://www.biosino.org/node/ (accessed on 25 September 2024)). The PBT dataset comprises three omics types: transcriptomics, proteomics, and phosphoproteomics, while the HCC dataset includes four omics types: genomics (copy number), transcriptomics, proteomics, and phosphoproteomics. Preprocessing methods for copy number and transcriptomics data followed the procedures described above. For proteomics and phosphoproteomics data, features with more than 50% missing values were excluded, and the remaining missing values were imputed using the mean. Feature selection and normalization were conducted using the median absolute deviation (MAD) and median normalization methods, respectively.

It is important to note that multi-omics clustering algorithms for molecular subtyping of cancers typically use unsupervised clustering methods. To ensure the fairness and consistency of the evaluation results, the number of clusters (K-value) for each cancer dataset was set based on the reasonable number of subtypes identified in previous studies (Table 1).

4.2. Multi-Omics Clustering Algorithm Selection

In recent years, researchers have developed a wide range of multi-omics clustering algorithms, each employing different strategies. Even when applied to the same cancer dataset, these algorithms often yield inconsistent subtyping results. As a result, evaluating and comparing different algorithms becomes crucial for ensuring reliable and accurate subtyping.

This study evaluated the performance of 11 multi-omics clustering algorithms for comprehensive cancer molecular subtyping. The selection was based on the multi-omics clustering algorithms published in the last five years and the availability of implementable code. A brief description of each of the 11 algorithms is provided below:

MOSD [8] first creates affinities for each data type using local scaling. The affinities are then linearly combined into a network by assigning weights to each omics. Finally, spectral clustering is applied to the self-diffusion-enhanced similarity network to identify cancer subtypes.
MSNE [9] first constructs similarity networks of samples for complete or partial multi-omics data. Then, the integrated similarity of samples is captured through random walk on multiple similarity networks. Finally, the cancer subtypes are obtained with the Skip-gram method by projecting the samples into a low-dimensional space for k-mean clustering.
MCLS [10] first utilizes complete multi-omics data to construct a latent subspace using principal component analysis (PCA)-based feature extraction and singular value decomposition (SVD). Then, a projection matrix of each omics is learned to project the incomplete multi-omics data to the latent subspace. Finally, the samples are clustered using spectral clustering in the latent subspace.
Subtype-WESLR [11] is based on a sparse subspace learning framework. First, it employs a weighted ensemble strategy to fuse base clustering obtained from different methods as prior knowledge. Then, the sample feature profiles of each data type are projected to a common latent subspace corresponding to the subspace consistency. Finally, the common subspace is optimized by an iterative method to identify cancer subtypes.
PIntMF [12] is a matrix factorization model with non-negativity and sparsity constraints. First, the original matrix is decomposed into a product of a common basis matrix (W) and a specific coefficient matrix (H^k). Then, sparsity is added to W and Hk via lasso penalization and equality constraints are applied to improve interpretability. Finally, W and H^k are iteratively updated until the similarity of W is stable, and hierarchical clustering is performed on W to obtain the sample subtypes.
nNMF [13] combines the strengths of intNMF [34] and SNF [35]. It first uses intNMF to construct a stable consensus matrix for each data type. Then, theses consensus matrices are integrated into a single consensus matrix by SNF. Finally, spectral clustering is performed on this single consensus matrix to determine cancer subtypes.
SMCC [14] first constructs sample-sample similarity networks based on Euclidean distance. Then, it integrates weighted least squares, low-rank subspace representation, and entropy to fuse networks. Finally, co-regularization is used to measure and minimize the distribution difference between the similarity networks and the fused network, and cancer subtypes are obtained through clustering.
MDICC [15] first constructs affinity matrices for different omics data based on Gaussian kernel functions. After fusing them using low-rank subspace representation and entropy, the integrated matrix is clustered using K-means++ to obtain cancer subtypes.
Parea [16] is based on multi-view hierarchical clustering and data fusion. It first selects hierarchical clustering methods to represent each omics data into separate views. Then, hierarchical clustering is used again to identify cancer subtypes by creating a fusion object.
Subtype-GAN [17] is a deep adversarial learning method based on a multi-input multi-output neural network. First, the features of each omics data are extracted from relatively independent layers. Then, after inputting these features into the same shared layer, the subtypes of the samples are obtained by consensus GMM clustering through the hidden factors of the shared layer.
Subtype-DCC [18] combines deep clustering and decoupled comparison learning. First, data pairs are constructed from the pair construction backbone (PCB) through data augmentation, and features are extracted from the data augmentation using a shared deep neural network. Then, the instance-level contrastive head (ICH) and the cluster-level contrastive head (CCH) are used for contrastive learning in the row and column spaces of the feature matrix, respectively. After training, the CCH is used to predict cancer subtypes.

As shown in Table 2, the operating principles of each multi-omics clustering algorithm are different. Based on the classification framework proposed by Subramanian and Rappoport et al. [3,4], we categorized the 11 algorithms into five main groups: similarity-based clustering methods, dimensionality reduction-based clustering methods, matrix factorization-based clustering methods, fusion-based methods, and deep learning-based clustering methods.

4.3. Metrics for Evaluating Subtyping Performance

Currently, commonly used metrics for evaluating cancer subtyping algorithms include clustering internal metrics or clinical metrics and, in some cases, runtime to assess the algorithm’s operational performance. Internal metrics include the silhouette coefficient (S) [36], the Calinski–Harabasz index (CH) [37], and the Dunn’s index (D) [38]. S means the pairwise differences in inter- and intra-cluster distances, CH represents the average of inter- and intra-cluster sums of squares, and D is the ratio of the minimum pairwise distance between clusters to the maximum diameter within a cluster. All the internal metrics do not rely on any external information and only measure the performance of clustering through intra-cluster compactness and inter-cluster separation. Clinical metrics include survival differences between subtypes obtained by the log rank test (LRT) [39] and the number of enriched clinical parameters (ECP) obtained by the χ² test [40] and the Kruskal–Wallis test [41]. The formulas for all clustering internal metrics or clinical metrics used in this study are provided in Table 3.

When comparing the subtyping performance of algorithms, the clinical significance of inter-subtyping is as important as the clustering effect itself. To integrate both the clustering performance of the algorithm and the clinical characteristics between subtypes, we propose a new evaluation metric, the accuracy-weighted average index (AWA). The AWA is defined as follows:

A W A = \frac{(S_{S} + S_{C H} + S_{D}) \times w_{1} + (S_{L R T} + S_{E C P}) \times w_{2}}{3 w_{1} + 2 w_{2}}

(1)

where S_S, S_CH, and S_D represent S, CH, and D of the internal metrics, respectively. S_LRT and S_ECP mean LRT and ECP in the clinical metrics. Moreover, w₁ and w₂ are weights of the internal and clinical metrics, respectively.

5. Conclusions

In this study, we developed a comprehensive framework for evaluating multi-omics clustering algorithms and introduced a novel metric for subtyping performance, the accuracy-weighted average index, which assigns different weights to clustering effectiveness and clinical performance. Furthermore, we comprehensively compared the molecular subtyping performance of 11 frontier multi-omics clustering algorithms on nine TCGA cancer datasets. Our findings indicate that PIntMF is the preferred algorithm for molecular typing in most cancers owing to its superior clustering accuracy and operational efficiency.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26030963/s1.

Author Contributions

Conceptualization, J.M., M.L. and Y.Z.; Data curation, J.W.; Formal analysis, J.W.; Funding acquisition, J.M. and Y.Z.; Investigation, J.W.; Methodology, Y.L. and X.L.; Project administration, J.M., M.L. and Y.Z.; Resources, J.W.; Software, J.W. and L.W.; Supervision, J.M., M.L. and Y.Z.; Validation, Y.L., X.L. and J.M.; Visualization, J.W. and L.W.; Writing—original draft, J.W. and L.W.; Writing—review and editing, J.W. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Key Research Program of China (grant number 2021YFA1301603), the National Natural Science Foundation of China (grant number 82341079), and the Open Project Program of the State Key Laboratory of Medical Proteomics (Academy of Military Medical Sciences) (grant number SKLP-O202204).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Materials.

Acknowledgments

We thank the bioinformatics platform of the National Center for Protein Sciences (Beijing, China) for providing the computational resource for data analysis.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhao, L.; Lee, V.H.F.; Ng, M.K.; Yan, H.; Bijlsma, M.F. Molecular Subtyping of Cancer: Current Status and Moving toward Clinical Applications. Brief. Bioinform. 2019, 20, 572–584. [Google Scholar] [CrossRef] [PubMed]
Ivanisevic, T.; Sewduth, R.N. Multi-Omics Integration for the Design of Novel Therapies and the Identification of Novel Biomarkers. Proteomes 2023, 11, 34. [Google Scholar] [CrossRef]
Subramanian, I.; Verma, S.; Kumar, S.; Jere, A.; Anamika, K. Multi-Omics Data Integration, Interpretation, and Its Application. Bioinform. Biol. Insig. 2020, 14, 117793221989905. [Google Scholar] [CrossRef] [PubMed]
Rappoport, N.; Shamir, R. Multi-Omic and Multi-View Clustering Algorithms: Review and Cancer Benchmark. Nucleic Acids Res. 2018, 46, 10546–10562. [Google Scholar] [CrossRef]
Athieniti, E.; Spyrou, G.M. A Guide to Multi-Omics Data Collection and Integration for Translational Medicine. Comput. Struct. Biotec 2023, 21, 134–149. [Google Scholar] [CrossRef]
Tini, G.; Marchetti, L.; Priami, C.; Scott-Boyer, M.-P. Multi-Omics Integration—A Comparison of Unsupervised Clustering Methodologies. Brief. Bioinform. 2019, 20, 1269–1279. [Google Scholar] [CrossRef]
Chauvel, C.; Novoloaca, A.; Veyre, P.; Reynier, F.; Becker, J. Evaluation of Integrative Clustering Methods for the Analysis of Multi-Omics Data. Brief. Bioinform. 2020, 21, 541–552. [Google Scholar] [CrossRef]
Duan, X.; Ding, X.; Zhao, Z. Multi-Omics Integration with Weighted Affinity and Self-Diffusion Applied for Cancer Subtypes Identification. J. Transl. Med. 2024, 22, 79. [Google Scholar] [CrossRef]
Xu, H.; Gao, L.; Huang, M.; Duan, R. A Network Embedding Based Method for Partial Multi-Omics Integration in Cancer Subtyping. Methods 2021, 192, 67–76. [Google Scholar] [CrossRef]
Ye, X. Multi-Omics Clustering for Cancer Subtyping Based on Latent Subspace Learning. Comput. Biol. Med. 2023, 164, 107223. [Google Scholar] [CrossRef]
Song, W.; Wang, W.; Dai, D.-Q. Subtype-WESLR: Identifying Cancer Subtype with Weighted Ensemble Sparse Latent Representation of Multi-View Data. Brief. Bioinform. 2022, 23, bbab398. [Google Scholar] [CrossRef] [PubMed]
Pierre-Jean, M.; Mauger, F.; Deleuze, J.-F.; Le Floch, E. PIntMF: Penalized Integrative Matrix Factorization Method for Multi-Omics Data. Bioinformatics 2022, 38, 900–907. [Google Scholar] [CrossRef] [PubMed]
Chalise, P.; Ni, Y.; Fridley, B.L. Network-Based Integrative Clustering of Multiple Types of Genomic Data Using Non-Negative Matrix Factorization. Comput. Biol. Med. 2020, 118, 103625. [Google Scholar] [CrossRef] [PubMed]
Tian, S.; Yang, Y.; Qiu, Y.; Zou, Q. SMCC: A Novel Clustering Method for Single- and Multi-Omics Data Based on Co-Regularized Network Fusion. IEEE/ACM Trans. Comput. Biol. Bioinform. 2024, 14, 1–9. [Google Scholar] [CrossRef]
Yang, Y.; Tian, S.; Qiu, Y.; Zhao, P.; Zou, Q. MDICC: Novel Method for Multi-Omics Data Integration and Cancer Subtype Identification. Brief. Bioinform. 2022, 23, bbac132. [Google Scholar] [CrossRef]
Pfeifer, B.; Bloice, M.D.; Schimek, M.G. Parea: Multi-View Ensemble Clustering for Cancer Subtype Discovery. J. Biomed. Inform. 2023, 143, 104406. [Google Scholar] [CrossRef]
Yang, H.; Chen, R.; Li, D.; Wang, Z. Subtype-GAN: A Deep Learning Approach for Integrative Cancer Subtyping of Multi-Omics Data. Bioinformatics 2021, 37, 2231–2237. [Google Scholar] [CrossRef]
Zhao, J.; Zhao, B.; Song, X.; Lyu, C.; Chen, W.; Xiong, Y.; Wei, D.-Q. Subtype-DCC: Decoupled Contrastive Clustering Method for Cancer Subtype Identification Based on Multi-Omics Data. Brief. Bioinform. 2023, 24, bbad025. [Google Scholar] [CrossRef]
Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; et al. Applications of Machine Learning in Drug Discovery and Development. Nat. Rev. Drug Discov. 2019, 18, 463–477. [Google Scholar] [CrossRef]
Olsen, T.G.; Jackson, B.H.; Feeser, T.A.; Kent, M.N.; Moad, J.C.; Krishnamurthy, S.; Lunsford, D.D.; Soans, R.E. Diagnostic Performance of Deep Learning Algorithms Applied to Three Common Diagnoses in Dermatopathology. J. Pathol. Inform. 2018, 9, 32. [Google Scholar] [CrossRef]
Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist–Level Classification of Skin Cancer with Deep Neural Networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef] [PubMed]
Lee, J.-S. Exploring Cancer Genomic Data from the Cancer Genome Atlas Project. BMB Rep. 2016, 49, 607–611. [Google Scholar] [CrossRef] [PubMed]
Berger, A.C.; Korkut, A.; Kanchi, R.S.; Hegde, A.M.; Lenoir, W.; Liu, W.; Liu, Y.; Fan, H.; Shen, H.; Ravikumar, V.; et al. A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers. Cancer Cell 2018, 33, 690–705.e9. [Google Scholar] [CrossRef]
Robertson, A.G.; Kim, J.; Al-Ahmadie, H.; Bellmunt, J.; Guo, G.; Cherniack, A.D.; Hinoue, T.; Laird, P.W.; Hoadley, K.A.; Akbani, R.; et al. Comprehensive Molecular Characterization of Muscle-Invasive Bladder Cancer. Cell 2017, 171, 540–556.e25. [Google Scholar] [CrossRef]
Creighton, C.J.; Morgan, M.; Gunaratne, P.H.; Wheeler, D.A.; Gibbs, R.A.; Gordon Robertson, A.; Chu, A.; Beroukhim, R.; Cibulskis, K.; Signoretti, S.; et al. Comprehensive Molecular Characterization of Clear Cell Renal Cell Carcinoma. Nature 2013, 499, 43–49. [Google Scholar] [CrossRef]
Collisson, E.A.; Campbell, J.D.; Brooks, A.N.; Berger, A.H.; Lee, W.; Chmielecki, J.; Beer, D.G.; Cope, L.; Creighton, C.J.; Danilova, L.; et al. Comprehensive Molecular Profiling of Lung Adenocarcinoma. Nature 2014, 511, 543–550. [Google Scholar] [CrossRef]
Raphael, B.J.; Hruban, R.H.; Aguirre, A.J.; Moffitt, R.A.; Yeh, J.J.; Stewart, C.; Robertson, A.G.; Cherniack, A.D.; Gupta, M.; Getz, G.; et al. Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma. Cancer Cell 2017, 32, 185–203.e13. [Google Scholar] [CrossRef]
Akbani, R.; Akdemir, K.C.; Aksoy, B.A.; Albert, M.; Ally, A.; Amin, S.B.; Arachchi, H.; Arora, A.; Auman, J.T.; Ayala, B.; et al. Genomic Classification of Cutaneous Melanoma. Cell 2015, 161, 1681–1696. [Google Scholar] [CrossRef]
Lei, Z.; Tan, I.B.; Das, K.; Deng, N.; Zouridis, H.; Pattison, S.; Chua, C.; Feng, Z.; Guan, Y.K.; Ooi, C.H.; et al. Identification of Molecular Subtypes of Gastric Cancer With Different Responses to PI3-Kinase Inhibitors and 5-Fluorouracil. Gastroenterology 2013, 145, 554–565. [Google Scholar] [CrossRef]
Levine, D.A. Integrated Genomic Characterization of Endometrial Carcinoma. Nature 2013, 497, 67–73. [Google Scholar] [CrossRef]
Robertson, A.G.; Shih, J.; Yau, C.; Gibb, E.A.; Oba, J.; Mungall, K.L.; Hess, J.M.; Uzunangelov, V.; Walter, V.; Danilova, L.; et al. Integrative Analysis Identifies Four Molecular and Clinical Subsets in Uveal Melanoma. Cancer Cell 2017, 32, 204–220.e15. [Google Scholar] [CrossRef] [PubMed]
Gao, Q.; Zhu, H.; Dong, L.; Shi, W.; Chen, R.; Song, Z.; Huang, C.; Li, J.; Dong, X.; Zhou, Y.; et al. Integrated Proteogenomic Characterization of HBV-Related Hepatocellular Carcinoma. Cell 2019, 179, 561–577.e22. [Google Scholar] [CrossRef] [PubMed]
Petralia, F.; Tignor, N.; Reva, B.; Koptyra, M.; Chowdhury, S.; Rykunov, D.; Krek, A.; Ma, W.; Zhu, Y.; Ji, J.; et al. Integrated Proteogenomic Characterization across Major Histological Types of Pediatric Brain Cancer. Cell 2020, 183, 1962–1985.e31. [Google Scholar] [CrossRef]
Chalise, P.; Fridley, B.L. Integrative Clustering of Multi-Level ‘omic Data Based on Non-Negative Matrix Factorization Algorithm. PLoS ONE 2017, 12, e0176278. [Google Scholar] [CrossRef]
Wang, B.; Mezlini, A.M.; Demir, F.; Fiume, M.; Tu, Z.; Brudno, M.; Haibe-Kains, B.; Goldenberg, A. Similarity Network Fusion for Aggregating Data Types on a Genomic Scale. Nat. Methods 2014, 11, 333–337. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Calinski, T.; Harabasz, J. A Dendrite Method for Cluster Analysis. Comm. Stats.—Theory Methods 1974, 3, 1–27. [Google Scholar] [CrossRef]
Dunn, J.C. Well-Separated Clusters and Optimal Fuzzy Partitions. J. Cybern. 1974, 4, 95–104. [Google Scholar] [CrossRef]
Mantel, N. Evaluation of Survival Data and Two New Rank Order Statistics Arising in Its Consideration. Cancer Chemother. Rep. 1966, 50, 163–170. [Google Scholar]
Plackett, R.L. Karl Pearson and the Chi-Squared Test. Int. Stat. Rev. / Rev. Int. De Stat. 1983, 51, 59. [Google Scholar] [CrossRef]
Mahoney, M.; Magel, R. Estimation of the Power of the Kruskal-Wallis Test. Biom. J. 1996, 38, 613–630. [Google Scholar] [CrossRef]

Figure 1. Comprehensive evaluation process of multi-omics clustering algorithms.

Figure 2. Internal metrics of clustering algorithm. The internal metrics (S, CH, D) are represented by bars of different colors. The height of each bar corresponds to the average score of each internal metric across the 9 cancer datasets.

Figure 3. Clinical metrics of clustering algorithm. The clinical metrics (LRT and ECP) are represented by bars of different colors. The height of each bar indicates the average score for each clinical metric across the 9 cancer datasets.

Figure 4. Metrics of clustering algorithms on the STAD dataset: (A) Plot showing the metrics of clustering algorithms on the STAD dataset, with the horizontal axis representing internal metrics and the vertical axis representing clinical metrics. The top five algorithms, ranked by internal or clinical metrics, are highlighted with color blocks. (B) Venn diagram illustrating the overlap of the top five algorithms selected based on internal metrics, clinical metrics, and AWA scores for the STAD dataset. (C) Bar chart displaying the internal metrics, clinical metrics, and AWA scores for all 11 algorithms on the STAD dataset. The yellow and blue bars represent the average internal metrics and clinical metrics for each algorithm, respectively, with the corresponding values labeled at the top of the bars. The endpoints of the line indicate the average AWA scores for the algorithms. The horizontal axis corresponds to the 11 algorithms, while the vertical axis represents the scores. (D) Kaplan–Meier survival plot for the clustering result of SubtypeDCC on the STAD dataset. The black dashed lines represent the survival times corresponding to a 50% cumulative survival probability for each of the three subtypes.

Figure 5. Average AWA score and running time (RT) for 11 clustering algorithms on 9 cancer datasets: (A) Bar diagram showing the average AWA scores for 11 clustering algorithms on 9 cancer datasets, sorted by score from highest to lowest. (B) Bar diagram illustrating the average running time for the 11 clustering algorithms on the 9 cancer datasets, sorted by runtime from longest to shortest. (C) Overview of the internal metrics, clinical metrics, AWA score, and runtime for PIntMF across the 9 cancer datasets. The yellow and blue bars represent the average internal metrics and clinical metrics for each algorithm, respectively, with the corresponding values labeled at the top of the bars. The endpoints of the line indicate the average AWA scores (denoted by dot endpoints on the orange line) and runtime (denoted by diamond endpoints on the purple line) for the algorithms. The horizontal axis represents the 11 algorithms, the left vertical axis displays the internal metrics, clinical metrics, and AWA scores, and the right vertical axis shows the run time.

Figure 6. Clustering performance and clinical performance of PIntMF: (A) UMAP visualization of latent variables generated by PIntMF based on the BRCA dataset. (B) Kaplan–Meier survival plot for the clustering result of PIntMF on the BRCA dataset. The black dashed lines represent the survival times corresponding to a 50% cumulative survival probability for each of the five subtypes. (C) UMAP visualization of latent variables generated by PIntMF based on the STAD dataset. (D) Kaplan–Meier survival plot for the clustering result of PIntMF on the STAD dataset. The black dashed lines represent the survival times corresponding to a 50% cumulative survival probability for each of the three subtypes.

Table 1. K value of 11 cancer datasets.

Datasets	K Value	References
BRCA	5	[23]
BLCA	5	[24]
KIRC	4	[25]
LUAD	3	[26]
PAAD	2	[27]
SKCM	4	[28]
STAD	3	[29]
UCEC	4	[30]
UVM	4	[31]
PBT	8	[32]
HCC	3	[33]

Table 2. Classification and information of 11 multi-omics clustering algorithms.

Categories	Algorithms	Year	Implementation	Reference
Similarity-based clustering methods	MOSD	2024	https://github.com/DXCODEE/MOSD (accessed on 4 March 2024)	[8]
Similarity-based clustering methods	MSNE	2021	https://github.com/GaoLabXDU/MSNE (accessed on 5 March 2024)	[9]
Dimensionality reduction-based clustering methods	MCLS	2023	https://github.com/ShangCS/MCLS (accessed on 9 March 2024)	[10]
Dimensionality reduction-based clustering methods	subtype-WESLR	2022	https://github.com/songwenjing123/subtype-WESLR (accessed on 2 April 2024)	[11]
Matrix factorization-based clustering methods	PIntMF	2022	https://github.com/mpierrejean/pintmf (accessed on 5 April 2024)	[12]
Matrix factorization-based clustering methods	nNMF	2020	R v4.3.2	[13]
Fusion-based clustering methods	SMCC	2024	https://github.com/yushanqiu/SMCC (accessed on 5 April 2024)	[14]
	MDICC	2022	https://github.com/yushanqiu/MDICC (accessed on 5 April 2024)	[15]
	Parea	2023	https://github.com/mdbloice/Pyrea (accessed on 5 April 2024)	[16]
Deep learning-based clustering methods	Subtype-GAN	2021	https://github.com/haiyang1986/Subtype-GAN (accessed on 6 April 2024)	[17]
Deep learning-based clustering methods	Subtype-DCC	2023	https://github.com/zhaojingo/Subtype-DCC (accessed on 6 April 2024)	[18]

Table 3. Metrics of subtyping performance evaluation.

Categories	Metrics	Formulas	Optimum
Internal metrics	S	$S = \frac{b - a}{m a x (a, b)}$ a: mean distance from the sample point to other points within the same cluster b: mean distance from the sample point to the nearest other cluster	Maximum
	CH	$C H = \frac{t r (B_{k}) \times (n - k)}{t r (W_{k}) \times (k - 1)}$ B_k: inter-cluster covariance matrix W_k: intra-cluster covariance matrix n: sample size k: number of clusters	Maximum
	D	$D = \frac{{m i n}_{i \neq j} (d (C_{i}, C_{j}))}{{m a x}_{k} (d i a m (C_{k}))}$ d(C_i, C_j): inter-cluster distance diam(C_k): cluster diameter	Maximum
Clinical metrics	LRT	Using log rank test to calculate the p-value	Minimum
Clinical metrics	ECP	χ² test discrete parameter enrichment Kruskal–Wallis test numerical parameter enrichment	Maximum
Time metrics	RT	Average running time	Minimum

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J.; Wang, L.; Liu, Y.; Li, X.; Ma, J.; Li, M.; Zhu, Y. Comprehensive Evaluation of Multi-Omics Clustering Algorithms for Cancer Molecular Subtyping. Int. J. Mol. Sci. 2025, 26, 963. https://doi.org/10.3390/ijms26030963

AMA Style

Wang J, Wang L, Liu Y, Li X, Ma J, Li M, Zhu Y. Comprehensive Evaluation of Multi-Omics Clustering Algorithms for Cancer Molecular Subtyping. International Journal of Molecular Sciences. 2025; 26(3):963. https://doi.org/10.3390/ijms26030963

Chicago/Turabian Style

Wang, Juan, Lingxiao Wang, Yi Liu, Xiao Li, Jie Ma, Mansheng Li, and Yunping Zhu. 2025. "Comprehensive Evaluation of Multi-Omics Clustering Algorithms for Cancer Molecular Subtyping" International Journal of Molecular Sciences 26, no. 3: 963. https://doi.org/10.3390/ijms26030963

APA Style

Wang, J., Wang, L., Liu, Y., Li, X., Ma, J., Li, M., & Zhu, Y. (2025). Comprehensive Evaluation of Multi-Omics Clustering Algorithms for Cancer Molecular Subtyping. International Journal of Molecular Sciences, 26(3), 963. https://doi.org/10.3390/ijms26030963

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comprehensive Evaluation of Multi-Omics Clustering Algorithms for Cancer Molecular Subtyping

Abstract

1. Introduction

2. Results

2.1. Internal Metrics of Clustering Algorithm

2.2. Clinical Metrics of Clustering Algorithm

2.3. The Accuracy-Weighted Average Index of Clustering Algorithm

2.4. Molecular Subtyping Performance of PIntMF

3. Discussion

4. Materials and Methods

4.1. Data Sources and Preprocessing

4.2. Multi-Omics Clustering Algorithm Selection

4.3. Metrics for Evaluating Subtyping Performance

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI