MoE Based Consistency and Complementarity Mining for Multi-View Clustering
Abstract
1. Introduction
- We propose a MoE based multi-view embedding learning method that effectively leverages inter-view consistency and complementarity, enabling adaptive modeling of the complex relationships in multi-view data.
- Through a rigorous evaluation on multiple benchmark datasets, our approach shows substantial performance improvements over existing state-of-the-art methods.
2. Related Work
2.1. Multi-View Clustering
2.2. Mixture of Experts
3. MEL-MoE Method
3.1. Reconstruction Loss
3.2. Consistency Loss
3.3. Kullback-Leibler Divergence Loss
3.4. Optimization
| Algorithm 1: Our proposed MEL-MoE |
Input: with c predefined clusters. Output: Cluster assignments for each . 1. Initialize the parameters and , respectively; 2. Optimize the parameters and based on the objective function defined in Equation (3); 3. Calculate the clustering centroids , using the K-means algorithm; 4. Optimize the parameters and based on the objective function defined in Equation (2) excluding ; 5. Return clustering label for all the samples using K-means on . |
4. Experiments
4.1. Experimental Datasets
4.2. Experimental Settings
4.3. Performance of MEL-MoE
4.4. Ablation Experiment
- MEL-MoE-Layer-1: We reduce the MoE network depth to a single layer.
- MEL-MoE-Layer-4: We extend the MoE network depth to four layers.
- MEL-MoE-Groups-1: We reduce the groups number in each MoE layer to a single group.
- MEL-MoE-Groups-4: We extend the groups number in each MoE layer to four groups.
- MEL-MoE-Expert-1: We reduce the number of experts in each group to a single expert, which degenerate into a traditional fully connected neural network.
- MEL-MoE-Expert-4: We extend the number of experts in each group to four experts.
4.5. Hyperparameter Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Fang, U.; Li, M.; Li, J.; Gao, L.; Jia, T.; Zhang, Y. A comprehensive survey on multi-view clustering. IEEE Trans. Knowl. Data Eng. 2023, 35, 12350–12368. [Google Scholar] [CrossRef]
- Zhou, L.; Du, G.; Lü, K.; Wang, L.; Du, J. A survey and an empirical evaluation of multi-view clustering approaches. Acm Comput. Surv. 2024, 56, 1–38. [Google Scholar]
- Zhang, C.; Chen, H.; Li, H.; Chen, C. Learning latent disentangled embeddings and graphs for multi-view clustering. Pattern Recognit. 2024, 156, 110839. [Google Scholar] [CrossRef]
- Chaudhuri, K.; Kakade, S.M.; Livescu, K.; Sridharan, K. Multi-view clustering via canonical correlation analysis. In Proceedings of the International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 129–136. [Google Scholar]
- Andrew, G.; Arora, R.; Bilmes, J.; Livescu, K. Deep canonical correlation analysis. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1247–1255. [Google Scholar]
- Kumar, A.; Rai, P.; Daume, H. Co-regularized multi-view spectral clustering. In Proceedings of the International Conference on Neural Information Processing Systems, Granada, Spain, 12–15 December 2011; pp. 1413–1421. [Google Scholar]
- Yang, T.; Wang, C.; Guo, J.; Li, X.; Chen, M.; Dang, S.; Chen, H. Triplets-based large-scale multi-view spectral clustering. Inf. Fusion 2025, 121, 103134. [Google Scholar]
- Dou, Z.; Peng, N.; Hou, W.; Xie, X.; Ma, X. Learning multi-level topology representation for multi-view clustering with deep non-negative matrix factorization. Neural Netw. 2025, 182, 106856. [Google Scholar] [PubMed]
- Che, H.; Li, C.; Leung, M.-F.; Ouyang, D.; Dai, X.; Wen, S. Robust hypergraph regularized deep non-negative matrix factorization for multi-view clustering. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 9, 1817–1829. [Google Scholar]
- Chen, R.; Tang, Y.; Zhang, W.; Feng, W. Adaptive-weighted deep multi-view clustering with uniform scale representation. Neural Netw. 2024, 171, 114–126. [Google Scholar] [PubMed]
- Hu, D.; Dong, Z.; Liang, K.; Yu, H.; Wang, S.; Liu, X. High-order topology for deep single-cell multi-view fuzzy clustering. IEEE Trans. Fuzzy Syst. 2024, 32, 4448–4459. [Google Scholar]
- Zhao, H.; Ding, Z.; Fu, Y. Multi-view clustering via deep matrix factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 2921–2927. [Google Scholar]
- Yin, Q.; Wu, S.; Wang, L. Multiview clustering via unified and view-specific embeddings learning. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5541–5553. [Google Scholar]
- Wang, X.; Li, H.; Cao, Y.; Yin, Q. VDMN: View-driven modular network for consistency and complementarity mining in multi-view clustering. In Proceedings of the International Conference on Computer and Communication Systems, Chengdu, China, 18–21 April 2025; pp. 414–419. [Google Scholar]
- Chen, J.; Huang, A.; Gao, W.; Niu, Y.; Zhao, T. Joint shared-and-specific information for deep multi-view clustering. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7224–7235. [Google Scholar]
- Zhao, M.; Yang, W.; Nie, F. Deep multi-view spectral clustering via ensemble. Pattern Recognit. 2023, 144, 109836. [Google Scholar] [CrossRef]
- Wan, X.; Liu, J.; Gan, X.; Liu, X.; Wang, S.; Wen, Y.; Wan, T.; Zhu, E. One-step multi-view clustering with diverse representation. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 5774–5786. [Google Scholar] [CrossRef]
- Luo, S.; Zhang, C.; Zhang, W.; Cao, X. Consistent and specific multi-view subspace clustering. In Proceedings of the AAAI Conferenc on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 3730–3737. [Google Scholar]
- Liang, Y.; Huang, D.; Wang, C.-D. Consistency meets inconsistency: A unified graph learning framework for multi-view clustering. In Proceedings of the IEEE International Conference on Data Mining, Beijing, China, 8–11 November 2019. [Google Scholar]
- Kang, Z.; Shi, G.; Huang, S.; Chen, W.; Pu, X.; Zhou, J.T.; Xu, Z. Multi-graph fusion for multi-view spectral clustering. Knowl.-Based Syst. 2020, 189, 105102. [Google Scholar] [CrossRef]
- Xu, K.; Tang, K.; Su, Z.; Tan, H. Clean and robust multi-level subspace representations learning for deep multi-view subspace clustering. Expert Syst. Appl. 2024, 252, 124243. [Google Scholar] [CrossRef]
- Kang, Z.; Shi, G.; Huang, S.; Chen, W.; Pu, X.; Zhou, J.T.; Xu, Z. Cluster-graph convolution networks for robust multi-view clustering. Knowl.-Based Syst. 2025, 327, 114163. [Google Scholar]
- Liu, Y.; Chen, J.; Lu, Y.; Ou, W.; Long, Z.; Zhu, C. Adaptively topological tensor network for multi-view subspace clustering. IEEE Trans. Knowl. Data Eng. 2024, 36, 5562–5575. [Google Scholar] [CrossRef]
- Kong, Z.; Fu, Z.; Chang, D.; Wang, Y.; Zhao, Y. One for all: A novel dual-space co-training baseline for large-scale multi-view clustering. arXiv 2024, arXiv:2401.15691. [Google Scholar]
- Kumar, A.; Daume, H. A co-training approach for multi-view spectral clustering. In Proceedings of the International Conference on Machine Learning, Bellevue, DC, USA, 28 June–2 July 2011; pp. 393–400. [Google Scholar]
- Liu, T.-L. Guided co-training for large-scale multi-view spectral clustering. arXiv 2017, arXiv:1707.09866v1. [Google Scholar]
- Yang, G.; Zou, J.; Chen, Y.; Du, L.; Zhou, P. Heat kernel diffusion for enhanced late fusion multi-view clustering. IEEE Signal Process. Lett. 2024, 31, 2310–2314. [Google Scholar] [CrossRef]
- Gan, Y.; You, Y.; Huang, J.; Xiang, S.; Tang, C.; Hu, W.; An, S. Multi-View Clustering via Multi-Stage Fusion. IEEE Trans. Multimed. 2025, 27, 4571–4583. [Google Scholar] [CrossRef]
- Bruno, E.; Marchand-Maillet, S. Multiview clustering: A late fusion approach using latent models. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, MA, USA, 19–23 July 2009; pp. 19–23. [Google Scholar]
- Liu, X.; Zhu, X.; Li, M.; Wang, L.; Tang, C.; Yin, J.; Shen, D.; Wang, H.; Gao, W. Late fusion incomplete multi-view clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2410–2423. [Google Scholar] [CrossRef]
- Chao, G.; Jiang, Y.; Chu, D. Incomplete contrastive multi-view clustering with high-confidence guiding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024. [Google Scholar]
- Wang, H.; Wang, Q.; Miao, Q.; Ma, X. Joint learning of data recovering and graph contrastive denoising for incomplete multi-view clustering. Inf. Fusion 2024, 104, 102155. [Google Scholar] [CrossRef]
- Zhu, P.; Yao, X.; Wang, Y.; Hui, B.; Du, D.; Hu, Q. Multi-view deep subspace clustering networks. IEEE Trans. Cybern. 2024, 54, 4280–4293. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Wang, Q.; Tao, Z.; Gao, Q.; Yang, Z. Deep adversarial multi-view clustering network. In Proceedings of the International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 2952–2958. [Google Scholar]
- Mu, S.; Lin, S. A comprehensive survey of mixture-of-experts: Algorithms, theory, and applications. arXiv 2025, arXiv:2503.07137. [Google Scholar]
- Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; Dean, J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv 2017, arXiv:1701.06538. [Google Scholar]
- Cai, W.; Jiang, J.; Wang, F.; Tang, J.; Kim, S.; Huang, J. A survey on mixture of experts in large language models. IEEE Trans. Knowl. Data Eng. 2025, 37, 3896–3915. [Google Scholar] [CrossRef]
- Zhang, Y.; Cai, J.; Wu, Z.; Wang, P.; Ng, S.-K. Mixture of experts as representation learner for deep multi-view clustering. In Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; pp. 22704–22713. [Google Scholar]
- Feng, J.; Yang, Y.; Xie, Y.; Li, Y.; Guo, Y.; Guo, Y.; He, Y.; Xiang, L.; Ding, G. Debiased novel category discovering and localization. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 1753–1760. [Google Scholar]
- Wang, G.; Wang, K.; Wang, G.; Torr, P.H.; Lin, L. Solving Inefficiency of Self-supervised Representation Learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 9505–9515. [Google Scholar]
- Zhan, K.; Shi, J.; Wang, J.; Wang, H.; Xie, Y. Adaptive structure concept factorization for multiview clustering. Neural Comput. 2018, 30, 1080–1103. [Google Scholar] [CrossRef]
- Cui, C.; Ren, Y.; Pu, J.; Li, J.; Pu, X.; Wu, T.; Shi, Y.; He, L. A novel approach for effective multi-view clustering with information-theoretic perspective. Neural Inf. Process. Syst. 2023, 36, 44847–44859. [Google Scholar]
- Xu, J.; Li, C.; Peng, L.; Ren, Y.; Shi, X.; Shen, H.T.; Zhu, X. Adaptive feature projection with distribution alignment for deep incomplete multi-view clustering. IEEE Trans. Image Process. 2023, 32, 1354–1366. [Google Scholar] [CrossRef]


| Dataset | NH | CCV | Cora | BBC | USPS | Source |
|---|---|---|---|---|---|---|
| SingleB | 81.49 (7.80) | 21.27 (0.67) | 32.75 (1.47) | 81.96 (5.57) | 61.81 (4.81) | 52.93 (3.59) |
| CCA | 80.41 (8.06) | 26.86 (2.23) | 26.95 (1.58) | 35.55 (8.49) | 74.53 (4.96) | 62.37 (6.76) |
| Co-Pair | 78.17 (0.35) | 23.49 (0.50) | 44.57 (3.57) | 85.85 (9.00) | 75.85 (5.89) | 58.37 (3.28) |
| Co-Cent | 81.05 (0.56) | 25.39 (0.93) | 43.26 (2.98) | 87.29 (7.99) | 77.73 (5.89) | 58.93 (3.07) |
| MultiDMF | 87.10 (0.90) | 23.87 (0.07) | 50.19 (0.02) | 88.25 (0.05) | 83.30 (0.15) | 71.42 (0.48) |
| MultiTE | 82.25 (5.70) | 26.88 (0.61) | 59.95 (0.86) | 93.74 (0.99) | 85.96 (1.45) | 82.91 (2.64) |
| MVCF | 88.90 (0.35) | 24.40 (0.35) | 38.60 (0.23) | 90.25 (0.15) | 63.00 (1.37) | 75.70 (4.31) |
| SGF | 85.21 (0.30) | 21.99 (0.61) | 55.17 (0.00) | 94.14 (0.00) | 94.25 (0.00) | 83.99 (0.13) |
| DGF | 87.38 (0.23) | 22.00 (0.00) | 55.17 (0.03) | 94.53 (0.00) | 67.55 (5.64) | 88.11 (0.26) |
| SUMVC | 90.35 (0.05) | 27.10 (0.35) | 58.17 (0.05) | 95.13 (0.08) | 93.55 (0.64) | 89.11 (0.16) |
| APADC | 89.32 (0.15) | 26.95 (0.12) | 57.85 (0.13) | 95.12 (0.18) | 93.25 (0.56) | 88.85 (0.14) |
| VDMN | 92.78 (0.36) | 28.17 (0.69) | 61.85 (0.80) | 96.34 (0.26) | 95.10 (0.12) | 90.12 (2.56) |
| MEL-MoE | 93.15(0.24) | 29.68 (0.43) | 63.46 (0.65) | 95.85 (0.12) | 96.18 (0.24) | 91.48 (1.24) |
| Dataset | NH | CCV | Cora | BBC | USPS | Source |
|---|---|---|---|---|---|---|
| SingleB | 70.48 (2.27) | 19.03 (0.40) | 18.20 (1.00) | 62.32 (3.16) | 59.12 (1.95) | 53.38 (2.12) |
| CCA | 77.05 (6.62) | 22.71 (1.38) | 1.30 (0.46) | 17.14 (8.54) | 75.54 (3.07) | 60.39 (6.94) |
| Co-Pair | 66.27 (0.27) | 19.71 (0.38) | 27.94 (2.00) | 73.37 (4.30) | 71.16 (1.45) | 62.25 (2.76) |
| Co-Cent | 76.71 (0.48) | 22.09 (0.62) | 24.47 (1.48) | 73.39 (3.78) | 73.40 (2.42) | 62.25 (2.51) |
| MultiDMF | 79.70 (0.50) | 22.77 (0.08) | 30.67 (0.02) | 78.44 (0.07) | 77.19 (0.08) | 54.15 (0.21) |
| MultiTE | 74.00 (2.87) | 22.24 (0.26) | 39.75 (0.25) | 81.58 (0.60) | 82.32 (0.53) | 79.36 (1.95) |
| MVCF | 76.40 (0.21) | 22.50 (0.46) | 18.10 (0.24) | 80.12 (0.05) | 63.30 (0.78) | 65.10 (3.38) |
| SGF | 86.28 (0.10) | 23.34 (0.28) | 42.48 (0.13) | 82.48 (0.00) | 88.94 (0.00) | 77.23 (0.00) |
| DGF | 86.24 (0.20) | 24.00 (0.10) | 45.34 (0.00) | 82.71 (0.00) | 78.53 (3.00) | 76.32 (0.18) |
| SUMVC | 88.28 (0.24) | 25.10 (0.20) | 44.86 (0.05) | 82.15 (0.15) | 86.54 (1.15) | 78.52 (0.08) |
| APADC | 88.15 (0.15) | 25.06 (0.14) | 44.95 (0.16) | 82.05 (0.25) | 86.25 (8.86) | 78.84 (0.13) |
| VDMN | 90.23 (1.47) | 26.15 (0.46) | 46.08 (0.38) | 84.85 (0.63) | 86.68 (0.23) | 81.23 (1.64) |
| MEL-MoE | 91.69 (0.34) | 26.24 (0.37) | 47.95 (0.46) | 86.16 (0.42) | 87.13 (0.18) | 82.36 (0.96) |
| Methods | Accuracy | NMI |
|---|---|---|
| MEL-MoE-Layer-1 | 90.85 (0.23) | 88.37 (0.32) |
| MEL-MoE-Layer-4 | 93.05 (0.19) | 90.89 (0.42) |
| MEL-MoE | 93.15 (0.24) | 91.69 (0.34) |
| MEL-MoE-Groups-1 | 92.86 (0.21) | 90.33 (0.35) |
| MEL-MoE-Groups-4 | 93.12 (0.25) | 91.48 (0.16) |
| MEL-MoE | 93.15 (0.24) | 91.69 (0.34) |
| MEL-MoE-Expert-1 | 87.35 (0.68) | 86.57 (0.92) |
| MEL-MoE-Expert-4 | 92.95 (0.34) | 91.87 (0.12) |
| MEL-MoE | 93.15 (0.24) | 91.69 (0.34) |
| 0.5 | 1 | 1.5 | 2 | 5 | |
|---|---|---|---|---|---|
| Accuracy | 83.16 (0.36) | 92.15 (0.12) | 92.96 (0.16) | 93.15 (0.24) | 91.68 (0.15) |
| NMI | 81.68 (0.24) | 90.85 (0.31) | 91.35 (0.25) | 91.69 (0.34) | 90.12 (0.20) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, X.; Cao, Y.; Zhang, Y.; Ren, H.; Yin, Q. MoE Based Consistency and Complementarity Mining for Multi-View Clustering. Algorithms 2026, 19, 132. https://doi.org/10.3390/a19020132
Wang X, Cao Y, Zhang Y, Ren H, Yin Q. MoE Based Consistency and Complementarity Mining for Multi-View Clustering. Algorithms. 2026; 19(2):132. https://doi.org/10.3390/a19020132
Chicago/Turabian StyleWang, Xiaoping, Yang Cao, Yifan Zhang, Hanlu Ren, and Qiyue Yin. 2026. "MoE Based Consistency and Complementarity Mining for Multi-View Clustering" Algorithms 19, no. 2: 132. https://doi.org/10.3390/a19020132
APA StyleWang, X., Cao, Y., Zhang, Y., Ren, H., & Yin, Q. (2026). MoE Based Consistency and Complementarity Mining for Multi-View Clustering. Algorithms, 19(2), 132. https://doi.org/10.3390/a19020132
