An End-to-End, Multi-Branch, Feature Fusion-Comparison Deep Clustering Method
Abstract
:1. Introduction
- Proposed a new, end-to-end, multi-branch, feature fusion-comparison deep clustering method. Contrast learning is utilized to accomplish a priori representation learning while fusing aggregated information under multiple branches in a feature extraction network. The contrastive representation-learning stage uses clustering centers to compare instance samples and extract semantically meaningful feature representations. Combined representation learning and clustering for joint training and iterative optimization.
- Designed a new, multi-branch feature-aggregation method. Divided multi-channel sub-features, using a three-branch structure to learn multi-dimensional spatial channel-dimension information and weighted receptive-field spatial features. Completed multi-branch and cross-dimensional information exchange, achieving the aggregation of sub-features and establishing long-term and short-term dependence.
- Designed a clustering-oriented contrastive representation learning strategy. Joint optimization of unsupervised contrastive representation learning and clustering to improve the problem of error transmission faced by multi-stage deep clustering tasks. The training of the model extracts clustering-oriented feature representations in continuous iterations, thus improving the model’s ability to cluster.
2. Related Work
3. Materials and Methods
3.1. Contrast Deep Clustering
3.2. Multi-Branch Feature Aggregation
3.3. Objective Function
4. Experiments
4.1. Dataset
- CIFAR-10 [33] is a dataset containing 60,000 images of color objects, of which 50,000 are training images, and 10,000 are test images. Each image is a three-channel color RGB image of size 32 × 32. Each image in CIFAR-10 represents real-world objects and can be categorized into 10 classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.
- CIFAR-100/20 [33] contains 60,000 color images, including 50,000 training images and 10,000 test images, each of which belongs to the RGB three-channel type of 32 × 32 pixels. The CIFAR100 dataset contains 100 categories, which can be subdivided into 20 major categories from a deeper perspective, and 5 subcategories in each major category. From the perspective of category division, CIFAR-100/20 is more detailed and rich in hierarchical structure than CIFAR-10, which is more conducive to network learning.
- STL10 [34] is one of the commonly used benchmark datasets in the unsupervised domain, which consists of 113,000 RGB images, all of which have a resolution of 96 × 96, and it contains 105,000 training data and 8000 test data.
4.2. Evaluation Metrics
4.3. Experimental Settings
4.4. Comparative Experiment
4.5. Empirical Analysis
4.5.1. Visualization of Cluster Semantics
4.5.2. Ablation Study
4.6. Comparative Study
4.7. Parameter Sensitivity
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, F.; Qiao, H.; Zhang, B.; Xi, X. Discriminatively Boosted Image Clustering with Fully Convolutional Auto-Encoders. Pattern Recognit. 2018, 83, 161–173. [Google Scholar] [CrossRef]
- von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
- Caron, M.; Bojanowski, P.; Joulin, A.; Douze, M. Deep Clustering for Unsupervised Learning of Visual Features; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11218, pp. 139–156. [Google Scholar]
- Ting, C.; Simon, K.; Mohammad, N.; Geoffrey, H. A Simple Framework for Contrastive Learning of Visual Representations. In International Conference on Machine Learning; PMLR: Birmingham, UK, 2020; Volume 119, pp. 1597–1607. [Google Scholar]
- Xu, J.; Tang, H.; Ren, Y.; Peng, L.; Zhu, X.; He, L. Multi-level Feature Learning for Contrastive Multi-view Clustering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; Volume 1, pp. 16030–16039. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
- Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Gansbeke, W.V.; Vandenhende, S.; Georgoulis, S.; Proesmans, M.; Gool, L.V. SCAN: Learning to Classify Images without Labels. In European Conference on Computer Vision 2020; Springer International Publishing: Cham, Switzerland, 2020; pp. 268–285. [Google Scholar]
- Chen, C.; Lu, H.; Wei, H.; Geng, X. Deep Subspace Image Clustering Network with Self-Expression and Self-Supervision; Springer: Berlin/Heidelberg, Germany, 2022; Volume 53, pp. 4859–4873. [Google Scholar]
- Yang, X.; Deng, C.; Zheng, F.; Yan, J.; Liu, W. Deep Spectral Clustering Using Dual Autoencoder Network. arXiv 2019, arXiv:1904.13113. [Google Scholar]
- Niu, C.; Shan, H.; Wang, G. SPICE: Semantic Pseudo-Labeling for Image Clustering. IEEE Trans. Image Process. 2022, 31, 7264–7278. [Google Scholar] [CrossRef]
- Kaiming, H.; Haoqi, F.; Yuxin, W.; Saining, X.; Ross, G. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020, Seattle, WA, USA, 13–19 June 2020; Volume 2020, pp. 9726–9735. [Google Scholar]
- Xinlei, C.; Haoqi, F.; Ross, G.; Kaiming, H. Improved Baselines with Momentum Contrastive Learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
- Xie, J.; Girshick, R.; Farhadi, A. Unsupervised Deep Embedding for Clustering Analysis. arXiv 2016, arXiv:1511.06335. [Google Scholar]
- Wu, Z.; Xiong, Y.; Yu, S.X.; Lin, D. Unsupervised Feature Learning via Non-Parametric Instance Discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018, Salt Lake City, UT, USA, 8–22 June 2018. [Google Scholar]
- van den Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. Adv. Neural Inf. Process. Syst. 2020, 33, 9912–9924. [Google Scholar]
- Jean-Bastien, G.; Florian, S.; Florent, A.; Corentin, T.; H., R.P.; Elena, B.; Carl, D.; Avila, P.B.; Daniel, G.Z.; Gheshlaghi, A.M.; et al. Bootstrap Your Own Latent—A New Approach to Self-Supervised Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21271–21284. [Google Scholar]
- Chen, X.; He, K. Exploring Simple Siamese Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021, Nashville, TN, USA, 20–25 June 2021; pp. 15745–15753. [Google Scholar]
- Chen, X.; Xie, S.; He, K. An Empirical Study of Training Self-Supervised Vision Transformers. arXiv 2021, arXiv:2104.02057. [Google Scholar]
- Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging Properties in Self-Supervised Vision Transformers. arXiv 2021, arXiv:2104.14294. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Guo, X.; Gao, L.; Liu, X.; Yin, J. Improved Deep Embedded Clustering with Local Structure Preservation. In Proceedings of the IJCAI 2017, Melbourne, Australia, 19–25 August 2017; pp. 1753–1759. [Google Scholar]
- Mukherjee, S.; Asnani, H.; Lin, E.; Kannan, S. Clustergan: Latent Space Clustering in Generative Adversarial Networks. arXiv 2019, arXiv:1809.03627. [Google Scholar] [CrossRef]
- Hu, J.; Zhang, Y.; Zhao, D.; Yang, G.; Chen, F.; Zhou, C.; Chen, W. A Robust Deep Learning Approach for the Quantitative Characterization and Clustering of Peach Tree Crowns Based on UAV Images. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
- Li, Y.; Hu, P.; Liu, Z.; Peng, D.; Zhou, J.T.; Peng, X. Contrastive Clustering. In Proceedings of the AAAI Conference on Artificial Intelligence 2021, Virtually, 2–9 February 2021; Volume 35, pp. 8547–8555. [Google Scholar]
- Zhong, H.; Wu, J.; Chen, C.; Huang, J.; Deng, M.; Nie, L.; Lin, Z.; Hua, X.S. Graph Contrastive Clustering. arXiv 2021, arXiv:2104.01429. [Google Scholar]
- Cuturi, M. Sinkhorn Distances: Lightspeed Computation of Optimal Transportation Distances. arXiv 2013, arXiv:1306.0895. [Google Scholar]
- Peyré, G.; Cuturi, M. Computational Optimal Transport. arXiv 2019, arXiv:1803.00567. [Google Scholar]
- Hu, Q.; Zhang, L.; Zhang, D.; Pan, W.; An, S.; Pedrycz, W. Measuring relevance between discrete and continuous features based on neighborhood mutual information. Expert Syst. Appl. 2011, 38, 10737–10750. [Google Scholar] [CrossRef]
- Ntelemis, F.; Jin, Y.; Thomas, S.A. Information maximization clustering via multi-view self-labelling. Knowl.-Based Syst. 2022, 250, 109042. [Google Scholar] [CrossRef]
- Ouyang, D.; He, S.; Zhan, J.; Guo, H.; Huang, Z.; Luo, M.; Zhang, G. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. arXiv 2023, arXiv:2305.13563. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 2 September 2024).
- Coates, A.; Ng, A.Y.; Lee, H. An Analysis of Single-Layer Networks in Unsupervised Feature Learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics 2011, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 215–223. [Google Scholar]
- Znalezniak, M.; Rola, P.; Kaszuba, P.; Tabor, J.; Smieja, M. Contrastive Hierarchical Clustering. arXiv 2023, arXiv:2303.03389. [Google Scholar]
- Gao, H.; Liu, Z.; Weinberger, K.; van der Maaten, L. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Dataset Name | Total Samples | Clusters | Type | Size |
---|---|---|---|---|
CIFAR-10 | 60,000 | 10 | Color object image | 32 × 32 |
CIFAR-100/20 | 60,000 | 20/100 | Color object image | 32 × 32 |
STL10 | 113,000 | 10 | Color object image | 96 × 96 |
Model | CIFAR-10 | CIFAR-100/20 | STL10 | ||||||
---|---|---|---|---|---|---|---|---|---|
ACC | NMI | ARI | ACC | NMI | ARI | ACC | NMI | ARI | |
K-means | 22.2 | 7.5 | 4.6 | 14.2 | 8.2 | 2.6 | 22.5 | 12.7 | 6.1 |
CC [26] | 78.9 | 70.4 | 63.7 | 42.8 | 43.0 | 26.5 | 85.0 | 76.3 | 72.5 |
SCAN [8] | 87.2 | 78.2 | 75.3 | 46.7 | 45.9 | 29.0 | 75 | 66.0 | 58.7 |
CoHiClust [35] | 83.0 | 75.3 | 70.1 | 45.0 | 41.0 | 28.0 | 69.0 | 60.7 | 52.5 |
IMC-SwAV [31] | 89.3 | 81.4 | 79.2 | 49.3 | 51.2 | 34.5 | 81.4 | 71.9 | 67.4 |
SwEAC (AVG) | 89.6 ± 0.4 | 81.8 ± 0.5 | 79.8 ± 0.6 | 51.0 ± 0.5 | 52.1 ± 0.4 | 35.7 ± 0.7 | 83.3 ± 0.3 | 73.1 ± 0.3 | 68.5 ± 0.5 |
SwEAC (best) | 90.1 | 82.3 | 80.7 | 51.5 | 52.8 | 36.5 | 83.6 | 73.4 | 69 |
Methods | ACC | NMI | ARI | |
---|---|---|---|---|
CIFAR-10 | SwEAC (ResNet) | 89.8 | 82.2 | 80.6 |
SwEAC (EAR) | 90.1 | 82.3 | 80.7 | |
CIFAR-100/20 | SwEAC (ResNet) | 49.0 | 50.4 | 33.9 |
SwEAC (EAR) | 51.5 | 52.8 | 36.5 | |
STL10 | SwEAC (ResNet) | 83.5 | 73.0 | 68.9 |
SwEAC (EAR) | 83.6 | 73.4 | 69 |
Methods | ACC | NMI | ARI |
---|---|---|---|
SwEAC-kmeans | 65.5 | 68.8 | 42.7 |
SwEAC-sc | 73.7 | 75.4 | 61.5 |
SwEAC | 89.9 | 82.3 | 80.6 |
ACC | NMI | ARI | |
---|---|---|---|
G = 4 | 50 | 51 | 34.6 |
G = 8 | 51.5 | 52.5 | 36.5 |
G = 16 | 49.6 | 50.9 | 34.3 |
G = 32 | 47.8 | 50.9 | 33.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, X.; Yang, H. An End-to-End, Multi-Branch, Feature Fusion-Comparison Deep Clustering Method. Mathematics 2024, 12, 2749. https://doi.org/10.3390/math12172749
Li X, Yang H. An End-to-End, Multi-Branch, Feature Fusion-Comparison Deep Clustering Method. Mathematics. 2024; 12(17):2749. https://doi.org/10.3390/math12172749
Chicago/Turabian StyleLi, Xuanyu, and Houqun Yang. 2024. "An End-to-End, Multi-Branch, Feature Fusion-Comparison Deep Clustering Method" Mathematics 12, no. 17: 2749. https://doi.org/10.3390/math12172749