Next Article in Journal
Adaptive Threshold Wavelet Denoising Method and Hardware Implementation for HD Real-Time Processing
Previous Article in Journal
An Improved Strategy for Data Layout in Convolution Operations on FPGA-Based Multi-Memory Accelerators
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Globally Collaborative Multi-View k-Means Clustering

by
Kristina P. Sinaga
1,2 and
Miin-Shen Yang
2,*
1
Institute of Information Science and Technologies of the National Research Council of Italy (ISTI-CNR), Via G. Moruzzi, 1, 56124 Pisa, Italy
2
Department of Applied Mathematics, Chung Yuan Christian University, Taoyuan 32023, Taiwan
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(11), 2129; https://doi.org/10.3390/electronics14112129
Submission received: 3 April 2025 / Revised: 10 May 2025 / Accepted: 20 May 2025 / Published: 23 May 2025

Abstract

:
Multi-view (MV) data are increasingly collected from various fields, like IoT. The surge in MV data demands clustering algorithms capable of handling heterogeneous features and high dimensionality. Existing feature-weighted MV k-means (MVKM) algorithms often neglect effective dimensionality reduction such that their scalability and interpretability are limited. To address this, we propose a novel procedure for clustering MV data, namely a globally collaborative MVKM (G-CoMVKM) clustering algorithm. The proposed G-CoMVKM integrates a collaborative transfer learning framework with entropy-regularized feature-view reduction, enabling dynamic elimination of uninformative components. This method achieves clustering by balancing local view importance and global consensus, without relying on matrix reconstruction. We design a feature-view reduction by embedding transferred learning processes across view components by using penalty terms and entropy to simultaneously reduce these unimportant feature-view components. Experiments on synthetic and real-world datasets demonstrate that G-CoMVKM consistently outperforms these existing MVKM clustering algorithms in clustering accuracy, performance, and dimensionality reduction, affirming its robustness and efficiency.

1. Introduction

The proliferation of multi-view (MV) data is evident across diverse domains. Social media platforms, such as Facebook, YouTube, Instagram, Twitter, etc., have increased significantly over the past years. The rapidly growing amount of MV data also happened on Google images in which there are over 136 billion images stored on their database. News agencies like Reuters, BBC, and CNN leverage these platforms to share headline news through multiple views: textual articles, photographs, videos, and social media engagement metrics. In healthcare, patient data often combines clinical records, imaging data (MRI, CT scans), and genetic information, forming comprehensive multi-view datasets. These collected data from multiple sources will form MV data, which will continue to grow rapidly in the future. The massive phenomena exploiting the availability of data can enable businesses to accelerate their services and benefits. With the surge of data generated from diverse sources and its ease of accessibility, clustering methods for MV data become a more attractive field of research.
Clustering is one of the most useful methods in pattern recognition and machine learning. The clustering method uses mathematical functions to label input data with high similarity into the same cluster while loading dissimilar data into different clusters. In clustering, there are generally two approaches: model-based [1] and non-parametric [2]. In the model-based approach, expectation & maximization (EM) becomes the most popular [3,4]. In the non-parametric approach, partitional methods are widely used, including k-means [5,6], mean-shift [7,8], spectral clustering [9,10], fuzzy c-means [11,12], graph clustering [13,14], and possibilistic c-means [15,16]. However, most consider only single-view data. In this paper, we develop clustering algorithms specifically for MV data, focusing on k-means clustering. MV clustering algorithms represent a popular version of unsupervised machine learning. While several MV k-means (MVKM) algorithms existed [17,18,19,20], most treat feature components with equal importance. In real applications, different features may have varying weights, making feature weighting crucial for MVKM algorithms. Several feature-weighted MVKM algorithms have been proposed in the literature [21,22,23], showing that MV clustering algorithms are more robust compared to conventional single-view clustering (SVC) algorithms.
However, most recently feature-weighted MVKM clustering only reveals the behavior of multiple representations for MV data by setting a collaboration stage between local membership and cluster centers across views. The real challenge lies in combining feature-view-reduction and collaborative-based approaches between feature weights, local membership, and cluster centers within a single framework. This integration faces several key challenges:
  • Mathematical complexity in revealing common information through matrix learning.
  • Integrating local steps to enhance global-based collaboration.
  • Maintaining consistency across different views.
  • Managing noise sensitivity in datasets.
  • Achieving unified patterns through fusion stages.
  • Implementing Shannon function [24] for collaborative transfer learning.
Yang and Sinaga [25] proposed Co-FW-MVFCM, which builds collaborative learning based on multiplication between diagonal matrix components and weighted square distances. However, they considered feature weighting independently without addressing mutual links across data views. To address these limitations, we propose a globally collaborative MV k-means (G-CoMVKM) clustering algorithm that embeds mutual feature weighting links across views within a single framework. The remainder of the paper is organized as follows. In Section 2, we review these related works in the literature. Section 3 presents the proposed G-CoMVKM objective function, updating equations, and algorithm details, including parameter behavior and computational complexity. In Section 4, we give a comparative analysis of the G-CoMVKM and existing methods using simulated and real data sets. Finally, conclusions are stated in Section 5.

2. Related Works

In this section, we review some related works of MVKM algorithms. We organize these related works into five categories: classical methods, feature reduction approaches, collaborative learning techniques, deep learning-based approaches, and contrastive learning methods. Let X = x 1 , , x n be an MV dataset in a d-dimensional Euclidean space d with x i = x i h h = 1 s , x i h = x i j h j = 1 d h , x i h d h , i = 1 , , n , and h = 1 s d h = d . Let μ i k { 0 , 1 } represent the membership degree of the ith data point assigned in the kth cluster where μ i k = 1 if the data point x i belongs to the kth cluster a k , but μ i k = 0 if the data point x i is not belonged to the kth cluster a k . Let U c × n = μ 1 , , μ n n × c be the membership matrix. Furthermore, let μ i h = μ i k h be the membership degree of ith data point assigned in the kth cluster for hth view with U h = μ 1 1 , , μ n 1 , μ 1 2 , , μ n 1 , , μ n s . Let A h = a 1 1 , , a c d 1 , a 1 2 , , a c d 2 , , a c d s with a k h = a k j h , k = 1 , , c being the cluster centers of jth feature component in the kth cluster for each hth view. Let W h = w 1 1 , , w d 1 1 , w 1 2 , , w d 2 2 , , w d s s with w d h h = w j h being the jth feature weight for the hth view and let V h = v 1 , , v s 1 × s be the weight for the hth view.

2.1. Classical MVKM Methods

The foundation of MVKM clustering algorithms is rooted in the classical k-means algorithm introduced by MacQueen [26]. The k-means algorithm has been widely adopted for clustering tasks due to its simplicity and efficiency in handling single-view data. Its objective function is mathematically expressed as J k - m e a n s U , A = k = 1 c i = 1 n μ i k x i a k 2 = k = 1 c i = 1 n μ i k j = 1 d x i j a k j 2 . Despite its widespread use, the classical k-means algorithm is inherently limited to single-view data and cannot effectively handle datasets with multiple representations (multi-view) data. Furthermore, its sensitivity to initialization and the lack of a mechanism to incorporate feature importance or inter-view relationships restrict its applicability in complex multi-view scenarios.
To address these limitations, several extensions of the classical k-means algorithm have been proposed, particularly in the context of multi-view clustering. Two notable advancements are the Two-Level Weighted K-means (TW-K-means) by Chen et al. [27] and the Weighted Multi-view Clustering with Feature Selection (WMCFS) by Xu et al. [21]. TW-K-means [27] extended the single-view feature-weighted k-means (WKM) [28] to multi-view data by introducing a two-level weighting mechanism. This approach assigns weights to both features and views, enabling the algorithm to identify and prioritize the most informative components across multiple representations. The objective function of TW-K-means [27] is formulated as:
J T W - K - m e a n s V , U , W , A = h = 1 s v h j = 1 d h w j h k = 1 c i = 1 n μ i k x i j h a k j h 2 + η h = 1 s v h log v h + β j = 1 d h w j h log w j h
Here, the parameters η and β control the entropy-based regularization terms, which prevent overfitting by penalizing extreme weight distributions. TW-K-means effectively identifies significant feature-view components, but it is sensitive to user-defined parameters and initialization. Additionally, it lacks mechanisms for inter-view information sharing, which limits its ability to achieve consensus clustering results.
WMCFS [21] introduces a feature selection mechanism by incorporating L2 penalty terms into the objective function. This approach shrinks feature-view weights toward zero, effectively eliminating irrelevant or redundant features during the clustering process. The objective function of WMCFS is given by:
J W M C F S V , U , W , A = h = 1 s v h α k = 1 c i = 1 n μ i k diag w h x i j h a k j h 2 + β h = 1 s w h 2
where α and β are regularization parameters that control the distribution of view and feature weights. By penalizing large weights, WMCFS ensures that only the most relevant features contribute to the clustering process. This method is particularly effective for sparse datasets, such as text or image data. However, its performance heavily depends on the choice of the regularization parameters, and it may struggle with datasets that exhibit diverse feature characteristics across views. These MVKM methods laid the groundwork for MV clustering by extending the capabilities of k-means to handle MV data. However, their reliance on user-defined parameters, sensitivity to initialization, and limited ability to incorporate inter-view relationships highlight the need for more advanced approaches, such as collaborative learning and feature reduction techniques, which are discussed in subsequent sections.

2.2. Feature Reduction Approaches

Feature reduction approaches in MVKM clustering aim to address the challenges posed by high-dimensional data in identifying and eliminating irrelevant or redundant features within each view. These methods not only improve clustering performance but also reduce computational complexity, making them particularly valuable for large-scale and high-dimensional datasets. Two prominent algorithms in this category are the Simultaneous Weighting on Views and Features (SWVF) by Jiang et al. [22] and the Feature Reduction Multi-View K-means (FRMVK) by Yang et al. [29].
SWVF was proposed by Jiang et al. [22] as an enhancement to the Weighted Multi-view Clustering with Feature Selection (WMCFS) algorithm. While WMCFS [21] employs L2 penalty terms to shrink feature-view weights, SWVF [22] introduces a sparsity representation mechanism using logarithmic weight functions. This approach allows SWVF to better capture the importance of feature-view components while maintaining a compact representation of the data. The objective function of SWVF is formulated as:
J S W V F V , U , W , A = h = 1 s v h α k = 1 c i = 1 n μ i k j = 1 d h w j h β x i j h a k j h 2
where parameters α and β are regularization exponents that control the distribution of view and feature weights. By leveraging these parameters, SWVF achieves improved clustering performance compared to WMCFS. However, the algorithm is sensitive to the choice of α and β , requires extensive parameter tuning to achieve optimal results. Additionally, while SWVF effectively handles sparsity in datasets such as text or image data, its performance may degrade when applied to datasets with diverse feature characteristics.
FRMVK was proposed by Yang et al. [29] as a novel approach to incorporate feature reduction directly into the clustering process. Unlike SWVF, which focuses on sparsity representation, FRMVK employs a feature reduction mechanism that dynamically eliminates irrelevant features during clustering. This is achieved by regulating the importance of each feature through a balancing parameter, δ j h , defined as δ j h = mean ( x i j h ) var ( x i j h ) , where mean ( x i j h ) and var ( x i j h ) denote the mean and variance of the j-th feature in the h-th view, respectively. The objective function of FRMVK is formulated as:
J F R M V K V , U , W , A = h = 1 s v h α k = 1 c i = 1 n μ i k j = 1 d h w j h δ j h x i j h a k j h 2 + n d h h = 1 s j = 1 d h w j h ln δ j h w j h
where the first term represents the clustering objective, and the second term penalizes uninformative features by shrinking their weights toward zero. The exponent parameter α controls the contribution of each view, with values typically ranging from 2 to 10. By dynamically reducing the dimensionality of each view, FRMVK improves clustering performance and reduces computational overhead. However, it does not leverage complementary information across views, limiting its ability to achieve consensus clustering results. This highlights the need for collaborative learning approaches that integrate feature reduction with inter-view information sharing.
In summary, feature reduction approaches such as SWVF and FRMVK play a critical role in addressing the challenges of high-dimensional multi-view data. While SWVF excels in sparsity representation, FRMVK introduces an effective mechanism for feature elimination. Both methods, however, face limitations in their ability to fully exploit inter-view relationships, paving the way for the development of more advanced collaborative learning techniques.

2.3. Collaborative Learning Methods

Collaborative learning methods in MVKM clustering represent a significant advancement by incorporating cross-view information sharing to enhance clustering performance. These methods address the limitations of traditional approaches by leveraging the disagreement between views to achieve a global consensus solution. One prominent example is the two-level weighted collaborative k-means (TW-Co-K-means) algorithm proposed by Zhang et al. [30]. This method introduces a novel collaboration step that facilitates information exchange across views during the clustering process. The objective function of TW-Co-K-means is formulated as:
J T W - C o - K - m e a n s V , U , W , A = h = 1 s v h j = 1 d h w j h k = 1 c i = 1 n μ i k h d i k h + η s 1 Δ + α h = 1 s j = 1 d h w j h ln w j h + β h = 1 s v h ln v h
where d i k h = x i j h a k j h 2 represents the Euclidean distance between data points and cluster centers, and Δ quantifies the degree of disagreement between views. The disagreement term Δ is expressed as Δ = h = 1 s h = 1 , h h s i = 1 n k = 1 c μ i k h μ i k h v h j = 1 d h w j h d i k h v h j = 1 d h w j h d i k h . The inclusion of Δ ensures that the algorithm accounts for the disagreement between views, promoting collaboration and information sharing. Larger values of Δ indicate stronger cross-view collaboration, which enhances the clustering performance.
The TW-Co-K-means algorithm [28] incorporates entropy-based regularization terms for both feature weights w j h and view weights v h , controlled by the parameters α and β , respectively. These terms encourage sparsity and prevent overfitting by penalizing excessive reliance on specific features or views. Additionally, the parameter η regulates the degree of agreement between views, with higher values promoting stronger inter-view correlations. While TW-Co-K-means demonstrates superior performance in real-world applications, it is computationally intensive due to the iterative nature of the collaboration step and the need to optimize multiple parameters α , β , and η . Furthermore, the algorithm considers all feature components within each view during clustering, which may degrade performance if irrelevant or redundant features are included. To address this, feature selection or reduction techniques, such as those employed in FRMVK [29], can be integrated to enhance efficiency and accuracy.
In summary, collaborative learning methods like TW-Co-K-means represent a paradigm shift in MVKM clustering by leveraging cross-view information sharing to achieve a global consensus solution. These methods effectively balance the contributions of individual views and their features, enabling robust clustering performance across diverse datasets. However, future research should focus on reducing computational complexity and improving the interpretability of parameter tuning to further enhance their applicability.

2.4. Deep Learning-Based Approaches

Recent advances in deep learning have led to more sophisticated MVKM variants. In this subsection, we focus on the Dynamic Multi-scale and multi-resolution Convolution Network (DMC-Net) proposed by Yang et al. [31,32] for the automatic pancreas and pancreatic mass segmentation in CT images. DMC-Net enhances the traditional U-Net architecture with two key modules [31,32]. We briefly review the Dynamic Multi-Resolution Convolution (DMRC). The DMRC [31] module consists of three main components:
  • Multi-scale Feature Extraction: Utilizes three distinct paths:
    -
    Path 1: Original resolution features via 3 × 3 convolution with F 1 = F conv ( X ) .
    -
    Path 2: Neighboring context via 4 × 4 pooling and upsampling with F 2 = F up ( F conv ( F AvgPool ( X ) ) ) .
    -
    Path 3: Pixel-wise context via 1 × 1 convolution with F 3 = F conv ( 1 ) ( X ) .
  • Feature Fusion: Combines Paths 2 and 3 Features to enhance the representation capabilities of the network according to F s = F 2 F 3 and F = F 1 F Sig ( F s ) .
  • Global Context Integration: Extracted from the fused features (Fs) using global average pooling F GAvgPool ( F ) : a linear layer F conv ( 1 ) and a sigmoid activation F Sig , formulated in the following way with G = F Sig ( F conv ( 1 ) ( F GAvgPool ( F ) ) ) and F = F G

2.5. Contrastive Learning Methods

Hassani and Khasahmadi [33] introduced a self-supervised approach for learning node and graph-level representations by contrasting encodings from two structural views of graphs such as graph diffusion and graph pooling, and then Fu et al. [34] considered contrastive multi-view clustering. We briefly described its key points as follows.
  • Graph diffusion is a technique used to generate structural views of graphs by combining local and global information. Graph diffusion provides a global view of the graph structure, complementing the local view provided by the adjacency matrix. The diffusion matrix is computed as S = k = 0 Θ k T k n × n where T is the generalized transition matrix, and Θ k is the weighting coefficient.
  • Graph pooling is used to aggregate node-level representations into graph-level representations. This pooling method is simple yet effective, outperforming more complex hierarchical graph pooling methods like DiffPool. It ensures that both local and global information from all layers is captured in the graph-level representation. Node representations are aggregated into graph representations using a pooling function, formulated in the following way: h g = σ l = 1 L [ i = 1 n   h i ( l ) ] W h d where h i ( l ) is the latent representation of node i in layer l, is the concatenation operator, L the number of GCN layers, W ( L × d h ) × d h is the network parameters, and σ is a PReLU non-linearity.

2.6. Comparative Analysis

Table 1 summarizes the key characteristics of representative methods from each category. As reported in Table 1, each approach presents distinct advantages and limitations. TW-K-means and WMCFS provide robust foundations but lack sophisticated feature reduction. TW-Co-K-means leverages cross-view information but faces computational challenges. Autoencoder-based methods demonstrate non-linear feature transformation capabilities, effective noise reduction through reconstruction, automatic feature hierarchy learning, and challenges in determining optimal network architecture. Contrastive learning approaches offer self-supervised representation learning, better view-invariant features, robust performance without manual labeling, and increased computational requirements during training. Our proposed G-CoMVKM addresses these limitations by combining feature reduction with collaborative learning while maintaining computational efficiency.

3. The Proposed Globally Collaborative Multi-View K-Means Algorithm

This section will cover the proposed algorithm’s architecture and intuition, followed by the mathematical formulation, component-wise explanation of the objective function, optimization procedures, parameters, threshold setting and impact, fusion stage, algorithm implementation, complexity analysis, and comparison with existing methods. This provides readers with a better understanding of the section’s organization and helps them navigate through the technical details of our proposed Globally Collaborative Multi-View K-means (G-CoMVKM) method. Let X = x 1 , , x n be an MV dataset in d with x i = x i h h = 1 s ,   x i h = x i j h j = 1 d h ,   x i h d h ,   i = 1 , , n and h = 1 s d h = d . Let μ i k { 0 , 1 } be the membership of the ith data point assigned in the kth cluster. Let U c × n = μ 1 , , μ n n × c be the membership matrix with μ i h = μ i k h being the membership of ith data point assigned in kth cluster for hth view with U h = μ 1 1 , , μ n 1 , μ 1 2 , , μ n 1 , , μ n s . Let A h = a 1 1 , , a c d 1 , a 1 2 , , a c d 2 , , a c d s with a k h = a k j h , k = 1 , , c being the cluster centers of jth feature component in the kth cluster for each hth view. Let W h = w 1 1 , , w d 1 1 , w 1 2 , , w d 2 2 , , w d s s with w d h h = w j h being the jth feature weight for the hth view and let V h = v 1 , , v s 1 × s be the weight for the hth view.

3.1. Overview and Algorithm Architecture

This subsection introduces our G-CoMVKM algorithm, which addresses several limitations of existing MVKM methods. We first present the overall architecture and working principles of G-CoMVKM, and then formulate its objective function with detailed explanations of each component. Subsequently, we derive the updating rules for cluster memberships, centroids or cluster centers, view weights, and feature weights. We also explain the feature pruning mechanism that enables effective dimensionality reduction, followed by a discussion on parameter settings and computational complexity analysis. The G-CoMVKM algorithm uniquely combines three critical components into a unified framework: (1) adaptive feature weighting, (2) cross-view collaboration through transfer learning, and (3) automatic feature pruning via threshold-based selection. Figure 1 illustrates the overall architecture of the G-CoMVKM algorithm, illustrating the flow from multi-view data input through feature weighting, cross-view collaboration, and feature pruning to final clustering output.
As shown in Figure 1, the algorithm begins with multi-view input data, where each view may contain different feature representations of the same entities. These features undergo an adaptive weighting process that determines their importance for clustering. Simultaneously, membership information is shared across views through a collaborative learning mechanism that transfers knowledge between views. The algorithm then employs a feature-pruning process that automatically identifies and eliminates redundant or irrelevant features using a threshold-based approach. Unlike feature pruning which occurs automatically, view pruning in G-CoMVKM is an optional user-guided process. The algorithm computes weights for each view, but the decision to exclude views with minimal weights depends on empirical validation. We recommend that users evaluate clustering performance both with and without the lowest-weighted views. If excluding a low-weight view improves clustering accuracy, we suggest discarding that view from the global solution. This empirical approach to view selection ensures optimal performance while maintaining interpretability. The entire integrated process iterates until convergence, producing the final clustering result with optimized feature-view weights. Most existing weighted MVKM algorithms can distinguish between relevant and weakly relevant features, but they cannot effectively eliminate redundant, unimportant, or noisy features [23,24,25,26]. In contrast, G-CoMVKM enables all relevant components across views to enhance collaborative learning while automatically pruning irrelevant features.

3.2. Problem Formulation and Objective Function

In this subsection, we formulate the G-CoMVKM objective function, which integrates adaptive feature weighting, cross-view collaborative learning, and automatic feature pruning mechanisms. Before diving into the mathematical formulation, we first provide an intuitive explanation of our approach. The core idea of G-CoMVKM is to perform clustering by balancing three key objectives including distance minimization, cross-view knowledge transfer, and feature pruning. Distance minimization aims to minimize the weighted distance between data points and their assigned cluster centers across views. Cross-view knowledge facilitates information sharing between different views through a collaborative transfer learning mechanism. Feature pruning is designed to automatically identify and eliminate irrelevant or redundant features using a threshold-based approach.
The mathematical formulation of G-CoMVKM is designed to address the challenges of clustering high-dimensional multi-view data by integrating feature-view reduction and collaborative learning. Unlike traditional MVKM methods, G-CoMVKM balances local view importance and global consensus, resulting in improved scalability, interpretability, and clustering accuracy. Most existing weighted MVKM algorithms can distinguish between relevant and weakly relevant features but are unable to effectively remove redundant, uninformative, or noisy features [25,26,27,29]. To overcome this limitation, we propose G-CoMVKM, which enables collaborative learning across all relevant components from multiple views. The objective function of G-CoMVKM is formulated as follows:
J G - C o M V K M = h = 1 s v h γ k = 1 c i = 1 n μ i k h j = 1 d h w j h τ j h d i k h + δ ρ + θ h = 1 s j = 1 d h w j h log ( τ j h w j h )
where ρ = h = 1 , h h s k = 1 c i = 1 n μ i k h μ i k h j = 1 d h w j h τ j h d i k h j = 1 d h w j h τ j h d i k h   ,   d i k h = x i j h a k j h 2 and h = 1 s k = 1 c μ i k h = 1 ,   μ i k h 0 , 1 , h = 1 s v h = 1 ,   v h 0 , 1 , and j = 1 d h w j h = 1 ,   w j h 0 , 1 .

3.3. Component-Wise Explanation of the Objective Function

As can be seen in Equation (1), the objective function of our proposed G-CoMVKM consists of two main components including collaborative clustering term in the first term and entropy-regularized feature-view reduction in the second term. The first term promotes clustering within each view while encouraging collaboration across views throughout the ρ term. This enables the algorithm to leverage complementary information and achieve a global consensus. The second term, weighted by θ , is an entropy regularization that encourages sparsity in feature weights. This term penalizes uninformative or redundant features, allowing the algorithm to dynamically eliminate them based on a threshold criterion. Features with small weights are considered unimportant and are pruned, resulting in effective dimensionality reduction.
To provide a clearer understanding, we explain each component of the objective function:
h = 1 s v h γ k = 1 c i = 1 n μ i k h j = 1 d h w j h τ j h ( x i j h a k j h ) 2
The term presented in Equation (2) performs weighted k-means clustering across all views. The weights v h and w j h determine the importance of each view and feature, respectively. The exponent γ and penalty parameter τ j h control the distribution of these weights. A larger γ increases the view weight disparity, allowing dominant views to have more influence. Similarly, a larger τ j h enhances the feature weight discrimination.
δ h = 1 s h = 1 , h h s k = 1 c i = 1 n μ i k h μ i k h v h γ j = 1 d h w j h τ j h ( x i j h a k j h ) 2 v h γ j = 1 d h w j h τ j h ( x i j h a k j h ) 2
As displayed in Equation (3), this term facilitates G-CoMVKM cross-view knowledge transfer by minimizing the disagreement between weighted distances across different views. The parameter δ controls the strength of this collaboration. A larger δ enforces stronger agreement between views, promoting consistency in the clustering results. For more details, we provide a brief description of implementation transfer learning through three key mechanisms. The 1st key is cross-view disagreement quantification. The expression μ i k h μ i k h measures the disagreement in cluster assignments between views h and h′ for each data point i. When views disagree on cluster assignments, this term captures potentially complementary information. The 2nd key is distance-based knowledge transfer. The term j = 1 d h w j h τ j h d i k h j = 1 d h w j h τ j h d i k h compares weighted distances across views, facilitating the transfer of structural information from one view to another. This allows G-CoMVKM to leverage complementary clustering structures that may be more prominent in certain views. While the 3rd key is adaptive transfer strength. The parameter δ in Equation (1) controls the strength of knowledge transfer between views. A larger value of δ enforces stronger collaboration, while a smaller value preserves more view-specific information.
θ h = 1 s j = 1 d h w j h log ( τ j h w j h )
In Equation (4), entropy-based terms prevent the algorithm from assigning extreme weights to specific views or features. They ensure a more balanced weight distribution while still allowing the algorithm to identify and prioritize the most informative components.
w j h < t h , h = 1 , , s , j = 1 , , d h
During optimization, features with weights below a predefined threshold t h are excluded from further computation. This mechanism ensures that only informative features and views contribute to the clustering process, improving both accuracy and computational efficiency. This constraint in Equation (5) implements the feature pruning mechanism by setting feature weights below a certain threshold t h to zero. This effectively eliminates irrelevant or redundant features from the clustering process, improving both performance and interpretability. The threshold t h plays a crucial role in the feature pruning mechanism. It determines which features are considered relevant for clustering. In our implementation, t h is adaptively set based on the distribution of feature weights within each view. In our scenario, the threshold estimator ( t h ) includes 1 / n ,   1 / D ,   t / n D ,   s / n D ,   t / n d h , and θ / n d h 1 . In summary, our threshold represents the division of the actual number of iterations, the total number of dimensionalities of s views input data, the total number of data points, and the number of dimensionalities within one data view. This adaptive threshold allows the algorithm to automatically determine the appropriate level of sparsity for each view.

3.4. Optimization

Theorem 1.
The necessary and sufficient conditions for minimizing the objective function  J G - C o M V K M  in Equation (1) with respect to cluster memberships  μ i k h , cluster centers  a k j h , view weights  v h , and feature weights  w j h  are given as follows:
μ i k h = 1   if   k = arg   min 1 q c v h γ ζ h δ n v h γ ζ h h = 1 , h h s ζ h   0 ,         Otherwise
a k j h = i = 1 n μ i k h Θ j h + δ h = 1 , h h s i = 1 n μ i k h μ i k h Θ j h i = 1 n μ i k h Θ j h + δ h = 1 , h h s i = 1 n μ i k h μ i k h Θ j h x i j h
v h = r = 1 s γ k = 1 c i = 1 n μ i k r ζ r + δ ρ γ k = 1 c i = 1 n μ i k h ζ h + δ ρ 1 γ 1
w j h = 1 τ j h exp ϖ h k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ j = 1 d h 1 τ j h exp ϖ h k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ
where  ρ = h = 1 , h h s k = 1 c i = 1 n μ i k h μ i k h j = 1 d h w j h τ j h d i k h j = 1 d h w j h τ j h d i k h   ,   d i k h = x i j h a k j h 2 and Θ j h = w j h τ j h .
Proof. 
The necessary condition to minimize J G - C o - M V K M w.r.t. μ i k h is similar to the k-means, and so the updating equation for μ i k h can be obtained with Equation (6). To find the necessary condition to minimize J G - C o - M V K M w.r.t. a k j h , we need to differentiate J G - C o M V K M w.r.t. a k j h by treating other variables as fixed. First, we differentiate ρ w.r.t. a k j h with ρ a k j h = a k j h h = 1 , h h s k = 1 c i = 1 n μ i k h μ i k h j = 1 d h w j h τ j h d i k h j = 1 d h w j h τ j h d i k h   = 0 . Then, we get ρ a k j h = a k j h h = 1 , h h s k = 1 c i = 1 n μ i k h μ i k h j = 1 d h w j h τ j h x i j h a k j h 2   = 0 . Due to h = 1 , h h s i = 1 n μ i k h μ i k h w j h τ j h x i j h a k j h = 0 , and let Θ j h = w j h τ j h , we have Ω = J G - C o - M V K M a k j h = h = 1 , h h s i = 1 n μ i k h μ i k h Θ j h x i j h a k j h = 0 . Thus, we obtain J G C o F L K M a k j h = v h γ i = 1 n μ i k h Θ j h x i j h a k j h + δ Ω = 0 and v h γ i = 1 n μ i k h Θ j h x i j h a k j h δ h = 1 , h h s i = 1 n μ i k h μ i k h Θ j h x i j h a k j h = 0 . They can convert to i = 1 n μ i k h Θ j h x i j h + i = 1 n μ i k h Θ j h a k j h δ h = 1 , h h s i = 1 n μ i k h μ i k h Θ j h x i j h + δ h = 1 , h h s i = 1 n μ i k h μ i k h Θ j h a k j h = 0 . We note that i = 1 n μ i k h Θ j h a k j h + δ h = 1 , h h s i = 1 n μ i k h μ i k h Θ j h a k j h = i = 1 n μ i k h Θ j h x i j h + δ h = 1 , h h s i = 1 n μ i k h μ i k h Θ j h x i j h and i = 1 n μ i k h Θ j h + δ h = 1 , h h s i = 1 n μ i k h μ i k h Θ j h a k j h = i = 1 n μ i k h Θ j h + δ h = 1 , h h s i = 1 n μ i k h μ i k h Θ j h x i j h . Therefore, the updated equation of J G - C o - M V K M w.r.t. a k j h is obtained as Equation (7). To find the necessary condition to minimize J G - C o - M V K M w.r.t. v h , we can treat all other variables as constants and differentiate w.r.t. v h . The terms that do not depend on v h will disappear, J G - C o - M V K M will be left with J G - C o - M V K M = h = 1 s v h γ k = 1 c i = 1 n μ i k h j = 1 d h w j h τ j h d i k h + δ ρ . The Lagrangian of J G - C o - M V K M corresponding to v h can be expressed as J ˜ G - C o - M V K M = J G - C o - M V K M + λ 2 h = 1 s v h 1 . To take the derivative of J ˜ G - C o - M V K M w.r.t. v h , we have that J ˜ G - C o - M V K M v h = v h J G - C o F L K M + λ 2 h = 1 s v h 1 = 0 , and then both γ v h γ 1 k = 1 c i = 1 n μ i k h j = 1 d h w j h τ j h d i k h + δ ρ + λ 2 = 0 and v h γ 1 = λ 2 / γ k = 1 c i = 1 n μ i k h ζ h + δ ρ are obtained. They can be reformulated as v h = λ 2 / γ k = 1 c i = 1 n μ i k h ζ h + δ ρ 1 γ 1 . Since h = 1 s v h 1 = 0 , we have r = 1 s λ 2 / γ k = 1 c i = 1 n μ i k h ζ h + δ ρ 1 γ 1 1 = 0 and r = 1 s γ k = 1 c i = 1 n μ i k r ζ r + δ ρ 1 = λ 2 1 , and then r = 1 s γ k = 1 c i = 1 n μ i k r ζ r + δ ρ = λ 2 . Thus, we have the update equation for v h as v h = r = 1 s γ k = 1 c i = 1 n μ i k r ζ r + δ ρ / γ k = 1 c i = 1 n μ i k h ζ h + δ ρ 1 γ 1 . To find the partial derivative of J G - C o - M V K M w.r.t. w j h , we need to consider the terms that depend on w j h , and differentiate them while treating all other variables as constant. To identify the relevant terms, we need to break down the objective function J G - C o - M V K M by removing the absolute value of μ i k h μ i k h . The Lagrangian of J G - C o - M V K M corresponding to w j h can be expressed as J ˜ G - C o - M V K M = J G - C o - M V K M + λ 1 j = 1 d h w j h 1 . First, we have that ρ w j h = h = 1 , h h s k = 1 c i = 1 n μ i k h μ i k h τ j h d i k h = k = 1 c i = 1 n h = 1 , h h s μ i k h μ i k h τ j h d i k h . Therefore, to simplify the notation, we write ρ = ρ / w j h = k = 1 c i = 1 n h = 1 , h h s μ i k h μ i k h τ j h d i k h . The partial derivation of J ˜ G - C o - M V K M w.r.t. w j h can be represented as J ˜ G - C o - M V K M w j h = h = 1 s v h γ k = 1 c i = 1 n μ i k h j = 1 d h w j h τ j h d i k h + δ ρ + θ h = 1 s j = 1 d h w j h log τ j h w j h + λ 1 j = 1 d h w j h 1 w j h = 0 . Then v h γ k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ + θ log τ j h w j h + 1 + λ 1 = 0 can be easily converted as log τ j h w j h = v h γ k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ λ 1 θ 1 . Similarly, let log τ j h w j h = v h γ k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ θ λ 1 θ + 1 be converted to τ j h w j h = exp v h γ k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ θ / exp λ 1 θ + 1 . Therefore, we have that w j h = 1 τ j h exp v h γ k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ θ / exp λ 1 θ + 1 and w j h = 1 τ j h exp ϖ h k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ j = 1 d h 1 τ j h exp ϖ h k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ (*). Since j = 1 d h w j h 1 = 0 , we obtain that j = 1 d h 1 τ j h exp v h γ k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ θ = exp λ 1 θ + 1 (**). By substituting (**) into (*), we have w j h = 1 τ j h exp v h γ k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ θ / j = 1 d h 1 τ j h exp v h γ k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ θ . Now let ϖ h = v h γ / θ , then the updated equation for feature weight w j h can be expressed as w j h = 1 τ j h exp ϖ h k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ j = 1 d h 1 τ j h exp ϖ h k = 1 c i = 1 n μ i k h τ j h d i k h + δ ρ . This is Equation (9). □

3.5. Parameter, Threshold Setting and Impact

In addition to excluding irrelevant features within a single view of multiple representation data commonly known as MV data, we are experimenting with G-CoMVKM with a scheme of decreasing the number of views using feature-view-weighted based G-CoMVKM. The purpose of this experiment is to discover specific insights within one view during clustering processes. The idea is simply to answer whether there is a nonzero effect or not if we exclude unlikely or even unrealistic views during computation. Views with lower contributions are considered less important and can be excluded from the clustering process, thereby reducing the computational complexity, and improving the clustering performance. By combining both feature-level and view-level selection criteria in a collaborative manner, G-CoMVKM can effectively handle high-dimensional MV datasets by transforming them into lower dimensionalities using a subset reduction framework of redundant or noisy features. The feature-level selection helps to identify relevant features of each view, while the view-level selection helps to identify unlikely or even unrealistic views during the clustering process. Together, they help to improve the accuracy and efficiency of the clustering process of the proposed G-CoMVKM.
There are four essential parameters in the proposed G-CoMVKM an exponent parameter γ to handle the distribution of one data view, a balancing parameter δ to handle the disagreement across views, a balancing parameter θ to regularize the entropy weights to lead to feature pruning, and a penalty parameter τ to handle the distribution of feature-view components. We need to define γ as well as possible to be matched for a collaboration step. Since multi-view data is varied, the measurement to control each data view h is also different from each other. Generally, we specify γ as a user defines with γ 3 , . In G-CoMVKM, the parameter θ is also user-defined, with θ 0 , 1 , for producing a desirable result, the starting point is 0.1 with incremental step 0.05 or 0.1. For producing a proper lower number of dimensionalities within one view on some MV data, tuning the parameter θ with criterion θ > 2 and γ 0 , 1 are recommended. Detailed analysis of these two parameters of γ and θ will be given in the experiments and results section. The estimation for τ j h depends on the minimum, mean, and maximum input data within one view, formulated as below:
τ j h = mean x i j h i = 1 n x i j h / mean max i x i j h min i x i j h
As visualized in Equation (10), the mean of data view measures the dispersion of feature weights and mean of maximization/minimum of data points in each view guaranteeing the variation of data sets. To exclude irrelevant features within one view during clustering processes, we need to take a threshold into account. We discontinue unimportant features-view by using some selection threshold criteria. In our scenario, the threshold estimator t h including 1 / n ,   1 / D ,   t / n D ,   s / n D ,   t / n d h , and θ / n d h 1 . In our procedure, we define δ = γ θ , a multiplication between parameters to handle the distribution of view weights, and regularization for feature weights work better to handle collaborative transfer learning in our proposed G-CoMVKM. This adaptive balancing parameter maintains the greater weight variance, impacting clearer discrimination between important and unimportant features.

3.6. Fusion Stage

The initial memberships are randomly generated and updated based on the input data within one view taking advantage of its own local and collaborative steps weighting component. In the very first stage, different views will have different membership values. The updating equation of view weight, cluster centers, and feature weights is then calculated in their own loop, locally. In this sense, we can name it as a local updating estimation. Each feature-view component within multi-view data interacted with each other based on its weights to transform the original dimension of MV data into a new lower dimensionality without matrices-based reconstruction approaches. In the scenario, it simply transfers their information to build an agreement for generating a global solution considering feature based-threshold reduction step. As our global solution is formulated, a new transformation of original MV data expectedly will have lower dimensions containing relevant feature component. The global solution will be finally made by integrating the membership of the local step, named a fusion step. This fusion step will conclude the final pattern of multi-view data. Our fusion step is formulated as follows.
U ¯ = h = 1 s μ i k h v h
where μ i k h is the membership of ith data point assigned in the kth cluster for hth view, and v h is the weight for the hth view.

3.7. Algorithm Implementation

In this subsection, we outline the full implementation of the proposed G-CoMVKM clustering algorithm (Algorithm 1). The algorithm begins by initializing the cluster centers, feature weights, and view weights, and then, it iteratively updates these variables until convergence or until the maximum number of iterations is reached. Thus, the proposed G-CoMVKM algorithm is summarized as follows in which Figure 2 demonstrates its flowchart.
Algorithm 1. The G-CoMVKM Clustering Algorithm
Input: Multi-view dataset X = x 1 , x 2 , , x n with x i = x i h h = 1 s , x i h = x i j h j = 1 d h , number of clusters c, parameters γ , δ , θ , maximum iterations t, convergence threshold ϵ
Initialization:
Initialize cluster centers a k j h randomly or using k-means++
Initialize feature weights w j h = 1 / d h for all j = 1 , , d h , h = 1 , , s
Initialize view weights v h = 1 / s for all h = 1 , , s
t = 0 and J G - C o M V K M 0 =
While  t < t max and J G - C o M V K M t J G - C o M V K M t 1 > ϵ  do
      // Compute penalty parameter
      Compute τ j h by Equation (10).
      // Compute membership matrix
      for  i = 1  to  n  do
             for  k = 1  to  c  do
                      Calculate D i k = v h γ j = 1 d h w j h τ j h ( x i j h a k j h ) 2 v h γ j = 1 d h w j h τ j h ( x i j h a k j h ) 2
                      Compute μ i k h according to Equation (6)
      // Update Cluster Centers
      for  h = 1  to  s  do
             for  k = 1  to  c  do
                    for  j = 1  to  d h  do
                      Calculate Θ j h
                      Compute a k j h according to Equation (7)
      // Update Feature Weights
      for  h = 1  to  s  do
             for  j = 1  to  d h  do
                      Calculate ρ = h = 1 , h h s k = 1 c i = 1 n μ i k h μ i k h j = 1 d h w j h τ j h d i k h j = 1 d h w j h τ j h d i k h
                      Update w j h according to Equation (9)
      // Feature pruning
      for  h = 1  to  s  do
             Calculate threshold t h
             for  j = 1  to  d h  do
                      set w j h = 0 if w j h < t h
             Renormalize remaining non-zero weights: w j h = w j h / j = 1 d h w j h
      // Update viee weights
      for  h = 1  to  s  do
             Calculate ζ h
             Update v h according to Equation (8)
      // View pruning      
      for  h = 1  to  s  do
             set v h = 0 if v h = min v h
             Renormalize remaining non-zero weights: v h = v h / r = 1 s v r
      // Calculate objective function value
      Calculate J G - C o M V K M t + 1 according to Equation (1)
       t = t + 1

3.8. Computational Complexity and Initialization Sensitivity

The computational complexity of G-CoMVKM is an important consideration, especially when dealing with large-scale multi-view datasets. As a non-convex optimization problem, G-CoMVKM employs Lagrangian multiplier techniques to derive the update functions for each variable while satisfying the respective constraints. This analytical approach, while mathematically elegant, inherits the initialization sensitivity common to many clustering algorithms in the k-means family. Similar to k-means and its variants, G-CoMVKM’s convergence to a global optimum cannot be guaranteed due to the non-convex nature of its objective function. The algorithm typically converges to a local minimum, with the quality of this solution being heavily dependent on the initial cluster centers, feature weights, and view weights. Multiple runs with different initializations are often necessary to identify a robust solution—a common practice in non-convex clustering frameworks.
With these considerations in mind, we analyze the complexity of each major step in the algorithm, accounting for both the computational requirements and the iterative nature needed to mitigate initialization sensitivity. First, for the membership updating of μ i k h , we need to compute the distance to each cluster center across all views. The computational complexity is O ( n × c × s × d ¯ ) , where n is the number of samples, c is the number of clusters, s is the number of views, and d ¯ = 1 s h = 1 s d h is the average dimensionality across views. Second, for the cluster center updating of a k j h , it requires computing the weighted sum of all assigned samples, and so the computational complexity is O ( n × c × s × d ¯ ) . Third, for the view weight updating of v h , it involves computing memberships and cluster centers and so has a complexity of O ( n × c × s × d ¯ ) . Similarly, for feature weight updating of w j h , it has a complexity of O ( n × c × s × d ¯ ) . Thus, the overall computational complexity of the G-CoMVKM algorithm should be O ( n × c × s × d ¯ × t ) , where t is the number of iterations.

3.9. Comparison with Existing Methods

Table 2 compares the computational complexity of G-CoMVKM with existing multi-view clustering algorithms.
Despite incorporating the collaborative transfer mechanism and feature pruning, G-CoMVKM maintains the same asymptotic complexity as basic multi-view k-means variants like TW-K-means and FRMVK. This is achieved through efficient implementation of the updating rules and pruning mechanism. The computational advantage becomes even more significant as the algorithm progresses because feature pruning reduces the effective dimensionality with each iteration, view weights concentrate on the most informative views, allowing for potential view-based early stopping strategies, and adaptive threshold calculation adds negligible overhead. In practice, G-CoMVKM shows superior computational efficiency compared to TW-Co-K-means due to the latter’s quadratic dependency on the number of views. Additionally, G-CoMVKM’s feature pruning mechanism not only improves interpretability but also enhances computational efficiency by reducing the effective dimensionality during clustering.

4. Experiments and Results

In this section, numerical and real data sets will be implemented to evaluate the performances of the proposed G-CoMVKM and several related MVKM algorithms. Table 3 summarizes detailed information of ten real-world applications and Table 4 summarizes detailed parameter settings of related methods used as comparisons. All algorithms in our experiments are evaluated by using three evaluation clustering metrics: accuracy rate (AR), Rand Index (RI) [35], and Normalized Mutual Information (NMI) [36,37]. The AR is calculated by using k = 1 c n k / n where n is the number of data points and n k is the number of correctly clustered data points in cluster k. The RI evaluates how similar the cluster assignments are to one another by making pair-wise comparisons. Let P i , P j be a given pair of points from the data set. Let a be the number of pairs of points in which both points belong to the same cluster in C and the same cluster in C , b denote the number of point pairs in which both points belong to two different clusters C and two different clusters in C , c be the number of pairs of points if the two points belong to the same cluster in C and different clusters in C , and d be the number of pairs of points if the two points belong to two different clusters in C and to the same cluster in C . The overall number M = n n 1 / 2 of possible point pairs within the dataset and the number n of data points are used to determine RI with R I = a + d / M . NMI measures the amount of information on the presence/absence of a term that contributes to making the correct classification decision [36,37]. NMI is computed by NMI = M I y , y ^ / [ H y + H y ^ ] / 2 , where H y and H y ^ are the marginal entropies of y and y ^ , respectively, and M I y , y ^ is a mutual information between the ground truth y and the predicted y ^ cluster. The values of AR, RI, and NMI range from 0 to 1. Thus, a higher value indicates larger similarity. The worst value is close to 0, and the best value is close to 1.
Example 1.
A two-dimensional two-view data with 1000 points distributed into two clusters is generated from a 2-component 2-variate Gaussian mixture model (GMM), as shown in Figure 3a–c, is called 2V2D2C. View 1 is named as V1D2C, while view 2 is named V2D3C. These  x 1 1  and  x 2 1  are the coordinates for V1D2C, while  x 1 2 ,  x 2 2 , and  x 3 2  are the coordinates for V2D3C. The means  μ k 1  for V1D2C in the 2V2D2C data are  2 , 2  and  6 , 6 . The means  μ k 2  for V2D3C in the 2V2D2C data are  6 , 6  and  2 , 2 . The covariance matrices and mixing proportions for the two views are  1 1 = 2 1 = 1 2 = 2 2 = 1 0 0 1 ,  α k 1 = α k 2 = 1 / 2 ,   k = 1 , 2 , respectively. To simulate our scenario in selecting relevant features with automatically unimportant feature-reduction, an additional feature component  x 3 2  is added to V3D2C. We design all feature components in view 1 that are relevant, thus the procedure of reduction is expected to not be applied and continuously processed as they retain as good features. The additional feature is generated from a uniform distribution to demonstrate unimportant features within one view in this multi-view data scenario, with the range from 50 to 60. In this experiment, we investigate the ability of our proposed G-CoMVKM algorithm to detect the unimportant features within each data view.
Table 3 shows the results of our experiments on this syntactic data by dropping out unimportant features during the clustering processes. As can be seen in Table 3, the proposed G-CoMVKM effectively shows its ability to correctly measure the quality of feature-view components and views. It can identify the third component of V2D3C as a non-informative feature and automatically exclude it from the account after iteration 1. While the remaining feature-view components of V1D2C handle the robustness to improve the accuracy by 65.6% after the third iteration. The experimental results successfully support our hypothesis to find a good pattern by exploiting the relevancy of both feature and view components subjected to reduce the dimensionality of each view that is designed as a noisy-feature-view. Moreover, the distribution of data points within one cluster for a single view (locally) and globally across views is reported in Table 4. As shown in Table 4, G-CoMVKM converges completely after 4 iterations, with view weights distribution stabilizing by the 3rd iteration. This stable distribution of weights determines the final assignment of data points across clusters. The early stabilization of some weights, far from being problematic, demonstrates the algorithm’s efficiency in identifying and preserving important features while focusing computational effort on optimizing features that require more refinement.
Example 2.
In this experiment, we consider two real datasets, called Wikipedia Articles [38] and Prokaryotic phyla [39]. The detailed information of these two datasets is displayed in Table 5. The purpose of this experiment is to learn a typical threshold for filtering the unimportant noisy or redundant features within one view. Our experiments will include some learning thresholds to see the effect of dimensionality reduction in advance. Our learning threshold will use the current number of dimensionality, number of data points, number of iterations, and number of views to distinguish a pair of essential feature-view components from a pair of unimportant feature-view components. This behavior occurs because:
  • Feature importance stability: These features ( x 1 1 , x 2 1 for view 1 and x 1 2 , x 2 2 for view 2) represent highly informative dimensions that are immediately recognized by the algorithm as significant for clustering. Their weights stabilize early because they consistently provide strong discriminative power.
  • View-specific convergence: Different views may converge at different rates depending on their information content. In this case, the stable weights indicate that the algorithm quickly identified the optimal weighting for these specific features.
  • Collaborative learning effect: The collaborative learning mechanism in G-CoMVKM can accelerate convergence for features that have strong consensus across views, resulting in early stabilization of their weights.
Table 5. A Summary of Seven Real World Datasets.
Table 5. A Summary of Seven Real World Datasets.
Data SetsCharacteristics
# of h# of n# of cName of Views# of d
Biological data
Prokaryotic phyla [39]35514Gene repertoire393
Proteome composition3
Textual438
Image data
MNIST4 [40]340004ISO30
LDA9
NPE30
COIL20 [41]3144020Degree 0 , 85 30
Degree 90 , 175 19
Degree 180 , 265 30
ORL Face [42]440040Intensity4096
LBP3304
Gabor6750
PHOG1024
Text and Image Data
Wikipedia articles [38]263910Text128
Image10
Motion or Human Activity Recognition (HAR) Data
DHA [43] 225323RGB view6144
Depth view110
UWA [44]225430RGB view6144
Depth view110
As comparisons, we employ TW-k-means, WMCFS, SWVF, TW-Co-k-means, and FRMVK to quantify the superiority of our proposed G-CoMVKM. Note that the parameter settings on these five competitive algorithms of TW-k-means, WMCFS, SWVF, TW-Co-k-means, and FRMVK are presented in Table 6.
The behavior of different thresholds on Wikipedia articles data produced by G-CoMVKM are displayed in Table 7. As can be seen, the first view of Wikipedia Articles, which is represented by text data, shows a trend of dimensionality reduction. The highest performance was obtained when the remaining dimension of the text view is 13, producing 56.13% ARs, 88.55% RIs, and 54.15% NMIs. Here, the image feature-view components are consistently relevant when these thresholds of s / n D ,   1 / n , and 1 / D assigned to exclude the unimportant feature from the account. In Prokaryotic Phyla data, the best performance is reached when threshold t h = t / n D , displayed in Table 8. The three feature-view components are reduced to 4, 3, and 7, respectively. To be noted, the maximum iterations in this experiment were set manually and each experiment ran under 100 different random initializations. The performances of five related algorithms on Wikipedia articles [38] and prokaryotic phyla data [39] are reported in Table 9. As can be seen in Table 7, the proposed G-CoMVKM performs better as compared to these six related methods with 56.13% Ars, 88.55% Ars, and 54.15% NMIs on Wikipedia articles: 62.49% ARs, 67.57% RIs, and 40.60% NMIs on Prokaryotic phyla data. Using this example, we investigate the impact of parameters η , α , and β of TW-Co-k-means clustering algorithm on the Wikipedia articles data set. As can be seen in Figure 4a,b, given by the certain value of the fixed parameter and different values of another parameter of TW-Co-k-means on Wikipedia articles does not significantly affect the clustering performances. The final dimensionalities after feature reductions on Wikipedia articles and Prokaryotic Phyla performed by our proposed G-CoMVKM and FRMVK clustering algorithms are reported in Table 10. As shown in Table 10, compared to FRMVK with final d h = 57 10 on Wikipedia articles and d h = 116 3 124 on prokaryotic phyla. Our proposed G-CoMVKM final d h = 13 10 on Wikipedia articles performs better with an improvement about 0.63% ARs, 0.51% RIs, and 1.15% NMIs. While final d h = 4 3 7 on prokaryotic phyla makes an improvement about 6.35% ARs, 3.14% RIs, and 8.31% NMIs.
Example 3.
In this experiment, real-world applications such as COIL20 [41] and MNIST [40] datasets, will be used to see the effectiveness of the proposed G-CoMVKM in demonstrating data with larger sizes of data points and clusters. The detailed information of these data is reported in Table 5. For simulation purposes, all the algorithms are set to be run manually and using the default parameter setting provided in Table 6. Additionally, we set up the maximum number of iterations for each running algorithm as 35 times and report the minimum, average, and maximum values based on a hundred simulations. Table 11 and Table 12 reported detailed clustering performances of our proposed G-CoMVKM and five related clustering algorithms in the minimum, average, and maximum values of these ARs, RIs, and NMIs. To perform the consistency of the importance of one view within multi-view data, we employed these six competitive clustering algorithms of TW-K-means [27], WMCFS [21], SWVF [22], TW-Co-K-means [30], and FRMVK [29]. Figure 5a,b displays the distribution of view weights performed by our proposed G-CoMVKM and these six employed related clustering algorithms on MNIST4 [40], and COIL20 [41] data sets, respectively. It is clearly shown that the proposed G-CoMVKM and six employed related clustering algorithms assigned a different weight to one data view that realized its contribution during the clustering process. Based on clustering performances in Table 11 and Table 12 and data views contribution in Figure 6a,b, we next investigate the impact of parameters of WMCFS, SWVF, and FRMVK on COIL20 data shown in Figure 6a–e. As can be seen, a different number of parameters combination on WMCFS and SWVF affects the clustering performances; While FRMVK on COIL20 data perform stable when different values of parameter assigned. This analysis showed that different combination parameters must be chosen properly to enable a good clustering performance on WMCFS and SWVF. Overall, the proposed G-CoMVKM is superior as compared to five employed competitive clustering algorithms. Even in the computational cost, our proposed G-CoMVKM is considerably faster with additional feature-view weighted, collaborative, and feature reduction steps, especially on a very large number of observation data sets such as MNIST4 [40] and COIL20 [41].
Example 4.
To enhance the assessment of the proposed G-CoMVKM’s efficacy, the view importance procedures will be implemented during the clustering processes. The significance of one view will be separately evaluated using our proposed G-CoMVKM-based-feature-view-weighted approaches, which will help to exclude the bias or unrealistic view and expectedly generate more valuable insights. The approach involves selecting views with lower dimensions and analyzing them in-depth. Our focus is on examining views that offer abundant information, with particular emphasis on the features that are most pertinent for pattern recognition.
v h = min v 1 , v 2 , , v s
The hth view with v h according to Equation (12) is significant and can then be subjected to a repeated process of G-CoMVKM, which excludes unimportant features to refine the results. As an experimental design to enhance accuracy, we could exclude one or two unrealistic detected views and observe if even minor changes in behavior lead to an improvement. We believe that good feature-view components can significantly boost accuracy and be more cost-effective in terms of running time. To address the effect of the decreasing number of views on MV data, we applied G-CoMVKM on Olivetti Research Laboratory (ORL) face data [42]. Table 13 reported that the G-CoMVKM produced the best results when all four view components on ORL face data were processed. It outperformed the G-CoMVKM with two or three views of ORL face as input data. In other words, there is no upgraded phenomenon when the number of views decreases. Conversely, the downgraded clustering performances happened with consistently processing all data-feature-view components obtained by G-CoMVKM on complete views. More preciously, the memberships among local and global feature-view component steps with complete views showed no tendencies of over-distributed behaviors, displayed in Figure 7a,b. Figure 8a,b displayed the feature and view weights behaviors on the ORL face data set by processing all the original view components as the input data and by decreasing the number of views, respectively. As can be seen in Figure 8a, each feature component of intensity’s, LBP’s, Gabor’s, and PHOG’s views showing trends. There exist some components with more and less contribution even after feature reduction is implemented. The high contribution of one component is occupied by the peak of the graph and the bottom indicates the lowest or less contribution. Figure 7b shows that the less the number of views, the better discriminated is the importance of one component. As the number of views decreased to two, G-CoMVKM recognized that intensity has the highest contribution in discovering the pattern of ORL face data. However, it linearly decreased the performances by 7.35% ARs, 5% RIs, and 4.21% NMIs. Observing these results, we conclude that the decreased number of views within MV data does not guarantee an upgrade phenomenon on clustering performances.
Example 5.
In this experiment, our goal is to identify motion or Human Activity Recognition (HAR) data, called UWA3D multi-view activity [43] and Depth-induced Human Action (DHA) [44]. The UWA data contains 660 action sequences; 11 actions were performed by 12 subjects with five repetitive actions. While DHA contains 483 video clips of 23 categories. These two HAR data were extracted with RGB and depth features. In this experiment, we first conduct 30 different simulations based on the G-CoMVKM algorithm and report its minimum, average, and maximum ARs, RIs, and NMIs. Secondly, we conduct an experiment by tuning the two parameters  γ  and  θ  on UWA and DHA. Further comparisons with six related methods will be made and reported its minimum, average, and maximum clustering metric performances under 30 different simulations.
Figure 9a displays the distribution of feature components on RGB and Depth views of DHA data after feature reduction performed by G-CoMVKM. While Figure 9b displayed its view weight distributions. For DHA data, we set γ as 0.49, 0.51, 0.59, 0.65, 0.67, and 0.69. The minimum, average, and maximum values of Ars, RIs, and NMIs produced by our proposed G-CoMVKM are displayed in a bar chart, presented in Figure 10a–c. For UWA data, we set γ as 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, and 0.95. The minimum, average, and maximum values of its Ars, RIs, and NMIs produced by our proposed G-CoMVKM are displayed in a bar chart, presented in Figure 11a–c. The final dimensionalities of each view on DHA and UWA data after feature reduction performed by G-CoMVKM are presented in Table 14. As a comparison, the final dimensionalities produced by FRMVK are also reported in the same table. As it can be seen, all γ produce the same final dimensionality on the Depth view when γ is 0.25 0.35 0.45 0.55 0.65 0.75 . While it produces a different final dimensionality on both RGB and Depth views when γ is 0.85 0.95 . Although all these γ produce a different number of dimensionalities within one view, it did not affect the final clustering performances. This is due to the “relevant features” and “weakly relevant but non-redundant features” are both included and discriminated by our threshold during clustering processes. Overall, the proposed G-CoMVKM performs better as compared to another related state-of-the-art, as displayed in Table 15.

5. Conclusions

In summary, the proposed G-CoMVKM algorithm can capture the local, collaborative, and global steps by taking advantage of features-view reduction into a single framework. It produces a new lower dimensionality within one view to pattern desirable MV data. While identifying redundant components and dropping them based on a threshold criterion. The G-CoMVKM succeed in transforming a new MV data with relevant feature components. Based on our experimental design and results, the proposed G-CoMVKM can process these real MV data applications and generate a feature-view reduction efficiently, without compromising on clustering performances. Overall, the G-CoMVKM is superior in terms of parameter tuning and its efficiency in transforming a high dimensionality without taking a matrices-based reconstruction approach into account. It is a simple MVKM algorithm that simultaneously enables transfer feature-view learning and feature reduction steps to form a global solution considering local and collaboration steps across data views. However, the proposed G-CoMVKM needs further improvement in terms of consideration of privacy concerns by using a concept of collaborative MV scenarios under multiple clients. Thus, our future work will consider including federated learning to discover hidden information from MV data under multiple clients.

Author Contributions

Conceptualization, K.P.S. and M.-S.Y.; methodology, K.P.S. and M.-S.Y.; validation, K.P.S.; formal analysis, K.P.S.; investigation, K.P.S. and M.-S.Y.; writing—original draft preparation, K.P.S.; writing—review and editing, M.-S.Y.; supervision, M.-S.Y.; funding acquisition, M.-S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data generated or analyzed during this study are included in this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. McLachlan, G.J.; Basford, K.E. Mixture Models: Inference and Applications to Clustering; Marcel Dekker: New York, NY, USA, 1988. [Google Scholar]
  2. Kaufman, L.; Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis; Wiley: New York, NY, USA, 1990. [Google Scholar]
  3. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B 1977, 39, 1–22. [Google Scholar] [CrossRef]
  4. Yu, J.; Chaomurilige, C.; Yang, M.S. On convergence and parameter selection of the EM and DA-EM algorithms for Gaussian mixtures. Pattern Recognit. 2018, 77, 188–203. [Google Scholar] [CrossRef]
  5. Jain, A.K. Data clustering: 50 years beyond k-means. Pattern Recognit. Lett. 2010, 31, 651–666. [Google Scholar] [CrossRef]
  6. Sinaga, K.P.; Hussian, I.; Yang, M.S. Entropy k-means clustering with feature reduction under unknown number of clusters. IEEE Access 2021, 9, 67736–67751. [Google Scholar] [CrossRef]
  7. Chang-Chien, S.J.; Hung, W.L.; Yang, M.S. On mean shift-based clustering for circular data. Soft Comput. 2012, 16, 1043–1060. [Google Scholar] [CrossRef]
  8. Yuan, Y.; Zhou, Y.; Chen, X.; Xiong, Q.; Okere, H.C. Enhancing recommendation diversity and novelty with Bi-LSTM and mean shift clustering. Electronics 2024, 13, 3841. [Google Scholar] [CrossRef]
  9. Ding, L.; Li, C.; Jin, D.; Ding, S. Survey of spectral clustering based on graph theory. Pattern Recognit. 2024, 151, 110366. [Google Scholar] [CrossRef]
  10. Xie, D.; Gao, Q.; Zhao, Y.; Yang, F.; Song, W. Consistent graph learning for multi-view spectral clustering. Pattern Recognit. 2024, 154, 110598. [Google Scholar] [CrossRef]
  11. Bezdek, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Plenum Press: New York, NY, USA, 1981. [Google Scholar]
  12. Lata, A.A.; Kang, M.; Shin, S. FCM-OR: A local density-aware opportunistic routing protocol for energy-efficient wireless sensor networks. Electronics 2025, 14, 1841. [Google Scholar] [CrossRef]
  13. Zheng, Q.; Zhu, J.; Li, Z.; Tang, H. Graph-guided unsupervised multiview representation learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 146–159. [Google Scholar] [CrossRef]
  14. Hajiveiseh, A.; Seyedi, S.A.; Tab, F.A. Deep asymmetric nonnegative matrix factorization for graph clustering. Pattern Recognit. 2024, 148, 110179. [Google Scholar] [CrossRef]
  15. Krishnapuram, R.; Keller, J.M. A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1993, 1, 98–110. [Google Scholar] [CrossRef]
  16. Yu, H.; Xie, S.; Fan, J.; Lan, R.; Lei, B. Mahalanobis-kernel distance-based suppressed possibilistic c-means clustering algorithm for imbalanced image segmentation. IEEE Trans. Fuzzy Syst. 2024, 32, 4595–4609. [Google Scholar] [CrossRef]
  17. Cleuziou, G.; Exbrayat, M.; Martin, L.; Sublemontier, J.H. CoFKM: A centralized method for multiple-view clustering. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, Miami Beach, FL, USA, 6–9 December 2009; pp. 752–757. [Google Scholar]
  18. Du, G.; Zhou, L.; Li, Z.; Wang, L.; Lü, K. Neighbor-aware deep multi-view clustering via graph convolutional network. Inf. Fusion 2023, 93, 330–343. [Google Scholar] [CrossRef]
  19. Yang, M.S.; Hussain, I. Unsupervised multi-view k-means clustering algorithm. IEEE Access 2023, 11, 13574–13593. [Google Scholar] [CrossRef]
  20. Busch, E.L.; Huang, J.; Benz, A.; Wallenstein, T.; Lajoie, G.; Wolf, G.; Krishnaswamy, S.; Turk-Browne, N.B. Multi-view manifold learning of human brain-state trajectories. Nat. Comput. Sci. 2023, 3, 240–253. [Google Scholar] [CrossRef]
  21. Xu, Y.M.; Wang, C.D.; Lai, J.H. Weighted multi-view clustering with feature selection. Pattern Recognit. 2016, 53, 25–35. [Google Scholar] [CrossRef]
  22. Jiang, B.; Qiu, F.; Wang, L. Multi-view clustering via simultaneous weighting on views and features. Appl. Soft Comput. 2016, 47, 304–315. [Google Scholar] [CrossRef]
  23. Long, Z.; Zhu, C.; Comon, P.; Liu, Y. Feature space recovery for incomplete multi-view clustering. In Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023. [Google Scholar]
  24. Shannon, C.E. A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 2001, 5, 3–55. [Google Scholar] [CrossRef]
  25. Yang, M.S.; Sinaga, K.P. Collaborative feature-weighted multi-view fuzzy c-means clustering. Pattern Recognit. 2021, 119, 108064. [Google Scholar] [CrossRef]
  26. MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
  27. Chen, X.; Xu, X.; Huang, J.Z.; Ye, Y. TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data. IEEE Trans. Knowl. Data Eng. 2013, 25, 932–944. [Google Scholar] [CrossRef]
  28. Huang, J.Z.; Ng, M.K.; Rong, H.; Li, Z. Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 657–668. [Google Scholar] [CrossRef]
  29. Yang, M.S.; Sinaga, K.P. A feature-reduction multi-view k-means clustering algorithm. IEEE Access 2019, 7, 114472–114486. [Google Scholar] [CrossRef]
  30. Zhang, G.Y.; Wang, C.D.; Huang, D.; Zheng, W.S.; Zhou, Y.R. TW-Co-k-means: Two-level weighted collaborative k-means for multi-view clustering. Knowl.-Based Syst. 2018, 150, 127–138. [Google Scholar] [CrossRef]
  31. Yang, J.; Marcus, D.S.; Sotiras, A. DMC-Net: Lightweight Dynamic Multi-Scale and Multi-Resolution Convolution Network for Pancreas Segmentation in CT Images. Biomed. Signal Process. Control. 2025, 109, 107896. [Google Scholar] [CrossRef]
  32. Yang, J.; Marcus, D.S.; Sotiras, A. Dynamic U-Net: Adaptively calibrate features for abdominal multiorgan segmentation. In Medical Imaging 2025: Computer-Aided Diagnosis; SPIE: San Diego, CA, USA, 2025; Volume 13407, p. 134071D. [Google Scholar] [CrossRef]
  33. Hassani, K.; Khasahmadi, A.H. Contrastive multi-view representation learning on graphs. In Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, 13–18 June 2020; Volume 119, pp. 4116–4126. [Google Scholar]
  34. Fu, L.; Huang, S.; Zhang, L.; Yang, J.; Zheng, Z.; Zhang, C.; Chen, C. Subspace-contrastive multi-view clustering. ACM Trans. Knowl. Discov. Data 2024, 18, 1–35. [Google Scholar]
  35. Rand, W.M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar] [CrossRef]
  36. Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 1991. [Google Scholar]
  37. Kvalseth, T.O. On normalized mutual information: Measure derivations and properties. Entropy 2017, 19, 631. [Google Scholar] [CrossRef]
  38. Pereira, J.C.; Coviello, E.; Doyle, G.; Rasiwasia, N.; Lanckriet, G.R.; Levy, R.; Vasconcelos, N. On the role of correlation and abstraction in cross-modal multimedia retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 521–535. [Google Scholar] [CrossRef]
  39. Brbić, M.; Piškorec, M.; Vidulin, V.; Kriško, A.; Šmuc, T.; Supek, F. The landscape of microbial phenotypic traits and associated genes. Nucleic Acids Res. 2016, 44, 10074–10090. [Google Scholar] [CrossRef]
  40. LeCun, Y.; Cortes, C.; Burges, C.J. The MNIST Database of Handwritten Digits. 1998. Available online: http://yann.lecun.com/exdb/mnist (accessed on 3 February 2025).
  41. Columbia Object Image Library (COIL-20). Available online: https://git-disl.github.io/GTDLBench/datasets/coil20/ (accessed on 12 February 2025).
  42. Olivetti Face Data Set, The Database of Faces, AT&T Laboratories Cambridge. 2002. Available online: http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html (accessed on 3 February 2025).
  43. Wang, L.; Ding, Z.; Tao, Z.; Liu, Y.; Fu, Y. Generative multi-view human action recognition. In Proceedings of the 2019 IEEE.CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6212–6221. [Google Scholar]
  44. Lin, Y.-C.; Hu, M.-C.; Cheng, W.-H.; Hsieh, Y.-H.; Chen, H.-M. Human action recognition and retrieval using sole depth information. In Proceedings of the 20th ACM International Conference on Multimedia, Nara, Japan, 29 October–2 November 2012; pp. 1053–1056. [Google Scholar]
Figure 1. Overall architecture of the proposed G-CoMVKM.
Figure 1. Overall architecture of the proposed G-CoMVKM.
Electronics 14 02129 g001
Figure 2. Flowchart of the proposed G-CoMVKM algorithm.
Figure 2. Flowchart of the proposed G-CoMVKM algorithm.
Electronics 14 02129 g002
Figure 3. Numerical data generated from GMM with 2-views-2-dimensions-2-clusters (2V2D2C): (a) V1D2C; (b) V3D2C; (c) Data numeric generated from GMM + Uniform with view-3-dimensions-2-clusters (V3D2C).
Figure 3. Numerical data generated from GMM with 2-views-2-dimensions-2-clusters (2V2D2C): (a) V1D2C; (b) V3D2C; (c) Data numeric generated from GMM + Uniform with view-3-dimensions-2-clusters (V3D2C).
Electronics 14 02129 g003
Figure 4. The Clustering performances of TW-Co-k-means on Wikipedia Articles data: (a) under different η with fixed α = 30 and β = 25 ; and (b) under different β with fixed α = 30 and η = 25 .
Figure 4. The Clustering performances of TW-Co-k-means on Wikipedia Articles data: (a) under different η with fixed α = 30 and β = 25 ; and (b) under different β with fixed α = 30 and η = 25 .
Electronics 14 02129 g004
Figure 5. View weights distribution produced by the comparison and proposed G-CoMVKM algorithms on (a) MNIST data; and (b) COIL20 data.
Figure 5. View weights distribution produced by the comparison and proposed G-CoMVKM algorithms on (a) MNIST data; and (b) COIL20 data.
Electronics 14 02129 g005
Figure 6. The Clustering performances of three related algorithms on COIL20 data based on different tuning parameters: (a) WMCFS under different β and fixed α = 7 ; (b) WMCFS under fixed β = 0.005 and different α ; (c) SWVF under fixed α and different β ; (d) SWVF under fixed β and different α ; (e) FRMVK under different α .
Figure 6. The Clustering performances of three related algorithms on COIL20 data based on different tuning parameters: (a) WMCFS under different β and fixed α = 7 ; (b) WMCFS under fixed β = 0.005 and different α ; (c) SWVF under fixed α and different β ; (d) SWVF under fixed β and different α ; (e) FRMVK under different α .
Electronics 14 02129 g006aElectronics 14 02129 g006b
Figure 7. The behavior of U on ORL Face data performed by G-CoMVKM: (a) reported from one initialization; and (b) average of 30 different initializations.
Figure 7. The behavior of U on ORL Face data performed by G-CoMVKM: (a) reported from one initialization; and (b) average of 30 different initializations.
Electronics 14 02129 g007
Figure 8. The distribution of feature-view components on ORL data processed by G-CoMVKM: (a) feature weight components; and (b) view weight.
Figure 8. The distribution of feature-view components on ORL data processed by G-CoMVKM: (a) feature weight components; and (b) view weight.
Electronics 14 02129 g008
Figure 9. The distribution of feature-view components on DHA data processed by G-CoMVKM: (a) feature weight components; and (b) view weight.
Figure 9. The distribution of feature-view components on DHA data processed by G-CoMVKM: (a) feature weight components; and (b) view weight.
Electronics 14 02129 g009
Figure 10. The Clustering performances of G-CoMVKM on DHA data based on different γ and fixed θ: (a) Minimum values; (b) Mean values; and (c) Maximum values.
Figure 10. The Clustering performances of G-CoMVKM on DHA data based on different γ and fixed θ: (a) Minimum values; (b) Mean values; and (c) Maximum values.
Electronics 14 02129 g010
Figure 11. The Clustering performances of G-CoMVKM on UWA data: (a) Minimum values; and (b) Mean values; and (c) Maximum values.
Figure 11. The Clustering performances of G-CoMVKM on UWA data: (a) Minimum values; and (b) Mean values; and (c) Maximum values.
Electronics 14 02129 g011
Table 1. Comprehensive Analysis of Multi-View K-Means Clustering Methods.
Table 1. Comprehensive Analysis of Multi-View K-Means Clustering Methods.
MethodFSVCPSSPerformance
Classical Multi-View K-Means Methods
TW-K-meansV X HighHighBaseline with high scalability
WMCFSV XMed.Med.Balanced trade-off
SWVFVXHighMed.Strong feature selection
FRMVKVXMed.HighRobust to noise
TW-Co-K-meansVV HighLow Enhanced collaboration
Modern Deep Learning-Based Methods
Deep-learning/autoencoder MV clusteringVVMed.Med.Superior learning
Contrastive MV clusteringVVMed.Med.State-of-the-art
FS: Feature Selection, VC: View Collaboration, PS: Parameter Sensitivity, S: Scalability; V/X: Present/Absent, Med.: Medium.
Table 2. Computational Complexity Comparison.
Table 2. Computational Complexity Comparison.
AlgorithmTime ComplexitySpace Complexity
TW-K-means O ( n × c × s × d ¯ × t ) O ( n × c × s × d ¯ )
WMCFS O ( n × c × s × d ¯ × t ) O ( n × c × s × d ¯ )
SWVF O ( n × c × s × d ¯ × t ) O ( n × c × s × d ¯ )
TW-Co-K-means O ( n × c × s 2 × d ¯ × t ) O ( n × c × s 2 × d ¯ )
FRMVK O ( n × c × s × d ¯ × t ) O ( n × c × s × d ¯ )
G-CoMVKM O ( n × c × s × d ¯ × t ) O ( n × c × s × d ¯ )
Table 3. Experiments with the numerical data set before and after withdrawing irrelevant features.
Table 3. Experiments with the numerical data set before and after withdrawing irrelevant features.
V1D2CV2D3CAR
x 1 1 x 2 1 x 1 2 x 2 2 x 3 2
Initials0.50.50.3330.3330.333
Iteration 00.4830.5170.3820.4800.1380.362
Iteration 10.4830.5170.4430.557Dropped out0.362
Iteration 20.4830.5170.4430.5570.344
Iteration 30.4830.5170.4430.5571.00
Iteration 40.4830.5170.4430.5571.00
Table 4. Experiments with the numerical data set before and after withdrawing irrelevant features.
Table 4. Experiments with the numerical data set before and after withdrawing irrelevant features.
View Weights μ k 1 μ k 2 U ¯
V1D2CV2D3CC1C2C1C2C1C2
Initials0.50.5------
Iteration 00.4870.513371629653347515.8484.2
Iteration 10.4740.527371629653347519.6480.5
Iteration 20.4970.503572428615385593.6406.4
Iteration 30.4930.507533467533467533467
Iteration 40.4940.507533467533467533467
Table 6. Parameter Setting of Six Related Algorithms.
Table 6. Parameter Setting of Six Related Algorithms.
Data SetsTW-k-MeansWMCFSSWVFTW-Co-k-MeansFRMVKG-CoMVKM
Prokaryotic Phyla η = 10 , β = 5 α = 7 , β = 5 × 10 3 α = 5 , β = 4 α = 60 , β = 50 , η = 0.45 α = 8 γ = 3 , θ = 4
Coil20 α = 7 , β = 5 × 10 3 α = 5 , β = 4 × 10 4 α = 6 , β = 5 , η = 0.45 γ = 0.5 , θ = 8
MNIST4 α = 7 , β = 5 × 10 6
ORL Face α = 57 × 10 7 , β = 8 α = 8 , β = 25 × 10 4 α = 35 , β = 7 , η = 0.25 γ = 0.35 , θ = 8
Wikipedia Articles η = 10 , β = 5 α = 10 , β = 25 × 10 2 α = 5 , β = 4 α = 60 , β = 50 , η = 0.45 α = 8 γ = 0.35 , θ = 6
DHA α = 57 × 10 7 , β = 8 α = 8 , β = 25 × 10 4 α = 4 γ = 0.7 , θ = 8
UWA α = 3 × 10 6 , β = 8 α = 8 , β = 45 × 10 4 α = 60 , β = 50 , η = 0.45 α = 5 γ = 0.95 , θ = 20
Table 7. The Clustering Performances of G-CoMVKM On Wikipedia Articles Data under Different Thresholds.
Table 7. The Clustering Performances of G-CoMVKM On Wikipedia Articles Data under Different Thresholds.
Thresholds v h Final   d h U ¯ Clustering Performances
Text ViewImage View d 1 d 2 ARRINMI
t / n D 0.5150.48288[107; 31; 43; 57; 57; 86; 139; 44; 70; 59]0.4700.8690.482
s / n D 0.5140.4863010[68; 101; 84; 57; 73; 50; 67; 65; 80; 48]0.5520.8850.536
1 / n 0.5120.48812110[104; 57; 86; 24; 33; 76; 74; 71; 113; 55]0.5470.8850.541
1 / D 0.5150.4851310[57; 69; 87; 58; 78; 62; 67; 109; 44; 62]0.561 *0.886 *0.542 *
* Bold face indicates the best one, _ indicates the second best one.
Table 8. The Clustering Performances of G-CoMVKM On Prokaryotic Data Set under Different Thresholds.
Table 8. The Clustering Performances of G-CoMVKM On Prokaryotic Data Set under Different Thresholds.
Thresholds v h Final   d h U ¯ Clustering Performances
Gene RepertoireProteome CompositionTextual d 1 d 2 d 3 ARRINMI
t / n D 0.3170.3290.354437[140.1; 103.8; 53.1; 59.3]0.623 *0.676 *0.406 *
s / n D 0.2850.3570.358433[50.9; 97.6; 91.4; 114.1]0.5820.6640.384
1 / n 0.2970.3400.3633753371[107.5; 113.5; 74.4; 55.1]0.5610.6420.318
1 / D 0.2990.3350.3673933438[115.9; 75.1; 54.5; 103.6]0.5700.6440.317
* Bold face indicates the best one, _ indicates the second best one.
Table 9. Experimental Results for Wikipedia Articles and Prokaryotic Data Sets.
Table 9. Experimental Results for Wikipedia Articles and Prokaryotic Data Sets.
Evaluation CriteriaWikipedia ArticlesProkaryotic
ARRINMIARRINMI
TW-k-means0.5270.8790.5390.2940.5220.006
WMCFS0.4250.6290.3610.5530.6450.325
SWVF0.5230.8770.544 *0.4420.5820.157
TW-Co-k-means0.5540.8850.5400.5210.5940.235
FRMVK0.5550.8810.5300.5610.6440.323
G-CoMVKM0.561 *0.886 *0.5420.625 *0.676 *0.406 *
* Bold face indicates the best one, _ indicates the second best one.
Table 10. Summary of Relevant Features By FRMVK and G-CoMVKM on Wikipedia Articles and Prokaryotic Data Sets.
Table 10. Summary of Relevant Features By FRMVK and G-CoMVKM on Wikipedia Articles and Prokaryotic Data Sets.
Evaluation CriteriaWikipedia ArticlesProkaryotic
TextImageGene RepertoireProteome CompositionTextual
FRMVK57101163124
G-CoMVKM1310437
Table 11. The Clustering Performances (Minimum, Mean, and Maximum AR, RI, NMI) and Total Running Time (TRT) of Each Employed Competitive Algorithms and Proposed G-CoMVKM on COIL20 Data Set.
Table 11. The Clustering Performances (Minimum, Mean, and Maximum AR, RI, NMI) and Total Running Time (TRT) of Each Employed Competitive Algorithms and Proposed G-CoMVKM on COIL20 Data Set.
ARRINMITRT
TW-k-means0.565/0.631/0.7010.511/0.589/0.6550.767/0.854/0.9026032.7
WMCFS0.236/0.382/0.5620.198/0.357/0.5460.222/0.401/0.6066176.3
SWVF0.476/0.561/0.6790.357/0.487/0.6330.636/0.747/0.8666177.7
TW-Co-k-means0.535/0.613/0.6800.484/0.566/0.6540.701/0.793/0.8722818.6
FRMVK0.716 */0.752 */0.7810.707/0.743/0.7760.880 */0.902 */0.919 *394.0
G-CoMVKM0.579/0.719/0.798 *0.947 */0.967 */0.976 *0.815/0.867/0.88934.50 *
* Bold face indicates the best one, _ indicates the second best one.
Table 12. The Clustering Performances (Minimum, Mean, and Maximum AR, RI, NMI) and Total Running Time (TRT) of Each Employed Competitive Algorithms and Proposed G-CoMVKM on MNIST4 Data Set.
Table 12. The Clustering Performances (Minimum, Mean, and Maximum AR, RI, NMI) and Total Running Time (TRT) of Each Employed Competitive Algorithms and Proposed G-CoMVKM on MNIST4 Data Set.
ARRINMITRT
TW-k-means0.729/0.786/0.8450.567/0.627/0.6810.623/0.659/0.6963001.1
WMCFS0.801/0.816/0.8200.576/0.587/0.5920.566/0.575/0.5781790.6
SWVF0.722/0.794/0.8510.556/0.628/0.6840.616/0.661/0.6963038.3
TW-Co-k-means0.729/0.777/0.8310.570/0.619/0.6750.629/0.659/0.6922608.1
FRMVK0.804 */0.843 */0.8760.648/0.713/0.6830.668/0.691/0.712571.9
G-CoMVKM0.617/0.833/0.885 *0.808 */0.882 */0.899 *0.885 */0.899 */0.717 *26.2 *
* Bold face indicates the best one, _ indicates the second best one.
Table 13. The Performance of G-CoMVKM with View Dropping Out Scheme On ORL Face Data Set.
Table 13. The Performance of G-CoMVKM with View Dropping Out Scheme On ORL Face Data Set.
View 1View 2View 3View 4ARRI NMI
ORL Face693148738110210.4600 */0.5005 */0.5650 *0.9630 */0.9682 */0.9746 *0.7276 */0.7596 */0.7911 *
6931487381Dropped out0.3775/0.4164/0.45000.9591/0.9629/0.96750.7072/0.7209/0.7436
6931487Dropped out9960.3825/0.4309/0.53500.9626/0.9650/0.97070.7153/0.7290/0.7597
6931487Dropped outDropped out0.4025/0.4270/0.46250.9599/0.9632/0.96920.7023/0.7175/0.7483
* Bold face indicates the best one, _ indicates the second best one.
Table 14. Final Dimensions on UWA Data After Performing G-CoFLKM with Fixed θ and Different γ.
Table 14. Final Dimensions on UWA Data After Performing G-CoFLKM with Fixed θ and Different γ.
G-CoMVKM
γ = 0.25 γ = 0.35 γ = 0.45 γ = 0.55 γ = 0.65 γ = 0.75 γ = 0.85 γ = 0.95
RGB596217156137130121119112
Depth110110110110110110109106
Table 15. The Performance of G-CoMVKM and Five Related Algorithms On DHA and UWA Data Sets.
Table 15. The Performance of G-CoMVKM and Five Related Algorithms On DHA and UWA Data Sets.
HAR DataRGB ViewDepth ViewClustering PerformancesTRT
ARRINMI
G-CoMVKMDHA1471100.5967 */0.6195 */0.6667*0.9468 */0.9556 */0.9621*0.7362 */0.7751 */0.8075*595.728
UWA1770.5415*/0.5818*/0.6245*0.9525*/0.9618*/0.9688*0.7610*/0.7815*/0.8086*637.582
FRMVKDHA17310.0658/0.1870/0.57610.0420/0.3127/0.95350.0000/0.1991/0.71218.781*
UWA1200.1304/0.2694/0.42690.8431/0.8977/0.94300.3681/0.5303/0.67142.390*
TW-k-meansDHA61441100.3745/0.4471/0.48970.9266/0.9365/0.94420.6465/0.6673/0.7130226.441
UWA0.4269/0.4900/0.59290.9434/0.9527/0.96100.6947/0.7306/0.7755510.223
WMCFSDHA0.0658/0.1052/0.46910.0420/0.1314/0.93840.0000/0.0699/0.7197223.247
UWA0.0553/0.0553/0.05530.0325/0.0325/0.03250.0000/0.0000/0.0000454.559
SWVFDHA0.3745/0.4750/0.54320.9302/0.9393/0.94690.6416/0.6919/0.7324233.293
UWA0.4704/0.5410/0.60870.9476/0.9596/0.96540.7340/0.7648/0.7914466.964
TW-Co-k-meansDHA0.3333/0.4016/0.48970.9171/0.9307/0.94050.6101/0.6384/0.669951.603
UWA0.4783/0.5387/0.61260.9512/0.9589/0.96680.7365/0.7638/0.792693.426
* Bold face indicates the best one, _ indicates the second best one.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sinaga, K.P.; Yang, M.-S. A Globally Collaborative Multi-View k-Means Clustering. Electronics 2025, 14, 2129. https://doi.org/10.3390/electronics14112129

AMA Style

Sinaga KP, Yang M-S. A Globally Collaborative Multi-View k-Means Clustering. Electronics. 2025; 14(11):2129. https://doi.org/10.3390/electronics14112129

Chicago/Turabian Style

Sinaga, Kristina P., and Miin-Shen Yang. 2025. "A Globally Collaborative Multi-View k-Means Clustering" Electronics 14, no. 11: 2129. https://doi.org/10.3390/electronics14112129

APA Style

Sinaga, K. P., & Yang, M.-S. (2025). A Globally Collaborative Multi-View k-Means Clustering. Electronics, 14(11), 2129. https://doi.org/10.3390/electronics14112129

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop