Consensus Guided Multi-View Unsupervised Feature Selection with Hybrid Regularization

Shi, Yifan; Zeng, Haixin; Gong, Xinrong; Cai, Lei; Xiang, Wenjie; Lin, Qi; Zheng, Huijie; Zhu, Jianqing

doi:10.3390/app15126884

Open AccessArticle

Consensus Guided Multi-View Unsupervised Feature Selection with Hybrid Regularization

by

Yifan Shi

^1,†,

Haixin Zeng

^1,†,

Xinrong Gong

^1,*,

Lei Cai

¹,

Wenjie Xiang

¹,

Qi Lin

²,

Huijie Zheng

² and

Jianqing Zhu

¹

School of Engineering, Huaqiao University, Quanzhou 362021, China

²

School of Information Science and Engineering, Huaqiao University, Xiamen 361021, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2025, 15(12), 6884; https://doi.org/10.3390/app15126884

Submission received: 11 May 2025 / Revised: 12 June 2025 / Accepted: 16 June 2025 / Published: 18 June 2025

(This article belongs to the Special Issue Robust and Secure Learning and Applications for Heterogeneous Data from Multiple Sources)

Download

Browse Figures

Versions Notes

Abstract

Multi-source heterogeneous data has been widely adopted in developing artificial intelligence systems in recent years. In real-world scenarios, raw multi-source data are generally unlabeled and inherently contain multi-view noise and feature redundancy, leading to extensive research on unsupervised multi-view feature selection. However, existing approaches mainly utilize the local adjacency relationships and the

L_{2, 1}

-norm to guide the feature selection process, which may lead to instability in performance. To address these problems, this paper proposes a Consensus Guided Multi-view Unsupervised Feature Selection with Hybrid Regularization (CGMvFS). Specifically, CGMvFS integrates multiple view-specific basic partitions into a unified consensus matrix, which is constructed to guide the feature selection process by preserving comprehensive pairwise constraints across diverse views. A hybrid regularization strategy incorporating the

L_{2, 1}

-norm and the Frobenius norm is introduced into the feature selection objective function, which not only promotes feature sparsity but also effectively prevents overfitting, thereby improving the stability of the model. Extensive empirical evaluations demonstrate that CGMVFS outperforms state-of-the-art comparison approaches across diverse multi-view datasets.

Keywords:

unsupervised feature selection; consensus clustering; heterogeneous data

1. Introduction

With the rapid advancement of the Internet of Things, multi-source heterogeneous data has been widely adopted in developing artificial intelligence systems across diverse domains such as multimedia [1], bioinformatics [2], industry [3], etc. However, high-dimensional multi-source heterogeneous data usually contains multi-view noise and redundant features that negatively impact the robustness and efficiency of learning models. Consequently, multi-view feature selection research has attracted growing interest, focusing on collaboratively learning multi-view features to identify and filter discriminative features.

Multi-view feature selection approaches can be divided into three categories based on the availability of labels: supervised methods [4,5,6,7], semi-supervised methods [8,9,10], and unsupervised methods [11,12,13]. Supervised methods leverage ground-truth label annotations to achieve optimal feature discriminability, typically demonstrating superior performance. Semi-supervised variants strategically incorporates both labeled and unlabeled samples to maintain competitive accuracy with partial supervision. The unsupervised paradigm, operating without labeled data, employs intrinsic data structures to identify important features, rendering it particularly prevalent in real-world applications where annotated multi-source heterogeneous data are often unavailable. In general, this paper focuses on the research on unsupervised multi-view feature selection.

In unsupervised learning, feature selection algorithms’ performance is highly contingent on the design paradigms of guidance information. Current methods showcase diverse innovations in guidance mechanisms, which can be broadly grouped into two principal paradigms according to their underlying strategies: (1) Graph-structured regularization approaches; representative works include the multi-view feature selection framework proposed by Hou et al. [14], which integrated adaptive graph similarity learning with view weight co-optimization. Additionally, Cao et al. [15] achieved cross-view complementary information mining by constructing multi-graph orthogonal constraints alongside clustering consistency objective functions. (2) Non-negative orthogonal decomposition frameworks: notable implementations encompass Huang et al.’s [16] extended weighted non-negative matrix factorization (WNMF) architecture, which utilized a complementary-consensus dual-module collaborative mechanism for feature selection. Another innovative study [17,18,19] introduced shared subspace embedding techniques into multi-view learning, thereby establishing low-dimensional representations for high-dimensional data. Despite methodological progress, persistent limitations warrant attention. Current methods mainly focus on the local adjacency relationship of samples, while the global cluster structures between samples can be further used as pseudo-constraints to guide the feature selection process. In addition, only applying

L_{2, 1}

-norm regularization can ensure sparsity in feature selection, but it may lead to overfitting problems.

To address these problems, this paper presents a Consensus Guided Multi-view Unsupervised Feature Selection with Hybrid Regularization (CGMvFS). Specifically, CGMvFS employs consensus clustering across individual views to capture view-specific intrinsic data structures. The resulting view-specific co-association matrices are aggregated to construct a unified co-association matrix that establishes comprehensive pairwise constraints across all views. A novel hybrid-regularized objective function is then formulated to simultaneously learn view-specific feature projection matrices while preserving the cross-view pairwise constraints. Through convex relaxation of the original problem’s upper bound, the optimized projection matrices are derived, enabling discriminative feature selection through embedded sparsity constraints. The primary contributions of this work are summarized as follows:

Multiple view-specific basic partitions are integrated into a unified consensus matrix, which guides the feature selection process by preserving comprehensive pairwise constraints across diverse views.
A hybrid regularization strategy incorporating the $L_{2, 1}$ -norm and the Frobenius norm is introduced into the feature selection objective function, which not only promotes feature sparsity but also effectively prevents overfitting, thereby improving the stability of the model.
The proposed CGMvFS framework is extensively evaluated on multiple multi-view datasets, demonstrating superior performance in unsupervised feature selection and robustness compared to existing methods.

This paper is structured systematically as follows. Section 2 conducts a critical survey of current methodologies and theoretical foundations in multi-view unsupervised feature selection. Section 3 provides detailed exposition of our innovatively designed cross-view feature selection mechanism operating in unsupervised contexts. Section 4 delivers empirical validation through standardized datasets, rigorously validating the efficacy of our approach with comparative performance metrics. Finally, Section 5 concludes the paper and discusses future research directions.

2. Related Works

Recent years have witnessed substantial scholarly interest in multi-view learning, particularly in its application to feature selection. This paradigm leverages complementary data representations from multiple perspectives to capture intrinsic structural patterns more comprehensively than conventional single-view approaches, offering distinct analytical advantages.

Current multi-view unsupervised feature selection methods [7,20] split into two theoretically grounded frameworks, each employing distinct guidance mechanisms. The first paradigm employs graph-regularized architectures [21,22] that maintain inter-view correlations through adaptive graph embeddings, effectively preserving manifold structures via topological representations. The second framework utilizes non-negative orthogonal decomposition, implementing matrix factorization with orthogonality constraints to balance discriminative feature selection with interpretable low-dimensional projections. These complementary approaches provide robust theoretical underpinnings and methodological flexibility, collectively advancing the field’s technical evolution.

The discipline has progressed through two synergistic yet distinct research trajectories. Graph-regularized approaches originated with Hou et al. [14], who pioneered adaptive view-weight coordination through joint similarity learning. Subsequent refinements include Zhang et al. [23] introducing sparsity-constrained feature weighting via dynamic sample space reconstruction, and Zhang et al. [24] developing subspace-embedded collaborative graphs that preserve view-specific local structures while ensuring global consistency. Tang et al. [25] achieved dual-space projection through manifold-aligned graph embeddings, culminating in Cao et al.’s [15] integration of spectral graph theory with clustering-driven selection via adaptive weighting matrices. Concurrently, non-negative decomposition frameworks evolved through Tang et al. [12] embedding sparse feature selection in NMF-based clustering, followed by Si et al. [26] incorporating cluster-aware optimization via diversity-constrained matrix factorization. Huang et al. [16] advanced weighted NMF formulations with auto-calibrated view importance, ultimately refined by Fang et al. [27] through orthogonalized projection matrices. The two complementary methodological foundations of graph-theoretic regularization and algebraic decomposition have facilitated substantial technical progress through adaptive operationalization strategies.

Despite these advancements, current methodologies exhibit critical limitations in dynamically distinguishing reliable information from low-quality view-specific data. This deficiency leads to feature redundancy and structural overlap in selected features, substantially constraining their effectiveness when handling complex high-dimensional multi-view datasets with inherent redundancy and noise interference.

3. Proposed Method

3.1. Framework and Definition

In this article, we propose a Consensus Guided Multi-view Unsupervised Feature Selection with Hybrid Regularization (CGMvFS), and the conceptual architecture of CGMvFS is shown in Figure 1. CGMvFS mainly consists of two steps, Firstly, consensus clustering is applied independently to each view to generate view-specific co-association matrices, which are then aggregated into a unified co-association matrix that encodes comprehensive pairwise relationships across all views. Then, a hybrid-regularized objective function is formulated to learn view-specific feature projection matrices under the cross-view constraints; by convexly relaxing its upper bound, the optimal projections are obtained and embedded sparsity is enforced to select the most discriminative features.

To facilitate mathematical exposition, the notation we adopt employs uppercase symbols (e.g.,

M

) for matrix representation and bold lowercase notation (e.g.,

v

) for vector quantities. Given a multi-view dataset

X = [X^{1}, \dots, X^{V}]

containing V views where each view is denoted by

X^{v} \in R^{n \times d_{v}}

corresponds to a data matrix containing n instances characterized by

d_{v}

dimensional features. The core objective of unsupervised multi-view feature selection is to identify the most m discriminative features, so as to eliminate noise and redundancy and improve the performance of learning models.

3.2. Formulation

Within the multi-view feature selection methodological framework [14], cluster labels are conventionally necessitated to supervise the learning regimen of dimensionality reduction operators through the following formulation:

min_{W^{v}} \sum_{v = 1}^{V} (∥ X^{v} W^{v} - Y^{*} ∥_{F}^{2} + α {∥ W^{v} ∥}_{2, 1})

(1)

The formulation incorporates an equilibrium coefficient

α

to balance objective components, where

Y^{*} \in R^{n \times k}

is expected to indicate the ground-truth cluster assignments of all samples. The feature projection matrix

W^{v} \in R^{d^{v} \times k}

establishes an optimized mapping between the feature space and label space for the

v

-th view. Furthermore, structured sparsity patterns are induced through

L_{2, 1}

-norm regularization on

W^{(v)}

, serving to mitigate feature redundancy and suppress noise artifacts through column-wise feature suppression. After solving

W^{v}

, the importance of each feature in diverse views can be derived by

∥ w_{i}^{v} ∥_{2}

. This metric enables systematic selection of the top-r discriminative features through descending norm-based ranking, as features with higher norms typically capture more significant variation or discriminative power in the data representation.

However, in unsupervised learning scenarios where ground-truth labels

Y^{*}

are inherently inaccessible, effective feature projection learning necessitates the exploitation of inherent structural relationships within the data distribution. To address this problem, we introduce ensemble clustering to capture complementary cluster structure information and design a multi-view objective function for more robust and comprehensive representation learning.

Specifically, in order to capture diverse intrinsic data structures of each view, Kmeans is applied to generate B basic partitions

{P_{i}^{v}}_{i = 1}^{B}

, where

P_{b}^{v} \in R^{n \times k_{b}^{v}}

is a partition matrix that indicating

k_{b}^{v}

clusters in the v-th view space [28]. The

(i, j)

-th entry of

P_{b}^{v}

is defined as follows:

p_{i j} = \{\begin{matrix} 1, if x_{i} \in C_{j} \\ 0, otherwise \end{matrix}

(2)

By concatenating B basic partitions, the ensemble partition matrix [28,29]

P^{v} = [P_{1}^{v} P_{2}^{v} \dots P_{B}^{v}]

is constructed to provide sufficient intra-view cluster structural information. Then, for the v-th perspective, the co-association matrix formulated can be analytically using the subsequent mathematical derivation:

M^{v} = \frac{1}{B} P^{v} {(P^{v})}^{⊤}

(3)

The

(i, j)

-th entry of the co-association matrix

M^{v}

quantifies the probability that

x_{i}^{v}

and

x_{j}^{v}

are co-clustered according to multiple basic partitions in the v-th view. When two samples are frequently co-assigned to identical clusters through various partitions, their corresponding entry

M_{i j}^{v}

converges toward 1, infrequent co-occurrences result in values approaching 0. This formulation effectively encodes similarity relationships between samples through different view-specific perspectives. To achieve cross-view consensus, we aggregate view-specific co-association matrices through element-wise averaging to construct the unified consensus matrix as follows:

M = \frac{1}{V} \sum_{v = 1}^{V} M^{v}

(4)

Since the unified consensus matrix provides the relationship between two samples, we use

M

as pseudo-constraints instead of pseudo-labels to guide the feature learning process as follows:

\begin{matrix} min_{W^{v}} \sum_{v = 1}^{V} {∥(X^{v} W^{v}) {(X^{v} W^{v})}^{⊤} - M∥}_{F}^{2} + α {∥W^{v}∥}_{2, 1} + σ {∥W^{v}∥}_{F}^{2} \end{matrix}

(5)

where

(X^{v} W^{v}) \in R^{n \times k}

projects the data from the v-th view into a low-dimensional feature subspace, where the similarity between samples can be assessed using the linear kernel

(X^{v} W^{v}) {(X^{v} W^{v})}^{⊤}

. In addition, the composite regularization architecture incorporates hybrid regularization terms (i.e.,

L_{2, 1}

-norm and Frobenius norm) to achieve dual structural control on the projection matrix. Intuitively, the

L_{2, 1}

-norm performs hard selection of discriminative features while the Frobenius norm softly regularizes retained features, thereby avoiding the over-sparsification problem while maintaining the benefits of dimensionality reduction. The trade-off between sparsity and generalization is explicitly controlled by

α

and

σ

.

3.3. Optimization

The non-convex optimization formulation in (5) presents computational intractability when approached through conventional analytical methods. To transform it into a convex problem for further optimization, we introduce auxiliary variables to simplify the subsequent derivation process, making the objective function more intuitive and eliminating the need for complex matrix multiplication. Specifically, the positive semi-definite matrix M inherently enables spectral decomposition through its characteristic eigenvalue-eigenvector configuration:

M = E Λ E^{⊤} = (E Λ^{\frac{1}{2}}) {(E Λ^{\frac{1}{2}})}^{⊤}

(6)

where

Λ \in R^{n \times n}

is a diagonal matrix with n eigenvalues of

M

,

E

is the corresponding eigenvector matrix. Comparing Equations (4) and (6), it can be seen that

E Λ^{\frac{1}{2}}

reflects the embedding of the samples in the feature subspace.

By minimizing

{∥(X^{v} W^{v}) {(X^{v} W^{v})}^{⊤} - M∥}_{F}^{2}

for the v-th view, the optimal solution can be obtained based on the Eckart–Young–Mirsky Theorem [30,31] as follows:

Y^{*} = X^{v} W^{v *} = E^{*} {(Λ^{*})}^{\frac{1}{2}}

(7)

Here,

Λ^{*} \in R^{k \times k}

is a diagonal matrix corresponding to the largest k eigenvalues of

M

, and

E^{*} \in R^{n \times k}

is a matrix composed of the corresponding k eigenvectors. According to [13,32,33], the upper bound of

{∥(X^{v} W^{v}) {(X^{v} W^{v})}^{⊤} - M∥}_{F}

is determined by

X^{v} W^{v} - Y^{*}

. Thus, we can further derive the upper bound of

\sum_{v = 1}^{V} {∥(X^{v} W^{v}) {(X^{v} W^{v})}^{⊤} - M∥}_{F}

as follows:

\begin{matrix} \sum_{v = 1}^{V} ∥ (Y^{*}) {(Y^{*})}^{⊤} {- M ∥}_{F} \leq \sum_{v = 1}^{V} {∥ (Y^{v}) {(Y^{v})}^{⊤} - M ∥}_{F} \\ \leq \sum_{v = 1}^{V} 2 (∥ Y^{*} ∥_{F} + {∥ Ψ ∥}_{F} {) ∥ Ψ ∥}_{F} + \sum_{v = 1}^{V} {∥ (Y^{*}) {(Y^{*})}^{⊤} - M ∥}_{F} \\ s . t . Ψ = Y^{v} - Y^{*}, Y^{v} = X^{v} W^{v} \end{matrix}

(8)

Since

\sum_{v = 1}^{V} {∥ (Y^{*}) {(Y^{*})}^{⊤} - M ∥}_{F}

and

∥ Y^{*} ∥_{F}

are constants, the upper bound of

\sum_{v = 1}^{V} {∥ (Y^{v}) {(Y^{v})}^{⊤} - M ∥}_{F}

is related to

Ψ

. As a result, problem (5) can be rewritten to minimize its upper bound as follows:

\begin{matrix} min_{W^{v}} J = \sum_{v = 1}^{V} {∥X^{v} W^{v} - Y^{*}∥}_{F}^{2} + α {∥ W^{v} ∥}_{2, 1} + σ {∥W^{v}∥}_{F}^{2} \\ s . t . Y^{*} = E^{*} {(Λ^{*})}^{\frac{1}{2}} \end{matrix}

(9)

The globally optimal solution can be derived by solving

\frac{\partial J}{\partial W^{v}} = 0

, yielding the following result:

\frac{\partial J}{\partial W} = 2 {(X^{v})}^{⊤} X^{v} W^{v} - 2 {(X^{v})}^{⊤} Y^{*} + 2 α F^{v} W^{v} + 2 σ I^{v} W^{v} = 0

(10)

W^{v} = {({(X^{v})}^{⊤} X^{v} + α F^{v} + σ I^{v})}^{- 1} {(X^{v})}^{⊤} Y^{*}

(11)

F^{v} = d i a g (\frac{1}{2 | | w_{1}^{v} {| |}_{2} + μ}, \frac{1}{2 | | w_{2}^{v} {| |}_{2} + μ}, \dots, \frac{1}{2 | | w_{d}^{v} {| |}_{2} + μ})

(12)

where

I^{v} \in R^{n \times n}

is the identity matrix, and

μ

is a small positive constant that prevents division by zero. Note that

F^{v}

depends on

W^{v}

, and the iterative process is applied until convergence to obtain the global optimum. In general, the computational workflow of the proposed methodology is formally outlined in Algorithm 1.

Algorithm 1 Consensus Guided Multi-view Unsupervised Feature Selection with Hybrid Regularization (CGMvFS)

Require:

Input: Multi-view dataset

X^{v}

,

α, σ

.

Ensure:

1: Initialize

F^{v}, I^{v}

,

M_{sum}, M

.

2: for each view

v = 1

to V do

3: Generate basic partition

P^{v}

;

4: Compute the co-affinity matrix

M^{v}

and accumulate:

M_{sum} + = M^{v}

;

5: end for

6: Compute the global consensus matrix

M = \frac{M_{sum}}{V}

;

7: Calculate the consensus representation matrix

Y^{*} = E^{*} {(Λ^{*})}^{\frac{1}{2}}

through spectral decomposition;

8: for each view

v = 1

to V do

9: repeat

10: Update

W^{v} = {({(X^{v})}^{T} X^{v} + α F^{v} + σ I^{v})}^{- 1} {(X^{v})}^{T} Y^{*}

;

11: Update

F^{v} = d i a g (\frac{1}{2 | | w_{1}^{v} {| |}_{2} + μ}, \frac{1}{2 | | w_{2}^{v} {| |}_{2} + μ}, . . ., \frac{1}{2 | | w_{d}^{v} {| |}_{2} + μ})

;

12: until

W^{v}

converges

13: end for

Output Rank the features based on

∥ W_{i}^{v} ∥_{2}

and select the top r most discriminative features.

3.4. Computational Complexity Analysis

We now analyze the computational complexity of CGMvFS. Given that the algorithm primarily operates on the variable

W^{v}

, its efficiency critically depends on the scale and manipulation of this matrix. Through careful evaluation of the dominant procedures—notably the iterative updates and evaluations involving

W^{v}

—we determine the time complexity to be

O (k_{v}^{3})

. Consequently, the overall complexity of CGMvFS is

O (\sum_{v = 1}^{V} k_{v}^{3})

. This confirms that the computational demands are fundamentally tied to the dimensionality of its core variable.

4. Experiments

This section presents a systematically designed experimental framework to rigorously assess the performance of the proposed method. Through comparative analysis with existing state-of-the-art unsupervised feature selection techniques across diverse real-world datasets, we validate both the efficacy and robustness of our approach. Furthermore, we conduct robustness analysis under varying noise conditions, perform sensitivity analysis to investigate parameter impacts, execute algorithmic diagnostics to examine convergence characteristics, and implement running time comparisons to evaluate computational efficiency.

4.1. Datasets and Experimental Setup

In this work, we utilize six benchmark multi-view datasets sourced from publicly accessible repositories, covering diverse application scenarios. As summarized in Table 1, these standardized datasets address fundamental machine learning tasks including handwritten digit recognition, object categorization, and web page classification. The diversity of this curated collection facilitates a comprehensive cross-domain evaluation of our methodology’s generalization performance.

Feature selection methodologies fundamentally function via discriminative relevance ranking to retain information-rich feature subsets. This dimensionality reduction mechanism prunes low-saliency attributes, optimizing downstream analytical pipelines while improving computational tractability. Following feature subset optimization, we project data to compressed feature spaces for subsequent k-means clustering. To ensure statistical reliability, we conducted 20 repeated experiments and took the average performance index as the final result. Our evaluation framework adopts two criteria: Normalized Mutual Information (NMI) assessing cluster-to-ground-truth distributional consistency, and Accuracy (ACC) evaluating class-label agreement, formally defined as:

NMI (X, Y) = \frac{I (X, Y)}{\sqrt{H (X) \times H (Y)}}

(13)

ACC = \frac{1}{n} \sum_{i = 1}^{n} 1 ({\hat{y}}_{i} = y_{i})

(14)

where

I (X, Y)

denotes cross-entropy mutual information,

H (\cdot)

represents Shannon entropy, n indicates sample cardinality, and

{\hat{y}}_{i}

and

y_{i}

correspond to predicted/ground-truth labels, respectively, with

δ (\cdot)

being the Kronecker delta function evaluating to 1 for matched labels and 0 otherwise.

4.2. Comparison Experiment

To evaluate the methodological efficacy, we benchmarked our framework against eight contemporary multi-view unsupervised feature selection architectures: ASVW [14], CGMV-FS [12], CRV-DGL [34], NSGL [35], TLR [36], CvLP-DGL [25], CCSFS [11], and CDMvFS [15]. This comparator set captures key methodological dimensions in multi-view learning, including representation learning paradigms and discriminative feature selection frameworks. To define a baseline performance level in the raw feature space, we further apply k-means algorithm [37] using concatenated multi-view features without feature selection, providing a non-parametric baseline that quantifies clustering capability prior to dimensionality reduction.

Table 2 and Table 3 provide a systematic comparison of unsupervised multi-view feature selection methodologies across six datasets, where optimal and suboptimal performances are highlighted using boldface and underlined formatting, respectively. Empirical findings substantiate the sustained dominance of the CGMvFS architecture over contemporary methodologies under all evaluative scenarios. Specifically, CGMvFS achieves significant performance advantages in NMI, establishing new performance benchmarks across all six experimental datasets. Regarding ACC metrics, CGMvFS obtains optimal performance on four out of six datasets.

As depicted in Figure 2 and Figure 3, the performance of CGMvFS’s performance monotonically increases with the number of selected features until reaching a plateau beyond 150 features. This convergence pattern suggests diminishing marginal returns from additional features, empirically validating both the method’s effectiveness and its resistance to feature space over-complexity.

4.3. Parameter Sensitivity Analysis

This subsection systematically examines the impact of parameters

α

and

σ

on CGMvFS’s performance. Through systematic hyperparameter exploration, we implement logarithmic scaling for parameters

α

and

σ

across five orders of magnitude (

10^{- 2}

to

10^{2}

). Through controlled experimentation that involves independently modulating each parameter while maintaining others fixed, the sensitivity analysis reveals parameter-dependent performance variations as illustrated in Figure 4. The empirical evidence demonstrates substantial sensitivity of algorithmic outcomes to parameter selection, with clustering manifestations exhibiting discernible variations across different configurations. These observations underscore the imperative of meticulous parameter calibration to attain performance optimization.

4.4. Convergence Behavior Analysis

To substantiate our algorithm’s operational efficiency, we conduct systematic convergence analysis across three benchmark datasets: handwritten,

M S R C V 1

, and WebKB. Figure 5 demonstrates the monotonic descent pattern of the objective function during iterative optimization. Notably, the algorithm achieves convergence within a limited number of iterations, characterized by rapid decline of objective values (particularly in the initial five iterations) followed by asymptotic stabilization. This empirical evidence substantiates that our method exhibits desirable computational efficiency while demonstrating scalability for processing large-scale datasets.

4.5. Robustness Analysis

To evaluate our method’s robustness under noisy conditions, Gaussian noise was introduced at

10 %

,

20 %

, and

30 %

of the feature dimensions. As depicted in Figure 6, our approach maintains performance comparable to the noise-free scenario in the Handwritten, Yale, and WebKB datasets. Meanwhile, the other three datasets exhibit marginal performance degradation.

4.6. Comparison of Running Time

Given computational resource constraints and experimental feasibility, this study employed two representative comparative methods for execution time assessment to capture the prevailing trends. As shown in Table 4, our method exhibits sensitivity to feature dimensionality, consequently requiring more computational time on datasets with higher dimensions, which is deemed acceptable. We have achieved significant performance improvements by trading off a degree of execution speed.

5. Discussion

This work introduces a Consensus-Guided Multi-View Unsupervised Feature Selection framework with Hybrid Regularization (CGMvFS). Our methodology formulates unified consensus representations through systematic aggregation of view-specific base partitions, preserving cross-view pairwise constraints to govern the feature selection process. This approach synergistically captures complementary similarity patterns across views while mitigating single-view bias through consensus regularization. The proposed optimization architecture innovatively integrates a dual-norm formulation, combining

L_{2, 1}

-norm induced feature sparsity with Frobenius norm-based stability enhancement, effectively balancing model fidelity and generalization capacity. Empirical validation across six heterogeneous multi-view benchmark datasets confirms the framework’s superiority.

However, the proposed method exhibits certain limitations: (1) significant scope for computational efficiency enhancement and (2) potential noise susceptibility in consensus clustering derivation through view averaging. Future work will pursue algorithmic refinements to boost computational efficiency and scalability while incorporating adaptive view-specific reliability weighting to enhance real-world applicability.

Author Contributions

Methodology: Y.S., H.Z. (Haixin Zeng) and X.G.; validation: H.Z. (Haixin Zeng); writing—original draft preparation: H.Z. (Haixin Zeng); writing—review and editing: Y.S. and X.G.; formal analysis: L.C.; investigation: W.X.; data curation: Q.L.; visualization: H.Z. (Huijie Zheng); supervision: J.Z.; funding acquisition: Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under No. 62306122, the Natural Science Foundation of Xiamen under No. 3502Z202571034, the Natural Science Foundation of Fujian Province under No. 2022J05065, the Fundamental Research Funds for the Central Universities under No. ZQN-1122, the High-level Talent Team Project of Quanzhou City under No. 2023CT001, the Scientific Research Funds of Huaqiao University under No. 24BS141, and the Science and Technology Major Special Project of Fujian Province under No. 2024HZ022007.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets analyzed in this study were derived from Multi-view Unsupervised Feature Selection With Consensus Partition And Diverse Graph [15] (DOI: 10.1016/j.ins.2024.120178).

Acknowledgments

We are extremely grateful for the valuable suggestions provided by the editors and reviewers.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Lin, Y.; Yu, Z.; Yang, K.; Philip Chen, C.L. Ensemble Denoising Autoencoders Based on Broad Learning System for Time-Series Anomaly Detection. IEEE Trans. Neural Netw. Learn. Syst. 2025, 1–14. [Google Scholar] [CrossRef] [PubMed]
Yu, Z.; Zhong, Z.; Yang, K.; Cao, W.; Chen, C.L.P. Broad Learning Autoencoder With Graph Structure for Data Clustering. IEEE Trans. Knowl. Data Eng. 2024, 36, 49–61. [Google Scholar] [CrossRef]
Chen, W.; Yang, K.; Yu, Z.; Nie, F.; Chen, C.L.P. Adaptive Broad Network With Graph-Fuzzy Embedding for Imbalanced Noise Data. IEEE Trans. Fuzzy Syst. 2025, 33, 1949–1962. [Google Scholar] [CrossRef]
Yu, Z.; Dong, Z.; Yu, C.; Yang, K.; Fan, Z.; Chen, C.P. A review on multi-view learning. Front. Comput. Sci. 2025, 19, 197334. [Google Scholar] [CrossRef]
Lin, Q.; Yang, L.; Zhong, P.; Zou, H. Robust Supervised Multi-View Feature Selection with Weighted Shared Loss and Maximum Margin Criterion. Knowl.-Based Syst. 2021, 229, 107331. [Google Scholar] [CrossRef]
Wang, C.; Song, P.; Duan, M.; Zhou, S.; Cheng, Y. Low-Rank Tensor Based Smooth Representation Learning for Multi-View Unsupervised Feature Selection. Knowl.-Based Syst. 2025, 309, 112902. [Google Scholar] [CrossRef]
Duan, M.; Song, P.; Zhou, S.; Cheng, Y.; Mu, J.; Zheng, W. High-Order Correlation Preserved Multi-View Unsupervised Feature Selection. Eng. Appl. Artif. Intell. 2025, 139, 109507. [Google Scholar] [CrossRef]
Yang, K.; Yu, Z.; Chen, W.; Liang, Z.; Chen, C.L.P. Solving the Imbalanced Problem by Metric Learning and Oversampling. IEEE Trans. Knowl. Data Eng. 2024, 36, 9294–9307. [Google Scholar] [CrossRef]
Shi, C.; Gu, Z.; Duan, C.; Tian, Q. Multi-View Adaptive Semi-Supervised Feature Selection with the Self-Paced Learning. Signal Process. 2020, 168, 107332. [Google Scholar] [CrossRef]
Zhang, C.; Fang, Y.; Liang, X.; Zhang, H.; Zhou, P.; Wu, X.; Yang, J.; Jiang, B.; Sheng, W. Efficient Multi-view Unsupervised Feature Selection with Adaptive Structure Learning and Inference. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24), Jeju, Republic of Korea, 3–9 August 2024. [Google Scholar]
Cao, Z.; Xie, X.; Sun, F.; Qian, J. Consensus Cluster Structure Guided Multi-View Unsupervised Feature Selection. Knowl.-Based Syst. 2023, 271, 110578. [Google Scholar] [CrossRef]
Tang, C.; Chen, J.; Liu, X.; Li, M.; Wang, P.; Wang, M.; Lu, P. Consensus Learning Guided Multi-View Unsupervised Feature Selection. Knowl.-Based Syst. 2018, 160, 49–60. [Google Scholar] [CrossRef]
Shi, Y.; Yang, K.; Wang, M.; Yu, Z.; Zeng, H.; Hu, Y. Boosted Unsupervised Feature Selection for Tumor Gene Expression Profiles. CAAI Trans. Intell. Technol. 2024. [Google Scholar] [CrossRef]
Hou, C.; Nie, F.; Tao, H.; Yi, D. Multi-View Unsupervised Feature Selection with Adaptive Similarity and View Weight. IEEE Trans. Knowl. Data Eng. 2017, 29, 1998–2011. [Google Scholar] [CrossRef]
Cao, Z.; Xie, X.; Li, Y. Multi-View Unsupervised Feature Selection with Consensus Partition and Diverse Graph. Inf. Sci. 2024, 661, 120178. [Google Scholar] [CrossRef]
Huang, Y.; Shen, Z.; Cai, Y.; Yi, X.; Wang, D.; Lv, F.; Li, T. C ² IMUFS: Complementary and Consensus Learning-Based Incomplete Multi-View Unsupervised Feature Selection. IEEE Trans. Knowl. Data Eng. 2023, 35, 10681–10694. [Google Scholar] [CrossRef]
Wan, Y.; Sun, S.; Zeng, C. Adaptive Similarity Embedding for Unsupervised Multi-View Feature Selection. IEEE Trans. Knowl. Data Eng. 2021, 33, 3338–3350. [Google Scholar] [CrossRef]
Gong, X.; Gao, J.; Sun, S.; Zhong, Z.; Shi, Y.; Zeng, H.; Yang, K. Adaptive Compressed-based Privacy-preserving Large Language Model for Sensitive Healthcare. IEEE J. Biomed. Health Inform. 2025, 1–13. [Google Scholar] [CrossRef]
Gong, X.; Chen, C.L.P.; Hu, B.; Zhang, T. CiABL: Completeness-Induced Adaptative Broad Learning for Cross-Subject Emotion Recognition With EEG and Eye Movement Signals. IEEE Trans. Affect. Comput. 2024, 15, 1970–1984. [Google Scholar] [CrossRef]
Wang, D.; Wang, L.; Chen, W.; Wang, H.; Liang, C. Unsupervised Multi-View Feature Selection Based on Weighted Low-Rank Tensor Learning and Its Application in Multi-Omics Datasets. Eng. Appl. Artif. Intell. 2025, 143, 110041. [Google Scholar] [CrossRef]
Xu, S.; Xie, X.; Cao, Z. Graph–Regularized Consensus Learning and Diversity Representation for Unsupervised Multi-View Feature Selection. Knowl.-Based Syst. 2025, 311, 113043. [Google Scholar] [CrossRef]
Jiang, B.; Liu, J.; Wang, Z.; Zhang, C.; Yang, J.; Wang, Y.; Sheng, W.; Ding, W. Semi-Supervised Multi-View Feature Selection with Adaptive Similarity Fusion and Learning. Pattern Recognit. 2025, 159, 111159. [Google Scholar] [CrossRef]
Zhang, L.; Liu, M.; Wang, R.; Du, T.; Li, J. Multi-View Unsupervised Feature Selection with Dynamic Sample Space Structure. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019; pp. 2641–2648. [Google Scholar]
Zhang, H.; Wu, D.; Nie, F.; Wang, R.; Li, X. Multilevel Projections with Adaptive Neighbor Graph for Unsupervised Multi-View Feature Selection. Inf. Fusion 2021, 70, 129–140. [Google Scholar] [CrossRef]
Tang, C.; Zheng, X.; Liu, X.; Zhang, W.; Zhang, J.; Xiong, J.; Wang, L. Cross-View Locality Preserved Diversity and Consensus Learning for Multi-View Unsupervised Feature Selection. IEEE Trans. Knowl. Data Eng. 2022, 34, 4705–4716. [Google Scholar] [CrossRef]
Si, X.; Yin, Q.; Zhao, X.; Yao, L. Consistent and Diverse Multi-View Subspace Clustering with Structure Constraint. Pattern Recognit. 2022, 121, 108196. [Google Scholar] [CrossRef]
Fang, S.G.; Huang, D.; Wang, C.D.; Tang, Y. Joint Multi-View Unsupervised Feature Selection and Graph Learning. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 16–31. [Google Scholar] [CrossRef]
Shi, Y.; Yu, Z.; Chen, C.L.P.; You, J.; Wong, H.S.; Wang, Y.; Zhang, J. Transfer Clustering Ensemble Selection. IEEE Trans. Cybern. 2020, 50, 2872–2885. [Google Scholar] [CrossRef]
Shi, Y.; Yu, Z.; Cao, W.; Chen, C.L.P.; Wong, H.S.; Han, G. Fast and Effective Active Clustering Ensemble Based on Density Peak. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 3593–3607. [Google Scholar] [CrossRef]
Eckart, C.; Young, G. The Approximation of One Matrix by Another of Lower Rank. Psychometrika 1936, 1, 211–218. [Google Scholar] [CrossRef]
Chan, J.T.; Li, C.K.; Sze, N.S. Isometries for Unitarily Invariant Norms. Linear Algebra Its Appl. 2005, 399, 53–70. [Google Scholar] [CrossRef]
Zhao, Z.; Wang, L.; Liu, H.; Ye, J. On Similarity Preserving Feature Selection. IEEE Trans. Knowl. Data Eng. 2013, 25, 619–632. [Google Scholar] [CrossRef]
Shi, Y.; Yu, Z.; Chen, C.L.P.; Zeng, H. Consensus Clustering With Co-Association Matrix Optimization. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 4192–4205. [Google Scholar] [CrossRef] [PubMed]
Tang, C.; Zhu, X.; Liu, X.; Wang, L. Cross-View Local Structure Preserved Diversity and Consensus Learning for Multi-View Unsupervised Feature Selection. Proc. AAAI Conf. Artif. Intell. 2019, 33, 5101–5108. [Google Scholar] [CrossRef]
Bai, X.; Zhu, L.; Liang, C.; Li, J.; Nie, X.; Chang, X. Multi-view Feature Selection Via Nonnegative Structured Graph Learning. Neurocomputing 2020, 387, 110–122. [Google Scholar] [CrossRef]
Yuan, H.; Li, J.; Liang, Y.; Tang, Y.Y. Multi-View Unsupervised Feature Selection with Tensor Low-Rank Minimization. Neurocomputing 2022, 487, 75–85. [Google Scholar] [CrossRef]
Xu, J.; Lange, K. Power K-Means Clustering. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6921–6931. [Google Scholar]

Figure 1. The framework of the proposed CGMvFS.

Figure 2. The accuracy (ACC) obtained by applying various feature selection methods on different datasets.

Figure 3. The normalized mutual information (NMI) obtained by applying various feature selection methods on different datasets.

Figure 4. Parameter sensitivity on different datasets. Colors are used for visual clarity only and do not indicate specific meanings.

Figure 5. Convergence trajectory analysis: objective function evolution of CGMvFS across iterative refinement stages.

Figure 6. ACC and NMI results of the influence of Gaussian noise features on CGMvFS.

Table 1. Details of datasets.

Feature	Handwritten	WebKB	MSRCV1	ORL	Outdoor Scene	Yale
1	FCCS (76)	view1 (1703)	HOG (576)	View 1 (4096)	GIST (512)	Intensity (4096)
2	KAR (64)	view2 (230)	CMT (24)	View 2 (3304)	HOG (432)	LBP (3304)
3	FAC (216)	view3 (230)	GIST (512)	View 3 (6750)	LBP (256)	GABOR (6075)
4	PA (240)	–	CENTRIST (254)	–	GABOR (48)	–
5	MOR (6)	–	LBP (256)	–	–	–
6	ZER (47)	–	–	–	–	–
Instance	2000	203	210	400	2688	165
Class	10	4	7	40	8	15

Table 2. ACC% ± std% of different methods on different datasets.

Method	Datasets
Method	MSRVCV1	Yale	Handwritten	Outdoor Scene	ORL	WebKB
ASVW [14]	69.43 ± 6.12	44.00 ± 2.25	80.12 ± 7.05	47.56 ± 1.98	33.1 ± 1.54	56.11 ± 6.53
CGMV-FS [12]	68.14 ± 5.41	42.88 ± 3.13	67.66 ± 4.85	26.95 ± 0.65	33.49 ± 1.16	58.18 ± 5.98
CRV-DGL [34]	77.05 ± 7.79	50.18 ± 5.59	79.92 ± 6.75	61.15 ± 4.50	54.56 ± 3.37	73.37 ± 7.38
NSGL [35]	69.88 ± 4.89	39.82 ± 3.52	75.91 ± 4.85	45.77 ± 3.02	40.75 ± 2.62	72.02 ± 4.58
TLR [36]	81.19 ± 7.32	48.58 ± 4.82	81.74 ± 6.73	42.87 ± 3.31	55.25 ± 3.39	76.82 ± 1.93
CvLP-DGL [25]	73.57 ± 3.61	46.18 ± 4.53	73.05 ± 6.55	62.83 ± 3.97	58.89 ± 2.94	70.96 ± 7.91
CCSFS- [11]	78.36 ± 5.20	54.64 ± 4.30	84.28 ± 7.30	62.16 ± 3.50	58.36 ± 3.76	75.34 ± 8.06
CDMvFS [15]	82.46 ± 6.16	54.58 ± 5.85	86.78 ± 7.69	62.58 ± 6.00	60.18 ± 2.89	75.71 ± 8.86
Our	83.14 ± 0.79	57.82 ± 1.85	92.57 ± 0.63	61.76 ± 0.66	62.80 ± 1.66	70.15 ± 1.38

Table 3. NMI% ± std% of different methods on different datasets.

Method	Datasets
Method	MSRVCV1	Yale	Handwritten	Outdoor Scene	ORL	WebKB
ASVW [14]	61.29 ± 6.12	49.55 ± 1.72	78.24 ± 3.34	39.76 ± 1.01	55.59 ± 1.51	11.73 ± 3.99
CGMV-FS [12]	58.04 ± 3.80	48.59 ± 2.26	67.31 ± 2.54	11.82 ± 0.42	55.77 ± 0.96	13.92 ± 8.49
CRV-DGL [34]	68.61 ± 5.18	57.72 ± 4.82	77.31 ± 2.83	49.01 ± 1.46	73.76 ± 2.03	35.14 ± 9.62
NSGL [35]	61.29 ± 3.78	46.15 ± 2.80	72.77 ± 2.83	37.59 ± 0.66	62.48 ± 1.67	33.73 ± 2.36
TLR [36]	74.67 ± 4.79	53.01 ± 3.59	81.44 ± 3.69	37.66 ± 0.79	74.23 ± 1.44	39.71 ± 4.79
CvLP-DGL [25]	64.21 ± 4.03	50.05 ± 4.18	70.26 ± 3.33	49.85 ± 1.28	76.43 ± 1.82	35.96 ± 5.07
CCSFS [11]	70.61 ± 3.32	58.42 ± 3.42	79.77 ± 3.52	53.33 ± 0.56	76.09 ± 2.17	42.71 ± 10.32
CDMvFS [15]	72.66 ± 5.00	59.72 ± 4.29	82.69 ± 4.27	51.13 ± 2.11	77.89 ± 1.20	44.04 ± 10.52
Our	75.10 ± 1.28	62.39 ± 2.43	85.50 ± 1.51	54.05 ± 0.60	80.03 ± 0.67	44.32 ± 1.27

Table 4. The runtimes of different methods on different datasets.

Method	Handwritten	WebKB	Yale	Outdoor Scene	ORL	MSRCV1
CCSFS	268.74	9.15	138.39	256.47	275.15	3.12
CDMvFS	938.17	4.02	138.39	1227.69	119.69	2.65
CGMvFS	3.76	11.34	847.02	11.88	889.06	2.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, Y.; Zeng, H.; Gong, X.; Cai, L.; Xiang, W.; Lin, Q.; Zheng, H.; Zhu, J. Consensus Guided Multi-View Unsupervised Feature Selection with Hybrid Regularization. Appl. Sci. 2025, 15, 6884. https://doi.org/10.3390/app15126884

AMA Style

Shi Y, Zeng H, Gong X, Cai L, Xiang W, Lin Q, Zheng H, Zhu J. Consensus Guided Multi-View Unsupervised Feature Selection with Hybrid Regularization. Applied Sciences. 2025; 15(12):6884. https://doi.org/10.3390/app15126884

Chicago/Turabian Style

Shi, Yifan, Haixin Zeng, Xinrong Gong, Lei Cai, Wenjie Xiang, Qi Lin, Huijie Zheng, and Jianqing Zhu. 2025. "Consensus Guided Multi-View Unsupervised Feature Selection with Hybrid Regularization" Applied Sciences 15, no. 12: 6884. https://doi.org/10.3390/app15126884

APA Style

Shi, Y., Zeng, H., Gong, X., Cai, L., Xiang, W., Lin, Q., Zheng, H., & Zhu, J. (2025). Consensus Guided Multi-View Unsupervised Feature Selection with Hybrid Regularization. Applied Sciences, 15(12), 6884. https://doi.org/10.3390/app15126884

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Consensus Guided Multi-View Unsupervised Feature Selection with Hybrid Regularization

Abstract

1. Introduction

2. Related Works

3. Proposed Method

3.1. Framework and Definition

3.2. Formulation

3.3. Optimization

3.4. Computational Complexity Analysis

4. Experiments

4.1. Datasets and Experimental Setup

4.2. Comparison Experiment

4.3. Parameter Sensitivity Analysis

4.4. Convergence Behavior Analysis

4.5. Robustness Analysis

4.6. Comparison of Running Time

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI