Federated Incomplete Multi-View Unsupervised Feature Selection with Fractional Sparsity-Guided Whale Optimization and Tensor Alternating Learning

Yufan Yuan; Wangyu Wu; Chang-An Xu; Weirong Zhang; Chuan Jin

doi:10.3390/fractalfract9110717

,

and

¹

School of Mathematical Science, Yangzhou University, Yangzhou 225002, China

²

School of Computer Science, University of Liverpool, Liverpool L69 3DR, UK

³

College of Materials and Energy, South China Agricultural University, Guangzhou 510642, China

⁴

School of Ecology, Hainan University, Haikou 570228, China

Fractal Fract.2025, 9(11), 717;https://doi.org/10.3390/fractalfract9110717

Version Notes

Order Reprints

Abstract

With the widespread application of multi-view data across various domains, multi-view unsupervised feature selection (MUFS) has achieved remarkable progress in both feature selection (FS) and missing-view completion. However, existing MUFS methods typically rely on centralized servers, which not only fail to meet privacy requirements in distributed settings but also suffer from suboptimal FS quality and poor convergence. To overcome these challenges, we propose a novel federated incomplete MUFS method (Fed-IMUFS), which integrates a fractional Sparsity-Guided Whale Optimization Algorithm (SGWOA) and Tensor Alternating Learning (TAL). Within this federated learning framework, each client performs local optimization in two stages: in the first stage, SGWOA introduces an

L_{2, 1}

proximal projection to enforce row-sparsity in the FS weight matrix, while fractional-order dynamics and fractal-inspired elite kernel injection mechanisms enhance global search ability, yielding a discriminative and stable weight matrix; in the second stage, based on the obtained weight matrix, an alternating optimization framework with tensor decomposition is employed to iteratively complete missing views while simultaneously optimizing low-dimensional representations to preserve cross-view consistency, with the objective function gradually minimized until convergence. During federated training, the server employs an aggregation and distribution strategy driven by normalized mutual information, where clients upload only their local weight matrices and quality indicators, and the server adaptively fuses them into a global FS matrix before distributing it back to clients. This process achieves consistent FS across clients while safeguarding data privacy. Comprehensive evaluations on CEC2022 and several incomplete multi-view datasets confirm that Fed-IMUFS outperforms state-of-the-art methods, delivering stronger global optimization capability, higher-quality feature selection, faster convergence, and more effective handling of missing views.

Keywords:

multi-view unsupervised feature selection; federated learning; whale optimization algorithm; tensor alternating learning; missing-view completion; privacy preservation

1. Introduction

Multi-view data refers to information collected from diverse sources or perspectives, providing complementary observations of the same entity or phenomenon. Such complex data structures pose both opportunities and challenges for effective data representation and feature selection (FS). This paper focuses on addressing these challenges through a federated multi-view unsupervised FS (MUFS) framework. Such views may come from various sensors, different modalities (e.g., images, text, or audio), or even different time points. For example, combining RGB images with depth sensor data enables more accurate 3D scene reconstruction []. In machine learning, multi-view learning focuses on modeling and integrating these diverse sources of information to enhance performance and generalization. A central challenge lies in effectively leveraging both the complementarity and consistency among views. Identifying informative features across multiple views is crucial for tasks such as classification, clustering, and detection [,]. Yet, multi-view datasets often encompass numerous views, each associated with high-dimensional features, which together cause extremely large feature spaces. This high dimensionality increases computational burden and raises the risk of overfitting. Therefore, effective FS techniques are indispensable in addressing these challenges [].

These features often exhibit certain correlations and complementarities across different views [,]. Due to the heterogeneity of feature sources and the high cost of data annotation, the resulting datasets are typically high-dimensional and unlabeled [,]. With the advent of big data, selecting the most informative features from such unlabeled multi-view data has become a critical issue in various domains, including recommendation systems, image recognition, and social network analysis. As a well-established unsupervised learning paradigm, MUFS has gained increasing interest within the research community, aiming to enhance the performance of downstream tasks through dimensionality reduction and the elimination of redundant features [].

Existing studies on MUFS can be classified into two fundamental categories. The first category integrates features from multiple views into a unified representation, upon which traditional single-view FS methods are applied. Representative methods include Graph-based Unsupervised FS for Interval-valued Information Systems (GBUFS) by Xu et al. [], High-order Similarity Learning (HSL) by Mi et al. [], and Sparse PCA with

ℓ_{2, p}

-Norm Regularization for Unsupervised FS (SPCAFS) by Li et al. []. Sparsity-driven methods, such as LASSO, originally emerged in single-view FS by introducing sparse regularization (e.g.,

L_{1}

penalty) to enforce sparsity in the weight matrix and automatically filter irrelevant features. However, when extended to multi-view scenarios, these approaches typically perform FS independently within each view and then aggregate the results, making it difficult to fully capture cross-view correlations. The second group directly performs FS on multi-view data to exploit the inherent relationships among different views. Typical examples include Structure Learning with Consensus Label Information for MUFS (SCMvFS) by Cao et al. [], Joint MUFS and Graph Learning (JMVFG) by Fang et al. [], and Structural Regularization-based Discriminative MUFS (SDFS) by Zhou et al. []. These methods usually rely on similarity graphs or graph learning frameworks to enhance cross-view consistency and improve the effectiveness of FS. Nevertheless, they generally assume that each sample is fully observed in all views, an assumption that rarely holds in real-world applications.

This challenge is particularly pronounced in practical applications. For instance, in recommendation systems, cold-start users may lack behavioral records in certain views []; in multimodal learning tasks, videos may be missing audio or subtitle information []; in social networks, users exhibit varying levels of activity across text, image, or interaction views []; and in medical diagnostics, although patient records and test results are often available, privacy concerns frequently lead to missing personal health information []. These scenarios collectively demonstrate that incomplete multi-view data are not only pervasive but also significantly impair the performance of downstream tasks. Although some methods, such as MIMB [] and IMC-MCL [], are capable of handling incomplete multi-view data, they typically address missing-value imputation and FS separately, thereby failing to exploit their potential synergy. To overcome this limitation, Huang et al. [] proposed a joint learning of tensorial incomplete MUFS and missing-view imputation (TIME-FS), which introduces an optimization framework that jointly incorporates missing-value completion, FS, and low-dimensional representation learning. While effective in handling missing views, TIME-FS is a centralized MUFS approach and thus cannot meet privacy requirements in distributed scenarios. Moreover, there remains room to enhance the efficiency of FS, the quality of solutions, and the speed of convergence.

Another pressing challenge is that most existing MUFS methods still rely on centralized frameworks, assuming that all data are stored on a central server. However, such a design not only poses severe privacy risks but also leads to substantial transmission overhead. Although a few studies have attempted to introduce federated learning (FL) into multi-view scenarios, their focus has largely remained on downstream tasks such as clustering and representation learning, while the upstream problem of FS under federated settings has been insufficiently explored [,]. In practice, multi-view data are inherently distributed and highly sensitive: in medical diagnostics, different hospitals separately store imaging, genomic, and clinical records; in financial risk control, banks, payment platforms, and e-commerce companies each maintain transaction, credit, and consumption data; and in cross-platform recommendation, video, shopping, and social platforms respectively retain users’ viewing, purchasing, and interaction histories. These characteristics render centralized storage and sharing impractical in real-world applications. Recent studies have successfully extended basic FS [] and multi-label FS [] problems into the FL framework, demonstrating its feasibility and potential for FS tasks. By enabling collaborative modeling across clients without exposing raw data, FL offers a promising solution to privacy-sensitive FS in distributed scenarios. Motivated by this, this paper integrates MUFS into the FL framework, not only bridging the gap between these two research directions but also establishing a methodological foundation for privacy-preserving and distributed multi-view data analysis, with significant theoretical value and broad application prospects.

In recent years, researchers have explored a wide range of metaheuristic algorithms, including Genetic Algorithm (GA) [], Particle Swarm Optimization algorithm (PSO) [], Ant Colony Optimization (ACO) [], and the Energy Valley Optimizer (EVO) []. These algorithms have contributed to enhancing global search capabilities, while simultaneously improving the stability and convergence of FS. Moreover, they have been applied to the optimization of diverse problems. For example, Meng et al. [] introduced an approach that combines the fractional-order grey model (FOGM) with PSO to identify the optimal fractional order, thereby enhancing prediction accuracy when data are scarce or unstable. Similarly, Kartci et al. [] presented a GA-driven mixed integer-order method for accurate optimization of fractional capacitive and inductive components over a four-decade bandwidth (up to 1 GHz), achieving exceptionally low phase error. Mirjalili and Lewis [] developed the Whale Optimization Algorithm (WOA), a metaheuristic inspired by the bubble-net hunting behavior of humpback whales. WOA has exhibited outstanding performance in global optimization [], engineering applications [], and FS [], and has been widely recognized for its simple yet effective mechanism, strong global search ability, and fast convergence [,]. Accordingly, this work adopts WOA as the metaheuristic optimization tool to improve both solution quality and convergence in incomplete MUFS. Nevertheless, the No Free Lunch theorem [] indicates that no algorithm achieves optimal performance on every problem. WOA, in particular, still encounters challenges in high-dimensional sparse modeling, escaping the local optima, and ensuring stable convergence, highlighting the need for task-oriented refinements and further extensions.

Summarizing the above, existing studies on MUFS face two major limitations: (i) most approaches assume fully observed data across all views, which hardly holds in real-world scenarios, and (ii) centralized frameworks raise privacy and communication concerns in distributed environments. To address these issues, we propose a federated incomplete MUFS method, Fed-IMUFS, which integrates a fractional Sparsity-Guided WOA (SGWOA), Tensor Alternating Learning (TAL), and an aggregation and distribution strategy driven by normalized mutual information (NMI) within a federated learning framework. Each client performs a two-stage local optimization: SGWOA learns sparse and stable feature weights, while TAL imputes missing views and refines low-dimensional representations. The server then adaptively integrates the client updates into a global matrix without exposing any raw data. Experiments on CEC2022 and multiple incomplete multi-view datasets show that Fed-IMUFS outperforms state-of-the-art methods in optimization, FS quality, convergence, and missing-view completion, confirming its effectiveness and practicality.

In light of the above motivation and challenges, the main contributions of this study are summarized as follows:

We propose a novel Fed-IMUFS method that integrates federated learning with MUFS. Each client sequentially executes SGWOA and TAL to obtain an optimized FS weight matrix and compute its NMI. During federated training, the server employs an aggregation and distribution strategy driven by NMI to adaptively fuse the uploaded weight matrices and quality indicators into a global matrix, which is then redistributed to clients. This process safeguards data privacy while enhancing the quality and convergence of MUFS.
We design an SGWOA for global search in the vectorized FS matrix space, integrating three mechanisms: (i) Prox- $ℓ_{2, 1}$ projection enforces row sparsity on W, enhancing stability and interpretability; (ii) fractional-order dynamics adaptively regulate parameters to avoid premature convergence; and (iii) fractal-inspired elite-kernel injection replaces poor solutions with samples near elites, sustaining diversity at low cost. Together, these mechanisms enable SGWOA to learn discriminative and robust weight matrices, forming a solid basis for data completion and representation learning.
We introduce an aggregation and distribution strategy driven by NMI. Each client independently optimizes its FS matrix and computes its NMI, uploading only the FS matrix and NMI to the server. The server adaptively allocates weights using NMI to aggregate local matrices into a global FS matrix, which is then redistributed to clients for subsequent optimization rounds. This strategy eliminates raw data transmission, improves global performance and cross-client consistency, and further strengthens privacy protection by minimizing the risk of data leakage.

The remainder of this paper is organized as follows. Section 2 provides a detailed review of related work. Section 3 introduces the fundamental concepts of the proposed method and elaborates on its theoretical foundation. Section 4 presents and discusses the experimental results. Finally, Section 5 concludes the paper, discusses its limitations, and outlines potential directions for future research.

2. Related Work

In this section, we review previous studies on single-view FS and MUFS, summarizing their main ideas, methodologies, and limitations.

2.1. Single-View Unsupervised Feature Selection

In recent years, a variety of single-view unsupervised FS methods have been developed, which are typically grouped into three categories: filter, wrapper, and embedded strategies.

Filter-based methods assess the significance of features using predefined metrics, such as variance, to select the most informative ones. For example, the Laplacian Score (LS) [] evaluates features according to their ability to preserve local manifold structures, while Spectral FS (SPEC) [] applies theory of graph spectra, combining supervised and unsupervised principles to rank feature importance. Despite their effectiveness, these methods mainly focus on inherent sample characteristics and often overlook correlations between features.

Wrapper methods [] select feature subsets by iteratively training learning algorithms, typically involving subset search and performance evaluation. Although their performance often surpasses that of filter methods, their excessive computational cost makes them inefficient for practical applications.

In contrast, embedded methods combine model training with FS under a joint optimization scheme, enabling the two processes to strengthen one another. For instance, Wang et al. [] proposed an unsupervised FS approach that combines local structure learning with exponentially weighted sparse regression, which not only preserves both global and local information in the feature subspace but also alleviates the common bias of traditional methods toward high-weight features while neglecting global structure. Li et al. [] designed a binary differential evolution algorithm for unsupervised FS based on feature subspace learning (FBUFS), which leverages local manifold structures and pseudo-label learning to derive more informative feature weights. Fan et al. [] introduced a multi-strategy improved beluga WOA (MSBWO), enhancing both the quality of FS and convergence speed. Miao et al. [] developed a dynamic multi-WOA with elite tuning, effectively addressing high-dimensional FS problems in classification tasks. Pramanik [] proposed an unsupervised WOA-based deep FS method (U-WOA), which has demonstrated effectiveness in cancer detection from breast ultrasound images. Similarly, Nie et al. [] presented the Structured Optimal Graph for Unsupervised FS (SOGFS), which adaptively learns sample similarities to capture local structures while simultaneously performing FS.

However, when single-view unsupervised FS methods are applied to multi-view data, a typical approach is to concatenate features from different views directly, without considering their inherent correlations. Such neglect of cross-view relationships undermines the potential synergy of complementary information across views, thereby reducing the effectiveness of FS.

2.2. Multi-View Unsupervised Feature Selection

Unlike single-view approaches, MUFS methods are specifically developed for multi-view data, aiming to exploit latent cross-view relationships and leverage complementary information across different views. This enables more accurate and robust FS compared with single-view methods.

To improve human action retrieval, Wang et al. [] designed the Adaptive Multi-View FS (AMFS) model, which constructs view-specific Laplacian matrices and adaptively assigns weights to integrate cross-view complementary information, thereby enhancing feature representation and retrieval accuracy. Cao et al. [] developed a novel graph construction strategy that integrates high-order neighbor information into multi-view FS, projecting data into a shared latent space to capture feature-level complementarity and achieving superior clustering performance. Building on this, Cao et al. [] further proposed a method that generates mutually exclusive multi-view graphs to enhance cross-view complementarity, combining graph learning with consensus clustering to mitigate the challenges of heterogeneous views. Liu et al. [] presented a FS method based on latent semantics and anchor graph learning, which integrates feature weighting, latent semantic discovery, and a unified multilayer framework that integrates adaptive graph learning. This method incorporates explicit redundancy suppression and effectively handles noise, redundant features, and the high cost of large-scale graph construction. Yang et al. [] addressed imbalanced and incomplete data with the TERUIMUFS method, which combines self-representation learning, sample diversity learning, and tensor low-rank constraints to model feature importance and correct structural errors. Yu et al. [] proposed CLSG, an incomplete multi-view FS approach that recovers missing data in a latent space, employs a dual-layer local similarity graph, and uses a linear-complexity optimization scheme to improve both FS and data reconstruction. Huang et al. [] introduced TIME-FS, which unifies missing value recovery, discriminative FS, and low-dimensional representation learning into a tensor decomposition framework, enabling joint optimization of FS and missing value imputation while reducing computational overhead.

Although these methods have demonstrated promising results in practice, several limitations remain: (1) most existing approaches are centralized and thus unsuitable for distributed scenarios involving privacy-sensitive and incomplete multi-view data; and (2) the quality of solutions and convergence performance still leave room for improvement. To address these challenges, this study builds upon the TIME-FS framework by incorporating an improved WOA to optimize both solution quality and convergence behavior. Furthermore, the proposed method is deployed in a federated learning framework, where an aggregation and distribution strategy driven by NMI is designed to effectively tackle the aforementioned issues.

3. The Proposed Method

This section elaborates on the fundamental principles of the WOA, the implementation details of MUFS matrix initialization, SGWOA, and Fed-IMUFS, followed by an analysis of the privacy preservation, communication overhead, and time complexity of Fed-IMUFS.

3.1. Whale Optimization Algorithm

The WOA [] is a nature-inspired metaheuristic based on the bubble-net feeding behavior of humpback whales. As shown in Figure 1, during this process, whales encircle their prey and create spiral bubbles to capture it. WOA simulates these actions for global optimization and has been successfully applied in various engineering and machine learning tasks. The two main update patterns are described below.

Figure 1. Schematic of the bubble-net feeding behavior, which inspires the WOA.

(1): Encircling and Searching

At iteration t, let

X (t)

be a whale’s position and

X^{*} (t)

the best solution found so far. The encircling behavior is expressed as Equation (1).

X (t + 1) = X^{*} (t) - A \cdot | B X^{*} (t) - X (t) |,

(1)

where

A = 2 b r - a

and

B = 2 r

, with

r \in [0, 1]

. When

| A | \geq 1

, a random whale

X_{rand}

is selected to promote exploration. Here,

X (t)

,

X^{*} (t)

, and

X_{rand}

\in R^{k}

denote position vectors in a k-dimensional continuous search space, while

A

and

B

are coefficient vectors of the same dimensionality.

(2): Spiral updating

The bubble-net (exploitation) phase is modeled by a logarithmic spiral around the prey, as defined in Equation (2).

X (t + 1) = X^{*} (t) + | X^{*} (t) - X (t) | e^{b k} cos (2 π k),

(2)

where

k \in [- 1, 1]

is random and b controls spiral contraction. With probability p, the algorithm alternates between the encircling and spiral modes to balance exploration and exploitation. In this context,

X (t)

and

X^{*} (t)

are k-dimensional position vectors, and all vector operations are performed element-wise.

3.2. MUFS Matrix Initialization

In this paper, a multi-view dataset with V views is used, where the v-th view is expressed as a matrix

X^{(v)} \in R^{d_{v} \times n}

(see Equation (3)).

X^{(v)} \in R^{d_{v} \times n},

(3)

with

d_{v}

denoting the number of features in view v, n the number of samples, and

v = 1, \dots, V

indexing each view. To ensure consistent feature learning across views, we define a projection matrix for each view, formally defined in Equation (4):

W^{(v)} \in R^{d_{v} \times k},

(4)

where each row of

W^{(v)}

corresponds to an original feature and each column corresponds to a latent feature dimension. The parameter k denotes the dimensionality of the latent subspace, typically set equal to the number of clusters c to align the learned representation with the underlying clustering structure. All projection matrices

{W^{(v)}}

are then flattened and concatenated into a single vector, as shown in Equation (5).

w = vec ({W^{(v)}}_{v = 1}^{V}) \in R^{\sum_{v = 1}^{V} d_{v} k},

(5)

where

vec (\cdot)

denotes the vectorization operation. This representation enables the optimization problem to be solved directly in a standard continuous search space.

3.3. Sparsity-Guided Whale Optimization Algorithm and Tensor Alternating Learning

This subsection details the SGWOA and TAL (SGWOA-TAL) executed on local clients to tackle incomplete multi-view FS. The approach employs a two-stage optimization framework, namely: (i) SGWOA (Stage 1) to obtain feature selectors

W^{(v)}

, and (ii) TAL (Stage 2) to learn low-dimensional embeddings

Z^{(v)}

for each view. Here,

Z^{(v)}

denotes the low-dimensional embedding of the v-th view, and all embeddings are stacked into a third-order tensor

Z = [Z^{(1)}, . . ., Z^{(V)}] \in R^{k \times n \times V}

for cross-view alignment.

3.3.1. Stage 1: Sparsity-Guided Whale Optimization Algorithm

To optimize

W^{(v)}

, we design a SGWOA. SGWOA inherits the canonical update rules of the WOA (see Equations (1) and (2)), which generate candidate solutions via prey encirclement, stochastic exploration, and spiral bubble-net foraging. A population size of

P = 30

is adopted, as justified by the sensitivity analysis in Section 4.5. Building upon the original framework of WOA, SGWOA integrates three tailored enhancements to tackle the challenges of high-dimensional multi-view FS.

First, proximal

ℓ_{2, 1}

projection. After each update step, a proximal operator associated with the group

ℓ_{2, 1}

norm is applied to enforce row sparsity, as formulated in Equation (6).

W_{i, :}^{(v)} \leftarrow {(1 - \frac{τ}{∥ W_{i, :}^{(v)} ∥_{2} + ϵ})}_{+} \cdot W_{i, :}^{(v)},

(6)

where

W_{i, :}^{(v)}

refers to the i-th row of

W^{(v)}

,

| | \cdot {| |}_{2}

the

ℓ_{2}

norm,

ϵ

a small constant (

10^{- 8}

) to avoid division by zero, and

(\cdot) +

the positive part. This operation shrinks feature weights row-wise toward zero, eliminating uninformative features and yielding sparse, interpretable projections.

Second, fractional-order dynamics. To prevent premature convergence while capturing long-term dependencies in the optimization process, the update coefficients A and B are adjusted according to a fractional-order dynamical scheme, as shown in Equation (7).

A (t) = 2 α^{ν} (t) r - α^{ν} (t), B (t) = 2 r,

(7)

where

r \in [0, 1]

is a random vector, and

α^{ν} (t)

represents a fractional-order decay function with order

0 < ν \leq 1

:

α^{ν} (t) = α_{0} {(1 - \frac{t}{T})}^{ν} .

(8)

Here,

α_{0} = 2

is the initial value (as in the standard WOA []), and T is the maximum number of iterations. When

ν = 1

, the update reduces to the classical linear decay, while smaller

ν

values introduce a memory effect with slower fading, enriching the exploration–exploitation tradeoff. In our implementation, the fractional order

ν

is fixed at

0.8

, chosen a priori based on previous studies [] and empirical validation, which provides a balanced trade-off between memory length and convergence stability and ensures compatibility with the canonical WOA while offering additional robustness. This fractional-order perspective enables the search dynamics to integrate non-local historical information, thereby improving stability and reducing premature convergence in high-dimensional multi-view FS.

Third, fractal-inspired elite kernel injection is employed to strengthen the algorithm’s global search capability. To prevent the population

P

from stagnating in suboptimal regions, the worst-performing individuals are periodically replaced with new candidates sampled in the neighborhood of the current global best X, as defined in Equation (9):

X_{new} = X + σ \cdot N (0, I) .

(9)

Here, the neighborhood is defined in the solution space, representing the vicinity of the current global best

X^{*}

within the FS weight domain. The perturbation scale

σ

controls the exploration radius, and empirical validation shows that setting

σ = 1

achieves a balanced trade-off between exploration and exploitation, ensuring stable convergence across datasets. This fractal-inspired injection maintains population diversity and strengthens the search dynamics, effectively guiding the algorithm toward promising regions of the solution space.

As a result, SGWOA constructs highly discriminative and robust FS weight matrices

W^{(v)}

, laying a solid foundation for subsequent missing-view completion and low-dimensional representation learning. To further ensure search stability and population diversity, candidate solutions are ranked by their NMI-based fitness at each iteration: the top 10% are retained as elites to guide exploitation, whereas the bottom 20% are replaced during the fractal-inspired kernel injection step. Proximity between solutions is measured using the Euclidean distance

∥ w_{i} - w^{*} ∥_{2} < δ

(with

δ = 0.05 \times mean ∥ w ∥

).

3.3.2. Stage 2: Tensor Alternating Learning

After obtaining

W^{(v)}

via SGWOA in Stage 1, the second stage optimizes latent representations and completes missing views within a tensor-based alternating learning framework inspired by TIME-FS [], which has been validated for missing-view completion, compact representation learning, and multi-view alignment. The central objective integrates FS, adaptive imputation, and tensor-based alignment. The overall optimization problem is formulated as shown in Equation (10).

\begin{matrix} min_{\begin{matrix} W^{(v)}, Z^{(v)}, E^{(v)}, \\ A, P, M, α \end{matrix}} & \sum_{v = 1}^{V} α_{v} {∥\underset{{\hat{X}}^{(v)}}{\underset{︸}{X^{(v)} + E^{(v)} ⊙ G^{(v)}}} - W^{(v)} Z^{(v)}∥}_{F}^{2} + λ \sum_{v} tr ({(W^{(v)} ⊙ Q W^{(v)})}^{⊤} W^{(v)}) \\ + γ \sum_{v} ∥ W^{(v)} ∥_{2, 1} + η \sum_{v} ∥ E^{(v)} ∥_{F}^{2} + {∥ Z - ⟦ A, P, M ⟧ ∥}_{F}^{2} \\ s . t . & α_{v} \geq 0, \sum_{v} α_{v} = 1, M 1 = 1, M \geq 0, \end{matrix}

(10)

where

X^{(v)} \in R^{d_{v} \times n}

denotes the v-th view,

G^{(v)}

is its binary mask,

E^{(v)}

contains imputed values,

W^{(v)}

denotes the feature selector,

Z^{(v)}

the low-dimensional embedding,

Z = [Z^{(1)}, \dots, Z^{(V)}]

, Q encodes a structure prior,

α

are non-negative view weights, and

λ

,

γ

,

η

are trade-off parameters controlling different regularization terms. Specifically,

λ

balances the influence of the structural prior Q to preserve feature-level consistency,

γ

regulates the

ℓ_{2, 1}

-norm sparsity of

W^{(v)}

to promote discriminative and compact FS, and

η

constrains the magnitude of the imputation matrix

E^{(v)}

to prevent overfitting and noise amplification.

Specifically, the framework comprises three components with an alternating update scheme:

(1) Adaptive Imputation under Sparsity Constraints

The imputation rule for completing missing values is defined as shown in Equation (11).

{\hat{X}}^{(v)} = X^{(v)} + E^{(v)} ⊙ G^{(v)},

(11)

where ⊙ denotes the element-wise (Hadamard) product and only entries with

G_{i j}^{(v)} = 1

are altered. After learning

W^{(v)}

and

Z^{(v)}

, the imputation is further refined as shown in Equation (12).

E^{(v)} = W^{(v)} (Z^{(v)} G^{(v) ⊤}) .

(12)

Here,

G^{(v)}

again guarantees that updates affect missing positions exclusively.

(2) Tensor-based Anchor Graph Learning

All view embeddings are stacked into a third-order tensor, as shown in Equation (13).

Z = [Z^{(1)}, \dots, Z^{(V)}] \in R^{k \times n \times V},

(13)

where k represents the embedding dimension, n the number of samples, and V the number of views. A rank-r CP factorization is enforced, as shown in Equation (14).

Z \approx ⟦ A, P, M ⟧ = \sum_{j = 1}^{r} a_{j} \circ p_{j} \circ m_{j},

(14)

where

A \in R^{k \times r}

comprises anchor bases,

P \in R^{n \times r}

stores sample-to-anchor affinities,

M \in R^{V \times r}

contains non-negative, row-normalized view coefficients, and ∘ denotes the outer product.

As shown in Equation (14), a rank-r CP decomposition is applied to the tensor Z. This factorization couples feature embeddings from all views into a shared latent space. By constraining Z to a low-rank representation, the decomposition implicitly filters out redundant and noisy information, thereby supporting more effective FS. Meanwhile, the factor matrices

(A, P, M)

provide cross-view correlations that can be exploited to estimate missing elements in Z, which facilitates adaptive imputation and alignment among different views.

Given two factors, each of the remaining factors is updated via alternating least squares: the update of A is given by Equation (15):

\begin{matrix} A & = Z_{(1)} (P ⊙ M) {[{(P ⊙ M)}^{⊤} (P ⊙ M)]}^{- 1}, \end{matrix}

(15)

the update of P is given by Equation (16):

\begin{matrix} P & = Z_{(3)} (M A) {[{(M A)}^{⊤} (M A)]}^{- 1}, \end{matrix}

(16)

and the update of M (via the intermediate Y) is given by Equation (17):

\begin{matrix} Y & = - Z_{(2)} (P A) {[η I + {(P A)}^{⊤} (P A)]}^{- 1}, M = max (\frac{1 + Y 1}{r} 1^{⊤} - Y, 0), \end{matrix}

(17)

where

Z_{(1)}, Z_{(2)}, Z_{(3)}

are the mode-1/2/3 unfoldings of Z, ⊙ denotes the Khatri–Rao product,

η > 0

sparsifies M, and

max (\cdot, 0)

enforces non-negativity.

(3) Alternating Update Strategy

The optimization iterates over four blocks to solve Equation (10): (i) update

E^{(v)}

using Equation (12); (ii) solve for

W^{(v)}

,

Z^{(v)}

, and

α_{v}

by minimizing Equation (10) with other variables fixed; (iii) refine CP factors via Equations (14)–(17); (iv) optionally synchronize

W^{(v)}

with a global optimizer. Steps (i)–(iv) are iterated until convergence or the maximum iteration limit is met.

Figure 2 illustrates the workflow of SGWOA-TAL, while Algorithm 1 provides its implementation details.

Figure 2. Workflow of Sparsity-Guided Whale Optimization and Tensor Alternating Learning.

Algorithm 1: Implementation of SGWOA-TAL

$Fractalfract 09 00717 i001$

3.4. Federated Incomplete Multi-View Unsupervised Feature Selection via Sparsity-Guided Whale Optimization Algorithm and Tensor Alternating Learning

To handle incomplete multi-view FS in distributed scenarios while ensuring privacy, we adopt SGWOA-TAL as the client-side optimization method within a federated learning framework. Furthermore, an aggregation and distribution strategy driven by NMI is devised to enable secure and effective collaboration among clients and the central server. Based on these designs, we propose a federated learning framework named Fed-IMUFS. The overall framework and corresponding pseudocode of Fed-IMUFS are illustrated in Figure 2 and Algorithm 2, respectively. During federated training, the NMI-driven aggregation and distribution mechanism facilitates secure information exchange while ensuring privacy preservation across all participants.

Algorithm 2: Implementation of Fed-IMUFS

$Fractalfract 09 00717 i002$

3.4.1. An Aggregation and Distribution Strategy Driven by Normalized Mutual Information

To address the variations in FS quality inherent in federated learning, we propose an aggregation and distribution strategy driven by NMI. This strategy establishes a bidirectional communication loop, in which high-quality local models are integrated during the aggregation phase, and the optimized global model is subsequently broadcast to guide the next round of local updates.

(1) Local Optimization and Upload. In each federated communication round, client k independently runs the SGWOA–TLA and obtains the optimal feature weight matrix

W_{k}^{★} \in R^{d \times c}

. The client then evaluates the local clustering quality via

q_{k} = NMI (W_{k}^{★})

and uploads the pair

{W_{k}^{★}, q_{k}}

to the central server.

(2) Server-Side Aggregation. After receiving all uploads, the server assigns each client an NMI-driven adaptive weight coefficient, as defined in Equation (18).

γ_{k} = \frac{q_{k}}{\sum_{j = 1}^{K} q_{j}}, \sum_{k = 1}^{K} γ_{k} = 1,

(18)

where K denotes the number of participating clients in the current round. Using these coefficients, the server aggregates the local models into a global feature-weight matrix, as shown in Equation (19).

\tilde{W} = \sum_{k = 1}^{K} γ_{k} W_{k}^{★} .

(19)

(3) Global Model Distribution. The server then broadcasts the aggregated matrix

\tilde{W}

to all clients. This matrix provides a more discriminative initialization for the next local optimization, effectively transferring knowledge from high-quality models, mitigating data heterogeneity, and accelerating convergence.

This strategy uploads only the local feature matrix and a scalar NMI value, thereby preserving data privacy while letting clients with higher feature-selection quality exert greater influence on the global model. Consequently, it suppresses the negative impact of low-quality models and enhances overall stability and performance.

3.4.2. Privacy and Communication Overhead Analysis of Fed-IMUFS

As illustrated in Figure 3, the proposed Fed-IMUFS framework enables each client to perform FS locally using SGWOA-TAL. It adopts an aggregation and distribution strategy driven by NMI, where each client uploads its locally optimized FS weights along with the corresponding NMI value to the server for aggregation. The server has a single role—to handle aggregation and distribution management, ensuring that all clients exchange information indirectly while maintaining data privacy. The aggregated global model is then broadcast back to all clients, enabling privacy-preserving collaborative optimization. By ensuring that raw data never leaves local devices, this strategy markedly decreases the possibility of data leakage. Since only abstract model parameters (i.e., the feature weight matrix and NMI scores) are shared, the system’s privacy protection capability is substantially enhanced.

Figure 3. The framework of Fed-IMUFS. (a) Upload the locally optimal feature weight matrix while keeping the original data stored locally; (b) The server aggregates the optimal weight matrix information received from all clients; (c) The server distributes the aggregated results back to each client to further guide their local optimization.

Furthermore, benefiting from the powerful local optimization ability of SGWOA-TAL, experimental results show that Fed-IMUFS typically reaches a globally optimal solution in just 2–3 communication rounds, which significantly reduces communication frequency. In each round, clients upload their locally optimal FS parameters; the server performs weighted aggregation to construct the global model and broadcasts it back to all clients. The communication cost of this process is

O (d \cdot c)

, where d denotes the feature dimensionality and c represents the number of clusters (or the dimensionality of the latent subspace). Compared with transmitting raw data directly, this cost is substantially lower, thereby reducing bandwidth usage and further enhancing privacy protection.

3.4.3. Time Complexity Analysis of Fed-IMUFS

The computational cost of Fed-IMUFS mainly arises from client-side optimization, along with a lightweight aggregation on the server, repeated over R communication rounds. On the client side, each participant runs the two-stage SGWOA-TAL algorithm to optimize a local FS matrix. In the first stage, SGWOA, the algorithm evolves a population of P individuals over

T_{1}

iterations. Each individual represents a vectorized matrix of size

d \times c

, where d is the feature dimension and c is the number of clusters (i.e., the latent subspace dimension). Each iteration includes three position-update strategies, a row-wise

L_{2, 1}

proximal projection for inducing sparsity, and fitness evaluation based on clustering quality (e.g., NMI), all with a per-iteration cost of

O (P d c)

. Therefore, the overall complexity for Stage 1 is

O (T_{1} P d c)

.

The second stage, TAL, involves

T_{2}

iterations consisting of adaptive imputation, joint optimization, and tensor decomposition. For a dataset with V views and n samples, each TAL iteration performs imputation with cost

O (V d n)

, optimizes the objective function in Equation (10) with cost

O (V d n c)

, constructs the tensor

Z

with cost

O (V c n)

, and updates CP factors

(A, P, M)

using ALS, whose complexity is

O (r (c + n + V))

, where r is the tensor rank. Thus, the total cost for Stage 2 is

O (T_{2} (V d n c + r (c + n + V)))

. Combining both stages, the total cost per client in a single communication round is

O (T_{1} P d c + T_{2} (V d n c + r (c + n + V)))

, and for K clients, this scales to

O (K (T_{1} P d c + T_{2} (V d n c + r (c + n + V))))

.

On the server side, in each round, the server collects the weight matrices and NMI scores from all K clients, computes adaptive weight coefficients, aggregates K matrices of size

d \times c

, and broadcasts the global model. These operations require a total cost of

O (K d c)

per round. Therefore, considering R rounds of communication, the total time complexity of the Fed-IMUFS training procedure is given by

O (R K (T_{1} P d c + T_{2} (V d n c + r (c + n + V)) + d c))

, where the overall cost is clearly dominated by the client-side optimization. Nonetheless, experimental results demonstrate that Fed-IMUFS typically reaches convergence within just 2–3 rounds, confirming its practical efficiency and scalability for large-scale distributed systems such as IIoT environments.

The final time complexity of the complete Fed-IMUFS model across R federated rounds is:

O (R K [T_{1} P d c + T_{2} (V d n c + r (c + n + V))])

(20)

Here, the leading order term arises from client-side iterative optimization and tensor learning, dominated by the product of R federated rounds and K clients. The first part corresponds to SGWOA’s population-based search

(T_{1} P d c)

, while the second part represents the TAL stage with imputation and CP decomposition.

4. Experiments and Analysis

This section first presents the overall experimental setup, including dataset descriptions, experimental environment, baseline methods, and parameter configurations. Then, a comprehensive analysis of the proposed approach is conducted, including SGWOA’s global optimization capability, the performance of Fed-IMUFS, and several ablation studies. Finally, a statistical significance analysis is performed on Fed-IMUFS to further validate its effectiveness.

4.1. Experimental Setup

This subsection presents the dataset description, experimental setup, baseline algorithms and parameter settings, and the comparison schemes.

4.1.1. Dataset Description

For the client-side SGWOA-TAL optimization, we focus on validating the global optimization capability of the first-stage metaheuristic algorithm—SGWOA—using the CEC2022 benchmark function set []. This is because the effectiveness of the second-stage TAL module has already been established by the TIME-FS framework []. To evaluate the proposed Fed-IMUFS framework in real-world scenarios, we perform comprehensive experiments on eight multi-view datasets [,], covering a variety of domains including object recognition, text classification, handwritten digit recognition, and face recognition. Table 1 summarizes six representative datasets selected for detailed analysis.

Table 1. Dataset description.

The incomplete multi-view scenario is simulated using the missing-view generation strategy of [], where a specified fraction of samples is randomly omitted from each view. The robustness of the model is then evaluated under different levels of missingness with missing ratios (MR) set to 10%, 20%, 30%, 40%, 50%.

4.1.2. Experimental Environment

All experiments are performed using MATLAB R2022a on a Windows-based desktop environment. The system is configured with an Intel Core i7-13700KF processor operating at 3.40 GHz, 32 GB of DDR4 RAM, an NVIDIA RTX 3060 GPU with 12 GB of dedicated memory, and 2.29 TB of SSD storage. This hardware setup ensures sufficient computational capacity for metaheuristic optimization, multi-view FS, and federated learning experiments performed in this study.

4.1.3. Comparison Algorithms and Parameter Configuration

For the purpose of benchmarking the global optimization ability of SGWOA, we compare it against three classical metaheuristic algorithms—PSO [], GA [], and ACO []; the original WOA as a baseline; two enhanced WOA variants, namely MSBWO [] and ImWOA []; and two recently proposed optimization techniques, CHBSI [] and CEO []. To assess the effectiveness of the proposed Fed-IMUFS, which is the first federated method for MUFS, we compare it with several representative centralized MUFS approaches, including TIME-FS [] and TERUIMUFS [], both representing state-of-the-art MUFS methods, TRCA-CGL [], SCMvFS [], JMVFG [], SDFS [], and CDMvFS []. For completeness, we include a baseline that uses the full set of original features (AllFea).

The main parameters of Fed-IMUFS are set as follows: the number of federated communication rounds is 10; the trade-off parameters are

γ = 6

,

λ = 0.01

, and

η = 1

; both the subspace dimension and the number of anchors are set to c; and the maximum number of iterations is set to 100. For the first stage of SGWOA, the population size is set to P = 30. For all baseline methods, the algorithm-specific parameters are configured according to the original literature.

4.1.4. Comparison Schemes

To ensure a fair comparison, methods that do not natively handle incomplete multi-view data are subjected to a unified preprocessing pipeline: except for TIME-FS, all baselines apply mean imputation to missing entries prior to FS. For centralized baselines, we follow the standard practice of evaluating them on the full dataset, where all samples and views are aggregated into a single collection and missing entries are imputed when necessary. This setting assumes idealized full access to data and thereby serves as an upper-bound reference. In contrast, our Fed-IMUFFS is evaluated in a genuine federated environment, where the dataset is randomly partitioned across K clients and no raw data is exchanged, thus preserving privacy. We then conduct a grid-based sensitivity analysis over four key hyperparameters: the sparsity weight

λ \in {0.001, 0.005, 0.01, 0.05}

, the view-weighting exponent

γ \in {1.5, 2, 3, 4, 5, 6}

, the anchor-graph regularization coefficient

η \in {0.01, 0.1, 1, 10, 100}

, and the population size

P \in {10, 20, 30, 40, 50}

used in the external optimization stage. Since the optimal number of selected features is not known a priori in unsupervised settings [], we sweep the feature ratio (FR) from 10% to 90% in 10% increments to assess robustness across dimensionalities. All methods are evaluated with k-means clustering, using clustering accuracy (ACC) and Normalized Mutual Information (NMI) as evaluation metrics []. Each experiment is repeated 30 times, and the average result is reported.

4.2. Global Optimization Analysis of SGWOA

To provide a comprehensive assessment of the global optimization ability of the proposed SGWOA, we carried out 30 independent runs on the CEC2022 benchmark suite. Each run was executed with 500 iterations under a problem dimension of 20. Figure 4 and Figure 5 present the convergence trajectories of SGWOA alongside several representative algorithms, offering a clear comparison of their search dynamics and convergence behaviors. For relatively simple unimodal functions, including F1, F2, F6, and F12, SGWOA converges remarkably fast, reaching near-optimal values within only 10 iterations. This demonstrates its strong exploitation ability and capacity to rapidly refine promising search regions. On functions such as F3, F8, F9, and F10, the convergence is slower and typically stabilizes after about 100 iterations, reflecting a more balanced interaction between exploration and exploitation. For the more challenging multimodal functions F4, F5, F7, and F11, the algorithm requires a longer search horizon; nevertheless, it continues to improve steadily and often reaches the optimum around 200 iterations, highlighting its robustness and resistance to premature stagnation. When compared with other state-of-the-art algorithms, SGWOA consistently exhibits superior performance both in convergence speed and final solution quality. Its ability to adaptively maintain population diversity while steadily guiding the search towards optimal regions enables it to achieve reliable results across nearly all benchmark functions. These findings confirm that SGWOA is not only efficient and accurate but also stable, making it highly suitable for solving a wide range of complex optimization tasks with varying landscape characteristics.

Figure 4. Convergence curves of the compared algorithms on the first six benchmark functions of CEC2022.

Figure 5. Convergence curves of the compared algorithms on the last six benchmark functions of CEC2022.

4.3. Comparative Performance Analysis of Fed-IMUFS

To guarantee a fair comparison, we apply a consistent preprocessing pipeline to all methods that are unable to directly handle incomplete multi-view data. Specifically, with the exception of TIME-FS, missing entries in all baseline methods are first filled using mean imputation prior to FS. For centralized approaches, we follow the conventional evaluation protocol: they are trained on the fully aggregated dataset, where all available samples and views are pooled together, and missing values are imputed when necessary. This protocol effectively assumes global data availability and thus serves as an optimistic upper-bound reference.

In contrast, Fed-IMUFFS is deployed in a realistic federated scenario, where the dataset is randomly partitioned across K clients, and raw data never leaves the local devices, ensuring privacy preservation. Within this setting, we conduct a grid-based sensitivity study on four key hyperparameters: the sparsity weight

λ \in 0.001, 0.005, 0.01, 0.05

, the view-weighting exponent

γ \in 1.5, 2, 3, 4, 5, 6

, the anchor-graph regularization parameter

η \in 0.01, 0.1, 1, 10, 100

, and the population size

P \in 10, 20, 30, 40, 50

employed in the external optimizer.

Since the optimal number of selected features is unknown in unsupervised learning [], we additionally vary the feature ratio (FR) from 10% to 90% in increments of 10% to examine robustness against dimensionality changes. All methods are ultimately evaluated using k-means clustering, and ACC together with NMI [] are reported as evaluation metrics. Each experiment is repeated 30 independent times, and the mean performance is reported.

Figure 6 and Figure 7 illustrate the performance of different methods in terms of ACC and NMI under varying FR(10–90%) with a fixed MR of 40%. As shown, Fed-IMUFS consistently outperforms all competing methods across the entire range of FS ratios. Specifically, on the Caltech101, ORL_mtv, and WebKB_mtv datasets, Fed-IMUFS achieves an average improvement of over 11% in both ACC and NMI compared with the best centralized approaches (TIME-FS and TERUIMUFS). On the COIL20, Digit4k, and HandWritten datasets, the average improvement exceeds 7% over the second-best methods, and Fed-IMUFS also demonstrates superior performance on the remaining datasets. Figure 8 and Figure 9 present the results with varying MR values while keeping the FR fixed at 40%. Across all datasets, Fed-IMUFS continues to achieve consistently higher ACC and NMI than other competitors. These improvements in ACC and NMI suggest that Fed-IMUFS can extract more discriminative and stable features from incomplete multi-view data, thereby producing more reliable clustering results in realistic privacy-preserving federated scenarios.

Figure 6. ACC of various methods on eight datasets under varying feature selection ratios (5 Clients).

Figure 7. NMI of various methods on eight datasets under varying feature selection ratios (5 Clients).

Figure 8. ACC of various methods on eight datasets with varying missing ratios (5 Clients).

Figure 9. NMI of various methods on eight datasets with varying missing ratios (5 Clients).

To evaluate the runtime efficiency and computational complexity of the proposed Fed-IMUFS, three datasets—COIL20, Caltech101, and HandWritten—are selected for comparison. The FR and MR are both set to 40%, and five clients are used in the federated configuration. Table 2 shows the average running time (averaged over 30 independent runs) and the theoretical time complexity of Fed-IMUFS compared with all baseline algorithms. The results show that Fed-IMUFS exhibits linear time complexity with respect to the number of samples n in each view.

Table 2. Average running time (s) (bold indicates the best result) and time complexity of different methods on three datasets, where

d = \sum_{v = 1}^{V} d_{v}

.

On COIL20 and Caltech101, its runtime is slightly lower than that of TIME-FS, while it achieves the shortest runtime and the best overall efficiency on HandWritten.

Figure 10 further illustrates the convergence performance of Fed-IMUFS compared with two state-of-the-art methods on the COIL20 and HandWritten datasets. It can be observed that Fed-IMUFS converges faster and attains a better optimum than the competing approaches.

Figure 10. Convergence performance of Fed-IMUFS compared with two state-of-the-art methods on the COIL20 and HandWritten datasets.

The superior performance of Fed-IMUFS stems from its integrated design, which combines the SGWOA-based feature optimizer, adaptive missing-view imputation, anchor-graph learning, and an NMI-driven aggregation and distribution strategy within a unified FL framework. This synergistic integration allows each component to complement the others, significantly enhancing overall performance and clustering quality.

4.4. Ablation Study

To evaluate the contribution of each proposed component in Fed-IMUFS, we design a series of ablation experiments and compared them with several variants: (i) Fed-IMUFS-I: It removes the first-stage SGWOA optimization and relies only on the second-stage TAL optimization; (ii) Fed-IMUFS-II: It removes the second-stage TAL optimization, relying solely on the first-stage SGWOA optimization, with missing values imputed by mean filling; (iii) Fed-IMUFS-III: To verify the effectiveness of each component in the first-stage SGWOA, this variant uses the basic WOA optimizer while retaining the second-stage TAL optimization; (iv) Fed-IMUFS-IV: It removes the

L_{2, 1}

proximal projection in SGWOA while retaining TAL optimization; (v) Fed-IMUFS-V: It removes the adaptive diversity control mechanism in SGWOA while retaining TAL optimization; (vi) Fed-IMUFS-VI: It removes the elite kernel injection mechanism in SGWOA while retaining TAL optimization; (vii) Fed-IMUFS-VII: It removes the NMI-driven federated aggregation and allocation strategy, replacing it with a simple average-weighted aggregation scheme. Table 3 reports the average results over 30 runs on eight datasets with a MR of 40%, a FR of 30%, and 5 clients.

Table 3. Comparison of ACC(%) and NMI(%) between IMUFFS, its ablation variants, and Fed-IMUFFS-VII, where * indicates significant improvement (Wilcoxon test,

p < 0.05

).

The experimental results demonstrate three main findings: (1) Compared with full Fed-IMUFS, the performance of Fed-IMUFS-III, Fed-IMUFS-IV, Fed-IMUFS-V, and Fed-IMUFS-VI drops significantly, confirming the essential role of SGWOA enhancements (i.e.,

L_{2, 1}

proximal projection, adaptive diversity control, and elite kernel injection) in boosting performance and laying a stronger foundation for adaptive missing-view imputation and FS in the TAL stage; (2) Fed-IMUFS-II performs substantially worse, showing that incorporating TAL with adaptive view imputation, consistent anchor graphs, and the alternating update strategy is vital for improving local structure modeling and clustering accuracy; (3) Fed-IMUFS-VII also exhibits degraded performance, indicating that replacing the NMI-driven federated aggregation and distribution strategy with a simple average-weighted scheme reduces global consistency and accuracy. This validates the effectiveness of the proposed NMI-driven aggregation and distribution strategy in enhancing system-wide performance.

4.5. Parameter Sensitivity Analysis

Fed-IMUFS involves four key tuning parameters:

γ

,

η

,

λ

, and the population size P. In the experiments, FR is fixed at 40% and MR at 30%, with five clients participating. The algorithm’s performance is assessed under various parameter configurations on multiple datasets. Each result is averaged over 30 independent runs, as presented in Figure 11, Figure 12 and Figure 13, and the ACC performance on the Caltech101, COIL20, and Digit4k datasets is reported. As illustrated in Figure 11, Figure 12 and Figure 13, Fed-IMUFS consistently achieves the best performance on all three datasets when

γ = 6

,

η = 1

,

λ = 0.1

, and

P = 30

. Moreover, the parameter P shows a relatively stable influence on the overall performance of Fed-IMUFS.

Figure 11. ACC of Fed-IMUFS on the Caltech101 dataset across varying values of

γ

,

η

,

λ

, and P.

Figure 12. ACC of Fed-IMUFS on the Coil20 dataset across varying values of

γ

,

η

,

λ

, and P.

Figure 13. ACC of Fed-IMUFS on the Digit4k dataset across varying values of

γ

,

η

,

λ

, and P.

4.6. Analysis of Statistical Significance

To investigate statistical differences between Fed-IMUFS and competing approaches, the experimental setup fixes the FR at 40% and the MR at 30%, involving five federated clients. ACC is adopted as the evaluation index. For significance testing, the Wilcoxon signed-rank test is applied at a 0.05 level, and the pairwise comparison outcomes are provided in Table 4. A p-value smaller than 0.05 indicates that the performance gap is statistically significant, while larger values imply that no reliable distinction can be established. To complement p-value, effect sizes are also reported:

(0, 0.2]

denotes a small effect,

(0.2, 0.5]

a medium effect, and

(0.5, 0.8]

a large effect. In the table, the markers +”, ≈”, and “−” correspond to Fed-IMUFS being better, statistically equivalent, or worse than the alternative, respectively.

Table 4. Wilcoxon signed-rank test results across eight datasets under

α = 0.05

.

Empirical evidence shows that Fed-IMUFS achieves superior results across most datasets. When compared with TIME-FS, it yields improvements on six datasets, produces comparable outcomes on one dataset, and performs worse on one dataset. The associated p-value of

1.1511 \times 10^{- 2}

confirms statistical significance, while an effect size of 0.4547 suggests a medium-level advantage. Extending the comparison to other baselines, Fed-IMUFS demonstrates statistically significant gains on eight datasets, underscoring the method’s robustness and stable performance improvements across diverse conditions.

5. Conclusions

In this paper, we introduce Fed-IMUFS, an FL-based approach for incomplete MUFS, integrating SGWOA with TAL and an NMI-driven federated aggregation and distribution strategy. Locally, SGWOA enforces row sparsity and strengthens global search, while TAL completes missing views and jointly refines low-dimensional representations to preserve cross-view consistency. Globally, the server aggregates only weight matrices and quality indicators via NMI and redistributes a unified FS model, achieving client-wide consistency with strong privacy protection. Experiments on CEC2022 and multiple real-world datasets show that Fed-IMUFS surpasses state-of-the-art methods in global optimization, FS quality, convergence stability, missing-view completion, and computational efficiency. These advantages confirm the practical value of Fed-IMUFS for deployment in real distributed environments where data privacy and incompleteness are key concerns.

Nevertheless, this study has certain limitations. In FL environments, the communication overhead may increase with the number of participating clients. In addition, the current imputation strategy primarily focuses on numerical completion rather than semantic understanding. Future work will explore more advanced imputation and aggregation mechanisms, reduce communication costs, and extend the framework to larger-scale, more heterogeneous datasets, as well as to online or stream-processing scenarios, thereby further validating its scalability and generalization capability.

Author Contributions

Conceptualization, Y.Y. and W.W.; methodology, Y.Y. and C.-A.X.; software, Y.Y.; validation, C.-A.X., W.Z. and C.J.; formal analysis, Y.Y.; investigation, W.Z. and C.J.; resources, W.W. and C.-A.X.; data curation, W.Z.; writing—original draft preparation, Y.Y.; writing—review and editing, W.W.; visualization, Y.Y.; supervision, W.W.; project administration, W.W.; funding acquisition, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data used and/or analyzed in this study are openly available, with access details and relevant dataset references provided in the manuscript. Further details can be obtained from the corresponding author upon reasonable request.

Conflicts of Interest

The authors confirm that they have no competing interests.

References

Lv, J.; Kang, Z.; Wang, B.; Ji, L.; Xu, Z. Multi-view subspace clustering via partition fusion. Inf. Sci. 2021, 560, 410–423. [Google Scholar] [CrossRef]
Wan, Y.; Sun, S.; Zeng, C. Adaptive similarity embedding for unsupervised multi-view feature selection. IEEE Trans. Knowl. Data Eng. 2020, 33, 3338–3350. [Google Scholar] [CrossRef]
Tang, J.; Li, D.; Tian, Y. Image classification with multi-view multi-instance metric learning. Expert Syst. Appl. 2022, 189, 116117. [Google Scholar] [CrossRef]
Ma, W.; Zhou, X.; Zhu, H.; Li, L.; Jiao, L. A two-stage hybrid ant colony optimization for high-dimensional feature selection. Pattern Recognit. 2021, 116, 107933. [Google Scholar] [CrossRef]
Hao, P.; Gao, W.; Hu, L. Embedded feature fusion for multi-view multi-label feature selection. Pattern Recognit. 2025, 157. [Google Scholar] [CrossRef]
Ran, W.; Yuan, W.; Zheng, Y. You Always Recognize Me (YARM): Robust Texture Synthesis Against MultiView Corruption. In Proceedings of the Forty-Second International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
Hu, R.; Gan, J.; Zhan, M.; Li, L.; Wei, M. Unsupervised Kernel-based Multi-view Feature Selection with Robust Self-representation and Binary Hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 17287–17294. [Google Scholar]
Duan, M.; Song, P.; Zhou, S.; Mu, J.; Liu, Z. Consensus and discriminative non-negative matrix factorization for multi-view unsupervised feature selection. Digit. Signal Process. 2024, 154, 104668. [Google Scholar] [CrossRef]
Liang, C.; Wang, L.; Liu, L.; Zhang, H.; Guo, F. Multi-view unsupervised feature selection with tensor robust principal component analysis and consensus graph learning. Pattern Recognit. 2023, 141, 109632. [Google Scholar] [CrossRef]
Xu, W.; Huang, M.; Jiang, Z.; Qian, Y. Graph-based unsupervised feature selection for interval-valued information system. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 12576–12589. [Google Scholar] [CrossRef]
Mi, Y.; Chen, H.; Luo, C.; Horng, S.J.; Li, T. Unsupervised feature selection with high-order similarity learning. Knowl.-Based Syst. 2024, 285, 111317. [Google Scholar] [CrossRef]
Li, Z.; Nie, F.; Bian, J.; Wu, D.; Li, X. Sparse PCA via ℓ_2,p-Norm Regularization for Unsupervised Feature Selection. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 45, 5322–5328. [Google Scholar]
Cao, Z.; Xie, X. Structure learning with consensus label information for multi-view unsupervised feature selection. Expert Syst. Appl. 2024, 238, 121893. [Google Scholar] [CrossRef]
Fang, S.G.; Huang, D.; Wang, C.D.; Tang, Y. Joint multi-view unsupervised feature selection and graph learning. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 8, 16–31. [Google Scholar] [CrossRef]
Zhou, S.; Song, P.; Yu, Y.; Zheng, W. Structural regularization based discriminative multi-view unsupervised feature selection. Knowl.-Based Syst. 2023, 272, 110601. [Google Scholar] [CrossRef]
Amara, A.; Taieb, M.A.H.; Aouicha, M.B. A multi-view GNN-based network representation learning framework for recommendation systems. Neurocomputing 2025, 619, 129001. [Google Scholar] [CrossRef]
Yu, H.T.; Song, M. Mm-point: Multi-view information-enhanced multi-modal self-supervised 3d point cloud understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 6773–6781. [Google Scholar]
Wang, X.; Zhang, Z.; Shen, G.; Lai, S.; Chen, Y.; Zhu, S. Multi-view knowledge graph convolutional networks for recommendation. Appl. Soft Comput. 2025, 169, 112633. [Google Scholar] [CrossRef]
Han, T.; Gong, X.; Feng, F.; Zhang, J.; Sun, Z.; Zhang, Y. Privacy-preserving multi-source domain adaptation for medical data. IEEE J. Biomed. Health Inform. 2022, 27, 842–853. [Google Scholar] [CrossRef] [PubMed]
Wang, H.; Yao, M.; Chen, Y.; Xu, Y.; Liu, H.; Jia, W.; Wang, Y. Manifold-based incomplete multi-view clustering via bi-consistency guidance. IEEE Trans. Multimed. 2024, 26, 10001–10014. [Google Scholar] [CrossRef]
Yin, J.; Wang, P.; Sun, S.; Zheng, Z. Incomplete Multi-View Clustering via Multi-Level Contrastive Learning. IEEE Trans. Knowl. Data Eng. 2025, 37, 4716–4727. [Google Scholar] [CrossRef]
Huang, Y.; Lu, M.; Huang, W.; Yi, X.; Li, T. Time-fs: Joint learning of tensorial incomplete multi-view unsupervised feature selection and missing-view imputation. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 17503–17510. [Google Scholar]
Chen, X.; Ren, Y.; Xu, J.; Lin, F.; Pu, X.; Yang, Y. Bridging gaps: Federated multi-view clustering in heterogeneous hybrid views. In Proceedings of the Advances in Neural Information Processing Systems 37 (NeurIPS 2024), Vancouver, BC, Canada, 9–15 December 2024; pp. 37020–37049. [Google Scholar]
Gao, M.; Zheng, H.; Feng, X.; Tao, R. Multimodal fusion using multi-view domains for data heterogeneity in federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 16736–16744. [Google Scholar]
Banerjee, S.; Bhuyan, D.; Elmroth, E.; Bhuyan, M. Cost-efficient feature selection for horizontal federated learning. IEEE Trans. Artif. Intell. 2024, 5, 6551–6565. [Google Scholar] [CrossRef]
Mahanipour, A.; Khamfroush, H. FMLFS: A federated multi-label feature selection based on information theory in IoT environment. In Proceedings of the 2024 IEEE International Conference on Smart Computing (SMARTCOMP), Dublin, Ireland, 29 June–3 July; IEEE: Piscataway, NJ, USA, 2024; pp. 166–173. [Google Scholar]
Bohrer, J.D.S.; Dorn, M. Enhancing classification with hybrid feature selection: A multi-objective genetic algorithm for high-dimensional data. Expert Syst. Appl. 2024, 255, 124518. [Google Scholar] [CrossRef]
Song, X.; Zhang, Y.; Gong, D.; Liu, H.; Zhang, W. Surrogate sample-assisted particle swarm optimization for feature selection on high-dimensional data. IEEE Trans. Evol. Comput. 2022, 27, 595–609. [Google Scholar] [CrossRef]
Marzbani, F.; Osman, A.H.; Hassan, M.S. Two-Stage Hybrid Feature Selection: Integrating ACO Algorithms with a Statistical Ensemble Technique for EV Demand Prediction. IEEE Trans. Ind. Appl. 2025, 61, 5091–5102. [Google Scholar] [CrossRef]
Fathi, I.S.; El-Saeed, A.R.; Hassan, G.; Aly, M. Fractional Chebyshev Transformation for Improved Binarization in the Energy Valley Optimizer for Feature Selection. Fractal Fract. 2025, 9, 521. [Google Scholar] [CrossRef]
Meng, Z.; Hu, Y.; Jiang, S.; Zheng, S.; Zhang, J.; Yuan, Z.; Yao, S. Slope Deformation Prediction Combining Particle Swarm Optimization-Based Fractional-Order Grey Model and K-Means Clustering. Fractal Fract. 2025, 9, 210. [Google Scholar] [CrossRef]
Kartci, A.; Agambayev, A.; Farhat, M.; Herencsar, N.; Brancik, L.; Bagci, H.; Salama, K.N. Synthesis and optimization of fractional-order elements using a genetic algorithm. IEEE Access 2019, 7, 80233–80246. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Liang, Z.; Shu, T.; Ding, Z. A novel improved whale optimization algorithm for global optimization and engineering applications. Mathematics 2024, 12, 636. [Google Scholar] [CrossRef]
Wu, L.; Xu, D.; Guo, Q.; Chen, E.; Xiao, W. A nonlinear randomly reuse-based mutated whale optimization algorithm and its application for solving engineering problems. Appl. Soft Comput. 2024, 167, 112271. [Google Scholar] [CrossRef]
Miao, F.; Wu, Y.; Yan, G.; Si, X. A memory interaction quadratic interpolation whale optimization algorithm based on reverse information correction for high-dimensional feature selection. Appl. Soft Comput. 2024, 164, 111979. [Google Scholar] [CrossRef]
Deng, L.; Liu, S. Deficiencies of the whale optimization algorithm and its validation method. Expert Syst. Appl. 2024, 237, 121544. [Google Scholar] [CrossRef]
Li, L.L.; Fan, X.D.; Wu, K.J.; Sethanan, K.; Tseng, M.L. Multi-objective distributed generation hierarchical optimal planning in distribution network: Improved beluga whale optimization algorithm. Expert Syst. Appl. 2024, 237, 121406. [Google Scholar] [CrossRef]
Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 2002, 1, 67–82. [Google Scholar] [CrossRef]
He, X.; Cai, D.; Niyogi, P. Laplacian score for feature selection. In Proceedings of the Advances in Neural Information Processing Systems 18 (NIPS 2005), Vancouver, BC, Canada, 5–8 December 2005. [Google Scholar]
Zhao, Z.; Liu, H. Spectral feature selection for supervised and unsupervised learning. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; pp. 1151–1157. [Google Scholar]
Shi, D.; Zhu, L.; Li, J.; Zhang, Z.; Chang, X. Unsupervised adaptive feature selection with binary hashing. IEEE Trans. Image Process. 2023, 32, 838–853. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Wang, J.; Gu, Z.; Wei, J.M.; Liu, J. Unsupervised feature selection by learning exponential weights. Pattern Recognit. 2024, 148, 110183. [Google Scholar] [CrossRef]
Li, T.; Qian, Y.; Li, F.; Liang, X.; Zhan, Z.H. Feature subspace learning-based binary differential evolution algorithm for unsupervised feature selection. IEEE Trans. Big Data 2025, 11, 99–114. [Google Scholar] [CrossRef]
Fan, Z.; Xiao, Z.; Li, X.; Huang, Z.; Zhang, C. MSBWO: A Multi-Strategies Improved Beluga Whale Optimization Algorithm for Feature Selection. Biomimetics 2024, 9, 572. [Google Scholar] [CrossRef]
Miao, F.; Wu, Y.; Yan, G.; Si, X. Dynamic multi-swarm whale optimization algorithm based on elite tuning for high-dimensional feature selection classification problems. Appl. Soft Comput. 2025, 169, 112634. [Google Scholar] [CrossRef]
Pramanik, P.; Pramanik, R.; Naskar, A.; Mirjalili, S.; Sarkar, R. U-WOA: An unsupervised whale optimization algorithm based deep feature selection method for cancer detection in breast ultrasound images. In Handbook of Whale Optimization Algorithm; Academic Press: New York, NY, USA, 2024; pp. 179–191. [Google Scholar]
Nie, F.; Zhu, W.; Li, X. Unsupervised feature selection with structured graph optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
Wang, Z.; Feng, Y.; Qi, T.; Yang, X.; Zhang, J.J. Adaptive multi-view feature selection for human motion retrieval. Signal Process. 2016, 120, 691–701. [Google Scholar] [CrossRef]
Cao, Z.; Xie, X. Multi-view unsupervised complementary feature selection with multi-order similarity learning. Knowl.-Based Syst. 2024, 283, 111172. [Google Scholar] [CrossRef]
Cao, Z.; Xie, X.; Li, Y. Multi-view unsupervised feature selection with consensus partition and diverse graph. Inf. Sci. 2024, 661, 120178. [Google Scholar] [CrossRef]
Liu, Q.; Liu, S.; Liu, X.; Dai, J. Latent Semantics and Anchor Graph Multi-layer Learning for Multi-view Unsupervised Feature Selection. IEEE Trans. Knowl. Data Eng. 2025, 37, 6032–6045. [Google Scholar] [CrossRef]
Yang, X.; Che, H.; Leung, M.F. Tensor-based unsupervised feature selection for error-robust handling of unbalanced incomplete multi-view data. Inf. Fusion 2025, 114, 102693. [Google Scholar] [CrossRef]
Yu, H.W.; Wu, J.Y.; Wu, J.S.; Min, W. Confident local similarity graphs for unsupervised feature selection on incomplete multi-view data. Knowl.-Based Syst. 2025, 316, 113369. [Google Scholar] [CrossRef]
Fan, S.; Wang, R.; Su, K. A novel metaheuristic algorithm: Advanced social memory optimization. Phys. Scr. 2025, 100, 055004. [Google Scholar] [CrossRef]
Xie, X.; Li, Y.; Sun, S. Deep multi-view multiclass twin support vector machines. Inf. Fusion 2023, 91, 80–92. [Google Scholar] [CrossRef]
Ma, Y.; Shen, X.; Wu, D.; Cao, J.; Nie, F. Cross-view approximation on grassmann manifold for multiview clustering. IEEE Trans. Neural Netw. Learn. Syst. 2024, 36, 7772–7777. [Google Scholar] [CrossRef]
Lin, Y.; Gou, Y.; Liu, X.; Bai, J.; Lv, J.; Peng, X. Dual contrastive prediction for incomplete multi-view representation learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4447–4461. [Google Scholar] [CrossRef] [PubMed]
Chakraborty, S.; Sharma, S.; Saha, A.K.; Saha, A. A novel improved whale optimization algorithm to solve numerical optimization and real-world applications. Artif. Intell. Rev. 2022, 55, 4605–4716. [Google Scholar] [CrossRef]
Mei, M.; Zhang, S.; Ye, Z.; Wang, M.; Zhou, W.; Yang, J.; Shen, J. A cooperative hybrid breeding swarm intelligence algorithm for feature selection. Pattern Recognit. 2026, 169, 111901. [Google Scholar] [CrossRef]
Dong, Y.; Zhang, S.; Zhang, H.; Zhou, X.; Jiang, J. Chaotic evolution optimization: A novel metaheuristic algorithm inspired by chaotic dynamics. Chaos Solitons Fractals 2025, 192, 116049. [Google Scholar] [CrossRef]
Li, J.; Cheng, K.; Wang, S.; Morstatter, F.; Trevino, R.P.; Tang, J.; Liu, H. Feature selection: A data perspective. ACM Comput. Surv. (CSUR) 2017, 50, 1–45. [Google Scholar] [CrossRef]

$Fractalfract 09 00717 g001$

Figure 1. Schematic of the bubble-net feeding behavior, which inspires the WOA.

$Fractalfract 09 00717 g002$

Figure 2. Workflow of Sparsity-Guided Whale Optimization and Tensor Alternating Learning.

$Fractalfract 09 00717 g003$

Figure 3. The framework of Fed-IMUFS. (a) Upload the locally optimal feature weight matrix while keeping the original data stored locally; (b) The server aggregates the optimal weight matrix information received from all clients; (c) The server distributes the aggregated results back to each client to further guide their local optimization.

$Fractalfract 09 00717 g004$

Figure 4. Convergence curves of the compared algorithms on the first six benchmark functions of CEC2022.

$Fractalfract 09 00717 g005$

Figure 5. Convergence curves of the compared algorithms on the last six benchmark functions of CEC2022.

$Fractalfract 09 00717 g006$

Figure 6. ACC of various methods on eight datasets under varying feature selection ratios (5 Clients).

$Fractalfract 09 00717 g007$

Figure 7. NMI of various methods on eight datasets under varying feature selection ratios (5 Clients).

$Fractalfract 09 00717 g008$

Figure 8. ACC of various methods on eight datasets with varying missing ratios (5 Clients).

$Fractalfract 09 00717 g009$

Figure 9. NMI of various methods on eight datasets with varying missing ratios (5 Clients).

$Fractalfract 09 00717 g010$

Figure 10. Convergence performance of Fed-IMUFS compared with two state-of-the-art methods on the COIL20 and HandWritten datasets.

$Fractalfract 09 00717 g011$

Figure 11. ACC of Fed-IMUFS on the Caltech101 dataset across varying values of

γ

,

η

,

λ

, and P.

$Fractalfract 09 00717 g012$

Figure 12. ACC of Fed-IMUFS on the Coil20 dataset across varying values of

γ

,

η

,

λ

, and P.

$Fractalfract 09 00717 g013$

Figure 13. ACC of Fed-IMUFS on the Digit4k dataset across varying values of

γ

,

η

,

λ

, and P.

Table 1. Dataset description.

Datasets	Views	Samples	Features	Classes
BBCSport	2	544	7073/6935	5
Caltech101	6	1474	48/40/254/1984/512/928	7
COIL20	3	1440	30/19/30	20
Digit4k	4	2000	240/216/47/64	10
HandWritten	2	544	4657/1125	5
ORL_mtv	3	400	4096/3304/6750	40
WebKB	3	2100	540/640/256	21
Yale	2	165	1024/3304/6750	15

Table 2. Average running time (s) (bold indicates the best result) and time complexity of different methods on three datasets, where

d = \sum_{v = 1}^{V} d_{v}

.

Table 2. Average running time (s) (bold indicates the best result) and time complexity of different methods on three datasets, where

d = \sum_{v = 1}^{V} d_{v}

.

Methods	COIL20	Caltech101	HandWritten	Time Complexity
Fed-IMUFS	2.87	39.01	5.40	$O (R K [T_{1} P d c + T_{2} (V d n c + r (c + n + V))])$
TIME-FS	0.46	6.55	7.14	$O (d n c + c^{2} n V)$
TRCA-CGL	15.81	127.30	35.69	$O (n^{2} V^{2} + 2 n^{2} V log (n) + d^{3})$
SDFS	27.68	187.16	48.45	$O (\sum_{v = 1}^{V} (n^{3} + n^{2} d_{v} + n d_{v}^{2} + d_{v}^{2}))$
CDMvFS	50.84	527.95	79.38	$O (V n^{3} + \sum_{v = 1}^{V} d_{v}^{2})$
JMVFG	3.22	68.75	12.45	$O (\sum_{v = 1}^{V} d_{v}^{3} + n \sum_{v = 1}^{V} d_{v}^{2} + n^{2} d)$
SCMvFS	19.05	256.84	27.64	$O (V n^{3} + \sum_{v = 1}^{V} d_{v}^{2})$
TERUI-MUFS	3.07	44.42	11.46	$O (max (d^{n}) n^{2})$

Table 3. Comparison of ACC(%) and NMI(%) between IMUFFS, its ablation variants, and Fed-IMUFFS-VII, where * indicates significant improvement (Wilcoxon test,

p < 0.05

).

Table 3. Comparison of ACC(%) and NMI(%) between IMUFFS, its ablation variants, and Fed-IMUFFS-VII, where * indicates significant improvement (Wilcoxon test,

p < 0.05

).

Datasets	Fed-IMUFS		Fed-IMUFS-I		Fed-IMUFS-II		Fed-IMUFS-III		Fed-IMUFS-IV		Fed-IMUFS-V		Fed-IMUFS-VII
Datasets	ACC	NMI	ACC	NMI	ACC	NMI	ACC	NMI	ACC	NMI	ACC	NMI	ACC	NMI
Caltech101	52.85 *	53.42 *	48.41	47.46	47.21	46.82	50.11	49.23	49.85	47.92	48.92	48.31	50.72	49.85
COIL20	90.55 *	96.12 *	85.81	91.97	84.98	90.14	86.42	92.11	87.05	91.85	85.63	90.72	87.42	92.23
Digit4k	76.65 *	45.64 *	71.39	40.14	70.80	39.07	72.50	42.11	71.95	41.73	70.25	40.32	72.18	41.52
HandWritten	65.18 *	65.43 *	57.95	65.23	56.62	59.77	59.84	62.09	60.21	61.84	59.45	60.37	61.72	62.35
ORL_mtv	86.73 *	87.27 *	78.81	82.97	77.08	81.64	80.45	84.11	81.62	83.92	79.58	82.16	82.05	84.02
WebKB	87.10 *	62.87 *	80.44	50.59	79.82	47.15	82.21	55.42	81.95	54.83	80.25	52.62	82.65	55.72

Table 4. Wilcoxon signed-rank test results across eight datasets under

α = 0.05

.

Table 4. Wilcoxon signed-rank test results across eight datasets under

α = 0.05

.

Methods	Superior	Comparable	Inferior	p-Value	Significance	Effect Size
Fed-IMUFS—TIME-FS	6	1	1	1.1511 × 10⁻²	+	0.4547
Fed-IMUFS—TRCA-CGL	8	0	0	2.6441 × 10⁻²	+	0.74614
Fed-IMUFS—SDFS	8	0	0	4.3422 × 10⁻³	+	0.73443
Fed-IMUFS—CDMvFS	8	0	0	6.1702 × 10⁻⁴	+	0.73227
Fed-IMUFS—JMVFG	8	0	0	1.8360 × 10⁻³	+	0.63172
Fed-IMUFS—SCMvFS	8	0	0	3.2637 × 10⁻³	+	0.62534
Fed-IMUFS—TERUIMUFS	7	1	0	3.3042 × 10⁻²	+	0.54402

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Federated Incomplete Multi-View Unsupervised Feature Selection with Fractional Sparsity-Guided Whale Optimization and Tensor Alternating Learning

Abstract

1. Introduction

2. Related Work

2.1. Single-View Unsupervised Feature Selection

2.2. Multi-View Unsupervised Feature Selection

3. The Proposed Method

3.1. Whale Optimization Algorithm

3.2. MUFS Matrix Initialization

3.3. Sparsity-Guided Whale Optimization Algorithm and Tensor Alternating Learning

3.3.1. Stage 1: Sparsity-Guided Whale Optimization Algorithm

3.3.2. Stage 2: Tensor Alternating Learning

3.4. Federated Incomplete Multi-View Unsupervised Feature Selection via Sparsity-Guided Whale Optimization Algorithm and Tensor Alternating Learning

3.4.1. An Aggregation and Distribution Strategy Driven by Normalized Mutual Information

3.4.2. Privacy and Communication Overhead Analysis of Fed-IMUFS

3.4.3. Time Complexity Analysis of Fed-IMUFS

4. Experiments and Analysis

4.1. Experimental Setup

4.1.1. Dataset Description

4.1.2. Experimental Environment

4.1.3. Comparison Algorithms and Parameter Configuration

4.1.4. Comparison Schemes

4.2. Global Optimization Analysis of SGWOA

4.3. Comparative Performance Analysis of Fed-IMUFS

4.4. Ablation Study

4.5. Parameter Sensitivity Analysis

4.6. Analysis of Statistical Significance

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics