DLPLSR: Dual Label Propagation-Driven Least Squares Regression with Feature Selection for Semi-Supervised Learning

Zhang, Shuanghao; Yang, Zhengtong; Shi, Zhaoyin

doi:10.3390/math13142290

Open AccessArticle

DLPLSR: Dual Label Propagation-Driven Least Squares Regression with Feature Selection for Semi-Supervised Learning

by

Shuanghao Zhang

¹,

Zhengtong Yang

^1,2 and

Zhaoyin Shi

^3,*

¹

Shenzhen Key Laboratory of Ultraintense Laser and Advanced Material Technology, Center for Intense Laser Application Technology, and College of Engineering Physics, Shenzhen Technology University, Shenzhen 518118, China

²

School of Applied Technology, Shenzhen University, Shenzhen 518060, China

³

College of Mechatronics and Control Engineering, Shenzhen University, Shenzhen 518060, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(14), 2290; https://doi.org/10.3390/math13142290

Submission received: 19 June 2025 / Revised: 14 July 2025 / Accepted: 15 July 2025 / Published: 16 July 2025

(This article belongs to the Special Issue Machine Learning and Optimization for Clustering Algorithms)

Download

Browse Figures

Versions Notes

Abstract

In the real world, most data are unlabeled, which drives the development of semi-supervised learning (SSL). Among SSL methods, least squares regression (LSR) has attracted attention for its simplicity and efficiency. However, existing semi-supervised LSR approaches suffer from challenges such as the insufficient use of unlabeled data, low pseudo-label accuracy, and inefficient label propagation. To address these issues, this paper proposes dual label propagation-driven least squares regression with feature selection, named DLPLSR, which is a pseudo-label-free SSL framework. DLPLSR employs a fuzzy-graph-based clustering strategy to capture global relationships among all samples, and manifold regularization preserves local geometric consistency, so that it implements the dual label propagation mechanism for comprehensive utilization of unlabeled data. Meanwhile, a dual-feature selection mechanism is established by integrating orthogonal projection for maximizing feature information with an ℓ_2,1-norm regularization for eliminating redundancy, thereby jointly enhancing the discriminative power. Benefiting from these two designs, DLPLSR boosts learning performance without pseudo-labeling. Finally, the objective function admits an efficient closed-form solution solvable via an alternating optimization strategy. Extensive experiments on multiple benchmark datasets show the superiority of DLPLSR compared to state-of-the-art LSR-based SSL methods.

Keywords:

semi-supervised classification; least squares regression; dual label propagation; fuzzy graph clustering; pseudo-label-free

MSC:

62H30

1. Introduction

Supervised learning is a cornerstone of machine learning and has been successfully applied in various domains such as computer vision, natural language processing, and biomedical analysis [1,2,3]. Classic models including K-nearest neighbor (KNN) [4,5], support vector machine (SVM) [6,7], ResNet [8,9], and vision transformer (ViT) [10,11] have shown impressive performance when trained with sufficient labeled data [12]. However, in real-world applications, acquiring labeled data are often costly and time-consuming, whereas unlabeled data are abundant and easily accessible. This has motivated the development of semi-supervised learning (SSL) approaches [13,14,15], which aim to exploit both labeled and unlabeled data for learning.

Pioneering representative methods include semi-supervised support vector machines [7,16], semi-supervised graph models [17,18,19], and non-negative matrix factorization [20,21]. Although these methods often deliver strong performance, their reliance on complex graph construction and iterative optimization incurs high computational costs, particularly on large-scale datasets. As a more efficient alternative, least squares regression (LSR) has gained traction in SSL due to its simplicity and convexity. By modeling label inference as a regression problem with manifold or graph regularization, LSR-based methods provide closed-form solutions and competitive performance with substantially lower computational overhead.

Least squares regression (LSR) aims to learn a projection

W

that maps features to labels via least-squares minimization. In semi-supervised scenarios, where labeled data are limited, existing LSR-based methods typically enhance either feature selection (FS) or label propagation (LP) to improve performance. For FS, ℓ_2,1-norm regularization [22], redundancy minimization [23], and discriminant-guided projections [24,25,26] have been explored to promote sparsity and class separability. For LP, early representative works such as flexible manifold embedding (FME) [27] and its acceleration [28] jointly learn graphs and classifiers. Recent advances include binary hash-based propagation [29], adaptive graph learning [30,31], and clustering-guided LSRs [32,33,34] that eliminate manual graphs.

While recent advances have enhanced feature selection in semi-supervised LSR frameworks, challenges remain in designing effective label propagation mechanisms. Existing methods either rely on adaptive graphs built from self-expression or local manifolds, which may overlook intrinsic sample similarity, or on pseudo-labels derived from clustering, whose quality is often unreliable. To overcome these issues, we propose a dual label propagation framework that integrates fuzzy graph-based clustering and adaptive manifold regularization, eliminating the need for pseudo-labels. Meanwhile, a dual feature selection strategy combining orthogonal projection and ℓ_2,1-norm regularization is introduced to improve the quality and compactness of the learned representations. The proposed method admits a closed-form solution and achieves superior performance across diverse benchmarks.

The major contributions of this paper are summarized as follows:

Dual label propagation mechanism: The global structure is preserved via the fuzzy graph, while the adaptive manifold regularization captures local geometric relationships among samples. Benefiting from this design, a dual label propagation mechanism is established to enable effective and consistent knowledge transfer from labeled to unlabeled data.
Dual feature selection mechanism: An orthogonal projection is employed to preserve feature diversity and maximize information retention, while an ℓ_2,1-norm regularization imposes structured sparsity to eliminate irrelevant dimensions. This dual feature selection mechanism enhances the robustness and discriminant capability of the learned representations under limited supervision.
Pseudo-labels free framework: The proposed framework discards the use of pseudo-labels typically required in semi-supervised LSR models, thereby avoiding performance degradation caused by low-quality supervision. Instead, it transfers supervision from labeled to unlabeled data solely based on structural relationships, which are fully captured through fuzzy graph similarity and manifold regularization.
End-to-end unified optimization: The model eliminates manual intervention and integrates all components into a single unified objective. It supports fully end-to-end optimization, allowing all modules to be jointly trained via an alternating strategy with closed-form solutions and guaranteed convergence.

Compared with the state-of-the-art (SOTA) semi-LSR, such as DRLSR [34], which performs a single label propagation path by generating pseudo-labels via fuzzy clustering, the proposed pseudo-label-free framework DLPLSR eliminates the need for membership estimation or explicit label assignment. Instead, it directly learns a fuzzy similarity matrix derived from a global clustering partition, enabling label propagation without relying on potentially inaccurate pseudo-labels. At the same time, a manifold similarity matrix based on local geometric structure is incorporated, establishing a dual label propagation mechanism that jointly captures both global and local relationships within a unified optimization framework. Moreover, the designed orthogonal sparse dual feature extraction mechanism enables DLPLSR to capture sample features more accurately and effectively than DRLSR, despite the latter lacking any specialized structural design.

The remainder of this paper is organized as follows. The next section provides the necessary preliminaries and a review of related works. In Section 3, we present the proposed model in detail. Section 4 elaborates on the optimization procedure. Experimental results and comparisons are reported in Section 5 to validate the effectiveness of our method. Finally, the last section concludes the paper and discusses potential directions for future research.

2. Preliminaries and Related Works

2.1. Notations and Definitions

To ensure consistent notation and improve readability, we specify the symbol conventions used throughout this paper. Matrices are denoted by bold, uppercase, upright letters such as

X

. Vectors are denoted by lowercase, italic, bold letters, where

x_{i}

represents the vector of the i-th column and

x^{i}

may be used to indicate the i-th column vector as well. Scalars are denoted by lowercase, regular (non-bold, nonitalic) letters; for example,

x_{i j}

represents the

(i, j)

-th element of a matrix.

With the notation clarified, we now formally define the semi-supervised classification problem considered in this paper. Given a training dataset

X = [x_{1}, x_{2}, \dots, x_{N}] \in R^{d \times N}

consisting of N samples, where each column

x_{i} \in R^{d}

denotes a d-dimensional sample. Assume that the first

N_{l}

samples are labeled and the remaining

N_{u} = N - N_{l}

samples are unlabeled. The labeled samples are associated with label matrix

Y_{l} = [y_{1}, y_{2}, \dots, y_{N_{l}}] \in R^{N_{l} \times K}

, where K is the number of classes and each

y_{i}

is a one-hot vector indicating the class membership of the i-th labeled sample. The unlabeled samples do not have associated ground-truth labels. The objective of semi-supervised classification is to leverage both the labeled data

{X_{l}, Y_{l}}

and the unlabeled data

X_{u}

to learn an effective classifier

f_{θ}

that can accurately predict the labels of the unlabeled samples as well as unseen test data.

2.2. Least Squares Regression

LSR serves as one of the most fundamental models for supervised learning, which is commonly solved by least squares methods, shown in Figure 1. By learning a linear transformation, it projects the input data from the feature space

X

into the label space

Y

, aiming to minimize the discrepancy between the predicted outputs and the true labels:

min_{W} {∥ W^{T} X - Y ∥}_{F}^{2} + λ R (W),

(1)

where

W \in R^{d \times K}

is the projection matrix,

R (W)

denotes the regularization term, and

λ

is the regularization coefficient. The choice of

R (W)

plays a key role in controlling the complexity of the model and improving generalization. Common strategies include the Frobenius norm

{∥ W ∥}_{F}^{2} = \sum_{i, j} w_{i j}^{2}

, the ℓ₁-norm

{∥ W ∥}_{1} = \sum_{i, j} | w_{i j} |

, and a combination of both, corresponding to ridge regression [35,36], Lasso regression [37,38], and elastic net regression [39,40], respectively. Ridge regression encourages smoothness and prevents overfitting through ℓ₂-norm regularization, while Lasso regression promotes sparsity and enables feature selection via ℓ₁-norm regularization. Elastic net integrates both penalties, balancing sparsity and stability, and is particularly effective for handling correlated features.

By solving for

W

, a linear regression classifier can be obtained, which enables label prediction for unseen test samples. However, it is important to note that LR can only utilize the labeled samples, that is,

X = X_{l}

and

Y = Y_{l}

, while ignoring the abundant unlabeled training data commonly available in semi-supervised scenarios. This limitation significantly restricts the applicability of LSR in semi-supervised classification tasks, where leveraging both labeled and unlabeled data is essential for improving learning performance.

2.3. Semi-Supervised LSR

SSL fundamentally relies on the construction of an effective bridge between labeled and unlabeled data, with the core objective of propagating label information from the labeled subset to the larger unlabeled portion, a process commonly referred to as the label propagation (LP) mechanism. Typical approaches adopt graph-based perspectives that assume that neighboring or clustered samples are likely to share the same label. Representative works include Gaussian fields and harmonic functions (GFHF) [41], learning with local and global consistency (LGC) [42], and the manifold regularization geometric framework (MRGF) [17], which construct affinity graphs and enforce smoothness over the data manifold to effectively guide label diffusion.

To extend LSR to SSL scenarios, it is essential to incorporate unlabeled data into the whole learning framework. Using abundant unlabeled samples to improve the feature selection performance is an intuitive approach. The representative is rescaled linear square regression (RLSR) [22]:

\begin{matrix} min_{W, b, θ, Y_{u}} & ({∥W^{T} diag (θ^{\frac{1}{2}}) X + b 1 - Y∥}_{F}^{2} + γ {∥ W ∥}_{F}^{2}) \\ s . t . & Y = [Y_{l}, Y_{u}], Y_{u} \geq 0, 1^{T} Y_{u} = 1^{T}, θ > 0, 1^{T} θ = 1, \end{matrix}

(2)

where

θ

is a feature selection vector and

b

is the bias term.

Similarly, Wang et al. [26] extend the RLSR framework by introducing an

ϵ

-drag matrix, leading to the sparse discriminative semi-supervised feature selection (SDSSFS) method. Meanwhile, sparse rescaled linear square regression (SRLSR) [43] improves RLSR by enforcing stronger sparsity on the learned projection matrix

W

.

In addition to feature selection, customized label propagation is also vital to model performance. To overcome the drawbacks of two-stage graph-based methods, Bao et al. [30] proposed robust embedding regression (RER), which jointly learns the classifier and the self-expression matrix

Z

of all training samples.

\begin{matrix} min_{W, Y, Z, E} & {∥W^{⊤} X Z - Y∥}_{*} + λ_{1} {∥W∥}_{2, 1} + λ_{2} Tr (Y L_{Z} Y^{T}) + λ_{3} {∥E∥}_{2, 1} \\ s . t . & Y = [Y_{l}, Y_{u}], X = X Z + E, diag (Z) = 0, Z \geq 0, Z^{T} 1 = 1, \end{matrix}

(3)

where

L_{Z}

is the Laplacian matrix of

Z

. Complementary to global self-expression-based approaches, local manifold structure-based graph construction is also an effective strategy. Liao et al. [31] proposed the AGLSOFS framework, which was further enhanced with entropy regularization and sparsity regularization for adaptive graph learning.

Moreover, treating pseudo-labels as a soft assignment matrix offers a flexible LP strategy for semi-supervised learning. For example, unified dual label learning model for semi-supervised feature aelection (UDM-SFS) [32] formulates the pseudo-label assignment as:

\begin{matrix} min_{W, Y_{u}} & \sum_{j = 1}^{N} \sum_{i = 1}^{K} y_{i j}^{m} {∥W^{T} x_{j} - t_{i}∥}_{2}^{2} \\ s . t . & Y = [Y_{l}, Y_{u}], Y_{u} \geq 0, Y_{u} 1 = 1, {∥ W ∥}_{2, 0} = C, \end{matrix}

(4)

where

T \in R^{K \times K}

denotes an identity matrix representing class prototypes and C denotes a constant. Building upon UDM-SFS, Qi et al. [44] formally proposed the class-credible pseudo-label learning (CPL) framework, which provides a general optimization framework for pseudo-label-based learning. In addition, the introduction of anchor graph structures [34] has been shown to further improve the efficiency of such methods.

2.4. Fuzzy Graph and Its Derived Clustering

For a large number of unlabeled samples, clustering remains one of the most intuitive and widely adopted strategies. As discussed earlier, it is typically integrated into semi-supervised LSR frameworks via pseudo-labels. However, their quality is often unreliable and may degrade performance. Fuzzy graph theory [45] provides an alternative perspective by interpreting the fuzzy membership matrix

U \in R^{N \times K}

as an anchor graph, yielding a complete similarity matrix

S = U Diag (U^{T} 1) U^{T}

.

Under this formulation, [45] revisited the fuzzy K-means (FKM) clustering objective and derived the following expression:

\begin{matrix} min_{S} & - Tr (X S X^{T}) + α R (S) \\ s . t . & S \in Ω_{S}, \end{matrix}

(5)

where

Ω_{S}

is the abstract constraints of

S

. This formulation provides a principled way to embed clustering structure into similarity learning without relying on pseudo-labels.

3. Model Description

In this section, we present a novel semi-supervised classification framework that avoids pseudo-label learning and instead performs dual-path label propagation through both manifold structure and fuzzy clustering. The core idea is to learn a discriminative linear classifier based on labeled data, while simultaneously leveraging the geometric structure and unsupervised clustering patterns of the entire dataset.

3.1. Pseudo-Label-Free Semi-LSR with LP Based on Manifold

We begin with the basic classification objective of semi-supervised least squares regression, which utilizes only the labeled samples to construct the supervised loss. To further enhance the utilization of the unlabeled data, we introduce a manifold regularization term that captures the local geometric structure among all training samples, including both labeled and unlabeled data. The resulting formulation is as follows:

min_{W} {∥W^{T} X_{l} - Y_{l}∥}_{F}^{2} + λ_{1} Tr (X L_{S} X^{T}),

(6)

where

X_{l}

and

Y_{l}

denote the labeled data and labels, and

L_{S}

is the graph Laplacian matrix derived from a similarity matrix

S

constructed on all training data. This term enables label propagation over the manifold structure, thereby guiding the learning process with both labeled and unlabeled information.

Note that we do not assign pseudo-labels to the unlabeled data. Instead, the unlabeled samples influence the model through the structural constraint encoded by the graph Laplacian, forming the first label propagation path without explicit label learning.

3.2. Dual Label Propagation

While the previous subsection leverages local geometric structure via manifold regularization, this part complements it by introducing a global perspective of label propagation. Specifically, we employ fuzzy clustering to construct a task-adaptive similarity matrix

S

, inspired by fuzzy graph theory [45]. This formulation avoids the need for pseudo-labels, enabling global propagation based on soft cluster assignments over the entire dataset.

The similarity matrix

S

not only provides a new propagation path through global relationships but also replaces the fixed graph used in manifold regularization, allowing joint adaptation. As a result, a unified dual label propagation mechanism is established: the local path relies on the Laplacian

L_{S}

, while the global path utilizes the learned

S

directly.

Based on this formulation, we jointly optimize the classifier

W

and the fuzzy graph

S

via the following objective function:

\begin{matrix} min_{W, S} & ∥ W^{T} X_{l} - Y_{l} ∥_{F}^{2} + λ_{1} Tr (X L_{S} X^{T}) \\ + λ_{2} [Tr (W^{T} X X^{T} W) - Tr (W^{T} X S X^{T} W)] + λ_{3} {∥ S ∥}_{F}^{2} \\ s . t . & S \geq 0, S 1 = 1, diag (S) = 0 . \end{matrix}

(7)

The Frobenius regularization on

S

helps prevent overfitting by controlling the scale and complexity of the similarity matrix. It avoids dense structures that may encode noise or spurious patterns, which is especially important when labeled data are limited. A larger value of

λ_{3}

encourages a sparser and more stable similarity structure that better captures the intrinsic relationships among samples. The fuzzy graph further enables adaptive connection strengths between samples, resulting in more flexible and informative propagation paths.

3.3. Dual Feature Selection

While the previous sections focus on label propagation and the utilization of unlabeled data, this subsection shifts attention to the features themselves. Under limited supervision, learning compact and discriminative features is crucial for model generalization. To achieve this, we introduce a dual feature selection mechanism.

First, an orthogonality constraint

W^{T} W = I

is imposed on the projection matrix, which eliminates feature redundancy and maximizes variance preservation in the projected space. This constraint complements the adaptive dual label propagation by ensuring that the learned features are uncorrelated and carry maximal information.

To further enhance discriminability, we incorporate an ℓ_2,1 regularization term on

W

. It enforces structured sparsity by shrinking entire rows of

W

towards 0. This operation effectively removes globally irrelevant features across all classes, leading to a compact, robust, and interpretable model.

Together, these two terms construct a dual feature selection mechanism: orthogonality ensures feature diversity, while sparsity enforces discriminative power by eliminating redundant dimensions. The overall optimization integrates dual propagation and dual feature selection into a unified framework:

\begin{matrix} min_{W, S} & {∥W^{T} X_{l} - Y_{l}∥}_{F}^{2} + λ_{1} Tr (X L_{S} X^{T}) \\ + λ_{2} [Tr (W^{T} X X^{T} W) - Tr (W^{T} X S X^{T} W)] \\ + λ_{3} {∥S∥}_{F}^{2} + λ_{4} {∥W∥}_{2, 1} \\ s . t . & W^{T} W = I, S \geq 0, S 1 = 1, diag (S) = 0 . \end{matrix}

(8)

This formulation not only facilitates label propagation through both manifold and clustering structures, but also ensures that the selected features are simultaneously informative, uncorrelated, and discriminative. The illustration of the proposed DLPLSR is shown in Figure 2.

4. Optimization Strategy

The objective function in Equation (8) involves two coupled variables

W

and

S

. To solve the problem efficiently, we adopt an alternating optimization strategy, where we iteratively update one variable while keeping the other fixed until convergence.

4.1. Algorithm Implementation

With

S

fixed, the objective reduces to:

\begin{matrix} min_{W} & {∥W^{T} X_{l} - Y_{l}∥}_{F}^{2} + λ_{1} Tr (W^{T} X L_{S} X^{T} W) \\ + λ_{2} [Tr (W^{T} X X^{T} W) - Tr (W^{T} X S X^{T} W)] + λ_{4} {∥W∥}_{2, 1} \\ s . t . & W^{T} W = I . \end{matrix}

(9)

To facilitate unified optimization, we reformulate the ℓ_2,1 norm

{∥W∥}_{2, 1}

in matrix form. Specifically, it is defined as:

{∥ W ∥}_{2, 1} = \sum_{i = 1}^{d} {∥ w^{i} ∥}_{2} = 2 Tr (W^{T} D_{W} W),

(10)

where

D_{W} \in R^{d \times d}

is a diagonal matrix with

{[D_{W}]}_{i i} = \frac{1}{2 ∥ w_{i} ∥_{2} + ϵ},

(11)

and

ϵ > 0

is a small constant added for numerical stability. This reformulation enables efficient gradient-based optimization in the alternating update framework.

Then, Equation (9) can be equivalently written as:

min_{W} Tr (W^{T} A W) - 2 Tr (W^{T} B), s . t . W^{T} W = I,

(12)

where

A = X_{l} X_{l}^{T} + λ_{1} X L_{S} X^{T} + λ_{2} (X X^{T} - X S X^{T}) + λ_{4} D_{W},

(13)

and

B = X_{l} Y_{l}^{T} .

(14)

The optimization problem Equation (12) mentioned above with the orthogonality constraint

W^{T} W = I

can be effectively solved using the generalized power iteration (GPI) method proposed by Nie et al. [46]. GPI provides a theoretically sound and computationally efficient solution for trace minimization problems under orthogonality constraints, whose complete alternating optimization procedure is summarized in Algorithm 1.

Algorithm 1 Generalized power iteration (GPI) [46] for Equation (12).

Input: Symmetric matrix

A \in R^{d \times d}

, matrix

B \in R^{d \times K}

.
Output: Orthogonal matrix

W \in R^{d \times K}

.
1: Initialize

W^{T} W = I

.
2: repeat
3: Compute

Z = 2 A W + 2 B

,
4: Perform SVD:

Z = U S V^{T}

,
5: Update

W \leftarrow U V^{T}

.
6: until convergence
7: return

W

When

W

is fixed, the objective with respect to

S

is

\begin{matrix} min_{S} & λ_{1} Tr (X L_{S} X^{T}) - λ_{2} Tr (X^{T} W W^{T} X S) + λ_{3} {∥ S ∥}_{F}^{2} \\ = \frac{λ_{1}}{2} Tr (D_{X}^{T} S) - λ_{2} Tr (X^{T} W W^{T} X S) + λ_{3} {∥ S ∥}_{F}^{2} \\ s . t . & S \geq 0, S 1 = 1, diag (S) = 0 . \end{matrix}

(15)

which is equal to

\begin{matrix} min_{S} & {∥ S - E ∥}_{F}^{2} \\ s . t . & S \geq 0, S 1 = 1, diag (S) = 0 . \end{matrix}

(16)

where

E = \frac{1}{2 λ_{3}} (λ_{1} D_{X} - λ_{2} X^{T} W W^{T} X) .

(17)

Equation (16) is a constrained quadratic programming problem for which each row of

S

can be computed independently via the linear-time complexity analytical solution proposed by Huang et al. [47].

The algorithm iterates between the

W

-step and the

S

-step until convergence. The general procedure is summarized in Algorithm 2.

Algorithm 2 DLPLSR: Dual label propagation-driven with feature selection regression for semi-supervised classification.

Input: Labeled data

\{X_{l}, Y_{l}\}

, unlabeled data

X_{u}

, hyperparameters

\{λ_{1}, λ_{2}, λ_{3}, λ_{4}\}

.
Output: Final classifier

W

.
1: Initialize

W = I_{d} (:, 1 : K)

.
2: Calculate

{[D_{X}]}_{i j} = {∥ x_{i} - x_{j} ∥}^{2}

.
3: repeat
4: Compute

E

by Equation (17).
5: Update

S

by Equation (16) according to [47].
6: Calculate

A

and

B

by Equations (13) and (14).
7: Update

W

according to Algorithm 1.
8: until convergence

4.2. Complexity Analysis

The time complexity of Algorithm 2 is primarily determined by the major matrix operations within each iteration. The pairwise squared distance matrix

D_{X}

is computed only once, with a cost of

O (N^{2} d)

. In the iterative phase, computing

E

via Equation (17) costs

O (N K d + N^{2} K)

. Updating the similarity matrix

S

involves solving Equation (16) for each of the N rows, each with complexity

O (N)

, resulting in a total cost of

O (N^{2})

. The matrix

A

in Equation (13) requires

O (N^{2} d + N d^{2})

, while the computation

B

in Equation (14) costs

O (N_{l} K d)

. The projection matrix

W

is updated using Algorithm 1, with a complexity of

O ({Iter}_{1} \cdot (K d^{2} + K^{2} d))

, where

{Iter}_{1}

denotes the number of inner GPI iterations. Assuming that Algorithm 2 converges after

epo

epochs, the total time complexity of DLPR is

O (epo \cdot [N^{2} d + {Iter}_{1} \cdot (K d^{2} + K^{2} d)])

.

The space complexity of the proposed DLPR algorithm mainly arises from storing the input data and several intermediate matrices. The data matrix

X \in R^{d \times N}

requires

O (N d)

space. The pairwise distance matrix

D_{X}

, similarity matrix

S

, and propagation matrix

E

are all of size

R^{N \times N}

, contributing

O (N^{2})

space in total. The projection matrix

W \in R^{d \times K}

requires

O (d K)

space, and the auxiliary matrices

A, B \in R^{d \times d}

require

O (d^{2})

. The labeled label matrix

Y_{l} \in R^{N_{l} \times K}

adds a negligible

O (N_{l} K)

cost. Therefore, the overall space complexity of DLPR is

O (N^{2} + N d + d^{2})

.

4.3. Convergence Analysis

The proposed DLPR algorithm adopts an alternating optimization strategy to update the variables

S

and

W

. Each subproblem has a closed-form or guaranteed-convergent solution, and the total objective function value strictly decreases at every step. Therefore, the total objective function is monotonically decreasing and bounded below, ensuring that the DLPR algorithm converges to a local optimum.

5. Experiments

In this section, we evaluate the proposed DLPLSR on 14 benchmark datasets against 6 representative SOTA semi-supervised LSR models to demonstrate its effectiveness. We also present comprehensive experimental results, implementation details, parameter settings, sensitivity analyses, an ablation study, and real-world applications experiments.

5.1. Datasets

To provide a comprehensive and systematic evaluation, we select 14 representative benchmark datasets from various domains. These include two UCI datasets (Iris (https://www.kaggle.com/datasets/uciml/iris, accessed on 1 June 2025) and Wine (https://www.kaggle.com/datasets/sgus1318/winedata), accessed on 1 June 2025), handwritten digit datasets (USPS (https://www.kaggle.com/datasets/bistaumanga/usps-dataset, accessed on 1 June 2025), MNIST-2k2k (https://www.kaggle.com/datasets/hojjatk/mnist-dataset/code, accessed on 1 June 2025), MNIST-10k (https://www.kaggle.com/datasets/hojjatk/mnist-dataset/code, accessed on 1 June 2025), and Semeion (https://www.kaggle.com/datasets/ibrahimalizade/semeion/data, accessed on 1 June 2025)), object image datasets (COIL20 (https://git-disl.github.io/GTDLBench/datasets/coil20, accessed on 1 June 2025) and COIL100 (https://www.kaggle.com/datasets/jessicali9530/coil100, accessed on 1 June 2025)), and facial image datasets (ORL (https://cam-orl.co.uk/facedatabase.html, accessed on 1 June 2025), JAFFE (https://zenodo.org/records/3451524, accessed on 1 June 2025), PIX10 (https://www.kaggle.com/datasets/shivamvyasiitm/extended-yale-face-b, accessed on 1 June 2025), Yale (http://cvc.cs.yale.edu/cvc/projects/yalefaces/yalefaces.html, accessed on 1 June 2025), YaleB (https://vision.ucsd.edu/datasets/extended-yale-face-database-b-b, accessed on 1 June 2025), and AR (https://personalpages.manchester.ac.uk/staff/timothy.f.cootes/data/tarfd_markup/tarfd_markup.html, accessed on 1 June 2025)). This diverse selection ensures a fair comparison of data characteristics in semi-supervised learning. The detail information of the employed datasets shown in Table 1, where ‘Capacity’ indicates the range of samples per class. Meanwhile, some example images are shown in Figure 3.

5.2. Baseline Methods

To comprehensively evaluate the effectiveness of the proposed DLPLSR framework, we compare it with seven representative semi-supervised LSR-based methods, which cover a wide range of design philosophies, including manifold regularization, graph learning, and label propagation:

RSSLSR (robust semi-supervised least squares regression using ℓ_2,p-norm minimization) [33]: A biased regression model in which each training sample is associated with a learnable weight. It adopts the ℓ_2,p-norm to compute the classification loss, thereby enhancing robustness against outliers and label noise.
SFS_BLL (semi-supervised feature selection with binary label learning) [29]: Performs discriminative feature selection in a binary hashing code space, enhancing class separability. However, its two-stage manual graph construction process may limit adaptability and increase sensitivity to noise.
DSLSR (discriminative sparse least squares regression) [24]: Enhances the discriminability of the regression space by employing a coordinate relaxation matrix to enlarge the distance between inter-class samples, while imposing sparsity constraints on regression features for more compact representation.
RER (robust embedding regression) [30]: Constructs an adaptive graph based on self-expressiveness, and evaluates the regression error using the nuclear norm, which captures the global low-rank structure of the error matrix from a holistic perspective. This enhances the model’s robustness against noise and outliers.
DRLSR (discriminative and robust least squares regression) [34]: Constructs an adaptive anchor-based graph and performs label propagation via the fuzzy membership matrix derived from classical fuzzy clustering, so enhances the model’s robustness and discriminability under semi-supervised scenarios.
AGLSOFS_N (adaptive orthogonal semi-supervised feature selection with reliable label matrix learning_norm) [31]: Incorporates confidence-based label learning to control inter-class overlap, employs orthogonal projection to enhance feature discriminability, and introduces a Frobenius norm regularization term to facilitate adaptive graph construction.
AGLSOFS_E (adaptive orthogonal semi-supervised feature selection with reliable label matrix learning_entropy) [31]: Similar in overall structure to AGLSOFS_N, but replaces the Frobenius norm with an entropy regularization term to achieve adaptive graph construction, resulting in a denser similarity structure.

Table 2 provides a detailed comparison of the characteristics and differences among these baseline methods and the proposed DLPLSR.

In Table 2, it should be clear that Feature Selection refers to the regularization imposed on the projection matrix

W

, rather than the transformed features

W^{T} X

. Regularizing

W

directly enforces sparsity or discriminative structure, guiding the model to suppress irrelevant features during learning, which is more fundamental than post-hoc analysis of projected outputs. Meanwhile, for pre-KNN methods, the count includes neighborhood size and Gaussian kernel bandwidth.

5.3. Evaluation Metrics

In the comparison experiments, we adopt two commonly used evaluation metrics to assess classification performance: Accuracy (ACC) and the F1-score. These metrics are briefly introduced as follows.

Starting from the binary classification setting, given a trained classifier

f_{θ}

applied to unlabeled samples (including both training and testing data), each prediction falls into one of the following four categories:

TP (True Positive): Positive samples correctly predicted as positive.
FP (False Positive): Negative samples incorrectly predicted as positive.
TN (True Negative): Negative samples correctly predicted as negative.
FN (False Negative): Positive samples incorrectly predicted as negative.

Based on these definitions, the ACC is calculated as

ACC = \frac{TP + TN}{TP + TN + FP + FN},

(18)

which measures the overall proportion of correctly classified samples. ACC is particularly informative when class distributions are relatively balanced.

However, in imbalanced scenarios—such as anomaly detection, where the number of negative samples far exceeds the number of positives—predicting all samples as negative can still yield a deceptively high ACC, despite the classifier’s failure to detect anomalies. To better assess such cases, two additional metrics are introduced:

Recall = \frac{TP}{TP + FN},

(19)

Precision = \frac{TP}{TP + FP} .

(20)

In the above example, both Recall and Precision would approach zero, indicating poor predictive performance. To balance the trade-off between these two metrics, the F1-score is defined as their harmonic mean:

F 1 = \frac{2 \cdot Precision \cdot Recall}{Precision + Recall} .

(21)

For multi-class classification problems with K classes, each class can be treated as the positive class while the others are considered negative (one-vs-all strategy). The final macro-average ACC and F1-score are computed by averaging the individual metrics across all classes:

Macro - F 1 = \frac{1}{K} \sum_{k = 1}^{K} {F 1}^{(k)},

(22)

Macro - ACC = \frac{1}{K} \sum_{k = 1}^{K} {ACC}^{(k)} .

(23)

In multi-class scenarios, where class imbalance is common, the F1-score becomes especially important as it better captures per-class performance.

5.4. Configurations

5.4.1. Testing Configuration

To ensure fairness and comparability, we adopt a unified prediction strategy for all testing samples across methods. Specifically, once the final projection matrix

W

is obtained, all testing samples are projected via

W^{T} X

(plus bias if applicable), and a standard 1-nearest neighbor (1-NN) classifier is employed to assign labels. Each testing sample is classified by finding its nearest labeled training sample in the transformed feature space.

Different methods handle prediction differently for the unlabeled training samples

X_{u}

. All evaluated methods are capable of generating pseudo-labels except to our DLPLSR, and we adopt the following rule to determine the prediction of

X_{u}

:

If a method explicitly uses pseudo-labels for unlabeled samples in its original paper, we follow that design, such as RER [30,34].
If the original paper does not specify the prediction approach for unlabeled data, including RSSLSR [33], SFS_BLL [29], DSLSR [24], AGLSOFS_N [31], and AGLSOFS_E [31], we select the strategy that achieves better performance, either pseudo-labels or 1-NN.

When pseudo-labeling is applied, the predicted label

{\hat{y}}_{i}

for each unlabeled sample is assigned as

{\hat{y}}_{i} = arg max_{j} y_{i}, for i = N_{l} + 1, \dots, N,

(24)

where

Y

is the predicted label matrix, and

N_{l}

and N denote the number of labeled and total training samples, respectively.

For our proposed DLPLSR with the pseudo-label-free framework, the unlabeled training samples are directly predicted using the same 1-NN classifier as test samples. The detailed prediction strategies of all methods are summarized in Table 3, where, ‘Unlabeled Specified’ indicates whether there is a specific prediction strategy for unlabeled training samples in the original paper.

5.4.2. Semi-Supervised Configuration

In semi-supervised learning scenarios, the proportion of labeled samples in the training set should remain low. However, it is also important to ensure that each class is represented by at least a few labeled instances, particularly for datasets with a small number of samples but a large number of classes. To balance these requirements, we uniformly set the labeled sample ratio to 10% of the total training set.

Furthermore, to guarantee fairness across all methods, we adopt the standard 50–50% split between training and testing data. For each dataset, the labeled/unlabeled and training/testing splits are pre-determined based on the above ratios via random sampling. These fixed splits are consistently applied throughout all experiments to ensure reproducibility and fair comparison.

5.4.3. Hyperparameters Configurations

The proposed DLPLSR method involves four hyperparameters, namely

λ_{1}

,

λ_{2}

,

λ_{3}

, and

λ_{4}

, each selected from the range

{10^{- 4}, 10^{- 3}, \dots, 10^{4}}

. This leads to a total of

9^{4} = 6561

unique hyperparameter combinations. To mitigate randomness, the model avoids random initialization. As outlined in Algorithm 2, the projection matrix

W

is initialized with the first K columns of the identity matrix

I_{d}

. If the feature dimension d is smaller than K, then the first d rows of

I_{K}

are used instead. Under this initialization scheme, each parameter group only requires a single training run. To support reproducibility, the optimal settings for DLPLSR on each dataset are reported in Table 4, and the corresponding relationship between the number and the data can be found in Table 1.

In comparison methods, regularization coefficients, such as

{∥ W ∥}_{2, 1}

,

{∥ W ∥}_{F}

, or terms like

Tr (W^{T} X_{l} L_{S} X^{T} W)

are also chosen from the same candidate set

{10^{- 4}, 10^{- 3}, \dots, 10^{4}}

. Additionally, specialized hyperparameters, including the p value in the ℓ_2,p-norm [33], the scaling factor in reliable label learning [31], and the fuzzifier r used in membership functions [34], as well as other task-specific settings, are configured according to their respective original papers. For fairness, all methods use the same initialization for

W

as DLPLSR, and biases in biased models are uniformly initialized to zero.

5.5. Comparison Experiments

This section presents the most critical part of the experimental evaluation. The comparison results in terms of ACC and F1-score are reported in Table 5 and Table 6, where the results are presented in Value(Rank) format. The best results are highlighted in bold, and the second-best results are underlined. The last row reports the average ranks across all datasets, respectively. As can be observed, the proposed method achieves a significant lead compared to 7 SOTA semi-supervised LSR models across 14 diverse benchmark datasets. Meanwhile, it is worth noting that compared with other types of semi-supervised learning methods, approaches based on LSR are particularly advantageous in terms of computational efficiency. Therefore, training time (Since the testing stage involves only simple pseudo-label assignment or 1-NN classification, its time consumption is negligible) is also considered an important evaluation metric. So, we report the training time of each method in Table 7, in which the results are reported in Time (Rank) format. AveT denotes the mean training time of each model averaged over all training datasets, while AveR represents the average rank of training time across all datasets.

To determine whether the average rank differences between DLPLSR and the baseline methods are statistically significant on multiple datasets, we adopt the Nemenyi post-hoc test at a 0.05 significance level. Specifically, the critical difference (CD) is calculated as:

CD = q_{α} \sqrt{\frac{N_{m} (N_{m} + 1)}{6 N_{d}}}

(25)

Here,

N_{m}

denotes the total number of compared methods (including DLPLSR),

N_{d}

is the number of datasets, and

q_{α} = 3.031

is the critical value at the 0.05 significance level. Accordingly, the critical difference (CD) is calculated to be 2.81. The significance boundaries and relative rankings under all metrics are illustrated in the critical difference diagrams in Figure 4.

As shown in Table 5 and Table 6, and Figure 4, DLPLSR exhibits consistently competitive performance across the four metrics (ACC_U, ACC_T, F1_U, and F1_T). For instance, in ACC_U, it ranks within the top three on 11 out of 14 datasets, with only USPS, Jaffe, and Yale falling outside. Its average rank is 2.64, slightly behind RER (2.21); however, the difference is not statistically significant, as it lies within the critical difference (CD = 2.81).

In ACC_T, DLPLSR demonstrates even stronger dominance, ranking within the top three on 13 out of 14 datasets, with only USPS falling outside. It achieves the best average rank of 1.79, significantly outperforming the second-best method (RER at 3.79) with a statistically significant margin, indicating superior generalization ability to unseen data.

The results for F1_U and F1_T follow similar trends. On the unlabeled data, DLPLSR ranks second, slightly behind RER, but the difference is not statistically significant. On the testing data, it achieves the best overall performance with a statistically significant lead. The number of datasets where DLPLSR ranks in the top three is also comparable to that in ACC, further confirming its robust generalization capability.

Taking specific datasets for closer inspection, on the Yale dataset, which is the most challenging overall, DLPLSR still ranks first on testing data in both Acc and F1. Similarly, in datasets such as PIX10, COIL100, and AR, our method consistently maintains the top 2 ranks. These results demonstrate that DLPLSR not only performs well on average but also avoids catastrophic failures in difficult cases, reflecting a stable and balanced generalization capability.

Moreover, compared to the other three pseudo-label-based methods, including RSSLSR, DRLSR, and AGLSOFS_E, our method shows a clear advantage in the unlabeled evaluation, outperforming them by large margins. This strongly supports the effectiveness of our pseudo-label-free framework, which reduces optimization variables, improves stability, and avoids overfitting issues caused by overly confident pseudo-labels.

With respect to the efficiency, as shown in Table 7, the proposed DLPLSR model demonstrates strong computational efficiency. It ranks in the top half among all methods, both in terms of AveT and AveR across datasets. Notably, DLPLSR achieves the third-lowest AveT (20.97 s) among all competing methods. Importantly, the time gap between DLPLSR and the fastest method (AGLSOFS_E) (11.63 s) is only 9.34 s, which is even smaller than the gap between DLPLSR and the fourth-fastest method (RSSLSR) (35.25 s). This indicates that DLPLSR sits much closer to the efficiency frontier than to slower alternatives, highlighting its advantage in terms of practical usability, especially in large-scale settings.

Moreover, DLPLSR shows consistently low training times across almost all datasets, ranking 1st or 2nd on 7 out of 14 datasets, including complex and large-scale ones such as 10k, YaleB, COIL100, and USPS. This further demonstrates that the efficiency advantage of DLPLSR becomes even more pronounced as dataset size and complexity increase. In summary, DLPLSR not only achieves competitive learning performance but also maintains a low computational cost across diverse data scenarios, confirming its practicality for efficient large-scale semi-supervised learning tasks.

5.6. Parameter Sensitivity

The proposed DLPLSR model includes four key hyperparameters:

λ_{1}

controls manifold-based label propagation,

λ_{2}

governs clustering-based label propagation,

λ_{3}

imposes graph sparsity regularization, and

λ_{4}

enforces projection sparsity via the ℓ₂₁ norm.

To evaluate the model’s robustness to hyperparameter selection, we perform a sensitivity analysis by fixing two parameters at their optimal values and varying the other two across a logarithmic range. We select three representative datasets from different application domains, including Wine (UCI tabular), YaleB (face images), and Semeion (handwritten digits), and conduct the analysis under both the Testing and Unlabeled prediction. The results are shown in Figure 5, where each row fixes two hyper-parameters at their optima referred to Table 4 and varies the other two. For example, Wine-U12 represents the impact of varying

λ_{1}

and

λ_{2}

on the ACC of unlabeled data on the Wine dataset.

Figure 5 illustrates the parameter sensitivity of DLPLSR. We observe that the model is more sensitive to

λ_{1}

and

λ_{2}

, particularly on complex datasets such as Semeion and YaleB (e.g., Figure 5e,f,i,j). Performance may vary significantly across different combinations, showing peaks and valleys rather than a flat plateau. In contrast, when

λ_{3}

and

λ_{4}

are varied with

λ_{1}

and

λ_{2}

fixed, the model generally maintains more stable accuracy, as observed in Figure 5c,d,g,h. This indicates that the regularization terms contribute to stability but are less sensitive overall. Moreover, on relatively simpler datasets such as Wine (Figure 5a–d), DLPLSR achieves consistently high accuracy across a wide range of parameters, demonstrating robustness in low-dimensional settings. These results suggest that proper tuning of

λ_{1}

and

λ_{2}

is more critical for complex tasks, while the model remains generally tolerant to variations in

λ_{3}

and

λ_{4}

.

5.7. Ablation Study

In the proposed DLPLSR, four hyperparameters are introduced in the objective function, each corresponding to a distinct model component. Specifically,

λ_{1}

and

λ_{2}

control the dual label propagation processes based on the local manifold and the global fuzzy graph, respectively,

λ_{3}

regularizes the Frobenius norm of the similarity matrix

S

, and

λ_{4}

imposes an ℓ_2,1 norm on the projection matrix

W

to promote structured sparsity. Among them,

λ_{3}

plays a pivotal role in preventing overfitting by stabilizing the learned similarity structure. Moreover, it is a core component of the model formulation and directly influences the optimization procedure. Therefore,

λ_{3}

is retained in all configurations.

To further verify the necessity of the other three terms, we have added an ablation study in the revised manuscript. By individually setting

λ_{1}

,

λ_{2}

, and

λ_{4}

to zero, we observe consistent performance degradation across various datasets, as detailed in Table 8, in which caption ‘w’ means the complete model and ‘

λ_{i}

’ means fix

λ_{i} = 0

. These results validate the contribution of each term to the overall effectiveness and robustness of the proposed framework.

From the ablation results in Table 8, it is clear that removing any of the regularization terms leads to noticeable performance degradation on most datasets and evaluation metrics. For instance, when

λ_{4}

(corresponding to the ℓ_2,1-norm constraint) is removed, the ACC and F1-score on datasets (both unlabeled and testing) such as Wine, YaleB, and AR drop significantly, indicating the importance of feature sparsity in enhancing robustness. Likewise, setting

λ_{1}

or

λ_{2}

to zero weakens the dual label propagation process, particularly on more complex datasets like YaleB and AR, where both ACC and F1-score degrade by more than 2%. These observations demonstrate that each regularization term plays a distinct and indispensable role in maintaining stable and high-performing model behavior. The ablation study, therefore, confirms the necessity and effectiveness of these components in the overall framework.

5.8. Visualization Analysis

To further validate the proposed DLPLSR, we present a visualization study from two aspects: numerical convergence and confusion matrix.

Although the theoretical convergence of DLPLSR has been established in Section 4.3, we additionally provide empirical evidence of its numerical stability. We select six representative datasets from various domains and scales: Iris, Wine, Yale, YaleB, 10k, and AR. The convergence curves of the loss function over iterations are illustrated in Figure 6. Despite different data characteristics and magnitudes, DLPLSR consistently converges within a small number of iterations (mostly less than 10), showing stable decreasing behavior or bounded oscillation after a sharp descent, which confirms that the optimization procedure of DLPLSR is numerically stable and efficient across different datasets.

Then, we visualize the confusion matrices on three representative datasets: COIL100, AR, and 10k, as shown in Figure 7. The COIL100 and AR datasets involve 100 classes each, presenting significant classification challenges due to high inter-class similarity. For better visualization, the confusion matrices are grouped by class intervals of 10. Although the absolute accuracy values on COIL100 and AR are moderate, DLPLSR achieves top-tier rank positions among all compared methods (Table 5). This indicates that while fine-grained recognition remains difficult due to limited labels, DLPLSR still captures essential class structures more effectively than others. The 10k dataset, characterized by a larger scale and more complex class distribution, further demonstrates the generalization strength of DLPLSR. As shown in the confusion matrices, most diagonal blocks remain prominent in both unlabeled and testing settings, indicating consistent and robust discrimination ability even with limited supervision.

These visual results reaffirm that DLPLSR maintains reliable classification quality across diverse and challenging scenarios, with strong generalization in large-scale and high-class-count conditions.

5.9. Real-World Applications Experiments

To validate the real-world applicability of the proposed method, we evaluate it on two widely used industrial fault diagnosis datasets, CWRU (https://engineering.case.edu/bearingdatacenter, accessed on 1 June 2025) and SEU (https://github.com/cathysiyu/Mechanical-datasets, accessed on 1 June 2025), both involving vibration signals collected under various fault types and operating conditions. The raw time-series signals are transformed into time-frequency representations using continuous wavelet transform (CWT), with a sliding window of length 1024, an overlap ratio of 0.5, a frequency scale of 128, and the wavelet basis function set to ‘cmor100-1’. From each frequency scale of the CWT matrix, we extract eleven statistical features, including mean, standard deviation, maximum, minimum, energy, skewness, kurtosis, average absolute value, peak value, shape factor, and Shannon entropy. These features are concatenated across all scales to form the final sample-level representation. For dataset splitting, both the CWRU and SEU datasets are divided into training and testing sets with a 1:19 ratio, using a fixed random seed to ensure reproducibility. Specifically, CWRU contains 1635 training samples and 31,065 test samples, while SEU includes 2047 training samples and 38,893 test samples. Furthermore, 50% of the training samples are labeled. The results are shown in the Table 9.

From Table 9, the best overall performer is DSLRSR with an average rank of 1.50, while DLPLSR also demonstrates strong performance, ranking second with an average rank of 2.75 across all metrics. On the SEU dataset, DLPLSR achieves top-2 results in all four metrics (ACC_U, ACC_T, F1_U, F1_T). Although it slightly lags behind the best method on the CWRU dataset, it still ranks within the top four across all metrics, reflecting stable generalization. While its current accuracy may not yet meet the stringent demands of critical fault diagnosis applications, it is notable that DLPLSR achieves over 90% accuracy on both widely used datasets without any task-specific adaptation, indicating considerable potential for further development in the fault diagnosis field.

6. Discussion

In this paper, we proposed DLPLSR, a dual label propagation-driven least squares regression framework without pseudo label learning. By jointly leveraging manifold-based local geometry and fuzzy graph clustering-based global structure, the model enables effective label propagation across all training samples. In addition, a dual feature selection mechanism combining orthogonal projection and ℓ_2,1 regularization is integrated to extract compact and discriminative features under limited supervision. Extensive experiments on 14 benchmark datasets against 7 SOTA baselines demonstrate the robustness and effectiveness of DLPLSR across various natural image classification tasks such as faces, objects, and handwritten digits. Moreover, without any domain-specific design, DLPLSR achieves over 90% accuracy on two real-world fault diagnosis datasets, showing strong potential for broader applications beyond standard vision tasks.

With the widespread adoption of multi-view data over single-view data, future work may first consider extending the proposed framework to multi-view semi-supervised classification [48,49], enabling more effective exploration of inter-view relationships to improve accuracy. Meanwhile, in the current model, label information is incorporated solely through the loss function. Exploring coarser-granularity label [50] representations and embeddings may offer a promising direction for enhancing label utilization and further improving model performance. Finally, while DLPLSR exhibits promise in fault diagnosis, its accuracy is not yet adequate for industrial use. Future research will aim to optimize the model for task-specific industrial applications.

Author Contributions

Conceptualization, S.Z. and Z.S.; methodology, Z.Y.; software, Z.Y.; writing—original draft preparation, S.Z.; writing—review and editing, Z.S.; supervision, Z.S.; project administration, Z.S.; funding acquisition, S.Z. and Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Guangdong Province Youth Innovation Talent Project for Colleges and Universities under Grant 2023KQNCX069 and the Scientific Foundation for Youth Scholars of Shenzhen University under Grant 868-000001032407.

Data Availability Statement

To facilitate academic exchange and reproducibility, the source code of this work is available at: https://github.com/ShiZhaoyin/DLPLSR (accessed on 1 June 2025).

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kotsiantis, S.B.; Zaharakis, I.D.; Pintelas, P.E. Machine learning: A review of classification and combining techniques. Artif. Intell. Rev. 2006, 26, 159–190. [Google Scholar] [CrossRef]
Sen, P.C.; Hajra, M.; Ghosh, M. Supervised classification algorithms in machine learning: A survey and review. In Emerging Technology in Modelling and Graphics: Proceedings of IEM Graph 2018; Springer Nature: Singapore, 2020; pp. 99–111. [Google Scholar]
Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of image classification algorithms based on convolutional neural networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
Zhang, S. Challenges in KNN classification. IEEE Trans. Knowl. Data Eng. 2021, 34, 4663–4675. [Google Scholar] [CrossRef]
Xie, J.; Xiang, X.; Xia, S.; Jiang, L.; Wang, G.; Gao, X. Mgnr: A multi-granularity neighbor relationship and its application in knn classification and clustering methods. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 7956–7972. [Google Scholar] [CrossRef] [PubMed]
Chandra, M.A.; Bedi, S. Survey on SVM and their application in image classification. Int. J. Inf. Technol. 2021, 13, 1–11. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, L.; Qiao, Q.; Li, F. A Lie group Laplacian Support Vector Machine for semi-supervised learning. Neurocomputing 2025, 630, 129728. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Shafiq, M.; Gu, Z. Deep residual learning for image recognition: A survey. Appl. Sci. 2022, 12, 8972. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10012–10022. [Google Scholar]
Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
Jain, A.; Patel, H.; Nagalapatti, L.; Gupta, N.; Mehta, S.; Guttula, S.; Mujumdar, S.; Afzal, S.; Sharma Mittal, R.; Munigala, V. Overview and importance of data quality for machine learning tasks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD), Virtual, 6–10 July 2020; pp. 3561–3562. [Google Scholar]
Van Engelen, J.E.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn. 2020, 109, 373–440. [Google Scholar] [CrossRef]
Han, K.; Sheng, V.S.; Song, Y.; Liu, Y.; Qiu, C.; Ma, S.; Liu, Z. Deep semi-supervised learning for medical image segmentation: A review. Expert Syst. Appl. 2024, 245, 123052. [Google Scholar] [CrossRef]
Zhou, Z.H.; Zhou, Z.H. Semi-supervised learning. In Machine Learning; Springer: Berlin/Heidelberg, Germany, 2021; Chapter 13; pp. 315–341. [Google Scholar]
Bennett, K.; Demiriz, A. Semi-supervised support vector machines. Adv. Neural Inf. Process. Syst. NIPS 1998, 11, 368–374. [Google Scholar]
Belkin, M.; Niyogi, P.; Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 2006, 7, 2399–2434. [Google Scholar]
Nie, F.; Shi, S.; Li, X. Semi-supervised learning with auto-weighting feature and adaptive graph. IEEE Trans. Knowl. Data Eng. 2019, 32, 1167–1178. [Google Scholar] [CrossRef]
Qiao, X.; Chen, C.; Wang, W.; Peng, Q.; Ghafar, A. Efficient ℓ_2,1-norm graph for robust semi-supervised classification. Pattern Recognit. 2025, 169, 111890. [Google Scholar] [CrossRef]
Jia, Y.; Kwong, S.; Hou, J.; Wu, W. Semi-supervised non-negative matrix factorization with dissimilarity and similarity regularization. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 2510–2521. [Google Scholar] [CrossRef] [PubMed]
Yuan, A.; You, M.; He, D.; Li, X. Convex non-negative matrix factorization with adaptive graph for unsupervised feature selection. IEEE Trans. Cybern. 2020, 52, 5522–5534. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Yuan, G.; Nie, F.; Huang, J.Z. Semi-supervised Feature Selection via Rescaled Linear Regression. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, 19–25 August 2017; Volume 2017, pp. 1525–1531. [Google Scholar]
Xu, S.; Dai, J.; Shi, H. Semi-supervised feature selection based on least square regression with redundancy minimization. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
Liu, Z.; Lai, Z.; Ou, W.; Zhang, K.; Huo, H. Discriminative sparse least square regression for semi-supervised learning. Inf. Sci. 2023, 636, 118903. [Google Scholar] [CrossRef]
Zhong, W.; Chen, X.; Yuan, G.; Li, Y.; Nie, F. Semi-supervised feature selection with adaptive discriminant analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 10083–10084. [Google Scholar]
Wang, C.; Chen, X.; Yuan, G.; Nie, F.; Yang, M. Semisupervised feature selection with sparse discriminative least squares regression. IEEE Trans. Cybern. 2021, 52, 8413–8424. [Google Scholar] [CrossRef] [PubMed]
Nie, F.; Xu, D.; Tsang, I.W.H.; Zhang, C. Flexible manifold embedding: A framework for semi-supervised and unsupervised dimension reduction. IEEE Trans. Image Process. 2010, 19, 1921–1932. [Google Scholar] [CrossRef] [PubMed]
Qiu, S.; Nie, F.; Xu, X.; Qing, C.; Xu, D. Accelerating flexible manifold embedding for scalable semi-supervised learning. IEEE Trans. Circuits Syst. Video Technol. 2018, 29, 2786–2795. [Google Scholar] [CrossRef]
Shi, D.; Zhu, L.; Li, J.; Cheng, Z.; Liu, Z. Binary label learning for semi-supervised feature selection. IEEE Trans. Knowl. Data Eng. 2023, 35, 2299–2312. [Google Scholar] [CrossRef]
Bao, J.; Kudo, M.; Kimura, K.; Sun, L. Robust embedding regression for semi-supervised learning. Pattern Recognit. 2024, 145, 109894. [Google Scholar] [CrossRef]
Liao, H.; Chen, H.; Yin, T.; Horng, S.J.; Li, T. Adaptive orthogonal semi-supervised feature selection with reliable label matrix learning. Inf. Process. Manag. 2024, 61, 103727. [Google Scholar] [CrossRef]
Zhang, H.; Gong, M.; Nie, F.; Li, X. Unified dual-label semi-supervised learning with top-k feature selection. Neurocomputing 2022, 501, 875–888. [Google Scholar] [CrossRef]
Wang, J.; Xie, F.; Nie, F.; Li, X. Robust supervised and semisupervised least squares regression using ℓ_2,p-norm minimization. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 8389–8403. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Chen, C.; Nie, F.; Li, X. Discriminative and robust least squares regression for semi-supervised image classification. Neurocomputing 2024, 575, 127316. [Google Scholar] [CrossRef]
McDonald, G.C. Ridge regression. Wiley Interdiscip. Rev. Comput. Stat. 2009, 1, 93–100. [Google Scholar] [CrossRef]
Varaprasad, S.; Goel, T.; Tanveer, M.; Murugan, R. An effective diagnosis of schizophrenia using kernel ridge regression-based optimized RVFL classifier. Appl. Soft Comput. 2024, 157, 111457. [Google Scholar] [CrossRef]
Ranstam, J.; Cook, J.A. LASSO regression. J. Br. Surg. 2018, 105, 1348. [Google Scholar] [CrossRef]
Wang, S.; Chen, Y.; Cui, Z.; Lin, L.; Zong, Y. Diabetes risk analysis based on machine learning LASSO regression model. J. Theory Pract. Eng. Sci. 2024, 4, 58–64. [Google Scholar]
Zhang, Z.; Lai, Z.; Xu, Y.; Shao, L.; Wu, J.; Xie, G.S. Discriminative elastic-net regularized linear regression. IEEE Trans. Image Process. 2017, 26, 1466–1481. [Google Scholar] [CrossRef] [PubMed]
Amini, F.; Hu, G. A two-layer feature selection method using genetic algorithm and elastic net. Expert Syst. Appl. 2021, 166, 114072. [Google Scholar] [CrossRef]
Zhu, X.; Ghahramani, Z.; Lafferty, J.D. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International Conference on MACHINE Learning (ICML-03), Washington, DC, USA, 21–24 August 2003; pp. 912–919. [Google Scholar]
Zhou, D.; Bousquet, O.; Lal, T.; Weston, J.; Schölkopf, B. Learning with local and global consistency. Adv. Neural Inf. Process. Syst. NIPS 2003, 16, 1–8. [Google Scholar]
Chen, X.; Yuan, G.; Nie, F.; Ming, Z. Semi-supervised feature selection via sparse rescaled linear square regression. IEEE Trans. Knowl. Data Eng. 2018, 32, 165–176. [Google Scholar] [CrossRef]
Qi, X.; Zhang, H.; Nie, F. Discriminative Semi-Supervised Feature Selection Via a Class-Credible Pseudo-Label Learning Framework. In Proceedings of the ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 6895–6899. [Google Scholar]
Shi, Z.; Chen, L.; Ding, W.; Zhong, X.; Wu, Z.; Chen, G.Y.; Zhang, C.; Wang, Y.; Chen, C.L.P. IFKMHC: Implicit Fuzzy K-Means Model for High-Dimensional Data Clustering. IEEE Trans. Cybern. 2024, 54, 7955–7968. [Google Scholar] [CrossRef] [PubMed]
Nie, F.; Zhang, R.; Li, X. A generalized power iteration method for solving quadratic problem on the stiefel manifold. Sci. China Inf. Sci. 2017, 60, 1–10. [Google Scholar] [CrossRef]
Huang, J.; Nie, F.; Huang, H. A new simplex sparse learning model to measure data similarity for clustering. In Proceedings of the IJCAI, Buenos Aires, Argentina, 25–31 July 2015; pp. 3569–3575. [Google Scholar]
Xu, C.; Si, J.; Guan, Z.; Zhao, W.; Wu, Y.; Gao, X. Reliable conflictive multi-view learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 16129–16137. [Google Scholar]
Shi, Z.; Chen, L.; Ding, W.; Zhang, C.; Wang, Y. Parameter-free robust ensemble framework of fuzzy clustering. IEEE Trans. Fuzzy Syst. 2023, 31, 4205–4219. [Google Scholar] [CrossRef]
Shi, Z.; Luo, Y.; Liu, X.; Chen, L.; Ding, W.; Zhong, X.; Wu, Z.; Philip Chen, C.L. MGL-FBLS: Multi-Granularity Label-Driven Feature Enhanced Broad Learning System for Semi-Supervised Classification. IEEE Trans. Emerg. Top. Comput. Intell. 2025; early access. [Google Scholar] [CrossRef]

Figure 1. The demonstration of least squares regression.

Figure 2. The illustration of the proposed DLPLSR.

Figure 3. Visual examples of the datasets used in experiments.

Figure 4. Post-hoc Nemenyi test results at significance level 0.05.

Figure 5. Parameter sensitivity 3D bar chart of DLPLSR on Wine, Semeion, and YaleB.

Figure 6. Loss convergence curves of DLPLSR on six representative datasets.

Figure 7. Confusion matrix visualization of DLPLSR on COIL100, AR, and 10k.

Table 1. Statistics of the benchmark datasets adopted in the experiments.

Type	UCI		Handwriting				Objects		Faces
Dataset	Iris	Wine	USPS	2k2k	10k	Semeion	COIL20	COIL100	ORL	Jaffe	PIX10	Yale	YaleB	AR
No.	①	②	③	④	⑤	⑥	⑦	⑧	⑨	⑩	❶	❷	❸	❹
Features	4	13	16 × 16	28 × 28	28 × 28	16 × 16	32 × 32	32×32	32 × 32	32 × 32	100 × 100	32 × 32	32 × 32	55 × 40
Instances	150	178	9298	4000	10,000	1593	1440	7200	400	213	100	165	2414	2600
Classes	3	3	10	10	10	10	20	100	40	10	10	15	38	100
Capacity	50	48, 59, 71	708–1553	359–454	863–1127	155–162	72	72	10	20–23	10	11	59–64	26

Table 2. Comparison of the proposed DLPLSR and Baseline Methods.

Methods	Label Propagation	Feature Selection	Samples’ Relationship	Biased Regression	Pseudo-Labels Learning	Number of Parameters	Solver	Highlights
RSSLSR [33]	No special	No special	No special	✓	✓	3	Alternative iterations	Sample-wise weights
SFS_BLL [29]	Pre-KNN	Sparse	Local	✗	✓	5	ADMM	Binary hash
DSLSR [24]	Pre-KNN	No special	Local	✓	✓	5	ADMM	Discriminant enhanced
RER [30]	Self-expression	Sparse	Global	✓	✓	3	ADMM	Robustness enhanced
DRLSR [34]	Fuzzy clustering	No special	Global	✓	✓	3	Alternative iterations	Anchor graph
AGLSOFS_N [31]	Adaptive manifold	Orthogonal + Sparse	Local	✗	✓	4	Alternative iterations	Reliable label
AGLSOFS_E [31]	Adaptive manifold	Orthogonal + Sparse	Local	✗	✓	4	Alternative iterations	Reliable label
DLPLSR	Fuzzy clustering + Adaptive manifold	Orthogonal + Sparse	Local + Global	✗	✗	4	Alternative iterations	Dual-LP Pseudo-label-free

Table 3. The detailed prediction strategies of all methods.

Prediction Strategy	RSSLSR [33]	SFS_BLL [29]	DSLSR [24]	RER [30]	DRLSR [34]	AGLSOFS_N [31]	AGLSOFS_E [31]	DLPLSR
Unlabeled specified	✗	✗	✗	Pseudo labels	Pseudo labels	✗	✗	1-NN
Unlabeled in experiments	Pseudo labels	1-NN	1-NN	Pseudo labels	Pseudo labels	1-NN	Pseudo labels	1-NN
Testing in experiments	1-NN	1-NN	1-NN	1-NN	1-NN	1-NN	1-NN	1-NN

Table 4. The optimal hyperparameters (

λ_{1}

to

λ_{4}

) of DLPLSR in the benchmarks.

Table 4. The optimal hyperparameters (

λ_{1}

to

λ_{4}

) of DLPLSR in the benchmarks.

Opts	①	②	③	④	⑤	⑥	⑦	⑧	⑨	⑩	❶	❷	❸	❹
$λ_{1}$	0.0001	0.1	10	0.1	0.01	1	10	100	10	1000	10	1	0.01	0.1
$λ_{2}$	0.0001	1	1	1	0.001	1	0.0001	100	100	10	100	1	100	10,000
$λ_{3}$	0.0001	0.001	1000	100	1	10	10	100	0.0001	10,000	1000	10	0.1	10
$λ_{4}$	0.0001	1	1	0.0001	0.1	0.01	100	100	100	10,000	10	10	1000	10,000

Table 5. Classification accuracy (%) and ranks on unlabeled and testing data.

	Unlabeled								Testing
	RSSLSR	SFS_BLL	DSLSR	RER	DRLSR	AGLSOFS	AGLSOFSE	DLPLSR	RSSLSR	SFS	DSLSR	RER	DRLSR	AGLSOFS_N	AGLSOFS_E	DLPLSR
Iris	73.13 (8)	85.07 (6)	95.52 (1)	92.54 (3)	89.55 (4)	89.55 (4)	74.63 (7)	95.52 (1)	100.00 (1)	96.00 (6)	100.00 (1)	100.00 (1)	96.00 (6)	94.67 (8)	98.67 (5)	100.00 (1)
Wine	70.00 (6)	73.75 (4)	76.25 (3)	71.25 (5)	62.50 (7)	78.75 (2)	43.75 (8)	91.25 (1)	70.79 (5)	70.79 (5)	83.15 (3)	70.79 (5)	62.92 (8)	84.27 (1)	76.40 (4)	84.27 (1)
USPS	86.66 (6)	53.91 (8)	89.24 (3)	95.70 (1)	89.77 (2)	87.52 (5)	75.86 (7)	87.91 (4)	88.30 (2)	54.33 (8)	88.19 (3)	82.66 (7)	88.34 (1)	86.26 (5)	85.76 (6)	86.75 (4)
2k2k	74.17 (6)	76.00 (4)	75.28 (5)	86.67 (1)	65.44 (7)	78.28 (2)	64.67 (8)	77.39 (3)	79.25 (3)	76.90 (6)	77.55 (5)	74.40 (7)	65.50 (8)	79.65 (2)	78.40 (4)	79.95 (1)
10k	81.42 (5)	81.04 (6)	84.16 (4)	93.36 (1)	79.40 (7)	84.36 (3)	70.73 (8)	85.11 (2)	84.30 (3)	81.04 (6)	83.44 (4)	78.74 (8)	80.02 (7)	85.74 (1)	83.26 (5)	84.66 (2)
Semeion	69.83 (5)	53.91 (7)	75.98 (4)	79.19 (1)	32.26 (8)	77.93 (3)	61.45 (6)	79.19 (1)	75.28 (4)	54.33 (7)	71.27 (5)	69.89 (6)	34.76 (8)	80.68 (1)	78.04 (3)	80.18 (2)
COIL20	79.17 (6)	81.48 (4)	84.10 (2)	86.11 (1)	62.19 (8)	79.94 (5)	68.67 (7)	82.56 (3)	80.00 (5)	79.58 (6)	83.61 (1)	80.28 (4)	62.78 (8)	78.19 (7)	81.53 (3)	83.47 (2)
COIL100	56.23 (6)	68.49 (4)	72.19 (2)	75.49 (1)	53.55 (7)	67.87 (5)	47.59 (8)	68.77 (3)	60.17 (7)	66.03 (2)	69.17 (1)	64.06 (6)	50.11 (8)	64.97 (4)	64.47 (5)	65.92 (3)
ORL	24.44 (2)	9.44 (8)	21.67 (6)	22.22 (5)	13.89 (7)	23.33 (4)	25.56 (1)	23.89 (3)	45.00 (4)	22.00 (8)	45.00 (4)	51.00 (2)	39.50 (7)	51.00 (2)	43.50 (6)	52.00 (1)
Jaffe	93.68 (4)	83.16 (7)	96.84 (2)	97.89 (1)	53.68 (8)	87.37 (6)	94.74 (3)	93.68 (4)	92.52 (3)	85.98 (7)	100.00 (1)	90.65 (5)	49.53 (8)	86.92 (6)	91.59 (4)	98.13 (2)
PIX10	26.67 (3)	22.22 (7)	26.67 (3)	31.11 (2)	22.22 (7)	26.67 (3)	80.00 (1)	26.67 (3)	58.00 (6)	62.00 (3)	62.00 (3)	60.00 (5)	58.00 (6)	66.00 (1)	58.00 (6)	64.00 (2)
Yale	22.97 (3)	22.97 (3)	18.92 (6)	27.03 (2)	14.86 (8)	18.92 (6)	33.78 (1)	21.62 (5)	33.73 (6)	34.94 (5)	40.96 (1)	39.76 (4)	16.87 (8)	40.96 (1)	33.73 (6)	40.96 (1)
YaleB	54.33 (4)	12.25 (8)	52.30 (5)	32.23 (6)	58.29 (3)	60.22 (2)	19.71 (7)	62.62 (1)	59.98 (3)	14.00 (8)	49.46 (7)	56.01 (5)	60.15 (2)	56.84 (4)	52.53 (6)	60.23 (1)
AR	33.76 (2)	14.19 (6)	1.45 (7)	53.16 (1)	1.20 (8)	22.82 (4)	18.80 (5)	25.47 (3)	32.38 (1)	12.54 (5)	0.46 (8)	23.31 (2)	1.69 (7)	22.15 (4)	10.46 (6)	23.31 (2)
Average	4.71	5.86	3.79	2.21	6.50	3.86	5.50	2.64	3.79	5.86	3.36	4.79	6.57	3.36	4.93	1.79

Table 6. Classification F1-score (%) and ranks on unlabeled and testing data.

	Unlabeled								Testing
	RSSLSR	SFS	DSLSR	RER	DRLSR	AGLSOFS	AGLSOFSE	DLPLSR	RSSLSR	SFS	DSLSR	RER	DRLSR	AGLSOFS_N	AGLSOFS_E	DLPLSR
Iris	58.33 (8)	84.28 (6)	95.48 (1)	92.35 (3)	89.61 (4)	89.09 (5)	62.19 (7)	95.48 (1)	100.00 (1)	96.17 (6)	100.00 (1)	100.00 (1)	95.92 (7)	94.59 (8)	98.62 (5)	100.00 (1)
Wine	68.76 (5)	72.78 (4)	76.14 (3)	67.43 (6)	59.63 (7)	78.82 (2)	20.29 (8)	91.64 (1)	69.45 (6)	68.99 (7)	82.87 (3)	69.82 (5)	59.51 (8)	84.81 (1)	76.23 (4)	84.42 (2)
USPS	85.55 (6)	54.57 (8)	87.92 (3)	95.21 (1)	88.61 (2)	86.36 (5)	75.34 (7)	86.61 (4)	87.06 (2)	55.13 (8)	86.90 (3)	81.32 (7)	87.09 (1)	85.04 (5)	84.60 (6)	85.40 (4)
2k2k	73.37 (6)	75.82 (4)	75.29 (5)	86.57 (1)	64.90 (7)	78.09 (2)	64.41 (8)	77.27 (3)	78.72 (3)	76.53 (6)	77.25 (5)	73.97 (7)	64.62 (8)	79.19 (2)	77.90 (4)	79.60 (1)
10k	80.71 (5)	80.59 (6)	83.75 (4)	93.24 (1)	78.93 (7)	84.00 (3)	69.79 (8)	84.75 (2)	83.92 (3)	80.75 (6)	83.13 (4)	78.40 (8)	79.56 (7)	85.49 (1)	82.93 (5)	84.36 (2)
Semeion	69.02 (5)	54.57 (7)	75.93 (4)	78.74 (2)	31.39 (8)	77.92 (3)	60.28 (6)	79.13 (1)	75.67 (4)	55.13 (7)	71.06 (5)	70.21 (6)	33.98 (8)	80.92 (1)	78.21 (3)	80.05 (2)
COIL20	76.85 (6)	80.14 (4)	82.70 (2)	85.34 (1)	59.78 (8)	78.22 (5)	66.76 (7)	81.94 (3)	79.69 (5)	79.11 (6)	83.58 (1)	79.77 (4)	61.30 (8)	77.75 (7)	81.53 (3)	83.26 (2)
COIL100	51.61 (7)	66.81 (4)	71.57 (2)	75.01 (1)	52.67 (6)	66.67 (5)	41.62 (8)	67.55 (3)	60.37 (7)	65.31 (3)	69.78 (1)	63.30 (6)	50.59 (8)	65.07 (4)	64.44 (5)	65.86 (2)
ORL	23.69 (4)	9.75 (8)	23.20 (5)	18.68 (6)	17.62 (7)	25.19 (3)	31.44 (1)	25.26 (2)	29.29 (4)	14.27 (8)	28.52 (5)	32.73 (2)	26.51 (7)	32.73 (3)	27.34 (6)	33.25 (1)
Jaffe	93.56 (4)	83.08 (7)	96.75 (2)	98.23 (1)	53.05 (8)	87.25 (6)	94.53 (3)	93.49 (5)	92.48 (3)	85.98 (7)	100.00 (1)	89.13 (5)	51.40 (8)	86.81 (6)	91.40 (4)	98.12 (2)
PIX10	12.58 (8)	22.11 (6)	22.53 (4)	32.54 (2)	20.11 (7)	22.26 (5)	79.90 (1)	24.62 (3)	37.62 (6)	39.49 (2)	37.89 (5)	37.89 (4)	35.93 (7)	39.90 (1)	34.97 (8)	39.01 (3)
Yale	20.04 (3)	19.43 (4)	16.72 (6)	21.65 (2)	10.25 (8)	15.13 (7)	38.84 (1)	18.95 (5)	21.66 (7)	24.04 (5)	29.07 (1)	27.60 (4)	8.12 (8)	27.93 (3)	22.97 (6)	28.13 (2)
YaleB	58.10 (4)	12.39 (8)	52.52 (5)	30.79 (6)	59.79 (3)	60.21 (2)	19.33 (7)	62.52 (1)	62.50 (1)	15.05 (8)	51.12 (7)	56.99 (5)	60.98 (3)	57.58 (4)	53.81 (6)	61.23 (2)
AR	30.99 (2)	14.64 (6)	0.48 (7)	51.78 (1)	0.02 (8)	22.65 (4)	14.88 (5)	23.96 (3)	32.69 (1)	13.53 (5)	0.02 (8)	24.97 (2)	0.76 (7)	21.97 (4)	11.22 (6)	23.21 (3)
Average	5.21	5.86	3.79	2.43	6.43	4.07	5.50	2.64	3.79	6	3.57	4.71	6.79	3.57	5.07	2.07

Table 7. Average training time (Seconds) across all hyper-parameter groups for each dataset.

Time	10k	2k2k	AR	COIL100	COIL20	Iris	Jaffe	ORL	PIX10	Semeion	USPS	Wine	Yale	YaleB	AveT	AveR
RSSLSR	216.60 (5)	16.27 (4)	36.23 (4)	76.87 (4)	1.07 (4)	0.00 (2)	0.01 (2)	0.06 (1)	0.00 (1)	1.71 (3)	140.23 (6)	0.00 (2)	0.01 (1)	4.40 (5)	35.25 (4)	3.14 (3)
SFS_BLL	552.66 (7)	292.03 (8)	324.55 (7)	942.07 (8)	8.29 (8)	0.01 (4)	0.02 (4)	0.26 (6)	0.01 (3)	7.68 (7)	7.68 (1)	0.01 (4)	0.03 (3)	31.76 (8)	154.79 (7)	5.57 (6)
DSLSR	42.31 (2)	11.78 (3)	20.40 (2)	65.83 (3)	1.15 (5)	0.00 (1)	0.01 (1)	0.06 (2)	0.00 (2)	1.68 (2)	60.00 (4)	0.00 (1)	0.01 (2)	3.40 (4)	14.76 (2)	2.43 (1)
RER	907.55 (8)	110.90 (7)	301.14 (6)	503.40 (7)	3.91 (7)	0.02 (7)	0.05 (7)	0.21 (5)	0.02 (5)	8.50 (8)	517.86 (8)	0.03 (6)	0.04 (5)	15.56 (7)	169.23 (8)	6.64 (8)
DRLSR	229.03 (6)	23.06 (6)	9.38 (1)	97.05 (6)	0.91 (3)	0.01 (6)	0.05 (5)	0.12 (3)	0.04 (7)	2.37 (5)	166.95 (7)	0.05 (8)	0.05 (6)	2.19 (3)	37.95 (5)	5.14 (5)
AGLSOFS_N	125.67 (4)	21.20 (5)	415.23 (8)	81.20 (5)	2.51 (6)	0.04 (8)	0.15 (8)	0.92 (8)	0.13 (8)	3.93 (6)	63.49 (5)	0.05 (7)	0.24 (8)	9.10 (6)	51.71 (6)	6.57 (7)
AGLSOFS_E	44.99 (3)	4.81 (1)	24.00 (3)	26.37 (2)	0.41 (1)	0.00 (3)	0.02 (3)	0.14 (4)	0.02 (4)	0.67 (1)	59.67 (3)	0.00 (3)	0.03 (4)	1.65 (1)	11.63 (1)	2.57(2)
DLPLSR	31.53 (1)	5.00 (2)	170.24 (5)	22.95 (1)	0.72 (2)	0.01 (5)	0.05 (6)	0.31 (7)	0.04 (6)	2.10 (4)	58.40 (2)	0.02 (5)	0.09 (7)	2.11 (2)	20.97(3)	3.93 (4)

Table 8. Ablation results on the impact of regularization terms.

Datasets	ACC_U				ACC_T				F1_U				F1_T
Datasets	w	1	2	4	w	1	2	4	w	1	2	4	w	1	2	4
Iris	95.52	95.52	95.52	95.52	100.00	100.00	100.00	100.00	95.48	95.48	95.48	95.48	100.00	100.00	100.00	100.00
Wine	91.25	90.00	91.25	68.75	84.27	82.02	82.02	73.03	91.64	90.60	91.64	67.10	84.42	82.27	82.27	71.92
USPS	87.91	87.33	87.31	88.31	86.75	86.62	86.62	86.17	86.61	86.16	86.12	87.07	85.40	85.46	85.46	84.78
2k2k	77.39	77.11	78.06	78.22	79.95	79.65	79.80	79.35	77.27	77.03	77.93	78.11	79.60	79.26	79.40	78.93
10k	85.11	85.07	85.07	85.33	84.66	84.68	84.66	85.06	84.75	84.69	84.70	84.99	84.36	84.37	84.36	84.76
Semeion	79.19	77.23	78.07	77.93	80.18	80.43	78.17	80.93	79.13	77.17	77.99	77.92	80.05	80.46	78.21	80.73
COIL20	82.56	82.10	82.56	82.72	83.47	80.56	83.47	81.39	81.94	80.64	81.94	81.16	83.26	80.43	83.26	81.51
COIL100	68.77	67.41	68.33	67.38	65.92	64.81	65.58	64.83	67.55	66.28	67.02	66.19	65.86	64.80	65.65	64.86
ORL	23.89	19.44	23.89	20.00	52.00	44.00	51.00	43.50	25.26	20.94	26.34	21.86	33.25	27.60	32.49	27.32
Jaffe	93.68	88.42	91.58	90.53	98.13	92.52	97.20	91.59	93.49	88.35	91.94	90.60	98.12	92.43	97.21	91.51
PIX10	26.67	24.44	24.44	24.44	64.00	64.00	64.00	64.00	24.62	25.91	22.99	23.95	39.01	39.36	39.12	39.27
Yale	21.62	20.27	20.27	20.27	40.96	34.94	34.94	34.94	18.95	18.20	18.20	18.20	28.13	22.72	22.72	22.72
YaleB	62.62	59.58	59.94	24.59	60.23	56.42	58.00	22.87	62.52	59.45	60.02	25.62	61.23	57.04	58.73	24.83
AR	25.47	22.91	23.25	15.04	23.31	21.77	21.85	13.31	23.96	22.50	22.64	15.69	23.21	21.70	21.64	15.34

Table 9. Fault diagnosis performance comparison.

	CWRU				SEU				Average Rank
	ACC_U	ACC_T	F1_U	F1_T	ACC_U	ACC_T	F1_U	F1_T	Average Rank
RSSLSR	95.59 (1)	95.05 (1)	95.14 (1)	95.10 (1)	79.86 (6)	83.80 (4)	80.18 (6)	83.79 (4)	3.00
SFS	80.17 (5)	78.81 (4)	78.05 (5)	78.59 (4)	81.04 (5)	82.29 (6)	81.02 (5)	82.30 (6)	5.00
DSLSR	95.35 (2)	92.47 (2)	95.13 (2)	92.50 (2)	98.34 (1)	98.43 (1)	98.34 (1)	98.43 (1)	1.50
RER	93.27 (3)	78.11 (5)	92.24 (3)	77.94 (5)	90.22 (3)	55.26 (7)	90.25 (3)	55.28 (7)	4.50
DRLSR	9.91 (8)	10.14 (8)	1.80 (8)	1.84 (8)	20.14 (7)	20.00 (8)	6.70 (7)	6.67 (8)	7.75
AGLSOFS	71.73 (6)	71.35 (6)	68.58 (6)	71.37 (6)	83.09 (4)	83.54 (5)	83.18 (4)	83.57 (5)	5.25
AGLSOFSE	42.23 (7)	71.07 (7)	31.30 (7)	70.77 (7)	20.14 (7)	92.56 (3)	6.70 (7)	92.56 (3)	6.00
DLPLSR	92.04 (4)	92.35 (3)	90.72 (4)	92.20 (3)	93.35 (2)	94.39 (2)	93.38 (2)	94.39 (2)	2.75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, S.; Yang, Z.; Shi, Z. DLPLSR: Dual Label Propagation-Driven Least Squares Regression with Feature Selection for Semi-Supervised Learning. Mathematics 2025, 13, 2290. https://doi.org/10.3390/math13142290

AMA Style

Zhang S, Yang Z, Shi Z. DLPLSR: Dual Label Propagation-Driven Least Squares Regression with Feature Selection for Semi-Supervised Learning. Mathematics. 2025; 13(14):2290. https://doi.org/10.3390/math13142290

Chicago/Turabian Style

Zhang, Shuanghao, Zhengtong Yang, and Zhaoyin Shi. 2025. "DLPLSR: Dual Label Propagation-Driven Least Squares Regression with Feature Selection for Semi-Supervised Learning" Mathematics 13, no. 14: 2290. https://doi.org/10.3390/math13142290

APA Style

Zhang, S., Yang, Z., & Shi, Z. (2025). DLPLSR: Dual Label Propagation-Driven Least Squares Regression with Feature Selection for Semi-Supervised Learning. Mathematics, 13(14), 2290. https://doi.org/10.3390/math13142290

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DLPLSR: Dual Label Propagation-Driven Least Squares Regression with Feature Selection for Semi-Supervised Learning

Abstract

1. Introduction

2. Preliminaries and Related Works

2.1. Notations and Definitions

2.2. Least Squares Regression

2.3. Semi-Supervised LSR

2.4. Fuzzy Graph and Its Derived Clustering

3. Model Description

3.1. Pseudo-Label-Free Semi-LSR with LP Based on Manifold

3.2. Dual Label Propagation

3.3. Dual Feature Selection

4. Optimization Strategy

4.1. Algorithm Implementation

4.2. Complexity Analysis

4.3. Convergence Analysis

5. Experiments

5.1. Datasets

5.2. Baseline Methods

5.3. Evaluation Metrics

5.4. Configurations

5.4.1. Testing Configuration

5.4.2. Semi-Supervised Configuration

5.4.3. Hyperparameters Configurations

5.5. Comparison Experiments

5.6. Parameter Sensitivity

5.7. Ablation Study

5.8. Visualization Analysis

5.9. Real-World Applications Experiments

6. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI