Next Article in Journal
Improvement of Ant Colony Algorithm Performance for the Job-Shop Scheduling Problem Using Evolutionary Adaptation and Software Realization Heuristics
Previous Article in Journal
Digital Authorship Attribution in Russian-Language Fanfiction and Classical Literature
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

RMFRASL: Robust Matrix Factorization with Robust Adaptive Structure Learning for Feature Selection

1
School of Software, Jiangxi Normal University, Nanchang 330022, China
2
College of Information Science and Technology, Northeast Normal University, Changchun 130117, China
*
Authors to whom correspondence should be addressed.
Algorithms 2023, 16(1), 14; https://doi.org/10.3390/a16010014
Submission received: 31 October 2022 / Revised: 3 December 2022 / Accepted: 22 December 2022 / Published: 26 December 2022

Abstract

:
In this paper, we present a novel unsupervised feature selection method termed robust matrix factorization with robust adaptive structure learning (RMFRASL), which can select discriminative features from a large amount of multimedia data to improve the performance of classification and clustering tasks. RMFRASL integrates three models (robust matrix factorization, adaptive structure learning, and structure regularization) into a unified framework. More specifically, a robust matrix factorization-based feature selection (RMFFS) model is proposed by introducing an indicator matrix to measure the importance of features, and the L21-norm is adopted as a metric to enhance the robustness of feature selection. Furthermore, a robust adaptive structure learning (RASL) model based on the self-representation capability of the samples is designed to discover the geometric structure relationships of original data. Lastly, a structure regularization (SR) term is designed on the learned graph structure, which constrains the selected features to preserve the structure information in the selected feature space. To solve the objective function of our proposed RMFRASL, an iterative optimization algorithm is proposed. By comparing our method with some state-of-the-art unsupervised feature selection approaches on several publicly available databases, the advantage of the proposed RMFRASL is demonstrated.

1. Introduction

In recent years, with the rapid development of the Internet and the popularization of mobile devices, it has become more and more convenient to analyze, search, transmit, and process large amounts of data, which can also promote the development of society and the sharing of resources. However, with the continuous advancement of information technology, mobile Internet data are becoming enormous [1], inevitably increasing the amount of data needing to be processed in real life. Although these high-dimensional data can bring us helpful information, they also cause problems such as data redundancy and noisy data [2,3,4]. Dimensionality reduction technology can not only avoid the “curse of dimensionality” by removing redundant and irrelevant features, but also reduce the time-consuming nature of data processing and the storage space of data [5]. Therefore, it is widely used to process large quantities of Internet mobile data [6]. The most commonly used dimensionality reduction techniques are feature extraction and feature selection. Feature extraction projects the original high-dimensional data into a low-dimensional subspace [7,8,9]. Unlike feature extraction, feature selection selects an optimal subset from the original feature set. Since feature selection can preserve the semantics of original features, it is more explicable and physical; thus, it has been widely used in many fields, e.g., intrusion detection of industrial Internet [10], simulated attack of Internet of things [11], and computer vision [12].
According to the presence or absence of labels, feature selection methods can be divided into supervised, semi-supervised, and unsupervised methods. Among them, unsupervised feature selection is a significant challenge research because few labeled data are available in real life [13]. Unsupervised feature selection can be divided into three categories: (1) filtering methods, (2) wrapping methods, and (3) embedding-based methods. The filtering method involves performing feature selection on the original dataset in advance and then training a learner. In other words, the procedure of feature selection is independent of the learner training [14]. The wrapping method continuously selects feature subsets from the initial dataset and trains the learner until the optimal feature subset is selected [15,16]. Embedding automatically performs feature selection during the learning process. Compared with the filtering and wrapping methods, embedded methods can significantly reduce the computational cost and achieve good results [17,18].
He et al. [19] proposed an unsupervised feature selection method named Laplacian score (LS), which selects optimal feature subsets to preserve the local manifold structure of the original data. However, the LS method ignores the correlation between features. In order to overcome this disadvantage, Liu et al. [20] proposed an unsupervised feature selection method with robust neighborhood embedding (RNEFS). RNEFS adopts the local linear embedding (LLE) algorithm to compute the feature weight matrix. Then, the L1-norm-based reconstructing error was utilized to suppress the influence of noise and outliers. Subsequently, Wang et al. [21] proposed an efficient soft label feature selection (SLFS) method, which first performed soft label learning and then selected features on the basis of the learned soft labels. However, weight matrices of the above methods were defined in advance by considering the similarity of features, which may have affected their performances due to the quality of weight matrices. Therefore, Yuan et al. [22] proposed an adaptive graph convex non-negative matrix analysis method for unsupervised feature selection (CNAFS) which embedded self-expression and pseudo-label information into a joint model. CNAFS can select the most representative features by computing top-ranked features. Shang et al. [23] proposed a non-negative matrix factorization adaptive constraint-based feature selection (NNSAFS) method, which introduced a feature map into matrix factorization to combine manifold learning and feature selection. By combining feature transformation with adaptive constraints, NNSAFS can significantly improve the accuracy of feature selection. However, the above methods take Euclidean distance as the metric. Euclidean distance is sensitive to noise and redundant data, which may cause the performance to be unstable and not robust. To address this issue, Zhao et al. [24] proposed a joint adaptive and discriminative unsupervised feature selection method, which adaptively learns similar graphs while employing irrelevance constraints to enhance the discriminability of the selected features. Zhu et al. [25] proposed a regularized self-representation (RSR) model, which efficiently performs feature selection by using the self-representation ability of features with the L21-norm error to guarantee robustness. Shi et al. [26] proposed a robust spectral learning framework, which utilized graph embedding and spectral regression to deal with noise during unsupervised feature selection. Du et al. [27] proposed a robust unsupervised feature selection based on the matrix factorization (RUFSM) method, which adopted the L21-norm to improve robustness and preserve the local manifold structure, so that an optimal feature subset could be found to retain the manifold structure information. Miao et al. [28] proposed an unsupervised feature selection method named graph-regularized local linear embedding (GRLLE), which combined linear embedding and graph regularization into a unified framework while maintaining local spatial structure. However, the correlation between the selected features was still ignored in the above methods.
This paper proposes an unsupervised feature selection method called robust matrix factorization with robust adaptive structure learning (RMFRASL) to solve the above problems. First, this paper presents a robust matrix factorization-based feature selection model (RMFFS), which introduces a feature indicator matrix into matrix factorization with an L21-norm-based error metric. Next, a robust adaptive structure learning (RASL) model based on self-representation capability is proposed to better describe the local geometry of the samples, which can reduce redundant and noisy features to learn a more discriminative graph structure. Then, to preserve the geometric structure information during the feature selection, a structure regularization (SR) term is constructed. Finally, the above three models are combined into a unified framework and shown in Figure 1. In summary, the main contributions of this paper are as follows:
(1) In order to overcome the shortcoming of existing methods in that they neglect the correlation of features, we introduce an indicator matrix into matrix factorization and design a robust matrix factorization-based feature selection (RMFFS).
(2) To consider the structure information during the process of feature selection, a structure regularization term is integrated into the RMFFS model.
(3) To solve the disadvantages of the existing methods needing to predefine a graph structure, a robust adaptive structure learning model based on the self-representation capability of the sample is proposed, which adopts L21-norm as an error metric rather than L2-norm to reduce the influence of noise or outliers.
(4) To further improve the performance of feature selection methods, an unfiled framework is proposed by combining the process of the feature selection and structure learning.
(5) An iterative optimization algorithm is designed to solve the objective function, and the convergence of the optimization algorithm is verified from mathematical theory and numerical experiments.
This paper is organized as follows: Section 2 briefly reviews the related work. The proposed RMFRASL model is introduced in Section 3. The convergence analysis is provided in Section 4. The experimental results and analysis are provided in Section 5. Section 6 gives the conclusions and future research works.

2. Related Work

2.1. Notations

Assume that Z = [ z 1 ; z 2 ; ; z j ; z n ] R n × d is a matrix with the size of n × d to represent high-dimensional original data, where n and d denote the number of data and features, and z j = [ z j 1 , z j 2 , , z j d ] R 1 × d is a row vector with the size of d to denote the j-th data of Z. L1-norm, L2-norm, and L21-norm of the matrix Z can be defined as follows:
| | Z | | 1 = i = 1 n j = 1 d | z i j | ,
| | Z | | 2 = i = 1 n j = 1 d z i j 2 ,
| | Z | | 2 , 1 = i = 1 n | | z i | | 2 = i = 1 n j = 1 d z i j 2 .
According to the above definitions and matrix correlation operations [29], we can obtain that | | Z | | 2 , 1 = t r ( Z T U Z ) , where U R n × n denotes the diagonal matrix whose diagonal elements are u i i = 1 / ( | | z i | | 2 ) . For ease of reading, a description of the commonly used symbols is listed in Table 1.

2.2. Introduction of Indicator Matrix

The indicator matrix S = [ S 1 , S 2 , , S k ] R d × k is a binary matrix, where S i denotes a column vector with only one element value equal to 1. To understand the indicator matrix S for feature selection more clearly, we establish an example. First, we assume that the data matrix contains four samples, and that each sample has five features; the data matrix can be described as follows:
                                f 1           f 2           f 3           f 4           f 5 X = x 1 x 2 x 3 x 4         (   0.9 0.1 0.7 1.0 0.2 0.7 0.4 0.5 0.3 0.4 0.3 0.8 0.4 0.6 0.1 1.0 0.6 0.6 0.8 1.0 ) .
Then, supposing that three features are selected (i.e., f1, f4 and f5), we can obtain the selected feature subset as follows:
                                  f 1             f 4           f 5 X = x 1 x 2 x 3 x 4           ( 0.9 1.0 0.2 0.7 0.3 0.4 0.3 0.6 0.1 1.0 0.8 1.0 ) .
Meanwhile, the indicator matrix S can be assumed as follows:
                                  1         4       5 S = 1 2 3 4 5             ( 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 ) .
Thus, we can obtain
X S = X ( 0.9 0.1 0.7 1.0 0.2 0.7 0.4 0.5 0.3 0.4 0.3 0.8 0.4 0.6 0.1 1.0 0.6 0.6 0.8 1.0 ) ( 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 ) = ( 0.9 1.0 0.2 0.7 0.3 0.4 0.3 0.6 0.1 1.0 0.8 1.0 ) .
Equation (7) can describe the process of feature selection using the indicator matrix S. Finally, we can also obtain S T S = I 3 × 3 as follows:
S T S = I ( 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 ) ( 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 ) = ( 1 0 0 0 1 0 0 0 1 ) .

3. Robust Matrix Factorization with Robust Adaptive Structure Learning

In this section, a robust matrix factorization with robust adaptive structure learning (RMFRASL) model for unsupervised feature selection is developed to deal with multimedia data collected from Internet of things. Then, an iterative optimization method is designed to solve the objective function of the proposed method. Lastly, a description and the time complexity of the proposed algorithm are provided.

3.1. The Proposed RMFRASL Model

The RMFRASL effectively integrates three models, i.e., robust matrix factorization-based feature selection (RMFFS), robust adaptive structure learning (RASL), and structure regularization (SR), into a unified framework.

3.1.1. RMFFS Model

In order to take the correlation of features into consideration, the self-representation of data or features is widely used for feature representation learning [30]. Therefore, a new feature selection model named robust matrix factorization-based feature selection (RMFFS) is proposed. In the proposed RMFFS model, we first utilize an indicator matrix to find a suitable subset of features for capturing the most critical information to represent the original features approximately. Then, unlike the previous methods, the L21-norm metric is imposed on the reconstruction errors to improve the effectiveness and enhance robustness of feature selection. As a result, the objective function of the RMFFS model is defined as follows:
min ϕ ( Z , A , S ) = | | Z Z S A | | 2 , 1 s . t . S 0 , A 0 , S T S = I k × k ,
where I k × k is an identity matrix with the size of k, A R k × d is a coefficient matrix which maps the original features from high-dimensional space into a new low-dimensional space, S = [ s 1 , s 2 , , s k ] R d × k is defined as a feature weight matrix, and k is the number of selected features. The constraint conditions of S T S = I k × k and S 0 are to force each element of matrix S to be either one or zero and at most a nonzero element in any row or column during the iterative update process. Thus, the matrix S can be considered as an indicator matrix for the selected features under these constraint conditions.

3.1.2. RASL Model

Although the proposed RMFFS model can implement feature selection, it does not sufficiently consider the local structure relationships among original data, which may lead to unsatisfactory performance. Therefore, a robust structure learning method based on the self-representation capability of the original data is proposed, which can learn non-negative representation coefficients to indicate the structural relationships among original data. In order to learn more discriminative non-negative representation coefficients, an error metric function with L21-norm is imposed. Hence, the objective function of RASL is defined as follows:
min ( Z , W ) = | | Z T Z T W | | 2 , 1 s . t . W 0 ,
where W = [ w 1 , w 2 , , w n ] R n × n denotes the non-negative representation coefficient matrix. The element of matrix W with a larger value means that the correlation between two data is more remarkable.

3.1.3. SR Model

Preserving the structure of selected feature space is very important for feature learning [31]. Thus, a structure regularization (SR) term is designed, which can be defined as follows:
min ε ( Z , S , W ) = i , j = 1 n | | y i y j | | 2 2 w i j = t r ( Y D Y T ) t r ( Y W Y T ) = t r ( Y L Y T ) = t r ( S T Z T L Z S ) s . t .   S 0 ,   W 0 ,
where y i = S T z i T denotes the selected feature of sample zi, L is a Laplacian matrix defined by L = D W [32], and D is a diagonal matrix with elements of d i i = j = 1 n w i j .

3.1.4. The Framework of RMFRASL

The three modules (feature selection, robust structure learning, and structure regularization) are integrated into a unified framework to obtain the final objective function of RMFRASL as follows:
min ψ ( Z , S , A , W ) = ϕ ( Z , A , S ) + α ε ( Z , S , W ) + β ( Z , W ) = | | Z Z S A | | 2 , 1 + α t r ( S T Z T L Z S ) + β | | Z T Z T W | | 2 , 1 s . t .   S 0 , A 0 , W 0 , S T S = I k × k ,
where α > 0 and β > 0 are tradeoff parameters that adjust the importance of each term in the model.

3.2. Model Optimization

The proposed model has three variables (S, A, and W) to be solved. The objective function of Equation (12) is not a convex optimization problem for all variables [32]. Therefore, we cannot obtain the globally optimal solution for the objective function. In order to solve the above problem, we need to design an iterative optimization algorithm. Specifically, we can optimize one of the variables while fixing the remainder, with alternate iterative updates until the objective function converges.

3.2.1. Fix A and W; Update S

For Equation (12), removing the irrelevant term with S yields the following objective function:
φ ( S ) = | | Z Z S A | | 2 , 1 + α t r ( S T Z T L Z S ) s . t . S 0 , S T S = I k × k .
However, the constraint conditions in Equation (13) make it difficult to optimize; hence, we can alleviate the constraint condition S T S = I k × k to | | S T S I k × k | | 2 2 as a penalty term. Thus, Equation (13) is transformed into the following objective function:
φ ( S ) = | | Z Z S A | | 2 , 1 + α t r ( S T Z T L Z S ) + λ | | S T S I | | 2 2 s . t . S 0 ,
where the parameter λ > 0 is used to constrain the orthogonality of vector S.
Let M = Z T L Z ; according to the above definitions and matrix correlation operations [29], Equation (14) can be simplified to Equation (15) through a series of algebraic operations:
φ ( S ) = | | Z Z S A | | 2 , 1 + α t r ( S T M S ) + λ t r ( ( S T S I ) ( S T S I ) T ) = t r ( ( Z Z S A ) T U ( Z Z S A ) ) + α t r ( S T M S ) + λ t r ( ( S T S I ) ( S T S I ) ) = t r ( ( Z T A T S T Z T ) U ( Z Z S A ) ) + α t r ( S T M S ) + λ t r ( S T S S T S 2 S T S ) = t r ( Z T U Z 2 A T S T Z T U Z + A T S T Z T U Z S A + α S T M S + λ S T S S T S 2 λ S T S ) s   . t .       S 0 ,
where U R n × n denotes the diagonal matrix defined as follows:
U i i = 1 | | ( Z Z S A ) i | | 2 + ε ,
where ε is a small constant.
To solve Equation (15), a Lagrange multiplier Λ with non-negative constraint S 0 is introduced, and the Lagrangian function φ ( S , Λ ) can be defined as follows:
φ ( S , Λ ) = t r ( Z T U Z 2 A T S T Z T U Z + A T S T Z T U Z S A + α S T M S ) + λ t r ( S T S S T S 2 S T S ) + t r ( Λ S ) s   . t .       S 0 .
The partial derivative of Equation (17) with respect to matrix S can be obtained as follows:
φ ( S , Λ ) S = 2 Z T U Z A T + 2 Z T U Z S A A T + 2 α M S + 4 λ S S T S 4 λ S + Λ .
Equation (19) can be obtained according to the KKT condition ( Λ i j S i j = 0 ) [33].
( 2 Z T U Z A T + 2 Z T U Z S A A T + 2 α M S + 4 λ S S T S 4 λ S + Λ ) S i j i j = 0 .
The update formula of the matrix S is as follows:
S = S Z T U Z A T + α M S + 2 λ S Z T U Z S A A T + α M + S + 2 λ S S T S ,
where M = M + M , M + = | M | + M 2 , and M = | M | M 2 .

3.2.2. Fix S and W; Update A

For Equation (12), removing the irrelevant term with A yields the following objective function:
φ ( A ) = | | Z Z S A | | 2 , 1 s   . t .         A 0 .
Through a series of algebraic formulas [29], Equation (21) can be simplified as
φ ( A ) = | | Z Z S A | | 2 , 1 = t r ( ( Z Z S A ) T U ( Z Z S A ) ) = t r ( Z T U Z 2 A T S T Z T U Z + A T S T Z T U Z S A ) s   . t .         A 0 .
Likewise, the Lagrangian multiplier ϖ with a non-negative constraint A 0 is introduced to solve Equation (22), and the Lagrangian function φ ( A , ϖ ) can be expressed as follows:
φ ( A , ϖ ) = t r ( Z T U Z 2 A T S T Z T U Z + A T S T Z T U Z S A ) + t r ( ϖ A ) .
The partial derivative of Equation (23) with respect to A is defined as follows:
φ ( A , ϖ ) A = 2 S T Z T U Z + 2 S T Z T U Z S A + ϖ = 0 .
As the complementary relaxation of the KKT condition leads to ϖ i j A i j = 0 , we can get Equation (25).
( 2 S T Z T U Z + 2 S T Z T U Z S A + ϖ ) A i j i j = 0 .
According to Equation (25), the update rule of the matrix A is as follows:
A = A S T Z T U Z S T Z T U Z S A .

3.2.3. Fix S and A; Update W

For Equation (12), the following objective function can be obtained by removing the irrelevant term with W:
φ ( W ) = β | | Z T Z T W | | 2 , 1 + α i , j = 1 n | | y i y j | | 2 2 w i j = β | | Z T Z T W | | 2 , 1 + α i , j = 1 n D i s t Y i j w i j ,
where D i s t Y = [ D i s t Y i j ] R n × n , and D i s t Y i j = | | y i y j | | 2 2 .
Through a series of algebraic formulas [29], Equation (27) can be simplified as
φ ( W ) = | | Z T Z T W | | 2 , 1 + μ i , j = 1 n D i s t Y i j w i j = t r ( ( Z T Z T W ) T C ( Z T Z T W ) ) + μ t r ( D i s t Y W ) = t r ( Z C Z T 2 W T Z C Z T + W T Z C Z T W + μ D i s t Y W ) s   . t .         W 0 ,
where μ = α / β ( β 0 ) and C R d × d is a diagonal matrix defined as follows:
C i i = 1 | | ( Z T Z T W ) i | | 2 + ε .
To solve Equation (28), a Lagrange multiplier τ for the non-negative constraint W 0 is introduced, and the Lagrangian function φ ( W , τ ) is defined as follows:
φ ( W , τ ) = t r ( Z C Z T 2 W T Z C Z T + W T Z C Z T W + μ D i s t Y W ) + t r ( τ W ) .
The partial derivative Equation (30) with respect to W can be shown as follows:
φ ( W , τ ) W = 2 Z C Z T + 2 Z C Z T W + μ D i s t Y + τ = 0 .
As the complementary relaxation of the KKT condition leads to τ i j W i j = 0 , we can obtain Equation (32).
( 2 Z C Z T + 2 Z C Z T W + μ D i s t Y + τ ) W i j i j = 0 .
The update formula for the matrix W is as follows:
W = W 2 Z C Z T 2 Z C Z T W + μ D i s t Y .

3.3. Algorithm Description

The pseudo code of RMFRASL is listed in Algorithm 1, and a flowchart of the proposed method is shown Figure 2. Moreover, the termination condition of the algorithm is that the change in objective function values between two successive iterations is less than a threshold or has reached the predefined maximum number of iterations in our work.
Algorithm 1 RMFRASL algorithm
Input: The matrix of sample dataset Z = [ z 1 ; z 2 ; ; z n ] R n × d , parameters α > 0, β > 0, λ > 0
1: Initialization: matrices S0, A0, and W0 are initial nonnegative matrices, t = 0
2: Calculate matrices Ut, and Ct according to Equations (16) and (29),
3: Repeat
4:   Update St according to Equation (20)
5:   Update At according to Equation (26)
6:   Update Wt according to Equation (33)
7:   Update Ut and Ct according to Equations (16) and (29)
8:   Update t = t + 1
9: Until the objective function is convergent.
Output: Index matrix S, coefficient matrix A, and weight matrix W

3.4. Computational Complexity Analysis

The computational complexity can be calculated in light of the above flowchart of the proposed RMFRASL algorithm. First, the computational complexities of all variables in one iteration are shown in Table 2, where d, n, and k represent the dimensions of the original sample, the number of samples, and the number of selected features, respectively. Since k << d and k << n, we only need to compare the size of d and n. (1) If d > n, S, A, and W need O(nd2), O(kdn), and O(nd2), respectively. Then, the computational complexity of the RMFRASL algorithm is O(t × nd2 ) where t is the number of iterations. (2) If n≥ d, S, A, and W need O(dn2), O(kn2), and O(dn2), respectively. Then, the computational complexity of the RMFRASL algorithm is O(t × dn2). Finally, we can conclude that the computational complexity of the RMFRASL algorithm is O(t × max (dn2, nd2) + dn2).
Meanwhile, the running time of the proposed RMFRASL and other compared feature selection methods was also tested when the number of experimental iterations was set to 10. From the experimental results listed in Table 3, the running time of the proposed method was lower than that of the other compared methods.

4. Convergence Analysis

In order to analyze the convergence of the proposed RMFRASL method, we firstly establish some theorems and definitions below.
Theorem 1.
For variables S, A, and W, the value of Equation (12) is nonincreasing under the update rules of Equations (20), (26) and (33).
We can refer to the convergence analysis of NMF to solve the above theorem. Therefore, some auxiliary functions are introduced below.
Definition 1.
ϕ ( m , m ) is an auxiliary function of φ ( m ) if ϕ ( m , m ) and φ ( m ) satisfy the following conditions [35]:
ϕ ( m , m ) φ ( m ) ϕ ( m , m ) = φ ( m ) .
Lemma 1.
If ϕ ( m , m ) is an auxiliary function of φ ( m ) , then it is nonincreasing under the following conditions:
m t + 1 = arg min m ϕ ( m , m t ) .
Proof. 
φ ( m t + 1 ) ϕ ( m t + 1 , m t ) ϕ ( m t , m t ) = φ ( m t ) .
In order to apply Lemma 1 to prove Theorem 1, we need to find suitable auxiliary functions for the variables S, A, and W. Therefore, the functions, first-order derivatives, and second-order derivatives of the three variables are specified. □
φ i j ( S i j ) = [ Z T U Z 2 A T S T Z T U Z + A T S T Z T U Z S A + α S T Z T L Z S + λ S T S S T S 2 λ S T S ] i j ,
φ i j ( S i j ) = 2 [ Z T U Z A T + Z T U Z S A A T + α Z T L Z S + 2 λ S S T S 2 λ S ] i j ,
φ i j ( S i j ) = 2 [ Z T U Z ] i i [ A A T ] j j + 2 α [ M ] i j + 4 λ [ S S T ] i j ,
φ i j ( A i j ) = [ Z T U Z 2 A T S T Z T U Z + A T S T Z T U Z S A ] i j ,
φ i j ( A i j ) = 2 [ S T Z T U Z + S T Z T U Z S A ] i j ,
φ i j ( A i j ) = 2 [ S T Z T U Z S T ] i i ,
φ i j ( W i j ) = [ Z C Z T 2 W T Z C Z T + W T Z C Z T W + μ D i s t Y W ] i j ,
φ i j ( W i j ) = [ 2 Z C Z T + 2 Z C Z T W + μ D i s t Y ] i j ,
φ i j ( W i j ) = 2 [ Z C Z T ] i i ,
where φ i j ( · ) and φ i j ( · ) denote the first-order and second-order derivatives, respectively.
Then, we can obtain the three lemmas described below.
Lemma 2.
The function
ϕ ( S i j , S i j ( t ) ) = φ i j ( S i j , S i j ( t ) ) + φ i j ( S i j ) ( S i j S i j ( t ) ) + [ Z T U Z S A A T + α M + S + 2 λ S S T S ] i j S i j ( t ) ( S i j S i j ( t ) ) 2
is an auxiliary function of φ ( S i j ) .
Lemma 3.
The function
ϕ ( A i j , A i j ( t ) ) = φ i j ( A i j , A i j ( t ) ) + φ i j ( A i j ) ( A i j A i j ( t ) ) + [ S T Z T U Z S A ] i j A i j ( t ) ( A i j A i j ( t ) ) 2
is an auxiliary function of φ ( A i j ) .
Lemma 4.
The function
ϕ ( W i j , W i j ( t ) ) = φ i j ( W i j , W i j ( t ) ) + φ i j ( W i j ) ( W i j W i j ( t ) )                                                     + [ 2 Z C Z T W + μ D i s t Y ] i j W i j ( t ) ( W i j W i j ( t ) ) 2
is an auxiliary function of φ ( W i j ) .
Next, we can prove Lemma 2 using the steps below.
Proof the Lemma 2.
The Taylor series expansion of φ ( S i j ) can be given as follows:
φ i j ( S i j ) = φ i j ( S i j ( t ) ) + φ i j ( S i j ) ( S i j S i j ( t ) ) + 1 2 φ i j ( S i j ) ( S i j S i j ( t ) ) 2 = φ i j ( S i j ( t ) ) + φ i j ( S i j ) ( S i j S i j ( t ) ) + ( 2 [ Z T U Z ] i i [ A A T ] j j + 2 α [ M ] i j + 4 λ [ S S T ] i j ) ( S i j S i j ( t ) ) 2 .
We can obtain the following inequalities through a series of calculations of the matrix:
[ Z T U Z S A A T ] i j = l = 1 k [ Z T U Z S t ] i l [ A A T ] l j [ Z T U Z S t ] i j [ A A T ] j j ( l = 1 d [ Z T U Z ] i l S l j t ) [ A A T ] j j [ Z T U Z ] i i S i j t [ A A T ] j j = [ Z T U Z ] i i [ A A T ] j j S i j t ,
[ M + S ] i j = l = 1 d [ M + ] i l S l j t M i j + S i j t [ M i j + M i j ] S i j t = [ M ] i j S i j t ,
[ S S T S ] i j = l = 1 k [ S S T ] i l S l j t [ S S T ] i i S i j t .
To sum up, we can obtain the inequality
[ Z T U Z S A A T + α M + S + 2 λ S S T S ] i j S i j ( t ) [ Z T U Z S t ] i j [ A A T ] j j + α [ M ] i j + 2 λ [ S S T ] i i .
Therefore, we can obtain Equation (54).
ϕ ( S i j , S i j ( t ) ) φ i j ( S i j ) .
Similarly, we can prove Lemma 3 and Lemma 4. Finally, according to Lemma 1, we can obtain the update scheme of variables S, A, and W as follows:
S i j t + 1 = arg min S i j ϕ ( S i j , S i j t ) = S i j t S i j t φ ( S i j ) 2 [ Z T U Z S A A T + α M + S + 2 λ S S T S ] i j = S i j t [ Z T U Z A T + α M S + 2 λ S ] i j [ Z T U Z S A A T + α M + S + 2 λ S S T S ] i j ,
A i j t + 1 = arg min A i j φ ( A i j , A i j t ) = A i j t A i j t φ i j ( A i j ) 2 [ S T Z T U Z S A ] i j = A i j t [ S T Z T U Z ] i j [ S T Z T U Z S A ] i j ,
W i j t + 1 = arg min W i j φ ( W i j , W i j t ) = W i j t W i j t φ i j ( W i j ) 2 [ 2 Z C Z T W + μ D i s t Y ] i j = W i j t [ 2 Z C Z T ] i j [ 2 Z C Z T W + μ D i s t Y ] i j .
In summary, Theorem 1 is proven.
Next, the convergence of the proposed iterative procedure in Algorithm 1 is proven.
For any nonzero vectors p R m and q R m , the following inequality can be obtained:
| | p | | 2 | | p | | 2 2 2 | | q | | 2 | | q | | 2 | | q | | 2 2 2 | | q | | 2 .
More detailed proof of Equation (58) is similar to that in [36].
Theorem 2.
According to the iterative update approach depicted in Algorithm 1, the objective function value in Equation (12) monotonically decreases in each iteration until it converges to the global optimum [36].
Proof. 
First, matrices Ut and Ct are denoted as the t-th iteration of matrices U and C. Then, when fixing Ut and Ct, matrices St+1, At+1, and Wt+1 can be updated by solving the following inequality:
Φ ( S t + 1 , A t + 1 , W t + 1 , U t , C t ) Φ ( S t , A t , W t , U t , C t ) .
The above inequality can be changed to
{ t r ( ( Z Z S ( t + 1 ) A ( t + 1 ) ) T U ( t ) ( Z Z S ( t + 1 ) A ( t + 1 ) ) ) + α t r ( S ( t + 1 ) T Z T L Z S ( t + 1 ) ) + β t r ( ( Z T Z T W ( t + 1 ) ) T C ( t ) ( Z T Z T W ( t + 1 ) ) ) { t r ( ( Z Z S ( t ) A ( t ) ) T U ( t ) ( Z Z S ( t ) A ( t ) ) ) + α t r ( S ( t ) T Z T L Z S ( t ) ) + β t r ( ( Z T Z T W ( t ) ) T C ( t ) ( Z T Z T W ( t ) ) ) .
On the basis of the definitions of matrices Ut and Ct, the following inequalities can be obtained:
{ i | | ( Z Z S ( t + 1 ) A ( t + 1 ) ) i | | 2 2 2 | | ( Z Z S ( t ) A ( t ) ) i | | 2 + α t r ( S ( t + 1 ) T Z T L Z S ( t + 1 ) ) + β i | | ( Z T Z T W ( t + 1 ) ) i | | 2 2 2 | | ( Z T Z T W ( t ) ) i | | 2 { i | | ( Z Z S ( t ) A ( t ) ) i | | 2 2 2 | | ( Z Z S ( t ) A ( t ) ) i | | 2 + α t r ( S ( t ) T Z T L Z S ( t ) ) + β i | | ( Z T Z T W ( t ) ) i | | 2 2 2 | | ( Z T Z T W ( t ) ) i | | 2 ,
{ i | | ( Z Z S ( t + 1 ) A ( t + 1 ) ) i | | 2 ( i | | ( Z Z S ( t + 1 ) A ( t + 1 ) ) i | | 2 i | | ( Z Z S ( t + 1 ) A ( t + 1 ) ) i | | 2 2 2 | | ( Z Z S ( t ) A ( t ) ) i | | 2 ) + α t r ( S ( t + 1 ) T Z T L Z S ( t + 1 ) ) + β i | | ( Z T Z T W ( t + 1 ) ) i | | 2 β ( i | | ( Z T Z T W ( t + 1 ) ) i | | 2 i | | ( Z T Z T W ( t + 1 ) ) i | | 2 2 2 | | ( Z T Z T W ( t ) ) i | | 2 ) { i | | ( Z Z S ( t ) A ( t ) ) i | | 2 ( i | | ( Z Z S ( t ) A ( t ) ) i | | 2 i | | ( Z Z S ( t ) A ( t ) ) i | | 2 2 2 | | ( Z Z S ( t ) A ( t ) ) i | | 2 ) + α t r ( S ( t ) T Z T L Z S ( t ) ) + β i | | ( Z T Z T W ( t ) ) i | | 2 β ( i | | ( Z T Z T W ( t ) ) i | | 2 i | | ( Z T Z T W ( t ) ) i | | 2 2 2 | | ( Z T Z T W ( t ) ) i | | 2 ) .
According to Equation (58), the following inequalities can be obtained:
i | | ( Z Z S ( t + 1 ) A ( t + 1 ) ) i | | 2 i | | ( Z Z S ( t + 1 ) A ( t + 1 ) ) i | | 2 2 2 | | ( Z Z S ( t ) A ( t ) ) i | | 2 i | | ( Z Z S ( t ) A ( t ) ) i | | 2 i | | ( Z Z S ( t ) A ( t ) ) i | | 2 2 2 | | ( Z Z S ( t ) A ( t ) ) i | | 2 ,
i | | ( Z T Z T W ( t + 1 ) ) i | | 2 i | | ( Z T Z T W ( t + 1 ) ) i | | 2 2 2 | | ( Z T Z T W ( t ) ) i | | 2 i | | ( Z T Z T W ( t ) ) i | | 2 i | | ( Z T Z T W ( t ) ) i | | 2 2 2 | | ( Z T Z T W ( t ) ) i | | 2 .
Combining Equation (61) to Equation (64), the following inequality can be obtained:
{ i | | ( Z Z S ( t + 1 ) A ( t + 1 ) ) i | | 2 + α t r ( S ( t + 1 ) T Z T L Z S ( t + 1 ) ) + β i | | ( Z T Z T W ( t + 1 ) ) i | | 2 { i | | ( Z Z S ( t ) A ( t ) ) i | | 2 + α t r ( S ( t ) T Z T L Z S ( t ) ) + β i | | ( Z T Z T W ( t ) ) i | | 2 .
From Equation (65), we can clearly see that Equation (12) is monotonically decreasing during iterations and has a lower bound. Therefore, the proposed iterative optimization algorithm will converge. Moreover, numerical experiments verified that the objective function values of our proposed method tend to converge.

5. Experiments and Analysis

In this section, we compare the proposed RMFRASL method with some advanced unsupervised feature selection methods in recent years for classification and clustering tasks, mainly consisting of LSFS [19], RNEFS [20], USFS [21], CNAFS [22], NNSAFS [23], RSR [25], and SPNFSR [34].

5.1. Description of Compared Methods

RNEFS [20] is a robust neighborhood embedding unsupervised feature selection method that employs the locally linear embedding (LLE) to compute the feature weight matrix and minimizes the reconstruction error using the L1-norm. USFS [21] guides feature selection by combining the feature selection matrix with soft labels learned in the low-dimensional subspace. CNAFS [22] combines adaptive graph learning with pseudo-label matrices and constructs two different manifold regularizations with pseudo-label and projection matrices. NNSAFS [23] uses residual terms from sparse regression and introduces feature graphs and maximum entropy theory to construct the manifold structure. RSR [25] uses a linear combination to represent the correlation among each feature. Meanwhile, it also imposes an L21-norm on the loss function and weight matrix to select representative features. SPNFSR [34] constructs a low-rank graph to maintain the global and local structure of data and employs the L21-norm to constrain the representation coefficients for feature selection. The summarization of the methods is shown in Table 4.

5.2. Description of Experimental Data

In this experiment, we performed classification and clustering tasks [37] on five publicly available datasets, including four publicly available face datasets (AR, CMU PIE, Extended YaleB, and ORL) and one object dataset (COIL20). The specific details of the five datasets are shown in Table 5, where Tr and Te represent the number of training samples and testing samples selected from each class.
The AR [38] database consists of 4000 images corresponding to face images of 70 males and 56 females. The photos have different facial expressions, lighting conditions, and occlusions. There are 26 images for each object. Some examples of the AR database are shown in Figure 3a.
The CMU PIE [39] database consists of 41,368 multi-pose, illumination, and expression face images, including face images with different pose conditions, illumination conditions, and expressions for each object. Some instances of the CMU PIE database are shown in Figure 3b.
The Extended YaleB [40] database consists of 2414 images containing 38 objects with 64 photographs taken for each object, with strict control over pose, lighting, and the shooting angle. Some face images of the Extended YaleB database are shown in Figure 3c.
The ORL [41] dataset consists of 400 face images containing 40 subjects taken at different times of day, under different lighting, and with different facial expressions. Some examples of the ORL dataset are shown in Figure 3d.
The COIL20 [42] dataset contains 1440 images, taken from 20 objects at different angles, with 72 images per object. Some images the COIL20 dataset are shown in Figure 3e.

5.3. Experimental Evaluation

For the classification task, the nearest neighbor classifier (NNC) [43] was employed to classify the selected features of each method, and the classification accuracy rate was used to evaluate the performance, which is defined as
A c c u r a c y r a t e = N c o r N t o r × 100 % ,
where Ncor and Ntol denote the number of the samples that are accurately classified and the number of all samples for classification, respectively.
For the clustering task, we applied the k-means clustering to cluster the selected features of different methods, and we adopted clustering accuracy (ACC) [44,45,46] to evaluate their performances. The ACC is defined as follows:
A C C = 1 n k = 1 n ϑ ( l k , m a p ( c k ) ) ,
where n is total number of the test samples, lk and ck denote the ground truth and the corresponding clustering result of the test sample xk, respectively, and ϑ ( · , · ) is a function which is defined as follows:
ϑ ( x , y ) = { 1 , i f x = y 0 i f x y .
m a p ( · ) is a mapping function that projects each clustering label ck to its corresponding truth label lk using the Kuhn–Munkres algorithm. Moreover, normalized mutual information (NMI) [46] was also used to test the clustering performance, which is defined as follows:
N M I ( U , V ) = I ( U , V ) H ( U ) H ( V ) ,
where U and V denotes two arbitrary variables, H(U) and H(V) are defined as the entropies of U and V, and I(U, V) is defined as the mutual information of variables U and V.

5.4. Experimental Setting

In the experiments of this paper, we randomly selected T sample data from each database as the training sample set and the remaining C sample data as the test samples. All experiments were repeated 10 times, and the average of their classification accuracy was calculated. In order to find the best parameters for the proposed method, a grid search method [47] was used. For the three balance parameters α, β, and λ, the values of α and β were set in the range {0, 0.01, 0.1, 1, 10, 100, 1000, and 10,000}, and λ was set as large as possible (i.e., 100,000) in this work. For the feature dimensions, we set the range from 60 to 560 with intervals of every 100 dimensions.

5.5. Analysis of Experimental Results

5.5.1. Classification Performance Analysis

In the first experiment, we tested the influence of parameter values on the classification performance of the proposed method. First, we analyzed the impact of each balance parameter (i.e., α and β) on the accuracy of the RMFRASL method. From the experimental results shown in Figure 4, we can observe that (1) when the value of parameters is set to 0, the performance of the proposed RMFRASL method is relatively low. Thus, adding these two terms is necessary for feature selection. (2) With the parameter value increasing, the performance of the method also improves, which indicates that introducing the two terms can enhance the discriminatory capability of the selected features. (3) When the performance of the proposed method reaches the optimal value, it decreases or remains stable as the value of the parameter increases. This phenomenon may be due to the parameters being set as larger values such that the importance of the second and third terms of the objective function is overemphasized, while the first term is neglected. As a result, the proposed method is unable to extract the discriminatory features. Then, we evaluated the impact of the number of iterations on the performance of the proposed method. Figure 5 shows the classification accuracy of RMFRASL with different numbers of iterations on five datasets, where the number of iterations was set to {20, 100, 200, 300, 400, and 500}. The experimental results show that the accuracy rate increased with the number of iterations between 20 and 100 and then stabilized or decreased slightly. The best combination of parameters for the classification task is listed in Table 6.
In the second experiment, we compared the performance of different feature selection methods. The classification accuracies of each unsupervised feature selection method on the five databases are given in Table 7. From the experimental results, we can find that, due to the LSFS computing the feature scores without considering the correlation of the selected feature, its performance was the worst. The RNEFS considers the feature self-representation capability; hence, its performance was better than LSFS. The performance of USFS was superior to RNEFS and LSFS since it incorporates soft label learning and constructs the regression relationships between the soft label matrix and features to guides feature selection. CNAFS and NNSAFS combine the geometric structure of the data or features and can select more discriminatory features to obtain slightly higher classification accuracy than RNEFS and USFS. Since RSR and SPNFSR exploit the feature self-representation capability and define the reconstruction error with L21-norm to significantly improve robustness to noise, their performances are better than other compared methods. However, SPNFSR outperforms RSR due to it considering the geometric structure of features. By fully considering the feature self-representation, geometric structure, adaptive graph learning, and robustness together, our proposed RMFRASL achieved the best results among all compared methods. The classification accuracy rate curves of different feature selection methods with different numbers of selected features are shown in Figure 6. First, with the increase in the number of selected features, the classification accuracy rates of all methods improved. Then, the classification accuracy rates of most methods began to stabilize after achieving their best performances.

5.5.2. Clustering Performance Analysis

In this subsection, the k-means method was used for the clustering task. To reduce the randomness of initialization in k-means, we report the mean of 10 experiments. First, the effect of different parameters on the performance of the proposed method was tested. The experimental results of each balance parameter (i.e., α and β) on the clustering of the RMFRASL method are shown in Figure 7. The results show that the influences of the balance parameters on the performance of our method were similar to those for the classification experiments. Then, after setting the optimal parameter settings for RMFRASL in Table 8, the clustering results on ACC and NMI were as shown in Table 9 and Table 10, consistent with the classification experiments.

5.5.3. Convergence Analysis

In this subsection, we set the parameters to their respective optimal combinations in each of the five databases; then, iterative convergence analysis was performed. Figure 8 shows that RMFRASL achieved convergence with a small number of iterations on each database. Moreover, we provide the number of iterations needed by the compared methods to achieve convergence in Table 11. According to this table, the number of iterations of the proposed RMFRASL method was lower than that of the other methods in most cases.

6. Discussion and Future Work

Feature selection is an effective dimensionality reduction technique, and previous methods did not consider the performance of adaptive graph learning and robustness simultaneously. To solve the above problems, this paper proposed a robust adaptive structure learning-based matrix factorization (RMFRASL) method, which integrates feature selection and adaptive graph learning into a unified framework. A large number of experiments showed that RMFRASL could achieve better performance than previous methods. Deep learning and transfer learning show excellent promise in feature learning tasks. In future work, it may be possible to introduce deep learning and transfer learning schemes into feature selection to develop a more efficient feature selection method.

Author Contributions

Data curation, S.L., L.H., P.L. and Z.L.; formal analysis, S.L., P.L. and Z.L.; methodology, S.L., L.H. and Y.Y.; resources, S.L. and Z.L.; supervision, L.H. and Y.Y.; writing—original draft, S.L., L.H. and J.W.; writing—review and editing, Y.Y. and J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (Nos. 62062040, 62006174, and 61967010), the Outstanding Youth Project of Jiangxi Natural Science Foundation (No. 20212ACB212003), the Jiangxi Province Key Subject Academic and Technical Leader Funding Project (No. 20212BCJ23017), and the Fund of Jilin Provincial Science and Technology Department (No. 20210101187JC).

Data Availability Statement

The data were derived from public domain resources.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cheng, X.; Fang, L.; Hong, X.; Yang, L. Exploiting mobile big data: Sources, features, and applications. IEEE Netw. 2017, 31, 72–79. [Google Scholar] [CrossRef]
  2. Cheng, X.; Fang, L.; Yang, L.; Cui, S. Mobile big data: The fuel for data-driven wireless. IEEE Internet Things J. 2017, 4, 1489–1516. [Google Scholar] [CrossRef]
  3. Lv, Z.; Lou, R.; Li, J.; Singh, A.K.; Song, H. Big data analytics for 6G-enabled massive internet of things. IEEE Internet Things J. 2021, 8, 5350–5359. [Google Scholar] [CrossRef]
  4. Tang, C.; Zheng, X.; Liu, X.; Zhang, W.; Zhang, J.; Xiong, J.; Wang, L. Cross-view locality preserved diversity and consensus learning for multi-view unsupervised feature selection. IEEE Trans. Knowl. Data Eng. 2021, 34, 4705–4716. [Google Scholar] [CrossRef]
  5. Jin, J.; Xiao, R.; Daly, I.; Miao, Y.; Wang, X.; Cichocki, A. Internal feature selection method of CSP based on L1-norm and Dempster-Shafer theory. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4814–4825. [Google Scholar] [CrossRef] [PubMed]
  6. Zhu, F.; Ning, Y.; Chen, X.; Zhao, Y.; Gang, Y. On removing potential redundant constraints for SVOR learning. Appl. Soft Comput. 2021, 102, 106941. [Google Scholar] [CrossRef]
  7. Li, Z.; Nie, F.; Bian, J.; Wu, D.; Li, X. Sparse pca via L2,p-norm regularization for unsupervised feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 1–8. [Google Scholar] [CrossRef] [PubMed]
  8. Luo, F.; Zou, Z.; Liu, J.; Lin, Z. Dimensionality reduction and classification of hyperspectral image via multistructure unified discriminative embedding. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–16. [Google Scholar] [CrossRef]
  9. Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef]
  10. Awotunde, J.B.; Chakraborty, C.; Adeniyi, A.E. Intrusion detection in industrial internet of things network-based on deep learning model with rule-based feature selection. Wirel. Commun. Mob. Comput. 2021, 2021, 7154587. [Google Scholar] [CrossRef]
  11. Aminanto, M.E.; Choi, R.; Tanuwidjaja, H.C.; Yoo, P.D.; Kim, K. Deep abstraction and weighted feature selection for Wi-Fi impersonation detection. IEEE Trans. Inf. Secur. 2017, 13, 621–636. [Google Scholar] [CrossRef]
  12. Zhang, F.; Li, W.; Feng, Z. Data driven feature selection for machine learning algorithms in computer vision. IEEE Internet Things J. 2018, 5, 4262–4272. [Google Scholar] [CrossRef]
  13. Qi, M.; Wang, T.; Liu, F.; Zhang, B.; Wang, J.; Yi, Y. Unsupervised feature selection by regularized matrix factorization. Neurocomputing 2018, 273, 593–610. [Google Scholar] [CrossRef]
  14. Zhai, Y.; Ong, Y.S.; Tsang, I.W. The emerging “big dimensionality”. IEEE Comput. Intell. Mag. 2014, 9, 14–26. [Google Scholar] [CrossRef]
  15. Bermejo, P.; Gámez, J.A.; Puerta, J.M. Speeding up incremental wrapper feature subset selection with Naive Bayes classifier. Knowl. -Based Syst. 2014, 55, 140–147. [Google Scholar] [CrossRef]
  16. Cheng, L.; Wang, Y.; Liu, X.; Li, B. Outlier detection ensemble with embedded feature selection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3503–3512. [Google Scholar]
  17. Lu, Q.; Li, X.; Dong, Y. Structure preserving unsupervised feature selection. Neurocomputing 2018, 301, 36–45. [Google Scholar] [CrossRef]
  18. Li, X.; Zhang, H.; Zhang, R.; Liu, Y.; Nie, F. Generalized uncorrelated regression with adaptive graph for unsupervised feature selection. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 1587–1595. [Google Scholar] [CrossRef]
  19. He, X.; Cai, D.; Niyogi, P. Laplacian score for feature selection. Adv. Neural Inf. Process. Syst. 2005, 18, 1–8. [Google Scholar]
  20. Liu, Y.; Ye, D.; Li, W.; Wang, H.; Gao, Y. Robust neighborhood embedding for unsupervised feature selection. Knowl.-Based Syst. 2020, 193, 105462. [Google Scholar] [CrossRef]
  21. Wang, F.; Zhu, L.; Li, J.; Chen, H.; Zhang, H. Unsupervised soft-label feature selection. Knowl.-Based Syst. 2021, 219, 106847. [Google Scholar] [CrossRef]
  22. Yuan, A.; You, M.; He, D.; Li, X. Convex non-negative matrix factorization with adaptive graph for unsupervised feature selection. IEEE Trans. Cybern. 2020, 52, 5522–5534. [Google Scholar] [CrossRef] [PubMed]
  23. Shang, R.; Zhang, W.; Lu, M.; Jiao, L.; Li, Y. Feature selection based on non-negative spectral feature learning and adaptive rank constraint. Knowl.-Based Syst. 2022, 236, 107749. [Google Scholar] [CrossRef]
  24. Zhao, H.; Li, Q.; Wang, Z.; Nie, F. Joint Adaptive Graph Learning and Discriminative Analysis for Unsupervised Feature Selection. Cogn. Comput. 2022, 14, 1211–1221. [Google Scholar] [CrossRef]
  25. Zhu, P.; Zuo, W.; Zhang, L.; Hu, Q.; Shiu, S.C. Unsupervised feature selection by regularized self-representation. Pattern Recognit. 2015, 48, 438–446. [Google Scholar] [CrossRef]
  26. Shi, L.; Du, L.; Shen, Y.D. Robust spectral learning for unsupervised feature selection. In Proceedings of the 2014 IEEE International Conference on Data Mining, Shenzhen, China, 14–17 December 2014; pp. 977–982. [Google Scholar]
  27. Du, S.; Ma, Y.; Li, S.; Ma, Y. Robust unsupervised feature selection via matrix factorization. Neurocomputing 2017, 241, 115–127. [Google Scholar] [CrossRef]
  28. Miao, J.; Yang, T.; Sun, L.; Fei, X.; Niu, L.; Shi, Y. Graph regularized locally linear embedding for unsupervised feature selection. Pattern Recognit. 2022, 122, 108299. [Google Scholar] [CrossRef]
  29. Hou, C.; Nie, F.; Li, X.; Yi, D.; Wu, Y. Joint embedding learning and sparse regression: A framework for unsupervised feature selection. IEEE Trans. Cybern. 2013, 44, 793–804. [Google Scholar]
  30. Wang, S.; Pedrycz, W.; Zhu, Q.; Zhu, W. Subspace learning for unsupervised feature selection via matrix factorization. Pattern Recognit. 2015, 48, 10–19. [Google Scholar] [CrossRef]
  31. Zhu, F.; Gao, J.; Yang, J.; Ye, N. Neighborhood linear discriminant analysis. Pattern Recognit. 2022, 123, 108422. [Google Scholar] [CrossRef]
  32. Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 1548–1560. [Google Scholar]
  33. Dreves, A.; Facchinei, F.; Kanzow, C.; Sagratella, S. On the solution of the KKT conditions of generalized Nash equilibrium problems. SIAM J. Optim. 2011, 21, 1082–1108. [Google Scholar] [CrossRef]
  34. Zhou, W.; Wu, C.; Yi, Y.; Luo, G. Structure preserving non-negative feature self-representation for unsupervised feature selection. IEEE Access 2017, 5, 8792–8803. [Google Scholar] [CrossRef]
  35. Yi, Y.; Zhou, W.; Liu, Q.; Luo, G.; Wang, J.; Fang, Y.; Zheng, C. Ordinal preserving matrix factorization for unsupervised feature selection. Signal Process. Image Commun. 2018, 67, 118–131. [Google Scholar] [CrossRef]
  36. Yang, Y.; Shen, H.T.; Ma, Z.; Huang, Z.; Zhou, F. ℓ 2, 1-norm regularized discriminative feature selection for unsupervised learning. Int. Jt. Conf. Artif. Intell. 2011, 22, 1589–1594. [Google Scholar]
  37. Yi, Y.; Wang, J.; Zhou, W.; Fang, Y.; Kong, J.; Lu, Y. Joint graph optimization and projection learning for dimensionality reduction. Pattern Recognit. 2019, 92, 258–273. [Google Scholar] [CrossRef]
  38. Martinez, A.; Benavente, R. The AR Face Database: Cvc Technical Report; Universitat Autònoma de Barcelona: Bellaterra, Spain, 1998; Volume 24. [Google Scholar]
  39. Sim, T.; Baker, S.; Bsat, M. The CMU pose, illumination, and expression (PIE) database. In Proceedings of the Fifth IEEE International Conference on Automatic Face Gesture Recognition, Washington, DC, USA, 21–21 May 2002; pp. 53–58. [Google Scholar]
  40. Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 643–660. [Google Scholar] [CrossRef] [Green Version]
  41. Samaria, F.; Harter, A. Parameterisation of a Stochastic Model for Human Face Identification. In Proceedings of the 1994 IEEE Workshop on Applications of Computer Vision, Sarasota, FL, USA, 5–7 December 1994; pp. 138–142. [Google Scholar]
  42. Nene, S.; Nayar, S.; Murase, H. Columbia object image library (COIL-20). In Technical Report, CUCS-005-96; Columbia University: New York, NY, USA, 1996. [Google Scholar]
  43. Dai, J.; Chen, Y.; Yi, Y.; Bao, J.; Wang, L.; Zhou, W.; Lei, G. Unsupervised feature selection with ordinal preserving self-representation. IEEE Access 2018, 6, 67446–67458. [Google Scholar] [CrossRef]
  44. Tang, C.; Li, Z.; Wang, J.; Liu, X.; Zhang, W.; Zhu, E. Unified One-step Multi-view Spectral Clustering. IEEE Trans. Knowl. Data Eng. 2022, 1–11. [Google Scholar] [CrossRef]
  45. Tang, C.; Zhu, X.; Liu, X.; Li, M.; Wang, P.; Zhang, C.; Wang, L. Learning a joint affinity graph for multiview subspace clustering. IEEE Trans. Multimed. 2018, 21, 1724–1736. [Google Scholar] [CrossRef]
  46. Li, Z.; Tang, C.; Liu, X.; Zheng, X.; Zhang, W.; Zhu, E. Consensus graph learning for multi-view clustering. IEEE Trans. Multimed. 2021, 24, 2461–2472. [Google Scholar] [CrossRef]
  47. Zhu, F.; Gao, J.; Xu, C.; Yang, J.; Tao, D. On selecting effective patterns for fast support vector regression training. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 3610–3622. [Google Scholar] [PubMed]
Figure 1. The structure flowchart of the proposed RMFRASL method.
Figure 1. The structure flowchart of the proposed RMFRASL method.
Algorithms 16 00014 g001
Figure 2. The flowchart of the proposed RMFRASL method.
Figure 2. The flowchart of the proposed RMFRASL method.
Algorithms 16 00014 g002
Figure 3. Selected images from the five databases: (a) AR; (b) CMU PIE; (c) Extended YaleB; (d) ORL; (e) COIL20.
Figure 3. Selected images from the five databases: (a) AR; (b) CMU PIE; (c) Extended YaleB; (d) ORL; (e) COIL20.
Algorithms 16 00014 g003
Figure 4. The classification accuracy rate of different parameter values: (a) parameter α; (b) parameter β.
Figure 4. The classification accuracy rate of different parameter values: (a) parameter α; (b) parameter β.
Algorithms 16 00014 g004
Figure 5. Accuracy rate of the proposed RMFRASL method on each database with different iteration.
Figure 5. Accuracy rate of the proposed RMFRASL method on each database with different iteration.
Algorithms 16 00014 g005
Figure 6. Accuracy rate of each database in different dimensions: (a) AR; (b) CMU PIE; (c) Extended YaleB; (d) ORL; (e) COIL20.
Figure 6. Accuracy rate of each database in different dimensions: (a) AR; (b) CMU PIE; (c) Extended YaleB; (d) ORL; (e) COIL20.
Algorithms 16 00014 g006aAlgorithms 16 00014 g006b
Figure 7. The clustering accuracy (ACC) of different parameter values: (a) parameter α; (b) parameter β.
Figure 7. The clustering accuracy (ACC) of different parameter values: (a) parameter α; (b) parameter β.
Algorithms 16 00014 g007
Figure 8. The convergence of the RMFRASL algorithm: (a) AR; (b) CMU PIE; (c) Extended YaleB; (d) ORL; (e) COIL20.
Figure 8. The convergence of the RMFRASL algorithm: (a) AR; (b) CMU PIE; (c) Extended YaleB; (d) ORL; (e) COIL20.
Algorithms 16 00014 g008aAlgorithms 16 00014 g008b
Table 1. Description of commonly used symbols.
Table 1. Description of commonly used symbols.
SymbolDescriptionSymbolDescription
Z R n × d Sample matrixdDimension of sample
S R d × k Indicator matrixnNumber of samples
A R k × d Low-dimensional feature matrixkNumber of selected features
W R n × n Weight matrix | | · | | 1 L1-norm
I R k × k Identity matrix | | · | | 2 L2-norm
U R n × n Diagonal matrix | | · | | 2 , 1 L21-norm
C R d × d Diagonal matrix t r ( · ) Matrix trace operation
Table 2. Computational complexity of each variable.
Table 2. Computational complexity of each variable.
MatrixComplexityMatrixComplexity
SO(max (dn2, nd2))UO(dn2)
AO(max (kn2, kdn))CO(kn2)
WO(max (nd2, dn2))
Table 3. Running time (s) of each method on different databases.
Table 3. Running time (s) of each method on different databases.
MethodARCMU PIEExtended YaleBORLCOIL20
RNEFS [20]11.5715.4313.891.6984.187
USFS [21]36.1230.0918.2718.298.524
CNAFS [22]11.8114.1911.484.2925.382
NNSAFS [23]34.8352.9543.704.2618.775
RSR [25]3.6074.9204.3771.5531.805
SPNFSR [34]15.2118.2416.247.2918.924
RMFRASL2.2502.6872.4931.1901.504
Table 4. The summarization of compared methods in this paper.
Table 4. The summarization of compared methods in this paper.
MethodObjective FunctionL21-Norm
LSFS [19]
(2005)
i j ( z r i z r i ) 2 W i j V a r ( z r ) No
RNEFS [20]
(2020)
| | ( I n A T ) Z S | | 1 + α 4 | | S T S I m | | 2 2 + t r ( β S T ) No
USFS [21]
(2021)
i = 1 n j = 1 c | | W T z i o j | | 2 2 y i j + α | | Z S Y | | F 2 + γ | | S | | 2 , 1 + β | | Y | | 2 2 No
CNAFS [22]
(2020)
{ | | Z Z S A | | 2 2 + | | C n ( Z T W Y p ) | | 2 2 + λ | | W | | 2 , 1 + α t r ( Y P T L Y p ) + β i j g i j log g i j + γ t r ( A L A T ) + ε t r ( A T Q A ) No
NNSAFS [23]
(2022)
{ | | Z T W Y | | 2 2 + α 1 | | W | | 1 + α 2 T r ( W T L W W ) + 2 λ ( ( T r ( Y T L S Y ) + β i j g i j log g i j ) No
RSR [24]
(2015)
| | Z Z S | | 2 , 1 + λ | | S | | 2 , 1 Yes
SPNFSR [34]
(2017)
| | Z Z S | | 2 , 1 + α t r ( S T M S ) + β | | S | | 2 , 1 Yes
RMFRASL
(2022)
| | Z Z S A | | 2 , 1 + α t r ( S T Z T L Z S ) + β | | Z T Z T W | | 2 , 1 Yes
Table 5. Details of five databases.
Table 5. Details of five databases.
DatabaseImage SizeNumber of ClassesEach Class SizeTrTe
AR32 × 321001477
CMU PIE32 × 3268241212
Extended YaleB32 × 3238642044
ORL32 × 32401073
COIL2032 × 3220722052
Table 6. The optimal combination of parameters and dimension on the classification task.
Table 6. The optimal combination of parameters and dimension on the classification task.
DatabasesAccuracy RateParameter {α, β}Dimension r
AR69.77 ± 0.79{0.1, 0.01}260
CMU PIE89.17 ± 0.80{0.1, 1}360
Extended YaleB64.98 ± 0.75{0.1, 0.01}360
ORL93.92 ± 1.56{10000, 0.01}460
COIL2095.93 ± 1.17{1000, 0.1}160
Table 7. The best accuracy rate and STD of different algorithms on the five databases.
Table 7. The best accuracy rate and STD of different algorithms on the five databases.
MethodARCMU PIEExtended YaleBORLCOIL20
LSFS61.90 ± 2.35
(560)
83.69 ± 1.36
(560)
61.21 ± 1.56
(560)
91.42 ± 3.07
(560)
90.34 ± 1.48
(560)
RNEFS63.56 ± 1.48
(460)
85.54 ± 1.29
(560)
62.95 ± 0.75
(260)
92.33 ± 2.00
(460)
91.68 ± 0.93
(560)
USFS63.71 ± 2.71
(460)
86.90 ± 1.17
(560)
63.57 ± 1.38
(560)
92.67 ± 1.93
(560)
92.66 ± 1.39
(60)
CNAFS63.83 ± 1.66
(560)
83.95 ± 1.56
(560)
63.92 ± 1.12
(560)
93.13 ± 1.85
(560)
94.29 ± 1.22
(160)
NNSAFS64.90 ± 2.12
(460)
87.29 ± 0.64
(560)
63.58 ± 1.39
(360)
92.17 ± 1.81
(560)
93.56 ± 1.32
(160)
RSR67.79 ± 2.03
(260)
88.45 ± 0.90
(360)
64.28 ± 1.10
(360)
93.33 ± 1.47
(560)
94.88 ± 1.15
(560)
SPNFSR68.50 ± 0.91
(260)
88.22 ± 1.02
(360)
64.02 ± 1.95
(360)
92.83 ± 1.81
(560)
94.21 ± 1.42
(460)
RMFRASL69.77 ± 0.97
(260)
89.17 ± 0.80
(360)
64.98 ± 0.75
(360)
93.92 ± 1.56
(460)
95.93 ± 1.17
(160)
Table 8. The optimal combination of parameters and dimension on the clustering task.
Table 8. The optimal combination of parameters and dimension on the clustering task.
DatabasesACC%NMI%Parameter {α, β}Dimension r
AR34.33 ± 1.3365.56 ± 1.04{0.01, 1}380
CMU PIE33.03 ± 0.9056.66 ± 0.97{0.01, 1}480
Extended YaleB34.42 ± 1.1657.26 ± 1.87{0.1, 0.01}350
ORL61.75 ± 2.4278.78 ± 1.24{0.01, 0.1}270
COIL2059.90 ± 1.5672.56 ± 1.07{0.01, 1}240
Table 9. The best ACC% and STD of different algorithms on the five databases.
Table 9. The best ACC% and STD of different algorithms on the five databases.
MethodARCMU PIEExtended YaleBORLCOIL20
LSFS29.89 ± 1.3726.88 ± 0.9432.82 ± 0.9355.57 ± 2.6553.42 ± 2.77
RNEFS30.37 ± 1.3428.13 ± 0.5933.99 ± 0.6059.79 ± 2.2255.67 ± 1.99
USFS31.00 ± 1.1728.50 ± 1.5933.39 ± 0.9057.79 ± 2.1558.73 ± 3.66
CNAFS31.10 ± 1.6730.38 ± 1.3834.33 ± 0.7757.53 ± 0.0058.27 ± 3.67
NNSAFS31.10 ± 1.7031.32 ± 1.4333.76 ± 0.6257.79 ± 5.0158.55 ± 3.25
RSR32.91 ± 1.2232.17 ± 1.4233.52 ± 0.8058.79 ± 3.5255.77 ± 1.65
SPNFSR33.13 ± 0.8032.07 ± 1.0133.57 ± 0.9159.57 ± 2.8557.10 ± 3.46
RMFRASL34.33 ± 1.3333.03 ± 0.9034.42 ± 1.1661.75 ± 2.4259.90 ± 1.56
Table 10. The best NMI% and STD of different algorithms on the five databases.
Table 10. The best NMI% and STD of different algorithms on the five databases.
MethodARCMU PIEExtended YaleBORLCOIL20
LSFS63.00 ± 0.7552.81 ± 0.5155.13 ± 1.4075.96 ± 1.1864.92 ± 1.71
RNEFS62.87 ± 0.7753.64 ± 0.5555.51 ± 0.8278.13 ± 0.8265.55 ± 1.42
USFS63.04 ± 1.1454.01 ± 1.1056.10 ± 0.9176.71 ± 1.6167.92 ± 1.69
CNAFS63.03 ± 1.0154.63 ± 0.8256.34 ± 1.3775.90 ± 0.0068.66 ± 2.15
NNSAFS63.99 ± 1.1753.78 ± 1.2556.26 ± 1.3872.26 ± 2.4369.19 ± 2.38
RSR63.53 ± 0.9154.98 ± 1.0656.66 ± 1.6976.09 ± 1.4766.60 ± 1.91
SPNFSR64.68 ± 0.6655.18 ± 0.8657.18 ± 1.0578.68 ± 1.4170.89 ± 1.53
RMFRASL65.56 ± 1.0456.66 ± 0.9757.26 ± 1.8778.78 ± 1.2472.56 ± 1.07
Table 11. The number of iterations of different algorithms on the five databases.
Table 11. The number of iterations of different algorithms on the five databases.
MethodARCMU PIEExtended YaleBORLCOIL20
RNEFS1523141215
USFS1330171520
CNAFS4095103982430
NNSAFS15455524
SPNFSR396410412462490
RMFRASL372120811
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lai, S.; Huang, L.; Li, P.; Luo, Z.; Wang, J.; Yi, Y. RMFRASL: Robust Matrix Factorization with Robust Adaptive Structure Learning for Feature Selection. Algorithms 2023, 16, 14. https://doi.org/10.3390/a16010014

AMA Style

Lai S, Huang L, Li P, Luo Z, Wang J, Yi Y. RMFRASL: Robust Matrix Factorization with Robust Adaptive Structure Learning for Feature Selection. Algorithms. 2023; 16(1):14. https://doi.org/10.3390/a16010014

Chicago/Turabian Style

Lai, Shumin, Longjun Huang, Ping Li, Zhenzhen Luo, Jianzhong Wang, and Yugen Yi. 2023. "RMFRASL: Robust Matrix Factorization with Robust Adaptive Structure Learning for Feature Selection" Algorithms 16, no. 1: 14. https://doi.org/10.3390/a16010014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop