A Bi-Directional Two-Dimensional Deep Subspace Learning Network with Sparse Representation for Object Recognition

Li, Xiaoxue; Feng, Weijia; Wang, Xiaofeng; Guo, Jia; Chen, Yuanxu; Yang, Yumeng; Wang, Chao; Zuo, Xinyu; Xu, Manlu

doi:10.3390/electronics12183745

Open AccessArticle

A Bi-Directional Two-Dimensional Deep Subspace Learning Network with Sparse Representation for Object Recognition

by

Xiaoxue Li

¹

,

Weijia Feng

^1,2,*

,

Xiaofeng Wang

³

,

Jia Guo

¹,

Yuanxu Chen

⁴,

Yumeng Yang

¹,

Chao Wang

¹,

Xinyu Zuo

¹ and

Manlu Xu

¹

College of Computer and Information Engineering, Tianjin Normal University, Tianjin 300387, China

²

Postdoctoral Innovation Practice Base, Huafa Industrial Share Co., Ltd., Zhuhai 519000, China

³

Tianjin Key Laboratory for Advanced Mechatronic System Design and Intelligent Control, Tianjin University of Technology, Tianjin 300384, China

⁴

Ping An Technology, Shenzhen 518000, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(18), 3745; https://doi.org/10.3390/electronics12183745

Submission received: 25 July 2023 / Revised: 23 August 2023 / Accepted: 29 August 2023 / Published: 5 September 2023

(This article belongs to the Special Issue Applications of Artificial Intelligence, Machine Learning, Deep Learning, and Explainable AI (XAI))

Download

Browse Figures

Versions Notes

Abstract

:

A principal component analysis network (PCANet), as one of the representative deep subspace learning networks, utilizes principal component analysis (PCA) to learn filters that represent the dominant structural features of objects. However, the filters used in PCANet are linear combinations of all the original variables and contain complex and redundant principal components, which hinders the interpretability of the results. To address this problem, we introduce sparse constraints into a subspace learning network and propose three sparse bi-directional two-dimensional PCANet algorithms, including sparse row 2D

^{2}

PCANet (SR2D

^{2}

PCANet), sparse column 2D

^{2}

PCANet (SC2D

^{2}

PCANet), and sparse row–column 2D

^{2}

PCANet (SRC2D

^{2}

PCANet). These algorithms perform sparse operations on the projection matrices in the row, column, and row–column direction, respectively. Sparsity is achieved by utilizing the elastic net to shrink the loads of the non-primary elements in the principal components to zero and to reduce the redundancy in the projection matrices, thus improving the learning efficiency of the networks. Finally, a variety of experimental results on ORL, COIL-100, NEC, and AR datasets demonstrate that the proposed algorithms learn filters with more discriminative information and outperform other subspace learning networks and traditional deep learning networks in terms of classification and run-time performance, especially for less sample learning.

Keywords:

deep subspace learning network; sparse representation; sparse bi-directional two-dimensional principal component analysis network; object recognition

1. Introduction

Deep learning [1] exploits network models with multiple hidden layers to train large amounts of data and learn deep features to achieve object detection or recognition [2]. As a classic deep learning model, convolutional neural networks (CNNs) [3] with convolutional and pooling layers have achieved unprecedented accuracy in numerous object recognition tasks. Nevertheless, when training CNN models, researchers need to determine many network parameters in advance, and the large number of convolutional operations leads to significant computational cost. In addition, CNNs have complex structures, converge slowly and may even lead to overfitting, and require high-performance hardware for huge computations [4]. To address these problems, several researchers have proposed simpler deep subspace learning models [5,6,7,8], including principal component analysis network (PCANet) [9], linear discriminant analysis network (LDANet) [9], independent component analysis network (ICANet) [10], canonical correlation analysis network (CCANet) [11], and local binary pattern network (LBPNet) [12].

PCANet is one of the most representative deep subspace learning networks with a simpler structure compared to CNNs. The filters are learned by a PCA algorithm without adjusting the parameters. Binary hashing and block histogram are deployed for indexing and pooling, located after two successive layers of filter learning. This model avoids the over-fitting problem caused by small sample datasets. Naturally, it requires less computational cost, thus demonstrating the advantages of deep subspace learning networks. PCANet has been widely applied in face recognition [13], power load characteristics classification [14], biometric identification [15], coal and gangue identification [16], image fusion [17], cell tracking [18], and texture representation [19].

In recent years, various improvements have been proposed for PCANet, either by adding new processing layers to the original PCANet structure or by replacing PCA with other subspace learning methods.

Several studies have improved recognition accuracy by modifying the PCANet network structure. Liu et al. [20] proposed an enhanced PCANet (EPCANet) that adds a spatial pooling layer between the first and second layers of PCANet. Low et al. [13] presented a stacking PCANet+ that operates on each feature map using mean pool units. Wang et al. [21] added another PCA convolution in the second stage to extract features by considering the global structure. Wang et al. [22] put forward a MMPCANet to obtain more image feature information by using spatial pyramids as the feature pooling layer. Duan et al. [23] provided a new multi-scale stack sparse PCANet (MS-SSPCANet) that introduces sparsity, multi-scale filters, and multi-scale pooling layers to optimize the PCANet structure. This network introduces sparsity, but the network structure is complicated. Sun et al. [5] employed feature pooling in a deep subspace model and added a rank-based average pooling layer between each subspace mapping layer. Increasing the number of layers or complicating the data processing inevitably deviates from the original purpose of the PCANet, which is to effectively obtain meaningful features using a simple network structure.

Another way to improve PCANet is to replace PCA with other subspace learning methods to learn filters in the convolution stage. To extract revocable palmprint features, Haouam et al. [24] proposed a discrete cosine transform network (DCTNet) using discrete cosine transform (DCT) as the filters. Feng et al. [25] presented a discriminant locality alignment network (DLANet) for wild scene classification, where the DLA layer maximizes the margin between inter-class patches and minimizes the distance between intra-class patches within a local area. Mustafa et al. [26] suggested a multilevel dense network (MLDNet) for multi-focus image fusion, where feature extraction, feature fusion, and reconstruction are learned within the same network to provide an end-to-end solution. Zeng et al. [27] proposed a quaternary PCANet (QPCANet) that extends PCANet using quaternion theory to ensure greater intra-class invariance and improve the recognition rate for color images. Yang et al. [11] put forward a canonical correlation analysis network (CCANet) that uses a two-view feature representation and learns two-view multilevel filters by the canonical correlation analysis (CCA) method. Fan et al. [28] combined the second-order statistical pooling method with the shallow network PCANet and submitted another network model, called PCANet-II, to obtain more discriminative information. Sun et al. [29] replaced the PCA in PCANet with a combination of fisher linear discriminant analysis (LDA) and PCA and developed a fisher PCANet (FPCANet). Yu and Wu [30] further extended PCANet to 2DPCANet and employed 2DPCA [31] to learn filter banks. To enhance the robustness of the algorithm against outliers, Li et al. [32] put forward an L1-2D

^{2}

PCANet and replaced the PCA algorithm with L1-2DPCA. In order to capture nonlinear structures within data and more representational image features, Sun et al. [33] proposed a nonlinear two-dimensional PCANet (RN2DPCANet). An approximate method based on a Gaussian kernel was used to map the original image to random feature space. The above methods further improve the performance of filters learning in different ways, but the principal components are still linear combinations of all the original variables, which leads to the appearance of complex and redundant principal components and hinders the interpretability of the results.

Sparse representation for linear combinations of principal components is particularly necessary. Zou et al. [34] proposed a sparse principal component analysis (SpPCA) and transformed the solution of principal components into a regression problem. The algorithm obtains reconstructed principal components with sparse loadings by applying an elastic net [35]. As a result, the sparse principal component analysis generally provides higher data interoperability, as well as less redundancy. Dutta et al. [36] presented a sparse PCANet by utilizing SpPCA to learn multistage filter banks in the convolution stage. Although the PCA used in PCANet makes principal components sparse, it concatenates two-dimensional image matrices into one-dimensional vectors, which leads to time-consuming computation and the removal of the intrinsic structure of images.

In this paper, we combine SpPCA and 2D

^{2}

PCANet to propose three new models, called sparse row 2D

^{2}

PCANet (SR2D

^{2}

PCANet), sparse column 2D

^{2}

PCANet (SC2D

^{2}

PCANet), and sparse row–column 2D

^{2}

PCANet (SRC2D

^{2}

PCANet). To solve the problems of redundancy and unfavorable interpretability in the principal components, we employ elastic net to sparse the principal components. Considering problems that the PCA algorithm leads to time-consuming computation and loses the 2D features, 2D

^{2}

PCA is used in filters learning to reduce computation and preserve the intrinsic features of images. After two stages of filter convolution, we perform binary hashing and block-wise histograms to encode features. The three proposed algorithms further sparsify the extracted principal components while retaining the advantages of deep subspace learning networks, thus reducing redundancy and improving the learning efficiency of the network. To evaluate the proposed methods, we test them on four publicly available datasets and compare them with other traditional convolutional neural networks.

The remainder of the paper is organized as follows: Section 2 provides related work. Section 3 explains SR2D

^{2}

PCANet, SC2D

^{2}

PCANet, and SRC2D

^{2}

PCANet. The experimental results and analysis are presented in Section 4, and Section 5 concludes the paper.

2. Related Work

Some of the deep subspace learning models, such as PCANet, 2D

^{2}

PCANet, and L1-2D

^{2}

PCANet, will be explained in this section.

2.1. PCANet

PCANet is a simpler deep subspace learning model that uses PCA of patches extracted from the input of layers to learn filters. Generally, PCANet consists of two PCA stages and the hashing and histogram stage.

Suppose that there exists a dataset,

{P_{i}}_{i = 1}^{N}

, with N training samples of size

m \times n

. The size of the sampling patch in each stage is

k \times k

, the number of filters in the first stage is

L_{1}

, and the number of filters in the second stage is

L_{2}

. For the i-th image,

P_{i}

, of the dataset, all overlapping patches are collected and the image is represented as:

X_{i} = [x_{i, 1}, x_{i, 2}, \dots, x_{i, j}, \dots, x_{i, \tilde{m} \tilde{n}}] \in R^{k^{2} \times \tilde{m} \tilde{n}}

(1)

where

x_{i, j} \in R^{k^{2} \times 1}

denotes the j-th vectorized patch,

\tilde{m} = m - k + 1

and

\tilde{n} = n - k + 1

.

Then, after these patches are centralized, this image matrix can be expressed as

{\bar{X}}_{i} = [{\bar{x}}_{i, 1}, {\bar{x}}_{i, 2}, \dots, {\bar{x}}_{i, j}, \dots, {\bar{x}}_{i, \tilde{m} \tilde{n}}]

. The centralized training dataset is denoted as:

X = [{\bar{X}}_{1}, {\bar{X}}_{2}, \dots, {\bar{X}}_{N}] \in R^{k^{2} \times N \tilde{m} \tilde{n}}

(2)

PCA satisfies the objective of minimizing the reconstruction variance within a family of orthogonal filters:

min_{V \in R^{k^{2} \times L_{1}}} {∥X - V V^{T} X∥}_{F}^{2}, s . t . V^{T} V = I

(3)

where I is an identity matrix of size

L_{1} \times L_{1}

, and V represents the projection matrix composed of

L_{1}

principal eigenvectors.

Suppose that

q_{l} (X X^{T})

is the l-th principal eigenvector of

X X^{T}

, then the projection matrix,

V_{l}^{s_{1}}

, in the first stage (what is denoted by the upper script

s_{1}

) is described by:

V_{l}^{s_{1}} = m a t_{k, k} (q_{l} (X X^{T})) \in R^{k \times k}, l = 1, 2, \dots, L_{1}

(4)

where

m a t_{k, k} (v)

represents the transformation of a vector,

v \in R^{k^{2} \times 1}

, into a matrix,

V_{l}^{s_{1}} \in R^{k \times k}

.

By using the PCA filters to convolve with the original training set, the outputs of the first stage become:

O_{i, l}^{s_{1}} = P_{i} * V_{l}^{s_{1}}, i = 1, 2, \dots, N; l = 1, 2, \dots, L_{1}

(5)

where ∗ denotes 2D convolution, and the boundary of

P_{i}

is zero-padded before convolving with

V_{l}^{s_{1}}

.

Similar to those in the first stage, the PCA filters,

V_{l}^{s_{2}}

, in the second stage (what is denoted by the upper script

s_{2}

) can be obtained by using

O_{i, l}^{s_{1}}

as the input in the second stage, where

l = 1, 2, \dots, L_{2}

. For each input

O_{i, l}^{s_{1}}

in the second stage, we have

L_{2}

outputs:

O_{i, l}^{s_{2}} = {\{O_{i, l}^{s_{1}} * V_{l}^{s_{2}}\}}_{l = 1}^{L_{2}}

(6)

Every

L_{2}

real-valued output of

O_{i, l}^{s_{1}}

is binarized, and the resultant decimal-coded image is:

Π_{i}^{l} = \sum_{l = 1}^{L_{2}} 2^{l - 1} H (O_{i, l}^{s_{2}}), l = 1, 2, \dots, L_{2}

(7)

where

H (\cdot)

is a Heaviside step function, and each pixel value ranges from 0 to

2^{L_{2}} - 1

.

For each of the

L_{1}

images,

Π_{i}^{l} (l = 1, 2, \dots, L_{1})

, we partition it into B blocks. Then, we compute the histogram (with

2^{L_{2}}

bins) of the decimal values in each block and concatenate all the B histograms into one vector and denote them as

B h i s t (Π_{i}^{l})

. After this encoding process, the input image,

P_{i}

, is converted to the vector,

φ_{i} (i = 1, 2, \dots, N)

, which is composed of block histograms and represented as:

φ_{i} = [B h i s t (Π_{i}^{1}), \dots, B h i s t (Π_{i}^{L_{1}})] \in R^{(2^{L_{2}}) L_{1} B}

(8)

2.2. 2D $^{2}$ PCANet

2D

^{2}

PCANet is the same as PCANet in the network architecture but utilizes 2D

^{2}

PCA to learn filters instead of PCA. For the i-th image,

P_{i}

is represented as a data matrix, and all (overlapping) patches are collected as:

A_{i} = (a_{i, 1}, a_{i, 2}, \dots, a_{i, j}, \dots, a_{i, \tilde{m} \tilde{n}}) \in R^{k \times k \tilde{m} \tilde{n}}

(9)

where

a_{i, j} \in R^{k \times k}

denotes the j-th patch in

P_{i}

,

\tilde{m} = m - k + 1

and

\tilde{n} = n - k + 1

.

After removing the average of each block, then

{\bar{A}}_{i} = ({\bar{a}}_{i, 1}, {\bar{a}}_{i, 2}, \dots, {\bar{a}}_{i, j}, \dots, {\bar{a}}_{i, \tilde{m} \tilde{n}})

. For all training images, the normalized sample matrices are concatenated, and the dataset is expressed as:

\begin{matrix} A = ({\bar{A}}_{1}, \dots, {\bar{A}}_{N}) \\ = ({\bar{a}}_{1, 1}, {\bar{a}}_{1, 2}, \dots, {\bar{a}}_{1, \tilde{m} \tilde{n}}, \dots, {\bar{a}}_{N, 1}, {\bar{a}}_{N, 2}, \dots, {\bar{a}}_{N, \tilde{m} \tilde{n}}) \\ = (a_{1}, a_{2}, \dots, a_{N \tilde{m} \tilde{n}}) \in R^{k \times k N \tilde{m} \tilde{n}} \end{matrix}

(10)

The objective of optimizing the 2D

^{2}

PCANet is defined as:

min_{V \in R^{k \times L_{1}}} {∥A - V V^{T} A∥}_{F}^{2}, s . t . V^{T} V = I

(11)

where V represents the projection matrix.

In the 2D

^{2}

PCANet model, 2D

^{2}

PCA is employed to learn filters of two stages, and the binary hashing and block-wise histogram stage is exploited to generate the local features.

2.3. L1-2D $^{2}$ PCANet

To improve robustness, the L1-2D

^{2}

PCANet algorithm was proposed, which replaces the square of the F-norm in the 2D

^{2}

PCANet with the

l_{1}

-norm. The data input method is the same as that of 2D

^{2}

PCANet. The objective function of L1-2D

^{2}

PCANet is defined as:

min_{V \in R^{k \times L_{1}}} {∥A - V V^{T} A∥}_{l_{1}}, s . t . V^{T} V = I

(12)

where

{∥\cdot∥}_{l_{1}}

represents the

l_{1}

-norm.

To satisfy the minimum reconstruction error in the objective function, the polarity function is introduced in the iteration to find the optimal solution. L1-2D

^{2}

PCANet is robust in handling data with noise or outliers.

2.4. SpPCANet

SpPCANet uses SpPCA to learn filters in the network, and the remaining architecture is the same as the PCANet. SpPCA utilizes elastic net, which combines

l_{1}

-norm penalty and

l_{2}

-norm penalty to obtain the sparse load of principle components. The characteristic of the

l_{1}

-norm penalty is that it tends to select only one predictor variable from a group of variables, while the

l_{2}

-norm penalty tends to select all variables. After obtaining the projection matrix in (4), the objective function of the elastic net is:

\underset{β}{arg min} {∥Y - V β∥}^{2} + λ_{1} {∥β∥}_{1} + λ_{2} {∥β∥}_{2}

(13)

where

β

is an

L_{1}

-dimensional parameter, representing the loading of the projection matrix;

V \in R^{k^{2} \times L_{1}}

denotes the projection matrix; Y is the corresponding observation; and

λ_{1}

and

λ_{2}

are non-negative parameters, representing the penalty parameters that control the sparsity of

β

.

2.5. Summary

These four representative deep subspace networks employ different methods to learn filters. Obviously, the 2D

^{2}

PCA method reserves 2D features of images, and SpPCA further reserves sparse principal components. They all improve recognition efficiency. A detailed summary of the four methods is listed in Table 1.

3. Proposed Sparse 2D $^{2}$ PCANet Algorithms

In this section, we present three algorithms, including SR2D

^{2}

PCANet, SC2D

^{2}

PCANet, and SRC2D

^{2}

PCANet. These algorithms utilize 2D

^{2}

PCA to compute the filters without matrix-to-vector conversion. In addition, sparse operations are performed on the extracted principal components in the row and column directions, thus facilitating the interpretation of the discriminant results.

3.1. SR2D $^{2}$ PCANet

The proposed SR2D

^{2}

PCANet method is based on the 2D

^{2}

PCANet model described in Section 2. SR2D

^{2}

PCANet performs sparse operations only on the principal components in the row direction, while those in the column direction remain unchanged. Since all key elements of the input layer are outlined, this section directly discusses the convolutional layer. The architecture of SR2D

^{2}

PCANet is shown in Figure 1.

3.1.1. SR2D $^{2}$ PCANet Convolutional Layer

After patch sampling and averaging operations, all training images can be represented as (10). Then, the covariance matrices in the row and column directions are denoted as:

G_{r} = \frac{1}{N \tilde{m} \tilde{n}} \sum_{i = 1}^{N \tilde{m} \tilde{n}} {(a_{i} - \bar{a})}^{T} (a_{i} - \bar{a})

(14)

G_{c} = \frac{1}{N \tilde{m} \tilde{n}} \sum_{i = 1}^{N \tilde{m} \tilde{n}} (a_{i} - \bar{a}) {(a_{i} - \bar{a})}^{T}

(15)

respectively, where

\bar{a}

denotes the average data matrix of all patches.

For the sake of distinction, suppose

R \in R^{k \times L_{1}}

is the projection matrix in the row direction, satisfying

k > L_{1}

. The matrix,

R = [R_{1}, R_{2}, \dots, R_{L_{1}}]

, consists of

L_{1}

principal eigenvectors of

G_{r}

. Similarly, suppose

Z \in R^{k \times L_{1}}

is the projection matrix in the column direction, then

C = [C_{1}, C_{2}, \dots, C_{L_{1}}]

contains

L_{1}

principal eigenvectors of

G_{c}

. In this way, we obtain projection matrices in both row and column directions and then apply the elastic net to achieve the sparse representation.

For the projection matrix,

R = [R_{1}, R_{2}, \dots, R_{L_{1}}]

, in the row direction, the objective function is:

β_{l} = \underset{L_{1}}{arg min} {(R_{l} - β_{l})}^{T} G_{r} (R_{l} - β_{l}) + λ_{2} {∥β_{l}∥}_{2} + λ_{1, l} {∥β_{l}∥}_{1}, l = 1, 2, \dots, L_{1}

(16)

where

β_{l}

is the loading of

R_{l}

,

λ_{2}

is the penalty parameter based on

l_{2}

-norm, and

λ_{1, l}

denotes the penalty parameter based on

l_{1}

-norm.

The larger

λ_{1, l}

is, the sparser the obtained

β_{l}

will be. By choosing the appropriate parameters,

λ_{2}

and

λ_{1, l}

, the solution satisfying the sparsity can be achieved. Using (16) to solve the elastic net problem for

L_{1}

principal components yields

B = [β_{1}, β_{2}, \dots, β_{L_{1}}]

. The matrices R and B together are solved together as iterations. Thus, the numerical solution to the SR2D

^{2}

PCA method is listed in Algorithm 1.

Algorithm 1 SR2D

^{2}

PCA

Input:

R, B, λ_{2}

,

λ_{1, l} (l = 1, 2, \dots, L_{1})

.
Initialize:

t = 0, R (t) = [R_{1}, R_{2}, \dots, R_{L_{1}}], B (t) = [β_{1}, β_{2}, \dots, β_{L_{1}}], ε = 1 \times 10^{- 4}

.
Output:

R^{*}

.

1:: $r \leftarrow 1$
2:: $s \leftarrow 1$
3:: while $r > = ε \lor s > = ε$ do
4:: Fixed $R (t)$ , solve the $L_{1}$ elastic net problems using (16) and obtain $B (t + 1)$
5:: Fixed $B (t + 1)$ , compute the SVD of $G_{r} B (t + 1) = U Σ V^{T}$
6:: $R (t + 1) \leftarrow U V^{T}$
7:: $r \leftarrow {∥R (t + 1) - R (t)∥}_{2}$
8:: $s \leftarrow {∥B (t + 1) - B (t)∥}_{2}$
9:: $t \leftarrow t + 1$
10:: end while
11:: ${\hat{β}}_{l} = \frac{β_{l}}{∥β_{l}∥}$
12:: $R^{'} \leftarrow R (t)$
13:: $B^{'} \leftarrow B (t)$
14:: return $R^{*} = R^{'} {(B^{'})}^{T}$

After being processed, the projection matrix, R, in the row direction is sparse and the projection matrix, C, in the column direction remains unchanged. In this way, the filters of SR2D

^{2}

PCANet in the first stage are obtained:

W_{l}^{1} = C_{l}^{} \cdot {(R_{l}^{*})}^{T} \in R^{k \times k}, l = 1, 2, \dots, L_{1}

(17)

By using the 2D

^{2}

PCA filters to convolve with the training image,

P_{i}

, the l-th filter output in the first stage is obtained by:

P_{i}^{l} = P_{i} * W_{l}^{1} \in R^{m \times n}, i = 1, 2, \dots, N

(18)

where ∗ denotes the two-dimensional convolution, and the boundary of

P_{i}

is zero-padded before being convolved with

W_{l}^{1}

.

Each image,

P_{i}

, can be obtained as

L_{1}

output blocks with size

m \times n

; then, all images yield

N L_{1}

output blocks with size

m \times n

as the input in the second stage. Repeating the first stage simply, we can obtain the filters of the second stage,

W_{l}^{2} (l = 1, 2, \dots, L_{2})

, and the output in the second stage is:

O_{i}^{l} = {\{P_{i}^{l} * W_{l}^{2}\}}_{l = 1}^{L_{2}}

(19)

where

i = 1, 2, \dots, N; l = 1, 2, \dots, L_{1}

.

3.1.2. SR2D $^{2}$ PCANet Output Layer

The

L_{2}

real-valued outputs,

O_{i}^{l}

, obtained in the second stage are binarized, and the resulting decimal-coded image is:

Γ_{i}^{l} = \sum_{l = 1}^{L_{2}} 2^{l - 1} H (O_{i}^{l}), i = 1, 2, \dots, N; l = 1, 2, \dots, L_{1}

(20)

where

H (\cdot)

is a Heaviside step function, the value of which is one for positive entries and zero otherwise.

Each pixel value ranges from 0 to

2^{L_{2}} - 1

. Then, the

L_{1}

images

Γ_{i}^{l} (l = 1, 2, \dots, L_{1})

are segmented into B blocks, and the histograms (with

2^{L_{2}}

bins) of the decimal values in each block are computed. Therefore, all B histograms can be concatenated into one vector named

B h i s t (Γ_{i}^{l})

. After this encoding process, the input image,

P_{i}

, is converted to a vector

f_{i} (i = 1, 2, \dots, N)

, which is the set of block-wise histograms:

f_{i} = [B h i s t (Γ_{i}^{1}), \dots, B h i s t (Γ_{i}^{L_{1}})] \in R^{(2^{L_{2}}) L_{1} B}

(21)

3.2. SC2D $^{2}$ PCANet

Different from the SR2D

^{2}

PCANet, the algorithm only carries out the sparse operation on principal components in the column direction. Furthermore, the

L_{1}

principal eigenvectors already obtained from (14) and (15) are

R = [R_{1}, R_{2}, \dots, R_{L_{1}}] \in R^{k \times L_{1}}

and

C = [C_{1}, C_{2}, \dots, C_{L_{1}}]

in the row direction and column direction, respectively. We can obtain the loadings of

C = [C_{1}, C_{2}, \dots, C_{L_{1}}]

:

α_{l} = \underset{L_{1}}{arg min} {(C_{l} - α_{l})}^{T} G_{c} (C_{l} - α_{l}) + λ {∥α_{l}∥}_{2} + λ_{1, l} {∥α_{l}∥}_{1}

(22)

where

l = 1, 2, \dots, L_{1}

,

α_{l}

is the loading of

C_{l}

,

λ

is the penalty parameter of

l_{2}

-norm, and

λ_{1, l}

denotes the penalty parameter of

l_{1}

-norm.

Denote

D = [α_{1}, α_{2}, \dots, α_{L_{1}}]

and use C and D together as input to Algorithm 2.

Algorithm 2 SC2D

^{2}

PCA

Input:

C, D, λ

,

λ_{1, l} (l = 1, 2, \dots, L_{1})

.
Initialize:

t = 0, C (t) = [C_{1}, C_{2}, \dots, C_{L_{1}}], D (t) = [α_{1}, α_{2}, \dots, α_{L_{1}}], ε = 1 \times 10^{- 4}

.
Output:

C^{*}

.

1:: $r \leftarrow 1$
2:: $s \leftarrow 1$
3:: while $r > = ε \lor s > = ε$ do
4:: Fixed $C (t)$ , solve the $L_{1}$ elastic net problems as (22), obtain $D (t + 1)$
5:: Fixed $D (t + 1)$ , compute the SVD of $G_{r} D (t + 1) = U Σ V^{T}$
6:: $C (t + 1) \leftarrow U V^{T}$
7:: $r \leftarrow {∥C (t + 1) - C (t)∥}_{2}$
8:: $s \leftarrow {∥D (t + 1) - D (t)∥}_{2}$
9:: $t \leftarrow t + 1$
10:: end while
11:: ${\hat{α}}_{l} = \frac{α_{l}}{∥α_{l}∥}$
12:: $C^{'} \leftarrow C (t)$
13:: $D^{'} \leftarrow D (t)$
14:: return $C^{*} = C^{'} {(D^{'})}^{T}$

After the sparse operation, we have a sparse projection matrix,

C^{*}

, in the column direction, while the projection matrix, R, in the row direction remain unchanged. Therefore, the filters in the first stage are:

W_{l}^{c_{1}} = C_{l}^{*} \cdot R_{l}^{T} \in R^{k \times k}, l = 1, 2, \dots, L_{1}

(23)

where

c_{1}

is a superscript to distinguish it from other filters.

By repeating the design process in the first stage, we can obtain the filters in the second stage. The output layer of SC2D

^{2}

PCANet is the same as that of SR2D

^{2}

PCANet.

3.3. SRC2D $^{2}$ PCANet

SRC2D

^{2}

PCANet performs sparse operations on the principal components in both the row and column directions. Based on the 2D

^{2}

PCANet,

L_{1}

principal components

R = [R_{1}, R_{2}, \dots, R_{L_{1}}]

and

C = [C_{1}, C_{2}, \dots, C_{L_{1}}]

can be obtained using (14) and (15), respectively. Using Algorithms 1 and 2, we have the sparse projection matrix in the row direction,

R^{*}

, and the sparse projection matrix in the column direction,

C^{*}

. The filters of SRC2D

^{2}

PCANet in the first stage are expressed as:

W_{l}^{r c} = C_{l}^{*} \cdot {(R_{l}^{*})}^{T} \in R^{k \times k}, l = 1, 2, \dots, L_{1}

(24)

where

r c

is a superscript to distinguish it from other filters.

3.4. Discussions

The difficulty of object recognition is how to identify the main features and eliminate the interference of redundant information and noise. The algorithms proposed in this paper perform sparse operations on the basis of 2D

^{2}

PCANet, which preserves the inherent 2D features of the image and reduces the computational effort. At the same time, we further sparsify the principal components to highlight the important factors by compressing the loads of the non-primary elements in the principal components to zero, which improves the efficiency of object recognition. However, due to differences in image features such as illumination, facial expressions, noise, and orientations, the degree of sparsity should be selected based on the specific features of the data. Insufficient sparsity results in redundant information, while excessive sparsity leads to the loss of important image information. Therefore, it is crucial to explore the most appropriate sparsity level for each image dataset.

As for how to choose the appropriate sparsity, we do not have a ready-made standard to refer to; we can only obtain it through experiments. This is also the point that we will focus on in the experimental part. As far as available experience goes, beyond a certain range, more sparse is not better. An algorithm that sparsifies part of the projection matrix may be better than an algorithm that sparsifies projection matrix in both row and column directions.

In addition, the proposed sparse algorithms can be extended to other deep subspace networks. Consequently, the sparse constraints can also be used in other subspace learning fields (LDA, ICA, etc.) and subspace learning network fields (LDANet, ICAnet, CCANet, etc.).

4. Experiments and Analysis

To examine the performance of the three proposed algorithms, SR2D

^{2}

PCANet, SC2D

^{2}

-PCANet and SRC2D

^{2}

PCANet, experiments were conducted on the ORL face dataset, the COIL-100 image base, the NEC toy animal image base, and the AR face dataset. We compared them with several deep subspace networks, such as PCANet [9], 2D

^{2}

PCANet [30], L1-2D

^{2}

PCANet [32], and SpPCANet [36]. The extracted features were then classified using a support vector machine (SVM) algorithm. In addition, traditional convolutional neural network algorithms, including AlexNet [37], VGG [38], and ResNet50 [39], were also used for comparison. The hardware platform for these experiments was an Intel i7 6500U (2.5 GHZ) processor, with 16GB memory, and the software platform was a Windows 10 operating system and PyCharm Community Edition 2022.

4.1. Dataset Description

The ORL dataset consists of 400 facial images with a resolution of

112 \times 92

pixels, 10 images per individual, and includes males and females with a wide range of races and ages. Each image is rich in facial expressions. The COIL dataset contains 100 objects, each of which is rotated 360 degrees to change the pose of the object with respect to a fixed color camera, and has a variety of complex geometric and reflective features. The NEC dataset comprises 5000 images of 60 animal dolls taken from different angles. Each doll has about 72 consecutive photographs at each angle, which can be used for image recognition tasks. In this study, 70 images of each doll were selected for the experiments. The AR face database contains 2600 images, 26 images for each of 50 males and 50 females. These images have facial expression variations, such as smiling or not, eyes open or closed, and wearing glasses or not. For more information on these datasets, see Table 2 and Figure 2 and Figure 3.

4.2. Sparse Stage Experiments

To investigate the optimal sparse stage, we added the sparse operation at different stages. Typically, the parameters of SR2D

^{2}

PCANet are set as follows. In all two stages, the patch (filter) size is

5 \times 5

. The number of filters in each stage is set to

L_{1} = 4

and

L_{2} = 4

. To control the number of blocks, the block size of the local histograms in the output layer is set to

5 \times 5

, and the overlap ratio between blocks is 0.5. We perform sparse operations at different stages, observing the trend in recognition rate changes as the number of training gradually increases. Through the analyses of the experimental results, the optimal sparse stage of SR2D

^{2}

PCANet can be obtained.

In general, the recognition rates of each method increase with the number of training samples and the extracted features become richer. Figure 4 illustrates that the recognition rates of SR2D

^{2}

PCANet-0 and SR2D

^{2}

PCANet-1 increase as the number of training samples increases. However, the recognition rate of SR2D

^{2}

PCANet-2 is much lower than the other two algorithms. Not only is its recognition rate much lower than the other two algorithms but it also does not exhibit a gradual increase in recognition rate with an increase in training samples. This phenomenon confirms our speculation that more sparse operations are not better, and excessive sparsity may lead to the loss of important image features, thus losing the expected effect of the sparse operation. In the COIL-100 dataset, although the recognition rate of SR2D

^{2}

PCANet-1 is higher than that of SR2D

^{2}

PCANet-0 at some points, overall, the performance of SR2D

^{2}

PCANet-0 outperforms that of SR2D

^{2}

PCANet-1. All things considered, in the following experiments, we only perform the sparsification operation in the first stage and directly name SR2D

^{2}

PCANet-0 as SR2D

^{2}

PCANet without specifying in which stage it performs the sparse operation.

4.3. Sparse Degree Experiments

To evaluate the effect of sparsity on the recognition rates of the algorithms, we chose different sampling patch sizes for the input layer. The sampling patch size varies from

3 \times 3

to

15 \times 15

, and the number of filters is kept constant at

2 \times 2

. We found that the sparsity of the projection matrix increases with the increase in the sampling patch size. For the ORL, COIL-100, NEC, and AR datasets, we randomly selected 5, 7, 10, and 3 images for training and the rest for testing, respectively. The results are shown in Figure 5.

It can be seen that there are different optimal sparsities for different datasets. When the sampling patch on ORL is

13 \times 13

, the sparsity is 92.3%, and the accuracy reaches a peak of 92.5%. As the patch size continues to increase, the sparsity increases, but the accuracy begins to decrease. Therefore, the optimal sparsity for the ORL dataset is 92.3%. Similarly, for the COIL-100 dataset, the accuracy peaks when the patch size is

11 \times 11

, at which point its sparsity is 90.9%. For the NEC dataset, the highest accuracy is achieved when the patch size is

9 \times 9

and the sparsity is 88.8%. Furthermore, for the AR dataset, the optimal sparsity is also 88.8%. The above results indicate that the optimal sparsity for different types of datasets varies depending on the extracted features. As we suspected, images with different features have different optimal sparsities. How to bring out the advantages of the sparse operation relies on the appropriate sparsity. Otherwise, the advantage of the algorithm will turn into a disadvantage.

4.4. Classification Rate Experiments

The classification performance of the proposed algorithms was evaluated on ORL, COIL-100, NEC and AR databases. The features extracted by these algorithms were used as inputs to the support vector machine (SVM) classifier. For the SR2D

^{2}

PCANet, SC2D

^{2}

PCANet, and SRC2D

^{2}

PCANet algorithms, we found their optimal sparsity by adjusting the penalty parameters. We randomly took a certain number of images from each dataset as the training set and the remaining images as the testing set. Considering the randomness of sample selection, each parameter was repeated five times for the experiments. Then, we observed the change in the recognition rates of the algorithms as the number of training samples gradually increased. The average classification rates are listed in Table 3, Table 4, Table 5 and Table 6.

We can conclude that the classification rate of each method increases with the increase in the training number. In the vast majority of cases, the classification rates of SR2D

^{2}

PCANet, SC2D

^{2}

PCANet, and SRC2D

^{2}

PCANet outperform those of other deep subspace networks. Obviously, this indicates that the proposed methods have higher learning capability. The reason for these results is that our algorithms use 2D

^{2}

PCA to learn filters, which preserves the inherent 2D features of the image compared to PCANet. This phenomenon also verifies that the sparsification operation removes the redundant principal components compared to the base algorithm, 2D

^{2}

PCANet. Therefore, the proposed algorithms capture the main features of images more accurately after sparsification. However, in the AR dataset, the advantages of our proposed algorithms are not obvious compared to SpPCANet. The reason is that the faces in the AR dataset have natural noise, such as illuminations, scarfs, sunglasses, and so on. We speculate that maybe the proposed algorithms do not perform as well as they should in the face of a certain kind of noise, and we continue to explore this in the next section.

4.5. Robustness Experiments

To validate the robustness of the proposed algorithms, we added 0–60% occlusion noise at an interval of 10% to the training samples of each dataset. For example, 40% occlusion noise means that 40% of training samples are randomly selected and the size of the noise blocks is also 40% of the image size. The noise blocks are randomly located and do not exceed the image boundaries. The network parameters are the same as those used in the classification rate experiments. The number of training samples for the ORL, COIL-100, and NEC datasets were 3, 7, and 5, respectively, and the remaining images were used for testing. Considering the randomness of the noise, the experiment was repeated five times for each parameter. The average recognition results are listed in Table 7, Table 8 and Table 9.

It can be observed that the average classification rates decrease as the noise blocks become larger. This means the fewer useful features extracted, the lower the average classification rates. In these sparse algorithms, most of the extracted principal components are compressed to zero, but this does not affect their excellent and stable recognition ability under noise conditions. This indicates the proposed SR2D

^{2}

PCANet, SC2D

^{2}

PCANet, and SRC2D

^{2}

PCANet are robust to noise.

However, the types of artificially added noise mentioned above are limited. Since the AR dataset did not perform as well in the classification rate experiments, we will research it separately. The AR database is the most widely used standard database, in which some of the faces have natural noise, such as changes in illuminations, scarfs, and sunglasses. The images of illumination changes, scarf occlusion, and sunglass occlusion are used as the training samples and the rest as the testing ones. The average classification rate is listed in Table 10.

It is worth mentioning that the classification accuracies of the three proposed algorithms are worse than that of SpPCANet in the noisy samples with scarfs and sunglasses. The reason for this is found to be the different locations of the two types of noise. For the added artificial noise, the locations of the noise blocks are random but conform to a uniform distribution overall. For the AR dataset, the distribution of noise for varying illumination conditions is also uniformly distributed. However, for images with scarves or sunglasses, the positions of the noise blocks are fixed. Therefore, the SR2D

^{2}

PCANet, SC2D

^{2}

PCANet, and SRC2D

^{2}

PCANet are more sensitive to noisy blocks with fixed locations.

To further validate this finding,

40 \times 40

noise blocks were added at a fixed location to all training images from the ORL, COIL-100, and NEC datasets. Subsequently, the number of training samples for the three datasets was 5, 7, and 30, and the remaining samples were used for testing to obtain classification accuracy. Different training samples were selected and repeated five times, and the average results are listed in Table 11.

The experimental results verify the correctness of the finding. In searching for the reason, it is found that SR2D

^{2}

PCANet, SC2D

^{2}

PCANet, and SRC2D

^{2}

PCANet employ 2D

^{2}

PCA learn filters, which preserve the intrinsic original features of the images and enhance the inherent image features of the fixed-location noise blocks. Consequently, when the samples are covered by fixed-location noise blocks, the recognition rates of the three proposed algorithms are lower than that of SpPCANet. This is a disadvantage of the proposed algorithms. Obviously, it is difficult to encounter this situation in natural noise situations. The location of the noise tends to be more randomly distributed in the generally captured images. However, there is still room for improvement in our algorithms in specific applications, such as face recognition in a workshop where everyone is wearing masks.

4.6. Comparative Experiments with Traditional Deep Learning Algorithms

To highlight the superiority of the proposed algorithms, traditional deep learning CNN networks (e.g., AlexNet, VGG, and ResNet50) were also compared with the three proposed algorithms. The ORL dataset was used in the comparison experiments. These traditional methods were trained in TensorFlow 2.8.0, and the experimental results are shown in Figure 6 and listed in Table 12.

As can be seen from Figure 6, the extracted features become richer with the increase in training samples, and the recognition rate gradually improves. Nevertheless, AlexNet and ResNet50 improve relatively slowly and are unable to quickly learn image features on fewer samples. The accuracy of the VGG network is basically zero, which indicates that overfitting occurs with fewer training samples. For traditional deep learning networks, the network volume and the number of parameters are larger, so the greater the number of samples in the dataset, the more the advantages of the model can be reflected. Therefore, it can be demonstrated that the three proposed algorithms can learn quickly even with limited samples, which reflects the advantage of fast learning in lightweight networks. Moreover, we can observe from Table 12 that the training and testing time of SR2D

^{2}

PCANet, SC2D

^{2}

PCANet, and SRC2D

^{2}

PCANet networks are much lower than that of traditional deep learning networks under the same conditions.

5. Conclusions

In this paper, three algorithms were proposed on the basis of the 2D

^{2}

PCANet algorithm. The corresponding algorithms include SR2D

^{2}

PCANet, SC2D

^{2}

PCANet, and SRC2D

^{2}

PCANet. Modified principal components with sparse loadings were generated by using the elastic net. The proposed algorithms further reduced redundancy and improved the interpretability of the principal components while retaining the advantages of deep subspace learning networks. Subsequently, we explored different stages and degrees of sparsity and determined the optimal sparsity for different datasets. Finally, a variety of experiment results verified the superiority of the proposed three algorithms in learning ability and robustness.

However, since the optimal penalty parameters are based on artificial experiments, future work will focus on finding optimal parameters by using machine learning methods. Moreover, the solution method for sparse PCA needs further improvement. As a next step, we will consider introducing an iterative function into the solution method that can automatically update the dynamic sparse penalty term for different types of data. We believe these issues are also well worth consideration. In any case, the proposed three algorithms models are good solutions for sparsifying redundant information, as was demonstrated by the experimental results. In addition, the sparse operation idea used in the proposed algorithms can be extended and applied to other deep subspace networks, such as LDANet, ICANet, CCANet, etc. This makes it possible to further sparse and streamline the filters while maintaining the network’s own advantages, enhancing its interpretability, and improving network learning efficiency.

Due to the advantages of deep subspace networks in small-sample learning and the redundancy-reducing properties of sparse operations, we will try to apply them in the field of micro-expression recognition in the future. Because the number of micro-expression image sequences on publicly available datasets is extremely sparse, totaling no more than 1000 micro-expression image sequences, micro-expression detection and classification is, thus, typically a small-sample problem. Therefore, this is a potentially suitable application direction.

Author Contributions

Conceptualization, methodology, and software, X.L., W.F. and X.W.; validation, J.G.; formal analysis, C.W.; investigation, M.X.; resources, Y.C.; data curation, Y.Y. and X.Z.; writing—original draft preparation, X.L.; writing—review and editing, X.W.; visualization, X.L.; supervision, project administration, and funding acquisition, W.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (grants: 61602345 and 62002263), partly by the National Key Research and Development Plan (grant: 2019YFB2101900), and partly by the Application Foundation and Advanced Technology Research Project of Tianjin (grant: 15JCQNJC01400).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Mamieva, D.; Abdusalomov, A.B.; Mukhiddinov, M.; Whangbo, T.K. Improved face detection method via learning small faces on hard images based on a deep learning approach. Sensors 2023, 23, 502. [Google Scholar] [CrossRef] [PubMed]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.L.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaria, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Sun, Z.; Chiong, R.; Hu, Z.P. An extended dictionary representation approach with deep subspace learning for facial expression recognition. Neurocomputing 2018, 316, 1–9. [Google Scholar] [CrossRef]
Xu, Y.H.; Meng, R.T.; Yang, Z.X. Research on micro-fault detection and multiple-fault isolation for gas sensor arrays based on serial principal component analysis. Electronics 2022, 11, 1755. [Google Scholar] [CrossRef]
Abdelbaky, A.; Aly, S. Human action recognition using short-time motion energy template images and PCANet features. Neural Comput. Appl. 2020, 32, 12561–12574. [Google Scholar] [CrossRef]
Bruna, J.; Mallat, S. Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1872–1886. [Google Scholar] [CrossRef]
Chan, T.H.; Jia, K.; Gao, S.H.; Lu, J.W.; Zeng, Z.N.; Ma, Y. PCANet: A simple deep learning baseline for image classification? IEEE Trans. Image Process. 2015, 24, 5017–5032. [Google Scholar] [CrossRef]
Zhang, Y.Q.; Geng, T.Y.; Wu, X.; Zhou, J.L.; Gao, D.R. ICANet: A simple cascade linear convolution network for face recognition. EURASIP J. Image Video Process. 2018, 2018, 51. [Google Scholar] [CrossRef]
Yang, X.H.; Liu, W.F.; Tao, D.P.; Cheng, J. Canonical correlation analysis networks for two-view image recognition. Inf. Sci. 2017, 385, 338–352. [Google Scholar] [CrossRef]
Tuncer, T.; Dogan, S.; Ertam, F. Automatic voice based disease detection method using one dimensional local binary pattern feature extraction network. Inf. Sci. 2019, 155, 500–506. [Google Scholar] [CrossRef]
Low, C.Y.; Teoh, A.B.J.; Toh, K.A. Stacking PCANet plus: An overly simplified convNets baseline for face recognition. IEEE Signal Process. Lett. 2017, 24, 1581–1585. [Google Scholar] [CrossRef]
Bian, S.R.; Wang, Z.S.; Song, W.H.; Zhou, X.H. Feature extraction and classification of time-varying power load characteristics based on PCANet and CNN plus Bi-LSTM algorithms. Electr. Power Syst. Res. 2023, 217, 109149. [Google Scholar] [CrossRef]
Liu, X.; Si, Y.J.; Yang, W.Y. A novel two-level fusion feature for mixed ECG identity recognition. Electronics 2021, 10, 2052. [Google Scholar] [CrossRef]
Hu, F.; Hu, Y.J.; Cui, E.H.; Guan, Y.Q.; Gao, B.; Wang, X.; Wang, K.; Liu, Y.; Yao, X.K. Recognition method of coal and gangue combined with structural similarity index measure and principal component analysis network under multispectral imaging. Microchem. J. 2023, 186, 108330. [Google Scholar] [CrossRef]
Li, S.S.; Zou, Y.H.; Wang, G.J.; Lin, C. Infrared and visible image fusion method based on a principal component analysis network and image pyramid. Remote Sens. 2023, 15, 685. [Google Scholar] [CrossRef]
Zhong, B.N.; Pan, S.N.; Wang, C.; Wang, T.; Du, J.X.; Chen, D.S.; Cao, L.J. Robust individual-cell/object tracking via PCANet deep network in biomedicine and computer vision. BioMed Res. Int. 2016, 2016, 8182416. [Google Scholar] [CrossRef]
Arashloo, S.R.; Amirani, M.C.; Noroozi, A. Dynamic texture representation using a deep multi-scale convolutional network. J. Vis. Commun. Image Represent. 2017, 43, 89–97. [Google Scholar] [CrossRef]
Liu, Y.; Zhao, S.S.; Wang, Q.Q.; Gao, Q.X. Learning more distinctive representation by enhanced PCA network. Neurocomputing 2018, 275, 924–931. [Google Scholar] [CrossRef]
Wang, J.P.; Ran, R.S.; Fang, B. Global and local structure network for image classification. IEEE Access 2023, 11, 27963–27973. [Google Scholar] [CrossRef]
Wang, Z.W.; Zhang, Y.J.; Pan, C.C.; Cui, Z.W. MMPCANet: An improved PCANet for occluded face recognition. Appl. Sci. 2022, 12, 3144. [Google Scholar] [CrossRef]
Duan, J.; Hu, C.; Zhan, X.B.; Zhou, H.D.; Liao, G.L.; Shi, T.L. MS-SSPCANet: A powerful deep learning framework for tool wear prediction. Robot. Comput.-Integr. Manuf. 2022, 78, 102391. [Google Scholar] [CrossRef]
Haouam, M.Y.; Meraoumia, A.; Laimeche, L.; Bendib, I. S-DCTNet: Security-oriented biometric feature extraction technique an effective pathway to secure and reliable biometric systems. Multimed. Tools Appl. 2021, 80, 36059–36091. [Google Scholar] [CrossRef]
Feng, Z.Y.; Jin, L.W.; Tao, D.P.; Huang, S.P. DLANet: A manifold-learning-based discriminative feature learning network for scene classification. Neurocomputing 2015, 157, 11–21. [Google Scholar] [CrossRef]
Mustafa, H.T.; Zareapoor, M.; Yang, J. MLDNet: Multi-level dense network for multi-focus image fusion. Signal Process.-Image Commun. 2020, 85, 115864. [Google Scholar] [CrossRef]
Zeng, R.; Wu, J.S.; Shao, Z.H.; Chen, Y.; Chen, B.J.; Senhadji, L.; Shu, H.Z. Color image classification via quaternion principal component analysis network. Neurocomputing 2016, 216, 416–428. [Google Scholar] [CrossRef]
Fan, C.X.; Hong, X.P.; Tian, L.; Ming, Y.; Pietikainen, M.; Zhao, G.Y. PCANet-II: When PCANet meets the second order pooling. IEICE Trans. Inf. Syst. 2018, E101D, 2159–2162. [Google Scholar] [CrossRef]
Sun, K.; Zhang, J.S.; Yong, H.W.; Liu, J.M. FPCANet: Fisher discrimination for principal component analysis network. Knowl.-Based Syst. 2019, 166, 108–117. [Google Scholar] [CrossRef]
Yu, D.; Wu, X.J. 2DPCANet: A deep leaning network for face recognition. Multimed. Tools Appl. 2018, 77, 12919–12934. [Google Scholar] [CrossRef]
Yang, J.; Zhang, D.; Frangi, A.F.; Yang, J.Y. Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 131–137. [Google Scholar] [CrossRef]
Li, Y.K.; Wu, X.J.; Kittler, J. L1-2D(2)PCANet: A deep learning network for face recognition. J. Electron. Imaging 2019, 28, 023016. [Google Scholar] [CrossRef]
Sun, Z.J.; Shao, Z.H.; Shang, Y.Y.; Li, B.C.; Wu, J.S.; Bi, H. Randomized nonlinear two-dimensional principal component analysis network for object recognition. Mach. Vis. Appl. 2023, 34, 21. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T.; Tibshirani, R. Sparse principal component analysis. J. Comput. Graph. Stat. 2006, 15, 265–286. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B-Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Dutta, K.; Bhattacharjee, D.; Nasipuri, M. SpPCANet: A simple deep learning-based feature extraction approach for 3D face recognition. Multimed. Tools Appl. 2020, 79, 31329–31352. [Google Scholar] [CrossRef]
Han, X.B.; Zhong, Y.F.; Cao, L.Q.; Zhang, L.P. Pre-Trained AlexNet architecture with pyramid pooling and supervision for high spatial resolution remote sensing image scene classification. Remote Sens. 2017, 9, 848. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
Behar, N.; Shrivastava, M. ResNet50-based effective model for breast cancer classification using histopathology images. CMES-Comput. Model. Eng. Sci. 2022, 130, 823–839. [Google Scholar] [CrossRef]

Figure 1. The architecture of SR2D

^{2}

PCANet. In the feature matrix, blue blocks represent non-zero elements and white blocks represent zero elements.

Figure 1. The architecture of SR2D

^{2}

PCANet. In the feature matrix, blue blocks represent non-zero elements and white blocks represent zero elements.

Figure 2. Some sample images with occluding noise from (a) ORL face dataset, (b) COIL-100 image base, and (c) NEC toy animal image base. From left to right, the percentage of the image size covered by the noise area is 0%, 10%, 20%, 30%, 40%, 50%, 60%, respectively.

Figure 3. Some sample images from AR dataset with (a) facial expression, (b) illumination variation, (c) scarf occlusion, and (d) glasses occlusion.

Figure 4. Recognition rates with varying training samples. SR2D

^{2}

PCANet-0, SR2D

^{2}

PCANet-1, SR2D

^{2}

PCANet-2 indicate the sparse processing in the first stage, in the second stage, and in both stages, respectively. (a) ORL dataset. (b) COIL-100 dataset. (c) NEC dataset. (d) AR dataset.

Figure 4. Recognition rates with varying training samples. SR2D

^{2}

PCANet-0, SR2D

^{2}

PCANet-1, SR2D

^{2}

PCANet-2 indicate the sparse processing in the first stage, in the second stage, and in both stages, respectively. (a) ORL dataset. (b) COIL-100 dataset. (c) NEC dataset. (d) AR dataset.

Figure 5. Recognition rates with varying patch size and sparsity. (a) ORL dataset. (b) COIL-100 dataset. (c) NEC dataset. (d) AR dataset.

Figure 6. Recognition rates on the ORL dataset.

Table 1. Detailed summary of related deep subspace networks.

Method	Description
PCANet	Advantage: avoids the overfitting caused by small sample datasets; does not adjust the parameters. Shortcoming: leads to time-consuming computation and loses the 2D features; redundancy and unfavorable interpretability in the principal components.
2D $^{2}$ PCANet	Advantage: preserves the 2D features of images; reduces time-consuming computation. Shortcoming: redundancy and unfavorable interpretability in the principal components.
L1-2D $^{2}$ PCANet	Advantage: robust in handling data with noise or outliers. Shortcoming: interpretability in the principal components.
SpPCANet	Advantage: reduces redundancy and improves interpretability in the principal components. Shortcoming: leads to time-consuming computation and loses the 2D features of images.

Table 2. Dataset description.

Dataset	Total Samples	Image Size	Subjects
ORL	400	112 × 92	40
COIL-100	1400	100 × 100	100
NEC	4200	100 × 100	60
AR	2600	165 × 120	100

Table 3. Average classification rates of ORL dataset with different numbers of training samples.

Algorithm	2	3	4	5	6	7
PCANet	0.7088	0.8407	0.8792	0.9270	0.9175	0.9350
2D $^{2}$ PCANet	0.8056	0.9050	0.9242	0.9440	0.9638	0.9750
L1-2D $^{2}$ PCANet	0.8050	0.8857	0.9308	0.9500	0.9725	0.9667
SpPCANet	0.7762	0.8343	0.8783	0.9020	0.9675	0.9733
SR2D $^{2}$ PCANet	0.8450	0.9143	0.9383	0.9580	0.9787	0.9800
SC2D $^{2}$ PCANet	0.7875	0.8957	0.9300	0.9410	0.9535	0.9733
SRC2D $^{2}$ PCANet	0.8663	0.8986	0.9317	0.9650	0.9662	0.9817

Table 4. Average classification rates of COIL-100 dataset with different numbers of training samples.

Algorithm	3	4	5	6	7	8	9	10
PCANet	0.6799	0.7495	0.7791	0.8160	0.8893	0.9180	0.9171	0.9400
2D $^{2}$ PCANet	0.6588	0.7057	0.7728	0.8182	0.9053	0.8711	0.8976	0.9272
L1-2D $^{2}$ PCANet	0.7159	0.7540	0.8023	0.8707	0.8996	0.9015	0.9189	0.9207
SpPCANet	0.7180	0.7789	0.8287	0.8613	0.8960	0.9229	0.9427	0.9477
SR2D $^{2}$ PCANet	0.6988	0.7950	0.8176	0.8794	0.9061	0.9090	0.9349	0.9640
SC2D $^{2}$ PCANet	0.7243	0.7689	0.8049	0.8584	0.8954	0.9112	0.9386	0.9500
SRC2D $^{2}$ PCANet	0.7571	0.7929	0.8316	0.8544	0.8947	0.9203	0.9344	0.9539

Table 5. Average classification rates of NEC dataset with different numbers of training samples.

Algorithm	10	15	20	25	30	35	40
PCANet	0.9014	0.9512	0.9473	0.9587	0.9735	0.9781	0.9789
2D $^{2}$ PCANet	0.9106	0.9333	0.9564	0.9724	0.9816	0.9866	0.9901
L1-2D $^{2}$ PCANet	0.7401	0.8818	0.8614	0.9033	0.9571	0.9357	0.9500
SpPCANet	0.8986	0.9543	0.9542	0.9779	0.9761	0.9855	0.9955
SR2D $^{2}$ PCANet	0.9334	0.9642	0.9733	0.9836	0.9832	0.9875	0.9962
SC2D $^{2}$ PCANet	0.9260	0.9542	0.9559	0.9786	0.9796	0.9926	0.9963
SRC2D $^{2}$ PCANet	0.9264	0.9572	0.9653	0.9824	0.9858	0.9874	0.9897

Table 6. Average classification rates of AR dataset with different numbers of training samples.

Algorithm	4	6	8	10	12	14
PCANet	0.8673	0.9075	0.9133	0.9275	0.8414	0.9883
2D $^{2}$ PCANet	0.9836	0.9860	0.9967	0.9894	0.9816	0.9967
L1-2D $^{2}$ PCANet	0.9355	0.9795	0.9783	0.9900	0.9814	0.9958
SpPCANet	0.9618	0.9855	0.9972	0.9994	0.9921	0.9992
SR2D $^{2}$ PCANet	0.9664	0.9630	0.9839	0.9969	0.9943	0.9983
SC2D $^{2}$ PCANet	0.9418	0.9940	0.9950	0.9994	0.9993	0.9950
SRC2D $^{2}$ PCANet	0.9545	0.9595	0.9689	0.9975	0.9750	0.9683

Table 7. Average classification rates under different noise blocks from the ORL dataset.

Algorithm	0%	10%	20%	30%	40%	50%	60%
PCANet	0.8407	0.8200	0.8143	0.8404	0.7693	0.7364	0.7550
2D $^{2}$ PCANet	0.9050	0.8693	0.8586	0.7957	0.8027	0.7922	0.7685
L1-2D $^{2}$ PCANet	0.8857	0.8850	0.8672	0.8750	0.7993	0.7521	0.6757
SpPCANet	0.8343	0.8728	0.8414	0.8243	0.8286	0.8307	0.8586
SR2D $^{2}$ PCANet	0.9143	0.8871	0.8907	0.8922	0.8793	0.8422	0.7928
SC2D $^{2}$ PCANet	0.8957	0.8793	0.8450	0.8293	0.8164	0.7957	0.8143
SRC2D $^{2}$ PCANet	0.8986	0.8872	0.8478	0.8250	0.7350	0.7007	0.7178

Table 8. Average classification rates under different noise blocks from the COIL-100 dataset.

Algorithm	0%	10%	20%	30%	40%	50%	60%
PCANet	0.8893	0.8500	0.8444	0.8269	0.8450	0.8350	0.7972
2D $^{2}$ PCANet	0.9053	0.8780	0.8664	0.8278	0.8536	0.8282	0.6800
L1-2D $^{2}$ PCANet	0.8996	0.8931	0.8683	0.8851	0.8696	0.8484	0.8096
SpPCANet	0.8960	0.9029	0.8781	0.8833	0.8836	0.8463	0.8349
SR2D $^{2}$ PCANet	0.9061	0.9218	0.8891	0.8906	0.8749	0.8716	0.7463
SC2D $^{2}$ PCANet	0.8954	0.9003	0.8819	0.8830	0.8689	0.8303	0.7246
SRC2D $^{2}$ PCANet	0.8947	0.9013	0.8803	0.8893	0.8810	0.7511	0.7823

Table 9. Average classification rates under different noise blocks from the NEC dataset.

Algorithm	0%	10%	20%	30%	40%	50%	60%
PCANet	0.8468	0.8349	0.8101	0.8000	0.8248	0.7850	0.6786
2D $^{2}$ PCANet	0.8713	0.8309	0.7850	0.8358	0.7429	0.7350	0.5794
L1-2D $^{2}$ PCANet	0.5603	0.5931	0.5721	0.7485	0.6700	0.5690	0.3812
SpPCANet	0.8047	0.7948	0.8212	0.8281	0.8341	0.8133	0.7276
SR2D $^{2}$ PCANet	0.8595	0.8047	0.8791	0.8587	0.7779	0.7883	0.6634
SC2D $^{2}$ PCANet	0.8731	0.8536	0.8799	0.8221	0.8173	0.7346	0.6179
SRC2D $^{2}$ PCANet	0.8319	0.8247	0.8486	0.8169	0.8236	0.6984	0.6457

Table 10. Average classification rates under natural noise from the AR dataset.

Algorithm	Illumination	Scarf	Sunglass
PCANet	0.7950	0.6750	0.5875
2D $^{2}$ PCANet	0.9438	0.9150	0.8825
L1-2D $^{2}$ PCANet	0.8875	0.6875	0.7912
SpPCANet	0.9538	0.9375	0.9212
SR2D $^{2}$ PCANet	0.9438	0.9112	0.8350
SC2D $^{2}$ PCANet	0.9588	0.9100	0.8738
SRC2D $^{2}$ PCANet	0.9388	0.8812	0.7812

Table 11. Average classification rates with fixed-location noise blocks.

Algorithm	ORL	COIL-100	NEC
SpPCANet	0.9690	0.8567	0.9698
SR2D $^{2}$ PCANet	0.9360	0.8383	0.9647
SC2D $^{2}$ PCANet	0.9300	0.8319	0.9352
SRC2D $^{2}$ PCANet	0.9090	0.8261	0.9363

Table 12. Recognition rates and running time on the ORL dataset.

Algorithm	Recognition Rate	Training Time (s)	Testing Time (s)
AlexNet	0.5714	279.63	2.13
VGG	0.4718	5128.48	43.68
ResNet50	0.5536	2831.08	26.00
SR2D $^{2}$ PCANet	0.9667	167.54	0.24
SC2D $^{2}$ PCANet	0.9500	175.59	0.24
SRC2D $^{2}$ PCANet	0.9500	170.24	0.33

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Feng, W.; Wang, X.; Guo, J.; Chen, Y.; Yang, Y.; Wang, C.; Zuo, X.; Xu, M. A Bi-Directional Two-Dimensional Deep Subspace Learning Network with Sparse Representation for Object Recognition. Electronics 2023, 12, 3745. https://doi.org/10.3390/electronics12183745

AMA Style

Li X, Feng W, Wang X, Guo J, Chen Y, Yang Y, Wang C, Zuo X, Xu M. A Bi-Directional Two-Dimensional Deep Subspace Learning Network with Sparse Representation for Object Recognition. Electronics. 2023; 12(18):3745. https://doi.org/10.3390/electronics12183745

Chicago/Turabian Style

Li, Xiaoxue, Weijia Feng, Xiaofeng Wang, Jia Guo, Yuanxu Chen, Yumeng Yang, Chao Wang, Xinyu Zuo, and Manlu Xu. 2023. "A Bi-Directional Two-Dimensional Deep Subspace Learning Network with Sparse Representation for Object Recognition" Electronics 12, no. 18: 3745. https://doi.org/10.3390/electronics12183745

APA Style

Li, X., Feng, W., Wang, X., Guo, J., Chen, Y., Yang, Y., Wang, C., Zuo, X., & Xu, M. (2023). A Bi-Directional Two-Dimensional Deep Subspace Learning Network with Sparse Representation for Object Recognition. Electronics, 12(18), 3745. https://doi.org/10.3390/electronics12183745

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Bi-Directional Two-Dimensional Deep Subspace Learning Network with Sparse Representation for Object Recognition

Abstract

1. Introduction

2. Related Work

2.1. PCANet

2.2. 2D $^{2}$ PCANet

2.3. L1-2D $^{2}$ PCANet

2.4. SpPCANet

2.5. Summary

3. Proposed Sparse 2D $^{2}$ PCANet Algorithms

3.1. SR2D $^{2}$ PCANet

3.1.1. SR2D $^{2}$ PCANet Convolutional Layer

3.1.2. SR2D $^{2}$ PCANet Output Layer

3.2. SC2D $^{2}$ PCANet

3.3. SRC2D $^{2}$ PCANet

3.4. Discussions

4. Experiments and Analysis

4.1. Dataset Description

4.2. Sparse Stage Experiments

4.3. Sparse Degree Experiments

4.4. Classification Rate Experiments

4.5. Robustness Experiments

4.6. Comparative Experiments with Traditional Deep Learning Algorithms

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Bi-Directional Two-Dimensional Deep Subspace Learning Network with Sparse Representation for Object Recognition

Abstract

1. Introduction

2. Related Work

2.1. PCANet

2.2. 2D 2 PCANet

2.3. L1-2D 2 PCANet

2.4. SpPCANet

2.5. Summary

3. Proposed Sparse 2D 2 PCANet Algorithms

3.1. SR2D 2 PCANet

3.1.1. SR2D 2 PCANet Convolutional Layer

3.1.2. SR2D 2 PCANet Output Layer

3.2. SC2D 2 PCANet

3.3. SRC2D 2 PCANet

3.4. Discussions

4. Experiments and Analysis

4.1. Dataset Description

4.2. Sparse Stage Experiments

4.3. Sparse Degree Experiments

4.4. Classification Rate Experiments

4.5. Robustness Experiments

4.6. Comparative Experiments with Traditional Deep Learning Algorithms

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2. 2D $^{2}$ PCANet

2.3. L1-2D $^{2}$ PCANet

3. Proposed Sparse 2D $^{2}$ PCANet Algorithms

3.1. SR2D $^{2}$ PCANet

3.1.1. SR2D $^{2}$ PCANet Convolutional Layer

3.1.2. SR2D $^{2}$ PCANet Output Layer

3.2. SC2D $^{2}$ PCANet

3.3. SRC2D $^{2}$ PCANet