Next Article in Journal
Elastic Provisioning of Network and Computing Resources at the Edge for IoT Services
Next Article in Special Issue
Human Activity Recognition in the Presence of Occlusion
Previous Article in Journal
Data-Driven Smart Living Lab to Promote Participation in Rehabilitation Exercises and Sports Programs for People with Disabilities in Local Communities
Previous Article in Special Issue
Vision-Based HAR in UAV Videos Using Histograms and Deep Learning Techniques
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Fusion-Assisted Multi-Stream Deep Learning and ESO-Controlled Newton–Raphson-Based Feature Selection Approach for Human Gait Recognition

1
Department of Computer Science, HITEC University, Taxila 47080, Pakistan
2
Department of Informatics, University of Leicester, Leicester LE1 7RH, UK
3
College of Computer Science and Engineering, University of Ha’il, Ha’il 81451, Saudi Arabia
4
Department of Software Engineering, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al-kharj 16242, Saudi Arabia
5
Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, Al-kharj 16242, Saudi Arabia
6
Faculty of Computers & Information Technology, Computer Science Department, University of Tabuk, Tabuk 71491, Saudi Arabia
7
Department of computer Science, Hanyang University, Seoul 04763, Republic of Korea
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(5), 2754; https://doi.org/10.3390/s23052754
Submission received: 31 January 2023 / Revised: 24 February 2023 / Accepted: 25 February 2023 / Published: 2 March 2023

Abstract

:
The performance of human gait recognition (HGR) is affected by the partial obstruction of the human body caused by the limited field of view in video surveillance. The traditional method required the bounding box to recognize human gait in the video sequences accurately; however, it is a challenging and time-consuming approach. Due to important applications, such as biometrics and video surveillance, HGR has improved performance over the last half-decade. Based on the literature, the challenging covariant factors that degrade gait recognition performance include walking while wearing a coat or carrying a bag. This paper proposed a new two-stream deep learning framework for human gait recognition. The first step proposed a contrast enhancement technique based on the local and global filters information fusion. The high-boost operation is finally applied to highlight the human region in a video frame. Data augmentation is performed in the second step to increase the dimension of the preprocessed dataset (CASIA-B). In the third step, two pre-trained deep learning models—MobilenetV2 and ShuffleNet—are fine-tuned and trained on the augmented dataset using deep transfer learning. Features are extracted from the global average pooling layer instead of the fully connected layer. In the fourth step, extracted features of both streams are fused using a serial-based approach and further refined in the fifth step by using an improved equilibrium state optimization-controlled Newton–Raphson (ESOcNR) selection method. The selected features are finally classified using machine learning algorithms for the final classification accuracy. The experimental process was conducted on 8 angles of the CASIA-B dataset and obtained an accuracy of 97.3, 98.6, 97.7, 96.5, 92.9, 93.7, 94.7, and 91.2%, respectively. Comparisons were conducted with state-of-the-art (SOTA) techniques, and showed improved accuracy and reduced computational time.

1. Introduction

Human verification or identification plays a significant role in information security, public security systems, point-of-sales machines, automatic teller machines, etc. [1]. Human beings can be identified by examining their different external and internal body parts, such as blood samples, skin, hair, ear shape, bite by forensic odontology, face recognition, and walking style by gait analysis [2]. Fingerprints and face recognition are well-known biometric systems, but both have some limitations; fingerprint verifications need contact with fingers, and face recognition needs a controlled environment and proper distance.
On the other hand, human gait analysis [3] is an application for recognizing people at a certain distance; basically, gait refers to the individual walking style of human beings [4]. Mainly, there are two approaches to gait recognition: model-based and model-free. The model-based approach tracks parts of the human body, such as arms, legs, hands, feet, and neck, and this approach obtains a set of static and dynamic parameters [5]. The basic idea behind this approach is to model the human skeleton’s bones and joints. Furthermore, the model-free process tracks an object’s geometry and shapes, which is helpful in object recognition systems [6].
Gait recognition is widely used in many applications, such as medical sciences [7], personal recognition, sports sciences, and cyber security. The human gait has 24 elements that can be used to identify a person. It is proven in different studies and experiments that every person has a unique muscular-skeletal structure that shows that it is possible to recognize a person with the help of gait information [8]. Gait recognition has more magnificent characteristics than other biometrics identifiers [9]. Firstly, the human gait can be captured far away without the subject’s cooperation [10], while all other biometrics cannot be obtained without the person connecting physical and mental involvement with data-acquiring sensors [11,12]. Secondly, gait recognition can be performed on low-resolution images or videos, while other biometrics, such as facial recognition, cannot be performed well on low-resolution images or videos [13]. Thirdly, HGR can be performed with minimal equipment, such as a camera, accelerometer, floor sensor, and radar [14].
Most gait recognition systems are based on four main components [15]. The first is capturing gait data from video sequences and real-time scenarios using different tools and techniques [16]. The second one is to apply segmentation to identify the human body shape and remove noise and blueness from the background by using different segmentation techniques based on the different characteristics of an image and region of interest. The third one is contour detection, which will be needed when a gap occurs between joins or some missing part of a human body after the segmentation. Contour detection is useful for the analysis of shape and object detection. The last one is feature extraction based on human properties, such as shape, geometry, etc. These features are finally classified using machine learning classifiers [17].
Applications in image processing require improving each image or frame before proceeding to the next level, such as feature extraction. The enhancement methodology relies on high-frequency pixels and low-frequency components [18]. High-frequency pixel components depict the scene and objects in the image, whereas low-frequency pixel components indicate minute features, including certain lines and minuscule points. To enhance the image, great attention is always required to increase the high-frequency components while retaining the low-frequency components [19]. Recently, deep learning has shown significant success in object classification, gait recognition, action recognition, amongst others. In deep learning, features are extracted from the raw images. A deep learning architecture consists of several hidden layers: convolutional, pooling, and fully connected. Training the deep learning model on the enhanced dataset may yield better features that later improve accuracy [20].

1.1. Existing Techniques

Several deep learning-based techniques for human gait classification have been introduced in the literature. Muhammad et al. [21] implemented an improved ant colony optimization (IACO) and deep learning framework for human gait recognition. The IACO algorithm has been used to select the best-classified features using machine learning classifiers. After several experimentations, the IACO method is more accurate while diminishing the computational cost compared to the other state-of-the-art techniques. Imran et al. [22] presented a deep learning (DL) and kurtosis-controlled entropy (KcE) based framework for HGR using video sequences. They tried to resolve challenges, such as human angle shift, clothing, and walking style. The authors extracted features using ResNet101 deep model and then selected the best features using the KcE approach. The experimental process was conducted on the CASIA-B dataset and obtained an accuracy of 95.26% and 96.60%, respectively. Khan et al. [23] introduced a single-stream HGR framework based on optimal deep learning fused features. In their work, the authors performed data augmentation at the first step and then used two pre-trained models, such as Inseption-ResNet-V2 and NASNet mobile. Features of both deep learnings were fused and further optimized using the whale optimization algorithm. Machine learning classifiers were applied and obtained the best accuracy of 89%.
Huang et al. [24] demonstrated a gait recognition method based on multisource sensing information. The 3D human features data was extracted using the human body’s structure and multisource stream information during a human walk. Athlete walk includes different characteristics, and based on these characteristics, a person is identified. The CASIA A dataset was used for the experimental process and obtained an accuracy of 88.33%. Hasan et al. [25] presented a modified residual block and a novel shallow convolutional layer for HGR. Wearable sensors were embedded in objects that can be worn on the subject body, such as wristwatches, necklaces, and smartphones, and were used for gait analysis. Template matching and conventional matching were not appropriate and did not provide improved performance for low-device wearable devices. They also introduced a modified residual block and shallow convolutional neural network that obtained an accuracy of 85% on the IMU-based dataset. Junaid et al. [26] presented a human gait analysis approach for osteoarthritis using DL and kernel extreme learning machine. The authors faced numerous difficulties in this approach, such as abnormal walking, patients’ clothes, and angle changes. Conventional techniques are only concerned with feature selection and do not address such issues; therefore, the authors employed a novel robust method to address that disparity. For experimental purposes, two pre-trained models (VGG16-Net and AlexNet) were used and obtained improved accuracy. Yonghong et al. [27] addressed the free-view gait recognition problem. They faced the problems of traditional methods that capture gait sequences under uncontrolled scenes, unknown view angles, and dynamically changing viewing angles during the walk. They presented a unique walking trajectory fitting (WTF) approach for these challenges. Also, they introduced a joint gait manifold (JGM) technique for gait similarity evaluation.

1.2. Major Challenges

In summary, all the above methods still faced several issues, such as selecting important features and extracting irrelevant features. Moreover, the above methods did not select the entire CASIA-B dataset for the experimental process. Features extraction from the original video frames may extract some redundant and irrelevant features due to complicating factors, such as outdoor environment, lighting conditions, complex background, noise, and low-resolution frames. These factors impact recognition accuracy. In addition, several studies extracted features from a region of interest (ROI), which is a time-consuming step that, sometimes, leads to a chance of incorrect ROI detection. The incorrect ROI detection consumes the developed system’s overall time and extracts irrelevant features that later reduce the classification accuracy. Therefore, this work presents a new framework using the best fusion-assisted deep learning features.

1.3. Major Contributions

The major contributions of this work are as follows:
  • A contrast enhancement technique based on local and global filter information fusion is proposed. The high-boost operation is finally applied to highlight the human region in a video frame.
  • The data augmentation was performed, and two fine-tuned deep learning models (MobilenetV2 and ShuffleNet) were trained using deep transfer learning. Features were extracted from the flattened global average pooling layers instead of the fully connected layer.
  • Features of both streams were fused in a serial-based fashion that can minimize the loss of information and then select the best features using a new approach called ESO-controlled Newton–Raphson.
  • The detailed ablation study-based results have been computed and discussed, showing the improvement in the accuracy of this work.

1.4. Manuscript Organization

The rest of the manuscript is organized in the following order. The proposed methodology is presented in Section 2, which includes the contrast enhancement technique, deep learning features, proposed feature fusion, and best feature selection. Section 3 presents the experimental results of the proposed methodology. Section 4 concludes the manuscript.

2. Proposed Methodology

The proposed human gait recognition framework is presented in Figure 1. This figure illustrates that the proposed HGR framework consists of several phases: contrast enhancement of original frames, data augmentation, training of the deep learning models, extraction of features, a fusion of both stream features, selection of the most optimal features, and finally, classification. A brief description of each step in the form of mathematics and numerical values is discussed below.

2.1. Novelty 1: Hybrid Fusion Enhancement Technique

Preprocessing is an important step in computer vision that can be used to refine the original images into better information. In this work, a new fusion-based technique is proposed for contrast enhancement and improvement of frame quality. For this purpose, HSV color transformation is initially applied, which encodes 24-bit colors by hue, saturation, and value. This selection aims to organize the colors in a more practically applicable manner. Consider we have a CASIA-B dataset denoted as ẞ and ẞ ℇ .   ( i ,   j ) denotes the HSV color-transformed image, and R ,   G , and B denote the red, green, and blue channels of values between 0–255. The mathematical formulation is defined as follows:
= R / 255
= G / 255
= B / 255
R a n g M a x = m a x   ( , , ) = B / 255
R a n g M i n = m i n   ( , , )
= R a n g M A X R a n g M i n
Here denotes the change in maximum and minimum range, denotes the extracted red channel, denotes the extracted green channel, and denotes the extracted blue channel, respectively. Based on the above information, the hue, saturation, and value channels are computed as follows:
H = { 60 ° × ( )   m o d   6 ,   R a n g M a x = 60 ° × ( ) + 2 ,   R a n g M a x = 60 ° × ( ) + 4 ,   R a n g M a x = }
S = { 0 ,   R a n g M a x = 0 R a n g M a x ,   R a n g M a x   0   }
V = R a n g M a x
After that, the averaging filter is applied to the transformed image due to its linear filter type. This filter minimized the ambient noise, refined edges, and rectified uneven lighting. This approach involves filtering the frame by correlation with a suitable filter kernel. The value of the resultant pixel is determined as the weighted combination of its neighbor pixels. On the input signal, it functions as an averaging filter; it processes an input vector of values and determines an average for every value inside the vector. Consider we have an input image   ( u ,   v ) filtered image   ( u ,   v ) , and ґ   ( a ,   b ) is the weight of the average filter that convolves the input image and produced the smooth filtered image: the mathematically averaging filter can be illustrated as:
  ( u ,   v ) = ( a = u x b = v y   (   a ,   b ) . ( u ,   v ) ) a = u x b = v y   ( a ,   b ) .  
The high-boost filter is later applied on the resultant image   ( u ,   v ) for the sharpening of edges in a video frame. This operation is employed to strengthen the image of high-frequency components, further improving the relative relevance of features conveyed by high-frequency components. The high-boost filtering image is computed as follows:
P h b ( u , v ) = ( W 1 )   ( u ,   v ) +   ( u ,   v ) h h p f ( a , b )
P h b ˜ ( u , v ) = [ P h b ( u , v ) +   ( i ,   j ) ] I ( u , v ) ]
where W is an increasing factor for adjusting the weights, P h p f ( u , v ) denotes the high-pass filtered image, and P h b ˜ ( u , v ) is the final fused enhanced image. The visual illustration is shown in Figure 2. Based on these contrast enhancement outputs, we augmented the entire dataset and results outputs are described in Table 1.

2.2. Pre-Trained Deep Models

Mobilenetv2: Mobilenetv2 was introduced by Google in 2018, and is the variation of the Mobilenet model. It is a convolutional neural network that contains 53 deep layers. This model is based on the inverted residual structure with connections between the bottleneck levels. Therefore, we used this model as a backbone network. A visual illustration of this network is shown in Figure 3 In this figure, it is noted that there are two blocks. One is a residual block with a stride of 1. Another one is a block with a stride of 2 for downsizing. There are three layers for both types of blocks. This time, the first layer is 1 × 1 convolution with ReLU6. The second layer is the depth-wise convolution. The third layer is another 1 × 1 convolution but without any non-linearity. It is claimed that if Relu is used again, the deep networks only have the power of a linear classifier on the non-zero volume part of the output domain. It is a very effective feature extractor mostly used for object detection and segmentation. Mobilenetv2 was pre-trained using the ImageNet dataset of about 1000 object classes.
ShuffleNet: The ShuffleNet is a lightweight convolutional neural network (CNN) model introduced by Magvii Inc in 2017. It is a 50-layer deep lightweight CNN model with 1.4 million parameters and accepts an RGB input image of a size of 224 × 224. They proposed a CNN model suitable and specially designed for mobile devices, which is highly efficient in computation and power consumption. This model is evaluated in ImageNet 2016 classification dataset. It introduces the three variants of the shuffle unit, composed of group convolution and channel shuffles. Group convolution uses multiple kernels per layer and generates multiple channel outputs per layer. It can learn more intermediate features and increase the channels for the next layer.
Moreover, channel shuffle is an operation to help information flow across feature channels in a CNN. It was used as a part of the ShuffleNet architecture. The input and output channels can be fully related if a group convolution is allowed to obtain input data from different groups. It can be designed with very limited computing power. Two new operations are used in this architecture (pointwise group convolution and channel shuffle) that reduce the computation cost while maintaining accuracy. The visual description of this architecture is shown in Figure 4.

2.3. Deep Transfer Learning based Features Extraction

The deep transfer learning process is employed in this work for the training of both pre-trained models on the enhanced CASIA-B dataset. For the deep transfer learning process, we firstly fine-tuned both models so that the last three consecutive layers were removed, and new layers were added and connected to the previous global average pooling layer. Next, the models were trained through deep transfer learning.
Domain. A domain can be represented as:
D = { V ,   ρ   ( V ) }  
which contains two parts: the feature space, V , and the probability distribution, ρ ( V ) . V = { v   |   v i     V ,   i = 1 , , N   } and N is a dataset with N instances. However, source and target domains are the subcategory of transfer learning with the same feature vector but different probability distributions.
Task. The task can be represented as:
= { ,   ʄ ( . ) }
which includes two factors: the label space, , and a mapping function, ʄ (.), where = { L   |   L i     ,   I = 1 , . M } , and M is a label set for the relevant instance in D . The mapping function, ʄ ( . ) , generally known as ʄ ( V ) = ρ   ( V | L ) , is a non-linear and indirect function that may fill the gap between input instances as well as the projected judgment that is acquired from the suggested datasets. Similarly, distinct objectives are specified because of the label spaces among these tasks. The visual process of deep transfer learning is shown in Figure 5. This figure illustrates that none of the layers are frozen, and the entire network is trained on the selected enhanced CASIA-B dataset. After training, features are extracted from the selected layers, such as global average pooling. In the first trained model, named MobilenetV2, the global average pooling layer is employed and performed the activation. From this layer, we obtained 1280 features for each image. Hence, the results vector is denoted by N × 1280 . In the second trained model, named ShuffleNet, we extracted features from the global average pooling layer and obtained 544 features; hence, the resultant vector is obtained on dimensional N × 544 . This is shown in Table 2.

2.4. Novelty 2: Minimal Serial Features Fusion

The two feature vectors F V 1 and F V 2 contain N × 1280 and N × 544 features for MobilenetV2 and ShuffleNet models, respectively. F V 3 is a fused feature vector using a serial approach that returns a feature vector of dimension N × 1824 by employing the following mathematical formulation:
F V 3 = ( F V 1 F V 2 ) ( N × 1280 + N × 544 )
The resultant fused feature vector consists of some redundant information; therefore, we implemented a minimization function that removes the redundant features after each iteration. The objective function of this function is to minimize the error rate and reduce the computational time that is required after the reduction of the redundant features. Mathematically, this function is defined as follows:
F = arg min F V 3 ( E r )
Also, the working of this process is described in the below Algorithm 1, and the final fused vector is denoted by F V ˜ .
Algorithm 1: Proposed Feature Fusion.
Input: Feature Vectors F V 1   and   F V 2
Step 1: Serially Fused using below Equation
    F V 3 = ( F V 1 F V 2 ) ( N × 1280 + N × 544 )
Step 2: for I = 1 to sizeof(FV3)
Step 3: Initialize the static parameter
   Er = 0
   Iterations = 100
Step 4: Minimization Function
    F = arg min F V 3 ( E r )
   if(F > Er)
   Repeat above steps
   Stop when Er near to Zero or Equal to Zero
Output: Final Fused Feature Vector F V ˜

2.5. Novelty 3: Proposed ESOcNR Feature Selection

Feature selection has been an important step in machine learning over the last two years. Many techniques have been introduced, but they faced a few issues, such as reducing important features and selecting extra features. These factors can reduce the accuracy and increase the computational time. This work proposed a new equilibrium state optimization technique controlled using the Newton–Raphson method (ESOcNR) for the best feature selection. The proposed technique is initially based on the original ESO algorithm [28], which uses a mass balance equation to define the concentration of a nonreactive ingredient in a control volume. The mass balance equation describes the mechanics of mass entering, leaving, and creating mass in a control volume. The universal mass-balance equation is represented by a first-order ordinary differential equation, described as follows:
W d E d t = R E f r R D + H
where W represents the inside of the control volume, D denotes the concentration, the rate of mass change in the control volume is denoted by W d E d t , and R denotes the volumetric efficiency of a control volume. The variable E f r implies the concentration at an equilibrium state with no production within the control volume, and H denotes the mass generation rate within the control volume. A stable equilibrium condition is achieved once W d E d t hits zero. Reordering of the above equation helps in solving d E d t as a function of R W , where R W reflects the inverse of the residency period, referred as λ or the turnover rate ( λ = R W ) . The W is computed as follows:
W = d E λ E f r λ E + H W
W = E 0 E d E λ E f r λ E + H W
E = E f r + ( E 0 E f r ) G + H λ W ( 1 G )
In the above equation, the H is determined as follows:
H = exp [ λ ( t t 0 ) ]
where t 0 and E 0 are the initial start time and concentration and are calculated by an integral interval, respectively. Equation (19) can be utilized to determine the attention in the control volume with a specified turnover rate, among several other things. It can also be employed to compute the average turnover rate by implementing a simple linear regression with a predetermined generation rate.
The first term is equilibrium concentration, which is one of the ideal solutions chosen randomly from a pool known as the equilibrium pool. The direct search approach is the second term concerned primarily with a concentration difference between a particle and the equilibrium state. The said term acts as an explorer, urging particles to explore the entire region. The third term is associated with the generation rate that either primarily contributes as an exploiter or remedy refiner, but can also function as an explorer occasionally. Each term is defined below and how it influences the search pattern.
Evaluation and Initialization of Functions: The optimization process is initiated by ESO, as are several other meta-heuristic algorithms with the initial population. The dimensions of the search space with uniform random initialization and initial concentrations are determined by the number of particles as follows:
E i i n i t i a l = E m i n + r a n d i ( E m a x E m i n ) i = 1 , 2 , 3 n
The i th particle’s initial concentration vector is indicated by E i i n i t i a l , while the maximal and minimal values for the dimensions are given by E m a x and E m i n , respectively. The variable n is the population’s number, and r a n d i is a haphazard vector that falls inside the range of 0–1. To evaluate the equilibrium candidates for the fitness function, particles are appraised and then classified.
Candidates and the Equilibrium Pool E fr : The algorithm’s ultimate convergence state is the global optimal equilibrium state. There is no knowledge about the equilibrium state at the outset of the optimization procedure; thus, only equilibrium candidates are picked to create a search pattern for the particles. The finest four particles are discovered throughout the optimization process, as well as one additional particle whose concentration matches the arithmetic mean of the four previously mentioned particles. These four options aid EO in improving its exploring abilities, whereas the average helps with exploitation. The number of candidates is arbitrary and determined by the nature of the optimization issue.
In contrast, selecting fewer than four candidates degrades the method performance in multimodal and composition functions, although improving outcomes in uni-modal functions. Furthermore, having more than four candidates may have a detrimental effect. Thus five particles are referred to as equilibrium candidates and are utilized to form the equilibrium pool vector:
E e q _ p o o l = { E e q ( 1 ) , E e q ( 2 ) , E e q ( 3 ) , E e q ( 4 ) , E e q ( a v e ) }
Each particle’s concentration is changed at random in each cycle by picking randomly from a pool of candidates with the same probability. For instance, in the first iteration, the first particle upgrades concentrations based on E e q ( 1 ) ; in the second iteration, concentrations may be updated based on E e q ( a v e ) . Each particle will be updated until the optimization process is complete, with about the same percentage of updates going to each candidate solution.
Exponential Term (H): The next term which further contributes to the primary concentration update rule is the exponential term ( H ). A thorough explanation of such a concept might help EO strike a fair equilibrium between exploration and exploitation. Because the turnover rate in a real control volume fluctuates over time, λ is a random vector that tends to range from 0–1.
H = e λ ( t t 0 )
The time is represented by t and formulated as follows:
t = ( 1 I t r M a x _ I t r ) ( b 2 I t r M a x _ I t r )
where I t r and M a x _ I t r represent the current and max number of iterations, respectively, and b 2 is a constant used to adjust exploitation capabilities. To ensure convergence, the search velocity is reduced to increasing the algorithm’s exploration and exploitation capabilities.
t 0 = 1 λ ln ( b 1 s i g n ( s 0.5 ) [ 1 e λ t ] ) + t
where, b 1 is a constant number that regulates exploration. The higher b 1 value, the better the exploring capacity, and hence the lower the exploitation efficiency. Similarly, increasing b 2 improves exploitation while decreasing exploring capabilities. The third component sign ( s 0.5 ) influences the course of exploration and exploitation. The variable s is a random vector with a value ranging from 0 to 1. These constants are calculated empirically by evaluating a set of test functions. These parameters, however, can be modified as needed for specific circumstances.
H = b 1 s i g n ( s 0.5 ) [ e λ t 1 ]
Generation Rate (G): The generating rate is one of the most important words in providing the proper answer by improving the exploitation phase. The model below displays generation rates as a first-order exponential decay process:
G = G 0 e l ( t t 0 )
where H 0 represents the beginning value and l represents the decay constant. To have a more controlled and systematic search pattern and to limit the number of random variables, we assume l = λ and use the previously computed exponential term. As a result, the final set of generation rate equations are as follows:
G = G 0 H
H = e λ ( t t 0 )
G 0 = G C P ( D e q λ D )
G C P = { 0.5 s 1   s 2 G P 0   s 2 G P
The generation rate control parameter (GCP) is a generic term for the potential contribution towards the updating process. However, one component (called the generation probability (GP)) determines the likelihood of this contribution by defining how many particles utilize the generic term to update their states. By using GP = 0.5, a reasonable balance between exploration and exploitation may be reached. Finally, the EO update regulation is defined as follows:
E = E e q + ( E E e q ) × H + G λ W ( 1 H )
The first term reflects an equilibrium concentration, whereas the second and third terms describe variations in concentration. The second term is in charge of searching the entire area for the best position. The third term contributes to exploitation by making the solution more exact when it reaches a spot. Depending on factors, such as particle concentrations, equilibrium candidates, and the turnover rate ( λ ), the second and third components may have the same or opposite sign. The same sign promotes diversity, which helps with domain searches, while the opposite sign minimizes variation, which helps with local searches. Finally, the memory-saving algorithms help each particle maintain track of its locations in space, which influences its fitness value. Each particle’s fitness value in the current iteration is assessed to the one from the previous iteration and is rewritten if it attains a better choice. This process improves exploitation capacity but increases the likelihood of being caught in local minima if the approach does not benefit from global exploration capability.
Newton–Raphson based Final Selection: The features of each iteration of ESO are passed to the Newton-Raphson-based function [29] that computes the resultant value. The resultant value states when the number of iterations will be stopped. The main purpose of this function is to find the quick value for the threshold selection. This value reduces the computational time and improves the performance. Mathematically, this process is defined as follows:
f 1 = f 0 h ( f 0 ) h ( f 0 )
f n + 1 = f n h ( f n ) h ( f n )
where, f n + 1 E and is a selected feature vector after each iteration. This vector is updated after each iteration, and once the value of f n + 1 becomes constant, it will stop and return the best feature vector. The final selected feature is finally classified using machine learning classifiers.

3. Results and Analysis

Dataset and Performance Measures: The proposed HGR framework has been evaluated using the CASIA-B dataset. A detailed description of a dataset has been given in Section 3. Several classifiers have been used for classification accuracies, such as fine tree, medium tree, linear SVM, quadratic SVM, weighted KNN, coarse KNN, bagged trees, subspace discriminate, Bi-layered neural network, and Tri-layered neural network. The performance of each classifier is computed using recall rate, precision rate, accuracy, and time (seconds).
Experimental Setup: We divided the entire dataset 50:50 for training and testing. The 10-fold cross-validation was chosen for the testing process. Moreover, several hyperparameters have been utilized in training deep learning models, such as a learning rate of 0.0001, epochs are 100, momentum value of 0.7, mini-batch size of 32, and the chosen optimizer was the stochastic gradient descent. The entire framework was simulated on MATLAB 2022a using a personal computer Corei7 having 16 GB of RAM and 8 GB graphics card.

3.1. Results Analysis

Table 3 presents the results of angle 0 of the CASIA-B dataset using the proposed framework. The results of the fusion and optimization methods are presented in this table. Several classifiers have been employed for the classification results. First, the fusion method obtained the highest accuracy of 97.3% on Quadratic SVM, whereas the recall rate is 97.33% and the precision rate was 97.37%. Computational time was also noted, and it was observed that the fusion process consumes 572.19 s for this classifier. However, the minimum noted time for the fusion process was 69.438 s on the medium tree classifier, whereas the maximum reported time was 4420.7 s on the tri-layered neural network classifier. Second, the optimization results have been presented and obtained the maximum accuracy of 97.2% on Quadratic SVM. The recall rate of this classifier was 97.23%, and the precision rate was 97.3%. The computation time of this step was also noted, and Quadratic SVM executes in 45.259 s. However, the minimum noted time for the optimization process is 38.502 s on the medium tree classifier, whereas the maximum reported time was 2049.1 s on the tri-layered neural network classifier. This shows that the accuracy of the optimization process was almost consistent, but the execution time was significantly reduced, which thus shows the strength of the proposed framework.
Table 4 presents the results of the fusion and optimization methods on angle 18 of the CASIA-B dataset using the proposed framework. First, the fusion method obtained the highest accuracy of 98.6% on Quadratic SVM, whereas the recall rate was 98.57% and the precision rate was 98.57%. Computational time is also noted, and it was observed that the fusion process consumes 859.46 s for this classifier. However, the minimum noted time for the fusion process was 174.72 s on the fine tree classifier, whereas the maximum reported time was 2049.8 s on the subspace discriminant classifier. Second, the optimization obtained a maximum accuracy of 98.0% on Quadratic SVM. The recall rate of this classifier was 97.93%, and the precision rate was 98%. The computation time of this step was also noted, and Quadratic SVM executes in 42.512 s. However, the minimum noted time for the optimization process was 34.35 s on the medium tree classifier, whereas the maximum reported time is 1573.6 s on the bagged trees classifier. This shows that the accuracy of the optimization process was almost consistent, but the execution time was significantly reduced.
Table 5 presents the results of angle 36 of the CASIA-B dataset using the proposed framework. First, the fusion method obtained the highest accuracy of 97.7% on Quadratic SVM, whereas the recall rate was 97.67% and the precision rate was 97.63%. Computational time was also noted, and observed that the fusion process consumes 1014.7 s for this classifier. However, the minimum noted time for the fusion process was 95.118 s on the medium tree classifier, whereas the maximum reported time was 5675.8 s on the bagged trees classifier. Second, the optimization obtained the maximum accuracy of 97.2% on Quadratic SVM. The recall rate of this classifier was 97.23%, and the precision rate was 97.23%. The computation time of this step was also noted, and Quadratic SVM executes in 53.864 s. However, the minimum noted time for the optimization process was 23.989 s on the medium tree classifier, whereas the maximum reported time was 1827.5 s on the bagged trees classifier. Results in this table show the consistency, but time was significantly reduced, which is a main strength of this step.
Table 6 discusses the results of angle 54 of the CASIA-B dataset using the proposed framework. First, the fusion method obtained the highest accuracy of 96.5% on Quadratic SVM, whereas the recall rate was 96.53% and the precision rate was 96.53%. The minimum noted computational time for the fusion process was 45.349 s on the medium tree classifier, whereas the maximum reported time was 4780.4 s on the bagged trees classifier. Second, the optimization obtained the maximum accuracy of 96.2% on Quadratic SVM. The recall rate of this classifier was 96.27%, and the precision rate was 96.27%. After this process, the computational time was significantly reduced, but not much change occurred in the accuracy.
Table 7 presents the results of angle 72 of the CASIA-B dataset using the proposed framework. First, the fusion method obtained the highest accuracy of 92.8% on coarse KNN, whereas the recall rate was 82.87% and the precision rate was 82.97%. Computational time was also noted; the minimum noted time was 232.76 s on the medium tree classifier, whereas the maximum reported time was 3645.4 s on the tri-layered neural network classifier. Second, the optimization obtained the maximum accuracy of 92.9% on coarse KNN. The recall rate of this classifier was 83%, and the precision rate was 82.9%. The minimum computation time of this step was 72.293 s on the medium tree classifier, whereas the maximum reported time was 2918 s on the tri-layered neural network classifier. Overall, this step reduced the computational time and remained consistent with the classification accuracy.
Table 8 shows the results of angle 90 of the CASIA-B dataset using the proposed framework. Results are presented for both the fusion and optimization steps. First, the fusion method obtained the highest accuracy of 93.7% on Quadratic SVM, whereas the recall rate was 93.73% and the precision rate was 94.03%. The minimum noted computational time on the medium tree classifier was 281.13 s, whereas the maximum reported time was 4960.7 s on the bagged trees classifier. Second, the optimization obtained the maximum accuracy of 93.1% on Quadratic SVM. The recall rate of this classifier was 93.17%, and the precision rate was 93.53%. The minimum recorded time for the optimization process was 78.005 s on the medium tree classifier, whereas the maximum reported time was 1280.4 s on the weighted KNN classifier. These facts show that the accuracy was not changed too much, but a significant reduction was noted in computational time, which is the strength of the proposed optimization algorithm.
Table 9 presents the results of angle 108 of CASIA-B dataset using the proposed framework. In this table, the fusion process obtained the highest accuracy of 94.7% on Quadratic SVM, whereas the recall rate was 94.57% and the precision rate was 94.63%. Second, the optimization process obtained the maximum accuracy of 94.2% on Quadratic SVM. Computational time was noted for both experiments and the minimum noted time for the fusion process was 83.087 s on the medium tree classifier. In contrast, the minimum noted time for the optimization process was 39.314 s on the medium tree classifier. This shows the significant improvement in the optimization process’s computational time, which is the strength of this step.
Similarly, Table 10 presents the results of angle 126 of the CASIA-B dataset using proposed framework. First, the fusion method obtained the highest accuracy of 91.2% on Quadratic SVM, whereas the recall rate was 91.13% and precision rate was 91.17%. Computational time was also computed, and the minimum noted time for the fusion process was 346.5 s on the medium tree classifier. Second, the optimization results have been presented and obtained the maximum accuracy of 90.4% on Quadratic SVM. The minimum computational time of this step was 25.116 s on the fine tree classifier. Hence, it is clearly observed that the proposed framework improved the accuracy and reduced the computational time.
Table 11 presents the results of angle 144 of CASIA-B dataset using the proposed framework. In this table, the fusion method obtained the highest accuracy of 92.4% on Quadratic SVM, whereas the recall rate is 92.34% and the precision rate is 92.6%. The minimum computational time for this experiment is 126.29 s on the fine tree classifier. Second, the optimization results have been presented and obtained the maximum accuracy of 91.9% on Quadratic SVM. The recall rate of this classifier is 91.9%, and the precision rate is 92.17%. Compared to the fusion process, the computation time of this step has been significantly reduced to 96.952 s on the fine-tree classifier.
Table 12 presented the results of angle 162 of the CASIA-B dataset and obtained the maximum accuracy for the fusion method was 96.5% on Quadratic SVM. In contrast, the recall rate was 96.47%, and the precision rate was 96.57%. Computational time was also noted, and it was observed that the minimum reported time was 32.703 s on the medium tree classifier. Second, the optimization obtained the maximum accuracy of 96.3% on Quadratic SVM. The recall rate of this classifier was 96.23%, and the precision rate was 96.4%. The computation time of this step was 68.363 s on the fine-tree classifier, which is significantly lower that the fusion process.
Table 13 presents the results of angle 180 of CASIA-B dataset using the proposed framework. First, the fusion method obtained the highest accuracy of 99.9% on Quadratic SVM, whereas the recall rate was 99.83% and the precision rate was 99.87%. Computational time was also noted, and observed that the fusion process consumes 240.29 s for this classifier. However, the minimum noted time for the fusion process was 88.322 s on the medium tree classifier, whereas the maximum reported time was 2476.1 s on the bagged trees classifier. Second, the optimization obtained the maximum accuracy of 99.8% on Quadratic SVM. The recall rate of this classifier was 99.8%, and the precision rate was 99.77%. The computation time of this step was also noted, and Quadratic SVM executes in 113.25 s. However, the minimum noted time for the optimization process was 28.024 s on the bi-layered neural network classifier, whereas the maximum reported time was 168.63 s on the weighted KNN classifier. This shows that the accuracy of the optimization process was almost consistent, but the execution time was significantly reduced.

3.2. Comparative Analysis

A detailed comparison of the proposed framework has been included in this section based on the intermediate steps’ performance and individual ESO-based feature selection. Figure 6 shows the analysis of the intermediate steps of the proposed framework. This figure illustrates that the accuracy of the ShuffleNet and MobilenetV2 deep model features is insufficient, and both models performed better for a few angles. However, the fusion process improves the accuracy, but an increase occurred in the computational time, as presented in Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13. Therefore, a new technique named ESOcNR is proposed. Using this technique, a significant reduction occurred in the number of features, but a minor drop occurred in accuracy (Table 3, Table 4, Table 5, Table 6, Table 7, Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13). Figure 7 illustrates the comparison between ESO-based feature selection and ESOcNR-based feature selection. In this figure, it is noted that the accuracy was improved after employing the proposed selection method. For example, for angle 0, a 3% change occurred; however, for the other angles, an almost 3–4% change is reported after employing the proposed ESOcNR. Table 14 shows the results of the proposed feature selection technique for CASIA-B dataset on all 11 angles. In this table, it is noted that the QSVM classifier shows the most improved accuracy for the most angles. Table 15 presents the comparison of the proposed framework with state-of-the-art techniques. In this table, the comparison is conducted with each angle of the CASIA-B dataset. The proposed framework performed better on angles 0, 18, 36, 54, 144, 162, and 180. However, the performance on other angles (72, 90, 108, and 126) is not improved, which will be considered in the future. Hence, overall, the proposed framework shows improved accuracy.

4. Conclusions

A new framework is proposed in this work based on fusion-assisted deep learning features and the ESOcNR feature selection technique. The proposed framework consists of a few important subsequent steps, including contrast enhancement of video frames, deep learning features extraction from the selected models, proposed minimal serial fusion approach, and ESOcNR-based feature selection. Results are computed on the enhanced CASIA-B dataset using all 11 angles and show improvements in accuracy on 7 of these 11 angles. Based on the results and comparative analysis, we conclude the following points:
  • The training of deep learning models on the enhanced dataset has extracted the more useful features that later improved the accuracy.
  • The proposed fusion approach improved the accuracy but increased the computation time.
  • The original ESO-based feature selection approach selected some redundant features that reduced the classification accuracy.
  • Selection of best features using the proposed ESOcNR maintains the classification accuracy and reduces the computational time of the fusion process.
In the future, the weights of deep learning models can be optimized using some meta-heuristic features selection techniques. Moreover, in the future, the angles 72, 90, 108, and 126 should be analyzed to attempt to improve their accuracy.

Author Contributions

Conceptualization, F.J., M.A.K. and M.A.; methodology, F.J., M.A.K., M.A. and A.A.; software, F.J., M.A.K., M.A. and S.A.; validation, S.A., M.S. and A.A.H.; formal analysis, A.A.H. and J.-h.C.; investigation, J.-h.C., M.A.K., A.A.H. and M.S.; resources, J.-h.C., M.S. and A.A.H.; data curation, A.A. and S.A.; writing—original draft preparation, F.J., M.A.K. and M.A.; writing—review and editing, A.A., S.A. and M.S.; visualization, A.A.H. and J.-h.C.; supervision, M.A.K. and J.-h.C.; project administration, A.A. and S.A.; funding acquisition, J.-h.C. and M.A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by “Human Resources Program in Energy Technology” of the Korea Institute of Energy Technology Evaluation and Planning (KETEP), granted financial resources from the Ministry of Trade, Industry & Energy, Republic of Korea. (No. 20204010600090). This study is supported via funding from Prince Satam bin Abdulaziz University project number (PSAU/2023/R/1444).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this work is publically available.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Jain, A.K.; Ross, A.; Prabhakar, S. An Introduction to Biometric Recognition; IEEE: New York, NY, USA, 2004; pp. 4–20. [Google Scholar]
  2. Jain, A.K.; Ross, A.; Pankanti, S. Biometrics: A tool for information security. IEEE Trans. Inf. Forensics Secur. 2006, 1, 125–143. [Google Scholar] [CrossRef] [Green Version]
  3. Liu, L.; Wang, H.; Li, H.; Liu, J.; Qiu, S.; Zhao, H.; Guo, X. Ambulatory human gait phase detection using wearable inertial sensors and hidden Markov model. Sensors 2021, 21, 1347. [Google Scholar] [CrossRef]
  4. Deligianni, F.; Guo, Y.; Yang, G.-Z. From emotions to mood disorders: A survey on gait analysis methodology. IEEE J. Biomed. Health Inform. 2019, 23, 2302–2316. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Liao, R.; Li, Z.; Bhattacharyya, S.S.; York, G. PoseMapGait: A model-based gait recognition method with pose estimation maps and graph convolutional networks. Neurocomputing 2022, 501, 514–528. [Google Scholar] [CrossRef]
  6. Gafurov, D. A survey of biometric gait recognition: Approaches, security and challenges. In Proceedings of the Annual Norwegian Computer Science Conference, Bergen, Norway, 20–24 August 2007; pp. 19–21. [Google Scholar]
  7. Khan, M.A.; Kadry, S.; Parwekar, P.; Damaševičius, R.; Mehmood, A.; Khan, J.A.; Naqvi, S.R. Human gait analysis for osteoarthritis prediction: A framework of deep learning and kernel extreme learning machine. Complex Intell. Syst. 2021, 1–19. [Google Scholar] [CrossRef]
  8. Bijalwan, V.; Semwal, V.B.; Mandal, T. Fusion of multi-sensor-based biomechanical gait analysis using vision and wearable sensor. IEEE Sens. J. 2021, 21, 14213–14220. [Google Scholar] [CrossRef]
  9. Bayat, N.; Rastegari, E.; Li, Q. Human Gait Recognition Using Bag of Words Feature Representation Method. arXiv 2022, arXiv:2203.13317. [Google Scholar]
  10. Derlatka, M.; Borowska, M. Ensemble of heterogeneous base classifiers for human gait recognition. Sensors 2023, 23, 508. [Google Scholar] [CrossRef]
  11. Turk, M.A.; Pentland, A.P. Face recognition using eigenfaces. In Proceedings of the 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Maui, HI, USA, 3–6 June 1991; IEEE Computer Society: Washington, DC, USA, 1991; pp. 586–591. [Google Scholar]
  12. Kim, D.; Paik, J. Gait recognition using active shape model and motion prediction. IET Comput. Vision 2010, 4, 25–36. [Google Scholar] [CrossRef]
  13. Wang, L.; Li, Y.; Xiong, F.; Zhang, W. Gait Recognition Using Optical Motion Capture: A Decision Fusion Based Method. Sensors 2021, 21, 3496. [Google Scholar] [CrossRef]
  14. Liao, R.; An, W.; Li, Z.; Bhattacharyya, S.S. A novel view synthesis approach based on view space covering for gait recognition. Neurocomputing 2021, 453, 13–25. [Google Scholar] [CrossRef]
  15. Slemenšek, J.; Fister, I.; Geršak, J.; Bratina, B.; van Midden, V.M.; Pirtošek, Z.; Šafarič, R. Human Gait Activity Recognition Machine Learning Methods. Sensors 2023, 23, 745. [Google Scholar] [CrossRef] [PubMed]
  16. Pinčić, D.; Sušanj, D.; Lenac, K. Gait Recognition with Self-Supervised Learning of Gait Features Based on Vision Transformers. Sensors 2022, 22, 7140. [Google Scholar] [CrossRef] [PubMed]
  17. Wan, C.; Wang, L.; Phoha, V.V. A survey on gait recognition. ACM Comput. Surv. (CSUR) 2018, 51, 1–35. [Google Scholar] [CrossRef] [Green Version]
  18. Xu, C.; Makihara, Y.; Li, X.; Yagi, Y. Occlusion-aware Human Mesh Model-based Gait Recognition. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1309–1321. [Google Scholar] [CrossRef]
  19. Zhu, H.; Zheng, Z.; Nevatia, R. Gait Recognition Using 3-D Human Body Shape Inference. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2–7 January 2023; pp. 909–918. [Google Scholar]
  20. Shi, L.-F.; Liu, Z.-Y.; Zhou, K.-J.; Shi, Y.; Jing, X. Novel Deep Learning Network for Gait Recognition Using Multimodal Inertial Sensors. Sensors 2023, 23, 849. [Google Scholar] [CrossRef]
  21. Kececi, A.; Yildirak, A.; Ozyazici, K.; Ayluctarhan, G.; Agbulut, O.; Zincir, I. Implementation of machine learning algorithms for gait recognition. Eng. Sci. Technol. Int. J. 2020, 23, 931–937. [Google Scholar] [CrossRef]
  22. Dou, H.; Zhang, W.; Zhang, P.; Zhao, Y.; Li, S.; Qin, Z.; Wu, F.; Dong, L.; Li, X. VersatileGait: A large-scale synthetic gait dataset with fine-Grained Attributes and complicated scenarios. arXiv 2021, arXiv:2101.01394. [Google Scholar]
  23. Zhu, Z.; Guo, X.; Yang, T.; Huang, J.; Deng, J.; Huang, G.; Du, D.; Lu, J.; Zhou, J. Gait recognition in the wild: A benchmark. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 14789–14799. [Google Scholar]
  24. Huang, C.; Zhang, F.; Xu, Z.; Wei, J. The Diverse Gait Dataset: Gait segmentation using inertial sensors for pedestrian localization with different genders, heights and walking speeds. Sensors 2022, 22, 1678. [Google Scholar] [CrossRef]
  25. Hasan, M.A.M.; Al Abir, F.; Al Siam, M.; Shin, J. Gait recognition with wearable sensors using modified residual block-based lightweight cnn. IEEE Access 2022, 10, 42577–42588. [Google Scholar] [CrossRef]
  26. An, W.; Yu, S.; Makihara, Y.; Wu, X.; Xu, C.; Yu, Y.; Liao, R.; Yagi, Y. Performance evaluation of model-based gait on multi-view very large population database with pose sequences. IEEE Trans. Biom. Behav. Identity Sci. 2020, 2, 421–430. [Google Scholar] [CrossRef]
  27. Tian, Y.; Wei, L.; Lu, S.; Huang, T. Free-view gait recognition. PLoS ONE 2019, 14, e0214389. [Google Scholar] [CrossRef] [PubMed]
  28. Faramarzi, A.; Heidarinejad, M.; Stephens, B.; Mirjalili, S. Equilibrium optimizer: A novel optimization algorithm. Knowl. Based Syst. 2020, 191, 105190. [Google Scholar] [CrossRef]
  29. Gnetchejo, P.J.; Essiane, S.N.; Dadjé, A.; Ele, P. A combination of Newton-Raphson method and heuristics algorithms for parameter estimation in photovoltaic modules. Heliyon 2021, 7, e06673. [Google Scholar] [CrossRef]
  30. Khan, A.; Khan, M.A.; Javed, M.Y.; Alhaisoni, M.; Tariq, U.; Kadry, S.; Choi, J.-I.; Nam, Y. Human gait recognition using deep learning and improved ant colony optimization. Comput. Mater. Contin. 2022, 70, 2113–2130. [Google Scholar] [CrossRef]
  31. Asif, M.; Tiwana, M.I.; Khan, U.S.; Ahmad, M.W.; Qureshi, W.S.; Iqbal, J. Human gait recognition subject to different covariate factors in a multi-view environment. Results Eng. 2022, 15, 100556. [Google Scholar] [CrossRef]
  32. Wang, L.; Zhang, X.; Han, R.; Yang, J.; Li, X.; Feng, W.; Wang, S. A Benchmark of Video-Based Clothes-Changing Person Re-Identification. arXiv 2022, arXiv:2211.11165. [Google Scholar]
  33. Rao, P.S.; Sahu, G.; Parida, P.; Patnaik, S. An Adaptive Firefly Optimization Algorithm for Human Gait Recognition. In Smart and Sustainable Technologies: Rural and Tribal Development Using IoT and Cloud Computing: Proceedings of ICSST 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 305–316. [Google Scholar]
  34. Mehmood, A.; Khan, M.A.; Sharif, M.; Khan, S.A.; Shaheen, M.; Saba, T.; Riaz, N.; Ashraf, I. Prosperous human gait recognition: An end-to-end system based on pre-trained CNN features selection. Multimed. Tools Appl. 2020, 1–21. [Google Scholar] [CrossRef]
  35. Anusha, R.; Jaidhar, C. Clothing invariant human gait recognition using modified local optimal oriented pattern binary descriptor. Multimed. Tools Appl. 2020, 79, 2873–2896. [Google Scholar] [CrossRef]
Figure 1. Proposed framework of HGR using two-stream fusion-assisted deep learning and optimal features selection.
Figure 1. Proposed framework of HGR using two-stream fusion-assisted deep learning and optimal features selection.
Sensors 23 02754 g001
Figure 2. Proposed fusion of filters for contrast enhancement of moving subject.
Figure 2. Proposed fusion of filters for contrast enhancement of moving subject.
Sensors 23 02754 g002
Figure 3. Original bottleneck architecture of MobilenetV2.
Figure 3. Original bottleneck architecture of MobilenetV2.
Sensors 23 02754 g003
Figure 4. Original architecture of ShuffleNet deep model.
Figure 4. Original architecture of ShuffleNet deep model.
Sensors 23 02754 g004
Figure 5. Visual illustration of deep feature extraction using deep transfer learning.
Figure 5. Visual illustration of deep feature extraction using deep transfer learning.
Sensors 23 02754 g005
Figure 6. Visual analysis of intermediate steps of the proposed framework.
Figure 6. Visual analysis of intermediate steps of the proposed framework.
Sensors 23 02754 g006
Figure 7. Comparison of proposed ESOcNR feature selection algorithm with original ESO based selection.
Figure 7. Comparison of proposed ESOcNR feature selection algorithm with original ESO based selection.
Sensors 23 02754 g007
Table 1. Description of the extracted dataset (CASIA-B) on all selected angles.
Table 1. Description of the extracted dataset (CASIA-B) on all selected angles.
CASIA-B DatasetBefore Augmentation
No. of Images
After Augmentation
No. of Images
0Bag23479388
Coat24369744
Normal21918764
18Bag24579828
Coat24289712
Normal255010,200
36Bag24599836
Coat253010,120
Normal22859140
54Bag257810,312
Coat266110,644
Normal24339732
72Bag253110,124
Coat258210,328
Normal24719884
90Bag251210,048
Coat271710,868
Normal23209280
108Bag23639452
Coat275311,012
Normal264710,588
126Bag24729888
Coat23019204
Normal24189672
144Bag24369744
Coat24529808
Normal24389752
162Bag24669864
Coat252810,112
Normal24229688
180Bag24239692
Coat267410,696
Normal262510,500
Table 2. Description of deep features for the selected deep models.
Table 2. Description of deep features for the selected deep models.
ModelLayerInputOutputFeature Vector
Fine-tuned MobilenetV2154224 × 224 × 31280N × 1280
Fine-tuned ShuffleNet172224 × 224 × 3544N × 544
Table 3. Classification results of HGR using proposed framework on angle 0 of the CASIA-B dataset.
Table 3. Classification results of HGR using proposed framework on angle 0 of the CASIA-B dataset.
ClassifiersFeaturesRecall (%)Precision (%)Accuracy (%)Time (s)
FusionOptimization
Fine Tree 94.894.8394.8154.08
94.6394.6394.661.748
Medium Tree 95.1795.295.169.438
9393.293.038.502
Linear SVM 97.397.397.2466.93
97.1797.297.1192.66
Quadratic SVM 97.3397.3797.3572.19
97.2397.397.245.259
Coarse KNN 96.1796.2396.11290.8
96.3796.496.3585.42
Weighted KNN 96.2796.2796.21347.6
96.196.196.0656.35
Bagged Trees 95.695.695.53765.9
95.3395.395.21422.3
Subspace Discriminant 96.979796.92999.2
96.696.6796.6575.46
Bilayered Neural Network 96.0796.0796.04155.3
95.895.8395.82032.2
Trilayered Neural Network 96.396.2796.24420.7
95.795.695.62049.1
Table 4. Classification results of HGR using the proposed framework on angle 18 of the CASIA-B dataset.
Table 4. Classification results of HGR using the proposed framework on angle 18 of the CASIA-B dataset.
ClassifiersFeaturesRecall (%)Precision (%)Accuracy (%)Time (s)
FusionOptimization
Fine tree 92.979392.9174.72
92.0392.0792.048.364
Medium tree 91.1791.3791.2505.02
89.3789.5789.434.35
Linear SVM 98.498.3798.4498.59
97.7797.7797.7204.978
Quadratic SVM 98.5798.5798.6859.46
97.939898.042.512
Coarse KNN 94.9395.2395.0873.62
95.495.595.4691.91
Weighted KNN 96.9397.0796.9695.55
96.879796.9668.99
Bagged trees 9696.0396.02899.7
95.2795.2795.31573.6
Subspace discriminant 97.5797.5297.52049.8
96.596.5796.5621.48
Bilayered neural network 98.2398.2098.2726.05
97.0797.0797.1839.48
Trilayered neural network 98.2798.2798.3600.06
97.1797.297.21216.1
Table 5. Classification results of HGR using proposed framework on angle 36 of CASIA-B dataset.
Table 5. Classification results of HGR using proposed framework on angle 36 of CASIA-B dataset.
ClassifiersFeaturesRecall (%)Precision (%)Accuracy (%)Time (s)
FusionOptimization
Fine tree 90.1790.2390.2175.99
89.5789.5789.630.051
Medium tree 87.487.3387.495.118
86.886.8386.823.989
Linear SVM 97.4397.3797.4850.09
96.8396.896.8259.84
Quadratic SVM 97.6797.6397.71014.7
97.2397.2397.253.864
Coarse KNN 94.394.2394.32152.5
93.893.7393.8699.39
Weighted KNN 95.6795.5795.62562.8
95.1395.0795.1671.14
Bagged trees 95.1795.1395.15675.8
93.793.793.71827.5
Subspace discriminant 96.696.5796.63608.6
95.0795.0395.1805.07
Bilayered neural network 97.1797.1797.22644.7
96.0796.0796.11054
Trilayered neural network 97.197.197.13075.3
95.6795.795.71131.5
Table 6. Classification results of HGR using the proposed framework on angle 54 of CASIA-B dataset.
Table 6. Classification results of HGR using the proposed framework on angle 54 of CASIA-B dataset.
ClassifiersFeaturesRecall (%)Precision (%)Accuracy (%)Time (s)
FusionOptimization
Fine tree 909090.0126.75
89.5389.689.5108.46
Medium tree 878786.945.349
86.286.0386.032.89
Linear SVM 96.1796.1396.1835.99
95.7795.895.7290.16
Quadratic SVM 96.5396.5396.51020
96.2796.2796.2320.93
Coarse KNN 93.2393.4793.21881.1
93.3793.5793.4652.75
Weighted KNN 95.1395.1395.11889.2
9595.0395.0913.86
Bagged trees 94.394.3794.34780.4
93.6393.6793.61510.8
Subspace discriminant 95.6795.6795.73385
94.6794.794.7863.8
Bilayered neural network 96.1396.1396.22521
95.3395.3395.3923.33
Trilayered neural network 969696.02438.8
95.039595.01013.3
Table 7. Classification results of HGR using the proposed framework on angle 72 of the CASIA-B dataset.
Table 7. Classification results of HGR using the proposed framework on angle 72 of the CASIA-B dataset.
ClassifiersFeaturesRecall (%)Precision (%)Accuracy (%)Time (s)
FusionOptimization
Fine tree 8.8780.9790.8300.21
80.4780.5380.4107.03
Medium tree 75.775.775.6232.76
76.378.5776.172.293
Linear SVM 86.9387.1786.91263.8
8686.1785.9666.27
Quadratic SVM 87.687.5787.51588.8
86.9386.8786.8204.15
Coarse KNN 82.8782.9792.81324.9
8382.992.9791.35
Weighted KNN 85.885.7785.71369.1
85.1785.1785.1925.01
Bagged trees 85.0385.0785.02694.2
84.8784.8784.72772.8
Subspace discriminant 86.6786.9786.62415.2
85.3785.5385.31544.1
Bilayered neural network 85.5785.5785.03259.8
84.184.184.02850.5
Trilayered neural network 85.585.5785.43645.4
84.4384.4784.42918
Table 8. Classification results of HGR using the proposed framework on angle 90 of the CASIA-B dataset.
Table 8. Classification results of HGR using the proposed framework on angle 90 of the CASIA-B dataset.
ClassifiersFeaturesRecall (%)Precision (%)Accuracy (%)Time (s)
FusionOptimization
Fine tree 8888.4388.01773.7
87.888.4387.8104.61
Medium tree 85.3786.585.4281.13
83.584.5783.378.005
Linear SVM 92.7393.3392.71871.9
92.492.992.4735.43
Quadratic SVM 93.7394.0393.71865.3
93.1793.5393.1737.25
Coarse KNN 89.5790.3389.42268.8
89.3390.1389.2553.79
Weighted KNN 92.492.5792.42467.2
92.0792.1391.91280.4
Bagged trees 91.6791.7991.74960.7
91.591.7391.51041.7
Subspace discriminant 92.4793.3392.54581.6
90.9392.0791.0557.85
Bilayered neural network 92.6792.6792.63507.9
91.591.5391.4808.15
Trilayered neural network 92.992.9392.84274.3
91.4791.4791.41225
Table 9. Classification results of HGR using the proposed framework on angle 108 of CASIA-B dataset.
Table 9. Classification results of HGR using the proposed framework on angle 108 of CASIA-B dataset.
ClassifiersFeaturesRecall (%)Precision (%)Accuracy (%)Time (s)
FusionOptimization
Fine tree 89.189.189.2195.8
88.0388.188.260.125
Medium tree 83.884.584.383.087
82.58583.239.314
Linear SVM 94.394.3394.41126.1
93.694.1393.7276.45
Quadratic SVM 94.5794.6394.71382
94.194.1394.2268.99
Coarse KNN 89.9389.990.01970
9089.9790.1418.15
Weighted KNN 92.5795.692.72148.8
92.2392.2792.31020.5
Bagged trees 93.393.3393.44736.1
92.892.892.91467.7
Subspace discriminant 93.8793.9794.03994.2
92.892.9793.0426.27
Bilayered neural network 93.8793.993.93528.1
92.692.692.71079
Trilayered neural network 93.6393.6793.73639.6
92.5392.5792.61315.2
Table 10. Classification results of HGR using proposed framework on angle 126 of CASIA-B dataset.
Table 10. Classification results of HGR using proposed framework on angle 126 of CASIA-B dataset.
ClassifiersFeaturesRecall (%)Precision (%)Accuracy (%)Time (s)
FusionOptimization
Fine tree 84.878584.8468.07
83.983.9383.925.116
Medium tree 83.0385.0782.8346.5
81.984.0381.765.492
Linear SVM 90.8790.990.91243.8
89.7389.789.7254.83
Quadratic SVM 91.1391.1791.21450.2
90.3390.3790.4338.84
Coarse KNN 87.2787.6387.31743.2
87.838887.9359.53
Weighted KNN 89.389.3789.31733
88.3788.4388.4399.24
Bagged trees 8989.0789.13541.7
88.588.588.61202.6
Subspace discriminant 90.490.4790.42676.8
88.989.1788.9705.52
Bilayered neural network 90.2390.2390.32448.2
88.488.488.41268.9
Trilayered neural network 90.2390.290.33238.2
88.588.588.61229.1
Table 11. Classification results of HGR using the proposed framework on angle 144 of the CASIA-B dataset.
Table 11. Classification results of HGR using the proposed framework on angle 144 of the CASIA-B dataset.
ClassifiersFeaturesRecall (%)Precision (%)Accuracy (%)Time (s)
FusionOptimization
Fine tree 87.4387.4787.4126.29
86.586.686.596.952
Medium tree 84.3385.184.3987.03
84.184.8384.1173.73
Linear SVM 92.192.4392.12679.4
91.2791.7391.3296.31
Quadratic SVM 92.3792.692.43202.6
91.992.1791.9327.22
Coarse KNN 87.9388.987.93544.2
88.4789.3788.5485.17
Weighted KNN 90.4790.690.53531.9
90.4390.5390.4505.14
Bagged trees 90.7790.890.87531.4
90.4390.3790.31321.3
Subspace discriminant 91.491.6391.41477.2
90.4390.890.4676.45
Bilayered neural network 91.291.291.210934
90.1790.290.21393.7
Trilayered neural network 91.27912791.31674.1
89.7389.789.71460.7
Table 12. Classification results of HGR using proposed framework on angle 162 of CASIA-B dataset.
Table 12. Classification results of HGR using proposed framework on angle 162 of CASIA-B dataset.
ClassifiersFeaturesRecall (%)Precision (%)Accuracy (%)Time (s)
FusionOptimization
Fine tree 92.2792.392.352.63
92.3392.5392.468.363
Medium tree 90.190.490.132.703
89.2789.9389.4143.29
Linear SVM 96.2396.496.2342.89
9696.1396.055.835
Quadratic SVM 96.4796.5796.5432.39
96.2396.496.3240.25
Coarse KNN 93.894.3793.8882.19
93.994.493.9353.4
Weighted KNN 95.595.5795.5807.78
95.4395.5395.4384.97
Bagged trees 95.4795.595.5126.7
94.894.894.81053.6
Subspace discriminant 9696.0796.01254.6
95.295.3795.2553.05
Bilayered neural network 96.0396.0396.03389.06
95.3795.3795.4798.62
Trilayered neural network 96.196.196.1281.9
95.3795.3395.4848.98
Table 13. Classification results of HGR using the proposed framework on angle 180 of CASIA-B dataset.
Table 13. Classification results of HGR using the proposed framework on angle 180 of CASIA-B dataset.
ClassifiersFeaturesRecall (%)Precision (%)Accuracy (%)Time (s)
FusionOptimization
Fine tree 97.597.5797.5140.6
97.297.1797.2106.4
Medium tree 96.796.796.788.322
95.5795.5795.646.451
Linear SVM 99.8799.8799.9243.19
99.7399.7399.820.143
Quadratic SVM 99.8399.8799.9240.29
99.899.7799.8113.25
Coarse KNN 99.3399.3799.41815.3
99.0799.0399.1164.76
Weighted KNN 99.7399.7399.71883.6
99.699.5799.6168.63
Bagged trees 99.0799.199.12476.1
98.8798.8798.9338.02
Subspace discriminant 99.6399.6399.61925.6
99.4799.4799.5112.38
Bilayered neural network 99.7799.7799.8237.92
99.799.799.728.024
Trilayered neural network 99.899.8399.8289.07
99.6799.799.734.53
Table 14. Overall proposed feature selection based obtained accuracy for CASIA-B dataset on all 11 angles.
Table 14. Overall proposed feature selection based obtained accuracy for CASIA-B dataset on all 11 angles.
Classifiers01836547290108126144162180
Fine tree94.692.089.689.580.487.888.283.986.592.497.2
Medium tree93.089.486.886.076.183.383.281.784.189.495.6
Linear SVM97.197.796.895.785.992.493.789.791.396.099.8
Quadratic SVM97.298.097.296.286.893.194.290.491.996.399.8
Coarse KNN96.395.493.893.492.989.290.187.988.593.999.1
Weighted KNN96.096.995.195.085.191.992.388.490.495.499.6
Bagged trees95.295.393.793.684.791.592.988.690.394.898.9
Subspace discriminant96.696.595.194.785.391.093.088.990.495.299.5
Bilayered NN95.897.196.195.384.091.492.788.490.295.499.7
Trilayered NN95.697.295.795.084.491.492.688.689.795.499.7
Table 15. Comparison of the proposed framework accuracy with state-of-the-art techniques.
Table 15. Comparison of the proposed framework accuracy with state-of-the-art techniques.
MethodAngles
01836547290108126144162180
[30] 202295.293.9--------98.2
[31] 202297.097.9--97.2-----96.0
[32] 202292.196.1-95.7-93---94.8791.33
[33] 2022----98.3----94.9098.6
[34] 2020-94.393.894.7-------
[35] 201998.895.696.391.994.095.294.695.490.493.0095.1
PROPOSED97.298.097.296.292.893.194.290.491.996.399.8
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jahangir, F.; Khan, M.A.; Alhaisoni, M.; Alqahtani, A.; Alsubai, S.; Sha, M.; Al Hejaili, A.; Cha, J.-h. A Fusion-Assisted Multi-Stream Deep Learning and ESO-Controlled Newton–Raphson-Based Feature Selection Approach for Human Gait Recognition. Sensors 2023, 23, 2754. https://doi.org/10.3390/s23052754

AMA Style

Jahangir F, Khan MA, Alhaisoni M, Alqahtani A, Alsubai S, Sha M, Al Hejaili A, Cha J-h. A Fusion-Assisted Multi-Stream Deep Learning and ESO-Controlled Newton–Raphson-Based Feature Selection Approach for Human Gait Recognition. Sensors. 2023; 23(5):2754. https://doi.org/10.3390/s23052754

Chicago/Turabian Style

Jahangir, Faiza, Muhammad Attique Khan, Majed Alhaisoni, Abdullah Alqahtani, Shtwai Alsubai, Mohemmed Sha, Abdullah Al Hejaili, and Jae-hyuk Cha. 2023. "A Fusion-Assisted Multi-Stream Deep Learning and ESO-Controlled Newton–Raphson-Based Feature Selection Approach for Human Gait Recognition" Sensors 23, no. 5: 2754. https://doi.org/10.3390/s23052754

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop