A Novel Hybrid Gradient-Based Optimizer and Grey Wolf Optimizer Feature Selection Method for Human Activity Recognition Using Smartphone Sensors

Human activity recognition (HAR) plays a vital role in different real-world applications such as in tracking elderly activities for elderly care services, in assisted living environments, smart home interactions, healthcare monitoring applications, electronic games, and various human–computer interaction (HCI) applications, and is an essential part of the Internet of Healthcare Things (IoHT) services. However, the high dimensionality of the collected data from these applications has the largest influence on the quality of the HAR model. Therefore, in this paper, we propose an efficient HAR system using a lightweight feature selection (FS) method to enhance the HAR classification process. The developed FS method, called GBOGWO, aims to improve the performance of the Gradient-based optimizer (GBO) algorithm by using the operators of the grey wolf optimizer (GWO). First, GBOGWO is used to select the appropriate features; then, the support vector machine (SVM) is used to classify the activities. To assess the performance of GBOGWO, extensive experiments using well-known UCI-HAR and WISDM datasets were conducted. Overall outcomes show that GBOGWO improved the classification accuracy with an average accuracy of 98%.


Introduction
The widespread use of mobile and smart devices has increased the demand for various smart home and Internet of Things (IoT) applications [1]. One of the most important applications is the Internet of Medical Things (IoMT) [2]/Internet of Healthcare Things (IoHT) [3], in which a real-time tracking, detection, and surveillance system is required for monitoring people's daily activities for medical diagnostics, healthy lifestyle purposes or assisted living environments [4]. In many cases, a such system uses mobile device (such as a smartphone) sensor data [5]. To this end, human activity recognition (HAR) is a necessary application for IoHT, which plays an essential role in medical care applications [6].
In previous decades, different techniques have been used for HAR, such as computer vision methods [7][8][9] that use cameras to track human motion and actions, and wearable devices that should be carried by users, such as wearable sensors [10], smartwatches [11], and smartphones [12,13]. Additionally, there are other techniques, such as environment installed sensors [14], and WiFi signals, which include three techniques, namely received signal strength [15], channel state information [16], and WiFi radar (micro-Doppler radar) [17]. Each of these techniques has its advantages and disadvantages. For instance, computer vision methods need good light conditions, and they raise significant concerns in terms of people's privacy [18]. Wireless methods do not require additional installation, but they are still in their early stage, and they require more and more improvements. Using carried sensors, such as smartphones, is preferred because virtually everyone uses smartphones today, so it is easy to collect data and to track different motions and activities.
With the developments in context-aware and machine learning techniques, researchers have applied different methods for HAR using data collected from smartphones. Smartphones have gained significant popularity for HAR due to three reasons. The first one is the ubiquitous nature of these small devices, which are used by almost everyone. The second reason is because of the reliability and efficiency of the procured data, and the third reason is that less restrictions can be considered in terms of privacy concerns compared to the case of computer vision methods. Therefore, in recent years, a number of studies have been proposed using different artificial intelligence (AI) techniques, such as [19][20][21].
In general, feature selection (FS) plays a vital role in improving classification accuracy and reducing computation costs. Nature-inspired algorithms such as ant colony optimization [22], particle swarm optimization [23], artificial bee colony [24], firefly algorithm [25], artificial ecosystem-based optimization [26], marine predators algorithm [27], Harris hawks optimizer [28], grey wolf optimizer [29], polar bear optimization [30] and red fox optimization [31], not to mention many others [32], are applicable and robust algorithms for finding a subset of prominent features while removing the non-informative features.
Especially in HAR, FS methods are popular techniques that help in obtaining high accuracy rates [33,34]. However, there are some limitations that can affect the performance of FS methods. For example, obtaining high accuracy rates can only be achieved with the correct features since some features do not provide improvements to the classification accuracy. In addition, FS methods are prone to a large number of features (i.e., high dimensionality), which can result in a high computational cost. Thus, to overcome these limitations and challenges, an efficient FS method should fulfill certain criteria such as being light and fast and able to extract relevant features, lower the feature space dimension, and reduce computation costs in terms of time and resources.
Hybrid algorithms are important for increasing the feature selection capability. Hybridization aims to benefit from each underlying optimization method to create a hybrid algorithm while minimizing any significant drawbacks. Such hybridization can often enhance the performance of various systems on complex tasks [35][36][37].
In our study, we propose a new FS method to improve the HAR system using the hybridization of two algorithms, namely the gradient-based optimizer (GBO) and grey wolf optimizer (GWO). The GBO is a novel metaheuristic (MH) algorithm proposed by Ahmadianfar et al. [38]. The GBO was inspired by gradient-based Newton's model, which has two operators, namely the gradient search rule and local escape operator. Moreover, GBO uses a set of vectors for exploring the search space. To our knowledge, this is the first study to apply GBO for feature selection. Meanwhile, the GWO algorithm is a swarm intelligence and MH algorithm inspired by the hunting mechanisms and leadership hierarchies of grey wolves [39]. The GWO has four types of grey wolves, called alpha, beta, delta, and omega. These types are applied to emulate leadership hierarchies. Furthermore, GWO has three hunting steps, called searching, encircling, and attacking prey. In recent years, the GWO has been adopted to solve various optimization tasks, including feature selection [40][41][42].

Contribution
The main contribution of the current study is to provide an efficient HAR system using smartphone sensors. The proposed system uses advanced AI techniques to overcome the complexity and limitations of traditional methods. We investigated the applications of MH optimization methods to select the best features that enhance the performance of the proposed HAR system. The GBO and GWO have proven their performance in the literature, but their individual applications suffer from certain limitations, such as being stuck at the local optima and the slow convergence. Thus, the combination of GBO and GWO provides a more robust method that balances between exploration and exploitation stages, in which the combined method will overcome the local optima problem. In addition, the proposed GBOGWO models the features as input for the well-known classifier, support vector machine (SVM), which is applied to classify human activities. Furthermore, extensive experimental evaluations have been carried out to evaluate the proposed HAR system performance using a public dataset, called UCI-HAR [43], and to verify its significant performance in extensive comparisons with existing HAR methods. We applied several performance measures, and we found that the proposed GBOGWO achieved better results when compared to several existing methods. Additionally, we also used the WISDM dataset to verify the performance of the GBOGWO method.
The rest of the current study is structured as follows. Related works are highlighted in Section 2. The preliminaries of the applied methods are described in Section 3. The proposed GBOGWO system is described in Section 4. Evaluation experiments are studied in Section 5. Finally, we conclude this study in Section 6.

Related Work
In this section, we only focus on the recent related works of HAR using smartphones. For other HAR techniques, the readers can refer to the survey studies [10,18,44].
Ronao and Cho [45] proposed a deep convolutional neural network (CNN) for tracking human activities using smartphone sensors. They used the UCI-HAR dataset [43], which was also used in this paper to test the performance of our proposed method. Their method achieved an average accuracy of 94.79%. Ahmed et al. [34] proposed a hybrid FS method to improve HAR using smartphones. They applied both wrapper and filter FS methods using a sequential floating forward search approach to extract features and then fed these features to the multiclass support vector machine classifier. The proposed approach showed robust performance and achieved significant classification results. Chen et al. [46] applied an ensemble extreme learning machine method for HAR using smartphone datasets. They applied Gaussian random projection to generate the input weights of the extreme learning machine, which improves the performance of the ensemble learning. Additionally, they tested the proposed method with two datasets, and they obtained high accuracy rates on both datasets. Wang et al. [21] proposed an HAR system using deep learning. They proposed an FS method using CNN to extract local features. After that, they employed several machine learning and deep learning classifiers to recognize several activities from two benchmark datasets. Zhang et al. [47] proposed an HAR model, called HMM-DNN, which uses a deep neural network to model the hidden Markov model (HMM). The main idea of this hybrid model is to enhance the performance of the HMM using DNN to be able to learn suitable features from the learning datasets and improve the classification process. Cao et al. [48] proposed a group-based context-aware HAR method, called GCHAR. They used a hierarchical group-based approach to enhance the classification accuracy and reduce errors. The GCHAR uses two hierarchical classification structures, inner and inter groups, that are used for detecting transitions through the group of the activities. Wang et al. [49] proposed an HAR model using a new feature selection method combining both wrapper and filter and wrapper methods. Moreover, they studied the use of different entrails' sensors of smartphones and their impacts on HAR. Sansano et al. [50] compared several deep learning models, including CNN, long short-term memory (LSTM), bidirectional LSTM (biLSTM), deep belief networks (DBN), and gated recurrent unit networks (GRU), for human activity recognition using different benchmark datasets. They found that the CNN methods achieved the best results. Xia et al. [51] proposed a hybrid HAR model that combined both CNN and LSTM. The hybrid model aims to automatically extract features of the proposed activities and classify these activities using a set of a few parameters. They evaluated the proposed model using different datasets, including the UCI-HAR dataset, which achieved an average accuracy of 95.78%. Moreover, a few studies have used swarm intelligence in the HAR field. For example, Elsts et al. [52] proposed an efficient HAR system using the multi-objective particle swarm optimization algorithm (PSO). The PSO was applied to select the appropriate features, which also leads to reduce computation time. They used a random forest (RF) to classify several activities. The results confirmed that the PSO improved the classification accuracy and reduced the computational cost. Abdel-Basset et al. [6] proposed a new HAR system, called ST-DeepHAR, which uses an attention mechanism to improve long short-term memory (LSTM). Two public datasets were utilized to evaluate the performance of the ST-DeepHAR, which showed significant performance.

Material and Methods
In this section, we describe the datasets used in our experiments. Furthermore, we present the preliminaries of gradient-based optimization (GBO) and grey wolf optimization.

UCI-HAR Dataset
Anguita et al. [43] have published a public dataset for activities of daily living. Thirty participating subjects were asked to follow a protocol for performing 6 activities using a waist-mounted smartphone, namely walking (WK), walking upstairs (WU), walking downstairs (WD), sitting (ST), standing (SD), and lying down (LD). A sampling rate of 50 Hz was used to collect the tri-axial linear acceleration and angular velocity of the smartphone accelerometer and gyroscope sensors. Each participant performed a sequence of activities in order. Hence, the raw signals of all activities were registered in one text file per participant. Due to the low sampling rate and high amount of noise, collected signals were filtered using a low-pass filter with a corner frequency 20 Hz. Then, body acceleration was separated from the gravity acceleration component in order to better extract representative features. After that, additional time and frequency-domain signals were generated from the filtered body/gravity tri-axial signals such as jerk (or time derivative), signal magnitude using Euclidean norm, and fast Fourier transformation (FFT). A total of 17 signals were obtained per subject. Time-domain signals were segmented using fixed-width sliding windows of a length of 2.56 s with 50% overlapping, and an equivalent rate was applied to FFT signals. Thus, each window contained approximately 128 data points of activity; such a selected segmentation rate is supposed to meet the activities of normal people, as justified in [43]. After that, many useful functions were applied to filter and segment the signals in order to extract the features including the mean, standard deviation, signal magnitude area, entropy, energy, autoregressive coefficients and the angle between vectors. Now, each activity window is represented by a 561-length vector. Authors have also published separate files for training and testing featured data where 70% of the data samples were randomly selected for training and the remaining 30% were the independent set for testing. Thus, the number of examples per activity for the training and testing is indicated in Table 1. The percentage of each activity in this dataset refers to a more or less balanced dataset. Hence, it is relevant to design and test different classification and recognition HAR models from an applicability point of view.

Gradient-Based Optimization (GBO)
Within this section, we introduce the basic concept of a new metaheuristic technique named GBO. In general, GBO simulates the gradient-based Newton's approach. The GBO depends on two operators to update the solutions, and each one of them has its own task. The first operator is the gradient search rule (GSR) which is used to improve the exploration, while the second operator is the local escaping operator (LEO), which is used to enhance the exploitation ability.
The first process in GBO is to construct a population X with N solutions, randomly generated using the following equation: where x min and x max are the limits of the search space and rand ∈ [0, 1] denotes a random number. Then, the fitness value for each solution is computed, and the best solution is determined. Thereafter, the gradient search rule (GSR) and direction movement (DM) are applied to update the solutions (x It i , i = 1, 2, ..., N) in the direction (x b − x It i ) (where x b refers to the best solution). This updating process is achieved by computing new three solutions x1 It i , x2 It i and x3 It i as In Equation (2), ρ 1 is applied to improving the balance between exploitation and exploration during the optimization process and it is defined as where: where β min = 0.2 and β max = 1.2. Iter denotes the current iterations, and Max It is the total number of iterations. The gradient search rule (GSR) is defined as follows: with: N) is a random vector whose dimensions N, r1, r2, r3, and r4 refer to random integers selected from [1, N]. ρ 2 is formulated as defined by Equation (3).
The locations yp i and yq i are updated using Equations (5) and (6): with: Finally, based on the positions x1 It i x2 It i , and x3 It i , a new solution at iteration It + 1 is obtained: x where r a and r b denote two random numbers. Moreover, the local escaping operator (LEO) is applied to improve the exploitation ability of GBO. This is achieved by updating the solution x It i using the following equation according to the probability pr: (11), f 1 ∈ [−1, 1] and f 2 denote a uniform random number and normal random number, respectively. u 1 , u 2 , and u 3 are three random numbers defined as where L 1 represents a binary variable (i.e., assigned to 0 or 1). Therefore, the new solution is obtained using the following equation: where L 2 is similar to L − 1 and x It p refers to a selected solution from X, and x rand denotes a random solution obtained using Equation (1).
The main steps of the GBO algorithm are presented in Algorithm 1.

Algorithm 1 The Gradient-Based Optimizer (GBO)
1: Initialize the parameters of GBO: , pr, Max It Maximum Iteration number, N: Population size. 2: Initialize randomly the population of N vectors using Equation (1) 3: Evaluate the position of each vector using the fitness function fit 4: Determine the best and worst solutions : x best , x worst 5: Let It = 1 6: while It ≤ Max It do 7: for each vector x It i do 8: Choose four integers randomly in the range [1..N] such that : r1 = r2 = r3 = r4 9: Update the position of the vector x It+1 i using Equation (14). 10: Evaluate the quality of the vector x It+1 i using the fitness function f it i 11: end for 12: if rand < pr then 13: Update the position of x It+1 i using the first branch of Equation (11) 14: else 15: Update the position of x It+1 i using the second branch of Equation (11) 16: end if 17: Determine the best and worst solutions : x best , x worst 18: 20: Return the optimal solution x best

Grey Wolf Optimization
In this section, the steps of the grey wolf optimization (GWO) [39] are described. The GWO emulates the behaviors of wolves in nature during the process of catching the prey X b . The GWO has three groups of solutions named α, β, and γ-each of which has its own task and represents the first three best solutions, respectively, while the other solutions are called the µ group.
GWO starts by setting the initial value for a set of solutions X and evaluating the fitness value for each of them and determines X α , X β , and X γ . Thereafter, the solutions are updated using a set of approaches, such as the encircling technique, and it is formulated as [39] X where A and B denote the coefficient parameters, whereas q 1 and q 2 refer to random numbers generated from [0, 1]. The value of b sequentially decreases from 2 to 0 with an increase in the iterations as where t max refers to the total number of iterations. The second strategy in GWO is called hunting, and this solution can be updated using the following Equation [39]: where A k = 2q 2 , k = 1, 2, 3, and B k = 2b × q 1 − b.
The steps of GWO are listed in Algorithm 2 [39].

Proposed Approach
Within this section, the steps of the developed HAR method based on a modified version of the GBO are introduced. The framework of the developed HAR method is given in Figure 1. The developed method starts by receiving the input data and splits them into the training and testing sets. This is followed by determining the initial value for the parameters of the developed HAR model such as the population size, the total number of generations, and the probability pr. Then, the initial population X is generated and the quality of each solution X i , i = 1, 2, ..., N is evaluated. This is achieved through two steps; the first step is to convert X i into a binary solution using the following equation: where BX i is the binary form of X i . The second step is to remove the features corresponding to zeros in BX, which represent irrelevant features. Then, those selected features from the training set are used to learn the multiclass-SVM classifier and compute the fitness value as [43,51] where PR presents the classification precision. The next step in the developed model is to find the best solution X b and the worst solution. Then, the solutions are updated according to X b and the operators of GBO and GWO. Here, GWO is applied to enhance the local escaping operator (LEO) according to the value of pr. In the case of pr greater than the random value, the operators of GBO are used to generate a new solution. Otherwise, the operators of GWO are used. By comparing the fitness value of the new obtained solution with the current solution X i , we select the best of them and remove the worst one. The process of updating the solutions is ongoing until it reaches the stopping criteria. Thereafter, the testing set is reduced according to the features obtained by the best solution and the performance of the predicted activities is computed using different classification measures.
where T init. represents the time spent collecting the initial population. p is the probability of selecting either the GWO update mechanism or GBO exploration subprocedure. T GBO GSR , T GBO Xnew and T GWO Xnew each has a time complexity of O(D). T FE refers to the time taken by the function evaluation, which has a notable enhancement in terms of execution time in HAR applications due to using classifiers such as multiclass-SVM, random forest, neural networks and others. T upd. denotes the time for evaluating X new and updating the best solution if necessary. T refers to the total number of iterations.

Experimental Results and Discussion
The proposed algorithm was applied to improve the classification performance of the UCI-HAR dataset via a feature selection approach. In this section, the experimental settings, the results of the proposed approach, the comparisons with other models, and the classification rates for the concerned dataset with comparison to other studies in the literature are presented. Moreover, a critical analysis of the obtained results using the proposed HAR system is given.

UCI-HAR Dataset
The performance of GBOGWO was exhaustively compared to a set of 11 optimization algorithms for feature selection. Basic continuous-based versions of the GBO, GWO, genetic algorithm (GA) [53], differential evolutionary algorithm (DE) [54], moth-flame optimization (MFO) [55], sine-cosine algorithm (SCA) [56], Harris hawks optimization (HHO) [57], and manta ray foraging (MRFO) [58] were implemented, in addition the particle swarm optimization (B-PSO) [59], bat algorithm (B-BAT) [60] and sine-cosine algorithm (B-SCA) [56]. The settings and parameter values of all algorithms used in the comparison are provided in Table 2. As a classification task, true positive (TP), true negative (TN), false positive (FP) and false negative (FN) rates define the commonly used performance metrics for HAR systems, which are defined as follows: Recall/Sensitivity = TP TP + FN Evaluation metrics of the comparison involve the mean (M) and standard deviation (std) of the precision (PR), M, and std of the number of selected features (# F), the percentage of feature reduction (red (%)), and the execution time. The Wilcoxon statistical test was used to determine the degree of significant difference between GBOGWO and each other compared algorithm in terms of the null hypothesis indicator H and significance level p-value. Each algorithm was repeated for 10 independent runs; this may be considered as the bottom line for examining the behavior of such a stochastic optimization technique. The reason refers to the huge execution time when training a multi-class SVM for extremely long training records (the training set was recorded with the dimension 561). The classification rates obtained by the proposed approach were compared to those of the original paper of the dataset under study as well as one recent study in the literature. Moreover, the performance of GBOGWO was compared to commonly used filter-based methods such as the t-test and ReliefF [61] in feature-selection applications.
All algorithms were implemented in the Matlab 2018a (MathWorks Inc., Natick, MA, USA) environment using CPU 2.6 GHz and RAM 10 GB.  Figure 2 shows a summary of the reported results in Table 3 in a normalized fashion, which gives more clear intuition about the behavior of GBOGWO according to different evaluation metrics. The confusion matrix, presented in Table 4, provides the rates of PR, sensitivity (Sens.), and specificity (Spec.) for each single activity. Walking downstairs (WD), lying down (LD), and walking (WK) were the highest recognized activities with PR rates of 100%, 100%, and 99.2%, respectively, while the worst PR rate was for standing (SD) activity with 93.57%. The recall of most activities was high except for sitting (ST) with 92.46%. It can also be noticed that the Spec. for all activities is quite good (>98.51%). The proposed model was able to well distinguish between the group of periodic activities (WK, WU, WD) and the other one of static or single-transition activities (ST, SD, LD) where the rate of misclassification is almost zero (only one wrong label between WU and ST in Table 4).    Figure 3, (WK, WU, WD) in (dark green, blue, black) can be linearly separated from (SD, ST, LD) in (red, yellow, light green), except for very few records which are clustered to wrong classes between WU and ST. On the other hand, there is a high degree of similarity between the extracted features of each of SD and ST. Such similarity has complicated the classification task; thus, there is notable confusion between SD and ST (on average, 36 wrong labels in-between).

Numerical Results of Experiments
To summarize the conducted experiments, the proposed feature set for the UCI-HAR dataset in [43] was useful for the targeted recognition task; however, discarding some illusive features using the proposed technique proved very useful to improve the overall performance of such an HAR model. The feature set was successfully reduced by 45.8%, and at the same time, the mean PR reached 98.13%, and the mean accuracy was 98%.

Comparison with Other Studies
Recognition rates of the proposed HAR model were compared to each of the original studies of UCI-HAR dataset [43] and the recent study by [51]. In [43], 561D feature vectors were provided to a multiclass SVM, which gave a mean PR of 96%. A hybrid model using LSTM and CNN was applied to segmented sequences of activity signals in [51], which reported a mean PR of 95.8%. Table 5 shows a comparison of the results obtained herein and in the aforementioned studies. The notable improvement of whole model performance is noticed, in particular for WK and ST activities. However, the three models resulted in low precision for the SD activity.

Comparison with Filter-Based Methods
Filter-based methods such as the statistical tests and the ReliefF algorithm [62] are commonly used for feature selection tasks. Such methods are time-efficient and their classifier-independent nature simplifies passing the selected feature set to any further classifier [63]. As a statistical test, the t-test examines the similarity between classes for each individual feature via mean and standard deviation calculations. It is then possible to rank features according to the significance importance and finally, define some cut-off threshold to select a feature set. The RelieF algorithm applies a penalty scheme, where features that map to different values for the same neighbors are penalized (i.e., negative weight); and otherwise rewarded. After that, the feature set with non-negative weights is expected to better represent the concerned classes. Table 6 gives the results of the comparison between the proposed model and the filterbased approach using the t-test and ReliefF. ReliefF was able to extract the smallest feature set, achieving a reduction ratio of 67%, but the GBOGWO was outstanding, according to the resulting accuracy, sensitivity, and precision. However, the feature set selection using the t-test was enlarged to 350D, but this did not improve the performance. In Table 6, and for a typical value λ = 0.99, the proposed GBOGWO fitness was 97.15%. For a more biased λ = 0.9 towards reducing the feature set, the fitness of GBOGWO reaches 88.37%. For both cases of α, the proposed approach is superior to the examined filter-based methods. The superior performance of the developed method over all other tested methods can be noticed from the previous discussion. However, the developed method still suffers from several limitations, such as the relatively large feature set required for achieving reasonable performance (i.e., 304 features on average for six activities). Thus, it is reasonable to realize such an HAR system on a smartphone environment to examine both the model size and real-time behavior. Moreover, enlarging the set of targeted activities is expected to add more time complexity for training a classifier such as the multi-class SVM.

Evaluate the Proposed GBOGWO with WISDM Dataset
For further evaluation, we test the proposed GBWGWO with other HAR datasets, called WISDM [64] dataset. This dataset contains six activities, namely walking (WK), walking upstairs (WU), walking downstairs (WD), sitting (ST), standing (SD), and jogging (JG). Table 7 shows the results of the proposed GBOGWO and several optimization methods, including the GWO, GA, MFO, MRFO, and GBO. From the table, we can see that the proposed method achieved the best results. It is worth mentioning that the best results for the WISDM dataset were achieved by using the random forest (RF) classifier; therefore, in this paper, for the WISDM dataset, we also used the RF. A basic version of the RF algorithm with 50 decision trees gives an average accuracy of 97.5% for the feature set defined in Table 8. Following the pre-processing steps of the UCI-HAR dataset, each activity signal was separated into body acceleration and gravity component signals. Then, segments of a length of 128 points (i.e., same segment length used for UCI-HAR dataset) with 50% overlap were generated for the purposes of realtime applications. The feature set in Table 8 was generated using simple time-domain statistics in the three-axes of each segment, notably the mean, standard deviation (STD), the coefficients of the auto-regressive model (AR) in the order of 4, and the histogram counts where the number of bins is 5, among others. Moreover, the mean, max, and median frequencies of each segment in the three-axes enhance the feature set. Considering that the proposed features are generated for both the body signal and gravity component, then the cardinality of the feature set reaches 150. Thus, such a feature set can help distinguish the behavior of the compared algorithms for the WISDM dataset. Since previous studies that addressed the WISDM dataset have considered Accuracy to evaluate their algorithms, then the classification error is set to 1− mean(Accuracy) as shown in Figure 4b.

Signals Body Acceleration and Gravity Component
Time-domain * AR coeff. (12); AR coeff. of magnitude (4); acceleration (1); entropy of jerk (3)  Since the search space of UCI-HAR-as a feature selection problem-is high-dimensional, then it is a suitable examiner for compared algorithms. Thus, for avoiding redundancy, only the top six algorithms according to the results in Table 3, namely GBOGWO, GWO, GA, MFO, MRFO, and GBO, were included in the experimentation of the WISDM dataset.
In Table 7, GBOGWO is able to achieve a mean accuracy (Acc) of 98.87%, which is a notable optimization for the basic model with a whole feature set of 97.5%. The GBOGWO outperforms other algorithms according to the Acc of classification only using 32.7 features on average (78.2% of reduction ratio). However, MFO uses the largest feature set among examined optimizers with 59.9 features, but it can reach a mean Acc of 98.21%. GBO attains the minimal feature set with cardinality of 25, but it seems insufficient to achieve a mean Acc above 98.11%. It was noticed that the STD for all algorithms was less than 0.01, which may refer to the relatively limited search space (e.g., the feature set size is 150). Moreover, the Wilcoxon test results in Table 7 ensure that GBOGWO is well distinguished from other algorithms of comparison.
In Table 9, the selection power of GBOGWO outperforms both the t-test and ReliefF which tend to attain a large feature set of size 124 and 108, respectively, whilst lesser mean Acc of 97.58% and 98.11%, respectively. According to the fitness criteria defined in Equation (21), GBOGWO outperforms both methods in the case of giving most importance to Acc (i.e., λ = 0.99) or to feature set reduction (i.e., λ = 0.9).  Table 10 shows the confusion matrix of the test set, which represents 30% of whole samples. The activities ST, SD, and WK, were well recognized with the mean PR that exceeds 99.5%. It was noticed that the rates of PR, Sens. and Spec. were close for most activities which reflects that the classification model (features + classifier) was balanced between such metrics. Most conflicts occur between WU and WD, as well as between WU and JG where misclassifications reach 27 and 15, respectively. Such conflicts may be caused by the sensor position (in the pocket); thus, for such applications, it is suggested to collect activity signals from different positions on the body such as pocket, wrist, waist, and shoulder.    Table 11 focuses on the most frequent features in the optimized feature sets of each algorithm. For UCI-HAR, only features attained by all considered algorithms (e.g., count = 6) are shown. These features are generated from the body signals of both the accelerometer (BodyAcc) and gyroscope (BodyGyro) in both the time-domain (with the prefix t) and frequency-domain (with the prefix f ). For more explanation of such features, the reader can refer to [43]. For WISDM, the skewness of the y axis of the body signal (Skewness-Y) looks like the most important feature as it is attained by every algorithm. Similarly, the tilt angle (TA), the STD of the jerk of x axis body signal (STD-Jerk-X), and the first coefficient of the AR model of magnitude signal (AR-Magnitude,1) have a frequency of 5. The maximum frequency of the z axis of the body signal (Max-Freq-Z) shows most notable effectiveness in the generated frequency-domain features with a count of 4. It is reasonable to find that body signal statistics are more useful than those of gravity components for such applications. Thus, only Gravity-STD-Y and Gravity-Kurtosis-Y appear in the elite feature set.  Gravity-Kurtosis-Y 4

Conclusions and Future Work
In this study, we presented a robust human activity recognition (HAR) system based on data collected from smartphones. We developed a new feature selection (FS) method that was applied to enhance the HAR system using a hybrid MH algorithm that combine both gradient-based optimization (GBO) and grey wolf optimization (GWO). The proposed method, called GBOGWO, was applied to the SVM classifier to classify the activities of wellknown UCI-HAR dataset. The combination of GBO and GWO overcomes the shortcomings of individual methods by exploiting the advantages of both algorithms to build an efficient FS method, which is employed to build a robust HAR classification system. Compared to existing HAR methods, and also to several metaheuristic algorithms that are applied as FS methods with SVM classifier, the developed GBOGWO has shown better performance in terms of classification accuracy and other performance metrics. Additionally, we evaluated the proposed GBOGWO with the WISDM dataset using the RF classifier. It also obtained the best results compared to several optimization algorithms.
The developed method could be further improved in future work to address more complex HAR datasets that may contain two or more human activities conducted simultaneously.