Research on Hyper-Parameter Optimization of Activity Recognition Algorithm Based on Improved Cuckoo Search

Activity recognition methods often include some hyper-parameters based on experience, which greatly affects their effectiveness in activity recognition. However, the existing hyper-parameter optimization algorithms are mostly for continuous hyper-parameters, and rarely for the optimization of integer hyper-parameters and mixed hyper-parameters. To solve the problem, this paper improved the traditional cuckoo algorithm. The improved algorithm can optimize not only continuous hyper-parameters, but also integer hyper-parameters and mixed hyper-parameters. This paper validated the proposed method with the hyper-parameters in Least Squares Support Vector Machine (LS-SVM) and Long-Short-Term Memory (LSTM), and compared the activity recognition effects before and after optimization on the smart home activity recognition data set. The results show that the improved cuckoo algorithm can effectively improve the performance of the model in activity recognition.


Introduction
In recent years, a number of methods have been proposed for activity recognition with non-obtrusive sensors [1,2], which have appeared as Ambient Intelligence (AmI) [3] and enablers to facilitate the development of applications that are aware of users' presence and contexts, and are adaptive and responsive to their needs and habits. The methods mainly include conventional approaches and deep learning approaches, which have witnessed fast development and advancement in recent years [4]. Conventional approaches mainly use Decision Tree [5], K-Nearest Neighbor (KNN) [6], Support Vector Machine (SVM) [7], Naïve Bayes(NB) [8], Hidden Markov Model (HMM) [9], Conditional Random Field (CRF) [10] and other traditional machine learning methods [11]. Deep learning approaches mainly include Convolutional Neural Network (CNN) [12] and Recurrent Neural Network (RNN) [13]. Compared to conventional approaches, deep learning approaches could automatically extract appropriate features from raw sensor data during the training phase, and present the low-level original temporal features with high-level abstract sequences [14].
However, both conventional approaches, or traditional methods, and deep learning methods often contain some hyper-parameters which affect the performance of the model. Hyper-parameters are defined relative to general parameters. General parameters are variables obtained by model learning, while hyper-parameters are set variables, based on experience. Specifically, a typical traditional method is Support Vector Machine (SVM) [15,16], which has been proven to be sensitive to hyper-parameters C and λ, while a typical deep learning approach is CNN [17,18], which has been proven to be very sensitive to hyper-parameter network depth, the number of filters and their respective sizes. In addition, there are other methods that include hyper-parameters but are rarely optimized for activity recognition, such as Least Squares Support Vector Machine (LS-SVM) [19] and Long-Short-Term Memory (LSTM) [20].

Improved Cuckoo Optimization Search Algorithm
This section first introduces the traditional cuckoo optimization algorithm, and then gives two improved cuckoo algorithms.

Traditional Cuckoo Optimization Algorithm
The CS Algorithm assumes that the cuckoo's behavior is in the following three ideal states: (1) The cuckoo lays only m eggs at a time, and randomly selects a suitable nest to hatch these m eggs. (2) In the process of cuckoo bird nest selection, the best quality bird nest will be retained for the next generation. (3) Under the premise of a certain number of bird nests that a cuckoo can choose from, the probability that each bird nest owner finds a foreign bird egg is p, where p ∈ [0,1]. If foreign bird eggs are found, the owner of the nest will rebuild a nest.
The flowchart of traditional the CS algorithm is presented in Figure 1.
The implementation of the traditional CS algorithm is as follows: Step 1. Generate n random host nests x i , i = 1, · · · , n, set t = 0 and compute the fitness F i , i = 1, · · · , n.
Step 2. Find best_nest and best_fitness. If t < iter_num (the maximum number of iterations), go to Step 3, or else go to the last step.
Step 3. Update_nests: generate x new i , i = 1, · · · , n with Lévy flights and compute the fitness F new Step 4. Abandon_nests: generate a random fraction P for every nest x i , i = 1, · · · , n, if P < Pa, build a new one at new locations Step 5. Calculate F t i , i = 1, · · · , n at the nest x i , i = 1, · · · , n obtained in Step 4 in the t-th iterations, and find max_fitness F tmax and the corresponding nest x tmax , if F tmax > best_fitness, update best_nest = x tmax and best_fitness = F tmax .
Step 7. Return best_nest and best_fitness.  The implementation of the traditional CS algorithm is as follows: Step 1. Generate n random host nests , 1, , i x i n = , set t = 0 and compute the fitness Step 2. Find best_nest and best_fitness. If t < iter_num (the maximum number of iterations), go to Step 3, or else go to the last step.

Improved Cuckoo Optimization for Optimizing Integer Parameters
To optimize integer parameters, the host nests x i , i = 1, · · · , n need to be rounded. To this end, this article changed Steps 1, 3, and 4 in the cuckoo optimization algorithm to adapt to the optimization of integer parameters. In order to distinguish it from the basic cuckoo algorithm, this article refers to the above improved algorithm as cuckoo Algorithm 1.

Algorithm 1 Improved Cuckoo Optimization for Optimizing Integer Parameters
Step 1. Generate n random host nests x i , i = 1, · · · , n, rounding all nests, and compute the fitness F i , i = 1, · · · , n.
Step 2. Find best_nest and best_fitness. If t < iter_num, go to Step 3, else go to the last step.
Step 3. Update_nests: generate x new i , i = 1, · · · , n with Lévy flights, rounding all nests, and compute the fitness F new Step 4. Abandon_nests: generate a random fraction P for every nest x i , i = 1, · · · , n, if fraction P < Pa, build a new one at new locations x new i via Lévy flights, rounding it, and update x i = x new i .
Step 5. Calculate F t i , i = 1, · · · , n at the nest x i , i = 1, · · · , n in the t-th iterations, find max_fitness F tmax and the corresponding nest x tmax , if F tmax > best_fitness, update best_nest = x tmax and best_fitness = F tmax .

Improved Cuckoo Optimization for Optimizing Continuous and Integer Mixed Parameters
Assuming that the parameters to be trained include m continuous parameters and k integer parameters, the i-th host nest x i needs to include two parts: the continuous part x m i and the integer part x k i . Therefore, the i-th host nest can be expressed as To optimize continuous and integer mixed parameters simultaneously, Steps 1, 3, and 4 in the cuckoo optimization algorithm need to be modified. To distinguish it from the above two previous cuckoo algorithms, this article refers to the above improved algorithm as cuckoo Algorithm 2.

Algorithm 2 Improved Cuckoo Optimization for Optimizing Continuous and Integer Mixed Parameters
Step 1. Generate n random host nests x i , i = 1, · · · , n which include two parts, the random continuous part x m i , i = 1, · · · , n and the random integer part x k i , i = 1, · · · , n. Then, compute the fitness F i , i = 1, · · · , n.
Step 2. Find best_nest and best_fitness. If t < iter_num, go to Step 3, else go to the last step.
Step 3. Update_nests: generate host nests x new i , i = 1, · · · , n with Lévy flights, rounding the integer part, and compute the fitness F new Step 4. Abandon_nests: For every nest x i , i = 1, · · · , n, generate a random fraction P, if fraction P < Pa, build a new one at new locations x new i via Lévy flights and rounding the integer part, then update x i = x new i .
Step 5. Calculate F t i , i = 1, · · · , n at the nest x i , i = 1, · · · , n in the t-th iterations, find max_fitness F tmax and the corresponding nest x tmax , if F tmax > best_fitness, update best_nest = x tmax and best_fitness = F tmax .

Hyper-Parameters in LS-SVM
LS-SVM is an improvement on the standard SVM. It changes the insensitive loss function in SVM to a quadratic loss function, and replaces inequality constraints with equality constraints. The solving coefficient is greatly reduced, and the solving speed is accelerated. LS-SVM is described as follows: Take a given training dataset {x i , y i }, i = 1, · · · , n, where x i is an n-dimensional input vector and y i is a one-dimensional output scalar. The optimization problem and the constraint conditions of LS-SVM algorithm are: where w ∈ H is the weight vector and H is the higher dimension space projected by the nonlinear function ϕ(x) from the original space R. Furthermore, ξ i ∈ R is the slack variable for x i , which measures the deviation degree of a datum from the ideal condition of the classification model, and b ∈ R is the bias, γ is regularization factor. The regularization factor is similar to the penalty factor C in SVM and is used to adjust the confidence interval of LS-SVM and the proportion of empirical risk. By solving the above optimization problem, the decision function for classification can be obtained [29] as: where K(x, x i ) is kernel functions, α i are positive real constants and b is a real constant which can be obtained by solving the above optimization problem.
There are three commonly used kernel functions: linear kernel functions, polynomial kernel functions, and radial basis kernel functions. Among the three commonly used kernel functions, the radial basis kernel function can non-linearly map the samples to the high-dimensional space, and has fewer parameters and operations compared with the polynomial kernel function.
The width parameter σ (sigma) of the RBF function, together with the regularization parameter γ (gam), directly affect the performance of the LS-SVM model. However, the two hyper-parameters were continuously chosen as a fixed value according to experience. Therefore, it is very important to study the selection method of σ, and γ. In this paper, we used the cuckoo search algorithm to optimize the optimal gam and sigma, and then used the LS-SVM to identify user activities.

Hyper-Parameters in LSTM
LSTM is one of the RNNs and the primary objectives of LSTM are to model long-term dependencies and determine the optimal time lags for time series problems. These features are especially desirable for activity recognition, due to the lack of a priori knowledge on the relationship between prediction results and the length of input historical data. The LSTM architecture is composed of one input layer, one recurrent hidden layer, which has a basic unit that is memory block instead of traditional neuron node, and one output layer.
Suppose that the input sequence is denoted as x = (x 1 , x 2 , · · · , x T ), the LSTM computes the hidden vector sequence h = (h 1 , h 2 , · · · , h T ) and the output predicted sequence y = (y 1 , y 2 , · · · , y T ) by iterating the following equations: where the W term denotes weight matrices (e.g., W xh is the input-hidden weight matrix), the b term denotes bias vectors (e.g., b h is hidden bias vector) and H is the hidden layer function. For more information about the LSTM model, please refer to Reference [30]. In the model of LSTM, there are integer hyper-parameter num_units and batch_size that affect the performance. In the RMSprop optimization, there are continuous hyper-parameters, lr and rho, which also affect the performance of LSTM. Thus, there are both integer and continuous hyper-parameters affecting the performance of LSTM.

Validation
In this section, we validate the proposed method that used the cuckoo algorithm and the improved algorithm to solve the hyper-parameters problems in LS-SVM and LSTM with the activity recognition datasets in smart home. In doing so, we first optimized hyper-parameters in LS-SVM, based on the basic cuckoo algorithm of Section 4.1. Then, the improved cuckoo algorithms were used to optimize the hyper-parameters in LSTM of Section 4.2.

Hyper-Parameter Optimization in LS-SVM
This subsection verified the effectiveness of the cuckoo algorithm in optimizing hyperparameters in LS-SVM based on WiFi-based indoor positioning dataset.
Wi-Fi based indoor positioning mainly uses the signal strength value generated by Wi-Fi, that is, the RSSI value, to locate the user's location. Each hotspot that releases Wi-Fi signals in space is called AP (access point, the signal sent by the wireless router), and one or more APs may be detected at each location. They can be detected at the same time. The AP's BSSID (that is, the MAC address, which uniquely identifies this hotspot) and the LEVEL value of the Wi-Fi signal (the strength value of the received Wi-Fi signal, in dBm, also known as the RSSI value, Received Signal Strength Indication) need to be obtained The RSSI is continuously collected through a smart phone. First, it is necessary to write an APP and, then, call the Wi-Fi module that comes with the phone to collect and process the data. After that, the phone is saved or sent to the computer for processing.
Due to the instability and fading of the signal in space, the RSSI values measured at the same place and at different times will fluctuate to a certain extent. Therefore, when position matching is performed, corresponding algorithms are required to calculate the distribution of values, and then test this to predict the distribution of values. For example, input the corresponding coordinate matrix and RSSI matrix for training data, and then input the tested RSSI vector to predict the coordinate value to complete the positioning.
Here we used the LS-SVM to predict the user's location based on the RSSI value generated by Wi-Fi. Since the model parameter gam and sigma of the least square support vector machine have a great influence on the position prediction effect, we first used the cuckoo search algorithm to optimize the optimal gam and sigma, and then used the least square support vector machine to determine the position so as to make an estimate.
Errors continuously exist. Assuming that the real coordinates are (x0, y0) and the predicted coordinates are (x, y), the error between the predicted position coordinates and the real position coordinates in this paper was defined as: Before optimization, we used the empirical parameters gam = 0.001, sigma = 0.05 as the initial points, and the search intervals of gam and sigma were both set to [0.001, 1000]. The predictions errors of the training position with iterate 10, 100 and 1000 times respectively are shown in Figure 2.
Since the optimized parameters of iterate 1000 have the lowest prediction error for the training position, we used this parameter to predict and estimate 33 test positions during the test. Figure 3 is a schematic diagram of the errors before and after prediction at 33 locations, o is the estimated error of the empirical parameter gam = 0.001, sigma = 0.05 for each test location, * is the estimated error of the optimized parameter gam = 0.001, sigma = 4.3507 for each test location. From the figure, we could see that the optimized parameters greatly reduced the estimation error of the test data.
Since the coordinates of the 33 tested locations were very close to each other, we only gave a schematic diagram of the prediction comparison of the first 8 locations. Figure 4 is a schematic diagram of the comparison of predictions before and after optimization. the training position, we used this parameter to predict and estimate 33 test positions during the test. Figure 3 is a schematic diagram of the errors before and after prediction at 33 locations, o is the estimated error of the empirical parameter gam = 0.001, sigma = 0.05 for each test location, * is the estimated error of the optimized parameter gam = 0.001, sigma = 4.3507 for each test location. From the figure, we could see that the optimized parameters greatly reduced the estimation error of the test data.

Hyper-Parameter Optimization in LSTM
This subsection validated hyper-parameter optimization in LSTM with the ADL Adlnormal dataset [31] and Kasteren Dataset [32]. Adlnormal dataset was collected in a smart apartment test bed located on the WSU campus. The dataset recorded 24 WSU undergraduate students performing five ADLs, one at a time. For the 6425 samples of the data set, we divided them into three parts: 3000 for the training data set, 2000 for the validation data set, and 1425 for the test data set. The Kasteren dataset was collected in a three-room apartment where a 26-year-old man lives and there were 14 state-change sensors installed in this apartment. We conducted experiments using the previous 30,000 time slices and split the training, validation and test datasets into three equal parts, i.e., 10,000 time slices, respectively.
In the model of LSTM, there are integer hyper-parameters num_units and batch_size that affect the performance. In the RMSprop optimization, there are continuous hyperparameters, lr and rho, which also affect the performance of LSTM. Thus, there are both integer and continuous hyper-parameters affecting the performance of LSTM. This subsection used the proposed hyper-parameter optimization algorithm to optimize continuous hyper-parameters, integer hyper-parameters and mixed hyper-parameters, respectively.
In order to verify the effectiveness of our proposed method, the activity recognition accuracy obtained by this method was not only compared with the empirical value, but also compared with the activity recognition accuracy obtained by Optuna [25] optimizing hyperparameters. Finally, this subsection compared the accuracy of LSTM activity recognition under different hyper-parameter optimization strategies, and analyzed the impact of different strategies on the model.
After CS optimization, the continuous hyper-parameters for the Adlnormal dataset were lr = 0.00782101 and rho = 0.59629055, and the continuous hyper-parameters for the Kasteren Dataset were lr = 0.00381946, rho = 0.56684786. It could be seen that the optimized continuous hyper-parameters of LSTM were different for the different data sets.
Before continuous hyper-parameter optimization, the accuracy score of test data set for Adlnormal dataset was 0.7986 and the accuracy score of test data set for Kasteren Dataset was 0.8445. After continuous hyper-parameter optimization with CS, the accuracy score of test data set for Adlnormal dataset was 0.8529 and the accuracy score of test data set for Kasteren Dataset was 0.8537. Figure 5 compares the accuracy score of the test data set with initialized hyper-parameters, hyper-parameter optimization with CS and hyper-parameter optimization with Optuna, where CHO represents only continuous hyper-parameters, lr and rho, optimized. From the figure, we can see that the activity recognition accuracies of the two datasets both improved significantly after continuous hyper-parameter optimization with CS and CS was better than Optuna.

Experiment 2
This experiment validated integer hyper-parameter num_units and batch_size optimization in LSTM, based on the improved Cuckoo optimization Algorithm 1 with the ADL Adlnormal dataset and Kasteren dataset.
Set epochs = 10, the integer hyper-parameters after optimization for Adlnormal dataset were num_units = 253, batch_size = 491, and the integer hyper-parameters after optimization for Kasteren Dataset were num_units = 12, batch_size = 931. It could be seen that the optimized integer hyper-parameters of LSTM were different for the different data sets.
After integer hyper-parameter optimization, the accuracy score of test data set for

Experiment 2
This experiment validated integer hyper-parameter num_units and batch_size optimization in LSTM, based on the improved Cuckoo optimization Algorithm 1 with the ADL Adlnormal dataset and Kasteren dataset.
Set epochs = 10, the integer hyper-parameters after optimization for Adlnormal dataset were num_units = 253, batch_size = 491, and the integer hyper-parameters after optimiza-tion for Kasteren Dataset were num_units = 12, batch_size = 931. It could be seen that the optimized integer hyper-parameters of LSTM were different for the different data sets.
After integer hyper-parameter optimization, the accuracy score of test data set for Adlnormal dataset was 0.8220 and the accuracy score of test data set for Kasteren Dataset was 0.8560. Figure 6 compares the accuracy score of the test data set with initialized hyperparameters, hyper-parameter optimization with CS and hyper-parameter optimization with Optuna, where IHO represents only integer hyper-parameter num_units and batch_size optimized. From the figure, we can see that the activity recognition accuracies of the two datasets both improved significantly after integer hyper-parameter optimization with CS and CS was better than Optuna.

Experiment 3
This experiment validated mixed hyper-parameters lr, rho, num_units and batch_size optimization in LSTM, based on the improved Cuckoo optimization Algorithm 2 with the ADL Adlnormal dataset and Kasteren dataset.
To optimize the continuous and integer hyper-parameters together, the mixed hyperparameters were initialized as lr = 0.001, rho = 0.9, num_units = 128, batch_size = 200, and the search intervals for hyper-parameters were set as continuous hyper-parameters and integer hyper-parameters above.
After mixed hyper-parameters optimization, the accuracy score of test data set for Adlnormal dataset was 0.8446 and the accuracy score of test data set for the Kasteren dataset was 0.8693. Figure 7 compares the accuracy score of the test data set with initialized hyper-parameters, hyper-parameters optimization with CS and hyperparameters optimization with Optuna, where MHO represent optimized mixed hyperparameters lr, rho, num_units and batch_size together with lr = 0.001, rho = 0.9, num_units = 128 and batch_size = 200 as the initial value of optimization. From the figure, we can see that the activity recognition accuracies of the two datasets both improved significantly after mixed hyper-parameter optimization with CS and CS was better than Optuna.

Experiment 3
This experiment validated mixed hyper-parameters lr, rho, num_units and batch_size optimization in LSTM, based on the improved Cuckoo optimization Algorithm 2 with the ADL Adlnormal dataset and Kasteren dataset.
To optimize the continuous and integer hyper-parameters together, the mixed hyperparameters were initialized as lr = 0.001, rho = 0.9, num_units = 128, batch_size = 200, and the search intervals for hyper-parameters were set as continuous hyper-parameters and integer hyper-parameters above.
After mixed hyper-parameters optimization, the accuracy score of test data set for Adlnormal dataset was 0.8446 and the accuracy score of test data set for the Kasteren dataset was 0.8693. Figure 7 compares the accuracy score of the test data set with initialized hyperparameters, hyper-parameters optimization with CS and hyper-parameters optimization with Optuna, where MHO represent optimized mixed hyper-parameters lr, rho, num_units and batch_size together with lr = 0.001, rho = 0.9, num_units = 128 and batch_size = 200 as the initial value of optimization. From the figure, we can see that the activity recognition accuracies of the two datasets both improved significantly after mixed hyper-parameter optimization with CS and CS was better than Optuna.

Experiment 4
This experiment compared the accuracy of LSTM activity recognition under different hyper-parameter optimization strategies, and analyzed the impact of different strategies on the model with the ADL Adlnormal dataset and Kasteren Dataset.
The optimized hyper-parameter and activity recognition accuracy with different strategies are shown in Table 1, where CHO and IHO meant optimizing continuous hyper-parameters, lr and rho, and optimizing integer hyper-parameters, num_units and batch_size, separately, and finally, the separately trained parameters were merged together for activity recognition. CHO after IHO meant optimizing continuous hyperparameters, lr and rho, with optimized integer hyper-parameters, num_units and batch_size, as input. IHO after CHO meant optimizing integer hyper-parameters, num_units and batch_size, with optimized continuous hyper-parameters, lr and rho, as input. MHO after CHO and IHO meant optimizing mixed hyper-parameters with CHO and IHO result as the initial value of optimization.

Experiment 4
This experiment compared the accuracy of LSTM activity recognition under different hyper-parameter optimization strategies, and analyzed the impact of different strategies on the model with the ADL Adlnormal dataset and Kasteren Dataset.
The optimized hyper-parameter and activity recognition accuracy with different strategies are shown in Table 1, where CHO and IHO meant optimizing continuous hyper-parameters, lr and rho, and optimizing integer hyper-parameters, num_units and batch_size, separately, and finally, the separately trained parameters were merged together for activity recognition. CHO after IHO meant optimizing continuous hyper-parameters, lr and rho, with optimized integer hyper-parameters, num_units and batch_size, as input. IHO after CHO meant optimizing integer hyper-parameters, num_units and batch_size, with optimized continuous hyper-parameters, lr and rho, as input. MHO after CHO and IHO meant optimizing mixed hyper-parameters with CHO and IHO result as the initial value of optimization. Figures 8 and 9 compared the accuracy score of different hyper-parameter optimization strategies for Adlnormal dataset and Kasteren Dataset, respectively. From the figure, we can see that the activity recognition accuracy was improved after hyper-parameter optimization for all optimization strategies. Compared with integer parameters, continuous parameters had a greater impact on the LSTM. Mixed hyper-parameter optimization obtained a stable improvement effect. The optimization strategies CHO and IHO, CHO after IHO, IHO after CHO, MHO after CHO and IHO also obtained relatively good effects.        The accuracy score of Adlnormal dataset

Conclusions
In order to reduce the hyper-parameter settings influence of certain algorithms on activity recognition, this paper proposed to use the cuckoo optimization algorithm to optimize the hyper-parameters in the algorithms, and improved the cuckoo algorithm to adapt to the optimization of integer hyper-parameters and hybrid hyper-parameters in activity recognition problems.
To validate the proposed method, this paper first optimized the hyper-parameters in LS-SVM, based on the basic cuckoo algorithm, and compared the WiFi-based indoor localization results before and after optimizing the hyper-parameters. The experimental results showed that after CS optimized the hyper-parameters, the LS-SVM model predicted the positions more accurately. Then, this paper validated hyper-parameter optimization in LSTM with the ADL Adlnormal dataset and Kasteren Dataset and compared the activity recognition accuracy of CS-optimized hyper-parameters, empirical hyper-parameters, and Optuna optimized hyper-parameters. Experimental results showed that the optimized hyper-parameters of LSTM were different for different data sets and optimizing hyper-parameters with CS obtained the best activity recognition accuracy compared with empirical hyper-parameters, and Optuna optimized hyper-parameters. Finally, this paper compared the accuracy of LSTM activity recognition under different hyper-parameter optimization strategies, and analyzed the impact of different strategies on the model. The result showed that the mixed hyper-parameter optimized with the improved cuckoo algorithm obtained a stable improvement effect. The other optimization strategies also obtained relatively good effects.
Each contribution to activity recognition brings us one step closer to the realization of Ambient Intelligence. As future work, we plan to expand hyper-parameter optimization in LS-SVM and LSTM to other activity recognition algorithms, and expand the cuckoo algorithm hyper-parameter optimization to other artificial intelligence algorithms, such as particle swarm and wolf swarm algorithms.

Conflicts of Interest:
The authors declare no conflict of interest.