Feature Optimization for Gait Phase Estimation with a Genetic Algorithm and Bayesian Optimization

: For gait phase estimation, time-series data of lower-limb motion can be segmented according to time windows. Time-domain features can then be calculated from the signal enclosed in a time window. A set of time-domain features is used for gait phase estimation. In this approach, the components of the feature set and the length of the time window are inﬂuential parameters for gait phase estimation. However, optimal parameter values, which determine a feature set and its values, can vary across subjects. Previously, these parameters were determined empirically, which led to a degraded estimation performance. To address this problem, this paper proposes a new feature extraction approach. Speciﬁcally, the components of the feature set are selected using a binary genetic algorithm, and the length of the time window is determined through Bayesian optimization. In this approach, the two optimization techniques are integrated to conduct a dual optimization task. The proposed method is validated using data from ﬁve walking and ﬁve running motions. For walking, the proposed approach reduced the gait phase estimation error from 1.284% to 0.910%, while for running, the error decreased from 1.997% to 1.484%.


Introduction
Walking and running are essential motions in daily life [1]. Gait motions can be analyzed by gait phases because gait motion is a cyclic movement composed of several gait phases [2]. Therefore, gait phase identification is widely used to evaluate and diagnose gait. Gait phases have been investigated using foot pressure sensors [3] and inertial measurement units (IMUs) [4]. Accordingly, gait phase analysis has been used in several fields, such as rehabilitation [5], diagnosis [6,7], and assistive robot design [8,9]. The performance of these applications relies on the gait phase estimation accuracy.
Gait phase estimation has been conducted using two different approaches: the discrete and continuous gait phase approaches. In the discrete gait phase approach, the gait event is recognized as one of the discrete gait phases, such as loading response, midstance, and terminal stance [10]. These discrete phases are differentiated based on the kinematic characteristics of lower limbs. In the continuous gait phase approach, the gait phase is a motion status that is defined as a value between 0% and 100% of the gait cycle [8]. More specifically, the heel strike of the left (or right) foot matches with 0% and 100%. Other gait events within one cycle correspond to values between 0% and 100%; the gait phase values linearly increase from 0% to 100% over time during one cycle. In practical applications, the discrete gait phase approach has drawbacks when compared with the continuous gait phase approach. First, assigning gait states to discrete gait phases is challenging, especially in transition states between two distinct phases [11]. This gait phase misidentification may provide unreliable gait information, which results in inadequate controls. Furthermore, controls that are determined by the discrete phase The remainder of this paper is organized as follows. Section 2 describes the thigh IMU signal used in this study for gait phase estimation. Section 3 describes the MLP-based gait phase estimation model used in this study. Subsequently, the proposed method for the feature set and hyperparameter determination is explained. Section 4 presents the optimized results for each of the walking and running cases evaluated. Finally, the paper is concluded in Section 5.

Data Description
This study used the IMU data of gait motions from five healthy male subjects (28 ± 3 years; 181.3 ± 8.0 cm; 78.7 ± 9.8 kg, mean ± SD); these IMU data were obtained from a previously conducted experiment in a previous work [39]. In that experiment, two IMU sensors (MTi-3 AHRS, Xsens Technologies, the Netherlands) were attached to the anterior part of each thigh to collect the data, as shown in Figure 1. The subjects walked (1.5 m/s) and ran (2.5 m/s) on an instrumented treadmill with mounted force plates (Bertec, Columbus, OH, USA). The IMU sensors recorded the thigh angles and angular velocities at a 1 kHz sampling rate. Details regarding the size of the signal used in this study, including the number of strides and the total experimental time for each subject, are provided in Table 1. Seventy percent of the time-series data were used for training the MLP, the remaining 30% of the data were used for testing.  The thigh motion IMU data of the five subjects are shown in Figure 2. The data show variations in gait patterns across subjects. Because of these variations, if the same feature set and LTW are used for every subject, the performance of the gait estimation algorithm may downgrade.  The thigh motion IMU data of the five subjects are shown in Figure 2. The data show variations in gait patterns across subjects. Because of these variations, if the same feature set and LTW are used for every subject, the performance of the gait estimation algorithm may downgrade.

MLP for Gait Phase Estimation
This section describes the gait estimation procedure using TD features. First, the TD features are calculated from the original time-series data in the current time window, as shown in Figure 3. Notably, the LTW is different for each TD feature set. Notably, the feature values were not normalized because a previous study [22] showed that an accurate estimation can be obtained without normalization. Next, the calculated TD features are fed into an MLP consisting of two hidden layers. Because the optimal selection of TD features is one of the main objectives of this study, the number of input nodes can vary depending on the number of selected TD features. The two hidden layers were constructed with 15 and 20 nodes, respectively. In this study, tanh activation function was applied to every hidden layer. The details on the MLP architecture are provided in Tables S4 and S5 in Supplementary Materials. The two output nodes represent continuously varying variables x and y, which can be converted to the gait phase pg as:

MLP for Gait Phase Estimation
This section describes the gait estimation procedure using TD features. First, the TD features are calculated from the original time-series data in the current time window, as shown in Figure 3. Notably, the LTW is different for each TD feature set. Notably, the feature values were not normalized because a previous study [22] showed that an accurate estimation can be obtained without normalization.

MLP for Gait Phase Estimation
This section describes the gait estimation procedure using TD features. First, the TD features are calculated from the original time-series data in the current time window, as shown in Figure 3. Notably, the LTW is different for each TD feature set. Notably, the feature values were not normalized because a previous study [22] showed that an accurate estimation can be obtained without normalization. Next, the calculated TD features are fed into an MLP consisting of two hidden layers. Because the optimal selection of TD features is one of the main objectives of this study, the number of input nodes can vary depending on the number of selected TD features. The two hidden layers were constructed with 15 and 20 nodes, respectively. In this study, tanh activation function was applied to every hidden layer. The details on the MLP architecture are provided in Tables S4 and S5 in Supplementary Materials. The two output nodes represent continuously varying variables x and y, which can be converted to the gait phase pg as: Next, the calculated TD features are fed into an MLP consisting of two hidden layers. Because the optimal selection of TD features is one of the main objectives of this study, the number of input nodes can vary depending on the number of selected TD features. The two hidden layers were constructed with 15 and 20 nodes, respectively. In this study, tanh activation function was applied to every hidden layer. The details on the MLP architecture are provided in Tables S4 and S5 in Supplementary Materials. The two output nodes represent continuously varying variables x and y, which can be converted to the gait phase p g as: If the error in the gait phase φ is directly used in the regression cost function, the result would be unsatisfactory. Because the gait phase discontinuously changed from 100% to Appl. Sci. 2021, 11, 8940 5 of 14 0% at the end of each gait cycle, the error is much larger near 0% (and 100%) than at other points. To avoid this unequally distributed error, a conversion technique (1) was adapted from a previous study [22]. The error is very large around the end of the gait cycle. For example, if the ground truth is 1%, the MLP prediction lags behind by 2%. Then, the MLP result would be 99%, and the absolute error is 98%. Thus, this unrealistic error must be addressed using (1).
This issue can be addressed by converting the gait phase into continuously varying variables x and y, as shown in (1). After the MLP calculates x and y, the current gait phase is obtained using the arctan2 function. A loss function L is defined using the mean absolute error as follows: where n represents the number of training data points. The Adam optimizer was applied with a learning rate of 0.001 and 5000 iteration epochs, which were fixed throughout the study. The estimation algorithm was trained on a workstation with a single graphics processing unit (CPU: Intel Xeon CPU E5-2620 v4, 2.10 GHz, and 16 cores; GPU: NVIDIA GeForce RTX 2080 Ti). TensorFlow was used as the deep learning framework.

Optimization Algorithms
The main objectives of this study are (1) the selection of the best feature set and (2) obtaining the optimal LTWs. A new optimizing procedure is proposed to achieve these objectives simultaneously, as shown in Figure 4. The BGA selects the feature sets, and BO determines the optimal LTWs. Using this structure, both the TD features and LTWs can be effectively optimized in a single procedure. If the error in the gait phase is directly used in the regression cost function, the result would be unsatisfactory. Because the gait phase discontinuously changed from 100% to 0% at the end of each gait cycle, the error is much larger near 0% (and 100%) than at other points. To avoid this unequally distributed error, a conversion technique (1) was adapted from a previous study [22]. The error is very large around the end of the gait cycle. For example, if the ground truth is 1%, the MLP prediction lags behind by 2%. Then, the MLP result would be 99%, and the absolute error is 98%. Thus, this unrealistic error must be addressed using (1).
This issue can be addressed by converting the gait phase into continuously varying variables x and y, as shown in (1). After the MLP calculates x and y, the current gait phase is obtained using the arctan2 function. A loss function is defined using the mean absolute error as follows: where n represents the number of training data points. The Adam optimizer was applied with a learning rate of 0.001 and 5000 iteration epochs, which were fixed throughout the study. The estimation algorithm was trained on a workstation with a single graphics processing unit (CPU: Intel Xeon CPU E5-2620 v4, 2.10 GHz, and 16 cores; GPU: NVIDIA GeForce RTX 2080 Ti). TensorFlow was used as the deep learning framework.

Optimization Algorithms
The main objectives of this study are (1) the selection of the best feature set and (2) obtaining the optimal LTWs. A new optimizing procedure is proposed to achieve these objectives simultaneously, as shown in Figure 4. The BGA selects the feature sets, and BO determines the optimal LTWs. Using this structure, both the TD features and LTWs can be effectively optimized in a single procedure.  To conduct this optimization, the estimation performance should be quantified. Therefore, the estimation performance was evaluated using the estimation error as follows: Appl. Sci. 2021, 11, 8940 Here, x i and y i are the MLP-predicted values, andx i andŷ i are the true values calculated from the true gait phase values. The root mean square error (RMSE) is divided by two because the range of x i and y i is between −1 and 1. Seventy percent of the timeseries data were used for training the MLP. Subsequently, the test error was computed using the remaining 30% of the data. The test error was used as the objective function to be minimized in the optimization procedure.

Binary Genetic Algorithm for Feature Selection
A BGA was used to select an optimal feature set for the target subject. The BGA generates different populations iteratively to create binary-type genes, each of which corresponds to one of the TD features. The BGA can be implemented using the following procedure. First, an initial population is generated. Then, the objective function value is calculated for each chromosome; the objective function is described in Section 3.2.2. The chromosome is a vector consisting of binary values, and each binary value implies the presence of a designated TD feature. Next, a new population is created by applying crossover and mutation to the chromosomes. Finally, the objective functions of the new population are evaluated. This procedure can be repeated until the generation number reaches a predefined value.
A feature pool for the BGA was constructed with 12 TD features: minimum, maximum, mean (µ), standard deviation (σ), initial (t i ), and final (t f ) values for the thigh angle and angular velocity. Specifically, the feature pool was constructed as (θ min , θ (t f ) ). If zero is allocated to a gene, the corresponding feature is not used in the gait phase estimation. If unity is allocated to a gene, the corresponding feature is used in the estimation. All features are composed of the values for the left and right thighs. For example, if µ θ is selected as a feature, then the mean values of both the left and right thigh angles (i.e., µ L,θ and µ R,θ ) are used in the gait phase estimation, as shown in Figure 5.
To conduct this optimization, the estimation performance should be quantified. Therefore, the estimation performance was evaluated using the estimation error as follows: Here, and are the MLP-predicted values, and and are the true values calculated from the true gait phase values. The root mean square error (RMSE) is divided by two because the range of and is between -1 and 1. Seventy percent of the timeseries data were used for training the MLP. Subsequently, the test error was computed using the remaining 30% of the data. The test error was used as the objective function to be minimized in the optimization procedure.

Binary Genetic Algorithm for Feature Selection
A BGA was used to select an optimal feature set for the target subject. The BGA generates different populations iteratively to create binary-type genes, each of which corresponds to one of the TD features. The BGA can be implemented using the following procedure. First, an initial population is generated. Then, the objective function value is calculated for each chromosome; the objective function is described in Section 3.2.2. The chromosome is a vector consisting of binary values, and each binary value implies the presence of a designated TD feature. Next, a new population is created by applying crossover and mutation to the chromosomes. Finally, the objective functions of the new population are evaluated. This procedure can be repeated until the generation number reaches a predefined value.
A feature pool for the BGA was constructed with 12 TD features: minimum, maximum, mean ( ), standard deviation ( ), initial ( ), and final ( ) values for the thigh angle and angular velocity. Specifically, the feature pool was constructed as (  ,  ,  , , ( ) , ( ) , , , , , ( ) , ( ) ). If zero is allocated to a gene, the corresponding feature is not used in the gait phase estimation. If unity is allocated to a gene, the corresponding feature is used in the estimation. All features are composed of the values for the left and right thighs. For example, if is selected as a feature, then the mean values of both the left and right thigh angles (i.e., , and , ) are used in the gait phase estimation, as shown in Figure 5.  In this study, the population size for each generation was fixed at 15, the maximum number of generations at 20, the crossover rate at 0.5, and the mutation rate at 0.05. Although Grefenstette et al. [40] suggested using 50 initial populations, this study selected  15 initial populations considering the small dimension of the feature pool. Consequently, the other parameters were adjusted based on [40]. Additionally, roulette-wheel crossover and bit-flip mutation were applied to the crossover and mutation methods, respectively. The roulette wheel is a widely used stochastic selection method in which the probability of an individual being selected is proportional to its fitness value [41]. This strategy enables the delivery of valuable individuals to the next generation with a high probability. The bit-flip mutation flips the binary value of an individual to generate a mutated chromosome [42]. Finally, one elite chromosome for each generation was preserved and passed to the next generation.

Bayesian Optimization for LTW
In this study, BO was used as the optimizer for the LTWs of the TD features, and the loss values of the test datasets, obtained by (2), were the values to be minimized. The BO of these loss values was realized using a surrogate model and an acquisition function. A Gaussian process was used as the surrogate model, and an expected improvement function was used as the acquisition function. The parameters of the BO algorithm implemented in this study are summarized in Table 2. After the BGA assigns a feature set to the BO, the BO determines the LTW value of each TD feature. Subsequently, the MLP is trained with the features calculated using the LTW, as shown in Figure 6. Next, the MLP returns the performance of the test dataset. Then, the BO selects new LTW values based on the previous results. The procedure described above is repeated until a predefined number of iterations are conducted. Notably, in this study, the BO was conducted with five random initial points for every feature set to obtain reliable results. Through this procedure, the BO is able to obtain the optimal gait estimation performance for the assigned feature set, and the corresponding optimal performance is returned to the BGA.
There are two special cases in which the BO needs to be operated differently. First, if every gene in a chromosome is zero, the MLP cannot estimate the gait phase. Second, if the selected TD feature set consists only of the final angle θ t f and angular velocity . θ t f value, the LTW determination is unnecessary because these final values are not affected by the LTW. Thus, BO does not iteratively optimize the LTWs in these cases.

Results
The proposed algorithm was evaluated using the walking and running motion data of five subjects. Because this study aims to optimize the gait estimation algorithm for each specific subject, the error of the proposed optimization was compared to that of a heuristic model.

Estimation Error of the Heuristic Model
A feature set and LTW are empirically determined in the heuristic model. Five feature sets were selected, as shown in Table 3. Feature set 1 is a set comprising 12 TD features; feature sets 2 and 3 consist of TD features extracted from the angle and angular velocity, respectively; feature sets 4 and 5 are randomly selected from the possible TD feature set. When training the MLP with feature sets 1-5, the LTW value was set to 300 ms, as suggested in a previous study [22].
The MLP was trained using a heuristically determined feature set and LTW. Seventy percent of the time-series data were used for training. The test error was computed using the remaining 30% of the data. The MLP was trained using five different initial values for weights and biases. The mean and standard deviations of the five errors were obtained and are presented in Tables 4 and 5. For walking, the error ranges from 1.1% to 1.6% in most cases, as presented in Table 4. The error is relatively large when feature set 3 is used, suggesting that angle information is essential for gait phase estimation. For running, the error is approximately 2%, which is relatively larger than that for walking, as shown in Table 5.
To quantitatively compare the performance of the heuristic model and the proposed optimizer, one of the five heuristic feature sets was selected and used for comparison. For walking, feature set 1 provides the smallest or second smallest error. Therefore, this set was used for the comparison. Similarly, feature set 2 was chosen as the representative heuristic set for running.

Results
The proposed algorithm was evaluated using the walking and running motion data of five subjects. Because this study aims to optimize the gait estimation algorithm for each specific subject, the error of the proposed optimization was compared to that of a heuristic model.

Estimation Error of the Heuristic Model
A feature set and LTW are empirically determined in the heuristic model. Five feature sets were selected, as shown in Table 3. Feature set 1 is a set comprising 12 TD features; feature sets 2 and 3 consist of TD features extracted from the angle and angular velocity, respectively; feature sets 4 and 5 are randomly selected from the possible TD feature set. When training the MLP with feature sets 1-5, the LTW value was set to 300 ms, as suggested in a previous study [22]. Table 3. Feature sets for comparison.
The MLP was trained using a heuristically determined feature set and LTW. Seventy percent of the time-series data were used for training. The test error was computed using the remaining 30% of the data. The MLP was trained using five different initial values for weights and biases. The mean and standard deviations of the five errors were obtained and are presented in Tables 4 and 5. For walking, the error ranges from 1.1% to 1.6% in most cases, as presented in Table 4. The error is relatively large when feature set 3 is used, suggesting that angle information is essential for gait phase estimation. For running, the error is approximately 2%, which is relatively larger than that for walking, as shown in Table 5. To quantitatively compare the performance of the heuristic model and the proposed optimizer, one of the five heuristic feature sets was selected and used for comparison. For walking, feature set 1 provides the smallest or second smallest error. Therefore, this set was used for the comparison. Similarly, feature set 2 was chosen as the representative heuristic set for running.

Error Reduction Due to Optimization
During the optimization procedure, the error value decreased and converged for all subjects, as shown in Figure 7. The final optimization results for walking and running are presented in Tables 6 and 7, respectively. The shaded boxes indicate that the corresponding feature was not included in the best feature set. While the initial angle θ(t i ) and angular velocity . θ(t i ) depend on the LTW, the final angle θ t f and angular velocity . θ t f are independent of the LTW. Thus, no LTW values are provided for θ t f and . θ t f . To assess the error reduction for walking, the error for feature set 1 is also provided in Table 6 (see the numbers in parentheses in the last row). For walking, when the error is averaged over all subjects, the average error is 0.910% after optimization. This subjectaverage error is 1.284% when feature set 1 is used. For running, the error for feature set 2 is provided in Table 7 (see the numbers in parentheses in the last row). The average error is 1.997% for feature set 2. The error decreased to 1.484% when using the proposed approach.
Additional case study is provided in Tables S1 and S2 in Supplementary Materials, which suggests the importance of feature selection on the estimation accuracy. It is worth noting that the optimal LTW is considerably different between some subjects. This difference is caused by the local minima of error over LTW. Details are provided in Figure S2 in Supplementary Materials.   A gray shadow represents a feature that is not selected in the best feature set. A hyphen indicates that this feature is selected, but no LTW value is required to calculate the corresponding feature value.
To assess the error reduction for walking, the error for feature set 1 is also provided in Table 6 (see the numbers in parentheses in the last row). For walking, when the error is averaged over all subjects, the average error is 0.910% after optimization. This subjectaverage error is 1.284% when feature set 1 is used. For running, the error for feature set 2 is provided in Table 7 (see the numbers in parentheses in the last row). The average error is 1.997% for feature set 2. The error decreased to 1.484% when using the proposed approach. Additional case study is provided in Tables S1 and S2 in Supplementary Materials, which suggests the importance of feature selection on the estimation accuracy.  A gray shadow represents a feature that is not selected in the best feature set. A hyphen indicates that this feature is selected, but no LTW value is required to calculate the corresponding feature value. A gray shadow represents a feature that is not selected in the best feature set. A hyphen indicates that this feature is selected, but no LTW value is required to calculate the corresponding feature value.

Discussion
In this study, the best feature set and its LTW were determined for each subject with BGA and BO. Some TD features were excluded from the best feature set, suggesting that some features were less useful than others. For example, the optimizer selects only six features among the 12 candidate features for subjects 1 and 3 for walking. For running, only four features were selected for subject 3. These small-sized feature sets show higher accuracy than other feature sets, which shows that optimization can also reduce the number of features. A smaller number of features reduces the computation time required for feature extraction. Table 8 lists the computation time for feature extraction. The computation cost for σ θ is considerably larger than that for the other features. Thus, a feature set that does not contain σ θ would be much faster than other feature sets with σ θ . Although the computational speed was not considered for optimization in this study, it could be included in future optimization studies. Table 8. Comparison of the computation time required for feature extraction with an LTW of 300 ms. The ratio represents the computation time normalized by the computation time required to extract θ(t i ). The best feature sets and optimal LTW values varied considerably across subjects, as shown in Tables 6 and 7. This finding implies that personal variations in features and LTWs are considerable; thus, personalized optimization is necessary for gait estimation. Furthermore, the best feature sets for walking and running are different, which indicates that the effects of features on the estimation are also affected by gait speed. Because the gait patterns of walking and running motions may differ considerably, the gait phase needs to be estimated using different TD features.
This study proposes a mathematical procedure to extract the TD features and LTWs. Previously, most studies selected a TD feature set without a systematic approach. For example, Kang et al. [22] used a single TD feature set for each subject. Farah et al. [23] changed the feature sets manually and compared the results. The proposed method is especially effective when the number of candidate features is very large. A grid search is a traditional feature selection approach in which all possible cases are verified to find an optimal set. However, this grid search requires a considerable amount of time if the number of features is large. Considering the time required for optimization, the proposed method is much more efficient than the grid search approach for large feature pools.

Conclusions
The gait phase estimation performance strongly depends on features. However, parameter determination for feature extraction is complicated. For gait estimation, TD feature set selection and LTW determination require different techniques. Previous studies determined the features by trial and error, which does not guarantee optimal features. To address this problem, this study proposes an optimizing procedure composed of a BGA and BO. This approach can effectively reduce the estimation error by simultaneously considering the feature set and LTWs. This study also revealed that the best feature set and optimal LTW vary across subjects and motion types (i.e., walking and running).
The proposed approach can be modified depending on its application. For example, if the gait phase is required for assistive robots, computation speed is the most important factor. In this system, the upper bound of the LTW can be lowered during the optimization process. Then, the time required to calculate the values of the TD features can be decreased. The maximum number of TD features can also be defined in order to reduce computation time. Penalty values can be allocated to individual features. Some features require a very short computation time, whereas other features require significant computational resources.
For instance, θ(t i ) and θ t f can be handled with very small computational resources. However, µ θ and σ θ require a relatively lengthy computation time, as shown in Table 8. Thus, the cost function of the optimizer can be modified to consider computation time. For example, a high cost can be applied if µ θ and σ θ are selected because their computation time is relatively long. This modification can provide different features that are optimized for computational speed.
Feature optimization for various walking (or running) speeds is required for practical use. In this study, optimization was conducted for gait motions where the speed remained constant over time. As shown in the optimization results, the optimal feature set and LTW changes according to walking (or running) speed. Thus, if the gait phase needs to be estimated at various speeds, the optimizer should consider the estimation errors over the target speed range. Alternatively, adaptive feature selection could be used as a scheme to handle speed variations. Walking (or running) speed can be classified into discrete levels. Then, the optimal feature set and LTW at each level is predefined. Subsequently, the estimation algorithm is able to handle speed variations by updating the feature set and LTW according to the walking (or running) speed.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/app11198940/s1, Table S1: Effects of optimization on the test error for walking, Table S2: Effects of optimization on the test error for running, Table S3: Estimation result with different MLP nodes for walking, Table S4: Estimation result with different MLP nodes for running, Figure S1: BO result for the LTW of θ min , Reference [22].

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: This study has used data measured in a previously published study.

Data Availability Statement:
The data supporting the results of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.