3.1. Data Collection
In the experiments, we recruited four undergraduate students (one female, three male) for the experiment. The participants were 20 to 23 years old, weighed 45 kg to 75 kg, and were between 155 cm and 180 cm tall. Participants were able to walk normally and had no conditions that affected their gait.
The wearable sensors were triaxle acceleration sensors. Due to the improvement of the sensor manufacturing process and optimized detection algorithms, more data can be obtained today using fewer sensors [
33], which is one of the important reasons why inertial sensors are increasingly used in gait events in recent years.
The acceleration sensors were placed on the foot instep [
34], calf [
19], and thigh [
18]. The direction of the acceleration sensor on the instep, calf, and thigh is shown in
Figure 1a. The acceleration resolution of the three-axis accelerometer was 6.1 × 10
−5g, the attitude measuring stability was 0.01°, and the baud rate range was 2400–921,600 bps. The baud rate used in the experiment was 115,200 bps and the sampling frequency was 50 Hz.
The toggle switch of the triaxial acceleration sensor was used to collect the acceleration data. Participants were then asked to walk for at least 120 s on the already configured treadmill. In addition, the treadmill speeds were set at 0.78 m/s, 1.0 m/s, and 1.25 m/s, respectively. Participants walked normally three times on a treadmill at each speed, with all settings being the same in each state. In order to prevent the participants from affecting the gait due to fatigue, the experiment requires the participants to rest for 2 min for each walking test. In addition, we only start saving data when the treadmill’s running speed reaches the set speed. Similarly, when the experiment was stopped and the treadmill began to slow down, we stopped collecting data. Moreover, each participant was asked to perform the same experiment under the same conditions to ensure the reliability and validity of the collected data.
3.2. Data Preprocessing
Because each dataset has multiple characteristics and includes the acceleration in the X-, Y-, and Z-directions, a large number of features would result in large complexity of the DNN model. PCA uses dimensionality reduction and transforms multiple indices into a few comprehensive indices (principal components). The principal components contain most of the information of the original variables and the information in the principal components is not correlated. This method reduces many complex factors into several principal components, which simplifies the problem. In this study, PCA was used to compress the data and generate a synthetic variable Comp. Comp is expressed by Equation (1), where
, and
represent the acceleration in the X-, Y-, and Z-directions respectively, and z1, z2, and z3 represent the coefficients of the acceleration in three directions, respectively. The distribution of z1, z2, and z3 corresponding to different body parts at asynchronous speed is shown in
Table 1. In the walking experiment, the acceleration and composite acceleration data curves of the three directions collected are shown in
Figure 2.
During normal walking, the acceleration signals in the three directions of the foot, thigh, and calf exhibit periodicity. The sway phase accounts for approximately 40% of the total gait phase, and the stance phase accounts for approximately 62% of the total gait phase [
35]. We can approximate that the stance phase is the biggest phase in the walking cycle. Determined according to the division of gait phase by Miguel et al. [
27], the phase division in this paper is shown in
Figure 3. Feature extraction is used to extract meaningful information or noise from acceleration graphs. After processing, the corresponding gait phase is obtained by using a classification algorithm.
On a related note, feature extraction is an effective and important method in extracting meaningful information from acceleration signals. The extracted features used in this study include the standard deviation (SD), mean absolute value (MAV), maximum (MAX), minimum (MIN), and median (MED). These features were selected because some of them were used in the study by Miguel et al. on acceleration signals [
27].
Su et al. [
36] proposed the use of single and multiple feature sets to achieve high accuracy. Therefore, to assess multi-feature performance and improve recognition accuracy, SD, MAX, MIN, MED, and MAV were combined to create the input feature vector. Because PCA was performed for dimensionality reduction, the number of input eigenvectors for the neural network is relatively low.
3.3. Gait Phase Division
A person’s walking process is a rhythmic movement and a complete walking cycle consists of following the ground with one foot to landing again with the same heel [
35]. During the walking cycle, the legs support the body’s weight during forward movement. Taborri et al. [
11] divided the gait phase into four phases, which comprises: (i) heel strike (HS); (ii) flat foot (FF); (iii) heel-off (HO); and (iv) swing phase (SW). This phase division is relatively detailed and makes the system more complex, thereby potentially lowering the recognition accuracy. Mileti et al. [
37] divided the gait cycle into four phases, namely, loading response (LR), flat-foot (FF), pre-swing (PS), and swing (SW) for the analysis, but there is a large error in the SW phase and the PS phase. Therefore, in order to improve the recognition accuracy, it is usually necessary to use fewer divisions. Most researchers tend to group this cycle into stance and swing phases [
16]. By referring to the above classification of gait, and ensuring the scientificity of the classification of gait, it is decided to divide the walking cycle into the stance and swing phases. Among them, the standing posture mainly includes HS, FF, and HO phases. The gait phase division is shown in
Figure 3. The first 60% of the signal of a complete cycle belongs to the stance phase, and the next 40% of the signal belongs to the swing phase.
Then we need to split the data. The data collected by the inertial sensor is a data stream distributed over time. It is not possible to directly extract and classify the features, and the data needs to be segmented. At present, the method of data segmentation is a multi-dimensional sliding window segmentation method [
38]. The acceleration signal is divided into sections with different gait phase information by using a sliding window segmentation method. The sliding windows are divided into overlapping windows and separated windows. Overlapping windows usually minimize the influence of the transition process on the detection accuracy [
39]. Therefore, a window with a 50% overlap is used in most studies [
38]. The sliding window uses a 50% coverage period to evaluate the gait phase.
Figure 4 shows a schematic diagram of the sliding window segmentation.
3.4. Proposed LSTM-DNN Algorithm
LSTM network is proposed based on a cyclic neural network (CNN). Hochreiter et al. [
40] proposed the LSTM network by adding thresholds such as an input gate, forget gate, output gate, and memory unit, which solved the problem of gradient disappearance or gradient explosion of the RNN and improved the network’s ability to process long sequence data. The difference between LSTM and an ordinary neural network is that the nodes between the hidden layers are connected, i.e., the input of the hidden layer at the given time not only contains the output of the hidden layer prior to the given time but also includes the output of the same hidden layer at a previous time. The historical information of the time series is stored in the network hidden layer [
41].
The LSTM can be utilized to process sequence data. Since the acceleration signal we collect is a time-series signal, the LSTM network can be utilized to process the acceleration signal. Unlike traditional classifiers such as SVM, the LSTM can capture feature vectors conveniently and automatically. The feature vectors enter the network directly and a modified classification model is established, whereas a traditional classification model requires more time to extract the feature vectors, leading to possible failure in the data preprocessing stage [
42].
The LSTM algorithm model is shown in
Figure 5 [
43], where
is the input eigenvector of the current cell,
is the output of the last cell, sigma is the sigmoid function,
is the forget gate for information to be abandoned,
is the unit before the update, and the output value of
is a real number between 0 and 1; 0 means to forget all the information of the previous cell unit and 1 means to retain all the information of the previous cell unit and input the output value to
. The calculation process of the forget gate is defined in Equation (2); the information is discarded when the input information passes through the forget gate. The input gate consists of two parts, i.e., a sigmoid layer and the tanh layer. The sigmoid layer is responsible for information that requires updating and the tanh layer is responsible for creating the vector of the alternative update content; these two parts are combined and represent the output of the output gate. The calculation is defined in Equation (3) and Equation (4). After the information passes through the forget gate and input gate, the cell state will be updated. The calculation is defined in Equation (5); the old status
is updated to
. Finally, the output gate also includes a sigmoid layer and a tanh layer. The sigmoid layer determines which input information needs to be output and the tanh layer processes the information of the cell state and combines the two parts as the output of the output gate. The calculation process is shown in Equations (6) and (7), where
represent the weights of the input of the current eigenvector through each control gate and
represent the bias terms of the control gate.
DNN algorithms have been widely used in industry and academia in recent years. The DNN algorithm provides good recognition rates. As an important machine learning method, a DNN performs particularly well for multi-label classification [
44]. This study focuses on the identification of two classes, namely, the stance phase and swinging phase. The training data set contains five features extracted from the subjects’ gait events.
The DNN is a feed-forward artificial neural network consisting of an input layer, an output layer, and at least two hidden layers [
45]. Several important parameters need to be set, including the gradient descent optimizer, learning rate, activation function, and training steps. In most machine learning tasks, the objective is to minimize loss and, in the case of loss definitions, the subsequent task is to select and define the optimizer. Gradient optimization is a common method of deep learning, that is to say, the optimizer is actually the optimization of the gradient descent algorithm at the end. The optimizer chosen in this study is the AdaGrad algorithm. The original gradient descent optimization algorithm converges slowly near the bottom of a slope but this is avoided when the AdaGrad algorithm is used. Because of the steep directional gradient, the learning rate will decay faster, which is conducive to the parameter moving in the direction closer to the bottom of the slope, thus accelerating the convergence [
46].
The learning rate has a crucial influence on the learning quality of the established learning model. The smaller the learning rate, the more sophisticated the learning is but at the same time the learning speed is also reduced, which will easily cause overmeshing and reduces the generalization ability of the training model. Similarly, if the learning rate is too high, the span of each step is too large and the model will miss a lot of important information in the training process, resulting in poor detection accuracy of the training model [
47]. The learning rate is generally related to the selected optimizer, the neural network model, the number of training steps, and the size of the sample set. In the LSTM-DNN algorithm, the learning rate is 0.001, 0.003, 0.005, 0.008, 0.01, 0.05, 0.08, 0.1, 0.03, 0.5, 0.7, and 0.8, respectively. Finally, when the learning rate is 0.5, each evaluation index is relatively high. There are three multi-label hidden layers. In a multilayer neural network, there is a functional relationship between the output of the upper node and the input of the lower node, which is called the activation function. In this study, the ReLu function is used as the activation function of the neurons. The important feature of AdaGrad algorithm, which is used as the optimizer, is that it automatically adjusts the learning rate, thereby avoiding the occurrence of the dead ReLu problem caused by a very large learning rate. The softmax classifier is an extension of the logistic regression model for multi-classification problems. For multi-classification, the softmax function is generally used for the activation function of the output layer [
42]. For multi-classification problems, it is evident from Equation (8) that the softmax function maps the output of multiple neurons to the interval of (0,1). When selecting the output node, we can select the node with the highest probability as the prediction target of the model, thus achieving multi-classification.
The acceleration signal obtained in this study is a time series signal; therefore, it can be input directly into the LSTM network for preprocessing. The LSTM network output layer signal represents the input into the DNN; the DNN improves the LSTM network output signal, resulting in greater accuracy. The structure of the LSTM-DNN is shown in
Figure 6. The pre-processed data consisted of 18 parameters and a total of 198,930 learning data point were calculated with the LSTM-DNN algorithm. The LSTM network includes two important parameters, i.e., num_units and forget_bias. Num_units refers to the number of neurons in a cell and forget_bias refers to the bias in the forget gates. In this study, there are 36 num_units and the forget_bias is 0.7. In addition, the LSTM-DNN algorithm used in this study is also a multi-layer perceptron with a DNN basic architecture network. The feed-forward DNN has five layers, namely one input layer, two hidden layers, and one output layer. In general, the input vector P of the input layer is expressed as p1, p2, ..., p15 and each input is represented by the corresponding weight matrix w11, w21, and w181. The essence of the LSTM-DNN algorithm is to integrate the DNN and LSTM network. Although the LSTM-DNN algorithm performs better in timing signal processing, it has a simple network structure due to fewer neurons. In the training process, the predicted value of the DNN is fed back to the LSTM network continuously. Since the DNN is used, the weights of the LSTM network and the DNN change during the training process. As a result, the fluctuations in the weight of the LSTM network in the later training process are reduced, thereby stabilizing the network structure. First, the LSTM network processes the output of the previous cell and the input eigenvectors of the current cell. Then the DNN uses the LSTM network’s output signal as the input signal and the weights are adjusted in the DNN and the LSTM network through the training process. Finally, the trained network structure is used and the classification is conducted using the Softmax classifier gait detection.
We compared the results of the LSTM-DNN with those of more mature and traditional algorithms, namely, the k-nearest neighbor method (KNN) and SVM, which are installed in the sklearn package. In order to make the classification results have certain comparability, this paper also uses the PCA processed data set as the input of KNN and SVM classifier. SVM algorithms separate classes into different class hyperplanes using samples. The algorithm maximizes the distance between two samples, thus differentiating the categories as much as possible. Among them, the choice of kernel function is crucial to the performance of the support vector machine algorithm. The kernel functions of support vector machines mainly include linear kernel function, RBF kernel function, poly kernel function, and sigmoid kernel function. In order to determine the kernel function under the unsynchronized speed, this paper has introduced these four kinds of kernel functions and tested them separately, and obtained the results as shown in
Table 2. In the KNN classification algorithm, if most of the K closest samples in the feature space belong to a certain category, the samples also belong to this category. The KNN algorithm is suitable for multi-classification. The KNN distance parameter chosen in this paper is the euclidean distance. In addition, the use of the KNN algorithm involves the parameter K. In order to find the ideal K value, the paper takes the values of 2, 5, 7, 10, 15, 20, and 30 in three kinds of paces. Under different K values, the classification results obtained by the classifier are shown in
Table 3. The final K value is then determined by using the accuracy and F-score.
In this experiment, a total of more than 640,000 pieces of acceleration information were collected. To avoid overfitting, three of the data were used for training and the other was used for testing. In this study, we randomly selected data from three subjects to train the LSTM-DNN model and the other three models. Use the remaining data from one subject to verify the performance of the model. In addition, the final model parameters were determined after 15,000 network trainings. The overall process is shown in
Figure 7.