A Human Activity Recognition Algorithm Based on Stacking Denoising Autoencoder and LightGBM

Recently, the demand for human activity recognition has become more and more urgent. It is widely used in indoor positioning, medical monitoring, safe driving, etc. Existing activity recognition approaches require either the location information of the sensors or the specific domain knowledge, which are expensive, intrusive, and inconvenient for pervasive implementation. In this paper, a human activity recognition algorithm based on SDAE (Stacking Denoising Autoencoder) and LightGBM (LGB) is proposed. The SDAE is adopted to sanitize the noise in raw sensor data and extract the most effective characteristic expression with unsupervised learning. The LGB reveals the inherent feature dependencies among categories for accurate human activity recognition. Extensive experiments are conducted on four datasets of distinct sensor combinations collected by different devices in three typical application scenarios, which are human moving modes, current static, and dynamic behaviors of users. The experimental results demonstrate that our proposed algorithm achieves an average accuracy of 95.99%, outperforming other comparative algorithms using XGBoost, CNN (Convolutional Neural Network), CNN + Statistical features, or single SDAE.


Introduction
With the development of the healthy life and smart home concept, human activity recognition (HAR) has been increasingly studied and applied in Human-Computer Interaction (HCI), and Mobile and Pervasive Computing [1]. One of the purposes of HAR is indoor positioning [2]. As the landmark of indoor positioning, elevators and escalators detect whether a human is currently taking them by judging moving modes to calibrate the indoor positioning results. Human physical motion recognition can also be used in indoor navigation by combining with the wireless signals [3]. Another feasible objective of HAR is static behavior recognition for safe driving [4], and scientific exercise [5]. Moreover, HAR can also be used for dynamic behavior recognition in healthcare monitoring [6]. This process will detect whether the patients or the elderly experience a sudden fall and raise the alarm promptly to protect the personal safety of the users. In addition, other applications include bilateral links for advertising, entertainment, games, and multimedia visualization guidance [7,8].
At present, the HAR methods are mainly divided into HAR based on vision [9,10] and HAR based on sensors. Vision-based HAR has high recognition accuracy, but it brings with it high power Although many works focus on HAR, there are still many deficiencies in the accuracy, latency, and power consumption. The observation noise of the sensor is the key reason for the low recognition accuracy. Recently, stacked autoencoder (SAE), as a classical unsupervised learning algorithm, has shown high feature extraction [40,41] and data compression [42,43] performance that matches the current state-of-the-art [41]. Vincent et al. [44,45] modified the traditional SAE to learn useful features from corrupted data and developed the stacked denoising autoencoder (SDAE) that eliminates sensor observation noise by signal reconstruction. The SDAE model has the potential to eliminate noise and extract robust unsupervised feature in practice. Nevertheless, few researchers have used SDAE as an independent feature extraction module in HAR. A deep convolutional autoencoder (CAE) network proposed in [46] utilizes autoencoder to initialize the weights of the following convolutional layers. Another network named AE-LRCN [47] uses the autoencoder layer to remove the inherent noise of the input data. Thus, it is necessary to carry out the task of HAR based on features extracted from SDAE.
Unlike traditional HAR algorithms, this paper proposes an fusion method of Stacked Denoising Autoencoder [45] and LightGBM [48] for human activity recognition based on inertial sensor data of smartphone and highlights the classification of four different daily activities under three typical scenarios of human moving modes, current static behavior, and current dynamic behavior. The main contributions of this paper are as follows: • We proposed a method which combines the feature extraction ability of deep learning with the classification ability of decision tree. We make advantage of SDAE to filter the occasional sensor noise (caused by the low-cost MEMS and complex human activities) and use the automatically obtained features for accurate human activity recognition The Boosting K-Fold LGB is used to realize accurate classification of the user behaviors.

•
We proposed a little trick of k-Fold based on the idea of Boosting. By repeating the error classification samples in the validation set of the previous fold, the attention of the n th fold error samples can be improved in the n + 1 th fold training.

•
We selected four datasets under three typical application scenarios, to verify the algorithm proposed in this paper and prove that this model can achieve high accuracy in multiple data sets and multiple classification problems.

•
We also implemented the state-of-art algorithm based on XGB [49], CNN [50], CNN + statistical features [13], and single SDAE [2]. Then, we compared the proposed algorithm and the state-of-art algorithm on the same datasets.

141
For describing conveniently, we introduce the notations used in this paper. We use for the 142 number of the sensors. Let , 1, 2 k k  represent the original data collected from the th k sensor and For describing conveniently, we introduce the notations used in this paper. We use N for the number of the sensors. Let k , k = 1, 2 . . . N represent the original data collected from the k th sensor and d k represent the dimension of the k th sensor. After the sliding window with a size of w and a stride length of s, k is divided into N samples, each of which is represented by x k i , i = 1, 2, . . . , N, x k i ∈ R w×d k . After splicing and standardization, the input of the k th SDAE model will be obtained, which is represented by

Data Pre-Process
The data pre-process aims to change the sensor data collected at a fixed frequency into the input of the SDAE network. The specific processing process is described below.
Data Segmentation. In each experiment conducted by each person, the result is a sensor data sequence which has indefinite length. A sliding window with a size of 2.56 s is used to capture sample data on the different datasets. The final shape of the samples is shown in Section 3.1.
Data Reshaping. To match the input shape of SDAE, it is necessary to reshape the sample obtained in the previous step. In this paper, we use the axis as the module to splice. If the sample shape after the first step is (N , P , M ), then the reshaped sample shape is (N , P × M ).
Data Standardization. This paper adopted the max-min standardization method before feeding the sample data into the SDAE network. The paper adopted the max-min standardization method. The following procedure is performed for the i th column of data (x[:, i], i = 1, 2, · · · P × M ) for the samples obtained from the previous step: The pseudo code of Algorithm 1 shows the data pre-process implementation. This algorithm receives a time-series matrix d = [d 1 , · · · , d i , · · · d n ], a variable l represents the size of the sliding window, and a variable s represents the step of the sliding window. For matrix d, d i is a m-dimension vector where m represents the number of the axes of all the sensors. The algorithm output is defined by a2-dimension list named "finalsamples" with a shape of [n|s − 1, l, m].

Unsupervised Feature Extraction
In this section, we introduce unsupervised feature extraction based on SDAE and demonstrates the effectiveness of the extracted features.
2.3.1. Feature Extraction SDAE come from a deep network scheme which stacks multiple denoising autoencoder together to learn complicated features [45]. Each denoising autoencoder consists of four layers: input, imnoise, hidden, and output layers. The hidden layer and the output layer are called the encoding layer and the decoding layer, respectively. To thoroughly learn the data variation rules of each sensor, we specifically construct a separated SDAE network. Each sensor data is passed into an SDAE network separately for data forward propagation and parameter reverse learning.
Assuming that there are n k layers of k th SDAE, then at the l th layer, a complete set of encoding-decoding operations is performed. Given the pre-processed vector of the k th sensor named i l → x k i , the imnoise layer of denoising autoencoder first transforms it by where f noi (·) is the noising function, θ k,l noi is the probability of dropout in this paper. By using the dropout layer, a certain number of input sensor data are randomly chosen and forced to be zero. The encoding layers are trained to fill in these blanks and reconstruct these corrupted inputs of sensor data. Let e l → x k i be the output of encoding layer, calculated by a function of where f enc (·) represents the encoding function and θ k,l enc is the noised-to-hidden parameters. The e l → x k i obtained by the encoding function is the feature learned in the current layer. Let f dec represent the decoding function, θ k,l dec represent the decoding parameters, and the final output of the denoising autoencoder data will be expressed as In Formula 4, θ k noi is a super parameter that needs to be defined manually. While θ k enc , θ k dec are parameters that need to be trained and adjusted through the back-propagation process, where the loss function is defined to minimize the mean square error between the decoded data and the input data, which is (θ k,l enc * , θ k,l dec * ) = arg min In this so-called "denoising" way, we can reduce the influence from the inherent noise of sensor data collected by smartphones and focus on retrieving the information we need, or the so-called "useful features". Then in the stacked structure, once the l th layer are trained, the SDAE scheme then leverages the outputs to train the l + 1 th layer. After fine-tuning of the layers, we obtain the final "useful features" by e → x k i = f enc (· · · f enc ( → x k i ; θ k,1 enc ); · · · θ k,n k enc ) When all the feature extraction tasks are completed, the features of each sensor are pieced together again to form the final feature vector e→ . Together with the label y i , i = 1, 2, . . . N, it forms the input of supervised feature classification layer.

Analysis of the Feature Performance
To evaluate the influence of the features extracted by SDAE, we leverage the inner-class dispersion matrix and the outer-class dispersion to describe the distribution of samples.
The inner-class dispersion matrix of the class Ω i is defined as where the X (i) represents the i th class sample set and the m (i) is the mean of all the samples in X (i) . The total inner-class dispersion matrix is defined as where M donates the number of the sample classes, and the P(Ω i ) is the probability of the i th class samples in the total number of samples. For the outer-class dispersion, the dispersion matrix is defined as S We use the trace of the dispersion matrix as a measure of the sample divergence. That is tr(S (i) W ) represents the divergence of each sample in i th class to the mean vector. And the tr(S W ) is the mean measure of the feature variance of all the classes. The S (ij) B donates the dispersion between i th and j th classes while the S B is a measurement of the mean dispersion between the mean of each class and the global mean vector. So the feature extracted by the SDAE should make the inner-class divergence as small as possible and the outer-class divergence as large as possible.
We selected six types of sample data and compared the sample dispersion before and after using SDAE. Table 2 lists the total inner and outer class divergence of the Original data and the extracted feature The inner-class divergence has been reduced by nearly 97.5% from the original data to the extracted feature. Although the outer-class divergence also decreases, the proportion of the decline is much smaller than the inter-class divergence. For the original data, the inner-class divergence is greater than the outer-class divergence. But for the extracted features, the outer-class divergence is more than four times the inner-class divergence. Tables 3 and 4 list the specific inner and outer class divergence of the Original data and the Extracted feature. The upper left corner is the calculation result of the original data, and the lower right corner is for the extracted features. As can be seen from the table comparison, the feature extracted by SDAE has a significant effect. For example, the inner-class divergence of WALK is 10.50, larger than the outer-class divergence between WALK and SIT on the original data. After the feature extraction, the inner-class divergence changes to 0.09 while the outer-class divergence becomes 0.49, which is more than five times that. Therefore, both total and specific divergence demonstrate that the SDAE has obvious advantages in excavating the hidden features of various types of data. To visually verify the validity of the extracted features from the SDAE model, we selected several features and made the numerical distribution diagram of all the samples in each category on a certain feature, which is shown in Figure 2.     Figure 2b, although the distribution of the sixth feature value among the three dynamic categories overlaps to a certain extent, it still has strong classification ability. For example, 93% values of the WALK class are distributed between 0.74 and 0.80, while 61% for WALKUP class and for WALKDOWN, only 43%. As for the sixty-ninth feature on the three static classes shown in Figure 2c, the classification effect is particularly noticeable. The feature values of the LAY class are distributed between 0.4 and 0.9, with no overlap with other classes. For the STAND and SIT, there is only 38% overlap. Although a single feature has shown a certain classification ability, it still has a lot of limitations. Therefore, we need a powerful classifier to deal with the 90-dimensional features learned by SDAE to achieve the best classification effect.

Supervised Classification
To make full use of the features extracted by SDAE for high-precision classification, we selected the LGB algorithm as a supervised classification method. This section gives a simple introduction to the advantages and calculation methods of LGB, and explains the Boosting K-fold algorithm proposed in this paper.

Classification Algorithm
With the labeled training dataset C= {(e → x i , y i )} N i=1 gained from the unsupervised feature extraction layer, the LGB algorithm will be used.
LGB is a new GBDT implementation with Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) that meet the requirements of efficiency and scalability under the situation of high dimension and a large amount of data. Researches show that LGB will speed up the training process of conventional GBDT by up to 20 times while achieving almost the same accuracy [48].

Boosting K-fold
In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples. Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data.
In this paper, we use the idea of boosting into the process of five-fold cross-validation. The change process of the dataset is shown in Figure 3. In this algorithm, the five-fold cross-validation is a serial process. At first, the original data, as shown in Figure 3a, was initially divided into five parts. During the first fold training, the samples that were misclassified in the verification set were selected Figure 3b. In the second fold of training, the misclassified samples we first copied to achieve the purpose of increasing the weight, and then were used as the training set Figure 3c. So on, the error samples in the verification set were marked and copied for the third fold of training Figure 3d. Repeat this process until all training is completed.

273
In this paper, we use the idea of boosting into the process of five-fold cross-validation. The

274
change process of the dataset is shown in Figure 3. In this algorithm, the five-fold cross-validation is 275 a serial process. At first, the original data, as shown in Figure 3a, was initially divided into five parts.

276
During the first fold training, the samples that were misclassified in the verification set were selected

284
The pseudo code of Algorithm 2 shows the implementation of Boosting K-Fold LGB. This 285 algorithm receives the whole train data and label named "X" and "Y" as input. The trained LGB 286 models will be used for prediction and this process is not shown in Algorithm 2. if len(misjudgedX) ! = 0 The pseudo code of Algorithm 2 shows the implementation of Boosting K-Fold LGB. This algorithm receives the whole train data and label named "X" and "Y" as input. The trained LGB models will be used for prediction and this process is not shown in Algorithm 2.

Models for Comparison
In this section, we provide a brief introduction to four algorithms that are used in the literature for comparison with the method proposed in this paper. These are single SDAE, XGB, CNN, and CNN + Statistic Features.

HAR Based on SingleSDAE
A single SDAE model can also be directly used for multiple classification problems. In this paper, the effects of single SDAE and SDAE+LGB were also compared. The pre-train phase of single SDAE is the same as 3-B. And the difference between them is that a softmax layer will be superimposed on the trained encoding network for category prediction for single SDAE. The reverse propagation process is the same as that of a neural network. The algorithm schematic diagram is shown in Figure 4.

303
The XGB algorithm is one of the common machine learning algorithms in HAR. In this paper,

304
we construct a complete set of feature engineering by studying the internal laws of data sets. Then 305 the XGB algorithm is called for classification. The accuracy of XGB will be compared with the 306 algorithm proposed in this paper. The characteristics of the data are constructed as Table 5.

HAR Based on XGB
The XGB algorithm is one of the common machine learning algorithms in HAR. In this paper, we construct a complete set of feature engineering by studying the internal laws of data sets. Then the XGB algorithm is called for classification. The accuracy of XGB will be compared with the algorithm proposed in this paper. The characteristics of the data are constructed as Table 5.

HAR Based on CNN
As a classic supervised deep learning method, CNN can also be used in HAR. This paper studied the accuracy under the CNN algorithm either. Taking the UCI-HAR data set as an example, the CNN network structure adopted in this paper is shown in Figure 5. One dimensional convolution operation was performed on the three sensor data respectively. The characteristics obtained by convolution were stretched into a one-dimensional vector. Then the completed input vector of the fully connected layer can be obtained by splicing the feature vector of each sensor. After three full connection operations, the output will be obtained, in which the i th -dimension represents the probability that the current sample belongs to the i th -class.

313
As a classic supervised deep learning method, CNN can also be used in HAR. This paper studied 314 the accuracy under the CNN algorithm either. Taking the UCI-HAR data set as an example, the CNN 315 network structure adopted in this paper is shown in Figure 5.

321
Considering that CNN is only an algorithm for comparison, we directly adopted the optimal 322 parameters in the paper [27], only slightly changing the network structure of CNN according to 323 different data formats of the input.  Considering that CNN is only an algorithm for comparison, we directly adopted the optimal parameters in the paper [27], only slightly changing the network structure of CNN according to different data formats of the input.

HAR Based on CNN+ Statistic Features
We also implement a state-of-art HAR algorithm for comparison with our own method. The method proposed in [13] presents a user-independent deep learning-based approach for online human activity recognition by using CNN for local feature extraction together with simple statistical features that preserve information about the global form of time series. The results published in [13] show that this method demonstrates state-of-the-art performance while requiring low computational cost and no manual feature engineering.

Experiments and Evaluation
To evaluate the performance of the proposed algorithm, we carried out a set of experiments described in this section.

Datasets
For the evaluation of the generalization ability of the algorithm we proposed, we tested four datasets from three typical scenarios. These datasets are elaborated below and the number of specific sample for each category of each dataset is shown in Table 6. The Human Moving Modes with Pressure (HMMwithPre): This dataset is from a variety of smartphones (HUAWEI NXT-TL00, NXT-AL10, Samsung G9200, MIX 2, and MI 5s) positioned horizontally in the user's hand to collect data from an accelerometer, gyroscope, magnetic, and air pressure sensor at a 100 Hz sampling rate. Twenty-five (25) subjects participated in data collection: 20 men and 5 women from 20 to 50 years old, of 165−192 cm height and 48−80 kg weight. Let n represents the length of the sequence. The processed data were built from 50%-overlapping sliding windows with 256 samples. Since the sampling frequency was 100 Hz, each data frame lasted 2.56 s, with every new frame available every 1.28 s. Finally, the sample shape obtained is ((n|128) − 1, 256, 10).
The Human Moving Modes without Pressure (HMMwithoutPre): This dataset is a variation of the HMMwithPre, which was derived from the original HMMwithPre data by removing the pressure sensor data. With the same sliding window size and step as HMMwithPre, the final sample shape is ((n|128) − 1, 256, 9).
The Human Static Behavior Dataset (HSBD): This dataset (https://archive.ics.uci.edu/ml/ datasets/Human+Activity+Recognition+Using+Smartphones) is from a single smartphone (Samsung Galaxy S2) positioned on the user's waist to collect the total accelerometer, the estimated body accelerometer and gyroscope data at a 50 Hz sampling rate. Thirty (30) subjects aged 19−48 years participated in data collection. The processed data were built from no-overlapping sliding windows with 2.56 s. Since the sampling frequency was 50 Hz, each data frame contains 128 samples. Finally, the sample shape obtained is (n|128, 128,9) The Human Dynamic Behavior Dataset (HDBD): This dataset (http://archive.ics.uci.edu/ml/ datasets/Smartphone-Based+Recognition+of+Human+Activities+and+Postural+Transitions) is from a single smartphone (Samsung Galaxy S2) positioned on the user's waist to collect the total accelerometer, the estimated body accelerometer and gyroscope data at a 50 Hz sampling rate. It is an updated version of the HSBD. After removing the data which has the same label with HSBD, the training samples were extracted. Considering the small number of datasets, a 0.16 s sliding window is adapted to obtain the samples, the shape of which is ((n|8) − 1, 128, 9).

Evaluation Metrics
In order to comprehensively evaluate the performance of HAR, we used four evaluation accuracy (A), precision (P), recall (R), F1-score (F1) to evaluate the classification results. For this multi-classification problem, the calculation steps of P, R, and F1 is shown below.
Step1: For each activity category, count the number of samples of predicting this class as this class (TP), predicting other classes as this class (FP), predicting this class as other classes (TN).
Step2: Calculate P k , R k , F1 k under each category by the statistics of the first step. The calculation formula is as follows: Step3: Average the results under all the categories obtained in the second step.

Network Structure of SDAE
SDAE network is composed of multiple encoding layers and decoding layers, the number of cells of which are different. The network structure of our SDAE for each dataset is summarized in Table 7, where the n_layer represents the number of the encoder layers, the n_hidden represents the number of cells in each layer of the encoding layer, and the dropout represents the rate at which input data is discarded.

Classification Performance
In the first experiment, we evaluated our proposed method on the HMMwithPre, HMMwithoutPre, HSBD, and HDBD datasets. The recognition results are presented as confusion matrices in Figure 6 and summarized as average recognition accuracy in Table 8.
judged as others and the others being judged as escalator up is the highest. For HSBD, three categories 399 of motion (walking, walking up, walking_down) and three categories of rest (sitting, lying, standing) 400 can be perfectly separated. In the rest categories, the misjudgment rate between standing and sitting 401 was higher. With the HDBD, the samples of misjudgement mainly focus on the discrimination of 402 standing or sitting to lying, and lying to standing or sitting, which is consistent with the classification 403 result of HSBD.

404
The experiment result has shown that the algorithm proposed in this paper achieves a good 405 classification effect on multiple datasets, especially on the discrimination of motion and rest data. In   For each dataset, the data on the diagonal occupies an absolute proportion. With the HMMwithPre dataset, the recognition accuracy is 95.73% while there is a large possibility that the elevator_up and elevator_down are misjudged and also 1% of the walking_up and walking_down data are judged as walking. As for HMMwithoutPre dataset, all but the stilling categories have a certain probability of being misjudged. Among them, the probability that the elevator_down is judged as others and the others being judged as escalator up is the highest. For HSBD, three categories of motion (walking, walking up, walking_down) and three categories of rest (sitting, lying, standing) can be perfectly separated. In the rest categories, the misjudgment rate between standing and sitting was higher. With the HDBD, the samples of misjudgement mainly focus on the discrimination of standing or sitting to lying, and lying to standing or sitting, which is consistent with the classification result of HSBD.
The experiment result has shown that the algorithm proposed in this paper achieves a good classification effect on multiple datasets, especially on the discrimination of motion and rest data. In the categories of human moving modes, escalator and elevator have a relatively high misjudgement rate, while for the data of user behavior, the distinction between standing and sitting is the difficulty of the classification.

Comparison of Different Models
In the second experiment, to compare the other classification performance with the algorithm proposed in this paper, we implemented additional single SDAE algorithms, XGB, CNN, and the CNN + Statistical features algorithm proposed in [13]. The experimental results are shown below.

Comparison with Single SDAE
The classification result of single SDAE is shown in Table 9 comparing with SDAE+LGB. The difference between them is that the former uses a fully connected layer for classifying rather than LGB. As shown in the table, the accuracy of single SDAE is about 10% lower than that of SDAE+LGB but varies little among the four datasets, which verifies the robustness of SDAE to extract effective features.  Table 10 shows the evaluation score of XGB. As for HMMwithPre, by the XGB algorithm achieves an accuracy of 95.06% which fully demonstrates the effectiveness of the proposed feature. However, the 10s' sliding window of pressure data will cause a long time delay and affect the real-time performance of recognition. With the HMMwithoutPre dataset, the performance XGB dropped significantly by almost 10% simply because it was missing four pressure-dependent features. Experimental results show that the performance of XGB greatly depends on the effectiveness of features.

Comparison with CNN
The performance of CNN is shown in Table 11 with an average accuracy of 89.52% on the four datasets. The experimental results show that the CNN algorithm has good robustness and will get similar results on multiple data sets. However, CNN still has shortcomings in feature extraction, which limits its accuracy.

Comparison with CNN+Statistical Features
The comparison results on the all datasets are shown inTable 12. As can be seen from the table, the accuracy of the CNN + Statistical features algorithm varies greatly on distinct datasets, from 78.11% to 97.63%. Compared with CNN, this algorithm has a great improvement in HSBD, but a sharp decrease in HDBD. This indicates that the algorithm is not robust, mainly because the features extracted manually are not universal.

Analysis of Comparison Results
From the experiment results, the highest accuracy on different classification methods is achieved by using SDAE+LGB. As for the single SDAE algorithm, the learning ability of the fully connected layer added after the encoding layers are limited. It's difficult for it to make full use of the features acquired by encoding-decoding network learning. For the XGB algorithm, the accuracy of recognition largely depends on the effectiveness of the extracted feature. Meanwhile the accuracy of the CNN algorithm is affected by many super parameters, and it's difficult to find an optimal combination to achieve the ideal accuracy of identification network. When together with the statistical features, the robustness of CNN will drop significantly.

Conclusions
In this paper, we propose a human activity recognition algorithm combining the feature extraction ability of SDAE and the classification ability of LGB and demonstrate its capability to produce a robust HAR. The evaluation was performed on four distinct datasets under different combination of sensors, various sensor positions, and three typical application scenarios which are human moving modes, current static behavior, and dynamic behavior change. For comparison, we also implemented the single SDAE, XGB, CNN, and a state-of-art algorithm and compared it with the SDAE+LGB algorithm on each dataset.
Extensive experimental results demonstrate that our proposed algorithm is more generic and robust than other state-of-art algorithms. There are two main reasons for this. One is that the features learned by SDAE are more capable of showing the variation law of sensor data than those constructed manually. The other is that for the same features, the classification capability of LGB is better than that of a simple fully connected layer.
For the future work, we plan to conduct further research along the following lines. First, we will explore the usage of unlabeled data generated during the user's use to improve the existing model incrementally. Second, we will construct an effective indoor positioning algorithm by combining the classification results of human moving modes with Pedestrian Dead Reckoning (PDR). Third, we will translate the classification results of HAR into practical semantic layer expression, which can provide suggestions for human daily life.