A Feature Selection and Classiﬁcation Method for Activity Recognition Based on an Inertial Sensing Unit

: The purpose of activity recognition is to identify activities through a series of observations of the experimenter’s behavior and the environmental conditions. In this study, through feature selection algorithms, we researched the e ﬀ ects of a large number of features on human activity recognition (HAR) assisted by an inertial measurement unit (IMU), and applied them to smartphones of the future. In the research process, we considered 585 features (calculated from tri-axial accelerometer and tri-axial gyroscope data). We comprehensively analyzed the features of signals and classiﬁcation methods. Three feature selection algorithms were considered, and the combination e ﬀ ect between the features was used to select a feature set with a signiﬁcant e ﬀ ect on the classiﬁcation of the activity, which reduced the complexity of the classiﬁer and improved the classiﬁcation accuracy. We used ﬁve classiﬁcation methods (support vector machine [SVM], decision tree, linear regression, Gaussian process, and threshold selection) to verify the classiﬁcation accuracy. The activity recognition method we proposed could recognize six basic activities (BAs) (standing, going upstairs, going downstairs, walking, lying, and sitting) and postural transitions (PTs) (stand-to-sit, sit-to-stand, stand-to-lie, lie-to-stand, sit-to-lie, and lie-to-sit), with an average accuracy of 96.4%.


Introduction
With the rapid development of artificial intelligence, activity recognition has become an important emerging field of research. Different technical means that recognize the human activities of users employing different embedded sensors have been actively studied [1]. Early research focused on recognizing activities with signals coming from one or more standalone motion sensors that were attached to the human body, at locations chosen by the researcher. Because vision-based activity recognition analyses include a high cost and environmental restrictions [2], human activity recognition (HAR) systems have been used to receive the state of the user by wearable inertial sensors attached to the user's body to measure and evaluate their action patterns [3,4].
Raw data collected from sensors recognize human activities through machine learning algorithms [5,6]. The main application areas of this research are low-complexity systems such as single-chip microcomputers, which only use a small amount of data. Accurate signal processing and feature selection are fast becoming key problems in the field of attitude recognition.
A total of 50% of the data was used for feature selection, 35% of the data was used for classification model training, and the other 15% was used for precision evaluation [18]. Signal acquisition is the collection and processing of sensor data from available sources. In general, signal conditioning (e.g., noise reduction, digitization amplification) is always required to adapt the sensing signal to the application requirements [19]. The feature extraction process is responsible for obtaining meaningful features that describe the data, and allows for a better representation and understanding of the phenomena under study. The extracted features provide the most effective features through feature selection and input them into the classifier in order to train the classification model [20]. Finally, the classification model is verified.
Information 2019, 10, x FOR PEER REVIEW 3 of 22 understanding of the phenomena under study. The extracted features provide the most effective features through feature selection and input them into the classifier in order to train the classification model [20]. Finally, the classification model is verified.

Signal Processing and Feature Extraction
Firstly, the tri-axial acceleration signal and tri-axial angular velocity signal of the inertial sensor corresponding to six basic motions (walking (WK), walking upstairs (WU), walking downstairs (WD), standing (SD), sitting (ST), and lying (LY)) were obtained in the experiment, as shown in Figures 2 and 3.

Signal Processing and Feature Extraction
Firstly, the tri-axial acceleration signal and tri-axial angular velocity signal of the inertial sensor corresponding to six basic motions (walking (WK), walking upstairs (WU), walking downstairs (WD), standing (SD), sitting (ST), and lying (LY)) were obtained in the experiment, as shown in Figures 2 and 3. understanding of the phenomena under study. The extracted features provide the most effective features through feature selection and input them into the classifier in order to train the classification model [20]. Finally, the classification model is verified.

Signal Processing and Feature Extraction
Firstly, the tri-axial acceleration signal and tri-axial angular velocity signal of the inertial sensor corresponding to six basic motions (walking (WK), walking upstairs (WU), walking downstairs (WD), standing (SD), sitting (ST), and lying (LY)) were obtained in the experiment, as shown in Figures 2 and 3.   Since the energy spectrum of human body motion is mainly in the range of 0-15 Hz, a median filter and a third-order low-pass Butterworth filter with a 20-Hz cutoff frequency were used to filter the six-axis signal to remove noise. The angular velocity signal was high-pass filtered to remove any DC offset that would affect the gyroscope. Similarly, a low-pass Butterworth filter with a cut-off frequency of 0.3 Hz was used to separate gravity acceleration signals from tri-axial acceleration signals, including tAcc-xyz, tGyro-xyz, and tGravityacc-xyz [21]. The resulting acceleration signals, gravity acceleration signals, and angular velocity signals provide information about the user's body movements, the person's orientation (for example, helping to distinguish between lying down and standing up), and the movement patterns in which people perform certain activities [22].
Subsequently, the body acceleration signals and angular velocity signals were calculated with the difference calculation to obtain Jerk signals (a certain amount of information was known about the characteristics of related activities, and was successfully applied to the test of patients [23]), respectively represented by taccjerk-xyz and tgyrojerk-xyz.
In addition, the formula 2 2 2 x y z + + was used to calculate the triaxial acceleration, triaxial angular velocity, and gravity component, that is, tAccMag, tGyroMag, and tGravityAccMag, respectively [24]. The processed signal allowed the reduction of data dimensions and meant that the data were independent of the direction. Furthermore, it was necessary to solve the angle between the tri-axial signal vector and the gravity component, that is, tAccAng and tGyroAng, respectively. The angle between the earth's gravity and the sensor device had a good effect on the static action.
Finally, the real value Fast Fourier Transformation (FFT) algorithm was used to transform these windows into the frequency domain, resulting in facc-xyz, fgyro-xyz, and faccjerk-xyz [25].
After the signal processing was completed as described above, these signals were used as variables for estimating the feature vector of each mode, and the behavior was modeled using a windowed method (the average human walking rhythm is at least 1.5 steps/second); each window sample preferably had at least a complete walking cycle. The rectangular sensor window of the overlapping window factory was used to extract the acceleration sensor signal. After selecting the appropriate window factory, various features were extracted from the acceleration signal of the single window factory to obtain the feature vector. Table 1 shows 22 measures applied to the x-axis. Table 2 shows the correlation feature between signal pairs of the x-axis and y-axis. All of the above signal-processing methods correspond to the time domain and frequency domain, respectively, as shown in Tables 3 and 4. Since the energy spectrum of human body motion is mainly in the range of 0-15 Hz, a median filter and a third-order low-pass Butterworth filter with a 20-Hz cutoff frequency were used to filter the six-axis signal to remove noise. The angular velocity signal was high-pass filtered to remove any DC offset that would affect the gyroscope. Similarly, a low-pass Butterworth filter with a cut-off frequency of 0.3 Hz was used to separate gravity acceleration signals from tri-axial acceleration signals, including tAcc-xyz, tGyro-xyz, and tGravityacc-xyz [21]. The resulting acceleration signals, gravity acceleration signals, and angular velocity signals provide information about the user's body movements, the person's orientation (for example, helping to distinguish between lying down and standing up), and the movement patterns in which people perform certain activities [22].
Subsequently, the body acceleration signals and angular velocity signals were calculated with the difference calculation to obtain Jerk signals (a certain amount of information was known about the characteristics of related activities, and was successfully applied to the test of patients [23]), respectively represented by taccjerk-xyz and tgyrojerk-xyz.
In addition, the formula x 2 + y 2 + z 2 was used to calculate the triaxial acceleration, triaxial angular velocity, and gravity component, that is, tAccMag, tGyroMag, and tGravityAccMag, respectively [24]. The processed signal allowed the reduction of data dimensions and meant that the data were independent of the direction.
Furthermore, it was necessary to solve the angle between the tri-axial signal vector and the gravity component, that is, tAccAng and tGyroAng, respectively. The angle between the earth's gravity and the sensor device had a good effect on the static action.
Finally, the real value Fast Fourier Transformation (FFT) algorithm was used to transform these windows into the frequency domain, resulting in facc-xyz, fgyro-xyz, and faccjerk-xyz [25].
After the signal processing was completed as described above, these signals were used as variables for estimating the feature vector of each mode, and the behavior was modeled using a windowed method (the average human walking rhythm is at least 1.5 steps/second); each window sample preferably had at least a complete walking cycle. The rectangular sensor window of the overlapping window factory was used to extract the acceleration sensor signal. After selecting the appropriate window factory, various features were extracted from the acceleration signal of the single window factory to obtain the feature vector.  Table 2 shows the correlation feature between signal pairs of the x-axis and y-axis. All of the above signal-processing methods correspond to the time domain and frequency domain, respectively, as shown in Tables 3 and 4.

486-505
Information-entropy (tAcc-X,Y,Z, tGyro-X,Y,Z, tGravityAcc-X,Y,Z, tAccJerk-X,Y,Z, tGyroJerk-X,Y,Z, tAccMag, tGyroMag, tGravityAccMag, tAccAng, tGyroAng) The signal amplitude region is helpful for identifying the active period of the tri-axial signals in the time domain. It is defined as the sum of dividing the absolute values of all axes by the number of samples in the signal window [29].
Information entropy is used to measure the uncertainty in information theory, so as to estimate the information provided by signals. The normalized information entropy of the signal size is used to estimate. The quartile range indicator is used to calculate the difference between the upper (Q3) and lower quartile (Q1) of a set of sorted elements. These quartiles are the points that divide the data into 25% and 75% [30].
The autoregressive coefficient is the coefficient found by the Burg method, which conforms to the autoregressive model of inputs. This operation is applied to the signals in the time domain, and produces outputs corresponding to the four features of the algorithm sequence [31]. Considering that each point is proportional to its amplitude, the weighted average of the signals gives the average frequency of the signals. The spectral energy of the frequency band returns energy measurements in a manner similar to the energy function, but only within the interval between the frequency signals. Starting from scratch, we selected continuous intervals with three different bandwidths (8,16, and 24 points) [32].

Feature Selection
This study focuses on three feature selection algorithms (Fisher_score, ReliefF, and Chi_square), and filtering methods of feature selection, because these methods are independent of the selected classifier. In order to reduce the computational complexity and improve the classification accuracy, this study synthesized three feature selection methods (Fisher_score, ReliefF, and Chi_square) to perform feature selection.

Fisher_score
The Fisher score algorithm is defined as the gradient of the logarithmic likelihood relative to the model parameter, and describes how the parameter helps to generate a specific example [33]. If a feature is discriminative, the variance between the feature and the same class of samples should be as small as possible, and the variance between the samples should be as large as possible. This is conducive to the subsequent operations of classification, prediction, etc. µ f i denotes the average of the i-th feature f i in the sample, and µ k f i denotes the average of the i-th feature f i in the sample in the k-th class. We can then give each feature a score. The Fisher score is defined as follows: where n k represents the number of samples of the k-th class and f µ represents the value of the i-th feature in the j-th sample.

ReliefF
A series of Relief algorithms (including the first proposed and later expanded ones, Relief and ReliefF) are considered the best evaluation algorithms for filter types [34]. The ReliefF algorithm randomly takes a sample R from the training sample set each time; finds k neighbor samples from the same sample set of the sample point R each time, finds k neighbor samples from different sample sets; and then randomly selects multiple sample points to update the feature weights, obtain the feature weight ranking, and set the threshold to select the effective features.

Chi_square
Chi_square has been successfully used in facial image analysis applications [35]. The chi-square test is a hypothesis test method for counting data. It is used for the correlation analysis of two categorical variables or for comparing the ratio of two or more sample rates, that is, testing the theoretical frequency and the actual frequency (the degree of fit between the actual frequency). The basic formula of the chi-square test is as follows: where A is the actual frequency, T is the theoretical frequency, and χ is the chi-square value. Fisher_score and ReliefF are based on the correlation between the feature and category, and each feature is weighted to obtain features with a high accuracy for different attitude classification. Chi_square determines the influence of features on classification based on the degree of deviation between observed values and theoretical inferred values.
Firstly, the six basic activities and six postural transitions (six postural transitions as one class) were divided into six levels for feature selection, as shown in Table 5. Corresponding features were then selected for each level.

Support Vector Machine
The selection of classifiers for activity recognition is determined by many factors. In addition to accuracy, factors such as the ease of development, computational complexity, and execution speed also affect the selection of classifiers [36]. The support vector machine classifier (SVM) is a popular machine learning method, which is based on finding the optimal separation decision hyperplane between the classes with the maximum margin in the pattern of each class [37]. The SVM actually constructs two parallel hyperplanes as a separation boundary to discriminate the classification of the sample: Each sample of the input data contains a plurality of features and thus constitutes a feature space X i = {x 1 , . . . . . . , x n } ∈ χ. Learning objectives are binary variables, and y ∈ {−1, 1} represents a negative class and a positive class. The parameters ω, b are the normal vectors and the intercept of the hyperplane, respectively. SVM can avoid the complexity of high-dimensional space and has a better generalization and promotion ability. Finally, we chose SVM for predictive classification.

Threshold Classification
The threshold-based classification method divides human body posture by defining the threshold. If the characteristic value in the current window is higher than the threshold value, it is determined to be one kind of action, and another kind of action is determined when the value is lower than the threshold value [38]. An appropriate threshold can reduce the classification error. This paper uses the threshold selection method based on Bayesian decision theory [39], which selects the probability distribution of features to minimize the total segmentation error and obtains the best tradeoff between false positives and false negatives. For a simple activity set, which only includes the activities of moving and not moving, thresholding the standard deviation (STDV) of the 3D acceleration magnitude can result in an accuracy of 99.4% [40].
SD reflects the extent to which the signal fluctuates around its mean. The SD expression is as follows: where n is the length of the window (the number of samples in the current window), s ki is the acceleration of the i-th sample point on the k-th axis, and s k is the average of the sample points of the k-th-axis.
The threshold can be computed by (4) and (5): where µ 1 and µ 2 denote the average of SD, and σ 1 2 and σ 2 2 denote the variance of SD.

Other Classification Methods
The basic idea of linear regression classification (LRC) is to find the best class reconstruction for test samples; that is, the class with the best class reconstruction is regarded as the class of test samples [41]. The Gaussian process model is a new kernel method that has been developed in research on the Bayesian artificial neural network in recent years. In addition to the advantages of the traditional kernel method, it has the advantages of complete Bayesian formulation, easy implementation, and the adaptive acquisition of parameters [42]. A decision tree algorithm has the advantages of a low complexity, a good stability, and being easy to understand. It is easy to evaluate the model by a static test, and the model reliability can be measured; for a given observation model, it is easy to derive a logical expression based on the resulting decision tree [43].

Selection of Data Sets
In many of these studies, the data set is small and homogeneous (e.g., consisting of subjects of the same age group, such as college students) [44]. This study selected the public domain UCI HAR data set (Smartphone-Based Recognition of Human Activities and Postural Transitions Data Set), which includes 12 types of actions (it includes six actions: standing, sitting, lying down, walking, going upstairs, and downstairs) for 30 different collectors (everyone, aged between 19 and 48, was instructed to follow the activity protocol when wearing an SGSII Smartphone at the waist). The frequency of the sensor was 100 Hz.
In order to ensure the accuracy of the calculated classification precision, the data of 30 experimenters in the original data set were divided: the data of the first 15 people were used as the feature selection set, the data from the 16th to the 26th person were used as the training set of the classifier, and the data from the 27th to the 29th person were used as the test set of the classifier. Class distributions at each level are shown in Table 6. In this experiment, the previously calculated complete feature set was classified into six levels by three feature selection algorithms to obtain feature scores. The features were arranged in descending order. Figure 4 shows the comparison of the classification accuracy and the number of selected features. In Figure 4, the number X on the x-axis refers to the first X features with the selected descending order using different feature selection algorithms, and the performance of the corresponding classifier is represented on the y-axis. With the increase in the number of features, the classification accuracy increases to a level close to 1 and then becomes stable. by three feature selection algorithms to obtain feature scores. The features were arranged in descending order. Figure 4 shows the comparison of the classification accuracy and the number of selected features. In Figure 4, the number X on the x-axis refers to the first X features with the selected descending order using different feature selection algorithms, and the performance of the corresponding classifier is represented on the y-axis. With the increase in the number of features, the classification accuracy increases to a level close to 1 and then becomes stable.  We selected the top 40 features with the highest score in each level, as shown in Figure 4. The initial selection of features in the five levels is shown in Table 7.  We selected the top 40 features with the highest score in each level, as shown in Figure 4. The initial selection of features in the five levels is shown in Table 7. Table 7. The feature numbers of classification for the five layers using three feature selection algorithms.

1
In the process of the experiment, the operation time of the classification process and the operation complexity of features should be taken into account. Since a large value of the input feature will increase the classification operation time, the features with a value greater than 10 and the features with a high operation time complexity were removed. Different feature calculations have different time complexities [45].
Due to the high computational complexity of the information-entropy and frequency domain feature, we did not use the features selected by Chi_square. In the first layer, we took the first five selected features of Fisher_score and ReliefF as the subset of the selected features, and the feature numbers selected were 78, 201, 286, 255, 257, 197, 199, 196, 223, and 198. We classified the selected feature subsets into basic activity (BA) and postural transition (PT) input SVMs to achieve a cross-validation accuracy of 0.99. For the selected features of the layer, pairwise combination was used to train the SVM and obtain the classification accuracy, as shown in Table 8.
Among them, the feature combination (223, 78) had the maximum classification accuracy, as shown in Table 8; that is, the optimal feature combination of the first layer is fAcc-X Sample Range and fAcc-X Largest values.
Similarly, in the classification of the other five layers, we took the first five of the selected features of Fisher_score and ReliefF as the subset of selection features. In the second layer, the classification accuracy of the feature subset (235, 363, 276, 301, 247, 199, 197, 196, 198, and 194) was 0.90, and the feature combination (276, 247) had the maximum classification accuracy, as shown in Table 9. The best combination of features for the second layer was the tAccMag 10th percentile and tAccMag inter-quartile range. In the fourth layer, the feature subset (114,275,276,392,235,199,196,278,139,195) we selected had a cross-validation classification accuracy of 0.99. The feature combination (276, 278) had the maximum classification accuracy, as shown in Table 10. The best combination of features for the third layer was the tAccMag 10th percentile and tGravityAccMag 10th percentile. In the sixth layer, the feature subset (432,426,452,446,319,199,197,296,201,194) had a cross-validation classification accuracy that achieved 0.90. The feature combination (319, 296) had the maximum classification accuracy, as shown in Table 11. The best combination of features for the fourth layer was the tAcc-X 50th percentile and tGravityAcc-X 25th percentile. Since classification of the third layer and fifth layer was difficult using only two features, we selected the first 18 features in Fisher_score and ReliefF (130,334,322,4,293,3,32,351,264,81,380,160,33,323,114,161,125,43,333,44,116,159,148,110,460,45,335,230,85,485,193,440,212,74,462, and 465) for the third layer (these features were used to train the SVM and obtain the classification accuracy of 0.99). For the fifth layer, we took the first five of the selected features of Fisher_score and ReliefF (166, 65, 326, 78, 84, 108, 199, 83, 36, and 196). Finally, we chose the feature subset (326, 36, 108, 78, 83, 84) for the fifth layer (the SVM classification accuracy was 0.97). In the experiment, these feature sets obtained a higher classification accuracy than other feature sets.

Threshold Selection of Classification Results
Using the feature set selected in the previous step as the feature used in threshold classification, the classification accuracy of the six levels was calculated, and the features with a better classification effect were selected for classification. In this study, 50% of the database data was used for threshold selection, and the rest of the data was used for precision verification. Firstly, the threshold value of BAs and PTs was selected, and the feature with the best classification effect was fAcc-X Sample Range. The probability distribution of static and dynamic actions is shown in Figure 5, where the calculated threshold is 0.045 and the classification accuracy is 0.91. The threshold value of static and dynamic actions was selected, and the feature with the best classification effect was the tAccMag 75th percentile. The probability distribution of static and dynamic actions is shown in Figure 6, where the calculated threshold is 0.024 and the classification accuracy is 0.99. Furthermore, the threshold value was selected for walking, going upstairs, and going downstairs, and the corresponding probability distribution was obtained by using the characteristic tAccJerk-X Mean crossing rate, as shown in Figure 7. The threshold value was determined to be 0.604, and the corresponding classification accuracy was 0.77. Therefore, it can be seen that the threshold classification is not ideal for walking, going upstairs, and going downstairs. In terms of using the feature tGyro-X 10th percentile to carry out threshold selection for going upstairs and going downstairs, the corresponding probability distribution diagram is shown in Figure 8. The calculated threshold was −0.699, and the classification accuracy was 0.97, thus it can be seen that the classification effect was relatively ideal. Using the tGravityAcc-Y 90th percentile to carry out threshold selection for standing, sitting, and lying, the corresponding probability distribution diagram was developed, as shown in Figure 9. The calculated threshold was 0.255 and the classification accuracy was 0.97, so the classification effect was relatively ideal. In the last step, using the tAcc-X 50th percentile to carry out threshold selection for sitting and lying, the corresponding probability distribution diagram was produced, as shown in Figure 10. The calculated threshold was 0.511 and the classification accuracy was 0.99, so the classification effect was relatively ideal. threshold selection for standing, sitting, and lying, the corresponding probability distribution diagram was developed, as shown in Figure 9. The calculated threshold was 0.255 and the classification accuracy was 0.97, so the classification effect was relatively ideal. In the last step, using the tAcc-X 50th percentile to carry out threshold selection for sitting and lying, the corresponding probability distribution diagram was produced, as shown in Figure 10. The calculated threshold was 0.511 and the classification accuracy was 0.99, so the classification effect was relatively ideal.    threshold selection for standing, sitting, and lying, the corresponding probability distribution diagram was developed, as shown in Figure 9. The calculated threshold was 0.255 and the classification accuracy was 0.97, so the classification effect was relatively ideal. In the last step, using the tAcc-X 50th percentile to carry out threshold selection for sitting and lying, the corresponding probability distribution diagram was produced, as shown in Figure 10. The calculated threshold was 0.511 and the classification accuracy was 0.99, so the classification effect was relatively ideal.    threshold selection for standing, sitting, and lying, the corresponding probability distribution diagram was developed, as shown in Figure 9. The calculated threshold was 0.255 and the classification accuracy was 0.97, so the classification effect was relatively ideal. In the last step, using the tAcc-X 50th percentile to carry out threshold selection for sitting and lying, the corresponding probability distribution diagram was produced, as shown in Figure 10. The calculated threshold was 0.511 and the classification accuracy was 0.99, so the classification effect was relatively ideal.      . Figure 9. Probability distribution of standing, sitting, and lying.

Discussions
Multiple classification models (decision tree, linear regression, Gaussian process, and SVM) were used to further evaluate the accuracy of the classification at six levels, as shown in Table 12. The selected optimal feature set was input into the SVM classifier to train the classification model, and the 27th-29th people in the database were used for cross-validation to calculate the average precision value. It can be observed that the five classification levels can achieve a satisfactory classification effect. According to the experiments presented in the literature [44], when the proportion of the training data set is 70%~90%, the accuracy of the decision tree algorithm is relatively high, and the proportion of the training set selected in this paper was 70%. This method verifies the proportion of the best training set and test set corresponding to the SVM, linear regression, and the Gaussian process, and verifies that its classification accuracy can reach a high level through cross-validation. Figure 11 shows the total accuracy of classification for six BAs and PTs using SVM, a decision tree (DT), linear regression (LR), the Gaussian process (GP), and threshold classification (TH) in six levels. In general analysis, the method of threshold classification has a better effect. As shown in Figure 11, the classification accuracy of the five classifiers is higher in the fifth layer (that is, to distinguish between sitting and lying).

Discussions
Multiple classification models (decision tree, linear regression, Gaussian process, and SVM) were used to further evaluate the accuracy of the classification at six levels, as shown in Table 12. The selected optimal feature set was input into the SVM classifier to train the classification model, and the 27th-29th people in the database were used for cross-validation to calculate the average precision value. It can be observed that the five classification levels can achieve a satisfactory classification effect. According to the experiments presented in the literature [44], when the proportion of the training data set is 70%~90%, the accuracy of the decision tree algorithm is relatively high, and the proportion of the training set selected in this paper was 70%. This method verifies the proportion of the best training set and test set corresponding to the SVM, linear regression, and the Gaussian process, and verifies that its classification accuracy can reach a high level through cross-validation. Figure 11 shows the total accuracy of classification for six BAs and PTs using SVM, a decision tree (DT), linear regression (LR), the Gaussian process (GP), and threshold classification (TH) in six levels. In general analysis, the method of threshold classification has a better effect. As shown in Figure 11, the classification accuracy of the five classifiers is higher in the fifth layer (that is, to distinguish between sitting and lying).

Discussions
Multiple classification models (decision tree, linear regression, Gaussian process, and SVM) were used to further evaluate the accuracy of the classification at six levels, as shown in Table 12. The selected optimal feature set was input into the SVM classifier to train the classification model, and the 27th-29th people in the database were used for cross-validation to calculate the average precision value. It can be observed that the five classification levels can achieve a satisfactory classification effect. According to the experiments presented in the literature [44], when the proportion of the training data set is 70%~90%, the accuracy of the decision tree algorithm is relatively high, and the proportion of the training set selected in this paper was 70%. This method verifies the proportion of the best training set and test set corresponding to the SVM, linear regression, and the Gaussian process, and verifies that its classification accuracy can reach a high level through cross-validation. Figure 11 shows the total accuracy of classification for six BAs and PTs using SVM, a decision tree (DT), linear regression (LR), the Gaussian process (GP), and threshold classification (TH) in six levels. In general analysis, the method of threshold classification has a better effect. As shown in Figure 11, the classification accuracy of the five classifiers is higher in the fifth layer (that is, to distinguish between sitting and lying).  Figure 11. Accuracy for six activities using multiple classification approaches.
It can be seen from Table 12 that the feature selection method proposed in this paper, which uses different feature sets for classification at different levels based on SVM, can achieve better classification results when applied to the decision tree and linear regression classification model, but it is not applicable to the Gaussian process. It uses all selected feature sets in six levels of classification to classify six BAs and PTs. It can be seen from Table 13 that trained classifiers can obtain better classification results. Finally, the data of the 22nd person in the database were used to simulate online prediction human activity recognition. According to Table 11 we decided to use SVM in the first level, TH in the second level, SVM in the third level, DT in the fourth level and fifth level, and TH in the sixth level. The experimental results are shown in Figure 12. The classification accuracy was 0.94. Nicole et al. [45] also used a similar feature selection algorithm, but only considered 76 features and verified them using different classification methods. In this work, 48 features were selected from 585 features, and the frequency domain characteristics were taken into account. The results show that the frequency domain characteristics apply to the first layer, the third layer, and the fifth layer classification. Since the orientation angle of the portable sensor is prone to error, it is finally found that Mag (   2  2  2 x y z + + ) is more suitable for activity classification. Figure 11. Accuracy for six activities using multiple classification approaches.
It can be seen from Table 12 that the feature selection method proposed in this paper, which uses different feature sets for classification at different levels based on SVM, can achieve better classification results when applied to the decision tree and linear regression classification model, but it is not applicable to the Gaussian process. It uses all selected feature sets in six levels of classification to classify six BAs and PTs. It can be seen from Table 13 that trained classifiers can obtain better classification results. Finally, the data of the 22nd person in the database were used to simulate online prediction human activity recognition. According to Table 11 we decided to use SVM in the first level, TH in the second level, SVM in the third level, DT in the fourth level and fifth level, and TH in the sixth level. The experimental results are shown in Figure 12. The classification accuracy was 0.94. Nicole et al. [45] also used a similar feature selection algorithm, but only considered 76 features and verified them using different classification methods. In this work, 48 features were selected from 585 features, and the frequency domain characteristics were taken into account. The results show that the frequency domain characteristics apply to the first layer, the third layer, and the fifth layer classification. Since the orientation angle of the portable sensor is prone to error, it is finally found that Mag ( x 2 + y 2 + z 2 ) is more suitable for activity classification. . The label of walking (WK) is "1", the label of walking upstairs (WU) is "2", the label of walking downstairs (WD) is "3", the label of standing (SD) is "4", the label of sitting (ST) is "5", the label of lying (LY) is "6", and the label of PTs is "7".

Conclusion
This paper proposes a feature selection method that synthesizes multiple feature selection algorithms and considers the combination effect among features. In order to verify this method, we calculated 585 features, including time and frequency domains, and used various classifiers, such as SVM, to evaluate the features selected by the introduced feature selection method, and the validity of the introduced feature has been verified. This paper used the threshold classification method, a Figure 12. The actual label of the 22nd person (a), and the simulated online prediction of the 22nd person (b). The label of walking (WK) is "1", the label of walking upstairs (WU) is "2", the label of walking downstairs (WD) is "3", the label of standing (SD) is "4", the label of sitting (ST) is "5", the label of lying (LY) is "6", and the label of PTs is "7".

Conclusions
This paper proposes a feature selection method that synthesizes multiple feature selection algorithms and considers the combination effect among features. In order to verify this method, we calculated 585 features, including time and frequency domains, and used various classifiers, such as SVM, to evaluate the features selected by the introduced feature selection method, and the validity of the introduced feature has been verified. This paper used the threshold classification method, a decision tree, linear regression, and the Gaussian process to evaluate the classification accuracy of the selected features. The results show that the human activity recognition system based on an inertial sensor has a good classification effect on several human activities and can play a good auxiliary role in a human activity recognition system. The feature selection method works for many classification methods (SVM, TH, DT, LR, and GP). We finally classified the seven activities according to the method presented in Table 14. Compared to other classification methods, the method proposed in this paper can select a feature set with smaller dimensions and obtain higher classification accuracy. In the future, more classification activities and methods are needed to test the algorithm proposed in this paper. At the same time, we hope that IMU can better and more effectively assist the human activities recognition system. tGyroJerk-Z-Sample-Variance, tAccMag-50th-Percentile, tGyro-X-50th-Percentile, tGyro-Y-Mean, tGyro-X-25th-Percentile, tGyro-X-Mean, tGyro-X-Median, tGyro-X-75th-Percentile, tGyro-X-10th-percentile, fGyro-X-Largest-values, tGyro-X-90th-Percentile, tAccMag-Skewness, tGyro-Y-Median, tGyro-X-50th-Percentile, fAccJerk-Y-Smallest-value, tGyroMag-Skewness, tAccJerk-X-Samlpe-Variance, tGyroJerk-Z-Median, tGyroJerk-Z-50th-Percentile, tAccMag-Median, tAcc-X-Samlpe-Variance, tGyroJerk-Z-Skewness, tGyro-X-Skewness, fGyro-X-Smallest-value, tGyroJerk-Z-Power, Median-tGyroMag, tGyroMag-50th-percentile, fAccJerk-Y-Sample-Range, fAccJerk-Y-Largest-value, tGyroAng-Slope, tGyroAng-Kurtosis, tGyroJerk-Z-Root-Mean-Square, tAccJerk-X-Sample-Range, tGyroMag-Largest-values, tGyroMag-Power, tGyroAng-Power