A Method of Human Activity Recognition in Transitional Period

: Human activity recognition (HAR) has been increasingly used in medical care, behavior analysis, and entertainment industry to improve the experience of users. Most of the existing works use ﬁxed models to identify various activities. However, they do not adapt well to the dynamic nature of human activities. We investigated the activity recognition with postural transition awareness. The inertial sensor data was processed by ﬁlters and we used both time domain and frequency domain of the signals to extract the feature set. For the corresponding posture classiﬁcation, three feature selection algorithms were considered to select 585 features to obtain the optimal feature subset for the posture classiﬁcation. And We adopted three classiﬁers (support vector machine, decision tree, and random forest) for comparative analysis. After experiments, the support vector machine gave better classiﬁcation results than other two methods. By using the support vector machine, we could achieve up to 98% accuracy in the Multi-class classiﬁcation. Finally, the results were veriﬁed by probability estimation.


Introduction
The human activity and posture transformation recognition is useful to provid users with valuable situational awareness, thus become one of the hotspots in many fields such as medical care, human-computer interaction, film and television production, and motion analysis [1]. The two dominant approaches for human activity classification used in literature are Vision-based systems and Wearable Sensor-based systems. Vision-based systems are widely used to detection of human parts and identification of daily activities [2]. These systems process the collected visual data for activity classification.
Wearable Sensor based systems consist of multiple inertial sensors connected to a human sensor network. After receiving and executing system commands, the raw human body data would be given feedback [3,4]. Inertial measurement (accelerometers and gyroscopes) units are used to measure the triaxle angular velocity and the triaxle acceleration signals generated during human body movement [5]. Sensors available in smartphones, such as temperature sensors and pressure sensors, are useful to know the surroundings [6]. The data collected from the sensors attached to the user and sensors installed in the surroundings are proceed to provide situational awareness to the user [7]. One of the problems of using accelerometer to detect the motion of an object is that it often affected by the gravitational field in the measurement, and its value (g = 9.81 m/s 2 ) is relatively high. However, many studies have Most human behavior recognition systems developed in the past ignored posture transitions because the incidence of posture transitions is lower and the duration is shorter than other basic physical activities [19]. However, the above assumptions depend on different applications and are not applicable when multiple activities must be performed in a short period of time. On the other hand, in many practical scenarios, such as fitness or disability monitoring systems, determining posture transitions is critical because in these cases the user performs multiple tasks in a short period of time [20]. In fact, in the case of human behavior recognition system and transient posture perception, the classification will change slightly, and the absence of specified posture transformation may lead to poor system performance [21].
A posture transition is a finite duration event determined by its start and end times. In general, the time required for posture transitions between different individuals is different. The posture transition is limited by the other two activities and represents the transition period between the two activities [22]. Basic activities like standing and walking can be extended for a longer period of time than posture transitions. The data collection of the two types of activities is also different. The posture transformation needs to be repeated to obtain a separate sample. Since the basic activities are continuous, multiple window samples can be obtained from a single test according to the limitation of its time range [23].
The other works related to this paper are referred in [24,25]. We have researched a large number of features on HAR assisted by an inertial measurement unit in the past. The various activity features are classified hierarchical, and six basic activities can be identified with an average accuracy of 96.4%. However, the transition period of activities was out of account.
This paper focuses on Human Activity Recognition with postural transition awareness. In this paper, the motion of the human body was sensed by an accelerometer and a gyroscope of the inertial measurement unit. The magnitude and direction of the acceleration can be measured by vertically arranging the sensors in three-dimensional space. It can also be built on a single chip, and it is now common to use three-axis accelerometers in some commercial electronic devices [26]. First, we analyzed the six-axis signal data acquired by the inertial measurement unit, and thenpreprocessed to obtain a variety of signals that can represent the action. The various signals obtained from the preprocessing were extracted in the time domain and the frequency domain using various standard and original measurement methods to characterize each active sample. Thereafter, we perform feature selection according to the specific classification condition by using various feature selection algorithms. A variety of machine learning methods are used to classify and selected the one with the highest classification accuracy. Finally, we use support vector machine to classify the posture. Different kernel functions and specific parameters are used to optimize the model. Figure 1 shows the framework followed in this paper for Activity Recognition. The framework consists of four modules: Data preprocessing, Feature Extraction and Selection, Classifier Selection, and Classifier Evaluation. The details of each module are given in next sections. In Section 2, we described the Data preprocessing, Feature extraction, and Data selection. Section 3 is focused on the Classifier Selection. In Section 4, we discussed Classifier Selection and Results. We concluded the paper in Section 5.

Data Preprocessing
The role of this module is to process the activity data received from the sensors and extract the variety of signals useful for activity recognition.
In this paper, we used the second generation human behavior recognition database available in the University of California Irvine (UCI) public platform [27]. The data set includes 6 basic activities: 3 static poses (standing, sitting, lying) and 3 dynamic poses (walking, downstairs, upstairs) for 30 different volunteers (everyone, aged between 19 and 48, who was instructed to follow the activity protocol when wearing an SGSII Smartphone at the waist as shown in Table 1), each volunteer was asked to do it twice. In addition, all possible pose transitions that occur between the existing three static poses are also available, including: standing-sitting (St-Si), sitting-standing (Si-St), sitting-lying (Si-Li), lying-sitting (Li-Si), standing-lying (St-Li), and lying-standing (Li-St). The frequency of the IMU was 100 Hz.  Table 1 shows all the activity tasks in order, and the corresponding time. In the process of experiment, every posture transformation performed twice by each volunteer. 60 labels were generated for each posture transformation which is accounting for 9% of all recorded experimental data. The duration of each posture tranformation is different, and even reverse transitions (for example, Stand-Sit and Sit-Stand). The average duration of posture transition is 3.7 s, while the basic activity is about 20.1 s. The signals collected from one volunteer were extracted and the data of 12 movements (6 basic movements and 6 posture transformation) were statistically analyzed as shown in Figure 2.
Information 2020, 11, x FOR PEER REVIEW 5 of 18

Data Preprocessing
The role of this module is to process the activity data received from the sensors and extract the variety of signals useful for activity recognition.
In this paper, we used the second generation human behavior recognition database available in the University of California Irvine (UCI) public platform [27]. The data set includes 6 basic activities: 3 static poses (standing, sitting, lying) and 3 dynamic poses (walking, downstairs, upstairs) for 30 different volunteers (everyone, aged between 19 and 48, who was instructed to follow the activity protocol when wearing an SGSII Smartphone at the waist as shown in Table 1), each volunteer was asked to do it twice. In addition, all possible pose transitions that occur between the existing three static poses are also available, including: standing-sitting (St-Si), sitting-standing (Si-St), sitting-lying (Si-Li), lying-sitting (Li-Si), standing-lying (St-Li), and lying-standing (Li-St). The frequency of the IMU was 100 Hz. Stop 0 Table 1 shows all the activity tasks in order, and the corresponding time. In the process of experiment, every posture transformation performed twice by each volunteer. 60 labels were generated for each posture transformation which is accounting for 9% of all recorded experimental data. The duration of each posture tranformation is different, and even reverse transitions (for example, Stand-Sit and Sit-Stand). The average duration of posture transition is 3.7 s, while the basic activity is about 20.1 s. The signals collected from one volunteer were extracted and the data of 12 movements (6 basic movements and 6 posture transformation) were statistically analyzed as shown in Figure 2. We process the original sensor signals obtained from the accelerometer (ar (t)) and the gyroscope (wr (t)) in three steps. First, we used a third-order median filter and a third-order low We process the original sensor signals obtained from the accelerometer (ar (t)) and the gyroscope (wr (t)) in three steps. First, we used a third-order median filter and a third-order low filter with a cutoff frequency of 20 Hz. Second, Battworth filter is applied for (transfer function is H1 (ω)) noise reduction, high-pass filter with a cutoff frequency of 0.3 Hz (transfer function is H2 (ω)) to eliminate the influence of DC bias in the gyroscope. Third, the acceleration signal is divided into gravity g (t) and object motion acceleration a (t).
The sensor data is plotted as Figures 3 and 4. The red line is the acceleration signal in the X-axis, the green line is the acceleration signal in Y-axis, and the blue line is the acceleration signal in Z-axis. It is evident from Figures 3 and 4 that the sensor data in the attitude transition phase changes significantly. The units used for the accelerations are g's, while the gyroscope units are rad'seg. The horizontal axis describes the sampling points which is corresponding to the time. All the preprocessed signals are summarized in Table 2.
Information 2020, 11, x FOR PEER REVIEW 6 of 18 filter with a cutoff frequency of 20 Hz. Second, Battworth filter is applied for (transfer function is H1 (ω)) noise reduction, high-pass filter with a cutoff frequency of 0.3 Hz (transfer function is H2 (ω)) to eliminate the influence of DC bias in the gyroscope. Third, the acceleration signal is divided into gravity g (t) and object motion acceleration a (t).
The sensor data is plotted as Figures 3 and 4. The red line is the acceleration signal in the X-axis, the green line is the acceleration signal in Y-axis, and the blue line is the acceleration signal in Z-axis. It is evident from Figures 3 and 4 that the sensor data in the attitude transition phase changes significantly. The units used for the accelerations are g's, while the gyroscope units are rad'seg. The horizontal axis describes the sampling points which is corresponding to the time. All the preprocessed signals are summarized in Table 2.     filter with a cutoff frequency of 20 Hz. Second, Battworth filter is applied for (transfer function is H1 (ω)) noise reduction, high-pass filter with a cutoff frequency of 0.3 Hz (transfer function is H2 (ω)) to eliminate the influence of DC bias in the gyroscope. Third, the acceleration signal is divided into gravity g (t) and object motion acceleration a (t).
The sensor data is plotted as Figures 3 and 4. The red line is the acceleration signal in the X-axis, the green line is the acceleration signal in Y-axis, and the blue line is the acceleration signal in Z-axis. It is evident from Figures 3 and 4 that the sensor data in the attitude transition phase changes significantly. The units used for the accelerations are g's, while the gyroscope units are rad'seg. The horizontal axis describes the sampling points which is corresponding to the time. All the preprocessed signals are summarized in Table 2.

Name
Quantity

Feature Extraction
We used both the time and the frequency domain to extract the features. Table 3 shows the various measures and formulas used for generating feature sets on a fixed width window of length N, and there is 50% overlap between the two windows. The length of the window used in experiment is 2.56 s, since a person typically takes 1.5 steps per second on average, each window requires at least one full walking cycle. Table 3. Feature Vector.

Function Function Description Formula
In our past work, we extracted a total of 585 features to describe each active window [25]. From the various features tabulated in Table 3, some new features are taken into account. These features are extracted from each axis of the acceleration signal and the angular velocity signal. The statistical features in Table 3 are also applicable to the x-axis, y-axis, z-axis, Mag, differential, and tilt angle of acceleration and angular velocity. Table 3 shows the feature representation form calculated by generating the metrics of the data set and the window signal of length 128. Taking the Mean (v) as an example to perform feature calculation on different processed signals and corresponding feature descriptions. Table 4 shows the characterization of the average value.

Feature Selection
The objective of this step is to select the significant features from the feature set obtained in the feature extraction module to the training model [28,29]. The feature selection methods adopted by most researchers include Filter, Embedded, Wrapper. In this step, we used the filtering methods in the feature selection algorithm. The basic principle of feature selection algorithm is shown in Figure 5. The algorithm uses divergence or correlation indicators to score each feature, and selects features with scores greater than a threshold or selects the top K features with the largest scores. Specifically, calculate the divergence of each feature, remove the features whose divergence is less than the threshold/select the top k features with the largest score; calculate the correlation between each feature and the label, and remove the features/selection with a correlation less than the threshold the top k features with the largest scores.
The advantages of the filtered feature selection algorithm are mainly versatility, low complexity, and fast running speed [30]. In this paper, three filtering feature selection algorithms, Relief-F, Fisher-Score, and Chi-Square, were applied to select the features.
The purpose of selected feature set is to classify the posture transformation between six basic movements (walking, going upstairs, downstairs, sitting, standing, and lying) and to achieve this, we selected 585 features. First, feature selection is made for the two categories: one is six basic actions, and another is six posture transformations. The results are shown in Figure 6. Secondly, the multiple classifications are characterized. The six basic movements are six categories, and another is The algorithm uses divergence or correlation indicators to score each feature, and selects features with scores greater than a threshold or selects the top K features with the largest scores. Specifically, calculate the divergence of each feature, remove the features whose divergence is less than the threshold/select the top k features with the largest score; calculate the correlation between each feature and the label, and remove the features/selection with a correlation less than the threshold the top k features with the largest scores.
The advantages of the filtered feature selection algorithm are mainly versatility, low complexity, and fast running speed [30]. In this paper, three filtering feature selection algorithms, Relief-F, Fisher-Score, and Chi-Square, were applied to select the features.
The purpose of selected feature set is to classify the posture transformation between six basic movements (walking, going upstairs, downstairs, sitting, standing, and lying) and to achieve this, we selected 585 features. First, feature selection is made for the two categories: one is six basic actions, and another is six posture transformations. The results are shown in Figure 6. Secondly, the multiple classifications are characterized. The six basic movements are six categories, and another is all posture transformations. The results are shown in Figure 7.
Specifically, calculate the divergence of each feature, remove the features whose divergence is less than the threshold/select the top k features with the largest score; calculate the correlation between each feature and the label, and remove the features/selection with a correlation less than the threshold the top k features with the largest scores.
The advantages of the filtered feature selection algorithm are mainly versatility, low complexity, and fast running speed [30]. In this paper, three filtering feature selection algorithms, Relief-F, Fisher-Score, and Chi-Square, were applied to select the features.
The purpose of selected feature set is to classify the posture transformation between six basic movements (walking, going upstairs, downstairs, sitting, standing, and lying) and to achieve this, we selected 585 features. First, feature selection is made for the two categories: one is six basic actions, and another is six posture transformations. The results are shown in Figure 6. Secondly, the multiple classifications are characterized. The six basic movements are six categories, and another is all posture transformations. The results are shown in Figure 7.  In Figures 6 and 7, the abscissa refers to the number of features selected by the three feature selection algorithms, and the ordinate refers to the classification accuracy. It can be seen from Figures 6 and 7 that the classification accuracy increases gradually with increase in the number of selected features and approaches to 1. The ordering of the abscissa features in the three feature selection algorithms is sorted according to the scores of the features in the three algorithm principles.
In order to further select features of smaller dimensions to classify human poses with higher accuracy, we first input the first feature selected by each algorithm, that is, the three features into the classifier for training, obtain a classification model, and test it. If the test accuracy does not reach the ideal value, the first two features selected by each feature selection algorithm are selected for classification training, and so on, the feature combination with the highest classification accuracy is selected.
Finally, the features with highest score got from three feature selection methods were selected in the two categories: the maximum value in the fAcc (X) sequence, the frequency signal kurtosis in the fAcc (Y) sequence, and the sample range of the fAcc (X) sequence. In order to ensure classification accuracy in multiple classifications, 30 features (The top ten features selected by each feature selection method) were selected as shown in Table 5.  In Figures 6 and 7, the abscissa refers to the number of features selected by the three feature selection algorithms, and the ordinate refers to the classification accuracy. It can be seen from Figures 6  and 7 that the classification accuracy increases gradually with increase in the number of selected features and approaches to 1. The ordering of the abscissa features in the three feature selection algorithms is sorted according to the scores of the features in the three algorithm principles.
In order to further select features of smaller dimensions to classify human poses with higher accuracy, we first input the first feature selected by each algorithm, that is, the three features into the classifier for training, obtain a classification model, and test it. If the test accuracy does not reach the ideal value, the first two features selected by each feature selection algorithm are selected for classification training, and so on, the feature combination with the highest classification accuracy is selected.
Finally, the features with highest score got from three feature selection methods were selected in the two categories: the maximum value in the fAcc (X) sequence, the frequency signal kurtosis in the fAcc (Y) sequence, and the sample range of the fAcc (X) sequence. In order to ensure classification accuracy in multiple classifications, 30 features (The top ten features selected by each feature selection method) were selected as shown in Table 5.

Classifier Selection
We used Support Vector Machine (SVM), which is a supervised machine learning algorithm developed in the last century and often used in statistical classification problems [31]. It was more often applied to the two-classification problem. The basic model is a linear classifier, which is transformed into a convex quadratic programming problem by maximizing the interval [32]. SVM is effective in high-dimensional space and suitable for situations where the dimensions are larger than the samples. Different kernel functions can be formulated for different scenarios. Linear separable samples can be classified by linear function. In diverse dimensions, the classifier shows different forms, such as a straight line for two-dimensions as shown in Figure 8, a plane for three-dimension and hyperplane for high-dimensional space.  The decision tree is a tree that is constructed according to different strategies. By training the input data, the decision tree can be constructed, which can classify the unknown data efficiently, that is, predict the future based on the known [33]. It is a tree structure algorithm composed of root node, internal node, and leaf node. The core idea of the decision tree algorithm is to select attributes based on information gain and select the attribute with the largest information gain as the root [34]. The root is the top classification condition, each node of the tree acts as a test point on the property. The leaf node represents each category number, and the branch is on behalf of the output of each criteria. A binary tree has two branches on each node, while a node in a multi-tree has more than two branches.
The random forest algorithm is mainly based on the model aggregation idea, and has high precision in the classification and regression of high dimensional uncertainties [35]. The key idea under the random forest classifier is to grow a large number of unbiased decision trees from the guided samples, where each tree is voted for an activity class, and the random forest finally selects the most voted classification in the forest [36]. The random forest starts by selecting guide samples from the original training data. Then learning each guide sample through the decision tree. Only a small number of variables are available for binary partitioning on each node.
In the previous section, three filtering feature selection algorithms were used to select three features for the two-category case, and 30 features were selected for the multi-classification case. Next, for the different classification cases, three features and 30 features were respectively applied to The decision tree is a tree that is constructed according to different strategies. By training the input data, the decision tree can be constructed, which can classify the unknown data efficiently, that is, predict the future based on the known [33]. It is a tree structure algorithm composed of root node, internal node, and leaf node. The core idea of the decision tree algorithm is to select attributes based on information gain and select the attribute with the largest information gain as the root [34]. The root is the top classification condition, each node of the tree acts as a test point on the property. The leaf node represents each category number, and the branch is on behalf of the output of each criteria. A binary tree has two branches on each node, while a node in a multi-tree has more than two branches.
The random forest algorithm is mainly based on the model aggregation idea, and has high precision in the classification and regression of high dimensional uncertainties [35]. The key idea under the random forest classifier is to grow a large number of unbiased decision trees from the guided samples, where each tree is voted for an activity class, and the random forest finally selects the most voted classification in the forest [36]. The random forest starts by selecting guide samples from the original training data. Then learning each guide sample through the decision tree. Only a small number of variables are available for binary partitioning on each node.
In the previous section, three filtering feature selection algorithms were used to select three features for the two-category case, and 30 features were selected for the multi-classification case. Next, for the different classification cases, three features and 30 features were respectively applied to the three classifiers, and the test set classification accuracy is shown in Tables 6 and 7. According to the analysis of the classification results, there is no significant difference between the classification accuracy of the three sets of testers. We found that the results of the SVM are better than the other two. Precision, recall and F1-score is the evaluation index of the classification results. Avg/total calculates the mean value of entirety, which represents the overall situation of evaluation index. We used the features selected by Fisher-Score, Relief-F and Chi-Square to train the SVM, and the training set accuracy is shown in Table 8.

Classifier Parameter Selection
In this Module, we used the support vector machine as a common classifier to classify the pose. The role of the kernel function is to map the input space to a high-dimensional space with certain rules, and construct an optimal separation hyperplane in it, and finally achieve the effect of separating nonlinear data [37]. We mainly used linear and Radial Basis Function.
If we learn and test the classifier model on the same subset of data, it will lead over-fitting phenomenon which can be avoided by cross-validation.
The data of 30 volunteers in the original data set were divided: the data of the first 15 people were used as the feature selection set, the data from th 16th to 26th person were used as the training set of the classifier, and the others were used as the test set of the classifier.

Classifier Linear Kernel Parameter Selection
A commonly used parameter in a linear kernel is the penalty factor C. When the value of C is large, the misclassification is less, the fitting to the sample is better, but it is easy to cause overfitting [38]. Although the possibility of misclassification becomes larger and the fit to the sample is degraded, the prediction effect may be more desirable due to the influence of noise between the samples [39].
First, based on the three features selected in the previous section, the linear kernel support vector machine is used to solve the two-class problem in behavior recognition. Figure 9 shows the selection process for parameter C in the two classifications. Next, based on the 30 features selected in the previous section, we used the linear kernel support vector machine to solve the seven classification problem in behavior recognition. Figure 10 shows the selection process of parameter C in the multi-class.  In Figures 9 and 10, the upper line represents the test set classification accuracy, and the lower line represents the cross-validation average. The abscissa shows the change of the penalty factor C, and the ordinate indicates the classification accuracy. It can be seen that with the increase of the penalty factor C, the classification accuracy and cross-validation average of the test set increase, but when the value of C is too large, the classification accuracy decreases slightly. In the process of processing the data, the larger the value of C, the more the error cannot be tolerated, and the time required for data processing will be longer. However, if the value of C is too small, we cannot guarantee that the parameter can be applied to other data sets. However, It still has a better effect. Therefore, considering the comprehensive consideration, we used the penalty factor value equals to 1. The 27th-29th people in the database were used for cross-validation to calculate the average precision value, mean value and standard deviation. We noticed that the classification accuracy of the test set is 0.973, the average cross-validation is 0.956, and the standard deviation of cross-validation is 0.042, which can achieve the desired effects. The factor C has a value of 1, and the classification accuracy of the test set is 0.975, the cross-validation average is 0.972, and the cross-validation standard deviation is 0.033, which can achieve the desired effect.

Classifier RBF Kernel Parameter Selection
The radial basis function (RBF) is a localized kernel function whose role is to map samples to high dimensional space. There are two main parameters in the classifier of RBF: the penalty factors C  In Figures 9 and 10, the upper line represents the test set classification accuracy, and the lower line represents the cross-validation average. The abscissa shows the change of the penalty factor C, and the ordinate indicates the classification accuracy. It can be seen that with the increase of the penalty factor C, the classification accuracy and cross-validation average of the test set increase, but when the value of C is too large, the classification accuracy decreases slightly. In the process of processing the data, the larger the value of C, the more the error cannot be tolerated, and the time required for data processing will be longer. However, if the value of C is too small, we cannot guarantee that the parameter can be applied to other data sets. However, It still has a better effect. Therefore, considering the comprehensive consideration, we used the penalty factor value equals to 1. The 27th-29th people in the database were used for cross-validation to calculate the average precision value, mean value and standard deviation. We noticed that the classification accuracy of the test set is 0.973, the average cross-validation is 0.956, and the standard deviation of cross-validation is 0.042, which can achieve the desired effects. The factor C has a value of 1, and the classification accuracy of the test set is 0.975, the cross-validation average is 0.972, and the cross-validation standard deviation is 0.033, which can achieve the desired effect.

Classifier RBF Kernel Parameter Selection
The radial basis function (RBF) is a localized kernel function whose role is to map samples to high dimensional space. There are two main parameters in the classifier of RBF: the penalty factors C In Figures 9 and 10, the upper line represents the test set classification accuracy, and the lower line represents the cross-validation average. The abscissa shows the change of the penalty factor C, and the ordinate indicates the classification accuracy. It can be seen that with the increase of the penalty factor C, the classification accuracy and cross-validation average of the test set increase, but when the value of C is too large, the classification accuracy decreases slightly. In the process of processing the data, the larger the value of C, the more the error cannot be tolerated, and the time required for data processing will be longer. However, if the value of C is too small, we cannot guarantee that the parameter can be applied to other data sets. However, It still has a better effect. Therefore, considering the comprehensive consideration, we used the penalty factor value equals to 1. The 27th-29th people in the database were used for cross-validation to calculate the average precision value, mean value and standard deviation. We noticed that the classification accuracy of the test set is 0.973, the average cross-validation is 0.956, and the standard deviation of cross-validation is 0.042, which can achieve the desired effects. The factor C has a value of 1, and the classification accuracy of the test set is 0.975, the cross-validation average is 0.972, and the cross-validation standard deviation is 0.033, which can achieve the desired effect.

Classifier RBF Kernel Parameter Selection
The radial basis function (RBF) is a localized kernel function whose role is to map samples to high dimensional space. There are two main parameters in the classifier of RBF: the penalty factors C and σ [40]. The parameter σ reflects the clustering of the points after the mapping. The smaller the parameter σ, the distance between the mapped points tends to be equal, and the classification of the points will be finer, which will easily lead to overfitting. The larger the parameter σ, the coarser the classification will be, making it impossible to distinguish the data.
In the process of selecting the penalty factor C and the parameter σ, when the value of C is too large, over-fitting is easy to occur. When the value of σ is too small, the more support vectors are, the finer the classification is, and over-fitting easily occurs. And the increasing of the number of support vectors affects the speed of training and prediction [41]. The cross-validation is also used to determine whether the classification result has been over-fitted.
First, based on the three features selected in the previous section, the classifier of the radial basis kernel was used to solve the two-class problem in behavior recognition. Figure 11 shows the selection process of parameters C and σ in dichotomies. We used radial basis kernel support vector machine to solve the seven classification problem in behavior recognition based on the 30 features selected in the previous section. Figure 12 shows the selection process for parameters C and σ in the multi-category.
Information 2020, 11, x FOR PEER REVIEW 14 of 18 vectors affects the speed of training and prediction [41]. The cross-validation is also used to determine whether the classification result has been over-fitted. First, based on the three features selected in the previous section, the classifier of the radial basis kernel was used to solve the two-class problem in behavior recognition. Figure 11 shows the selection process of parameters C and σ in dichotomies. We used radial basis kernel support vector machine to solve the seven classification problem in behavior recognition based on the 30 features selected in the previous section. Figure 12 shows the selection process for parameters C and σ in the multi-category. There are two subgraphs in Figures 11 and 12. The abscissa shows the change of the parameter σ and the ordinate shows the change of the parameter C. While Figures 11 and 12a shows the classification accuracy of the test set, and Figures 11 and 12b represents the cross validation average. The darker the color, the larger the value. When the penalty factor C is too small and the parameter value σ is too large, the classification accuracy may not reach the ideal value. However, excessive pursuit of classification accuracy may cause computational complexity. Considering comprehensively, when the penalty factor C is selected as 100 and the parameter is selected as 0.00001 in the second classification, the classification accuracy of the test set is 0.973, the cross-validation average is 0.975, and the cross-validation standard deviation is 0.011, which can achieve the desired effect, the penalty factor C in the seven classification. When the parameter is selected and the parameter is 0.001, the classification accuracy of the test set is 0.978, the average vectors affects the speed of training and prediction [41]. The cross-validation is also used to determine whether the classification result has been over-fitted. First, based on the three features selected in the previous section, the classifier of the radial basis kernel was used to solve the two-class problem in behavior recognition. Figure 11 shows the selection process of parameters C and σ in dichotomies. We used radial basis kernel support vector machine to solve the seven classification problem in behavior recognition based on the 30 features selected in the previous section. Figure 12 shows the selection process for parameters C and σ in the multi-category. There are two subgraphs in Figures 11 and 12. The abscissa shows the change of the parameter σ and the ordinate shows the change of the parameter C. While Figures 11 and 12a shows the classification accuracy of the test set, and Figures 11 and 12b represents the cross validation average. The darker the color, the larger the value. When the penalty factor C is too small and the parameter value σ is too large, the classification accuracy may not reach the ideal value. However, excessive pursuit of classification accuracy may cause computational complexity. Considering comprehensively, when the penalty factor C is selected as 100 and the parameter is selected as 0.00001 in the second classification, the classification accuracy of the test set is 0.973, the cross-validation average is 0.975, and the cross-validation standard deviation is 0.011, which can achieve the desired effect, the penalty factor C in the seven classification. When the parameter is selected and the parameter is 0.001, the classification accuracy of the test set is 0.978, the average There are two subgraphs in Figures 11 and 12. The abscissa shows the change of the parameter σ and the ordinate shows the change of the parameter C. While Figures 11 and 12a shows the classification accuracy of the test set, and Figures 11 and 12b represents the cross validation average. The darker the color, the larger the value. When the penalty factor C is too small and the parameter value σ is too large, the classification accuracy may not reach the ideal value. However, excessive pursuit of classification accuracy may cause computational complexity. Considering comprehensively, when the penalty factor C is selected as 100 and the parameter is selected as 0.00001 in the second classification, the classification accuracy of the test set is 0.973, the cross-validation average is 0.975, and the cross-validation standard deviation is 0.011, which can achieve the desired effect, the penalty factor C in the seven classification. When the parameter is selected and the parameter is 0.001, the classification accuracy of the test set is 0.978, the average cross-validation is 0.938, and the cross-validation standard deviation is 0.057, which can achieve the desired effect.

Probability Estimation
Commonly used SVM can only generate categories without probability. The probability estimation can be used to transform the classification result of the support vector machine, that is, the probability that a sample belongs to each category [42].
The probabilistic calibration used in this study is isotonic regression, which is a nonparametric method. The core idea is to fit the deviation between the current classifier output and the real results. Isotonic regression is suitable for cases with large sample sizes, and over-fitting is prone to occur when the sample size is small. The Brier score can be used to evaluate the results of the probabilistic calibration. The Brier score is a loss, so the smaller score is better [43]. In all categories in which N predictions are aggregated, the Brier score measures the mean square error between the predicted probability and the actual probability assigned to the category. Therefore, for a set of predictionsmeans the lower the Brier score, the better the prediction calibration effects.
In this paper, we used data of five volunteers on which we used the support vector machine to learn and classify, and then uses isotonic regression to probabilistically estimate the data compiled by the volunteers. Due to individual differences, they completed each activity in different time actually. In order to maintain the integrity of a whole set of actions, result of one volunteer was presented only in Figure 13. In Figure 13, the abscissa is the test set data corresponding to different postures randomly selected from the volunteer data, and the ordinate is the predicted probability value obtained by estimating the probability of the data. The seven different colored lines represent the probability that the data is predicted into seven categories.

Probability Estimation
Commonly used SVM can only generate categories without probability. The probability estimation can be used to transform the classification result of the support vector machine, that is, the probability that a sample belongs to each category [42].
The probabilistic calibration used in this study is isotonic regression, which is a nonparametric method. The core idea is to fit the deviation between the current classifier output and the real results. Isotonic regression is suitable for cases with large sample sizes, and over-fitting is prone to occur when the sample size is small. The Brier score can be used to evaluate the results of the probabilistic calibration. The Brier score is a loss, so the smaller score is better [43]. In all categories in which N predictions are aggregated, the Brier score measures the mean square error between the predicted probability and the actual probability assigned to the category. Therefore, for a set of predictionsmeans the lower the Brier score, the better the prediction calibration effects.
In this paper, we used data of five volunteers on which we used the support vector machine to learn and classify, and then uses isotonic regression to probabilistically estimate the data compiled by the volunteers. Due to individual differences, they completed each activity in different time actually. In order to maintain the integrity of a whole set of actions, result of one volunteer was presented only in Figure 13. In Figure 13, the abscissa is the test set data corresponding to different postures randomly selected from the volunteer data, and the ordinate is the predicted probability value obtained by estimating the probability of the data. The seven different colored lines represent the probability that the data is predicted into seven categories. The Brier score is then used to evaluate the results of theprobability estimates. The average results of five volunteers are shown in Table 9. The column labels in the table represent which actions the selected data comes from, the row labels represent the seven categories, and the values in the table are the obtained Brier scores. The Brier score on the diagonal in the table is relatively small, so the result of the probability estimation achieves the desired effects. Comparing with the experiments adopted SVM in the literature [44], SVM with kernel parameter selection adjustment has The Brier score is then used to evaluate the results of theprobability estimates. The average results of five volunteers are shown in Table 9. The column labels in the table represent which actions the selected data comes from, the row labels represent the seven categories, and the values in the table are the obtained Brier scores. The Brier score on the diagonal in the table is relatively small, so the result of the probability estimation achieves the desired effects. Comparing with the experiments adopted SVM in the literature [44], SVM with kernel parameter selection adjustment has a significantly higher effectiveness and accuracy in identifying "walking", "upstairs", and "downstairs".

Conclusions
In recent years, research on behavioral recognition methods for transitional attitude perception has become more and more widely used in many fields such as medical care. Based on the evaluated human behavior recognition data set it is found that the three-axis acceleration values of different static actions are significantly different, the three-axis angular velocity values are basically the same, and the posture conversion data between static actions changes significantly. It is undeniable that the data of the static posture is not always stable, as it cannot be guaranteed that the volunteer was completely still while sitting (or standing or lying) during the experiment.
We used Fisher-Score, Relief-F, and Chi-Square to select 585 features to obtain relatively good features set for classification. The features with higher scores were calculated using methods such as maximum value, minimum value, variance, skewness, kurtosis, and information entropy. The investigation shows that support vector machine gives better results than decision tree and random forest. In the second classification, the classification accuracy of the linear kernel (C = 1) is 97%, and the classification accuracy of the RBF kernel (C = 1, σ = 0.001) in the multi-class is 98%. Probability estimation overcomes some of the shortcomings of SVM and can directly output the probability that the data belongs to each category, thus making the results more intuitive.