A Novel Segmentation Scheme with Multi-Probability Threshold for Human Activity Recognition Using Wearable Sensors

In recent years, much research has been conducted on time series based human activity recognition (HAR) using wearable sensors. Most existing work for HAR is based on the manual labeling. However, the complete time serial signals not only contain different types of activities, but also include many transition and atypical ones. Thus, effectively filtering out these activities has become a significant problem. In this paper, a novel machine learning based segmentation scheme with a multi-probability threshold is proposed for HAR. Threshold segmentation (TS) and slope-area (SA) approaches are employed according to the characteristics of small fluctuation of static activity signals and typical peaks and troughs of periodic-like ones. In addition, a multi-label weighted probability (MLWP) model is proposed to estimate the probability of each activity. The HAR error can be significantly decreased, as the proposed model can solve the problem that the fixed window usually contains multiple kinds of activities, while the unknown activities can be accurately rejected to reduce their impacts. Compared with other existing schemes, computer simulation reveals that the proposed model maintains high performance using the UCI and PAMAP2 datasets. The average HAR accuracies are able to reach 97.71% and 95.93%, respectively.


Introduction
With the rapid development in the fields of internet of things (IoT), human activity recognition (HAR) has gradually become a research hotspot these days. HAR provides the detection, interpretation, and recognition of different kinds of activities such as walking, running, eating, lying down, sitting, etc. Recently, numerous research works on HAR have been conducted, and most of the works are on healthcare [1,2], surveillance activities [3,4], context-aware computing [5,6], and smart home [7]. For example, in the medical industry, the accurate detection of human movement by HAR supports the development of autonomous machine-based diagnostic systems. For smart home and video surveillance, HAR applications can assist family members in remotely monitoring abnormal behaviors and the physical health conditions of the elderly and children. There mainly exist two data types, video based and sensor based, which are usually applied for HAR. Compared with the video type, the sensor type is more widely utilized since no image information of users is required, which can protect the user privacy [8].
In order to collect sensor-based data, external sensors and wearable sensors are always deployed in the HAR system [9]. For the former, the devices are fixed in a predetermined place, so the inference of activity entirely depends on the voluntary interaction between users and sensors, such as smart home environment. However, wearable sensors can support users in dealing with HAR with the data (such as accelerometer data, temperature, heart rate, etc.) collected anytime and anywhere [1]. They are widely used during HAR analysis due to their advantages of being light weight and easy to carry, having flexible installation position, and having low power consumption [5]. In recent years, the continuous development of machine learning technology has also provided efficient algorithms for HAR, such as support vector machine (SVM), K-nearest neighbor (KNN) and decision tree (DT) [10]. Here, one of the key steps is feature extraction; the extracted features include statistical features which depend on the original signals (time and frequency domain features), and cross coding (such as Fourier transform and wavelet transform) [11]. In addition, with the successful applications of deep learning technology in the field of computer vision [4], convolutional neural network (CNN), long short-term memory (LSTM), bidirectional LSTM (BLSTM), multi-layer perceptron (MLP), etc., are also introduced for sensor-based HAR [12]. It automatically extracts relevant features by constructing multilayer deep structures [13]. Compared with the traditional classification algorithms, deep learning is able to automatically extract proper features [3]. However, a large number of samples are required for accurate analysis, and expensive hardware is indispensable to build a proper deep learning model [14].
The general process of simple activity recognition is first to identify the action segments manually from the action time series, then the HAR classifier can be generated after feature extraction and training process. However, only parts of data and labels of related actions can be known in real collected time series. There exist many challenges to identify the main human activities in a complete time series. For example, it is difficult for the trained model to classify the human activities which have not been learned before, and each segmented window may contain multiple types of activities which improve the difficulty of the classification. In addition, the starting point and the ending point of each main activity from the complete time series should be exactly found out. Additionally, body jitter and useless segments may have similar characteristics to the main activities, which decreases the accuracy of HAR. Therefore, it is an important issue to effectively identify the main activities from the time series and reject unknown activities. In [15], Gupta and Dallas (later referred to as the GD algorithm) was proposed using Relief-F and sequential forward floating search (SFFS) for feature selection. Here, naive Bayes (NB) and KNN were applied to identify six kinds of daily life activities and transition activities with a fixed window size of 6 s. In [10,[16][17][18][19][20], researchers used different segmentation methods. Ref. [19] proposed to use the adaptive time window in a quasi-periodic part and fixed time window in a non-periodic part. Ref. [20] proposed an adaptive signal segmentation method to detect transition activities, and integrated it with the activity classification algorithm to overcome the limitations of the sliding window with a fixed size used in the existing work. However, these approaches require large computation, and the accuracy of the classifier still can be improved.
In this paper, a novel segmentation model based on the multi-probability threshold is proposed for complex activity recognition, and the corresponding algorithm is developed based on the characteristics of typical activities. According to the small fluctuation of static activity data, a new threshold-segmentation (TS) algorithm is proposed to find the optimal threshold according to the related measurements. Periodic-like activity has typical signal characteristics, such as peak and trough points. Through the connection of peak and trough points, the corresponding gradient area can be used to obtain the optimal threshold. Additionally, in order to identify the periodic-like activity, the slope-area (SA) filtering algorithm is applied to eliminate the abnormal points in the time series. Here, a new multi-label weighted probability model (MLWP) algorithm is proposed to obtain the probability of each activity which can be estimated by overlapping the sliding window and combining with the proposed segmentation algorithm. In addition, the threshold, θ reject , can support to distinguish whether the segment is the main activity or the unknown activity. The proposed method is evaluated using two common benchmarks of HAR datasets, UCI and PAMAP2. Computer simulation reveals that the proposed segmentation and recognition model significantly improves the recognition accuracy and has relatively low computational complexity. The main contributions are as follows: • The TS algorithm is proposed according to the stationary of the static signal. A new indicator, F ab , is estimated to identify the optimal threshold and segment from the static interval of the unknown time series. • The SA algorithm is proposed according to the peak and trough of the periodic-like signal. Two new notions, slope and area, are employed to eliminate the abnormal points which support to identify the suspected periodic-like interval of the unknown time series. • Combined with the pre-segmentation results, a multi-probability threshold recognition model is proposed, which not only substantially improves the accuracy of HAR, but also effectively distinguishes the useless segments in the complex continuous time series.
The remainder of this paper is organized as follows. Section 2 provides the related work of HAR. Section 3 describes the proposed multi-probability threshold recognition model and the segmentation algorithms. Section 4 introduces the HAR data set and shows the performance evaluation of the proposed scheme. Finally, Section 5 summarizes the paper and lists the future work.

Human Activity Recognition
Recent HAR researches focus on typical activities (e.g., walking, standing, sitting, and running). However, human daily activities are always complex and continuous, which may include transition and atypical actions. Esfahani et al. showed that location-aware multi-sensors (PAMS) can significantly improve the classification accuracy of HAR [7]. Gyllensten et al. [21] used traditional machine learning technologies to classify static and dynamic actions in human daily life. Wan et al. [11] applied deep learning methods to identify human activities, including CNN, LSTM and other methods. Gyroscopes can also be used for HAR. It has been proved that the use of gyroscopes and accelerometers can improve the recognition performance [22]. In [23], the hidden Markov model (HMM) was introduced to detect feeding activities with the collected data of acceleration and angular velocity of the arms, and the accuracy rate reached 84.3%.
In [24] , the researchers proposed a lightweight CNN using Lego filters for HAR, which can greatly reduce the cost of memory and computation compared with traditional CNN. Ref. [25] introduced the mixed channel and time attention mechanism into CNN, which enhanced the interpretability. CondConv [26] was employed to replace the standard convolution procedure in CNN. The performance of the model can be improved by increasing the number of experts. Yang et al. [27] quantified the weight and adopted dynamic fusion strategy for different types of activities, which achieved good results on multiple data sets and greatly saved memory. Since deep learning methods require a large number of samples and expensive hardware to train the model, the shallow learning method is mainly focused on in this paper. Experiments show that it can also achieve good classification results with fewer computing resources.

Signal Segmentation
HAR can be essentially simplified as a multivariate time series classification problem [4]. The signal is divided into different fragments by using segmentation methods, and then these fragments are mapped to specific activities [3]. In [18], the researchers proposed a method to dynamically adjust the window size based on entropy for activity recognition, but it did not consider the transition actions. Ref. [28] applied a data stream segmentation algorithm to adjust the window size according to whether the data value is stable. These algorithms are very sensitive to noise. Therefore, it is necessary to preprocess signals before recognition. There exist many kinds of filters, such as the Butterworth filter [28], Chebyshev filter, Bessel filter and Elliptic filter [29]. In [9], researchers used the Butterworth filter to process acceleration data and achieved good results. Referring to the extraction algorithm of fundamental tone in speech signal processing, an adaptive time window method was employed to accurately extract features from periodic-like signals for HAR [19]. Experiment showed that it has a good recognition rate for dynamic and static activities. A symbol-based segmentation method [30] was proposed to detect the gait phase and transmit important dynamic information from the accelerometer signal. Here, a symbol-based symmetry index was introduced to replace the traditional one.
As shown in Figure 1, sliding window is a typical segmentation method to solve the HAR problem [5], which can be mainly divided into two types: time based and activity based. The time-based type is the window segmentation of the original signal. Jorge-L et al. [17] proposed the transition-aware human activity recognition (TAHAR) system architecture, which has greatly improved the recognition of transition actions in UCI [9], PAMAP2 [31] and REALDISP dataset. Noor et al. [20] used the adaptive window segmentation method to solve the limitation of the fixed window segmentation in UCI [9] dataset. Here, the window was adaptively expanded according to the probability of the action in the window. Activity-based segmentation is the window segmentation of the data segments of each activity. Fidad et al. studied the recognition effect of different lengths of windows on short-term activities (sitting, standing and transition) and long-term activities (walking, upstairs and downstairs) and used a self-collected dataset, for which subjects wore a tri-axial accelerometer on their waist [10]. Since the gait recognition performance decreases with the change of walking speed, Sun et al. [16] proposed a gait segmentation method based on adaptive speed, and the threshold was generated by single match. The ZJU-GaitAcc public dataset and self-collected dataset were utilized in the comparative experiment. For these two sliding window segmentation methods, the activity-based type does not need to consider the situation of useless segments or multiple activities in one window, which can achieve better accuracy. However, for the complex and continuous time series of HAR, data contain useless segments, and the starting point of the activity is unknown. This paper adopts the time-based sliding window segmentation method.

Problem Formalization
In this paper, assume that volunteers have k sensors in different parts of their bodies, while all sensors have the same sampling frequency and emission time synchronization. Usually, wearable sensors, such as smart phones and the inertial measurement unit (IMU), are equipped with accelerometers, gyroscopes and magnetometers. Each sensor can generate multi-dimensional signals (for example, accelerometers can generate three-dimensional signals along the x-axis, y-axis and z-axis), and the signals generated by all sensors can be expressed as a multi-dimensional time series T as shown in Equation (1). Here, T t represents the 1 × k output vector by the k sensors at time t, and T kt represents the output data of the kth sensor at time t, so T is a matrix with size t × k.
As shown in Figure 2, it is assumed that volunteers perform a total of N different daily activities during time t, including some useless segments caused by body jitter or manually unmarked segments. Let A={ g 1 , g 2 , . . . , g N , g τ } represent the whole recognition set of activities, where g τ is the category of useless segments. Then the complex HAR problem can be described as follows: given an unknown S, find the various activities occurring in S and identify their corresponding starting and ending positions. The mathematical description is shown in Equation (2), where S u i r i represents the sequence segment of the ith activity segment from time u i to r i , and o is the number of activity segments in the time series.  Figure 3 shows an overall framework which is proposed to segment and identify unknown time series with multiple activities. The black, orange, blue, and the black dotted arrow represent the training procedure, TS algorithm and its optimization, SA algorithm, and MLWP algorithm and testing procedure, respectively. Therefore, the framework mainly includes four procedures:

The Proposed Framework
• The training set is segmented by sliding window based on activity, and the corresponding time-frequency domain features are extracted manually. The recognition model is trained by traditional classifiers (SVM, DT, NB, etc.). • For the training set, TS and its optimization algorithms are used to find the optimal threshold parameters, c best and d best , and apply them to the testing set to identify the suspected static segmentations in the time series. • For the training set, the peak-trough method is applied to estimate the related slope, K min , and area, S max . The SA algorithm is used to detect and eliminate the outliers, and the suspected periodic-like segmentations in the testing set can be determined. • The testing set is segmented according to the method of overlapping sliding window and feature extraction, and multi-class labels are generated by training the model. Combined with the basic activity segmentations identified before, the probability vector of each window can be obtained by the MLWP algorithm. Correct activity category and unknown ones of the window can be distinguished by θ reject .  Section 3.3 shows the data preprocessing. In Section 3.4, the TS algorithm and optimization algorithm are explained in detail, and the optimal threshold c best and d best are obtained. Section 3.5 shows how to segment periodic-like interval and detailed description of the exclusion of outliers. After the test sample is pre-segmented, activity recognition was carried out according to MLWP algorithm in Section 3.6.

Filtering and Feature Extraction
In the real environment, the signal generated by the sensor usually contains noise, and even the data can be lost. Therefore, it is necessary to preprocess the raw signal first. In order to reduce the interference of random noise on the signal, the median filter and thirdorder Butterworth filter are employed to handle the original signal. Here, the acceleration and angular velocity data are utilized to extract features in order to improve the HAR performance [16]. Six new sets of data, A x , A y , A z , G x , G y and G z , are generated by obtaining derivatives with respect to the original data (including A x , A y , A z , G x , G y and G z ) from each sensor. In addition, the Euclidean norm of the original acceleration, R A , and angular velocity data, R G , can be calculated to obtain two new sets of data. Therefore, in total, 14 × k sets of data are obtained, which include 6 sets of original data and 8 sets of generated new data, where k is the number of sensors. The sliding window method is used to extract 7 time domain features (mean value, standard deviation, mode, maximum, minimum, skewness and kurtosis) and three frequency domain features (gravity frequency (the weighted average of the amplitude of the power spectrum), frequency variance and mean square frequency) from each set of data of each window so that each sliding window can obtain a total of 140 × k sets of statistical features. The initial feature set and descriptions of 14 signals of HAR are listed in Tables 1 and 2.

A x
Acceleration of x-axis A y Acceleration of y-axis A z Acceleration of z-axis G x Angular velocity of x-axis G y Angular velocity of y-axis G z Angular velocity of z-axis

Static Segmentation
Human activity can be divided into static, dynamic and transition actions. Compared with the dynamic and transition action, static action has a little rate of change. Therefore, the difference of the signals can be clearly reflected through acceleration and angular velocity.
The signals of acceleration and angular velocity are differential processed, respectively, and the static segmentations in the whole time series can be identified by setting the threshold. As shown in Figure 4a, a set of thresholds is randomly selected from the candidate value pairs to obtain the corresponding static segmentationC by the proposed TS algorithm. Figure 4b illustrates the selection procedure of the threshold pair using the grid search approach. The evaluation indicator F ab can be estimated through comparingC from Figure 4a with C labeled manually. The best threshold pair, c best and d best , are finally obtained if F ab is optimized. A detailed estimation of F ab is provided below.   For a complex time series, the starting point of the focused activity is often manually identified [7]. C = {S u 1 r 1 , S u 2 r 2 , . . . , S u i r i , . . . , S u K r K }, where 1 ≤ u i < r i ≤ t and 1 ≤ i ≤ K. C is the set of static segmentations manually identified. S u k r k represents the static segmen-tation from time u k to r k , while K represents the number of static segmentations manually identified. After differential processing of the time series data, it can be found that the difference between the static segmentations is relatively small during [0, g], where g is the gravity acceleration. Referring to the grid search method, this paper exhaustively traverses all the hyper-parameter combinations in order to select the optimal set as the final results. The purpose of TS algorithm is to find out the optimal threshold c best and d best to cut out the optimal static segmentations. It is assumed that the thresholds c and d both have z groups of candidate parameters, and listed as 1 × z one-dimensional matrices, I c and I d , respectively. I c and I d can generate z × z candidate values. According to Figure 4a, the static segmentations with different candidate values can be identified, and the related optimal threshold can be finally obtained according to Figure 4b. Let C= Sũ 1r1 , Sũ 2r2 , . . . , Sũ iri , . . . , SũKrK , where 1 ≤ũ i <r i ≤ t and 1 ≤ i ≤K be a set of static segmentations identified using candidate thresholds c and d. Here,K represents the number of static segmentations identified by the TS algorithm, and Sũ iri denotes a static segmentation from timeũ i to timer i .
In order to find the optimal thresholds in the training samples, the algorithm should clearly determine the optimized static segmentations, while the TS algorithm does not mix up with the segmentations of other types of activities. As obtained in Equations (3) and (4), S a denotes the total number of sampling points in static segmentations labeled manually, while S b is the total number of sampling points in static segmentations identified by the TS algorithm with the candidate pairs of thresholds. Here, u i and r i represent the starting and ending points of the ith static segmentation in set C, respectively. Similarly,ũ j andr j are the starting and ending points of the jth static segmentation in setC, respectively. S ab represents the number of sampling points in the overlapping areas of the static segmentations identified by the TS algorithm and the labels. S ab /S a represents the proportion of all static intervals that are correctly split. S ab /S b represents the proportion of correct segmentation in the interval segmented by the TS algorithm. In order to divide the interval to be both right and complete, S ab /S a and S ab /S b should be as big as possible. As shown in Figure 5, the red parts are the manually labeled static segmentations, and the black rectangular boxes are the static segmentations identified by the TS algorithm. In Figure 5a, most static intervals are not split using a very small threshold, then S a /S ab is small. In Figure 5b, the transition actions are contained in S b which result in S b /S ab being smaller. Therefore, there exists a trade-off between these two requirements. The F1-score is an indicator used to measure the accuracy of binary classification model in statistics, which considers the accuracy and recall of the classification model at the same time. According to the logic of F1-score, F ab is calculated in Equation (5). As shown in Figure 6, the red and blue parts represent the static segmentations identified manually and by the TS algorithm, respectively. It can be found that there exist only four cases in which overlapping occurs among the total six cases of the relative positions between two kinds of static segmentations. For cases 1-3, it can be seen that the ending point of the segmentations using the TS algorithm is smaller than those manually labeled. If the ending point of the blue parts is smaller than the starting point of the red part, there is no overlapping area. Therefore, max(0, (r j − u i ))/|r j − u i | is used to eliminate case 1. Additionally, Figure 6 shows that the overlapping area can be obtained as follows: min(r j , r i ) − max(u i ,ũ j ) + 1. For cases 4-6, similarly, the ending points of the blue parts are greater than the red part. Then, the total overlap point number of C andC is obtained as max(0, (r j − u i ))/|r j − u i | × (min(r j , r i ) − max(u i ,ũ j ) + 1). In summary, the overall calculation can be estimated in Equation (6).
Algorithm 1 gives the detailed procedures of the proposed TS algorithm.

Algorithm 1 The proposed TS algorithm.
Input: C, I c , I d , A x , G x . Output: c best , d best initialization: c best , d best , f best = 0 Calculate S a (x) using Equation (3). 16: Calculate S b (y) using Equation (4). 17: Calculate S ab (x, y) using Equation (6). 18: Calculate F ab using Equation (5). 19: return F ab 20: end function 21: for i from 1 to z do 22: for j from 1 to z do 23:C = Segmentation(A x , G x , I c (i), I c (j)) 24: F ab = compare(C,C) 25: if F ab > f best then 26: c best = I c (i), d best = I d (i) 27: end if 28: end for 29: end for 30: return c best , d best

Periodic-like Interval Segmentation
Periodic-like activity usually takes a long time. Here, peaks and troughs can clearly reflect the characteristics of periodic signals. Generally, the horizontal spacing distance between the peaks and troughs are half of the human activity cycle. For a complex time series, the periodic-like action segmentations can be identified by finding peaks and troughs. However, the transition activity between two static activities and the jitter of human body has the probability to generate abnormal peaks and troughs, which may bring serious impacts to HAR. Therefore, the SA algorithm is applied to eliminate these abnormal points. The area of the line connecting two adjacent peaks and the troughs between them will be much larger than the normal area, so the abnormal points can be preliminarily found according to the calculated area. However, the abnormal points cannot be accurately identified only by using the area. Another notion, slope, is introduced. Since for the abnormal points, the slope of the line connecting two adjacent peaks and the troughs between them is much smaller than the normal slope, the abnormal points can be further filtered by the slope. The flowchart of the proposed SA algorithm is shown in Figure 7. The training part lists the peaks and troughs and connects the adjacent peaks and troughs to estimate the threshold slope and area, then finds the minimum slope value and maximum area value stored as k min and S max . The test data set repeats the procedure to calculate the related slope and area which are used to compare with k min and S max . After eliminating the abnormal points, the periodic-like segmentations can be cut out from the time series.
. . , P n c } be the set of peak and trough points in the periodic-like segmentations in the training sample, respectively. Here, m and n are the number of peak and trough points, respectively. Suppose k ur and L ur are the absolute value of the slope and the length of the line connecting between the peak point (p r v ) and the trough point (p u c ), respectively. Similarly, k u(r+1) and L u(r+1) are the absolute value of the slope and length of the line connecting between the peak point (p r+1 v ) and the trough point (p u c ) . Here, P r v < P u c < P r+1 v , 1 ≤ u ≤ n, 1 ≤ r < m. It is necessary to calculate the area of the lines connecting three points, p u c , p r+1 v , and p r v , where the triangle area is S u,r,r+1 = 1 2 L ur × L u(r+1) × sin a, and a is the angle of the lines connecting three points, as shown in Figure 8. The slope of point p u c and p r v , k ur can be obtained as shown in Equation (7). Here, x p r v and y p r v represent the corresponding number of sampling points and the acceleration value of point p r v , respectively. Let 1/k ur and 1/k u(r+1) be the tangent values of ∠1 and ∠2. The tangent values of ∠a can be obtained, as shown in Equation (8). According to tan a obtained above, the corresponding sin a can be obtained using Equation (9). Then the triangle area, S u,r,r+1 can be calculated by Equation (10).
Algorithm 2 lists the detailed steps of the proposed SA algorithm. Output, D, is the set of the peak and trough points without abnormal points.

Algorithm 2 The proposed SA algorithm.
Input: K min , S max , P c , P v ,P c ,P v Output: D 1: function GETSLOPEAREA(P c , P v ) 2: Calculate K(P c , P v ) using Equation (7).

Multi-Label Weighted Probability Model (MLWP)
For complex HAR using the sliding window method, the window is prone to make classification errors at the boundaries of different actions or at the segmentations due to body jitter. Even worse, some useless segmentations may be classified into major activities. If a sliding window of a small size is used to reduce the number of sampling points at the boundary while improving the recognition rate, this may lead to the loss of the basic characteristics of other activities. Therefore, window overlap is a better solution. Let the sliding window be overlapped by q%, which means each sub-window produces 1/(q%) labels. When the overlapping sliding window method is used for classification and recognition, the sub-windows at the boundary may generate many different labels. For these sub-windows, the corresponding weight vector can be determined by combining the basic activity segmentations identified before, and the corresponding probability can be obtained. By setting the threshold, the unknown class is rejected, and the classification and recognition are carried out to determine the activity category of the sub-window.
Let E = s m 1 n 1 , s m 2 n 2 , s m 3 n 3 , s m i n i , . . . , s m k n k , 1 ≤ m i < n i ≤ t, 1 ≤ i ≤ k be the set of all abnormal segmentations in the time series, and l m i n i = {l 1 , l 2 , l 3 , l 4 }, 1 ≤ i ≤ k, l ∈ A be the four labels of s m i n i , which is generated by the classifier. Let w m i n i = [w 1 , w 2 , w 3 , . . . , w N ] T , w m i n i be the weight vector of all kinds of activities from the time interval m i to n i , and its initial value is zero vector with size N × 1. N is the number of activity classes in the time series. Let M be the set of all static and periodic-like segmentations according to the proposed TS and SA algorithms. The algorithm diagram is shown in Figure 9. Through the proposed TS and SA algorithms, the thresholds K min , S max , c best and d best are obtained, and the time series of the test set is pre-segmented. The time series employ overlapping sliding windows to go through the classifier. According to label, L, generated by the sub-windows, the corresponding weight vector, w, is estimated. When the labels are not completely consistent, the segmentations are identified as E, and the segmentation inside the sub-window is judged if it is in M. The corresponding w is weighted, and the detailed weighting procedure is shown in Algorithm 3. Here, w is converted into the corresponding activity probability vector, P, using Equation (11). The maximum probability of activity is found, and the threshold, θ reject , is determined to distinguish between the known and the unknown activity.  Figure 9. The diagram of the proposed MLWP algorithm.
For the selection of threshold θ reject , this paper selects a group of complete time series subjects as validation data from the training samples, and the candidate value of θ reject is set from [0, 1]. Regardless of the accuracy, the maximum is selected as the θ reject of the data set to reject the unknown activities in the time series.

Experimental Environment and Data Sets
The experiment was conducted on a laptop equipped with AMD Ryzen 5 4600H 3 GHz CPU and NVIDIA GeForce ® GTX 1650 2G GPU. The operating system was Windows 10. MATLAB R2019b was used for HAR.
The data sets used in this paper include the UCI and PAMAP2 data set. The UCI (UCI-Rvine, University of California, Irvine) data set comes from "Human activity recognition using smart phones" in the machine learning repository [9]. The data set consists of 30 volunteers aged 19-48 who wore a smartphone (Samsung Galaxy S II) around their waist. Each volunteer performed six consecutive activities (walking, walking upstairs, walking downstairs, sitting, standing, and lying down). Using its embedded accelerometer and gyroscope, it samples 3-axis acceleration and 3-axis angular velocity at a constant rate of 50 Hz. The PAMAP2 data set is measured by nine volunteers wearing inertial measurement units (IMUs) consisting of gyroscopes, magnetometers, an accelerometer, temperature, and heart rate sensor composition. Each volunteer performed 12 consecutive activities [31]. As described in the previous section, this data set was preprocessed, and the sensor data of one experimenter were randomly selected as the verification set, while the sensor data of the other experimenters were used for model training and hyperparameter tuning.

Evaluation Indicators
The problem to be solved in this paper is to accurately detect the starting point of each activity for complex activity time series. In order to evaluate the performance of the proposed scheme from multiple perspectives, the evaluation indicators include accuracy, precision, recall and F1-score [32]. Accuracy represents the percentage of the correct prediction results in the total sample; precision is for the prediction results, which means the probability of actually being a positive sample in all the predicted positive samples; recall is for the original sample which means the probability of being predicted to be positive in the actual positive sample; and the F1-score considers both precision and recall to make both reach the highest at the same time and maintain the balance. The indicators can be obtained as Equations (12)- (15).

Static and Period-like Interval Segmentation
In this paper, 5 groups of 60 samples from 30 volunteers in UCI data set were randomly selected as test samples, while others were used as training samples. One of the nine volunteers in the PAMAP2 data set was randomly selected as the test sample, while others are used as training samples. According to the proposed method, the optimal threshold was calculated to identify segmentations. The results of the segmentations are shown in Figure 10.  Figure 10a,b shows the complete continuous activity time series in UCI and PAMAP2 data sets, respectively. The red and blue parts are the static and periodic-like segmentations identified by the proposed algorithm, respectively. The black-dotted rectangular boxes are the manually labeled periodic-like and static segmentations. It is clear that the typical segmentations in the original time series can be clearly figured out.

Model Classification Results
The selection of sliding window size has a certain influence on the final recognition rate [33]. In [34], the sliding windows of 0.5 s, 1.28 s, 2.56 s, and 3 s were selected as the candidate windows on the UCI data set. The final result was that the size of 2.56 s showed the best performance. Therefore, the training set adopted the window of 2.56 s to collect six basic action signals and extract the corresponding time-frequency domain characteristics.
Since the sampling frequency of PAMAP2 data set is 100Hz, this paper selects 1.28 s, 2.56 s, 3.84 s, 5.12 s, and 6.4 s as the size of the sliding window, respectively. The experimental results are shown in Table 3, where 5.12 s performs the best, so 5.12 s is selected as the size of the sliding window on the PAMAP2 data set. For the UCI and PAMAP2 test sets, the sliding windows of 2.56 s and 5.12 s are used respectively, and 25%(q) overlap is set for feature extraction. The multi-class labels are obtained through the classifier. Among them, the UCI data set involves the identification of the transition actions. Therefore, according to the previously identified static segmentation, the transition action segmentations are derived by the change of before and after actions. For the UCI data set, Anguita et al. proved that SVM had the best performance results, and multi-class labels were obtained by SVM. For the PAMAP2 test set, multi-class labels are obtained by different classifiers trained using training set. The corresponding θ reject is obtained according to the training set, as shown in Figure 11. Among them, 0.4 is selected as the threshold for the UCI data set, while 0.1 is selected as the threshold for the PAMAP2 data set. The θ reject of the data set is low since when the PAMAP2 data set is manually labeled, the transition between actions is not considered (the previous sampling point is walking, and the next sampling point is seating), so it is not as good as what the UCI data set shows. The proposed MLWP algorithm is used to determine the label, and the results are compared with those of the manually labeled ones. The experimental results are shown in Tables 4 and 5.
(a) (b) Figure 11. Accuracies of two data sets at different thresholds. (a) UCI data set, and (b) PAMAP2 data set.  As shown in Table 4, five groups of test samples are randomly selected, where S 1 −S 5 represent the first to the fifth groups of data in the test samples, respectively. The highest accuracy is 98.28%, while the lowest is 97.39%. The average accuracy can achieve 97.71%.
As shown in Table 5, SVM, DT, linear discriminant (LDA), NB, KNN and bag tree (BT) classifiers are applied. The accuracies of SVM, LDA, KNN and BT are relatively better than the others, while SVM performs the best, reaching 95.93%. Figure 12 shows the confusion matrix of the proposed scheme on the UCI and PAMAP2 data sets. From Figure 12a, it can be seen that the classification effect of three types of static activities, sitting, and lying, and three types of dynamic activities, walking, upstairs and downstairs, is very good, while the effect of transition activities (standing to lying, standing to sitting, sitting to standing, sitting to lying, lying to standing, and lying to sitting) is relatively poor because the boundary part is often mistakenly classified into static actions. Additionally, Figure 12b shows that the model has a good recognition rate for lying, running, cycling, walking, going up and down stairs, and relatively poor recognition for sitting, standing, scalding, cleaning and other actions (the volunteer does not perform rope skipping). As illustrated by Figure 13, different types of actions use different colors; the red spaces in Figure 13a,c is manually unlabeled segmentations, and the black spaces in Figure 13b,d are unknown segmentations rejected according to the proposed algorithm. The first black box in Figure 13b is identified as unknown and walking, because the volunteer may stand up and walk for some time, while the second black box is totally identified as unknown since it can be distinguished as transition action according to before and after actions. Similarly, the first black box in Figure 13d is identified as unknown and downstairs, while the second black box is identified as unknown since it is a transition action. It can be seen that the proposed scheme can clearly segment the time series and identify all kinds of actions. In addition, the unknown segmentation can be distinguished accurately. In order to demonstrate the superiority of the proposed model, this paper compares the results with existing research work. As shown in Figure 14, in [35], the features are first processed by a kernel principal component analysis (KPCA) and LDA. Finally, researchers proposed a deep belief network (DBN) and it was compared with SVM and artificial neural network (ANN). Ref. [36] proposed the U-Net network (UNET) and fully convolutional networks (FCN); UNET achieved fast enough recognition speed. Ref. [37] evaluated extreme gradient boosted machines (EGBM) in HAR. Ref. [38] proposed that shown in Figure 13, a sparse representation based hierarchical (SRH) classifier. Figure 14 shows the comparison of accuracy of different methods in the UCI data set. Numerically, the proposed scheme shows outstanding performance and produces 8.65%, 4.79%, 4.55%, 3.59%, 2.74%, 1.85% and 0.15% higher accuracy compared to that of ANN, FCN, UNET, SVM, EGBM, DBN and SRH, respectively. Accuracy(%) Figure 14. The comparison of accuracy using different methods in the UCI data set. Table 6 compares the recall of various types of activities form UCI data sets. Among them, the meaning of A1-A12 is walking, upstairs, downstairs, sitting, standing, lying, standing to sitting, sitting to standing, sitting to lying, lying to sitting, standing to lying, standing to lying, and lying to standing. The proposed scheme produces better recognition results for most of the activities. For the PAMAP2 data set, the accuracy, precision, recall and F1-core are compared with the existing deep learning-based schemes. As shown in Figure 15, numerically, the proposed scheme shows outstanding performance and produces 11.86%, 4.93%, 2.96%, 2.43%, 1.92%, 8.03%, 1.21%, 8.53% and 1.55% higher accuracy compared to that of SVM, CNN, Local Loss CNN, Lego CNN, condconv CNN, MLP-D, CNN-D, LSTM-D, and Hybrid-D, respectively. As shown in Table 7, the proposed model focuses on shallow learning method. Through probabilistic alignment of the identified typical segmentations, the F1-score is raised to 95.12%. Ref. [39] introduced using the distance-based loss function in MLP, CNN, LSTM and hybrid model, and found that CNN-D shows the best performance among these methods. Compared with CNN-D, the accuracy and F1-score increases by 1.21% and 0.89%. Compared with [26] which introduced condconv to replace the standard convolution layer, the accuracy increases by 1.92%. While compared with [24], which applied the Lego CNN model, the accuracy, recall, and F1-score increases by 2.43 %, 5.64%, and 3.72%, respectively. For the other schemes, the proposed scheme also shows the best performance for the four evaluation indicators. In summary, the proposed shallow learning scheme is able to maintain good classification results with fewer computing resources.   Table 8 compares the recall of various types of activities form PAMAP2 data sets. Among them, the meaning of B1-B11 is lying, sitting, standing, walking, running, cycling, Nordic walking, upstairs, downstairs, vacuum cleaning, and ironing. The proposed scheme produces better recognition results for most of the activities.

Conclusions
Most of the current research work focuses on simple HAR. Here, classification and recognition are based on manually labeled segmentations in time series, without considering the cost of the manually labeled and personal privacy. In this paper, a probability threshold based algorithm for complex HAR is proposed, which can segment and identify the basic actions in complex activity time series. The proposed scheme accurately segments the activities while effectively rejecting the useless segmentations. In addition, the cost of manual labeling can be reduced to improve the efficiency of HAR. The proposed model is applied to the UCI and PAMAP2 data sets for experiment validation. The results show that for the UCI data set, the proposed model can well segment and identify the static, dynamic, and transition activities. Additionally, the useless segmentation can be effectively identified, and the overall accuracy rate is able to reach 97.8%. For the PAMAP2 data set, the proposed model can distinguish the basic activities well, and the overall accuracy is about 95.9%. This paper only classifies and identifies six basic activities and six transitional activities. The structure of the proposed model used in the experiment can be further optimized, and more detailed comparative experiments can be carried out. In the future work, in order to verify the robustness and practicability of the proposed model, experiments are planned on various data sets, and the developed modules will be applied to deep learning model.