A Robust Step Detection Algorithm and Walking Distance Estimation Based on Daily Wrist Activity Recognition Using a Smart Band

Human activity recognition and pedestrian dead reckoning are an interesting field because of their importance utilities in daily life healthcare. Currently, these fields are facing many challenges, one of which is the lack of a robust algorithm with high performance. This paper proposes a new method to implement a robust step detection and adaptive distance estimation algorithm based on the classification of five daily wrist activities during walking at various speeds using a smart band. The key idea is that the non-parametric adaptive distance estimator is performed after two activity classifiers and a robust step detector. In this study, two classifiers perform two phases of recognizing five wrist activities during walking. Then, a robust step detection algorithm, which is integrated with an adaptive threshold, peak and valley correction algorithm, is applied to the classified activities to detect the walking steps. In addition, the misclassification activities are fed back to the previous layer. Finally, three adaptive distance estimators, which are based on a non-parametric model of the average walking speed, calculate the length of each strike. The experimental results show that the average classification accuracy is about 99%, and the accuracy of the step detection is 98.7%. The error of the estimated distance is 2.2–4.2% depending on the type of wrist activities.


Introduction
In recent years, human activity recognition and pedestrian dead reckoning using inertial sensor-based wearable devices have received much attention of researchers to support human life [1][2][3][4][5][6][7][8][9][10].Using more wearable devices is synonymous with deploying more sensors.However, this is inconvenient for users if they simultaneously use many devices and perform various daily activities [11].The main problem is that researchers must find a robust algorithm with high performance using the smallest number of sensors to provide the most convenience for users.
Some studies estimated the walking distance by considering a small number of user's walking modes and device poses [5,10,12].Phuc et al. [12] presented the precise stride counting-based method to estimate the walking distance using insole sensors.The insole sensors consisted of a triaxial inertial sensor and eight pressure sensors.The authors estimated the traveling distance based on the number of strides extracted from the phase information.However, they only considered the walking distance estimation of normal walking on flat ground.Lee et al. [5] introduced a robust step detection algorithm for three step modes and seven device poses of the smartphone.The step detection used an adaptive magnitude and temporal thresholds which addressed the transition among step modes or device poses and the time-varying pace of human walking or running problems.The developed method can detect the number of steps for any combination of step mode and device pose.Ho et al. [10] developed a method of walking distance estimation based on an adaptive estimator of the step length and robust step detection.The presented method successfully estimated the traveling distance at three speed levels and four different distances.Furthermore, the step-length estimator, which was an improvement of Weinberg equation [13], used an adaptive K-value as a linear regression model.
In common approaches, all processed activities data are directly fed to an adaptive step detector without classifying the performing activities [5,6].However, it is more effective if the activities are classified because the thresholds of the acceleration values depend on the type of activities.To achieve high accuracy in estimating the traveling distance with various actions, e.g., texting, calling, and swinging, some studies [9,14,15] addressed the problem by implementing classifiers or improving the step detection algorithms before estimating the distance.Susi et al. [9] proposed an adaptive step detection by analyzing the characteristics of the gait cycle, which included the hand motion and carrying-mode difference of a pedestrian using a smartphone.The authors detected the motion modes, e.g., swinging, texting, phoning, bag and irregular motion, before applying the step detection algorithm on the collected inertial signals.Renaudin et al. [14] estimated the step length using a handheld sensor, which was an extended idea from [9].The presented step detection algorithm used the step frequency, height of the pedestrian, and three variables to estimate the step frequency of non-body fixed sensors.Zhang et al. [15] designed an inertial pedestrian navigation system (IPNS) based on the improvement of the step mode and device pose algorithm using a low cost hand-held device.The step detection algorithm addressed the over-counting and under-counting errors by implementing a support vector machine that was used to recognize step modes and device poses.
The aforementioned studies do not account for the errors caused by the classifiers and step detectors.Therefore, the step detector can make a serious error, where it attempts to detect the steps of a calling or texting activity, which is classified as hand swinging and vice versa.The system can give an exception message when we do not handle these errors.Specifically, the accuracy rates of the walking distance estimator significantly decrease.Thus, in the paper, we propose a new method that uses a smart band to estimate the walking distance based on a robust step detection and an adaptive step length estimation for five daily wrist activities during walking: phone texting, phone calling, hand in pocket, suitcase carrying and hand swinging.The performance of step detection and traveling distance estimation can be improved by applying classifiers, robust step detectors, and the error feedback technique.The activity samples are classified and labeled by support vector machine (SVM) classifiers.A 2-s window of the preprocessed data is used to obtain features that are fed to the classifiers.The step detector used adaptive thresholds each activity.Basically, activity samples can be classified two times before being fed into the step detectors.The movement distances are estimated by summing the length of all walking steps.Furthermore, the step length equation is constructed based on a non-parametric regression of an average magnitude of tri-axial velocities and a set of variables.The contributions of this paper are as follows:

•
Developing a hierarchy framework of the walking distance estimation for five daily living activities: phone texting; phone calling; hand in pocket; suitcase carrying; hand swinging.

•
Proposing a robust step detection algorithms using an adaptive threshold.

•
Improving the step detectors and traveling distance estimators using error feedback.

•
Developing the step-length estimation based on non-parametric regression.

•
Estimating and comparing the performance of each walking distance estimator with various activities and speed levels.
This paper is organized as follows.In Section 2, we describe the hierarchical framework of the walking distance estimation in details.Section 3 shows the results of our method in three parts: activity classification, step detection, and walking distance estimation.Finally, in Section 4, we conclude the paper and provide directions for future works.

Proposed Hierarchical Framework of the Walking Distance Estimation
Our objective is to improve the performance of the walking distance estimation using two layers of activity classification (Figure 1).The first layer divides five activities into two groups.Group 1 contains phone calling, texting, suitcase carrying and hand in pocket during walking; group 2 contains hand swinging during walking.The second layer separately classifies the activities in group 1.The five daily wrist activities are described in Figure 2. The characteristic of the generated acceleration signal on the wrist during walking depends on type of user's wrist activities.Each activity has a different threshold value to detect a peak and a valley of the acceleration signal.Therefore, the step detection algorithm performs more effectively if it knows the type of data that is being processed.Step detection In the step detection phase, the number of peaks between two valleys is checked to make a final decision about the type of activity being processed using the relationship among a step event, a peak and a valley of the filter acceleration signal.Therefore, the step detection phase can detect the misclassification activities and return it to the previous layer.After the activity is determined, an adaptive threshold algorithm is applied to detect the step events.Once the step events are identified, three non-parametric step length models are applied (Figure 3).For each class of activities, only the highest distance accuracy of those three models is considered.
Step events infomation Non-parametric Weinberg method

Non-parametric Kim method
Non-parametric Tian method Highest traveling distance accuracy of 5 walking activities.The complete hierarchical framework of the walking distance estimation is illustrated in Figure 4.This framework is discussed in Sections 2.2-2.6, in order to describe the new walking distance estimation algorithm based on activity recognition using a smart band.Step Based Parameter Extraction

Data Collection and Pre-Processing
In this study, a smart band (Microsoft band 2) that integrates a tri-axis accelerometer (ST-Microelectronics, LSM6DS2, Scottsdale, AZ, USA) was used to collect tri-axis acceleration data.Ten healthy people participated in the experiments: six men (aged 24-27; height 170 ± 15.0 cm; weight 70.0 ± 5.0 kg) and four women (aged 24-25; height 168 ± 5.0 cm; weight 50.0 ± 2.0 kg).They were requested to wear the smart band and perform five wrist activities: texting, calling, hand in pocket, suitcase carrying and swinging during walking.Each person was required to repeat 20 m of walking at different speed levels 28 times for each wrist activity.The collected dataset contained 280 trials for each activity and 1400 trials in total (approximate 40 min of walking for each person) and was sampled at 62.5 Hz (maximum sampling frequency of the Microsoft band).In the preprocessing phase, we resampled the raw data at 50 Hz.For the frequency component of human body and the energy during perform movements below 15 Hz [16,17], we applied the collected tri-axis acceleration data a low-pass filter (10th-order Butterworth filter) with a cut-off frequency of 15 Hz.

Feature Extraction
The filtered signal does not characterize the activities.Therefore, we must extract the features from the data that characterize different activities.In this paper, 23 features were extracted from a sliding window of 100 samples data points with 50% overlap from the filtered data.This selection of window size was proven to be the successful solution for activity recognition in a previous work [18].The following features, which have been shown to be effective in human activity recognition [18][19][20][21], are used in the paper:

•
Average Energy (AE) [20,22,23]: The energy of each axis of the triaxial acceleration sensor is calculated by summing the squared discrete FFT component magnitudes of the signal in a sliding window.The AE in the paper is the average energy value calculated in three axes.
where n is the size of a sliding window; a x , a y and a z are sample point the acceleration data on three axes, x, y and z, of the triaxial sensor, respectively.

•
Intensity of Movement (IM) [20,24]: • Mean: • Standard deviation [19,20,22]: • Band power and peak power: the band power, which is defined as the power ratio in three frequency ranges (0-0.5 Hz, 0.5-1 Hz, 1-5 Hz), and the peak power, which is defined as the total power of the five dominant frequencies, are also effective features as demonstrated in [18].The power in the band frequency from f a Hz to f b Hz is calculated by the following equation: where S X ( f ) is the power spectral density of the Fourier transform of the acceleration signal; N is sampling frequency.

Activity Classification
In the activity classification task, two support vector machine (SVM) classifiers are used to classify five daily wrist activities during walking (Figure 4), since it is robust and highly accurate as demonstrated in other studies [18,25].The first classifier is a binary SVM; it classifies two classes: swing activity and the other four activities.The second classifier is a multi-class SVM; it classifies four classes: texting, calling, hand in pocket and suitcase carrying.To select features for each classifier, we visualized the separation of wrist activities in the feature space.The corresponding feature of the classifiers are described in Table 1.After the hand motion mode is classified, the acceleration data are low-pass filtered again with a cut off frequency of 5 Hz to remove noise and avoid the failure in peak detection (Figure 5). Park et al. [26] demonstrated that the arm and foot movements were synchronized during walking.Using this relationship, the step events are more easily detected by analyzing the collected acceleration data from the smart band.Figure 6 describes the insight into the wrist acceleration and arm movement.For the arm swinging, when the arm position is beyond or behind the user's hip, the wrist's acceleration value is maxima (peak); when the arm's direction is perpendicular to the ground and tends to move forward, the wrist's acceleration is minima (valley).The second case includes texting, calling, hand in pocket and suitcase carrying, whose common property is the center-of-mass movement.The acceleration in this case changes in a sinusoidal pattern because of the up and down motion of the user's torso [9].Therefore, the step detection problem can turn in the peaks and valleys detection of the acceleration signal of the wrist.The main difference between these two mentioned cases is the number of peaks between two valleys.In arm swinging, there are two peaks between two valleys, which corresponds to the number of steps.In other cases, there is only one peak between two valleys.To detect the peaks and valleys of the wrist acceleration signal, we define the peak and valley thresholds.In addition, to minimize the probability of the miss-detection peaks and valleys, we initialize these values as follows: arm swing case th p = 0.5 × max(a), the other cases th p = 0.5 × max(a), where th p and th v are the threshold values for the peak and valley detection, respectively; a is the vector of acceleration in a sample.

Minimum Correction
The initialization of the thresholds is not the perfect value to detect the valleys (peaks) because some data do not clearly reflect the human action.Therefore, there are fewer detected valleys (peaks) than the actual valleys (peaks) or vice versa, which causes an incorrect detection of the actual steps.To resolve this issue, we define an abnormal interval Ab in , which is the minimal distance between two valleys and calculated in Equation (9).
Any greater distance between two adjacent detected valleys than the abnormal interval is considered a missing valley in that interval.Those valleys will be detected again using our minimum correction algorithm with an adaptive threshold as illustrated in Figure 7.The notation in our algorithms is described in Table 2.
First, the Findvalleys function will take the abnormal interval (Equation ( 9)) and the threshold values for valleys (Equations ( 6) and ( 7)) as its input.The valley thresholds are increased in one of two scenarios: 1st-No valley is detected; 2nd-The distance between detected valleys to the first and to the end data point of abnormal interval smaller than a quarter of µ d .A valley will be considered invalid if the absolute value of the acceleration data at that valley is smaller than 0.1 m/s 2 (considered a noise).If more than one valley is detected, the valley with the largest corresponding absolute acceleration is accepted.

Missclassification Activity Feedback and Maximum Correction
Once all the valleys have been successfully detected, each interval between two adjacent valleys is considered a reference interval.As mentioned in Section 2.5.1, the number of peaks between two valleys is the utility information to contribute to the wrist's activity classification.Therefore, we check the number of peaks in the reference interval.If the activities are classified as hand swing, but the number of reference intervals with two peaks exceeds 50% of the total reference intervals in one observation, the activities are considered as belonging to another wrist class activity.These activities will be fed back to the second classification layer as its input.Otherwise, if the activities are classified as belonging to group 2, but the number of reference intervals with one peak exceeds 50% of the total reference intervals of one observation, then it belongs to the hand-swinging class.These activities will be returned to the step detection algorithm of the hand-swinging case as its input.
Capturing the step events or peak detection is the main factor to having a high accuracy of the distance estimation.Therefore, a maximum correction algorithm is necessary to correct the peaks that fail in the first detection, as described in Section 2.5.1.This algorithm is shown in Figure 8.
The key idea is to use all successful detected valleys and characteristics of the acceleration data for each type of activity during walking (analyzed in Section 2.5.1) to find the peaks that failed in the first detection times-the reference intervals in which incorrect peaks detection are taken into account.If there are two fewer peaks in the reference interval (in the swinging case, or one for the other cases), the Findpeaks function will detect the peaks one more times, where the threshold is the maximum acceleration data of the valleys in that interval.This threshold varies depending on the valleys of each reference interval to ensure that the peaks are successfully detected.In some cases, an irregular motion occurs during walking and causes redundant peaks to be detected.Then, two peaks (in the swinging case) or one peak (for the other case) with maximum accelerations are selected as the correct peaks.

Distance Estimation Method
For the walking distance estimation problem, a popular strategy is to sum up the length of all steps walked [13,[27][28][29][30].In the proposed method, we derive the equations of length steps based on the results of three previous studies [13,29,30].These studies use a K-factor that is manually set according to the statistics of the volunteers.In the previous work [10], the K-factor was presented as a parametric model of polynomial regression.However, the parametric model is less robust and less flexibile than the non-parametric model [31].Considering this issue, we propose the K-factor as a non-parametric regression model of the velocity features, which is called locally weighted polynomial regression [32,33].Furthermore, we consider three step length equations of [13,29,30] to obtain the efficient distance estimators for each activity.The equations of the step length estimation are as follows:

•
Weinberg method [13]: where A max and A min are the maximum and minimum accelerations in the vertical movement of the human body axis, and K is a constant unit for conversion (i.e., feet or meters traveled).

•
Kim method [29]: where A i is the measured acceleration of sample ith in a single step; N is the number of samples covered in each step; and K is a constant unit for conversion.

•
Tian method [30]: where h is the height of the subject and f s is the step frequency, which is measured during the walking experiment; K is a constant unit for conversion.
Based on these methods, we derive the K-factor as a polynomial function of the step velocity: where e = (e 1 , e 2 ,...,e n ), where v is as the magnitude of the average velocities on three axes in each step; e and n are the noise and number of observations, respectively.We assume that e contains uncorrelated, mean zeros, and random variables [32].Then, the problem is obtained by solving the weighted least-square problem: where W is a diagonal matrix with the Gaussian weight function, which can achieve a more accurate local approximation model and a smooth fit [34].
The solution for coefficient β is: The K-factor of the ith step is obtained as: The proposed adaptive step-length estimation equations are derived from three mentioned equations as follows: Non-parametric Weinberg method: Non-parametric Kim method: Non-parametric Tian method: The walking distance is calculated by summing all steps for each experiment: where N is the number of walked steps in each experimental sample.

Activity Classification
As mentioned, to collect sufficient data to assess the performance of our proposed method, ten participants were requested to perform five daily wrist activities in 20 m of walking at different levels of speed.We used a confusion matrix to estimate the performance of the classifiers in Tables 3 (classifier 1) and 4 (classifier 2).As described in these tables, the first column lists the performed activities by the participants, and the first row lists the predicted activities by the classifiers.In Table 3, the swing activity is 100% correctly predicted.As mentioned, the swing acceleration data is significantly different from other cases.In addition to the up and down actions of the hip, the forward and backward actions of the arms also affect the acceleration data.This characteristic makes the swinging activity different from the other activities.The accuracy of predicting texting/calling/hand in pocket/suitcase carrying is 99%.The first classifier incorrectly predicted 1% of them as swinging.Texting, calling, hand in pocket, and suitcase carrying are center-of-mass motions, and the acceleration is generated by the up and down actions of the hip, but, in some cases, the arm slightly moves because of the inertia of fast walking.In this situation, texting/calling/hand in pocket/suitcase carrying is identical to swinging at a slow speed, so the classifier failed to classify these activities.All activities that are predicted as swinging are the input of the swinging step detector.The 1% incorrectly predicted activity is rechecked in the step detector and returned to the second classifier.The classes (texting, calling, hand in pocket and suitcase carrying) from the first classifier are the input of the second classifier.The confusion matrix is provided in Table 4.The hand in pocket is perfectly predicted.The calling and suitcase carrying are 2% incorrectly predicted as texting and calling, respectively.The texting is 1% incorrectly classified as hand in pocket.Those errors affect the performance of the step detector but in acceptable amounts because all activities are one peak between two valleys.

Step Detection
Classified data are fed to the step detector, which has five different reference and adaptive thresholds for five walking activities.The misclassifications of the first classifier are returned and corrected.The step detection algorithm is affected by the wrong classification of the first and second classifier.Figure 9 illustrates the accuracy and standard variance of the step detection between with and without misclassification correction for each walking activity.As shown in the figure, the accuracy of the step detection algorithm with misclassification correction is higher than without misclassification correction in the calling, suitcase carrying and swinging.This is because the 1% error of predicted swinging activity in the first classifier is returned to the second classifier.For step detection with misclassification correction, the accuracy of each walking activity is higher than 98% and the highest standard deviation is 3%.We must emphasize that, for each walking activity, the vertical acceleration data change among the x, y and z-axes.It is difficult for the step detection and distance estimation algorithm when we use the vertical acceleration data as a fixed axis.The classification data solve these problems by classifying the activity and using corresponding vertical data of that activity.In addition, the adaptive threshold also renders the step detection performance.

Walking Distance Estimation
To evaluate the performance of the proposed method, the Leave-One-Sample-Out technique, which makes one trial a test set and the remaining trials the training set in each epoch, was applied to the classified activity data.This technique is commonly used for small datasets [35].We derived the K-factor as a p-degree polynomial function of the velocity feature.In the experiment, we examined various values of p to minimize the estimation error.The polynomial degree p of the K-factor, which was implemented for three methods [13,29,30], was four.
The walking speeds are: low speed (v ≤ v − σ), normal speed (v − σ < v < v + σ) and high speed (v + σ < v).Here, v is the average speeds and σ is the deviation of human walking speed [10].
The performance (accuracy, standard deviation (Std) and normalized mean square error (NMSE)) of each traveling distance estimator considering the activities and walking speed is presented in Table 5.All three proposed methods estimate the walking distance in the texting activity efficiently, and the performance is best when the person walks at high speed (the accuracy is more than 97.91%).The average distance accuracy is higher than 97% for low, normal and high speeds, the normalized mean square error is 1.19, and the standard variance is acceptable (below 0.92 m).Otherwise, with the calling case, the estimated distance at high walking speed is lower than that at normal and low walking speeds.For the other case, the accuracy does not depend on the walking speed but on the deployed method.For example, in the hand-in-pocket case, the highest accuracy is 97.89% using the non-parametric Kim method at high walking speed, and the worst accuracy is 93.33% using the non-parametric Tian method at low speed.The swinging case has a larger standard variance than the other activities due to the change in vertical acceleration as a result of both arm swinging and hip moving during walking.
One of our objectives is to select the best distance estimators that can be stable and achieve high accuracy for each hand daily activity using a smart band.The non-parametric Tian estimator was implemented to estimate the walking distance of texting, calling, and suitcase-carrying activities.In addition, the non-parametric Kim estimator was used for the hand-in-pocket and swinging activities.
According to Table 6, the proposed method achieved an average accuracy of 96.9%, whereas that of the reference method is 95.1%.The calling-during-walking experiment is the most unstable because of different arm gestures of phone call and arm fatigue during the experiments.The smallest and largest gaps of accuracy between the proposed method and reference method were found for the texting and hand in pocket activities, respectively.According to Figure 10, the proposed and reference methods suffer low accuracy and high standard deviation for the suitcase-carrying and calling activities, respectively.Overall, an aspect of the proposed method that used walking distance estimators with each activity can surpass the reference method in terms of accuracy and standard deviation.

Conclusions
In this paper, a step detection algorithm and a walking distance estimation based on daily hand activity recognition using the smart band have been presented and experimentally evaluated.Five daily hand activities during walking were considered: phone calling, phone texting, hand in pocket, suitcase carrying and swinging.Each hand activity has different vertical acceleration data, and changing the vertical acceleration data of the smart band is the main challenge of the distance estimation.Therefore, two SVM classifiers are used to classify and let the step detector and distance estimator know the activity that is processed.In addition, the classification is processed in two steps to improve the robustness of the step detection and walking distance estimation by feedback data of the wrong candidates.The new step detection and distance estimation algorithm using the smart band have been presented.To evaluate the performance of the proposed method, experiments of 20-m walking while performing daily hand activities using the Microsoft smart band were conducted with ten participants.The accuracy of this classification was above 99% for all activities of both classifiers.With prior knowledge about the data being processed, the adaptive threshold strategy of the step detection algorithm is effectively performed.The error of misstep detection is approximately 2%.The experiment results also show the performance of three non-parametric methods, and we compared the performance of the walking distance estimation algorithm with the reference method.The result shows that the proposed method has outstanding accuracy and robustness.
In the proposed method, a post-hoc analysis has been applied.For real applications, real-time processing algorithms will be required.Also, to enhance the performance of estimation, we should consider other daily living activities.These remain future work.

Figure 3 .
Figure 3. Brief structure of step detection-based distance estimation.

Figure 4 .
Figure 4. Proposed hierarchical framework of walking distance estimation.

Figure 6 .
Figure 6.Relation between the vertical acceleration and the activities during walking: (a) calling; (b) swinging.

Figure 10 .
Figure 10.Performance comparison of the proposed method and the reference method.

Table 1 .
Corresponding features of the classifiers.

Table 2 .
Notation of the variables used in the algorithms.

Table 3 .
Classification results: swinging versus texting, calling, hand in pocket and suitcase carrying.

Table 5 .
Distance estimation accuracy of the proposed method.

Table 6 .
Accuracy of the distance estimation of the proposed method and the reference method.