Iterative Learning for Human Activity Recognition from Wearable Sensor Data †

Wearable sensor technologies are a key component in the design of applications for human activity recognition, in areas like healthcare, sports and safety. In this paper, we present an iterative learning method to classify human locomotion activities extracted from the Opportunity dataset by implementing a data-driven architecture. Data collected by twelve 3D acceleration sensors and seven inertial measurement units are de-noised using a wavelet filter, prior to the extraction of statistical parameters of kinematical features, such as Principal Components Analysis and Singular Value Decomposition of roll, pitch, yaw and the norm of the axial components. A novel approach is proposed to minimize the number of samples required to classify walk, stand, lie and sit human locomotion activities based on these features. The methodology consists in an iterative extraction of the best candidates for building the training dataset. The best training candidates are selected when the Euclidean distance between an input data and its cluster’s centroid is larger than the mean plus the standard deviation of all Euclidean distances between all input data and their corresponding clusters. The resulting datasets are then used to train an SVM multi-class classifier that produces the lowest prediction error. The learning method presented in this paper ensures a high level of robustness to variations in the quality of input data while only using a much lower number of training samples and therefore a much shorter training time, which is an important aspect given the large size of the dataset.


Introduction
Wearable sensor technologies are gaining interest in research communities due to the use of significantly miniaturized electronic components, with low power consumption, which makes them ideal for applications in human activity recognition for both indoor and outdoor environments.These applications allow users to achieve a natural execution of any physical activity, while providing good results in multiple practical applications, such as health rehabilitation, respiratory and muscular activity assessment, sports and safety applications [1].However, in practical situations, collected data are affected by several factors related to sensor data alignment, data losses and noise, among other experimental constrains, all deteriorating their quality [2].Also, the non-ergodicity of the acquisition process, especially when processing signals from acceleration sensors, will result in a poor learning performance [3] in applications involving multi-class classification [4].The problems become even more complex if the multi-class classification process is applied on high dimensionality data vectors.Considering these restrictions prevalent in multimodal sensor data fusion [3], which is the case in the study, reported in this paper, feature extraction becomes a critical component for finding the multi-variable correlations that allow the classifier to improve the model precision reflected by a low misclassification rate.
In this paper, we present a new method for classifying human locomotion activities (e.g., walk, stand, lie and sit) by implementing a data-driven architecture based on an iterative learning framework.The proposed solution optimizes the model performance by choosing the best training dataset for non-linear multi-class classification that makes use of an SVM classifier, while also reducing the computational load.We aim to show that by appropriately choosing our data samples for the training of this multi-class classifier, we can achieve close results to the current approaches in the literature, while using only a fraction of the data and improving significantly the computation time.The article is organized as follows: Section 2 presents our method.Section 3 shows relevant results, and Section 4 discusses the conclusions.

Iterative Learning Method for Classifying Human Locomotion
The work in this paper is based on data acquired by body-worn sensors, and extracted from the Opportunity dataset [5].The body-worn sensors are twelve 3D-acceleration customized sensors [6] and seven inertial measurement units-IMUs (Xsens MT9).The dataset has a total of 58 dimensions including the time stamp.Each device senses the acceleration in the 3 perpendicular axes, recording the acceleration values at a sampling rate of 30 Hz.All records are labeled according to four primitive classes: walk, lie, sit and stand.The signal acquisition protocol is performed under a pre-established scenario with six experimental sessions (or runs), performed independently by four users.The extracted dataset contains a total of 869,387 samples, which are distributed as follows: 234,661 samples for user 1; 225,183 samples for user 2; 216,869 samples for user 3, and 192,674 samples for user 4. The goal of is to extract from these data the best training samples that enable the classification of the locomotion activity of the users independently.

Data Pre-Processing
The data pre-processing phase consists of two steps.First, we proceed with the exclusion of values affected by data losses and random noise, issues that are very common in wireless acceleration sensors.In the dataset we use, roughly 30% of the data contains such values.In order to deal with the problem of missing data, we fused all readings produced by each sensor, for each user and each experiment, to work exclusively from a data-driven perspective, as explained in the following sections.The aim of the second step is to de-noise the raw data.

Wavelet Filtering
In order to efficiently de-noise raw data, we include a mechanism that guarantees that the resulting classification model is not biased due to the quality of the input data [7].In general, the acceleration sensors are influenced by several noise sources, such as electrical noise induced by the electronics [8], or noise produced by the wireless communication processes, resulting from the propagation phenomenon and causing distortion in the transmitted signal.The noise present in the acceleration sensor measurements has commonly a flat spectrum.This means that the noise is present in all frequency components.This constitutes a challenge for the use of traditional filtering methods, which by removing sharp features, can introduce distortions in the resulting signal.Decomposition of the noisy signal into wavelets [9] will eliminate small coefficients, commonly associated with the noise, by zeroing them, while concentrating the signal in a few large-magnitude wavelet coefficients.

Feature Extraction and Selection
After filtering the raw data, we proceed with the feature extraction and selection process.The aim is to retrieve a set of data with high correlation, allowing us to extract the best candidates for the training dataset [10].This process focuses on the extraction of kinematics features, such as roll, pitch, yaw (RPY), and the norm of the axial components produced by each of the body-worn sensors.Our first feature set is based on the signal magnitude vector (SMV).At each time instance j, an acceleration sensor k produces a 3D vector, consisting of acceleration values along a system of orthogonal axes  , = (  ,   ,   )  ℛ 3 .For each sensor, we can retrieve the single magnitude vector | , |.A second feature set is related to roll, pitch and yaw (RPY), calculated as follows: Finally, we build a matrix with all axial components produced by all sensors under observation: The last matrix has  ×  , ×  components, where n is the number of samples in each experiment for  sensors in  , dimensions.To deal with the absence of some values, we use principal component analysis (PCA) and singular value decomposition (SVD).PCA provides a mechanism to reduce dimensionality, while SVD provides a convenient way to extract the most meaningful data.Combining these techniques, we find data dependency while removing redundancy.PCA and SVD ensure the preservation of the nature of the resulting data structures on each feature category.When applying PCA, each feature category is reduced to two principal components (Figure 1a).Similarly, when SVD is applied, each feature category is reduced to the two first SVD dimensions, as shown in Equation ( 3).The new target function  , () is represented as follows: , = ( (), (), ( ,,, ), (), (), ( ,,, ) ) We are therefore reducing our analysis to a function with three attributes (, ,  ,,, ) and two mathematical methods, PCA and SVD.

Learning Architecture
Our learning framework aims to classify human activities using a single multi-class SVM classifier (LibSVM version 3.20 [11]).To achieve this, we must deal with two data constrains: (1) the large size of the experimental datasets containing in many cases overlapping class members and high data density; and (2) the non-ergodicity of the recorded signals demonstrated by the fact that we were not able to find temporal patterns in the dataset.In order to improve the classification accuracy, while reducing the processing time required, features (( 1 ,  2 ), . ., (  ,   )) produced by Equation ( 3) are grouped pairwise to cover all the possible combinations.The candidates for the training dataset are then determined by measuring the Euclidean distance between each class member and the centroids of each distribution of (  ,   ).If the resulting distance is larger than the mean plus the standard deviation of all resulting Euclidean distances, then the class member is considered a candidate for the training set.This process leads to the creation of support vectors, which generate the optimal separation plans to classify the remaining data with only a fraction of the total data presented for each user experiment.The goal is to build a robust classification model, which will not be affected by the quality of the input data [12].

Training Data Selection
The following procedure summarizes the process for the extraction of the training dataset (for any user and any experiment): 1. Select a user.2. Select a user experiment.3. Extract two features (  ,   ) from the experiment.4. Extract all classes from (  ,   ). 5. Select a pair of classes (  ,  , ) (i.e., a one-versus-all methodology is used) and extract their corresponding centroids.6. Extract the Euclidean distance between each class member (  ) and the centroid of the class (  ).
Store the results in a vector of distances  , (): where n and m are the classes of (  ,   ),  is a class member and  , is the opposite centroid with respect to the discriminating hyperplane of the class member under evaluation (Figure 1b).7. If the resulting Euclidean distance vector  , () satisfies condition (5), then the class member is a candidate for the training dataset.

Model Selection
Once the best training dataset BoC( , ()) has been identified, we proceed with the selection of the best classification model using a multi-class SVM classifier with an RBF kernel.Since we have more than 2 classes, we follow the strategy one-versus-all.The problem of model selection is reduced to finding the best combination of parameters cost, c and γ extracted in a 5-fold cross validation process, in which the values of c and γ are chosen according to a grid of values, i.e., (2 −5 , . ., 2 7 ).The reason we used this grid is to compensate for the behavior of c and γ.When c is large the classifier presents low bias and high variance.For small values of c the classifier presents high bias and low variance.A similar situation is found with γ.For large values of γ the classifier presents high bias and low variance and for small values of γ the classifier presents low bias and high variance.The best model produced by the combination of c and γ in a 5-fold cross-validation will achieve the lowest misclassification rate.This model is then used to predict the labels on the testing dataset.Once the classification rate is determined, the algorithm stores the accuracy values, the features (  ,   ), c and γ and the size of the training sample BoC( , ()) and repeats the process until all combinations of (  ,   ) are exhausted.

Experimental Results
The proposed process is evaluated over three experimental scenarios and the results are presented in Tables 1-3.We compare our method with a scenario in which the training dataset is randomly selected and its size corresponds to 80% of total of data of each user experiment, common practice when 5-fold cross-validation is performed.These last results are compared with our proposed method and presented in Table 4.We use two measures to validate our results, namely the prediction accuracy (Acc) and the size (as percentage of the total dataset) of the training dataset that is used (TS): Acc = Labels correctly predicted (size of user ′ s dataset) × 100%; TS = size(R n,m ) (size of user ′ s dataset) × 100% (6)

Figure 1 .
Figure 1.(a) PCA is applied to  ,,, (data distribution corresponds to the first and second principal components); (b) Classes are extracted in pairs (   ,  , ) , centroids are extracted and Euclidean distances are calculated according to step 6; and (c) Training candidates are produced by the selection algorithm.