^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

The use of on-body wearable sensors is widespread in several academic and industrial domains. Of great interest are their applications in ambulatory monitoring and pervasive computing systems; here, some quantitative analysis of human motion and its automatic classification are the main computational tasks to be pursued. In this paper, we discuss how human physical activity can be classified using on-body accelerometers, with a major emphasis devoted to the computational algorithms employed for this purpose. In particular, we motivate our current interest for classifiers based on Hidden Markov Models (HMMs). An example is illustrated and discussed by analysing a dataset of accelerometer time series.

The availability of a system capable of automatically classifying the physical activity performed by a human subject is extremely attractive for many applications in the field of healthcare monitoring and in developing advanced human-machine interfaces. By the term physical activity, we mean either static postures, such as standing, sitting, lying, or dynamic motions, such as walking, running, stair climbing, cycling, and so forth. More precisely, we distinguish in this paper between primitives, namely elementary activities like the ones just mentioned, and composite activities, namely sequences of primitives, e.g., sitting-standing-walking-standing-sitting, in as much the same way as we distinguish between words and sentences in a spoken language.

The information on the human physical activity is valuable in the long-term assessment of biomechanical parameters and physiological variables. Think, for instance, of the limitations when the metabolic energy expenditure of a human subject is estimated using indirect methods: serious estimation errors may occur when wearable sensor systems composed of motion sensors, such as accelerometers, are used without any regard to what she/he is actually doing [

In this paper the most common approaches to automatic classification of human physical activity are introduced and discussed. In regard to the problem stated above, the main steps regarding sensor selection, data acquisition, feature selection, extraction and classification are reviewed by tracing the diagram of

The first important aspect to be considered in building a system for automatic classification of human physical activity concerns the choice of sensors. Wearable sensors should be small and lightweight, in order to be fastened to the human body without compromising the user’s comfort and allowing her/him to perform under unrestrained conditions as much as possible. Although ultrasonic or electromagnetic localisation systems [

Historically, accelerometers entered the biomechanical arena well in advance to gyros. Few pioneering contributions [

Interestingly, using accelerometers is also commonplace in many other biomedical applications, such as tremor analysis [

A pattern recognition machine does not perform its classification tasks working directly on the raw sensor data. Usually, the classification is pursued after that a data representation is built in terms of feature variables. The choice of features with high information content for classification purpose is both a fundamental step in the development of any pattern recognition machine and a highly problem-dependent task.

An accelerometer—the sensor of main interest in this paper—measures the projection along its sensitive axis of the specific force

Although the choice of features is problem-specific, and different researchers may pursue different approaches for their identification and computation [

The DC component of acceleration is estimated by taking the signal average from the data samples within each frame. Since each accelerometer axis provides a data frame, the DC component feature vector can be conveniently used to get an idea about how the body is oriented in space with respect to the gravity direction. The DC component is thus well suited to classify postures.

Simple statistical descriptors, such as the variance, are widely used; the variance is computed by taking the average of the squared detrended data samples within each frame. The signal energy and the distribution of signal energy over the frequency domain are other popular choices. Frequency-domain features can be derived from the coefficients of time-frequency transforms, like the Short Time Frequency Transform (STFT), the Continuous or the Discrete Wavelet Transform (CWT, DWT) [

The frequency-domain entropy is helpful in discriminating primitives that differ in complexity. As a matter of fact, walking and cycling can be difficult to discriminate based on the DC component and energy features; however, the walking entropy turns out to be much higher than the cycling entropy, mainly because of the foot impacts with ground occurring during walking, which give rise to the distinctive high-frequency coloured noise-like signatures typically observed in the signals from on-body accelerometers. In this paper, the coefficients of the STFT transform are used to compute the frequency-domain entropy [

The correlation coefficients between each pair of accelerometer signals are also useful features. They are obtained by computing the dot product of pairs of frame vectors, normalised to their length, and are highly helpful in discriminating activities that involve motions of several body parts [

When the dimension of the feature space is high, learning the parameters of a classifier becomes a difficult task, especially when the size of the training set is small (the curse of dimensionality). Usually, one individuates, empirically or based on theoretically sound considerations, as many features as needed to deal with the classification problem at hand. The available dataset is then divided into a training set and a test set. As a rule of thumb, the

The feature selection approach consists of detecting and discarding the features that are demonstrated to minimally help to cause a correct response by the classifier. The identification of the optimal feature set is not always feasible because of the high computational costs connected to searching through an inordinate number of

The feature extraction approach revolves around the idea that data representations can be constructed in subspaces with reduced dimension, while at the same retaining, if not increasing, the discriminative capability of the new set of feature variables [

Feature selection and feature extraction are not necessarily cascaded in some predefined order. Oftentimes, for instance, a feature selection algorithm is either applied to data that have been previously subjected to dimensionality reduction by feature extraction, or without a successive extraction step.

A taxonomy of classifiers can be built according to different criteria [

In accordance to the rules of the probabilistic approach, a feature vector _{i}

In the geometric approach the classification is performed based upon the construction of decision boundaries in the feature space that specify regions for each class. Decision boundaries are constructed during the training session via iterative procedures or geometrical considerations. As a matter of fact, Artificial Neural Networks (ANN) are based on iteratively tessellating the feature space [

A carefully handcrafted setting of thresholds is required in order to separate the various classes under examination. For instance, a threshold based on an energy-related feature, or simply the data variance, helps discriminate between presence and absence of motion. The main disadvantage of this approach is its potential sensitivity to intra and inter individual variations and to the precise placement of sensors. In this sense extensive handcrafting of classifier parameters is believed to be detrimental for achieving good generalisation properties of the classifier itself [

The template matching approach is based on the concept of similarity between observed data and activity templates, either defined by the designer or obtained during the training session. The editing and condensing techniques, customarily applied to

Finally, there exist so-called binary classifiers, where the classification process is articulated in several different steps. At each step, different strategies, based on either threshold-based or template-matching detectors, are followed to reach a binary decision. For instance, in hierarchical binary decision trees each node is capable of discriminating between two states, and the classification becomes progressively more refined as the tree is descended along its branches [

Although the single-frame methods are quite widespread for classification of human physical activity, a possibly better way to deal with this problem is to exploit the decisions taken by the classifier in the past (sequential approach to classification). If we turn our attention to a sequential classification approach, a composite activity (motor sentence) can be conveniently viewed as the result of chaining a number of primitives (motor words). The knowledge about the way humans organise the functional tasks they are involved in during their daily life (motor language) can help describing the statistical properties of this chaining process. The sequential approach calls quite naturally for Markov modelling [_{i}

prior probability vector _{i}_{i}_{0}:

transition probability matrix (TPM) _{ij}_{i}_{n}_{j}_{n+}_{1}, as schematically depicted in

Elementary considerations of probability calculus yield the following constraints for the transition probabilities:

The prior and transition probabilities needed to create the Observable Markov Model (OMM) (π, A) associated to the Markov chain can be empirically determined based on observations of the activity behaviour of a subject. If the TPM and the state at the current time are known, then the most likely state that will follow is probabilistically determined. In a more practical sense, each primitive can only be observed through a set of raw sensor signals (the measured time series from on-body accelerometers, in the present case). We would like to infer the hidden state from the available noisy observations, and to trace the time history of how the primitives have evolved up to the present time, in order to estimate the composite activity. In other words, the states are hidden and only a second-level process is actually observable. The observable outputs are called emissions.

If the assumption is made that the emissions are discrete, an alphabet _{i}, i_{ij}_{j}_{n}_{i}

Finally, an HMM is modelled by a parameter set

If the emissions are continuous, continuous PDFs are to be assigned, instead of probability mass functions (

A mixture of _{jm}_{jm}_{jm}_{jm}_{jm}

The HMM modelling framework requires that three main problems are solved, (a) thru (c): given an observation sequence _{1})_{2})…_{T}_{1})_{2})…_{T}

Currently, HMMs are applied in a large number of pattern recognition problems. For many years, speech recognition has been considered the killing application for HMM [

In this paper we propose to build a sequential classifier composed of a Gaussian cHMM. A potential problem with this approach is the huge number of parameters we need to estimate. In fact, a Gaussian cHMM trained in a

Suppose that the training set presents only a relatively limited number of examples. A sensible approach to deal with the difficulty of parameter estimation may be to train, separately, different subsets of them. We propose to train the transition parameters,

Finally, an interesting feature of the classifier we have developed resides in its capability of managing spurious data. One difficulty for the classifier is in fact when activity primitives are presented during operation, and examples of them are not included in the training set. Our approach to deal with this problem consists of computing the likelihood of each feature vector, given the GMM structure that models the cHMM emissions. A simple threshold-based detector enables to flag anomalous feature vectors, preventing them from being actually presented to the classifier.

At the time being, we are developing a wearable sensor system for indoor-outdoor pedestrian navigation, which embodies the following sub-systems: an on-body network of four tri-axial accelerometers, an on-foot fully integrated Inertial Measurement Unit (IMU) that includes a triad of magnetometers, and finally a waist-worn GPS receiver. Since the hardware and firmware components of this system are currently undergoing their production phase, the validation of the classification methods studied in this paper is based on analysing a dataset of acceleration waveforms, made available to us by Prof. Intille and associates at MIT [

The classification methods were applied to the dataset described in [

Since the research goal in [

The procedure of synthesising virtual experiments in the manner described above implied the existence of clear-cut borders between data frames associated to different primitives, which were managed by data cropping in creating the original dataset [

The feature vectors were built from 50%-overlapping sliding windows with 512 samples. Since the sampling frequency was 76.25 Hz, each data frame lasted 6.7 seconds, with every new frame available every 3.35 s. The DC component, the energy, the frequency-domain entropy, and the correlation coefficients were calculated for inclusion in the feature vector. In order to evaluate the entropy, the PDF of the STFT coefficients was estimated using an Epanechnikov kernel density estimator. Since five dual-axis accelerometers were considered in the experimental setup, each feature vector was composed of 30 components, which yielded the DC component, energy and entropy for the 10 data channels, plus 55 correlation coefficients (^{th}-dimensional feature vectors were not submitted to any feature extraction step.

The single-frame classification algorithms included in

The figure of merit for classifier performance assessment was the aggregate classification accuracy; it was computed by constructing an aggregated confusion matrix that added the classification outcomes for all subjects. The algorithms were developed in MATLAB, using the PRTools [

The cHMM-based sequential classification algorithm was a

The classifier training was performed both by running the first-phase training only, and by combining it with the second-phase training, as discussed in Section 2.6. Additional testing was performed, where the TPM estimated in the first-phase was altered, before applying the cHMM-based classifier to incoming data during testing, both with and without the second-phase training.

Finally, additional testing was performed with the aim to specifically assess the classifier capability of protecting itself from spurious data. The threshold for spurious frame rejection was determined by performing a ROC study of sensibility and specificity of the classification process, averaged over all subjects. If not flagged as spurious, each feature vector presented to the classifier was assimilated by the Viterbi algorithm, used for estimating the most likely state sequence generated by the cHMM.

The training set for single-frame classifiers is composed of

The performance of the single-frame classifiers is reported in

As for the cHMM-based sequential classifier, we settled

Results summarising the performance of the cHMM-based sequential classifier are given in

Finally, we are interested in assessing the benefits of the rejection of spurious feature vectors, outlined in the previous Section (sensibility: 96.4%; specificity: 93.7%),

The performance improvement is remarkable yielding results similar to those achieved when spurious data frames are not inserted in the sequences to be classified, see

The classification accuracy achieved by analysing the acceleration reduced dataset for the purpose of classifying the seven primitives of

It is remarkable that the features selected by the Pudil algorithm yield simply gross postural information (the DC components), and highlight the existence of stable patterns in the various acceleration time series due to coordinated motion of different body parts (the set of surviving correlation coefficients). Nonetheless, it is argued that energy and entropy time-domain features would be highly valuable, provided that we decide to investigate other activities, e.g., those from the set studied in [

An important contribution of this paper is the demonstration that Markov modelling can be an important weapon in our arsenal of computational methods for classification of human physical activity. In fact, it should be pointed out that the cHMM-based sequential classifier performs systematically better than its simple single-frame GMM counterpart (99.1%

The supervised training is pursued in this paper with the idea to split the process of estimating the parameters of the cHMM-based sequential classifier into two distinct phases. This is a helpful recipe to effectively cope with the size limitations of the training set.

A final point is related to the proposed method for managing spurious data. It is worthy noting that most published studies, including [

In conclusion, in this paper we have reviewed the various steps needed to implement a pattern recognition machine for automatic classification of human physical activity from on-body accelerometers. A major contribution of the paper lies in pursuing a Markov modelling approach to the design of one such machine. The results of extensive testing performed on an available dataset of acceleration time series shed light on the potential advantages of the proposed approach.

Future work will concern the integration of the proposed pattern recognition machine in the wearable sensor system we are currently developing in our lab for applications in the field of outdoor-indoor pedestrian navigation.

The authors are indebted to Stephen S. Intille, for allowing them to use his acceleration dataset for the computer experiments in this paper.

Conceptual scheme of a generic classification system with supervised learning.

Graphical representation of a six-state Markov chain: the nodes are the states of the chain; the oriented arcs between nodes denote state-to-state transitions, including self-transitions.

Block diagram of the developed cHMM-based sequential classifier.

Experimental setup for the acquisition of the selected dataset (courtesy of Ling Bao and Stephen S. Intille © 2004 IEEE).

Sequential classification through an HMM-based classifier.

Classification accuracy

Feature vectors of three different classes are projected in a bi-dimensional subspace, to show how spurious data can be rejected based on the value of its likelihood.

State of the art of human motor activity classification systems.

[ |
1 tri-axis accelerometer (3D acc) | Raw data |
GMM | 8 | 6 | 91.3 |

[ |
1 bi-axis accelerometer (2D acc) | Wavelet coefficients | k-NN | 5 | 6 | 86.6 |

[ |
1 3D acc | Standard deviation |
Naive Bayesian |
8 | NA | 46.3–99.3 |

[ |
5 2D acc | Standard deviation |
Naive Bayesian |
20 | 20 | 84 |

[ |
2 3D acc | Wavelet coefficients | ANN | 4 | 6 | 83–90 |

[ |
1 2D acc | RMS velocity | ANN | 6 | 10 | 95 |

[ |
1 2D acc |
Standard deviation |
ANN |
7 | NA | 42–96 |

[ |
1 3D acc | Wavelet coefficients |
Threshold-based | 3 | 23 | p < 0.01 |

[ |
1 3D acc | Wavelet coefficients | Threshold-based | 3 | 20 | 98.8 |

[ |
1 2D acc 1 gyro | Wavelet coefficients | Threshold-based | 5 | 44 | > 90 |

[ |
1 3D acc | FFT | Threshold-based | 9 | 12 | 95.1 |

[ |
1 2D acc |
Raw data |
Threshold-based | 5 | 8 | 92.9–95.9 |

[ |
2 uni-axis acc (1D acc) | Median |
Threshold-based | 4 | 5 | 89.3 |

[ |
4 1D acc |
FFT | Template matching | 9 | 24 | 95.8 |

[ |
3 1D acc | DC component |
Threshold-based |
6 | 10 | 80–97.5 |

[ |
5 1D acc |
Angular signal |
Binary decision | 23 | NA | 81–93 |

[ |
1 3D acc | Magnitude area/vector |
Binary decision | 10 | 6 | 90.8 |

Activity primitives in the reduced dataset.

sitting | walking |

lying | stair climbing |

standing | running |

cycling |

Single-frame classifiers.

Naive Bayesian (NB) | Support vector machine (SVM) | Binary decision tree (C4.5) |

Gaussian Mixture Model (GMM) | Nearest mean (NM) | |

Logistic classifier | k-NN | |

Parzen classifier | ANN (multilayer perceptron) |

Example of TPM.

0.9500 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0100 | 0.0400 | |

0.0001 | 0.8999 | 0.0000 | 0.0400 | 0.0000 | 0.0100 | 0.0500 | |

0.0001 | 0.0000 | 0.6199 | 0.2500 | 0.0100 | 0.0200 | 0.1000 | |

0.0001 | 0.0100 | 0.0300 | 0.7999 | 0.0200 | 0.0700 | 0.0700 | |

0.0001 | 0.0100 | 0.0100 | 0.3500 | 0.3999 | 0.0100 | 0.2200 | |

0.0200 | 0.0000 | 0.0100 | 0.0400 | 0.0000 | 0.8500 | 0.0900 | |

0.0100 | 0.0300 | 0.0100 | 0.1800 | 0.0300 | 0.1200 | 0.6200 |

Single-frame classifier performance.

NB | 97.4 |

GMM | 92.2 |

Logistic | 94.0 |

Parzen | 92.7 |

SVM | 97.8 |

NM | 98.5 |

k-NN | 98.3 |

ANN | 96.1 |

C4.5 | 93.0 |

Sequential classifiers classification accuracy.

First-phase only | 95.6 |

First and second-phase combined | 98.4 |

Performance in the presence of spurious data (one spurious frame every three data frames).

Without rejection of spurious data | 73.3 |

With rejection of spurious data | 99.1 |