1. Introduction
Physical activity (PA) is fundamental for functionality of the human body and it is one of the strong predictors of healthy ageing and wellbeing. Low physical activity in the elderly population is strongly associated with many fall related injuries, age-related loss of muscle, mobility disorders, and loss of independence in daily life. A study conducted by the World Health Organization (WHO) in the 28 member states of European Union (EU), proposed that promotion of physical activity and prevention of falls are among the five priority interventions to promote healthy ageing [
1]. The statistics shows that the proportion of falls per year is 30% among the population over 65 which increases to 50% in the population above 80 [
1]. Better knowledge about activities of daily life (ADL) is needed in order to design interventions to prevent inactivity and improve health and function during the ageing process.
Recent technological advances in the IMU (inertial measurement unit) sensors have encouraged researchers and scientists to incorporate these in personal health systems. This is mainly due to their low cost, low power consumption, small size, wearability, and reliable data transfer capabilities. A typical IMU device is composed of a tri-axial accelerometer and gyroscope capable of measuring linear acceleration and angular velocity. There is an increasing number of physical activity classification (PAC) systems to classify the ADL by utilizing these sensors [
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16]. The overall performance of these PAC systems presented in the literature can depend on many factors, illustrated in
Figure 1.
- (i).
Dataset: Nature of the datasets differs in terms of the population studied, how and where the ADLs are performed and the type of ADLs included in the dataset. Majority of the existing PAC systems developed in the literature have used datasets collected in a laboratory setting or in a controlled environment with predefined sets of activities [
13,
14,
17,
18].
- (ii).
Number of sensors: Varies from a single sensor setup [
3] to multiple sensors setup [
2,
4,
5].
- (iii).
Placement of sensors: Varies, covering different body locations in order to record the upper and lower body movements. The common sensor placements are L5, hip, thigh, waist, foot, ankle, chest, and wrist [
4,
5,
14,
17,
18,
19].
- (iv).
Features set: Existing PAC systems are composed of numerous time and frequency domain features, statistical features and bio-mechanical features [
8,
20].
- (v).
Window size: Window size and overlapping intervals used for the feature computation vary and they may affect the performance of machine learning algorithms and classifiers. The window sizes largely differs across the PAC systems proposed in the literature: 2 s [
4], 2.5 s [
11], 5 s [
5], 5.12 s [
3], 6.7 s [
2], and 10 s [
9]. The overlapping interval used in most of the PAC systems is 50% of the window size [
20].
- (vi).
Classifier: In most of the PAC systems, a single classifier is used to differentiate between all the different ADLs in the dataset. A common choice for such classifiers may include a decision tree classifier [
2], support vector machine (SVM), artificial neural network (ANN) [
13], and K-nearest neighbors (KNN) [
4]. However, some systems have attempted to integrate the base level classifiers either by plurality voting [
3] or by defining a hierarchical classification process which uses different classifiers for each subset of ADL [
6,
10,
15].
The choice of each single aspect discussed above is crucial in the development of a robust PAC system since all of these factors contribute directly to overall performance. Due to the large diversity in the design process, the existing PAC systems are not directly comparable which hinders the development of new techniques informed by the strengths and the gaps in these systems. Another issue is that most of the existing PAC systems used younger subjects for data collection [
3,
4,
5,
6,
9,
10,
13,
14,
17,
21,
22] and few systems collected data on older subjects [
11,
23,
24,
25,
26]. Furthermore, most PAC systems are developed in a controlled environment, which is quite different from real-life conditions [
27]. A group of researchers [
28] recently proposed a set of recommendations about the standardization of validation procedures for PAC systems in older people, which emphasizes the need to develop and validate the systems using a semi-structured protocol where ADLs are performed in real-life conditions, in addition to the validation performed in the laboratory setting.
In the past, some researchers [
10,
29,
30] have tried to compare the performance of their proposed PAC systems with existing systems. However, in our opinion, they failed to provide a fair comparison, since they did not consider that the factors reported in
Figure 1 were just not comparable. Therefore, the present study aims to propose a fair and unbiased benchmark for the field-based validation of existing state of the art (SOA) systems for PAC of older subjects highlighting the gap between the laboratory and real-life conditions. The specific aims of this study are as follows:
- (1)
To compare the performance of existing PAC systems in a common dataset of activities of older subjects in an unbiased way (i.e., with the same subjects, sensors, sampling frequency, window size and cross-validation procedure), and to investigate the effect of varying window size on system’s performance.
- (2)
To validate and compare the performance of the PAC systems in real-life scenarios compared to an in-lab setting in order to check if these systems are transferrable to real life settings.
- (3)
To evaluate the impact of the number of sensors on the performance in the analyses in (1) and (2) using a reductionist approach (i.e., analyzing only the sensing unit worn at the lower back instead of the multi-sensor setup). The lower back location is chosen since it is a very common case that shows no major drawbacks for the monitoring of the activities of older subjects.
For the presented aims, we selected three representative SOA systems for PAC [
2,
9,
10] motivated by the following reasons: (i) diversity in the number of sensors used; ranging from four sensing units by Leutheuser et al. [
10] up to six sensing units by Cleland et al. [
9]; (ii) use of different time intervals for windowing (ranging from 5 s [
10] to 10 s [
9]); (iii) different classification techniques i.e., decision tree classifier by Bao et al. [
2], SVM by Cleland et al. [
9], and hierarchical classification by Leutheuser et al. [
10].
Four ADLs (sitting, standing, walking, and lying) are studied in this work in order to provide a fair comparison. These ADLs are chosen as they are the most common in this kind of studies and due to these four activities being present in all of the selected systems.
The rest of the article is structured as follows:
Section 2 presents the methodology of the study and the description of the dataset used; in
Section 3, results with a comprehensive discussion on the findings are presented; in this section comparative analysis of the three systems is also presented;
Section 4, concludes the study.
4. Conclusions
A benchmark study is presented which investigates the performance of various SOA systems for PAC in the in-lab and out-of-lab environment. The sensitivity analysis to window size shows that the increase in window size generally degrades the performance. The in-lab training/out-lab testing analysis concludes that the systems developed in controlled settings are not capable of performing well in real-life conditions where the ADLs are performed in a more natural way. Therefore, the newly-developed systems should be trained and tested on the dataset collected in the real-life conditions. The reductionist approach also obtained similar results for all analyses (in-lab sensitivity analysis to window size, out-of-lab analysis, in-lab training/out-lab testing) but the degradation is much larger than the multi-sensor setup. Furthermore, investigation of the computational complexity is conducted for the feature extraction stage and the classifier testing stage of out-of-lab data. The findings, as we expected, show that the systems with more complex classifier approaches and large numbers of sensors increases the computational complexity of the system.
The number of analyzed subjects (16) is a limitation to overcome in future studies by adding more subjects. However, the analyzed database is one of the largest databases available to date [
31], especially considering that the activities were manually annotated with a very high frequency (25 Hz, 25 annotations per second) and this process required significant resources. Another limitation of this study is that it only investigates basic ADLs while real life conditions contain many other activities.
The reductionist approach we developed which, derived from existing systems, is an important first step to study the effect of reducing the number of sensors in order to find an optimal trade-off between usability and performance (the use of multiple sensors on various body locations can be impractical in real-life).
Our future aim is to develop a physical activity classification system in real life conditions with optimal number of sensors (by exploring various sensor locations), improved feature set (using various feature selection approaches), and robust classification methods to perform comparably to, or better than, existing systems.