The monitoring of daily activity patterns may support people seeking to enhance their personal fitness, promote patient engagement in the management of chronic diseases, or enable providers and patients to gain insights into the progression and impact of illnesses [
1]. A great deal of research work has focused on the concept of frailty which has proven to affect older people’s lives and health radically [
2], increasing the risk for falls, disability, hospitalization, loss of autonomy and mortality. Because of its complexity, multi-faceted nature and unclear pathophysiology, there is a great difficulty in defining, early identifying and preventing frailty. Information and communications technologies [
3,
4,
5,
6,
7] try to address this unmet need, many of them by monitoring the physical behavior of older people aiming at providing solutions for active and healthy aging.
Most of the technological solutions previously reported in the literature for the recognition of activities of daily living (ADL) use a variety of wearable and non-wearable sensors. In this paper we present an activity classification scheme for detecting movement patterns of older people and focus on the necessary model reconfigurations for resolving challenges imposed by inconsistent measurements due to the type and misplacement of the sensors. Two approaches will be presented. The first is based on standard machine learning techniques coupled with necessary pre- and postprocessing steps to tackle data inconsistencies [
8], which is referred to as the augmented standard approach in the paper. This approach was incorporated in the online analysis module of the FrailSafe project [
9], and therefore was constrained to be computationally efficient. The second approach exploits deep network architectures, and specifically convolutional neural networks (CNNs), and aims to provide an alternative solution for unified multi-scale feature extraction and classification.
1.1. Related Work
One of the earliest and well cited studies on sensor-based ADL classification used accelerometers in multiple locations of the body of young individuals and performed decision tree classification [
10]. The results showed that for some activities inter-subject analysis was difficult, requiring the training of subject-specific models. A signature extraction methodology was proposed in [
11] using a smartphone’s accelerometer, placed at the subjects’ pelvis and implementing a threshold-based, or a principal component analysis (PCA)-based classification algorithm. In [
12] the possibility of improving ADL classification accuracy by applying feature ranking and selection is explored. Activity recognition was performed in [
13] using a hidden Markov model on recordings from sensors placed in the house and on the body, whereas in [
14] the aim was to identify high falls’ risk-related activities of older people based on a wearable passive radio frequency identification sensor. Analysis was based on data from healthy adult volunteers.
Although many frameworks have been reported in the literature for activity monitoring of older people, most of them have been tested on data from young and healthy participants [
11,
12,
14], or the experiments were performed in laboratory conditions, e.g., in [
13] a scaled model of a house used along with a simulated sequence of activities. Those works report high classification accuracy, but results are not directly comparable with uncontrolled monitoring systems in real home environments. In contrast to the aforementioned studies, our approach is tested on data unobtrusively recorded from wearable sensors designed for monitoring the physiological signals of older people during their everyday life activities.
Comparable works that use wearable sensors on older people for activity recognition aim at providing the means for promoting active and healthy ageing through the development of assisted living systems or prediction of adverse events [
15,
16,
17,
18]. In [
15] the authors reported the use of a smart watch enclosing three different kinds of sensors, namely accelerometer, temperature sensors and altimeter. After performing some calibrating actions on the raw signals and feature selection, neural network and support vector machine (SVM) classifiers were used for classifying the activities of the elderly. In [
16] an inertial measurement unit located near the sternum and a thigh wearable sensor were used to detect the posture of the elderly with the deployment of a rule-based algorithm. Extraordinary work on activity classification for the elderly is reported in [
17], where the idea of instrumented shoes able to record movement is introduced for the purpose of discriminating postural transitions, locomotion and walking activities using decision trees. A review on the potential benefits of ADL monitoring in the care of elderly people is presented by Gokalp and Clarke [
18].
Deep learning architectures, such as convolutional neural networks (CNNs), have also been previously exploited for ADL recognition [
19,
20,
21,
22]. CNNs show great potential because they can tackle the two main elements of ADL recognition. The first refers to localization, i.e., the ability to capture only the part of the signal stream that is relevant to the pattern of interest. In fact, continuously monitored physiological signals contain in the majority non-specific or not distinct activities or transitional states. These irrelevant parts, denoted as null activity, dominate over the few distinct classes introducing a major challenge for classification. The second element is the rich variation in how a given activity can be performed, usually consisting of the decomposition into several movement patterns, arranged sequentially with a smooth transition. CNNs can extract discriminative features in a hierarchical way from lower to higher scales. This allows identifying the basic movement patterns (in the lower layers of the architecture) along with the combination of several movements (in the higher layers), thereby capturing multi-scale salient patterns characterizing the different activities across individuals.
The work most related to ours is the CNN proposed by Yang et al. [
19]. This evaluated the hand gesture recognition problem [
23] using recordings from body-worn sensors and on the opportunity activity recognition dataset [
24,
25] that includes activities in the kitchen, monitored by a variety of body-worn, object-based, and ambient sensors. The authors in [
26] experimented with the use of deep CNNs in human activity recognition using data from smartphone sensors. They report increasingly better performance with the addition of new layers, while complexity is successfully decreased, ending up as an effective solution for smartphone-related activity recognition systems. In another work [
20], a shallow CNN was used with a weight-sharing technique on accelerometer signals. This showed improved classification over previous techniques, such as PCA based on empirical cumulative distribution estimation and k-nearest neighbor classification. Evaluation was based on recordings collected using a cell phone in a controlled laboratory environment [
27]. Moreover Jiang and Yin [
22], aiming to exploit 2D CNNs, stacked the raw signals row-by-row such that every signal sequence becomes adjacent to every other sequence creating a 2D image. Then they calculated the magnitude of the 2D discrete Fourier transform of this 2D image and used it as input to a deep 2D CNN. A deep belief network (DBN) [
28] was investigated for ADL classification in [
29], however, these networks are fully connected and therefore do not capture local dependencies of the signals. A different approach was reported in [
30], where a transfer learning methodology was proposed for activity recognition without using new labeled data. This approach is based on measuring the correlation between observations acquired from an old sensor for which trained models are available, and observations from a newer sensor. This multi-view learning technique achieved high accuracy and is applicable in both batch and online modes.
Furthermore, Ordóñez and Roggen [
21] used convolutional and long short-term memory (LSTM) recurrent units to model the temporal dependencies for multimodal activity recognition showing improved performance over baseline CNNs. An interesting approach in handling the heterogeneity of data in the context of human activity recognition is reported in [
31], where a combination of convolutional and long short-term memory layers forming a deep network serve as a fine-tuning process to efficiently learn new settings of an existing system. More details on the effectiveness of different network architectures can be found in [
32], where deep, convolutional, and recurrent approaches are compared using recordings of physiological activity captured by wearable sensors.
1.2. Challenges in Activity Recognition Using Wearable Sensors
Moving from a laboratory environment to real-life experiments, researchers have to deal with numerous obstacles that they must overcome, concerning mostly the devices used to monitor older people. A first challenge appears in case the device is not placed with standard orientation, causing rotation of axes. For sensors such as the accelerometer, orientation plays a significant role in recognizing the subject’s activity. Thus, a misplacement of a wearable device could easily disorientate the data analysis. It should be noted here that research has been conducted in the past regarding the sensor displacement problem, e.g., in [
33] the authors present a benchmark dataset to examine the performance of activity recognition algorithms. Nonetheless, both the problem of inconsistent sensor placement and the lack of uncontrolled environment data still exist, thus need to be addressed.
Another issue occurs when a different type of sensor is used across individuals (e.g., accelerometers with different technical characteristics), or the sensors are placed in different locations on the body. This is possible when data from different clinical centers are combined or for example, after updating hardware and software components, since it causes lack of uniformity in the dataset adding barriers on inference and modeling. When activity annotations are used for model construction (e.g., in supervised learning settings), additional challenges come from the inter- and intra-rater variability, the rater’s subjectivity and the interactive nature of the annotation process, especially when frail individuals are instructed and monitored. A relevant work that aimed at integrating new sensors to an existing activity recognition system is reported in [
34], where a semi-supervised graph method was used to incorporate the information provided by the new sensor and improve the system’s performance. The list of challenges could be really long, but we focus on the aforementioned issues since they are likely to occur in uncontrolled sensor systems, such as the one used in this work. Retrospective solutions on how to address these issues during data analysis are proposed and discussed in the subsequent section.