1. Introduction
The rise in life expectancy and the profound changes of the family structure have provoked a remarkable increase in the number of older people that live on their own. In this respect, falls are one of the most challenging risks to the autonomy and quality of life of the elderly. The World Health Organization has reported [
1] that roughly a third of people aged over 65 fall at least once every year (with a prevalence of 50% for those over 80). According to certain clinical studies [
2], up to 48% of fallers are incapable to get up on their own after a fall, thus, a third may remain on the floor longer than one hour. Unfortunately, a long period lying on the floor without assistance is closely linked to comorbidities (such as pressure sores, dehydration, pneumonia, or hypothermia) that cause long term hospital or care home admissions while remarkably increasing (up to 50% of the fallers) the probability of dying within six months [
3,
4]. In addition, from an economic point of view, the yearly medical costs attributable to older adult falls (only in the USA) have been estimated in
$50.0 billion [
5]. With that in mind, the deployment of cost-effective Fall Detection Systems (FDS), has gained much attention in recent years, as a pivotal element to support the autonomy of older people and the affordability of health systems.
The general objective of an FDS is to track the movement of potential fallers and to automatically transmit an alarm (text message, phone call, etc.) to a remote monitoring point as soon as a fall is suspected to have taken place. Thus, FDSs are a particular typology of pattern recognition systems for human movements or HAR (Human Activity Recognition) intended to discriminate falls from ordinary movements or ADLs (Activities of Daily Living). FDS are designed to achieve a compromise between the need of minimizing the number of unnoticed falls and avoiding false alarms (namely ADLs misidentified as falls).
Depending on the nature of the employed sensors and the application environment, FDSs can be categorized into two generic groups [
6,
7,
8]. The first group corresponds to Context-Aware Systems (CAS), which detect the occurrence of falls by analyzing the signals captured by a set of ambient and/or vision-based sensors, located in a preconfigured scenario around the patient (for example, a nursing home). The operation of CAS (or ambient-based) architectures (which normally entail high installation and maintenance costs) is restricted to a very particular zone and prone to errors caused by spurious elements [
9] (falling objects or pets, changes in the lightning or in the position of the furniture, which create shadow or occlusion areas for the cameras, etc.). Furthermore, under this supervision, the user may feel his/her intimacy violated if permanent use of video-cameras or microphones is required.
On the contrary, the other general typology of FDSs (wearable systems) can undertake the monitoring of the subjects in an almost ubiquitous and continuous way thanks to transportable sensors that are attached to clothing, or embedded into a pendant or into any other personal device or accessory of the user. As long as the system is provided with a long-range connection (e.g., cellular telephony), the operability of a wearable FDS (and consequently the freedom of movements of the patient) is not constrained to a specific area. Moreover, the signals measured by the sensors unambiguously describe the mobility of the user to be supervised. Due to the increasing popularity and wide expansion of wearable devices among the general public, the research on wearable FDS has witnessed a boost during the last decade.
A wearable FDS typically consists of a set of inertial sensors and a central transportable node, which is in charge of classifying the user movements as a fall or as an ADL based on the measurements collected by the sensors. In most proposals, the sensors and the central node are integrated in the same device (usually a smartphone, which natively embeds an accelerometer and a gyroscope). Otherwise, the sensors are placed in independent external sensing motes which can be attached to certain parts of the body more easily and which wirelessly connect to the central node (e.g., via Bluetooth), creating a Body Sensor Network.
The central decision element in an FDS is the detection algorithm. Existing algorithms for FDS can be roughly grouped into two main categories [
10]: threshold-based and machine learning strategies. Threshold-based solutions detect a fall accident by contrasting the signals collected by the inertial sensors (e.g., the acceleration magnitude) with one or several values of reference which are supposed to be surpassed whenever a fall occurs. However, due to the variety and complexity of the movements associated to falls, these deterministic “thresholding” policies are normally too rigid and produce poor results. In fact, the other type of detection strategies (based on machine learning and artificial intelligence techniques) generally outperform thresholding methods [
11].
When the fall detection problem is approached with pattern recognition methods, the arbitrary selection of a decision threshold is avoided. Instead, the detection algorithms are parametrized to infer the unknown function that links a set of statistical features derived from the mobility measurements and the correct classification decision (i.e., fall or ADL) that corresponds to every movement. Under supervised learning schemes (which are typically employed in most fall detection architectures, although there are also examples of unsupervised and semi-supervised FDSs), this parameterization is achieved by means of a training procedure during which the algorithm is configured to map the inputs (computed from the collected measurements of the movement) and the desired output (the binary categorization of the movement) for a predetermined set of labeled samples. In this field, deep learning architectures can be characterized as a type of machine learning that utilizes multiple and successive processing layers, aiming at obtaining a representation of the data with different levels of abstraction [
12]. The most interesting advantage of deep learning techniques is that they can automatically and directly extract from the raw data (in an FDS, those directly collected by the sensors) the most adequate features to represent the input patterns that must be classified. Hence, in contrast with most machine leaning algorithms, deep learning techniques do not require a previous feature-extractor, which is often manually-engineered and strongly dependent on the expertise of the designer.
Convolutional Neural Networks (CNNs) are one of the most promising and widespread deep learning methods. In order to discover the underlying structure and interrelation of the input data in big datasets, CNNs employ several layers with convolutional filters (or kernels) that sequentially operate to condense the input data into a series of feature maps.
CNNs were originally envisaged for image recognition and classification systems, but they have been profitably applied to other domains, including HAR (Human Activity Recognition) systems [
13,
14]. Therefore, CNNs can replace those conventional machine learning techniques that entail some sort of “handcrafted” selection of input features, which must be computed from the inertial information captured by the wearable sensors prior to be fed into the classifier.
On another note, most wearable FDSs proposed in literature exclusively make use of the accelerometry signals as the input data that characterize the movements of the users to be monitored. However, in many of these proposals, the detection architecture was provided with an Inertial Measurement Unit (IMU), which integrates not only an accelerometer, but also a gyroscope (and in many cases a magnetometer, also). Consequently, since it is not particularly complex to access the information about the evolution of the angular velocity of the user’s body, it is of great relevance to assess whether the complementary use of the measurements delivered by the gyroscope can enhance the efficiency of the fall detection system.
In this paper, we employ CNNs (also considered in the FDS by He et al. in [
15]), to evaluate the advantages of introducing the data collected by the gyroscope in the input features of the classifier intended for a wearable FDS. As the benchmarking tool, we employ the SisFall repository [
16] (available in [
17]), one of the largest public datasets containing accelerometry and gyroscope signals captured during the execution of falls and ADLs.
This paper is organized as it follows: after the introduction
Section 2 revises the state of the art.
Section 3 presents the employed dataset while
Section 4 discusses the election of the input features for the neural classifier.
Section 5 describes the architecture of the CNN, which is tested and systematically evaluated in
Section 6 under different configurations. Finally,
Section 7 recapitulates the main conclusions of the paper.
2. Related Work
As aforementioned, most studies in related literature about wearable FDSs exclusively consider the accelerometry signals to feed the fall detection algorithm. However, the idea of complementing the information of the accelerometer with that provided by the gyroscope is not new.
For example, the study by Casilari and Garcia-Lagos in [
18], which reviews in detail all those FDSs that are based on artificial neural networks, reveals that up to 11 out of the 59 revised proposals utilize the angular velocity captured by the gyroscope to obtain some of the input features of the neural architecture that classifies the movements. To extend that bibliographical analysis,
Table 1 summarizes those works that describe a system intended for fall detection in which the triaxial components of the angular velocity (or some parameters derived from them) are employed as inputs of the classifying algorithm. The table indicates in separate columns the typology of the algorithm implemented by each detector, the achieved performance metrics (expressed in terms of sensitivity and specificity, or failing that, the obtained accuracy) as well as the size of the experimental population (number of volunteers) with which the system was tested. Likewise, those proposed FDSs that make use of both artificial neural networks and the measurements of the gyroscope are recapitulated in
Table 2. For each FDS, the last column of this table itemizes the nature of the features utilized by the neural detector. Thus, that column specifies if they are statistics derived from the sensor measurements (which implies a heuristic selection of features and the need of a certain preprocessing phase to generate the inputs) or if the neural architecture is directly fed by the “raw” measurements so that a deep neural classifier can autonomously identify the features that allow discriminating the occurrence of falls from the monitored movements of the user.
Apart from these studies, in other works, such as in [
19,
20,
21,
22,
23,
24,
25,
26], the data obtained from the three-axis accelerometer, gyroscope and magnetometer are combined to obtain information about the orientation and tilt of the body, which are, in turn, input into the classifier.
Gyroscopes have even also been considered in hybrid FDSs that blend CAS and wearable approaches. In [
27], Nyan et al. describe an architecture that employs the signals of a gyroscope to monitor the sagittal plane of the body, while the user is simultaneously monitored by a video-camera. Following a similar approach, Kepski et al. present a fuzzy system for fall detection in [
28], grounded on the joint analysis of the signals captured by a Kinect device, an accelerometer and a gyroscope.
In spite of the small—but not negligible—number of papers that have addressed the use of the gyroscope in FDSs, the possible incremental benefits of introducing this element (or other sensors) in the detection process (when compared to the case in which only the accelerometer is contemplated) have been discussed in only a few studies.
For example, Bianchi showed that the performance of an accelerometer-based detector can produce better results if it is complemented by a barometric pressure sensor [
80].
Authors in [
81] state that accelerometer-only based detectors can offer a high detection accuracy that could be improved if additional motion sensors (including gyroscopes) are employed. However, no systematic analysis of the advantages of using a gyroscope is provided. De Cillis et al. show in [
39] that the combined use of the information provided by the accelerometer and the gyroscope can reduce the number of false alarms. In particular, the detection algorithms integrate the measurements of the vertical component of the angular velocity to estimate the heading of a user equipped with a waist-worn mobile sensor and a smartphone. Nonetheless, the proposed threshold-based system is tested against only 30 samples (14 falls and 16 ADLs).
The study of the prototype presented by Astriani et al. in [
30] suggests that the combined use of accelerometry and gyroscope signals in basic threshold-methods seems to improve the accuracy of the detector, although the system was evaluated with a small set of falls (only 84). Similarly, according to the results presented by Hakim et al. in [
46], the accuracy of a smartphone-based fall detector (using different machine learning strategies) augments as the number of considered IMU sensors increases. However, only six experimental subjects (and a very small sampling rate of 10 Hz) were employed to generate the training samples of the detection methods. In addition, the process of feature extraction to feed the classifiers is not described.
The study by Dzeng et al. [
43] compares an accelerometer-based and a gyroscope-based algorithm intended for fall detection with smartphones. Results indicate that the accuracy of the algorithm that utilizes the gyroscope signals is more affected in case of work-related motions.
The study by Yang et al. in [
69] analyses the advantages of combining the detection decision of two separate threshold-based algorithms: one using the signals captured by the accelerometer and the other one the measurements from the gyroscope. As a fall is only presumed when both algorithms detect it, the specificity is obviously increased (although the cost in terms of sensitivity is not studied).
Chelli et al. showed in [
72] that the detection ratio of four different machine learning strategies can be enhanced if a wide set of features extracted from the signals of the accelerometer and gyroscope are considered. This improvement is achieved at the cost of requiring of a high number of input features (328). In a real scenario, these sophisticated features should be computed in real time to continuously feed the classifier, which could pose a problem to the limited computing capacity of current wearables.
In this regard, there is a family of FDS that try to benefit from a “multi-sensor fusion approach”, i.e., from the simultaneous use of several sensors. Gyroscopes and altimeters are assumed to be helpful to improve our understanding of the falling dynamics [
82].
In some cases, the gyroscope, or the information about the orientation, is used to confirm the horizontal position of the user after the fall. For example, this strategy is followed by the smartphone only based FDS presented by Viet et al. in [
83]. A similar policy is assumed by Rashidpour et al. in [
76]. These authors suggest that falls are better identified by the accelerometer rather than by the gyroscope. By separately employing both elements, the number of undetected falls could diminish (but at the cost of decreasing the specificity).
Tsinganos et al. [
84] compare four different algorithms extracted from literature to assess the effects of the fusion of data captured by diverse sensors. Nevertheless, the incremental improvement achieved by using the data from the gyroscope is not analyzed. In a multisensory scheme, mutual information analysis is utilized by Chernbumroong [
36] to evaluate the importance of different sensors for the movement classification, concluding that the most relevant features are those derived by two particular (Z- and Y-) acceleration components.
Furthermore, other studies highlighted the different costs linked to the use of the gyroscope. For example, it was proved [
85,
86] that the operation of this sensor normally requires more energy than the accelerometer. Figueiredo et al. present a smartphone-based FDS in [
44], in which diverse features (calculated from the acceleration and angular velocity) are used to produce the detection decision. In their analysis, the authors also highlight that the demanded energy and economical costs of the gyroscope are higher than those of an accelerometer. On the contrary, Nguyen et al. report in [
87] an increase of less than 3% of the consumption in a sensing module (comprising a gyroscope and an accelerometer) when both sensors are activated (with respect to the case in which only the acceleration magnitudes are measured).
In the next sections, we propose to optimize the hyperparameters of a Convolutional Neural Network to detect falls from the raw signals captured by the accelerometer and the gyroscope. Once the final deep learning architecture is parameterized, in order to assess the benefits achieved by the use the gyroscope, the detector is also trained and tested taking into consideration only the accelerometry signals.
3. Employed Dataset
The effectiveness of an FDS is evaluated by assessing the binary responses that it provides when it is fed with a set of known mobility patterns (ADL or fall). Due to the evident and intrinsic complexity of testing an FDS against the actual falls suffered by the target population (the elderly) captured in a real scenario, almost all authors in the related literature utilize datasets obtained from a group of volunteers that systematically emulate a set of predefined ADLs and mimicked falls. Thus, these “laboratory-created” falls are normally collected by monitoring the movements of young volunteers that simulate falling on a cushioned surface to avoid injuries. In this regard, we have to emphasize that the appropriateness of testing a FDS with ADLs and, in particular, falls mimicked by a set of young healthy experimental subjects on a cushioning element is a controversial issue still under discussion and out of the scope of this work (see, for example, the discussion presented by Kangas et al. in [
88]). In this respect, some researchers, such as Klenk et al. in [
89], discovered non-minor discrepancies between the patterns of real-world falls and those mimicked or executed on a padded element. On the other hand, after analyzing the dynamics of actual falls experienced by elderly people, Jämsa et al. conclude in [
90] that intentional and real life falls may exhibit a similar behavior.
Although many authors do not make the traces generated for their experiments publicly available, during the few last years, some repositories (specifically designed for the analysis of wearable FDSs) were published to be used as a benchmarking tool by the research community.
Table 3 summarizes the main characteristics of most of these released datasets (see [
91] for a further insight), which contain a list of numerical series with the measurements obtained from one or several sensing units transported by the experimental users during the execution of the movements. The series provided by these repositories are labelled to indicate if the action corresponds to an ADL or a fall.
By analyzing the data provided by this table, we select the SisFall dataset [
16,
17] not only because it integrates the measurements of both a gyroscope and two accelerometers, but also in the light of the wide size and age range of the experimental volunteers (38 participants including 19 males and 19 females between 19–75 years), the number and duration of samples (4505 samples between 10 and 180 s, comprising 2707 ADLs and 1798 falls) and the large diversity of emulated movements (19 classes of ADLs including basic and sporting activities- and 15 classes of falls).
In this dataset, every experimental subject repeated each type of movement five times, except for the cases of jogging and walking, which were carried out just once but during a longer time interval (100 s).
The sensing node was deployed on a microcontroller-based platform, specifically designed for this purpose, which integrated an ITG3200 gyroscope, two accelerometers (models ADXL345 and MMA8451Q) and an SD card to store the traces. By using an elastic belt, the mote was fixed to the waist. This position is considered to offer an adequate location to characterize the dynamics of the trunk [
57] as it is normally near the human body center of gravity and not particularly affected by the individual movements of the limbs. The employed sampling frequency was 200 Hz, which is far above the minimum rate (20–40 Hz), which was suggested for an adequate modelling of the fall mobility [
103]. In our analysis, we consider the series captured by the MMA8451Q accelerometer as it was programmed with a higher range (±16 g) than the ADXL345 model (±8 g). The range of the gyroscope is 2000°/s.
4. Discussion on the Input Features
The election and design of a fall detection algorithm for an FDS cannot overlook the fact that most wearables present remarkable limitations in terms of storing capacity, power consumption and real-time computation. Consequently, a complex preprocessing of the data measured by the sensors, which could facilitate the task of the detector, may be unviable if it implies an intensive use of the hardware device or a rapid depletion of the battery. Thus, we opt for using the raw measurements obtained from the sensors (the accelerometer and the gyroscope) as the input features of the classifier. Under this approach, in contrast to other proposals, the real-time calculation of a set of arbitrarily chosen statistics computed from the measurements (moments, autocorrelation coefficients, etc.) is not required. In particular, we propose to focus the analysis of the detector on those intervals (or observation windows) where a fall is most likely to have occurred.
Human falls are typically connected to one or several brusque peaks of the acceleration originated when the body hits the ground [
104]. This acceleration peak (or peaks) is preceded by a period of “free fall” during which the acceleration module rapidly decays. In addition, both before and after the impact, the body orientation is normally strongly altered, which is reflected in abrupt alterations of the values of the three coordinates of the acceleration as well as in noteworthy changes of the angular velocity in the three axes measured by the gyroscope. Therefore, we take the peak of the acceleration module as the basis to define the observation window of the detector.
This acceleration module or Signal Magnitude Vector (
) for the
i-th sample is computable as:
being
,
and
the
x,
y, and
z components of the vector measured by the triaxial accelerometer, respectively, while
denotes the component in the vertical direction (perpendicular to the floor plane) when the monitored subject is standing.
For each trace, the maximum (or peak) of the SMV is defined as:
where
n is the number of samples of the movement captured in the trace while
ko indicates the instant (sample index) at which this maximum is found.
As long as the movements associated to a fall typically extend between 1 s and 3 s [
105], we will consider an observation time window of up to 8 s around S
MVmax (4 s before and after the peak) to select the measurements that will feed (as input mobility patterns) the detection algorithm. This procedure of selecting an time window around the detected peak is followed in other studies such as [
98,
106,
107,
108].
To illustrate the dynamics of the two different types of activities that must be discriminated,
Figure 1 displays the measured acceleration components and SMV, as well as the components of the angular velocity, for a certain ADL (walking upstairs and downstairs quickly) and a fall (caused by a mimicked slip while walking). In turn,
Figure 2 zooms into the same series to show a 3-s observation window around the detected acceleration peak (which is indicated in the figures with a square marker).
In
Section 6, the impact of the election of the observation window is investigated.
Once the window size is fixed and the maximum is found, the input sequence (
I) that feeds the detector is formed by simply concatenating the series of triaxial measurements of the accelerometer and the gyroscope (forming a tuple of six elements) in that period of time:
where
describe the three components of the angular velocity measured by the gyroscope for the
j-th sample,
TW is the duration of observation window and
fs the sampling rate of the sensors (200 Hz).
5. Architecture of the CNN
The general objective of a fall detector is to discriminate falls from ADLs based on the measurements collected by the mobility sensors. In the case of using a supervised machine learning technique, the classifier must be trained with pre-recorded samples to map a certain number of “input features” (computed from the sensor measurements) and the corresponding binary decision expressed in terms of an output probability of 0 or 1 in the outputs.
As already mentioned, one of the main advantages of Convolutional Neural Networks is that they are able to autonomously learn the internal structure of the data to generate the output, avoiding the preprocessing and the heuristic choice and calculation of “intermediate” variables derived from the raw measurements, which are required by other artificial intelligence methods.
A CNN essentially consists of a series of sequential (and alternate) convolutional and pooling layers that extract the features to feed a final classifying, which produces the final output decision.
Every convolution layer applies to the input features a series of linear convolution filters followed by a non-linear activation function, typically the Rectified Linear Unit—ReLU—function, defined as
f(
z) =
max (
z,
0) [
12]. The obtained results are then down-sampled by a pooling layer, commonly a max-pooling operator that moves across the input values to select only the maximum for every region of a predetermined size within the input data map.
A parameter called “stride” defines the step size utilized by the convolutional filters to slide across the data. Thus, depending on the selected value of the stride, the regions to which this processing is applied may overlap or not.
In any case, the combination of these two layers (convolutional and pooling) yields a certain abstract representation of the data, which is employed as the input features of the next layer. During the training process, the coefficients of the convolutional operators are jointly tuned to optimize the response of the final classifier whose architecture (i.e., neuron’s weights) is also adapted to produce the output that corresponds to the input training patterns.
Although CNNs were conceived for image processing, they can also be successfully utilized in the analysis of temporal (one-dimensional) sequences, as it is the case of the inputs received by an FDS.
The use of small-size and adjustable convolutional filters allows CNNs to be responsive to a set of very particular characteristics of the input signals (e.g., a sudden acceleration peak or decay or a short period in which the variation of the angular velocity in a particular direction is noteworthy), which are extracted and learned from the data without requiring any further intervention of the user. The pooling layers in turns enables the “translation invariance” of the key features (that is to say, they are detected with independence of their position—the moment within the sequence—or observation interval). Thus, the features learned by the precedent convolutional layer are filtered into a condensed map of features that summarize the main characteristics of a certain region of the input data.
The learned features of the last convolutional/pooling layer are fed to the classifier, in our case, a fully-connected neural network layer. The linear weighted sum of the features produced by the neurons is passed through a softmax (or normalized exponential) function, which normalizes the values into a probability distribution that is used by a final stage to assign one of the two mutually exclusive output classes (fall or ADL).
Training of the CNN
To prevent overfitting and, consequently, a loss in the capability of the model to generalize, a cross-validation method is conducted. For this purpose, the dataset is partitioned into three independent set of samples: training, validation and test sets, with 60%, 20% and 20% of the samples, respectively. This data segmentation is randomly executed, but, previously, it was ensured that all the sample groups have the same proportion of falls and ADLs.
The validation set is employed to assess the performance of the CNN once that it is trained for an epoch (a single pass through the entire training set). In the case that the mean error of the outputs obtained for the validation samples increases for a certain number of times (validation patience), the training phase is stopped as the CNN is supposed to start overfitting the training data. If no overfitting is detected, the training process finishes after a predetermined number of epochs (in our case, five epochs). Consequently, the reported global effectiveness of the CNN is computed by employing the test set.
To diminish the impact of the random division of the dataset, we perform a five-fold validation, by executing five rounds of training, validation and testing with five different subsets of samples, randomly extracted from the dataset. Accordingly, the final presented results correspond to the average of the quality metrics obtained for the five rounds.
In addition, during training, the weight decay (or L2 Regularization) technique is applied to reduce the values of the weights as another method to prevent overfitting.
Figure 3 shows an example of the evolution of the loss and accuracy of the training progress. The figure illustrates that the convergence is attained after very few epochs.
Table 4 and
Table 5 presents the parameters (or hyper-parameters) that define the initial architecture and training process of the CNN. The impact of the election of the hyper-parameters included in
Table 4 is investigated in the next section.
6. Performance Analysis
We utilized scripts using the Matlab Neural Network ToolboxTM [
108] to implement the CNN. As the employed functions are conceived for image processing, we define an equivalent “image” of (1 ×
width) “pixels” as the input features of the classifier. The parameter “
width” defines the size of the vector with the measurements (triaxial components of the acceleration and angular velocity) captured during the observation interval around the peak. This size or number of input features (
Ni) straightforwardly depends of the duration of the observation window (
Tw):
To appraise the performance of the detector, three typical metrics quality metrics (commonly used for binary pattern classifiers) are considered:
(1) Sensitivity (or recall), which describes the ability to identify falls:
where
TP (“True Positives”) and
FN (“False Negatives”) respectively indicate the numbers of test movement samples containing falls that were correctly and wrongly identified.
(2) Specificity, which assesses the competence of the FDS to prevent ADLs misidentified as falls (which are supposed to trigger a false alarm):
where
TN and
FP denote the numbers of “True Negatives” (ADLs that are properly detected) and “False Negatives” (ADLs misclassified as falls).
(3) Accuracy, which provides a global metric of the effectiveness of the system, defined as the ratio:
The results obtained (for an observation time window (
TW) of three seconds) with the test data and the initial (reference) architecture of the network are offered in
Table 6.
To assess the tenability of the detection model, we modify the most relevant hyperparameters of the configured network as well as the size of the observation window. As this analysis progresses, we choose those combinations of parameters that yield the best performance metrics (marked in bold in the corresponding tables).
Firstly, we investigate the effects of the size of the Max Pooling Window (MPW) (i.e., the size of the region where max pooling operator is applied). From the results tabulated in
Table 7, we can conclude that a smaller value of this window is preferable, thus MPW is set to three for the next experiments.
The study of the impact of the considered duration of the observation window (
TW) is presented in
Table 8, which reveals that the best global efficiency of the FDS is achieved with an observation interval of five seconds (±2.5 s) around the peak.
The repercussion of the dimension of the convolutional filters is summarized in
Table 9, which shows that a higher size (1 × 30) improves the specificity. Accordingly, we set the dimension of the filters to 1 × 30 (except for the last convolutional layer, for which a filter of 1 × 10 samples is configured as long as the number of inputs in this layer is very reduced).
Table 10, in turn, displays the influence of the number of convolutional layers. Results indicate that an architecture of only three layers (which reduces the complexity of the system) improves the sensitivity at the cost of a slight decay of the specificity. Thus this three-layer topology is preferred.
Table 11 and
Table 12 investigate in turn the effects of the election of the number of filters per layer and the use of the ReLu layer (which can be optionally employed in a CNN). As no improvement is achieved with respect to the reference case, these hyperparameters are not altered. On the contrary,
Table 13 shows that a better performance is obtained if a smaller size is selected for the mini-batches (subset of the training samples that is utilized to compute the gradient of the loss function in order to update the weights).
Once that the hyper-parameters and the observation window were chosen to maximize the results, we evaluate the advantages of a multi-sensory approach. Thus, we compare in
Table 14 the obtained performance of the final architecture with that achieved when the CNN is trained to discriminate falls just based on the data captured by the accelerometer.
Surprisingly,
Table 14 shows that the best metrics are obtained in the case in which only the acceleration signals are considered. This seems to indicate that the information provided by the gyroscope does not offer any supplementary information to improve the effectiveness of the detection decision in the CNN. Thus, the dimension and complexity of the system can be reduced by half (by just considering the acceleration signals) without any performance loss.
This results are coherent with those recently presented by Boutella et al. in [
34]. These authors also analyze the benefits the using the combined information retrieved from multiple sensors when a k-NN classifier is considered. The classifier is fed with the covariance matrix of the signals of the sensors collected during a fixed interval. When the detection method is applied to two public datasets (Cogent and DLR, described in
Table 3), the obtained performance metrics (which do not exhibit a high accuracy, in particular for the Cogent dataset) also suggest that the joint utilization of gyroscopes and accelerometer may even deteriorate the effectiveness of the detector.
Nevertheless, the proposal yields better results (with values for the specificity and the sensitivity around 99%) than those achieved by other works in the literature that use the same SisFall repository as the benchmarking tool [
16,
103,
109,
110,
111]. Similarly, our system also outperforms the FDS based on a Recurrent Neural Networks (RNNs) analyzed in [
112], which is tested with three different datasets.
In any case, we cannot forget that the most adequate validation of any FDS should be carried in a realistic scenario with real participants. Although there is a non-negligible number of commercial devices intended for fall detection (normally marketed in the form of a pendant or a watch), vendors do not usually inform about neither the detection method that the detector implements nor about the procedure that was followed to test its effectiveness when it is worn by a real user (in some cases, the prototypes are just tested with mannequins). In fact, in some clinical analyses, such as that provided by Lipsitz et al. in [
113], the authors showed that the fall detection ratio of this type of on-the-shelf device is still far from being satisfactory. In that study, after monitoring nursing home residents over six months, only 17 of the 89 actual falls recorded by the nursing staff did not remain unnoticed while 111 false alarms (totaling 87% of the alerting messages) were generated by the detection system.
In this regard, aspects such as usability, ergonomics and comfort are key factors to guarantee the social acceptance of fall detectors among older adults (the main target public of these monitoring applications). Unfortunately, these subjective characteristics are not considered in most articles dealing with new proposals of prototypes for FDS. However, these topics are arousing an increasing interest in the research on fall prevention and detection in the area of geriatrics (see, for example, [
114,
115,
116] for a systematic review on these subjects). In a recent study by Thillo et al. [
117], community-dwelling older people were asked to carry a prototype of a wearable FDS during nine days. After several types of qualitative analysis based on discussions with the users, authors conclude that the usability is not only determined by technical requirements but also strongly influenced by the habits and personal preferences of the final users. Thus, an adequate design of an FDS can be only achieved by involving the target users. It was shown that the performance of an FDS can improve notably if its parameters are “personalized” (tuned or configured) after analyzing the particular characteristics of the mobility of the patient to be tracked [
118]. In this sense, it is not clear if this configuration phase of the detection method may cause discomfort. If it does, users could be reluctant to accept detectors if they are obliged to participate in a configuration procedure during which a set of mobility samples must be generated to train the system.
7. Conclusions
Falls among the elderly constitute a serious medical and social concern. In this context, wearable Fall Detection Systems have been proposed as an economic and non-intrusive alternative to context-aware techniques aimed at deploying alerting systems capable of transmitting an alarm to a remote monitoring point whenever a fall is suspected.
This work has investigated the capability of a deep learning strategy (a Convolutional Neural Network) to discriminate falls from other ordinary movements (ADLs) based on the measurements of a gyroscope and an accelerometer. CNNs avoid the complex pre-processing pipeline and the need of manually engineering input features of other artificial intelligence techniques. Thus, the deep learning architecture is able to directly extract from the raw data captured by the sensors those features that maximize the capability of the classifier to discriminate the movements.
By using a long and well-known public repository containing traces of falls and ADLs, the CNN is carefully tailored and hyper-parameterized to optimize the classification performance when it is fed with signals from both sensors. However, even in that case, the performed experiments show that the configured CNN obtains better results when the gyroscope measurements are ignored and the accelerometry signals are the only parameters used as input features to train and test the convolutional neural classifier.
In spite of the fact that gyroscopes have been employed in several studies on wearable FDS, the particular benefits of combining both sensors have not usually been specifically analyzed by most studies. Our achieved results imply that a deep learning classifier can properly characterize the mobility of the user just focusing on the raw signals provided by the accelerometer, without needing the extra-costs (energy, hardware complexity, processing) of employing the information of the gyroscope. In addition, by considering the acceleration components exclusively, the dimension of the input features of the detection algorithm can be reduced, which eases the ability of the detector to operate in real time (a key factor when the fall detection system is deployed on a wearables with limited computing resources). In any case, future studies should confirm this conclusion by using other public datasets.