Next Article in Journal
Skin-Friction Coefficient Model Verification and Flow Characteristics Analysis in Disk-Type Gap for Radial Turbomachinery
Previous Article in Journal
An Approach of Improving the Efficiency of Software Fault Localization based on Feedback Ranking Information
Previous Article in Special Issue
Exploring the Potential of Sensing for Breast Cancer Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Fall Detection Approaches for Monitoring Elderly HealthCare Using Kinect Technology: A Survey

1
Nanomedicine, Imagery, and Therapeutics Laborator, University of Franche-Comté, Cedex, 25030 Besançon, France
2
DISC Department, FEMTO-ST Institute, University of Franche-Comté, 90000 Belfort, France
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(18), 10352; https://doi.org/10.3390/app131810352
Submission received: 13 August 2023 / Revised: 5 September 2023 / Accepted: 11 September 2023 / Published: 15 September 2023

Abstract

:
The severity of falls increases with age and reduced mobility. Falls are a frequent source of domestic accidents and accidental death on the part of fragile people. They produce anatomical injuries, reduce quality of life, cause dramatic psychological effects, and impose heavy financial burdens. A growing elderly population leads to a direct increase in health service costs, and indirectly to a deterioration of social life in the long term. Unsurprisingly, socioeconomic costs have triggered new scientific health research to detect falls in older people. One of the most appropriate solutions for monitoring the elderly and automatically detecting falls is computer vision. The Kinect camera plays a vital role in recognizing and detecting activities while ensuring seniors’ comfort, safety, and privacy preferences in the fall detection system. This research surveys several Kinect-based works in the literature that cover the approaches used in fall detection. In addition, we discuss the public fall benchmark based on Kinect technology. In general, the main objective of this survey is to provide a complete description of the modules making up the fall detectors and thereby guide researchers in developing fall approaches based on Kinect.

1. Introduction

The world is facing demographic changes on an unprecedented scale. According to the world health organization (WHO), the number of people over 60 years old is expected to increase from 605 million to 2 billion, between 11% and 22% of the world’s total population [1]. The elderly are the fastest-growing segment of the world population [2]. One estimate in Europe suggests that between 2004 and 2050 the elderly population will increase by 58 million [3]. In the United States, the population aged 65 and older is anticipated to reach 83.7 million by 2050, almost double its population of 43.1 million in 2012 [4]. In China, it is expected to be 487 million, or nearly 35% of China’s total population [5]. Accordingly, the number of frail and dependent older adults will increase.
Aging is characterized by a gradual deterioration of physical and mental skills. Among the risks inherent in the elderly, falling remains the most critical and complex [6]. Several factors increase the risk of falls in the elderly: (a) the state of a person’s health (e.g., deterioration of the biological and physical state); (b) the environment (e.g., low light, poor accommodation); and (c) behaviors (e.g., alcohol abuse, forgetting to use aid instruments). Nevertheless, the age and fragility of the person remains the main risk factors for falls.
Falls are the most important source of domestic accidents and the leading cause of death from unintentional injuries for the elderly. According to one study, 28–35% of people aged 65 and over fall each year, and this percentage increases to 32–42% for those aged 70 and over [7].
Falls impact seniors’ physical and mental health, and have economic repercussions on both the family and society. The real situation of the fallen elderly becomes vulnerable, mainly because it results in fracture. The psychological impact results in loss of confidence, fear of falling, and occasionally withdrawal into oneself. The cost of falls for the elderly in healthcare spending is higher than that of epilepsy, and is on the same order as the direct costs of mental illnesses like depression, schizophrenia, and dementia [8]. Every year in Europe, fall care costs are estimated at 25 billion Euros [9] and more than USD 50 billion in America [10].
Faced with this major social, economic, and ethical issue for our society, many researchers have invented technical solutions to detect falls automatically. These solutions are based on sensors that can identify a fall and report it by sending alerts in needed cases. The critical step in fall detection is the measurement of the human activity parameters with precision and reliability. Various types of intelligent sensors allow for the acquisition of necessary data. These sensors are classified into wearable sensors (accelerometers [11], gyroscopes [12], magnetometers [13], and RFID [14]) and external sensors (piezoelectric [15], microphone [16], passive infrared [17], single-camera [18], and several cameras [19]). Each category has advantages and disadvantages related to constraints such as cost, operating zone, privacy, the conformability of older adults, and the system’s accuracy (Figure 1). The actuation of wearable sensors depends on the subject and body position. Thus, older persons are advised to wear the sensors permanently, which can be particularly inconvenient at night. Most of these sensors are generally refused by the elderly due to their high false alarm rate and the difficulties involved in wearing them during activities of daily living (ADLs) [20]. Consequently, the camera option is the best in detecting falls in the home or other environments such as hospitals and accommodation establishments for dependent older adults thanks to its versatility [21].
Over the last ten years, developments in the field of microelectromechanical systems (MEMS) have immensely evolved the new interfaces of human–machine–nature communications. Since 2010, Microsoft Kinect has significantly evolved a way to communicate with a machine through a single device [22]. Researchers have used Kinect technology to develop different applications in different sectors such as physical rehabilitation, medical operation, healthcare monitoring, 3D reconstruction, augmented reality, video games, robotics, and object recognition [23]. Older people are in favor of Kinect technology. They find that using this technology can give them a greater sense of security by respecting their private life and guaranteeing the continuation of their lifestyles.
Furthermore, Kinect-based fall detection systems effectively detect human falls from other life activities. In this paper, we are primarily interested in healthcare monitoring for the elderly. For several reasons, we focus mainly on fall detection applications using Kinect:
  • The technical advantages of Kinect for the multi-sensor type include depth, RGB, infrared, and voice, producing a precise system for detecting falls.
  • Kinect’s Time Of Flight (TOF) principle makes it insensitive to lighting issues that reduce system reliability [24].
  • Kinext generates 3D data with a single camera, whereas a stereoscopic system is needed to attain the same objective in the RGB domain.
  • The acceptable price of Kinect compared to other motion-tracking cameras encourages researchers to invest in the field of computer vision for health applications.
  • Kinect fully respects the privacy of the person, and provides skeleton and depth data independently of RGB data.
  • The release of the new Kinect Azure motivated us to write a survey on the different fall detection approaches that use Kinect V1 or V2.
This survey provides guidelines to researchers interested in developing a healthcare monitoring system for fall detection for the elderly using Kinect technology. The remaining part of this document is organized as follows. Section 2 is devoted to the presentation of related work and the highlighting of our contributions. In Section 3, we describe the principal features of Microsoft Kinect. In Section 4, we discuss the different modules in the life cycle of the Kinect-based fall detection approach. Section 5 presents the various metrics used to measure performance. Section 6 describes public Kinect datasets for fall detection. Finally, our remarks are discussed in Section 7 before concluding and advising on future works in Section 8.

2. Related Work

Reviewing fall detection has already been studied by several researchers. Mohamed et al. [21] conducted a survey of fall detection systems in the context of elderly care, covering algorithms based on wearable, audio, and video technologies. They deduced that the video-based system is the best due to its versatility. However, they did not explore the fall detection system based on the Kinect technology. Zhu et al. [25] surveyed a variety of fall detection algorithms for monitoring the health of the elderly. First, they analyzed fall detection based on the acceleration sensor. After a comparison, they concluded that using a single accelerometer is limited, as the false-positive judgment increases.
Moreover, the video-based fall detection system is highly accurate [21]. Lun et al. [26] reviewed the application of the Kinect technology in three application domains of healthcare: physical therapy and rehabilitation, clinical environments, and fall detection. For the fall detection subsection, they outlined ten studies based on the Kinect V1 relatively to their specific application, i.e., fall detection or fall prevention. In addition, they identified the main drawbacks of using Kinect in healthcare (occlusion, self-occlusion, and unconventional body postures). Zhang et al. [27] surveyed only vision-based methods for fall detection. They divided the vision-based fall detectors into three classifications according to the camera (single RGB, multiple RGB, and depth). Bet et al. [28] reviewed three types of applications related to falls (fall detection, fallers classification, and fall risk) using wearable sensors. The selected paper’s fall risk screening was the most carefully considered study. The authors identified essential perspectives based on sensor type, sample rate, signal processing, and sample size.
This article presents an exhaustive view of the component modules of a fall detection approach based on Kinect. It presents guidelines for further research in the area of automatic fall detection. The following contributions make this survey different from previous surveys or reviews related to fall detection:
  • This article serves as a valuable reference for the development of elderly fall detection applications. We have compiled a comprehensive overview of both older and recent articles spanning from 2011 to 2022. Additionally, we provide in-depth insights into the various components of fall detection approaches, including data acquisition, preprocessing, feature extraction, actuation algorithms, and alert systems. Furthermore, we offer an extensive examination of performance metrics and highlight all publicly available Kinect-based datasets, making this resource indispensable for researchers and practitioners in the field.
  • We cover Kinect applications in a critical area of our society, namely, healthcare fall detection.
  • We do not focus on commercial fall detection products.
  • We present guidelines for further research.

3. The Microsoft Kinect

For presentation consistency, we introduce the Kinect camera. We first start with its success story, then an overview of the features of its three generations, and finally its software tools.

3.1. Success of Kinect Technology

The Microsoft Kinect (formerly known as the Natal project) made its first public appearance during the Electronic Entertainment Expo in June 2009 [29]. Initially, this optional accessory was intended for the XBox 360 video game console. In the digital leisure context, it allowed direct user interaction with a video game’s content without any controller, with the player simply being the controller. The Kinect can be controlled by either motion recognition or voice commands.
More than 8 million copies were sold within 60 days of its release in November 2010. The XBox 360 holds the Guinness World Record as the fastest-growing consumer electronics device ever sold [30]. In February 2012, Microsoft unveiled the official use of this depth sensor for personal computers. After the great success of version 1, Microsoft continued its development project and released the Kinect V2 in 2014. This giant company then resuscitated the Kinect after announcing a production halt in 2015, presenting a new connected camera, the Microsoft Azure Kinect, at the 2019 MWC in Barcelona [31].
Thanks to the innovation on different versions of Kinect, scientific researchers have shown Kinect’s potential as a platform for both science and industry to create innovative new applications. Kinect has opened enormous opportunities for the scientific community to tackle societal issues such as health, education, and aging.

3.2. Technical Characteristics

The Kinect is composed of different sensors that allow visual and sound data to be collected, consolidated, and analyzed; in other words, it offers restitution of what it films and hears within its operating range. It can capture three types of images: RGB, depth, and infrared. We collected the following technical information from Microsoft’s official documentation [32] and books [33,34,35]. Table 1 presents the technical characteristics of each model.
The Kinect V1 includes an RGB camera, a 3D depth sensor (infrared camera and CMOS detector), a three-axis accelerometer, a four-microphone network, and a tilt motor. The Israeli company PrimeSense developed its 3D vision technology. The data are continuously interpreted from the projected structured infrared light, which maps the infrared lightroom and creates a thermal map that identifies the distance between the lens and the scene for each pixel. The RGB camera allows the video stream to be recorded in 640 × 480 at 30 fps. The depth sensor generates depth data in 320 × 240 format at 30 fps. However, microphones allow for the spatial location of a sound source and the elimination of background noise. They collect sound data in 16 Khz. The motorized foot makes it possible to move the change of vision up and down by a value of 27°. A USB 2.0 port and USB power adapter are required to communicate with a PC. Microsoft’s minimum recommendations are a ×32 or ×64 processor of at least a dual 2.66 GHz type, 2 GB RAM, a Direct× 9.0c-compatible graphics card, and a Windows 7 operating system or newer.
The Kinect V2 contains an RGB camera with high definition of 1920 × 1080 and a wide capture angle. It is possible to collect data at 30 fps under high light conditions or 15 fps under low light conditions. The 3D information is obtained by the depth sensor and the infrared emitter. Combining these two technologies makes it possible to obtain accurate data in 3D space even under dark lighting conditions. Version 2 uses the TOF method to generate a complete 3D image of the measured object. This principle makes it possible to measure the distance of a scene from the time required for light to reach the object and return. The depth sensor operates at a resolution of 512 × 424 in 30 fps, and the IR sensor can be used at the same time as the color.
The Kinect V2 also includes a high-quality four-microphone linear progressive array that allows useful sound information to be captured in noisy conditions at 48 Khz. The orientation of these sensors allows for the detection of the sound’s origin area over a 180° range. In addition, version 2 includes a three-axis accelerometer that communicates with a PC via a USB 3.0 controller. Microsoft provides the following minimum recommendations regarding the PC performance requirements to run version 2: a 64-bit 3.1 GHz dual core processor, 4 GB RAM, and a DirectX 11-compatible graphics adapter. This version works on a PC that has at least Windows 8 or Embedded 8 operating system.
Kinect’s innovation lies in the evolution of sensor performance in each version along with its ability to miniaturize. Microsoft Azure Kinect is characterized by the fusion of the cloud, artificial intelligence, and motion detection technology of the Kinect. It can be deployed with or without cloud services. This latest technology is half the size of version 2. It contains a 12 MP RGB camera, 1 MP TOF depth camera, circular array of seven microphones, and an inertial motion unit consisting of a three-axis accelerometer and three-axis gyroscope. The RGB camera captures images with a high resolution of 3840 × 2160 at 30 fps, while the depth sensor captures information in space at different resolutions of 640 × 576 and 512 × 512 in 30 fps and 1024 × 1024 in 15 fps. The inertial measurement unit (IMU) allows sensor orientation and spatial tracking. It only works if the color and/or depth sensors work. It allows the external synchronization of sensor flows from several Kinects. Microsoft provides the following minimum recommendations: a 64-bit 2.4 GHz dual core processor, 4 GB RAM, HD620 graphics processor or higher, and USB 3.0 port. Windows 10 and Ubuntu 18.04 LTS support this version.
Each new generation of Kinect is more powerful and is presented in a new style. The sensors’ innovation, shape evolution, and motion recognition algorithms allow for more reliable data of the filmed objects. However, this versatile technology is far from being outdated. Through the new functionalities of each generation, especially those of the Azure Kinect, which is based on artificial intelligence, allows for the possibility of communication with the cloud as well as for the automatic synchronization of several Azure Kinects.

3.3. Kinect Software Tools

At the time of publication, Microsoft initially planned to use the Kinect only for video games. However, Adafruit launched a hacking contest to make Kinect functional on Windows with an increased bonus of up to USD 3000 [36]. Hacker groups have successfully exploited Kinect data on the PC, providing free libraries and unofficial software development kits (SDKs) such as the CL NUI platform, Libfreenect, OpenNi, and PCL. To act against hacker initiatives and open-source libraries, Microsoft launched the first official SDK in July 2011 [37]. The official SDK allows users to develop applications that encourage gesture and voice recognition. It provides direct access to Kinect data. This SDK includes drivers for deploying the Kinect on a computer running the required operating system. Thus, application programming interfaces (APIs), device interfaces, and samples codes are available in the development kit delivered by Microsoft. Indeed, research laboratories were the first users of Kinect outside consoles. Most of the previous libraries offer basic functionalities such as camera calibration, automatic body calibration, skeleton tracking, face tracking, and scanning. Each library has advantages and disadvantages. OpenNi and Microsoft SDK are the most used libraries in the development community [38]. OpenNI’s most compelling benefit is its flexibility for multiple operating platforms. Moreover, it is most beneficial to use OpenNi to provide the best results in finding colored points clouds. However, the official Kinect SDK is more stable in terms of the accuracy of the original image and the preprocessing technology. Moreover, it is more convenient to use when it requires skeletal monitoring and audio processing.
Each version of Kinect can detect and interpret the movements of a specific number of individuals in the scene (see Figure 2 and Table 2). It exploits a person’s skeleton by generating 3D data (position and orientation) for particular joints [39]. The depth data are used with a random forest classifier to establish the skeleton in each acquired frame [40]; the random forest categorizes each pixel into the background or part of the body, then the pixels allocated to each body part are clustered to create the skeletal model. The ease of obtaining the skeleton has made the Kinect one of the most suitable technologies for fall detection.

4. Kinect-Based Fall Detection Approaches

A fall detection method does not make it possible to identify all falls and ADLs; rather it has the objective of maximizing support for home care, retirement homes, and hospitals in satisfactory safety conditions. Therefore, each approach has its own particularities. A fall detection system can be described as a closed-loop system consisting of five main steps: (1) data acquisition, (2) preprocessing, (3) feature extraction, (4) the actuation algorithm, and (5) interpretation in control. Figure 3 explores the key elements of each module, which are precisely described by different steps. The details are summarized in Table 3.

4.1. Data Acquisition

Data acquisition is the most critical step for all fall detection methods. In the case of a vision-based system, it consists of continuously tracking the person in the scene. First, the camera acquires the predefined data flows, then the processing unit analyzes them. Kinect allows for the proper surveillance of older adults in controlled environments. This technology, with its different smart sensors, encourages the differentiation of humans from other objects in the scene. The three generations of Kinect can record and hear in real time during both day and night. They provide four types of vision streams, namely, color, depth, infrared, skeleton, and sound signals. The skeleton class was available in version 1, and has been replaced by the body class in version 2. Depth and skeleton data ideally respond to ethical issues related to privacy, as they prevent any identification of the people being monitored. Fall detection techniques based on RGB or depth fluxes do not monitor the parameters of the subjects’ joints; instead, they are based on image processing techniques on the frames around the region of interest. Other models use the 3D positions of the body’s joints estimated by the Kinect to monitor changes in posture position before, during, and after a fall.
In most approaches, volunteers perform scenarios of falls and activities of daily life to make datasets. They operate in experimental configurations that guarantee their safety. Evaluation of approaches in offline mode are carried out with data already prepared. The integration of data collected by older adults is a challenge for researchers. Effectively, we are convinced that real data contributes to reinforcing the realism of fall detection techniques. Researchers adapt the data acquisition conditions, scenarios, and characteristics necessary to ensure that their perform algorithms successfully.

4.2. Preprocessing

The data acquisition module in turn sends the data to the preprocessing unit. This step ensures that the algorithms make the correct decisions. Firstly, we consider this stage crucial to refining and preparing the parameters. Different filters and techniques are used to clean up the signals and accentuate certain information. We found that filtering was widely used in depth data. We will not go into detail about each technique as we aim to identify the broad outlines applied in this step. Indeed, preprocessing refers to all the tasks that generate information as independent as possible of the defects specific to the Kinect and the experimental configurations.
(1) In image preprocessing, different algorithms often detect the target person appearing and reappearing in Kinect’s field of vision. They extract the silhouette in a video sequence and follow different body parts. The segmentation processes used are grouped into adaptive and non-adaptive methods for detecting the background and foreground of the image. Filters are primarily designed to reduce noise while preserving the definition and structure of the data. Its objective is to obtain a set of connected pixels having a form that can be analyzed facially. The morphological methods (erosion, dilation, opening, and closing) are applied after the binarization of images. They modify the structure and shape of regions of interest in frames. These nonlinear operators are applied to remove or fill pixels as needed. The normalization process is used to modify the input data into a more informal pixel intensity range. Researchers have applied the change in the Cartesian image coordinate system to the polar coordinate system [42]. The main advantage of this change is that it solves the nonlinear deformation of the pattern. The median, averaging, particle, and Kalman filters are deployed to homogenize and smooth areas. The median filter retains proper contours compared to the others. Framing objects by encompassing geometric shapes is used in particular works. Authors have relied on the coordinates within a predefined coordinate system to delimit the external surface of the objects of interest in the scene and identify their locations. To measure specific geometrical features compared to the ground, researchers have used techniques such as the v-disparity, Hough transformation, and least squares, as well as using them as an alternative to the parameters of the floor plane equation delivered by the SDK.
(2) The large quantity and diversity of the dataset require a split step. Researchers have divided the data into two minimal parts according to predefined objectives. The training data is the largest, being used to train the algorithms and to calculate parameters in the case of thresholding algorithms. The test data are used to verify the performance of the deployed approaches, and must be made up of new data not used when designing and adjusting the model. Usually, the best performance is obtained with a ratio of 70–80% for training and 20–30% for testing [43,44].

4.3. Feature Extraction

Measuring human activity leads to collecting a large amount of essential data. Working with the smallest number of values that characterize the relevant aspects of human movement is recommended in order to achieve the most excellent potential performance for the fall detection system. These values are called characteristics, features, descriptors, or judging criteria. Scientific researchers always select relevant descriptors. Characteristics are established during the formulation of objectives, with the primary goals of diminishing error rates, mitigating the risk of model overfitting, and regulating the response time. This is particularly critical in detecting falls, as falls are typically marked by biomechanical shifts resulting in a loss of dynamic balance in the elderly [45].
A cycle of movements is repeated during a fall before the body reaches the ground. In particular works, the fall has been considered as a spatiotemporal study. For this, researchers have defined a study reference system such as a spatiotemporal coordinate system, allowing the complete characterization of the fall. They analyze events over time, characterized by trajectory, speed, and acceleration. Using a single parameter type does not lead to a reliable system that distinguishes the different types of actions. For example, according to the speed variation alone, the fall detection approach leads to an incorrect classification of the fast stretch on the ground. In other words, the approach detects may daily activity as a fall. Fall detection can be reduced to a simple study of the distance between precise joints and the ground. Among the drawbacks of the evolution of the joint’s position, we cite the stop receiving relevant data, especially the occlusion case. This last implies an increase in the rate of false alarms. In addition, during the fall cyclethe body is sometimes represented by its center of mass. The center of mass is estimated using geometric center of the silhouette or an articulation such as the hip center. The fall aspect is sometimes explored by kinematic parameters such as the angle of inclination, which makes it possible to analyze the body’s direction and flexion relative to the ground or its initial posture. In other cases, mathematical calculations are applied to the pixels of a frame to derive local visual characteristics. Among these characteristics are the histogram of oriented gradients (HOG), motion history image (MHI), curvature scale space (CSS), wavelet moment, and depth range difference. These features facilitate the interpretation of a person’s movement in the field of vision. On the other hand, the main drawback is the processing time of the image data. The fusion of kinematic and visual data with variations in displacement or velocity over time has shown effectiveness in having a global description of all the issues involved, and the time can be used as an additional characteristic to reinforce the decision.

4.4. Actuation Algorithm

After modeling the specific characteristics from large sets of learned data, the judgment unit intervenes in the method for arbitrating events Threshold-based approaches are necessarily established based on a value judgment which is generally, a comparison of the acquired data with predefined fixed values [46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66], an adaptive threshold [67,68,69,70,71,72] or both [73,74]. Researchers have based these thresholds on a configuration installed on logical hypotheses very close to real life or on experimental trials. Setting a threshold is a major challenge, and it is typically necessary to find the most generalized thresholds possible. These values try to correctly detect the events despite the physical variations of the subjects’ monitored information (size, sex, and age) and the experimental configurations (scenarios, target–Kinect distance, Kinect positioning). Machine learning is more compatible with variations in the operational constraints of fall detection. The generalization capabilities of ML models can distinguish approaches based on this type of algorithm, often achieving remarkable results. These algorithms are characterized by their ability to receive large amounts data and learn from it. However, real-time implementation and execution in the embedded system remains a scientific challenge. Machine learning techniques commonly used for fall detection include SVM [42,43,63,64,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89], KNN [90], K-Means [91], BoW–VPSO–ELM [92], BoW–naïve Bayes [93], RBF [83], Fuzzy logic [94,95,96], MLP [83,84], NCA [43], decision tree [97,98], C4.5 [43], random forest [43,99], Fisher vector [85], LMNN [100], Bayesian framework [101], hidden Markov model [102], NBC [86], and sparse representation [65]. Deep Learning, or machine learning with neural networks, has been used to detect falls in the elderly. Deep Learning models are attracting much attention from scientists, and can reach a previously unachievable level of performance, sometimes higher than human performance. The algorithms used include DNN MyNet 1D-D [103], DNN [83], ANN [43,86], ResNet [104], BPNN [105], and LSTM [44,106].

4.5. Interpretation in Control

Unfortunately, falls can lead to severe and even fatal consequences. These issues are compounded by the presence of a long time spent lying after a fall. This aspect occurs when an older adult is unable to stand up alone and spends more than an hour on the ground before someone comes to help [107,108]. A static study has shown that staying immobilized on the ground for more than an hour brings a 50% risk of death within six months [109]. More than 50% of those who fall cannot get up from the floor by themselves [110,111], leading to the necessity of effective and rapid automatic alerting in each approach to anticipate the consequences of prolonged immobilization on the ground. A long lie time can lead to complications such as muscle weakness, pressure area development, pneumonia, dehydration, missing medication, and hypothermia [112]. Thus, automatic notification via SMS-Mail-Call to emergency services, family members, or neighbors must occur immediately after the accident. In the literature, we find that there are several strategies for warning that researchers have used as a condition for the period of inactivity of the fallen person [68]. Several durations have been implemented: 0.7 s [78], 5 s [73], 10 s [59,105], and 60 s [54,75,91]. Voice exchange has been used via the Kinect’s voice recognition system [56]. This solution without a timer is not recommended, especially if the person loses consciousness. In certain works, the authors have indicated the alert system’s presence in their methods without details on the triggering process [44,50,57,58,62,74,83,84,90,94,95,96,97]. Sometimes, an older person who has fallen lives with the anxiety of falling again. This behavior can impoverishes social life and increase isolation. The integration of neighbors, family members, and volunteers in the intervention after the fall minimizes the psychological and social consequences of a fall [55].
This functionality appears in the form of audio, message [76], or visual exchange [55] of information between the target and the caregivers (family, neighbors, professional). We can name this task through feedback; moreover, one of its objectives is to reduce the error rate of the approach.
Table 3. Summary table of Kinect-based fall detection approaches; (*) means that the feature is relative to the ground.
Table 3. Summary table of Kinect-based fall detection approaches; (*) means that the feature is relative to the ground.
PaperDataPreprocessingFeaturesAlgorithmThreshold
Values
Performance
[89]- Skeletal- Apply the average
filter of 4 previous samples
(1) Location of the center of
mass with respect to the
support polygon
(2) Height of center of mass
(3) Dispersion of joint heights
(4) change of dispersion of joint heights
- SVM- - Acc 98.5%
[90]- Skeletal-(1) Standard deviation of the head joint
y-axis trajectory
- KNN-- Acc 95%
[88]- RGB- Segment the foreground using
the adaptive background
mixture model
- Extract the key frame
(1) Optical flow vectors
(2) Orientation of the elliptical pattern
- SVM with
linear Kernel
-- Acc 97.14%
- Pr 93.75%
- Se 100%
- Sp 95%
[66] - RGB- Extract the human shape
- Create the groups of pictures (GOP)
- Select Keyframes from GOP
(1) Motion qunatification from the
optical flow
- Fixed thresholding(1) α >0.95
(1) β ≤ 0.05
- Se 99%
- Sp 98.57%
- Pr 98.57%
- Acc 99%
[65]- Depth
- 3-axis acceleration
- Compute the threshould value
from the training data
- Extract the depth difference
map each 15 frames
- Generate the depth difference
gradiant along the 4 perpendiculare
directions
(1) Sum vector magnitude
(2) Number of entropy of
depth difference
(3) Gradient map
- Fixed thresholding
- Sparse representation
-- Mean Acc 97.86%
- Mean Pr 95.05%
- Mean Se 100%
- Mean Sp 96.37%
[99]- Depth
- 3-axis acceleration
-(1) Acceleration along x, y and z
(2) Magnitude of the acceleration vector
(3) Differentiel of acceleration along x, y and z
(4) Height, width, and depth of the user
(5) Area
(6) ratio of height to width
(7) ratio of height to depth
(8) differential of the height, width and depth
- Random forest - - Acc 90%
[75]- Depth- Noise reduction(1) HOG features- SVM with linear
kernel
-- Acc 98.1%
- Se 97.6%
- Sp 100%
[43]- Skeletal- Skeleton normalization(1) Common 12 features from NCA
(2) Original distances from Kinect
(3) Normalized distance from Kinect
(4) Velocities and distance from Kinect
- C4.5
- RF
- ANN
- SVM
- NCA
-- Mean Acc 99.03%
[55]- Skeletal-(1) y-coorinate
(2) And angle of the upper body
(head, neck, shoulder right, shoulder
left and spine shoulder)
- Fixed thresholding(2) 30° - Se 70%
- Sp 75%
[76]- Skeletal- Manual labeling of dataset
- Outlier skeleton filter using
one-class SVM
(1) EDM of 17 normalized joints by
the height of the monitored human
(2) Interframe speed
- SVM with Gaussian
RBF kernel + CUSUM
-- Acc 91.7%
[103] - Depth- Forground segmentation
- Labeling RLC and object labeling
- Thining Zhang Suen’s rapid thining
method Searching
(1) 2D skeleton information of calculated
joints (Head, ShoulderLeft, ElbowLef,
HandLeft, ShoulderRight, ElbowRight,
and HandRight)
- DNN MyNet1D-D-- Acc 99.2%
- Pr 99.1%
- Recall rate 98.9%
[59] - Depth- Get the 3D point cloud images
- Generate the 3D human point cloud
- Foreground map by segmentation
- Estimate the height of the human
point cloud
(1) Speed, and height of the user’s
point cloud
(2) Inactivity time
- Fixed thresholding(2) 10 s- Acc 99%
[105]- Skeletal- Calibration
- Coordinate transformation from 2D
data to 3D
- Normalization with the
Euclidean norm
(1) 3D position of the 20 joints - BPNN-- Acc 98.5%
[104]- Skeletal-(1) Body centroid height *
(2) Left hip angle
(3) Right hip angle
(4) Left knee angle
(5) Right knee angle
(6) Left feet height *
(7) Right feet height *
- ResNet modified- - Acc 98.51%
- Pr 99.02%
- Recall 98.06%
- F1 score 98.54%
[44] - Depth- Resized the depth input frames to
224 × 224 × 3
- Normalize the depth values
- Apply a color map to produce a
RGB frames
(1) Visual features using ResNet - LSTM- - Acc 98%
- Pr 97%
- Se 100%
- Sp 97%
[100]- Depth - Extract the 3D human upper body
and the head through random
forest classifier
- Mean-shift clustering
(1) Human head height *
(2) Upper body center *
(3) Inactivity
(4) Human-bed position
- LMNN(2) 0.3 m
(3) >3 s
- Acc 92.3%
- Error 7.7%
- Pr 97.8%
- Se 86.6%
- Sp 98.1%
[91]- Skeletal- FFT image encryption(1) Height and width of the
circumscrid rectangle of
the skeleton image
- K-means--
[63] - Depth- Median filter
- Background elimination
- Morphological operations
- Speed constraint
(1) Inclinaison angle
(2) Aspct ratio of the bounding rectangle
(3) Average distance of ellipse’s center *
- Fixed thresholding
- SVM
- - Mean Se 98.52%
- Mean Sp 97.35%
[84]- Skeletal- Min-max normalization
- Euclidean distance
- K-mean clustering to detect the
limits of the transistion phase
(1) Duration of the transition phase
(2) Average distance of torso joint poisition
during the transition phase
(3) Derived ratio of the average distance
over transition phase half time
- MLP
- SVM with RBF
kernel
-- Mean Acc 99.5%
- Mean Pr 99.71%
- Mean recall 99.95%
[86]- Depth- Background elimination
- Denoising
- Remove the discontinuities
in the silhouette
through median filter and a
closing operator
- Compute the 3D coordinates of
the silhouette’s center
(1) More than 30 kinematic features
related to the position, velocity and
acceleration of the monitored person
(2) Mel-cepstrum related features
- SVM
- ANN
- NBC
- - Mean Se 96.25%
98.85%.
[69]- Depth- Filtration
- Background substraction
- Binarization
- Connected component analysis
(1) Height of the region of interest - Adaptive
thresholding
- (Traget’s height)
/3
- Mean Acc 86.65%
- Mean Pr 100%
- Mean Recall 65.5%
[74]- Depth- Background substraction(1) Distance of person’s central point
from sensor *
(2) Recovery time
(3) Wind time
(4) Sit time
(5) Shift time
- Fixed and Adaptive
Thresholding
(1) KinectHeight-
0.6 m
(2) 2 s
(3) 3 s
(4) 3 s
(5) 1 s
- Mean Acc 95.9%
[106]- Skeletal- Determine the state of equilibrium
by estimating the support of base
and line of gravity
- Compute the dynamic position
of COM of 5
kinematic chains (trunk, left arm,
right arm, left leg and right leg)
(1) Accelerated velocity of COM of
5 body segments
(2) 3D skeleton data
- LSTM -- Acc 97.41%
[64]- Skeletal
- 3-axs
acceleration
- Median filter(1) Signal magnitude area
(2) Signal magnitude vector
(3) Tilt angle between y axis
and vertical direction
(4) Head speed along z axis
and x-y combined axis
- Fixed thresholding
- SVM with RBF
- - Acc 100%
[71] - Depth- Estimate floor plane
- Generate skeletal data
- Identification of fall factors
(step summetry, trunk sway
and arm spread)
(1) Velocity and height * of subject- Adaptive thresholding - - Acc 88.57%
- Pr 80.56%
- Se 96.67%
- Sp 82.5%
[78]- Depth- Foreground segmentation
- Filtering/erosion/hole filling/
blob analysis
- Random decision features to
recognize the
posture(stantind, sitting, lying)
(1) Change in lying posture
confidence levels
- SVM -- Acc 96%
- Pr 91%
- Se 100%
- Sp 93%
[70] - Skeletal- Generate the collection of Skeleton
stream using the 20 joints
- Build the adaptive directional
bounding box to rotate virtually
the camera view in ordre to
the fall be in front view
(1) Aspect ratio of DHW(DW to DH)
(2) Aspect ratio of center of gravity (COG)
(3) Aspect ratio of diagonal line (DGNL)
(4) Aspect ratio of bonding box -Height
(BB-Height)
- Adaptive thresholding(1) 1.38–3.20
(2) 0.27
(3) 0.85
(4) 0.81
- Acc 98.15%
- Se 97.75%
- Sp 98.25%
[53]- Depth- GMM background
substraction method
- Apply morphological operation
erosion and dilatation
- Approximate head to ellipse
- Particle filter
- Cross product of 3 points
to estimate the floor plane detection
(1) Head distance *
(2) Covariance of the center
of mass movement
- Fixed thresholding-- Mean Acc 85.97%
- Mean Se 81.46%
- Mean Sp 87.35%
[62]- Depth- Get the 3D position of hip center
and shoulder center joints
- Forme the torso line using the
shoulder center and hip center joints
- Use the hip center joint and a point
on the y-axis to draw the gravity vector
- Considered the hip center joint like
as the person’s centroid
- When the Kinect can not estimate
the ground parameter they detecded
the foot joint
- Start the human torso motion model
processing when the torso angle
exceed 13°
(1) Max changing rate of the angle torso
which formed between the torso line
vector and the gravity vector
(2) Max changing rate of the
centroid height *
(3) Tracking time
- Fixed thresholding(1) 12°/100 ms
(2) 1.21 m/s
(3) 1300 ms
- Acc 97.5%
- TPR 98%
- TNR 97%
[80]- Depth- Person detected segmentation
- Extract the data of all the detected joints
(1) Height and velocity change of the person
(2) Position of the subject during and after
a mouvement
- SVM-- Average Ac 97.39%
- Sp 96.61%
- Se 100%
[83]- Depth- Transforme the coordinate system from
camera coordinate to the
global coordinate
- Foreground detection Gaussian
distributed backgound;
- Background susbtraction method
(1) COG
(2) Vertical velocity of the COG
(3) Ratio of the height and width of
the bounding box
(4) Ratio of floor pixels (of the
detected foreground)
(5) Amount of floor pixels
(6) Angle of the main axes
- MLP
- RBF;
- SVM
- DNN
- - Acc ratio 98.15%
- k 0.96
[87]- Depth- Create a new coordinate system
centered at the hip center
- Calculate the 3D skeletal data
regarding to the new coordinate system
- Build the three anatomical planes
centered at the hip center
- Calculate the displacement vectors and
the signed distances for the skeletl joints
- Estimate the pose profile using some
relational geometric characteristics
(1) Histogramme based
representation of the motion-pose
geometric descriptor
- SVM with a RBF
kernel
- - Mean Acc
78.77%
[72] - Skeletal - Change axis in coordinates system
- Apply a sliding windows of 2 s
(1) Head velocity
(2) Horizontal velocity of hip center
joint (center of gravity)
(3) Vertical velocity of hip center joint
- Adaptive thresholding(1) 0.6 *((rac2 *
person’s height)/2)
(2) 0.55 *(person’s
height/4)
(3) 0.7 * (person’s
height/4)
- F1 measure 94.4%
[67] - Depth- Background substraction
- Noise filter
- Calculate Depth histrogram of the
body detected
(1) Depth range difference
(2) Depth mean difference
(3) Inactivity
- Adaptive thresholding(3) 2 s- Acc 97.2%
- Error rate 2.8%
- Se 94%
- Sp 100%
[54]- Depth- Median filter
- Canny filter
- Calculate bthe the tangent vector
angle of the outline binary image
(1) Percentage of tangent vector angle
divided into 15° groups
- Fixed thresholding (1) 40%- Acc 97.1%
- Error rate 2.9%
- Se 94.9%
- Sp 100%
[42]- Depth- Image normalization
- Image polar coordinates
- FFT transform
(1) Wavelet moment- Minimum distance
- SVM with linear
kernel
- - Mean Acc 91%
[96] - Depth
- 3-axis
acceleration
- Person extraction if sum vector of
acceleration >3 g
using differencing technique or
region growing
- Floor plane extraction
- Upload parameters’s floor plane
through RANdom Sample Consensus
- Transformation data to 3D
point cloud
(1) Height to width ratio of the person’s
bounding box in the maps of depth
(2) Ratio of height of the person’s bounding
box to the real height of person
(3) Height of person’s centroid to the ground
(4) Largest standard derivation from the
centroid for the abscissa and the application
(5) Ratio of the number of points in the cuboid
to the number of points in the surrounding
cuboid measured from 40 cm height
(6) Sum vector of acceleration and
2 dynamic features
- Fuzzy logic- - Acc 97.14%
- Pr 93.75%
- Se 100%
- Sp 95%
[93] - Depth- Human body segmentation using static
background substraction
- Noise removel by
morphological operations
- Silhouette extraction using
edge detection
- Generate the coodbook by
K-medoids clustering
(1) Silhouette orientation volume- BoW-Naïve bayes- - Mean Acc 93.84%
[60]- Depth- Person detection by
background substraction
- Noise filtering using erosion and
dialatation operations
- Model the head by an ellipse
- Track the head by Praticle Filter
(1) Ratio of y-coordinate of the head’s
center to the person’s height
(2) Covariance of fall’s duraction and
velocity of center of mass
- Fixed thresholding- - Acc 92.98%
- Se 90.76%
- Sp 93.52%
[48]- Depth- Median filter
- Background substraction
- Otsu algorithm
- V-disparity
- least squares
- Convert coordinates to the world
coodinate system
(1) Distance * of the
approximated ellipse
(2) Orientation * of the
approximated ellipse
- Fixed thresholding(1) 0.5 m
(2) 45°
-
[81]- Skeletal- Devided the behvior sequences to
3 types actual fall, non-fall and ADL
- Use the y-coodinates of head-left
shoulder, right shoulder
and torso joints
(1) 5 Body shapes and behior sequences - SVM - - Mean Acc 95.89%
[85] - Depth- Human body extraction from the
background using the adaptive
mixture of gaussian method
- Extraction of human silhouette
using Canny edge detector
- Generate the codebook
(1) CSS features - Fisher vector
- SVM with RBF
-- Mean Acc 93.8%
[61]- Skeletal
- 3-axis
acceleration
- Linearzied the samples of
accelerometrs by linear
regression algorithm
- Sychronie one acceleration sample
to one skeletal frame based
on timestampes
- Modeling the floor plane
(1) Varation of the y-spine base joint
(2) Height of the spine base joint *
(3) Acceleration magnitude
(4) Angle formed between the g vector and
x axis
- Fixed thresholding(1) 0.5 m
(2) 0.2 m
(3) 3 g
(4) 90° for 0.5 s
- Mean Se 79.3%
- Mean Sp 99%
[98]- Depth- Filtering Dilation hole filling erosion
and erosion
- 3-D foreground segmentation
- Median/average filters
- Vertical state characterisation
- On ground event segmentation
(1) Minimum vertical velocity
(2) Maximum vertical acceleration
(3) Mean of smoothed signal of the
vertical state time series
(4) Occlusion adjusted change in pixels
(5) Minimum frame-to-frame
vertical velocity
- Decision tree --
[52] - Depth- Background substraction
- Smooth velocity Kalman Filer
- Estimated threshould random search
(1) Velocity of the height of the user
bounding box
(2) Velocity of the width-depth of the
user bounding box
(3) Inactivity
- Fixed thresholding(1) 1.18 m/s
(2) 1.2 m/s
(3) 2 s
- Se 100%
[73]- Skeletal
- Audio
-(1) Positions * of some joints (head,
shouldercenter, hipcenter,
ankleright, ankleleft)
- Fixed and adaptive
thresholding
0.3 m or
0.7 *
target’s height
- Mean Acc
81.25%
[58] - Depth- Foreground segmentation
- Floor uniformization
- Substitues the null pixels
- Edge detection using sobel filter
- Generate the super-pixel frame 40 × 40
- Split the object in the depth scene
- Idententification of the human
subjects through (head-ground/head-
shoulder distance gap and the
appropriate head dimension)
(1) height * of the person’s central point- Fixed thresholding (1) 0.4 m -
[82]- Skeletal- Compute gound plane using
V-disparity the Hough
transform or the SDK Kinect
- Convert the 3D joint coordinate into
the coordinate of the plane ground
(1) Distance * and velocity * of 5 modes
composed from 8 joints (headshoulder
center, spine, hip center, shoulder right,
shoulder left, hip right, and hip left)
(2) Angle composed between the normal of
the floor plane and the line joint the head
with specific joints
- SVM - - FN 0
(mode(2))
- FP 3.3%
(mode(2))
[47]- Depth- Kalman filter
- Create the 3D human bouding Box
(1) Speed of height
(2) Speed of width-depth
(3) speed of y-coordinate
of the top left vertex, of the user
bounding box
- Fixed thresholding (1) 1.1 m/s
(2) 1.4–1.7 m/s
(3) 0.5 m/s
- Reliability
97.3%
- Efficiency
80%
[92] - Depth - Foreground segmentation GMM
- Countour detection canny
edge detector
- Use K-means to generate
the codebook
(1) CSS features of the extracted
human silhouette
- BoW-VPSO-ELM-- Mean Ac
92.02%
- Mean Se
95.54%
- Mean Sp
84.56%
[102]- Depth- Background extraction using the
running average method
- Denoising through erode and
dilate filtres
- Estimate the real word coordinate
and composate the Kinect
tilt angle
- Detect the pixel belonging to the
same object in scene using
the component labelling method
- Calculate the person’s center of mass
(1) Vertical position of the center of mass
(2) Vertical speed and the standard
derivation of all the person’s points
- Hidden Markov Model -- Mean Se
86.25%
- Mean Sp
97.88%
[77]- Skeletal - Rigid transformation from
camera coordinates
to world coordinates
(1) Average Height of the shoulders
(2) Vertical speed of the upper body
(3) Body orientation(shoulders, torso,
and hips)
(4) Temporal gradient of body orientation
(5) Distance between COM and COS
- Linear SVM - - Se 89.1%
- False alarm
rate 4.5%
[57] - Skeletal- Adding postural recongnition
algorithm to separate the false
positive and the true positive
(1) Position
(2) Velocity(vertical, horizontal)
of user’s hip center reprsented the center
of mass
- Fixed thresholding(2) 1.9 m/s- Se 100%
- Sp 90%
[56]- Skeletal- Calculate the floor plane equation
based on the angle of the Kinect
in case of difficulty having
the equation given by the SDK
(1) Position *
(2) Average velocity
of 20 joints
- Fixed thresholding(2) −1 m/s-
[51]- Depth - V-disparity
- Last squares algorithm
- Eliminating tracking error
- Calculate the mean position of
the Knee
(1) Major axi’s orientation formed by the
(head, shoulderCenter, spine, hip, and
knee) joints in 2D and 3D
(2) Height * of the spine * in 2D and 3D
- Fixed thresholding(1) Parallel to the floor plane- Mean Acc
87.7%
- Mean Pr
92.5%
- Mean recall
85%
- Mean true
negative
rate 91.3%
- Mean
F-score
88.15%
[49] - Depth- Background substraction
- Head detection the head
model algorithm
- Recongnize people HOG features
modified-SVM
(1) Vertical speed
(2) Distance *
of the head and the body center
- Fixed thresholding(1) 2 m/s, and
1 m/s
(2) 0.5 m
- Acc 98.4%
- Se 96.7%
- Sp 100%
[97] - Skeletal- Removed th depth value unrealistic
to room’s size or the direct
neighbourhood pixel
- Manuel segmentation to identify
the regions in the reference image
- Apply some rules to distinct person
for objects in the scene
(1) Prson’s height and width
(2) Ratio of corcumference to area
(3) Orientattion of patches
(4) Change of salopes of object’s longer axis
(5) COG
- Decision tree -- Se 93%
[94]- Depth - Median filter
- Interpolation
- Decimation
- Synchronization
(1) Acceleration
(2) Angular velocity
(3) Difference between the person’s
center of gravity
and the height at wich the Kinect is located
- Fuzzy logic- - Se 100%
[95]- Depth - Forground extraction
- Median filter
- Interpolation
- Decimation
- Synchronization
(1) acceleration
(2) Angular velocity
(3) Difference between the person’s
center of gravity
and the height at wich the Kinect
is located
- Fuzzy logic-- Se 100%
- Sp 96.4%
[50] - Depth- Median filter
- Segmentation Mean shift/connected
components
(1) Difference between the person’s
center of gravity
and the height at wich the Kinect is located
- Fixed thresholding--
[79]- Depth
- RGB
- Generate the three dimensional
motion history Images
- 7 hu-moments for each dimension of MHI- SVM-- Acc 97%
- Error rate
0.03%
[68]- Skeletal-(1) Distance *
(2) Vertical velocity *
of the (hipcenter, head, and neck)joints
- Adaptive thresholding- 1.5 * (distance head to neck) -
[101]- Depth- Preson detection using background
treshoulding susbtraction
- Coordinate transformation from image
to 3D world coordinate system
- Estimate the floor level by using
2,5-th percentile of heights in image
- Average the top 5% values in the
y-coordinate to compute the head
position to the floor
(1) Time span of fall
(2) Total head height change throughout fall
(3) Maximum head speed
(4) Head’s height *
(5) Pourcentage of frames where the head’s
height is small
- Bayesian framework-- Pr 92%
[46]- Depth- V-disparity
- Hough transform
- Least squares
- Background substraction
- Morphological filtering
(1) Human centroid height *
(2) body velocity
- Fixed thresholding(1) 0.358 m
(2) 0.63 m/s
- Success rate 98.7%

5. The Performance Measurement Terms

The performance of a fall detection approach is measured using standard terms. In particular, these terms verify how often the judgments of the system are correct given the actual situation of events. The fall detection problem can be considered as a binary detection/classification problem. The confusion matrix and its derivations are the most frequently used measurement tools to assess the quality of a fall detector [113,114]. In this section, we explain the terms of performance used in the reviewed papers, providing the general definition of each measurement and its application in fall detection algorithms. Before going into details, we note that these terms are used to set thresholds as well as in machine learning and deep learning approaches. We use the prediction and observation or class terms for all types of algorithms.
To measure the total performance, it is necessary to have the four components of the confusion matrix: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). A TP is a result where the algorithm correctly predicts the positive observation. Similarly, a TN is a result where the algorithm correctly predicts the negative observation. Likewise, FPs and FNs are outcomes where the model incorrectly predicts the positive and negative classes. Because true and false observations are known in almost all fall detection papers for the real-life situation of falls and ADLs, respectively, the TPs and FPs respectively count the number of falls and ADLs detected as falls. The same counting principle applies to TNs and FNs. Table 4 explains the distribution of terms for each situation. The table’s columns correspond to the actual situation of actions, and every row is linked to the result of the algorithm.
The confusion matrix exposes how the approach is confused when it judges scenarios detected by intelligent devices. It provides an explanation of the types of errors made by the system. Several evaluation measures based on the confusion matrix elements are defined. These are used to quantify the performance from different perspectives [115].
The total number of observations successfully detected is known as accuracy. It proves the success rate of fall detection approaches. However, it only reflects the reality of the system when working on symmetric datasets, where there is npo disproportion between the number of real situations (i.e., falls and ADLs).
A c = T P + T N T P + T N + F P + F N
The error rate (misclassification rate) represents the ratio of incorrectly detected observations to the total amount of data, or simply one minus the accuracy. It demonstrates how often the fall detection algorithm is wrong.
E R R = F P + F N T P + T N + F P + F N
Precision is the ratio of correctly positive predictions to the number of predicted positive observations. It determines the percentage of actions identified as falls that have been detected correctly. This performance expression allows for judging the stability of a fall detection algorithm.
P r = T P T P + F P
The sensitivity (recall, recognition rate, or true positive rate) can be defined as the proportion of actual positive results that have been correctly detected. In other words, this metric measures how useful the algorithm is in identifying the falls movement.
S e = T P T P + F N
Specificity (selectivity or true negative rate) is the proportion of actual negatives outcomes that have been correctly identified. It shows the ability of algorithms to recognize ADLs and avoid false alarms correctly.
S p = T N T N + F P
The false alarm rate (false positive rate or fallout) is equivalent to one minus the specificity. It is calculated by dividing the number of ADLs incorrectly detected as falls by the total number of ADL instances.
F A R = F P F P + T N
Three elements determine the validity of a fall detection algorithm: sensitivity, false alarm rate, and miss rate [116]. In order to fully determine the efficacy of a fall detection approach, both the precision and the sensitivity should be calculated. Nevertheless, the improvement of one of these elements is generally made at the expense of the other. For example, in the case of high sensitivity and low precision, most of the falls are correctly identified (low FN), along with a lot of false positives, and vice versa. Thus, the performance estimation tends towards calculation of the specificity and F-score.
Three more terms used in the performance evaluation deserve to be mentioned here. The F-score (F1 score) combines the precision and sensitivity by calculating the harmonic mean; it provides the algorithm’s capacity to detect all judicious situations.
F score = 2 Precision × sensitivity Precision + Sensitivity
The kappa value is mainly a measure that compares the system to how it would have worked merely by chance; it judges the inter-rater reliability of fall detectors [18,117]. The best value of kappa is when it approaches 1.
k = Accuracy Expected Accuracy 1 Expected Accuracy
Expected Accuracy = ( T P + F N ) × ( T P + F P ) T P + F P + T N + F N + ( T N + F N ) × ( T N + F P ) T P + F P + T N + F N TP + FP + TN + FN
The area under the Receiver Operating Characteristic (ROC) curve (AUROC) is obtained from the ROC curve, which shows the variation in sensitivity and specificity as a function of the decision threshold. It measures the precision of an approach for all possible decision thresholds. The fall detection approach is perfectly discriminating if the AUROC approaches 1. In addition, the principle of evaluation by cross-validation has been used in several approaches based on machine and deep learning, in the same context as estimating performance in practice. A fall detection system is considered perfect if the above performance terms have a value close to 1 (100%) and a fallout with the error rate very close to zero.

6. Kinect Datasets for Fall Detection

The creation of rich and well annotated datasets is one of the compulsory elements for detecting falls by computer vision. These datasets are made up of video sequences of fall and ADL events. The datasets are intended initially to assess the success of the algorithms. Many researchers have made their datasets public within the scientific community in order to compare more judicious approaches and raise new challenges. Many of the current fall detection benchmark datasets are performed in simulated scenarios [118]. Table 5 summarizes the public Kinect datasets usable for fall detection, while Table 6 enumerates the types of scenarios and provides the access links for each database.
The fall detection dataset created by Zhang Zhong and Vassilis Athitsos (ZZ-VA dataset) [101] contains data collected from two viewpoints. It was collected using two version 1 Kinects located at two corners of a simulated apartment. Six subjects performed the experimental actions in the Heracleia Human-Centered Computing Laboratory at the University of Texas at Arlington. This dataset includes 26 reals falls from the two viewpoints and 61 fall-like events; in all, 31,614 depth frames are registered at 320 × 240 pixel resolution. Experimental scenarios from the two Kinects are not captured at the same moment. In the same experimental setup there are two fall detection benchmark datasets, namely, EDF and OCCU [119]; EDF has no occlusion data, while OCCU has occlusion data. Each of the five volunteers made an un-occluded fall from each point of view in eight directions. There are 110 falls and 30 actions that appear to show similar features to those of a fall occurrence. The video sequences are recorded at the same time from both viewpoints. The main characteristic of the OCCU is the presence of occluded falls with the end entirely occluded by a bed. The data are not captured simultaneously from the two Kinect cameras.
The SDU fall dataset [92] was recorded using one version 1 Kinect camera installed at 1.5 m height. This dataset included five ADLs and intentional falls performed by ten young male and female subjects. In addition, there are a total of 1800 video clips collected in randomly experimental conditions, such as carrying or not carrying an object, lighting on or off, varied layout of the room, and changes of position and direction relative to the Kinect camera. The resolution of the datastream was 320 × 240 pixels, and it is stored in an audio-video interleaved format at 30 fps. The SDU FAll server is in service again, and a new access link with the password is available on the Google page of the author.
The UR fall detection dataset [120] can be considered the wealthiest dataset in terms of data relating to vision and movement. This dataset comprises 70 event sequences; the measurements were collected using two Kinect cameras and accelerometers. All RGB and depth images are synchronized with movement data based on the timestamp value. Camera 0 was parallel to the ground at the height of around 1 m, while Camera 1 used an in-ceiling configuration 2.5 m above the ground. Moreover, accelerometers were mounted on the waist or close to the pectorals. Five subjects performed the action of falling (falls from standing position and falls from sitting on the chair) and ADLs (walking, sitting down, lying on the floor, taking an object from or placing it on the floor, tying shoelaces, bending to left of an object and crouching down) in traditional spaces such as an office, classroom, etc. For the fall events, two cameras and accelerometer devices were used. ADLs were recorded only by the camera parallel to the floor with the accelerometer.
The TST fall detection dataset v1 [58] was collected using a Kinect version 1 in top view configuration. It includes depth frames captured from four volunteers. A total of 20 tests were carried out by subjects aged 26 to 27 years with different heights between 1.62 and 1.78 m. The tests were organized into two main groups; the first ten tests include data from two or more people walking in the covered area, and the rest correspond to simulated falls. Moreover, the TST fall detection dataset v2 [61] provides vision and acceleration samples. In this dataset, eleven healthy volunteers performed eight types of activities three times, resulting in 264 sequences. The young subjects who performed The ADLs and falls actions were aged between 22 and 39 years and measured 1.62 to 1.97 m. TSTv2 was captured using a Kinect version 2 with two accelerometers (IMU shimmer) attached to the wrist and waist of the subjects.
Pawel Mazurek et al. [86] built the IRMTv1 dataset with two synchronized Kinects. This dataset provides the integer numbers matrices representing the distance between the plane parallel to the device lens and the image’s scene points across both cameras simultaneously. Two performers simulated 20 different types of falls and 20 non-falls, for a total of 160 sequences of depth images. The length of the recorded data sequence corresponds to 75 frames. Each performer simulated the scenarios at a distance of approximately 1.5–4 m from each instrument in the observed area.

7. Discussion

The field of fall detection is pivotal in enhancing the quality of life for older adults and mitigating the risk of losing their independence. It encompasses various approaches aimed at developing and quantitatively analyzing activities. These approaches are predominantly focused on data analysis and processing, seeking to elucidate human movement through tiny sensors.
One of the significant challenges in fall detection lies in the physical characteristics shared between falls and certain routine activities. This challenge manifests itself in the nearly indistinguishable alteration of the body’s 3D information when an individual falls or is in a supine position on the ground. Another area for improvement affecting the accuracy of these techniques is the viewing angle. In order to accurately estimate the skeleton, the person must be appropriately oriented towards the Kinect device while ensuring that the right and left joints align along the depth axis (z-axis). Potential solutions include utilizing joints in the upper body or those forming the major axis of the tracked person, such as the head joint. Integrating additional features such as speed, acceleration, and inactivity time can effectively enhance the system’s performance in detecting falls. Moreover, performing studies in a 2D space imposes constraints on precision and generalization, resulting in loss of information on the third axis.
In most of the reviewed approaches, volunteers with varying characteristics (age, sex, height, weight, and health status) simulated activities within diverse environments and experimental setups. Consequently, the collected data used for method development may require greater realism. Integrating real and simulated data encompassing various fall scenarios and activities of daily living (ADLs) into a publicly accessible dataset would significantly benefit future research endeavors. This approach would facilitate equitable algorithm comparisons, streamline experimental preparations, and promote the generalization of techniques. Additionally, incorporating real-world events can better validate the effectiveness and applicability of the proposed approaches.
Efficiently optimizing the features that represent and differentiate visual information is critical for developing dependable real-time fall detection systems. Compared to machine learning and deep learning approaches, threshold-based systems are computationally less demanding and are more suitable for 24 h surveillance systems. They exhibit lower energy consumption when contrasted with machine learning-based methods or those employing multiple sensor types. Operationally, threshold-based approaches offer rapid outputs at the cost of precision, whereas machine learning-based approaches provide high precision but may not be suitable for real-time systems due to their computational overhead. Notably, the accuracy of a fall detection approach not only depend on the type of algorithm, but on the diversity and quantity of characteristics that cover the three phases of a fall (pre-fall, critical, and post-fall).
It is worth noting that the Kinect sensor’s operational range extends up to 8 m; however, beyond 4.5 m the reliability of the acquired data diminishes, rendering skeleton detection invalid. Various filters become imperative after data acquisition to enhance the accuracy of the results. Additionally, the calibration process for the Kinect camera demands meticulous precision and time investment. This process must be repeated whenever the camera is intentionally or accidentally moved. This calibration step is obviated when employing official Kinect drivers, streamlining the operational process.

8. Conclusions

Developing an effective fall detection system necessitates a comprehensive understanding of human movements. Fortunately, advancements in Microsoft Kinect hardware and software have alleviated many computer vision challenges, significantly advancing scientific research in recognizing and comprehending human activities. In this survey, we have provided an extensive overview of various Kinect-based methods for detecting falls in elderly individuals, offering valuable insights into this critical area. Our focus has encompassed Kinect vision technology, life cycle approaches, performance metrics, and publicly available Kinect-based datasets. Significantly, the principles underpinning a Kinect-based fall detection system can be adapted for applications with various sensor types. Such adaptability is crucial to ensure that the model remains robust across changes in viewpoint, scale, inter-class deformation, and occlusion.
The core objective of fall detection approaches is to safeguard the lives and well-being of elderly individuals under surveillance. From a personal and medical perspective, missed and incorrectly detected falls carry more significant consequences than inaccurately detected activities of daily living (ADLs). Thus, a dependable system is characterized by maximization of correct detections (true positives and true negatives) along with minimization of missed and incorrectly detected falls (false negatives) and incorrectly detected ADLs (false positives). To enhance the performance of these fall detection techniques, avenues for improvement include expanding the dataset, refining data preprocessing and denoising techniques, extracting additional meaningful features, and transitioning from analytical decision algorithms to machine or deep learning-based approaches (and vice versa).
Looking ahead, future research in this field should consider the following key directions:
  • Open and Balanced Datasets: Researchers should strive to make their datasets publicly accessible, encompassing diverse fall scenarios and daily activities performed by various volunteers. Ensuring dataset balance is crucial for robust model training.
  • Multi-Sensor Integration: Combining Kinect sensor technology with other sensor types can mitigate constraints limiting system performance, such as occlusion, data scarcity, and spatial limitations in monitoring areas.
  • Leveraging 3D Biomechanical Data: Focusing on utilizing skeletal data can more effectively characterize the biomechanics of the human body, while 3D data are better suited for precise motion analysis as compared to 2D data.
  • Real-World Data Validation: Evaluating methods using real data acquired from retirement homes or government organizations while respecting the privacy of elderly individuals and avoiding commercial objectives is essential for real-world validation and practical applicability.
  • Real-Time Performance Evaluation: Evaluating approaches in real-time scenarios should be prioritized in order to enhance their generality and suitability for practical deployment.
  • Incorporating Social and Ethical Aspects: The significance of the alarm unit within each model and the importance of elderly individuals’ social lives must be acknowledged during fall detection application development.
  • Dynamic Thresholds: Direct algorithm development toward adaptive thresholding is needed in order to infuse dynamism into the system, thereby improving its responsiveness to varying conditions. These techniques is more relevant than techniques based on fixed thresholds.
  • Deep Learning Advancements: Embracing the application of deep learning algorithms, which have demonstrated remarkable results with minimal errors in the healthcare domain, can further enhance the reliability of fall detection systems, and issues with a lack of samples can be solved through the use of deep generative models.

Author Contributions

M.F. and M.-Y.H. are respectivelly the first and second authors who collected all the data, organised them, analysed them, and wrote this paper. R.Y. is the supervisor of this project who supervised the full work. K.G., A.M., S.C., F.P., G.H. and I.L. suggested various edits to make this paper better. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by: (1) The FEDER “European regional development fund” project “Reper@ge” (https://www.europe-bfc.eu/beneficiaire/reperge/). (2) The Junior Professor Chair (CPJ) of Franche Comte University, Galaxie number 4718.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

As this is a review paper, all of the data we used may be available in public research databases.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. WHO. WHO|Facts about Ageing; World Health Organization: Geneva, Switzerland, 2014. [Google Scholar]
  2. He, W.; Goodkind, D.; Kowal, P.; West, L.A.; Ferri, G.; Fitzsimmons, J.D.; Humes, K. An Aging World: 2015; United States Census Bureau: Suitland, MD, USA, 2016.
  3. Carone, G.; Costello, D. Can Europe afford to grow old. Financ. Dev. 2006, 43, 28–31. [Google Scholar]
  4. Ortman, J.M.; Velkoff, V.A.; Hogan, H. An Aging Nation: The Older Population in the United States; United States Census Bureau; Economics and Statistics Administration: Washington, DC, USA, 2014.
  5. Leung, K.M.; Chung, P.K.; Chan, A.W.; Ransdell, L.; Siu, P.M.F.; Sun, P.; Yang, J.; Chen, T.C. Promoting healthy ageing through light volleyball intervention in Hong Kong: Study protocol for a randomised controlled trial. BMC Sport. Sci. Med. Rehabil. 2020, 12, 6. [Google Scholar] [CrossRef] [PubMed]
  6. Berg, R.L.; Cassells, J.S. Falls in older persons: Risk factors and prevention. In The Second Fifty Years: Promoting Health and Preventing Disability; National Academies Press (US): Washington, DC, USA, 1992. [Google Scholar]
  7. World Health Organization (Ed.) WHO Global Report on Falls Prevention in Older Age; World Health Organization: Geneva, Switzerland, 2008. [Google Scholar]
  8. Heinrich, S.; Rapp, K.; Rissmann, U.; Becker, C.; König, H.H. Cost of falls in old age: A systematic review. Osteoporos. Int. 2010, 21, 891–902. [Google Scholar] [CrossRef] [PubMed]
  9. Turner, S.; Kisser, R.; Rogmans, W. Falls among older adults in the EU-28: Key facts from the available statistics. EuroSafe, Amsterdam; The European Public Health Association: Brussels, Belgium, 2015. [Google Scholar]
  10. Important Facts about Falls | Home and Recreational Safety | CDC Injury Center; CDC Injury Center: Atlanta, GA, USA, 2019.
  11. Bourke, A.; O’brien, J.; Lyons, G. Evaluation of a threshold-based tri-axial accelerometer fall detection algorithm. Gait Posture 2007, 26, 194–199. [Google Scholar] [CrossRef] [PubMed]
  12. Li, Q.; Stankovic, J.A.; Hanson, M.A.; Barth, A.T.; Lach, J.; Zhou, G. Accurate, Fast Fall Detection Using Gyroscopes and Accelerometer-Derived Posture Information. In Proceedings of the BSN, Berkeley, CA, USA, 3–5 June 2009; Volume 9, pp. 138–143. [Google Scholar]
  13. Hwang, S.; Ryu, M.; Yang, Y.; Lee, N. Fall detection with three-axis accelerometer and magnetometer in a smartphone. In Proceedings of the International Conference on Computer Science and Technology, Jeju, Republic of Korea, 22–25 November 2012; pp. 25–27. [Google Scholar]
  14. Chen, Y.C.; Lin, Y.W. Indoor RFID gait monitoring system for fall detection. In Proceedings of the 2010 2nd International Symposium on Awaregs o Computing, Tainan, Taiwan, 1–4 November 2010; pp. 207–212. [Google Scholar]
  15. Alwan, M.; Rajendran, P.J.; Kell, S.; Mack, D.; Dalal, S.; Wolfe, M.; Felder, R. A smart and passive floor-vibration based fall detector for elderly. In Proceedings of the Information and Communication Technologies, Damascus, Syria, 24–28 April 2006; Volume 1, pp. 1003–1007. [Google Scholar]
  16. Li, Y.; Ho, K.; Popescu, M. A microphone array system for automatic fall detection. IEEE Trans. Biomed. Eng. 2012, 59, 1291–1301. [Google Scholar] [PubMed]
  17. Popescu, M.; Hotrabhavananda, B.; Moore, M.; Skubic, M. VAMPIR-an automatic fall detection system using a vertical PIR sensor array. In Proceedings of the 2012 6th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops, San Diego, CA, USA, 21–24 May 2012; pp. 163–166. [Google Scholar]
  18. Miaou, S.G.; Sung, P.H.; Huang, C.Y. A customized human fall detection system using omni-camera images and personal information. In Proceedings of the 1st Transdisciplinary Conference on Distributed Diagnosis and Home Healthcare, Arlington, VA, USA, 2–4 April 2006; pp. 39–42. [Google Scholar]
  19. Cucchiara, R.; Prati, A.; Vezzani, R. A multi-camera vision system for fall detection and alarm generation. Expert Syst. 2007, 24, 334–345. [Google Scholar] [CrossRef]
  20. Wang, Z.; Ramamoorthy, V.; Gal, U.; Guez, A. Possible life saver: A review on human fall detection technology. Robotics 2020, 9, 55. [Google Scholar] [CrossRef]
  21. Mohamed, O.; Choi, H.J.; Iraqi, Y. Fall detection systems for elderly care: A survey. In Proceedings of the 2014 6th International Conference on New Technologies, Mobility and Security (NTMS), Dubai, United Arab Emirates, 12 May 2014; pp. 1–4. [Google Scholar]
  22. Kandroudi, M.; Bratitsis, T. Exploring the educational perspectives of XBOX kinect based video games. Proc. ECGBL 2012, 2012, 219–227. [Google Scholar]
  23. Saenz-de Urturi, Z.; Garcia-Zapirain Soto, B. Kinect-based virtual game for the elderly that detects incorrect body postures in real time. Sensors 2016, 16, 704. [Google Scholar] [CrossRef]
  24. Gallo, L.; Placitelli, A.P.; Ciampi, M. Controller-free exploration of medical image data: Experiencing the Kinect. In Proceedings of the 2011 24th international symposium on computer-based medical systems (CBMS), Bristol, UK, 27–30 June 2011; pp. 1–6. [Google Scholar]
  25. Zhu, L.; Zhou, P.; Pan, A.; Guo, J.; Sun, W.; Wang, L.; Chen, X.; Liu, Z. A survey of fall detection algorithm for elderly health monitoring. In Proceedings of the 2015 IEEE Fifth International Conference on Big Data and Cloud Computing, Dalian, China, 26–28 August 2015; pp. 270–274. [Google Scholar]
  26. Khosrow-Pour, D.B.A. Encyclopedia of Information Science and Technology, 3rd ed.; IGI Global: Hershey, PA, USA, 2015. [Google Scholar] [CrossRef]
  27. Zhang, Z.; Conly, C.; Athitsos, V. A survey on vision-based fall detection. In Proceedings of the 8th ACM International Conference on Pervasive Technologies Related to Assistive, Corfu, Greece, 1–3 July 2015; p. 46. [Google Scholar]
  28. Bet, P.; Castro, P.C.; Ponti, M.A. Fall detection and fall risk assessment in older person using wearable sensors: A systematic review. Int. J. Med. Inform. 2019, 130, 103946. [Google Scholar] [CrossRef]
  29. Kinect Introduced at E3. 2010. Available online: https://blogs.microsoft.com/blog/2010/06/14/kinect-introduced-at-e3/ (accessed on 10 September 2023).
  30. Suma, E.A.; Krum, D.M.; Lange, B.; Koenig, S.; Rizzo, A.; Bolas, M. Adapting user interfaces for gestural interaction with the flexible action and articulated skeleton toolkit. Comput. Graph. 2013, 37, 193–201. [Google Scholar] [CrossRef]
  31. Microsoft at MWC Barcelona: Introducing Microsoft HoloLens 2; Microsoft: Redmond, WA, USA, 2019.
  32. Technical Documentation, API, and Code Examples|Microsoft Docs; Microsoft: Redmond, WA, USA, 2020.
  33. Webb, J.; Ashley, J. Beginning Kinect Programming with the Microsoft Kinect SDK; Apress: New York, NY, USA, 2012. [Google Scholar]
  34. Ouvré, T.; Santin, F. Kinect: Intégrez le Capteur de Microsoft dans vos Applications Windows; Epsilon, Editions ENI: Saint-Herblain, France, 2015. [Google Scholar]
  35. Rahman, M. Beginning Microsoft Kinect for Windows SDK 2.0: Motion and Depth Sensing for Natural User Interfaces; Apress: New York, NY, USA, 2017. [Google Scholar]
  36. Cruz, L.; Lucio, D.; Velho, L. Kinect and rgbd images: Challenges and applications. In Proceedings of the 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images Tutorials, Ouro Preto, Brazil, 22–25 August 2012; pp. 36–49. [Google Scholar]
  37. Cai, Z.; Han, J.; Liu, L.; Shao, L. RGB-D datasets using microsoft kinect or similar sensors: A survey. Multimed. Tools Appl. 2017, 76, 4313–4355. [Google Scholar] [CrossRef]
  38. Han, J.; Shao, L.; Xu, D.; Shotton, J. Enhanced computer vision with microsoft kinect sensor: A review. IEEE Trans. Cybern. 2013, 43, 1318–1334. [Google Scholar]
  39. Lan, Y.; Li, J.; Ju, Z. Data fusion-based real-time hand gesture recognition with Kinect V2. In Proceedings of the 2016 9th International Conference on Human System Interactions (HSI), Portsmouth, 6–8 July 2016; pp. 307–310. [Google Scholar]
  40. Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A. Real-time human pose recognition in parts from single depth images. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; pp. 1297–1304. [Google Scholar]
  41. Jarraya, S.K. Computer Vision-Based Fall Detection Methods Using the Kinect Camera: A Survey. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 2018, 10, 73–92. [Google Scholar] [CrossRef]
  42. Ding, Y.; Li, H.; Li, C.; Xu, K.; Guo, P. Fall detection based on depth images via wavelet moment. In Proceedings of the 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Shanghai, China, 14–16 October 2017; pp. 1–5. [Google Scholar]
  43. Alzahrani, M.S.; Jarraya, S.K.; Ben-Abdallah, H.; Ali, M.S. Comprehensive evaluation of skeleton features-based fall detection from Microsoft Kinect v2. Signal Image Video Process. 2019, 13, 1431–1439. [Google Scholar] [CrossRef]
  44. Abobakr, A.; Hossny, M.; Abdelkader, H.; Nahavandi, S. Rgb-d fall detection via deep residual convolutional lstm networks. In Proceedings of the 2018 Digital Image Computing: Techniques and Applications (DICTA), Canberra, ACT, Australia, 10–13 December 2018; pp. 1–7. [Google Scholar]
  45. Lockhart, T.E.; Smith, J.L.; Woldstad, J.C. Effects of aging on the biomechanics of slips and falls. Hum. Factors 2005, 47, 708–729. [Google Scholar] [CrossRef] [PubMed]
  46. Rougier, C.; Auvinet, E.; Rousseau, J.; Mignotte, M.; Meunier, J. Fall detection from depth map video sequences. In Proceedings of the International conference on smart homes and health telematics, Montreal, QC, Canada, 20–22 June 2011; pp. 121–128. [Google Scholar]
  47. Bevilacqua, V.; Nuzzolese, N.; Barone, D.; Pantaleo, M.; Suma, M.; D’Ambruoso, D.; Volpe, A.; Loconsole, C.; Stroppa, F. Fall detection in indoor environment with kinect sensor. In Proceedings of the 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings, Alberobello, Italy, 23–25 June 2014; pp. 319–324. [Google Scholar]
  48. Yang, L.; Ren, Y.; Zhang, W. 3D depth image analysis for indoor fall detection of elderly people. Digit. Commun. Netw. 2016, 2, 24–34. [Google Scholar] [CrossRef]
  49. Nghiem, A.T.; Auvinet, E.; Meunier, J. Head detection using kinect camera and its application to fall detection. In Proceedings of the 2012 11th international conference on information science, signal processing and their applications (ISSPA), Montreal, QC, Canada, 2–5 July 2012; pp. 164–169. [Google Scholar]
  50. Kepski, M.; Kwolek, B. Human fall detection by mean shift combined with depth connected components. In Proceedings of the International Conference on Computer Vision and Graphics, Warsaw, Poland, 24–26 September 2012; pp. 457–464. [Google Scholar]
  51. Planinc, R.; Kampel, M. Introducing the use of depth data for fall detection. Pers. Ubiquitous Comput. 2013, 17, 1063–1072. [Google Scholar] [CrossRef]
  52. Mastorakis, G.; Makris, D. Fall detection system using Kinect’s infrared sensor. J. Real-Time Image Process. 2014, 9, 635–646. [Google Scholar] [CrossRef]
  53. Merrouche, F.; Baha, N. Fall detection using head tracking and centroid movement based on a depth camera. In Proceedings of the International Conference on Computing for Engineering and Sciences, Istanbul Turkey, 22–24 July 2017; pp. 29–34. [Google Scholar]
  54. Kong, X.; Meng, L.; Tomiyama, H. Fall detection for elderly persons using a depth camera. In Proceedings of the 2017 International Conference on Advanced Mechatronic Systems (ICAMechS), Xiamen, China, 6–9 December 2017; pp. 269–273. [Google Scholar]
  55. Fayad, M.; Mostefaoui, A.; Chouali, S.; Benbernou, S. Fall Detection Application for the Elderly in the Family Heroes System. In Proceedings of the 17th ACM International Symposium on Mobility Management and Wireless Access, Miami Beach, FL, USA, 25–29 November 2019; pp. 17–23. [Google Scholar]
  56. Kawatsu, C.; Li, J.; Chung, C.J. Development of a fall detection system with Microsoft Kinect. In Robot Intelligence Technology and Applications 2012; Springer: Berlin/Heidelberg, Germany, 2013; pp. 623–630. [Google Scholar]
  57. Lee, C.K.; Lee, V.Y. Fall detection system based on kinect sensor using novel detection and posture recognition algorithm. In Proceedings of the International Conference on Smart Homes and Health Telematics, Singapore, 18–21 June 2013; pp. 238–244. [Google Scholar]
  58. Gasparrini, S.; Cippitelli, E.; Spinsante, S.; Gambi, E. A depth-based fall detection system using a Kinect® sensor. Sensors 2014, 14, 2756–2775. [Google Scholar] [CrossRef]
  59. Peng, Y.; Peng, J.; Li, J.; Yan, P.; Hu, B. Design and development of the fall detection system based on point cloud. Procedia Comput. Sci. 2019, 147, 271–275. [Google Scholar] [CrossRef]
  60. Merrouche, F.; Baha, N. Depth camera based fall detection using human shape and movement. In Proceedings of the 2016 IEEE International Conference on Signal and Image Processing (ICSIP), Beijing, China, 13–15 August 2016; pp. 586–590. [Google Scholar]
  61. Gasparrini, S.; Cippitelli, E.; Gambi, E.; Spinsante, S.; Wåhslén, J.; Orhan, I.; Lindh, T. Proposal and experimental evaluation of fall detection solution based on wearable and depth data fusion. In Proceedings of the International Conference on ICT Innovations, Lugano, Switzerland, 3–6 February 2015; pp. 99–108. [Google Scholar]
  62. Yao, L.; Min, W.; Lu, K. A new approach to fall detection based on the human torso motion model. Appl. Sci. 2017, 7, 993. [Google Scholar] [CrossRef]
  63. Panahi, L.; Ghods, V. Human fall detection using machine vision techniques on RGB—D images. Biomed. Signal Process. Control 2018, 44, 146–153. [Google Scholar] [CrossRef]
  64. Li, X.; Nie, L.; Xu, H.; Wang, X. Collaborative fall detection using smart phone and Kinect. Mob. Netw. Appl. 2018, 23, 775–788. [Google Scholar] [CrossRef]
  65. Jansi, R.; Amutha, R. Detection of fall for the elderly in an indoor environment using a tri-axial accelerometer and Kinect depth data. Multidimens. Syst. Signal Process. 2020, 31, 1207–1225. [Google Scholar] [CrossRef]
  66. Sowmyayani, S.; Murugan, V.; Kavitha, J. Fall detection in elderly care system based on group of pictures. Vietnam. J. Comput. Sci. 2021, 8, 199–214. [Google Scholar] [CrossRef]
  67. Sun, C.C.; Sheu, M.H.; Syu, Y.C. A new fall detection algorithm based on depth information using RGB-D camera. In Proceedings of the 2017 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Xiamen, China, 6–9 November 2017; pp. 413–416. [Google Scholar]
  68. Bian, Z.P.; Chau, L.P.; Magnenat-Thalmann, N. A depth video approach for fall detection based on human joints height and falling velocity. In Proceedings of the International Conference on Computer Animation and Social Agents, Singapore, 9–11 May 2012. [Google Scholar]
  69. Sase, P.S.; Bhandari, S.H. Human fall detection using depth videos. In Proceedings of the 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 22–23 February 2018; pp. 546–549. [Google Scholar]
  70. Yajai, A.; Rasmequan, S. Adaptive directional bounding box from RGB-D information for improving fall detection. J. Vis. Commun. Image Represent. 2017, 49, 257–273. [Google Scholar] [CrossRef]
  71. Nizam, Y.; Mohd, M.N.H.; Jamil, M. Development of a user-adaptable human fall detection based on fall risk levels using depth sensor. Sensors 2018, 18, 2260. [Google Scholar] [CrossRef]
  72. Zhang, S.; Li, Z.; Wei, Z.; Wang, S. An automatic human fall detection approach using RGBD cameras. In Proceedings of the 2016 5th International Conference on Computer Science and Network Technology (ICCSNT), Changchun, China, 10–11 December 2016; pp. 781–784. [Google Scholar]
  73. Mundher, Z.A.; Zhong, J. A real-time fall detection system in elderly care using mobile robot and kinect sensor. Int. J. Mater. Mech. Manuf. 2014, 2, 133–138. [Google Scholar] [CrossRef]
  74. Spinsante, S.; Fagiani, M.; Severini, M.; Squartini, S.; Ellmenreich, F.; Martelli, G. Depth-based fall detection: Outcomes from a real life pilot. In Proceedings of the Italian Forum of Ambient Assisted Living, Catania, Italy, 21–23 February 2018; pp. 287–299. [Google Scholar]
  75. Kong, X.; Meng, Z.; Nojiri, N.; Iwahori, Y.; Meng, L.; Tomiyama, H. A hog-svm based fall detection iot system for elderly persons using deep sensor. Procedia Comput. Sci. 2019, 147, 276–282. [Google Scholar] [CrossRef]
  76. Seredin, O.; Kopylov, A.; Huang, S.C.; Rodionov, D. A Skeleton Features-Based Fall Detection Using Microsoft Kinect V2 with One Class-Classifier Outlier Removal. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-2/W12, 189–195. [Google Scholar] [CrossRef]
  77. Davari, A.; Aydin, T.; Erdem, T. Automatic fall detection for elderly by using features extracted from skeletal data. In Proceedings of the 2013 International Conference on Electronics, Computer and Computation (ICECCO), Ankara, Turkey, 7–9 November 2013; pp. 127–130. [Google Scholar]
  78. Abobakr, A.; Hossny, M.; Nahavandi, S. A skeleton-free fall detection system from depth images using random decision forest. IEEE Syst. J. 2017, 12, 2994–3005. [Google Scholar] [CrossRef]
  79. Dubey, R.; Ni, B.; Moulin, P. A depth camera based fall recognition system for the elderly. In Proceedings of the International Conference Image Analysis and Recognition, Aveiro, Portugal, 25–27 June 2012; pp. 106–113. [Google Scholar]
  80. Mohd, M.N.H.; Nizam, Y.; Suhaila, S.; Jamil, M.M.A. An optimized low computational algorithm for human fall detection from depth images based on Support Vector Machine classification. In Proceedings of the 2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuching, Malaysia, 12–14 September 2017; pp. 407–412. [Google Scholar]
  81. Yoon, H.J.; Ra, H.K.; Park, T.; Chung, S.; Son, S.H. FADES: Behavioral detection of falls using body shapes from 3D joint data. J. Ambient. Intell. Smart Environ. 2015, 7, 861–877. [Google Scholar] [CrossRef]
  82. Le, T.L.; Morel, J. An analysis on human fall detection using skeleton from Microsoft Kinect. In Proceedings of the 2014 IEEE Fifth International Conference on Communications and Electronics (ICCE), Danang, Vietnam, 30 July–1 August 2014; pp. 484–489. [Google Scholar]
  83. Su, M.C.; Liao, J.W.; Wang, P.C.; Wang, C.H. A smart ward with a fall detection system. In Proceedings of the 2017 IEEE International Conference on Environment and Electrical Engineering and 2017 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I&CPS Europe), Milan, Italy, 6–9 June 2017; pp. 1–4. [Google Scholar]
  84. Patsadu, O.; Watanapa, B.; Dajpratham, P.; Nukoolkit, C. Fall motion detection with fall severity level estimation by mining kinect 3D data stream. Int. Arab J. Inf. Technol. 2018, 15, 378–388. [Google Scholar]
  85. Aslan, M.; Sengur, A.; Xiao, Y.; Wang, H.; Ince, M.C.; Ma, X. Shape feature encoding via fisher vector for efficient fall detection in depth-videos. Appl. Soft Comput. 2015, 37, 1023–1028. [Google Scholar] [CrossRef]
  86. Mazurek, P.; Wagner, J.; Morawski, R.Z. Use of kinematic and mel-cepstrum-related features for fall detection based on data from infrared depth sensors. Biomed. Signal Process. Control 2018, 40, 102–110. [Google Scholar] [CrossRef]
  87. Alazrai, R.; Momani, M.; Daoud, M.I. Fall detection for elderly from partially observed depth-map video sequences based on view-invariant human activity representation. Appl. Sci. 2017, 7, 316. [Google Scholar] [CrossRef]
  88. Jeffin Gracewell, J.; Pavalarajan, S. Fall detection based on posture classification for smart home environment. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 3581–3588. [Google Scholar] [CrossRef]
  89. Maldonado-Mendez, C.; Hernandez-Mendez, S.; Torres-Muñoz, D.; Hernandez-Mejia, C. Fall detection using features extracted from skeletal joints and SVM: Preliminary results. Multimed. Tools Appl. 2022, 81, 27657–27681. [Google Scholar] [CrossRef]
  90. Mansoor, M.; Amin, R.; Mustafa, Z.; Sengan, S.; Aldabbas, H.; Alharbi, M.T. A machine learning approach for non-invasive fall detection using Kinect. Multimed. Tools Appl. 2022, 81, 15491–15519. [Google Scholar] [CrossRef]
  91. Kong, X.; Meng, Z.; Meng, L.; Tomiyama, H. A privacy protected fall detection iot system for elderly persons using depth camera. In Proceedings of the 2018 International Conference on Advanced Mechatronic Systems (ICAMechS), Zhengzhou, China, 25 October 2018; pp. 31–35. [Google Scholar]
  92. Ma, X.; Wang, H.; Xue, B.; Zhou, M.; Ji, B.; Li, Y. Depth-based human fall detection via shape features and improved extreme learning machine. IEEE J. Biomed. Health Inform. 2014, 18, 1915–1922. [Google Scholar] [CrossRef] [PubMed]
  93. Akagündüz, E.; Aslan, M.; Şengür, A.; Wang, H.; İnce, M.C. Silhouette orientation volumes for efficient fall detection in depth videos. IEEE J. Biomed. Health Inform. 2016, 21, 756–763. [Google Scholar] [CrossRef] [PubMed]
  94. Kepski, M.; Kwolek, B.; Austvoll, I. Fuzzy inference-based reliable fall detection using kinect and accelerometer. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 29 April–3 May 2012; pp. 266–273. [Google Scholar]
  95. Kepski, M.; Kwolek, B. Fall detection on embedded platform using kinect and wireless accelerometer. In Proceedings of the International Conference on Computers for Handicapped Persons, Linz, Austria, 11–13 July 2012; pp. 407–414. [Google Scholar]
  96. Kwolek, B.; Kepski, M. Fuzzy inference-based fall detection using kinect and body-worn accelerometer. Appl. Soft Comput. 2016, 40, 305–318. [Google Scholar] [CrossRef]
  97. Marzahl, C.; Penndorf, P.; Bruder, I.; Staemmler, M. Unobtrusive fall detection using 3D images of a gaming console: Concept and first results. In Ambient Assisted Living; Springer: Berlin/Heidelberg, Germany, 2012; pp. 135–146. [Google Scholar]
  98. Stone, E.E.; Skubic, M. Fall detection in homes of older adults using the Microsoft Kinect. IEEE J. Biomed. Health Inform. 2014, 19, 290–301. [Google Scholar] [CrossRef] [PubMed]
  99. Kim, K.; Yun, G.; Park, S.K.; Kim, D.H. Fall detection for the elderly based on 3-axis accelerometer and depth sensor fusion with random forest classifier. In Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 23–27 July 2019; pp. 4611–4614. [Google Scholar]
  100. Zhao, F.; Cao, Z.; Xiao, Y.; Mao, J.; Yuan, J. Real-time detection of fall from bed using a single depth camera. IEEE Trans. Autom. Sci. Eng. 2018, 16, 1018–1032. [Google Scholar] [CrossRef]
  101. Zhang, Z.; Liu, W.; Metsis, V.; Athitsos, V. A viewpoint-independent statistical method for fall detection. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan, 11–15 November 2012; pp. 3626–3630. [Google Scholar]
  102. Dubois, A.; Charpillet, F. Human activities recognition with RGB-Depth camera using HMM. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan, 3–7 July 2013; pp. 4666–4669. [Google Scholar]
  103. Tsai, T.H.; Hsu, C.W. Implementation of Fall Detection System Based on 3D Skeleton for Deep Learning Technique. IEEE Access 2019, 7, 153049–153059. [Google Scholar] [CrossRef]
  104. Zheng, Y.; Liu, S.; Wang, Z.; Rao, Y. ReFall: Real-Time Fall Detection of Continuous Depth Maps with RFD-Net. In Proceedings of the Chinese Conference on Image and Graphics Technologies, Beijing, China, 23–25 August 2019; pp. 659–673. [Google Scholar]
  105. Xu, Y.; Chen, J.; Yang, Q.; Guo, Q. Human Posture Recognition and fall detection Using Kinect V2 Camera. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8488–8493. [Google Scholar]
  106. Xu, T.; Zhou, Y. Elders’ fall detection based on biomechanical features using depth camera. Int. J. Wavelets Multiresolut. Inf. Process. 2018, 16, 1840005. [Google Scholar] [CrossRef]
  107. Reece, A.C.; Simpson, J.M. Preparing older people to cope after a fall. Physiotherapy 1996, 82, 227–235. [Google Scholar] [CrossRef]
  108. Fleming, J.; Brayne, C. Inability to get up after falling, subsequent time on floor, and summoning help: Prospective cohort study in people over 90. BMJ 2008, 337, a2227. [Google Scholar] [CrossRef]
  109. Cummings, S.R.; Kelsey, J.L.; Nevitt, M.C.; O’DOWD, K.J. Epidemiology of osteoporosis and osteoporotic fractures. Epidemiol. Rev. 1985, 7, 178–208. [Google Scholar] [CrossRef]
  110. Pannurat, N.; Thiemjarus, S.; Nantajeewarawat, E. Automatic fall monitoring: A review. Sensors 2014, 14, 12900–12936. [Google Scholar] [CrossRef]
  111. Vlaeyen, E.; Deschodt, M.; Debard, G.; Dejaeger, E.; Boonen, S.; Goedemé, T.; Vanrumste, B.; Milisen, K. Fall incidents unraveled: A series of 26 video-based real-life fall events in three frail older persons. BMC Geriatr. 2013, 13, 103. [Google Scholar] [CrossRef] [PubMed]
  112. Charlton, K.; Murray, C.M.; Kumar, S. Perspectives of older people about contingency planning for falls in the community: A qualitative meta-synthesis. PLoS ONE 2017, 12, e0177510. [Google Scholar] [CrossRef] [PubMed]
  113. Provost, F.; Kohavi, R. Glossary of terms. J. Mach. Learn. 1998, 30, 271–274. [Google Scholar] [CrossRef]
  114. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  115. Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. In Proceedings of the Australasian joint conference on artificial intelligence, Hobart, Australia, 4–8 December 2006; pp. 1015–1021. [Google Scholar]
  116. Liu, Y.; Wang, N.; Lv, C.; Cui, J. Human body fall detection based on the Kinect sensor. In Proceedings of the 2015 8th International Congress on Image and Signal Processing (CISP), Shenyang, China, 14–16 October 2015; pp. 367–371. [Google Scholar]
  117. McHugh, M.L. Interrater reliability: The kappa statistic. Biochem. Medica Biochem. Medica 2012, 22, 276–282. [Google Scholar] [CrossRef]
  118. Xu, T.; Zhou, Y.; Zhu, J. New advances and challenges of fall detection systems: A survey. Appl. Sci. 2018, 8, 418. [Google Scholar] [CrossRef]
  119. Zhang, Z.; Conly, C.; Athitsos, V. Evaluating depth-based computer vision methods for fall detection under occlusions. In Proceedings of the International Symposium on Visual Computing, Las Vegas, NV, USA, 8–10 December 2014; pp. 196–207. [Google Scholar]
  120. Kwolek, B.; Kepski, M. Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput. Methods Programs Biomed. 2014, 117, 489–501. [Google Scholar] [CrossRef]
Figure 1. Types of sensors deployed in fall detection, with + indicating advantages and − indicating disadvantages.
Figure 1. Types of sensors deployed in fall detection, with + indicating advantages and − indicating disadvantages.
Applsci 13 10352 g001
Figure 2. The hierarchy of joints provided by V1 (a), V2 (b), and Azure Kinect (c) [32,41].
Figure 2. The hierarchy of joints provided by V1 (a), V2 (b), and Azure Kinect (c) [32,41].
Applsci 13 10352 g002
Figure 3. General flowchart of Kinect-based fall detection approaches.
Figure 3. General flowchart of Kinect-based fall detection approaches.
Applsci 13 10352 g003
Table 1. Comparison between the three generations of Kinect technology.
Table 1. Comparison between the three generations of Kinect technology.
FeaturesKinect V1Kinect V2Azure Kinect
SensorsRGB
3D Depth
4-mic progressif linear array
Accelerometer 3 axis
RGB
3D Depth
4-mic progressif linear array
Accelerometer 3 axis
RGB
3D Depth
7-mic circular array
Accelerometer 3 axis
Gyroscope 3 axis
Color stream640 × 480 @30 fps1920 × 1080 @30 fps
1920 × 1080 @15 fps
3840 × 2160 @30 fps
Depth stream320 × 240 @30 fps512 × 424 @30 fps640 × 576 @30 fps
512 × 512 @30 fps
1024 × 1024 @15 fps
Operating range0.8∼4.5 m0.5∼4.5 m0.5∼3.86 m
View angles (H × V)57° × 43°70° × 60°75° × 65°
Joints per person202531
Depth methodStructured lightTOFTOF
Tilt motorMotorized (up down 27°)ManualManual
Hardware RequirementsDual-core 2.66 GHz
2 GB RAM
Dual-core 3.1 GHz
DX11 graphics processor
4 GB RAM
Dual-core 2.4 GHz
HD620 graphics processor
4 GB RAM
Supported OSWin 7, Win 8
Win embedded standard 7
Win 8
Win embedded 8
Win 10
Ubuntu 18.04
ConnectivityUSB 2.0USB 3.0USB 3.0
Table 2. Comparison of representations of the human body in different Kinect SDK versions.
Table 2. Comparison of representations of the human body in different Kinect SDK versions.
Kinect V1Kinect V2Azure Kinect
Number of body tracked
simultaneously
26N/A a
Number of joint
per person
202532
Enumerates of Joints
in SDK
0-Hip_Center
1-Spine
2-Shoulder_Center
3-Head
4-Shoulder_Left
5-Elbow_Left
6-Wrist_Left
7-Hand_Left
8-Shoulder_Right
9-Elbow_Right
10-Wrist_Right
11-Hand_Right
12-Hip_Left
13-Knee_Left
14-Ankle_Left
15-Foot_Left
16-Hip_Right
17-Knee_Right
18-Ankle_Right
19-Foot_Right
0-Spine_Base
1-Spine_Mid
2-Neck
3-Head
4-Shoulder_Left
5-Elbow_Left
6-Wrist_Left
7-Hand_Left
8-Shoulder_Right
9-Elbow_Right
10-Wrist_Right
11-Hand_Right
12-Hip_Left
13-Knee_Left
14-Ankle_Left
15-Foot_Left
16-Hip_Right
17-Knee_Right
18-Ankle_Right
19-Foot_Right
20-Spine_Shoulder
21-HandTip_Left
22-Thumb_Left
23-HandTip_Right
24-Thumb_Right
0-Pelvis
1-Spine_Naval
2-Spine_Chest
3-Neck
4-Clavicle_Left
5-Shoulder_Left
6-Elbow_Left
7-Wrist_Left
8-Hand_Left
9-Handtip_Left
10-Thumb_Left
11-Clavicle_Right
12-Shoulder_right
13-Elbow_Right
14-Wrist_Right
15-Hand_Right
16-Handtip_Right
17-Thumb_Right
18-Hip_Left
19-Knee_Left
20-Ankle_Left
21-Foot_Left
22-Hip_Right
23-Knee_Right
24-Ankle_Right
25-Foot_Right
26-Head
27-Nose
28-Eye_Left
29-Ear_Left
30-Eye_Right
31-Ear_Right
Development
language
C++, C#, VBC++, C#, VBC, C++
a The official documentation does not specify the exact number, stating only that “Azure Kinect body tracking can track multiple human bodies at the same time”.
Table 4. Confusion matrix.
Table 4. Confusion matrix.
Real Situation
FallNo-Fall/ADLs
predicted
situation
FallTPFP
No-fall/ADLsFNTN
Table 5. Descriptive table of public Kinect datasets for fall detection.
Table 5. Descriptive table of public Kinect datasets for fall detection.
DatasetYearDevicesDataSizeContens
ZZ-VA20122 Kinect V1Depth12 sequences
performed by
6 subjects
Depth frames
Number of start and end frame for fall
EDF20142 Kinect V1Depth110 actions
performed by
5 subjects
Depth frame
Number of start and end frame for fall
The parameters of floor plane estimation
OCCU20142 Kinect V1Depth140 actions
performed by
5 subjects
Depth frame
Number of start and end frame for fall
The parameters of floor plane estimation
SDUFall2014Kinect V1Depth1800 video clips
performed by
10 young and
woman subjects
Depth video in 320 × 240 @30 fps
URFD20142 Kinect V1
PS-Move
x-IMU
Depth
RGB
Acceleration
70 video sequences
performed by
5 subjects
Acceleration value along x, y and z axis.
Signal magnitude vector of acceleration.
Ratio of the person’s bounding box
height to width.
Ratio of Major to minor axis of
the segmented person.
Standard derivation of pixels
from the centroid for the X axis and
Z axis.
Ratio of the bounding box occupancy.
Ratio of human heigh in current frame
to the physical human height
while standing.
Actual height of the person.
Distance from the person’s centroid to
the ground.
Ratio of the number of point clouds
attached to the cuboid of 40 cm height
to the cuboid of person’s height.
TSTv12014Kinect V1Depth20 tests performed
by 4 volunters aged
between 26–27 and
measuring 1.62 to
1.78 m
Depth frames in 320 × 240 pixels
TSTv22015Kinect V2
2 IMU
shimmer
Depth
Skeletal
Acceleration
264 sequences
performed by 11 young
actors from 22 to
39 years with different
height 1.62–1.97 m
Depth frames
Acceleration streams
Skeleton joints in depth
and skeleton
space Timestamps
IRMTv120182 Kinect V1Depth160 sequences performed
by 2 males in their
mid-twenties
Depth value of the points in
imaged scene (background
and frame)
Table 6. Descriptive table of public Kinect datasets for fall detection.
Table 6. Descriptive table of public Kinect datasets for fall detection.
DatasetTypes of EventsAvailable from (*)
ZZ-VAReal falls
Picking up object from the ground
Tying shoelaces
Sitting-Lying on the floor
Lying down-Jumping on the bed
Opening-Closing a drawer at ground level
http://vlm1.uta.edu/~zhangzhong/fall_detection/
EDFFalls
Picking up object from the ground
Sitting on the ground
Lying down on the ground
https://sites.google.com/site/occlusiondataset/
OCCUTotally occuled falls
un-occluded picking up something from the ground
un-occluded sitting on the ground
un-occluded tying shoelaces
Lying down on the ground(end frame is totally occluded)
https://sites.google.com/site/occlusiondataset/
SDUFallFalling down
Bending
Squatting
Sitting
Lying
Walking
https://sites.google.com/view/haibowang/home
URFDFalls from standing
Falls from sitting on the chair
Walking
Sitting down
Lying on the floor
Lying on bed/sofa
Bending to take or putt an object from the floor
Ting laces
Crouching down
http://fenix.univ.rzeszow.pl/~mkepski/ds/uf.html
TSTv1FallsTwo or more people walinghttps://ieee-dataport.org/documents/tst-fall-detection-dataset-v1
https://www.tlc.dii.univpm.it/research/processing-of-rgbd-signals-for-the-analysis-of-activity-daily-life/Kinect-based-dataset-for-motion-analysis
TSTv2Fall forward and ends up lyojng
Fall backward and ends up lying/sitting
Fall lateral and ends up lying
Sitting on a chair
Walking and Picking an object from the floor
Walking back and forth
Lying down on the floor
https://ieee-dataport.org/documents/tst-fall-detection-dataset-v2
https://www.tlc.dii.univpm.it/research/processing-of-rgbd-signals-for-the-analysis-of-activity-daily-life/Kinect-based-dataset-for-motion-analysis
IRMTv1Fall forward when (trying-always) to sit on a chair
with and without rotation
Fall lateral when(trying-always) to sit on a chair
without rotation
Fall forward-backward when trying to sit on the
floor with and without rotation
Fall forward-backward from standing with and
without rotation
Fall lateral without rotation
Fall forward when walking wwith and without rotation
Fall dorward against the wall without rotation
Fall lateral from (trying-always) lying on a bed
without rotation
Sitting down on a chair + slowlySitting down on
the floor + slowly
Rising from a sitting position on a chair-the floor
Rising from a lying position on the floor-a bed
Rising from one’s knees-crouching position
Lying down on a bed-on the floor
Kneeling-Crouching down on the floor
Standing still
Bending to pick up an object from the floor
Stretching
Shiflting from one foot to the other
Resting against the wall
Stumbling while walking with recovery
http://home.elka.pw.edu.pl/~pmazurek/
(*) the access date is 1 January 2023.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fayad, M.; Hachani, M.-Y.; Ghoumid, K.; Mostefaoui, A.; Chouali, S.; Picaud, F.; Herlem, G.; Lajoie, I.; Yahiaoui, R. Fall Detection Approaches for Monitoring Elderly HealthCare Using Kinect Technology: A Survey. Appl. Sci. 2023, 13, 10352. https://doi.org/10.3390/app131810352

AMA Style

Fayad M, Hachani M-Y, Ghoumid K, Mostefaoui A, Chouali S, Picaud F, Herlem G, Lajoie I, Yahiaoui R. Fall Detection Approaches for Monitoring Elderly HealthCare Using Kinect Technology: A Survey. Applied Sciences. 2023; 13(18):10352. https://doi.org/10.3390/app131810352

Chicago/Turabian Style

Fayad, Moustafa, Mohamed-Yacine Hachani, Kamal Ghoumid, Ahmed Mostefaoui, Samir Chouali, Fabien Picaud, Guillaume Herlem, Isabelle Lajoie, and Réda Yahiaoui. 2023. "Fall Detection Approaches for Monitoring Elderly HealthCare Using Kinect Technology: A Survey" Applied Sciences 13, no. 18: 10352. https://doi.org/10.3390/app131810352

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop