You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

7 June 2023

A Robust and Automated Vision-Based Human Fall Detection System Using 3D Multi-Stream CNNs with an Image Fusion Technique

,
and
1
Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
2
Department of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia
*
Author to whom correspondence should be addressed.

Abstract

Unintentional human falls, particularly in older adults, can result in severe injuries and death, and negatively impact quality of life. The World Health Organization (WHO) states that falls are a significant public health issue and the primary cause of injury-related fatalities worldwide. Injuries resulting from falls, such as broken bones, trauma, and internal injuries, can have severe consequences and can lead to a loss of mobility and independence. To address this problem, there have been suggestions to develop strategies to reduce the frequency of falls, in order to decrease healthcare costs and productivity loss. Vision-based fall detection approaches have proven their effectiveness in addressing falls on time, which can help to reduce fall injuries. This paper introduces an automated vision-based system for detecting falls and issuing instant alerts upon detection. The proposed system processes live footage from a monitoring surveillance camera by utilizing a fine-tuned human segmentation model and image fusion technique as pre-processing and classifying a set of live footage with a 3D multi-stream CNN model (4S-3DCNN). The system alerts when the sequence of the Falling of the monitored human, followed by having Fallen, takes place. The effectiveness of the system was assessed using the publicly available Le2i dataset. System validation revealed an impressive result, achieving an accuracy of 99.44%, sensitivity of 99.12%, specificity of 99.12%, and precision of 99.59%. Based on the reported results, the presented system can be a valuable tool for detecting human falls, preventing fall injury complications, and reducing healthcare and productivity loss costs.

1. Introduction

Falls in older adults can have far-reaching consequences, which can lead to either severe injuries or death if they do not obtain medical assistance immediately. Human falls are usually unplanned and involve falling from a higher level, when sitting or standing, to a lower level on the ground. According to [], falls have been described by the World Health Organization (WHO) as the leading cause of trauma in elderly individuals, which indicates that approximately 30% of individuals aged over 65 years have at least one trauma event annually []. A total of 47% of older adults who fall lose their independence and must rely on others for their daily activities []. This information shows that there is a need to assist older adults as soon as they fall. In this case, it is essential to have technologies that can detect falls among this population to be able to act as soon as possible. With the rapid development of video monitoring, surveillance, and communication technologies, it is becoming increasingly feasible to detect falls immediately after they occur.
There has been increased interest in fall detection, resulting in the development of many technologies that can be used to detect these events [,]. Acceleration and vibration sensors [,] are utilized to identify human motion, sound, and vibration [,,]. Most of the proposed methods cannot perform as expected due to several challenges, including noise that affects the acoustic sensors’ functions. Additionally, some strategies, such as floor vibration, are only possible if there are sensors on the ground. Another major challenge is that using sensors in large areas can be extremely expensive. These challenges are the reason why some people have embraced smartwatches and smartphones since they have equally effective sensors.
Other models used in the detection of falls include those that focus on gathering data from sequences of videos []. This strategy includes the use of multiple [], single, omnidirectional [], and stereo pair [] cameras. Since cameras can record detailed information about the subject’s mobility, the method is thought to be useful in identifying whether a fall has occurred. In this situation, everyone agrees that the camera’s data are more comprehensive and insightful than those of regular sensors. However, vision-based fall detection may save lives and considerably save medical costs for the elderly.
Deep learning encompasses a wide range of methods, including hierarchical probabilistic models, artificial neural networks, and various algorithms of supervised and unsupervised feature learning []. In comparison to other machine learning techniques utilized in different fields in the past, deep learning approaches exhibit a superior performance [,]. Deep learning is unique since they have multiple layers that allow these systems to obtain data from different abstraction levels. They can comprehend large amounts of complicated data and provide useful details. Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Long-Short-Term Memory (LSTM) [] are widely used deep learning models that outperform state-of-the-art (SOTA) techniques employed in visual processing, audio, neural language, and other sensor-based challenges. Deep learning has demonstrated a considerable leap in computer vision applications, such as object detection, activity recognition [], semantic segmentation [], and motion tracking.
A video is a series of still images or frames played in rapid succession to provide the impression of uninterrupted motion. CNNs find widespread use in areas such as video analysis, categorization, person identification, and posture estimation. However, incorporating the temporal dimension of videos can be challenging, which can be overcome in several ways. One approach is using 3D CNNs, which can capture the movement of objects in videos by applying a 3D filter to a sequence of frames used as input for the convolution process. This approach allows for the integration of temporal and spatial details in the same convolution operation.
One of the most important parts of object identification is identifying the kind of item present in an image or video and pinpointing its exact location with the help of a bounding box []. However, this method poses challenges that can be addressed through the use of deep learning techniques. Popular frameworks that are utilized include Deep Belief Networks, CNNs, and Recurrent Neural Networks (RNNs), which are commonly used in processing temporal signals.
Systems that rely on visual understanding need segmentation modules. Since the discovery of the computer vision field, image segmentation has been a primary challenge related to the separation of images into several segments []. Segmentation denotes a method that involves allocating labels to every image pixel, based on the understanding that images sharing a label are generally linked to the same object. Several objects may be included when segmenting an image, such as buildings, dogs, cars, or people. When image segmentation is accurate, understanding scenes becomes simpler [].
As a computer vision problem, human segmentation has been used in numerous applications, including understanding human movement, retexturing and classifying clothing, or identifying pedestrians. Human segmentation is a crucial phase of surveillance cameras’ functions before any recognition decision. However, this is often challenging because the human body shape, pose, environment, and clothing differ from one instance to the next. To successfully segment the human body from its background, the human segmentation approach needs the capability to define the boundaries of the human body and mask them.
The image segmentation task can broadly be handled using two methods. The first technique identifies similar image pixels from the segment. This detected similarity depends on pixel threshold values and can be attained in ML through clustering algorithms. The second technique uses the image’s pixel discontinuity. Some methods for detecting the edge, point, and line employ the discontinuity approach to obtain intermediate segments. The ultimate segmentation image can be obtained by further processing the intermediate segments. To date, the available algorithms designed to solve the image segmentation problem apply varying techniques based on neural networks, clustering, region, edge, and threshold. This work mainly solves the semantic segmentation problem using a region-based technique [].
In semantic segmentation, pictures are categorized pixel by pixel such that each pixel belongs to a unique class cluster []. With the advent of deep learning, semantic segmentation has become a pivotal area of computer vision and image processing; it has experienced major research efforts and diverse applications in various fields [].
Instance segmentation is a particular type of image segmentation that focuses on identifying objects’ instances and outlining their boundaries. It has broad practical use in various real-world situations, such as self-driving vehicles, medical imaging, and monitoring crops from above. When there are several objects of the same kind that need to be tracked individually, instance segmentation is useful.
Image fusion combines multiple images from different sources into a single image containing information from all of the input images. Image fusion aims to create a new image with improved features compared to the individual input images. This can include improved resolution, increased contrast, or reduced noise. The technique is used in many fields, including remote sensing, medical imaging, and surveillance.
Deep learning has been used for image fusion to improve the performance of the fusion process. The utilization of CNNs enables the extraction of characteristics from the input images, which can then be combined through a fusion process. This approach has been shown to improve image fusion performance, especially in cases in which there is significant variation in the input images.
Image fusion is a process that involves gathering significant details from multiple images to produce a reduced set of images, typically just one image. The resulting image is more precise and instructive than any individual image source since it contains all the essential details. The primary objective of image fusion is to have fewer images with more information and to produce images, which are easily interpretable for both humans and machines [,,]. When various images are combined into a single image to include relevant information, it is referred to as multisensory image fusion in computer vision []. An image created by fusing many images contains more information than any of the individual images used to create it [].
This paper proposed a new human fall detection system using four-stream 3D CNNs. The major contributions of the system are as follows. Each of the system’s streams is associated with one of the four stages of human decline (standing or walking, falling, fell, and at rest). Each stage consists of a series of frames with a certain orientation(s). The sequences are organized into four groups based on these characteristics. Phase-wise frame feeding to 3D convolutional neural networks is a novel idea for detecting human falls. In other studies, multi-stream CNNs have been utilized, but just to represent frames in a variety of ways, not to distinguish between stages. The poor accuracy of these systems may be traced back to the fact that some earlier efforts simply fed the CNN a set number of frames without paying attention to the semantic information (human fall phases) included within them.
The remaining parts of this paper are organized into five main sections. Section 2 reviews previous studies in the field of human fall detection, analyzing different approaches to the problem and identifying gaps in knowledge. In Section 3, the proposed methodology for the human fall detection system is detailed, highlighting the key features and their functions. Section 4 presents an in-depth analysis of the conducted experiments that evaluate the efficacy of the proposed system. The results are discussed in Section 5, which compares the proposed method’s performance with that of other approaches. Finally, Section 6 concludes the paper, summarizing the main findings, limitations, and potential avenues for future research in related fields.

3. Our Proposed Method

In recent years, deep learning has gained significant attention as a powerful tool in various applications of image and video processing, including the challenging task of human fall detection. Figure 1 shows a block diagram of a human fall detection system that uses deep learning. Various sensors, including wearable smartwatches, body sensors, and mounted IP cameras, can be utilized to gather data about the patient []. The obtained data are conveyed to a local server for pre-processing, which may involve filtering out redundant information, consolidating frames, and eliminating noise. Subsequently, the pre-processed data are input into a deep learning model, such as CNN. The system’s output could be standing, falling, fallen, or other activities []. In our proposed work, we aim to build upon our previous research [] to develop an advanced automated vision-based method for detecting human falls using multi-streams of 3D CNNs. The complex event of human falls occurs in live or video scenes and typically involves transitions from standing (or sitting) to falling to a resting position.
Figure 1. Example of a general block diagram of a human fall detection system.
A video-level CNN model was trained to capture appearance and motion information from surveillance recordings, allowing for the collection of a wide variety of elements from people’s everyday lives. Standard 2D convolutional networks focus only on feature extraction at the frame or picture level, excluding any consideration of dynamics over many frames. In contrast, 3D CNNs are proficient in extracting features from both spatial and temporal dimensions, making them well-suited for video data processing. Furthermore, multi-stream CNNs have emerged as a new trend for handling video data, and by leveraging the capabilities of 3D CNNs, the aim is to capture both spatial and temporal features from video scenes, which is essential for robust and accurate human fall detection.
Building upon the aforementioned discussion, our proposed work created an advanced method for detecting human falls utilizing multi-stream 3D CNNs. This was accomplished by leveraging the 4S-3DCNN architecture [], which incorporated a 4-stream 3D CNN architecture capable of analyzing videos to capture both the spatial and temporal features of human fall actions.
Unlike our previous method [], this work employed the 4S-3DCNN model within a comprehensive fall video scenario. There were other different improvements. The current work incorporated human segmentation as a preliminary processing measure aimed at improving the accuracy of the classification process, thereby resulting in more precise fall detection. In addition to human segmentation, the method also employed image fusion on the segmented images utilizing a sequence of 32 frames instead of 16 frames. This expansion allowed for the capture of a broader scope of temporal data, further enhancing the accuracy of the method.
The proposed method, as shown in Figure 2, begins by capturing 32 consecutive frames from an input video, which corresponds to roughly one second of live video at a frame rate of 30 frames per second (fps), representing a fall action. These frames are then converted into grayscale images. Next, a fine-tuned deep learning model designed for human semantic segmentation [] is utilized to segment human appearance within the frames. An image fusion technique is then applied to the segmented images to generate four RGB pre-processed images. Details of the pre-processing procedure, which involves human segmentation and image fusion, are provided in Section 3.1 and Section 3.2, respectively.
These pre-processed images are then fed into the previously developed 4S-3DCNN model that uses a 4-branch architecture, wherein each branch concentrates on a single image’s worth of feature-learning data, gathered in a distinct order via space and time. This model classifies the 32 frames into one of the four classes that are trained on, including either Standing, Falling, Fallen, or Others. In case there is a sequence reporting Falling followed by Fallen, the method alerts the occurrence of a human fall situation. Section 3.3 provides details of classification and human fall alerts.

3.1. Human Segmentation

Human segmentation is a computer vision task that involves separating human figures or shapes from the background in images or videos. The goal is to accurately distinguish between foreground (human shapes) and background pixels in an image or video. This task has various applications, including surveillance systems, medical imaging, and video editing. Deep-learning-based approaches have achieved significant progress in human segmentation, with CNNs being the most used technique. The segmentation process is typically performed by classifying pixels as either belonging to the foreground or background using a trained neural network [].
Gruosso et al. [] proposed an approach for human segmentation in surveillance videos using deep learning techniques. The authors leveraged a combination of CNNs and Fully Convolutional Networks (FCNs) to extract features from the input frames and perform pixel-level segmentation. The proposed method uses an encoder–decoder neural network, developed from the SegNet [], to classify pixels and distinguish between foreground (human shapes) and background. Figure 3 shows the architecture of the human segmentation model. The authors also demonstrated the practicality of their approach by applying it to real-world surveillance footage and achieving accurate segmentation results.
Figure 3. The human segmentation CNN model.
Figure 2. The architecture of the proposed fall detection system.
Our approach involves utilizing the human segmentation model proposed by [] as the initial pre-processing step. This model was designed to effectively segment humans from irrelevant background objects, thus facilitating the accurate classification of human actions. By leveraging this model as a pre-processing step, our method aimed to improve the overall accuracy of human fall identification. In our method, the human segmentation starts by reading an original input image, as illustrated in Figure 4A. Then, it converts the input image into a grayscale image resized to 256 × 256 × 1, as shown in Figure 4B. The resulting image is then processed by a trained human segmentation CNN model, which produces an image segmented by class, as shown in Figure 4C. Morphological image analysis is then implemented to obtain the final segmentation of the human within the original image [], as shown in Figure 4D.
Figure 4. Human segmentation. (A) Original input image, (B) converting to grayscale, (C) applying the human segmentation CNN model, and (D) keeping the most sizable cluster of pixels and filling any holes.
To implement this approach, we fine-tuned and retrained the original model of [] on a manually selected and annotated set of images from the Le2i dataset. Each image was manually annotated with pixel-level labels, where each pixel was categorized as either a person pixel or a background. During fine-tuning, we updated the original model’s input layer to accept grayscale images of size 256 × 256 × 1 and transformed the first layer’s 3-dimensional filters f x , y ,   3 into 2-dimensional filters f x , y using the mapping F : f x , y ,   3 f x , y , so that:
f x , y = F f x , y ,   3 = 1 3 i = 1 3 f x , y ,   i .
To ensure the reliability and generalizability of the fine-tuned pre-trained model, we employed the 3-fold cross-validation technique. Despite the high segmentation results that can be achieved through fine-tuning, it is common for specific pixels or groups of pixels to be classified incorrectly. This can result in some pixels being wrongly labeled as a person while they belong to the background, and vice versa. To fix this, post-processing is required, which entails patching up any missing pixels and removing the small-connected components of pixels while keeping the large-connected group of pixels [,].

3.2. Pre-Processing with Image Fusion

Image fusion combines multiple images from different sources into a single image containing information from all the input images. Image fusion aims to create a new image with improved features compared to the individual input images. This can include improved resolution, increased contrast, or reduced noise [,]. When two frames from a video sequence are fused, the aspects of motion that contribute to scene interpretation and categorization are brought to the forefront [].
The proposed method involves taking a sequence of 32 frames from a video, converting them into grayscale, segmenting them to isolate human appearance, and resizing them to 128 × 128. The aim is to create four fused images that are then fed into a previously developed 4-branch 3D CNN model (4S-3DCNN), with each image being processed by a separate branch. To obtain these four fused images, a fusion process using three levels of image fusion was proposed. Initially, every odd frame is fused with the following frame, resulting in 16 fused frames. The same fusion approach was then performed for the second and third levels, resulting in eight and four pre-processed images by fusion, respectively. Figure 5 provides an example of how this three-level image fusion process generates four fused images from the 32 frames that were grayscale-converted and human-appearance-segmented.
Figure 5. Three-level frame fusion to yield four fused images out of the 32 grayscale-converted and human-appearance-segmented frames.
The formulation for the three-level image fusion is as follows:
First-level fusion for 32 images:
A 1 = f u s e ( F 1 + F 2 ) ;     A 2 = f u s e ( F 3 + F 4 ) ;   A 16 = f u s e ( F 31 + F 32 )
Second-level fusion for 16 images:
B 1 = f u s e ( F 1 + F 2 ) ;     B 2 = f u s e ( F 3 + F 4 ) ;   B 8 = f u s e ( F 15 + F 16 )
Third-level fusion for 8 images:
C 1 = f u s e ( B 1 + B 2 ) ;   C 2 = f u s e ( B 3 + B 4 ) ;   C 3 = f u s e ( B 5 + B 6 ) ;   C 4 = f u s e ( B 7 + B 8 ) ;

3.3. Action Classification and Fall Alert

In the final step of the proposed method, the goal is to classify the action and detect if a human has fallen, as in Figure 2. To accomplish this, the pre-processed images that have undergone human segmentation and image fusion are fed into the previously developed 4S-3DCNN model that has a 4-branch architecture, as depicted in Figure 6. Each branch within this model handles one image of the four resulting fused images to learn features that capture different spatial and temporal information. The model then classifies every 32 frames of the input video into one of four classes: Standing, Falling, Fallen, or Others, based on the learned features. If there is a sequence of Falling followed by Fallen, the method detects a human fall situation and alerts accordingly.
Figure 6. Proposed 4S-3DCNN model’s architecture for human fall detection.

4. Experiments

The equipment utilized in conducting the experiments consisted of an Intel® Core™ i9-9900K central processing unit operating at a speed of 3.60 GHz and equipped with 64 GB of RAM, and it operated on a 64-bit Windows 10 operating system. Additionally, it had an NVIDIA GeForce RTX 2080Ti graphics processing unit with 11 GB of memory. The machine’s capacity to perform the experiments was verified by running a 64-bit edition of MATLAB R2022b.
The experiments conducted in this study involved assessing the effectiveness of the fine-tuned human segmentation model as well as evaluating the fall action classification of the full proposed method using the Le2i fall dataset. During the evaluations, we employed commonly used metrics to measure the performance, including accuracy, sensitivity, specificity, and precision. However, the performance of the human segmentation CNN model was assessed through various metrics, such as pixel global accuracy, mean recall, mean Intersection Over Union (IoU), and Weighted Intersection Over Union (wIoU). All these metrics were calculated using True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) concepts [], as illustrated in Table 1.
Table 1. Performance measurement metrics.

4.1. Database Description

The proposed study considered the Le2i fall detection dataset [] that comprises fall and normal videos. The normal category contained videos with normal daily activities, such as walking, standing, squatting, and sitting. The data were obtained from fixed cameras set up at home, coffee shop, lecture hall, and office. The actors had different clothing at different sections and tried to simulate normal activities and falls to make the dataset as diverse as possible. Other features obtained included shadows, occlusions, and varying levels of illumination that were also annotated manually, especially the start and end of the frames.

4.2. Dataset Preparation and Augmentation

To fine-tune the human segmentation model, a dataset of 355 images was created by randomly selecting images from videos in the Le2i dataset. The images were manually labeled using MATLAB Image Labeler, where each pixel in the selected images was labeled as either Person or Background using pixel-wise labeling. To prevent overfitting during the training or fine-tuning of deep learning models, it is necessary to have a large number of training images. Therefore, three augmentation techniques were applied to this dataset []. These techniques included image translation in both x-axis (horizontal) and y-axis (vertical) directions. To utilize the translation technique, it is necessary to define a range of pixel translations. In this case, the pixel range was set at 10 pixels for both the horizontal and vertical translation directions. The third technique of augmentation was random rotation, which rotates every image in the range of 10° clockwise and anticlockwise.
To evaluate the effectiveness of fall action reporting on the proposed method, the entire Le2i dataset was analyzed and labeled. The dataset comprised a total of 177 videos, with 120 videos containing fall action and 57 videos without any fall. This comprehensive dataset was used to assess the proposed method’s ability to accurately report fall actions, ensuring that the method’s performance was evaluated under diverse conditions. Table 2 shows the preparation of the needed datasets.
Table 2. Dataset preparation for the human segmentation model and fall action video classification.

4.3. Experimental Results

The fine-tuning of the human segmentation model was undertaken using a rigorous method that incorporated a three-fold cross-validation process to ensure the robustness and generalizability of results; in addition, data shuffling was utilized during training. Through this procedure, the model achieved remarkable metrics, including a pixel global accuracy of 98.46%, a mean sensitivity of 92.78%, a mean Intersection Over Union (IoU) of 83.98%, and a mean weighted IoU of 97.24%, as shown in Table 3.
Table 3. Results of the fine-tuning of the human segmentation model.
The implementation of three-fold cross-validation served to reduce overfitting by dividing the dataset into three distinct subsets. Each subset was used in turn as a validation set, while the remaining two subsets formed the training set. This ensured that the model’s performance was validated on different portions of the dataset, thus contributing to the generalizability of the model.
The resulting model was highly effective in distinguishing between a person and their surroundings. This effectiveness is further demonstrated by experimental examples of human segmentation using the fine-tuned [] model on each type of falling action as well as other actions (such as sitting), with post-processing, as illustrated in Figure 7. These examples highlight the model’s accuracy in segmenting human subjects, further supporting its effectiveness and reliability, as proven by the three-fold cross-validation process.
Figure 7. Examples of human segmentation using fine-tuning [] with post-processing.
On average, the segmentation network took 0.21 s to segment one image.
In the experiment with the entire proposed method for fall action classification, the prepared dataset included 120 videos, each containing fall action, and 57 videos without any falls. The method was applied to each video, and a binary result was reported based on the presence or absence of a fall situation. The classification results of the proposed method, as in Table 4, were highly accurate, achieving an accuracy of 99.44%, a sensitivity of 99.12%, a specificity of 99.12%, and a precision of 99.59%. The confusion matrix of the proposed method’s performance in classifying fall actions can be seen in Figure 8, providing a visual representation of the classification results. These impressive results indicate the effectiveness of the proposed method in accurately detecting fall actions in videos.
Table 4. Results of the proposed method for fall action classification.
Figure 8. Confusion matrix of the proposed method’s performance.

5. Discussion

In this section, we compare our proposed technique for human fall detection to the state of the art in terms of numerous metrics, such as accuracy, sensitivity, specificity, and precision. The comparison in Table 5 provides a comprehensive overview of the performance of various fall detection models. The accuracy of the models ranges from 78.50% (obtained by Vishnu et al. []) to 99.44%. The proposed method has the highest accuracy of 99.44%, which indicates that it has a high percentage of correctly classified fall and non-fall videos compared to other models.
Table 5. A comparison of our proposed method with other studies on the Le2i dataset.
Sensitivity is an important metric for fall detection, as it measures the ability of the model to correctly identify a fall event. The models have sensitivities ranging from 84.30% (obtained by Chamle et al. []) to 100.00%. The models by Zou et al. [] and Youssfi et al. [] have perfect sensitivities of 100.00%, indicating that they can classify all fall events correctly. Our proposed method achieved a sensitivity of 99.12%, which is above average in comparison to the other sensitivities of similar works.
Specificity measures the ability of the model to identify non-fall events correctly. The range of specificities of the compared works is between 64.29% (obtained by Poonsri et al. []) and 98.32%. Chamle et al. [] and Alaoui et al. [] also have specificity less than 85%. The model proposed by Carneiro et al. [] demonstrates the highest specificity among the compared works, with a value of 98.32%. The specificity of our proposed method outperformed others, as it achieved 99.12%.
Precision measures the number of correctly predicted fall events. The models have precision values ranging from 79.40% (obtained by Chamle et al. []) to 99.59%. The proposed method has the highest precision of 99.59%, indicating that it can accurately predict fall events.
Overall, our proposed model has the highest accuracy, specificity, precision, and above-average sensitivity compared to other similar methods. These results demonstrate the effectiveness of our approach to human fall detection and its potential for real-world applications.

6. Conclusions

In conclusion, this paper presented an innovative automated vision-based system for detecting falls and issuing instant alerts upon detection. By utilizing a fine-tuned human segmentation model and image fusion technique as pre-processing and classifying sets of live footage with a 3D multi-stream CNN model (4S-3DCNN), the system can effectively detect when a monitored human experiences a sequence of falling followed by fallen. By applying human segmentation pre-processing, the system can isolate the human appearance in 32 input frames and use three-level image fusion to create four fused images that capture movement features across the consecutive frames. The 4S-3DCNN architecture has four branches, each corresponding to one of the pre-processed images. Each branch is responsible for extracting and learning features from a different set of consecutive spatial and temporal information. The model classifies the 32 frames into one of four classes, including Standing, Falling, Fallen, or Others. If there is a sequence reporting Falling followed by Fallen, the system alerts of a human fall situation. The system’s effectiveness was assessed using the publicly available Le2i dataset, achieving impressive accuracy, sensitivity, specificity, and precision values of 99.44%, 99.12%, 99.12%, and 99.59%, respectively. Therefore, this proposed system has the potential to be a valuable tool for detecting human falls, preventing fall injury complications, and reducing healthcare and productivity loss costs.
Despite its effectiveness in detecting human falls, the proposed method has some limitations. Firstly, it can only detect falls in scenes that have one person without any localization of the person within the video. Additionally, the experiments conducted to evaluate the method’s performance were only conducted on the Le2i fall detection dataset, which may not represent real-world scenarios accurately. In future research, we aim to explore the performance of this method on various publicly available datasets, such as MCFD and URFD, to enhance its generalizability and effectiveness in real-world settings. Additionally, we will consider integrating comparisons with wearable sensor-based methods to provide a broader perspective on the problem.

Author Contributions

Conceptualization, methodology, software, investigation, resources, formal analysis, and validation, T.A., K.B. and G.M.; data curation, T.A. and K.B.; writing—original draft preparation, T.A.; writing—review and editing, supervision, project administration, and funding acquisition, G.M. All authors have read and agreed to the published version of the manuscript.

Funding

The work was funded by Researchers Supporting Project number (RSP2023R34), King Saud University, Riyadh, Saudi Arabia.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Le2i dataset [] is used in the experiments. The dataset can be obtained from http://le2i.cnrs.fr (accessed on 26 March 2023).

Acknowledgments

The authors extend their appreciation to Researchers Supporting Project number (RSP2023R34), King Saud University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. World Health Organization. Falls. 2021. Available online: https://www.who.int/news-room/fact-sheets/detail/falls (accessed on 10 October 2022).
  2. Alam, E.; Sufian, A.; Dutta, P.; Leo, M. Vision-based human fall detection systems using deep learning: A review. Comput. Biol. Med. 2022, 146, 105626. [Google Scholar] [CrossRef] [PubMed]
  3. Yu, M.; Rhuma, A.; Naqvi, S.M.; Wang, L.; Chambers, J. A posture recognition-based fall detection system for monitoring an elderly person in a smart home environment. IEEE Trans. Inf. Technol. Biomed. 2012, 16, 1274–1286. [Google Scholar] [CrossRef] [PubMed]
  4. W.H.O. WHO Global Report on Falls Prevention in Older Age; World Health Organization Ageing and Life Course Unit: Geneva, Switzerland, 2008. [Google Scholar]
  5. San-Segundo, R.; Echeverry-Correa, J.D.; Salamea, C.; Pardo, J.M. Human activity monitoring based on hidden Markov models using a smartphone. IEEE Instrum. Meas. Mag. 2016, 19, 27–31. [Google Scholar] [CrossRef]
  6. Baek, J.; Yun, B.-J. Posture monitoring system for context awareness in mobile computing. IEEE Trans. Instrum. Meas. 2010, 59, 1589–1599. [Google Scholar] [CrossRef]
  7. Tao, Y.; Hu, H. A novel sensing and data fusion system for 3-D arm motion tracking in telerehabilitation. IEEE Trans. Instrum. Meas. 2008, 57, 1029–1040. [Google Scholar]
  8. Mubashir, M.; Shao, L.; Seed, L. A survey on fall detection: Principles and approaches. Neurocomputing 2013, 100, 144–152. [Google Scholar] [CrossRef]
  9. Shieh, W.-Y.; Huang, J.-C. Falling-incident detection and throughput enhancement in a multi-camera video-surveillance system. Med. Eng. Phys. 2012, 34, 954–963. [Google Scholar] [CrossRef]
  10. Miaou, S.-G.; Sung, P.-H.; Huang, C.-Y. A Customized Human Fall Detection System Using Omni-Camera Images and Personal Information. In Proceedings of the 1st Transdisciplinary Conference on Distributed Diagnosis and Home Healthcare, Arlington, VA, USA, 2–4 April 2006. [Google Scholar]
  11. Jansen, B.; Deklerck, R. Context aware inactivity recognition for visual fall detection. In Proceedings of the Pervasive Health Conference and Workshops, Innsbruck, Austria, 29 November–1 December 2006. [Google Scholar]
  12. Voulodimos, A.; Doulamis, N.; Doulamis, A.; Eftychios, P. Deep Learning for Computer Vision: A Brief Review. Comput. Intell. Neurosci. 2018, 2018, 13. [Google Scholar] [CrossRef]
  13. Islam, M.M.; Nooruddin, S.; Karray, F.; Muhammad, G. Human activity recognition using tools of convolutional neural networks: A state of the art review, data sets, challenges, and future prospects. Comput. Biol. Med. 2022, 149, 106060. [Google Scholar] [CrossRef]
  14. Muhammad, G.; Alshehri, F.; Karray, F.; El Saddik, A.; Alsulaiman, M.; Falk, T.H. A comprehensive survey on multimodal medical signals fusion for smart healthcare systems. Inf. Fusion 2021, 76, 355–375. [Google Scholar] [CrossRef]
  15. Islam, M.M.; Nooruddin, S.; Karray, F.; Muhammad, G. Multi-level feature fusion for multimodal human activity recognition in Internet of Healthcare Things. Inf. Fusion 2023, 94, 17–31. [Google Scholar] [CrossRef]
  16. Altaheri, H.; Muhammad, G.; Alsulaiman, M.; Amin, S.U.; Altuwaijri, G.A.; Abdul, W.; Bencherif, M.A.; Faisal, M. Deep learning techniques for classification of electroencephalogram (EEG) motor imagery (MI) signals: A review. Neural Comput. Appl. 2021, 35, 14681–14722. [Google Scholar] [CrossRef]
  17. Pathak, A.R.; Pandey, M.; Rautaray, S. Application of Deep Learning for Object Detection. Procedia Comput. Sci. 2018, 132, 1706–1717. [Google Scholar] [CrossRef]
  18. Szeliski, R. Computer Vision—Algorithms and Applications in Text. In Computer Science; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  19. Guo, Z.; Huang, Y.; Hu, X.; Wei, H.; Zhao, B. A Survey on Deep learning based approaches for scene understanding in autononmous driving. Electroincs 2021, 10, 471. [Google Scholar]
  20. Li, F.-F.; Johnson, J.; Yeung, S. Detection and Segmentation. Lecture. 2011. Available online: http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture11.pdf (accessed on 10 March 2023).
  21. Liu, C.; Chen, L.-C.; Schroff, F.; Adam, H.; Hua, W.; Yuille, A.L.; Fei-Fei, L. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 82–92. [Google Scholar]
  22. Kirillov, A.; He, K.; Girshick, R.; Rother, C.; Dollár, P. Panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9404–9413. [Google Scholar]
  23. Blasch, E.; Zheng, Y.; Liu, Z. Multispectral Image Fusion and Colorization; SPIE Press: Bellingham, WA, USA, 2018. [Google Scholar]
  24. Masud, M.; Gaba, G.S.; Choudhary, K.; Hossain, M.S.; Alhamid, M.F.; Muhammad, G. Lightweight and Anonymity-Preserving User Authentication Scheme for IoT-Based Healthcare. IEEE Internet Things J. 2021, 9, 2649–2656. [Google Scholar] [CrossRef]
  25. Muhammad, G.; Hossain, M.S. COVID-19 and Non-COVID-19 Classification using Multi-layers Fusion from Lung Ultrasound Images. Inf. Fusion 2021, 72, 80–88. [Google Scholar] [CrossRef]
  26. Haghighat, M.B.A.; Aghagolzadeh, A.; Seyedarabi, H. Multi-focus image fusion for visual sensor networks in DCT domain. Comput. Electr. Eng. 2011, 37, 789–797. [Google Scholar] [CrossRef]
  27. Haghighat, M.B.A.; Aghagolzadeh, A.; Seyedarabi, H. A non-reference image fusion metric based on mutual information of image features. Comput. Electr. Eng. 2011, 37, 744–756. [Google Scholar] [CrossRef]
  28. Trapasiya, S.; Parmar, R. A Comprehensive Survey of Various Approaches on Human Fall Detection for Elderly People. Wirel. Pers. Commun. 2022, 126, 1679–1703. [Google Scholar]
  29. Biroš, O.; Karchnak, J.; Šimšík, D.; Hošovský, A. Implementation of wearable sensors for fall detection into smart household. In Proceedings of the IEEE 12th International Symposium on Applied Machine Intelligence and Informatics (SAMI), Herl’any, Slovakia, 23–25 January 2014. [Google Scholar]
  30. Nafea, O.; Abdul, W.; Muhammad, G.; Alsulaiman, M. Sensor-Based Human Activity Recognition with Spatio-Temporal Deep Learning. Sensors 2021, 21, 2141. [Google Scholar] [CrossRef]
  31. Quadros, T.D.; Lazzaretti, A.E.; Schneider, F.K. A Movement Decomposition and Machine Learning-Based Fall Detection System Using Wrist Wearable Device. IEEE Sens. J. 2018, 18, 5082–5089. [Google Scholar] [CrossRef]
  32. Özdemir, A.T.; Barshan, B. Detecting Falls with Wearable Sensors Using Machine Learning Techniques. Sensors 2014, 14, 10691–10708. [Google Scholar] [CrossRef] [PubMed]
  33. Pernini, L.; Belli, A.; Palma, L.; Pierleoni, P.; Pellegrini, M.; Valenti, S. A High Reliability Wearable Device for Elderly Fall Detection. IEEE Sens. J. 2015, 15, 4544–4553. [Google Scholar]
  34. Yazar, A.; Erden, F.; Cetin, A.E. Multi-sensor ambient assisted living system for fall detection. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014. [Google Scholar]
  35. Santos, G.L.; Endo, P.T.; Monteiro, K.; Rocha, E.; Silva, I.; Lynn, T. Accelerometer-Based Human Fall Detection Using Convolutional Neural Networks. Sensors 2019, 19, 1644. [Google Scholar] [CrossRef]
  36. Islam, M.M.; Nooruddin, S.; Karray, F.; Muhammad, G. Internet of Things: Device Capabilities, Architectures, Protocols, and Smart Applications in Healthcare Domain. IEEE Internet Things J. 2023, 10, 3611–3641. [Google Scholar] [CrossRef]
  37. Alshehri, F.; Muhammad, G. A Comprehensive Survey of the Internet of Things (IoT) and AI-Based Smart Healthcare. IEEE Access 2021, 9, 3660–3678. [Google Scholar] [CrossRef]
  38. Chelli, A.; Pätzold, M. A Machine Learning Approach for Fall Detection and Daily Living Activity Recognition. IEEE Access 2019, 7, 38670–38687. [Google Scholar] [CrossRef]
  39. Muhammad, G.; Rahman, S.K.M.M.; Alelaiwi, A.; Alamri, A. Smart Health Solution Integrating IoT and Cloud: A Case Study of Voice Pathology Monitoring. IEEE Commun. Mag. 2017, 55, 69–73. [Google Scholar] [CrossRef]
  40. Muhammad, G.; Alhussein, M. Security, trust, and privacy for the Internet of vehicles: A deep learning approach. IEEE Consum. Electron. Mag. 2022, 6, 49–55. [Google Scholar] [CrossRef]
  41. Leone, A.; Diraco, G.; Siciliano, P. Detecting falls with 3D range camera in ambient assisted living applications: A preliminary study. Med. Eng. Phys. 2011, 33, 770–781. [Google Scholar] [CrossRef]
  42. Jokanovic, B.; Amin, M.; Ahmad, F. Radar fall motion detection using deep learning. In Proceedings of the IEEE Radar Conference (RadarConf16), Philadelphia, PA, USA, 2–6 May 2016. [Google Scholar]
  43. Amin, M.G.; Zhang, Y.D.; Ahmad, F.; Ho, K.D. Radar Signal Processing for Elderly Fall Detection: The future for in-home monitoring. IEEE Signal Process. Mag. 2016, 33, 71–80. [Google Scholar] [CrossRef]
  44. Yang, L.; Ren, Y.; Hu, H.; Tian, B. New Fast Fall Detection Method Based on Spatio-Temporal Context Tracking of Head by Using Depth Images. Sensors 2015, 15, 23004–23019. [Google Scholar] [CrossRef] [PubMed]
  45. Ma, X.; Wang, H.; Xue, B.; Zhou, M.; Ji, B.; Li, Y. Depth-Based Human Fall Detection via Shape Features and Improved Extreme Learning Machine. IEEE J. Biomed. Health Inform. 2014, 18, 1915–1922. [Google Scholar] [CrossRef]
  46. Angal, Y.; Jagtap, A. Fall detection system for older adults. In Proceedings of the IEEE International Conference on Advances in Electronics, Communication and Computer Technology (ICAECCT), Pune, India, 2–3 December 2016. [Google Scholar]
  47. Stone, E.E.; Skubic, M. Fall Detection in Homes of Older Adults Using the Microsoft Kinect. IEEE J. Biomed. Health Inform. 2015, 19, 290–301. [Google Scholar] [CrossRef] [PubMed]
  48. Yang, L.; Ren, Y.; Zhang, W. 3D depth image analysis for indoor fall detection of elderly people. Digit. Commun. Netw. 2016, 2, 24–34. [Google Scholar] [CrossRef]
  49. Adhikari, K.; Bouchachia, A.; Nait-Charif, H. Activity recognition for indoor fall detection using convolutional neural network. In Proceedings of the Fifteenth IAPR International Conference on Machine Vision Applications (MVA), Nagoya, Japan, 8–12 May 2017. [Google Scholar]
  50. Fan, K.; Wang, P.; Zhuang, S. Human fall detection using slow feature analysis. Multimed. Tools Appl. 2019, 78, 9101–9128. [Google Scholar] [CrossRef]
  51. Xu, H.; Leixian, S.; Zhang, Q.; Cao, G. Fall Behavior Recognition Based on Deep Learning and Image Processing. Int. J. Mob. Comput. Multimed. Commun. 2018, 9, 1–15. [Google Scholar] [CrossRef]
  52. Bian, Z.-P.; Hou, J.; Chau, L.-P.; Magnenat-Thalmann, N. Fall Detection Based on Body Part Tracking Using a Depth Camera. IEEE J. Biomed. Health Inform. 2015, 19, 430–439. [Google Scholar] [CrossRef]
  53. Wang, S.; Chen, L.; Zhou, Z.; Sun, X.; Dong, J. Human Fall Detection in Surveillance Video Based on PCANet. Multimed. Tools Appl. 2016, 75, 11603–11613. [Google Scholar] [CrossRef]
  54. Benezeth, Y.; Emile, B.; Laurent, H.; Rosenberger, C. Vision-Based System for Human Detection and Tracking in Indoor Environment. Int. J. Soc. Robot. 2009, 2, 41–52. [Google Scholar] [CrossRef]
  55. Liu, H.; Zuo, C. An Improved Algorithm of Automatic Fall Detection. AASRI Procedia 2012, 1, 353–358. [Google Scholar] [CrossRef]
  56. Lu, K.-L.; Chu, E.T.-H. An Image-Based Fall Detection System for the Elderly. Appl. Sci. 2018, 8, 1995. [Google Scholar] [CrossRef]
  57. Debard, G.; Karsmakers, P.; Deschodt, M.; Vlaeyen, E.; Bergh, J.; Dejaeger, E.; Milisen, K.; Goedemé, T.; Tuytelaars, T.; Vanrumste, B. Camera Based Fall Detection Using Multiple Features Validated with Real Life Video. In Proceedings of the Workshop 7th International Conference on Intelligent Environments, Nottingham, UK, 25–28 July 2011. [Google Scholar]
  58. Shawe-Taylor, J.; Sun, S. Kernel Methods and Support Vector Machines. Acad. Press Libr. Signal Process. 2014, 1, 857–881. [Google Scholar]
  59. Shawe-Taylor, J.; Cristianini, N. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods 22; Cambridge University Press: London, UK, 2001. [Google Scholar]
  60. Muaz, M.; Ali, S.; Fatima, A.; Idrees, F.; Nazar, N. Human Fall Detection. In Proceedings of the 16th International Multi Topic Conference, INMIC, Lahore, Pakistan, 19–20 December 2013. [Google Scholar]
  61. Leite, G.; Silva, G.; Pedrini, H. Three-Stream Convolutional Neural Network for Human Fall Detection. In Deep Learning Applications 2; Springer: Singapore, 2020; pp. 49–80. [Google Scholar]
  62. Zou, S.; Min, W.; Liu, L.; Wang, Q.A.Z.X. Movement Tube Detection Network Integrating 3D CNN and Object Detection Framework to Detect Fall. Electronics 2021, 10, 898. [Google Scholar] [CrossRef]
  63. Charfi, I.; Mitéran, J.; Dubois, J.; Atri, M.; Tourki, R. Optimised spatio-temporal descriptors for real-time fall detection: Comparison of SVM and Adaboost based classification. J. Electron. Imaging 2013, 22, 17. [Google Scholar] [CrossRef]
  64. Lu, N.; Wu, Y.; Feng, L.; Song, J. Deep Learning for Fall Detection: Three-Dimensional CNN Combined with LSTM on Video Kinematic Data. IEEE J. Biomed. Health Inform. 2019, 23, 314–323. [Google Scholar] [CrossRef] [PubMed]
  65. Min, W.; Cui, H.; Rao, H.; Li, Z.; Yao, L. Detection of Human Falls on Furniture Using Scene Analysis Based on Deep Learning and Activity Characteristics. IEEE Access 2018, 6, 9324–9335. [Google Scholar] [CrossRef]
  66. Kong, Y.; Huang, J.; Huang, S.; Wei, Z.; Wang, S. Learning Spatiotemporal Representations for Human Fall Detection in Surveillance Video. J. Vis. Commun. Image Represent. 2019, 59, 215–230. [Google Scholar] [CrossRef]
  67. Taramasco, C.; Rodenas, T.; Martinez, F.; Fuentes, P.; Munoz, R.; Olivares, R.; De Albuquerque, V.H.; Demongeot, J. A Novel Monitoring System for Fall Detection in Older People. IEEE Access 2018, 6, 43563–43574. [Google Scholar] [CrossRef]
  68. Ogas, J.; Khan, S.; Mihailidis, A. DeepFall: Non-Invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders. J. Healthc. Inform. Res. 2020, 4, 50–70. [Google Scholar]
  69. Gu, C.; Sun, C.; Ross, D.A.; Vondrick, C.; Pantofaru, C.; Li, Y.; Vijayanarasimhan, S.; Toderici, G.; Ricco, S.; Sukthankar, R.; et al. Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  70. Peng, X.; Schmid, C. Multi-region Two-Stream R-CNN for Action Detection. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
  71. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 28, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  72. Carreira, J.; Zisserman, A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  73. Fan, Y.; Levine, M.D.; Wen, G.; Qiu, S. A deep neural network for real-time detection of falling humans in naturally occurring scenes. Neurocomputing 2017, 260, 43–58. [Google Scholar] [CrossRef]
  74. Núñez-Marcos, A.; Azkune, G.; Arganda-Carreras, I. Vision-Based Fall Detection with Convolutional Neural Networks. Wirel. Commun. Mob. Comput. 2017, 2017, 1–16. [Google Scholar] [CrossRef]
  75. Hsieh, Y.Z.; Jeng, Y.-L. Development of Home Intelligent Fall Detection IoT System Based on Feedback Optical Flow Convolutional Neural Network. IEEE Access 2018, 6, 6048–6057. [Google Scholar] [CrossRef]
  76. Carneiro, S.A.; da Silva, G.P.; Leite, G.V.; Moreno, R.; Guimarães, S.J.F.; Pedrini, H. Multi-Stream Deep Convolutional Network Using High-Level Features Applied to Fall Detection in Video Sequences. In Proceedings of the International Conference on Systems, Signals and Image Processing (IWSSIP), Osijek, Croatia, 5–7 June 2019. [Google Scholar]
  77. Leite, G.; Silva, G.; Pedrini, H. Fall Detection in Video Sequences Based on a Three-Stream Convolutional Neural Network. In Proceedings of the 18th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019. [Google Scholar]
  78. Menacho, C.; Ordoñez, J. Fall detection based on CNN models implemented on a mobile robot. In Proceedings of the 17th International Conference on Ubiquitous Robots (UR), Kyoto, Japan, 22–26 June 2020. [Google Scholar]
  79. Chhetri, S.; Alsadoon, A.; Al-Dala, T.; Prasad, P.W.C.; Rashid, T.A.; Maag, A. Deep learning for vision-based fall detection system: Enhanced optical dynamic flow. Comput. Intell. 2020, 37, 578–595. [Google Scholar] [CrossRef]
  80. Vishnu, C.; Datla, R.; Roy, D.; Babu, S.; Mohan, C.K. Human Fall Detection in Surveillance Videos Using Fall Motion Vector Modeling. IEEE Sens. J. 2021, 21, 17162–17170. [Google Scholar] [CrossRef]
  81. Berlin, S.J.; John, M. Vision based human fall detection with Siamese convolutional neural networks. J. Ambient Intell. Humaniz. Comput. 2022, 13, 5751–5762. [Google Scholar] [CrossRef]
  82. Alanazi, T.; Muhammad, G. Human Fall Detection Using 3D Multi-Stream Convolutional Neural Networks with Fusion. Diagnostics 2022, 12, 20. [Google Scholar] [CrossRef]
  83. Gruosso, M.; Capece, N.; Erra, U. Human segmentation in surveillance video with deep learning. Multimed. Tools Appl. 2021, 80, 1175–1199. [Google Scholar] [CrossRef]
  84. Soille, P. Morphological Image Analysis: Principles and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
  85. Gonzalez, R.C.; Woods, R.E. Digital Image Processing; Pearson Education Limited: London, UK, 2018. [Google Scholar]
  86. Musallam, Y.K.; Al Fassam, N.I.; Muhammad, G.; Amin, S.U.; Alsulaiman, M.; Abdul, W.; Altaheri, H.; Bencherif, M.A.; Algabri, M. Electroencephalography-based motor imagery classification using temporal convolutional network fusion. Biomed. Signal Process. Control 2021, 69, 102826. [Google Scholar] [CrossRef]
  87. Chamle, M.; Gunale, K.G.; Warhade, K.K. Automated unusual event detection in video surveillance. In Proceedings of the International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 26–27 August 2016. [Google Scholar]
  88. Alaoui, A.Y.; El Hassouny, A.; Thami, R.O.H.; Tairi, H. Human Fall Detection Using Von Mises Distribution and Motion Vectors of Interest Points. Assoc. Comput. Mach. 2017, 82, 5. [Google Scholar]
  89. Poonsri, A.; Chiracharit, W. Improvement of fall detection using consecutive-frame voting. In Proceedings of the International Workshop on Advanced Image Technology (IWAIT), Chiang Mai, Thailand, 7–9 January 2018. [Google Scholar]
  90. Alaoui, A.Y.; Tabii, Y.; Thami, R.O.H.; Daoudi, M.; Berretti, S.; Pala, P. Fall Detection of Elderly People Using the Manifold of Positive Semidefinite Matrices. J. Imaging 2021, 7, 109. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.