The risk of falling is a huge issue, not only for older adults in aging societies, but also for younger adults with drug and alcohol problems and for patients with chronic diseases. Falling has serious health consequences and is a leading cause of death. According to a survey by the World Health Organization [1
], falling occurs most frequently in the 28–35% of individuals between the ages of 65 and 70 and in the 42% who are over 70 years old. Falling is particularly dangerous for persons who live alone in an indoor environment, because much time can pass before they receive assistance. In this situation, many countries are adopting policies to increase the life expectancy, by providing extra care to people living independently.
For this reason, much research has focused on developing a robust fall-detection process in smart home systems using their respective specialized technologies. Progress in developing such intelligent technologies holds the promise of improving the quality of life for the aged and infirm. Popular fall detection and prediction systems have emerged based on wearable monitoring devices, ambient devices, vision-based devices, and portable devices. The most commonly used wearable devices feature accelerometers and gyroscopes embedded in belts, watches, or pendants that are reasonably comfortable to wear. However, some of these devices have disadvantages that limit their usability, such as excessive power consumption. Some of these devices produce false alarms when triggered by normal body movements, and some require the manual activation of an alarm after a fall. In addition, the elderly often forgets to wear the devices. However, wearable devices based on machine learning and intelligent systems are effective for monitoring people in both indoor and outdoor environments. Various interesting approaches for fall detection systems using wearable devices are discussed in [2
], in which accelerometers and gyroscopes are employed to collect data on the emotional state and body movements of subjects.
The processing of data collected using ambient devices provides information without demanding user intervention. Commercially available devices using advanced technologies include presence sensors, motion sensors, and vibration sensors, which can be embedded in an indoor environment, such as on furniture. Presence sensors can detect the tiniest movements of subjects, such as the movement of a finger, with a high resolution and precision. They can easily be set on high ceilings, as well as the floor, to detect the smallest movement. Compared to presence sensors, motion sensors are useful in perceiving the arm movements involved in walking in detection zones. Such zones are selected as high-traffic areas for detecting moving objects in busy indoor or outdoor areas. Vibration sensors are useful for detecting falling events, and can distinguish the activities through vibrations. Vibration sensors installed on floors or beds acquire vibration data that can be analyzed by determining the fall detection ratio [6
]. Although such sensors do not disturb the people involved, they can generate false alarms.
In recent years, the development of smart phone technologies that are incorporated with accelerometers has provided some very interesting approaches for healthcare monitoring systems, including fall detection. Mobile phone-based fall detection algorithms have been presented in [7
]. However, these devices are not useful if the monitored subject does not always have them in their hand.
For these reasons, video monitoring systems based on computer vision and machine learning are potentially more beneficial and reliable for fall detection. Vision-based devices utilizing cameras have some of the same limitations as ambient devices, such as the fact that the devices must be installed in several places to provide full coverage of the required areas over a long period of time. However, video surveillance systems can effectively predict specific human activities, such as walking, sitting down, going to bed, and getting up from bed [11
], as well as detecting fall events [13
]. Moreover, a large amount of visual information is captured in the video record of falling events. The various definitions of falling events, as well as the reasons and circumstances behind such events and other abnormal occurrences in real-world environments, should be visually analyzed in monitoring systems. The vision-based detection of abnormal or fall events could become an important tool in elderly care systems.
In developing better vision-based video monitoring systems, abnormal event detection holds great promise, but faces many challenges in real-world environments. In developing applications for this purpose, establishing a fall detection system presents difficulties because the dynamic conditions involved in falls are not well-understood. As an important focus, researchers must investigate the universal features common to all falling events. Therefore, we must differentiate falls from normal activities in a comprehensive way that reflects the scene of falling scenarios. Any system developed must be robust in the face of changing postures and positions, such as a loss of balance, and abrupt changes in direction. These conditions must be considered in reliably assessing abnormal events. However, research trends and current best practices indicate that the prospects are great for using vision-based monitoring to improve the quality of health care.
Therefore, we propose a vision-based system for fall detection by differentiating abnormal behavior or falls from normal states. This system includes simple and uncomplicated feature extraction or effective statistical analyses. Key to detecting abnormal or falling events is the recognition that they involve a loss of balance. The three research objectives of this proposed system are as follows:
Develop a monitoring system that provides a visual understanding of a person’s situation and can judge whether the state is abnormal or normal based on video data acquired using a simple and affordable RGB camera;
Develop an individualized and modified statistical analysis on each of the extracted features, providing trustworthy information, not only on the definite moment of a fall, but also on the period of a fall;
Develop an efficient way of using a Hidden Markov Model (HMM) for the detailed detection of sequential abnormal and normal states for the person being monitored.
Our proposed system intends to establish a long-term monitoring system for facilitating independent living. Major steps in the system include (1) feature extractions in estimating positions and postures of the person by utilizing the virtual grounding point (VGP
) concept, and related visual features inspired by our previous research [13
]; (2) modified statistical analysis to estimate the time interval or period for falls and normal states through extensive feature observation; and (3) the establishment of a Hidden Markov Model (HMM
) to detect the sequential normal and fall states of the person. The rest of the paper is organized as follows: Section 2
presents related works; Section 3
presents theoretical analysis and methodologies of the proposed system; Section 4
presents and evaluates experimental results, comparing the robustness and limitations of the proposed systems with existing algorithms; and finally, Section 5
presents the conclusions of this work.
2. Related Works
Related state-of-the-art fall detection systems will be discussed in this section. In order to effectively define the postures assumed by a human object, it is very important to perform feature extraction or selection, analyze the selected features, and set detection rules in a vision-based video monitoring system, especially for monitoring in health care. The most common feature extraction methods used in fall detection systems involve the human shape and motion-history images. The fall detection system presented in [14
] is a method based on motion history images (MHI
) and changes in the human shape. In this system, information on the history of motion occurring over a fixed interval can be obtained from MHI
. Then, the human shape is obtained by constructing a blob using an approximate ellipse. Finally, fall detection is achieved by considering three factors: motion quantification, analysis of the human shape, and the lack of motion after a fall. Motion quantification allows sudden motion to be detected when a person falls. The approximated ellipse constructed on the object can provide information about changes in the human shape, more precisely, changes in orientation. The final analysis provides a moving ellipse, which indicates whether the person is moving after a fall. The decision confirming a fall is made when the ellipse stops moving for five seconds after a fall. The video sequences are captured using wall-mounted cameras to cover wide areas, and the results are presented in 2D motion and shape information. Extended work on the system presented in [15
] involves both 2D and 3D information for fall detection, as the researchers intended to recover localization information on the person relative to the ground. This extended process of feature extraction involves computing the 3D head trajectories of a person, and a fall is detected if the velocity of the head exceeds a certain value and the position of the head is too close to the floor or the ground.
Similarly, a process of detecting unnatural falls has been developed [16
], which includes background subtraction using the frame difference method, feature extraction using MHI
, and a change in human shape and classification using a support vector machine (SVM
). Compared with the previous approach [15
], the system involves constructing three specific features on the human shape, namely, the orientation of the approximated ellipse, its aspect ratio, and silhouette detection below a threshold line. The aspect ratio of the ellipse describes changes in major and minor axes, and differentiates a fall from normal activities. After a fall, the previously moving object lies on the ground. For this reason, the threshold line is set by considering a suitable height from the ground. After that, fall and non-fall objects can be differentiated according to the height of the silhouette object. Finally, classifications are performed using the support vector machine (SVM
), k-nearest neighbor (KNN
) classifier, Stochastic Gradient Descent (SGD
), Decision Tree (DT
), and Gradient Boosting (GB
). The DT
algorithm consistently outperforms the rest, with a high detection rate, as confirmed using the Le2i fall detection dataset.
Another fall detection system for elderly care is proposed in [17
]. This system firstly conducts background subtraction to segment out the silhouette moving object, and then tracks the object to determine the trajectory. Secondly, a timed motion history image (tMHI
) is constructed to detect high velocities. Then, the motion is quantified to acquire pixel values for the tMHI
, which are divided by the number of pixels in the detected silhouette object. Similar to the approach presented in [16
], this system provides a useful definition of the human body posture using the following combined features as the input: the ratio, orientation, and major and minor semi-axes of the fitted ellipse. In this system, both MHI
and a projection histogram are applied to confirm that a falling event has occurred. In addition, the position of the head can be tracked in sequential frames in order to obtain useful information, since the trajectory of the head is visible most of the time. Finally, a multilayer perceptron (MLP
) neural network is applied on the extracted features to classify falls and non-falls, with an accuracy of 99.24%, and a precision of 99.60%, as confirmed using the UR fall detection dataset. In addition, an extensive automated human fall-recognition process is proposed in [18
] to support independent living for the elderly in indoor environments using the Le2i fall-detection dataset. This system is based on motion, orientation, and histogram features, and achieves an overall accuracy of 99.82%. The approaches in [14
] discussed above focus on distinguishing falls from normal activities.
However, any monitoring system should take into account consecutive daily activities in a real-world environment. Such activities include walking, standing, and sitting, as well as transitioning between these activities. In this regard, our previous work [19
] proposed human action analysis based on motion history and the orientation of the human shape. As its main purpose, this system takes into consideration a prediction of the degree of mobility for the elderly in daily activities, such as getting into and out of bed. These activities include the following consecutive actions: sitting, transitioning from sitting to lying, lying, transitioning from lying to sitting, transitioning from sitting to standing, and walking. We firstly conducted background subtraction [20
] for the proper separation of foreground and background objects. Among the features used in this system are tMHI
and the orientation of the approximated ellipse. However, the construction of one ellipse is not enough for detecting the human object region. As a key part of our design, two approximated ellipses are constructed using horizontal and vertical histogram values. The vertical histogram is employed for the whole body region, and the horizontal histogram is employed for the upper body region. Then, motion quantification is used to analyze these human activities. In considering detailed sequential actions, multiple threshold values are observed, which depend on the shape orientation and the coefficient of motion. Rather than using fall scenes, the experiments were conducted using the normal scenes in the 14 videos of the Le2i dataset. After that, we extended our system [22
] to analyze not only normal activities, but also falls. In this system, the virtual grounding point (VGP
) is introduced for feature extraction, and analyses are performed by the combined features of tMHI
. The overall accuracy of the proposed system for the detection of falling events was 82.18%, as experimentally confirmed using 15 videos from the Le2i fall detection dataset. To improve the accuracy of detecting falls in a given video sequence, we extended our research [13
] by incorporating feature selection using the VGP
concept with its related features and statistical analysis for estimating the falling period, as well as two classification methods, namely the support vector machine (SVM
) and period detection (PD
). A comparison with existing approaches using the Le2i dataset showed that our SVM
approach outperforms the rest, with a precision of 93%, a recall of 100%, and an accuracy of 100%. However, our previous system exclusively concentrated on differentiating falling events from normal events; in other words, classifying the videos which contain abnormal or normal events. A long-term monitoring system for home care requires the effective detection of sequential normal and abnormal states.
Another vision-based fall detection system using a convolutional neural network has been developed by Adrian [23
]. This system was designed to work on human motion, avoiding any dependence on the appearance of an image. An optical flow image generator is utilized to efficiently represent human motion. A set of optical flow images are stacked and then set as inputs in a convolutional neural network (CNN
) that can learn longer time-related features. A fully connected neural network (FCNN
) receives these features as inputs, and produces fall and non-fall outputs. The best overall accuracy for this system is 97%, as confirmed using the Le2i fall-detection dataset. Moreover, the approach used in [24
] proposes fall detection based on body keypoints and sequence-to-sequence architecture. In this system, a skeleton framework is modeled to receive a sequence of observed frames. Then, the coordinates of keypoints of the object are extracted from observed frames. The bounding boxes of the detected object are given to the tracking algorithm for clustering body keypoints belonging to the same person in different video sequences. A keypoint-vectorization method is exploited to extract salient features from associated coordinates. Next, the pose prediction phase is conducted, predicting the vectors of future keypoints for the person. Finally, falls are classified using the Le2i dataset, achieving an accuracy of 97%, a precision of 90.8%, a recall of 98.3%, and an F1-score of 0.944.
A related approach is presented in [25
] in an automatic fall-detection system with an RGB-D camera using a Hidden Markov Model (HMM
). Background subtraction is firstly performed by averaging the depth map to learn the background. The Kinect optical parameters for factory use are employed to obtain a real-world coordinate system, and an Open Natural Interaction (OpenNI
) is used for a Kinect-to-real-world transformation. After that, the center of mass of the person is extracted to calculate the vertical speed from that point. The standard deviation for all points belonging to a person are then calculated. After that, these three features are used as inputs to calculate the probability of HMM
. Finally, the forward-backward and Viterbi algorithms are applied to classify the states of normal activities and falls. The experiments were conducted on young and healthy subjects, and occlusions were not included. In future research, this system will be tested using real-life unhealthy subjects with occlusions. In addition, other fall detection systems for shop floor applications [26
] were modeled using an HMM
based on the vertical velocity, area variance, and height of a person, using cameras positioned to provide a top view. This incident detection method focuses on two things: detecting people in restricted areas and detecting falls. Potential fall events are analyzed based on specific features and on circumstances such as whether the person can get up or not. Analysis is also based on an allocation of status according to the location, whether the event occurs in an area that is restricted to all personnel, where work is ongoing, or where maintenance is being performed. Such considerations are valuable in identifying health and safety issues. In future work, this system will be improved by incorporating more incidents, such as collisions.
Vision-based monitoring systems for detecting falls or abnormal events can be powerful tools in various applications. These new technologies have great potential as intelligent monitoring systems. Therefore, we propose a detection system for indoor environments that include normal activities; one that is based on the consecutive states of abnormal or fall events using image processing techniques and a Hidden Markov Model. This detection system will be ideal for elderly care monitoring systems.