Automated Remote Detection of Falls Using Direct Reconstruction of Optical Flow Principal Motion Parameters

Karpuzov, Simeon; Kalitzin, Stiliyan; Georgieva, Olga; Trifonov, Alex; Stoyanov, Tervel; Petkov, George

doi:10.3390/s25185678

Open AccessArticle

Automated Remote Detection of Falls Using Direct Reconstruction of Optical Flow Principal Motion Parameters

by

Simeon Karpuzov

¹

,

Stiliyan Kalitzin

^2,3

,

Olga Georgieva

^1,4

,

Alex Trifonov

¹,

Tervel Stoyanov

¹ and

George Petkov

^1,*

¹

GATE Institute, Sofia University, 1164 Sofia, Bulgaria

²

Stichting Epilepsie Instellingen Nederland (SEIN), Achterweg 5, 2103 SW Heemstede, The Netherlands

³

Image Sciences Institute, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands

⁴

Faculty of Mathematics and Infromatics, Sofia University, St. Kliment Ohridski, 1164 Sofia, Bulgaria

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(18), 5678; https://doi.org/10.3390/s25185678

Submission received: 29 July 2025 / Revised: 2 September 2025 / Accepted: 9 September 2025 / Published: 11 September 2025

(This article belongs to the Special Issue Advanced Signal and Image Processing Techniques for Sensor Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Highlights

What are the main findings?

Falls can be reliably detected using optical flow video processing algorithms.
Real-time performance is enhanced by direct reconstruction of principal motion parameters.

What is the implication of the main finding?

The proposed algorithm allows for modular integration into existing patient care observation systems.
It provides non-obstructive, maintenance-free, and privacy-respecting tools for safety.

Abstract

Detecting and alerting for falls is a crucial component of both healthcare and assistive technologies. Wearable devices are vulnerable to damage and require regular inspection and maintenance. Manned video surveillance avoids these problems, but it involves constant labor-intensive attention and, in most cases, may interfere with the privacy of the observed individuals. To address this issue, in this work we introduce and evaluate a novel approach for fully automated fall detection. The presented technique uses direct reconstruction of principal motion parameters, avoiding the computationally expensive full optical flow reconstruction and still providing relevant descriptors for accurate detections. Our method is systematically compared with state-of-the-art techniques. Comparisons of detection accuracy, computational efficiency, and suitability for real-time applications are presented. Experimental results demonstrate notable improvements in accuracy while maintaining a lower computational cost compared to traditional methods, making our approach highly adaptable for real-world deployment. The findings highlight the robustness and universality of our model, suggesting its potential for integration into broader surveillance technologies. Future directions for development will include optimization for resource-constrained environments and deep learning enhancements to refine detection precision.

Keywords:

fall detection; optical flow; real-time detection; video-surveillance; principal motion parameters

1. Introduction

1.1. Motivation

Detection of falls is an increasingly important task in the modern world. Approximately one-third of adults aged 65 and above in the European Union experience one fall annually [1]. Falls are dangerous and often lead to serious injuries and health complications. In many cases, falling can be fatal [2]. Falls frequently require medical attention, and people often need to be hospitalized [2]. This rounds up to an annual cost of treatment around €25 billion, which is a substantial toll [3]. Naturally, developing alerting and prevention strategies, protocols, methods, and technologies is an important endeavor that helps reduce the negative outcomes and costs related to falls in the elderly population as well as in specific groups of vulnerable individuals such as epileptic patients.

There are different approaches for automated fall detection that are used in practice. Frequently used devices include wearable sensors and ambient devices. Wearable sensors, which are the most accurate [4,5,6], are widely used. They offer round the clock monitoring and instant response in the event of an emergency. Wearable sensors are affordable and can be customized based on the individual using them. Ambient sensors [7] also offer substantial benefits—they require no compliance from the user, provide broad area coverage, and are usable continuously. In the current work, we use a visual-based sensor (camera) for data collection, specifically for the purpose of fall detection. When compared to wearables, camera-based systems offer several advantages. They can successfully identify falls in any individual present within the monitored space. Wearable devices presuppose that their user is already at risk of falling. Camera-based systems have no such assumption, thus broadening their field of application. Camera infrastructure is more robust and, once installed, requires little support and/or maintenance. Wearable devices need to be charged and placed properly. They may also pose certain issues of compromising comfort, privacy, and add stigmatizing effect for the individual wearing them. Remote sensors, such as cameras, will monitor and alert without obstructive effects. When compared to ambient devices, cameras are less expensive and require less maintenance while offering comparable benefits in terms of detection accuracy. There are several disadvantages when using cameras that are worth mentioning. They may be less accurate in certain scenarios due to unfavorable lighting conditions. Obstructions and limited coverage my further impact detection capability. Personal privacy is also impacted when working with video data. These are important points that need to be accounted for when selecting a sensor for the task.

Our method for fall detection works by processing video data with Optical Flow (OF) [8,9,10,11] techniques. Such methods are widely used in different computer vision tasks, and their application for fall detection can offer various benefits. OF methods allow us to capture motion without the need for physical markers or wearable devices [12]. If certain conditions are met, of methods allow for real-time analysis of movement [13]. To work properly, Optical Flow methods only require simple and cheap hardware such as a basic USB camera, connected to a personal computer [14]. OF techniques have been shown to perform well under a dynamic environment—complex scenes with multiple moving objects [15]. Finally, Optical Flow methods can be combined with machine learning algorithms [16] in order to better solve different computer vision tasks.

1.2. Related Works

Fall detection using camera-based systems is an increasingly important topic in the field of computer vision and digital healthcare. A lot of work is constantly being performed in the field. Many different approaches are available; we refer the reader to the relevant literature [4,17]. Here we will go over only those methods that use Optical Flow evaluation in their workflow. In [18], the motion vector field for each two consecutive frames is calculated. The mean value of the vertical component of the vector field for each two frames is evaluated. From it, certain features such as the maximum vertical velocity and acceleration are derived, and together with the maximum amplitude of the recorded sound, a feature vector is defined. An SVM [19] classifier is trained on a large dataset, and subsequently the classifier is evaluated on new data. For this work, both video and audio data are used. The method proposed in [20] utilizes a mixture of background subtraction [21], standard Optical Flow techniques, and Kalman filtering [22] to detect falling events. A feature vector that consists of the “angle”, the “ratio” (defined as width-to-height relationship of a bounding rectangle), and the “ratio derivative” (defined as the velocity with which the silhouette changes) is used with a k-Nearest Neighbor [23] classifier to classify whether to observed movement event is a fall. In [24], OF is calculated, and various features of points of interest are derived. These are then used by a Convolutional Neural network [25] for classification of falling events. The system also includes a two-step rule-based motion detection system that handles large, sudden movements and applies rule-based mechanisms when variations in optical flow exceed a threshold.

In the current paper, Optical Flow is calculated by a novel technique developed in our group [9]. This OF method provides substantial benefits when compared to traditional OF algorithms. GLORIA is a group parameter reconstruction method known for its computational efficiency and speed. In the following sections, we show how to utilize the benefits of this method in solving the fall detection problem. We would like to point out that the method for fall detection we are presenting has been designed and tested on indoor data with a single moving person in frame but can certainly be applied in open space applications as well. We would also note that, even though we analyze video data, our method is set up in such a way that respects the privacy of the individual being filmed. This is a specific property of the GLORIA algorithm—it extracts only principal motion information from a video. Any other structure is lost, and the results of the technique cannot be used for any other purposes (unethical or otherwise). This is a central theme in computer vision tasks. We have gone in more detail in a previous work of ours [13] regarding sensitivity of the video data. Our technique has low computational requirements. It can run in real-time on low to mid-range CPUs. This is a significant advantage when compared to other contemporary methods which require a dedicated GPU for their detection workflow.

1.3. Organization

This paper is organized in the following way: Our novel fall detection method is introduced in the following chapter. We present its description, the machine learning methods that are used for classification, the evaluation metrics, and the video datasets on which our algorithm is evaluated. Subsequently, we present our results on said datasets and provide comments about the method based on accuracy, speed, and other conditions such as camera placement. In the final section of the paper, we go over further comments, directions for future improvements, limitations, and the properties of our approach that make it stand out.

2. Materials and Methods

Our method consists of three steps. First, the GLORIA technique allows us to extract motion features from a subsection of a video. We use it to calculate principal motion information (two translations, two shears, dilatation, and rotation) and organize this data into a lower-dimensional structure—a 6 × 150 vector. This data is then used as an input to our CNN. The network classifies events based on that input, and subsequently, the LSTM allows us to better capture short-term dynamics (a key property of these types of methods). The final step increases detection accuracy. We go over each step in more detail in the following subchapters, starting with a brief description of the GLORIA technique.

2.1. GLORIA Net

Description of our current method begins by introducing the OF problem in Equation (1). We refer to an individual pixel in a color image frame as

L^{c} (x, y, t)

. Here, we denote with

(x, y, t)

the spatial coordinates and time, and

c

is an index for the color channel (RGB). If we assume that all temporal changes in the image content arise solely from scene deformation, and we define the local velocity vector field as

v (x, y, t)

, then the corresponding image transformation can be expressed as:

\frac{d L^{c}}{d t} = - \nabla L^{c} \cdot v \equiv \nabla_{v} L^{c}

(1)

In Equation (1), by

\nabla_{v}

we denote the vector field operator,

c = 1 : N_{c}

is the current color channel,

N_{c}

is the total number of color channels, and

t

is the time. The velocity field can account for a wide variety of objects’ basic motions, such as rotations, dilatations, translations, etc. For the current method, however, we do not need to calculate the velocity vector field for each point, as we can directly reconstruct the global features of the optic flow by considering only specific aggregated values associated with it. Specifically, we aim to identify the global two-dimensional linear non-homogeneous transformations, which comprise rotations, translations, dilatations, and shear transformations that characterize fall movements. For this purpose, we use the GLORIA algorithm. The vector field operator from Equation (1) takes the following form:

\nabla_{v} \equiv v \cdot \nabla \equiv \sum_{k} v_{k} \nabla_{k}; \nabla_{k} L^{c} \equiv \frac{\partial L^{c}}{\partial x_{k}}

(2)

The representation in Equation (2) is used for the decomposition of the vector field

v

into a group of several known transformations. In Equation (3), we denote by

v^{u}

the transformation generators within the group and by

A^{u}

their corresponding parameters.

v \equiv \sum_{u} A^{u} v^{u}

(3)

With Equation (3), we introduce a set of differential operators for the group of transformations that form a Lie algebra:

G^{u} \equiv \sum_{k} v_{k}^{u} \nabla_{k}

(4)

We can apply Equation (4) to the group of six general linear non-homogeneous transformations in two-dimensional images to derive the global motion operators:

G^{t r a n s l a t i o n_{x}} = \nabla_{x}; G^{t r a n s l a t i o n_{y}} = \nabla_{y}; G^{d i l a t i o n} = x \nabla_{x} + y \nabla_{y}; G^{r o t a t i o n} = y \nabla_{x} - x \nabla_{y}; G^{s h e a r_{1}} = x \nabla_{y}; G^{s h e a r_{2}} = y \nabla_{x}

(5)

The aggregated values related to the translations, shears, rotation, and dilatation

A_{u}

can be expressed through the structural tensor

S_{k j}

and the driving vector field

H_{k}

:

H_{k} = - \sum_{c} \frac{d L^{c} (x, t)}{d t} \nabla_{k} L^{c} S_{k j} = \sum_{c} \nabla_{k} L^{c} \nabla_{j} L^{c}; j, k = 1,2 A^{u} = {(S^{u v})}^{- 1} H^{u}

(6)

The GLORIA algorithm allows us to skip pointwise OF calculation and instead obtain a 6-element mean-flow vector for each two consecutive frames. This is very advantageous as it allows us to speed up the process significantly while preserving a high degree of accuracy. Another strength of this method is that it works regardless of camera position. It is also important to note that GLORIA can work with spectral data (in our case the color images that make up the video). As seen from Equation (1), all spectral channels contribute to the solution of the inverse problem. It is in general, and especially in the case of using only intensity images, an underdetermined problem because the local velocities in the directions of constant intensity can be arbitrary. This degeneracy is less likely to occur in multichannel images.

We examine the videos by splitting them into N = 150 frame subdivisions. For each subsection, we have a 6 × 150 array. We use our datasets to train a Convolutional Neural Network (CNN) that works with our GLORIA vectors and subsequently classify whether new video data contains a fall event or not.

The CNN uses a 1D-like architecture by applying 2D convolutional layers, allowing it to capture patterns across the width while preserving vertical structure. The network has an input layer tailored to arrays of size 6 × 150. It uses three convolutional blocks to extract features. The first two blocks each have a convolutional layer, a batch normalization, ReLU activation function, and max pooling step. These layers extract different features and reduce the width of the feature maps. The last convolutional block is used to refine finer details and features and can be optional. Afterwards, the feature maps are flattened into a one-dimensional vector, passed through a dense layer with 64 neurons, and regularized with dropout step to prevent overfitting. The final fully connected layer has two neurons that are used for binary classification, followed by a softmax and classification layer to output probabilities and compute the cross-entropy loss. The model is trained using an Adam optimizer with a learning rate of 0.001, mini-batches of size 32, and for up to 16 epochs. Validation is performed regularly using test data, and L2 regularization (0.001) is applied to improve generalization. This architecture is specially designed for our type of structured temporal data. Our main idea with this approach is that a fall is a very fast and abrupt event that will have an effect on all six elements of the flow vector. The CNN helps us to additionally refine and classify features that reflect the fall. A diagram of the network’s layers is presented in Figure 1:

2.2. LSTM

The GLORIA algorithm generates a sequence of 6 × 1 vectors, each quantifying the total motion between consecutive video frames, thereby encoding inherent temporal dependencies. To classify videos for fall detection, a model capable of processing this sequential data while preserving historical context was essential. Traditional recurrent neural networks (RNNs) are fundamentally limited in maintaining long-term contextual information. Consequently, we adopted Long Short-Term Memory (LSTM) networks [26], a robust extension designed to overcome these limitations through its specialized gating mechanism. Despite their prevalent use in natural language processing, LSTMs have demonstrated strong performance in time-series analysis, suggesting their utility for our classification problem. Our solution employs a hybrid CNN-LSTM architecture to leverage the respective strengths of both models. Their specific properties allow them to capture short-term dynamics like sudden movements and long-term tendencies like slow postural movements. This makes them very effective for temporal data such as human activity or in our current case—fall detection. Convolutional Neural Network (CNN) layers are first utilized to extract local patterns and refine the raw motion vector representation. The output from these CNN layers is then fed into LSTM layers, which process the refined vectors sequentially to identify meaningful temporal features. This integrated approach effectively combines the spatial feature extraction capabilities of CNNs with the temporal modeling power of LSTMs, resulting in enhanced classification accuracy for fall detection. This LSTM approach allows us to additionally increase the accuracy of our method. Figure 2 shows where the Bidirectional LSTM block fits in our CNN layer structure:

2.3. Datasets

We have used a total of three datasets for method training and evaluations in this work. They are all public and available for download by anyone. We have decided to use widely popular and freely available resources as we believe it is a good way to benchmark our method’s capabilities.

2.3.1. UP Fall Detection Dataset

This dataset [27] contains 1118 videos in total with FPS of 18 frames per second. There are two camera positions from which falls are recorded. Videos are short, with an average duration of roughly 16 s. The camera is placed horizontally with respect to the floor. Frame size is 640 × 480. The recorded video is in color. The dataset is labeled by activity. These are the motions the recorded person is performing—jumping, laying on the ground, sitting in a chair, walking, squatting, falling, and standing still upright.

2.3.2. LE2I Video Dataset

This dataset [28] contains 191 videos with FPS of 25 frames per second. There are numerous camera positions, including camera placement in one of the upper corners of the room. This means that sometimes, depending on fall direction and camera placement, the velocities of the person related to the fall will not have a significant y-component. Frame resolution is 320 × 240. Here the videos are labeled by marking during which frames of the video a fall occurs.

2.3.3. UR Fall Detection Dataset

This dataset [29] contains 70 videos with FPS of 30 frames per second. Like the previous dataset, camera positions are numerous. Kinect cameras are used to record fall events. The videos are in color. We use this dataset mainly as out-of-distribution set to further evaluate the performance of the GLORIA Net method.

2.4. Evaluation Parameters

In the current work, we are interested in binary classification. In order to evaluate the performance of our models, we introduce a number of statistical values, derived from the confusion matrix. This is a 2 × 2 matrix that is used to compare ground-truth labels to labels predicted from our models. It has the following entries:

True positives (TP)—number of times we have correctly predicted an event as positive (e.g., our model predicts a fall, and a fall occurred indeed);
False positives (FP)—number of times we have incorrectly predicted an event as positive (Type I error);
True negative (TN)—number of times we have correctly predicted an event as negative (e.g., our model predicts no fall, and a fall did not occur);
False positives (FN)—number of times we have incorrectly predicted an event as negative (Type II error).

Using these values, we can define the following measures:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(7)

Accuracy is a useful evaluation parameter as it shows us the overall proportion of correctly predicted events. Another measure we use in the article is the Sensitivity:

S e n s i t i v i t y = \frac{T P}{T P + F N}

(8)

This measure indicates how often the model correctly classifies positive cases. Similarly, the Specificity shows us how good the model’s prediction is for negative cases:

S p e c i f i c i t y = \frac{T N}{T N + F P}

(9)

Precision shows us the rate of predicted positives that are classified correctly:

P r e c i s i o n = \frac{T P}{T P + F P}

(10)

The F1-score is a good example for a single parameter that can evaluate false positives and false negatives at the same time:

F 1 = 2 \frac{P r e c i s i o n \cdot S e n s i t i v i t y}{P r e c i s i o n + S e n s i t i v i t y}

(11)

In order to demonstrate the performance of our model graphically, we will use Receiver Operating Characteristic (ROC) curves. They plot the Sensitivity (True Positive Rate) against the False positive rate (1 − Specificity) at different thresholds. They are useful as the Area Under the Curve (AUC) can be used to describe how effective a model is. The closer AUC is to 1, the better the model is. An AUC value close to 0.5 indicates random guessing. The introduced values so far are dimensionless and indicate a percentage.

Used together, all the measures described so far can give a very detailed evaluation of the performance of our predictive model.

3. Results

A Lenovo^® ThinkPad (Lenovo, Hong Kong) with an Intel^® Core i5 CPU (Intel, Santa Clara, CA, USA) and 16 Gb of RAM is used to process videos from our sets. Algorithm realization is carried out in MATLAB^® R2023b environment.

3.1. GLORIA Net Results

For training and evaluation of this method, we combined the UP-Fall dataset and the LE2I dataset. The CNN was trained on 90% of the data (as well as finetuning the hyperparameters and other network-related settings). Performance evaluation was then carried out on the remaining 10% of the data. We also introduced an out-of-distribution dataset (UR Fall Detection Dataset), in order to give a more detailed and clearer picture on model performance as sometimes CNNs tend to overfit on their training data. This also helps us to verify the quality, adequacy, reusability, and robustness of the selected features. ROC curves for GLORIA Net are available in Figure 3:

The other evaluation measures, introduced in Section 2.4, are displayed in Table 1:

With an included LSTM approach, the performance of our method is slightly improved (we register less false negative (FN) events). In this case, the same evaluation measures are presented in Table 2:

The results in Table 2 show the beneficial effect of the LSTM approach.

3.2. Comparison in Processing Times

We also provide a detailed evaluation of the speed with which we process the video data and classify events in a 150-frame window extract from the video. Figure 4a shows us the time our method needs to decide whether a fall event is present or not in a 150-frame time window for different camera resolutions. Figure 4b displays the memory requirements of our method. The biggest contribution is related to the fact that we need to load two images at any given time in order to extract the six-element vector of principal movements.

This points to the method’s applicability in real-time scenarios and the overall low resource requirements.

4. Discussion and Conclusions

We have developed a novel method for detecting falls using video data from standard cameras. Our technique is very robust—it does not depend on camera placement, which is very advantageous as often cameras are placed in various locations throughout a room. We provide two evaluations with which we can judge overall performance of the method—one on the 10% remainder of training data and another on an OOD set. From the evaluations, we see that it matches the accuracy of state-of-the-art optical flow methods such as the one presented in [18], without needing fusion of both video and audio data.

The method relies on a fast OF algorithm, which makes it very suitable for real-time problems such as the fall detection task. Both of these factors—freedom to place the camera in different locations and the computational efficiency—show us the universality of our method and its application for detection of falls in real-time.

GLORIA Net runs on a low to mid-range laptop CPU with low memory requirements and provides highly accurate results. Many contemporary methods for fall detection rely on processing through a dedicated GPU (both in training and analyzing new data). This is a key feature that demonstrates that it is competitive and practically relevant.

In addition to the above, it is worth mentioning that in the present work, we show an original approach to apply and use CNNs in video data. Most commonly, CNNs are used with images. A number of possible approaches [30,31] for application of Convolutional Neural networks in videos can be found in the literature, but the integration of the principal component OF parameters with a CNN is a novel contribution.

Recently, our group has introduced a system for real-time automated detection of epileptic seizures from video. Its capabilities were upgraded to include patient tracking and multiple camera coverage. GLORIA Net has been added to the existing system as it is modular in nature. We are currently observing its work, but so far, the addition has been very beneficial.

A limitation of our method is that it works with chunks of video data—collections of 150 frames from the video. This can be restricting in certain scenarios. As a future direction of development, we plan to increase responsiveness of the method so relevant parties can react quicker in the event of a fall. As with all other optical flow methods, GLORIA Net needs constant brightness of the scene in order to function properly. A future direction of work may include sensor fusion with an IR camera, utilizing the multi-spectral nature of the GLORIA technique. This would allow the fall detection method to function in more varied lighting conditions.

Another direction for future development is to define and extract features from the data that are related to the fall event in a physical sense. For example, we may look at the fall duration, the acceleration of the person, whether or not movement is periodic, and other similar quantities. Such preliminary refinement may help increase the accuracy of our fall detection scheme.

The datasets we have used in the current article are of videos in which a single person is present. We plan to test our method and upgrade it if needed on data that contains multiple moving people. A more thorough investigation of the additional LSTM block will be conducted on this data.

Many commercially available fall detection systems exist. Like the method here, they are described as accurate, inobtrusive, real-time, and automatic. What our system can do better is to rely on less sensors than wearable approaches [6,7]—it can work with a single camera while retaining real-time function with accurate results. In addition, cameras are readily available and can be found in many hospitals and homes, which removes the need to install and maintain special ambient sensors [5]. These benefits of our method, when compared to commercial products, further highlight its usefulness and ease of applicability.

Author Contributions

Conceptualization, G.P. and S.K. (Stiliyan Kalitzin); methodology, S.K. (Stiliyan Kalitzin), S.K. (Simeon Karpuzov), and O.G.; software, S.K. (Simeon Karpuzov), T.S. and A.T.; validation, S.K. (Simeon Karpuzov), T.S. and A.T.; formal analysis, G.P.; investigation, S.K. (Simeon Karpuzov), T.S., G.P., and A.T.; resources, O.G. and S.K. (Stiliyan Kalitzin); data curation, S.K. (Simeon Karpuzov); writing—original draft preparation, S.K. (Simeon Karpuzov); writing—review and editing, O.G., S.K. (Stiliyan Kalitzin), and G.P.; visualization, S.K. (Simeon Karpuzov), T.S., and A.T.; supervision, S.K. (Stiliyan Kalitzin) and G.P.; project administration, G.P.; funding acquisition, G.P., O.G., and S.K. (Stiliyan Kalitzin). All authors have read and agreed to the published version of the manuscript.

Funding

This research is part of the GATE project funded by the Horizon 2020 WIDESPREAD-2018-2020 TEAMING Phase 2 program under grant agreement no. 857155 and the program “Research, Innovation and Digitalization for Smart Transformation” 2021-2027 (PRIDST) under grant agreement no. BG16RFPR002-1.014-0010-C01. Stiliyan Kalitzin is partially funded by “Anna Teding van Berkhout Stichting”, Program 35401, Remote Detection of Motor Paroxysms (REDEMP).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

OF	Optical Flow
SVM	Support Vector Machine
CNN	Convolutional Neural Network
ROC	Receiver Operator Characteristic
AUC	Area Under the Curve
FPS	Frames Per Second

References

Davis, J.C.; Husdal, K.; Rice, J.; Loomba, S.; Falck, R.S.; Dimri, V.; Pinheiro, M.; Cameron, I.; Sherrington, C.; Madden, K.M.; et al. Cost-effectiveness of falls prevention strategies for older adults: Protocol for a living systematic review. BMJ Open 2024, 14, e088536. [Google Scholar] [CrossRef]
European Public Health Association. Falls in Older Adults in the EU: Factsheet. Available online: https://eupha.org/repository/sections/ipsp/Factsheet_falls_in_older_adults_in_EU.pdf (accessed on 4 June 2025).
Davis, J.C.; Robertson, M.C.; Ashe, M.C.; Liu-Ambrose, T.; Khan, K.M.; Marra, C.A. International comparison of cost of falls in older adults living in the community: A systematic review. Osteoporos. Int. 2010, 21, 1295–1306. [Google Scholar] [CrossRef]
Wang, X.; Ellul, J.; Azzopardi, G. Elderly Fall Detection Systems: A Literature Survey. Front. Robot. AI 2020, 7, 71. [Google Scholar] [CrossRef] [PubMed]
Alon, N. Fall Detection and Prevention System and Method. WO2025082457, 24 April 2025. Available online: https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2025082457 (accessed on 16 June 2025).
John, C.-F. Method and System for Fall Detection. US8217795B2, 10 July 2012. Available online: https://patents.google.com/patent/US8217795B2/en (accessed on 16 June 2025).
González, C.S. Device, System and Method for Fall Detection. EP 3 796 282 A2, 21 March 2021. Available online: https://patentimages.storage.googleapis.com/e9/e8/a1/fc9d181803c231/EP3796282A2.pdf#page=19.77 (accessed on 16 June 2025).
Kalitzin, S.; Geertsema, E.; Petkov, G. Scale-Iterative Optical Flow Reconstruction from Multi-Channel Image Sequences. Front. Artif. Intell. Appl. 2018, 310, 302–314. [Google Scholar] [CrossRef]
Kalitzin, S.; Geertsema, E.; Petkov, G. Optical Flow Group-Parameter Reconstruction from Multi-Channel Image Sequences. Front. Artif. Intell. Appl. 2018, 310, 290–301. [Google Scholar] [CrossRef]
Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision (IJCAI). Available online: https://www.researchgate.net/publication/215458777 (accessed on 1 June 2025).
Horn, B.K.P.; Schunck, B.G. Determining optical flow. Artif. Intell. 1981, 17, 185–203. [Google Scholar] [CrossRef]
Jeyasingh-Jacob, J.; Crook-Rumsey, M.; Shah, H.; Joseph, T.; Abulikemu, S.; Daniels, S.; Sharp, D.J.; Haar, S. Markerless Motion Capture to Quantify Functional Performance in Neurodegeneration: Systematic Review. JMIR Aging 2024, 7, e52582. [Google Scholar] [CrossRef]
Karpuzov, S.; Petkov, G.; Ilieva, S.; Petkov, A.; Kalitzin, S. Object Tracking Based on Optical Flow Reconstruction of Motion-Group Parameters. Information 2024, 15, 296. [Google Scholar] [CrossRef]
Vargas, V.; Ramos, P.; Orbe, E.A.; Zapata, M.; Valencia-Aragón, K. Low-Cost Non-Wearable Fall Detection System Implemented on a Single Board Computer for People in Need of Care. Sensors 2024, 24, 5592. [Google Scholar] [CrossRef]
Wu, L.; Huang, C.; Zhao, S.; Li, J.; Zhao, J.; Cui, Z.; Yu, Z.; Xu, Y.; Zhang, M. Robust fall detection in video surveillance based on weakly supervised learning. Neural Netw. 2023, 163, 286–297. [Google Scholar] [CrossRef]
Chhetri, S.; Alsadoon, A.; Al-Dala’IN, T.; Prasad, P.W.C.; Rashid, T.A.; Maag, A. Deep Learning for Vision-Based Fall Detection System: Enhanced Optical Dynamic Flow. Comput. Intell. 2021, 37, 578–595. [Google Scholar] [CrossRef]
Gaya-Morey, F.X.; Manresa-Yee, C.; Buades-Rubio, J.M. Deep learning for computer vision based activity recognition and fall detection of the elderly: A systematic review. Appl. Intell. 2024, 54, 8982–9007. [Google Scholar] [CrossRef]
Geertsema, E.E.; Visser, G.H.; Viergever, M.A.; Kalitzin, S.N. Automated remote fall detection using impact features from video and audio. J. Biomech. 2019, 88, 25–32. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V.; Saitta, L. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
De Miguel, K.; Brunete, A.; Hernando, M.; Gambao, E. Home camera-based fall detection system for the elderly. Sensors 2017, 17, 2864. [Google Scholar] [CrossRef]
Piccardi, M. Background subtraction techniques: A review. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics, The Hague, The Netherlands, 10–13 October 2004; Volume 4, pp. 3099–3104. [Google Scholar] [CrossRef]
Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
Cover, T.M.; Hart, P.E. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
Hsieh, Y.Z.; Jeng, Y.L. Development of Home Intelligent Fall Detection IoT System Based on Feedback Optical Flow Convolutional Neural Network. IEEE Access 2017, 6, 6048–6057. [Google Scholar] [CrossRef]
Hinton, G.E. Computation by neural networks. Nat. Neurosci. 2000, 3, 1170. [Google Scholar] [CrossRef] [PubMed]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Martínez-Villaseñor, L.; Ponce, H.; Brieva, J.; Moya-Albor, E.; Núñez-Martínez, J.; Peñafort-Asturiano, C. UP-Fall Detection Dataset: A Multimodal Approach. Sensors 2019, 19, 1988. [Google Scholar] [CrossRef] [PubMed]
Charfi, I.; Miteran, J.; Dubois, J.; Atri, M.; Tourki, R. Optimized spatio-temporal descriptors for real-time fall detection: Comparison of support vector machine and Adaboost-based classification. J. Electron. Imaging 2013, 22, 041106. [Google Scholar] [CrossRef]
Kwolek, B.; Kepski, M. Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput. Methods Programs Biomed. 2014, 117, 489–501. [Google Scholar] [CrossRef] [PubMed]
Carreira, J.; Zisserman, A. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HA, USA, 21–26 July 2017. [Google Scholar]
Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features with 3D Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar] [CrossRef]

Figure 1. Description of the layers and architecture of GLORIA Net.

Figure 2. Addition of the LSTM block to our network.

Figure 3. ROC curves for the GLORIA Net. Green line is for the performance on UR dataset, and blue line is for the performance on the UP-Fall+LE2I 10% subset. AUC values are provided in the legend.

Figure 4. (a) Processing time required to make a classification for different frame resolutions. (b) Memory requirements for different frame resolutions.

Table 1. Accuracy, precision, sensitivity, specificity, and f1-score for both datasets.

Dataset	Accuracy	Sensitivity	Specificity	Precision	F1-Score
UP-Fall + LE2I	97.7%	98.1%	96.9%	97.0%	98.0%
UR	83.3%	83.3%	83.3%	83.3%	83.3%

Table 2. Accuracy, precision, sensitivity, specificity, and f1-score for UR dataset with and without the LSTM block.

Dataset	Accuracy	Sensitivity	Specificity	Precision	F1-Score
UR (+LSTM)	91.7% (↑ 8.4%)	83.3%	100.0% (↑ 16.7%)	100.0% (↑ 16.7%)	90.9% (↑ 7.6%)
UR (no LSTM)	83.3%	83.3%	83.3%	83.3%	83.3%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Karpuzov, S.; Kalitzin, S.; Georgieva, O.; Trifonov, A.; Stoyanov, T.; Petkov, G. Automated Remote Detection of Falls Using Direct Reconstruction of Optical Flow Principal Motion Parameters. Sensors 2025, 25, 5678. https://doi.org/10.3390/s25185678

AMA Style

Karpuzov S, Kalitzin S, Georgieva O, Trifonov A, Stoyanov T, Petkov G. Automated Remote Detection of Falls Using Direct Reconstruction of Optical Flow Principal Motion Parameters. Sensors. 2025; 25(18):5678. https://doi.org/10.3390/s25185678

Chicago/Turabian Style

Karpuzov, Simeon, Stiliyan Kalitzin, Olga Georgieva, Alex Trifonov, Tervel Stoyanov, and George Petkov. 2025. "Automated Remote Detection of Falls Using Direct Reconstruction of Optical Flow Principal Motion Parameters" Sensors 25, no. 18: 5678. https://doi.org/10.3390/s25185678

APA Style

Karpuzov, S., Kalitzin, S., Georgieva, O., Trifonov, A., Stoyanov, T., & Petkov, G. (2025). Automated Remote Detection of Falls Using Direct Reconstruction of Optical Flow Principal Motion Parameters. Sensors, 25(18), 5678. https://doi.org/10.3390/s25185678

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Automated Remote Detection of Falls Using Direct Reconstruction of Optical Flow Principal Motion Parameters

Abstract

Highlights

Abstract

1. Introduction

1.1. Motivation

1.2. Related Works

1.3. Organization

2. Materials and Methods

2.1. GLORIA Net

2.2. LSTM

2.3. Datasets

2.3.1. UP Fall Detection Dataset

2.3.2. LE2I Video Dataset

2.3.3. UR Fall Detection Dataset

2.4. Evaluation Parameters

3. Results

3.1. GLORIA Net Results

3.2. Comparison in Processing Times

4. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI