Exploring Optical Flow Methods for Automated Fall Detection System

Karpuzov, Simeon; Kalitzin, Stiliyan; Tabakov, Stefan; Tsolyov, Dobromir; Petkov, Georgi

doi:10.3390/info17030300

Open AccessArticle

Exploring Optical Flow Methods for Automated Fall Detection System

by

Simeon Karpuzov

¹

,

Stiliyan Kalitzin

^2,3

,

Stefan Tabakov

¹,

Dobromir Tsolyov

¹ and

Georgi Petkov

^1,*

¹

GATE Institute, Sofia University, 1164 Sofia, Bulgaria

²

Stichting Epilepsie Instellingen Nederland (SEIN), Achterweg 5, 2103 SW Heemstede, The Netherlands

³

Image Sciences Institute, University Medical Center Utrecht, Heidelberglaan 100, 3584 CX Utrecht, The Netherlands

^*

Author to whom correspondence should be addressed.

Information 2026, 17(3), 300; https://doi.org/10.3390/info17030300

Submission received: 19 February 2026 / Revised: 12 March 2026 / Accepted: 18 March 2026 / Published: 20 March 2026

Download

Browse Figures

Versions Notes

Abstract

Falls pose severe risks to vulnerable populations, particularly the elderly and individuals with adverse neurological conditions, necessitating reliable and non-obstructive detection systems. While previous multi-modal approaches utilizing video and audio have demonstrated strong performance, they face significant limitations regarding sensitivity to environmental noise. This paper presents a robust, video-only fall detection framework that eliminates reliance on acoustic data to enhance universality. We conduct a comprehensive comparative analysis of five optical flow (OF) algorithms—Horn–Schunck, Lucas–Kanade (LK), LK-Derivative of Gaussian, Farneback, and the spectral method SOFIA—to determine the range of applicability of each technique for capturing fall dynamics. Beyond detection accuracy, we investigate the computational efficiency of each configuration. This optimized, privacy-centric pipeline offers a scalable solution for continuous monitoring in home and clinical settings, addressing the critical need for immediate intervention following high-impact falls.

Keywords:

optical flow; fall detection; video analysis

1. Introduction

1.1. Motivation

Falls represent a significant health risk for vulnerable populations, particularly the elderly and individuals with epilepsy. For these groups, a fall is a potential indicator of an acute medical event, such as a seizure, or a precursor to a long-term injury. The consequences of such incidents are worsened when the individual is unable to get help due to injury or loss of consciousness. Consequently, the development of automated fall detection systems has become a critical area of research in biomedical engineering and ambient assisted living.

To address the limitations of wearable devices, which are prone to an array of issues, remote sensing approaches using computer vision have emerged as a non-obstructive alternative. In our previous work [1], we proposed an automated analysis algorithm for the remote detection of high-impact falls. That system utilized a multi-modal approach, combining optical flow (OF) features from video with sound amplitude analysis. While the inclusion of audio features increased detector specificity, reliance on sound presents significant challenges in real-world deployment, such as sensitivity to environmental noise and privacy concerns.

This article presents a substantial evolution of that pipeline, moving towards a robust, video-only fall detection system. By removing the dependency on audio, we aim to increase the universality of our solution. To compensate for the loss of audio data, we focus on maximizing the information extraction from the video stream. The specific choice of optical flow algorithm and machine learning classifier plays a decisive role in the system’s ability to distinguish falls from everyday activities. Therefore, this study systematically evaluates multiple optical flow techniques, ranging from classical global methods to modern robust estimators. Our aim is to identify the optimal configuration for real-time fall detection. We aim to investigate which OF method is most suitable for which use case. We also propose an extracted feature set to achieve better detection performance. We will examine the potential benefit of these additions and determine which can help with the exclusion of the audio data.

While numerous optical flow methods exist in isolation, their performance varies drastically when deployed in live clinical environments. We proceed from the understanding that different methods behave differently depending on scene dynamics—such as the sudden lighting changes or occlusions common in hospital corridors. Therefore, we are not merely ranking algorithms; we are examining the OF methods to identify which domains allow for computationally cheaper estimators and which strictly require robust, complex solvers. This comparative analysis helps us explore new approaches that could improve the reliability of the existing system.

1.2. Related Works

Camera-based fall detection has emerged as a prominent research direction within computer vision and digital healthcare. Numerous methodologies have been explored, and we refer the reader to existing surveys for a broader overview [2,3]. In what follows, we focus specifically on approaches that incorporate optical flow (OF) analysis as a core component of their processing pipeline.

In [4], the authors propose a system aimed at improving classification accuracy in dynamic lighting conditions and optimizing pre-processing performance. The methodology employs an enhanced dynamic optical flow technique that leverages rank pooling to summarize temporal video data into a single image representation.

The method shown in [5] uses a combination of standard optical flow, background subtraction [6] and Kalman filtering [7] to determine whether the recorded movement is a fall. The algorithm proceeds by organizing the isolated motion patterns into a feature vector. This structure aggregates the silhouette angle, the width-to-height ratio of the bounding rectangle, and the ratio derivative—a parameter specifically included to quantify the rate of shape deformation over time. Subsequently, this data is used as an input for a k-nearest neighbor (k-NN) algorithm [8]. The classifier processes these features to determine whether the observed motion dynamics correspond to a fall event.

Another contemporary method is shown in [9]. Optical flow is calculated, and then the extracted information is utilized in a distinct manner. Descriptors of points of interest are derived and subsequently serve as an input to a Convolutional Neural Network (CNN) [10], which executes the final classification. To increase robustness, the framework incorporates a two-stage rule-based motion detection module. This component identifies large, abrupt movements and applies predefined rules once the optical flow magnitude surpasses specific thresholds.

In [11], principal motion parameters are passed into a lightweight CNN that classifies falls. It works in real-time and avoids computationally expensive optical flow calculation. The videos are divided into portions of five seconds, and the system analyzes each subsection for falls.

Although we focus on Optical flow methods in the current paper, it is worth mentioning some of the recent advancements in deep learning-based methods for fall detection. Recent literature in computer vision for fall detection highlights a shift towards Transformer-based architectures, which outperform traditional convolutional neural networks by modeling long-range dependencies and complex spatial representations [12,13]. Vision Transformers utilize self-attention mechanisms on discrete frame patches to capture global spatial dependencies, effectively tracking the rapid postural changes indicative of a fall with over 98% accuracy [12,13]. In addition, to mitigate privacy concerns and computational overhead, researchers are leveraging spatio-temporal graph convolutional networks (ST-GCNs) for skeleton-based detection [14,15]. By extracting human anatomical key points and modeling them as dynamic graphs, ST-GCNs analyze the temporal evolution of a skeleton, achieving exceptional robustness (over 99% accuracy) even under occlusions, multi-person scenes, or rotated viewpoints [14,15]. Finally, to overcome the vulnerabilities of single-modality systems, such as visual occlusions or false positives from wearable inertial measurement units (IMUs), recent state-of-the-art frameworks employ bimodal late-fusion networks [16]. These systems process parallel data streams (e.g., an LSTM for IMU data and an attention-based network for visual data) and aggregate independent predictions at the decision level, drastically minimizing false-positive rates to as low as 3.6% and ensuring highly reliable monitoring in challenging environments [16].

1.3. Original Contributions

In this work, we extend our original method by including tests and analysis on the following:

•: Comparative Evaluation of Optical Flow Methods: We move beyond the exclusive use of Horn–Schunck (HS) [17] by implementing and testing four additional optical flow algorithms: Lucas–Kanade (LK) [18], Lucas–Kanade Derivative of Gaussian (LKDoG) [19], Farneback (FB) [20], and SOFIA (SF) [21]. This comparison allows us to determine which method provides the most reliable motion estimation for fall dynamics under varying conditions.
•: Expanded Feature Extraction: To attempt compensation for the removal of the sound channel, we introduce new motion features derived from the optical flow field, moving beyond simple vertical velocity and acceleration to capture more subtle kinematic characteristics of falls.
•: FPS and Computational Efficiency Analysis: Real-time processing is a prerequisite for emergency alerting systems. Thus, we provide a detailed analysis of the frames per second (FPS) performance for the different OF methods. This ensures that the recommended solution is not only accurate but also viable for deployment on standard hardware.

1.4. Article Structure

The remainder of this article is organized in the following way: Section 2 details the methodology, outlining the modified detection pipeline that excludes sound analysis. We describe the working principles of the five optical flow methods selected for evaluation and the expanded feature set extracted from the flow fields. Section 3 presents the experimental results, comparing various performance metrics of each optical flow method. Finally, Section 4 discusses the implications of these results for real-world deployment, analyzes the trade-offs between accuracy and computational cost, and suggests directions for future research.

2. Materials and Methods

In our original work we combine audio and video features into a single feature vector that is then used as input for an SVM classifier. A general scheme of the system is presented in Figure 1a. Here, we remove the audio branch entirely. We examine different OF calculation methods and add a new motion feature to our feature vector. The updated system can be viewed in Figure 1b:

We begin this chapter by introducing different optical flow calculation techniques.

2.1. Optical Flow Methods

The Horn–Schunck method is a global regularization method that addresses optical flow estimation as a variational minimization problem. The method minimizes a global energy functional:

J (h) = \iint [{(I_{x} u + I_{y} v + I_{t})}^{2} + α^{2} ({|\nabla u|}^{2} + {|\nabla v|}^{2})] d x d y,

(1)

where the first term enforces brightness constancy and the second term imposes smoothness regularization on the velocity field, with the parameter α controlling the trade-off between these competing objectives. The solution is obtained through Euler–Lagrange equations. The resulting global regularization produces dense flow fields with velocity estimates at every pixel, propagating motion information from high-gradient regions into textureless areas through the smoothness constraint. Horn–Schunck produces dense flow fields with velocity estimates at every pixel. Even in textureless regions where the brightness constraint provides no useful information, the global smoothness constraint propagates flow from surrounding areas, filling in the velocity field throughout the image. This makes the method particularly valuable for applications requiring complete motion coverage across the entire image domain. This is the OF method we have used in our original work.

We implemented and tested the Farneback OF algorithm to assess its suitability for fall-related motion analysis. Farneback provides a dense flow field with good sensitivity to broad motion patterns, estimating motion for every pixel, allowing it to capture these patterns across the entire image. This dense representation is particularly sensitive to large, abrupt movements typical of fall events. From the resulting flow fields, we extracted frame-wise vertical velocity profiles and computed the standard fall-related features used in prior works (maximum vertical velocity, acceleration, deceleration).

Additionally, we also integrated the Lucas–Kanade method as a lightweight alternative to Farneback’s dense estimation. Lucas–Kanade tracks small sets of coherent key points, making it faster, sparser, and less memory intensive. Using the vertical component of the tracked object, we derived the same fall-related features, enabling a direct comparison between the two methods within the unified evaluation framework.

A different variation of the standard LK method is the so-called Lucas–Kanade Derivative of Gaussian (LKDoG). Compared with the original Lukas-Kanade (LK) method, LKDoG differs primarily in how it computes derivatives. For spatial derivatives, LKDoG first applies Gaussian smoothing to the whole input image. Then, it computes the DoG filters, followed by additional Gaussian smoothing applied on the gradient fields. The method uses separate standard deviation parameters for image smoothing and gradient smoothing to control the characteristics of these filters. For temporal derivatives, it applies a DoG filter for temporal filtering not on two consecutive frames, but on a window of multiple frames. Using DoG filters provides better noise reduction and smoother gradient estimates compared to simple finite difference approximations used in basic Lucas–Kanade implementations. The LKDoG method also incorporates temporal filtering across multiple frames rather than just two consecutive frames, making it more robust to insignificant temporal variations.

The spectral optical flow iterative algorithm (SOFIA) proposes a novel approach for optical flow reconstruction that integrates local structural information from multiple spectral components, such as color channels, to address the inverse problem. To tackle singularities, the method constructs a structural tensor using aggregated spatial-spectral gradient data, effectively enhancing the rank of the system. Furthermore, the algorithm introduces a spatial smoothening functionality applied directly to the structural tensor within local neighborhoods, which avoids the gradient cancellation effects often observed when smoothening images directly. Subsequently, an iterative multi-scale scheme is utilized where the vector field reconstructed at coarser scales deforms the source image to refine the input for finer scale calculations. Validations demonstrate that this approach significantly improves reconstruction accuracy and functional association compared to standard methods like Horn–Schunck, particularly when processing complex global translations.

In summary, a brief description of the methods is available in Table 1:

2.2. Data

In the current work we have used two publicly available datasets of people falling down on video. The first dataset is LE2I [22]. It contains 190 videos with a framerate of 25 frames per second. Various camera positions are available; most often the camera is placed in one of the upper corners of the room. The frame resolution is 320 by 240 pixels. Only one moving person is present in the videos. Only one person is falling down. This is a very widespread dataset in the current topic.

The second dataset is UG [23]. It consists of 40 videos of five seconds each. Twenty are with falls, and twenty are with other movement activities. There is a single moving person in front of a homogenous background. Camera resolution is 1920 by 1080 pixels, with a framerate of 30 frames per second. It is indoors, and the brightness in the scene is constant. It is a useful out-of-distribution set to test detection and classification performance. It differs from LE2I with respect to the background of the scenes.

Both datasets are open access and available to everyone for testing, which was our primary reason for selection—availability. Both sets are labeled.

The datasets share a similar annotation convention. For each video there is a binary flag that indicates whether or not a fall is present—1 for fall, 0 for other type of movement. LE2I also has information about the duration of the fall and which frames constitute the person falling. In UG, the duration of the individual video is quite short—5 s each—and there is only one type of movement.

2.3. Expanded Motion Features

In our previous work we extract three fall descriptors from the mean vertical velocity curve—the maximum acceleration

A_{m a x}

, the maximum deceleration

D_{m a x}

and the maximum vertical velocity

V_{m a x}

. We choose the maximum in value

(A, V, D)

triplet, such that these values are sequential in time. The mean velocity curve may have several peaks for a given time interval, and imposing a restriction related to the time of occurrence of the above features allows us to select the proper triplet that we have shown is related to the fall event. These three features are based on amplitude characteristics. Falls have a distinct duration, so it would be natural to introduce a temporal-based feature.

A natural candidate is related to the width of the velocity curve, where a

(A, V, D)

triplet is found. The width is related to the standard fall time of the average adult—somewhere between 0.5 and 3.0 s. Because of this, we add the full width at half maximum σ to our feature vector. All motion extracted features are presented in Figure 2.

This width—

σ

captures the duration of the impact, whereas the original triplet can only capture the intensity, making both features conceptually and statistically distinct. The width of the curve is helpful because it allows us to distinguish between events with different rates of movement. When a person sits down, σ will be lower. When a person falls we expect to see a wider mean velocity curve.

Calculation of σ is straightforward:

•: Find the value of $V_{m a x}$ ;
•: Construct a straight line with equation $y = 0.5 \times V_{m a x}$ ;
•: Find the intercepts $x_{1}, x_{2}$ between the velocity curve and y;
•: Obtain FWHM by subtraction: $σ = x_{2} - x_{1}$ .

After we have calculated σ, we incorporate it into the feature vector

(A_{m a x,} V_{m a x,} D_{m a x,} σ)

.

2.4. Evaluation Parameters

We split events into two categories: falls and all other types of movement. For evaluation, we use a set of statistical values derived from the confusion matrix, which is a two-by-two table that is useful when comparing ground-truth labels with model predictions. It has the following elements: true positives (TP)—number of times we have correctly predicted an event as a fall; false positives (FP)—number of times we have incorrectly predicted an event as fall; true negative (TN)—number of times we have correctly predicted an event as other movement; false positives (FN)—number of times we have incorrectly predicted an event as other movement. Using these values, we introduce the following metrics:

\begin{array}{l} A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}, S e n s i t i v i t y = \frac{T P}{T P + F N}, S p e c i f i c i t y = \frac{T N}{T N + F P}, \\ P r e c i s i o n = \frac{T P}{T P + F P}, F 1 = 2 \times \frac{P r e c i s i o n \cdot S e n s i t i v i t y}{P r e c i s i o n + S e n s i t i v i t y} \end{array}

(2)

To comprehensively evaluate the system’s performance, Accuracy is utilized to quantify the overall proportion of correctly predicted events. Complementary metrics include Sensitivity, which measures the model’s capacity to correctly classify positive instances, and Specificity, which assesses the prediction reliability for negative cases. Precision is defined as the ratio of true positive classifications to the total predicted positives. Furthermore, the F1-score is computed to provide a balanced metric that simultaneously accounts for false positives and false negatives. Graphically, the discriminatory capability of the model is illustrated using receiver operating characteristic (ROC) curves, which plot Sensitivity against the false-positive rate (1−Specificity) across varying thresholds. The area under the curve (AUC) is subsequently derived to quantify effectiveness; a value approaching 1 indicates superior performance, whereas a value near 0.5 suggests results equivalent to random guessing. Collectively, all these dimensionless metrics provide us with a robust assessment of the fall detection model.

2.5. Statistical Testing

Bootstrapping [24] is a robust, non-parametric statistical resampling technique used to estimate the variance and uncertainty of a model’s performance metric (in our case—the AUC). Because a single test set only represents one specific sample of real-world data, a single AUC score lacks context regarding its reliability. To compute a bootstrapped confidence interval (CI), the original test dataset is randomly resampled with replacement thousands of times (e.g., N = 1000 iterations) to create many synthetic test sets of the same size. The model’s AUC is recalculated on each of these resampled sets, generating a distribution of performance scores. From this distribution, the 95% confidence interval is extracted, representing the range in which the model’s true real-world performance is 95% likely to fall. When comparing multiple machine learning models, if their 95% CIs heavily overlap, it mathematically indicates that there is no statistically significant difference in their performance on that dataset size. Conversely, a clear gap between intervals denotes definitive statistical superiority.

McNemar’s Test [25] is a non-parametric statistical hypothesis test specifically designed to compare the predictive accuracy of two or more machine learning classifiers evaluated on the exact same dataset. Unlike standard accuracy metrics that evaluate the overall proportion of correct predictions, McNemar’s test isolates the specific instances where the two models disagree. The test constructs a contingency table focusing only on the cases where one of the models predicted correctly while the rest were incorrect, and vice versa. It then calculates a χ² (chi-squared) statistic to determine if the ratio of these disagreements is heavily skewed toward one model. The resulting p-value indicates whether the difference in the models’ binary classification performance is statistically significant (p < 0.05) or simply due to random variance (p ≥ 0.05). In the context of ensemble system design, a high p-value between top-performing classical methods confirms a statistical tie, validating that no single algorithm holds a universal advantage across the given data.

3. Results

A Lenovo^® ThinkPad (Lenovo, Hong Kong) with an Intel^® Core i5 CPU (Intel, Santa Clara, CA, USA) and 16 Gb of RAM is used to process videos from our sets. Algorithm ralization is carried out in the MATLAB^® R2023b environment.

The parameters of the different optical flow methods are the following:

Horn–Schunck—smoothness value is set to 1, maximum number of iterations are set to 10;
Lucas–Kanade—window size is set to 3 × 3, noise threshold is set to 0.039;
Farneback—number of pyramid levels is set to 3, pyramid scale is set to 0.5, size of the pixel neighborhood is set to 5, number of iterations per pyramid level is set to 3;
LKDoG—number of buffered frames is set to 3, noise threshold is set to 0.039, standard deviation for image smoothing filter is set to 1.5;
SOFIA—smoothing scale is set to 3, vertical and horizontal structural integration scales are set to 3.

In the paper, we follow the same SVM setup as our previous work: The LE2I dataset, consisting of 190 videos, is first randomly divided into two equal parts to form separate training and testing sets. For the classification task, the model utilizes an SVM equipped with a radial basis function (RBF) kernel. Before being fed into the classifier, the extracted event features are logarithmically transformed and standardized to possess a zero mean and unit standard deviation. The model’s parameters are tuned strictly on the training set using a 10-fold cross-validation scheme to ensure robust performance and prevent data leakage. The UG dataset is used exclusively for testing.

Our first contribution is investigating how different optical flow methods affect the overall detection algorithm. We have tested the following five: Lucas–Kanade, Horn–Schunck, Lucas–Kanade Derivative of Gaussian, SOFIA and Farneback. Results for the LE2I dataset, presented as ROC curves and performance metrics are available in Figure 3:

Here we can see that the three best performing methods in terms of maximum area below the curves are Horn–Schunck, Farneback and SOFIA. Detailed performance metrics—accuracy, sensitivity, specificity, precision, f1-score and AUC—are also presented. These values confirm the findings from the ROC curves for the three best-performing OF methods.

In addition, we present ROC curves and performance metrics for the second dataset—UG. The results are available in Figure 4:

From these results it would seem that the best performing OF method is SOFIA. Adding a new feature to the feature vector—namely the width of the velocity curve where V_y is at a maximum—gives us the following results, presented on Figure 5:

Figure 5. (Left): Performance metrics for the new feature vector on LE2I data. (Right): Performance metrics for the new feature vector on UG data. The maximum value for each metric is given in bold and black. Stronger shades of green indicate a better result. All values are in the range from 0 to 1. Higher value indicates a better result.

We can see that for LE2I, LKDoG excels, while for the UG dataset, SOFIA is still the best performing. We can also compare how the addition of the new feature affected performance. In Figure 6 we show the difference in performance metrics between the original and updated feature vector.

We provide information about the bootstrapped 95% confidence intervals for both datasets:

Figure 7 illustrates the bootstrapped 95% confidence intervals for the area under the curve of the evaluated OF methods on both datasets. On the heavily textured LE2I dataset, LKDoG (mean AUC 0.899, confidence bounds 0.812–0.950) heavily overlapped with the other classical methods, including the lowest-performing SOFIA method. This significant overlap indicates no definitive statistical superiority among the top methods in known environments. However, performance on the unseen UG dataset revealed severe degradation in performance for the majority of the algorithms. Notably, LKDoG and HS experienced drastic mean AUC reductions (dropping to 0.590 and 0.450, respectively), rendering their predictive capabilities equivalent to or worse than random chance. Conversely, SOFIA demonstrated unique environmental invariance, increasing its mean AUC to 0.780 and maintaining predictive viability despite the expanded error bounds inherent to the out-of-distribution testing.

Figure 8 presents the pairwise McNemar’s test p-values, evaluating the statistical significance of differences in the final binary predictions of the models. On the LE2I dataset, the test confirms a statistical tie among the top-tier classical estimators (LKDoG and Farneback, p = 1.000), further corroborating the AUC confidence interval findings. The test also successfully isolated the lowest-performing method, SOFIA, as statistically inferior to LKDoG (p = 0.024) and Farneback (p = 0.029) in this specific environment. In contrast, the heatmap for the out-of-distribution UG dataset shows that all pairwise p-values exceed the 0.05 threshold. This lack of statistical significance across the board mathematically reflects the high error variance and shifting performance boundaries that occur when these models are applied to drastically different conditions.

In addition, we examine the correlation between the existing features and the newly introduced σ. We calculate the Pearson linear correlation coefficient between σ and the old features. The results are presented in Table 2.

These results demonstrate that the absolute correlation coefficients remain low (∣r∣ < 0.30), indicating a weak to negligible linear relationship. Crucially, we highlight a fundamental mathematical distinction: while

V_{m a x}

,

D_{m a x}

, and

A_{m a x}

function as differential descriptors identifying instantaneous local extrema, σ serves as an integral temporal descriptor.

Finally, we examine OF method runtimes for a video with FPS = 25 and N = 150 total frames. Video frame size is 320 by 240 pixels². Results are presented in Figure 9:

These results are expected and are related to the nature of the used algorithms—sparse versus dense. In the case of LKDoG, Farneback and SOFIA, the extra assumptions, conditions and information used by the methods lead to a higher time-performance cost. Their reported calculation speed does not prevent their use in real-time calculations.

Texture Analysis of the Background Scene

We carry out quantitative analysis of the texture information of the background scene. The process initiates by sequentially extracting 100 frames from two separate videos from each dataset. A 2D directional Sobel spatial filter [26] is applied to the frame data to compute the gradient magnitude matrix, which effectively highlights regions containing strong structural edges and high contrast. To quantify the overall structural complexity of the image, this matrix is flattened and the arithmetic mean of its elements is calculated to yield a single average gradient magnitude (AGM) scalar value. This is systematically repeated for every extracted frame across both videos. Finally, the resulting temporal AGM values are plotted in Figure 10 in order to visualize the dynamic texture changes between the two videos.

These results show clearly the textural difference in the scenes of both video sets. These properties of the data have a large effect on the performance of different OF methods for fall detection.

Figure 10. AGM for each individual frame for two videos from LE2I and UG.

4. Discussion

In this study, we presented a comprehensive evaluation of a vision-based fall detection system that integrates variable optical flow (OF) algorithms with a standard ML classifier (namely the SVM). We tested the OF methods on two datasets and can summarize our findings in the following way:

On the LE2I dataset, the Horn–Schunck and Farneback methods demonstrate the strongest performance with the original feature vector. Horn–Schunck achieves the highest overall metrics, with a sensitivity of 98.4%, an F1-score of 93.3% and an AUC of 0.962. Farneback shows the second-best performance across all categories. The introduction of the new feature vector significantly shifts the results for LE2I. The Lucas–Kanade Derivative of Gaussian method sees a dramatic improvement, becoming the top performer with increases in accuracy (+22.1%) and specificity (+35.5%). This suggests that LKDoG benefits most from the additional temporal shape information in simpler environments.

In the UG dataset, SOFIA emerges as the most robust method using the original feature vector, achieving the highest accuracy (75.0%), sensitivity (90.0%), and F1-score (78.3%). While Horn–Schunck struggles here with a low accuracy of 45.0%, SOFIA maintains high reliability, outperforming standard methods like Lucas–Kanade and Farneback. Unlike the LE2I dataset, adding the new feature vector to the UG dataset generally degrades performance for most methods. SOFIA’s sensitivity drops by 30%, and Horn–Schunck sees decreases across multiple metrics. Only Lucas–Kanade shows a mild improvement in sensitivity (+30%) but suffers in specificity (−30%), indicating that the new feature may introduce noise or overfitting in this dataset.

The performance disparity on the UG dataset—where Horn–Schunck (HS) failed (45.0% accuracy) while SOFIA excelled (75.0%)—can be attributed to the handling of homogeneous backgrounds. Our AGM analysis shows that the LE2I dataset yielded an AGM of roughly 52, whereas the UG dataset demonstrated a significantly reduced AGM of roughly 15. This comparative analysis confirms that the UG dataset constitutes a homogenous, low-texture environment relative to the high-frequency spatial components found in the domestic and office backgrounds of the LE2I scenes. The observed performance degradation of the Horn–Schunck (HS) algorithm on the UG dataset is directly attributable to its reliance on global smoothness constraints. In such under-determined, textureless regions, the HS method encounters an ill-posed inverse problem: the “Aperture Problem” [27], where the lack of local gradients forces the algorithm to rely on artificial flow propagation, thereby distorting the subject’s silhouette. Conversely, SOFIA demonstrates superior architectural robustness, maintaining an accuracy of 75% in identical conditions. This stability is facilitated by SOFIA’s implementation of Tikhonov regularization on the structural tensor. By utilizing aggregated spatial-spectral gradient data to mitigate singularities, SOFIA effectively stabilizes the motion estimation in homogeneous areas, preserving critical geometric features where global smoothness constraints typically fail. The analysis of the background scene and its effect on detection accuracy can help increase the robustness of the method. As it stands, our system is intended for indoor scenarios that include a small number of people. Solving the problem of dynamic scene variation can increase the generalization extent of our work.

Our statistical tests show the following results: there is a lack of absolute statistical superiority among the evaluated optical flow methods, which fundamentally supports the necessity of a multi-method approach for real-world fall detection. On the in-distribution LE2I dataset, both 95% confidence intervals and pairwise McNemar’s tests (p > 0.05) confirmed that top-performing classical methods like HS, LKDoG and Farneback are in a statistical tie. This proves that within richly textured environments, no single method acts as a universal solution. The limitations of relying on a single, isolated estimator are further exposed by the out-of-distribution UG dataset results. When transitioned to homogeneous, low-texture backgrounds, algorithms that excelled on the LE2I dataset suffered severe relative degradation, with AUC scores plummeting to near-random chance (AUC was roughly equal to 0.500). In stark contrast, the SOFIA estimator, despite being proven statistically inferior on the textured LE2I dataset (p < 0.05), demonstrated remarkable structural robustness on the UG dataset, maintaining a highly viable predictive AUC of 0.780.

While the small sample size of the UG dataset resulted in wide confidence intervals and no definitive statistical dominance, the practical engineering implications are absolute. SOFIA shows immunity to the domain shift that disabled the other methods. This highlights its unique value in homogeneous environments. Ultimately, this performance inversion empirically proves that optical flow estimator efficacy is strictly environmentally dependent. To break through the current accuracy upper bound, future robust fall detection systems must abandon single-method architectures in favor of context-aware combinations, utilizing algorithms like LKDoG or HS for textured backgrounds while dynamically routing homogeneous video frames to more resilient algorithms like SOFIA.

One limitation of the current work is the relatively small size of the test dataset (UG). Misclassification of a small number of videos can create large differences in reported performance. However, the UG dataset provides a preliminary look at out-of-distribution performance. For future development, more specialized video data is needed to ensure more rigorous validation.

One future direction of work is to determine whether SOFIA can benefit from calculation on a graphics processing unit. We will explore full GPU acceleration (utilizing CUDA kernels) to offload the structural tensor computation and Gaussian smoothing steps.

A different direction for future work is to find better ways to utilize the temporal nature of the fall movement. The proposed new feature provided only marginal improvements for some OF methods. A natural candidate for such analysis would be to include a LSTM (long short-term memory) block or a lightweight Transformer that can help with the temporal properties of the data. A larger volume of test data may still prove the usefulness of σ.

While this study focuses on classical and spectral optical flow methods suitable for CPU-bound edge devices, we acknowledge the emergence of deep learning-based estimators like RAFT (recurrent all-pairs field transforms) [28]. RAFT offers superior accuracy in detecting fine micro-movements, but its heavy GPU requirement currently makes it cost-prohibitive for large-scale, privacy-preserving edge deployment in every hospital room. Future iterations of our system may explore lightweight distilled versions of such networks (e.g., FlowSeek or LiteFallNet) to balance this trade-off.

Finally, we aim to explore how changing the ML method for classification can increase performance. In our original work, the SVM classifier showed the best performance when compared to other existing machine learning techniques. Since then, there has been rapid development of new methods that could potentially be better for the task at hand. Such an examination would deserve its own separate study.

Author Contributions

Conceptualization, S.K. (Stiliyan Kalitzin); methodology, S.K. (Stiliyan Kalitzin), S.K. (Simeon Karpuzov) and G.P.; software, S.K. (Simeon Karpuzov), G.P. and S.K. (Stiliyan Kalitzin); data, S.K. (Simeon Karpuzov), S.T. and D.T.; verification, S.T. and D.T.; writing—original draft preparation, S.K. (Simeon Karpuzov). All authors have read and agreed to the published version of the manuscript.

Funding

This research is part of the GATE project funded by the Horizon 2020 WIDESPREAD-2018–2020 TEAMING Phase 2 program under grant agreement no. 857155 and the program “Research, Innovation and Digitalization for Smart Transformation” 2021–2027 (PRIDST) under grant agreement no. BG16RFPR002-1.014-0010-C01. Stiliyan Kalitzin is partially funded by “Anna Teding van Berkhout Stichting”, Program 35401, Remote Detection of Motor Paroxysms (REDEMP).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets used in the study are publicly available at: https://doi.org/10.5281/zenodo.17777971, https://www.kaggle.com/datasets/tuyenldvn/falldataset-imvia, accessed on 8 March 2026.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

OF	Optical Flow
LK	Lucas–Kanade
LKDoG	Lucas–Kanade Derivative of Gaussian
SF	SOFIA
FB	Farneback
HS	Horn–Schunck

References

Geertsema, E.E.; Visser, G.H.; Viergever, M.A.; Kalitzin, S.N. Automated remote fall detection using impact features from video and audio. J. Biomech. 2019, 88, 25–32. [Google Scholar] [CrossRef] [PubMed]
Gutiérrez, J.; Rodríguez, V.; Martin, S. Comprehensive Review of Vision-Based Fall Detection Systems. Sensors 2021, 21, 947. [Google Scholar] [CrossRef] [PubMed]
Singh, A.; Rehman, S.U.; Yongchareon, S.; Chong, P.H.J. Sensor Technologies for Fall Detection Systems: A Review. IEEE Sens. J. 2020, 20, 6889–6919. [Google Scholar] [CrossRef]
Chhetri, S.; Alsadoon, A.; Al-Dala’IN, T.; Prasad, P.W.C.; Rashid, T.A.; Maag, A. Deep Learning for Vision-Based Fall Detection System: Enhanced Optical Dynamic Flow. Comput. Intell. 2020, 37, 578–595. [Google Scholar] [CrossRef]
De Miguel, K.; Brunete, A.; Hernando, M.; Gambao, E. Home camera-based fall detection system for the elderly. Sensors 2017, 17, 2864. [Google Scholar] [CrossRef] [PubMed]
Piccardi, M. Background subtraction techniques: A review. In Proceedings of the 2004 IEEE International Conference on Systems, Man and Cybernetics, The Hague, The Netherlands, 10–13 October 2004. [Google Scholar] [CrossRef]
Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
Cover, T.M.; Hart, P.E. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
Hsieh, Y.Z.; Jeng, Y.L. Development of Home Intelligent Fall Detection IoT System Based on Feedback Optical Flow Convolutional Neural Network. IEEE Access 2017, 6, 6048–6057. [Google Scholar] [CrossRef]
Hinton, G.E. Computation by neural networks. Nat. Neurosci. 2000, 3, 1170. [Google Scholar] [CrossRef] [PubMed]
Karpuzov, S.; Kalitzin, S.; Georgieva, O.; Trifonov, A.; Stoyanov, T.; Petkov, G. Automated Remote Detection of Falls Using Direct Reconstruction of Optical Flow Principal Motion Parameters. Sensors 2025, 25, 5678. [Google Scholar] [CrossRef] [PubMed]
Zafar, R.O.; Farhan, Z. Real-time activity and fall detection using transformer-based deep learning models for elderly care applications. BMJ Health Care Inform. 2025, 32, e101439. [Google Scholar] [CrossRef] [PubMed]
Saha, N.; Kundu, T.; Banerjee, A.; Ghosh, A.; Ghosh, A. Fall Detection Using Vision Transformer. In Proceedings of the 2025 3rd International Conference on Computer Graphics and Image Processing (CGIP), Tianjin, China, 11–13 July 2025. [Google Scholar]
Zobi, M.; Bolzani, L.; Tabii, Y.; Thami, R.O.H. Robust 3D Skeletal Joint Fall Detection in Occluded and Rotated Views Using Data Augmentation and Inference–Time Aggregation. Sensors 2025, 25, 6783. [Google Scholar] [CrossRef] [PubMed]
Liu, L.; Sun, Y.; Ge, X. A Hybrid Multi-Person Fall Detection Scheme Based on Optimized YOLO and ST-GCN. Int. J. Interact. Multimed. Artif. Intell. 2025, 9, 26–38. [Google Scholar] [CrossRef]
Rehouma, H.; Mounir, B. Fall Detection by Deep Learning-Based Bimodal Movement and Pose Sensing with Late Fusion. Sensors 2025, 25, 6035. [Google Scholar] [CrossRef] [PubMed]
Horn, B.K.P.; Schunck, B.G. Determining optical flow. Artif. Intell. 1981, 17, 185–203. [Google Scholar] [CrossRef]
Lucas, B.D.; Kanade, T. An Iterative Image Registration Technique with an Application to Stereo Vision (IJCAI) An Iterative Image Registration Technique with an Application to Stereo Vision. Available online: https://www.researchgate.net/publication/215458777 (accessed on 29 December 2025).
Barron, J.L.; Fleet, D.J.; Beauchemin, S.S.; Burkitt, T.A. Performance of Optical Flow Techniques; Springer Nature: Berlin/Heidelberg, Germany, 1994; Volume 12, pp. 43–77. [Google Scholar] [CrossRef]
Farnebäck, G. Two-frame motion estimation based on polynomial expansion. In Scandinavian Conference on Image Analysis; Springer: Berlin/Heidelberg, Germany, 2003; pp. 363–370. [Google Scholar]
Kalitzin, S.; Geertsema, E.; Petkov, G. Scale-Iterative Optical Flow Reconstruction from Multi-Channel Image Sequences. Front. Artif. Intell. Appl. 2018, 310, 302–314. [Google Scholar] [CrossRef]
Charfi, I.; Miteran, J.; Dubois, J.; Atri, M.; Tourki, R. Optimized spatio-temporal descriptors for real-time fall detection: Comparison of support vector machine and Adaboost-based classification. J. Electron. Imaging 2013, 22, 041106. [Google Scholar] [CrossRef]
Karpuzov, S.; Petkov, G. Fall Detection Dataset UG (Underground GATE). Dataset. Available online: https://zenodo.org/records/17777971 (accessed on 15 December 2025).
Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
McNemar, Q. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 1947, 12, 153–157. [Google Scholar] [CrossRef] [PubMed]
Sobel, I. An Isotropic 3x3 Image Gradient Operator. Presentation at Stanford A.I. Project 1968. 2014. Available online: https://www.researchgate.net/publication/285159837_A_33_isotropic_gradient_operator_for_image_processing (accessed on 19 February 2026).
Nakayama, K.; Silverman, G.H. The aperture problem—I. Perception of nonrigidity and motion direction in translating sinusoidal lines. Vision Res. 1988, 28, 739–746. [Google Scholar] [CrossRef] [PubMed]
Teed, Z.; Jia, D. Raft: Recurrent all-pairs field transforms for optical flow. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar]

Figure 1. (a) Outline of our original system. Both sound and motion features are combined into a feature vector that is then used as input to an SVM classifier. (b) Updated system pipeline in the current work. We no longer use audio for fall detection.

Figure 2. All four motion features that we extracted from the video—

(A_{m a x,} V_{m a x,} D_{m a x,} σ)

.

Figure 2. All four motion features that we extracted from the video—

(A_{m a x,} V_{m a x,} D_{m a x,} σ)

.

Figure 3. (Left): ROCs for each of the tested optical flow methods. (Right): Various performance metrics of our technique, based on different OF methods for velocity field calculation. The maximum value for each metric is given in bold and black. Stronger shades of green indicate a better result. All values are in the range from 0 to 1. Higher value indicates a better result. LE2I dataset with the original feature vector.

Figure 4. (Left): ROCs for each of the tested optical flow methods. (Right): Various performance metrics of our technique, based on different OF methods for velocity field calculation. The maximum value for each metric is given in bold and black. Stronger shades of green indicate a better result. All values are in the range from 0 to 1. Higher value indicates a better result. UG dataset with the original feature vector.

Figure 6. (Left): Difference in performance between new feature vector and original feature vector for LE2I dataset. (Right): Difference in performance between new feature vector and original feature vector for first occurrence for the UG dataset. The maximum value for each metric is given in bold and black. Stronger shades of yellow/orange indicate a better result. Stronger shades of blue indicate a worse result. Positive values indicate an increase in performance, negative values indicate a decrease in performance.

Figure 7. (Left): Confidence intervals for the AUC of the test split of the LE2I dataset. (Right): Confidence intervals for the AUC of OOD dataset UG.

Figure 8. (Left): Calculated p-values on the LE2I dataset. (Right): Calculated p-values on the UG dataset. For values of p > 0.05, we observe a lack of statistical significance. Darker shades of red indicate a higher p-value, brighter shades of yellow indicate a lower p-value. For p < 0.05, the models differ mathematically. We have marked those cases with “*”.

Figure 9. Computation times for each optical flow method. Lower value is better.

Table 1. Summary of different optical flow methods.

Method	Strengths	Weaknesses	Use Cases
Lucas–Kanade (LK)	- Simple, efficient - Good for small, coherent displacements - Works well in sparse, feature-based tracking (corners, textures)	- Assumes brightness constancy & small motion - Fails with large displacements - Sensitive to noise and textureless regions	- Tracking a few good features (e.g., Shi–Tomasi corners) - Real-time applications with limited compute
Lucas–Kanade Derivative of Gaussian (LKDoG)	- More accurate gradient estimation than standard LK - Improves robustness to noise - Better handling of textured patterns	- Higher computational cost than basic LK - Still assumes small motion	- Scientific/medical imaging - Cases where accurate gradient estimation matters more than speed
Farneback (Polynomial Expansion)	- Dense optical flow - Handles larger displacements better than LK - Provides smooth, detailed flow fields	- More computationally expensive - May over smooth motion boundaries - Can be less accurate with sudden motion	- Dense motion analysis - Video stabilization, background/foreground segmentation - Gesture recognition
Horn–Schunck (HS)	- Produces dense, smooth flow - Global regularization makes it robust to noise - Well-established classical method	- Over-smooths at motion boundaries - Sensitive to parameter tuning (smoothness λ) - Computationally heavier than LK	- Global motion estimation - Medical imaging (e.g., tissue/organ motion) - Scenarios where smoothness is desired
SOFIA	- Robust to noise via spectral representation - Can handle complex, non-rigid deformations - Iterative refinement improves accuracy	- Higher computational load - Less common as it is new - Requires careful parameter tuning	- Biomedical signals (EEG, brain activity mapping) - Applications with periodic/oscillatory motion - Research where spectral methods outperform spatial-only methods

Table 2. Correlation analysis of used features.

Feature Pairwise Comparison	Pearson Correlation Coefficient ( $r$ )
$V_{m a x}$ and $σ$	−0.05
$A_{m a x}$ and $σ$	0.27
$D_{m a x}$ and $σ$	−0.25

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Karpuzov, S.; Kalitzin, S.; Tabakov, S.; Tsolyov, D.; Petkov, G. Exploring Optical Flow Methods for Automated Fall Detection System. Information 2026, 17, 300. https://doi.org/10.3390/info17030300

AMA Style

Karpuzov S, Kalitzin S, Tabakov S, Tsolyov D, Petkov G. Exploring Optical Flow Methods for Automated Fall Detection System. Information. 2026; 17(3):300. https://doi.org/10.3390/info17030300

Chicago/Turabian Style

Karpuzov, Simeon, Stiliyan Kalitzin, Stefan Tabakov, Dobromir Tsolyov, and Georgi Petkov. 2026. "Exploring Optical Flow Methods for Automated Fall Detection System" Information 17, no. 3: 300. https://doi.org/10.3390/info17030300

APA Style

Karpuzov, S., Kalitzin, S., Tabakov, S., Tsolyov, D., & Petkov, G. (2026). Exploring Optical Flow Methods for Automated Fall Detection System. Information, 17(3), 300. https://doi.org/10.3390/info17030300

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Optical Flow Methods for Automated Fall Detection System

Abstract

1. Introduction

1.1. Motivation

1.2. Related Works

1.3. Original Contributions

1.4. Article Structure

2. Materials and Methods

2.1. Optical Flow Methods

2.2. Data

2.3. Expanded Motion Features

2.4. Evaluation Parameters

2.5. Statistical Testing

3. Results

Texture Analysis of the Background Scene

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI