2. Materials and Methods
In our original work we combine audio and video features into a single feature vector that is then used as input for an SVM classifier. A general scheme of the system is presented in
Figure 1a. Here, we remove the audio branch entirely. We examine different OF calculation methods and add a new motion feature to our feature vector. The updated system can be viewed in
Figure 1b:
We begin this chapter by introducing different optical flow calculation techniques.
2.1. Optical Flow Methods
The Horn–Schunck method is a global regularization method that addresses optical flow estimation as a variational minimization problem. The method minimizes a global energy functional:
where the first term enforces brightness constancy and the second term imposes smoothness regularization on the velocity field, with the parameter α controlling the trade-off between these competing objectives. The solution is obtained through Euler–Lagrange equations. The resulting global regularization produces dense flow fields with velocity estimates at every pixel, propagating motion information from high-gradient regions into textureless areas through the smoothness constraint. Horn–Schunck produces dense flow fields with velocity estimates at every pixel. Even in textureless regions where the brightness constraint provides no useful information, the global smoothness constraint propagates flow from surrounding areas, filling in the velocity field throughout the image. This makes the method particularly valuable for applications requiring complete motion coverage across the entire image domain. This is the OF method we have used in our original work.
We implemented and tested the Farneback OF algorithm to assess its suitability for fall-related motion analysis. Farneback provides a dense flow field with good sensitivity to broad motion patterns, estimating motion for every pixel, allowing it to capture these patterns across the entire image. This dense representation is particularly sensitive to large, abrupt movements typical of fall events. From the resulting flow fields, we extracted frame-wise vertical velocity profiles and computed the standard fall-related features used in prior works (maximum vertical velocity, acceleration, deceleration).
Additionally, we also integrated the Lucas–Kanade method as a lightweight alternative to Farneback’s dense estimation. Lucas–Kanade tracks small sets of coherent key points, making it faster, sparser, and less memory intensive. Using the vertical component of the tracked object, we derived the same fall-related features, enabling a direct comparison between the two methods within the unified evaluation framework.
A different variation of the standard LK method is the so-called Lucas–Kanade Derivative of Gaussian (LKDoG). Compared with the original Lukas-Kanade (LK) method, LKDoG differs primarily in how it computes derivatives. For spatial derivatives, LKDoG first applies Gaussian smoothing to the whole input image. Then, it computes the DoG filters, followed by additional Gaussian smoothing applied on the gradient fields. The method uses separate standard deviation parameters for image smoothing and gradient smoothing to control the characteristics of these filters. For temporal derivatives, it applies a DoG filter for temporal filtering not on two consecutive frames, but on a window of multiple frames. Using DoG filters provides better noise reduction and smoother gradient estimates compared to simple finite difference approximations used in basic Lucas–Kanade implementations. The LKDoG method also incorporates temporal filtering across multiple frames rather than just two consecutive frames, making it more robust to insignificant temporal variations.
The spectral optical flow iterative algorithm (SOFIA) proposes a novel approach for optical flow reconstruction that integrates local structural information from multiple spectral components, such as color channels, to address the inverse problem. To tackle singularities, the method constructs a structural tensor using aggregated spatial-spectral gradient data, effectively enhancing the rank of the system. Furthermore, the algorithm introduces a spatial smoothening functionality applied directly to the structural tensor within local neighborhoods, which avoids the gradient cancellation effects often observed when smoothening images directly. Subsequently, an iterative multi-scale scheme is utilized where the vector field reconstructed at coarser scales deforms the source image to refine the input for finer scale calculations. Validations demonstrate that this approach significantly improves reconstruction accuracy and functional association compared to standard methods like Horn–Schunck, particularly when processing complex global translations.
In summary, a brief description of the methods is available in
Table 1:
3. Results
A Lenovo® ThinkPad (Lenovo, Hong Kong) with an Intel® Core i5 CPU (Intel, Santa Clara, CA, USA) and 16 Gb of RAM is used to process videos from our sets. Algorithm ralization is carried out in the MATLAB® R2023b environment.
The parameters of the different optical flow methods are the following:
Horn–Schunck—smoothness value is set to 1, maximum number of iterations are set to 10;
Lucas–Kanade—window size is set to 3 × 3, noise threshold is set to 0.039;
Farneback—number of pyramid levels is set to 3, pyramid scale is set to 0.5, size of the pixel neighborhood is set to 5, number of iterations per pyramid level is set to 3;
LKDoG—number of buffered frames is set to 3, noise threshold is set to 0.039, standard deviation for image smoothing filter is set to 1.5;
SOFIA—smoothing scale is set to 3, vertical and horizontal structural integration scales are set to 3.
In the paper, we follow the same SVM setup as our previous work: The LE2I dataset, consisting of 190 videos, is first randomly divided into two equal parts to form separate training and testing sets. For the classification task, the model utilizes an SVM equipped with a radial basis function (RBF) kernel. Before being fed into the classifier, the extracted event features are logarithmically transformed and standardized to possess a zero mean and unit standard deviation. The model’s parameters are tuned strictly on the training set using a 10-fold cross-validation scheme to ensure robust performance and prevent data leakage. The UG dataset is used exclusively for testing.
Our first contribution is investigating how different optical flow methods affect the overall detection algorithm. We have tested the following five: Lucas–Kanade, Horn–Schunck, Lucas–Kanade Derivative of Gaussian, SOFIA and Farneback. Results for the LE2I dataset, presented as ROC curves and performance metrics are available in
Figure 3:
Here we can see that the three best performing methods in terms of maximum area below the curves are Horn–Schunck, Farneback and SOFIA. Detailed performance metrics—accuracy, sensitivity, specificity, precision, f1-score and AUC—are also presented. These values confirm the findings from the ROC curves for the three best-performing OF methods.
In addition, we present ROC curves and performance metrics for the second dataset—UG. The results are available in
Figure 4:
From these results it would seem that the best performing OF method is SOFIA. Adding a new feature to the feature vector—namely the width of the velocity curve where V
y is at a maximum—gives us the following results, presented on
Figure 5:
Figure 5.
(Left): Performance metrics for the new feature vector on LE2I data. (Right): Performance metrics for the new feature vector on UG data. The maximum value for each metric is given in bold and black. Stronger shades of green indicate a better result. All values are in the range from 0 to 1. Higher value indicates a better result.
We can see that for LE2I, LKDoG excels, while for the UG dataset, SOFIA is still the best performing. We can also compare how the addition of the new feature affected performance. In
Figure 6 we show the difference in performance metrics between the original and updated feature vector.
We provide information about the bootstrapped 95% confidence intervals for both datasets:
Figure 7 illustrates the bootstrapped 95% confidence intervals for the area under the curve of the evaluated OF methods on both datasets. On the heavily textured LE2I dataset, LKDoG (mean AUC 0.899, confidence bounds 0.812–0.950) heavily overlapped with the other classical methods, including the lowest-performing SOFIA method. This significant overlap indicates no definitive statistical superiority among the top methods in known environments. However, performance on the unseen UG dataset revealed severe degradation in performance for the majority of the algorithms. Notably, LKDoG and HS experienced drastic mean AUC reductions (dropping to 0.590 and 0.450, respectively), rendering their predictive capabilities equivalent to or worse than random chance. Conversely, SOFIA demonstrated unique environmental invariance, increasing its mean AUC to 0.780 and maintaining predictive viability despite the expanded error bounds inherent to the out-of-distribution testing.
Figure 8 presents the pairwise McNemar’s test
p-values, evaluating the statistical significance of differences in the final binary predictions of the models. On the LE2I dataset, the test confirms a statistical tie among the top-tier classical estimators (LKDoG and Farneback,
p = 1.000), further corroborating the AUC confidence interval findings. The test also successfully isolated the lowest-performing method, SOFIA, as statistically inferior to LKDoG (
p = 0.024) and Farneback (
p = 0.029) in this specific environment. In contrast, the heatmap for the out-of-distribution UG dataset shows that all pairwise
p-values exceed the 0.05 threshold. This lack of statistical significance across the board mathematically reflects the high error variance and shifting performance boundaries that occur when these models are applied to drastically different conditions.
In addition, we examine the correlation between the existing features and the newly introduced σ. We calculate the Pearson linear correlation coefficient between σ and the old features. The results are presented in
Table 2.
These results demonstrate that the absolute correlation coefficients remain low (∣r∣ < 0.30), indicating a weak to negligible linear relationship. Crucially, we highlight a fundamental mathematical distinction: while , , and function as differential descriptors identifying instantaneous local extrema, σ serves as an integral temporal descriptor.
Finally, we examine OF method runtimes for a video with FPS = 25 and N = 150 total frames. Video frame size is 320 by 240 pixels
2. Results are presented in
Figure 9:
These results are expected and are related to the nature of the used algorithms—sparse versus dense. In the case of LKDoG, Farneback and SOFIA, the extra assumptions, conditions and information used by the methods lead to a higher time-performance cost. Their reported calculation speed does not prevent their use in real-time calculations.
Texture Analysis of the Background Scene
We carry out quantitative analysis of the texture information of the background scene. The process initiates by sequentially extracting 100 frames from two separate videos from each dataset. A 2D directional Sobel spatial filter [
26] is applied to the frame data to compute the gradient magnitude matrix, which effectively highlights regions containing strong structural edges and high contrast. To quantify the overall structural complexity of the image, this matrix is flattened and the arithmetic mean of its elements is calculated to yield a single average gradient magnitude (AGM) scalar value. This is systematically repeated for every extracted frame across both videos. Finally, the resulting temporal AGM values are plotted in
Figure 10 in order to visualize the dynamic texture changes between the two videos.
These results show clearly the textural difference in the scenes of both video sets. These properties of the data have a large effect on the performance of different OF methods for fall detection.
Figure 10.
AGM for each individual frame for two videos from LE2I and UG.
4. Discussion
In this study, we presented a comprehensive evaluation of a vision-based fall detection system that integrates variable optical flow (OF) algorithms with a standard ML classifier (namely the SVM). We tested the OF methods on two datasets and can summarize our findings in the following way:
On the LE2I dataset, the Horn–Schunck and Farneback methods demonstrate the strongest performance with the original feature vector. Horn–Schunck achieves the highest overall metrics, with a sensitivity of 98.4%, an F1-score of 93.3% and an AUC of 0.962. Farneback shows the second-best performance across all categories. The introduction of the new feature vector significantly shifts the results for LE2I. The Lucas–Kanade Derivative of Gaussian method sees a dramatic improvement, becoming the top performer with increases in accuracy (+22.1%) and specificity (+35.5%). This suggests that LKDoG benefits most from the additional temporal shape information in simpler environments.
In the UG dataset, SOFIA emerges as the most robust method using the original feature vector, achieving the highest accuracy (75.0%), sensitivity (90.0%), and F1-score (78.3%). While Horn–Schunck struggles here with a low accuracy of 45.0%, SOFIA maintains high reliability, outperforming standard methods like Lucas–Kanade and Farneback. Unlike the LE2I dataset, adding the new feature vector to the UG dataset generally degrades performance for most methods. SOFIA’s sensitivity drops by 30%, and Horn–Schunck sees decreases across multiple metrics. Only Lucas–Kanade shows a mild improvement in sensitivity (+30%) but suffers in specificity (−30%), indicating that the new feature may introduce noise or overfitting in this dataset.
The performance disparity on the UG dataset—where Horn–Schunck (HS) failed (45.0% accuracy) while SOFIA excelled (75.0%)—can be attributed to the handling of homogeneous backgrounds. Our AGM analysis shows that the LE2I dataset yielded an AGM of roughly 52, whereas the UG dataset demonstrated a significantly reduced AGM of roughly 15. This comparative analysis confirms that the UG dataset constitutes a homogenous, low-texture environment relative to the high-frequency spatial components found in the domestic and office backgrounds of the LE2I scenes. The observed performance degradation of the Horn–Schunck (HS) algorithm on the UG dataset is directly attributable to its reliance on global smoothness constraints. In such under-determined, textureless regions, the HS method encounters an ill-posed inverse problem: the “Aperture Problem” [
27], where the lack of local gradients forces the algorithm to rely on artificial flow propagation, thereby distorting the subject’s silhouette. Conversely, SOFIA demonstrates superior architectural robustness, maintaining an accuracy of 75% in identical conditions. This stability is facilitated by SOFIA’s implementation of Tikhonov regularization on the structural tensor. By utilizing aggregated spatial-spectral gradient data to mitigate singularities, SOFIA effectively stabilizes the motion estimation in homogeneous areas, preserving critical geometric features where global smoothness constraints typically fail. The analysis of the background scene and its effect on detection accuracy can help increase the robustness of the method. As it stands, our system is intended for indoor scenarios that include a small number of people. Solving the problem of dynamic scene variation can increase the generalization extent of our work.
Our statistical tests show the following results: there is a lack of absolute statistical superiority among the evaluated optical flow methods, which fundamentally supports the necessity of a multi-method approach for real-world fall detection. On the in-distribution LE2I dataset, both 95% confidence intervals and pairwise McNemar’s tests (p > 0.05) confirmed that top-performing classical methods like HS, LKDoG and Farneback are in a statistical tie. This proves that within richly textured environments, no single method acts as a universal solution. The limitations of relying on a single, isolated estimator are further exposed by the out-of-distribution UG dataset results. When transitioned to homogeneous, low-texture backgrounds, algorithms that excelled on the LE2I dataset suffered severe relative degradation, with AUC scores plummeting to near-random chance (AUC was roughly equal to 0.500). In stark contrast, the SOFIA estimator, despite being proven statistically inferior on the textured LE2I dataset (p < 0.05), demonstrated remarkable structural robustness on the UG dataset, maintaining a highly viable predictive AUC of 0.780.
While the small sample size of the UG dataset resulted in wide confidence intervals and no definitive statistical dominance, the practical engineering implications are absolute. SOFIA shows immunity to the domain shift that disabled the other methods. This highlights its unique value in homogeneous environments. Ultimately, this performance inversion empirically proves that optical flow estimator efficacy is strictly environmentally dependent. To break through the current accuracy upper bound, future robust fall detection systems must abandon single-method architectures in favor of context-aware combinations, utilizing algorithms like LKDoG or HS for textured backgrounds while dynamically routing homogeneous video frames to more resilient algorithms like SOFIA.
One limitation of the current work is the relatively small size of the test dataset (UG). Misclassification of a small number of videos can create large differences in reported performance. However, the UG dataset provides a preliminary look at out-of-distribution performance. For future development, more specialized video data is needed to ensure more rigorous validation.
One future direction of work is to determine whether SOFIA can benefit from calculation on a graphics processing unit. We will explore full GPU acceleration (utilizing CUDA kernels) to offload the structural tensor computation and Gaussian smoothing steps.
A different direction for future work is to find better ways to utilize the temporal nature of the fall movement. The proposed new feature provided only marginal improvements for some OF methods. A natural candidate for such analysis would be to include a LSTM (long short-term memory) block or a lightweight Transformer that can help with the temporal properties of the data. A larger volume of test data may still prove the usefulness of σ.
While this study focuses on classical and spectral optical flow methods suitable for CPU-bound edge devices, we acknowledge the emergence of deep learning-based estimators like RAFT (recurrent all-pairs field transforms) [
28]. RAFT offers superior accuracy in detecting fine micro-movements, but its heavy GPU requirement currently makes it cost-prohibitive for large-scale, privacy-preserving edge deployment in every hospital room. Future iterations of our system may explore lightweight distilled versions of such networks (e.g., FlowSeek or LiteFallNet) to balance this trade-off.
Finally, we aim to explore how changing the ML method for classification can increase performance. In our original work, the SVM classifier showed the best performance when compared to other existing machine learning techniques. Since then, there has been rapid development of new methods that could potentially be better for the task at hand. Such an examination would deserve its own separate study.