1. Introduction
Since its inception, background subtraction has always been an interesting problem among computer vision researchers. Background subtraction is the computer vision technique where the background is removed from the foreground. Foreground is the cluster of pixels that denote a moving object in the scene, captured by a camera, which is separated and distinguished from the background. This segmentation of pixels between foreground and background has been important, with many applications, in several areas ranging from medical imaging, astronomy, machine vision, and tool inspection to action recognition and vehicle detection and tracking [
1]. The advent of autonomous vehicles has caused a resurgence of interest in the background subtraction techniques for detecting moving vehicles, pedestrians, animals, and other objects of interest [
2]. Even though background subtraction is a well-studied problem, many challenges remain open and unsolved. In particular, dynamic scenes that include several moving objects, changes in illumination conditions, shadows, or occluded objects or foreground objects that become part of the background, when, for instance, a moving vehicle is stopped or parked, make background subtraction a difficult yet challenging problem.
A basic technique to tackle the problem of background subtraction is through pixel differencing, that is, subtracting corresponding pixels between two consecutive image frames while a given threshold differentiates between foreground and background pixels. The resulting, black and white, image provides a representation of the pixels belonging to either one of the two categories. Complex environments, however, require more sophisticated methods to deal with noise, shadows, changes in illumination conditions, and a time interval to consider a pixel as a background or foreground. A significant number of background subtraction algorithms rely on statistical techniques to tackle the aforementioned issues, for example, mixture of Gaussians [
3] or optical flow [
4] where the motion parameters of pixels are inferred with respect to time and space, thus allowing the segmentation of pixels belonging to multiple foreground objects.
In this research, the problem of background subtraction is addressed using statistical techniques that model pixel intensities with lognormal distributions. In our method, a static camera is employed that captures a small number of frames for inferring the parameters (log mean and log standard deviation) of lognormal distributions used to describe a scene without any foreground (i.e., moving) objects, which constitutes the training phase of the algorithm. Our method models every single pixel using lognormal distributions. In the testing phase, every single pixel is checked as to whether it belongs to the foreground or background, given the estimated lognormal parameters from the training phase. We have tested our algorithm in a number of videos and have compared it with the state of the art to validate our methodology. Our proposed model has proven to be robust yet efficient at distinguishing foreground pixels from the background in a parsimonious manner. In addition, our algorithm adapts quickly to changes in illumination, or to objects entering into or moving out of a scene by re-sampling and computing continuously pixel intensities and lognormal distribution parameters, respectively. Our algorithm is fully parallel and has been implemented and tested on NVIDIA Graphical Processing Units (GPUs) [
5] using the CUDA [
6] parallel platform. Our parallel algorithm is based on the serial algorithm that was implemented on a CPU, presented in [
7], with improvements being introduced in the statistical model.
This paper consists of five sections. The next section,
Section 2, presents a review of the state-of-the-art algorithms with respect to the background subtraction.
Section 3 provides a mathematical explanation of the statistical model, which is then followed by an explanation of the parallel algorithm.
Section 4 presents the experiments carried out as well as the results from the proposed parallel algorithm compared with the state-of-the-art algorithms. A qualitative analysis is provided. Finally,
Section 5 offers a synopsis of the results obtained as well as a discussion for future research directions and improvements of the proposed algorithm.
2. Background Work
This section reviews representative background subtraction approaches and highlights their modeling assumptions and practical trade-offs. To improve readability, we group prior work by (i) parametric per-pixel models, (ii) feature/texture enhancements, (iii) GPU acceleration, (iv) low-rank/decomposition models, (v) non-parametric and sample-based models, and (vi) deep-learning approaches, followed by deployment considerations and the positioning of the present work.
2.1. Parametric Per-Pixel Background Models (Gaussian Mixtures)
Mixture of Gaussian algorithms (MOG) [
8] are the most well-known algorithms to date for background subtraction [
8]. Each pixel is modeled using Gaussian distributions, and an online approximation is used to update the model. Every pixel that does not fit the Gaussian distribution model is considered a foreground with the remaining being considered background pixels. The concept of mixture models appears also in [
3], which is based on the framework proposed by [
8]. Post-processing enhances further the overall outcome of background subtraction by utilizing two-pass connected components with the view to connect foreground regions. Furthermore, it is also possible to detect shadows [
9] in foreground pixels.
In [
10], an improved adaptive mixture model for background subtraction is presented, while in [
11], the pixel-level background subtraction is continuously updated using recursive equations from the Gaussian mixture models. Furthermore, a non-parametric adaptive density estimation method is also presented. In [
12], background pixels are described using a statistical representation, while a second statistical method is used for foreground moving objects. A general non-parametric utilization of the density estimation technique is presented for foreground and background pixels. In [
13], the authors present a non-parametric texture-based method using adaptive local binary pattern histograms in regions around a pixel to distinguish the foreground from the background. An adaptive non-parametric background subtraction algorithm is presented in [
14] using non-parametric kernel density estimations. In [
15], an adaptive model is presented where randomly selected values are substituted instead of selecting the oldest ones. A background subtraction for real-time tracking in embedded camera networks appears in [
16], while in [
17], an embedded vision MOG-based algorithm is presented.
In [
18], the authors present a statistical background estimation with Bayesian segmentation and a solution to multi-target tracking problem using Kalman filters and Gabe-Shapley matching. Their method tracks and segments people in variable lighting and illumination conditions. In [
19], the authors use local texture features to differentiate the background from the foreground. Although this method can adapt efficiently to varying illumination conditions, it does not perform well on uniform regions where a homogeneous texture appears.
Strengths of Gaussian mixture models include their computational efficiency and online adaptation, which makes them attractive for real-time systems. Limitations include sensitivity to heavy-tailed deviations (e.g., sudden illumination changes and specularities) and the need to tune learning rates and thresholds to avoid ghost artifacts and false positives.
2.2. GPU Acceleration and Real-Time Systems
A GPU-based algorithm for high-performance MOG background subtraction appears in [
20]. The authors present several GPU optimization techniques such as control flow and register usage optimization implemented on a CUDA platform. In [
21], a parallel implementation is presented using Gaussian mixture models (GMMs) and the Codebook, reporting the efficiency of GMMs over Codebook. A Deep-Neural-Network-based algorithm for background subtraction is proposed in [
22] where a Convolutional Neural Network (CNN) is used on spatial features in grayscale images. In another neural network implementation, the authors in [
23] use multiple layers to combine spatio-temporal correlations among pixels.
GPU implementations exploit the fact that most operations are per-pixel and embarrassingly parallel. In practice, throughput depends not only on arithmetic intensity but also on memory layout, coalesced access, and kernel launch overhead; these considerations motivate designs that minimize passes over the image.
2.3. Low-Rank and Decomposition Models (RPCA/DLAM)
Early change-detection research focused on formalizing the signal model itself. The influential survey in [
24] mapped the field into hypothesis-testing, predictive, and background-modeling families and discussed how statistical rigor invariably traded off against real-time speed on the hardware of the day. That taxonomy framed subsequent work on robust low-rank models, culminating in the convex Principal Component Pursuit formulation proven in [
25]. The authors showed that any video stack can, under mild incoherence conditions, be split exactly into a low-rank background and a sparse foreground, and they supplied an Augmented-Lagrange solver that immediately became the go-to baseline for Robust Principal Component Analysis (RPCA) methods.
The review in [
26] catalogs thirty-two DLAM (Decomposition into Low-rank and Additive Matrices) algorithms—covering Stable-PCA, Robust-NMF (Non-negative Matrix Factorization), and Incremental-SVD—and re-benchmarks them on the BMC2012 corpus, highlighting which variants remain plausibly real-time and which require GPU acceleration. Addressing that speed gap, ref. [
27] recasts RPCA as Robust Online Matrix Factorization: background sub-spaces are updated with stochastic gradients, allowing 250 fps on 720 p streams and graceful recovery from sudden illumination jumps. Yet, even on-line factorization requires dense linear algebra; ref. [
28] sidesteps that by treating background pixels as smooth graph signals and recovering them with graph total-variation priors, which improves F-measure on CDnet’s Dynamic Background scenes while needing far fewer labels than CNN-based rivals.
Low-rank approaches can be highly accurate, particularly in the presence of structured noise, but they often require heavier optimization and memory, which can be challenging for high frame rates or embedded deployments.
2.4. Non-Parametric and Sample-Based Models
Parallel to low-rank theory, the community refined non-parametric pixel models. The Pixel-Based Adaptive Segmenter (PBAS) of [
29] introduced per-pixel thresholds and learning rates that self-tune from recent error statistics, eliminating global hyper-parameters and making PBAS a popular baseline for embedded cameras. The authors of [
30] later standardized evaluation practice by re-implementing twenty-nine classical algorithms in a unified C++ library and publishing exhaustive numbers on synthetic and real footage [
30]. Their work, together with the greatly expanded CDnet-2014 benchmark in [
31], gave the field the large, labeled data required to compare algorithms fairly.
Sample-based and non-parametric models can adapt quickly to multi-modal backgrounds and reduce modeling bias, but they increase the memory footprint and may be sensitive to sample-update policies and threshold schedules.
2.5. Deep Learning for Background Subtraction
With datasets in place, deep learning surged. The 2019 synthesis in [
32] explains why fully convolutional networks began to dominate CDnet’s league tables and which open problems—domain adaptation and class imbalance—remain unsolved. Scene-specific CNNs were first explored in [
22], where a modest five-layer network learned the subtraction function directly and already outperformed handcrafted features at 25 fps on a GPU. Generalizing further, ref. [
33] trained a deeper ResNet on only 5% of CDnet frames and fused its logits with a spatial median filter, achieving a top-three average rank while keeping annotation needs low. Performance jumped again with explicit multi-scale design: FgSegNet v2 [
34] combines skip connections and feature-pooling modules to boost the F-measure on cluttered indoor and camera-jitter categories. Hybrid pipelines followed: RT-SBS in [
35] runs ViBe every frame but calls a light semantic segmenter only every k frames, preserving a real-time speed yet lifting recall in crowded scenes. Finally, BSUV-Net [
36] introduces a two-background-frame input and a cosine-annealed augmentation schedule; remarkably, it ranks first on the Unseen Videos track without any scene-specific fine-tuning.
Deep learning models provide state-of-the-art accuracy on large benchmarks, but they typically require labeled data, careful domain adaptation, and significant computing resources, which may limit the throughput on edge devices.
2.6. Deployment Considerations and Downstream Pipelines
Speed, however, is still a constraint for fielded systems on drones or roadside units. GPU-specific kernel fusion and memory tiling in [
20] accelerate classic GMM and ViBe by up to 30× on a GTX 780, while the CUDA port of ViBe for Jetson TX modules in [
37] sustains 30 fps at 720 p within a 10 W power envelope. FPGA solutions surveyed in [
38] show a PBAS core reaching 150 fps VGA at under 3 W, and the edge-oriented study [
39] measures how energy–latency trade-offs shift across ARM CPUs, Mali GPUs, and Jetson NX boards—critical information when sizing airborne inspection rigs.
Most recently, YOLOv8 has become the detector of choice for highway-distress inspection networks. The algorithm-specific improvements in [
40] embed SPD-Conv blocks and adopt WIoU loss, lifting mAP by 6% on RDD2022 while shrinking model size by 22%. Complementarily, ref. [
41] replaces C2f with a Faster-EMA block plus SimSPPF, netting another 5.8% mAP gain and 21% fewer FLOPs. These latest advances illustrate how the field continues to swing between better modeling and faster deployment, echoing the same accuracy–speed tension first articulated two decades ago [
24] and still central to our own work.
Overall, the literature reflects a recurring accuracy–speed trade-off. This motivates methods that remain label-free and lightweight while improving robustness to skewed pixel deviations and exploiting GPU parallelism.
2.7. Positioning and Contribution of This Work
Motivated by the need for fast, label-free per-pixel modeling that remains robust under right-skewed deviations, we propose a lognormal per-pixel deviation model implemented fully in parallel on GPUs. The method uses a short circular buffer to estimate log-domain parameters and produces a probabilistic foreground score via the lognormal cumulative density function.
3. Methods
This section presents the mathematical formulation and GPU implementation of the proposed background subtraction method. We describe (i) a short training phase that estimates per-pixel lognormal parameters from a circular buffer, (ii) a testing phase that computes a probabilistic foreground score for each pixel, and (iii) the parallelization strategy on GPUs. Unless stated otherwise, our experiments use 640 × 480 frames.
3.1. Problem Setup and Notation
Let denote the pixel intensity at location in the z-th frame of a circular buffer of length n (). Let denote the pixel intensity of the current test frame.
3.2. Why Lognormal Modeling of Pixel Deviations?
We model deviations (absolute differences) rather than raw intensities. Deviations are nonnegative and, in practice, often exhibit right-skewed distributions due to illumination changes, sensor noise, specularities, and multiplicative effects. Under such conditions, a log-domain representation tends to stabilize variance and leads naturally to a lognormal model for deviations. In
Section 4, we also describe a simple goodness-of-fit comparison pathway to empirically support the choice of lognormal over Gaussian fits.
3.3. Training and Testing Phase: Log-Domain Parameter Estimation
Our parallel algorithm has been tested in various illumination conditions and is compared with two state-of-the-art algorithms. The results show that our algorithm can fully adapt to environmental changes such as illumination by performing temporal sampling on individual pixels on a constant basis. Furthermore, it can detect shadows as foreground pixels and process a high number of frames per second due to its parallel implementation.
At first, the training phase is taking place by capturing a small number of samples (i.e., frames),
n, that fills up the buffer when the environment consists of only static objects. During this phase, the mean value of every single pixel, across all frames from the sample, is computed as shown in Equation (
1),
where the sample (buffer) size, in our experiments, is equal to three frames (
). In other words, we compute the mean intensity of all pixels that belong to the same location,
, across multiple frames. In addition, the median,
, is computed for every pixel across all frames. Upon computing the mean of every pixel,
, we subtract it from every pixel value,
, (Equation (
2)), resulting in the histograms as shown in
Figure 1.
In this figure, the lognormal distributions are different on every pixel location. A curve fitting in the resulting histograms produces lognormal probability density functions
The difference between
and
is that the former takes place during training, while the latter does during testing. Equation (
4) shows the function of the lognormal pdf, and
Figure 1 provides a pictorial representation of the aforementioned process,
while Equation (
5) shows the lognormal cumulative density function (cdf), where
erfc is the complementary error function, and
is the standard normal cdf, as follows:
By knowing
, we can infer a probabilistic score for every pixel using Equation (
5). A probabilistic score above a given threshold,
, denotes that the pixel belongs to the foreground, whereas a probabilistic score below the threshold indicates the pixel belongs to the background. During the testing phase, the model is updated in every frame with the buffer evicting the oldest frame and incorporating the newest one. All the aforementioned parameters, such as mean, median, and
(log scale), are updated in parallel, by exploiting the multiple cores that GPU offers, to ensure the model adapts immediately to changes as well as to switch a pixel from the foreground to the background should it remain unchanged for a period of
x frames (in our experiments
).
Figure 1.
Lognormal distributions obtained for every pixel during the training phase by sampling temporal pixel values over a three-frame buffer. The buffer is constantly updated with newer frames by evicting the old ones.
Figure 1.
Lognormal distributions obtained for every pixel during the training phase by sampling temporal pixel values over a three-frame buffer. The buffer is constantly updated with newer frames by evicting the old ones.
3.4. Algorithm Description
Within the
if statement in Algorithm 1, we primarily look into creating the variables by focusing on Equations (
1)–(
3) whose outputs are needed for Equations (
4) and (
5) in Algorithm 2. In Algorithm 1, before the
if statement, we initialize the CUDA/C++ variables used within the GPU processing. To optimize background subtraction using lognormal cumulative density functions, Algorithm 1 must be processed first. In this algorithm, the
pixelsPerFrame parameter is crucial as it determines the number of threads required to process the image. If this is not correctly set, the CUDA/C++ implementation will create too many threads. Allocating a
pixel variable ensures that the number of threads matches the frame size, fitting within the thread matrix on the GPU. The
if statement in the algorithm helps identify the correct threads needed for processing. By avoiding the allocation of unnecessary threads, the algorithm ensures that no overhead is added, which improves the processing speed and efficiency by preventing stray threads from accessing memory outside of the image size parameters.
The method relies on a dynamic frame-handling system, where each new frame replaces the previous one in a buffer array, following a first-in, first-out (FIFO) structure. This constant refresh of the buffer ensures the ability for new static objects to be placed in frame. The iterative nature of this process, involving both Algorithms 1 and 2, ensures that variables and frames are continuously updated, maintaining a high-quality flow of data for the background subtraction task including updating the buffer with updated pixel intensities. This buffer is referenced throughout both algorithms as , where are the coordinates in the 2D image array, and is the depth, that is, the number of 2D image arrays in the buffer that constitute the 3D array.
The line combines the thread’s local ID within a block and the block’s global ID to assign each thread a unique one-dimensional index across the entire CUDA grid. By doing so, we ensure that each thread operates on one distinct pixel in the image frame. For instance, if the frame resolution is 1920 × 1280 (i.e., 2,073,600 pixels) and each block contains 512 threads, launching 4050 blocks results in exactly one thread per pixel. The if (pixel < pixelsPerFrame) condition ensures that threads beyond the valid range do not access invalid memory, maintaining safe and efficient execution.
Algorithm 2 is a core component of the background subtraction framework, responsible for computing the cumulative density function (CDF) statistics to classify pixels based on a lognormal model. This process enhances the accuracy of dynamic background subtraction by leveraging statistical normalization techniques.
Table 1 and
Table 2 provide a description of the variables for Algorithms 1 and 2, respectively.
Table 1.
Description of variables in Algorithm 1.
Table 1.
Description of variables in Algorithm 1.
| Variable | Description |
|---|
| Total number of pixels per frame, calculated as frame width × frame height. |
| Current pixel index. |
| Total number of frames in the buffer. |
| Sum of all pixel intensities across frames for given coordinates . |
| Mean pixel intensity at over all frames. |
| Lognormal buffer. |
| Sum of all pixel intensities across . |
| Median of |
| Standard deviation of . |
| Algorithm 1: Image Buffer Parameterization |
|
Initially, the algorithm determines the total number of pixels per frame by multiplying the frame’s width and length. The if statement ensures that only the necessary threads are allocated by checking if the current pixel index is within the total number of pixels per frame.
Within the
if statement, the algorithm performs several sequential transformations to standardize the pixel intensity variations:
Table 2.
Description of variables in Algorithm 2.
Table 2.
Description of variables in Algorithm 2.
| Variable | Description |
|---|
| Total number of pixels per frame, calculated as frame width × frame height. |
| Latest frame to be processed. |
| Lognormalized buffer frame (i.e., Equation (5)). |
| Input variability amount. |
| Background subtracted image. |
| Algorithm 2: CDF-based Statistics Calculation |
|
4. Results
In
Figure 2,
Figure 3,
Figure 4 and
Figure 5, we provide a series of images that our algorithm was tested on. In particular, we have experimented with different videos taken under varying illumination conditions with high-speed moving objects, such as a vehicle, as well as with humans in different scenarios. In the same set of figures, we juxtapose the output of our parallel lognormal algorithm with the state-of-the-art algorithms, namely, MOG [
3] and MOG2 [
10,
11], and provide a qualitative analysis for comparison. For the implementation of MOG and MOG2 algorithms, the
deepgaze (
https://github.com/mpatacchiola/deepgaze, accessed on 3 May 2017) computer vision library has been utilized [
42] using the Python 3 programming language and OpenCV [
43]. The video in
Figure 2 is from [
42], while videos in
Figure 3 and
Figure 4 are from [
44] (
http://study.marearts.com/, accessed on 3 May 2017). The implementation of our parallel algorithm can be found on
Github (
https://github.com/TSUrobotics/Parallel-Background-Subtraction, accessed on 10 May 2023), while the output videos can be found on our
YouTube channel (Background Subtraction playlist) (
https://www.youtube.com/@robotperception6035, accessed on 1 June 2025).
In all of the figures shown (
Figure 2,
Figure 3,
Figure 4 and
Figure 5), our training sample size has been kept at minimum, that is, three frames. In
Figure 2, a moving vehicle is detected successfully across all video sequences. Furthermore, the qualitative analysis demonstrates that our parallel algorithm outperforms the state-of-the-art algorithms. The moving object, in this case, a vehicle, is robustly segmented by our parallel algorithm. Although the MOG algorithm introduces some noise, the segmented vehicle appears with missing portions, particularly in the center. MOG2 achieves better segmentation of the vehicle, but significant noise is still present throughout the image.
Figure 3 and
Figure 4 feature substantial variations in lighting conditions. In spite of this, our algorithm outperforms the MOG algorithm, while MOG2 is able to differentiate between foreground pixels and shadows (
Figure 3). Finally, in
Figure 4, our algorithm again outperforms both MOG and MOG2. In particular, the MOG algorithm performs poorly in low light conditions, whereas MOG2 exhibits substantial noise, especially in areas containing stationary vehicles.
Finally, in
Figure 5, a highway is shown with multiple vehicles moving at high speeds. The MOG algorithm struggles to effectively segment vehicles that are farther from the camera, though the segmentation improves when vehicles are closer, where more pixels represent the object. While the MOG2 algorithm performs well, it is still affected by noise. In contrast, our parallel algorithm successfully segments all vehicles, regardless of their distance from the camera, and exhibits significantly lower noise. In the upper part of the image, a flock of birds above the highway, visible in the original frame, is correctly segmented by all three algorithms.
The frame processing time has been estimated to be between 10 ms (100 fps) and 15 ms (67 fps) when using the NVIDIA GeForce GTX 970M, while a processing time of 8 ms (125 fps) is achieved with the NVIDIA Titan X. As mentioned, the resolution used was 640 × 480, and the buffer size was equal to three frames (). Finally, no post-processing has been implemented such as connected components or the flood-fill algorithm to improve the output of the background subtraction method.
In addition to the aforementioned resolution and buffer sizes, a number of experiments were carried out with higher—raw video—resolutions as well as different buffer sizes. The results are summarized in
Table 3. The specs used are as follows: Processor: AMD Ryzen 7 9700X 8-Core Processor, 3800 MHz, with 8 Cores and 16 Logical Processors. The RAM used was 2 × 32 GB 6000 MHz DDR5 and an NVIDIA 3080 Ti GPU. It shows the average per-frame execution time for four video sequences at seven circular-buffer sizes (3, 8, 10, 20, 30, 40, and 50 frames). For the Full-HD sequences, that is, 1920 × 1080 (cars, cctv1, and cctv5), the processing time rises from roughly 9 ms to 76 ms as the buffer grows. The 1280 × 720 highway sequence is consistently faster, sustaining an execution time between 5 ms to roughly 48 ms.
Table 4 shows the frame-per-second (fps) rates achieved. These results illustrate the classic accuracy–latency trade-off: a larger temporal context improves the model stability but reduces real-time throughput, especially for higher-resolution inputs.
In
Figure 6, each row shows the same frame from a given dataset, whereas each column shows the result obtained with a different buffer size; specifically, columns 2–4 correspond to buffer sizes of 3, 8, and 10 frames, respectively. Increasing the buffer size reduces noise, but this comes at the cost of a lower frame rate, as reported in
Table 3.
Parameter Sensitivity
The throughput depends primarily on the buffer size
n and image resolution (
Table 3), while the segmentation behavior depends on both
n and the decision threshold
. Smaller buffers improve speed and reduce memory footprint but may increase sensitivity to short-term fluctuations; larger buffers typically stabilize parameter estimates at the cost of the throughput (
Figure 6). This produces a clear speed–accuracy curve that complements qualitative comparisons. Deep models often require labeled data and may use scene-specific fine-tuning, so it is important to explicitly report the number of labeled training frames (if any), whether fine-tuning was performed, and the inference hardware and throughput (fps). This makes comparisons meaningful in the context of edge deployment.
5. Conclusions and Future Work
In this research, we have addressed the problem of background subtraction using a parallel algorithm implemented on GPU cores. Our proposed algorithm models pixel intensities using lognormal distributions. We compared our algorithm’s output with two of the most well-established mixture of Gaussians (MOG) algorithms. The results obtained show that our method surpasses in most cases the state-of-the-art algorithms. Specifically, our algorithm combines the strengths of both MOG and MOG2, maintaining low noise levels similar to MOG while providing superior object segmentation akin to MOG2 but with significantly less noise. Furthermore, the throughput of our algorithm achieves a high number of processed frames, measured between 9 fps and 188 fps depending on the number of cores available in a given GPU and the video resolution as well as the buffer size. The parallel nature of the algorithm allows it to fully leverage additional cores, further increasing the frame rate. As demonstrated in the figures, the proposed algorithm quickly adapts to changes in lighting conditions and efficiently switches pixel states between the background and foreground, thanks to the minimal sample size required to train the lognormal model.
Our approach is designed to be lightweight and highly parallel, but it may still be challenged by highly dynamic backgrounds (e.g., waving vegetation), strong camera motion, and hard shadow boundaries. In such cases, practical improvements may include (i) mild spatial regularization (median/morphological filtering) on the binary mask, (ii) adaptive thresholding based on scene statistics, and (iii) incorporating motion cues (e.g., optical flow) to suppress false positives. These extensions preserve per-pixel parallelism and can be integrated without changing the core lognormal formulation.
Currently, we are working on an enhancement that leverages KL divergence, where we will compare distributions rather than individual pixels between the training and testing phases. For future research, we plan to adopt a Bayesian approach and optical flow [
45,
46,
47] with the view to more accurately differentiate moving objects from static. Finally, we plan on testing our algorithm on a grid of GPUs to fully exploit the potential of our parallel algorithm. Our algorithm has numerous potential applications in transportation, surveillance, and other fields that involve detecting and tracking moving objects.