Next Article in Journal
The Counterfactual–Dialectical Optimization Framework: A Prescriptive Approach to Employee Attrition Management with Empirical Validation
Previous Article in Journal
Graph Neural Networks in Medical Imaging: Methods, Applications and Future Directions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FreeViBe+: An Enhanced Method for Moving Target Separation

1
Third Institute of Oceanography, Ministry of Natural Resources, Xiamen 361005, China
2
Key Laboratory of Smart Agriculture and Forestry (Fujian Agriculture and Forestry University), Fujian Province University, Fuzhou 350002, China
*
Author to whom correspondence should be addressed.
Information 2025, 16(12), 1052; https://doi.org/10.3390/info16121052
Submission received: 21 October 2025 / Revised: 24 November 2025 / Accepted: 26 November 2025 / Published: 1 December 2025

Abstract

An enhanced method called FreeViBe+ for moving target segmentation is proposed in this paper, addressing limitations in the ViBe algorithm such as ghosting, shadows, and holes. To eliminate ghosts, multi-frame background modeling is introduced. Shadows are detected and removed based on their characteristics in the HSV color space, while holes are filled by merging GrabCut segmentation results with the ViBe extraction output. Furthermore, the Structure-measure is tuned to optimize image fusion, enabling improved foreground–background separation. Comprehensive experiments on the UCF101 and Weizmann datasets demonstrate the effectiveness of FreeViBe+ in comparison with Finite Difference, Gaussian Mixture Model, and ViBe methods. Ablation studies confirm the individual contributions of multi-frame modeling, shadow removal, and GrabCut refinement, while sensitivity analysis verifies the robustness of key parameters. Quantitative evaluations show that FreeViBe+ achieves superior performance in precision, recall, and F-measure compared with existing approaches.

1. Introduction

Enhancing the foreground–background S-measure in videos is a crucial step in moving target detection and tracking within dynamic scenes and represents one of the key and challenging issues in computer vision and image processing. Numerous methods for enhancing the foreground–background S-measure have been proposed in the literature, including classical approaches such as the background subtraction method, the Gaussian Mixture Model (GMM), background modeling, and the Visual Background Extractor (ViBe) algorithm for foreground detection. However, the accuracy of existing S-measure enhancement algorithms is significantly affected by factors such as background disturbances, lighting variations, and camera shake.
Although researchers have introduced various improved techniques to address issues like shadows, ghosting, and holes, none has fully resolved the problem of the incomplete S-measure. Therefore, this paper proposes an enhanced algorithm, called FreeViBe+, that integrates the ViBe algorithm with the GrabCut algorithm, aiming to better eliminate shadows, ghosting, and holes in the foreground by using a targeted, sequential pipeline for cascade error mitigation, so that each step directly addresses the primary weakness of the previous one. Furthermore, the effectiveness and superiority of the enhanced algorithm are validated based on the F-measure and the S-measure.
The primary contribution of this work is threefold. First, we introduce a novel framework that strategically integrates the efficiency of ViBe for initial motion detection with the refinement capabilities of GrabCut, moving beyond a simple pipeline to an adaptive fusion model. Second, the proposed framework incorporates an innovative feedback mechanism, where the S-measure evaluation metric is not merely used for final assessment but is also actively employed to guide the iterative refinement process in GrabCut, dynamically adjusting segmentation parameters based on structural similarity outcomes. Third, we enhance the classic ViBe and shadow detection components with a context-aware update policy, significantly improving robustness against dynamic backgrounds and accuracy in distinguishing shadows from foreground objects. These contributions collectively result in a system which achieves a superior balance between computational efficiency and segmentation precision compared with standalone or merely combined classical approaches.

2. Related Work

A lot of research on video-based moving object detection based on background and foreground extraction is reported in the literature and can be broadly categorized into several main algorithms, including background subtraction, Gaussian Mixture Models, and neural network methods [1,2,3].
For background subtraction-based moving object detection, representative work includes that by Panda et al. [4], who proposed a background extraction technique for moving objects in complex environments based on a fuzzy color difference histogram (CDH), testing and validating it in challenging scenarios such as water ripples, fountains, and swaying vegetation. Barnich et al. [5] presented a technique for motion detection that incorporates several innovative mechanisms. Chen et al. [6] proposed a new weighted kernel density estimation technique to build long-term background and short-term foreground models to flexibly represent the long-term state and the short-term changes in a scene. Aliouat et al. [7] presented an enhanced video background subtraction algorithm, with a controlled adaptive threshold selection method for low-cost surveillance systems.
As for Gaussian Mixture Model-based moving object detection, typical works include that by Xie et al. [8], who leveraged low-rank matrix approximation (LRMA) in computer vision to introduce an image denoising and background extraction method based on weighted Schatten p-norm minimization. Javed et al. [9] studied moving target detection in complex environments via spatio-temporal robust principal component analysis (RPCA). Dewan et al. [10] proposed fuzzified color difference histogram-based background modeling to significantly deal with complex background scenes, followed by principal component analysis-based feature extraction. Zhang et al. [11] designed a spiking auto-encoder network based on the noise resilience and time-sequence sensitivity of spiking neural networks to enhance the S-measure of foreground and background.
Moving target detection based on neural networks and deep learning has gained significant popularity in recent years, with typical works being the following: Babaee et al. [12] proposed a background extraction method for video sequences based on a deep convolutional neural network (CNN) capable of real-time processing, without the need for feature selection or parameter tuning, and validated its performance on the change detection benchmark dataset CDnet 2014. Bouwmans et al. [1] provided the first review of deep neural network concepts in background subtraction for novices and experts, in order to analyze this success and to provide further directions. Houhou et al. [13] proposed a novel deep learning model, called deep multi-scale network (DMSN), for background subtraction. Yang et al. [14] proposed a new background subtraction algorithm named spatio-temporal propagation network. Zhao et al. [15] proposed a universal background subtraction framework based on the arithmetic distribution neural network for learning the distributions of temporal pixels. Dai et al. [16] proposed an encoder–decoder-type deep neural network to tackle the task of moving object detection from video sequences. Xiong et al. [17] utilized background subtraction, which is highly sensitive to dynamic pixels, to provide YOLO with the location and features of small dynamic targets, thus reducing the missed detection rate of small targets.
While these algorithms are widely applied and perform well, they each possess certain limitations. For instance, background subtraction requires obtaining the original video’s background, which must be static, limiting its real-time applicability; Gaussian Mixture Models face computational load issues and often fail to achieve a complete foreground–background S-measure; neural network methods demand large volumes of training data and suffer from poor interpretability of results; and the ViBe algorithm often produces foregrounds with issues such as shadows and ghosting [1,18].
Compared with other moving target foreground detection algorithms, the ViBe algorithm [3,5,19] was the first to introduce random sampling and neighborhood propagation mechanisms into background modeling and updating. Since it initializes the background model using the first frame of the video sequence, it offers high real-time performance and robustness, with low computational cost and high efficiency. However, being based on statistical principles, the ViBe model is constrained by its sample size. When foreground and background colors are similar, the foreground can be misclassified as background, leading to problems such as ghosts, shadows, and voids in the foreground [18]. In response to these issues, numerous researchers have proposed improvements. Notable efforts include that by Cheng et al. [20], who addressed background modeling by proposing multi-frame motion region detection and background modeling methods to suppress ghosting. Ye et al. [21] proposed an enhanced Vibe algorithm to improve the accuracy of moving object detection. Ma et al. [22] proposed an improved ViBe algorithm combined with the average background to address the ghost phenomenon existing in the foreground detection of the ViBe algorithm and the difficulty regarding its long-term elimination.
Although numerous studies on moving target extraction have made improvements in ghost suppression, shadow removal, complete target extraction, and computational complexity reduction, existing algorithms still struggle to achieve a satisfactory foreground–background S-measure for moving targets. No single method has fully resolved the issues of ghosts, shadows, and voids. Therefore, from the perspectives of background modeling, shadow removal, and multi-feature extraction result fusion, this study proposes an enhanced method for enhancing the foreground–background S-measure of moving targets.

3. FreeViBe+: An Enhanced Foreground–Background S-Measure Algorithm

By employing a targeted, sequential pipeline for cascade error mitigation, the enhanced algorithm FreeViBe+ significantly reduces the occurrence of ghosting through multi-frame image background modeling, enhances shadow removal efficacy via shadow detection, and integrates the image segmentation algorithm GrabCut to extract more complete contours of moving targets while mitigating the issue of internal voids, thereby achieving a substantial improvement in performance in moving target detection. The perspective of the algorithm’s underlying principles will be introduced in the following section.

3.1. Principles

3.1.1. Ghost Elimination Based on Multi-Frame Background Modeling

The ViBe algorithm selects the first frame containing foreground objects for background modeling, and subsequent frames extract foreground targets based on this background. However, this approach is susceptible to interference from the initial background, leading to ghosting issues. To address this, the FreeViBe+ algorithm constructs the background model, using the first n frames of video data that include foreground objects. By applying inter-frame difference analysis, moving objects in these initial frames are classified as foreground, while the remaining static parts are treated as background. This method avoids simply treating the first frame as the background, significantly reducing the impact of the initial background on subsequent moving target extraction, thereby minimizing the likelihood of ghosting.

3.1.2. Shadow Detection and Removal Based on HSV and Luminance

Under lighting conditions (particularly strong light), due to the similar motion speed and magnitude of moving objects and their shadows, the ViBe algorithm often misidentifies shadows as part of the foreground. The FreeViBe+ algorithm leverages the characteristics of shadows: in areas where shadows are present in the current frame, the sum of the hue H, saturation S, and value V of each pixel is less than the sum of the corresponding hue h, saturation s, and value v in the background image. Pixels in the current frame that satisfy the condition of having a “sum of HSV values lower than the background”, and a “brightness V lower than the average brightness of all backgrounds” are classified as shadows and subsequently removed as part of the background. In FreeViBe+, the criterion for determining whether a pixel is a shadow point is shown in Formula (1), where 0 represents a background point and 1 represents a foreground point.
I s S h a d o w ( x , y ) = 0 , HSV ( x , y ) < hsv i ( x , y ) V ( x , y ) < V ¯ ( x , y ) , i { 1 , 2 , , k } 1 , others
Among them, HSV ( x , y ) is the sum of the hue H, saturation S, and brightness V of the current frame image at point ( x , y ) . hsv i ( x , y ) is the sum of the hue h, saturation s, and brightness v of the i-th background image at point ( x , y ) (there are k background images in total). V ( x , y ) is the brightness of the current frame image at point ( x , y ) , and v ¯ ( x , y ) = 1 k · i = 1 k v i ( x , y ) is the average brightness of the background images at point ( x , y ) .

3.1.3. Void Filling by Integrating the GrabCut Algorithm

Due to the limited number of statistical model samples, the ViBe algorithm falls short of achieving the infinite sample size required to accurately describe a scene. As a result, whether for rigid or non-rigid objects, the extracted contours may appear relatively complete but often contain internal voids or intermediate fractures. In contrast, object extraction based on the image segmentation algorithm GrabCut [23] typically yields more complete internal regions, though the contours may be fragmented.
To address these complementary limitations, the enhanced FreeViBe+ algorithm integrates both approaches: it uses the ViBe algorithm to obtain the contour of the moving object and the GrabCut algorithm to extract its internal regions. The fusion weight between the ViBe result and the GrabCut result is then adjusted, based on the Structure-measure (referred to hereafter as the S-measure) [24,25] as an evaluation metric. This enables the algorithm to achieve a more satisfactory foreground S-measure of the moving target. The fusion criterion is defined by Formula (2).
R ( x , y ) = G ( x , y ) · ω + C ( x , y ) · ( 1 ω )
For the i-th grayscale image of the video, G i ( x , y ) is the result extracted using the ViBe algorithm, C i ( x , y ) is the result obtained using the GrabCut algorithm, R i ( x , y ) is the fused result of the two, and ω is the fusion weight.

3.2. Implementation

The workflow of the enhanced FreeViBe+ algorithm for the enhancement of the moving target foreground–background S-measure is illustrated in Figure 1.
First, video data are read and processed through frame (image) acquisition, where the images are converted into grayscale. Let the frame image size be M × N , the grayscale image sequence be denoted by P 1 , P 2 , P 3 , , P n , and the algorithm parameters be set as follows: radius threshold R, matching threshold σ , and grayscale value replacement probability ρ .
Next, the first t frames containing foreground targets are utilized to perform background modeling via inter-frame difference analysis, resulting in a background image P B . Based on P B , k background images M 1 , M 2 , , M k are generated, replacing the background image generation in the original ViBe algorithm, which relies solely on the initial frame P 1 . This enhancement helps mitigate potential ghosting issues inherent in the standard ViBe approach. This step corresponds to Step S 1 in Figure 1: multi-frame background modeling.
Subsequently, for each grayscale frame image P i ( i = 1 , 2 , 3 , ), moving target extraction based on ViBe and image feature extraction based on GrabCut are performed:
  • Moving Target Detection Based on ViBe: First, shadow detection based on HSV and luminance is executed to remove potential shadows, as shown in Step S 2 in Figure 1: HSV color space identification. The shadow removal module successfully eliminates the elongated shadows attached to the moving target, which were initially misclassified as foreground. Then, foreground target recognition using the ViBe algorithm is carried out.
  • Target Detection Based on GrabCut Image Segmentation: Taking the moving target as the region of interest, the Gaussian Mixture Model (GMM) and K-means clustering are iteratively applied to update the results, yielding a relatively complete set of foreground region pixels. The hole-filling module effectively reconstructs the complete silhouette of the object, which was fragmented due to uniform texture.
  • Finally, the S-measure is used as the criterion for weight adjustment, and the results obtained from the ViBe algorithm and the GrabCut algorithm are fused to produce the final foreground moving target. This step corresponds to Step S 4 in Figure 1: result image fusion.
The pseudocode for the FreeViBe+ algorithm can be seen in Algorithm 1.
Algorithm 1 FreeViBe+: Video-based Moving Object Segmentation Algorithm
1:Input: video_file, params R , σ , ρ , t , k
2:Output: Final segmented frames R 1 , R 2 ,
3:
4:function FreeViBe+( F r a m e S e q u e n c e = { F 1 , F 2 , , F N } , p a r a m s = { R , σ , ρ , t , k } )
5:   f r a m e s = [ ]
6:  while video_file has next frame do
7:       f r a m e = r e a d _ f r a m e ( v i d e o _ f i l e )
8:       g r a y _ f r a m e = c o n v e r t _ t o _ g r a y s c a l e ( f r a m e )
9:      append g r a y _ f r a m e to f r a m e s
10:  end while
11:   P = [ P 1 , P 2 , P 3 , ]
12:  
13   R , σ , ρ p a r a m s
14:  
15:   P _ B = i n i t i a l i z e _ b a c k g r o u n d ( P , t )
16:   M = [ ]
17:  for  i = 1 to k do
18:       M _ i = g e n e r a t e _ b a c k g r o u n d _ v a r i a n t ( P _ B , R , σ , ρ )
19:      append M _ i to M
20:  end for
21:  
22:   V i B e _ m o d e l = i n i t i a l i z e _ V i B e ( M )
23:   G = [ ]
24:  for  P _ i in P do
25:       s h a d o w _ m a s k = d e t e c t _ s h a d o w s ( P _ i , H S V _ t h r e s h o l d s )
26:       P _ i _ n o _ s h a d o w = P _ i s h a d o w _ m a s k
27:       f o r e g r o u n d _ m a s k _ V i B e = V i B e _ d e t e c t ( V i B e _ m o d e l , P _ i _ n o _ s h a d o w )
28:       G _ i = e x t r a c t _ f o r e g r o u n d ( P _ i , f o r e g r o u n d _ m a s k _ V i B e )
29:      append G _ i to G
30:  end for
31:  
32:   C = [ ]
33:  for  P _ i in P do
34:       g r a b c u t _ m a s k = G r a b C u t _ s e g m e n t ( P _ i , i n i t i a l _ r e c t )
35:       f o r e g r o u n d _ r e g i o n = e x t r a c t _ f o r e g r o u n d ( P _ i , g r a b c u t _ m a s k )
36:       b i n a r y _ f o r e g r o u n d = b i n a r i z e ( f o r e g r o u n d _ r e g i o n , t h r e s h o l d )
37:       f i l l e d _ f o r e g r o u n d = f i l l _ h o l e s ( b i n a r y _ f o r e g r o u n d )
38:       C _ i = c r e a t e _ s e g m e n t a t i o n _ r e s u l t ( P _ i , f i l l e d _ f o r e g r o u n d )
39:      append C _ i to C
40:  end for
41:  
42:   R = [ ]
43:  for  i = 1 to length(P) do
44:       G _ i = G [ i ]
45:       C _ i = C [ i ]
46:       S _ m e t r i c = c a l c u l a t e _ S _ m e t r i c ( G _ i , C _ i )
47:       w = a d j u s t _ f u s i o n _ w e i g h t ( S _ m e t r i c , ρ )
48:       R _ i = f u s e _ i m a g e s ( G _ i , C _ i , w )
49:      append R _ i to R
50:  end for
51:  return R
52:end function
In the above, key functions are listed as follows:
  • initialize_background (P, t) uses frame differencing on the first t frames to initialize the background P B .
  • generate_background_variant( P B , R, σ , ρ ) generates a variant of the background P B with random perturbations (controlled by R, σ , and ρ ).
  • detect_shadows( P i , H S V t h r e s h o l d s ) identifies shadow regions in P i using HSV color space thresholds.
  • initialize_ViBe(M) initializes the ViBe model with background images M.
  • ViBe_detect(ViBe_model, P i -no-shadow) applies ViBe to detect moving foreground in P i -no-shadow.
  • GrabCut_segment( P i , initial_rect) performs GrabCut segmentation on P i with an initial bounding box.
  • fill_holes(binary_foreground) fills holes in the binary foreground mask.
  • calculate_S_metric( G i , C i ) computes a similarity metric S between the ViBe foreground G i and GrabCut result C i .
  • fuse_images( G i , C i , w) fuses G i and C i using weight w (e.g., weighted average or pixel-wise selection).

4. Experiments and Analysis

To evaluate whether the algorithm can effectively detect ghosts and shadows in video sequences, achieve accurate foreground–background S-measures of moving targets, and avoid internal voids, we conducted comparative experiments using the UCF101 and Weizmann datasets as benchmarks. The proposed FreeViBe+ algorithm was compared against classical background subtraction-based methods, such as Finite Difference, Gaussian Mixture Models (GMM), and the standard ViBe algorithm. These experiments were designed to validate the performance of FreeViBe+ in separating moving targets from the background.

4.1. Dataset Information

UCF101 [26] is an action recognition dataset collected from YouTube, featuring 101 action categories. As an extension of the UCF50 dataset, it constitutes a large-scale video dataset of human actions captured in outdoor environments, originally comprising 50 action categories.
The Weizmann dataset [27] is an image segmentation dataset shared by the Weizmann Institute in Israel. The videos are recorded from a fixed perspective with relatively simple backgrounds. Each frame contains only one individual performing an action, with each person executing ten different actions (bend, jack, jump, pjump, run, side, skip, walk, wave1, and wave2) (Figure 2). Each action includes nine distinct samples.

4.2. Experimental Setup and Results

To validate the effectiveness and superiority of the algorithm FreeViBe+, Finite Difference, the GMM, and the standard ViBe algorithm were compared using identical parameters. The parameter settings followed those recommended in references [20], specifically number of background models K = 18, background update probability ρ = 0.01, matching threshold σ = 5, fusion weight ω = 0.5, and number of frames used for background modeling t = 3.
The experimentation was conducted using a PC with an Intel Xeon W-2145 CPU, 256 GB of RAM, and 4TB HDD + 512GB SSD storage. For the fairness and objectivity of experimental comparisons, all experiments were conducted on video sequences with a unified resolution of 320 × 240 pixels, and the proposed FreeViBe+ algorithm and all baseline methods (ViBe, GMM, etc.) were implemented in Python 3.8, while key libraries included OpenCV for image processing and NumPy for numerical computations.
Compared with three typical background modeling-based video image moving object segmentation methods, i.e., Finite Difference (FD), GMM, and ViBe, and two popular deep learning based models, i.e., DeepLabv3+ and ST-Former, the precision, recall, and F-measure scores of the FreeViBe+ method on the UCF101 and Weizmann datasets can be seen in Table 1.
The S-measure scores of four distinct moving object detection algorithms, Finite Difference, GMM, ViBe, and FreeViBe+, on different test datasets are presented in Table 2.
Figure 3 presents a comparative visualization of the foreground–background S-measure results achieved by the four algorithms on the Jump1 video from the UCF101 dataset, using the 86th frame as an example. Specifically, the bounding boxes highlight areas that are not properly handled by the traditional methods, where green regions represent the background, orange regions represent shadows, red regions represent holes, blue regions represent ghosts, and yellow regions represent noise.
Figure 4 presents a comparative visualization of the foreground–background S-measure results achieved by the four algorithms on the Walk1 video from the Weizmann dataset, using the 23rd frame as an example.

4.3. Analysis and Discussion

As shown in Table 1 and Table 2, the deep learning baselines, as the absolute mainstream in current research and applications of moving object detection and segmentation in videos and images, particularly ST-Former, demonstrate competitive performance in terms of F-measure, precision, and recall. However, the use and promotion of deep learning methods are subject to certain foundational constraints, the proposed method maintains a significant advantage in terms of limited hardware resources and computing power, and there is a lack of large-scale high-quality annotated data. This validates our method’s value within its intended application scope.
The significant improvement in F-measure/recall/S-measure scores observed in our full method (Table 1 and Table 2) can be largely attributed to the effective reduction in holes in the foreground masks, a direct result of the proposed hole-filling strategy. Similarly, the notable gain in precision is closely associated with the suppression of dynamic background and shadows, achieved by our multi-frame updating and shadow detection modules.
As observed in Figure 3, the Finite Difference method (Subplot (a)) exhibits issues such as ghosting, voids, shadows, and an incomplete background S-measure, achieving an S-measure score of 0.6662. The GMM algorithm (Subplot (b)) does not suffer from ghosting but still presents voids, shadows, and an incomplete S-measure, with an S-measure score of 0.7406. The ViBe algorithm (Subplot (c)) avoids an incomplete background S-measure but is affected by voids and ghosting. In contrast, the FreeViBe+ algorithm (Subplot (d)) effectively addresses the issue of voids compared with the other methods while also being free from ghosting, shadows, and an incomplete S-measure. With an S-measure score of 0.8221, it demonstrates superior foreground–background S-measure performance.
From Figure 4, it can be seen that the background subtraction method (Subplot (a)) produces ghosting artifacts, resulting in an S-measure score of 0.7228. The GMM algorithm (Subplot (b)) introduces some noise due to an incomplete S-measure, achieving an S-measure score of 0.6632. The ViBe algorithm (Subplot (c)) is affected by ghosting and also exhibits voids within the foreground target, with an S-measure score of 0.8369. In comparison, the FreeViBe+ algorithm (Subplot (d)) is free from ghosting, shadows, and voids, attaining an S-measure score of 0.8775, which indicates excellent S-measure performance.
As for the computational cost of FreeViBe+, the average processing time per frame was evaluated on the 320 × 240 video sequences from the UCF101 and Weizmann datasets by using a PC with an Intel Xeon W-2145 CPU and 256 GB of RAM. All the algorithms were implemented and executed in Python, and the average processing time of the original ViBe algorithm takes about 0.035 s per frame, while background subtraction takes about 0.083 s per frame and the Gaussian Mixture Model (GMM) takes about 0.056 s per frame. In comparison, the proposed FreeViBe+ took about 0.037 s per frame; while our multi-stage approach introduces additional computational overhead, it achieves significantly higher recognition accuracy (as shown by the S-measure in Figure 3 and Figure 4) without a noticeable increase in time cost. It achieves a favorable balance between accuracy and efficiency compared with traditional methods like ViBe, making it suitable for applications requiring high-quality segmentation.

4.3.1. Ablation Analysis

To verify the effectiveness of three core modules of FreeViBe+, (1) multi-frame background modeling for ghost elimination, (2) HSV and luminance for shadow detection and removal, and (3) the GrabCut algorithm for void filling, ablation experiments were conducted on the Weizmann and UCF101 datasets; the standard ViBe algorithm was taken as the “baseline”, and the performance gain (F-measure) achieved by adding each proposed module to the baseline was evaluated. The experimental results are shown in Table 3.
In Table 3, each subsequent row adds a new component to the configuration of the previous row. The values in parentheses indicate the approximate performance gain over the previous configuration. The results clearly show that each component (multi-frame modeling, shadow removal, and GrabCut refinement) provides a significant and cumulative improvement in the F-measure, effectively validating our design choices.

4.3.2. Sensitivity Analysis

A sensitivity analysis was performed on the UCF101 dataset to assess the robustness of FreeViBe+. The baseline performance (F-measure) achieved with the default parameters was 88.4%. Table 4 illustrates the effect of varying each key parameter.
It can be concluded from the sensitivity analysis that our method is not overly sensitive to the exact parameter values within a reasonable range. The selected default parameters (K = 18, ρ = 0.01, σ = 5, ω = 0.5, and t = 3) consistently reside at or near the performance optimum, thereby justifying our choices and underscoring the robustness of the proposed algorithm.

5. Conclusions

In conclusion, an enhanced algorithm for moving target separation called FreeViBe+ was presented. FreeViBe+ eliminates ghosting through multi-frame background modeling, achieves shadow detection and removal by leveraging the HSV and luminance characteristics of shadows, and fills voids by integrating the results of the GrabCut image segmentation algorithm. Finally, a comparative analysis was conducted to verify the effectiveness of FreeViBe+ in extracting the contours of moving targets and avoiding the problems of ghosting, shadows, and voids. The main contribution of the work is the free integration of multi-frame modeling, HSV-based shadow detection, and GrabCut within a single, problem-oriented pipeline. This architecture effectively demonstrates that carefully sequencing solutions to interdependent problems (ghosts, shadows, and voids) can lead to significant performance gains over using individual methods or their naive combination. Collectively, these innovations address the limitations of traditional background modeling-based video object segmentation methods, which they are usually sensitive to dynamic backgrounds and “ghosting artifacts”, while achieving a better balance between computational efficiency and segmentation accuracy compared with standalone or simply combined classical approaches.
Despite these promising results, we acknowledge certain limitations that pave the way for future work. Primarily, the shadow detection module’s reliance on fixed thresholds may limit its adaptability to environments with highly variable illumination. To address these issues, we plan to develop an adaptive parameter-tuning mechanism to enhance robustness across diverse lighting conditions. Furthermore, we aim to validate FreeViBe+ in more complex real-world applications, such as intelligent surveillance and autonomous driving, and investigate a hybrid framework that integrates its strengths with deep learning models to tackle even more challenging segmentation tasks.

Author Contributions

Conceptualization, J.W.; methodology, J.W.; software, J.W.; validation, J.W., Y.S., K.Z., and J.L.; resources, J.W. and J.L.; writing—original draft preparation, J.W. and J.L.; writing—review and editing, J.W., Y.S., K.Z., and J.L.; visualization, J.W., Y.S., K.Z., and J.L.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

The work was supported in part by Basic Scientific Research Project of the Third Institute of Oceanography, Ministry of Natural Resources (No. 2016020); National Key R&D Program of China (2022YFC2804002); Fujian Provincial Natural Science Foundation of China (2025J011222); and Fujian Provincial Science and Technology Plan Project of China (No. 2024I1001).

Institutional Review Board Statement

This study was based on the analysis of publicly available data, a fully anonymized dataset. We have ensured that all data were handled in strict compliance with relevant data protection regulations and ethical guidelines to safeguard participant confidentiality.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in the study are included in the article, and further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bouwmans, T.; Jayed, S.; Sultana, M.; Jung, S.K. Deep Neural Network Concepts for Background Subtraction: A Systematic Review and Comparative Evaluation. Neural Netw. 2019, 117, 8–66. [Google Scholar] [CrossRef]
  2. Tokmakov, P.; Schmid, C.; Alahari, K. Learning to Segment Moving Objects. Int. J. Comput. Vis. 2019, 127, 282–301. [Google Scholar] [CrossRef]
  3. Bou, X.; Ehret, T.; Facciolo, G.; Morel, J.M.; von Gioi, R.G. Reviewing ViBe, a Popular Background Subtraction Algorithm for Real-Time Applications. Image Process. Line 2022, 12, 527–549. [Google Scholar] [CrossRef]
  4. Panda, D.K.; Meher, S. Detection of Moving Objects Using Fuzzy Color Difference Histogram Based Background Subtraction. IEEE Signal Process. Lett. 2016, 23, 45–49. [Google Scholar] [CrossRef]
  5. Barnich, O.; Droogenbroeck, M.V. ViBe: A Universal Background Subtraction Algorithm for Video Sequences. IEEE Trans. Image Process. 2011, 20, 1709–1724. [Google Scholar] [CrossRef] [PubMed]
  6. Chen, Z.; Wang, R.L.; Zhang, Z.; Wang, H.B.; Xu, L.Z. Background-foreground Interaction for Moving Object Detection in Dynamic Scenes. Inf. Sci. 2019, 483, 65–81. [Google Scholar] [CrossRef]
  7. Aliouat, A.; Kouadria, N.; Maimour, M.; Harize, S. EVBS-CAT: Enhanced Video Background Subtraction with a Controlled Adaptive Threshold for Constrained Wireless Video Surveillance. J. Real-Time Image Process. 2024, 21, 9. [Google Scholar] [CrossRef]
  8. Xie, Y.; Gu, S.H.; Liu, Y.; Zuo, W.M.; Zhang, W.S.; Zhang, L. Weighted Schatten p-Norm Minimization for Image Denoising and Background Subtraction. IEEE Trans. Image Process. 2016, 25, 4842–4857. [Google Scholar] [CrossRef]
  9. Javed, S.; Mahmood, A.; Al-Maadeed, S.; Bouwmans, T.; Jung, S.K. Moving Object Detection in Complex Scene Using Spatiotemporal Structured-Sparse RPCA. IEEE Trans. Image Process. 2019, 28, 1007–1022. [Google Scholar] [CrossRef]
  10. Dewan, P.; Nivedita, N.; Kumar, R. A Novel Approach for Detection of Moving Objects in Complex Scenes Using Fuzzy Colour Difference Histogram. Int. J. Softw. Innov. 2021, 9, 81–101. [Google Scholar] [CrossRef]
  11. Zhang, Z.X.; Li, X.P.; Li, Q. SAEN-BGS: Energy-efficient Spiking Autoencoder Network for Background Subtraction. Pattern Recognit. 2026, 169, 111792. [Google Scholar] [CrossRef]
  12. Babaee, M.; Dinh, D.T.; Rigoll, G. A Deep Convolutional Neural Network for Video Sequence Background Subtraction. Pattern Recognit. 2018, 76, 635–649. [Google Scholar] [CrossRef]
  13. Houhou, I.; Zitouni, A.; Ruichek, Y.; Bekhouche, S.E.; Kas, M.; Taleb-Ahmed, A. RGBD Deep Multi-scale Network for Background Subtraction. Int. J. Multimed. Inf. Retr. 2022, 11, 395–407. [Google Scholar] [CrossRef]
  14. Yang, Y.Z.; Ruan, J.H.; Zhang, Y.Q.; Cheng, X.; Zhang, Z.; Xie, G.J. STPNet: A Spatial-Temporal Propagation Network for Background Subtraction. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 2145–2157. [Google Scholar] [CrossRef]
  15. Zhao, C.Q.; Hu, K.K.; Basu, A. Universal Background Subtraction Based on Arithmetic Distribution Neural Network. IEEE Trans. Image Process. 2022, 31, 2934–2949. [Google Scholar] [CrossRef]
  16. Dai, Y.; Yang, L. Background Subtraction for Video Sequence Using Deep Neural Network. Multimed. Tools Appl. 2024, 83, 82281–82302. [Google Scholar] [CrossRef]
  17. Xiong, J.; Wu, J.; Tang, M.; Xiong, P.W.; Huang, Y.S.; Guo, H. Combining YOLO and Background Subtraction for Small Dynamic Target Detection. Vis. Comput. 2025, 41, 481–490. [Google Scholar] [CrossRef]
  18. Cucchiara, R.; Grana, C.; Piccardi, M.; Prati, A. Detecting Moving Objects, Ghosts, and Shadows in Video Streams. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 1337–1342. [Google Scholar] [CrossRef]
  19. Barnich, O.; Droogenbroeck, M.V. ViBe: A Powerful Random Technique to Estimate the Background in Video Sequences. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009; International Conference on Acoustics Speech and Signal Processing ICASSP. pp. 945–948. [Google Scholar]
  20. Cheng, K.Y.; Hui, K.F.; Zhan, Y.Z.; Qi, M. A Novel Improved ViBe Algorithm to Accelerate the Ghost Suppression. In Proceedings of the 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD 2016), Changsha, China, 13–15 August 2016; pp. 1692–1698. [Google Scholar]
  21. Ye, Y.; Mingwei, C.; Feng, Y. EVibe: An Improved Vibe Algorithm for Detecting Moving Objects. Chin. J. Sci. Instrum. 2014, 35, 924–931. [Google Scholar]
  22. Ma, Y.J.; Chen, M.L.; Liu, P.P.; Duan, R.G.; Ma, Y.T. ViBe Algorithm-Based Ghost Suppression Method. Laser Optoelectron. Prog. 2020, 57, 105–112. [Google Scholar]
  23. Rother, C.; Kolmogorov, V.; Blake, A. GrabCut - Interactive Foreground Extraction Using Iterated Graph Cuts. Acm Trans. Graph. 2004, 23, 309–314. [Google Scholar] [CrossRef]
  24. Fan, D.P.; Cheng, M.M.; Liu, Y.; Li, T.; Borji, A. Structure-measure: A New Way to Evaluate Foreground Maps. In Proceedings of the 16th IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017; IEEE International Conference on Computer Vision. pp. 4558–4567. [Google Scholar]
  25. Cheng, M.M.; Fan, D.P. Structure-Measure: A New Way to Evaluate Foreground Maps. Int. J. Comput. Vis. 2021, 129, 2622–2638. [Google Scholar] [CrossRef]
  26. Soomro, K.; Zamir, A.; Shah, M. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. arXiv 2012, arXiv:1212.0402. [Google Scholar] [CrossRef]
  27. Blank, M.; Gorelick, L.; Shechtman, E.; Irani, M.; Basri, R. Actions as Space-Time Shapes. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 2247–2253. [Google Scholar] [CrossRef] [PubMed]
  28. Sobral, A.; Vacavant, A. A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Comput. Vis. Image Underst. 2014, 122, 4–21. [Google Scholar] [CrossRef]
Figure 1. Workflow of FreeViBe+.
Figure 1. Workflow of FreeViBe+.
Information 16 01052 g001
Figure 2. Sample experimental data (UCF101 jump video and Weizmann walk video).
Figure 2. Sample experimental data (UCF101 jump video and Weizmann walk video).
Information 16 01052 g002
Figure 3. Comparison of foreground and background S-measure effects for moving targets (UCF101-Jump1, frame 86 image).
Figure 3. Comparison of foreground and background S-measure effects for moving targets (UCF101-Jump1, frame 86 image).
Information 16 01052 g003
Figure 4. Comparison of foreground and background S-measure effects for moving targets (Weizmann-Walk1, frame 23 image).
Figure 4. Comparison of foreground and background S-measure effects for moving targets (Weizmann-Walk1, frame 23 image).
Information 16 01052 g004
Table 1. Precision, recall, and F-measure of different methods on UCF101 and Weizmann (%) [5,28].
Table 1. Precision, recall, and F-measure of different methods on UCF101 and Weizmann (%) [5,28].
DatasetMethodsFDGMMViBeFreeViBe+DeepLabv3+ST-Former
WeizmannPrecision82 ± 388 ± 294 ± 196 ± 198.5 ± 0.499 ± 0.3
WeizmannRecall78 ± 485 ± 392 ± 295 ± 197.5 ± 0.798 ± 0.5
WeizmannF-measure80 ± 386 ± 293 ± 197 ± 198 ± 0.598.5 ± 0.4
UCF101Precision65 ± 572 ± 485 ± 389 ± 396 ± 0.897 ± 0.6
UCF101Recall58 ± 668 ± 581 ± 485 ± 395 ± 196 ± 0.8
UCF101F-measure61 ± 570 ± 483 ± 388 ± 395.5 ± 0.996.5 ± 0.7
Table 2. S-measure scores of Finite Difference, GMM, ViBe, and FreeViBe+ on different datasets (%).
Table 2. S-measure scores of Finite Difference, GMM, ViBe, and FreeViBe+ on different datasets (%).
DatasetFinite DifferenceGMMViBeFreeViBe+ (Ours)
UCF101 Jump166.6274.0671.1882.21
UCF sports video168.9371.2973.1880.91
Weizmann Walk172.2883.6966.3287.75
ViBe original data81.9082.1171.4389.51
Table 3. Ablation study on the UCF101 and Weizmann datasets (%).
Table 3. Ablation study on the UCF101 and Weizmann datasets (%).
Method ConfigurationWeizmann (F-Measure)UCF101 (F-Measure)
Baseline: ViBe93 ± 183 ± 3
+Multi-frame Modeling (t = 3)94.5 ± 1.5 (+1.5)84.5 ± 2 (+1.5)
+Shadow Removal95.5 ± 1.5 (+1.0)86 ± 2 (+1.5)
+GrabCut Refinement: FreeViBe+97 ± 1 (+1.5)88 ± 3 (+2.0)
Table 4. Algorithm FreeViBe+’s sensitivity to different parameter values.
Table 4. Algorithm FreeViBe+’s sensitivity to different parameter values.
ParameterTested RangeF-Measure RangeObservation
K10–3085.5–89.3%Performance is stable. K = 18 provides a good balance between model complexity and accuracy. Smaller models (K < 15) lead to a slight drop, while larger models
(K > 22) offer
negligible improvement.
ρ 0.001–0.186.0–89.5%The method is relatively sensitive. Very low values ( ρ < 0.005) cause slow adaptation, while high values ( ρ > 0.05) lead to increased noise. The chosen ρ = 0.01 lies within a stable, high-performance plateau.
σ 3–2087.8–89.2%Performance is robust across a wide range. A lower threshold ( σ < 5) is too strict, misclassifying some background as foreground. A higher threshold ( σ > 15) makes the matching too lenient. σ = 5 is optimal for our setup.
ω 0.3–0.887.5–90.2%The performance is very stable, confirming the robustness of our fusion strategy. The chosen value of ω = 0.5 gives equal weight to both feature streams, which yields the best and most balanced result.
t1–1087.2–89.4%Using too few frames (t < 3) results in an insufficiently trained model. Performance plateaus after t = 3, indicating that a small number of frames is sufficient for
effective initialization.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, J.; Zhang, K.; Shen, Y.; Lin, J. FreeViBe+: An Enhanced Method for Moving Target Separation. Information 2025, 16, 1052. https://doi.org/10.3390/info16121052

AMA Style

Wu J, Zhang K, Shen Y, Lin J. FreeViBe+: An Enhanced Method for Moving Target Separation. Information. 2025; 16(12):1052. https://doi.org/10.3390/info16121052

Chicago/Turabian Style

Wu, Jianwei, Keju Zhang, Yuhan Shen, and Jiaxiang Lin. 2025. "FreeViBe+: An Enhanced Method for Moving Target Separation" Information 16, no. 12: 1052. https://doi.org/10.3390/info16121052

APA Style

Wu, J., Zhang, K., Shen, Y., & Lin, J. (2025). FreeViBe+: An Enhanced Method for Moving Target Separation. Information, 16(12), 1052. https://doi.org/10.3390/info16121052

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop