An Improved Vibe Algorithm Based on Adaptive Thresholding and the Deep Learning-Driven Frame Difference Method

Liu, Huilin; Wei, Huazhang; Yang, Gaoming; Xia, Chenxing; Zhao, Shenghui

doi:10.3390/electronics12163481

Open AccessArticle

An Improved Vibe Algorithm Based on Adaptive Thresholding and the Deep Learning-Driven Frame Difference Method

by

Huilin Liu

,

Huazhang Wei

^*,

Gaoming Yang

,

Chenxing Xia

and

Shenghui Zhao

School of Computer Science and Engineering, Anhui University of Science and Technology, Taifeng Street, Huainan 232001, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(16), 3481; https://doi.org/10.3390/electronics12163481

Submission received: 5 July 2023 / Revised: 31 July 2023 / Accepted: 14 August 2023 / Published: 17 August 2023

(This article belongs to the Special Issue Modern Computer Vision and Image Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

Foreground detection is the main way to identify regions of interest. The detection effectiveness determines the accuracy of subsequent behavior analysis. In order to enhance the detection effect and optimize the problems of low accuracy, this paper proposes an improved Vibe algorithm combining the frame difference method and adaptive thresholding. First, we adopt a shallow convolutional layer of VGG16 to extract the lower-level features of the image. Features images with high correlation are fused into a new image. Second, adaptive factors based on the spatio-temporal domain are introduced to divide the foreground and background. Finally, we construct an inter-frame average speed value to measure the moving speed of the foreground, which solves the mismatch problem between background change rate and model update rate. Experimental results show that our algorithm can effectively solve the drawback of the traditional method and prevent the background model from being contaminated. It suppresses the generation of ghosting, significantly improves detection accuracy, and reduces the false detection rate.

Keywords:

image processing; vibe algorithm; ghost elimination; frame difference method; adaptive thresholding

1. Introduction

With the development of computer vision and intelligent surveillance technology, moving target detection technology has become a hot topic of research at present. The effectiveness of detection affects the subsequent behavioral analysis and identification tracking. Moving target detection is widely used in intelligent transportation, medicine, aerospace remote sensing, unmanned aerial vehicles, disaster rescue, and many other fields, with positive prospects.

Target detection involves a wide range of fields and a huge amount of data processing in the detection process. Therefore, detection capability and real-time performance will become urgent problems if deployed on devices with low computational resources. Compared with deep learning methods, traditional detection algorithms have apparent advantages in computational capacity. Conventional mainstream detection generally includes three types: optical flow method, inter-frame difference method, and background difference method. The optical flow method [1] can detect independent moving targets without the need to know information about the scene; however, it is susceptible to environmental factors. The calculation is also time-consuming. The inter-frame difference method [2,3] has high stability and low complexity, but it cannot extract the entire region of the target. The background difference method [4] has a wider application scenario than the previous two algorithms. It divides the foreground and background by comparing the difference between the background model and the current frame. The main background difference methods commonly used are the Gaussian mixture model [5,6] and the Vibe algorithm. The Gaussian mixture model is highly adaptable to the scene. However, it is computationally intensive and cannot meet real-time demand.

The Vibe algorithm [7,8] has a simple concept, high stability, and rapid computation. It is suitable for real-time detection in static scenes. However, the Vibe algorithm also suffers from significant drawbacks. The Vibe algorithm uses the initial frame to build the background model. There are moving targets in the initial frame, or the background change rate does not match the update rate of the sample model, which may produce ghosting phenomena [9]. Wind, sunlight, or camera shake may cause changes in the background, resulting in a lot of noise interference. In addition, if a moving target blocks the light, the color projected on the shadow area will be darker than the background color. Obvious pixel differences can cause the background to be misclassified as foreground. In addition, a limited sample model or a high similarity between foreground and background can lead to missing target edges. Deep-learning-based target detection techniques require high-arithmetic resources and are difficult to deploy on small devices. The detection accuracy of traditional vision-based detection techniques is low. In this paper, we develop an improved ViBe algorithm that integrates traditional detection techniques with deep learning. With the limited arithmetic resources, the detection accuracy is improved by suppressing ghosting and noise. Its main contributions are as follows.

We adopt a shallow convolutional layer of a pre-trained neural network to extract the underlying features and calculate the signal-to-noise ratio between the features of each channel. Then, the feature images with the highest correlation are selected for fusion. Finally, the fused image is modeled by replacing the image of the original frame with the fused image. This method suppresses the redundant information with weak correlation, achieves the purpose of removing background noise, and establishes the foundation for the subsequent background modeling.
Based on the newly synthesized images, we use the histogram similarity method to compare the similarity frame by frame. Then, three frames with large differences in similarity are selected for inter-frame differencing. Finally, the background model is completed by morphological filling to obtain a clear foreground target. This approach suppresses the ghosting generated by the modeling process while eliminating the problem of target holes.
We introduce a spatio-temporal adaptive factor to adjust the threshold value dynamically. This method solves the problem of poor background suitability and enhances the ability to adjust to changes in light. In addition, setting the inter-frame average value measures the speed of the target’s movement. This approach dynamically updates the neighborhood pixels so that the update rate of the model matches the background change rate to adapt to the changes in the scene.

In this paper, Section 2 reviews the previous related work. Section 3 briefly reviews the detection principles of the Vibe algorithm. In Section 4, the proposed algorithm is described in detail. Section 5 provides the dataset and performance metrics. The feasibility is verified by ablative experiments and compared with the performance of the vibe algorithm. Finally, Section 6 concludes the article.

2. Related Works

In dynamic target detection, background modeling is widely used to detect foreground targets in video frames. The background model and simulation accuracy directly affect the effectiveness of detection. Any motion target detection algorithm has to meet the processing requirements of different scenes as much as possible. However, background modeling and simulation becomes more difficult due to the complexity of the scene and various environmental disturbances. The basic idea of the ViBe algorithm is to store for each pixel a set of values that were obtained in the past at the same or adjacent positions. Then, the difference between the current pixel and the pixel in the sample is compared to determine whether it is a background point or not. Compared to traditional detection algorithms, the optical flow method detects the changing image of a frame in a graphics program in the time and space domain gradients but is influenced by environmental factors and is computationally time-consuming. The inter-frame difference method finds the detected object by comparing the difference between two consecutive frames, but the detection effect is usually residual and affected by the inter-frame time interval. The advantage of the ViBe algorithm is that it is computationally small, fast, and real-time. In contrast to deep learning methods, it does not require training on specific scenes and is unsupervised methods. However, it also has disadvantages, such as ghosting, shadows, and incomplete targets. Researchers have proposed many improved algorithms for moving target detection for the above shortcomings.

To address the above problems, Piccardi M et al. [10] proposed a background modeling method based on a mean-shift procedure in 2004. The convergence property based on mean shift enables the system to implement background modeling. In addition, histogram-based computation and local attraction enable it to meet the stringent real-time requirements of video processing. Mittal A et al. [11] proposed to use a hybrid model to represent pixels in a panoramic view and construct background images of static parts of the scene. The method detects moving objects in a video sequence, detects active patterns on a wide field of view, and moving objects removed from the video. Zhang H et al. [12] proposed a method based on thread block coordinates, optimized divergence angle, computed kernel function, and CUDA (computing unified device architecture). The improved algorithm achieved ghosting elimination and avoided large irrelevant background edges and achieved better accuracy and precision. Shao X et al. [13] proposed a motion target detection method based on the Vibe algorithm. The background model is initialized using consecutive multi-frame images, which solves the ghosting phenomenon generated by the presence of moving targets in the initial frames. However, this algorithm cannot eliminate the foreground shadow problem caused by light and increases the algorithm’s time complexity. Hayat MA et al. [14] proposed calculating the maximum Euclidean distance using the grey scale values of odd frames. The background model is then updated using the current pixel grey scale value instead of the Euclidean distance sample value in the background model. It suppresses the effect of ghosting on the background model. Singh RP et al. [15] proposed a pixel sample consensus technique for segmenting the foreground, which uses a segmentation mask approach to analyze the possibility of being absorbed and speed up ghost image suppression. However, it is too computationally intensive and cannot effectively eliminate the ghost region. In the same year, Sudha D et al. [16] used an improved Yolo-fusion Vibe algorithm. The authors combined the Kalman filter algorithm and particle filtering techniques to find upcoming vehicles. The detection accuracy is improved without unduly increasing the computational resource consumption. However, the parameters of the algorithm cannot be adaptively adjusted. The regions with depth variations are not well handled.

In recent years, researchers have considered noise point elimination and suppression of ghosting phenomena to be equally important. Yan Q et al. [17] proposed initializing the background model with the average pixel values of multi-frame images. In addition, setting adaptive thresholds enhances the adaptability of the model when calculating the Euclidean distance between the current pixel value and the background model. This algorithm increases the computation and complexity of the model. Wang T et al. [18] designed an improved Vibe algorithm incorporating CLD (Color Layout Descriptor). They designed the adaptive thresholding method in the background model to reduce the error detection rate by differential operation. However, the algorithm has poor real-time performance, and allows empty regions in foreground detection. In 2022, Lyu C et al. [19] used the EffificientNetB0 lightweight network combined with fusion Vibe for leak detection, significantly improving detection accuracy. However, it is only suitable for small target-specific scenario detection. To solve the problem of multi-scale moving objects and dynamic background in real surveillance tasks, Subudhi B et al. [20] proposed to model each pixel in the kernel-induced space using the possibility fuzzy cost function. Using the induced kernel function to project the low-dimensional data into the high-dimensional space, a robust background model will be constructed using the likelihood function based on the density of the data in the time domain, avoiding noise and outliers. In 2023, Qi, Q et al. [21] designed a novel Regional Multi-Feature Frequency (RMFF) for detecting multi-scale moving objects in dynamic backgrounds. Background changes are ignored through spatial relationships between pixels and eigenfrequencies over time in the neighborhood. Multi-scale superpixels are then used to utilize the structural information present in the real-world scene to better delineate the background from the foreground and improve the robustness of the algorithm. However, Multi-Feature Frequency detects multi-scale moving targets in dynamic backgrounds, which is highly accurate but results in consuming too much arithmetic resources and cannot be deployed on devices with less arithmetic power. Ju J et al. [22] proposed a detection algorithm that combines the smoothed three-frame difference method and the robust principal component analysis (RPCA) method. Smoothed frames weaken the effect of illumination variation, while RPCA enables data dimensionality reduction. Both provide suppression of noise. However, they are prone to crashing when processing large amounts of image data. Zheng D et al. [23] uses a Gaussian modeling algorithm with a three-frame difference method logical operation to suppress target voids and breaks. In addition, a method based on a U-net network is proposed to attenuate the dependence of the number of data sets by calculating the inverse of the positive and negative sample ratios as sample weights to deal with the imbalance of the data and set the threshold to predict the results. The method can reduce the targeted null and enhance certain anti-interference abilities. Although the number of samples is small, the introduction of deep learning also consumes some arithmetic resources.

The above researchers have used different types of methods to improve the detection accuracy, including improvements based on the traditional background disparity method and improvements based on the deep learning method. Both suppress ghosting and noise to improve the detection accuracy. Most of the improvements based on the traditional method are through background modeling, increasing the update rate of the model, etc. The method has low-detection accuracy, but is fast and has high real-time performance. Deep learning-based methods have high-detection accuracy, but consume large computing resources and are difficult to deploy on small surveillance devices. In this paper, the proposed algorithm combines traditional background differencing and deep learning, and implants a shallow network of VGG16 based on the improvement of the traditional ViBe algorithm, which reduces the detection efficiency by a small amount but ensures better detection results.

Based on the above research, this paper uses the VGG network and frame difference method to complete the background modeling. In addition, an adaptive threshold is employed to adjust the relationship between the background change rate and the update rate. The ghost image and noise point are suppressed under the premise of limited computational resources.

3. Principle of the Detection Algorithm

The Vibe [24] is a pixel-based algorithm for moving target detection in image sequences or video. The process mainly includes initializing the background model, foreground segmentation, and updating the background model.

3.1. Initialising the Background Modeling

In the Vibe algorithm, the background model is usually initialized using the first frame of the video file or image sequence. The spatial distribution of random pixel points concerning their neighbors is used to build a background sample model. By default, N = 20 samples are selected in eight neighborhoods of each pixel, M(x, y) denotes the pixel points in the sample set, and v₁, v₂… denotes the pixel values in the neighborhood. The background model is defined in Equation (1), as shown Figure 1.

M_{B} (x, y) = {v_{1}, v_{2}, v_{3}, \cdot \cdot \cdot, v_{N - 1}, v_{N}}

(1)

3.2. Foreground Segmentation

After the background model is created, the pixel points need to be divided for each image frame. First, the Euclidean distance between the current frame pixel and the sample model pixel is calculated. Second, the radius threshold R and the number threshold min are set. Third, the current pixel point P(x, y) is compared with the pixel point in the background model to determine whether P is the foreground or the background. M_R(P(x, y)) represents the pixel region with the current pixel point P(x, y) as the center of the circle and R as the radius. The condition to distinguish the front and back view is shown in Equation (2); if it is greater than the threshold value, it is the background and vice versa.

M_{R} (P (x, y)) \cap M (x, y) \geq \min

(2)

3.3. Updating the Background Model

The environment is constantly changing over time, and the foreground segmentation results must be constantly updated and iterated. Among the commonly used update strategies, the main ones are the conservative update method and the foreground counting method. In the conventional update method based on a deadlock effect, pixel points judged to be in the foreground area of the image will no longer be used to fill the background model. On the other hand, the foreground counting method makes a judgment for each pixel point in the current frame. Once a pixel has been identified several times as a pixel point in the foreground, it is necessary to determine whether the pixel is a background point. Based on this, the Vibe algorithm introduces a probability time factor t. If the sampled value is not updated after time

Δ t

, the point is a background point, and vice versa. The formula for calculating the temporal probability factor is shown in Equation (3).

P (t, t + Δ t) = {(\frac{N - 1}{N})}^{Δ t}

(3)

4. The Proposed Algorithm

The ghosting phenomenon is mainly caused by two factors: one being the presence of a moving target in the initial frame and the other being the state transition of the moving target. In addition, thresholds used in the Vibe algorithm are fixed empirical values, so many false detections exist in complex scenarios.

Based on this, this paper proposes an improved Vibe algorithm based on adaptive thresholding and a deep learning-driven frame difference method. The algorithm first uses shallow convolutional to extract the lower-level features to synthesize the new image. Then, three new frames are selected by a histogram similarity algorithm to do the difference operation. Finally, the background modeling is completed by initializing the parameters. In addition, adaptive thresholds and inter-frame averaging coefficients are set in this paper. The spatio-temporal adaptive threshold is led to distinguish between foreground and background accurately; the inter-frame average is used to measure the relationship between the rate of change in the background and the rate of update of the model.

The flow of the implementation steps of the improved Vibe algorithm is shown in Figure 2.

4.1. Description of Improved Background Modelling Methods

The Vibe algorithm initializes the background model with the first frame. A moving target in the initial frame is judged to be the background causing misjudgment and resulting in ghost regions. The frame difference method can extract the foreground area by background subtraction. However, it requires a certain degree of target movement speed. When the target moves faster, it can cause an oversized outline in the foreground area and lead to false detections when filling. When the target moves slowly, it can cause a lot of overlap in the foreground region and affect the detection accuracy. In this paper, we first use a shallow convolutional layer of pre-trained neural networks to extract the lower-level features of the image. Then, features with high correlation are selected to synthesize the new image. Furthermore, a histogram similarity algorithm is used to select the three frames with significant differences for the difference operation. Finally, the background modeling is completed by filling in the image initialization parameters.

4.2. The Underlying Features Extraction of VGG16

The deep convolutional network can extract abstract features; it needs to be trained for specific scenarios and consume a lot of computational resources. The proposed method adopts a shallow convolutional layer of pre-trained VGG16 neural networks to extract the lower-level features of the image. These features include multiple channels. These channels contain the primary information in the image and a large amount of redundant information. Applying primary information channels to the Vibe algorithm can reduce data processing and eliminate redundant interference in the scene.

VGG16 is a traditional model with a simple structure and extensive application. The model includes 13 convolutional layers and 3 fully connected layers. It uses 64 convolutional kernels to extract features of the image. First, the inner product of the original image and the inverse convolution kernel is calculated to obtain a convolution image. Then, the entire image is traversed based on the defined step size. The convolution operation makes the image size smaller, so the image needs to be filled to restore the original size. We set the original image as M(h, w), the convolution kernel as K(m, n), and the convolutional image as C(x, y). The formulation of the convolution operation is as follows.

C (x, y) = \sum_{i = 1}^{m} \sum_{j = 1}^{m} k (i, j) \cdot = M (x - i, y - j)

(4)

In the VGG16 network, both CONV3–128 and CONV3–64 can extract shallow features of images. CONV3–64 indicates that the dimension becomes 64 after the third layer of convolution and, similarly, CONV3–128 indicates that the dimension becomes 128 after the third layer of convolution; it is shown from the occupied memory in Figure 3 that CONV3–128 requires a large number of parameters and more. The advantage of Vibe algorithm is that it consumes less arithmetic resources and has high real-time performance. The use of CONV3–128 greatly enhances the arithmetic power. In small devices with limited arithmetic resources, the choice of CONV3–64 to extract features is more responsive and better meets the demand for real-time.

The underlying features extracted by VGG have multiple channels contain redundant information, as well as primary information about the image. The underlying features extracted by the underlying network layer include color, lines, edge shapes, spatial relationships, etc.

However, the effect map of the background difference method is a binarized image and only considers whether the current frame contains a moving target. The modeling only needs to judge the pixel value of the current frame and the neighboring pixels. It is equivalent to the filtering operation of the original video image, i.e., to suppress the interference of irrelevant noise. The main channel features are extracted and fused into a new image instead of the input image, and then the feature image is processed by background differencing, which improves the performance of the differencing method.

We adopt the shallow convolutional layer of VGG16 to extract the generic features of original images and obtain 64 feature images. Some feature images are presented as noisy images; others show the original images’ main features. According to the similarity between the convolutional feature map and the original map, we extract the main convolutional features that represent the image. The main convolutional features representing the image are extracted based on the similarity between the convolutional feature map and the original mapping. The shallow network with smaller perceptual fields and overlapping areas can extract more details. The effect map of vibe is a binarized image. The generated graph of the vibe algorithm is a binarized image. The modeling process only needs to obtain information such as spatial location and pixel values. Therefore, it is necessary to increase the weight of this part in the feature fusion process and reduce the weight of the redundant information with weak correlation. Finally, the new image is used to replace the current frame of the original image.

The prerequisite for subsequent background modeling is provided. PSNR (Peak Signal-to-Noise Ratio) indicates the strength of correlation between the feature image and the original image. Higher values indicate a stronger correlation and vice versa. The formulation of the PSNR is as follows.

P S N R = 10 \cdot \lg (\frac{M a x^{2}}{M S E})

(5)

M S E = \frac{1}{H W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} {‖ M (i, j) - C (i, j) ‖}^{2}

(6)

In the formula: Max represents the maximum pixel value of the original image; MSE represents the mean square error of the original image and the feature map; H and W represent the image height and width.

The size of PSNR can be obtained by calculating the Equation. The (a,b) in Figure 4 represent the convolutional images with the largest PSNR, and the index of the feature map is noted. The distribution of features extracted from the convolution layers of different video frames of the same scene is the same. The index of the maximum value of the input frame can be used as the index of the convolutional features of the subsequent video frames, so only the first n channels with the maximum PSNR need to be used for feature fusion. The feature fusion method uses series feature fusion; first, two features are directly connected in series, and then the output feature sizes are summed, and finally normalized to obtain the feature image. The experimental results are shown in Figure 4. The featured image is subsequently used instead of the input image for background differential processing.

The result of feature fusion, shown in Figure 4d, represents the image after feature fusion. The results show that the shallow convolutional layer extracts the low-level features and the input more closely. It contains features of color, texture, edge, and corner information of the image. The shallow layer network has a smaller perceptual field and smaller overlapping area to capture more details. It ensures the results are closer to the original image and weakens the interference caused by the fusion of other features. Compared with deep features, shallow feature fusion reduces computational resources and improves detection efficiency. In addition, there is no need for training in a specific situation.

4.3. Improved Three-Frame Differential Method

The traditional three-frame differencing algorithm takes three sequential frames and performs the difference operation. It eliminates background interference in static scenes and has a higher stability level. However, it cannot extract the complete area of the target and is dependent on the inter-frame time interval. We propose to combine the improved frame difference method with the Vibe algorithm so that they complement each other and improve the effectiveness of target detection. The three-frame differential method is affected by the moving speed of the target. Slow speed causes a large overlap and fast speed causes a larger contour. In order to improve the overlapping problem and increase the detection effect, this paper uses the histogram similarity algorithm combined with the frame difference method to extract the motion target contour. The difference between the initial frame and the neighboring frames is measured first, and then three frames are selected instead of three consecutive frames in the frame difference method, and the final pixels are filled to obtain the real background.

A histogram similarity algorithm was used to measure the differences between the images. First, VGG16 synthesized new images that were captured frame by frame. Second, the Bhattacharyya coefficients were applied to compare the differences between the images to obtain the image similarity values. Its value ranges between [0, 1], where 0 means different and 1 means similar. The calculation of the Bhattacharya coefficient is as follows. p and p′ represent two frames for comparison.

P (p, p^{'}) = \sum_{i = 1}^{N} \sqrt{p (i) p^{'} (i)}

(7)

The steps to improve the three-frame differencing method are as follows.

Screening of the three frames. Using the first frame Ms₁ as the base, the histogram similarity algorithm is used to traverse the image frame by frame to calculate the difference. When the similarity is less than the threshold value, we set the current frame as Ms₂, and then take Ms₂ as the base to obtain Ms₃.
Extraction of the target contour. Let Ms₂ and Ms₁ do the difference operation to obtain the result as I₁, then let Ms₃ and Ms₂ do the difference operation to obtain the result I₂, and finally, let I₁ and I₂ take the intersection to obtain the result I₁₂.

Filling the background image. The background area is filled using morphological filtering such as expansion, erosion, and open and close operations.

3.: Background modeling. A random sampling of each pixel point and neighboring pixel points is performed to build a background model sample.

4.4. Description of the Improved Thresholds Method

The Vibe algorithms use fixed thresholds, which can lead to a significant number of false detections in complex scenes. To address the poor background applicability and noise points caused by fixed thresholds, this paper establishes an adaptive threshold for each pixel based on the spatio-temporal domain. It eliminates false detections caused by insignificant grayscale differences and also removes noise interference. As a result, it can better distinguish the foreground and background. Finally, an inter-frame average velocity value is led to adjust the relationship between background change rate and model update rate, which prevents the background model from being contaminated.

4.5. Adaptive Thresholds in the Spatio-Temporal Domain

In complex scenes, the background can be affected by external factors such as light, camera shake, and weather changes. The Vibe algorithm uses a fixed threshold to determine whether the current pixel point matches the sample pixel point. Once the background difference method identifies the background point as the foreground point, it can cause misjudgment affecting the detection effect. If the threshold value set is too large, it will cause missed detection, and if it is too small, it will cause false detection [25]. Therefore, this paper leads to an adaptive threshold based on the spatio-temporal domain. It optimizes the error detection caused by the fixed threshold and improves the detection accuracy.

{\bar{d}}_{\min} = [\sum_{i = 1}^{t} \min {d_{i} [M_{N} (x, y), M_{B} (x, y)]}] / t

(8)

The time dimension introduces a time metric factor, which is calculated as shown in Equation (8). Where

d_{i}

denotes the minimum Euclidean distance between the current pixel point and the sample model pixel point at moment t, and M_N (x, y) denotes the coordinates of the current pixel point position, and M_B (x, y) denotes the coordinates of the pixel points in the sample set, respectively.

{\bar{d}}_{\min}

denotes the average minimum distance of the pixel point from the sample set in period t.

The neighborhood standard deviation is used in the spatial dimension to measure the complexity of the background changes [26,27]. It measures the degree of difference between the current and average pixel values. The threshold can be adaptively adjusted to the appropriate size when the background changes. M(x, y) represents the current pixel position of the region, and then the average pixel value M and the standard deviation d_space of the pixels in the region are calculated [28]. The calculation formulas are shown in Equations (9) and (10).

M = \frac{1}{S_{w} \times S_{h} - 1} (\sum_{i = 1}^{S_{w} \times S_{h}} M_{i}^{t} (x, y))

(9)

d_{s p a c e} = \sqrt{\frac{1}{S_{w} \times S_{h} - 1} \sum_{i = 1}^{S_{w} \times S_{h}} {| M_{i}^{t} (x, y) - M |}^{2}}

(10)

The mixed background complexity can be expressed as Equation (11), where

α_{1}

,

α_{2}

are the weight coefficients.

d_{mix} = α_{1} {\bar{d}}_{\min} + α_{2} d_{s p a c e}

(11)

R^{'} (x, y) = {\begin{matrix} R (x, y) \times (1 + d_{mix}), δ > d_{s a p c e} \\ R (x, y) \times (1 - d_{mix}), δ < d_{s a p c e} \end{matrix}}

(12)

The calculation of the adaptive thresholding is shown as in Equation (12). where R′(x, y) denotes the adaptive threshold, R(x, y) denotes the fixed threshold and

δ

is the set scale factor.

The comparison of adaptive segmentation threshold and fixed threshold is shown in Figure 5a,b represent low- and high-dynamic background-fixed thresholds R, while Figure c represents the high-background adaptive threshold R(x, y). P(x, y) is the background point in the low-dynamic background. Using a fixed threshold, P(x, y) becomes the front point in the highly dynamic background. While an adaptive threshold R(x, y) is adopted, P(x, y) is transformed into a background point again. The size of the threshold value determines the sensitivity of the model.

4.6. Background Update Phase Improvements

The background model is affected by external factors such as weather and light. The mismatch between the background update rate and the background change rate may lead to false detection. In this paper, we adopt an inter-frame average speed factor to measure the foreground’s movement speed and dynamically adjust the model’s update rate. Where f represents the number of frames and d_i represents the deviation in position of the foreground between adjacent frames. The inter-frame average speed factor

ε

is expressed as Equation (13).

ε = (\sum_{i}^{n} d_{i}) / f

(13)

The Vibe algorithm sets the value of the update rate

φ

to 16. In this paper, the model update rates are divided into three categories, which are determined based on the empirical values of the set frame rates

δ

.

{\begin{matrix} φ_{1} = 8, & ε \geq δ_{2} \\ φ_{2} = 16, & δ_{1} \leq ε < δ_{2} \\ φ_{3} = 32, & ε < δ_{1} \end{matrix}}

(14)

5. Experimental Results and Analysis

To verify the feasibility and effectiveness of the proposed algorithm, we conducted experiments on ghosting suppression and noise elimination. The hardware environment of the experimental platform is an ordinary computer with an Intel I5 processor at 2.60 GHz and 8 G of RAM. Ablative experiments were conducted for the three improvement points separately. The experimental results are shown in Figure 5, Figure 6, Figure 7 and Figure 8. The materials used in the experiments are videos from the CDW-2014 dataset and self-built campus videos. The CDW-2014 dataset contains 11 video categories (Baseline, Dynamic Background, Camera Jitter, Intermittent Object, Motion, Shadow, Thermal, Bad Weather, Low Framerate, Night Videos, PTZ, Turbulence), each category has four to six video sequences. The dataset scenarios are complex and varied with different video sequences located in different scenarios and subject to different environments, so that the background changes in different situations. Different data were used in the experiments to highlight the generalization of the algorithm. The software environment is Pycharm and OpenCV3 computer vision open-source library. The programming language is Python. First, ablation experiments are conducted to verify its feasibility. Second, the Vibe algorithm is compared with the improved algorithms in the literature [22,23]. Finally, the effectiveness of the proposed algorithm in eliminating ghosting and suppressing noise interference is verified by different performance metrics. Experimental results demonstrate that this method can effectively eliminate ghosts and suppress noise interference caused by background changes.

5.1. Algorithm Performance Evaluation

To quantitatively evaluate the quality criteria of the proposed algorithm, this paper evaluates the model’s strengths and weaknesses in five aspects [26,27]: Accuracy, Precision, Recall, F-measure, and Balance error rate (PCW), and tests them on different datasets. These metrics are calculated as follows.

A C C = \frac{T_{P} + T_{f}}{T_{P} + F_{p} + T_{f} + F_{f}} \Pr e c i s i o n = \frac{T_{P}}{T_{P} + F_{p}} \times 100 % Re c a l l = \frac{T_{P}}{T_{P} + F_{f}} \times 100 % F - m e a s u r e = \frac{2 \times Re c a l l \times \Pr e c i s i o n}{Re c a l l + \Pr e c i s i o n} P C W = [1 - \frac{1}{2} (\frac{T_{P}}{T_{P} + F_{p}} + \frac{T_{f}}{T_{f} + F_{p}})] \times 100 %

(15)

where T_p represents the number of points correctly identified as foreground; F_p represents the number of points incorrectly detected as foreground; F_f is mistakenly detected as foreground point background points.

5.2. Experiment of Ghost Elimination

The ghosting elimination experiments were validated not only on the CDW public datasets, such as “Highway I”, “Pets 2006”, and “Pedestrians”, but also using our campus’ self-built datasets. The experimental results are shown in Figure 6. In addition to the Vibe algorithm, the foreground extraction algorithms of Ju J et al. and Zheng D et al. were compared in the experiments. In Figure 6, we notice that the Vibe algorithm leaves many ghost pixels, which affects the detection effectiveness. Ju J adopts the improved frame difference method to extract the foreground, which has some suppression effect, but still leaves some ghost pixels. Based on this, Zheng D et al. apply deep learning-driven GMM algorithm modeling to achieve suppression in four sets of data. However, the target still has gaps and noise points affecting the detection accuracy. The extraction effects of different improved algorithms are shown in Figure 6.

The ghost shadow experiment is to eliminate the residual shadow behind the motion area. There are no clear quantitative indicators for ghost elimination experiments. However, to quantitatively analyze the effect of ghost shadow elimination, an additional experiment is added. As shown in Figure 7, a ghost suspicious area is set behind the moving target. The image ROI region is extracted, and the number of foreground spots and background spots is counted in the rectangular box. Pixels with a pixel value of 0 are real background points and pixels with a pixel value of 255 are ghost points. The effect of ghost shadow elimination was analyzed by calculating the percentage of the number of pixels in the region.

Table 1 shows the number of front points and background points in the region. The number of front points is the number of points judged as ghosting points, and the total number of pixel points in the region is the number of background points in the real region. The larger the percentage of predicted background points, the better the effect of ghost removal. The percentage of the number of background points is shown in Figure 8.

The set ghosting area is the real background points, and the foreground pixel points are the ghosting pixels. The ratio of the number of background point pixels to the total pixels in the region represents the ghosting elimination rate. Tested on three public video sequences and one self-constructed dataset, by comparison, Ju J and Zheng D’s method has some suppression effect on ghosting, but the stability varies greatly for different datasets. However, the proposed algorithm is the most stable, and the ghosting elimination rate can reach as low as 96.99% and as high as 99.39%. The method obviously suppresses the ghosting phenomenon and ensures the integrity of the experimental results. The experimental results reveal that the proposed algorithm not only has a significant extraction effect on the public dataset but also still has a good suppression effect on the self-built campus dataset for ghost shadow elimination.

5.3. Experiment of Eliminating Noise

In the noise cancellation experiment, we selected data from public datasets such as “Office”, “Highway II”, and “Snowfall”, and verified the detection effectiveness in our campus’ self-built dataset. The detection effectiveness of different algorithms is shown in Figure 9. The Vibe algorithm is sensitive to sudden changes in light, leaf shaking, rain, and snow, leading to background points being falsely detected as foreground. Ju J propose an improved algorithm combining the smoothed frame difference and the robust principal component analysis. Zheng D adopt a deep learning-driven improved GMM algorithm for background modeling. In addition, the background image is optimized by fusing the frame difference method and morphological filtering. In this paper, we improved the foreground segmentation and model updating in addition to the optimized background model. We adopted an adaptive threshold to suppress noise interference to adjust the relationship between the update rate and background change rate.

The detection effectiveness figure reveals that Vibe has a significant number of noise points and contains the problem of target voids. Ju J and Zheng’s method have a certain extent of suppression effectiveness. In contrast, the proposed method has more obvious suppression effectiveness on noise points. It also strengthens the anti-interference capability. The following graphs quantify the above detection results in terms of accuracy, recall, balance error rate, and balance error rate. The data for each group of metrics is shown in Table 2, and the comparison effect is shown in Bar Figure 10.

Compared with Vibe’s algorithm, the detection of Ju J and Zhang’s method on different datasets is basically improved. The noise points are also suppressed. The detection accuracy of Zhang’s method on Highway II decreased slightly, with the F-measure dropping by about 4%. Due to the light, a lot of shadows were left when detecting the vehicles, and the morphologically processed noise points were amplified, which affected the detection accuracy. The algorithm in this paper has a high accuracy of 96.68% on the Campus dataset and the value of PCW is as low as 13.12%. The effects are substantially improved on all other datasets. By comparing several methods, the proposed algorithm has the best performance in terms of suppression of ghosting and elimination of noise with the highest stability. However, this method still has a drawback in the computation of image similarity and the introduction of adaptive factors increases the computation time. Therefore, the proposed algorithm is more effective in detecting images with a smaller percentage of foreground pixels.

5.4. The Ablative Experiments

The image of the graph synthesized by VGG16 replaced the first frame image relative to the filtering operation performed on the image. To verify the effectiveness of the method, ablative experiments were conducted. The noise point tests were performed separately with the first frame image and the image after synthesis using the feature map, and the experimental results are shown in Figure 11.

After graying binarization of the two images, they were compared with the foreground binarized image pixels. By calculating the number of pixels in the image other than the foreground pixels, we found that the number of noise points caused by light was greatly suppressed after the binarization of the original image and the fused feature map, which also made the background clearer.

The experiments were conducted to test the ablation of the three improvement points proposed in this paper. The experimental results are shown in Figure 12.

Figure 12c shows the processed image of the Vibe in the pedestrians’ dataset, and Figure 12d uses the improved three-frame difference method to fuse the processed image of the Vibe. The image clearly shows the ghosting phenomenon of the Vibe [25], which is due to the existence of motion targets in the initial frame. The single-frame initialization modeling cannot handle the situation. At the same time, the improved algorithm uses the VGG network to eliminate redundant terms. The histogram similarity selects three frames with relatively significant differences to do the difference operation to avoid the ghosting problem caused by using only the initial frame.

Figure 13a shows 188 frame images in the campus dataset. Figure 13c is the result of Vibe processing with a large amount of noise interference. Figure 13d is the result of processing using spatio-temporal domain thresholding. The adaptive setting of the threshold value R not only makes the foreground background delineation more accurate but also optimizes the problem of the high-false-detection rate of the traditional algorithm.

Figure 14a shows 108 frames of data in the CDW-2014 dataset. Figure 14c shows that when the sudden change of light causes the background to change rapidly, the model cannot be updated in time, leading to false detection. Figure 14d adds the inter-frame average velocity factor. It adaptively adjusts the background model’s update factor according to the foreground objects’ motion speed. The background change rate is matched with the model update rate to reduce the false detection rate.

The quantitative analysis of the ablative experiments on three sets of video data is performed in terms of precision, accuracy, recall, F-measure (F1), and balance error rate. The experimental results are shown in Figure 15.

Different datasets (including self-constructed datasets) have a large degree of influence on the evaluation metrics. Differences in the background complexity of the dataset, the number of noise points, and the rate of change of the foreground all lead to large differences in performance metrics. Ablation experiments were conducted on three video datasets with different improvement points. The histogram shows that the Vibe algorithm has a higher recall rate with lower precision. We use the F1 Index to consider the detection effect comprehensively. On the three datasets, we found that the F1 values were substantially elevated, reaching a maximum of 78.05%. The data comparison revealed that it proves the feasibility of the proposed algorithm and quantitatively shows the detection enhancement effect. The value of PCW indicates that the Vibe algorithm has a higher error detection. In comparison, the improved algorithm has a low PCW value of 15.69% on the pedestrian dataset, which significantly reduces the false detection rate and improves the detection accuracy.

6. Conclusions

This paper presents a foreground segmentation method with good real-time performance and high-detection accuracy. The ghosting is effectively suppressed, and the detection accuracy is improved. The setting of the adaptive thresholding in the spatio-temporal domain makes the model more real-time and reduces the error detection rate. Experiments show that the accuracy rate of different data sets can reach about 90%, and the balance error rate is reduced to less than 13%. The proposed algorithm is effective not only for public datasets, but also maintains positive detection results for self-built datasets. Experiments indicate that the proposed algorithm is not only effective for public datasets but also maintains a positive detection effect on self-built datasets. Subsequently, we will conduct an in-depth study to deploy the method to small mobile devices to find the optimal solution between real-time and accuracy according to the arithmetic power of the device. Secondly, the effect image obtained by this method can be transferred to large devices to realize high-accuracy detection by combining with deep learning. The algorithm can serve as a pre-operation for image segmentation, such as leak detection, fire detection, etc. It provides a direction for simplifying the computation of deep learning. However, the proposed method still has limitations. For example, the introduction of neural networks and the calculation of similarity increase computational costs. It makes the proposed algorithm more time-consuming. In addition, the algorithm cannot solve the problems arising from the slow movement of the target or the change of the target from motion to prohibition. This would lead to the foreground being absorbed into the background. The proposed algorithm is better for processing images with small targets. Further research is needed for images with a relatively large proportion of foreground targets.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/2079-9292/12/16/3481/s1.

Author Contributions

Conceptualization, H.L.; Validation, H.W.; Data curation, C.X.; Writing—original draft, H.W.; Writing—review & editing, G.Y.; Supervision, S.Z.; Funding acquisition, G.Y., C.X. and S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This project was funded by the National Natural Science Foundation of China (No. 62102003) and the Natural Science Foundation of Anhui Province of China (No. 2008085MF220). This study also is supported by the open Foundation of Anhui Engineering Research Center of Intelligent Perception and Elderly Care, Chuzhou University, under Grant No. 2022OPB01. The funders had no role in the study design, data collection and analysis, publication decisions, or manuscript preparation.

Data Availability Statement

The public dataset can be found at the https://aimagelab.ing.unimore.it/visor/video_categories.asp link, and the self-built dataset and code are being uploaded to the supplementary file.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zheng, Y.; Zhang, M.; Lu, F. Optical Flow in the Dark. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6748–6756. [Google Scholar]
Fadl, S.M.; Han, Q.; Li, Q. Inter-frame forgery detection based on differential energy of residue. IET Image Process. 2019, 13, 522–528. [Google Scholar] [CrossRef]
Bakas, J.; Naskar, R.; Bakshi, S. Detection and localization of inter-frame forgeries in videos based on macroblock variation and motion vector analysis. Comput. Electr. Eng. 2021, 89, 106929. [Google Scholar] [CrossRef]
Xu, F.; Li, G. Feature extraction algorithm of basketball trajectory based on the background difference method. Math. Probl. Eng. 2022, 2022, 2653279. [Google Scholar] [CrossRef]
Singhal, A.; Singh, P.; Lall, B.; Joshi, S.D. Modeling and prediction of COVID-19 pandemic using Gaussian mixture model. Chaos Solitons Fractals 2020, 138, 110023. [Google Scholar] [CrossRef]
Ban, Z.; Liu, J.; Cao, L. Superpixel Segmentation Using Gaussian Mixture Model. IEEE Trans. Image Process. 2018, 27, 4105–4117. [Google Scholar] [CrossRef] [PubMed]
Huang, W.; Liu, L.; Yue, C.; Li, H. The moving target detection algorithm based on the improved visual background extraction. Infrared Phys. Technol. 2015, 71, 518–525. [Google Scholar] [CrossRef]
Liu, L.; Chai, G.-h.; Qu, Z. Moving target detection based on improved ghost suppression and adaptive visual back-ground extraction. J. Cent. South Univ. 2021, 28, 747–759. [Google Scholar] [CrossRef]
Liu, L.; Liu, S.; Qu, Z.; Zhou, D. Self-adaptive visual background extraction with ghost regions elimination. In Proceedings of the 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 22–24 October 2021; pp. 456–462. [Google Scholar]
Piccardi, M.; Jan, T. Mean-shift background image modelling. In Proceedings of the 2004 International Conference on Image Processing, 2004. ICIP’04, Singapore, 24–27 October 2004; Volume 5, pp. 3399–3402. [Google Scholar]
Mittal, A.; Huttenlocher, D. Scene modeling for wide area surveillance and image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), Hilton Head, SC, USA, 15 June 2000; Volume 2, pp. 160–167. [Google Scholar]
Zhang, H.; Qian, Y.; Wang, Y.; Chen, R.; Tian, C. A ViBe Based Moving Targets Edge Detection Algorithm and Its Parallel Implementation. Int. J. Parallel Program. 2019, 48, 890–908. [Google Scholar] [CrossRef]
Shao, X.; Chen, X.; Li, K.; Lv, Z.; Zhu, H. An Improved Moving Target Detection Method Based on Vibe Algorithm. In Proceedings of the 2018 Chinese Automation Congress (CAC), Xi’an, China, 30 November–2 December 2018. [Google Scholar]
Hayat, M.A.; Yang, G.; Iqbal, A.; Saleem, A.; Mateen, M. The swimmers motion detection using improved vibe algorithm. In Proceedings of the 2019 International Conference on Robotics and Automation in Industry (ICRAI), Rawalpindi, Pakistan, 21–22 October 2019; pp. 1–6. [Google Scholar]
Singh, R.P.; Sharma, P.; Madarkar, J. Compute-Extensive Background Subtraction for Efficient Ghost Suppression. IEEE Access 2019, 7, 130180–130196. [Google Scholar] [CrossRef]
Sudha, D.; Priyadarshini, J. An intelligent multiple vehicle detection and tracking using modified vibe algorithm and deep learning algorithm. Soft Comput. 2020, 24, 17417–17429. [Google Scholar] [CrossRef]
Yan, Q.; Wang, J. An improved moving target detection algorithm based on vibe. In Proceedings of the 2020 International Conference on Computer Network, Electronic and Automation (ICCNEA), Xi’an, China, 25–27 September 2020; pp. 16–20. [Google Scholar]
Wang, T.; Wang, W.; Cui, Y.-h. Improved vibe algorithm based on color layout descriptor. J. Comput. Appl. 2020, 40, 812–818. [Google Scholar]
Lyu, C.; Liu, Y.; Wang, X.; Chen, Y.; Jin, J.; Yang, J. Visual Early Leakage Detection for Industrial Surveillance Environments. IEEE Trans. Ind. Informatics 2021, 18, 3670–3680. [Google Scholar] [CrossRef]
Subudhi, B.N.; Panda, M.K.; Veerakumar, T.; Jakhetiya, V.; Esakkirajan, S. Kernel-Induced Possibilistic Fuzzy Associate Background Subtraction for Video Scene. IEEE Trans. Comput. Soc. Syst. 2022, 10, 1314–1325. [Google Scholar] [CrossRef]
Qi, Q.; Yu, X.; Lei, P.; He, W.; Zhang, G.; Wu, J.; Tu, B. Background subtraction via regional multi-feature-frequency model in complex scenes. Soft Comput. 2023, 1–14. [Google Scholar] [CrossRef]
Ju, J.; Xing, J. RETRACTED ARTICLE: Moving object detection based on smoothing three frame difference method fused with RPCA. Multimed. Tools Appl. 2018, 78, 29937–29951. [Google Scholar] [CrossRef]
Zheng, D.; Zhang, Y.; Xiao, Z.; Jan, M.A. Deep learning-driven gaussian modeling and improved motion detection algo-rithm of the three-frame difference method. Mob. Inf. Syst. 2021, 2021, 9976623. [Google Scholar]
Barnich, O.; Van Droogenbroeck, M. ViBe: A Universal Background Subtraction Algorithm for Video Sequences. IEEE Trans. Image Process. 2011, 20, 1709–1724. [Google Scholar] [CrossRef] [PubMed]
Qu, Z.; Huang, X.-L. The foreground detection algorithm combined the temporal–spatial information and adaptive visual background extraction. Imaging Sci. J. 2017, 65, 49–61. [Google Scholar] [CrossRef]
Prati, A.; Mikic, I.; Trivedi, M.M.; Cucchiara, R. Detecting moving shadows: Algorithms and evaluation. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 918–923. [Google Scholar] [CrossRef]
Zuo, J.; Jia, Z.; Yang, J.; Kasabov, N. Moving object detection in video sequence images based on an improved visual background extraction algorithm. Multimed. Tools Appl. 2020, 79, 29663–29684. [Google Scholar] [CrossRef]
Liu, J.; Zhang, Y.; Zhao, Q. Adaptive ViBe Algorithm Based on Pearson Correlation Coefficient. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; pp. 4885–4889. [Google Scholar]

Figure 1. Randomly collect neighborhood pixel values to initialize the background model.

Figure 2. Flow chart of the improved algorithm.

Figure 3. Memory capacity required for VGG16 partial model.

Figure 4. VGG16 feature extraction and fusion. (a) Video frames; (b) Convolutional images with the largest PSNR; (c) Convolutional images with the second highest PSNR; (d) Image after feature fusion.

Figure 5. Comparison graph of the threshold for different dynamic backgrounds. (a) low -dynamic background fixed thresholds; (b) high-dynamic background fixed thresholds; (c) high-background adaptive threshold.

Figure 6. Detection results of different algorithms. (a) Video frames; (b) ground ture; (c) Vibe algorithm; (d) Ju J’s method; (e) Zheng’s method; (f) Proposed algorithm.

Figure 7. Extracted ROI-ghosted regions.

Figure 8. The ratio of the real background points to the sum of the pixel values in the regions.

Figure 9. Extraction effect of motion targets. (a) Video frames; (b) Ground truth; (c) Vibe algorithm; (d) Ju J’s method; (e) Zheng D’s method; (f) Proposed algorithm.

Figure 10. Performance histograms of each algorithm.

Figure 11. The ablative experiment of ghost shadow. (a) Video frames; (b) Fusion image (c) Binarized image of Ground truth; (d) Binarized image of the original video frame. (e) Binarized image of the proposed algorithm.

Figure 12. The ablative experiment of ghost shadow. (a) Video frames; (b) Ground truth; (c) Vibe algorithm (d) Proposed algorithm.

Figure 13. The ablative experiment of radius threshold. (a) Video frames; (b) Ground truth; (c) Vibe algorithm (d) Proposed algorithm.

Figure 14. The ablative experiment of background renewal rate. (a) Video frames; (b) Ground truth; (c) Vibe algorithm (d) Proposed algorithm.

Figure 15. The ablative experimental evaluation index.

Table 1. Number of foreground-background pixels in the region.

Algorithm	Pedestrians	Highway I	Pets 2006	Campus
ViBe	2648/5632	7412/5214	9248/5446	1515/6285
Ju J’s method	176/8104	2497/10,129	3456/11,238	717/7083
Zheng’s method	224/8056	1296/11,330	2357/12,337	310/7490
Proposed	50/8230	268/12,358	443/14,251	226/7574

Table 2. Evaluation indexes of different algorithms, unit: %.

Evaluation Index	Algorithm	Highway II	Office	Snowfall	Campus
Precision	ViBe	78.66	81.06	67.51	79.80
	Ju J’s method	80.13	79.11	72.43	82.69
	Zheng’s method	71.80	86.15	88.23	77.59
	Proposed	82.21	91.21	85.32	91.68
Recall	ViBe	92.65	85.70	70.96	87.55
	Ju J’s method	89.46	78.36	66.12	83.14
	Zheng’s method	91.94	84.87	79.84	91.87
	Proposed	92.59	87.23	95.57	96.24
F-measure	ViBe	85.08	83.32	69.19	83.49
	Ju J’s method	84.54	78.73	69.14	82.92
	Zheng’s method	80.63	85.50	83.82	84.13
	Proposed	87.09	89.18	90.16	93.91
PCW	ViBe	24.95	25.15	30.76	24.30
	Ju J’s method	18.13	18.08	15.98	20.83
	Zheng’s method	15.20	12.80	13.94	16.98
	Proposed	12.60	9.67	9.55	13.12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, H.; Wei, H.; Yang, G.; Xia, C.; Zhao, S. An Improved Vibe Algorithm Based on Adaptive Thresholding and the Deep Learning-Driven Frame Difference Method. Electronics 2023, 12, 3481. https://doi.org/10.3390/electronics12163481

AMA Style

Liu H, Wei H, Yang G, Xia C, Zhao S. An Improved Vibe Algorithm Based on Adaptive Thresholding and the Deep Learning-Driven Frame Difference Method. Electronics. 2023; 12(16):3481. https://doi.org/10.3390/electronics12163481

Chicago/Turabian Style

Liu, Huilin, Huazhang Wei, Gaoming Yang, Chenxing Xia, and Shenghui Zhao. 2023. "An Improved Vibe Algorithm Based on Adaptive Thresholding and the Deep Learning-Driven Frame Difference Method" Electronics 12, no. 16: 3481. https://doi.org/10.3390/electronics12163481

APA Style

Liu, H., Wei, H., Yang, G., Xia, C., & Zhao, S. (2023). An Improved Vibe Algorithm Based on Adaptive Thresholding and the Deep Learning-Driven Frame Difference Method. Electronics, 12(16), 3481. https://doi.org/10.3390/electronics12163481

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved Vibe Algorithm Based on Adaptive Thresholding and the Deep Learning-Driven Frame Difference Method

Abstract

1. Introduction

2. Related Works

3. Principle of the Detection Algorithm

3.1. Initialising the Background Modeling

3.2. Foreground Segmentation

3.3. Updating the Background Model

4. The Proposed Algorithm

4.1. Description of Improved Background Modelling Methods

4.2. The Underlying Features Extraction of VGG16

4.3. Improved Three-Frame Differential Method

4.4. Description of the Improved Thresholds Method

4.5. Adaptive Thresholds in the Spatio-Temporal Domain

4.6. Background Update Phase Improvements

5. Experimental Results and Analysis

5.1. Algorithm Performance Evaluation

5.2. Experiment of Ghost Elimination

5.3. Experiment of Eliminating Noise

5.4. The Ablative Experiments

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI