^{⋆}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

This paper presents a region-based method for background subtraction. It relies on color histograms, texture information, and successive division of candidate rectangular image regions to model the background and detect motion. Our proposed algorithm uses this principle and combines it with Gaussian Mixture background modeling to produce a new method which outperforms the classic Gaussian Mixture background subtraction method. Our method has the advantages of filtering noise during image differentiation and providing a selectable level of detail for the contour of the moving shapes. The algorithm is tested on various video sequences and is shown to outperform state-of-the-art background subtraction methods.

To track moving objects in videos, two main approaches are possible: explicit segmentation of the moving regions following by matching of the segmented regions, or searching of moving objects based on appearance without segmentation. Although segmentation is known to be challenging, segmenting moving regions makes it possible to focus on a search by appearance on a smaller area. Furthermore, any additional information is always welcome in computer vision. This is why background subtraction is an important subject and forms the basis of many algorithms and applications in video surveillance to detect suspicious behaviors of people or objects, and in human-computer interactions (HCI) to recognize the posture, gestures, or activities of people so that the environment can react accordingly.

Even with all the effort made to date, background subtraction, which is applicable to still camera images, continues to face a number of difficulties. The principle behind background subtraction is to subtract and threshold a background model image from the current frame. The result gives the differences between the two subtracted images, and it is hypothesized that these differences correspond to moving objects. In practice, this is not always the case, as differences may correspond to shadows, changes of lighting, or camera noise. Furthermore, some of them may correspond to changes in an image, like waving leaves or waves on a lake, which are irrelevant to the application. The challenge, then, is to propose a background model that allows filtering of these unavoidable perturbations, while still correctly detecting the moving objects of interest.

Many background subtraction methods have been proposed with different models and update strategies. Most rely on the difference between individual pixels. Since perturbations often affect individual pixels, this may cause misdetection when performing differentiation, as observed in [

We thoroughly tested our method by comparing detected moving regions with ground-truth regions using true and false positive rate measures. We also characterized the impact of parameter change on the results to evaluate parameter sensitivity and stability. The results show that our proposed method, combined with Gaussian Mixture, outperforms Gaussian Mixture alone and other state-of-the-art methods.

One of the advantages of our proposed approach compared to state-of-the-art methods is that it reduces the number of false detections, as pixel-level differentiation can be performed in regions with significant motion only. Another advantage is that the subdivision of large regions into small ones can be stopped before pixel level is reached. So, if required, only a coarse background subtraction need be performed (see

The paper is structured as follows. Section 2. gives an overview of the state-of-the-art of background subtraction methods. Section 3. presents the proposed motion detection and background subtraction algorithms, and Section 4. demonstrates the capability of the proposed method with a thorough analysis. Finally, Section 5. concludes the paper.

Most background subtraction methods consider pixels individually. One of those most often used is the Single Gaussian method [

The Single Gaussian method can be improved significantly by using more than one Gaussian per pixel [

A related model uses the median, the minimum, and maximum values of a pixel [

Edges can be used to model the background instead of the pixel colors, for example, edge histograms for pixel blocks may be used to model the background [

A Bayesian approach is proposed in the work of Li

In the work of Wu

With the Gaussian Mixture method, it may not be possible to easily model backgrounds with fast variations accurately with a few Gaussians. To overcome this problem, a non-parametric approach, which estimates the probability density function of each pixel using a kernel density estimation technique, was proposed in [

Some region-based methods have also been proposed. Recently, a method based on local binary patterns was tested [

The methods presented in the work of Matsuyama

Like all background subtraction approaches, our proposed method is composed of a regularly updated background model and a similarity measure to compare a given frame with that background model. The background is modeled at different scales with color histograms of rectangular regions. We use histograms to model the regions, because they are robust to local noise and because histograms of larger regions can be recursively built from the combination of smaller ones. This is not true for the descriptors that account for pixel locations, however. Thus, histograms can be computed rapidly and can filter noise. The current frame in which we want to detect motion is modeled in the same way. Motion is detected by comparing the corresponding rectangular regions from the coarsest scale to the finest scale. Comparisons are performed at a finer scale only if motion is detected at a coarser scale.

To devise a more efficient method, the Gaussian Mixture method is applied at the finest scale. We call our method RECTGAUSS-Tex.

Our background model _{R}

The background model _{R}_{B}

For each 4 by 3 rectangle (the finest scale), we compute two statistical measures. These measures will allow us to detect changes between _{R}_{t}_{M}_{M}

For color histograms, we use a uniform _{c}_{c}_{R}

To detect motion, the current frame _{t}_{R}_{R}_{R}

Starting from the coarsest-scale rectangles and for each rectangle, we compute the histogram similarity using the MDPA distance [_{H}_{I}_{M}_{R}_{R}^{th}^{th}

Two histograms are similar if
_{R}

For greater robustness, a second statistical measure is used, which is the pixel intensity variance _{M}

The intensity variance is used to modulate threshold

Given a rectangle variance from the background model _{M}_{I}_{v}

This gives a value between 0 and 1, corresponding to the rectangle texture similarity of the current rectangular region _{I}_{M}_{I}_{I}_{M}_{M}_{I}_{I}_{I}_{M}_{R}_{R}_{R}_{R}

Through testing, we have noted that our method is improved by combining it with the Gaussian Mixture [

Gaussion Mixture method

In the Gaussian Mixture method, RGB values measured in time at a given pixel position have been generated by a stationary process modeled by a Gaussian distribution
_{t}_{t}_{t}_{t}_{i,t}^{th}_{t}_{t,r}_{t,g}_{t,b}_{t}_{g}_{b}

Gaussion Mixture in our method

In our method, Gaussian Mixture background subtraction is used in the following way. It is applied only when the finest scale is reached, which means that the Gaussian Mixture background model is updated for all pixels, but Gaussian Mixture motion detection is applied only for rectangles where motion is detected at the finest scale (4 × 3 pixels).

We have thoroughly characterized our method with various experiments. The goal of this work is to propose a background subtraction method that is efficient, but at the same time does not need too much parameter tuning to be easily applicable. Thus, the performance of RECTGAUSS-Tex was tested for parameter stability. The actual foreground/background segmentations were verified against ground-truth and with the same parameters for the videos in each dataset. In addition, RECTGAUSS-Tex was compared with other common background subtraction methods.

We used the Wallflower dataset[

First, for the Wallflower dataset, seven different background subtraction methods were tested: RECTGAUSS-Tex (RGT), Single Gaussian (SG) [

For the performance evaluation of our proposed background subtraction method, we used two metrics: True Positive Rate (TPR) and False Positive Rate (FPR). True Positive (TP)/True Negative (TN) is defined as the number of foreground/background pixels that are correctly classified as foreground/background pixels. False Positive (FP)/False Negative (FN) is defined as the number of background/foreground pixels that are erroneously classified as foreground/background. The TPR and FPR are defined as

A high TPR will be obtained if the number of real foreground pixels detected in the extracted foreground is much larger than the number of real foreground pixels that are detected in the background (

As shown in

For complete separation of foreground from background, the FPR value has to be very small (

There is no motion in the

In the second experiment, we tested our method with our dataset. The parameters of each method are listed in

Next, we tested the effect of parameters on the performance of the proposed background subtraction method. This can be determined by a metric called sensitivity [

The same normalization is performed for the TNR. As shown in

Now, are the results stable for some parameter range? Do we have to change all the parameters or just a few? Do we have the parameter insensitivity that we wished for? To answer these questions, the normalized sum of TPR and TNR is also calculated to arrive at a general metric for evaluation. As shown in _{c}

The selection of the best parameter set also depends on the application. For some applications, having a high TPR is much more important than a high TNR, because there is a requirement not to miss any foreground parts (e.g., hand tracking, face tracking,

The results show that our method has the following advantages over to the state-of-the-art:

Because of the use of rectangular regions, local noise affecting individual pixels is filtered naturally by the histogram properties. Waving leaves in a background can thus be dealt with directly during foreground detection;

Small objects can be ignored by customizing the size of the rectangles at the coarsest level (e.g., to detect cars, not humans);

Texture and rectangular region intensity variance allow our method to deal with light shadows and small illumination changes by adjusting the foreground detection threshold when comparing histograms;

Motion detection can be performed by selecting a different scale dynamically at each frame by stopping histogram subdivision at the foreground detection step. Foreground object shapes may be detected with precision only when required during a tracking process (see

However, like many other background subtraction methods, our method does not handle large illumination changes. Perhaps, this could be accounted for by considering histogram shifts. Another drawback is the choice of the coarsest rectangle size, which needs to be selected to be small enough to detect the object of interest. Objects to detect must occupy a large enough proportion of the coarsest rectangle. Thus, balancing the rectangle size for the detection of small objects while filtering noise might be incompatible for very small objects.

In this paper, a novel background subtraction method is proposed. The background is modeled by rectangular regions described by a color histogram and a texture measure. It is modeled at different scales to detect motion more and more precisely. This background model is combined with the Gaussian Mixture model. The use of rectangular regions filters out small motions like swaying vegetation, and data acquisition noise. The Gaussian Mixture background subtraction method then completes the work by detailing the foreground detection in rectangular areas where significant motion is detected. Compared to the Gaussian Mixture method alone, RECTGAUSS-Tex gives less false positive detection for similar true positive results.

The algorithm was evaluated with various videos against different illumination changes and resolutions. The results obtained show that RECTGAUSS-Tex outperforms the Gaussian Mixture method as it has a similar TPR, but a smaller FPR. For the datasets used, it also outperforms CodeBook, KDE, and TBMOD using their default parameters. Although our algorithm uses eight parameters, six of them are stable, and only two requires tuning.

Our motion detection algorithm can be performed at different scales to adjust to the object shape precision needed for one application, which means that we can perform detection only at coarse scale with large rectangles and without using the Gaussian Mixture method.

The drawback of the method is that it requires 3% to 6% more processing than Gaussian Mixture method, but, like the Gaussian Mixture method, ours is appropriate for online application with an image resolution of 320 × 240 or less with our implementation. However, MDPA distance calculations could be performed in parallel to speed up processing. Future work will involve adjusting the thresholds dynamically based on scene content and object appearance. So, if the color of an object is similar to the background, the detection threshold could be decreased in this background area for more sensitivity.

This work is supported by the Fonds Québécois de la Recherche sur la Nature et les Technologies (FQRNT), by the Natural Sciences and Engineering Research Council of Canada (NSERC), and by the Canada Foundation for Innovation (CFI). We would like to thank Cynthia Orman for revising the paper and the anonymous reviewers for providing helpful comments.

Motion detection at different scales. Finest rectangle size of. (a) 4 × 3; (b) 16 × 12; and (c) 32 × 24.

Steps of the background modeling process.

Steps in background updating and motion detection.

Motion detection criterion.

MDPA distance example. This distance is smaller for H1 and H2 than for H1 and H3. The Euclidean distances are similar.

An example of motion detection as the algorithm progresses into scale. Black rectangles illustrate regions labeled as background. Motion is detected in the rectangle indicated. When, the algorithm moves to a finer scale, it will consider all the rectangles in the grayed area.

True Positive Rate of various background subtraction methods for the Wallflower dataset. The Total column represents the combination of all videos. SG: Single Gaussian, GM: Gaussian Mixture, TA: Temporal Average, RGT: our method, KDE: Kernel Density Estimation, CB: CodeBook, TBMOD: Texture-based Moving Object Detection.

False Positive Rate of various background subtraction methods for the Wallflower dataset. The Total column is the combination of all videos. SG: Single Gaussian, GM: Gaussian Mixture, TA: Temporal Average, RGT: our method, KDE: Kernel Density Estimation, CB: CodeBook, TBMOD: Texture-based Moving Object Detection.

Detection results of the proposed method for the Wallflower dataset.

True Positive Rate of different variations of the proposed background subtraction method for our dataset. The Total column is the combination of all the videos. RGT: our method, GM: Gaussian Mixture, KDE: Kernel Density Estimation, CB: CodeBook, TBMOD: Texture-based Moving Object Detection.

False Positive Rate of variations of the proposed background subtraction method for our dataset. The Total column is the combination of all videos. RGT: our method, GM: Gaussian Mixture, KDE: Kernel Density Estimation, CB: CodeBook, TBMOD: Texture-based Moving Object Detection.

Detection results of the proposed method for our dataset.

ROC curve of our method (parameter

Normalized TPR, TNR and TPR + TNR of each parameter of the proposed background subtraction method for the

Parameters for the experiment using Wallflower dataset.

Method | _{g} |
_{b} |
Δ |
_{c} |
_{v} |
||||
---|---|---|---|---|---|---|---|---|---|

SG | - | - | 0.05 | - | - | - | - | - | - |

GM | 3 | 3.5 | 0.007 | 0.50 | - | - | - | - | - |

TA | - | - | - | - | - | - | - | - | 31 |

| |||||||||

RGT | 5 | 3.0 | 0.005 | 0.30 | 0.044 | 0.014 | 100 | 2 | - |

_{g}_{b}_{c}_{v}

Parameters for the experiment using our dataset.

Method | _{g} |
_{b} |
Δ |
_{c} |
_{v} | |||
---|---|---|---|---|---|---|---|---|

GM | 3 | 3.4 | 0.009 | 0.50 | - | - | - | - |

RGT | 5 | 3.0 | 0.01 | 0.30 | 0.38 | 0.0095 | 100 | 2 |

_{g}_{b}_{c}_{v}

Parameters for the experiment using both datasets.

Method | _{B} |
_{b} |
_{ω} |
_{P} |
_{Region} |
||||||
---|---|---|---|---|---|---|---|---|---|---|---|

TBMOD [ |
3 | 0.4 | 0.01 | 0.01 | 0.65 | 9 | 2 | 6 | - | - | - |

KDE [ |
- | - | - | - | - | - | - | - | 10 | 15 | 1e-20 |

_{B}_{b}_{ω}_{P}_{region}

Parameters for the experiment using both datasets.

Method | _{1} |
_{2} |
_{r} |
_{t} |
_{s} | |||
---|---|---|---|---|---|---|---|---|

CB [ |
20 | 20 | 0.7 | 1.2 | 60 | 3 | 20 | 100 |

_{1}: Color sampling bandwidth, _{2}: Color detecting threshold, _{r}_{t}_{s}

Processing rate (fps) of various background subtraction methods for different image resolutions.

Resolution | Processing rate of GM (fps) | Processing rate of RGT (fps) |
---|---|---|

640 × 480 | 7.33 | 7.1 |

320 × 240 | 28.3 | 27.2 |

160 × 120 | 86.4 | 81.5 |