^{*}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

In this paper, a pixel-based background modeling method, which uses nonparametric kernel density estimation, is proposed. To reduce the burden of image storage, we modify the original KDE method by using the first frame to initialize it and update it subsequently at every frame by controlling the learning rate according to the situations. We apply an adaptive threshold method based on image changes to effectively subtract the dynamic backgrounds. The devised scheme allows the proposed method to automatically adapt to various environments and effectively extract the foreground. The method presented here exhibits good performance and is suitable for dynamic background environments. The algorithm is tested on various video sequences and compared with other state-of-the-art background subtraction methods so as to verify its performance.

One of the most important aspects of an intelligent vision surveillance system is background subtraction, which is used as a preprocessing step for object detection and tracking in vision systems. Usually, every pixel is searched and compared step-by-step with a predefined object dataset so as to detect or track an object. However, searching every pixel requires a high computational time and thus, a background subtraction method is generally used to reduce the searching region and improve computational performance. Background subtraction is also used in human-computer interactions (HCI) as a preprocessing step to reduce computational cost. As such, background subtraction is an important subject in the field of computer vision. Since background modeling significantly affects the performance of the overall vision system, it is important to employ a good background subtraction method. However, many challenges are associated with background modeling.

Dynamic backgrounds: The background is generally non-static (e.g., waving trees, swaying curtains, escalators, rippling water surfaces,

Gradual illumination changes: These are caused by either sunlight changes as time elapses or by the sun being covered by clouds.

Sudden illumination changes: Light can sometimes be switched on or off in indoor environments. This can significantly change the background. Thus, a modeled background should quickly adapt to environmental changes.

Moved object: A background should be changed by a moved object. If someone parks a car and the car is not moved for a long period of time, the car should be accepted as part of the background.

Shadows: Usually, the shadows of moving objects need to be eliminated.

Another challenge is that many moving foregrounds can appear simultaneously with the above non-static problems. Therefore, background modeling methods should intelligently overcome such issues.

There are three representative approaches for background subtraction methods. First, pixel-based methods extract foregrounds using each pixel independently. Such approaches do not consider the relationships among the surrounding pixels. One of the most commonly used pixel-based methods is Gaussian modeling. Wren

Another group of background subtraction techniques are the block-based methods. Among these techniques, the Markov random field framework was used by Reddy for background estimation [

The third class of background subtraction approaches are the texture-based methods. Heikkila

Many background subtraction algorithms have also been proposed. Each algorithm has produced effective foreground extraction results in a limited environment. However, more robust and faster algorithms are constantly required because, as a preprocessing step, exact foreground extraction produces good results in terms of detecting or tracking an object. In this paper, we used a pixel-based method since it is simpler and faster than block-based or hierarchical methods and yields more precise results. Specifically, we propose an adaptive background subtraction method based on kernel density estimation in a pixel-based method. Through the use of kernel density estimation, we can adaptively devise a probabilistic background model in each environment. The proposed method can automatically adapt to various environments and stochastically delete non-background information or add new-background values. In addition, the scheme can quickly adapt to sudden or gradual illumination changes. In Section 2, we present the proposed method and background modeling scheme. In Section 3, well-known sequences are used to compare the performance of the proposed method to that of other state-of-art methods. Finally, conclusions are presented in Section 4.

Backgrounds are generally non-static with many dynamic factors such as waving trees, rippling water, and illumination changes. Various attempts have been made to overcome these problems. One of the most useful methods is the MOG method, but MOG parameters such as the number of Gaussian models and variances should be manually selected and thus, it takes too much time to initialize the background model with the expectation maximization (EM) algorithm in every pixel independently.

In this paper, we used the kernel density estimation (KDE) method [_{t}_{t}

Each pixel has a probability model. The probability obtained by the KDE method is added to the prior probability density at every frame. In _{t}_{t}_{t}_{t}_{t}(x)

A new probability background model is obtained through the above process. This updating method improves memory effectiveness because it does not require many images to be saved to initialize the probability background model. The updating method automatically reduces the probability of unimportant backgrounds that do not appear over a long period of time by adding an additional probability and performing a normalization step. For example, when a car parked for a long period of time moves or disappears, the proposed method continually updates the environment. Consequently, new background information appears and the prior unimportant background probability associated with the car is automatically lowered by updating the background model. We used _{t}_{t}_{t}_{t}

The value of _{t}

In

A few of the problems associated with the non-parametric kernel density estimation approach are the undesirably long processing time and the large memory requirement. We can reduce the complexity and memory requirement using histogram approximation. The Gaussian probability and an example of histogram approximation are shown in _{d}_{k}_{d}_{d}^{d}

A normalization method was employed in this work since the probability _{t}^{d}_{k}_{t}^{d}(C_{k})dC_{k}_{t}^{d}_{k}_{k}^{ˆ}p_{t}^{d}_{k}

To reduce the complexity, by taking the integer part after dividing the input with the width of the bin, we may directly find the bin number which the current input belongs to. The _{t}^{d}_{d}_{d}

For instance, if the input sequences have values in [0 255] and take a bin width of _{d}

To update the probability histogram, we applied a Gaussian whose mean value is the input as _{d}/2_{t}^{d}_{k}_{d}_{t}^{d}_{(k}_{−}_{2)}_{t}^{d}_{(k}_{+}_{2)}_{(k}_{−}_{2)}_{(k}_{+}_{2)}

When we update the _{t}^{d}_{(k}_{−}_{2)}_{t}^{d}_{(k}_{+}_{2)}_{k}_{t}^{d}

We consider the case in the example before. If the _{d}_{d}_{d}_{d}_{d}_{d}/2_{t}^{d}_{(k}_{−}_{G)}_{t}^{d}_{(k}_{+}_{G)}_{d}_{d}) + 1), where

If we previously calculated and saved the complex Gaussian computations, we can reduce the computational cost by simply using the saved values in each case when we compute

Most of the background extraction methods used color information, especially RGB color space. However, RGB color is very sensitive to illumination changes, but we can independently analyze both the color itself and illumination changes using HSV color space. Even if the illumination changes significantly, the hue and saturation keep stable. When compared to RGB space, HSV color space is more useful for devising a background model and removing shadows. Therefore, we employed HSV color space to develop the background model. HSV color space is not linear. Hue space does not have linear values, but values repeat periodically. So, we have to consider it during updating the probability histogram. For example, if the bins of the histogram has values in [0 63] and the _{d}_{(}_{k}_{−}_{2}_{)} to _{(}_{k}_{+}_{2}_{)}. If the

A background subtraction algorithm is composed of two steps: background modeling and background updating. Most background subtraction algorithms collect image frames and use the collected images to generate the background model. The algorithms then extract the foreground using the background model, which is subsequently updated. However, our proposed method does not require an image collection process to generate the background model. The scheme proposed here updates the background model and extracts the foreground at every frame. In other words, the probability background model is initialized by the first frame and updated by the same process with the initialized background. As time elapses, this method automatically adapts to the environment and extracts the foreground in a more precise manner. The proposed method used the minimum distance value between the current image and the background model to obtain the foreground. We also used the average mean value of the minimum distances to adaptively extract the foreground.

The foreground is acquired via the following steps. First, the nearest _{k}_{k}_{d}_{k}

We can obtain the foreground by comparing _{d}_{d}_{t,d}_{d}_{t,d}_{t,d}_{d}_{t-1,d}_{t-1,d}_{d}_{t-1,d}_{d}_{t-1,d}_{d}_{t,d}_{d}_{t-1,d}_{d}_{d}_{t-1,d}

To remove the shadows of moving objects, we applied a moving cast shadow detection algorithm [^{h}^{s}^{v}^{v}^{s}^{h}_{s}_{h}

If the background itself is significantly changed (e.g., suddenly brightened or darkened), fast adaptation is required. We can obtain this effect by initializing the _{t}

In _{v}_{v}_{t}_{∀i,j}_{v}

We tested the proposed method with the Li and Wallflower datasets (the Li dataset is available at

Three measures were used to evaluate the performance of the proposed method: recall, precision, and F-measure. Recall is defined as the number of assigned foreground/true foreground pixels; it shows the rate of exactly how many true foreground pixels are classified as foreground pixels. Precision is defined as the number of true foreground/assigned foreground pixels; it indicates how many pixels are classified as true foreground pixels among the assigned foreground pixels.

High recall or high precision means high performance. However, each performance measure can be misleading when examined alone. For example, a simple algorithm that assigns every pixel to foreground will have a perfect recall of 100%, but an unacceptably low score in terms of precision. Conversely, if a system assigns most of the pixels to background, it will have a high score in terms of precision, but will sacrifice recall to a significant degree. Usually, there is a trade-off between recall and precision; to obtain a high recall usually means sacrificing precision and _{1}_{1}

To verify the performance of the proposed method, we used seven video sequences from the Li dataset. The results obtained with the proposed scheme are compared with those from the MOG [

Shown in the first column of

To compare the performance of the proposed method with that of the other methods, we used parameters presented in the papers detailing the other methods or found appropriate parameters by repeated testing. If a paper detailing one of the other algorithms proposed a parameter set for the image sequence, we used the given parameter value; otherwise, we assumed that the default parameters implemented in the offered algorithm programs are appropriate or we tried to find the best parameters. The Gaussian mixture model was employed and implemented by OpenCV in default mode. The CodeBook algorithm was tested by a program found on the internet [

We used six sequences in the Wallflower dataset to test our method. The first sequence is BOOTSTRAP (B), which contains many moving people and numerous shadows. If the updating speed for the background is too fast or the threshold is too high, people at the desk can be classified as part of the background. In the sequence, the proposed algorithm is able to effectively eliminate the shadows, while the other methods sometimes cannot reduce errors. The second sequence is CAMOUFLAGE (C). In this sequence, the codebook method yields the best result. While our method has a lower recall than codebook, it exhibits higher precision than codebook and the other methods. We can confirm that our method is able to adapt to sudden environment changes by applying the LIGHTSWITCH (LS) sequence containing 2,714 frames. In the sequence, the light is turned off after 812 frames and then turned on again at frame 1,854. The sequence MOVEOBJECT (MO) contains 1,745 frames with a moving object. MOVEOBJECT is appropriate to test the adaptability of the background model. When the chair is moved at frame 888, it should become part of the background after a suitable period of time. The recall, precision, and F1 results for the MO sequence are not displayed here. However, the adaptability of each background modeling method is shown in

The recall results obtained with all methods are shown in

In this work, we investigated the effects of different values of _{d}_{d}

If the number of _{d}_{d}_{d}

Since the proposed method used histograms instead of density estimation and previously calculated the Gaussian values according to distance to avoid repeating such a complex calculation, the process of the proposed method is simplified. Also, because the proposed method does not use other complex approaches such as calculating gradient information or thresholds considering whole pixel values, it is much faster than the KDE method and Park [

An adaptive background subtraction method based on kernel density estimation was presented. The background is modeled as a probabilistic model by kernel density estimation. To reduce the computational complexity and memory requirements, we modified the original kernel density estimation method and applied histogram approximation and modified the updating method. This method automatically adapts to the environment as time progresses and it can reduce the complexity compared with original KDE approach method. In the initial stage, the proposed method could not correctly extract foreground, because the moving object and passing space of the moving object can be classified as background, so the background needs to re-update fast in the initial stage. The updating process should be stabilized as time goes on, so we applied a sigmoid function to control the learning rate according to the environment. When we set

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2011-0026367).

Example of the value of _{t}

Gaussian probability and an example of histogram approximation; _{d}_{k}

An example of how the histogram was updated.

An example of possible Gaussians in a bin.

Summary of the proposed algorithm.

Background subtraction results obtained with the proposed scheme and other methods using the Li dataset. The first frame of each video sequence is shown in the first row, the test frames are displayed in the second row, the ground truth data of the test frames are shown in the third row, and the results obtained with the proposed method are displayed in the fourth row. The results acquired with the other methods are shown in the fifth to eighth rows.

The recall results obtained with the proposed scheme and other methods for the Li dataset. The AVG column represents the average values of the results in all datasets.

The precision results obtained with the proposed scheme and other methods for the Li dataset. The AVG column represents the average values of the results in all datasets.

The F-measure results obtained with the proposed scheme and other methods for the Li dataset. The AVG column represents the average values of the results in all datasets.

Background subtraction results obtained with the proposed scheme and other methods using the Wallflower dataset. The first frame of each video sequence is shown in the first row, test frames are displayed in the second row, ground truth data for the test frames are shown in the third row, and the results obtained with the proposed method are displayed in the fourth row. The results obtained with the other methods are shown in the fifth to eighth rows.

The recall results obtained with the proposed scheme and other methods for the Wallflower dataset. The AVG column represents the average values of the results in all datasets.

The precision results obtained with the proposed scheme and other methods for the Wallflower dataset. The AVG column represents the average values of the results in all datasets.

The F1 results obtained with the proposed scheme and other methods for the Wallflower dataset. The AVG column represents the average values of the results in all datasets.

The evaluation performance as a function of _{d}

A contingency table.

A | B | |

C | D |