# Asynchronous Semantic Background Subtraction

^{*}

## Abstract

**:**

## 1. Introduction

#### Problem Statement

**Contributions.**We summarize our contributions as follows—(i) We propose a novel method, called ASBS, for the task of background subtraction. (ii) We alleviate the problem of the slow computation of semantics by substituting it for some frames with the help of a change detection algorithm. This makes our method usable in real time. (iii) We show that at a semantic frame rate corresponding to real-time computations, we achieve results close to that of SBS, meaning that our substitute for semantics is adequate. (iv) We show that our method ASBS with a real-time BGS algorithm such as ViBe and a simple feedback mechanism achieves performances close to the ones of non real-time state-of-the-art BGS algorithms such as SuBSENSE, while satisfying the real-time constraint.

## 2. Description of the Semantic Background Subtraction Method

## 3. Asynchronous Semantic Background Subtraction

Algorithm 1 Pseudo-code of ASBS for pixels with semantics. The rule and color maps are updated during the application of SBS (note that R is initialized with zero values at the program start). |

Require:${I}_{t}$ is the input color frame (at time t)1: for all $(x,y)$ with semantics do2: ${D}_{t}(x,y)\leftarrow apply\phantom{\rule{3.33333pt}{0ex}}\mathrm{SBS}\phantom{\rule{3.33333pt}{0ex}}in\phantom{\rule{3.33333pt}{0ex}}(x,y)$ 3: if $\mathrm{rule}\phantom{\rule{0.166667em}{0ex}}1$ was activated then4: $R(x,y)\leftarrow 1$ 5: $C(x,y)\leftarrow {I}_{t}(x,y)$ 6: else if $\mathrm{rule}\phantom{\rule{0.166667em}{0ex}}2$ was activated then7: $R(x,y)\leftarrow 2$ 8: $C(x,y)\leftarrow {I}_{t}(x,y)$ 9: else10: $R(x,y)\leftarrow 0$ 11: end if12: end for |

Algorithm 2 Pseudo-code of ASBS for pixels without semantics, $\mathrm{rule}\phantom{\rule{0.166667em}{0ex}}A$, $\mathrm{rule}\phantom{\rule{0.166667em}{0ex}}B$ or the fallback are applied. |

Require:${I}_{t}$ is the input color frame (at time t)1: for all $(x,y)$ without semantics do2: if $R(x,y)=1$ then3: if $\mathrm{dist}\left(\right)open="("\; close=")">C(x,y),{I}_{t}(x,y)$ then4: ${D}_{t}(x,y)\leftarrow \mathrm{BG}$ 5: end if6: else if $R(x,y)=2$ then7: if $\mathrm{dist}\left(\right)open="("\; close=")">C(x,y),{I}_{t}(x,y))$ then8: ${D}_{t}(x,y)\leftarrow \mathrm{FG}$ 9: end if10: else11: ${D}_{t}(x,y)\leftarrow {B}_{t}(x,y)$ 12: end if13: end for |

#### Timing Diagrams of ASBS

- ${I}_{t}$, ${S}_{t}$, ${B}_{t}$, ${D}_{t}$ respectively denote an arbitrary input, semantics, background segmented by the BGS algorithm, and the background segmented by ASBS, indexed by t.
- ${\delta}_{I}$ represents the time between two consecutive input frames.
- ${\Delta}_{S}$, ${\Delta}_{B}$, ${\Delta}_{D}$ are the times needed to calculate the semantics, the BGS output, and to apply SBS or ASBS, which are supposed to be the same, respectively. These times are reasonably constant.

## 4. Experimental Results

#### 4.1. Evaluation Methodology

#### 4.2. Performances of ASBS

#### 4.3. A Feedback Mechanism for SBS and ASBS

#### 4.4. Time Analysis of ASBS

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Bouwmans, T. Traditional and recent approaches in background modeling for foreground detection: An overview. Comput. Sci. Rev.
**2014**, 11–12, 31–66. [Google Scholar] [CrossRef] - Stauffer, C.; Grimson, E. Adaptive background mixture models for real-time tracking. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Corfu, Greece, 20–25 September 1995; Volume 2, pp. 246–252. [Google Scholar]
- Elgammal, A.; Harwood, D.; Davis, L. Non-parametric Model for Background Subtraction. In European Conference on Computer Vision (ECCV); Lecture Notes in Computer Science; Springer: Berlin, Germany, 2000; Volume 1843, pp. 751–767. [Google Scholar]
- Maddalena, L.; Petrosino, A. A Self-Organizing Approach to Background Subtraction for Visual Surveillance Applications. IEEE Trans. Image Proc.
**2008**, 17, 1168–1177. [Google Scholar] [CrossRef] [PubMed] - Barnich, O.; Van Droogenbroeck, M. ViBe: A universal background subtraction algorithm for video sequences. IEEE Trans. Image Proc.
**2011**, 20, 1709–1724. [Google Scholar] [CrossRef] [PubMed] [Green Version] - St-Charles, P.L.; Bilodeau, G.A.; Bergevin, R. SuBSENSE: A Universal Change Detection Method with Local Adaptive Sensitivity. IEEE Trans. Image Proc.
**2015**, 24, 359–373. [Google Scholar] [CrossRef] [PubMed] - St-Charles, P.L.; Bilodeau, G.A.; Bergevin, R. Universal Background Subtraction Using Word Consensus Models. IEEE Trans. Image Proc.
**2016**, 25, 4768–4781. [Google Scholar] [CrossRef] - Bianco, S.; Ciocca, G.; Schettini, R. Combination of Video Change Detection Algorithms by Genetic Programming. IEEE Trans. Evol. Comput.
**2017**, 21, 914–928. [Google Scholar] [CrossRef] - Javed, S.; Mahmood, A.; Bouwmans, T.; Jung, S.K. Background-Foreground Modeling Based on Spatiotemporal Sparse Subspace Clustering. IEEE Trans. Image Proc.
**2017**, 26, 5840–5854. [Google Scholar] [CrossRef] [PubMed] - Ebadi, S.; Izquierdo, E. Foreground Segmentation with Tree-Structured Sparse RPCA. IEEE Trans. Pattern Anal. Mach. Intell.
**2018**, 40, 2273–2280. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Vacavant, A.; Chateau, T.; Wilhelm, A.; Lequièvre, L. A Benchmark Dataset for Outdoor Foreground/ Background Extraction. In Asian Conference on Computer Vision (ACCV); Lecture Notes in Computer Science; Springer: Berlin, Germany, 2012; Volume 7728, pp. 291–300. [Google Scholar]
- Wang, Y.; Jodoin, P.M.; Porikli, F.; Konrad, J.; Benezeth, Y.; Ishwar, P. CDnet 2014: An Expanded Change Detection Benchmark Dataset. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Columbus, OH, USA, 23–28 June 2014; pp. 393–400. [Google Scholar]
- Cuevas, C.; Yanez, E.; Garcia, N. Labeled dataset for integral evaluation of moving object detection algorithms: LASIESTA. Comput. Vis. Image Understand.
**2016**, 152, 103–117. [Google Scholar] [CrossRef] - Braham, M.; Van Droogenbroeck, M. Deep Background Subtraction with Scene-Specific Convolutional Neural Networks. In Proceedings of the IEEE International Conference on Systems, Signals and Image Processing (IWSSIP), Bratislava, Slovakia, 23–25 May 2016; pp. 1–4. [Google Scholar]
- Bouwmans, T.; Garcia-Garcia, B. Background Subtraction in Real Applications: Challenges, Current Models and Future Directions. arXiv
**2019**, arXiv:1901.03577. [Google Scholar] - Lim, L.; Keles, H. Foreground Segmentation Using Convolutional Neural Networks for Multiscale Feature Encoding. Pattern Recognit. Lett.
**2018**, 112, 256–262. [Google Scholar] [CrossRef] [Green Version] - Wang, Y.; Luo, Z.; Jodoin, P.M. Interactive Deep Learning Method for Segmenting Moving Objects. Pattern Recognit. Lett.
**2017**, 96, 66–75. [Google Scholar] [CrossRef] - Zheng, W.B.; Wang, K.F.; Wang, F.Y. Background Subtraction Algorithm With Bayesian Generative Adversarial Networks. Acta Autom. Sin.
**2018**, 44, 878–890. [Google Scholar] - Babaee, M.; Dinh, D.; Rigoll, G. A Deep Convolutional Neural Network for Background Subtraction. Pattern Recognit.
**2018**, 76, 635–649. [Google Scholar] [CrossRef] - Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene Parsing through ADE20K Dataset. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5122–5130. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Available online: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html (accessed on 1 August 2019).
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. Microsoft COCO: Common Objects in Context. In European Conference on Computer Vision (ECCV); Lecture Notes Computer Science; Springer: Berlin, Germany, 2014; Volume 8693, pp. 740–755. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24–27 June 2014; pp. 580–587. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Sevilla-Lara, L.; Sun, D.; Jampani, V.; Black, M.J. Optical Flow with Semantic Segmentation and Localized Layers. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 3889–3898. [Google Scholar]
- Vertens, J.; Valada, A.; Burgard, W. SMSnet: Semantic motion segmentation using deep convolutional neural networks. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 582–589. [Google Scholar]
- Reddy, N.; Singhal, P.; Krishna, K. Semantic Motion Segmentation Using Dense CRF Formulation. In Proceedings of the Indian Conference on Computer Vision Graphics and Image Processing, Bangalore, India, 14–18 December 2014; pp. 1–8. [Google Scholar]
- Braham, M.; Piérard, S.; Van Droogenbroeck, M. Semantic Background Subtraction. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 4552–4556. [Google Scholar]
- Cioppa, A.; Van Droogenbroeck, M.; Braham, M. Real-Time Semantic Background Subtraction. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Abu Dhabi, UAE, 25–28 October 2020. [Google Scholar]
- Van Droogenbroeck, M.; Braham, M.; Piérard, S. Foreground and Background Detection Method. European Patent Office, EP 3438929 A1, 7 February 2017. [Google Scholar]
- Roy, S.; Ghosh, A. Real-Time Adaptive Histogram Min-Max Bucket (HMMB) Model for Background Subtraction. IEEE Trans. Circ. Syst. Video Technol.
**2018**, 28, 1513–1525. [Google Scholar] [CrossRef] - Piérard, S.; Van Droogenbroeck, M. Summarizing the performances of a background subtraction algorithm measured on several videos. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Abu Dhabi, UAE, 25–28 October 2020. [Google Scholar]
- Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Semantic understanding of scenes through the ADE20K dataset. Int. J. Comput. Vis.
**2019**, 127, 302–321. [Google Scholar] [CrossRef] [Green Version] - Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Implementation of PSPNet. Available online: https://github.com/hszhao/PSPNet (accessed on 1 August 2019).
- Barnich, O.; Van Droogenbroeck, M. Code for ViBe. Available online: https://orbi.uliege.be/handle/2268/145853 (accessed on 1 August 2019).
- St-Charles, P.L. Code for SuBSENSE. Available online: https://bitbucket.org/pierre_luc_st_charles/subsense (accessed on 1 August 2019).
- Jiang, S.; Lu, X. WeSamBE: A Weight-Sample-Based Method for Background Subtraction. IEEE Trans. Circuits Syst. Video Technol.
**2018**, 28, 2105–2115. [Google Scholar] [CrossRef]

**Figure 1.**Timing diagram of a naive real-time implementation of the semantic background subtraction (SBS) method when the frame rate of semantics is too slow to handle all the frames in real time. From top to bottom, the time lines represent: the input frames ${I}_{t}$, the computation of semantics ${S}_{t}$ by the semantic segmentation algorithm (on GPU), the computation of intermediate segmentation masks ${B}_{t}$ by the background subtraction (BGS) algorithm (on CPU), and the computation of output segmentation masks ${D}_{t}$ by the SBS method (on CPU). Vertical lines indicate when an image is available and filled rectangular areas display when a GPU or CPU performs a task. Arrows show the inputs required by the different tasks. This diagram shows that even when the background subtraction algorithm is real time with respect to the input frame rate, it is the computation of semantics that dictates the output frame rate.

**Figure 2.**Schematic representation of our method named ASBS, extending SBS [30], capable to combine the two asynchronous streams of semantics and background subtraction masks to improve the performances of BGS algorithms. When semantics is available, Asynchronous Semantic Background Subtraction (ASBS) applies Rule 1, Rule 2, or selects the fallback, and it updates the color and rule maps. Otherwise, ASBS applies Rule A, Rule B, or it selects the fallback.

**Figure 3.**Timing diagram of ASBS in the case of a real-time BGS algorithm (${\Delta}_{B}<{\delta}_{I}$) satisfying the condition ${\Delta}_{B}+{\Delta}_{D}<{\delta}_{I}$. Note that the output stream is delayed by a constant ${\Delta}_{S}+{\Delta}_{D}$ time with respect to the input stream.

**Figure 4.**Overall ${F}_{1}$ scores obtained with SBS and ASBS for four state-of-the-art BGS algorithms and different sub-sampling factors. The performances of ASBS decrease much more slowly than those of SBS with the decrease of the semantic frame rate and, therefore, are much closer to those of the ideal case (SBS with all semantic maps computed, that is SBS 1:1), meaning that ASBS provides better decisions for frames without semantics. On average, ASBS with 1 frame of semantics out of 25 frames (ASBS $25:1$) performs as well as SBS, with copy of ${B}_{t},$ with 1 frame of semantics out of 2 frames (SBS $2:1$).

**Figure 5.**Effects of SBS and ASBS on BGS algorithms in the mean ROC space of CDNet 2014 [12]. Each point represents the performance of a BGS algorithm and the end of the associated arrow indicates the performance after application of the methods for a temporal sub-sampling factor of $5:1$. We observe that SBS improves the performances, but only marginally, whereas ASBS moves the performances much closer to the oracle (upper left corner).

**Figure 6.**Per-category analysis. We display the relative improvements of the ${F}_{1}$ score of SBS, ASBS, and the second heuristic compared with the original algorithms, by considering only the frames without semantics (at a $5:1$ semantic frame rate).

**Figure 7.**Evolution of the optimal thresholds ${\tau}_{A}$ and ${\tau}_{B}$ of the ASBS method when the semantic frame rate is reduced. Note that the Manhattan distance associated to these thresholds is computed on 8-bit color values. The results are shown here for the PAWCS algorithm, and follow the same trend for the IUTIS-5, SuBSENSE, and WeSamBe BGS algorithms.

**Figure 8.**Our feedback mechanism, which impacts the decisions of any BGS algorithm whose model update is conservative, consists to replace the $\mathrm{BG}/\mathrm{FG}$ segmentation of the BGS algorithm by the final segmentation map improved by semantics (either by SBS or ASBS) to update the internal background model.

**Figure 9.**Comparison of the performances, computed with the mean ${F}_{1}$ score on the CDNet 2014, of SBS and ASBS when there is a feedback that uses ${D}_{t}$ to update the model of the BGS algorithm. The results are given with respect to a decreasing semantic frame rate. It can be seen that SBS and ASBS always improve the results of the original BGS algorithm and that a feedback is beneficial. Graphs in the right column show that the intrinsic quality of the BGS algorithms is improved, as their output ${B}_{t}$, prior to any combination with semantics, produces higher mean ${F}_{1}$ scores.

**Figure 10.**Illustration of the results of ASBS using ViBe as BGS algorithm. From left to right, we provide the original color image, the ground truth, the BGS as provided by the original ViBe algorithm, using our ASBS method without any feedback, and using ASBS and a feedback. Each line corresponds to a representative frame of a video in each category of CDNet2014.

**Figure 11.**Timing diagram of ASBS with a feedback mechanism in the case of a real-time BGS algorithm (${\Delta}_{B}<{\delta}_{I}$) satisfying the condition ${\Delta}_{B}+{\Delta}_{D}<{\delta}_{I}$ and the computation of semantics being not real-time (${\Delta}_{S}>{\delta}_{I}$). Note that the feedback time ${\Delta}_{F}$ is negligible.

**Table 1.**Comparison of the best mean ${F}_{1}$ score achieved for two semantic networks used in combination with SBS on the CDNet 2014 dataset. These performances are obtained considering the SBS method, where the output of the BGS algorithm is replaced by the ground-truth masks. This indicates how the semantic information used in SBS would deteriorate a perfect BGS algorithm.

**Table 2.**Decision table as implemented by SBS. Rows corresponding to “don’t-care” values (X) cannot be encountered, assuming that ${\tau}_{\mathrm{BG}}<{\tau}_{\mathrm{FG}}$.

${\mathit{B}}_{\mathit{t}}(\mathit{x},\mathit{y})$ | ${\mathit{S}}_{\mathit{t}}^{\mathbf{BG}}(\mathit{x},\mathit{y})\le {\mathit{\tau}}_{\mathbf{BG}}$ | ${\mathit{S}}_{\mathit{t}}^{\mathbf{FG}}(\mathit{x},\mathit{y})\ge {\mathit{\tau}}_{\mathbf{FG}}$ | ${\mathit{D}}_{\mathit{t}}(\mathit{x},\mathit{y})$ |
---|---|---|---|

BG | false | false | BG |

BG | false | true | FG |

BG | true | false | BG |

BG | true | true | X |

FG | false | false | FG |

FG | false | true | FG |

FG | true | false | BG |

FG | true | true | X |

${\mathbf{\Delta}}_{\mathit{D}}\left(\mathbf{SBS}\right)$ | $1.56$ |
---|---|

${\Delta}_{D}(\mathrm{ASBS}:\phantom{\rule{0.166667em}{0ex}}\mathrm{frames}\phantom{\rule{0.166667em}{0ex}}\mathrm{with}\phantom{\rule{0.166667em}{0ex}}\mathrm{semantics})$ | $2.12$ |

${\Delta}_{D}(\mathrm{ASBS}:\phantom{\rule{0.166667em}{0ex}}\mathrm{frames}\phantom{\rule{0.166667em}{0ex}}\mathrm{without}\phantom{\rule{0.166667em}{0ex}}\mathrm{semantics})$ | $0.8$ |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Cioppa, A.; Braham, M.; Van Droogenbroeck, M.
Asynchronous Semantic Background Subtraction. *J. Imaging* **2020**, *6*, 50.
https://doi.org/10.3390/jimaging6060050

**AMA Style**

Cioppa A, Braham M, Van Droogenbroeck M.
Asynchronous Semantic Background Subtraction. *Journal of Imaging*. 2020; 6(6):50.
https://doi.org/10.3390/jimaging6060050

**Chicago/Turabian Style**

Cioppa, Anthony, Marc Braham, and Marc Van Droogenbroeck.
2020. "Asynchronous Semantic Background Subtraction" *Journal of Imaging* 6, no. 6: 50.
https://doi.org/10.3390/jimaging6060050