Compressive Online Video Background-Foreground Separation Using Multiple Prior Information and Optical Flow

In the context of video background-foreground separation, we propose a compressive online Robust Principal Component Analysis (RPCA) with optical flow that separates recursively a sequence of video frames into foreground (sparse) and background (low-rank) components. This separation method can process per video frame from a small set of measurements, in contrast to conventional batch-based RPCA, which processes the full data. The proposed method also leverages multiple prior information by incorporating previously separated background and foreground frames in an n-`1 minimization problem. Moreover, optical flow is utilized to estimate motions between the previous foreground frames and then compensate the motions to achieve higher quality prior foregrounds for improving the separation. Our method is tested on several video sequences in different scenarios for online background-foreground separation given compressive measurements. The visual and quantitative results show that the proposed method outperforms other existing methods.


Introduction
Emerging applications in surveillance and autonomous driving are challenging the existing visual systems to detect and understand objects from visual observations.Video background-foreground separation is one of great important components for object detection, identification, and tracking.In video separation, a video sequence can be separated into a slowly-changing background (modeled by L as a low-rank component) and the foreground (modeled by S, which is a sparse component).RPCA [1,2] was shown to be a robust method for separating the low-rank and sparse compenents.RPCA decomposes a data matrix M into the sum of unknown sparse S and low-rank L by solving the Principal Component Pursuit (PCP) [1] problem: min L,S L * + λ S 1 subject to M = L + S, (1) where • * is the matrix nuclear norm (sum of singular values) and • 1 is the 1 -norm.Many applications of RPCA can be found in computer vision, web data analysis, and recommender systems.However, batch RPCA processes all data samples, e.g., all frames in a video, which demands high computational and memory requirements.
Moreover, the video separation can be improved by taking into account the correlation between consecutive frames.The correlations can be obtained in the form of motions, which manifest as change in information from one frame to the other.Detecting motion is an integral part of the human visual system.A popular and convenient method for estimating motion in computer vision is by using optical flow [3][4][5] by variational methods.The optical flow estimates motion vectors of all pixels in a given frame due to the relative motions between frames.In particular, the motion vectors at each pixel can be estimated by minimizing a gradient-based matching of pixel gray value, combined with a smoothness criteria [3].Thereafter, the computed motion vectors in the horizontal and vertical directions [4] are used to compensate and predict information in the next frame.For producing highly accurate motions and accurate correspondences between frames, a large displacement optical flow [6], that combines a coarse-to-fine optimization with descriptor matching can be used to estimate the motions from previously separated frames and subsequently use them to support the separation of the current frame.
In order to deal with the video separation in an online manner, we consider an online RPCA algorithm that recursively processes a sequence of frames (a.k.a., the column-vectors in M) per time instance.Additionally, we aim at recovering the foreground and background from a small set of measurements rather than a full frame data, leveraging information from a set of previously separated frames.In particular, at time instance t, we wish to separate , where [•] denotes a matrix and x t , v t ∈ R n are column-vectors in S t and L t , respectively.We assume that In the next instance t, we have access to compressive measurements of the full frame, a.k.a., vector x t + v t , that is, we observe The recovery problem at time instance t is thus written [8] as where L t−1 and S t−1 are given.

Related Work
Many methods [9][10][11][12][13][14][15] have been proposed to solve the separation problem by advancing RPCA [1] exist.Incremental PCP [9] processes each column-vector in M at a time, but it needs access to complete data (e.g., full frames) rather than compressive data.A counterpart of batch RPCA that operates on compressive measurements known as Compressive PCP can be found in [16].The studies in [12][13][14][15]17] aim at solving the problem of online estimation of low-dimensional subspaces from randomly subsampled data for modeling the background.An algorithm to recover the sparse component x t in (2) has been proposed in [18], however, the low-rank component v t in (2) is not recovered per time instance from a small number of measurements.
The alternative method in [19,20] estimates the number of compressive measurements required to recover foreground x t per time instance via assuming the background v t not-varying.This assumption is invalid in realistic scenarios due to variations in illumination or dynamic backgrounds.The online method in [17] works on compressive measurements without taking the prior information into account.
The problem of separating a sequence of time-varying frames using prior information brings significant improvements in the context of online RPCA [18,21,22].Some studies on recursive recovery from low-dimensional measurements have been proposed to leverage prior information [18,19,21,23].The study in [23] provided a comprehensive overview of the domain, reviewing a class of recursive algorithms.A review of all the recent problem formulations is made in [24] with a unified view with Decomposition into Low-rank plus Additive Matrices (DLAM).
The studies in [18,21] used modified-CS [25] to leverage prior knowledge under the condition of slowly varying support and signal values.However, this method as well as the methods in [12,13,15] do not explore the correlations between the current frame and multiple previously separated frames.
The recent work in [8] leverages correlations across the previously separated foreground frames.However, displacements between the previous foreground frames and the current frame are not taken into account.These displacements can incur the degradation of the separation performance.
There are alternative approaches such as, double-constrained RPCA, namely shape and confidence map based RPCA (SCM-RPCA) in [26] that combines saliency map along with RPCA for automated maritime surveillance.Recently a method that incorporates the spatial and temporal sparse subspace clustering into the RPCA framework was developed in [27] and earlier in [28].Recent studies in [29] proposes a non-parametric approach for background subtraction.

Contributions
In this paper, we propose a compressive online robust PCA with optical flow (CORPCA-OF 1 ) method, which is based on our previous work in [8].We leverage information from previously separated foreground frames via optical flow [6].The novelty of CORPCA-OF over CORPCA [8] is that we make use of optical flow to estimate and compensate motions between the foreground frames, in order to generate new prior foreground frames.These new prior frames have high correlation with the current frame and thus improve the separation.We also exploit the slowly-changing characteristics of backgrounds known as low-rank components via an incremental SVD [30] method.The compressive separation problem in (2) is solved in online manner by minimizing not only an n-1 -norm cost function [31] for the sparse foreground but also the rank of a matrix for the low-rank backgrounds.Thereafter, the new separated foreground and background frames are used to update the prior knowledge for the next processing instance.This method makes video separation better with higher accuracy and more efficient implementation.The algorithm has been implemented in C++ using OpenCV and also tested on Matlab.Various functions/methods that are needed in performing video separation or general compressed sensing have been developed in C++ and can be used as stand-alone libraries.
The paper is organized as follows.We provide a brief introduction to the CORPCA algorithm [8], on which our proposed CORPCA-OF is built upon.Based on this, we formulate the problem statement in Sec.2.1, followed by a proposed CORPCA-OF algorithm in Sec.2.2.1.The visual and quantitative results obtained by testing our method on real video sequences are presented and discussed in Sec. 3.

Compressive Online Robust PCA Using Multiple Prior Information and Optical Flow
In this section, we firstly review the CORPCA [8] algorithm for online compressive video separation and state our problem.Thereafter, we proposed CORPCA-OF, which is summarized in the proposed CORPCA-OF algorithm.

Compressive Online Robust PCA (CORPCA) for video separation
The CORPCA algorithm [8] for video separation is based on RAMSIA [31], that solves an n-1 minimization problem with adaptive weights to recover a sparse signal x from low-dimensional random measurements y = Φx with the aid of multiple prior information or side information z j , j ∈ {0, 1, . . ., J}, with z 0 = 0.The objective function of RAMSIA [31] is given by min where λ > 0 and β j > 0 are weights across the prior information, and W j is a diagonal matrix with weights for each element in the prior information signal z j ; namely, W j =diag(w j1 , w j2 , ..., w jn ) with w ji >0 being the weight for the i-th element in the z j vector.The CORPCA algorithm processes one data vector per time instance by leveraging prior information for both its sparse and low-rank components.At time instance t, we observe y t = Φ(x t + v t ) with y t ∈ R m .Let Z t−1 := {z 1 , ..., z J }, a set of z j ∈ R n , and B t−1 ∈ R n×d denote prior information for x t and v t , respectively.The prior information Z t−1 and B t−1 are formed by using the already reconstructed set of vectors { x1 , ..., xt−1 } and { v1 , ..., vt−1 }.
The objective function of CORPCA is to solve Problem (2) and can be formulated by min where µ > 0. It can be seen that if v t is static (not changing), Problem (4) reduces to Problem (3).Furthermore, when x t and v t are batch variables and we do not take the prior information, Z  are solved iteratively at iteration k + 1 via the soft thresholding operator [32] for x t and the single value thresholding operator [33] for v t : x 2.2.Video Foreground and Background Separation using CORPCA-OF Problem statement.Using the prior information in CORPCA [8] has provided significant improvement of the current frame separation.However, there can be displacements between the consecutive frames that can deteriorate the separation performance.Fig. 1 illustrates an example of three previous foreground frames, x t−3 , x t−2 , and x t−1 .These frames can be used directly as prior information to recover foreground x t and background v t due to temporal correlations between x t and x t−3 , x t−2 , x t−1 , as seen in CORPCA.In the last row of the prior foreground frames in Fig. 1, it can be seen that motions exist between frames.By estimating motion using optical flow [6], we can obtain motions between the previous foreground frames as in Fig. 1, which are visualized using color codes based on the magnitude and direction of motion vectors [6].These motions can be compensated to generate better quality prior frames (compare compensated frames x t−3 , x t−2 with x t−3 , x t−2 , x t−1 ), and it is better correlated to x t .In this work, we discuss an algorithm we proposed -CORPCA with Optical Flow (CORPCA-OF), whose work flow is shown in Fig. 1.Optical flow [6] is used to improve prior foreground frames.

Compressive Separation Model with CORPCA-OF
A compressive separation model using the CORPCA-OF method is shown in Fig. 2. At a time instance t, the inputs consist of compressive measurements y t = Φ(x t + v t ) and prior information from time instance t − 1, Z t−1 (foreground) and B t−1 (background).The model outputs foreground and background information x t and v t by solving the CORPCA minimization problem in (4).Finally, the outputs x t and v t are used to generate better prior foreground information via a prior generation using optical flow and update Z t−1 and B t−1 for the next instance via a prior update.The novel block of CORPCA-OF compared with CORPCA [8] is the Prior Generation using Optical Flow, where prior foreground information is improved by exploiting the correlation between frames using large displacement optical flow [6].The method is further described in Algorithm 1.
Input: y t , Z t−1 , B t−1 ; Output: x t , v t , Z t , B t ; // Initialize variables and parameters.

Prior Generation using Optical Flow
The main idea of CORPCA-OF is to improve the foreground prior frames using the correlation between frames, which is done by estimating motion between frames via optical flow.In Algorithm 1, the prior frames are initialized with x t−1 , x t−2 and x t−3 .Optical flow is used to compute the motions between frames x t−1 and x t−2 (also x t−1 and x t−3 ) to obtain flow vectors for these two frames.This can be seen in Fig. 1 from the color coded representation of optical flow fields [6].The function f ME (•) in Lines 2 and 3 [see Algorithm 1] computes the motions between prior foreground frames.It involves computing the optical flow vectors consisting of horizontal (x) and vertical (y) components, and is denoted by v 1x , v 2x and v 1y , v 2y ∈ R n , respectively.The estimated motions in the form of optical flow vectors, (v 1x , v 1y ) and (v 2x , v 2y ), are then used to predict the following frames by compensating for Considering a point or a pixel i in the given frame, the horizontal and vertical components v 1xi and v 1yi of corresponding horizontal and vertical flow vectors v 1x and v 1y are obtained, as mentioned in [34], by solving : where I 1x = ∂I 1 /∂x and I 1y = ∂I 1 /∂y are the intensity changes in the horizontal (x) and vertical (y) directions, respectively, constituting the spatial gradients of the intensity level I 1 ; I 1t = ∂I 1 /∂t is the time gradient, which is a measure of temporal change in the intensity level at point i.There are various methods [3][4][5][6] to determine v 1xi and v 1yi .Our solution is based on large displacement optical flow [6], that is a combination of global and local approaches to estimate all kinds of motion.It involves optimization and minimization of error by using descriptor matching and continuation method, which utilizes feature matching along with conventional optical flow estimation to obtain the flow field.We combine the optical flow components of each point i in the image into two vectors (v 1x , v 1y ), i.e., the horizontal and the vertical components of the optical flow vector.Similarly, we obtain (v 2x , v 2y ).
The estimated motions in the form of optical flow vectors are used along with the frame x t−1 to produce new prior frames that form the updated prior information.Linear interpolation is used to generate new frames via column interpolation and row interpolation.This is represented as f MC (•) in Lines 5 and 6 in the Algorithm 1.The obtained frame is the result of using the flow fields (v 1x , v 1y ) and ( 12 v 2x , 1 2 v 2y ) to predict motions in the next frame and compensate them on the frame x t−1 to obtain x t−2 and x t−3 respectively.It should be noted that x t−3 is obtained by compensating for half the motions, i.e., ( 1 2 v 2x , 1 2 v 2y ), between x t−1 and x t−3 .These improved frames x t−2 , x t−3 are more correlated to the current frame x t than x t−2 , x t−3 , i.e., without motion estimation and compensation.We also keep the most recent frame x t−1 = x t−1 (in Line 4) as one of the prior frames.
Thereafter, v (k+1) t and x (k+1) t are iteratively computed as in Lines 14-15 in Algorithm 1.It can be noted that the proximal operator Γ τg 1 (•) in Line 13 of Algorithm 1 is defined [8] as: where g 1 (•)= • 1 ( 1 -norm).The weights W j and β j are updated per iteration of the algorithm (see .As suggested in [2], the convergence of Algorithm 1 in Line 8 is determined by evaluating ) 2 2 .In the next step, we perform an update of the priors Z t and B t .

Prior Update
The update of Z t and B t [8] is carried out after each time instance (see Lines 21-22, Algorithm 1).Due to the correlation between subsequent frames, we update the prior information Z t by using the J latest recovered sparse components, which is given by, Z t := {z j = x t−J+j } J j=1 .For B t ∈ R n×d , we consider an adaptive update, which operates on a fixed or constant number d of the columns of B t .To this end, the incremental singular value decomposition [30] method (incSVD(•) in Line 12, Algorithm 1) is used.It is worth noting that the update t , causes the dimension of B t to increase as B t ∈ R n×(d+1) after each instance.However, in order to maintain a reasonable number of d, we take [9,30] since we only compute the full SVD of the middle matrix with size (d + 1) × (d + 1), where d n, instead of n × (d + 1).The computation of incSVD(•) is presented as follows: The goal is to compute incSVD where e t = U T t−1 v t and δ t = v t − U t−1 e t .By computing the SVD of the central term of (9), we • U, Σ t = Σ, and also,

Experimental Results
In this section we present and discuss the experimental results obtained by applying our method CORPCA-OF on real video data.We also evaluate the performance of our algorithm and compare the results with other existing methods.
Experimental setup.The experiments were carried out on two computers.The Matlab implementation was carried out and tested on a desktop PC with Intel i5 3.5GHz CPU (4 cores) and 12GB RAM.The C++ implementation was carried out and tested on a computer with i7-4510U 2.0 GHz CPU (2 cores) and 8GB memory.
For the experimental evaluation with the various existing methods, mainly two sequences [35], Bootstrap (80 × 60) and Curtain (80 × 64) were used.The Bootstrap sequence consists of 3055 frames and has a static background and a complex foreground.The Curtain sequence contains 2964 frames with a dynamic background and simple foreground motion.For separating each of these sequences, 100 frames are randomly selected and used for initialization of prior information.The prior information is later updated by selecting three most recent frames as seen in Sec.2.2.3.
We evaluate the performance of the proposed CORPCA-OF in Algorithm 1 and compare it against the existing methods, RPCA [1], GRASTA [12], and ReProCS [18,23].RPCA [1] is a batch-based method assuming full access to the data, while GRASTA [12] and ReProCS [18] are online methods that can recover either the (low-rank) background component (GRASTA) or the (sparse) foreground component (ReProCS) from compressive measurements.

Prior Information Evaluation
We evaluate the prior information of CORPCA-OF compared with that of CORPCA [8] using the previously separated foreground frames directly.For CORPCA-OF, we generate the prior information by estimating and compensating motions among the previous foreground frames.Fig. 3 shows a few examples of the prior information generated for the sequences Bootstrap and Curtain.In Fig. 3 Especially in Fig. 3(c), the generated frames #448' and #449' have significantly improved due to dense motion compensations.In Fig. 3(d), it is clear that the movements of the person is well compensated in #2771' and #2772' by CORPCA-OF compared to #2771 and #2772 respectively, of CORPCA, leading to better correlations with the foreground of current frame #2774.Replace one of the curtain and bootstrap sequences by an elevator and a fountain sequence.

Compressive Video Foreground and Background Separation
We assess our CORPCA-OF method in the application of compressive video separation and compare it against the existing methods, CORPCA [8], RPCA [1], GRASTA [12], and ReProCS [18].We run all methods on the test video sequences.In this experiment, we use d = 100 frames as training vectors for the proposed CORPCA-OF, CORPCA [8] as well as for GRASTA [12] and ReProCS [18].Three latest previous foregrounds are used as the foreground prior for CORPCA.Meanwhile, CORPCA-OF refines them using optical flow [6].RPCA [1] GRASTA [12] ReProCS [18] (a) Bootstrap RPCA [1] GRASTA [12] ReProCS [18] (b) Curtain Fig. 7 illustrates the ROC results when assuming full data access, i.e., m/n = 1, of CORPCA-OF, CORPCA, RPCA, GRASTA, and ReProCS.The results show that CORPCA-OF delivers higher performance than the other methods.Furthermore, we compare the foreground recovery performance of CORPCA-OF against CORPCA and ReProCS for different compressive measurement rates: m/n = {0.8;0.6; 0.4; 0.2}.The ROC results in Figs. 8 and 9 show that CORPCA-OF can achieve higher performance in comparison to ReProCS and CORPCA.In particular, with a small number of measurements, CORPCA-OF produces better curves than those of CORPCA.This is evident for Bootstrap at m/n = {0.2;0.4; 0.6} [see Fig. 8(a)].For the Curtain sequence, which has a dynamic background and less complex foreground, the measurements at m/n = {0.2;0.4} [see Fig. 9(a)] are clearly better.The ROC results for ReProCS are quickly degraded even with a high compressive measurement rate m/n = 0.8 [see Figure 9(c)].

Escalator and Fountain sequences
The CORPCA-OF method was also compared with CORPCA against Escalator and Fountain sequences for compressive measurements.From Figs. 10(a) and 10(b), it is clear that CORPCA-OF performs slightly better than CORPCA.In Fig. 11(a) and Fig. 11(b) we can see that for the Fountain sequence, which is similar to the Curtain sequence in terms of complexity of foreground motions, the results are better for CORPCA-OF compared to CORPCA at rate m/n = {0.2}and almost the same for higher rates.
is a function of both background and foreground, g(.) is a function of the foreground (sparse) component and h(.) is a function of the background (low rank) component.The components x (a), it can be observed that frames #2210', #2211' and #2212' (of CORPCA-OF) are better than corresponding #2210, #2211 and #2212 (of CORPCA) for the current frame #2213, similarly in Figs.3(b), 3(c), and 3(d).

Figure 4 .
Figure 4. Foreground and background separation for the different separation methods with full data access Bootstrap #2213 and Curtain #2866.

Figure 7 .
Figure 7. ROC for the different separation methods with full data.
(1)and B t−1 , and the projection matrix Φ into account, Problem (4) reduces to Problem(1).The CORPCA algorithm 2 solves Problem (4) given that Z t−1 and B t−1 are known (they are obtained from the time instance or recursion).Thereafter, we update Z t and B t , which are used in the subsequent time instance.Let us denote f