2.1. Notation and Pixel Model
Throughout this paper, lowercase letters represent scalar variables, bold lowercase letters are used for vectors, capital letters represent matrices, and Greek letters are used for the coefficients.
As shown in 
Figure 2, the video sequences are organized in the form of three-dimensional arrays. Each element in the array represents the IR intensity value associated with the corresponding pixel. Since the IR images are monochromatic, each element carries just one value, instead of the triad of RGB videos.
Referring to [
3], we model the signal 
 carried by a single pixel of spatial coordinates 
 and 
 at the quantized time instant 
 as:
        where 
 is the background signal; 
 is the target signal; 
 and 
 are the height and the width of each frame, respectively; 
 is the number of collected frames. We also introduce the matrices 
, 
, and 
, in which 
, 
, and 
 denote the 
-th frame, the corresponding background, and the target, respectively, reorganized in lexicographic order, while 
 is the number of pixels. Given such a model, the objective of target detection is to separate the target signal 
 from the background 
. In the literature, such a task is commonly referred to as background subtraction.
  2.2. RPCA for Background Subtraction
RPCA is a well-known technique that improves PCA [
43] by making it robust against outliers. In fact, while PCA can be used to effectively purge the input matrix from the additive white Gaussian noise, it fails in detecting outliers. In the case of MTD, according to the previously introduced model, the input matrix 
, which is representative of the input video, can be seen as the sum of a background matrix, represented by 
, and an outlier matrix 
, which represents the target. The idea behind using RPCA is that 
 is low rank, while 
 is sparse. Mathematically, the problem can be reformulated as that of finding 
 and 
 that satisfy Equation (2):
        where 
 denotes the 
l0-pseudo-norm, which counts the total number of non-zero elements in the matrix 
, while 
 is a regularization parameter. Since both 
 and 
 are non-convex, the problem is not tractable as it is. For this reason, a convex relaxation makes it possible to find the optimal 
 and 
 with high probability. Such relaxation is given in Equation (3):
        which is further relaxed in:
        where 
 is the nuclear norm of 
, which is a convex envelope of the function 
; 
 is the 
l1-norm of 
, which is a convex approximation of the 
l0-pseudo norm which promotes sparsity; as well, 
 is another regularization parameter which, along with 
, controls the balance of the three terms. The convex problem in Equation (3) is known as principal component pursuit (PCP); it converges to the problem in Equation (2) and can be solved using an augmented Lagrange multiplier (ALM) algorithm [
25,
44]. The implementation is exposed in Algorithm 1.
        
| Algorithm 1: RPCA by ALM | 
| 1 | Input:  (observed data) (regularization parameters)
 | 
| 2 | Initialize: , | 
| 3 | while not converged do | 
| 4 | (1)
 | 
| 5 | (2)
 | 
| 6 | (3)
 | 
| 7 | return: | 
In Algorithm 1:
		
-  denotes the shrinkage operator applied on the matrix  - , which is the proximal operator for the  l- 1-norm minimization problem  - [ 45- ]; 
- > -  denotes the singular value thresholding operator applied on the matrix  - , whose singular value decomposition (SVD) is  - , which is the proximal operator for the nuclear-norm minimization problem  -  [ 45- ]. 
RPCA is usually implemented in a batch form. In this implementation, the video is divided into batches of fixed length of 
 frames and RPCA is applied on each batch. The length of the batches has to be chosen taking into consideration the minimum speed of the target we are interested in as well as the stationarity of the background. This method is affected by non-causality, and therefore, it does not meet the real-time requirements. In fact, we would need to wait for the collection of the entire batch before obtaining background and target estimates. A possible solution is to apply a sliding window to the input video, resulting in a moving window RPCA (MW-RPCA) [
40] which, for each new collected frame, calculates the batch RPCA on the last 
 frames to provide the background/foreground separation of the last frame. This implementation in the analysis of the video sequences usually has quite a large computational burden.
  2.3. Online Moving Window RPCA
In the literature, there are a few proposals of online RPCA implementations [
38,
39,
40]. For this study, we referred to online moving window RPCA (OMW-RPCA) proposed by Xiao et al. [
40], which is an improvement of online robust PCA via stochastic optimization (RPCA-STOC) proposed by Feng et al. [
39]. We, hereinafter, summarize the ideas behind OMW-RPCA, which, by relaxing (3), solves the following problem:
        where 
 and 
 are regularization parameters. It is worth noting that, even though by dividing the three terms in (5) by 
 we could reconduct to a form that is more similar to the one in Equation (4), which is relative to batch implementation, online implementation requires a different proportion of the regularization parameters. For this reason and in order to comply with the notation used in the reference paper, we decided to keep the notations distinguished. Therefore, hereinafter, 
 and 
 will refer to batch RPCA, while 
 and 
 will refer to online implementation.
According to [
39], the nuclear norm of 
 respects the relation in Equation (6), which means that, given two matrices 
 and 
 such that 
 with 
, the nuclear norm of 
 is always lower than 
.
        
This means that solving the minimization problem in Equation (7) by plugging (6) into (5) also solves the minimization problem in Equation (5).
        
The above-depicted nuclear norm factorization is a well-established solution for online optimization problems [
39,
40,
46,
47] and is particularly elegant since 
 can be seen as the basis for the low-rank subspace, in which case, 
 would represent the coefficients of observations with respect to the basis 
. Given the input matrix 
, solving Equation (7) minimizes the following so called “empirical cost function”:
        where 
 is the empirical loss function for each frame, which is defined as:
The vectors  and  and the matrix  are updated in two steps.
First, Equation (9) is solved in 
, to find 
 and 
; then, 
 is updated by minimizing the following function:
        whose minimum can be found in closed form:
        which means that 
 can be updated by block-coordinate descent with warm restart.
The advantage of online implementation with respect to the MW-RPCA lies in the fact that, for each new frame, only Equation (9) must be minimized with respect to two vectors, which requires remarkably less time than the minimization of Equation (4) with respect to two matrices. In addition, the update of  is in closed form and does not have to be accomplished in an iterative way, therefore, adding very small computational load.
The implementation of OMW-RPCA, unfortunately, needs an initialization which provides both the estimated rank of the matrix 
 and the initial basis 
. Such initialization, which is called the “burn-in” phase, is accomplished by applying batch RPCA on the first 
 frames of the sequence, where 
 is a user-specified window size that must be higher than the expected rank of the matrix 
. Although we suggest reading [
40] for more details, we report in Algorithm 2 the steps of OMW-RPCA.
        
| Algorithm 2: Online Moving Window RPCA | 
| 1 | Input:  (observed data revealed sequentially) (burn-in regularization parameters) (online regularization parameters) (burn-in samples)
 | 
| 2 | Initialize: Compute batch RPCA on burn-in samples  to get r,  and Compute SVD on  to get  and ,  (auxiliary matrices)
 | 
| 3 | for  to  do | 
| 4 | 
 | 
| 5 | for  to  do | 
| 6 | (4)Reveal the sample 
 | 
| 7 | (5)Project new sample: 
 | 
| 8 | (6)
 | 
| 9 | (7)Compute  with  as warm restart
 | 
| 10 | return: 
 | 
Although OMW-RPCA solves the causality problem, the result is highly affected by the burn-in phase. In fact, in the burn-in sequence, if, on the one hand, no target is present, the successive iterations effectively isolate the target. On the other hand, if any target is present in the burn-in sequence, the successive iterations keep on considering the initial presence of the target as a part of the background. The result is that the estimated foreground and background contain a ghost of the target in the position it occupied during the burn-in phase. This problem is a sensitive issue since, in an operative context, we do not have any control of the scene during the initialization of the surveillance system. 
Figure 3 shows the effect of the burn-in ghosting in a sequence in which the target was present at the beginning of the recording. The upper row shows one of the first frames of the video sequence, which is included in the burn-in sequence, while the lower row shows a later frame, which is outside of the burn-in sequence. Alongside both frames, the corresponding background and foreground estimations are represented. It is worth noting that the presence of the target in the burn-in sequence affects the estimations and, even though the target is moving at a constant speed, the ghost remains in the position assumed by the boat in the burn-in sequence and does not move towards the successive positions.
A trivial idea to solve the burn-in ghosting problem is to increase the value of the regularization parameter , which increases the weight of  in the loss function in Equation (5). In fact, by increasing , we would increase the threshold of the proximal operator associated with the l1-norm, which is, indeed, the shrinkage operator. By doing this, we would cut the lower intensity pixels out of the foreground. Such pixels would hopefully belong to the ghost rather than to the actual target. In this way, the background estimation would also be modified, because of the condition , therefore, effectively deleting the ghost.
Increasing  is, unfortunately, an unpleasant solution for the following reasons:
- The parameter would become much more dependent on the specific input matrix , while, in the practice, it is usually set as ; 
- Along with the ghost pixels, a higher  would also cause erosion of target associated pixels, affecting the detection probability as well. 
In order to overcome those problems, we used a saliency-based approach, described in 
Section 2.4, which consisted of using a saliency map to modulate the regularization parameter associated with 
.
  2.4. Saliency-Aided RPCA
The saliency-based approach in RPCA is not new in the literature [
41,
48,
49]. Our approach was inspired by the approach proposed by Oreifej et al. in [
41], which modified the minimization problem in Equation (3) as follows:
        which is then relaxed to the form:
        where 
 is a matrix whose 
i-th column 
 is the saliency map of the 
i-th frame, scaled in the range between 0 and 1 and organized in lexicographic order. The operator 
 indicates the element-wise multiplication, while the operator 
 denotes any function that:
		
- inverts the polarity of each element of , in the sense that a low value should address high objectness confidence, and vice versa; 
- scales the resulting matrix in a wider modulation range (e.g., between 0 and 20).  
We use , where  and  are tuning parameters controlling the slope of the negative exponential and the dynamic of the resulting matrix, respectively. For each new frame, the saliency map is calculated through one of the many saliency filters presented in the literature. In this work, we refer to the SR and the FG algorithms because of their very small execution time. In particular, SR takes advantage of the property of the natural images known as 1/f law, which states that the amplitude  of the averaged Fourier spectrum of the ensemble of natural images obeys a distribution of the type .
FG is an implementation of the well-known visual attention model, which emulates the behavior of the retina of the human eye, to highlight the spots within the image that are characterized by the highest center–surround contrast. After calculating the saliency maps, the problem in Equation (13) can be solved, again, using ALM. Referring to [
41] for the details, the steps of the saliency-aided RPCA are reported in Algorithm 3.
        
| Algorithm 3: Saliency aided RPCA | 
| 1 | Input:  (observed data) (regularization parameters) (parameters of )
 | 
| 2 | Initialize:   (empty matrix of size ) | 
| 3 | for  to  do | 
| 4 | Reshape  in the frame form to get the matrix  of size | 
| 5 | Compute the saliency algorithm on the frame  to get | 
| 6 | Put  in lexicographic order to get  and update | 
| 7 | while not converged do | 
| 8 | (1)
 | 
| 9 | (2)
 | 
| 10 | (3)
 | 
| 11 | return: |