A Constrained Sparse-Representation-Based Spatio-Temporal Anomaly Detector for Moving Targets in Hyperspectral Imagery Sequences

: At present, small dim moving target detection in hyperspectral imagery sequences is mainly based on anomaly detection (AD). However, most conventional detection algorithms only utilize the spatial spectral information and rarely employ the temporal spectral information. Besides, multiple targets in complex motion situations, such as multiple targets at different velocities and dense targets on the same trajectory, are still challenges for moving target detection. To address these problems, we propose a novel constrained sparse representation-based spatio-temporal anomaly detection algorithm that extends AD from the spatial domain to the spatio-temporal domain. Our algorithm includes a spatial detector and a temporal detector, which play different roles in moving target detection. The former can suppress moving background regions, and the latter can suppress non-homogeneous background and stationary objects. Two temporal background puriﬁcation procedures maintain the effectiveness of the temporal detector for multiple targets in complex motion situations. Moreover, the smoothing and fusion of the spatial and temporal detection maps can adequately suppress background clutter and false alarms on the maps. Experiments conducted on a real dataset and a synthetic dataset show that the proposed algorithm can accurately detect multiple targets with different velocities and dense targets with the same trajectory and outperforms other state-of-the-art algorithms in high-noise scenarios.


Introduction
With the development of optical sensor technology, hyperspectral imagery (HSI) has been dramatically improved in recent years, and HSI sequences are more available in the real world. Because of adequate spectral information with dozens or hundreds of spectrum bands, the HSI detection technique can find and distinguish dim targets, which are unobservable in the visible or infrared images, and has promising prospects in military, security, satellite surveillance, disaster monitoring, and other applications [1]. According to whether prior target spectral information is utilized, the HSI detection technique can be mainly divided into target detection [2][3][4] and anomaly detection. Due to factors such as camera angle, illumination, atmosphere, and sensor spatial resolution, it is common in HSI that the same object has different spectra. Besides, no prior target spectrum is available for most of the moving target detection scenes. Therefore, current hyperspectral moving target detection technologies [5][6][7][8][9][10][11][12] are mainly based on anomaly detection.
Traditional single-frame anomaly detection is usually accomplished by detecting irregular deviations between the test pixel and background pixels in a hyperspectral image. Designed to detect the presence of a dim target in a multi-band image, the Reed-Xiaoli (RX) algorithm [13] assumes that the global background spectra obey a multivariate Gaussian distribution and applies the Mahalanobis distance to identify anomaly spectra. To solve the problem that the Gaussian distribution is not applicable to the non-stationary global background, the local version of RX [14] divides the local neighborhood of the test pixel into potential regions and background regions by dual-windows and replaces global statistics with local statistics. The Quasi-local-RX (QLRX) algorithm [15] improves point-target detection by utilizing local and global statistics simultaneously. The kernel RX (KRX) algorithm [16], a nonlinear version of RX, maps spectra into a more high-dimensional characteristic space through a kernel function and outperforms the original RX detector in military target and mine detection. The cluster KRX (CKRX) algorithm [17] improves the performance of KRX by replacing background pixels with cluster centers. Support vector data description (SVDD) algorithms [18,19] also determine anomalies in a high-dimensional characteristic space by building a minimal enclosing hypersphere around local background pixels. Sparse representation (SR)-based algorithms [20][21][22][23][24][25][26][27] have made significant progress in anomaly detection in recent years. These algorithms usually assume that background pixels can be presented as linear combinations of the surrounding background, and anomaly pixels cannot. The collaborative representation (CR)-based algorithm [22] adopts l 2 -norm minimization to reinforce the collaboration of background representation and is superior to RX and its improved algorithms. To realize the detection of dense small targets, the constrained sparse representation (CSR)-based algorithm [23] imposes two constraints on abundance vectors and can remove anomalous atoms from the local background dictionary. Because background pixels and target pixels are considered low rank and sparse, respectively, low-rank and sparse matrix decomposition-based algorithms [28][29][30] have also received widespread attention in anomaly detection.
When a hyperspectral staring camera is continuously imaging at short intervals, anomaly detectors can output detection maps in succession. Usually, anomaly detection maps of a hyperspectral imagery sequence can be regarded as an infrared image sequence. Therefore, multi-frame infrared detection or tracking algorithms can be used to detect or track dim moving targets on these maps. Rotman et al. combined hyperspectral target detection and infrared target tracking for the first time [5][6][7]. They transformed each HSI into a two-dimensional anomaly detection map and then utilized a variance filter (VF) [31] to detect targets moving at subpixel velocity. Besides, Duran et al. focused on tracking small dense objects, such as pedestrians or vehicles, from airborne platforms [8][9][10]. They adopted endmember techniques to detect subpixel targets and estimated the motion parameters of targets under the framework of the Bayesian filter. Wang et al. proposed a novel temporal anomaly detector in dim moving target detection, which extracts the local spatial background in the previous frame to mine the singularity of the test pixel [11]. Combining the traditional single-frame detection with their proposed temporal detection can effectively reduce temporal noise clutter. Then, Wang et al. introduced a simplified VF to calculate a trajectory history map in the literature [12]. The fusion of the spatial detection map, the temporal detection map, and the trajectory history map (STH) is superior to previous moving target detection algorithms in hyperspectral imagery sequences.
In summary, current anomaly detection algorithms for moving targets still only utilize the spatial neighborhood background of the current frame or the previous frame. However, static or non-moving objects for which the spectra are different from neighborhoods can be regarded as anomaly targets by these detection algorithms. Temporal profile filtering algorithms can detect moving targets, but ask for prior information about speed. Besides, detecting targets in complex motion, such as multiple targets at different velocities and dense targets on the same trajectory, is still a challenge for temporal profile filtering-based algorithms [5][6][7]11]. To solve these problems, we propose a CSR-based spatio-temporal anomaly detector (CSR-ST), sufficiently employing temporal spectral information in HSI sequences. Unlike hyperspectral change detection (CD) [32,33], which detects anomaly regions under diurnal and seasonal changes, moving target detection asks for a very short interval between frames. This means that camera angle, illumination, weather, and other imaging conditions are almost unchanged in adjacent frames. After frame registration, the spectrum of the same pixel can be regarded as a mixture of spectra in a small local region, only affected by the temporal clutter in different frames. Based on this assumption, we propose a novel temporal anomaly detection framework that calculates the anomaly score of the test pixel employing its former spectra. In our previous work [23], the CSR detector was based on the assumption that a background pixel can be linearly represented by the endmembers present in its spatial neighborhood while an anomaly pixel cannot. Compared to background spectra in the spatial neighborhood, the former spectra of the test pixel in previous frames can provide more pure background endmembers to represent the current spectrum. Therefore, the CSR-based temporal detector has a better ability to recover the test background pixel than the CSR-based spatial detector. Besides, the temporal detector has two insurances to construct a pure temporal background dictionary for the test pixel. The first insurance is to remove potential target spectra from the candidate set of the temporal background dictionary based on spatial detection results. The other insurance is to automatically remove anomaly atoms from the background dictionary when the corresponding abundances are higher than a given upper bound and then solve the model with the new background dictionary. Non-homogeneous background pixels or stationary objects can turn into false alarms in the single-frame detection, while the temporal detector is mainly sensitive to moving targets. However, when some background regions move in the imaging scene, the temporal detector can regard them as targets and be inferior to the spatial detector. The fusion of the spatial detection map and the temporal detection map combines the advantages of the two detectors and can suppress the background and stationary objects. The main contributions of this article are summarized as follows.
1. A novel hyperspectral spatio-temporal anomaly detection algorithm is proposed. Compared to traditional anomaly detection algorithms, the proposed algorithm utilizes the temporal spectral information and extends the CSR algorithm from the spatial domain to the spatio-temporal domain. The spatial detector and the temporal detector play different roles in moving target detection. The former can suppress moving background regions, and the latter can suppress non-homogeneous background and stationary objects. To the best of our knowledge, no literature has introduced the historical spectra of the test pixel to construct the temporal background set in anomaly detection yet. 2. In the CSR-based temporal detection, there are two procedures to purify the background dictionary. The purification procedures can improve the ability of the temporal detector to detect multiple targets in complex motion situations, such as multiple targets with different velocities and dense targets with the same trajectory. 3. An iterative smoothing filter is executed on both spatial and temporal detection maps to suppress the background clutter. Furthermore, the filter can strengthen the detection performance for slow-moving area targets.
The rest of this article is organized as follows. The CSR detector and its kernel version are introduced in Section 2. The proposed CSR-ST algorithm is described in Section 3. The experiments conducted on a real dataset and a synthetic dataset are presented in Section 4, followed by the conclusions in Section 5.

Related Work
SR-based anomaly detection algorithms usually assume that a background pixel can exist in a low-dimensional subspace spanned by surrounding background pixels. Meanwhile, anomaly pixels cannot be represented as a sparse linear mixture of background spectra. Suppose y is the test pixel, which has N spectral bands, and A is the background dictionary, which has M atoms; the competing hypotheses for the SR-based algorithms are: H 0 : y = Aα + n, background pixel H 1 : y = Aα + n, anomaly pixel (1) where A ∈ R N×M , α is defined as a sparse vector for which each item is the abundance of the correlated atom in A and n is defined as a random noise item. Usually, the sparse vector α has a sparsity constraint α 0 ≤ K imposed in SR-based detection, where K is a sparsity parameter. However, if there is no constraint on each abundance item in α, anomaly pixels can also be linear mixtures of the background dictionary on account of abundance items less than zero. The linear spectral mixture model (LMM) [34] supposes that the abundance vector α of a mixed pixel should satisfy a sum-to-one constraint: and a non-negativity constraint: The CSR algorithm introduces Equations (2) and (3) into the SR model, and the minimizing problem of CSR can be expressed as: where e represents an M × 1 vector for which each item is one. The objective function can be converted to: Note that y T y is a constant and can be removed. If the test pixel is anomalous and the background dictionary contains a few anomaly pixels, the corresponding entries of α can be enormous, resulting in a small reconstitution residual. To avoid missing alarms, an adequately tiny constant C is introduced as an upper limit of α, and Equation (4) can be transformed as: where C ∈ [1/M, 1]. According to the Karush-Kuhn-Tucker conditions [35], the constraint α 0 ≤ K in Equation (4) can be removed in Equation (6). When abnormal pixels are tested, the abundances correlated with similar anomalous atoms can reach the maximum. Accordingly, the atoms for which the abundances are C have a significant possibility of being anomalies and should be eliminated from the background dictionary. A pure dictionaryÃ can be built by the remaining atom. With the constraint 0 ≤α i ≤ 1 andÃ, reconstruction residuals of anomalous test pixels will be significantly higher than those in the first reconstruction and can be regraded as anomaly scores. r = α * TÃ TÃα * − 2y TÃα * + y T y (7) whereα * is the approximately calculated sparse vector without anomalous atoms in the background dictionaryÃ.
Given secondary or multiple scattering in the atmosphere, spectrum mixing usually is a nonlinear process [36]. The kernel methods map the original data into a more high-dimensional characteristic space via nonlinear functions and then achieve linear partition of the linearly inseparable data [37]. Skillfully, the inner product in the characteristic space can be replaced by: where φ is a nonlinear function, x i and x j are the original data, and k is the kernel function. The kernel CSR (KCSR) algorithm introduces the kernel method and adopts the Gaussian radial basis function kernel: The optimal problem is replaced by: where K is an M × M Gram matrix for which the i-th row and j-th column item K i,j = k a i , a j . K y = φ (y) T φ (A) and can also be replaced by: Likewise, the atoms for which abundances are C are removed, and then, a pure background dictionaryÃ is used to solve Equation (10). Therefore, the anomaly score can be replaced by: where r is the approximate error andK andK y are both solved byÃ.

Spatio-Temporal Anomaly Detection for Moving Targets
In this section, a novel CSR-based spatio-temporal anomaly algorithm is proposed to detect dim moving targets accurately in HSI sequences. Our algorithm is divided into four steps, namely spatial anomaly detection, iterative smoothing filter, temporal anomaly detection, and spatial-temporal fusion. The spatial anomaly detection finds abnormal targets by utilizing the spectral information of the current frame. An iterative smoothing filter can reduce noise and false alarms in the time and space domains. Different from AD, CD, and the temporal detection [12] using the information between two adjacent frames, our proposed temporal anomaly detection constructs background dictionaries with the historical spectral curves of the test pixels. The proposed temporal anomaly detection explores anomaly characteristics in the time dimension and provides anomaly information different from that in the spatial detection. The fusion of spatial and temporal anomaly detection can explore the target information more comprehensively. The framework of the proposed CSR-ST algorithm is displayed in Figure 1.

Spatial Anomaly Detection
∈ R N denote a hyperspectral cube collected in the current frame, where i is the current sequence number, d 1 and d 2 are defined as the space sizes of the cube, and N is the quantity of spectral bands. Dual concentric windows [38] are used to extract a spatial background dictionary for each pixel x j i . The dual-windows are centered at each test pixel and divide the neighborhood into a potential target region and a background region. Pixels in the background region are selected as atoms to form a background dictionary A j i . Then, the spatial anomaly score s j i of the test pixel x j i is solved by the CSR detector with the corresponding background dictionary A j i . After all pixels on X i are detected in sequence, a two-dimensional spatial detection map S i is obtained:

Iterative Smoothing Filter
The spectra change with time due to the measurement noise, resulting in temporal fluctuation of anomaly scores. Meanwhile, spatial background clutter is also generated in the detection maps due to the fluctuation. The literature [17] has used a simple smoothing filter as a post-processing procedure to decrease false alarms and noise in detection maps. Inspired by [17], an iterative smoothing filter is adopted to reduce noise both in the spatial and temporal domains simultaneously.
To avoid the overall drift of anomaly scores on the spatial detection map S i caused by sudden changes in imaging conditions, Z-score normalization should be first performed: In typical image preprocessing, µ and σ are the mean value and standard deviation of pixels in the whole image, respectively. However, because anomaly scores of anomalous pixels are much higher than those of background pixels on S i , it is more accurate to describe the distribution of s j i by a truncated normal distribution or a half-normal distribution [39] rather than a normal distribution. Therefore, it is more reasonable to set µ and σ to the mean value and standard deviation of the collection of S i and its symmetric set about zero.
Then, an iterative smoothing operation is performed onS i to reduce spatial and temporal clutter: wheres l i is the normalized spatial anomaly scores of x l i ,s j i ands j i−1 are the smoothed spatial anomaly scores of x j i and x j i−1 , respectively, L denotes the spatial neighborhood used for smoothing, and ρ and ε l denote filter weights. When the first spatial detection map is smoothed, let ρ = 1. The latter part of Equation (15) is actually a spatial smoothing filter such as the mean filter or the Gaussian filter. Furthermore, one-dimensional denoising algorithms can also replace the temporal iterative smoothing part of Equation (15) to reduce temporal clutter. Compared to the original spatial detection map S i , background clutter and noise onS i are suppressed, and detection performance can be improved.

Temporal Anomaly Detection
Note that, using the dual-window strategy to select a background dictionary has several disadvantages. Firstly, the selection of an inappropriate dual-window size can cause the local background to be contaminated by target pixels in spatial anomaly detection. If the inner window of dual-windows is too small, the chosen local background of the test target pixel can contain some target pixels. Moreover, the contamination problem can also occur when multiple targets are densely distributed. Secondly, the spatial distributions of moving targets are usually unknown and change in the real world. Therefore, it is difficult to determine the optimal dual-window size to detect moving targets in advance. Thirdly, the performance of these algorithms still varies with the dual-window size, and the best performance of the dual-window-based AD algorithms is a local optimum. For instance, detection results can be further improved after combining with a weight matrix obtained by segmentation or clustering in the literature [40,41], where background pixels are assigned lower weight values. An interesting phenomenon is that the best local background of some detection algorithms for subpixel targets are eight neighborhoods [42], and large dual-windows are harmful to these algorithms. Fourthly, the dual-window-based spatial detection cannot eliminate motionless objects, the spectra of which are also different from the background spectra.
To accurately detect moving targets in HSI sequences, we propose a new approach for constructing background dictionaries of test pixels. Compared to hyperspectral CD, the interval between two contiguous frames in moving target detection is short; thus, the camera angle, illumination, weather, and other imaging conditions are almost unchanged. In this case, the spectrum of the same object in short HSI sequences can only be affected by the measured noise. Moreover, due to camera shake and the error of frame registration, the imaging space corresponding to the same pixel in the HSI moves back and forth in a local background region. Therefore, it can be assumed that the spectra of the same pixel in adjacent frames, where B j i is defined as the former spectra matrix, β is defined as the abundance vector, P is defined as the number of former spectra, and n is defined as the noise item.
Equation (16)  j i is more suitable as a background dictionary for the CSR and KCSR detectors. In this subsection, temporal anomaly detection is defined as a method to calculate the anomaly scores of the test pixel x j i in the current frame by using its former spectra B j i . Because the positions of non-homogeneous background pixels or motionless objects are almost unchanged in the HSI after inter-frame registration, the temporal anomaly detection can avoid false alarms caused by these pixels.
However, B j i is not a pure background dictionary sometimes. When the target is moving slowly, it takes more than one frame to pass through a pixel. In this case, if x j i is a target pixel, it is possible that its former spectra are also target spectra. Besides, if the trajectories of moving targets intersect, the former spectra of pixels at the intersection can also be contaminated by targets. Therefore, we delete the abnormal atoms in B j i based on the spatial anomaly detection results. N D and N C are defined as the number of atoms in the background dictionary and its candidate set, respectively. Specifically, for the test pixel x j i in the current frame, smoothed spatial anomaly scoress where C ∈ [1/N D , 1]. The background dictionaryB j i can be further purified by removing the atoms with α = C. The temporal anomaly detection result t j i of x j i is transformed as: whereα * is the approximately calculated sparse vector without anomalous atoms in the background dictionaryB j i and t j i is the l 2 -norm of the approximate error. Similarly, the KCSR algorithm can also be applied to the temporal anomaly detection. After all pixels on X i are detected in sequence, a two-dimensional temporal detection map T i is obtained.
The lower limit of the constraint parameter C is connected with the number of anomalous atoms in the background dictionary. To obtain a convenient setting of C in the spatial and temporal anomaly detection, C can be represented as: where ν ∈ [1/N D , 1]. If ν < 1/N D and C > 1, then the inequality constraint α l ≤ C is invalid.
To further explore the meaning of ν, two definitions are given as follows: where N a is defined as the number of anomalous atoms and α l a is defined as the abundance relevant to the anomaly endmember in the LMM of the l-th anomalous atom. In the hyperspectral AD, 0 ≤ η 2 ≤ η 1 1. We proofed a proposition of the parameter ν in the article [23]: To delete all anomalous atoms from the background dictionary, ν must satisfy: where α a is defined as the abundance relevant to the anomaly endmember in the LMM of the test pixel.
The proposition gives an intuitive interpretation of ν. When ν is larger than max (η 1 , η 2 /α a ), all anomalous atoms can be deleted. Regardless of spatial detection or temporal detection, α a of the same test pixel is constant. Therefore, it is practicable to set ν to the same value in both detections. η 1 and η 2 /α a in temporal detection can be set to values smaller than those in spatial detection by reducing the proportion of anomalous atoms inB j i . One method is to enlarge N D , the size ofB j i . Another method is to decrease N a , the number of anomalous atoms, by enlarging the size of the candidate set B j i or sample the former spectra at intervals before constructing B j i . Through the above operations, the lower limit of ν in temporal detection is less than that in spatial detection. When ν is set to an excessively large value, numerous background atoms are exorbitantly deleted, resulting in slight degeneration in the ability of the CSR and KCSR algorithms to represent test background pixels. Therefore, ν should be a trade-off value between the inadequate deletion of anomalous atoms and unnecessary deletion in spatial detection. The same ν can cause the excessive deletion of atoms in temporal detection, but a large N D can avoid this situation.

Spatio-Temporal Fusion
Compared to the spatial anomaly detection, the temporal anomaly detection can suppress spatially non-homogeneous background pixels and stationary objects. Furthermore, compared to the temporal profile filtering algorithms, the proposed temporal anomaly detection can identify moving targets with different speeds simultaneously and is robust to the situation where multiple targets pass through the same trajectory one after the other. However, the temporal detection is inferior to the spatial detection in some situations. If there are some moving background pixels in the scene, such as clouds, temporal anomaly detection can judge them as targets. Besides, if the frame registration error is too large, the temporal background dictionary cannot describe the background accurately. To improve the stability and robustness of the detection algorithm, it is necessary to combine spatial and temporal detection results.
Before fusion, the filtering operation in Section 3.2 can also be performed on the temporal detection map T i . First, perform Z-score normalization on T i : where µ and σ are set to the mean value and standard deviation of the collection of S i and its symmetric set about zero. Then, the same iterative smoothing operation as Equation (15) is performed onT i to reduce temporal clutter:t wheret l i is the normalized temporal anomaly scores of x l i andt where max S i and max T i are the maximum values inS i andT i , min S i and min T i are the minimum values inS i andT i , the symbol • denotes the Hadamard product, and ST i is the fusion spatio-temporal detection map. The overall description of the proposed spatio-temporal anomaly detection is presented in Algorithm 1.

Algorithm 1 CSR-based spatio-temporal anomaly detection for moving targets
Input: Hyperspectral sequences, dual-window size (w in , w out ), temporal background dictionary size N D , candidate set size N C , parameter ν, and kernel parameter γ for KCSR.
for each frame X i in the hyperspectral sequences do

Experimental Results and Discussion
In the beginning of this section, a real HSI sequence dataset and a synthetic dataset are introduced. Subsequently, the capability of the proposed temporal anomaly detection with different background dictionary sizes and different spatial detection results is demonstrated in detail. Additionally, the proposed spatio-temporal anomaly detection is compared to several existing algorithms in the detection performance.

Datasets and Evaluation Metrics
The Cloud dataset is an HSI sequence under a complex cloudy background and was collected by the Interuniversity Microelectronics Centre of Beihang University with the xiSpec snapshot mosaic hyperspectral cameras [12]. The dataset has a spatial size of 409 × 216 pixels and 25 spectral bands including the 682-957 nm spectral region. The HSI sequence consists of 500 frames, where an aircraft (Target A) rises from the bottom of the imagery. Since the distance between the camera and the aircraft increases with the frames, the size of the aircraft decreases over time, 53 pixels in the 1st frame and 21 pixels in the 500th frame, resulting in a descending spectral difference from the background. However, because of the aircraft's speed on HSIs also decreases, the number of frames that the aircraft needs to pass through a pixel increases. Three small flying targets (Target B, Target C, and Target D) with no more than 10 pixels exist in the 250th-393rd, 256th-363rd, and 417th-466th frames, respectively, and their velocities are all greater than 5 pixels per frame. As shown in Figure 2, there is a noise clutter in the cloudy background. The synthetic dataset is based on the Terrain dataset acquired by the Hyperspectral Digital Image Collection Experiment sensor. The dataset has a spatial size of 180 × 180 pixels and 210 spectral bands including the 400-2500 nm spectral region, as shown in Figure 3a. The spatial resolution is 1 m, and the spectral resolution is 10 nm. The water absorption and high noise bands are deleted, and one-hundred sixty-two spectral bands are usable in the experiments. According to the LMM, synthetic targets can be added to the Terrain dataset by:ã where a is a pure target spectrum, b is an original background spectrum,ã is a mixed target spectrum, n is the added zero mean Gaussian noise vector, and λ is the target abundance to be set. Considering that the radiation response interval of the background varies with bands, Gaussian noise with different variance is added to each band of a hyperspectral cube. Noise intensity is adjusted by the signal-to-noise ratio (SNR), expressed in this dataset by: where σ 2 b,l and σ 2 n,l are the variances of the background and noise in the l-th band. Three targets with a size of 5 × 5 pixels and a speed of 2 pixels per frame are added to the Terrain dataset and move 100 frames. The plane trajectories of targets are the same, and the distance between the two targets is ten frames. Considering that the boundaries between neighboring objects are often accompanied by severe spectral mixing in the real data, λ of 16 pixels on the periphery of targets is set to 10%, while that of 9 pixels in the center of targets is set to 40%. To explore the noise immunity of CSR-ST, the SNR is set to 20 dB, 10 dB, 5 dB, and 0 dB in turn. Figure 3b-f shows background spectra and mixed target spectra in different noise environments. With the decrease of SNR, the discriminability between background and mixed targets also decreases. When the SNR is 0 dB, background spectra and mixed target spectra are almost indistinguishable.  To evaluate anomaly detection performance, this article adopts the receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC). The detection probability (P d ) and false alarm rate (P f ) are computed on a segmentation map, which is obtained on the detection map by a given threshold. After the threshold is iterated over, a set of P d and P f can be used to plot the ROC curve. An excellent detector has an upper left ROC curve [43]. However, the ROC curve can only qualitatively analyze detection performance. AUC [44] can give an intuitive and quantitative description and is calculated by several trapezoids: where (P l f , P l d ) is defined as the l-th coordinate point and n is defined as the number of coordinate points constituting the curve. The closer to 1 AUC values are, the better the detection algorithms are. For the anomaly detector in an HSI sequence, the mean ROC of all frames can describe the performance.
Considering that kernel space can represent hyperspectral data better, the proposed spatio-temporal anomaly detection algorithm is based on the KCSR model in the following experiments. KCSR-S, KCSR-SF, KCSR-T, and KCSR-ST denote spatial detection, smoothed spatial detection, temporal detection, and spatio-temporal fusion detection, respectively. All the experiments were implemented on a machine that was equipped with an Intel Core i9-9980XE CPU and 128-GB RAM, and the programs were written in Python.

Temporal Detection Performance under Different Settings of the Temporal Background Dictionary
For the KCSR-based temporal detection, the parameter ν can be set to the same value in spatial detection, which was analyzed in Section 3.3. Moreover, because of the same background spectra for the spatial and temporal detection, the kernel space in spatial detection is also suitable for temporal detection. Therefore, after the parameters of spatial detection are adjusted, the settings to be adjusted in the temporal detection are N D and N C , denoting the sizes of the temporal background dictionary and its candidate set, respectively. To further explain N D and N C , we define the number of removed atoms as: The meaning of the candidate set is to prevent the background dictionary from the target contamination, and N R should ensure than most of the abnormal spectra can be removed from the candidate set.

Experiments on the Cloud Dataset
Traditional temporal profile filtering algorithms ask for strong prior information about the target velocity. We count the number of frames that targets take to pass through a single pixel in the Cloud dataset and draw the histogram. As shown in Figure 4, three-thousand two-hundred forty-one pixels are passed through by targets in 20 frames, while only 130 pixels are passed through by targets more than 20 frames. The latter occurs mainly in the latter half of the sequence because the airport is far away from the camera and becomes slower in the imagery.
)UDPH 1XPEHU Figure 4. Histogram of the number of frames that targets take to pass through a single pixel in the Cloud dataset.
To explore the impact of the temporal background dictionary on the temporal detection performance, we set N C to 20, 30, 50, 80, and 100, respectively. N R was set to 10, 20, 30, and 40, respectively. Because the first 100 frames in the Cloud dataset are selected as the temporal background candidate set, the temporal anomaly detection starts at the 101st frame. The parameters ν, γ and the dual-window size of KCSR-S are empirically tuned to acquire the best detection capability in the first frame. The mean AUCs of KCSR-T in the Cloud dataset are shown in Table 1. When N C is set to 20, the mean AUC of KCSR-T becomes the worst value of 0.970966 in the table. That is because if the dictionary candidate set size is too small, temporal background dictionaries of some target pixels can consist mainly of target spectra. When N C is set to 50, 80, and 100, the mean AUCs of KCSR-T when N R is 20 are better than those when N R is 10. That is because the former can remove more target spectra in the dictionary candidate set than the latter. When N C is set to 30 and 50, the mean AUCs of KCSR-T when N D is 10 are worse than those when N D is 20. It is indicated that a small temporal background dictionary size is not conducive to the representation of spectral features. Moreover, the best mean AUC in Table 1 is 0.980302 and achieved when N C is 50 and N R is 20.

Experiments on the Synthetic Terrain Dataset
To explore how to set the temporal background dictionary on the synthetic Terrain dataset, we set N C to 20, 30, 40, and 50 in turn. N R was set to 10, 20, 30, and 40, respectively. The first 50 frames in the Cloud dataset are selected as the temporal background candidate set, and the KCSR-T is performed on the last 50 frames. The parameters ν, γ and the dual-window size of KCSR-S are empirically tuned to acquire the best detection capability in the first frame. As shown in Table 2, when the background dictionary size N D is fixed to 10, the worst mean AUC is achieved by N R = 20. That is because the former spectra of target pixels contain at most 8 target spectra, and a small N R is not conducive to removing them from the background dictionary candidate set. With SNR decreasing, the distinction between background and target spectra decreases, and the gaps of mean AUCs between N R = 10 and other settings become larger. In addition, the best mean AUCs in the four noise conditions are achieved by N C = 50 and N R = 20, which are 0.999258, 0.932968, 0.819948, and 0.685078, respectively.

Detection Performance under Different Settings of the Dual-Window
As mentioned in Section 3.3, it is different for moving target detection to set the optimal dual-window size in advance in the spatial anomaly detection. One important reason for this is that the sizes of moving targets can change. For the Cloud dataset, as the airplane moves away from the camera, the aircraft size in the HSI becomes smaller. In Section 4.2.1, the dual-window size (w in , w out ) of KCSR-S was set to (29,31), which is the optimal size in the first frame. However, (29,31) is too large for the aircraft in the 500th frame, which only has 21 pixels. To explore the impact of different settings of the dual-window on KCSR-S and KCSR-T, we set the dual-window size to (3,5), (9,11), (13,15), (19,21), (23,25), and (29,31), respectively. Considering that the iterative smoothing filter can improve the spatial detection map of KCSR-S, KCSR-T uses the original spatial detection results instead of smoothed results to select the temporal background dictionary in this subsection. Table 3, the dual-window size has a significant influence on the detection capability of KCSR-S. The best mean AUC of KCSR-S in the 101st-500th frames is 0.970251, while the worst mean AUC is 0.961412. However, the mean AUC of KCSR-T is better than that of KCSR-S under different dual-window sizes and fluctuates in a small range from 0.979962 to 0.980175. To give a more intuitive representation, we fit the variation curves of AUC with time in the 101st-500th frames by a power function with the highest power of 15. As shown in Figure 5a, with the change of the aircraft size, the optimal dual-window size also changes at different times. The optimal size around the 200th frame is (23,25) and then becomes (19,21) in the 300th frame. When it reaches the 200th frame, the 300th frame, the 450th frame, and the 480th frame, respectively, the optimal size is (23,25), (19,21), (13,15), and (9, 11), respectively. Although the fitted curve with a dual-window size of (29, 31) performs well at the beginning of the sequence, the gap between the curve and the best AUC increases over time. However, the AUC of KCSR-T is almost impervious to the dual-window size of KCSR-S. Compared to Figure 5a, the curves with different dual-window size in Figure 5b are almost the same. There are two reasons why KCSR-T is robust to the dual-window size of KCST-S. On the one hand, different dual-window sizes can result in different anomaly scores of target pixels in the spatial detection, and an unsuitable size can lead to lower anomaly scores. However, for the same pixel, even though under unsuitable dual-window sizes, the gap between anomaly scores within and without targets is still large enough to remove anomalous spectra in the candidate set of the temporal background dictionary. On the other hand, KCSR-T can also automatically remove anomalous atoms from the background dictionary during the temporal detection process. In conclusion, the proposed temporal anomaly detection is remarkably robust to the dual-window size in the spatial detection, and the combination of the spatial and temporal detection can overcome the disadvantages of the dual-window strategy.

Comparison to the State-of-the-Art
In the subsection, the KCSR-ST algorithm is contrasted with several single-frame HSI anomaly detection algorithms, including RX [13], QLRX [15], KSVDD [19], KRX [16], CR [22], KCR [22], and CSR. Meanwhile, the proposed algorithm is also contrasted with two detection algorithms for moving targets, including VF [5] and STH [12]. In fairness, both VF and STH are based on KCSR in the following experiments, denoted by KCSR-VF and KCSR-STH, respectively. All parameters of these algorithms are empirically tuned to acquire the best detection capability at the beginning of the sequences. The dual-window sizes are set to (29,31) and (9,15) for the Could and Terrain dataset, respectively. The N C and N D on the two datasets are set to the optimal values obtained in Section 4.2. The temporal filter weight ρ is set to 0.5, and the spatial smooth filter adopts a simple 3 × 3 mean smoothing filter. The AUC performances and detection maps of KCSR-S, KCSR-SF, and KCSR-T are also shown to explore the role of each step in the proposed KCSR-ST algorithm.

Experiments on the Cloud Dataset
The ROC curves obtained on the Cloud dataset are shown in Figure 6; the AUC values are shown in Table 4; and the color detection maps are shown in Figure 7. These all illustrate that KCSR-ST is superior to all single-frame and multiple-frame anomaly detection algorithms.  As shown in Table 4, the best AUC value among single-frame anomaly detection algorithms is 0.9649 and achieved by KCSR. Taking advantage of temporal information, the AUC values of KCSR-VF, KCSR-STH, and KCSR-T are all higher than single-frame algorithms. The reason for this phenomenon can be explained in Figure 7. As shown in Figure 7a, obvious vignetting exists at the edges of false color images. Vignetting is a common phenomenon in photography, but turns edges of HSIs into heterogeneous background pixels. Therefore, there always exists a relatively large number of false alarms at the edges of detection maps obtained by single-frame algorithms, which is shown in Figure 7c-j. Because KCSR-VF and KCSR-T make use of the historical spatial detection results and the former spectra of test pixels, respectively, the heterogeneous background pixels rarely lead to false alarms in the corresponding detection maps, which is shown in Figure 7k,n. However, the historical trajectory of Target B turns into false alarms in the detection map of KCSR-VF in the 400th frame. That is because the VF algorithm is mainly designed to detect slow targets, and the parameter setting of VF depends on the speed of targets. Because velocities of Targets B, C, and D are all greater than 5 pixels per frame and go through a pixel in a frame, the temporal variance-calculation window suitable for Target A is too long for them. As long as the temporal variance-calculation window contains the trajectory of Targets B, C, and D, the detection results can have high values and become false alarms in the VF detection map. Moreover, KCSR-STH combines KCSR-VF with other spatial detection maps and is slightly affected by these false alarms, shown in Figure 7l.
As shown in Figure 7j,m, there is much background clutter on the detection maps of KCSR-S and KCSR-T. Compared to KCSR-S, the spatial detection map after the iterative smoothing filter, KCSR-SF, suppresses the background clutter and enhances the target. However, false alarms resulting from the heterogeneous background are also enhanced in Figure 7m. KCSR-ST combines the smoothed spatial detection map (KCSR-SF) with the smoothed temporal detection map, and the heterogeneous background and the background clutter are entirely suppressed in Figure 7o. As shown in Figure 6a, the ROC curve of KCSR-ST is on the upper left of those of other algorithms, which indicates that KCSR-ST is superior to the single-frame and multi-frame anomaly detection algorithms. However, when P f is limited to an extremely low value range, the detection performance of KCSR-ST is inferior to KCSR-T. As shown in Figure 6b, when P f is 10 −5 , the P d of KCSR-T is about 0.75, while that of KCSR-ST is only about 0.35. Furthermore, when P f is smaller than 10 −5 , the ROC curve of KCSR-S outperforms KCSR-SF. This is because the iterative smoothing filter enhances target pixels and pixels around targets. Compared to KCSR-S and KCSR-T, KCSR-SF and KCSR-ST blurred the boundary between target and background in the detection maps. However, the iterative smoothing filter can still be regarded as a useful strategy. Although reducing P d when P f is low, the enhancement improves the ability to detect slow targets and the robustness to the different moving speeds of the targets. For most hyperspectral anomaly detection scenarios, the focus is on whether the target exists rather than the shape of the target. The false alarms that result from the enhancement of pixels around the target have little influence on the judgment of whether the target exists. Besides, the enhancement from the iterative smoothing filter can be optimized by adjusting the filter weights or changing the smoothing strategy.

Experiments on the Terrain Dataset
The ROC curves achieved on the synthetic Terrain dataset under different noise environments are shown in Figure 8; the color detection maps are shown in Figures 9-11; and the AUC results are shown in Table 5. Our proposed KCSR-ST algorithm is considerably robust to noise and superior to all single-frame anomaly detection algorithms. When the SNR is set to 20 dB, 10 dB, 5 dB, and 0 dB, respectively, the corresponding mean AUC of KCSR-ST is 0.9996, 0.9959, 0.9461, and 0.7516, respectively; whereas, the best mean AUC among single-frame algorithms is 0.8402, 0.7438, 0.7057, and 0.6205, respectively. As shown in Figure 9c-j, Figure 10c-j, and Figure 11c-j, there are a large number of false alarms on the detection maps of single-frame algorithms because some trees are sparsely distributed in the scene. .&657 .&6567 (d) Although KCSR-VF and KCSR-STH are also superior to single-frame algorithms, their detection performance is far inferior to that of KCSR-ST on the Terrain dataset. As shown in Figure 9k, the trajectory of targets results in false alarms on the detection map of KCSR-VF. That is because the targets share the same trajectory, and the baseline background of VF cannot be estimated accurately. KCSR-STH combines KCSR-VF, KCSR-S, and its temporal detection and suppresses the background and false alarms. However, because the temporal detection of STH extracts the background dictionary of the test pixel in the forward frame by the same dual-window as KCSR-S, the false alarms resulting from sparse trees are still on the temporal detection map of KCSR-STH and then appear in the final fusion map, i.e., Figure 9l.
As shown in Figure 9k, when the SNR is 20 dB, KCSR-T has an excellent ability to detect moving targets. Although the mean AUC of KCSR-T is slightly lower than KCSR-ST, the ROC performance of KCSR-T outperforms KCSR-ST when P f is smaller than 10 −3 . As shown in Figure 8a, when P f is 10 −5 , the P d of KCSR-T is about 0.98, while the P d s of all the other algorithm are higher than 0.25. When the SNR is 10 dB, the mean AUC of KCSR-T is much lower that of KCSR-ST, and the ROC performance of KCSR-T is also inferior to that of KCSR-ST. That is because the target abundances of target peripheral pixels are lower, and then, these pixels cannot be detected by KCSR-T, which is shown in Figure 10n. Employing the iterative smoothing filter, KCSR-ST enhances the anomaly scores of pixels around targets and then performs prominently under the ROC and AUC evaluation metrics. When SNR comes to 5 dB, there is much noise clutter on the detection map of KCSR-T, i.e., Figure 11n, and the mean AUC of KCSR-T descends to 0.8199. As shown in Figure 8c, the ROC curves of single-frame algorithms are close to the diagonal, which means that the detection abilities of single-frame algorithms of moving targets are incredibly inferior. KCSR-ST can effectively suppress the noise clutter and false alarms on the detection map, which is shown in Figure 11n, and its ROC performances are much better than the other curves in Figure 8c. Even though the SNR descends to 0 dB, KCSR-ST still can detect targets. As shown in Figure 8d, the ROC curves of other algorithms are around the diagonal, while the P d of KCSR-ST can reach 0.6 when the P f is 0.1.

Conclusions
In the traditional single-frame anomaly detection, false alarms on stationary targets and non-homogeneous backgrounds are unavoidable. Besides, detecting targets in complex motion is still a challenge for multi-frame algorithms. In this article, a constrained sparse representation-based spatio-temporal AD algorithm is proposed to identify small and dim moving targets in hyperspectral sequences and overcomes the aforementioned drawbacks. Our algorithm includes a spatial detector and a temporal detector. The former can suppress moving background regions, and the latter can suppress non-homogeneous background and stationary objects. Moreover, two temporal background purification procedures ensure the effectiveness of the temporal detector for targets in complex motion. Experiments accomplished on the Cloud dataset and the synthetic Terrain dataset indicate that our algorithm is superior to other classic detection algorithms. Even though the noise clutter is extreme, our algorithm can also suppress the clutter and effectively detect small and dim moving targets.
Our algorithm provides a novel spatio-temporal anomaly detection framework for hyperspectral remote sensing. In addition, adaptive anomaly elimination in the temporal background is a good idea for detecting targets in complex motion. However, the proposed algorithm needs accurate frame registration and has enormous demand for data storage equipment. Besides, the iterative smoothing filter can effectively suppress background clutter, but blurs the boundary between the target and the background. In future work, we will focus on reducing the algorithm's need for inter-frame matching and data storage and improve the iterative smoothing filter by introducing edge-preserving filters. Furthermore, the proposed algorithm can be combined with target tracking, state estimation, and trajectory prediction and then provide motion information about targets. Acknowledgments: Thanks to Wang of Beihang University for providing the Cloud dataset.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: