A Sliding Window-Based Joint Sparse Representation (SWJSR) Method for Hyperspectral Anomaly Detection

: In this paper, a new sliding window-based joint sparse representation (SWJSR) anomaly detector for hyperspectral data is proposed. The main contribution of this paper is to improve the judgments about the probability of anomaly presence in signals using the integration of information gathered during transition of sliding window for each pixel. In this method, each pixel experiences different spatial positions with respect to the spatial neighbors through the transition of this sliding window. In each position, an optimized local background dictionary is formed using a K-Singular Value Decomposition (K-SVD) algorithm and the recovery error of sparse estimation for each pixel is calculated using a simultaneous orthogonal matching pursuit algorithm (SOMP). Thus, the votes of each signal in terms of the anomaly presence in each spatial neighborhood are calculated and the variance of these recovery errors is considered as the detection criterion. The experimental results of the proposed SWJSR method on both synthetic and real datasets proved its higher performance compared to the Global RX (GRX), Local RX (LRX), Collaborative Representation Detector (CRD), Background Joint Sparse Representation (BJSR), Causal RX Detector (CR-RXD, CK-RXD), and Sliding Local RX(SLRX) detectors with an average efﬁciency improvement of about 7.5%, 14.25%, 8.2%, 8.25%, 6.45%, 6.5%, and 3.6%, respectively, in comparison to the mentioned algorithms.


Introduction
Today, hyperspectral imaging has become a powerful tool in the field of remote sensing applications.It provides valuable data acquired from hundreds of narrow spectral bands across the reflective electromagnetic spectrum to distinguish different materials based on their unique spectral responses [1].Target detection and classification could be considered as the most important information extraction approaches in hyperspectral data interpretations [2][3][4].The target detection algorithms could be utilized in supervised and unsupervised categories [4].In the former case, the spectral signatures of the targets are used in detection algorithms whereas, in the latter, no prior knowledge is available regarding the spectral characteristics of targets, and just the detection of the spectral anomalies would be on the agenda [5].In fact, anomaly detection algorithms could be considered as an unsupervised classification with two classes (anomaly and background) [6].Thus, the anomalies are unknown targets that are significantly different from their neighbor samples and their probabilities of occurrence are low.Detection of these differences would be independent of the spectral signature of the targets and thus, their effective parameters, including the environmental and atmospheric conditions [7].Remote sensing application, such as search and rescue [8], detection of military vehicles and objects [9], detection of rare minerals in geology, recognition of vegetation stress [10], toxic wastes in environmental monitoring, and tumors in medical imaging could be considered as spectral anomalies that can be detected via hyperspectral anomaly detection algorithms.
All of the developed methods in the field of anomaly detection could be classified into two broad categories.Local and global methods include the first category in this area.In the global methods, the judgment criterion of each pixel in terms of anomaly presence is the generation of indicators that use all the signals recorded in the hyperspectral image [11].In local methods, only the spatial neighbors of each signal are used for this purpose.When considering the compliance or non-compliance of hyperspectral data to the normal distribution assumption in the feature space leads to another categorization of anomaly detection algorithms.Parametric methods, such as considering the covariance/correlation matrix, assume that the background data follow a normal distribution.In contrast, methods that are based on linear un-mixing or sparse representations do not make any assumption on the statistical distribution of the hyperspectral data.
The Reed-Xiaoli (RX) method [12] is known as a traditional benchmark of hyperspectral anomaly detection algorithms.The idea of this traditional algorithm has been used as the basis of development of other similar methods which have been used in both local and global strategies, such as normalized RX, modified RX, causal RX [13,14], weighted RX [15], RX-UTD, and Adaptive Causal Anomaly Detector (ACAD) [16].The main assumption of these algorithms is that the hyperspectral data follows the multivariate normal distribution.Thus, it is assumed that the anomalous signals would be placed in a larger Mahalanobis distance compared to the centroid of the data.Although it seems reasonable in the homogeneous regions, it is not, however, convenient to represent the background signals when the data do not follow the Gaussian distribution.In this regard, some modified version of RX, such as the Kernel-RX algorithm [17] was proposed to overcome the flaws of the mentioned RX assumption for the background.This method attempts to increase the tendency of the data in the feature space to the Gaussian distribution by mapping the signals into a higher dimensional space using non-linear kernels.When considering the use of the covariance/correlation matrix of the sampled data in RX-based methods, these methods are categorized as parametric algorithms.
Another developed algorithm to detect anomalies in hyperspectral data is the Dual Window-based Eigen Separation Transform (DWEST) algorithm [18].Based on the linear transformation of EST, this algorithm has been designed to maximize the separation between two classes in the low-dimensional subspaces by using local windows [19].The Nested Spatial Window-based Target Detector (NSWTD) algorithm is also another anomaly detection algorithm [20].In this algorithm, similar to DWEST, the nested spatial windows with a pre-defined size are used as inner, middle, and outer windows.The evaluation criterion of the spectral features differences of these windows is also Orthogonal Projection Divergence (OPD).Liu and co-workers extends the concept of DWEST to propose a new approach, called multiple-window anomaly detection (MWAD), using multiple windows to perform anomaly detection adaptively.This method is able to detect anomalies of various sizes using multiple windows so that local spectral variations can be characterized and extracted by different window sizes [21].Chang and co-workers proposed an anomaly detection method using causal sliding windows, which has the real-time capability.They suggested three types of causal windows, using causal sliding square matrix windows, causal sliding rectangular matrix windows, and causal sliding array windows.In this method a causal sample covariance/correlation matrix can be derived for causal anomaly detection.In the case of using covariance matrix and correlation matrix, they are called CK_RXD and CR-RXD, respectively.They also proposed a recursive update equation to speed up the real-time processing [22].Moreover, Li and co-workers introduced another method, named the CRD algorithm [23].The main assumption in this method is the possibility of precise background estimation using the neighboring pixels.Thus, it is not true for the anomaly signals and a high residual occurs.Therefore, in this method the l 2 -norm of residuals of the estimated signals have been considered as an anomaly detection map.In other words, this detector locally estimates the backgrounds using a dynamic dual-window structure, and, subsequently, estimating the error vector of the signals located at the center of the window is considered as the criterion of probability of anomaly presence for each signal.The idea of background signals recovery using bases of the background subspace and utilizing these bases to recover the anomalous signals is considered as the most important innovative aspect of this detector.Yuan and co-workers proposed a novel method for fast and accurate hyperspectral anomaly detection, which is called 2DCAD [24].In this method a high-order two-dimensional (2-D) crossing approach is proposed to find the regions of rapid change in the spectrum, which runs without any a priori assumption.This method has a low-complexity discrimination framework which can be implemented by a series of filtering operators with linear time cost.Also it has the ability to detect the true pixel-level for real-time application.Also, Yuan and co-workers proposed a graph-based method for anomaly detection without any assumptions of background distribution statistics [25].In this method, after the construction of a vertex-and edge-weighted graph, a pixel selection process is utilized to locate the anomalies.The philosophy behind this method is that the anomalies tend to be picked out more easily than the background pixels in the constructed graph.Because an anomaly pixel generally deviates from the background, and its distinctiveness makes its connections with other background pixels vulnerable.This method has good robustness to noise and adaptability to window sizes, which makes it more applicable in the real situations.
Recently, another method by applying sparse representation theory has been introduced and successfully accepted as a strong tool for anomaly/target detection [26].The main objective of these techniques are the recovery of the majority of high-dimensional signals via a low-dimensional subspace through a dictionary of normalized signals (atoms).In the process of sparse estimation of each signal, a limited number of atoms of a dictionary are active and a majority of coefficients related to dictionary atoms are zero [27].In other words, signals are recovered via a linear mixing of atoms in the dictionary through the sparse coefficients.
When considering the sparse representation techniques, targets and anomalies could be detected using two different approaches.In the target detection approach, the creation of a dictionary containing background and target spectra are the main steps of sparse representation.In other words, a proper background modeling would result in the efficient presence estimation of spectral targets [5].In this regard, Chen and co-workers [28] defined a dictionary including interested targets using their spectral signatures, at the same time another dictionary was including the local background signals.Subsequently, these two dictionaries are used to make a decision on a pixel being a target or a background.This decision can be made through sparse estimation of each pixel using two target and background dictionaries while considering the recovery error differences.Furthermore, Du and co-workers [29] presented a target detection algorithm through integration of statistical methods and sparse representation by the Hybrid Sparsity and Statistic Detector (HSSD) algorithm.The primary assumption in this method is that the pixel of interest follows the Gaussian normal distribution with the same covariance and different variance in two statistical hypotheses of being or not being a target.To achieve the efficient detection, the probable target pixels are removed from the dictionary related to the background through utilizing the SAM algorithm based on the initial target spectral signatures.Then, in an iterative process, the sparse estimation is performed by the Orthogonal Matching Pursuit (OMP) method in two stages: (1) the dictionary including only the background data; and, (2) the integrated dictionary of target and background data.Finally, the recovery error difference of the pixel in these two stages in comparison to a pre-determined threshold will yield the decision as to whether the pixel is a target or a background.
In anomaly detection methods, considering no prior knowledge about the spectral targets, the plan is to build a dictionary of atoms that can exclusively model the background elements [30].In other words, having a dictionary that is composed of bases denoting the background subspace enables the precise recovery of background signals.Additionally, the presence of anomaly signals, assuming their deviation from the background subspace, will not have a precise estimation by the background dictionary.The main idea of anomaly detection methods based on sparse representation of signals is focused on evaluating recovery errors of signals by a dictionary that describes the background subspace.The effort of removing atoms that describe the anomaly in the background dictionary can be considered as one of the essential actions in this procedure [31].In this field, Yuan and co-workers [32] presented a new method for anomaly detection in hyperspectral images by introducing a spatial-spectral evaluation index, which is called the Local Sparsity Divergence (LSD), where the estimation of sparse matrix elements is locally performed by the determination of the search window dimensions.Lee et al. [33] also suggested Background Joint Sparse Representation (BJSR) for anomaly detection by estimating the background locally using a limited number of subspaces extracted from the hyperspectral data through sparse coding.Zhao and co-workers [34] presented the Sparsity Score Estimation Anomaly Detector (SSEAD) algorithm for the same reason.In this method, an index is used to detect anomalies based on the frequency of the participating atoms in the dictionary learning process to estimate the background atoms.In this way, through an iterative process, the estimation of the background is optimized.Moreover, through optimization of the weighting of the forming atoms of the background, each pixel in the hyperspectral data is scored and the decision is made for being an anomaly or background.Zhang and co-workers [35] also introduced the LLTSA-SSBJSR method as an extension to the BJSR method.This method first uses the spectral space to identify anomalies, and then spatial analysis is performed on the dimensionally reduced data by the LLTSA method.Also Ma and co-workers proposed a novel anomaly detection method based on sparse dictionary learning with capped norm constraint using the sliding dual window, which is named SDLCN [36].In this method, a number of patches with same size from the entire image are randomly selected and stacked as training data to construct the background dictionary.After that the capped l1 norm based loss function is used to suppress the effects of anomalies in the training set, which will learn a better dictionary resistant to anomalies.After learning an optimized background dictionary, through computing the sparse representation coefficient matrix, the reconstruction errors are calculated, which can be regarded as the corresponding anomaly probability values.
By focusing on local anomaly detection algorithms, in all of these methods, the assumption of the spatial symmetry of background elements is considered to judge a signal.Thus, each pixel of the hyperspectral image is tested only once in terms of anomaly presence.In such situations, if the anomalous pixels are near the edges of the image, the probability of false detection will be increased and the background signals might be considered as anomalies.Due to the lack of prior knowledge about the spatial distribution of similar signals in a geographic area, providing a voting-based approach in the definition of a diverse neighborhood could be a good solution.Accordingly, in this research, creating diversity in the definition of spatial neighborhoods of spectral signals, as well as voting-based judgment in different situations, of the spatial distribution are proposed as two approaches to confront this challenge.In other words, the most important aspect of this study is to improve the judgments about the probability of anomaly presence in signals by diversifying the definition of spatial neighborhood of their surrounding area.Since, by designing an optimized local dictionary, which is based on a sliding window with a new structure, the votes of each signal in terms of anomaly presence in each spatial neighborhood are calculated with the aim of achieving better judgment.

Dictionary Learning and Joint Sparse Coding
In the sparse coding techniques, the b-dimensional signals ([s] b×1 ) are mapped to a low-dimensional subspace through a dictionary of atoms [37].When considering [D] . . ., → d n ] as a dictionary of unit length atoms ([ ) where b << n, the aim of the sparse estimation of a signal is to find the sparse vector [α] n×1 through solving an under-determined system of equations presented in Equation (1) [38]: where .0 indicates the L 0 -norm, which is equivalent to the number of non-zero elements of â.Since there is no explicit method to solve this equation systems, greedy algorithms [39] are used as the general approach to estimate â.OMP [40] and Simultaneous Orthogonal Matching Pursuit (SOMP) [41] techniques are two common approaches for greedy and sparse estimation of signals using dictionaries.
In the OMP algorithm, the sparse vector is estimated individually based on a signal, while in the SOMP algorithm, it is estimated simultaneously based on several signals.Both of these algorithms try to find atoms that describe signals iteratively to satisfy the conditions mentioned in Equation (1) [42].In these two techniques, in each iteration, an atom from the dictionary, which has the minimum spectral angle in the estimation error of a signal/signals is added as a new atom to a set of previously-selected atoms (activate atoms).In this trend, the similarity of the residual vector/vectors related to estimated signals using previous spanned subspace is considered as the criterion for choosing new atoms.In other words, when considering R as the vector/matrix of the estimation error obtained from previously activated atoms (Equation ( 2)), in each iteration the atom which maximizes . ., n) would be added as the new active atom to the set of previously-activated atoms in dictionary (D): Here, when the OMP technique is used the S is exclusively a vector including a single signal S = [s] b×1 and also when SOMP technique is used a matrix including all of the signals ) that tend to be estimated simultaneously.In the same way, it is obvious that the dimension of A will be [α] n×1 in the OMP technique and [ in the SOMP technique, as they share the same zero rows.Notably in the first iteration R is considered S (S = R) to select the first atom.
In sparse coding procedures, by creating a redundant dictionary from probable endmembers in the feature space, sparse recovery of signals is performed.Effective performance of a dictionary depends on the correct orientation of its atoms in the feature space, and also the lack of these bases in input data through the absence of end-members in the imaging process is possible.The direct use of sampled signals or learning of dictionary atoms are two main approaches of dictionary generation.In the first approach, if all sampled signals are chosen, the sparse estimation of each signal is merely converted to a minimum distance classification process and the L 0 -norm of the sparse estimation vector ( â) of each signal will be one.Choosing a part of the sampling signals faces probable problems, such as (1) the occurrence of a minimum distance classification phenomena ( â 0 = 1) for chosen signals, and (2) the probability of the impossibility of the signal subspace spanning using dictionary atoms.
In the anomaly detection applications using sparse coding methods, having a dictionary where their atoms are capable of spanning the formed space by background signals is critical.In other words, because the sparse estimating error of signals by the background dictionary is considered to be a measure of being or not being an anomaly for each signal, correct extraction of the background bases subspace and their presence in the dictionary is necessary.Due to the limitations of using randomly selected signals in the formation of the background dictionary (according to the designed structure of the proposed anomaly detection algorithm) learning of dictionary atoms to match with bases that can correctly recover the space of the background signals is used in this research.
The K-SVD technique [43], as one of the dictionary learning methods, by choosing a percentage of sampled signals, randomly creates the initial dictionary and during the iterative process converges its atoms to the spanning bases of the subspace of all input signals.In each iteration of the K-SVD algorithm after the sparse estimation of all signals with the OMP technique, the effect of loss of each atom in the estimation error vector of the signals is affected by that atom is evaluated.The main idea of this technique is to correct the base of the specified atom toward the dominating base of the estimating error vectors of signals.To this aim, specific vectors corresponding to the maximum singular value that is obtained from singular value decomposition of residual matrices is chosen as the substituted base of the specified atom.This iterative procedure is continued to stabilize the base of all atoms of the dictionary.

Methodology
In general, as mentioned before, there are two main strategies of global and local in the field of anomaly detection for hyperspectral images where all of the so-far developed methods can be placed in one of these categories.Our proposed method is among the local strategies that evaluate the probability of anomalies' existence using various spatial neighbor conditions around each pixel of interest (PoI).This is performed through a transition of a sliding window with a pre-defined size around the PoI.In other words, when considering the location of each PoI in the input image, e.g., Figure 1, the PoI experiences different spatial positions with respect to the spatial neighbors through the transition of the sliding window.In each position, the PoI is investigated for the presence and absence of the anomaly.Finally, the fusion of the results obtained for each PoI during its presence in the sliding window will generate the anomaly detection criteria.According to the Figure 1, m is the number of elements of sliding window.Thus, each PoI has the possibility of placement in m different positions with respect to the sliding window where, in each location, the consequent fi, I = 1, 2, …, m index will be calculated, as described in Equation (3).In other words, through a complete transition of the sliding window on each PoI, m different positions of the PoI would occur in the sliding window (wi, I = 1, 2, 3, …, m).Finally, for each PoI, the feature vector composed of m members (FPoI) will be calculated where the variance of its elements is used as the index of the anomaly detector.The main reason of proposing such an idea is the spatial asymmetry assumption of background elements around the probable spectral anomalies.Prior to this idea, all local anomaly detection methods have assumed the spatial symmetry of background elements around pixels of interest in the detection process.Furthermore, each pixel of the image is evaluated in terms of the presence of the anomaly only one time in its symmetric neighboring region.Accordingly, the proposed solution includes two main contributions: (1) the ability of judgment for each PoI with the variety of the spatial neighborhood; and, (2) the capability of the synergy of the obtained knowledge through the transition of the sliding window for each PoI.
The presentation strategy of the proposed algorithm is focused on the estimation of the fi indices for each PoI in the wi's location of the sliding window.It is obvious that generalization of this process to other wi positions of each PoI will yield the generation of its m-dimensional FPOI vector.
Simultaneous sparse representation of all pixels occurring in each wi using a unique set of local background atoms from a learned dictionary is the main idea of the proposed method in the field of anomaly index generation.Through this process, it is expected that simultaneous estimation of all the available signals in wi leads to imposing the selection of the descriptive background atoms.Consequently, the increase in the l2-norm of the estimated residuals for each signal could be interpreted as the level of anomaly.In this procedure, using the traditional RXD anomaly, the randomly selected signals used in dictionary learning are initially refined by exclusion of highlyprobable anomaly signals.Figure 2 illustrates the process of fi estimation in a wi occurrence.
Here, a spatial subset of a hyperspectral image that coincided with the position of wi is depicted where the PoI inside the wi is shown as a red pixel (parts a, b, and c).Obviously the transition of wi will change the location of the PoI in it.According to the Figure 1, m is the number of elements of sliding window.Thus, each PoI has the possibility of placement in m different positions with respect to the sliding window where, in each location, the consequent f i , I = 1, 2, . . ., m index will be calculated, as described in Equation (3).In other words, through a complete transition of the sliding window on each PoI, m different positions of the PoI would occur in the sliding window (w i , I = 1, 2, 3, . . ., m).Finally, for each PoI, the feature vector composed of m members (F PoI ) will be calculated where the variance of its elements is used as the index of the anomaly detector.The main reason of proposing such an idea is the spatial asymmetry assumption of background elements around the probable spectral anomalies.Prior to this idea, all local anomaly detection methods have assumed the spatial symmetry of background elements around pixels of interest in the detection process.Furthermore, each pixel of the image is evaluated in terms of the presence of the anomaly only one time in its symmetric neighboring region.Accordingly, the proposed solution includes two main contributions: (1) the ability of judgment for each PoI with the variety of the spatial neighborhood; and, (2) the capability of the synergy of the obtained knowledge through the transition of the sliding window for each PoI.
The presentation strategy of the proposed algorithm is focused on the estimation of the f i indices for each PoI in the w i 's location of the sliding window.It is obvious that generalization of this process to other w i positions of each PoI will yield the generation of its m-dimensional F POI vector.
Simultaneous sparse representation of all pixels occurring in each w i using a unique set of local background atoms from a learned dictionary is the main idea of the proposed method in the field of anomaly index generation.Through this process, it is expected that simultaneous estimation of all the available signals in w i leads to imposing the selection of the descriptive background atoms.Consequently, the increase in the l 2 -norm of the estimated residuals for each signal could be interpreted as the level of anomaly.In this procedure, using the traditional RXD anomaly, the randomly selected signals used in dictionary learning are initially refined by exclusion of highly-probable anomaly signals.Figure 2 illustrates the process of f i estimation in a w i occurrence.
Here, a spatial subset of a hyperspectral image that coincided with the position of w i is depicted where the PoI inside the w i is shown as a red pixel (parts a, b, and c).Obviously the transition of w i will change the location of the PoI in it.According to Figure 2, and as the first step, using the traditional RXD, the potentially anomalous signals were removed by applying a proper threshold (Th-Plane).This threshold is set to twice of the average (2μ) of RXD map.In continue, the signals having higher values than Th-Plane were omitted from randomly selection process of dictionary learning (parts c and d).The aim of this process is to perform initial refinement of dictionary atoms.Thus, the initial dictionary atoms (Dwi) would be more descriptive to model the background signals.
In Figure 2, the matrix [Swi]b×m (b is the number of hyperspectral image bands) contains the constructive signals of wi where the green columns are indications of possible anomaly signals that are detected using RXD.The black columns are randomly-chosen candidates as initial atoms in the process of dictionary learning (Dwi).The number of signals used in dictionary learning (K) is equivalent to the 'q' percentage of the Swi signals in the case of the omitted probably-anomalous signals.As a result, the matrix Dwi would be constructed from matrix Swi as the initial dictionary in the dictionary learning process (part d).
To optimize the initial atoms of Dwi, the K-SVD method [43] has been utilized where the OMP algorithm [41] is used for the sparse coding process of each signal (part e).In this method, the direction of each dictionary atom is updated in an iterative process.This method is composed of two main steps.In the first step, sparse coding of all input signals (Swi) is performed, and, in the second step, for each selected atom, a new direction is calculated using the signals coded by the selected atom.This new direction is estimated through the Singular Value Decomposition (SVD) method of a matrix composed of columnar vectors, indicating the residual of the affected signals.In the other words, the estimated residual vector of the signals is calculated by only the signals affected by that selected atom.In this process, while the selected atom is absent, through elimination of the effect of the selected atom, the residual vector of the estimated signals will be calculated.Finally, the bdimensional eigenvector corresponds to the largest singular value will be chosen as the substitute direction of each selected atom.As can be seen in [43], dictionary learning is an iterative procedure, which includes two steps (1) sparse coding of the Swi signals, and (2) optimization of the direction of Dwi atoms.
After the Dwi was learned via K-SVD, finding a common subspace for all Swi signals is performed through the SOMP algorithm (part e).In this algorithm, the sparse coding of a set of signals is simultaneously carried out.This means that all of the Swi signals will be simultaneously estimated through the same subspace spanned by the atoms in the learned dictionary with the minimum dimension.Furthermore, the minimization of the l2-norm of estimated residuals should be satisfied.The aim of this process is choosing the background descriptive atoms to reconstruct the Swi signals.
As discussed before, it is expected that, during this process, the background signals are estimated to be more precise than the rare and anomalous signals.When considering this expectation, after the estimation of the Swi residual vectors using SOMP, their l2-norm for all wi signals would be calculated as rj, j = 1, 2, …, m.To continue, the index fi for the PoI will be estimated by normalizing the ri corresponding the location of PoI in wi through the Equation (3): According to Figure 2, and as the first step, using the traditional RXD, the potentially anomalous signals were removed by applying a proper threshold (Th-Plane).This threshold is set to twice of the average (2µ) of RXD map.In continue, the signals having higher values than Th-Plane were omitted from randomly selection process of dictionary learning (parts c and d).The aim of this process is to perform initial refinement of dictionary atoms.Thus, the initial dictionary atoms (D wi ) would be more descriptive to model the background signals.
In Figure 2, the matrix [S wi ] b×m (b is the number of hyperspectral image bands) contains the constructive signals of wi where the green columns are indications of possible anomaly signals that are detected using RXD.The black columns are randomly-chosen candidates as initial atoms in the process of dictionary learning (D wi ).The number of signals used in dictionary learning (K) is equivalent to the 'q' percentage of the S wi signals in the case of the omitted probably-anomalous signals.As a result, the matrix D wi would be constructed from matrix S wi as the initial dictionary in the dictionary learning process (part d).
To optimize the initial atoms of D wi , the K-SVD method [43] has been utilized where the OMP algorithm [41] is used for the sparse coding process of each signal (part e).In this method, the direction of each dictionary atom is updated in an iterative process.This method is composed of two main steps.In the first step, sparse coding of all input signals (S wi ) is performed, and, in the second step, for each selected atom, a new direction is calculated using the signals coded by the selected atom.This new direction is estimated through the Singular Value Decomposition (SVD) method of a matrix composed of columnar vectors, indicating the residual of the affected signals.In the other words, the estimated residual vector of the signals is calculated by only the signals affected by that selected atom.In this process, while the selected atom is absent, through elimination of the effect of the selected atom, the residual vector of the estimated signals will be calculated.Finally, the b-dimensional eigenvector corresponds to the largest singular value will be chosen as the substitute direction of each selected atom.As can be seen in [43], dictionary learning is an iterative procedure, which includes two steps (1) sparse coding of the S wi signals, and (2) optimization of the direction of D wi atoms.
After the D wi was learned via K-SVD, finding a common subspace for all S wi signals is performed through the SOMP algorithm (part e).In this algorithm, the sparse coding of a set of signals is simultaneously carried out.This means that all of the S wi signals will be simultaneously estimated through the same subspace spanned by the atoms in the learned dictionary with the minimum dimension.Furthermore, the minimization of the l 2 -norm of estimated residuals should be satisfied.The aim of this process is choosing the background descriptive atoms to reconstruct the S wi signals.
As discussed before, it is expected that, during this process, the background signals are estimated to be more precise than the rare and anomalous signals.When considering this expectation, after the estimation of the S wi residual vectors using SOMP, their l 2 -norm for all w i signals would be calculated as r j , j = 1, 2, . . ., m.To continue, the index f i for the PoI will be estimated by normalizing the r i corresponding the location of PoI in w i through the Equation (3): Through the transition of w i (i = 1, 2, . . ., m) on PoI, a m-dimensional vector F PoI will be generated.Finally, its variance would be selected as the criterion of anomaly detector after the 3σ statistical test (Equation (4).In this equation, m * is the number of f i for each PoI after the blunder separation procedure by the 3σ test.The well-known 3σ test is a standard statistical test to remove blunders from the random variable sets (f i ).In this procedure, by assuming the normal distribution of the random variables, using mean (µ) and the standard deviation (σ) of the f i elements, those samples that are located within the range of µ − 3σ < f i < µ + 3σ are known as inliers and the other samples outside this interval are considered to be outliers.Finally, the outlier samples are removed from the data set and the anomaly index (Equation ( 4)) is calculated for the remaining samples [44]: The following pseudo-code represents the general process of the proposed algorithm (Algorithm 1).

Datasets and Pre-Processing
Three real and two synthetic datasets were used in this research.The first real dataset contains an urban and forestry region of Cook city in Minnesota, USA acquired by a Hymap hyperspectral sensor with 126 spectral bands ranging from 370-512 µm in 2006.This data is freely available to the public through the Rochester Institute of Technology (RIT) and includes several targets with known spatial and spectral characteristics.This data is considered as a reference for the evaluation of target and anomaly detection methods.Figure 3 shows this reference data and the location of spectral targets.The details of the spectral targets and their behavior can be studied in [45].
Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 25 and anomaly detection methods.Figure 3 shows this reference data and the location of spectral targets.The details of the spectral targets and their behavior can be studied in [45].A subset of 80 × 100 pixels from the first real data containing six spectral targets was selected for analysis in this study.The second real data is from an airport zone of San Diego that has been collected by the AVIRIS sensor with 224 spectral bands ranging from 350-2510 nm.This has been converted to a 100 × 100 × 189 hypercube after removal of the water absorption and noisy bands.In this data, three spectral targets (airplanes) with extents of more than a couple of pixels exist and were used to apply the anomaly detection algorithms.Figure 4 displays the original data, the selected subset, and the spectral curve of the targets.The third real data has been acquired from a region in Viareggio city in Italy collected by the SIM.GA airborne sensor [46].Although this original dataset has 512 spectral bands, ranging from 388-994 nm, after removal of the low Signal to Noise Ratio (SNR) bands, by applying a spectral resampling process with a 4 nm interval, it has been converted to a 100 × 100 × 123 hypercube.In the area of interest on this image, five spectral targets with extents of more than a few pixels were chosen to apply anomaly detection algorithms.Figure 5 shows the original data, the area of interest, and the spectral curve of targets.
On the other hand, in the majority of previous works, the efficiency of the developed methods for target and anomaly detection has also been evaluated using synthetic data.In this research, two synthetic datasets were also created when considering two different strategies.As the first strategy, some sub-pixel targets were implanted in a region near to the location of the original targets from the Rochester Institute of Technology (RIT) data.Figure 6 shows the implantation targets and their spectral curves.Thus, this way, the number of spectral targets will be increased and a higher number of probable spectral anomalies should be identified.According to Figure 6, seven spectral targets with 50-80% of similarity to the original spectrum were linearly added to the hyperspectral image and a total of 13 potential anomaly pixels were constructed.Then, with the aim of simulating PSF effects, a Gaussian weighted averaging process using a 3 × 3 window around the location of the implanted target was applied.A subset of 80 × 100 pixels from the first real data containing six spectral targets was selected for analysis in this study.The second real data is from an airport zone of San Diego that has been collected by the AVIRIS sensor with 224 spectral bands ranging from 350-2510 nm.This has been converted to a 100 × 100 × 189 hypercube after removal of the water absorption and noisy bands.In this data, three spectral targets (airplanes) with extents of more than a couple of pixels exist and were used to apply the anomaly detection algorithms.Figure 4 displays the original data, the selected subset, and the spectral curve of the targets.and anomaly detection methods.Figure 3 shows this reference data and the location of spectral targets.The details of the spectral targets and their behavior can be studied in [45].A subset of 80 × 100 pixels from the first real data containing six spectral targets was selected for analysis in this study.The second real data is from an airport zone of San Diego that has been collected by the AVIRIS sensor with 224 spectral bands ranging from 350-2510 nm.This has been converted to a 100 × 100 × 189 hypercube after removal of the water absorption and noisy bands.In this data, three spectral targets (airplanes) with extents of more than a couple of pixels exist and were used to apply the anomaly detection algorithms.Figure 4 displays the original data, the selected subset, and the spectral curve of the targets.The third real data has been acquired from a region in Viareggio city in Italy collected by the SIM.GA airborne sensor [46].Although this original dataset has 512 spectral bands, ranging from 388-994 nm, after removal of the low Signal to Noise Ratio (SNR) bands, by applying a spectral resampling process with a 4 nm interval, it has been converted to a 100 × 100 × 123 hypercube.In the area of interest on this image, five spectral targets with extents of more than a few pixels were chosen to apply anomaly detection algorithms.Figure 5 shows the original data, the area of interest, and the spectral curve of targets.
On the other hand, in the majority of previous works, the efficiency of the developed methods for target and anomaly detection has also been evaluated using synthetic data.In this research, two synthetic datasets were also created when considering two different strategies.As the first strategy, some sub-pixel targets were implanted in a region near to the location of the original targets from the Rochester Institute of Technology (RIT) data.Figure 6 shows the implantation targets and their spectral curves.Thus, this way, the number of spectral targets will be increased and a higher number of probable spectral anomalies should be identified.According to Figure 6, seven spectral targets with 50-80% of similarity to the original spectrum were linearly added to the hyperspectral image and a total of 13 potential anomaly pixels were constructed.Then, with the aim of simulating PSF effects, a Gaussian weighted averaging process using a 3 × 3 window around the location of the The third real data has been acquired from a region in Viareggio city in Italy collected by the SIM.GA airborne sensor [46].Although this original dataset has 512 spectral bands, ranging from 388-994 nm, after removal of the low Signal to Noise Ratio (SNR) bands, by applying a spectral resampling process with a 4 nm interval, it has been converted to a 100 × 100 × 123 hypercube.In the area of interest on this image, five spectral targets with extents of more than a few pixels were chosen to apply anomaly detection algorithms.Figure 5 shows the original data, the area of interest, and the spectral curve of targets.
On the other hand, in the majority of previous works, the efficiency of the developed methods for target and anomaly detection has also been evaluated using synthetic data.In this research, two synthetic datasets were also created when considering two different strategies.As the first strategy, some sub-pixel targets were implanted in a region near to the location of the original targets from the Rochester Institute of Technology (RIT) data.Figure 6 shows the implantation targets and their spectral curves.Thus, this way, the number of spectral targets will be increased and a higher number of probable spectral anomalies should be identified.According to Figure 6, seven spectral targets with 50-80% of similarity to the original spectrum were linearly added to the hyperspectral image and a total of 13 potential anomaly pixels were constructed.Then, with the aim of simulating PSF effects, a Gaussian weighted averaging process using a 3 × 3 window around the location of the implanted target was applied.As the second strategy of synthetic data generation, spectral destruction of original signals in the real RIT dataset was performed.In this strategy, a variation between ±5 to ±20% with respect to the original signals was applied to a randomly selected number of spectral bands (ranging from 5-10% of the total image bands) for six candidate pixels and a total of 12 potential anomaly pixels were constructed.The location of candidate pixels in this strategy were also locally chosen similar to the first strategy in the relatively homogeneous regions.The position of the destructed signals, a sample of the destructed spectral curve, and its related original data are displayed in Figure 7.

Results and Discussion
As mentioned before, to evaluate the results and efficiency of the proposed algorithm, five types of different data consisting of three real and two synthetic datasets were used.
In this study, the functionality of the proposed method was assessed by performing the threedimensional (3D)-ROC analysis [10,14] (Figure 8), evaluating the area under curves (Figure 9 and Table 1), background suppression criteria (Figure 10 and Table 1) and the generation of a targetbackground separation diagram [5] (Figure 11).As the second strategy of synthetic data generation, spectral destruction of original signals in the real RIT dataset was performed.In this strategy, a variation between ±5 to ±20% with respect to the original signals was applied to a randomly selected number of spectral bands (ranging from 5-10% of the total image bands) for six candidate pixels and a total of 12 potential anomaly pixels were constructed.The location of candidate pixels in this strategy were also locally chosen similar to the first strategy in the relatively homogeneous regions.The position of the destructed signals, a sample of the destructed spectral curve, and its related original data are displayed in Figure 7.

Results and Discussion
As mentioned before, to evaluate the results and efficiency of the proposed algorithm, five types of different data consisting of three real and two synthetic datasets were used.
In this study, the functionality of the proposed method was assessed by performing the threedimensional (3D)-ROC analysis [10,14] (Figure 8), evaluating the area under curves (Figure 9 and Table 1), background suppression criteria (Figure 10 and Table 1) and the generation of a targetbackground separation diagram [5] (Figure 11).As the second strategy of synthetic data generation, spectral destruction of original signals in the real RIT dataset was performed.In this strategy, a variation between ±5 to ±20% with respect to the original signals was applied to a randomly selected number of spectral bands (ranging from 5-10% of the total image bands) for six candidate pixels and a total of 12 potential anomaly pixels were constructed.The location of candidate pixels in this strategy were also locally chosen similar to the first strategy in the relatively homogeneous regions.The position of the destructed signals, a sample of the destructed spectral curve, and its related original data are displayed in Figure 7.As the second strategy of synthetic data generation, spectral destruction of original signals in the real RIT dataset was performed.In this strategy, a variation between ±5 to ±20% with respect to the original signals was applied to a randomly selected number of spectral bands (ranging from 5-10% of the total image bands) for six candidate pixels and a total of 12 potential anomaly pixels were constructed.The location of candidate pixels in this strategy were also locally chosen similar to the first strategy in the relatively homogeneous regions.The position of the destructed signals, a sample of the destructed spectral curve, and its related original data are displayed in Figure 7.

Results and Discussion
As mentioned before, to evaluate the results and efficiency of the proposed algorithm, five types of different data consisting of three real and two synthetic datasets were used.
In this study, the functionality of the proposed method was assessed by performing the threedimensional (3D)-ROC analysis [10,14] (Figure 8), evaluating the area under curves (Figure 9 and Table 1), background suppression criteria (Figure 10 and Table 1) and the generation of a targetbackground separation diagram [5] (Figure 11).

Results and Discussion
As mentioned before, to evaluate the results and efficiency of the proposed algorithm, five types of different data consisting of three real and two synthetic datasets were used.
In this study, the functionality of the proposed method was assessed by performing the three-dimensional (3D)-ROC analysis [10,14] (Figure 8), evaluating the area under curves (Figure 9 and Table 1), background suppression criteria (Figure 10 and Table 1) and the generation of a target-background separation diagram [5] (Figure 11).
The traditional ROC curve is obtained by plotting of the false alarm rate (P FA (versus the correct probability of detection (P D (for different thresholds through Equation ( 5): Since the output of anomaly detection algorithms is an image with two anomaly and background classes, by calculating the ratio of the number of correctly-detected anomaly pixels (N Signal detected ) to the total anomaly pixels (N t )for each threshold, the probability of correct detection is calculated.Additionally, with the calculation of the ratio of number of background pixels wrongly placed in the anomaly class (N False Alarm ) to the all pixels of the image (N), the probability of wrong detection (known as the false alarm rate) for each threshold will be obtained.
Recently, the 3-D ROC analysis with some advantages respect to 2-D one was developed to evaluate anomaly detection algorithms [44].In this case, varying the value of threshold (Th) enables the users to observe progressive changes in P D and P FA independently.A 3-D ROC curve can be generated when considering P D , P FA , and threshold (Th) as three components of a 3D point in the Cartesian coordinate system.In other words, it is a three-dimensional curve of (P D , P FA , Th), in which three different 2-D ROC curves could be also generated from each aspect.The 2-D ROC that was obtained from (P D , P FA ) is the traditional one and the 2-D ROC obtained from (P FA , Th) or (P D , Th) are the new ones.
The 2-D ROC of (P D , Th) could be represented as the progressive detection power versus the changes of threshold and the 2-D ROC of (P FA , Th) provides important information of progressive background suppression as the threshold varies, especially in the case of visual interpretation with no availability of ground truth data.
Having obtained the detection maps of each method in different situations, the plot of this 3D curve for 5000 numbers of different thresholds with the minimum and maximum limit of the map of detection was performed and the area under curves were considered as a scale of the evaluation of the efficiency.
The separability diagram is also one of the indices of efficiency evaluation of two-class classification algorithms that shows the statistical separation of anomaly and background data.This diagram is generated with the help of the ground truth map and shows the range of the recorded values in the anomaly and background locations in the detection map.The level of separation or a presence of overlap among the domains of anomaly and background values indicate the level of success of the anomaly detection algorithm.In plotting this diagram, the following steps are considered: (1) generating anomaly detection maps for all of the compared methods; (2) normalization of detection maps considering the minimum and maximum of all anomaly detection maps simultaneously; (3) identification of anomaly and background signals through a ground truth reference map; and, (4) estimation of the minimum and maximum anomaly and background values for each detection map in two ways: (A) without removing any of the signals that lead to the drawing of the bars in the graphs; and, (B) removing 10% of the minimum and maximum values of background and anomaly signals and mapping down the dropped domain into colored boxes.
To compare the results obtained from the proposed algorithm, seven other anomaly detection algorithms were also implemented.The traditional Global and Local RX algorithms [12], Causal R-RX, and K-RX [22], as well as the recently-developed CRD [23] and BJSR [33] algorithms in the field of anomaly detection were chosen for this evaluations.Except the Global RX which has not any setting parameters, other six algorithms have several setting parameters.Generally, default setting parameters proposed by the developers were used in our comparisons.Window size is the only setting parameter in Local RX method.Generally, the best result obtained from windows of 11 × 11, 13 × 13 and 15 × 15 pixels were used in the comparisons.In Causal R-RX and K-RX, the window width of sliding array is the main setting parameter which set to CW = 900, according the best result obtained in [22].The CRD setting are inner and outer windows as well as the regularization parameters.Here, a 7 × 7 inner and a 15 × 15 outer window size as well as 10 −6 were set as the regularization parameter [23].In continue, the BJSR setting parameters are background and guard window size, search window and level of sparsity.Window sizes (background and guard) were selected based on the optimum setting reported in [33].So, a 17 × 17 background window, 5 × 5 guard window, and 19 × 19 search window were the spatial setting and the sparsity level was set to 3 in SOMP method [33].However, among the detection algorithms used in this paper, the Local RX algorithm (LRX) was easily adapted to the proposed sliding window.Thus, a new version of Local RX called the Sliding-window Local RX anomaly detector (SLRX) is also used in the evaluations.In this version, the generation of F POI for each PoI is based on the Mahalanobis distance calculated by the Local RX of samples in the w i windows.Again, and similar to the method of SWJSR, the variance of F PoI has been used as the measure of this detector for each PoI.
Table 1 presents the results of AUC index (P D , P FA , and P FA , Th) for the abovementioned algorithms and the best results by the proposed algorithm (SWJSR) implemented on the five types of real and synthetic data (implanted and destructed data).When considering the traditional AUC (P D , P FA ) that is provided in Table 1, in the similar conditions the higher efficiency of the proposed algorithm is observed.Thus, an average 2% improvement when compared to the best results obtained from the other methods is noticeable.On the other hand, the AUC (P FA , Th) values of the proposed method are rather lower than the other algorithms.It should be noted that the AUC of P FA vs. Th represents the level of background suppression and their lower values indicate the better performance of the algorithms [47].In order to compare the anomaly detection algorithms, the 3D-ROC curves, 2-D ROCs of (P D , P FA ), 2-D ROCs of (P FA , Th), and the target-background separation diagrams that are related to each dataset (Table 1) are shown in the following figures .(e)    Again, higher efficiency of the SWJSR algorithm can be seen from the formation mechanism of 3-D ROC curves and separability diagrams.These diagrams, except for the implanted synthetic data, also reveal the desirable separation between the anomaly and background elements in the proposed method, which is a verification of better functionality when compared to the other methods.
In the case of synthetic data, all of the compared methods yield similar results.In the case of ROC (PD, PF) curves, mainly for all of the applied examinations, the relevant curve of the SWJSR is closer to the upper-left of the diagram and this factor has yielded the increase of the AUC (PD, PF) value.
Figure 12 shows the obtained detection maps by the GRX, LRX, CRD, BJSR, CR-RXD, CK-RXD, SLRX, and the proposed algorithms (SWJSR) for the reference ground truth map and for all of the used data in this research.
According to the best obtained results from the suggested SWJSR algorithm, the sensitivity analysis of this algorithm with respect to its tuning parameters was also implemented.These parameters, including: (1) the dimensions of the sliding window; and, (2) the level of sparsity, were used in simultaneous reconstruction of the background signals by SOMP algorithm.As the first investigation, the results of changing the sliding window dimensions in the AUC index for all of the used data are presented in Table 2. Again, higher efficiency of the SWJSR algorithm can be seen from the formation mechanism of 3-D ROC curves and separability diagrams.These diagrams, except for the implanted synthetic data, also reveal the desirable separation between the anomaly and background elements in the proposed method, which is a verification of better functionality when compared to the other methods.
In the case of synthetic data, all of the compared methods yield similar results.In the case of ROC (P D , P F ) curves, mainly for all of the applied examinations, the relevant curve of the SWJSR is closer to the upper-left of the diagram and this factor has yielded the increase of the AUC (P D , P F ) value.
Figure 12 shows the obtained detection maps by the GRX, LRX, CRD, BJSR, CR-RXD, CK-RXD, SLRX, and the proposed algorithms (SWJSR) for the reference ground truth map and for all of the used data in this research.
According to the best obtained results from the suggested SWJSR algorithm, the sensitivity analysis of this algorithm with respect to its tuning parameters was also implemented.These parameters, including: (1) the dimensions of the sliding window; and, (2) the level of sparsity, were used in simultaneous reconstruction of the background signals by SOMP algorithm.As the first investigation, the results of changing the sliding window dimensions in the AUC index for all of the used data are presented in Table 2.As shown in Table 2, the obtained results depend on the correct definition of the dimensions of the sliding window.In this regard, according to part b of Figure 2, using the traditional RXD, to remove potentially anomalous signals by applying a proper threshold (Th-Plane), the covariance matrix that was estimated from a small number of data samples, could involve rank-deficient (non-invertible) matrices.To overcome numerical instabilities in case of smaller sliding windows, the pseudo inverse based on the Moore-Penrose method was used.To this aim when the number of samples (sliding window elements) was less than the number of spectral bands, the "pinv" function was used as an alternative of common inversion function (inv) in MATLAB.In the case of spectral anomalies with large spatial extension, it is necessary to use large sliding windows to achieve better results.For example, since the multi-pixel anomaly regions for the San Diego airport zone and Viareggio city in Italy are more extended than one-pixel anomalies in the RIT data, the optimum sliding window is also larger.The same rule also applies to the San Diego data compared with the Viareggio data, and the larger sliding window dimensions are convenient.Thus, keeping this rule in mind and while considering the spatial resolution of the sensor, obtaining primary knowledge about the extension of probable anomalies could be effective in tuning the sliding window size, reaching reliable results faster.This knowledge is less important when dealing with anomalies in the range of one pixel or less.

Size of Window
One of the most important tuning parameters of the suggested algorithm is a determination of the level of simultaneous sparse estimation of the background elements, which is called level of sparsity.In other words, the maximum number of atoms used from the learned dictionary for simultaneous estimation of all signals located in the sliding window is another tuning parameter.It is obvious that the value of this parameter depends on the variety of the occurrence of endmembers in the window.Since this value is indicative of the maximum use of the dictionary atoms, it is obvious that a lower, or the same, number of dictionary atoms that are proportional to this tuning parameter are selected in the recovery of all sliding window positions.Assigning a low number for this parameter yields an incomplete modeling of the background, and, when considering a higher number than necessary, results in the possibility of cooperation of unrelated atoms in decreasing the recovery residuals during the anomaly occurrence.Accordingly, the optimum value of this parameter was selected in a way that provided a balance between the two mentioned boundaries of the consequences of the incorrect selection of this parameter.
In Table 3, by assigning the identified optimum value for each dataset to the dimension of the sliding window, the effect of changing the level of sparsity in the AUC index has also been studied for all the datasets.
As observed from the results of Table 3, an optimized selection of this parameter has a significant role in the efficiency of the proposed algorithm.Indeed, considering the variety of the input data, choosing values of 5, 6, or 7 for this parameter will mainly yield desirable results, although in the ranges close to the optimum value this parameter did not reveal a significant change in results.Incorrect determination of this will considerably influence the results.
Since the proposed method involves considerably high processing when compared to other methods, it could not be compared from the computational cost and running time point of view.For example, the running time for RIT data in MATLAB software using a computer having an Intel Core i7 2.6 GHz processor and 16 GB of RAM under the Windows 10 64-bit operating system was 129 s, which is longer than the other methods.Nevertheless, the average running time of the proposed method in comparison with other methods are tabulated in Table 4.These times are the average value of running times of all datasets in each anomaly detection algorithm.Finally, it seems that utilizing and developing parallel processing systems will increase the speed of running the proposed algorithm that is the focus of future studies of the authors.

Conclusions
Since the development of anomaly detection algorithms for hyperspectral images includes a large number of applications, many researchers are motivated to develop efficient methods in this area.In this paper, a new method based on simultaneous sparse representation of local background signals using a sliding window was proposed to detect spectral anomalies.In this method, all of the signals located in the sliding window are voted through examining the estimated error of each signal to determine if there is any anomaly or not.As the precision of recovery for each pixel of the hyperspectral image is evaluated several times during the transition of the sliding window, this potential provides better conditions for evaluation of each signal from being an anomaly or background.The learned dictionary in each position of the sliding window is affected by the signals that are located in that window, and, practically, each pixel is being recovered many times with the help of a set of different background dictionaries.
The results of implementation of the proposed SWJSR method in five used datasets in this research proved its higher functionality when compared to the GRX, LRX, CRD, BJSR, CR-RXD, CK-RXD, and SLRX detectors.According to the obtained AUC, the results show the average improvement of efficiency (AUC) of about 7.5%, 14.25%, 8.2%, 8.25%, 6.45%, 6.5%, and 3.6%, respectively, in comparison to the mentioned algorithms.The implementation of this idea and its success showed that development of voting algorithms and the combination of the results could be considered as an effective approach to detect anomalies in hyperspectral signals.This idea could also be utilized in other hyperspectral image processing algorithms to evaluate the results by comparing prior methods.The results of SLRX, which show the average improvement of efficiency (AUC) of about 10% in comparison with traditional local RX, confirms this idea.
Automatic tuning of the proposed SWJSR algorithm parameters and developing parallel processing techniques to improve the running time of this algorithm are the focus of future research of the authors.Moreover, detecting spatial anomalies by the proposed approach and using spatial-spectral features in this field include other interested future works of the authors.
Remote Sens. 2018, 10, x FOR PEER REVIEW 6 of 25 around the PoI.In other words, when considering the location of each PoI in the input image, e.g., Figure 1, the PoI experiences different spatial positions with respect to the spatial neighbors through the transition of the sliding window.In each position, the PoI is investigated for the presence and absence of the anomaly.Finally, the fusion of the results obtained for each PoI during its presence in the sliding window will generate the anomaly detection criteria.

Figure 1 .
Figure 1.Structure of moving the sliding window around the PoI.

Figure 1 .
Figure 1.Structure of moving the sliding window around the PoI.

25 Figure 2 .
Figure 2. The process of the proposed anomaly detection algorithm.

Figure 2 .
Figure 2. The process of the proposed anomaly detection algorithm.

Figure 3 .
Figure 3. Rochester Institute of Technology (RIT) real dataset: (a) the spectral curve of targets; (b) the location of targets in selected subset; and, (c) original data.

Figure 4 .
Figure 4. San Diego real dataset: (a) the spectral curve of targets; (b) the location of targets in selected subset; and, (c) the original data.

Figure 3 .
Figure 3. Rochester Institute of Technology (RIT) real dataset: (a) the spectral curve of targets; (b) the location of targets in selected subset; and, (c) original data.

Figure 3 .
Figure 3. Rochester Institute of Technology (RIT) real dataset: (a) the spectral curve of targets; (b) the location of targets in selected subset; and, (c) original data.

Figure 4 .
Figure 4. San Diego real dataset: (a) the spectral curve of targets; (b) the location of targets in selected subset; and, (c) the original data.

Figure 4 .
Figure 4. San Diego real dataset: (a) the spectral curve of targets; (b) the location of targets in selected subset; and, (c) the original data.

Figure 5 .Figure 6 .
Figure 5. Viareggio real dataset: (a) the spectral curve of targets; (b) The location of targets in selected subset; and, (c) the original data.

Figure 7 .
Figure 7. Destructed RIT dataset: (a) the spectral curve of destructed targets; (b) the location of destructed targets in the selected subset; and, (c) the original data.

Figure 5 .Figure 5 .Figure 6 .
Figure 5. Viareggio real dataset: (a) the spectral curve of targets; (b) The location of targets in selected subset; and, (c) the original data.

Figure 7 .
Figure 7. Destructed RIT dataset: (a) the spectral curve of destructed targets; (b) the location of destructed targets in the selected subset; and, (c) the original data.

Figure 6 .
Figure 6.Implanted RIT dataset: (a) the spectral curve of implanted targets; (b) the location of implanted targets in selected subset; and, (c) the original data.

Figure 5 .Figure 6 .
Figure 5. Viareggio real dataset: (a) the spectral curve of targets; (b) The location of targets in selected subset; and, (c) the original data.

Figure 7 .
Figure 7. Destructed RIT dataset: (a) the spectral curve of destructed targets; (b) the location of destructed targets in the selected subset; and, (c) the original data.

Figure 7 .
Figure 7. Destructed RIT dataset: (a) the spectral curve of destructed targets; (b) the location of destructed targets in the selected subset; and, (c) the original data.

Figure 11 .
Figure 11.Target-background separation diagram of the anomaly detection algorithms (the green box shows the target and the red box shows background statistics): (a) real RIT dataset; (b) real San Diego dataset; (c) real Viareggio dataset; (d) implanted RIT dataset; and, (e) destructed RIT dataset.

Figure 11 .
Figure 11.Target-background separation diagram of the anomaly detection algorithms (the green box shows the target and the red box shows background statistics): (a) real RIT dataset; (b) real San Diego dataset; (c) real Viareggio dataset; (d) implanted RIT dataset; and, (e) destructed RIT dataset.

Algorithm 1 .
SWJSR Anomaly Detector Algorithm.Find all w i around the PoI (W PoI = {w 1 , w 2 , . . ., w m }) 4 FOR all W PoI members 5 S w i = vector matrix of all signals in w i 6 Remove signals with the high anomaly potential from S w i using RX algorithm 7 D w i = randomly selected q% of remained Signals in S w i 8 TD w i = Trained D wi Using K-SVD Algorithm 9 seS w i = simultaneously estimated S w i using TD w i via SOMP algorithm 10 rS w i = [S w i -seS w i ] b×m

Table 1 .
Average improvement of efficiency (AUC) of the GRX, local RX algorithm (LRX), CRD, BJSR, CR-RXD, CK-RXD, sliding-window Local RX anomaly detector (SLRX), and SWJSR for all the datasets.(The bold one is higher in case of AUC(P D , P FA ) and it is lower in case of AUC(P FA , Th)).

Table 2 .
Effect of size of sliding window in the Sliding Window-Based Joint Sparse Representation (SWJSR) detector for all datasets.(The bold one is the higher).

Table 2 .
Effect of size of sliding window in the Sliding Window-Based Joint Sparse Representation (SWJSR) detector for all datasets.(The bold one is the higher).

Table 3 .
Effect of the level of sparsity in the SWJSR detector for all datasets.(The bold one is the higher).

Table 4 .
Average running time of the compared algorithms using all datasets.