A Randomized Subspace Learning Based Anomaly Detector for Hyperspectral Imagery

: This paper proposes a randomized subspace learning based anomaly detector (RSLAD) for hyperspectral imagery (HSI). Improved from robust principal component analysis, the RSLAD assumes that the background matrix is low-rank, and the anomaly matrix is sparse with a small portion of nonzero columns (i.e., column-wise). It also assumes the anomalies do not lie in the column subspace of the background and aims to ﬁnd a randomized subspace of the background to detect the anomalies. First, random techniques including random sampling and random Hadamard projections are implemented to construct a coarse randomized columns subspace of the background with reduced computational cost. Second, anomaly columns are searched and removed from the coarse randomized column subspace by solving a series of least squares problems, resulting in a puriﬁed randomized column subspace. Third, the nonzero columns in the anomaly matrix are located by projecting all the pixels on the orthogonal subspace of the puriﬁed subspace, and the anomalies are ﬁnally detected based on the L 2 norm of the columns in the anomaly matrix. The detection performance of RSLAD is compared with four state-of-the-art methods, including global Reed-Xiaoli (GRX), local RX (LRX), collaborative-representation based detector (CRD), and low-rank and sparse matrix decomposition base anomaly detector (LRaSMD). Experimental results show good detection performance of RSLAD with lower computational cost. Therefore, the proposed RSLAD offers an alternative option for hyperspectral anomaly detection.


Introduction
Hyperspectral imaging collects detailed spectral information of ground objects on the earth surface using hundreds of narrow and continuous bands [1,2]. It has distinctive advantages in detecting small and low-probability ground objects using the techniques of target detection [3][4][5][6]. In particular, with the forthcoming generation of hyperspectral sensors (e.g., EnMAP, HISUI, and Hispery), there is a tremendous need to develop intelligent methods and protocols for target detection to fully benefit from a wider range of spectral bands. Hyperspectral target detection can be applied in many realistic applications, including biophysical parameter retrieval [7], classification of complicated environments [8] and military target detection [9]. Compared with supervised target detection, unsupervised target detection, i.e., anomaly detection, does not require any prior knowledge of target spectral characteristic [10][11][12]. Anomaly detection methods have witnessed increasing interest due to their interesting applications [13][14][15][16]. In anomaly detection, common ground objects that dominate the image scene are defined as the background, whereas small and low-probability ground objects that detector (CRD) assumes that each background pixel can be approximately represented by its spatial neighborhoods. The anomalies are estimated by subtracting the predicted background from the original HSI data [32]. The low-rank and sparse representation (LRASR) based detector regards that a background pixel can be approximately represented by a background dictionary and the anomalies are estimated from the residual sparse image [33]. The low-rank representation and learned dictionary (LRRaLD) improves decomposition process of the regular low-rank representation model using the random selected dictionary and could obtain more robust detection results within less computation time [34]. The sparsity score estimation framework (SSEF) detector counts the frequency of each dictionary atom for hyperspectral data construction in sparse representation, and it estimates the anomalies using the sparsity score matrix of all pixels [35]. The low-rank and sparse decomposition (LSD) formulates the detection of anomalies as a RPCA problem in the local image region and finds the anomalies by soring each pixel by the norm of its corresponding sparse coefficient vector [36]. The low-rank and sparse matrix decomposition based anomaly detection (LRaSMD) [12] improved the RPCA model by separating the noise term from the anomaly term in the sparse noise matrix. The pixels that have small L 2 norm values of the sparse coefficient vectors are selected as the anomalies. Later, the LRaSMD is further improved by replacing the L 2 norm with the Mahalanobis distance [37].
In this paper, inspired by RPCA [38][39][40], we propose a randomized subspace learning based anomaly detector (RSLAD) for hyperspectral anomaly detection. The RSLAD assumes that the background is low-rank, and the anomaly matrix is sparse and has a small portion of nonzero columns (i.e., column-wise). Meanwhile, the RSLAD assumes that the background pixels lie in the column subspace of the background whereas the anomalies do not. It aims to find a randomized subspace of the background where anomalies are more likely to be excluded. Random techniques are utilized to find a coarse randomized subspace of the background. Random sampling and the random Hadamard projections could separately sketch the original data from columns and rows and greatly reduce the computational requirements of subspace learning. The anomaly columns are excluded from the coarse randomized subspace by solving a series of least squares problems, resulting in purified randomized subspace of the background. The anomalies are then located by projecting the data onto the orthogonal subspace projection of the purified column subspace.
Compared with current sparsity based anomaly detectors, our method favors three main contributions: (1) The RSLAD has more advanced assumptions than RPCA. The RSLAD assumes that the anomaly matrix has a small portion of nonzero columns and these nonzero columns do not lie in the column subspace of the background. This assumption reduces the impact from anomalies when constructing the column subspace of the background. In contrast, the RPCA based anomaly detectors assume that nonzero elements in the sparse anomaly matrix are uniformly scattered without any specific structure. Accordingly, nonzero entries in the anomaly matrix would negatively affect all the columns of the background when optimizing the convex programs.
(2) The idea behind RSLAD is more advanced than current sparsity based anomaly detectors. It is to find a randomized subspace of the background and investigates the low-dimensionality of the background column subspace and the independence between anomalies. It estimates the randomized column subspace of the background and alleviates the effects from anomalies in estimating the low-rank background.
(3) The RSLAD does not actually solve a complicated convex optimization problem, and it offers good performance with a low computational cost due to the use of random selection and projection. The low computational complexity makes it more appealing in practical applications.
The forthcoming sections of our paper are arranged as follows. Section 2 describes the modeling of background and anomalies in RSLAD. Section 3 presents the methodology of RSLAD. Section 4 compares with state-of-the-art methods and analyzes the detection performance of RSLAD using four real hyperspectral images. Section 5 discusses the experimental results. Section 6 draws conclusions of our paper.

Consider a hyperspectral data as a collection of band vectors
where M is the number of bands and N is the number of pixels. Let the background matrix and the anomaly matrix be denoted as B ∈ R M×N and S ∈ R M×N respectively. The anomaly detection is to separate the anomalies from the background, and accordingly, the HSI matrix Y can be expressed as the sum of background matrix B and anomaly matrix S.
The background matrix B consists of spectral vectors of main ground objects in the image scene and is assumed to lie on a low-dimensional subspace with low-rank properties. The anomaly matrix S collects spectral vectors of small and low-probability ground objects (i.e., anomalies), and hence it is column-wise sparse with a small portion of nonzero columns C. Obviously, the corresponding columns C in B is zero [38,41]. The matrices Y, B, and S are related as where b i and s i are the i-th column of B and S, respectively, r is the dimensonality of subspace of B, and U is the basis of column subspace of B. The constraint Rank(B) = r is to guarantee the low-rank property of B. The constraint (I − U U T U −1 U T )s i = 0, ∀i ∈ C is to restrict that the anomalies do not lie in the column subspace of B. Figure 1 illustrates the difference between the model in Equation (1) and the original RPCA [38]. The RPCA assumes that the matrix S is sparse with nonzero entries being scattered uniformly at random. The nonzero entries can have arbitrarily large magnitude. Consequently, all the columns of B can be affected by the nonzero elements of S [41,42]. On the contrary, Equation (1) assumes that the anomaly matrix S is column-wise sparse, where only a portion of its columns are nonzero. A small portion of the columns of the anomaly matrix S are nonzero, and these nonzero columns do not lie in the column space of B. Therefore, a portion of the columns that formulate the space of B are unaffected by the nonzero columns of S [42,43].

Modeling of Background and Anomalies in RSLAD
Consider a hyperspectral data as a collection of band vectors = ∈ ℝ × , where M is the number of bands and N is the number of pixels. Let the background matrix and the anomaly matrix be denoted as ∈ × and ∈ × respectively. The anomaly detection is to separate the anomalies from the background, and accordingly, the HSI matrix can be expressed as the sum of background matrix and anomaly matrix . The background matrix consists of spectral vectors of main ground objects in the image scene and is assumed to lie on a low-dimensional subspace with low-rank properties. The anomaly matrix collects spectral vectors of small and low-probability ground objects (i.e., anomalies), and hence it is column-wise sparse with a small portion of nonzero columns . Obviously, the corresponding columns in is zero [38,41]. The matrices Y, , and are related as where and are the i-th column of and , respectively, is the dimensonality of subspace of , and is the basis of column subspace of . The constraint Rank( ) = is to guarantee the low-rank property of . The constraint ( − ( ) ) ≠ , ∀ ∈ is to restrict that the anomalies do not lie in the column subspace of . Figure 1 illustrates the difference between the model in Equation (1) and the original RPCA [38]. The RPCA assumes that the matrix is sparse with nonzero entries being scattered uniformly at random. The nonzero entries can have arbitrarily large magnitude. Consequently, all the columns of can be affected by the nonzero elements of [41,42]. On the contrary, Equation (1) assumes that the anomaly matrix is column-wise sparse, where only a portion of its columns are nonzero. A small portion of the columns of the anomaly matrix are nonzero, and these nonzero columns do not lie in the column space of . Therefore, a portion of the columns that formulate the space of are unaffected by the nonzero columns of [42,43].

The Proposed RSLAD for HSI Anomaly Detection
The background matrix and the anomaly matrix are unknown, and it is difficult to find a closed form solution of Equation (1). It is often transformed into a convex optimization problem [41] with an objection function combining the nuclear norm minimization of and the , norm minimization of . The nuclear norm and , norm could ensure the low-rank of and the column space of , respectively. However, the approach could yield a robust estimate of only when the fractions of anomalies in the image scene are less than a constant threshold [43].

The Proposed RSLAD for HSI Anomaly Detection
The background matrix B and the anomaly matrix S are unknown, and it is difficult to find a closed form solution of Equation (1). It is often transformed into a convex optimization problem [41] with an objection function combining the nuclear norm minimization of B and the L 1,2 norm minimization Remote Sens. 2018, 10, 417 5 of 20 of S. The nuclear norm and L 1,2 norm could ensure the low-rank of B and the column space of S, respectively. However, the approach could yield a robust estimate of B only when the fractions of anomalies in the image scene are less than a constant threshold [43].
In Equation (1), once the column subspace U of the background is found, the anomalies can be also correctly located with the orthogonal subspace projection constraint ( where y i is a column (corresponds to one pixel) of the data matrix Y. The explanation is that the background pixels lie in the column subspace of B, whereas the anomaly pixels lie out of the column subspace of B. Accordingly, the idea of our RSLAD is to find a column subspace of the background where anomalies being excluded.

Constructing Coarse Randomized Subspace by Data Sketching
In Equation (1), the column subspace of the background matrix B is a low-dimensional subspace, and it can be spanned by a small subset of its columns. Assuming Y Ψ is the matrix of p randomly sampled columns of the background matrix B, and V Ψ is an orthonormal basis for the row space of B, random sampling states that, if the sampled number satisfies p ≥ 10µn log 2n δ , then Y Ψ and Y have the same column subspace with probability at least (1 − δ), and the Y Ψ is incoherent with the basis of row subspace V Ψ by satisfying max i ||e T i V Ψ || 2 2 ≤ 6µn p with probability at least (1 − δ) [43]. n is the rank of background B, e i is an identity vector with all entries equal to 1, and µ is the incoherence parameter of row subspace of B. Therefore, the column subspace Y Ψ of the background matrix B can be learned from a random subset of its columns when its row space is incoherent with the orthonormal basis.
Random sampling can be used to extract the low-dimensional subspace structure of the background. It also sketches the data matrix Y from columns and reduce its computational requirements. Suppose the p pixels (i.e., columns) are randomly sampled, the random sampling can be defined as where Y Ψ ∈ R M×p is the sub-matrix of Y with p selected columns, and Ψ is the random sampling matrix whose columns are randomly selected with replacement from the N × N identity matrix I N .
On the other hand, numerous bands in Y bring high computational complexity and memory. Random projections originate from the famous Johnson-Lindenstrauss lemma and have been proven to be a computationally efficient and sufficiently accurate method for reducing the dimensionality of the HSI data. Compared with the regular Gaussian random projections, the random Hadamard projections have lower computational costs and better performance [44,45]. Therefore, we adopt the random Hadamard projections to sketch the matrix Y Ψ from rows and reduce the spectral dimensionality of Y Ψ . The random Hadamard matrix based dimensionality reduction is defined as where Y Φ ∈ R K×p is the projected matrix and Φ = K M DH M P ∈ R M×K is the random Hadamard projection matrix. D ∈ R M×M is a diagonal matrix with diagonal entries sampled uniformly from {−1, 1}. H N ∈ R M×M is the Hadamard matrix defined recursively for any M that is an integer power , where H 1 = 1. P ∈ R M×K is a uniform sampling matrix that randomly samples K columns of DH N , where each column of P is randomly selected with replacement from the M × M identity matrix I M .

Purifying Randomized Column Subspace of Background
Due to the presence of anomalous pixels, Y Φ may include both background and anomalous columns. However, the number of background columns are much larger than that of anomaly columns with higher probability. Therefore, the anomaly columns should be removed from Y Φ to purify the column subspace of background.
The columns of Y Φ from random sampling and random projections can span the columns of background with high probability. Accordingly, any background column of Y Φ lies in the span of other background columns of Y Φ with high probability. That is, for each background column y i Φ , let the matrix Y Φ(−i) be Y Φ but with the i-th column being removed, it can be expressed as a linear combination of columns of Y Φ(−i) . In contrast, the anomaly column does not lie in the span of background columns. Since real hyperspectral data are always contaminated by noises, the anomaly columns from Y Φ can be located by solving the following least squares problem: RSE i > ε, i belong to anomaly columns RSE i ≤ ε, i belong to background columns (4) where ε > 0 is the defined residual threshold because of noise. After locating the anomaly columns in Y Φ , the purified matrix U can be obtained.

Detecting Anomalies Using Orthogonal Subspace Projection
The purified matrix U of Y Ψ contains linearly independent background columns and can be considered as a basis of the background matrix B. Since the anomalies do not lie in the column subspace of the background B, the projection of the HSI pixels on the orthogonal subspace of U can be used to locate nonzero columns of S [43]. The anomaly matrix S is estimated from Equation (5) Due to the impact of noise, the anomaly matrix S does not necessarily have a small portion of nonzero columns. Therefore, the L 2 norm is adopted to calculate the anomalous value for each pixel, and the pixels with the anomalous value above a manually selected threshold are determined to be anomalies.

The Summary of RSLAD for HSI Anomaly Detection
The RSLAD assumes that the background is low-rank, the anomaly matrix is sparse and column-wise, and the anomalies do not lie in the column subspace of the background B. It seeks the randomized subspace of the background B and detects the anomalies by projecting the HSI dataset Y on the complement subspace of the randomized subspace. Algorithm 1 lists the detailed procedure of RSLAD for anomaly detection. The RSLAD uses random sampling and the random Hadamard projections to sketch the HSI data from columns and rows, respectively. That greatly reduces the computational complexity and memory requirements of the HSI data in subspace learning. More importantly, random sampling constructs a coarse low-dimensional randomized subspace Y Φ of the background. After that, the RSLAD removes the anomalies from the coarse randomized subspace Y Φ by solving a series of least square problems, and the purified randomized subspace is obtained. The least square problem investigates the linear dependence of background pixels and the linear independence between the column subspace of background pixels and anomaly pixels. Finally, the RSLAD detects the anomalies in the HSI data Y by projecting it on the complement subspace of U. The sketch map of anomaly detection using RSLAD is illustrated in Figure 2.
(c) Locate the anomaly columns in using the residual threshold in (4); End for (a) Obtain the purified matrix from and set it as the basis of the background (3) Detecting the anomalies using orthogonal subspace projection (a) Compute the anomaly matrix using complement subspace projection of in (5); (b) Compute anomalous values of all pixels via the L2 norm of columns in the anomaly matrix Output: Anomaly detection map

The HSI Dataset Descriptions
The first dataset is the Pavia Center (PaviaC) dataset acquired by the reflective optics system imaging spectrometer (ROSIS) sensor [12,37]. It covers the Pavia Center in northern Italy and has accurate ground truth information. The number of bands in the initial dataset is 115 with 1.3 m spatial resolutions covering the spectrum range from 430 to 860 nm. In the experiment, the digital numbers of a smaller image were used as the input data, containing 108 × 120 pixels and 102 bands after removing low signal-to-noise ratio (SNR) bands. As shown in Figure 3a, three ground objects constitute the background: bridge, water, and shadow. A total of 47 pixels representing vehicles on the bridge and the bare soil near the bridge were commonly selected as anomalies. The reason is that they cover a very small number of pixels and are spectrally different from main ground objects. Figure 3b shows the ground objects of the anomalies and Figure 3c plots spectral curves of anomalies and main ground objects.

The HSI Dataset Descriptions
The first dataset is the Pavia Center (PaviaC) dataset acquired by the reflective optics system imaging spectrometer (ROSIS) sensor [12,37]. It covers the Pavia Center in northern Italy and has accurate ground truth information. The number of bands in the initial dataset is 115 with 1.3 m spatial resolutions covering the spectrum range from 430 to 860 nm. In the experiment, the digital numbers of a smaller image were used as the input data, containing 108 × 120 pixels and 102 bands after removing low signal-to-noise ratio (SNR) bands. As shown in Figure 3a, three ground objects constitute the background: bridge, water, and shadow. A total of 47 pixels representing vehicles on the Remote Sens. 2018, 10, 417 8 of 20 bridge and the bare soil near the bridge were commonly selected as anomalies. The reason is that they cover a very small number of pixels and are spectrally different from main ground objects. Figure 3b shows the ground objects of the anomalies and Figure 3c plots spectral curves of anomalies and main ground objects.
The third dataset includes the Botswana data acquired from Remote Sensing Group of the University of Texas at Austin (www.csr.utexas.edu/hyperspectral/index.html) [12,48]. The dataset was collected by the EO-1 Hyperion sensor. The dataset covers the area of Okavango Delta, Botswana. It was acquired on 31 May 2001 with 30 m spatial resolution and 10 nm spectral resolution, ranging the spectrum between 400 and 2500 nm. In total, 145 bands were used in the experiment: 10-55, 82-97, 102-119, 134-164, and 187-220. A smaller subset of size 235 × 255 pixels was cropped, containing five classes, namely, woodlands, exposed soil, savanna, floodplain and mopane [44], and the digital numbers were implemented as the input data. These classes reflect the impact of flooding on vegetation in the study area. In the image scene of Figure 5a, 35 pixels were selected as anomalies since these minority pixels are spectrally different from main ground objects. Figure 5b shows the ground objects of the anomalies and Figure 5c plots spectral curves of anomalies and main ground objects.   The second dataset includes the San Diego data collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) over San Diego, CA, USA [46,47]. The initial images have 3.5 m spatial resolution and 224 spectral channels ranging the spectrum from 370 nm to 2510 nm. In the experiment, a subset image with the image size of 100 × 85 pixels was selected, and the digital numbers were used as the input data. After removing the bad bands [1-6, 33-35, 97, 107-113, 153-166, 221-224] due to water absorption and low signal-to-noise ratio, the 189 bands were used in the experiment. In the image scene shown in Figure 4a, main ground objects of the background are roof, road, shadow and grass. Three planes occupying 58 pixels were commonly regarded as anomalies because they cover a very small number of pixels and are spectrally different from main ground objects. Figure 4b shows the ground objects of the anomalies and Figure 4c plots spectral curves of anomalies and main ground objects.
Remote Sens. 2018, 10, x FOR PEER REVIEW 8 of 20 The second dataset includes the San Diego data collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) over San Diego, CA, USA [46,47]. The initial images have 3.5 m spatial resolution and 224 spectral channels ranging the spectrum from 370 nm to 2510 nm. In the experiment, a subset image with the image size of 100 × 85 pixels was selected, and the digital numbers were used as the input data. After removing the bad bands [1-6, 33-35, 97, 107-113, 153-166, 221-224] due to water absorption and low signal-to-noise ratio, the 189 bands were used in the experiment. In the image scene shown in Figure 4a, main ground objects of the background are roof, road, shadow and grass. Three planes occupying 58 pixels were commonly regarded as anomalies because they cover a very small number of pixels and are spectrally different from main ground objects. Figure 4b shows the ground objects of the anomalies and Figure 4c plots spectral curves of anomalies and main ground objects.
The third dataset includes the Botswana data acquired from Remote Sensing Group of the University of Texas at Austin (www.csr.utexas.edu/hyperspectral/index.html) [12,48]. The dataset was collected by the EO-1 Hyperion sensor. The dataset covers the area of Okavango Delta, Botswana. It was acquired on 31 May 2001 with 30 m spatial resolution and 10 nm spectral resolution, ranging the spectrum between 400 and 2500 nm. In total, 145 bands were used in the experiment: 10-55, 82-97, 102-119, 134-164, and 187-220. A smaller subset of size 235 × 255 pixels was cropped, containing five classes, namely, woodlands, exposed soil, savanna, floodplain and mopane [44], and the digital numbers were implemented as the input data. These classes reflect the impact of flooding on vegetation in the study area. In the image scene of Figure 5a, 35 pixels were selected as anomalies since these minority pixels are spectrally different from main ground objects. Figure 5b shows the ground objects of the anomalies and Figure 5c plots spectral curves of anomalies and main ground objects.   The third dataset includes the Botswana data acquired from Remote Sensing Group of the University of Texas at Austin (www.csr.utexas.edu/hyperspectral/index.html) [12,48]. The dataset was collected by the EO-1 Hyperion sensor. The dataset covers the area of Okavango Delta, Botswana. It was acquired on 31 May 2001 with 30 m spatial resolution and 10 nm spectral resolution, ranging the spectrum between 400 and 2500 nm. In total, 145 bands were used in the experiment: 10-55, 82-97, 102-119, 134-164, and 187-220. A smaller subset of size 235 × 255 pixels was cropped, containing five classes, namely, woodlands, exposed soil, savanna, floodplain and mopane [44], and the digital numbers were implemented as the input data. These classes reflect the impact of flooding on vegetation in the study area. In the image scene of Figure 5a, 35 pixels were selected as anomalies since these minority pixels are spectrally different from main ground objects. Figure 5b shows the ground objects of the anomalies and Figure 5c plots spectral curves of anomalies and main ground objects.  Figure 6a shows the image scene of size 200 × 800 pixels, and its main background types include houses, roads, and trees. Two small fabric targets F1 and F2 occupying 32 pixels were selected as anomalies, since these minority pixels are spectrally different from main ground objects. Figure 6b shows the ground objects of the anomalies, and Figure 6c plots spectral curves of anomalies and main ground objects.

Experimental Results
In this section, we conduct four groups of experiments on the above hyperspectral datasets to testify the performance of RSLAD in anomaly detection. First, we make comparison between RSLAD and other four state-of-the-art detectors, including GRX [15], LRX [15], CRD [32] and LRaSMD [12]. Second, we investigate the performance sensitivity of RSLAD to the number of randomly sampled columns (i.e., pixels) p. Third, we explore the performance sensitivity of RSLAD to the randomly projected dimension K. Finally, we analyze the performance sensitivity of RSLAD to the residual threshold . The receiver operating characteristic (ROC) curve and area under curve (AUC) are utilized to evaluate the detection performance. The ROC depicts both the probability of detection and the probability of false alarm rate. In this paper, we utilize the logarithmic curve to better illustrate  Figure 6a shows the image scene of size 200 × 800 pixels, and its main background types include houses, roads, and trees. Two small fabric targets F1 and F2 occupying 32 pixels were selected as anomalies, since these minority pixels are spectrally different from main ground objects. Figure 6b shows the ground objects of the anomalies, and Figure 6c plots spectral curves of anomalies and main ground objects.  Figure 6a shows the image scene of size 200 × 800 pixels, and its main background types include houses, roads, and trees. Two small fabric targets F1 and F2 occupying 32 pixels were selected as anomalies, since these minority pixels are spectrally different from main ground objects. Figure 6b shows the ground objects of the anomalies, and Figure 6c plots spectral curves of anomalies and main ground objects.

Experimental Results
In this section, we conduct four groups of experiments on the above hyperspectral datasets to testify the performance of RSLAD in anomaly detection. First, we make comparison between RSLAD and other four state-of-the-art detectors, including GRX [15], LRX [15], CRD [32] and LRaSMD [12]. Second, we investigate the performance sensitivity of RSLAD to the number of randomly sampled columns (i.e., pixels) p. Third, we explore the performance sensitivity of RSLAD to the randomly projected dimension K. Finally, we analyze the performance sensitivity of RSLAD to the residual threshold . The receiver operating characteristic (ROC) curve and area under curve (AUC) are utilized to evaluate the detection performance. The ROC depicts both the probability of detection and

Experimental Results
In this section, we conduct four groups of experiments on the above hyperspectral datasets to testify the performance of RSLAD in anomaly detection. First, we make comparison between RSLAD and other four state-of-the-art detectors, including GRX [15], LRX [15], CRD [32] and LRaSMD [12].
Second, we investigate the performance sensitivity of RSLAD to the number of randomly sampled columns (i.e., pixels) p. Third, we explore the performance sensitivity of RSLAD to the randomly projected dimension K. Finally, we analyze the performance sensitivity of RSLAD to the residual threshold ε. The receiver operating characteristic (ROC) curve and area under curve (AUC) are utilized to evaluate the detection performance. The ROC depicts both the probability of detection and the probability of false alarm rate. In this paper, we utilize the logarithmic curve to better illustrate the details with a base 10 logarithmic scale for the false alarm rate and a liner scale for the probability of detection. The AUC quantifies the area under the ROC curve and shows how far the ROC curve from the base line (i.e., AUC = 1).

Detection Performance the RSLAD Method
This experiment compares the detection performance of RSLAD with four state-of-the-art methods: two classical detectors GRX and LRX, a representation based detector CRD [32] and a RPCA based detector LRaSMD [12]. Table 1 lists the parameters of all the five detectors on the four datasets. For the LRX and CRD methods, the outer window (OW) and inner window (IW) of PaviaC dataset are set to be 19 × 19 and 7 × 7, respectively; those of San Diego dataset are set to be 21 × 21 and 5 × 5, respectively; those of Botswana dataset are set to be 17 × 17 and 9 × 9, respectively; and those of HyMap dataset are set to be 15 × 15 and 7 × 7, respectively. For the LRaSMD method, the rank of background r and the sparsity level k of anomaly matrix on Pavia dataset are set to be 2 and 0.45, respectively; the r and k on San Diego dataset are set to be 2 and 0.4, respectively; the r and k on Botswana dataset are set to be 4 and 0.5, respectively; and the r and k on the on HyMap dataset are set to be 10 and 0.75, respectively. For the RSLAD method, the number of sampled pixels p, the projected dimension K and the residual threshold ε on the PaviaC dataset are manually set to be 150, 50 and 10 −9 , respectively; the p, K and ε on the San Diego dataset are manually set to be 120, 50 and 1.7 × 10 −10 , respectively; the p, K and ε on the Botswana dataset are manually set to be 100, 50 and 4 × 10 −10 , respectively; and the p, K and ε on the HyMap dataset are manually set to be 200, 60 and 1.7 × 10 −10 , respectively.   Figure 7 illustrates the ROC curves and confidence intervals and regions [51] of RSLAD and other four methods on the four datasets. For the PaviaC dataset of Figure 7a, RSLAD has the lowest false alarm rate at 100% probability of detection. The false alarm rate of RSLAD is smaller than those of GRX, LRX and CRD in the ROC curve. The CRD curve behaves better in the false alarm rate than that of LRX but it is inferior to that of GRX. The LRX is the worst among all five detectors. The explanation for the worse performance of LRX than GRX is as follows. The LRX assumes homogenous background within the spatial window. However, for Pavia Center, the anomalies lie close to the edge of the bridge, and the selected window with the anomalies can be composed of different materials including water, bridge, and shadow, which may be falsely detected as the anomaly targets. For the San Diego dataset in Figure 7b, the RSLAD has higher probability of detection than GRX, LRX and CRD. Moreover, the RSLAD performs slightly better than LRaSMD, having a slightly smaller false alarm rate at 100% probability of detection. The CRD behaves worse than LRaSMD and RSLAD in the false alarm rate but it outperforms the two RX detectors, especially LRX. The LRX has the worst performance of all the five methods. The reason for that is similar to that of PaviaC dataset in Figure 7a, i.e., the background within spatial neighborhood does not satisfy the homogeneous assumption. For the Botswana dataset of Figure 7c, the RSLAD curve is superior to those of GRX, LRX, CRD and LRaSMD, having the smallest false alarm rate at 100% probability of detection. The probability of detection in LRaSMD is higher than those of GRX, LRX and CRD. The CRD outperforms GRX and RX in the false alarm rate at 100% probability of detection, and the GRX is the worst of all the methods. For the HyMap dataset of Figure 7d, the LRX curve has the largest probability of detection with a small false alarm rate less than 0.01, but it could not fully detect all the anomalies in the HyMap imagery. The RSLAD curve has the smallest false alarm rate when obtaining 100% probability of detection. Furthermore, Figure 7 shows the 95% confidence regions drawn around each estimated ROC curves from all the five detectors. The superiority of RSLAD to GRX is statistically significant in all the four datasets, and the ROC curves of RSLAD and LRaSMD show less statistical difference from each other. datasets among all the five methods, and the second highest is LRaSMD. The CRD performs better than LRX in the AUCs. Figures 9-11 show the detection maps of all five methods on the PaviaC, San Diego and Botswana datasets with normalized anomaly values between 0 and 1. We did not show the HyMap detection maps because of too small and even invisible anomalies with respect to the overlarge image size. Table 2 compares the computational time of all five methods on the four datasets. All the methods are implemented in Matlab 2016a and their codes are run on a WIN10 computer with Intel Core (TM) i7-6700 CPU 3.40 GHz and 32 GB of RAM. The LRX and CRD cost longer computational times than the other three methods GRX, LRaSMD and RSLAD, and the CRD takes the longest computational times of all. The LRaSMD takes shorter computational times than LRX and CRD but its computational speed is much slower than GRX and RSLAD. The RSLAD takes slightly longer time than GRX on the PaviaC dataset, but it clearly surpasses the GRX in computational speed on the San Diego, Botswana and HyMap datasets.     Figure 7. The RSLAD has the highest AUCs on four HSI datasets among all the five methods, and the second highest is LRaSMD. The CRD performs better than LRX in the AUCs. Figures 9-11 show the detection maps of all five methods on the PaviaC, San Diego and Botswana datasets with normalized anomaly values between 0 and 1. We did not show the HyMap detection maps because of too small and even invisible anomalies with respect to the overlarge image size.

Performance Sensitivity to the Number of Sampled Pixels p
In the experiment, the ranges of sampled pixel numbers p on the PaviaC, San Diego and Botswana datasets are manually set as [60, 120, 300, 600, 1200, 3000, 6000]. Figure 12 illustrates the ROC curves from the RSLAD on the three HSI datasets by changing number of sampled pixels p. For the PaviaC data of Figure 12a, the detection performance increases with the rising p from 60 to 6000, and the RSLAD performs the best when the sampled pixel number p equals 300. After that, the detection performance of RSLAD begins to decrease. The ROC curve with p equal to 6000 performs the worst of all, having the highest false alarm rate at 100% probability of detection. The ROC curves of San Diego data in Figure 12b have similar observations. The performance is the best when the p equals 120. For the Botswana dataset in Figure 12c, the ROC curve of p at 60 performs the best of all, having the lowest false alarm rate at 100% probability of detection. Moreover, Figure 13a shows the result of AUC curves with the changing number of sampled pixels p on the three datasets. In the figure, the AUCs of RSLAD on three HSI datasets gradually decrease with the increasing p from 60 to 6000. Particularly, the AUC curves have the greatest fall on the San Diego dataset, decreasing from 0.9965 to 0.2157 as the p increases from 60 to 6000. The observations of AUC curves coincide with those of ROC curves in Figure 12. Moreover, Figure 13b plots the curves of computational times with respect to the increasing sampled pixel number p. The results show that computational time of RSLAD increases with the rising p. Especially, the  Table 2 compares the computational time of all five methods on the four datasets. All the methods are implemented in Matlab 2016a and their codes are run on a WIN10 computer with Intel Core (TM) i7-6700 CPU 3.40 GHz and 32 GB of RAM. The LRX and CRD cost longer computational times than the other three methods GRX, LRaSMD and RSLAD, and the CRD takes the longest computational times of all. The LRaSMD takes shorter computational times than LRX and CRD but its computational speed is much slower than GRX and RSLAD. The RSLAD takes slightly longer time than GRX on the PaviaC dataset, but it clearly surpasses the GRX in computational speed on the San Diego, Botswana and HyMap datasets. In the experiment, the ranges of sampled pixel numbers p on the PaviaC, San Diego and Botswana datasets are manually set as [60, 120, 300, 600, 1200, 3000, 6000]. Figure 12 illustrates the ROC curves from the RSLAD on the three HSI datasets by changing number of sampled pixels p. For the PaviaC data of Figure 12a, the detection performance increases with the rising p from 60 to 6000, and the RSLAD performs the best when the sampled pixel number p equals 300. After that, the detection performance of RSLAD begins to decrease. The ROC curve with p equal to 6000 performs the worst of all, having the highest false alarm rate at 100% probability of detection. The ROC curves of San Diego data in Figure 12b have similar observations. The performance is the best when the p equals 120. For the Botswana dataset in Figure 12c, the ROC curve of p at 60 performs the best of all, having the lowest false alarm rate at 100% probability of detection.
Moreover, Figure 13a shows the result of AUC curves with the changing number of sampled pixels p on the three datasets. In the figure, the AUCs of RSLAD on three HSI datasets gradually decrease with the increasing p from 60 to 6000. Particularly, the AUC curves have the greatest fall on the San Diego dataset, decreasing from 0.9965 to 0.2157 as the p increases from 60 to 6000. The observations of AUC curves coincide with those of ROC curves in Figure 12. Moreover, Figure 13b plots the curves of computational times with respect to the increasing sampled pixel number p. The results show that computational time of RSLAD increases with the rising p. Especially, the computational times drastically increase when having a larger pixel number p over 1200. Since a large value of p degrades the detection performance and significantly increase computing time, a smaller value of p, e.g., 120, is preferred. and the RSLAD performs the best when the sampled pixel number p equals 300. After that, the detection performance of RSLAD begins to decrease. The ROC curve with p equal to 6000 performs the worst of all, having the highest false alarm rate at 100% probability of detection. The ROC curves of San Diego data in Figure 12b have similar observations. The performance is the best when the p equals 120. For the Botswana dataset in Figure 12c, the ROC curve of p at 60 performs the best of all, having the lowest false alarm rate at 100% probability of detection. Moreover, Figure 13a shows the result of AUC curves with the changing number of sampled pixels p on the three datasets. In the figure, the AUCs of RSLAD on three HSI datasets gradually decrease with the increasing p from 60 to 6000. Particularly, the AUC curves have the greatest fall on the San Diego dataset, decreasing from 0.9965 to 0.2157 as the p increases from 60 to 6000. The observations of AUC curves coincide with those of ROC curves in Figure 12. Moreover, Figure 13b plots the curves of computational times with respect to the increasing sampled pixel number p. The results show that computational time of RSLAD increases with the rising p. Especially, the computational times drastically increase when having a larger pixel number p over 1200. Since a large value of p degrades the detection performance and significantly increase computing time, a smaller value of p, e.g., 120, is preferred.

Performance Sensitivity to the Projected Dimension K
The projected dimension K correlates with the number of random Hadamard projections in the RSLAD. In the experiment, the projected dimensions K on the PaviaC, San Diego and Botswana datasets are manually set between 10 and 100 with a step interval of 10. Figure 14 shows the ROC curves of RSLAD on the three datasets with the changing projected dimensions K from 10 to 100. For the PaviaC dataset of Figure 14a, when the K increases from 10 to 100, the ROC curves of RSLAD have similar trend with small fluctuations. Particularly, the false alarm rates of most ROC curves are concentrated between 0.001 and 0.003 when achieving 100% probability of detection. The ROC curves of San Diego dataset in Figure 14b and Botswana dataset in Figure 14c have similar observations with that of Figure 14a. In Figure 14b,c, the ROC curves of RSLAD are overlaid with each other, despite of the changing projected dimensions K.

Performance Sensitivity to the Projected Dimension K
The projected dimension K correlates with the number of random Hadamard projections in the RSLAD. In the experiment, the projected dimensions K on the PaviaC, San Diego and Botswana datasets are manually set between 10 and 100 with a step interval of 10. Figure 14 shows the ROC curves of RSLAD on the three datasets with the changing projected dimensions K from 10 to 100. For the PaviaC dataset of Figure 14a, when the K increases from 10 to 100, the ROC curves of RSLAD have similar trend with small fluctuations. Particularly, the false alarm rates of most ROC curves are concentrated between 0.001 and 0.003 when achieving 100% probability of detection. The ROC curves of San Diego dataset in Figure 14b and Botswana dataset in Figure 14c have similar observations with that of Figure 14a. In Figure 14b,c, the ROC curves of RSLAD are overlaid with each other, despite of the changing projected dimensions K.
the PaviaC dataset of Figure 14a, when the K increases from 10 to 100, the ROC curves of RSLAD have similar trend with small fluctuations. Particularly, the false alarm rates of most ROC curves are concentrated between 0.001 and 0.003 when achieving 100% probability of detection. The ROC curves of San Diego dataset in Figure 14b and Botswana dataset in Figure 14c have similar observations with that of Figure 14a. In Figure 14b,c, the ROC curves of RSLAD are overlaid with each other, despite of the changing projected dimensions K.   Figure 15a illustrates the AUC curves of RSLAD on the three datasets when the projected dimensions K change from 10 to 100. The results show that the AUC curves of RSLAD are not much affected by the changing projected dimension K. Among them, the AUC curves on San Diego has largest fluctuations ranging from 0.9966 to 0.9977. That further supports the above observations of ROC curves in Figure 14. Moreover, Figure 15b shows the computational times of RSLAD on the three datasets when K from 10 to 100, where the computational times linearly increases with the K. The detection performance of RSLAD is less insensitive to the projected dimension K. The computational times of RSLAD linearly increases with the projected dimension K. Therefore, we will recommend to adopt a small projected dimension K, say, K = 50.
Remote Sens. 2018, 10, x FOR PEER REVIEW 15 of 20 Figure 15a illustrates the AUC curves of RSLAD on the three datasets when the projected dimensions K change from 10 to 100. The results show that the AUC curves of RSLAD are not much affected by the changing projected dimension K. Among them, the AUC curves on San Diego has largest fluctuations ranging from 0.9966 to 0.9977. That further supports the above observations of ROC curves in Figure 14. Moreover, Figure 15b shows the computational times of RSLAD on the three datasets when K from 10 to 100, where the computational times linearly increases with the K. The detection performance of RSLAD is less insensitive to the projected dimension K. The computational times of RSLAD linearly increases with the projected dimension K. Therefore, we will recommend to adopt a small projected dimension K, say, K = 50.

Performance Sensitivity to the Residual Threshold
This experiment investigates the impacts from the residual threshold in the detection performance of RSLAD on the PaviaC, San Diego and Botswana HSI datasets. With prior knowledge about the range of residual errors for all the pixels, we manually set the range of residual thresholds on the three datasets as [10 −12 , 10 −11 , 10 −10 , 10 −9 , 10 −8 ]. According to previous investigation, the sampled pixel numbers p is set to be 120, and the projected dimensions K is 50. Figure 16a-c shows the ROC curves of RSLAD on the three datasets with different residual

Performance Sensitivity to the Residual Threshold
This experiment investigates the impacts from the residual threshold ε in the detection performance of RSLAD on the PaviaC, San Diego and Botswana HSI datasets. With prior knowledge about the range of residual errors for all the pixels, we manually set the range of residual thresholds ε on the three datasets as [10 −12 , 10 −11 , 10 −10 , 10 −9 , 10 −8 ]. According to previous investigation, the sampled pixel numbers p is set to be 120, and the projected dimensions K is 50. Figure 16a-c shows the ROC curves of RSLAD on the three datasets with different residual threshold. For the PaviaC dataset of Figure 16a, with the increase in ε from 10 −12 to 10 −8 , the detection performance of RSLAD first increases and then drastically decreases. The ROC curve performs best when the ε equal to 10 −9 . The ROC curve at ε equal to 10 −8 is worst of all, having the largest false alarm rate when achieving 100% probability of detection. The similar observations exist in ROC curves of the San Diego dataset in Figure 16b. The detection performance of RSLAD first increases and then achieves the best performance when the ε is equal to 10 −10 . After that, further increasing the ε will lower the detection performance of RSLAD. The ROC curves of Botswana dataset in Figure 16c coincides with the observations in Figure 16b. The ROC curve at ε equal to 10 −10 performs best, and the ROC curve at ε equal to 10 −12 is the worst. Moreover, the AUC curves of RSLAD on three datasets in Figure 16d further support the above observations of ROC curves. With the increase in ε from 10 −12 to 10 −8 , the AUC first increases, achieves a maximum and then gradually decreases until the end of the range.

Discussion
In Section 4.2, four groups of experiments are designed to analyze the performance of RSLAD. The first experiment compares the detection results of RSLAD with those of four state-of-the-art methods, i.e., GRX, LRX, CRD and LRaSMD. Experimental results show that the RSLAD has the smallest false alarm rate under 100% probability of detection on the four HSI datasets. The confidence intervals and regions are also adopted to verify the significant difference between RSLAD and other detectors. The superiority of RSLAD to GRX is statistically significant in all the four datasets, and the ROC curves of RSLAD and LRaSMD show less statistical difference from each other. The better performance of RSLAD compared to CRD and LRX is statistically significant on the PaviaC and San Diego datasets, whereas it is less statistically significant on both Botswana and HyMap datasets. As discussed in [52], one could not guarantee a detector is always superior to other detection methods. Fortunately, the RSLAD yields better detection performance than GRX and takes less computational

Discussion
In Section 4.2, four groups of experiments are designed to analyze the performance of RSLAD. The first experiment compares the detection results of RSLAD with those of four state-of-the-art methods, i.e., GRX, LRX, CRD and LRaSMD. Experimental results show that the RSLAD has the smallest false alarm rate under 100% probability of detection on the four HSI datasets. The confidence intervals and regions are also adopted to verify the significant difference between RSLAD and other detectors. The superiority of RSLAD to GRX is statistically significant in all the four datasets, and the ROC curves of RSLAD and LRaSMD show less statistical difference from each other. The better performance of RSLAD compared to CRD and LRX is statistically significant on the PaviaC and San Diego datasets, whereas it is less statistically significant on both Botswana and HyMap datasets. As discussed in [52], one could not guarantee a detector is always superior to other detection methods. Fortunately, the RSLAD yields better detection performance than GRX and takes less computational time than other detectors (i.e., LRaSMD, GRX, LRX and CRD). Therefore, the good detection performance and lower computational cost make the RSLAD a great alternative for hyperspectral anomaly detection.
The second experiment investigates the detection performance sensitivity of RSLAD with respect to the sampled pixels p. Experimental results show that the detection performance and computational times of RSLAD are sensitive to the number of sampled pixels p. Both the detection performance and computational speed of RSLAD decrease with the increasing pixel number p. Particularly, a small number of sample pixel could bring about good performance of RSLAD in detection and computational efficiency. In contrast, a too large sampled pixel number p severely decreases the detection performance of RSLAD and also bring about extremely high computational costs. The explanation for that is as follows. The detection performance of RSLAD relies on the orthonormal basis estimation U of the column subspace of background B. The purified randomized subspace using Equation (4) takes enough independent background columns and can be seen a basis U for the background matrix B. If a too large p pixels were randomly sampled from the original HSI dataset, the purified matrix would have too many dependent columns that represent similar background pixels. That reduces the independence among the columns of U and accordingly degrades the detection performance of RSLAD. From the above analysis, we recommend to adopt a small sampled pixel number p in the following experiment to guarantee good detection performance and low computational costs.
The third experiment analyzes the detection performance sensitivity of RSLAD to the projected dimension K. Experimental results show that the detection performance of RSLAD is insensitive to the projected dimension K. However, the computational times of RSLAD linearly increases with the projected dimension K. The explanation for the above conclusions is as follows. The projected dimension K could lower the computational requirements of the original HSI dataset by reducing its dimensionality from M to K. However, the projected dimension K does not affect the coarse randomized subspace Y Φ of the background. Accordingly, it shows no correlations with the detection performance of the RSLAD. From the above analysis, we recommend to adopt a small projected dimension K in further experiments to reduce the computation costs of RSLAD.
The final experiment explores the detection performance sensitivity of RSLAD to the residual threshold ε. Experimental results show that the detection performance of RSLAD is sensitive to the residual threshold ε. A too-small or too-large threshold ε from its predefined range would negatively affect the detection performance of RSLAD. The explanation for that is as follows. The residual threshold ε correlates with the estimation of anomaly columns in Y Φ and determines the accurate estimation of orthonormal basis U for the column space of background B. If a too-small threshold ε were adopted, too many columns (i.e., sampled background pixels) would be removed from Y Φ . Too few independent columns of U could not fully describe the subspace structure of the background and accordingly negatively affect the detection performance of RSLAD. In contrast, if a too-large threshold were adopted, the anomaly columns from random sampling would be left in the columns of purified matrix U. That would degrade the estimation of orthonormal basis U and negatively affect the detection performance of RSLAD. From the above analysis from three HSI datasets, we recommend to initialize the residual threshold ε on the HSI dataset with 10 −10 , make slight tuning around the initials, and select the proper threshold that has the best performance in ROC and AUC.
Our RSLAD still has some drawbacks and needs careful investigations in the future work. First, we could not provide an intelligent estimation scheme for the threshold ε. The automatic parameter estimation of residual threshold ε will be carefully studied in our following work. Second, the quantitative relationship between sampled pixel number p and the detection performance