Real-Time Anomaly Detection Based on a Fast Recursive Kernel RX Algorithm

Real-time anomaly detection has received wide attention in remote sensing image processing because many moving targets must be detected on a timely basis. A widely-used anomaly detection algorithm is the Reed-Xiaoli (RX) algorithm that was proposed by Reed and Yu. The kernel RX algorithm proposed by Kwon and Nasrabadi is a nonlinear version of the RX algorithm and outperforms the RX algorithm in terms of detection accuracy. However, the kernel RX algorithm is computationally more expensive. This paper presents a novel real-time anomaly detection framework based on the kernel RX algorithm. In the kernel RX detector, the inverse covariance matrix and the estimated mean of the background data in the kernel space are non-causal and computationally inefficient. In this work, a local causal sliding array window is used to ensure the causality of the detection system. Using the matrix inversion lemma and the Woodbury matrix identity, both the inverse covariance matrix and estimated mean can be recursively derived without extensive repetitive calculations, and, therefore, the real-time kernel RX detector can be implemented and processed pixel-by-pixel in real time. To substantiate its effectiveness and utility in real-time anomaly detection, real hyperspectral data sets are utilized for experiments.


Introduction
Hyperspectral imagery (HSI) can provide abundant spectral information to describe various ground materials due to its very high spectral resolution [1].Anomaly detection, one of the main research areas, is of particular importance since it can uncover many subtle materials of which there is no prior knowledge or visualization for image analysts [2].These types of materials generally appear as anomalies in hyperspectral images, such as special species in agriculture and ecology, rare minerals in geology, oil spills in water pollution, drug trafficking in law enforcement, man-made objects in battlefields, and tumors in medical imaging [3].
In HSI anomaly detection, the Reed-Xiaoli (RX) detector of Reed and Yu [4] is widely used and considered a baseline algorithm [5][6][7][8][9][10][11][12].The well-known RX detector is the benchmark algorithm derived from a generalized likelihood ratio test for an unknown additive contrast signal in a multivariate Gaussian background.Complex ground material distributions have a negative impact on the RX detection since the RX detector only makes use of low-order statistics of hyperspectral data.To this issue, the kernel RX (KRX) algorithm [5], a nonlinear version of the RX algorithm, was proposed by Kwon and Nasrabadi.By mining the high-order correlation between spectral bands via a kernel function, the KRX detector (KRXD) provides better detection performance when original data samples are mixed in a non-linear model, as is always the case.Depending on the background information, KRXD can be divided into two types: global KRXD (GKRXD) and local KRXD (LKRXD).GKRXD calculates the Mahalanobis distance between a test pixel and global background information in the feature space, while LKRXD uses sliding windows to effectively detect local anomalous targets that could easily be overwhelmed in the global background.By virtue of a kernel trick [13], the KRX detector can be implemented using the dot products of input data in the original low dimensional space rather than in high dimensional space.
Recently, in applications of anomaly detection, there is an increasing interest concerning real-time and quick algorithms of anomaly detection in HSI [14][15][16].It is particularly crucial since some moving targets, denoted by anomalies, need to be located in real time.Real-time processing also alleviates the need for data storage, system response time and the transmission of large amounts of hyperspectral data.Over the past few years, many real-time anomaly detection algorithms have been proposed to enable real-time or nearly real-time on-board processing in the literature [14][15][16][17][18][19][20][21][22][23][24][25][26].In [14], computationally efficient anomaly detectors were developed and tested in the operating airborne platforms.In [17], the architecture for real-time global background data statistics evaluation is shown.To speed up processing time, real-time anomaly detection was successfully implemented on graphics processing units (GPUs) [21][22][23].Subsequently, the real-time anomaly detectors based on efficient updating strategies were proposed and developed in [25,26].Unfortunately, most of them are not actually real-time processors but simply fast algorithms.An anomaly detection algorithm for implementation in real time must meet the requirement of causality [16].In other words, the data sample vectors used for data processing can only be those prior to the sample vector being visited, and any future sample vectors should not be involved in processing.By doing so, a real-time causal processing of anomaly detection was proposed in [15].In this method, causal equations are derived and updated recursively.In [16,27], global real-time causal RX detectors (GRTC-RXD) and local real-time causal RX detectors (LRTC-RXD) based on different causal sliding windows were investigated.Both the GRTC-RXD and the LRTC-RXD use the Woodbury matrix identity to reduce computational complexity.While GRTC-RXD takes all data samples before the test pixel as background information, the LRTC-RXD only utilizes the background information in a local causal sliding window.However, these algorithms are designed based on RX detectors, where there is still a challenge of detection accuracy since they only use low-order statistics of hyperspectral data.
Therefore, this paper proposes a new framework of real-time anomaly detection based on the KRX algorithm that has better detection accuracy than RX algorithms.A local causal sliding array window is employed to ensure a causal detection system [27], thereby gaining an advantage in that the data samples are collected and detection is performed simultaneously.The matrix inversion lemma and Woodbury matrix identity are then combined to recursively update the inverse covariance matrix and the estimated mean of the background data in the kernel space.There is no need for the entire previously visited data sample vectors to be reprocessed, thereby speeding up real-time processing.After meeting the requirements of causality and efficiency, the kernel RX detector can be implemented and processed pixel-by-pixel in a real-time manner.

RX and KRX Anomaly Detector
Reed and Yu in [4] developed an RX detector that is a widely used anomaly detection algorithm in hyperspectral imaging.The RX algorithm calculates Mahalanobis distance between the data sample vector currently being detected and background data sample vectors.To exploit abundant nonlinear information of hyperspectral data, Kwon et al. proposed a KRX algorithm that has better separation performance between anomalies and the background by using kernel functions.In the following, we briefly describe the RX algorithm and the KRX algorithm.

RX Algorithm
Let each input spectral signal consisting of l spectral bands be denoted by x i = [x 1i , x 2i , . . ., x li ] T .The two competing hypotheses that the RX-algorithm should distinguish are given by where a = 0 under H 0 and a > 0 under H 1 , n is additive Gaussian noise, and s is a vector that represents the spectral signature of the signal (target).The model assumes that the data are from two normal probability density functions with the same covariance matrix but different means.Under H 0 , the data (background clutter) are modeled as N(0, C), and under H 1 the data are modeled as N(µ, C).The RX detector, referred to as δ RXD (r), is specified by where µ is the background estimated mean and C is the background covariance matrix.

Kernel RX Algorithm
The KRX algorithm uses the same assumptions as those used in the RX algorithm, in other words, the mapped input data in the feature space now consists of two Gaussian distributions, thus modeling the two hypotheses as where a Φ = 0 under 0Φ and a Φ > 0 under 1Φ , Φ(s) and n Φ represent target spectral signature and noise in the feature space, respectively.The corresponding KRX algorithm in the feature space is represented as where μBΦ and KBΦ are the estimated mean and covariance matrix of the background data in the feature space, respectively.Through certain kernelization and derivation, where X B is the original background data including N data sample vectors.The Gram matrix K w is expressed by k(X B , X B ). Finally, the KRX algorithm can be simplified as Kernel-based learning algorithms use an effective kernel trick to implement dot products in the feature space by employing kernel functions [13].A commonly used kernel is the Gaussian radial basis function (RBF) kernel expressed as where c is a positive constant.

Proposed Real-Time Processing of KRX Detector
Although the KRX detector has desirable detection accuracy, neither the GKRXD nor LKRXD is actually a real-time detector since the Gram matrix to be processed needs to use either the entire background data sample vectors or the data sample vectors in the local window.Fortunately, both the global and local KRX detection algorithms can be improved to become real-time.In this paper, however, only real-time processing based on LKRXD, by using a local causal sliding array window, is investigated.For the global real-time KRX detector, the Gram matrix will grow in size as detection progresses because the size of the Gram matrix is dependent on the number of background data samples, and the global model needs to include all data samples before the test pixel.Considering that a real-time detector should be implemented in continuous time, the Gram matrix will become so big that the computational complexity is too time consuming for practical applications.

Local Causal Sliding Array Window
Due to the requirement of causality, local real-time processing would be rather complicated if the commonly used dual window is used.This is because, each time, more than one vector in the local window alters to make sure that the background data sample vectors in the causal sliding window only include the same data sample vectors as before.To address this issue, the literature [16] proposes a local causal sliding array window obtained from stretching out the causal matrix window.The local causal sliding array window of width w slides along with the data sample vector r n being processed, which performs first in and first out.Figure 1 shows the local causal sliding array window at r n depicted by dotted lines and the local causal sliding array window at r n+1 depicted by dashed lines, where the farthest data sample vector r n−w from r n in the local causal sliding array window at r n is removed from the local causal array window at r n+1 , while the most recent data sample vector r n is then added to the local causal sliding array window at r n+1 .

Proposed Real-time Processing of KRX Detector
Although the KRX detector has desirable detection accuracy, neither the GKRXD nor LKRXD is actually a real-time detector since the Gram matrix to be processed needs to use either the entire background data sample vectors or the data sample vectors in the local window.Fortunately, both the global and local KRX detection algorithms can be improved to become real-time.In this paper, however, only real-time processing based on LKRXD, by using a local causal sliding array window, is investigated.For the global real-time KRX detector, the Gram matrix will grow in size as detection progresses because the size of the Gram matrix is dependent on the number of background data samples, and the global model needs to include all data samples before the test pixel.Considering that a real-time detector should be implemented in continuous time, the Gram matrix will become so big that the computational complexity is too time consuming for practical applications.

Local Causal Sliding Array Window
Due to the requirement of causality, local real-time processing would be rather complicated if the commonly used dual window is used.This is because, each time, more than one vector in the local window alters to make sure that the background data sample vectors in the causal sliding window only include the same data sample vectors as before.To address this issue, the literature [16] proposes a local causal sliding array window obtained from stretching out the causal matrix window.The local causal sliding array window of width w slides along with the data sample vector n r being processed, which performs first in and first out.Figure 1 shows the local causal sliding array window at n r depicted by dotted lines and the local causal sliding array window at

Local Causal KRX Detector
By using the local causal sliding array window to get background data sample vectors, the LKRXD can be designed as a local causal KRX detector (LC-KRXD): where n r is the th n sample vector currently being processed, and ( ) w n K is called the causal Gram matrix, which is defined by ( ) ( ( ), ( )) where w N is the total number of data sample vectors in the local causal sliding array window.

Local Causal KRX Detector
By using the local causal sliding array window to get background data sample vectors, the LKRXD can be designed as a local causal KRX detector (LC-KRXD): where r n is the n th sample vector currently being processed, and where X w (n) consists of all data sample vectors included in the local causal sliding array window.In Equation ( 9), k T r (n) and k T μ(n) should meet causality as well.Accordingly, they are represented by where N w is the total number of data sample vectors in the local causal sliding array window.

Local Real-time KRX Detector
As for LC-KRXD, it is quite time-consuming since every component of Equation ( 9) has to be recalculated as long as the local causal sliding array window moves.To solve this problem, we develop a local real-time causal KRX detector (LRTC-KRXD) specified by Employing the Gaussian RBF kernel function denoted by Equation ( 8), the Gram matrix in the local causal sliding array window at r n is expressed as where X n,w is represented by that involves all of the data sample vectors in the local causal sliding array window at r n , γ n is given by γ n = k(x n−w , x n−w ), and η n is a vector obtained by Similarly, the Gram matrix in the local causal sliding array window at r n+1 is written as where X n+1,w is represented by (x n−w+1 , x n−w+2 , • • • x n ) involving all of the data sample vectors in the local causal sliding array window at r n+1 , χ n is denoted by χ n = k(x n , x n ), and β n is a vector derived from The matrix inversion lemma [28,29] is a favored technique to simplify the matrix inversion process and is expressed by γ n in Equation ( 13) can be regarded as a 1 × 1 matrix.So if we let Next, we would like to speed up the calculation of K −1 w by deriving the recursive formula between , which is one of the main contributions in this paper.Let According to Equations ( 16) and ( 17), we have In order to efficiently calculate K −1 n from the known D and avoid an inverse operation, we use the Woodbury matrix identity [30,31] given by By virtue of Equation ( 20), if we also let We employ Equation ( 20) again to derive K −1 w (n + 1).χ n in Equation ( 14) can also be treated as ) is finally written as Since n is a constant, there is no inverse operation of the matrix existing in Equation (22).
The proposed fast formula for updating K −1 w (n + 1) from K −1 w (n) consists of the following steps: Step 1 To save computing time further, k T μ(n + 1) can be recursively updated where X n,2:w denotes the data sample matrix with the vectors ranging from 2 th to w th in the local causal sliding array window at r n , and X n+1,1:w−1 denotes the data sample matrix with the vectors ranging from 1 th to (w − 1) th in the local causal sliding array window at r n+1 .According to k m1 (n), we can obtain k m f , and further, k m1 (n + 1) can be derived by Equation (26).Using Equation ( 27), k m2 (n + 1) can be updated by k m2 (n).After obtaining k m1 (n + 1) and k m2 (n + 1), k T μ(n + 1) can be updated recursively by Equation ( 24).This paper is inspired by the real-time RX algorithm in the literature [15,16], but it should be noted that both algorithms are different in terms of the original algorithms used to design real-time frames and the processing styles.The real-time RX algorithm is developed based on the RX algorithm, while the proposed algorithm is derived according to the nonlinear version of the RX algorithm, which is more complicated but has a higher detection accuracy.In the recursive process of the real-time RX algorithm, Woodbury matrix identity is used to directly update the inverse covariance matrix.In contrast, the accelerated processing of the proposed algorithm is intricate.The matrix inversion lemma must be associated with the Woodbury matrix identity, in this case, K −1  n can be derived through D. After that, by using the matrix inversion lemma again, K −1  w (n + 1) is finally updated from K −1  n .Moreover, the estimated mean of the background data in the kernel space is also recursive in this paper.

Pavia University Dataset
The Pavia University (PaviaU) data were obtained from the Reflective Optics System Imaging Spectrometer (ROSIS) sensor during a flight over Pavia University, northern Italy.They consist of 115 bands ranging from 430 to 860 nm with a 4 nm spectral resolution.The space resolution is approximately 1.3 m.The dataset contains 610 × 340 pixels in the image scene shown in Figure 2. In this study, a subarea shown in Figure 3a was segmented from the initial larger image to conduct experiments.The subset contains 260 × 110 pixels and 103 bands after removing low signal-to-noise ratio (SNR) bands.The ground-truth map is displayed in Figure 3b.Its detailed parameters are presented in Table 1.

Pavia Center Dataset
The Pavia Center hyperspectral dataset, acquired by the ROSIS sensor, covers the Pavia Center in northern Italy shown in Figure 4.It is a 115-band image with a size of 1096 × 715 pixels, but only 102 bands were used for experiments after removing low signal-to-noise ratio bands.In this experiment, a smaller subset with a size of 115 × 115 pixels, shown in Figure 5a, was segmented from the initial larger image.Its parameters are presented in Table 2.The smaller image constitutes the background, including a bridge, water and shadows, and anomalies representing vehicles on the bridge, which are shown in Figure 5b.

Pavia Center Dataset
The Pavia Center hyperspectral dataset, acquired by the ROSIS sensor, covers the Pavia Center in northern Italy shown in Figure 4.It is a 115-band image with a size of 1096 × 715 pixels, but only 102 bands were used for experiments after removing low signal-to-noise ratio bands.In this experiment, a smaller subset with a size of 115 × 115 pixels, shown in Figure 5a, was segmented from the initial larger image.Its parameters are presented in Table 2.The smaller image constitutes the background, including a bridge, water and shadows, and anomalies representing vehicles on the bridge, which are shown in Figure 5b.

San Diego Airport Dataset
The San Diego Airport dataset was collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) hyperspectral spectrometer over the area of the San Diego airport and the image is shown in Figure 6.It contains 400 × 400 pixels and 224 bands, 126 of which were used for experiments.A smaller dataset shown in Figure 7a was segmented from the larger image, and its ground truth is given by Figure 7b.Table 3 presents the parameters of the smaller dataset.

San Diego Airport Dataset
The San Diego Airport dataset was collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) hyperspectral spectrometer over the area of the San Diego airport and the image is shown in Figure 6.It contains 400 × 400 pixels and 224 bands, 126 of which were used for experiments.A smaller dataset shown in Figure 7a was segmented from the larger image, and its ground truth is given by Figure 7b.Table 3 presents the parameters of the smaller dataset.

Experimental Results
In this section, three sets of real hyperspectral datasets, collected by different imaging sensors, are used to perform experimental evaluation for the proposed algorithm.

Experimental Results
In this section, three sets of real hyperspectral datasets, collected by different imaging sensors, are used to perform experimental evaluation for the proposed algorithm.

Optimum Kernel Parameter on the LRTC-KRXD
This group of experiments explores the optimum Gaussian radial basis function kernel parameter c on the LRTC-KRXD.In the experiments, by virtue of cross-validation, the local causal sliding window width in the LRTC-KRXD on the PaviaU dataset, Pavia Center dataset, and San Diego Airport dataset is manually set to be 70, 40, and 70, respectively.
The receiver operating characteristics (ROC) curve representing detection probability versus false-alarm rates is a strong technique to present quantitative performance analysis.Area under the ROC curve (AUC) is also used to judge the performance of hyperspectral detection.Figure 8 gives the different AUC of the LRTC-KRXD with changing kernel parameter c for all three hyperspectral images.For the PaviaU dataset, the AUC is the largest when c is equal to 100.For the Pavia Center and San Diego Airport datasets, however, the LRTC-KRXD shows the best AUC when c is up to 10. From Figure 8, the AUC for all three datasets rises rapidly between c = 10 −1 and c = 1, then they keep basically smooth until c = 10 6 , which is followed by a steady drop before c = 10 7 .Therefore, the kernel parameter c is not sensitive to the LRTC-KRXD in some very long numerical range.The receiver operating characteristics (ROC) curve representing detection probability versus false-alarm rates is a strong technique to present quantitative performance analysis.Area under the ROC curve (AUC) is also used to judge the performance of hyperspectral detection.Figure 8 gives the different AUC of the LRTC-KRXD with changing kernel parameter c for all three hyperspectral images.For the PaviaU dataset, the AUC is the largest when c is equal to 100.For the Pavia Center and San Diego Airport datasets, however, the LRTC-KRXD shows the best AUC when c is up to 10. From Figure 8, the AUC for all three datasets rises rapidly between .Therefore, the kernel parameter c is not sensitive to the LRTC-KRXD in some very long numerical range.

Effects of the Local Causal Sliding Array Window Width on the LRTC-KRXD
This group of experiments investigates the performance sensitivity of the LRTC-KRXD in terms of local causal sliding array window width.In these experiments, the local causal sliding array window width in the LRTC-KRXD on three images is manually set from 10 to 130 with steps of 30.By using cross-validation, the parameter c of the Gaussian radial basis function kernel in LRTC-KRXD is set at 100 on the PaviaU dataset and 10 on both the Pavia Center and San Diego Airport datasets.
Figure 9 shows the ROC curves of the LRTC-KRXD on three images using changing local causal sliding array window width w between 10 and 130.For the PaviaU dataset in Figure 9a, the detection effect is poor using the local causal sliding array window width w = 10, but the detection performance starts to improve as the local causal sliding window width grows.When it is greater or equal to w = 70, the local causal sliding array window detection performances are comparable.For the Pavia Center dataset in Figure 9b, the ROC curve result with the local causal sliding array window width

Effects of the Local Causal Sliding Array Window Width on the LRTC-KRXD
This group of experiments investigates the performance sensitivity of the LRTC-KRXD in terms of local causal sliding array window width.In these experiments, the local causal sliding array window width in the LRTC-KRXD on three images is manually set from 10 to 130 with steps of 30.By using cross-validation, the parameter c of the Gaussian radial basis function kernel in LRTC-KRXD is set at 100 on the PaviaU dataset and 10 on both the Pavia Center and San Diego Airport datasets.
Figure 9 shows the ROC curves of the LRTC-KRXD on three images using changing local causal sliding array window width w between 10 and 130.For the PaviaU dataset in Figure 9a, the detection effect is poor using the local causal sliding array window width w = 10, but the detection performance starts to improve as the local causal sliding window width grows.When it is greater or equal to w = 70, the local causal sliding array window detection performances are comparable.For the Pavia Center dataset in Figure 9b, the ROC curve result with the local causal sliding array window width w = 10 performs the worst, however, the detection accuracy increases when the local causal sliding window width rises.When w = 40, their ROC curves are similar.For the San Diego Airport dataset in Figure 9c, the detection performance improves as the local causal sliding window width w rises from 10 to 70.When it continues to grow, the ROC curve result remains basically unchanged.

Detection Performance of the LRTC-KRXD
These experiments explore the detection performance of the LRTC-KRXD.We made comparisons between the LRTC-KRXD and three other anomaly detectors on three hyperspectral datasets.The other three anomaly detectors included two real-time detectors (GRTC-RXD and LRTC-RXD) and one non-real-time detector (LKRXD).In these experiments, for the LRTC-KRXD, by cross-

Detection Performance of the LRTC-KRXD
These experiments explore the detection performance of the LRTC-KRXD.We made comparisons between the LRTC-KRXD and three other anomaly detectors on three hyperspectral datasets.The other three anomaly detectors included two real-time detectors (GRTC-RXD and LRTC-RXD) and one non-real-time detector (LKRXD).In these experiments, for the LRTC-KRXD, by cross-validation, the parameter c of Gaussian radial basis function kernel is set to 100 on the PaviaU dataset and to 10 on both the Pavia Center dataset and the San Diego Airport dataset.The local causal sliding array window width, w, on the PaviaU dataset and San Diego Airport dataset is set to 70, and the local causal sliding array window width, w, on the Pavia Center dataset is set to 40.For the LRTC-RXD, the local causal sliding array window width, w, on the PaviaU, Pavia Center, and San Diego Airport datasets is set to 400, 300, and 300, respectively, to obtain the best, stable outputs.For the LKRXD, by cross-validation, the set of the kernel parameter c is the same as the LRTC-KRXD on all three images, but the size of the inner window and outer window in the LKRXD is set to 5 and 11, respectively.
Figure 11 shows the results of the ROC curves from all three real-time anomaly detectors and one non-real-time anomaly detector on the three HSI datasets.For the PaviaU dataset shown in Figure 11a, the LRTC-KRXD and the LKRXD are similar throughout the curves, and compared to the GRTC-RXD and the LRTC-RXD, they show a much higher detection probability.For the Pavia Center dataset shown in Figure 11b, the ROC curve results of the LRTC-KRXD and the LKRXD are comparable, but they far outperform those of the GRTC-RXD and the LRTC-RXD.For the San Diego Airport dataset, similar detection effects related to the LRTC-KRXD and the LKRXD are shown in Figure 11c, and these detection effects are better than those of the GRTC-RXD and the LRTC-RXD.This result is because the recursion process of the LRTC-KRXD is derived from the LKRXD that mines nonlinear information by a kernel trick, and there is no information leaked out in this process.Some small differences occur between the detection results of the LRTC-KRXD and the LKRXD because the pseudoinverse of the Gram matrix generally needs to be implemented when each pixel is detected in the LKRXD [32], while this is avoided by recursion in the LRTC-KRXD; the local dual window is used in the LKRXD while the local causal sliding array window is used for the LRTC-KRXD.

Computational Analysis of the LRTC-KRXD
Both computational complexity and computing time are significant indicators to measure the performance of anomaly detection, especially real-time processing.The computational complexity of the LKRX algorithm originates in the components of the LKRX formula specified by Equation ( 7), including the current pixel in the feature space r k T , the background data sample mean kT μ , the Gram matrix w K and its inversion   where ω is the size of the local causal sliding array window and l is the number of bands.It is not really possible to improve it since r in the equation of r k T is not constant when the local sliding array window moves.Second,

Computational Analysis of the LRTC-KRXD
Both computational complexity and computing time are significant indicators to measure the performance of anomaly detection, especially real-time processing.The computational complexity of the LKRX algorithm originates in the components of the LKRX formula specified by Equation ( 7), including the current pixel in the feature space k T r , the background data sample mean k T μ, the Gram matrix K w and its inversion K −1 w .First, k T r is of order O(ω • l) where ω is the size of the local causal sliding array window and l is the number of bands.It is not really possible to improve it since r in the equation of k T r is not constant when the local sliding array window moves.Second, for k T μ, the multiplicative order is approximately O(ω) without recursive update processing, while using the recursive update equation given by Equation ( 23), the multiplicative order is close to zero.Third, the multiplicative order of K w in the LKRXD is O(ω 2 • l) + O(ω 2 • e m ) (m ∈ R), while in the the LRTC-KRX detector, due to the usage of recursive processing, the multiplicative order is not defined ("n.d." entries in Table 4).Lastly, the multiplicative order of K −1  w is reduced to O(ω 2 ) from O(ω 3 ) by virtue of the matrix inversion lemma and Woodbury matrix identity.The more detailed computational complexity is shown in Table 4, where we can see that O(ω 3 ) and O(ω 2 • l) in the multiplicative order are not involved in LRTC-KRX detection, which reduces much of the computational complexity and cuts down on the massive computing time.
The computer environments used for the experiments were 64-bit operating systems with Intel(R) Core (TM) i7-4770K, 3.5 GHz CPU, and 16 GB memory (RAM).All the experiments were conducted five times and averaged to remove computer error [15,16].Table 5 shows the total computing time of the LKRXD and the LRTC-KRXD on three hyperspectral datasets.By using recursive update equations, the total computing time of the LRTC-KRXD is reduced by at least 44-fold compared to the LKRXD.

Discussion
In the experiment, the effect of kernel parameter c on the LRTC-KRXD was analyzed.It is clear in Figure 1 that if c is less than 1 or larger than 10 6 the AUC descends immediately for three datasets.With a wide range (e.g., from 10 to 10 5 ) of kernel parameter, the AUC can reach a very high value and basically remain stable.Therefore, it can be concluded that the Gaussian radial basis function kernel parameter c is not sensitive to the LRTC-KRXD in some very long numerical range.
The effect of the local causal sliding window width on real-time detectors was depicted.The results from Figures 9 and 10 indicate that the best detection performance of the LRTC-KRXD is obtained when only a very small local causal sliding window width (less than 100) is utilized.By contrast, the LRTC-RXD needs hundreds of pixels to form the local background information to realize a better detection performance [27].Due to the requirement of causality, the pixels in the first local causal sliding window are not processed, which implies that the anomalous targets will be missed when they are involved in the initial local causal sliding window.Since the LRTC-KRXD uses fewer pixels than the LRTC-RXD to constitute the local background information, the LRTC-KRXD has a lower probability of missing targets.
In the experiment, the proposed the LRTC-KRXD was also compared with its original algorithm (LKRXD) and two other widely used algorithms (GRTX-RXD and LRTC-RXD).Experimental results on three real data sets illustrate the advantage of the proposed LRTC-KRXD method.The LRTC-KRXD possesses a comparable detection output with the KRXD but higher detection accuracy than the GRTX-RXD and the LRTC-RXD (Figures 11 and 12).For computational complexity and processing time, the results from Tables 4 and 5 show that the LRTC-KRXD is very computationally efficient.This implies that the LRTC-KRXD achieves a breakthrough in terms of detection accuracy in real-time anomaly detection.Although the LRTC-KRXD gets an over 44-fold speedup, it is sometimes still limited for practical applications.GPU can be taken into account to speed up the processing further using its parallel processing capability.
In the real-time detectors based on the RX algorithm, the size of covariance and the autocorrelation matrix is dependent on the number of bands.So as the number of bands grows, the computational complexity and processing time will increase considerably.For the LRTC-KRXD, however, the Gram matrix is determined by the number of pixels in the local causal sliding window, which means the band growth does not have a great influence on the LRTC-KRXD.This finding can be considered as an encouraging result since modern hyperspectral sensors are characterized by a very high spectral resolution, thus acquiring data on a large number of contiguous bands.
A real-time anomaly detector is implemented pixel-by-pixel, which means that the detection result display of the current pixel is not impacted by subsequent detection results.Accordingly, some weak anomalies detected early may be shown in the detection result (Figure 14).However, this phenomenon cannot appear in non-real-time anomaly detectors that show the final detected anomalies by performing a one-shot operation.

Conclusions
Real-time anomaly detection has promising prospective applications and significant practical value.Most real-time anomaly detection algorithms are designed based on the RX detector.However, the real-time RX detector has limitations with the usually undesirable detection output.Therefore, this paper focuses on this issue and develops a new real-time processing framework based on the KRX detector.The kernel RX algorithm has better detection accuracy, but the computation of the Gram matrix and its inverse is computationally inefficient.By taking advantage of the matrix inversion lemma and Woodbury matrix identity, the computation can be recursively updated without repeated calculation.As a result, the kernel RX algorithm complexity is greatly reduced and computing time becomes very short.Our experimental results, conducted using three hyperspectral datasets, indicate that the proposed real-time KRX detector possesses comparable detection accuracy with the original KRX algorithm but with much shorter processing time.
In general, hyperspectral imaging has three data acquisition formats: band interleaved by pixels (BIP) that collects data pixel-by-pixel, band sequential (BSQ) that collects data band-by-band, and band interleaved by lines (BIL) that collects data line-by-line.This paper is designed according to BIP.So both BSQ and BIL can be considered for the real-time KRX implementation in the future.In addition, designing some other computationally efficient methods, for example, using GPU or additional matrix simplification/efficiencies to speed up real-time processing can also be considered for future work.
dashed lines, where the farthest data sample vector n w − r from n r in the local causal sliding array window at n r is removed from the local causal array window at 1 n+ r , while the most recent data sample vector n r is then added to the local causal sliding array window at 1 n+ r .

Figure 1 .
Figure 1.Local causal sliding array windows at n r and +1 n r .

Figure 1 .
Figure 1.Local causal sliding array windows at r n and r n+1 .

Figure 3 .
Figure 3. (a) The smaller image scene; (b) ground truth of the smaller image scene.

Figure 5 .
Figure 5. (a) The smaller image scene; (b) ground truth of the smaller image scene.

Figure 7 .
Figure 7. (a) The smaller image scene; (b) ground truth of the smaller image scene.

4. 1 .
Optimum Kernel Parameter on the LRTC-KRXD This group of experiments explores the optimum Gaussian radial basis function kernel parameter c on the LRTC-KRXD.In the experiments, by virtue of cross-validation, the local causal sliding window width in the LRTC-KRXD on the PaviaU dataset, Pavia Center dataset, and San Diego Airport dataset is manually set to be 70, 40, and 70, respectively.

Figure 8 .
Figure 8.The area under the cure (AUC) of the local real-time causal kernel RX detector (LRTC-KRXD) with the changing kernel parameter c on the Pavia University (PaviaU) dataset, Pavia Center dataset, and San Diego Airport dataset.

Figure 8 .
Figure 8.The area under the cure (AUC) of the local real-time causal kernel RX detector (LRTC-KRXD) with the changing kernel parameter c on the Pavia University (PaviaU) dataset, Pavia Center dataset, and San Diego Airport dataset.

Figure 9 .
Figure 9.The receiver operating characteristics (ROC) curves of the LRTC-KRXD with changing local causal sliding array window width on the (a) PaviaU dataset; (b) Pavia Center dataset; (c) San Diego Airport dataset.

Figure 10
Figure 10 reveals the grayscale results of the above experiments.When the local causal sliding window width w = 10, shown in Figure 10a, f, and k, there is a bad detection effect on the pixels next to the large anomaly in the direction of the window.This is because at those positions, with the small local causal sliding array window width, backgrounds are easily corrupted by the anomalies involved in the local causal sliding array window.As the local causal sliding array window width increases, such phenomena disappear gradually since the number of background pixels in the local causal sliding array window grows.When the local causal sliding array window widths of the LRTC-KRXD on the PaviaU, Pavia Center, and San Diego Airport images are greater or equal to w = 70, w = 40, or w = 70, respectively, their own grayscale outputs are similar by visual inspection.

Figure 9 .
Figure 9.The receiver operating characteristics (ROC) curves of the LRTC-KRXD with changing local causal sliding array window width on the (a) PaviaU dataset; (b) Pavia Center dataset; (c) San Diego Airport dataset.

Figure 10
Figure 10 reveals the grayscale results of the above experiments.When the local causal sliding window width w = 10, shown in Figure 10a, f, and k, there is a bad detection effect on the pixels next to the large anomaly in the direction of the window.This is because at those positions, with the small local causal sliding array window width, backgrounds are easily corrupted by the anomalies involved in the local causal sliding array window.As the local causal sliding array window width increases, such phenomena disappear gradually since the number of background pixels in the local causal sliding array window grows.When the local causal sliding array window widths of the LRTC-KRXD on the PaviaU, Pavia Center, and San Diego Airport images are greater or equal to w = 70, w = 40, or w = 70, respectively, their own grayscale outputs are similar by visual inspection.PaviaU Dataset

Figure 10 .
Figure 10.The grayscale results of the LRTC-KRXD with the changing local causal sliding array window width on the PaviaU, Pavia Center, and San Diego Airport datasets.

Figure 10 .
Figure 10.The grayscale results of the LRTC-KRXD with the changing local causal sliding array window width on the PaviaU, Pavia Center, and San Diego Airport datasets.

Figure 11 .
Figure 11.The ROC curves of four anomaly detectors on (a) PaviaU dataset; (b) Pavia Center dataset; (c) San Diego Airport dataset.

Figure 11 .
Figure 11.The ROC curves of four anomaly detectors on (a) PaviaU dataset; (b) Pavia Center dataset; (c) San Diego Airport dataset.

Figure 12 Figure 12 .
Figure12presents the grayscale outputs of ther GRTC-RXD, LRTC-RXD, LKRXD and LRTC-KRXD on three hyperspectral images.For the PaviaU dataset and the Pavia Center dataset, by visual inspection of Figure12c,d,g,h, there are no appreciable differences, but they clearly show better grayscale results compared with that of Figure12a,b,e,f.For the San Diego Airport dataset, better background suppression is shown in the LRTC-KRXD and the LKRXD.Three-dimensional (3D) plots are used to verify the detailed differences among the GRTC-RXD, LRTC-RXD, LKRXD, and LRTC-KRXD on the three hyperspectral datasets.As we can see from the

Figure 14
Figure14shows the progressive detection procedures of the LRTC-KRXD on the PaviaU, Pavia Center, and San Diego Airport datasets.By using the local causal sliding array window, anomalies in the hyperspectral images are detected pixel-by-pixel in real time.In addition, as time moves along, some weak anomalies appear with various levels of background suppression.For example, on the Pavia Center image, the first three weak anomalies display clearly in the progressive detection results, but when the strong anomaly is detected later, these weak anomalies become dim.

Figure 14
Figure14shows the progressive detection procedures of the LRTC-KRXD on the PaviaU, Pavia Center, and San Diego Airport datasets.By using the local causal sliding array window, anomalies in the hyperspectral images are detected pixel-by-pixel in real time.In addition, as time moves along, some weak anomalies appear with various levels of background suppression.For example, on the Pavia Center image, the first three weak anomalies display clearly in the progressive detection results, but when the strong anomaly is detected later, these weak anomalies become dim.PaviaU Dataset

Table 1 .
Parameters of the Pavia University hyperspectral dataset.ROSIS is the Reflective Optics System Imaging Spectrometer.

Table 2 .
Parameters of the Pavia Center hyperspectral dataset.

Table 2 .
Parameters of the Pavia Center hyperspectral dataset.

Table 3 .
Parameters of the San Diego Airport hyperspectral dataset.AVIRIS is the Airborne Visible/Infrared Imaging Spectrometer.

Table 3 .
Parameters of the San Diego Airport hyperspectral dataset.AVIRIS is the Airborne Visible/Infrared Imaging Spectrometer.

Table 4 .
Computation complexity of four components.

Table 5 .
Processing time for real datasets.