A Preprocessing Method for Hyperspectral Target Detection Based on Tensor Principal Component Analysis

: Traditional target detection (TD) algorithms for hyperspectral imagery (HSI) typically suffer from background interference. To alleviate this problem, we propose a novel preprocessing method based on tensor principal component analysis (TPCA) to separate the background and target apart. With the use of TPCA, HSI is decomposed into a principal component part and a residual part with the spatial-spectral information of the HSI being fully exploited, and TD is performed on the latter. Moreover, an effective distinction in scheme can be made between a HSI tensor’s spatial and spectral domains, which is in line with the physical meanings. Experimental results from both synthetic and real hyperspectral data show that the proposed method outperforms other preprocessing methods in improving the TD accuracies. Further, target detectors that combine the TPCA preprocessing approach with traditional target detection methods can achieve better results than those of state-of-the-art methods aiming at background suppression.


Introduction
Hyperspectral remote sensing for earth observation has attracted increasing interest in recent years.In addition to the ground cover's spatial distribution, hyperspectral imaging sensors can synchronously measure the reflectance of materials in hundreds of narrow and contiguous bands, spanning over the visible-to-infrared and even a wider range of the electromagnetic spectrum [1,2].Therefore, each pixel of hyperspectral imagery (HSI) can be presented as a continuous spectral curve.HSI has been extensively applied in various fields, such as mineral exploration, environmental protection, and military affairs [3][4][5].
Target detection (TD) is one of the most important HSI processing technologies [6,7].Different materials usually have distinct wavelength-dependent electromagnetic scattering and radiation characteristics, which essentially enables TD for HSI.In fact, TD is about solving a supervised dichotomous problem where pixels are labeled the target or background according to their similarity to a priori target spectra.
In recent decades, many TD algorithms have been developed.The probability and statistics-based model is one of the classic models for TD, and some traditional TD algorithms in the literature are proposed based on it.For example, constrained energy minimization (CEM) [8] is a finite impulse response filter which minimizes the total output energy, subjected to the constraint that the output of the target's spectral signature is one.In general, CEM can well distinguish the targets from background.
The adaptive cosine/coherence estimation (ACE) [9] and the adaptive matching filter (AMF) [10] algorithms are based on Kelly's generalized likelihood ratio detection operator.An assumption is made in these two detectors that a mixed background model is subject to the multivariate Gaussian distribution, and additive noise has already been included in the background.Generally, ACE and AMF detectors can be deduced from the generalized likelihood ratio test (GLRT).Closed-form solutions and a considerably low computational cost for TD can be usually achieved with these algorithms.Equipped with the characters above, CEM, ACE and AMF are commonly cited as classic TD methods in many works related to TD [11][12][13].
In fact, a vast majority of HSI pixels often belong to the background, while the target only presents in a few of pixels.The aforementioned methods commonly suffer from background interference, which may give rise to false alarms.Therefore, it is very important to improve the performance of the target detection algorithm by minimizing the interference of background pixels.
Several state-of-the-art hyperspectral target detection techniques have been proposed to cope with this issue.Two types of methods, i.e., background suppression and background separation, are usually considered.The hierarchical constrained energy minimization (hCEM) method [14] consists of different layers of classic CEM detectors, and the output of each layer is considered as the coefficient of the next layer to suppress the background.The geometric matched filter (GMF) method [15,16] provides an inventive way to suppress the background by estimating the background endmembers.Both hCEM and GMF detectors are adept at suppression of the undesired background, which enables them to better concentrate on the targets and acquire remarkable detection results.
Apart from background suppression, background separation also plays an important role in minimizing the background interference and has drawn increasing interest.It is thus very meaningful to develop a preprocessing method aiming at separating the background and target apart before TD is performed.Principal component analysis (PCA) [17] can be used to address this problem and decompose the HSI into a principal component (PC) part and a residual part.The background is supposed to mainly exist in the PC part, while the target is regarded to present in the residual part.Another famous method, independent component analysis (ICA), which is an unsupervised blind source separation technique has also already been used in hyperspectral data processing [18] and background separation [19].ICA can separate endmembers apart, which enables its application in hyperspectral target detection.Nevertheless, ICA only shows independent sources instead of signal intensity, which makes it difficult to discriminate the part containing the target.In this sense, the connection between ICA and our feature extraction preprocessing scheme actually seems to be considerably weak.Therefore, we consider PCA as a feasible approach for background separation.
However, PCA only takes advantage of the one-dimensional spectral features of samples, which signifies the absence of any spatial dimensions.As a result, PCA fails to employ the spatial information of HSI, which has become an inevitable drawback in HSI processing.Unfortunately, the spatial information of HSI is always of great significance.This fact prevents PCA from being a desirable instrument for HSI processing.
Mathematically, a hyperspectral data cube is actually a three-dimensional tensor [20,21].A tensor can be considered as an extension of a vector and matrix, which provides a convenient way to capture inter-correlation among multiple dimensions.In this sense, both the spectral and spatial domains of hyperspectral datasets can always be better integrated jointly in such an expression than can the situations in two-dimensional matrices.Based on a tensor model, Tucker decomposition [22,23] can take spatial-spectral information into consideration, but the essential difference between the spatial and spectral domains of HSI is ignored.These two classic feature extraction methods perform poorly in reflecting the HSI's intrinsic physical properties.
Despite this, algorithms related to tensor have attracted increasing attention in the field of hyperspectral image processing [24][25][26] recently.Tensor principal component analysis (TPCA) has been successfully used for feature extraction in HSI classification [27].Furthermore, we consider that the TPCA model compared with the PCA and Tucker decomposition models should have much more potential in TD.
The TPCA method is a tensorial counterpart of PCA.Not only can the spatial and spectral information of HSI can be considered jointly, but they are also dealt with discriminatively.In this sense, TPCA can overcome the drawbacks of PCA and Tucker decomposition, leading to a more accurate description of the HSI physical property.Inspired by this, we propose a TPCA-based TD preprocessing method.This method first operates TPCA on HSI, thus the imagery is decomposed into the sum of the principal component imagery and the residual imagery.Since the target only takes up a very small number of pixels, it is often considered to appear in the residual imagery.In this way, the target detection in the residual imagery of TPCA can largely reduce the interference of the background, which belongs to the principal component of the HSI, and produce better detection results.Experimental results demonstrate that by using the proposed algorithm, the background and the target can be well separated, and better TD results can be obtained than those with other feature extraction methods and state-of-the-art methods aiming at background suppression.
The main contributions of our work can be concisely summarized as follows: (1) A tensor principal component analysis (TPCA) based preprocessing method is proposed for hyperspectral target detection.Due to the decomposition of HSI into a principal part and a residual part for detection, the effect of background interference on detection can be well alleviated.
(2) The spectral and spatial information of HSI is not only employed jointly but is also treated differently in the proposed preprocessing method.This characteristic is especially important for revealing the HSI physical property in detection, which also makes the proposed method have superior performance compared with that of PCA and Tucker decomposition.
(3) TPCA-based detectors that combine the proposed preprocessing method with traditional detectors can significantly improve the performance of TD and can outperform some state-of-the-art TD methods.
Traditional hyperspectral target detectors suffer from background interference; thus, their detection performances are limited.Feature extraction preprocessing methods used for background separation, such as PCA and Tucker decomposition, fail to reveal the physical meaning of the HSI's spatial or spectral domains.The precise objective of this work is to provide a TPCA-based preprocessing method for TD through which both spatial and spectral information of HSI can be reasonably exploited.
The remainder of this paper is organized as follows.Section 2 reviews the related works.Our TPCA-based target detection algorithm is presented in Section 3. Section 4 shows the experimental results.The discussion is shown in Section 5. Conclusions are drawn in Section 6.

Symbols and Notations
Representations of basic notations used throughout the paper are listed in Table 1.
Tensor space (each element is a tensor of a certain size) Tensor vector (vector defined in a tensor space)

X tm
Tensor matrix (matrix defined in a tensor space) (x tv ) i ith entry of tensor vector The Fourier transform of X
Consider a hyperspectral image with N pixels and D bands, where each pixel vector can be denoted as x i ∈ R D , i = 1, 2, . . ., N, and let d ∈ R D represent an a priori target spectrum.The CEM [8] aims at producing an optimal finite impulse response (FIR) filter w whose unit impulse response is finite [29,30].Specifically, the optimal solution of w can be obtained in Equation ( 1), subject to the constraint that the filter's response to d is 1.
where R = (1/N)∑ N i=1 x i x T i is the estimated correlation matrix of the background.With the use of w CEM , the output of a pixel x i for detection is defined as follows: ACE [9] and AMF [10] are derived from Kelly's generalized likelihood ratio detection operator, which is based on the following competing hypothesis between the target and non-target pixels: where x 0 denotes the mean-removed pixel's spectrum, and columns of S are a priori target spectrum vectors.a is the corresponding abundance vector in the target pixel, whose entries represent the coefficients of several base material vectors [31].The parameter σ represents the intensity of the background in pixel x 0 .
Null hypothesis H 0 assumes that the pixel x 0 contains no target information, and it can be described with b, which is modeled by a multivariate normal distribution with a zero mean and covariance matrix Σ b [32].In contrast, alternative hypothesis H 1 assumes the target is present, and pixel x 0 can be linear represented by the background and target jointly.
The GLRT approach [33] leads to the output of ACE as follows: where d 0 denotes the mean-removed a priori target spectrum vector.
Under the hypothesis that the dimension of the target subspace is one, the AMF detector can be presented as follows:

Feature Extraction
Two classic feature extraction algorithms are presented: PCA and Tucker decomposition.
The traditional PCA algorithm linearly transforms the samples in the original spectral space to the vectors in the feature space, retaining most of the available information.For samples x 1 , x 2 , ..., x N ∈ R D , singular value decomposition (SVD) [28] first performed on their covariance matrix, as follows: where x = (1/N)∑ N k=1 x k is the sample mean, and U and V are two orthonormal matrices, i.e., UU T = VV T = I.Columns of U and V are called the left and right singular vectors of G, and the positive diagonal entries of S are called the singular values of G.Then, using U with the feature vectors being arranged according to the descending orders of the singular values, the PCA result of a test sample y can be obtained by the following: where the elements ranked ahead are the most significant PCs of y in the feature space.
Tucker decomposition [22,23] is a feature extraction algorithm based on tensor models.For a third-order tensor X ∈ R D 1 ×D 2 ×D , where D 1 and D 2 represent two spatial dimensions of the HSI, and The form of Tucker decomposition is as follows: where × n represents the n-mode product [22] (e.g., [X × n v] i 1 ,...,i n−1 ,i n+1 ,...,i N = ∑ and C ∈ R D×D are the factor matrices in three different dimensions, respectively.The first rows in these factor matrices stand for the principal components and can be used for tensor reconstruction.

Tensor Algebra
Given a second-order tensor A, which represents the reflectance values of an m × n local spatial neighborhood in a single band of HSI, and [A] i 1 ,i 2 denotes the (i 1 , i 2 )th entry of A, several definitions about tensor algebra are briefly introduced here [34][35][36].
Tensor multiplication: For two second-order tensors A, B ∈ R m×n , their tensor multiplication [34] C = A • B ∈ R m×n is defined as follows: Tensor space: Each element in a tensor space is a tensor with a certain size [34,35].Given a tensor space R t , to give a clearer explanation, we suppose that each element in R t is a tensor of m × n dimensions in this section.
Tensor matrix multiplication: Given tensor matrices X tm ∈ R D 1 ×D 2 t and Y tm ∈ R D 2 ×D 3 t , their tensor matrix multiplication [34,35] is defined as follows: where (X tm ) i 1 ,k • (Y tm ) k,i 2 is calculated according to Equation (9).
SVD of tensor matrix: Given a square tensor matrix , we have the singular value decomposition X tm = U tm • S tm • V T tm .It can be proved that tensor matrices U tm and U tm are all orthonormal tensor matrices [34,35].
Fourier transform of tensor matrix: Discrete Fourier transform (DFT) [37] transforms the digital data from the time domain into the frequency domain.The matrix DFT of X i 1 ,i 2 ∈ R D 1 ×D 2 [28] is carried out as follows: where j denotes an imaginary unit.Based on the matrix DFT, the Fourier transform of a tensor matrix X tm can be defined [36].The Fourier transform of X tm is X ftm = F(X tm ), and the DFT of the (ω

TPCA
TPCA is a tensorial extension of PCA, which performs feature extraction in a tensor space.Given samples (tensor vectors) x tv,1 , x tv,2 , ..., x tv,N ∈ R D t , the sample mean is x tv , and the sample covariance tensor matrix is calculated similarly to Equation (6) as follows: Singular value decomposition is operated on G tm : Then, similar to Equation ( 7) in Section 2.3, a sample y tv ∈ R D t is transformed into the feature tensor space as follows: where • denotes the tensor matrix multiplication provided in Equation (10).
After ŷtv is obtained, TPCA uses the tensor slice to restore it to a vector in the real number space: Since the tensor multiplication is defined in the form of a circular convolution, it can be implemented efficiently via the Fourier transform [37].The fast implementation of TPCA via Fourier transform is carried out as follows in Algorithm 1.

Proposed Method
In this work, TPCA is introduced as a preprocessing method for hyperspectral target detection.The proposed method can utilize the spatial and spectral properties of HSI jointly, which is in line with the physical meanings of HSI.

Tensorization
TPCA requires the input sample to be a tensor vector, but each pixel of HSI is a vector in the real number space.Therefore, the HSI pixel vectors should be tensorized.Let a hyperspectral imagery be expressed as t after tensorization.To this end, the spatial neighborhood information of the pixel x is used.Specifically, the pixels in its n × n spatial neighborhood are taken into consideration.Since there may be pixels whose n × n neighborhood is beyond the border of the imagery, the circular-shift neighborhood of x is employed to form a tensor vector, which is defined in the n × n second-order tensor space.The formula of tensorization is provided as follows: x tv,(i where We simply take advantage of tensorization to preserve the local spatial information of the HSI.For the pixel x (i 1 ,i 2 ) whose n × n spatial neighborhood is within the border of the imagery, tensorization simply selects pixels of the spatial neighborhood of is beyond the border, the pixels of the corresponding opposite side will take the place of the vacancies, which is called the circular-shift neighborhood of Tensorization enables the introduction of tensor approaches with spatial information into HSI processing.Notably, our approach merely adds local spatial information to tensorization, which is quite different from global solutions that refer to the whole HSI as a tensor.As a result, the selected tensors will contain more spatial information related to the center pixel x (i 1 ,i 2 ) than only a tensor with global consideration.This operation is critical for better reflecting the HSI physical property and helps to determine the background accurately.

TPCA-Based Target Detection
In the TPCA of Section 2, the transformation tensor matrix U tm obtained in Equation ( 13) is orthonormal.Therefore, U tm can be used to transform the sample tensor vectors in the feature tensor space back into the original tensor space.Similar to PCA, the first few columns of U tm are used as the PC part and the remaining columns as the residual part.Obviously, these two separated parts are also orthonormal.By using them as transformation matrices, inverse transformation is performed on the TPCA-transformed pixels in the feature space.Then, the PC part X and residual part E are obtained and X = X + E .
It is assumed that X mainly corresponds to the background of the original imagery, and the target exists in E because the target is usually scarce in HSI.Accordingly, TD can be performed on E using the PC-part-removed target spectrum to remove the background interference.
Specifically, the procedure of the proposed preprocessing approach includes three aspects.First, the original HSI and a priori target spectrum are tensorized.Since the target spectrum contains no spatial information, its neighborhood spectra are set as itself in tensorization, as follows: where t ∈ R D is an a priori target spectrum.Sample pixels are randomly selected from x 1 , x 2 , ..., x N ∈ R D .Second, the samples are tensorized and TPCA is performed to obtain a transformation tensor matrix U tm .Finally, the first n PC columns of U tm are removed to obtain a residual transformation tensor matrix The number of PCs, noted as n PC , is required to be properly determined.The descent of energy of imagery is considered in this paper.The energy of the residual part is calculated according to the 2 -norm of E (n PC ), where E (n PC ) represents the residual part after the removal of n PC PCs, thus the relative residual energy can be represented as E (n PC ) 2 / X 2 , where X 2 is the 2 -norm [38] of the whole imagery, calculated as . The descending pace of the relative residual energy ∆ E is utilized for determining n PC .Specifically, n PC is the smallest number of PCs satisfying the following equation: where δ is a threshold parameter.δ is set as 0.5% in this paper for all datasets.The first n PC rows of the feature tensor vector ŷtv and the target feature tensor vector ttv = U T tm • (t tv − x tv ) are removed as well.Then, they are transformed into the original tensor space using the residual transformation tensor matrix U E ,tm .Finally, they are restored to the real number space to obtain the residual part E and the residual target spectrum t E .Traditional target detection algorithms are performed in E using t E to obtain the TD results.Furthermore, the corresponding operation is sped up by Fourier transform.The TD algorithm based on TPCA using Fourier transform is presented as follows in Algorithm 2.

Algorithm 2: TD based on TPCA via Fourier transform
Step 1. Tensorize each pixel of X and t by Equations ( 16) and ( 17).
Step 2. Randomly select part of the pixels as samples.
Step 3. Compute the Fourier transform of pixels and target spectrum: end, thus we obtain (ω 1 , ω 2 ) slice of residual part in Fourier domain Step 7. Compute TPCA result t E and E in the real number space by Equation (15).
Step 8. Perform target detection with E and t E to obtain Z.

Synthetic Data Experiments
The background of the synthetic data comes from the Cuprite dataset collected by an Airborne Visible/InfraRed Imaging Spectrometer (AVIRIS) (http://www.ehu.eus/ccwintco/index.php?title= Hyperspectral_Remote_Sensing_Scenes), containing 224 bands.After removal of low-SNR and water absorption bands, 188 bands remained.A region with the size of 250 × 191 pixels, shown in Figure 1a, is selected as the background.The embedded target spectrum (shown in Figure 1b) is Muscovite_GD111 from the U.S. Geological Survey (USGS) [39] spectral set, with the same bands being removed.Five targets with size 1 × 1 and five targets with size 2 × 2 (shown in Figure 1c) are randomly embedded in the background.Due to the low spatial resolution of HSI, mixed pixels widely exist in practice.Linear mixing model (LMM) [40] is a simple but physically meaningful model which is very popular and useful in describing the mixed pixels.Based on the LMM, a synthetic target spectrum z is generated.It is mixed by a desired target spectrum t and a background pixel spectrum b according to a random abundance fraction f between 0 and 1: If the abundance fraction f is large enough, the synthetic pixel z becomes more similar to the target t.In this case, the target can be detected easily.However, z may be separated into the background when f is very small, which may result in the failure of detection.After that, 30 dB Gaussian white noise is added into the synthetic data.

1) Parameters Analysis
First, the impact of the parameters on TD is analyzed using the synthetic data shown in Figure 1.The curve in Figure 2 shows how 2 () PC n varies with PC n according to Equation (18).It can be observed that the energy of the residual part changes slowly after the 4th PC.Moreover, three, four, and five PCs are used to obtain the residual part for the TD, respectively.The experimental results are shown in Figure 3 and Table 2, indicating that not only is the criterion in Equation ( 18) effective, but the TPCA-based preprocessing method is also robust to the number of PCs.The bold in all the tables in this paper denotes the highest AUCs.Second, the impact of the sample rate on the TD results is investigated when 10-90% of the pixels of the dataset are tensorized as the training samples.The AUCs obtained for the ROC curves are shown in Figure 4.It indicates that the rate of training samples has no obvious effect on the detection results.Therefore, we select 40% of the pixels in the image as the training samples for the following experiments.The detection accuracy is evaluated by receiver operating characteristic (ROC) curves [41] and their areas under the curves (AUCs).Based on the ground truth, the ROC curves describe the varying relationship of detection probability P d and false alarm rate P f , which provide an unbiased quantitative and threshold-free performance comparison of the different detectors.ROC curves have been widely used as a performance evaluation tool in TD applications.P d and P f are defined as follows: where N d represents the number of detected target pixels under a certain threshold, N t represents the number of target pixels in the image, N f represents the number of background pixels mistaken as targets, and N all represents the total number of pixels in the image.

1) Parameters Analysis
First, the impact of the parameters on TD is analyzed using the synthetic data shown in Figure 1.The curve in Figure 2 shows how E (n PC ) 2 varies with n PC according to Equation (18).It can be observed that the energy of the residual part changes slowly after the 4th PC.Moreover, three, four, and five PCs are used to obtain the residual part for the TD, respectively.The experimental results are shown in Figure 3 and Table 2, indicating that not only is the criterion in Equation (18) effective, but the TPCA-based preprocessing method is also robust to the number of PCs.The bold in all the tables in this paper denotes the highest AUCs.N represents the total number of pixels in the image.

1) Parameters Analysis
First, the impact of the parameters on TD is analyzed using the synthetic data shown in Figure 1.The curve in Figure 2 shows how 2 () PC n varies with PC n according to Equation (18).It can be observed that the energy of the residual part changes slowly after the 4th PC.Moreover, three, four, and five PCs are used to obtain the residual part for the TD, respectively.The experimental results are shown in Figure 3 and Table 2, indicating that not only is the criterion in Equation (18) effective, but the TPCA-based preprocessing method is also robust to the number of PCs.The bold in all the tables in this paper denotes the highest AUCs.Third, the impact of the size of the tensorization neighborhood is studied.Different neighborhood sizes are selected, and the number of PCs is set as 4 and the sample rate is 40%.The ROC curves and the running times of the TPCA algorithm are displayed in Figure 5 and Table 3, respectively.It can be observed that the best detection results are obtained when the size of the neighborhood is 3 × 3.This is because the size 2 × 2 fails to contain enough spatial information, while the size larger than 3 × 3 comprises superfluous spatial information.Additionally, a TPCA with a neighborhood size larger than 3 × 3 is more time-consuming.Third, the impact of the size of the tensorization neighborhood is studied.Different neighborhood sizes are selected, and the number of PCs is set as 4 and the sample rate is 40%.The ROC curves and the running times of the TPCA algorithm are displayed in Figure 5 and Table 3, respectively.It can be observed that the best detection results are obtained when the size of the neighborhood is 3 × 3.This is because the size 2 × 2 fails to contain enough spatial information, while the size larger than 3 × 3 comprises superfluous spatial information.Additionally, a TPCA with a neighborhood size larger than 3 × 3 is more time-consuming.Third, the impact of the size of the tensorization neighborhood is studied.Different neighborhood sizes are selected, and the number of PCs is set as 4 and the sample rate is 40%.The ROC curves and the running times of the TPCA algorithm are displayed in Figure 5 and Table 3, respectively.It can be observed that the best detection results are obtained when the size of the neighborhood is 3 × 3.This is because the size 2 × 2 fails to contain enough spatial information, while the size larger than 3 × 3 comprises superfluous spatial information.Additionally, a TPCA with a neighborhood size larger than 3 × 3 is more time-consuming.Third, the impact of the size of the tensorization neighborhood is studied.Different neighborhood sizes are selected, and the number of PCs is set as 4 and the sample rate is 40%.The ROC curves and the running times of the TPCA algorithm are displayed in Figure 5 and Table 3, respectively.It can be observed that the best detection results are obtained when the size of the neighborhood is 3 × 3.This is because the size 2 × 2 fails to contain enough spatial information, while the size larger than 3 × 3 comprises superfluous spatial information.Additionally, a TPCA with a neighborhood size larger than 3 × 3 is more time-consuming.

2) Quantitative Evaluation of TD Results
Finally, the performances of the feature extraction algorithms PCA and Tucker decomposition for TD, and the TD results obtained using the original imagery (noted as original) are compared in the following experiments.The numbers of PCs in the PCA and Tucker decomposition's spectral domain are set to be the same as TPCA.According to Figure 6, the numbers of PCs in the Tucker decomposition's spatial dimensions (noted as the PC number of SD1 and the PC number of SD2, respectively) are both set to five, corresponding to the best results.The sample rate of the PCA is consistent with the TPCA's.Here, the number of PCs is set as 4, the sample rate is equal to 40%, and the spatial neighborhood size is 3 × 3 for TPCA.
Figure 7 indicates, for the original imagery X of the synthetic data, its PC part X and residual part E. The result clearly reveals that, with our TPCA method, the background mostly appears in the PC part and can be removed, while the targets are well preserved in the residual part.Using the preprocessed results, TD is performed for the residual part by CEM [8], ACE [9] and AMF [10], individually.It can be observed from the detection results in Figures 8 and 9 that, among the TD algorithms combining various feature extraction methods with CEM, ACE and AMF, the TPCA preprocessing method has the best TD results.

2) Quantitative Evaluation of TD Results
Finally, the performances of the feature extraction algorithms PCA and Tucker decomposition for TD, and the TD results obtained using the original imagery (noted as original) are compared in the following experiments.The numbers of PCs in the PCA and Tucker decomposition's spectral domain are set to be the same as TPCA.According to Figure 6, the numbers of PCs in the Tucker decomposition's spatial dimensions (noted as the PC number of SD1 and the PC number of SD2, respectively) are both set to five, corresponding to the best results.The sample rate of the PCA is consistent with the TPCA's.Here, the number of PCs is set as 4, the sample rate is equal to 40%, and the spatial neighborhood size is 3 × 3 for TPCA.
Figure 7 indicates, for the original imagery of the synthetic data, its PC part ' and residual part .The result clearly reveals that, with our TPCA method, the background mostly appears in the PC part and can be removed, while the targets are well preserved in the residual part.Using the preprocessed results, TD is performed for the residual part by CEM [8], ACE [9] and AMF [10], individually.It can be observed from the detection results in Figures 8 and 9 that, among the TD algorithms combining various feature extraction methods with CEM, ACE and AMF, the TPCA preprocessing method has the best TD results.

2) Quantitative Evaluation of TD Results
Finally, the performances of the feature extraction algorithms PCA and Tucker decomposition for TD, and the TD results obtained using the original imagery (noted as original) are compared in the following experiments.The numbers of PCs in the PCA and Tucker decomposition's spectral domain are set to be the same as TPCA.According to Figure 6, the numbers of PCs in the Tucker decomposition's spatial dimensions (noted as the PC number of SD1 and the PC number of SD2, respectively) are both set to five, corresponding to the best results.The sample rate of the PCA is consistent with the TPCA's.Here, the number of PCs is set as 4, the sample rate is equal to 40%, and the spatial neighborhood size is 3 × 3 for TPCA.
Figure 7 indicates, for the original imagery of the synthetic data, its PC part ' and residual part .The result clearly reveals that, with our TPCA method, the background mostly appears in the PC part and can be removed, while the targets are well preserved in the residual part.Using the preprocessed results, TD is performed for the residual part by CEM [8], ACE [9] and AMF [10], individually.It can be observed from the detection results in Figures 8 and 9 that, among the TD algorithms combining various feature extraction methods with CEM, ACE and AMF, the TPCA preprocessing method has the best TD results.Moreover, to further study the stability of the proposed algorithm, the experiments are repeated 20 times with randomly generated target pixels' locations and embedded abundances.The AUCs are shown in Table 4.The experimental results show that the TPCA-based TD algorithm provides the best and the most stable detection results.This is because the proposed method can take advantage of TPCA to reflect HSI's physical property compared with that of other preprocessing approaches.Moreover, to further study the stability of the proposed algorithm, the experiments are repeated 20 times with randomly generated target pixels' locations and embedded abundances.The AUCs are shown in Table 4.The experimental results show that the TPCA-based TD algorithm provides the best and the most stable detection results.This is because the proposed method can take advantage of TPCA to reflect HSI's physical property compared with that of other preprocessing approaches.Moreover, to further study the stability of the proposed algorithm, the experiments are repeated 20 times with randomly generated target pixels' locations and embedded abundances.The AUCs are shown in Table 4.The experimental results show that the TPCA-based TD algorithm provides the best and the most stable detection results.This is because the proposed method can take advantage of TPCA to reflect HSI's physical property compared with that of other preprocessing approaches.
By simply combining our TPCA-based preprocessing method with the traditional TD algorithm, a novel target detector can be obtained.For instance, our TPCA-based method can be incorporated with CEM [8] to form a TPCA + CEM detector.The experiments are repeated 20 times with randomly generated target pixel locations and embedded abundances.The experimental results shown in Table 5 indicate that the combination TPCA + CEM has superiority in TD accuracy and stability compared with that of state-of-the-art detection algorithms that are designed to suppress background interference such as hierarchical CEM (hCEM) [14] and geometric matched filter (GMF) [15,16].By simply combining our TPCA-based preprocessing method with the traditional TD algorithm, a novel target detector can be obtained.For instance, our TPCA-based method can be incorporated with CEM [8] to form a TPCA + CEM detector.The experiments are repeated 20 times with randomly generated target pixel locations and embedded abundances.The experimental results shown in Table 5 indicate that the combination TPCA + CEM has superiority in TD accuracy and stability compared with that of state-of-the-art detection algorithms that are designed to suppress background interference such as hierarchical CEM (hCEM) [14] and geometric matched filter (GMF) [15,16].TD is also operated by CEM, ACE, and AMF using the original imagery and the residual parts of PCA, Tucker decomposition, and TPCA, respectively.The detection results are shown in Figures 12 and 13.The ROC curves are presented in Figures 14 and 15.The AUCs are displayed in Table 6.Similar to the experimental results of the synthetic data, the experimental results of the real data also reveal that the proposed method performs better than other methods do for both the HYDICE and Hyperion datasets.Similar to the operations for synthetic data, the TPCA-based preprocessing method is integrated with traditional detectors, such as CEM.The ROC curves demonstrated in Figure 16 and the AUCs and running time presented in Table 7 prove that our method outperforms the state-of-the-art TD algorithms in terms of TD accuracy but has a higher time cost.This result is because the computational complexity of tensor calculations is high, which is induced by retaining the spatial relationships in numerous local patches of images.In this sense, the acceleration of this procedure can be a very meaningful goal in practical applications.TD is also operated by CEM, ACE, and AMF using the original imagery and the residual parts of PCA, Tucker decomposition, and TPCA, respectively.The detection results are shown in Figures 12 and 13.The ROC curves are presented in Figures 14 and 15.The AUCs are displayed in Table 6.Similar to the experimental results of the synthetic data, the experimental results of the real data also reveal that the proposed method performs better than other methods do for both the HYDICE and Hyperion datasets.Similar to the operations for synthetic data, the TPCA-based preprocessing method is integrated with traditional detectors, such as CEM.The ROC curves demonstrated in Figure 16 and the AUCs and running time presented in Table 7 prove that our method outperforms the state-of-the-art TD algorithms in terms of TD accuracy but has a higher time cost.This result is because the computational complexity of tensor calculations is high, which is induced by retaining the spatial relationships in numerous local patches of images.In this sense, the acceleration of this procedure can be a very meaningful goal in practical applications.TD is also operated by CEM, ACE, and AMF using the original imagery and the residual parts of PCA, Tucker decomposition, and TPCA, respectively.The detection results are shown in Figures 12 and 13.The ROC curves are presented in Figures 14 and 15.The AUCs are displayed in Table 6.Similar to the experimental results of the synthetic data, the experimental results of the real data also reveal that the proposed method performs better than other methods do for both the HYDICE and Hyperion datasets.Similar to the operations for synthetic data, the TPCA-based preprocessing method is integrated with traditional detectors, such as CEM.The ROC curves demonstrated in Figure 16 and the AUCs and running time presented in Table 7 prove that our method outperforms the state-of-the-art TD algorithms in terms of TD accuracy but has a higher time cost.This result is because the computational complexity of tensor calculations is high, which is induced by retaining the spatial relationships in numerous local patches of images.In this sense, the acceleration of this procedure can be a very meaningful goal in practical applications.

Experimental Results Analysis
The detection results shown in Section 4 indicate that our TPCA-based preprocessing method has a remarkable background energy suppression property.Parameter analysis experiments operated on synthetic data not only demonstrate that the employed number of PCs selecting criterion in Equation ( 18) is valid but also provide the best parameters for later experiments: the TPCA sample rate is 40% and size of the neighborhood is 3 × 3. It implies that the simplicity of parameter selection in our TPCA-based algorithm can be proven.
Figure 7 describes the PC part and the residual part.The PC part is mostly similar to the synthetic data, and the targets lie in the residual part.This result demonstrates that our TPCA-based preprocessing method can leave most of the background information in the PC part, while targets are preserved in the residual part.The detection results shown in Figure 8 indicate the effectiveness of different preprocessing methods in various TD methods.Compared with that of other methods, fewer background pixels are detected as targets by the proposed TPCA-based preprocessing method.

Experimental Results Analysis
The detection results shown in Section 4 indicate that our TPCA-based preprocessing method has a remarkable background energy suppression property.Parameter analysis experiments operated on synthetic data not only demonstrate that the employed number of PCs selecting criterion in Equation ( 18) is valid but also provide the best parameters for later experiments: the TPCA sample rate is 40% and size of the neighborhood is 3 × 3. It implies that the simplicity of parameter selection in our TPCA-based algorithm can be proven.
Figure 7 describes the PC part and the residual part.The PC part is mostly similar to the synthetic data, and the targets lie in the residual part.This result demonstrates that our TPCA-based preprocessing method can leave most of the background information in the PC part, while targets are preserved in the residual part.The detection results shown in Figure 8 indicate the effectiveness of different preprocessing methods in various TD methods.Compared with that of other methods, fewer background pixels are detected as targets by the proposed TPCA-based preprocessing method.It can be explained that less background energy is remaining in the residual part of our method than that in others.Although Tucker decomposition is also good at background suppression, the target information is lacking in its residual part.The ROC curves in Figure 9 give support to the visualized results in Figure 8.The higher detection probability and lower false alarm rate signifies more target preservation and background removal.After repeating the experiments 20 times to randomly generate various synthetic data, the results shown in Table 4 manifest the stability of our TPCA-based algorithm.Furthermore, our preprocessing method not only improves the performances of traditional target detectors but also constructs new compositional detectors by simply combining the TPCA method with traditional detectors.The results in Table 5 prove that the effectiveness of the TPCA + CEM detector is even better than the state-of-the-art hCEM and GMF detectors aimed at background suppression.
Adopting the parameters selecting criterion mentioned above, detections are carried out in two real hyperspectral datasets, HYDICE data and Hyperion data.According to the detection results shown in Figures 12 and 13, similar conclusions to the synthetic data experiments can be drawn.Our TPCA-based approach has more remarkable performance in both background separation and target preservation.Compared with PCA, Tucker decomposition and original imagery, the targets are more conspicuous and the results are clearer with TPCA-based preprocessing.A convincing instance can be derived from the performance of the ACE detector in the Hyperion dataset.Targets are difficult to identify among false alarm pixels in the original image and Tucker decomposition detection results, and few targets are discerned in the PCA detection result.However, the ACE detector performs well in the TPCA residual part.The ROC curves shown in Figures 14 and 15, as well as AUCs listed in Table 6 also confirm the outperformance of our method.Figure 16 and Table 7 indicate that the TPCA + CEM combination detector rivals the state-of-the-art algorithms designed for background suppression, and the former behaves even better.
In brief, experiments on both synthetic data and real data reveal the superiority of our TPCA-based preprocessing method in terms of detection accuracy over that of PCA and Tucker decomposition.Furthermore, regarding the HSI target detector, by combining TPCA with traditional TD methods such as CEM, it can outperform the state-of-the-art algorithms aimed at background suppression.

Advantages and Limitations
From the previous discussions, the advantages of the TPCA-based preprocessing method are briefly summarized as follows: 1.
The employment of the TPCA-based method can take advantage of information of both the one spectral dimension and two spatial dimensions simultaneously, which is consistent with the HSI physical property.Higher detection accuracy and stability are obtained using our TPCA-based method than that with traditional algorithms and the state-of-the-art algorithms.

2.
Very few parameters must be determined manually.According to the discussion in Section 4, the number of PCs can be automatically calculated by Equation ( 18), and, further, our method is robust to different sample rates.

3.
Since our TPCA-based preprocessing method can be combined with any traditional TD method (such as CEM), it should provide great flexibility for practical use.
However, our method has an inevitably high time cost, which is its main limitation.Acceleration strategies will be investigated in our future work, for instance, the adoption of more effective strategies to construct the tensor computation and the use of parallel computation and GPU.

Conclusions
In this paper, a novel TPCA-based preprocessing method for hyperspectral TD is proposed to alleviate background interference.The proposed method decomposes a HSI into the sum of a PC part and a residual part via TPCA.Because the target is scarce in HSI, the residual part can be effectively utilized to perform TD and greatly reduce the background interference.Compared with other preprocessing methods, the TPCA-based method performs better in revealing the HSI physical property.The experimental results on both the synthetic and real hyperspectral datasets showed that the accuracy of the traditional TD methods can be greatly improved by using the proposed preprocessing method.Moreover, detection approaches that combine our TPCA-based preprocessing method with traditional TD methods outperformed the state-of-the-art algorithms aimed at background suppression.
However, there are some issues to be addressed in further research.The high computational complexity associated with tensor methods should be reduced for practical use.A more effective strategy to construct the tensor computation will be studied.Furthermore, we will focus on more efficient methods, such as parallel processing or GPU computing, to accelerate the computation.

Algorithm 1 :
Fast TPCA computation via Fourier transform Input: x tv,1 , x tv,2 , ..., x tv,N ∈ R D t : Training tensor vector samples y tv ∈ R D t : Test tensor vector sample Output: ŷ ∈ R D : TPCA result in the real number space Step 1. Compute sample mean as x tv = (1/N)∑ N k=1 x tv,k Step 2. Compute the Fourier transform of samples: x ftv,k = F(x tv,k − x tv ), y ftv = F(y tv − x tv ) Step 3.For all slices

Figure 2 .
Figure 2. The curve of

Figure 2 .
Figure 2. The curve of

Figure 2 .Figure 3 .
Figure 2. The curve of E (n PC ) 2 varying with n PC .

Figure 8 .Figure 8 .Figure 9 .
Figure 8.Detection results of the synthetic dataset with different feature extraction methods and traditional TD methods.

4. 2 .
Real Data Experiments Two publicly available hyperspectral datasets are employed in the real data experiments.The first dataset was collected by a HYperspectral Digital Imagery Collection Experiment (HYDICE) sensor.It contains 210 bands, and 188 bands are used after the removal of the low SNR and water absorption bands.An interest region of 80 × 100 pixels is used.The second hyperspectral dataset was collected by Hyperion sensor, and has 242 bands.After discarding the low SNR and water absorption bands, 155 bands remained.The pseudocolor images and ground truths of the two real datasets are shown in Figures 10 and 11, respectively.The number of PCs is selected according to Equation (18).Remote Sens. 2018, 10, x FOR PEER REVIEW 14 of 21 Two publicly available hyperspectral datasets are employed in the real data experiments.The first dataset was collected by a HYperspectral Digital Imagery Collection Experiment (HYDICE) sensor.It contains 210 bands, and 188 bands are used after the removal of the low SNR and water absorption bands.An interest region of 80 × 100 pixels is used.The second hyperspectral dataset was collected by Hyperion sensor, and has 242 bands.After discarding the low SNR and water absorption bands, 155 bands remained.The pseudocolor images and ground truths of the two real datasets are shown in Figures 10 and 11, respectively.The number of PCs is selected according to Equation (18).

Figure 12 .
Figure 12.Detection results of the HYDICE dataset with different feature extraction methods and traditional TD methods.

Figure 12 .
Figure 12.Detection results of the HYDICE dataset with different feature extraction methods and traditional TD methods.

Figure 12 .Figure 13 .Figure 14 .Figure 15 .
Figure 12.Detection results of the HYDICE dataset with different feature extraction methods and traditional TD methods.

Figure 13 .Figure 13 .Figure 14 .Figure 15 .
Figure 13.Detection results of the Hyperion dataset with different feature extraction methods and traditional TD methods.

Figure 16 .
Figure 16.ROC curves with different target detection methods for the real data: (a) HYDICE; and (b) Hyperion.

Figure 16 .
Figure 16.ROC curves with different target detection methods for the real data: (a) HYDICE; and (b) Hyperion.

Table 1 .
Representations of basic notations.

Table 2 .
AUCs of ROC curves for TPCA-based TD with different numbers of PCs.

Table 2 .
AUCs of ROC curves for TPCA-based TD with different numbers of PCs.

Table 2 .
AUCs of ROC curves for TPCA-based TD with different numbers of PCs.

Table 2 .
AUCs of ROC curves for TPCA-based TD with different numbers of PCs.

Table 3 .
AUCS and running time with different neighborhood sizes.

Table 3 .
AUCS and running time with different neighborhood sizes.

Table 3 .
AUCS and running time with different neighborhood sizes.

Table 4 .
AUCs of ROC curves for TD for synthetic data with different feature extraction methods repeated 20 times.

Table 5 .
AUCs and running times for TD on synthetic datasets with different TD algorithms repeated 20 times.

Table 4 .
AUCs of ROC curves for TD for synthetic data with different feature extraction methods repeated 20 times.

Table 5 .
AUCs and running times for TD on synthetic datasets with different TD algorithms repeated 20 times.

Table 6 .
AUCs and running time for the TD of synthetic datasets with different TD algorithms.

Table 6 .
AUCs and running time for the TD of synthetic datasets with different TD algorithms.

Table 7 .
AUCs and running time for the TD of real datasets with different TD algorithms.

Table 7 .
AUCs and running time for the TD of real datasets with different TD algorithms.