Tensor Discriminant Analysis via Compact Feature Representation for Hyperspectral Images Dimensionality Reduction

: Dimensionality reduction is of great importance which aims at reducing the spectral dimensionality while keeping the desirable intrinsic structure information of hyperspectral images. Tensor analysis which can retain both spatial and spectral information of hyperspectral images has caused more and more concern in the ﬁeld of hyperspectral images processing. In general, a desirable low dimensionality feature representation should be discriminative and compact. To achieve this, a tensor discriminant analysis model via compact feature representation (TDA-CFR) was proposed in this paper. In TDA-CFR, the traditional linear discriminant analysis was extended to tensor space to make the resulting feature representation more informative and discriminative. Furthermore, TDA-CFR redeﬁnes the feature representation of each spectral band by employing the tensor low rank decomposition framework which leads to a more compact representation.


Introduction
Hyperspectral images can offer wealth ground objects information which make the precision analysis of different material come true. On the other hand, the high spectral dimensionality not only leads to high computational and storage costs but also degrade the processing performance especially when the training samples are scarce, which is known as "curse of dimensionality". Dimensionality reduction (DR) is a critical preprocessing of hyperspectral images which aims at reducing the spectral dimensionality while keeping the desirable intrinsic structure information of hyperspectral images [1,2].
According to whether the labeled training samples are used or not, the existing dimensionality reduction methods can be categorized into three categories: unsupervised, supervised and semi-supervised. The labeled samples are needed in supervised methods which may lead to a more discriminative low dimensionality subspace, but the cost of labeling samples in hyperspectral images is extreme high which may limit the application of these methods in practice. The most widely used supervised dimensionality reduction is Linear Discriminant Analysis (LDA). The unsupervised methods obtain the low dimensionality representation by mining the structure characters of original dataset and need no label samples and Principal Components Analysis (PCA) is the most famous unsupervised criterion. To jointly consider the advantages of supervised and unsupervised methods, the semi-supervised criterion utilizes the label information from a few labeled samples and the structure information extracted from a large number of unlabeled samples [3,4].
From another perspective, the existing DR methods can also be classified into two types: feature extraction [5][6][7] and feature selection [8,9]. When feature selection is concerned, it refers to select feature subset which can retain the most original information of hyperspectral images according to some designed criteria. While feature extraction refers to project the original dataset into a low dimensionality with a designed project matrix. A lot of dimensionality reduction methods have been proposed, but how to obtain a desirable lower dimensionality feature representation of original hyperspectral images remains a challenge. LDA is the most popular discriminant analysis criterion which aims at projecting the original high dimensionality dataset to a lower dimensionality subspace where the samples with the same labels are close to each other and the samples with different labels are apart from each other. With the advantages of enhance the discrimination of projected dataset, LDA has been widely used in subspace learning community [10][11][12][13]. In traditional LDA based methods, the original cube hyperspectral images have to be converted to matrix and the corresponding pixel samples is presented in vector form, the vectorization processing may destroy the intrinsic spatial region structure information.
To overcome the disadvantage of vectorization, a multilinear algebra framework, i.e., tensor analysis has been introduced to the community of hyperspectral images processing [14][15][16][17][18][19]. By presenting the original cube image as a 3-order tensor, tensor analysis can deal with the tensor samples directly. Furthermore, to jointly consider the advantages of discriminant analysis and tensor analysis, the traditional LDA model has been extended to tensor space via different criteria. Yan et al. [20] proposed a multilinear discriminant analysis (MDA) model which treated the original images as a two or three order tensor and achieved supervised dimensionality reduction. Tao et al. [21] extended the differential scatter discriminant criterion (DSDC) model to tensor space and constructed the general tensor discriminant analysis (GTDA) framework. Nie et al. proposed a local within class scatter matrix criterion which overcome the drawback of Gaussian hypothesis in LDA [22]. Zhong et al. proposed an integrated spatial-spectral feature extract method for original hyperspectral images under the tensor analysis framework which can characterize the intrinsic formation of the original hyperspectral images more efficiently [23]. By constructing the same class and different class patch with tensor samples, Zhang et al. proposed the discriminative analysis by maximizing the distances of different class while minimizing the distances of different class [19].
It can be observed from the analysis above that, extracting the intrinsic spatial-spectral information of the original hyperspectral images and enhancing the discriminant ability are two critical issues in hyperspectral images dimensionality reduction. In vector samples based discriminant analysis methods, the spatial neighborhood information of hyperspectral images may be destroyed during the process of vectorization which may degrade the performance of dimensionality reduction. In addition, some tensor based discriminant analysis methods extend the discriminant analysis model into tensor space to make the processed dataset more discriminantive, but these methods are directly implemented on the raw spectral bands which may degrade the dimensionality reduction performance due to the redundant, scattered and chaotic spectral information distribution.
In general, the compact feature representation is more informative and representative [24]. When hyperspectral images are concerned, the wealthy spectral information makes the analysis of land covers more efficient. It is noted that, the original information of hyperspectral images is distributed in all the spectral bands randomly, and the information offered by different bands may be scattered, or even chaotic and conflicting. All these may degrade the representative ability of original spectral bands. To alleviate the conflict of different bands and derive a compact feature representation, a low rank tensor decomposition based compact feature representation method (TDA-CFR) is proposed. In TDA-CFR, hyperspectral image is treated as a 3-order tensor dataset, and a novel tensor decomposition criterion is proposed to eliminate the spectral redundant information and make the resulting feature representation more compact and informative. The flowchart of the proposed TDA-CFR is shown in Figure 1.
The main contributions of this paper can be concluded from three points: (1) by employing the fixed spatial window criterion, the proposed TDA-CFR constructs the training samples in 3-order tensor form which can well preserve the spatial neighborhood information and lead to a more effective representation of the hyperspectral image. (2) by employing tensor low rank criterion, the proposed method obtains a more compact feature representation which can eliminate the chaotic and conflicting information offered by different bands. (3) by integrating the compact feature representation into the tensor discriminant analysis framework, the proposed method obtains a compact and discriminative feature representation.
The rest of this paper is arranged as follows. Some related works are introduced in Section 2. TDA-CFR is described detailed in Section 3. In Section 4, experiments are conducted to evaluate the performance of TDA-CFR. Some important issues of the proposed method are discussed in Section 4 and Section 5 shows the conclusion of this paper.

Linear Discriminant Analysis
LDA [25] aims at finding a projection direction which can ensure the samples with the same labels are as close as possible in the low dimensionality subspace, while samples belonging to different classes are far from each other. Suppose there are C classes of all samples and x ij denotes the samples j belonging to class i, n i is the training samples number of class i, m i is the mean values of the samples belonging to class i and m is the mean value of all samples. Then the within class scatter matrix can be defined as [21] similarly, the between class scatter matrix can be defined as With the difference scatter discriminant criterion, the objective function of LDA can be represented as where ζ is a tuning parameter and U is the optimal projection matrix.

Tensor Analysis
In this section, some basic tensor operations are presented. For an N-order tensor X ∈ R I 1 ×I 2 ×···×I N , the related tensor operations are elaborated as follows.
n-mode flattening matrix: By fixing all the indices of X except i n , we can obtain the n-mode vectors of X . n-mode flattening matrix is constructed by taking all the n-mode vectors in columns which can be represented as n-mode product: Given a matrix U ∈ R J×I n , the n-mode product can be denoted as Y = X × n U, where the symbols "X × n U" means the tensor X products matrix U along mode n and Y ∈ R I 1 ×···×J×···×I N with elements Y i 1 i 2 ···i n−1 ji n+1 ···i N = ∑ I n i n =1 x i 1 i 2 ···i n−1 i n i n+1 ···i N u ji n . In addition, the n-mode product can be denoted in a matrix form Tucker decomposition: Given the factor matrices U n ∈ R I n ×R n (1 ≤ n ≤ N), Tucker decomposition can be formulated as where "•" is the outer product of the vectors.
Tensor Frobenius norm: The tensor Frobenius norm is calculated by

Tensor Discriminant Analysis
In this section, the traditional LDA is extended to tensor discriminant analysis (TDA). Suppose p tensor samples, i.e., X 1 , X 2 , · · · , X p ∈ R I 1 ×I 2 ×···×I N , belong to C classes, n i is the samples number Similar to the discussion above, TDA aims at finding a series of factor matrices U 1 , U 2 , · · · , U N , U n ∈ R In ×I n (n = 1, 2, · · · N) to project the original tensor dataset to a low dimensionality subspace, where Y ij is the projected data of X ij . The projected mean tensor of class i in the low dimensionality subspace can be represented as Similarly, the projected mean tensor of all samples in the low dimensionality subspace can be denoted asM It is noted that, the solution of k factor matrices in Equations (9) and (10) are high order optimization problems which cannot be solved simultaneously. An iteration scheme is adopted to calculate the optimal factor matrices by solving one factor matrix at a time while with the remaining fixed [26]. To apply this iteration scheme, the tensor samples are flattened along mode n and the between class scatter matrix can be denoted as where is the projected between class scatter matrix except mode n.
Similarly, the n mode within class scatter matrix can be defined as where is the projected within class scatter matrix along all modes except mode n.
To measure the distance of tensor samples in the low dimensionality subspace, the tensor Frobenius norm is chosen to calculate the distance of tensor samples. According the tensor operations, maximize the tensor Frobenius norm of the projected mean tensor can be reformulated as Equation (13) arg max Similarly, minimize the tensor Frobenius norm of the projected mean tensor is equal to minimize the trace of mode n within class scatter matrix, which can be denoted as Equation (14) arg min By adopting the difference scatter discriminant criterion [21,27], the objective function of TLDA can be reformulated as Equation (15).
where ξ is the tuning parameter. Finally, the optimal factor matrix U n can be solved by Equation (16) .
The projected data of sample X i,j in low dimensionality subspace can be calculated by Equation (17).
The detailed steps of TDA are illustrated in Algorithm 1.

INPUT:
Original cube hyperspectral image X ∈ R I 1 ×I 2 ×I 3 and the corresponding labels, the dimensionality of projection space I 1 × I 2 × I 3 , spatial size of tensor training samples B 1 and B 2 , the training samples number of each class n i , parameter ξ, maximum iteration number T max and iteration error tolerance ε. Construct n i tensor training samples for each class with spatial size For n = 1, 2, · · · , N do compute Bn n by Equation (11). compute Wn n by Equation (12). compute factor matrixU n by Equation (16). end The optimal factor matrices U n , n = 1, 2, 3.

Compact Feature Representation of Hyperspectral Images
The large spectral bands number of hyperspectral images bring rich spectral information, but when the ability of representation is concerned, the original spectral bands may not be the ideal representation.There are at least two drawbacks of the original spectral bands: (1) there are redundancy information among different spectral bands, especially between the neighboring bands which may lead to the high computational and storage cost. (2) the intrinsic spectral feature of original hyperspectral images is randomly distributed in all the spectral bands, compared with the compact feature distribution, it is difficult to extract the disperse information from all the spectral bands. These two drawbacks may result in a chaos and even conflicting spectral representation which will degrade the representative of the original spectral bands. To overcome these drawbacks, a novel tensor analysis based compact feature representation method of hyperspectral images is proposed.
The original cube hyperspectral image can be directly presented as a 3 order tenser dataset, i.e., two spatial dimensions and one spectral dimension. Suppose a hyperspectral image X ∈ R I 1 ×I 2 ×I 3 , where I 1 and I 2 are the spatial dimensions and I 3 is the spectral dimension. The low rank tensor decomposition aims at finding a approximate tensorX ∈ R I 1 ×I 2 ×I 3 , which satisfies the Equation (18) where rank 1 (X ) = r 1 , rank 2 (X ) = r 2 , rank 3 (X ) = r 3 is the given rank along each mode ofX . It have been proved that, Equation (18) can be converted to Equation (19), where U n , n = 1, 2, 3 is constructed by the first r n eigenvectors of R n R T n and R n is the n−mode flattening matrix ofX [28]. The solution of Equation (19)can be represented as where P n = U n U T n , n = 1, 2, 3. The purpose of compact representation is to make the randomly distributed in all the spectral bands more concentrated. By extending the PCA to tensor space, the eigenvectors corresponding to the larger eigenvalues ofR nR T n retains more intrinsic information of mode n. Inspired by the analysis above, it can be concluded that, the concentration of spectral bands can be achieved by concentrating the eigenvalues corresponding to the covariance matrixR 3R T 3 along spectral mode, so we resort the eigenvalues and represent the new eigenvalues array as δ. It is noted that, the spatial region neighborhood information of hyperspectral images must be kept, so the compact representation is only considered in spectral domain. Here, the rank along each mode is set to be equal to the original size of hyperspectral image, i.e., r 1 = I 1 , r 2 = I 2 , r 3 = I 3 . The compact representation of X can be calculated by Equation (21).
where Λ is the diagonal matrix corresponding to δ.
To illustrate the effect of compact spectral feature representation, some selected raw bands and compact representation bands of three real hyperspectral images are shown in Figure 2.  Figure 2 that, for the raw representation, the spectral information is randomly distributed in all spectral bands which make it difficult to extract the intrinsic information of the hyperspectral images, while for the compact representation, the spectral information is mainly preserved in the first spectral bands and the latter bands contain little useful information.

Dimensionality Reduction of Hyperspectral Images by TDA-CFR
In general, a ideal feature representation of hyperspectral image should be compact and discriminative. To achieve this, a hyperspectral images dimensionality reduction model called tensor discriminant analysis via compact feature representation (TDA-CFR) is constructed by integrating the TDA and the compact representation. There are mainly three steps of the proposed model, i.e., compact spectral feature representation, tensor samples construction and dimensionality reduction.
For compact spectral feature representation, by setting the rank along each mode equal to the original size of the hyperspectral image, the compact spectral feature representation can be calculated by Equation (21).
To fully preserve the spatial region neighborhood information of the hyperspectral images, tensor samples are constructed for each pixels by a fixed spatial window B 1 × B 2 . For the compact representation X compact ∈ R I 1 ×I 2 ×I 3 , the tensor samples dataset, which is called sub-tensor dataset can be represented as X sub compact ∈ R B 1 ×B 2 ×I 3 ×(I 1 ×I 2 ) , where B 1 , B 2 is the spatial size of tensor samples, I 3 is the spectral dimensionality and I 1 × I 2 is the number of tensor samples. In the framework of Tucker decomposition, the dimensionality reduction of tensor samples can be achieved by Equation (22), where X sub(i) compact is the i-th tensor sample and Y i is the corresponding low dimensionality projection. The critical issue of Equation (22) is the factor matrices U n which can be calculated by Equation (16). In Equation (16), we select the first I n eigenvectors corresponding to (Bn n − ξWn n ) along mode n and the factor matrices U n can be denoted as U n ∈ R I n ×I n , I n > I n , n = 1, 2, 3, where I n is the original size along mode n and I n is the reduction dimensionality along mode n. It is noted that, to keep the spatial dimensionality of hyperspectral image unchanged and reduce the spectral dimensionality, the spatial factor matrices U 1 , U 2 are constructed by the largest eigenvector along mode 1 and mode 2, while the factor matrix U 3 is constructed by the first I n eigenvector along mode 3. After rearranging the projected tensor samples Y i , the dimensionality reduction dataset Y ∈ R I 1 ×I 2 ×I 3 is obtained.
The detailed steps of TDA-CFR are shown in Algorithm 2. Different form the existing tensor based discriminant analysis methods which are directly applied in the raw spectral bands of hyperspectral images, the proposed TDA-CFR integrates the tensor discriminant analysis and compact feature representation in one framework which can eliminate the chaotic and conflicting information of the raw spectral bands and lead to a more discriminative and informative feature representation. On the other hand, the tensor sample volume of TDA-CFR is larger than that of vector base methods due to the tensor samples criterion is employed in the proposed method which may result in a longer running time.

Algorithm 2:
The Proposed TDA-CFR INPUT: Original cube hyperspectral image X ∈ R I 1 ×I 2 ×I 3 , the spatial size of sub-tensor samples B 1 and B 2 , the training samples number of each class n i , parameter ξ, maximum iteration number T max and iteration error tolerance ε. Calculate the compact representation of hyperspectral image X compact ∈ R I 1 ×I 2 ×I 3 by Equation (21). Construct n i tensor training samples for each class with spatial size B 1 × B 2 Calculate the factor matrices U n , n = 1, 2, 3 by Algorithm 1. Calculate the low dimensionality projection of all tensor samples by Equation (22). Rearrange the projected dataset Y i . OUTPUT: The dimensionality reduction dataset Y

Experimental Results and Analysis
The performance of TDA-CFR, i.e., the classification accuracy, the sensitivity with different number of spectral bands and the computation efficiency are presented in this section.

Experimental Setup
(1) Hyperspectral dataset Three real hyperspectral images, namely Indian Pine, Pavia University and Salinas are selected as the experimental datasets in the following experiments [29]. The pseudo-color and ground truth maps of these three hyperspectral images are shown in Figure 3a    (2) Comparison methods The proposed TDA-CFR is a tensor discriminative analysis based dimensionality reduction method, some related methods, including Linear discriminant analysis (LDA), sparse and low rank graph for discriminant analysis (SLGDA) [30] which employs the sparse and low rank constraints to achieve dimensionality under the graph embedding framework, low rank tensor approximation (LRTA) [31] in which the tensor low rank decomposition criterion is directly implemented on the raw hyperspectral images, group tensor based low rank decomposition (GTLR) [32] in which the tensor samples of hyperspectral images are firstly grouped into some clusters and then the tensor low rank decomposition is implemented on the obtained clusters, tensor discriminant analysis without compact representation (TDA) is also chosen as the comparison method to evaluate the effect of compact feature representation, tensor sparse and low rank graph based discriminant analysis (TSLGDA) [33] which is a tensor graph method with sparse and low rank constraints and the raw spectral feature (Original) is chosen as the benchmark.
Two popular classifiers, i.e., the Support Vector Machine (SVM) and Nearest Neighborhood (1NN) and chosen as the classifiers. The Gaussian kernel function is employed in SVM and the parameters in SVM are obtained by the 5-flod cross validation. 10% training samples are randomly selected to train the classifiers and the remaining are used as testing samples. To reduce the effect of randomness, all experiments are repeated 100 times and the mean values and variance are reported.
Let N be the number of all samples, C be the number of all classes, N i be the samples number of class i and n i be the number of correctly classified samples of class i, the corresponding evaluation indexes can be defined as follows.
OA evaluates the percentage of correctly classified samples in the total samples which is defined by Equation (23).
AA measures the mean values of the correctly classified percentages of all classes which is denoted by Equation (24).
CA is the percentage of correctly classified samples for individual class which is defined by Equation (25).
Kappa coefficient is a multivariate statistical method for evaluating classification accuracy which can take into account the uncertainty of classification results and reflect the error of classification more accurately. Suppose the confusion matrix M is denoted by Equation (26).
where m ij is the number of pixels which belong to class i and be classified to class j. Then, the Kappa can be calculated by Equation (27) where p o = 1 N ∑ C i=1 m ii , p e = 1 N 2 ∑ C i=1 m i * m * i and m i * = ∑ C j=1 m ij , m * i = ∑ C j=1 m ji . Kappa coefficient is between 0 and 1, and the higher Kappa coefficient means the better consistency of classification results.

Classification Results
The separability of hyperspectral dataset after dimensionality reduction is an important index for dimensionality reduction. Classification experiments are employed to evaluate the separability of the dimensionality reduced hyperspectral dataset. The classification results (including OA, AA, Kappa coefficients and Class-special Accuracy) are listed in Tables 1-3, the classification maps are shown in Figures 3-5.
Indian Pines is a relatively complicated scene, there are small sample classes (class 7 and 9, marked by) as well as the classes with good consistency (such as class 11 and 14, marked by white rectangle). TDA-CFR can obtain the best performance in terms of OA, AA and Kappa. Specifically, for the small samples classes, TDA-CFR achieves about 94% and 81% classification accuracy of class 7 and class 9. For the classes with large samples and good consistency, such as classes 11 and 14, TDA-CFR achieves about 98% and 99% classification accuracy of these two classes. The desirable classification performance reveals the good performance of TDA-CFR in dimensionality reduction. GTLR achieves the best classification performance of classes 7, but the classification performance greatly depends on the spatial size of tensor samples, the performance will degrades seriously if the spatial size is inappropriate. For Pavia University, it can be concluded from Table 2 and Figure 4 that, for good spatial consistency classes (such as class 6, marked by white rectangle), the tensor based methods have evident advantages over vector based methods, which further reveals the superiority of tensor analysis in hyperspectral images processing. In addition, the proposed method also achieves desirable performance on the zonal and dotted classes (such as class 8, marked by white oval).
The ground objects in Salinas have relatively large samples number and good spatial consistency which leads to a good classification performance for all the comparing methods. It is noted that, by employing the tensor discriminative analysis and compact spectral feature representation, the proposed method achieves desirable classification performance of class 8 and 15 (marked by white oval) which are easy to confuse. In addition, the proposed TDA-CFR achieves almost 100% classification accuracy of classes 2, 3, 6, 9 and 12 with SVM classifier, which further demonstrates the excellent performance of the TDA-CFR at extracting the discriminative and compact intrinsic information of the hyperspectral images.

The Effect of Reduced Dimensionality
The classification accuracy with different spectral dimensionality is an important evaluation index of dimensionality reduction methods and this issue of TDA-CFR is analyzed experimentally in this section. The dimensionality range is set as (2-50) and the step size is 2. The results are shown in Figure 6 which illustrate that, for all comparison methods, the OA increases with the increase of dimensionality and then goes stabilized. In addition, TDA-CFR achieves the optimal classification performance when the dimensionality is larger than 20, which further reveals the good performance of TDA-CFR in hyperspectral images dimensionality reduction. According the experimental results, we set the reduced dimensionality as 30 in all these experiments.

Analysis of Computation Efficiency
The computation efficiency is analyzed in term of running time with Matlab R2014a on a PC with Intel Core i5-5490 CPU and 8G RAM. The results are listed in Table 4.
It can be observed from Table 4 that, for vector based methods, i.e., LDA, SLGDA, and ORIGINAL, the corresponding running time are shorter than that of tensor based methods, such as GTLR, TDA, LRTA, TSLGDA and TDA-CFR. More specifically, GTLR and LRTA cost the much shorter running time than the other tensor based methods, this is because the best rank values along each mode in these two methods are given directly and this avoids the complicated operation of rank estimation. Because the optimal factor matrices in TSLGDA are calculated by the iteration criterion which is time consuming, TSLGDA needs almost the longest running time. The running time of TDA and TDA-CFR are almost the same, this reveals most of the running time is consumed in the step of discriminant analysis.

Discussion
There are some parameters of the proposed TDA-CFT, such as the spatial size of tensor samples, the number of training samples and the tuning parameter. In this section, these parameters are discussed to evaluate the method more comprehensively.

Discussion of the Number of Training Samples
Training samples number is an important issue for the proposed TDA-CFT. In this section, a classification experiment with SVM classifier with the training samples number varying from 2 to 20 and the results are shown in Figure 7. Generally speaking, the classification results increase with the increase of training samples number and then tend to be stable. It can be observed from Figure 7 that, the training samples number has slight effect on the OA and the proposed TDA-CFR can achieve desirable performance even there are only a few training samples number, this may be because one tensor samples usually contains many pixel samples, so there will still be a number of pixel samples which can offer rich intrinsic information of the hyperspectral images even only a few tensor samples are selected. This character will make the proposed TDA-CFT more practical when the labeled samples are scarce. The number of training samples for each class is set as 5 in all the experiments.

Discussion of the Spatial Size of Tensor Samples
The spatial fixed window criterion is employed to construct tensor samples in the proposed method. When the spatial fixed window criterion is concerned, the window size is an important issue. If a too small window is employed, each tensor sample may contain only a few pixel samples and the spatial region neighborhood information cannot be well preserved. On the other hand, a too large window may cause the pixel samples of different categories to be divided into the same tensor sample which may destroy the spatial structure consistency. Here, the effects of the window size on the classification accuracy are discussed experimentally and the results are illustrated in Figure 8. Specifically, we suppose the spatial window is a square with a side length of B. It can be observed from Figure 8 that, if a small spatial window is employed, such as 3, the classification performance is poor and with the increase of window size, the classification accuracy increases and then goes stabilized. The spatial window size is set as 9 × 9 in all the experiments.

Discussion of the Parameter ξ
As the discussion above, parameter ξ is used to measure the effect of within class compactness and between class separation in tensor discriminant analysis model. Here, the effect of ξ in term of OA are discussed experimentally. Specifically, ξ varies from [0.005, 0.01, 0.05, 0.1, 1, 5] and the experimental results with SVM classifier are shown in Figure 9. It can be seen from Figure 9 that, when the value of ξ is small, such as 0.005, 0.01, 0.1, the proposed method can achieve the desirable classification performance and the corresponding variance is small. With larger ξ, the classification performance may degrade evidently and the corresponding variance may also increase, which suggests the proposed method cannot work steadily with a large ξ. So the parameter ξ is set as 0.01 in all these experiments.

Conclusions
A novel compact feature representation based tensor discriminative analysis model was proposed for hyperspectral images dimensionality reduction. Generally speaking, an ideal feature representation should be compact and discriminative. To this end, tensor low rank decomposition framework is employed in the proposed TDA-CFR to obtain a more compact feature representation which can eliminate the chaotic and conflicting information of the raw spectral bands. In addition, the tensor discriminant analysis which can preserve the original spatial region information and enhance the discriminant of the processed dataset is integrated to the compact feature representation model. Extensive experiments illustrate the superiority of TDA-CFR over the existing methods.
Final, there are still some issues which deserve further considered. In the proposed TDA-CFR, the spatial size of tensor samples are determined experimentally, how to construct tensor samples by a spatial window with adaptive size deserves further study. In addition, how to integrate the process of dimensionality reduction and the following tasks into one framework and make the results of dimensionality reduction more suitable for specific tasks requires further consideration.