Hyperspectral Remote Sensing Image Feature Representation Method Based on CAE-H with Nuclear Norm Constraint

: Due to the high dimensionality and high data redundancy of hyperspectral remote sensing images, it is difficult to maintain the nonlinear structural relationship in the dimensionality reduction representation of hyperspectral data. In this paper, a feature representation method based on high order contractive auto-encoder with nuclear norm constraint (CAE-HNC) is proposed. By introducing Jacobian matrix in the CAE of the nuclear norm constraint, the nuclear norm has better sparsity than the Frobenius norm and can better describe the local low dimension of the data manifold. At the same time, a second-order penalty term is added, which is the Frobenius norm of the Hessian matrix expressed in the hidden layer of the input, encouraging a smoother low-dimensional manifold geometry of the data. The experiment of hyperspectral remote sensing image shows that CAE-HNC proposed in this paper is a compact and robust feature representation method, which provides effective help for the ground object classification and target recognition of hyperspectral remote sensing image.


Introduction
Hyperspectral remote sensing images are rich in spatial, spectral and radiation information, including hundreds or even thousands of spectral bands, which can fully reflect the subtle features of the surface object spectrum and provide extremely rich information for the extraction of surface object information, which is beneficial to more detailed surface object classification [1][2][3][4]. In recent years, hyperspectral remote sensing images have attracted the attention of many scholars, and have been widely used in ecological monitoring [5,6], medical diagnosis [7,8], military reconnaissance [9] and other important fields. Due to the increase of spectral bands in hyperspectral remote sensing images, the problems of increased dimension and high data redundancy appear [10,11], resulting in the complexity of data processing [12]. In order to alleviate the above problems, it is usually necessary to reduce the dimension of hyperspectral data [13,14]. Hyperspectral data has a specific nonlinear structure in the high dimensional space, and this nonlinear structure is also the area where hyperspectral data are distributed and concentrated in high density [10,15]. The dimensional reduction representation of hyperspectral data can accurately describe the effective information in the data and keep the important information in the data only by maintaining the nonlinear structural relationship in the data [16,17]. Therefore, it is necessary to study the feature representation method that can keep the nonlinear structure relation in hyperspectral data.
Manifold learning algorithm can solve the problem of feature dimensionality reduction in hyperspectral remote sensing image. In many manifold learning algorithms, the local structure of manifold is represented by the local basis of changing direction, that is, the tangent plane of any point on the manifold. In order to construct the global manifold structure or global density, various methods of splicing these local tangent planes are proposed by different algorithms. In the past decade, domestic and foreign scholars have proposed many improved manifold algorithms. For example, Wang et al. proposed an improved ISOMAP algorithm for the analysis of hyperspectral image features. The improved ISOMAP algorithm selects neighborhood according to spectral Angle, avoiding neighborhood instability in high-dimensional spectral space [18]. Wang et al. proposed a method combining UVE and LLE, which used LLE to reduce the dimension of the image composed of effective wavelength, and used partial least squares discriminant analysis to establish the classification model [19]. In order to solve the local tangent space alignment in the adaptability of higher order information loss problems in the manifold, such as Yang et al. proposed a local neighborhood information extraction of the new algorithm optimization [20], through the optimization of the extraction of tangent vector, which can improve higher dimensional nonuniform distribution manifold dimensionality reduction effect, the proposed algorithm can effectively reconstruct density curve of low dimensional coordinates, the low dimensional high-dimensional image has good adaptability. Huang proposed a sparse discriminant embedded retained projection (SDE) [21], which takes advantage of the advantages of sparsity and manifold structure. It not only preserves the sparse reconstruction relation, but also promotes the manifold structure of discriminating data. Pan et al. put forward a kind of based on local keep rule of two-dimensional local projections (2 DLPP) directly from the image matrix to extract the features [22], the algorithm can better simulate the image characteristics inherent in the manifold structure, improves the characteristics of robustness, and reduces the computational complexity and the characteristics of the final dimension, on the recognition accuracy and recognition speed have achieved good results. However, a potentially serious limitation of these manifold learning algorithms is that they are based on local generalization, mainly using training points near points of interest to infer these local tangent planes. In order to overcome the disadvantage of local generalization, Reference [23] suggested neural network algorithm of Contractive Auto-encoder (CAE), the method by changing the main singular vectors of the Jacobian matrix, captured every input point around the local manifold structure, the corresponding singular value specified in related to the corresponding singular vectors direction how many local changes are credible, and keep in high density area of the input space. In order to maintain a smoother nonlinear manifold structure with data in a low-dimensional space, Rifai et al. proposed a method of Higher Order Contractive auto-encoders (CAE-H) based on CAE [24]. Yu et al. proposed a stacked contractive auto-encoder (SCAE) to improve the robustness of feature extraction through unsupervised training and learning [25]. Aamir et al. proposed an improved variant of CAE based on layered architecture following feed forward mechanism named as deep CAE. By encoding and decoding in each layer of CAE, reconstruction errors were reduced, so as to obtain more robust information features [26]. Ng et al. proposed a denoising-contractive auto-encoder (DCAE), which can learn robust feature representations from noisy and sparse feature vectors [27]. Zhang et al. proposed a new ensemble deep contractive auto-encoder (EDCAE), which automatically learns invariant feature representations by designing a variety of different DCAE models [28]. Due to the Jacobian penalty term in DCAE and different characteristics, these models can deal with various noisy data effectively. Finally, an effective EDCAE model is designed using the combination strategy. The above CAE models can effectively solve the invariance of characteristic jitter to varying degrees. However, due to the poor sparsity of the Frobenius norm of CAE model, the low-dimensional manifold structure cannot be effectively described, which reduces the expression ability of effective information in the data.
In view of the problem of high dimensionality and high data redundancy in hyperspectral remote sensing images, the dimensionality reduction representation of existing hyperspectral data cannot effectively maintain the nonlinear structural relationship in the data. Starting from the effective characterization of the nonlinear manifold structure of hyperspectral image data in low-dimensional space, this paper analyzes the Jacobian matrix Frobenius norm approximation of CAE, the geometric interpretation of CAE and the Hession matrix Frobenius norm approximation of CAE-H, and proposes a feature representation method of Higher Order Contractive Auto-Encoders With Nuclear Norm Constraint (CAE-HNC). By introducing Jacobian matrix in the CAE of the nuclear norm constraint, the nuclear norm has better sparsity than the Frobenius norm and can describe the local low dimension of the data manifold better. At the same time, a second-order penalty term is added, which is the Frobenius norm of the Hessian matrix expressed in the hidden layer of the input, encouraging a smoother low-dimensional manifold geometry of the data.

Contractive Auto-Encoders
From the point of view of manifold learning, the high-dimensional training data is located on a low-dimensional manifold [23]. Changes in the data correspond to local changes in the manifold (along the direction of the tangent plane), while changes in the data correspond to directions that are orthogonal to the manifold. Therefore, as long as we learn the changes and invariable directions in the data, the manifold structure of the high-dimensional data is also characterized. The goal of Contractive auto-encoders is to learn the manifold structure of high-dimensional data in low-dimensional space. The two driving forces for CAE learning are the contraction penalty term to keep the learned feature constant in all directions (shrinking in all directions), and the reconstruction error term to be able to reconstruct the learned feature back into the input. Therefore, during the learning process, the force of the contraction penalty term makes the change direction in the data (that is, the direction of the manifold tangent plane) able to resist the contraction, which is reflected in the large singular value in its corresponding Jacobian. The directions that do not resist the contractive ability correspond to the invariant directions in the data (orthogonal to the manifold's direction), and the gradient in Jacobian becomes very small. It can be seen that the shrink autoencoder can effectively describe the lowdimensional manifold structure of the data so as to obtain a more compact data representation.

CAE Model
Contractive auto-encoders is a kind of regular auto-encoder, whose model structure is consistent with the traditional self-encoder. The main purpose of CAE is to suppress the disturbance of the training sample data in all directions and to achieve the effect of local space contraction by adding a penalty term on the target function of the traditional autoencoder. The penalty term is the Frobenius norm of Jacobian matrix expressed in the hidden layer of input, whose purpose is to shrink the mapping of feature space near the training data, specifically expressed as follows: where, is Jacobian matrix of , represented as the sum of the partial derivatives of ℎ with respect to . ‖•‖ is the Frobenius norm of the matrix.ℎ contains function groups and contains variables. CAE adds equation (1) to the loss function as a penalty term to reduce the sensitivity of the model to small changes in input, so as to achieve the purpose of a good system. This formula is used as a penalty term because it has the following characteristics: when the penalty term has a relatively small first-order derivative, it indicates that the hidden layer expression corresponding to the input signal is relatively smooth. Then when the input changes to a certain extent, the hidden layer expression will not change much, which achieves the purpose of being insensitive to the input changes. Therefore, the loss function of CAE can be expressed as follows: where is the weight vector and is the bias of the loss function, is the number of training samples, is a super parameter to control the strength of the penalty term, and you can choose any value between 0 and 1. The former term of the loss function is to make the reconstruction error as small as possible, so that CAE can obtain all the information of the input signal as far as possible, and the latter term can be regarded as the information that CAE is discarding as much as possible. Therefore, CAE will finally obtain the disturbance information on the training data, making the model invariable to the disturbance. According to equation (2), the optimization problem can be described as: Learning parameters are： is the Jacobian matrix of the hidden layer relative to the input x, = ,where is the diagonalization of the first derivative function of the function, that is, When the stochastic gradient descent algorithm is used to solve the parameters, it is necessary to calculate the gradient of each parameter corresponding to the loss function.
CAE can describe the local complex manifold structure around each data point by the singular value decomposition (SVD) of input Jacobian matrix. The corresponding singular value specifies how much local variation is trusted in the direction associated with the corresponding singular vector, while remaining in the dense region of the input space.
The penalty term of the CAE loss function excites the insensitivity of ℎ in all input space directions. This pressure is balanced by the need for accurate reconstruction, resulting in ℎ being essentially sensitive to only a few input directions and requiring training samples to distinguish closed input directions. When contains all the information to calculate the sensitivity of ℎ = to the motion in any input direction, performing SVD produces a more direct orthonormal basis for the direction of information, ranking from the most sensitive to the least sensitive. The subset of the most sensitive directions in this orthogonal basis can be interpreted as a manifold generated from the tangent space at point .

Jacobian's Frobenius Norm Approximation
In order to solve the Frobenius norm of Jacobian expressed by the hidden layer of the input data in the penalty term in CAE, while calculating ‖ ‖ , this paper uses the method of ref [24] of the following: where ~ 0, is an isotropic gaussian distribution with variance of . Theoretically, the smaller is the more accurate the random approximation. In practice, however, the larger is used because it actually allows regularization to explore relatively distant observation points from the data manifold. In fact, the above approximation use random sampling approximations, that is: With parameter of Jacobian Frobenius norm of the gradient, about parameter Θ regular function is: It can be approximated as: Purpose is to calculate gradient on Θ. Function R (Θ) about Θ differential for:

Geometric Interpretation of CAE
The regular term in CAE encourages the hidden layer to encode ℎ in an input space that is insensitive in all directions (i.e. the activation function is saturated), which means that the points in the training set in the input space are not different (the so-called sensitivity is the difference). However, the reconstruction task in CAE needs to be able to identify different points in the training set, and this balance makes ℎ sensitive to only a few directions in the input space, so as to be able to distinguish different points near these phases in the training sample set. The geometric interpretation is that these sensitive directions span the locally tangent plane of the manifold. The tangent bundle of a smooth manifold consists of a set of tangent planes along all the sample points on the manifold. Each tangent plane corresponds to a Euclidean coordinate system or chart. In topology, atlas is a collection of such charts. Although the charts collection can form a non-Ou manifold, and each chart is Ou manifold.
Given data set , ℎ , ∈ satisfies the necessary condition of local injectivity, and consider how to define the local chart around by the property of ℎ. Since ℎ must be sensitive to the change of a sample from itself to one of its adjacent points , but not to other changes, we expect this sensitivity to be reflected in the Jacobian matrix = ℎ / spectrum where training is concentrated at each .Assuming that the rank of is , ℎ + and ℎ are, ideally, only different if is the span of a singular vector corresponding to a nonzero singular value of . That is to say, the sensitive direction of the singular vector corresponding to the nonzero singular value of , is that is that when becomes , ℎ becomes ℎ . In fact, has many smaller eigenvalues. Therefore, SVD decomposition of is used: = .
Define the partial chart of .The tangent plane of sample is defined as: where ℬ = | > , this is a larger eigenvalues of the singular vector(columns vectors).The left singular value of Jacobian matrix transpose (gradient matrix) SVD decomposition spans the tangent plane. The coordinates of vector ∈ ℬ on ℬ are the coordinates of vector ∈ ℬ on the tangent plane . Based on the local linear approximation, the atlas described by the encoder function ℎ is defined as: Given training set sample ≠ , sensitivity refers to ℎ ≠ ℎ , and insensitivity refers to ℎ = ℎ .

CAE-H Model and Its Norm Approximation
In order to improve the robustness of the input with small changes, CAE-H improves the objective function on the basis of CAE and adds a second-order penalty term, which is the Frobenius norm of the Hessian matrix expressed in the hidden layer of the input, as follows: where, is Jacobian matrix. The Frobenius norm constraint of Hessian matrix is used in CAE-H to punish curvature and encourage smoother manifold structure. From the above equation, the objective function of the CAE-H can be obtained as follows: Because the second derivative is added, the complexity of the model is greatly increased. The second derivative is converted into the first derivative to reduce the computational complexity. The Hessian Frobenius norm can be approximated as: Therefore, the final objective function of CAE-H is as follows:

CAE-H Based on Nuclear Norm Constraint
Part 2 introduces the shortcomings of the Frobenius norm constrained optimization model in CAE. The advantage of the nuclear norm is that it has a sparse low-dimensional manifold direction, and the nuclear norm is similar to the norm of singular value vectors.
norm has good sparsity, the model is easy to be interpreted, and the local low dimension of the characterizing manifold is relatively good, which can completely retain the geometric characteristics of the original data, so that the information of the original data is not lost, and the resistance to noise is strong. Therefore, this section proposes a CAE-HNC.

Definition of Nuclear Norm and Its Jacobian Approximation
The nuclear norm function , also known as trace norm, is defined as: where = . The nuclear norm is a convex function which can be optimized effectively and is the best convex approximation of the rank function on the matrix identity sphere with the norm less than 1. When matrix variables are symmetric and positive semidefinite, this heuristic is equivalent to the tracking heuristic often used in control systems. The nuclear norm heuristic has been observed in practice to produce very low-rank solutions, but the theoretical representation of when it produces the minimum rank solution has not been obtained before. (17) with respect to A can be approximated as:

Theorem 1. The Jacobian matrix of the nuclear norm F(A) defined in Equation
Kronecker sum is defined as: And √ = . (20) Proof of Theorem 1. Since the differential of the nuclear normal function can be expressed as: Therefore, the Jacobian approximation of the nuclear norm function is: where √ ; = √ ⨁√ and ⨁ is Kronecker sum. If ∈ ℝ × , ∈ ℝ × , then Kronecker sum is defined as：

The Robust CAE-H with Nuclear Norm Constraints
Nuclear norm with sparse is the advantage of low dimensional manifold directions, and nuclear norm is similar to the singular value vector of norm, and norm has good sparse, the model is easy to explain, and characterization of the manifold local low dimensional relatively well, which can keep complete geometric characteristics of original data, the original data information is not lost, the stronger noise resistance. CAE-HNC is still a regular auto-encoder, whose main purpose is to suppress the disturbance of the training sample data in all directions, and to better achieve the local space contraction effect by adding a Jacobian nuclear norm constraint penalty term on the objective function, as follows: The Frobenius norm of Hessian can be used to describe the geometric structure of the manifold with smoother data. Combining Equations (21), the objective function of the high-order contraction auto-encoder with nuclear norm constraint can be obtained as follows: The high-order contraction auto-encoder with nuclear norm constraint is transformed into the following optimization problem: The learning parameters of solution are:

Solution Algorithm
Aiming at the specific problem of feature representation of wetland hyperspectral remote sensing image, this part proposes a feature representation method of CAE-HNC. The regular terms of the Jacobian matrix with nuclear norm constraint are designed, and the sparse representation of the Jacobian matrix is realized, the model is easier to understand, and the local low dimension of manifolds is easier to describe. The parameter updating process in the solution algorithm of method CAE-HNC is as Algorithm 1:

Experimental Results and Analysis
Three groups of hyperspectral remote sensing images were selected in this paper. In order to verify the effectiveness of the proposed algorithm, experimental simulation analysis was carried out on three groups of hyperspectral images, and feature extraction and representation of different algorithms were performed using hyperspectral data such as CAE, CAE-H, SCAE, DCAE, and CAE-HNC, then classification comparison was conducted to verify the robustness of the proposed method. The experiment was run on a Windows (Inter(R) Celeron(R) G4900 cpu@3.10ghz 3.10 GHz 8GB RAM) 64-bit operating system, using Matlab2016 and ENVI5.3 for simulation verification.

The Robust CAE-H
In this paper, simulation experiments are carried out on three groups of data, all of which are obtained by spatial and spectral degradation simulation using hyperspectral images, as shown in Figure 1.  The third set of data, also obtained by AVIRIS sensor, is located in the Salinas Valley region of California. Image size is 512 × 217, contains 111104 pixels, removing water absorption band to participate in a total of 204 regions of the spectrum experiment, the wavelength range of 0.4 ~ 2.5 microns, the spatial resolution of 3.7 meters, the corresponding real line map contains 16 class features, including Fallow, Celery, etc.
The experimental data have been radiometric correction and geometric registration. The number of available samples for each group of data is shown in Table 1.

Classification Results of Hyperspectral Image Features
In order to verify the effectiveness of CAE-HNC feature representation method based on the kernel norm constraint proposed in this paper, this paper uses the CAE-HNC algorithm to extract the feature representation information of hyperspectral images, and uses the SVM classifier to realize the feature classification. The kernel function of SVM adopts gaussian RBF kernel function, the kernel function parameter = 0.3, the penalty factor C=50. In order to test the robustness of the proposed algorithm, 5% and 10% of the hyperspectral data were selected as training samples, and CAE-HNC method was used for data feature extraction. Then 40% of the extracted feature data were used as training samples and 60% as test samples for ground object classification in SVM.
The proposed CAE-HNC method is compared with different methods, including Contractive Auto-encoder (CAE), High-order Contractive auto-encoder (CAE-H), Stacked Contractive Auto-encoder (SCAE) and Denoising Contractive Auto-encoder (DCAE).The experimental results were measured by the Overall Accuracies (OA) A and Kappa coefficient (R) of ground object classification in hyperspectral images. For each group of experiments, the experiment was repeated 10 times, and then the average value of the 10 results was selected for comparison. The experimental results are shown in Table 2, where the results shown in bold are the relatively highest accuracy and Kappa coefficient.  Table 2 shows that the proposed CAE-HNC method can well describe the low-dimensional manifold structure of local features of hyperspectral images, with good robustness and satisfactory classification results. Figure 2 is the local region magnification of several different feature representation methods on the University of Pavia (UP) data set after SVM classification results, in which the training sample is 5%. As can be seen from Figure Figure  2d-f can better eliminate salt and pepper noise in classification mapping, so that the image becomes clearer and smoother. In fact, this is the result of effectively improving the phenomena of "different objects with the same spectrum" and "the same object with different spectra". For ground objects with similar spectrum, such as grass and trees, buildings composed of gravel and bricks, there is a certain degree of misclassification. The CAE-HNC method proposed in this paper has more accurate classification results in the red area of Figure 2. The contours of trees and asphalt buildings are clearer and more in line with the real ground features.

Conclusions
Hyperspectral data has a special nonlinear structure in high dimensional space, and this nonlinear structure is also the area where hyperspectral data are distributed and concentrated in high density. This kind of nonlinear structure relation is not effectively described in the dimensionless representation of hyperspectral data, resulting in poor robustness of the feature representation of hyperspectral image. Therefore, this article focus on the study of the characteristics of wetland hyperspectral image said learning method, aimed at effectively depict hyperspectral data in low dimensional space nonlinear manifold structure, analyzes the Jacobian matrix of CAE Frobenius norm approximation, CAE geometric interpretation and CAE-H Frobenius norm Hession matrix approximation, is proposed based on nuclear norm constraint of high order contractive auto-encoder (CAE-HNC) feature representation. By introducing Jacobian matrix in the CAE of the nuclear norm constraint, the sparse feature is enhanced, the low-dimensional manifold structure of the data is characterized effectively, the Frobenius norm second-order penalty term of the Hessian matrix is added, and the smoother low-dimensional manifold geometry is encouraged. Experiments on hyperspectral images show that CAE-HNC is a compact and robust feature representation method, which provides effective help for the ground object classification and target recognition of hyperspectral images.

Conflicts of Interest:
The authors declare no conflict of interest.