Hyperspectral Sea Ice Image Classification Based on the Spectral-Spatial-Joint Feature with the PCA Network

: Sea ice is one of the most prominent causes of marine disasters occurring at high latitudes. The detection of sea ice is particularly important, and the classification of sea ice images is an important part of sea ice detection. Traditional sea ice classification based on optical remote sensing mostly uses spectral information only and does not fully extract rich spectral and spatial information from sea ice images. At the same time, it is difficult to obtain samples and the resulting small sample sizes used in sea ice classification has limited the improvement of classification accuracy to a certain extent. In response to the above problems, this paper proposes a hyperspectral sea ice image classification method involving spectral-spatial-joint features based on the principal component analysis (PCA) network. First, the method uses the gray-level co-occurrence matrix (GLCM) and Gabor filter to extract textural and spatial information about sea ice. Then, the optimal band combination is extracted with a band selection algorithm based on a hybrid strategy, and the information hidden in the sea ice image is deeply extracted through a fusion of spectral and spatial features. Then, the PCA network is designed based on principal component analysis filters in order to extract the depth features of sea ice more effectively, and hash binarization maps and block histograms are used to enhance the separation and reduce the dimensions of features. Finally, the low-level features in the data form more abstract and invariant high-level features for sea ice classification. In order to verify the effectiveness of the proposed method, we conducted experiments on two different data collection points in Bohai Bay and Baffin Bay. The experimental results show that, compared with other single feature and spectral-spatial-joint feature algorithms, the proposed method achieves better sea ice classification results (94.15% and 96.86%) by using fewer training samples and a shorter training time. paper proposed a hyperspectral sea ice image classification method based on the spectral-spatial-joint feature with PCANet. The method uses fewer training samples to dig deeper into the textural, spatial, and spectral features implicit in hyperspectral data. The PCANet network was designed to produce improved sea ice image classification compared with other remote sensing image classification methods. The experimental results show that the proposed method can efficiently extract the depth features of remote sensing sea ice images with fewer training samples, and it has a better classification per-formance on the whole. Thus, it could be used as a new model for the classification of remote sensing sea ice images.


Introduction
Seawater accounts for about 70% of the global area, and sea ice accounts for 5-8% of the global ocean area. Sea ice is the main cause of marine disasters in high-latitude regions, and it is also an important factor affecting fishery production and construction manufacturing in mid-high latitude regions [1]. Therefore, sea ice detection has important research significance. As an important part of sea ice detection, sea ice image classification can extract the types of sea ice accurately and efficiently, which is of great significance in the assessment of sea ice conditions and the prediction of sea ice disasters [2].
In recent years, the continuous development of remote sensing technology has provided more data sources for sea ice detection. At present, common remote sensing data include the synthetic aperture radar [3], multispectral satellite images with medium and high spatial resolution, and hyperspectral images [4]. Of these, hyperspectral remote sensing data have the characteristics of wide coverage, high resolution, and multiple data sources. The data contain rich spectral and spatial information, which supports sea ice detection and classification [5]. At present, more and more hyperspectral images are being used for sea ice classification, all of which have a good classification effect [6]. However, the high dimensions of hyperspectral data bring many computational problems, such as strong correlation bands and data redundancy. In addition, due to the special environment of sea ice, it is difficult to obtain a sufficient number of sea ice samples. These problems bring great challenges to the classification of remote sensing sea ice images.
Traditional remote sensing image classification methods include maximum likelihood estimation, decision trees, and support vector machines (SVM) [7]. Most of these methods do not take full advantage of the spectral and spatial information in hyperspectral images, so it is difficult to solve the phenomenon of having "different objects with the same spectral information". Therefore, the introduction of effective spatial information can make up for the deficiency of using only spectral information. Common spatial feature extraction methods include the GLCM, Gabor filter, and the morphological profile [8]. The GLCM is based on a statistical method and has the advantages of strong robustness and recognition ability. The Gabor filter is based on the signal method and has a similar frequency and direction to the human visual system, so it is particularly suitable for textural representation and discrimination [9]. Zhang Ming et al. [10] used the GLCM to extract features and carried out classification research on sea ice through the SVM. Zheng Minwei et al. [11] used the GLCM to calculate the textural features of images and separated ice and water with the SVM. Yang Xiujie et al. [12] extracted Gabor spatial features from the PCA projection subspace to quantify the local direction and scale features and then combined these with the Gaussian mixture model for classification. For classification using the spectral-spatial-joint features of remote sensing images, there are two main approaches, pixel-based and object-based. Amin et al. [13] proposed a new semi-automatic framework for spectral-spatial classification of hyperspectral images, using pixel-based classification results as pixel reference and finally doing decision-level classification with object-based classification results. Although the above studies have achieved a certain classification effect, feature extraction is based on a shallow manual extraction method involving prior knowledge, and deeper feature information cannot be obtained, which also limits the accuracy of hyperspectral image classification to a certain extent.
Some studies have shown that the image classification method based on the deep learning algorithm can obtain better classification results than traditional methods [14]. Since the hyperspectral image classification method based on deep learning was first introduced to the field of remote sensing in 2014 [15], the simultaneous extraction of spectral and spatial features through deep learning has become a major technical hotspot in the classification of remote sensing images. These algorithms include the deep belief network and the convolutional neural network [16]. In addition, the deep convolutional neural network algorithm is a relatively mature method that can be used for extracting spectral and spatial features simultaneously. Compared with prior knowledge, the neural network can automatically learn advanced features. It reduces the large amount of repetitive and cumbersome data preprocessing work required for traditional algorithms, such as SVMs [17]. Chen Sizhe et al. [18] used the full convolutional neural network to classify sea ice images and achieved a significantly superior average accuracy to that of traditional algorithms. The convolutional neural network model established by Wang Lei et al. [19] has the ability to recognize the differences between subtle features, which is conducive to intra-class classification. However, training a neural network requires a wealth of data and a large number of parameters, which is a huge challenge for computational performance. In order to avoid cumbersome parameter tuning techniques in 2015, Chen et al. proposed a deep learning baseline network PCANet [20] to reduce the computational overhead and achieved good results in tasks such as image classification and target detection. Wang Fan et al. [21] proposed an improved PCANet model, whose improvement lies in the input layer of PCANet, and applied it to the classification of hyperspectral images. The PCA network, as a competitive deep learning network, has the following advantages: on one hand, it is easy to train, which allows it to achieve good effects for small samples and improve the training time efficiency; on the other hand, the cascading mapping process of the network allows a mathematical analysis to be carried out and proves its effectiveness, which is urgently needed in today's deep learning algorithms.
Based on the above research, this paper proposes a hyperspectral sea ice classification method based on a spectral-spatial-joint feature with PCANet. This method fully extracts spectral and spatial information from hyperspectral data and uses the PCANet deep learning network to obtain depth feature information from sea ice images. It further improves the sea ice classification accuracy and allows a highly efficient training process. In this method, the sea ice textural information is extracted by the GLCM, and the sea ice spatial information is extracted by Gabor filters. The influence of redundant bands is removed by correlation analysis, and the spectral and spatial features of sea ice images are excavated by the PCANet network to improve the final classification accuracy. The method selects samples randomly for training and then inputs them into the SVM classifier for classification, which improves the sea ice classification when there is only a small number of samples.
The rest of this paper is organized as follows. Section 2 introduces the design framework and related algorithms in detail. The experimental data set and related experimental setups are described in Section 3, and the experimental results and model parameters are discussed and analyzed at the same time. Finally, we summarize the work of this paper in Section 4.

Proposed Method
The proposed sea ice classification framework based on spectral-spatial-joint features is shown in Figure 1. It includes four main parts: the extraction of textural features based on GLCM, the extraction of spatial features based on Gabor filters, the selection of spectral features based on the band selection algorithm, and feature extraction and classification based on PCANet. Firstly, the principal component analysis is used to reduce the dimensionality, and then the GLCM and Gabor filter are used to extract textural and spatial features from hyperspectral images. The band selection algorithm and correlation analysis are used to extract the subsets of the main spectral bands from the original data. Then, the spatial and spectral feature information extracted from the two branches are fused as the input of PCANet and classified by the SVM. Finally, classification accuracy is evaluated. This algorithm is described in subsequent sections.

Texture Feature Extraction
Texture refers to small, semi-periodic, or regularly arranged patterns that exist in a certain range of images. It is one of the main features of image processing and pattern recognition. Texture is usually used to represent the uniformity, meticulousness, and roughness of images [10]. The textural features of an image reflect the attributes of the image itself and help to distinguish it.
Textural features refer to changes in of the gray value of the image, which is related to spatial statistics. The GLCM is a method of extracting textural features based on statistics. It describes the gray-level relation matrix between a pixel and adjacent pixels within a certain distance in the local or overall area of the image [22]. It reflects the joint distribution probability of pixel pairs in the image, which can be expressed as follows: In Equation (1), p(i, j, d, θ) represents the times at which pixel pairs appear on the direction angle θ , where the gray values of pixel pairs are i and j , respectively. i , j∈{0,1, . . . , N − 1}, where N is the number of quantized gray levels, (x, y) are the relative coordinates of pixels in the image as a whole, D and D are horizontal and vertical offsets respectively, N x and N y are the number of columns and rows of the image respectively, d is the distance between pixel pairs, and θ is the direction angle in the process of displacement, which is generally set as 0°, 45°, 135°, or 180° [22].
After calculating the GLCM, some statistics are usually constructed as texture classification features instead of applying the matrix directly. In this paper, eight typical texture scalars are extracted, which are the mean, variance, homogeneity, contrast, dissimilarity, entropy, angular second moment, and correlation.

Spatial Feature Extraction
The Gabor filters can effectively obtain representative frequencies and directions in the spatial domain [2]. Therefore, the establishment of a group of Gabor filters with different frequencies and directions is conducive to the accurate extraction of spatial information features. As a linear filter, the Gabor is suitable for the expression and separation of features. By extracting relevant features at different scales and in different directions in the frequency domain, it can achieve multi-scale and multi-direction features of images.
Generally speaking, the Gabor filter is defined by the Gaussian kernel function, as shown in Equation (2). Its components are divided into a real part and an imaginary part. The real part can smooth the image, while the imaginary part can be used for edge detection, as shown in Equations (3) and (4).
g(x, y; λ, θ, ψ, σ, γ) = exp − x + γ y 2σ exp i 2π g(x, y; λ, θ, ψ, σ, γ) = exp − x + γ y 2σ cos 2π g(x, y; λ, θ, ψ, σ, γ) = exp − x + γ y 2σ sin 2π In the above equations, x = x cos θ + y sin θ, y = −x sin θ + y cos θ is the center point of the Gaussian kernel; λ is the wavelength, which is usually not greater than one-fifth of the size of the input image; θ is the direction, ranging from 0° to 360°; ψ is the phase offset, which mainly considers the rotation possibility of the sample and increases its robustness to rotation invariance; σ is the standard deviation of the Gaussian function; and γ represents the spatial aspect ratio, which determines the ellipticity of the shape of the Gabor function.
After the principal component analysis of the original data, a set of fixed Gabor filters was applied to extract spatial information from the images extracted by the principal component analysis after various parameters had been adjusted and optimized.

Band Selection
Generally speaking, a reduction in the dimensionality of hyperspectral data can be achieved by feature extraction or feature selection (band selection). Band selection can effectively reduce the amount of redundant information without affecting the original content by selecting a group of representative bands in the hyperspectral image [23]. This paper adopts a band selection algorithm based on a hybrid strategy. The purpose of this algorithm is to select the best band combinations that provide a large amount of information, less relevance, and strong category separability. The main idea is to find the best band division using the evaluation criterion function, and then evaluate all bands using the mixed strategy of clustering and sorting. Finally, the bands with the highest level in each division are used to form the final best band combination [24].
In this paper, two objective functions, the Normalized Cut and Top-Rank Cut, are used to divide bands. The smaller the value of the former, the higher the intra-group correlation and the lower the inter-group correlation of each band combination. The smaller the value of the latter, the lower the overall correlation. Both of these partitioning functions are beneficial for subsequent sorting selection. In the mixed strategy, we adopt the maximum variance principal component analysis (MVPCA), information entropy (IE) based on sorting and the enhanced rapid density peak clustering (E-FDPC) based on clustering.

PCANet
Compared with previous convolution neural networks, the PCANet deep learning framework has strong competitiveness in terms of feature extraction. It can be used to train a simple network to learn data and learn low-level features from the data. These lowlevel features constitute high-level features with abstract meaning and invariance, which is beneficial for image classification, target detection, and other tasks [25]. In hyperspectral image classification, PCANet can choose the adaptive convolution filter group as the basic principal component analysis filter, and hash binarization mapping and block histogram enhance feature separation and reduce feature dimension, respectively [21]. At the same time, the spatial pyramid is used to extract the invariant features, and finally, the SVM classifier is used to output the classification results. The specific network structure includes PCA convolution layers, a nonlinear processing layer (NP layer), and a feature pooling layer (FP layer), as shown in Figure 2.
Assuming that our input training set {I } contains N images of size m × n，the filter size in all convolutional layers is k × k . We need to learn the principal component analysis filter from {I } , and the following text introduces the proposed architecture.
Patch-mean removal PCA filters convolution

The First PCA Convolution Layer
First, each pixel in the input image I is sampled with a k × k block, and then these sample blocks are cascaded to get an m × n block. The sample block of image I is expressed as x , , x , , … , x , ∈ ℝ , where m = m − ⌈k /2⌉, n = n − ⌈k /2⌉. Then, the sample block is de-zeroed and averaged to obtain X = x , , x , , … , x , ∈ ℝ , where x , = x , − , is a sample block with the average value removed. Then, the same processing is done on the rest of the images. Finally, the training sample matrix is obtained, as shown in Equation (5). The PCA algorithm minimizes the reconstruction error by looking for a series of orthonormal matrices. We assume that the number of filters in the i − th layer is L , so the reconstruction error can be expressed by Equation (6). Similarly to the classic principal component analysis algorithm, the first n eigenvectors of the covariance matrix of matrix X need to be obtained, so the representation of the corresponding filter is shown in Equation (7), where the mat function is used to transform the vector into a matrix, and q represents the l − th principal eigenvector. The output of the first layer is shown by Equation (8), where * represents convolution operation and I is the i − th input image.

The nth PCA Convolution Layer
The subsequent convolution layer process is same that for as the first layer. First, the output C of the previous layer is calculated. Then, the sample is zeroed to the edge to ensure it is the same size as the original image. The output is subjected to operations such as block sampling, cascading, and de-averaging. Similarly, the feature vector and output of the n − th layer are obtained. It is worth noting that since the first layer has L filters, the first layer generates L output matrices, and the second layer generates L outputs for each output by the first layer, which is accumulated layer by layer. For each sample, the n − order PCANet generates ∏ L output feature matrices, and the final output feature is expressed as shown in Equation (9). Thus, it can be seen that the individual convolutional layers of the PCANet are structurally similar, which facilitates the construction of the multi-layer deep network.

Nonlinear Processing Layer
In previous work, we obtained the features through the convolution layer and then used the hash algorithm to perform nonlinear processing on the output of the previous layer. The specific operation used is as follows: binary hash encoding is carried out on each output to reset the pixels and index to enhance the separation between different features. L images form a vector of length L , and each vector is converted to a decimal as the output of the non-linear processing layer. The function is shown below:

Feature Pooling Layer
After the nonlinear processing layer, each output matrix is divided into B blocks, and the histogram information of the decimal values in each block is calculated and cascaded into a vector to obtain the final block expansion histogram features, as shown in Equation (11). Histogram features add some stability to the variation in the features extracted by PCANet while reducing the feature dimension [26]. In order to improve the multi-scale property of the spatial features, a spatial pyramid module is added after the block histogram to give the features a better distinguishing ability. Finally, the SVM classifier is selected to get the final output.

Algorithm Process Description
After describing the above framework, the specific implementation process for the Algorithm 1 used in this paper is as follows:

Algorithm 1 The Proposed Algorithm Process: Begin
Input: the labeled sample set with a size of M × N × B, where M × N is the size of the input image and B is the number of bands.

A. Feature extraction and fusion
(1) Based on the band selection algorithm, the original band is selected to obtain the optimal band combination after selection, and the spectral features F are extracted; (2) The primary component analysis algorithm is used to process the original data, and the first principal component PC is obtained; (3) Using the gray-scale co-occurrence matrix algorithm, the gray-scale co-occurrence matrix of the sliding window based on the principal component PC in each direction at angle θ and step d is calculated, and the texture eigenvalue of the center pixel is obtained; (4) Repeat (2) until the principal component area is completely covered by the sliding window; (5) The texture features at each direction angle θ are averaged to obtain the final texture feature matrix; (6) Repeat (4) until the five texture features are obtained and the extraction of texture features F has been completed; (7) According to the Gabor algorithm, set a group of filter banks with scale s and direction θ; (8) Based on the results of (1), the results for the principal component PC based on Gabor filters at various scales and directions are calculated to obtain the final spatial feature information; (9) Repeat (8) until sixteen spatial features have been obtained to complete the extraction of spatial features F ; (10) Fuse the features of (1), (6), and (9) to complete the extraction and fusion of features.

B. PCANet
(1) Randomly select training samples with a certain training testing ratio and input them into the pre-established PCANet network; (2) The first principal component analysis filter is used to obtain the high-level abstract features, and the filter parameters W are learned to calculate the first-layer convolution output C ; (3) The output in (2) is taken as the input and (2) is repeated until all PCA filter layers have been calculated, and the output of the last layer O has been obtained; (4) The separation between features is enhanced by hashing binarization, and O is hashencoded and converted to a decimal to obtain T ; (5) The feature dimension is reduced by the block histogram, and the statistical histogram information is cascaded to get the final block extended histogram feature f , and then the spatial pyramid is used to generate multi-scale spatial features; (6) The SVM classifier is used for classification. Output: Confusion matrix, Overall accuracy (OA), Kappa value. End

Experimental Results and Discussion
In order to verify the effectiveness of the method proposed in this paper, we used two hyperspectral remote sensing sea ice data sets in the experiment: images of Bohai Bay and Baffin Bay were captured by Earth Observation Satellite-1 (EO-1). We compared the proposed algorithm with four other algorithms: SVM [10], one-dimensional convolutional neural network (1D-CNN), two-dimensional convolutional neural network (2D-CNN), three-dimensional convolutional neural network (3D-CNN), and gray-level co-occurrence matrix convolutional neural network (GLCM-CNN) [22]. All classification algorithms evaluated the experimental results in terms of the overall accuracy (OA) and the kappa statistic. Each algorithm ran the experiments 10 times with different initial random training samples. The experimental environment used was an Intel(R) Core(TM) i5-6500CPU with 3.20 GHz and 20 GB of installed memory.

Data Description
The first data set used in the experiment was a cloud-free Hyperion image of the Bohai Bay area taken on 23 January 2008. The original size of the image was 7061 pixels × 2001 pixels, and the spatial resolution was 30 m. After removing the low SNR and water absorption bands from the original 242 bands, 176 bands were selected for analysis. A scene covered by sea ice with a size of 442 pixels × 212 pixels was cropped from the original image and used as the experimental area. According to the spectral curve, the experimental area could be roughly divided into four categories: white ice, gray-white ice, gray ice, and sea water. As shown in Figure 3a, the experimental area was a false-color image composed of bands R: 30, G: 25, and B: 18, and the corresponding labeled samples are shown in Figure 3b. The labeled samples were randomly divided into training samples and test samples at a ratio of 1:9, as shown in Table 1.  Generally speaking, the hyperspectral classification algorithm which only uses spectral features does not fully utilize the spatial information, which limits the further improvement of classification accuracy. Especially for spectral information relatively similar within classes, making full use of spatial information can further improve the classification accuracy [25]. In this experiment, we studied different spatial feature extraction methods and combined spatial features and spectral features.
Although the GLCM provides information on the gray-level direction, interval, and variation in the amplitude of the image, it cannot directly determine the distinguishing texture. In addition, the highly correlated components in texture features extracted by GLCM need to be removed through a correlation analysis [23]. As shown in Table 2, the correlation analysis was conducted on eight texture components in the Bohai Bay data. When the absolute value of the correlation coefficient of two texture features was greater than 0.8, we classified them as highly correlated. In this case, texture features with smaller average absolute correlation values were selected for further study. Based on the correlation matrix, five texture components with low similarity were finally retained: the mean, variance, homogeneity, contrast, and correlation. In order to extract more abundant spatial features, we studied the multi-direction and multi-scale Gabor filters and then selected suitable Gabor filters through parameter tuning. Finally, we used four filter scale sizes of 7, 9, 11, and 13 in the experiment and four directions, θ = 0, π/4, π/2, and 3π/4, were determined for each scale.
We conducted a principal component analysis on the original image and used the five texture features extracted by the GLCM and the 16 spatial features extracted by the Gabor filter. The spectral features were combined after band selection and finally input into the network model for training and classification. The model proposed in reference [5] has a simple structure and is easy to train. On this basis, we adjusted the parameters and optimized the structure. The structure of the network model is shown in Table 3. The model contains an input layer, two convolution layers, a nonlinear layer, a feature pooling layer, and an output layer. The data input size was 8×8, and 93 samples were randomly input in each training session. The average of 10 rounds of training was taken as the final evaluation result.

Experimental Results and Discussion
The classification results for the Bohai Bay data are shown in Table 4. In the experiment, after band selection, the input size when only spectral features and textural features was 6 × 6, and the input size when only spatial features were used by Gabor was 7 × 7. The input size when all of the above were combined was 8 × 8. The input size of the original spectral features and the two spatial features was 14 × 14 while that of the fusion method was 15 × 15. It can be seen from Table 4 that compared with other combinations, the combination of spectral features after band selection and two spatial features achieved the best classification result, with an average accuracy of 94.15%. The classification accuracy after band selection was 2.51% higher than that using the whole band, which indicates that removing highly correlated and redundant bands can effectively improve the classification accuracy. In terms of feature combinations, the fusion of single spatial features and spectral features did not achieve a great improvement, while the fusion of two spatial features and spectral features improved the result by 0.94%. (Accu: classification accuracy, BN: number of bands, F : using only spectral features, F +F : combining spectral features and texture features extracted by GLCM, F +F : combining spectral features and spatial features extracted by Gabor filtering, F +F +F : combining the above three features). Table 5 shows the classification results for the Bohai Bay dataset based on different classification methods. In general, the classification accuracy of the SVM algorithm, which is a shallow learning method, was relatively low, only 85.21%. Although the one-dimensional convolutional neural network, which only uses spectral features, was found to be better than the SVM, neither of them utilize spatial features. The classification accuracy of the proposed method in this paper was 4.94% and 4.01% higher than that of 2D-CNN and 3D-CNN, respectively, both of which use spatial features. This shows that fully mining spatial features and performing fusion on spectral and spatial features can effectively improve the sea ice classification accuracy. The GLCM-CNN produced more accurate results than the 3D-CNN, which reflects the effectiveness of the texture information extracted by the GLCM. The classification accuracy of the proposed method improved by 2.1% compared with that of GLCM-CNN, which indicates that compared with the convolutional neural network, the proposed PCANet network can choose the data adaptive convolution filter as the principal component analysis filter and enhance the separation of features through hash binarization mapping and the block histogram, which is a better feature extraction method. It can obtain a better classification effect by using fewer training samples. In sea water and three types of sea ice, this method is advantageous for the classification of sea ice. Compared with 3D-CNN and GLCM-CNN, the classification accuracy of white ice was found to increase by 4.5% and 0.6%, respectively, and that of gray-white ice and gray ice increased by 5.15%, 2.51%, 2.62%, and 0.62%, respectively.
In addition, result maps are shown in Figure 4. The classification result maps show that the method is an advantageous one for classifying two intermediate seawater categories and three sea ice categories as it can distinguish the boundary between gray ice and gray-white ice more clearly. The proposed method effectively eliminates the noise points and makes the distinction between different types of edge regions more accurate.

Data Description
The second dataset used in the experiment was a cloud-free Hyperion image of Baffin Bay area taken on 12 April 2014. The original size of the image was 2395 pixels × 1769 pixels, and it had a spatial resolution of 30 m. Similar to the Bohai Bay dataset, we selected 176 bands out of the original 242 bands for subsequent analysis, and cropped 212 pixels × 300 pixels from the original image as the experimental area, which was divided into three main categories: white ice, gray ice, and sea water. As shown in Figure 5a, this was a false-color image of the experimental area composed of three bands (R: 175, G: 168, and B: 150). The corresponding labeled samples are shown in Figure 5b, and the labeled samples were randomly divided into training samples and test samples at a ratio of 1:9, as shown in Table 6.   Table 7 shows the correlation matrix used to extract the corresponding texture features based on the GLCM, and feature exclusion was conducted according to the correlation value. Because the level of dissimilarity was highly correlated with the level of homogeneity and contrast, and because the level of homogeneity had greater similarity with the angular second moment, the contrast feature was retained. Similarly, the angular second moment and entropy had high similarity, so the texture feature of the angular second moment was excluded. The five remaining texture components were the mean, variance, contrast, entropy, and correlation. The parameters used by the Gabor filter to extract spatial features were similar to those used for the Bohai Bay dataset, but the difference was that the four scales of the filter were set as 5, 7, 9, and 11 respectively. After analyzing the original spectral information with the same band selection algorithm, the extracted spatial features were combined into the network, and the structure of network was as shown in Table 8. The model consists of an input layer, three convolution layers, a nonlinear layer, a feature pooling layer, and an output layer. The size of the input data was 8 × 8, and 123 samples were randomly input for each training session. Similarly, the average of 10 rounds of training was taken as the final evaluation result. Table 8. Network structure of the Baffin Bay data set.

Experimental Results and Discussion
The classification results of the Baffin Bay data are shown in Tables 9 and 10. For the feature combination results, the input size was the same as that of the Bohai Bay dataset. It can be seen from Table 9 that the classification accuracy of the method in this paper was the highest. The overall classification effect of the band selection method was better than that of the full-band method, which indicates that the existence of redundant and highly correlated bands had a greater impact on the results. When the number of bands was the same, the extraction of spatial and texture features enriched the information contained in the samples, which improved the final classification accuracy by 2-3%. In addition, the spatial and texture information extracted by combining the two methods was found to be more advantageous for classification than that extracted by using spectral information alone. The classification accuracy of the former was found to be 2.99% higher than that of the latter. (Accu: classification accuracy, BN: number of bands, F : using only spectral features, F +F : combining spectral features and texture features extracted by GLCM, F +F : combining spectral features and spatial features extracted by Gabor filtering, F +F +F : combining the above three features) As shown in Table 10, the classification accuracy for gray ice in the compared methods is relatively low, and it is difficult to distinguish gray ice using the spectral features contained in the SVM and 1D-CNN. However, in the method proposed in this paper, the classification of gray ice by spatial features is improved. Compared with other algorithms, the proposed method has the highest overall classification accuracy (OA = 96.86%): 6.16%, 6.67%, 3.46%, 2.78%, and 1.85% higher than that of SVM (90.70%), 1D-CNN (90.19%), 2D-CNN (93.40%) 3D-CNN (94.08%), and GLCM-CNN (95.01%), respectively. The classification result images are shown in Figure 6, where the visual classification result of the proposed method at the category boundaries was found to be more precise. For example, as can be seen from the lower right corner of the experimental area, for the intermediate category (gray ice), the proposed method better distinguished gray ice from sea water. This shows that the proposed method can effectively extract feature information from samples. Compared with the convolutional neural network, the proposed method can learn the low-level features from the data, and these low-level features constitute more abstract high-level features, which are suitable for the classification of hyperspectral sea ice images.

Parameter Influence Analysis
It can be seen from the experiment that in the classification of hyperspectral sea ice images, various parameters and the number of bands in the deep learning network affect the final classification accuracy. This section details our analysis of the experimental results in which we changed the number of network layers, the number of filters, the size of the filters, and the number of bands in the model. In addition, the parameters of the feature pooling layer were fixed in this experiment. The size of the histogram blocks for the Bohai Bay data and Baffin Bay data were 7 × 7 and 5 × 5 respectively, and the sizes of the spatial pyramid pools were set to 1 × 1, 2 × 2, and 4 × 4.

The Number of PCA Layers
As can be seen from Figure 2, the network used in this paper has multiple layers, that is, multiple PCA processing stages, which can extract image features several times. We analyzed the number of network layers in the two data sets, and the experimental results are shown in Figure 7. The results show that for Bohai Bay, the classification accuracy generally decreases as the number of network layers increase. For Baffin Bay, the best classification accuracy was achieved when the number of network layers was 3, and then it decreased. In general, a lot of training layers should not be used in the PCA. Based on the data, setting 2 or 3 network layers can improve the classification accuracy.

The Number of Filters
The key characteristic of the method proposed in this paper is that each image needs to be convoluted through the filter generated by the feature vector, so the number of filters will inevitably have an impact on the classification accuracy. We chose to use the Bohai Bay dataset with a two-layer PCA network for the experiment. The number of filters in the first layer (L1) was controlled as a fixed value each time, and the experimental results are shown in Figure 8. In general, as the number of filters in the second layer network (L2) increased, the classification accuracy showed an obvious increasing trend. When L1 = L2 = 8, the highest level of accuracy reached was 95.43%, and then a downward trend began. Therefore, the number of two-layer filters selected was L1 = L2 = 8 in this paper.

The Size of Filters
In the training of a deep learning model, the influence of the filter size on the classification results cannot be ignored. By changing the filter size of each layer, the influence of the filter on the classification accuracy was further studied. Considering the size of the input, we changed the size of the filter of the first layer (k1) in the Bohai Bay dataset from 3 to 11 at an interval of 2, and then we changed the size of the filter of the second layer (k2) for the experiment. Figure 9 shows that when k1 was fixed and k2 was changed, the classification accuracy changed slightly. When k1 = 3 or 5, the classification accuracy increased as k2 increased and tended to be stable when k2 = 7 or 9. When k1 = 7, 9, or 11, the overall trend was similar to the increase of k2. It tended to be flat at first and then decreased. In general, for this experimental dataset, the size of filter needed to be equal to half of the input, and the filter size of the first layer needed to be bigger than the second layer, but the interval could not be too large. If the filter was too small, there was a possibility that it would not contain enough information, and if the filter was too large, it could too much redundant and irrelevant information, both of which would lead to a low classification accuracy.
It is worth noting that considering the three-layered structure of Baffin Bay, we set the size of the filter of the second layer to be same as that of the first layer, and we changed the size of k3 and other settings to be similar to those of the Bohai Bay dataset. The experimental results are shown in Figure 10. It can be seen from the figure that when k1 = k2 = 3, 5, 7, and 9, the classification accuracy first increased and then decreased as k3 increased. When the size of the filter was 11, the overall trend was downward. The results show the same characteristics and conclusions as those of the experiment in Bohai Bay. The filter size of the layers needed to be half of the input, and the interval of each layer could not be too large. In addition, because the filter extracts the local features of the image, the size of the filter could not be too large as this would increase the number of redundant features, introducing more errors into the classification and leading to a reduction in the classification accuracy.

The Number of Bands
In order to retain bands with rich information and low correlation as much as possible, we first needed to determine the number of bands and then select the corresponding number of bands through a correlation analysis. Finally, we determined the band combination according to the results. The experimental results are shown in Figure 11. Having too few bands would lead to incomplete information, while having too many bands would lead to redundancy. The above would affect the improvement in classification accuracy obtained and increase the cost of computing. As can be seen from the experimental results, as the number of bands increased, the overall classification accuracy rose. When the number of bands was 30, the best classification accuracy was obtained, and when the number of bands was greater than 30, a downward trend began. Therefore, 30 bands were used for the experiment on both datasets in this paper. Figure 11. The influences of different numbers of bands on the classification accuracy. Table 11 compares the training time of two datasets with different algorithms. As can be seen from the table, SVM is a shallow learning method with a relatively simple model, so the training time is relatively short. In general, the deep learning method requires a longer training time. However, the 1D-CNN only uses spectral information, while the 2D-CNN uses two-dimensional spatial information after the principal component analysis, so the number of parameters is relatively small and the training time is short. The 3D-CNN extracts spatial features and spectral features simultaneously and has a higher level of model complexity and a longer training time than the other models. The method proposed in this paper increases the computational complexity of feature extraction due to the integration of texture information, spatial information, and spectral information. Compared with the SVM, 1D-CNN, and 2D-CNN algorithms, the training time increases somewhat, but it is far less than that of the 3D-CNN with the same model complexity. The channel dimension of GLCM-CNN is slightly higher than that of the 3D-CNN, so its training time is comparable to that of the 3D-CNN. Therefore, based on the trainability of PCANet and its applicability to small samples, this proposed method can obtain the best classification effect for sea ice with a relatively shorter training time.

Conclusions
In the classification of hyperspectral remote sensing sea ice images, due to the limitation of environmental conditions, the cost of labeling samples is relatively high. In addition, the traditional sea ice classification methods mostly use a single feature and do not fully explore the deep spatial and spectral features hidden in the sea ice images, which limits the improvement of classification accuracy. To address the above-mentioned problem, this paper proposed a hyperspectral sea ice image classification method based on the spectral-spatial-joint feature with PCANet. The method uses fewer training samples to dig deeper into the textural, spatial, and spectral features implicit in hyperspectral data. The PCANet network was designed to produce improved sea ice image classification compared with other remote sensing image classification methods. The experimental results show that the proposed method can efficiently extract the depth features of remote sensing sea ice images with fewer training samples, and it has a better classification performance on the whole. Thus, it could be used as a new model for the classification of remote sensing sea ice images.
(1) Compared with sea ice image classification methods that only rely on spectral features, the proposed method, which is based on the GLCM and Gabor filter, can fully extract the spatial and texture features hidden in a sea ice image. It can enhance features, which makes it conducive to the classification of sea ice images. (2) The rich spectral information present in hyperspectral images provides data support for sea ice classification, but the high level of correlation between bands increases the computational cost, and the existence of the Hughes phenomenon also reduces the classification accuracy. In this paper, a band selection algorithm based on a hybrid strategy was adopted to remove redundant bands with a high level of correlation, and bands with a large amount of information and a low level of correlation were selected to extract depth spectral features, which improved the training efficiency. (3) Compared with other sea ice classification methods, this proposed method integrates the implicit spatial and spectral feature in the sea ice image. The PCANet network is used to select the adaptive convolution filter bank as the principal component analysis filter to excavate the depth characteristics of sea ice. Based on the hash binarization mapping and block histogram, this enhances the feature separation and feature dimension reduction. The low-level features are further learned from the data and used to form high-level features with abstract significance and invariance for sea ice image classification. In this paper, it was shown that, using this method, a good sea ice classification effect can be achieved with fewer training samples and a relatively short training time.
The two datasets used in this paper are not covered by clouds. Generally, the influence of clouds is inevitable. Therefore, in future research, we will integrate microwave remote sensing data and make full use of the characteristics of synthetic aperture radar data that are not affected by clouds in order to enhance the extraction of sea ice features, and solve the problem of sea ice classification in complex environments such as those affected by clouds.
Author Contributions: Y.H. and Y.Z. conceived and designed the framework of the study. X.S. completed the data collection and processing. Y.H. and Z.H. completed the algorithm design and the data analysis and were the lead authors of the manuscript, with contributions from X.S., R.Z. and S.Y. All authors have read and agreed to the published version of the manuscript.