Hyperspectral Sea Ice Image Classification Based on the Spectral-Spatial-Joint Feature with Deep Learning

Sea ice is one of the causes of marine disasters. The classification of sea ice images is an important part of sea ice detection. The labeled samples in hyperspectral sea ice image classification are difficult to acquire, which causes minor sample problems. In addition, most of the current sea ice classification methods mainly use spectral features for shallow learning, which also limits further improvement of the sea ice classification accuracy. Therefore, this paper proposes a hyperspectral sea ice image classification method based on the spectral-spatial-joint feature with deep learning. The proposed method first extracts sea ice texture information by the gray-level co-occurrence matrix (GLCM). Then, it performs dimensionality reduction and a correlation analysis of the spectral information and spatial information of the unlabeled samples, respectively. It eliminates redundant information by extracting the spectral-spatial information of the neighboring unlabeled samples of the labeled sample and integrating the information with the spectral and texture data of the labeled sample to further enhance the quality of the labeled sample. Lastly, the three-dimensional convolutional neural network (3D-CNN) model is designed to extract the deep spectral-spatial features of sea ice. The proposed method combines relevant textural features and performs spectral-spatial feature extraction based on the 3D-CNN model by using a large amount of unlabeled sample information. In order to verify the effectiveness of the proposed method, sea ice classification experiments are carried out on two hyperspectral data sets: Baffin Bay and Bohai Bay. Compared with the CNN algorithm based on a single feature (spectral or spatial) and other CNN algorithms based on spectral-spatial features, the experimental results show that the proposed method achieves better sea ice classification (98.52% and 97.91%) with small samples. Therefore, it is more suitable for classifying hyperspectral sea ice images.


Introduction
Sea ice is an essential part of the Earth's climate system [1] and one of the indicators of global climate change.It plays an important role in heat exchange between the ocean and the atmosphere [2].Sea ice detection is very crucial for an accurate weather forecast [3].At the same time, sea ice is one of the causes of marine disasters in polar and high mid-latitude regions.An unusually large flux of polar sea ice will disrupt the balance of freshwater [4], and affect the survival of living things [5].Sea ice at a mid-high latitude affects the human marine fishery [6], coastal construction industry, and manufacturing industry, and also causes serious economic losses [7].In recent years, sea ice disasters have attracted more attention.Sea ice detection has important research significance [8].
Sea ice detection needs to acquire effective data, and remote sensing technology has become an important means of analyzing and researching sea ice [9] because of its characteristics of timeliness, accuracy, and large-scale data acquisition.The sea ice detection data commonly used include passive microwave [10,11], synthetic aperture radar (SAR) [12,13], multispectral satellite images with a medium or high spatial resolution (MODIS) [14,15], Landsat [16], and hyperspectral images [17].Of these, hyperspectral remote sensing data has a large amount of data and many bands.It can obtain continuous, high-resolution one-dimensional spectral information, and two-dimensional spatial information of a target object.Hyperspectral images (HSIs) have the characteristics of a "merged image-spectrum," and provide important data support for accurate sea ice classification.However, the high dimensionality and large data volume of hyperspectral data bring many problems (e.g., a strong correlation among bands, mixed redundant pixels, and data information redundancy), which easily lead to the Hughes phenomenon and become a problem of sea ice detection.In recent years, hyperspectral remote sensing technology has been gradually applied to the research of sea ice detection.Common methods include the conventional minimum distance, maximum likelihood, decision tree [18], and the support vector machine (SVM) method [19], based on the iterative self-organizing data analysis technique algorithm (ISODATA) [20], linear prediction (LP) band selection [21], and more.However, most of the above methods only use the spectral information in HSIs.It is difficult to solve the phenomenon of "different objects with the same spectrum" in hyperspectral sea ice detection using only spectral information, and the introduction of effective spatial information can make up for the deficiency of using only spectral information [22], which can further improve the accuracy of sea ice classification.At present, the common spatial feature extraction methods are the gray-level co-occurrence matrix (GLCM) [23,24], Gabor filter [25], and morphological profiles (MPs) [26].Among them, the statistical-based GLCM algorithm has a better application value because of its small feature dimension, strong discriminating ability, adaptability, and robustness [2,24].However, these shallow and manual feature extraction methods based on expert experience and prior knowledge often ignore deep information, which limits the classification accuracy of HSIs.
In recent years, deep learning models have been well developed in the field of computer vision.Due to their characteristics from shallow to deep, from simple to complex, and independent of shallow artificial design, they can simultaneously extract the characteristics of deep spectral-spatial features, providing a new opportunity for the development of remote sensing technology.In 2014, for the first time in the field of remote sensing, an HSI classification method based on deep learning was introduced [27].The method uses a stacked autoencoder (SAE) to extract the hierarchical and robust features of HSIs and achieved good results.Subsequently, a large number of deep learning algorithms for HSI classification were proposed.These algorithms make full use of the characteristics of deep learning, in order to learn nonlinear, discriminative, and invariant features from the data autonomously, such as the deep belief network (DBN) [28], the convolutional neural network (CNN) [29], etc.The focus has also evolved from spectral and spatial features to spectral-spatial-joint features.CNN, as a local connection network, automatically learns the features with translation invariance from the input data through the convolution kernel, combines the represented features with the corresponding categories for joint learning, and adaptively adjusts the features to get the best feature representation [30].Additionally, hyperspectral data is usually represented by a three-dimensional data cube, which fits the input mode of extracting features from the three-dimensional convolution filter in CNN, and provides a simple and effective method for simultaneously extracting the features of spectral-spatial-combined features [31].Therefore, in recent years, CNN has been widely used in the field of HSI classification.
It is well-known that deep learning algorithms acquire discriminative and robust feature representations in repeated iterative processes, which require long training time and a large number of training samples to achieve the purpose of accurately identifying the type of object [32].Due to the particularity of the geographical environment of the sea ice covered area, it is difficult to obtain the actual land type information.It is necessary to manually label the sea ice type through prior knowledge.However, the labeling process is time consuming and costly.Due to the limited training samples and unpredictable sample quality, it is a challenge for deep learning algorithms to be applied to hyperspectral sea ice detection.To solve this problem, we improve the sample quality by combining the spectral-spatial information of unlabeled samples, reduce the cost of labeling, and realize sea ice detection based on deep learning using small samples.Therefore, our goal is to make full use of the spectral and spatial information of hyperspectral sea ice data and improve the accuracy of sea ice classification by using a deep learning method.
Based on the above research, this paper proposes a method for the classification of hyperspectral sea ice images based on spectral-spatial-joint features of deep learning, which combines three-dimensional CNN (3D-CNN) and the GLCM algorithm, enhances sample quality by making full use of the spectral-spatial information contained in neighboring, unlabeled samples of each pixel, and deep mines the spectral and spatial information of hyperspectral sea ice to further improve the accuracy of sea ice classification.Considering that 3D-CNN contains a large number of parameters to be trained, this paper utilizes the k-nearest neighbor (KNN) algorithm, which does not need any empirical parameters and is easy to implement, to select a certain number of unlabeled neighbor samples, and the information of unlabeled neighbor samples is used to enhance the quality of the labeled sample.However, the unlabeled sample also contains high-dimensional spectral information and redundant spatial information, which will burden the convolution operation of 3D-CNN.Therefore, in terms of the spectral information of unlabeled samples, based on our previous work [21], we adopt the dimensionality reduction algorithm to select the optimal spectral band combination, according to its spectral characteristics.Furthermore, in terms of spatial information, we remove the spatial features with a high redundancy by correlation analysis [33], and retain the low-correlation texture components for subsequent deep feature extraction, which reduces the burden of feature extraction in 3D-CNN as much as possible.Then, we compare the sea ice classification results with the existing National Snow and Ice Data Center (NSIDC, http://nsidc.org/)data to verify the reliability of the proposed method.
The remainder of this paper is organized as follows.Section 2 introduces the design framework and algorithm ideas in detail.The data set and related experimental setups are described in Section 3, and the effects of experimental results and related parameters are discussed in Section 4. Lastly, we summarize the work of this paper in Section 5.

Proposed Method
The framework of this proposed method is shown in Figure 1.It consists of four main parts: GLCM-based spatial feature extraction, unlabeled sample data processing based on the band selection algorithm and correlation analysis, labeled sample enhancement that fuses neighboring unlabeled samples' information, and spectral-spatial feature extraction and classification based on the 3D-CNN.First, the spatial features are extracted from the original HSI using GLCM, and the extracted spatial features are merged with the original data as the labeled sample data.The unlabeled sample data is obtained from the original data and the extracted spatial features by using the band selection algorithm and correlation analysis.Lastly, the labeled sample data and the unlabeled sample data are fused as the input of 3D-CNN, and the study of deep space-spectral features is then carried out.This algorithm will be described in subsequent sections.

Feature Extraction: Gray Level Co-Occurrence Matrix (GLCM)
Texture is defined by the fact that the gray value of the pixel in the image changes continuously, according to a certain rule in the spatial position.It is one of the most important visual cues for identifying various homogeneous regions, which helps to segment or classify images [34].
A texture feature is a set of metrics computed by a feature extraction algorithm that is used to quantify the perceived texture of an image.Texture features are usually computed in a moving window, and each pixel in the image is set to an explicit and generic neighborhood set [26].The GLCM method extracts the texture feature information of an image by counting the grayscale attributes of each pixel and its neighborhood in the image, which are statistics of the frequency at which two pixels with a certain gray value on the image appear under a given offset distance, which lists the joint probability distribution of the pixel pair at different gray values.It can be expressed as follows. . (1) In Equation (1), , {0,1,..., represents the set of N g quantized gray levels, d is the interval distance of pixel pairs, q is the direction angle during the displacement process, ( , ) x y is the relative coordinate of the pixel in the entire image, x D and y D are the horizontal and vertical offsets, respectively, and x N and y N are the columns and rows, respectively.( , , , ) p i j d q represents the number of times that the pixel pair appears on the direction angle q , where the gray values of the pixel pair are i and j , respectively.Generally, four angles of 0°, 45°, 90°, and 135° are taken in the process of calculating GLCM, and the values corresponding to x D and y D are (1,0) , (0,1) , (1,1) , and ( 1,1) -, respectively.Then, the eigenvalues in the four directions are averaged as a texture eigenvalue matrix.
The normalized gray level co-occurrence matrix can be expressed as follows.
, ( 1 ) 0 or 90 ( , ) , ) 45 or 135 In Equation (2), , i j P is the normalized representation of the elements in GLCM, ( , ) p i j is an element ( , ) i j of GLCM, and x N and y N are the columns and rows of the image, respectively.

Feature Extraction: Gray Level Co-Occurrence Matrix (GLCM)
Texture is defined by the fact that the gray value of the pixel in the image changes continuously, according to a certain rule in the spatial position.It is one of the most important visual cues for identifying various homogeneous regions, which helps to segment or classify images [34].
A texture feature is a set of metrics computed by a feature extraction algorithm that is used to quantify the perceived texture of an image.Texture features are usually computed in a moving window, and each pixel in the image is set to an explicit and generic neighborhood set [26].The GLCM method extracts the texture feature information of an image by counting the grayscale attributes of each pixel and its neighborhood in the image, which are statistics of the frequency at which two pixels with a certain gray value on the image appear under a given offset distance, which lists the joint probability distribution of the pixel pair at different gray values.It can be expressed as follows.
In Equation (1), i, j ∈ 0, 1, . . ., N g − 1 represents the set of N g quantized gray levels, d is the interval distance of pixel pairs, θ is the direction angle during the displacement process, (x, y) is the relative coordinate of the pixel in the entire image, D x and D y are the horizontal and vertical offsets, respectively, and N x and N y are the columns and rows, respectively.p(i, j, d, θ) represents the number of times that the pixel pair appears on the direction angle θ, where the gray values of the pixel pair are i and j, respectively.
The normalized gray level co-occurrence matrix can be expressed as follows.
In Equation (2), P i,j is the normalized representation of the elements in GLCM, p(i, j) is an element (i, j) of GLCM, and N x and N y are the columns and rows of the image, respectively.Normally, if the texture of the image is relatively uniform, the larger values of the GLCM extracted from the image are gathered near the diagonal.If the texture of the image changes more sharply, the larger value of the GLCM will be distributed farther from the diagonal.

Band Selection
Hyperspectral data dimensionality reduction maximizes the retention of important information in the data while compressing the original data.Preserving less spectral bands by the band selection algorithm not only better preserves the original features of the band, but also removes interference from redundant information, which reduces the computational cost in CNN.
In the band selection part of the algorithm in this paper, we used the content of the improved similarity measurement method based on the linear prediction (ISMLP) algorithm considering our previous work [21].The purpose of the algorithm is to select the optimal band combination with a great deal of information and low similarity between bands.The main idea is to determine the first initial band with the largest amount of information through mutual information (MI) and then select the second initial band by a spectral correlation measure (SCM).Then, subsequent band selection is performed by a linear prediction (LP) and the virtual dimension (VD) is used to estimate the minimum number of hyperspectral bands that should be selected.

3D-CNN
Compared to one-dimensional convolutional neural network (1D-CNN) and two-dimensional convolutional neural network (2D-CNN) based on spectral features and spatial features, respectively, the 3D-CNN based on spectral-spatial-joint features takes into account the advantages of both and can simultaneously extract the spectral information and spatial information in HSI spontaneously.By initializing in an unsupervised manner, and then fine-tuning in a supervised manner, 3D-CNN constantly learns high-level features with abstraction and invariance from low-level features, which is conducive to classification, target detection, and other tasks.
In HSI classification, the 3D-CNN can map the input HSI pixels to the input pixel labels, so that each pixel in the HSI can obtain its category label through the network, and complete the pixel-level classification for HSI.In contrast, the 3D-CNN extracts the spectral-spatial information from the cube of the small spatial neighborhood centered on a certain pixel, and uses the category label obtained by the output layer as the label of the central pixel.Therefore, the input of the 3D-CNN is a three-dimensional cube block extracted from the HSI, and the pixel block size is K × K × B, where K is the spatial dimension of the pixel block and must be an odd number, and B is the size of the pixel block in the spectral dimension and is the number of bands of the HSI.The calculation formula for 3D-CNN is carried out as follows.
where R i is the size of the 3D convolution kernel in the spectral dimension, H i and W i , respectively, represent the height and width of the 3D convolution kernel, υ xyz ij is the value at the jth feature cube of the ith layer at the position (x, y, z), k hwr ijm represents the specific value of the jth convolution kernel of the ith layer at the position (h, w, r), and the convolution kernel is connected to the mth feature cube of the (i − 1)th layer, and b ij is biased while f (•) is the activation function.In this paper, the ReLU function is used as the activation function, which is beneficial to gradient descent and back propagation, and avoids the problem of gradient disappearance.The formula is shown below.
In addition, the Adaptive moment estimation (Adam) algorithm is used as the gradient optimization algorithm in this paper [35].The algorithm combines the advantages of the adaptive gradient algorithm (AdaGrad) to deal with the sparse gradient and root mean square prop algorithm (RMSProp) to deal with non-stationary targets.The first-order moment estimation and second-order moment estimation of the gradient are used to dynamically adjust the learning rate of each parameter, after offset correction, and update the different parameters.The specific expression is as follows.
First, assume that, at time t, the first derivative of the objective function for the parameter is g t .Equations ( 5) and ( 6) are updates to the biased first-order moment estimate and the biased second-order moment estimate of the gradient.The deviation is corrected according to the gradient by Equations ( 7) and ( 8), and, lastly, the parameter update is completed by Equation ( 9).
In Equations ( 5)-( 9), β 1 and β 2 are the exponential decay rates of the moment estimates, 0.9 and 0.999, respectively.m t and ν t are first-order moments and second-order moment variables at time t, and the initial values are all 0 while η is the learning rate.To prevent the denominator's value being 0, the value of the small constant ε is set to 10 −8 , and θ t is the 3D-CNN parameter variable at time t, including weights and offsets.

Implementation Process of the Proposed Method
According to the above algorithm, the specific algorithm implementation process is described as Algorithm 1 shown below.(2) According to the GLCM algorithm, slide the sliding window on the PC by step d and direction angle θ, and calculate the gray level co-occurrence matrix of the sliding window every time it is slid, obtain the texture feature value, and then assign textural feature values to the center pixel of the window, (3) Repeat (2) until the sliding window covers the PC, (4) For a certain textural scalar, sum, and average the textural feature matrix of the four direction angles as the final textural feature matrix, and also do the same with other textural features, (5) Repeat step (4) until eight textural features are obtained, (6) Feature extraction is completed.B. Nearest neighbor sample selection (7) For a certain labeled sample in the original data, calculate the Euclidean distance with all unlabeled samples, and sorted into ascending order, its formula is as follows: dist(x, y) = n k=1 (8) Repeat step (7) until all labeled samples have been calculated, ( 9) Neighbor samples extraction is completed.

C. Input data preprocessing
i. Labeled samples (10) Stack the spatial features extracted from the phase with the original data.(11) The data acquisition of labeled samples is completed.ii.Unlabeled samples (12) Obtain spectral features by reducing the original data by the band selection method in Section 2.2.( 13) Obtain spatial features by correlation analysis of the spatial features extracted from phase A, and remove the highly correlated components.(14) Stack spectral features and spatial features in steps ( 12) and ( 13).(15) The data acquisition of unlabeled samples is completed.( 16) Fuse the labeled sample data in i(Labeled samples) with the unlabeled sample data in ii(Unlabeled samples) according to the neighbor relationship in phase B.
( (20) Suppose the first layer contains n convolution kernels of size C × C × D. After each K × K × B sample is subjected to the convolution operation of the first layer, n data cubes of size (K-C+1) × (K-C+1) × (B-D+1) are output values.The output of the first layer is the input of the second layer, which continues the convolution operation, and more.The final output is converted into a feature vector and then input into the fully connected layer, with the mapping and merging of local features extracted during convolution.After calculating the loss rate by the Softmax cross entropy function, the gradient of each parameter is calculated by back propagation, and the network parameters are dynamically updated by the Adam algorithm.
Test stage (23) Input the test sample in step (18) into the trained 3D-CNN model, and calculate the confusion matrix according to the predicted label and the real label to get the classification accuracy.

Experimental Results
To verify the effectiveness of the proposed method, we used two hyperspectral remote sensing sea ice datasets in the experiment.Baffin Bay images and Bohai Bay images were captured by the Earth Observation Satellite-1 (EO-1) [36], and compared with the other six algorithms: decision tree, SVM, 1D-CNN, 2D-CNN, 3D-CNN, and GLCM-CNN.GLCM-CNN combines the spatial features extracted by the GLCM with the original data, and then inputs the fused data into the CNN network.All classification algorithms used the overall accuracy (OA), average accuracy (AA), and kappa statistic (K) to evaluate the experimental results.Each algorithm ran the experiments 20 times with different initial random training samples.The classification accuracy values are given in the form of the mean ± standard deviation.The experimental environment is Intel(R) Core(TM) i5-2500 CPU 3.30 GHz and 22 GB Installed Memory.

Data Description
The first data used in this experiment is the Hyperion image without cloud coverage of the Baffin Bay near Greenland taken on 12 April, 2014.The latitude and longitude of the upper left-hand corner of the image are 79 • 51 27" W and 74 • 16 16" N, and the lower right-hand corner is located at 79 • 29 20" W, 73 • 57 5" N. The data set belongs to the level L1G, which is a level at which geometric correction, projection registration, and topographic correction have already been made [36].The image size is 2395 pixels × 1769 pixels, with a spatial resolution of 30 m. Considering the computational cost, we take part of the original image as the experimental area, which contains all the sea ice categories in the original image.Additionally, the size of the experimental region is 186 pixels × 209 pixels.In the image data with 242 bands, after removing some of the low signal-noise-ratio (SNR) and water absorption bands, 176 bands were used for analysis [37].According to the spectral characteristics of sea ice (see Figure 2) and NSIDC data (see Section 4.5), we divided the experimental scene into three categories: seawater, thin ice (<120 cm thick), and thick ice (>120 cm thick).By manually labeling a certain number of labeled samples as the sample database, which is taken as ground truth, the sample library was randomly divided into training samples and test samples, as shown in Figure 3 and Table 1.The first data used in this experiment is the Hyperion image without cloud coverage of the Baffin Bay near Greenland taken on 12 April, 2014.The latitude and longitude of the upper left-hand corner of the image are 79°51′27′′W and 74°16′16′′N, and the lower right-hand corner is located at 79°29′20′′W, 73°57′5′′N.The data set belongs to the level L1G, which is a level at which geometric correction, projection registration, and topographic correction have already been made [36].The image size is 2395 pixels × 1769 pixels, with a spatial resolution of 30 m. Considering the computational cost, we take part of the original image as the experimental area, which contains all the sea ice categories in the original image.Additionally, the size of the experimental region is 186 pixels × 209 pixels.In the image data with 242 bands, after removing some of the low signal-noiseratio (SNR) and water absorption bands, 176 bands were used for analysis [37].According to the spectral characteristics of sea ice (see Figure 2) and NSIDC data (see Section 4.5), we divided the experimental scene into three categories: seawater, thin ice (<120 cm thick), and thick ice (>120 cm thick).By manually labeling a certain number of labeled samples as the sample database, which is taken as ground truth, the sample library was randomly divided into training samples and test samples, as shown in Figure 3 and Table 1.The first data used in this experiment is the Hyperion image without cloud coverage of the Baffin Bay near Greenland taken on 12 April, 2014.The latitude and longitude of the upper left-hand corner of the image are 79°51′27′′W and 74°16′16′′N, and the lower right-hand corner is located at 79°29′20′′W, 73°57′5′′N.The data set belongs to the level L1G, which is a level at which geometric correction, projection registration, and topographic correction have already been made [36].The image size is 2395 pixels × 1769 pixels, with a spatial resolution of 30 m. Considering the computational cost, we take part of the original image as the experimental area, which contains all the sea ice categories in the original image.Additionally, the size of the experimental region is 186 pixels × 209 pixels.In the image data with 242 bands, after removing some of the low signal-noiseratio (SNR) and water absorption bands, 176 bands were used for analysis [37].According to the spectral characteristics of sea ice (see Figure 2) and NSIDC data (see Section 4.5), we divided the experimental scene into three categories: seawater, thin ice (<120 cm thick), and thick ice (>120 cm thick).By manually labeling a certain number of labeled samples as the sample database, which is taken as ground truth, the sample library was randomly divided into training samples and test samples, as shown in Figure 3 and Table 1.The CNN contains a large number of parameters to be trained.For 3D-CNN, the input size of the three-dimensional data (including the spatial dimension of the image and the size of the channel depth dimension) directly affects the training time of the model.As space dimensions increase and channel dimensions deepen, the time cost becomes higher.In order to reduce the time complexity of the model as much as possible, it is necessary to reduce the dimension of unlabeled samples containing spectral features and spatial features.
In terms of the spectral characteristics of unlabeled samples, based on our previous work [21], considering the information of the band itself, the similarity among the bands, and the spectral characteristics of the sea ice, we use the band selection algorithm to select a combination of bands with a large amount of information and low similarity among bands.Lastly, three bands are selected as the spectral features, and the band number is 16, 118, and 84, respectively.
In addition, correlation analysis methods are used to exclude highly correlated components of textural features extracted by GLCM.Quantitative analysis by the correlation matrix, shown in Table 2, is the correlation of eight texture components in the Baffin Bay data set.When the two texture features are highly correlated (the absolute value of the correlation coefficient is greater than 0.7), the texture component with a smaller average absolute correlation is selected for further study.For example, in Table 2, the correlation coefficient between homogeneity and the angular second moment (ASM) is 0.8020, while the average absolute correlation of homogeneity is 0.5087, which is greater than ASM (0.4967).Similarly, the correlation coefficient between contrast and dissimilarity is 0.7232, the correlation coefficient between entropy and ASM is -0.7353, and contrast and entropy have a larger average absolute correlation.Therefore, five texture components are finally retained: mean, variance, contrast, ASM, and correlation.Combining the three bands after band selection and the five texture components after correlation analysis, the eight attribute features are used to represent the unlabeled sample spectral-spatial information.Additionally, 20 neighbor samples (K = 20) are selected for the each labeled sample, and its spectral-spatial feature is used as the patch information of the labeled sample.The channel (depth) dimension of the final input data is 176 + 8 + 8 × 20 = 344.The model proposed in Reference [31] is lightweight and easy to train and adjust, and, in order to further reduce the time cost, this paper draws on the advantages of the model and optimizes it on the hyper-parameters.The model structure is shown in Table 3.The model contains two convolutional layers, a fully connected layer, and an input layer and output layer.The data input size is 5 × 5 × 344 and this is normalized to [0, 1].The learning rate is set to 0.001 and the dropout value is 0.5.The number of training iterations is 2000, and 20 training samples are randomly input into the network for each iteration.4 that, compared with other algorithms, the proposed algorithm achieves the best classification result, and the OA is 98.52%.As can be seen, as a whole, decision tree and SVM achieve relatively low classification accuracy (85.54% and 90.36%, respectively) because SVM and decision tree belong to a shallow learning model, and do not fully mine the spatial features in the process of classification.Compared with 1D-CNN using only spectral features and 2D-CNN using only spatial features, the increase is 8.74% and 5.05%, respectively, which indicates that the use of spectral-spatial-joint features can effectively improve the accuracy of sea ice classification.In the method based on spectral-spatial features, GLCM-CNN is higher than 3D-CNN, which reflects the validity of the texture information extracted by GLCM.However, the accuracy improvement is limited, with a value of only 0.58%.The proposed method is 3.08% higher than 3D-CNN and 2.50% higher than the GLCM-CNN result (96.02%) with unfused neighbor sample information, and AA and the Kappa coefficient are increased by 2.35% and 3.82%, respectively.It is proved that the spatial information of the unlabeled samples of the nearest neighbors can effectively enhance the quality of training samples and improve the accuracy of sea ice classification, which further proves the superiority of the proposed method.
Among the three categories, Seawater has a lower reflectance of the spectrum due to its own characteristics, so it is more distinguishable than sea ice.Thin ice, as an intermediate category between seawater and thick ice, has a wide range of thicknesses, so its misclassification is serious.In Table 4, thin ice's classification accuracy in the seven methods is relatively low, and it is difficult to distinguish by spectral features in the decision tree, SVM, and 1D-CNN.In 2D-CNN and GLCM-CNN, the classification of thin ice by spatial features has obtained certain effects.The proposed method fuses the spatial information of the unlabeled neighbor samples based on the spatial features of the original labeled samples, and obtains the best classification results, which are increased by 6.34% and 4.57%, respectively.The classification result images are shown in Figure 4.Because the proposed method has advantages for the classification of intermediate categories (thin ice), the visual classification results are more accurate for the category boundaries.

Data Description
The second data used in this experiment is the Hyperion image without cloud coverage of the Bohai Bay region taken on 23 January, 2008.The latitude and longitude of the upper left-hand corner of the image are 120°45′12″W and 41°39′7″N, and the lower right-hand corner is located at 121°13′9″E, 39°44′42″N.The image size is 7061 pixels × 2001 pixels and the spatial resolution is 30 m.The scene covered by sea ice is cut out from the original image as the experimental area.After clipping, the size of the experimental area is 272 pixels × 159 pixels.Similar to the Baffin Bay data, 176 bands were selected for analysis in 242 bands, and the experimental areas are classified into four categories, according to spectral characteristics (see Figure 5) and NSIDC data (see Section 4.5).white ice (30-70 cm thick), gray-white ice (15-30 cm thick), gray ice (10-15 cm thick), and seawater.Labeled samples were acquired by manual labeling and then randomly divided into training samples and test samples in a 1:1 ratio, as shown in Figure 6 and Table 5.

Data Description
The second data used in this experiment is the Hyperion image without cloud coverage of the Bohai Bay region taken on 23 January, 2008.The latitude and longitude of the upper left-hand corner of the image are 120 • 45 12" W and 41 • 39 7" N, and the lower right-hand corner is located at 121 • 13 9" E, 39 • 44 42" N. The image size is 7061 pixels × 2001 pixels and the spatial resolution is 30 m.The scene covered by sea ice is cut out from the original image as the experimental area.After clipping, the size of the experimental area is 272 pixels × 159 pixels.Similar to the Baffin Bay data, 176 bands were selected for analysis in 242 bands, and the experimental areas are classified into four categories, according to spectral characteristics (see Figure 5) and NSIDC data (see Section 4.5).white ice (30-70 cm thick), gray-white ice (15-30 cm thick), gray ice (10-15 cm thick), and seawater.Labeled samples were acquired by manual labeling and then randomly divided into training samples and test samples in a 1:1 ratio, as shown in Figure 6 and Table 5.For the Bohai Bay data set, band 21, band 120, and band 83 were selected as the optimal combined bands to represent the spectral information of the unlabeled samples [21].Table 6 is the correlation matrix of the Bohai Bay data set, in which the correlation coefficients of mean and   For the Bohai Bay data set, band 21, band 120, and band 83 were selected as the optimal combined bands to represent the spectral information of the unlabeled samples [21].Table 6 is the correlation matrix of the Bohai Bay data set, in which the correlation coefficients of mean and

Experimental Setup
For the Bohai Bay data set, band 21, band 120, and band 83 were selected as the optimal combined bands to represent the spectral information of the unlabeled samples [21].matrix of the Bohai Bay data set, in which the correlation coefficients of mean and dissimilarity, and entropy are 0.7458 and 0.7358, respectively.The mean's average absolute correlation (0.5932) is lower than that of the two texture components.Therefore, dissimilarity and entropy are excluded.Homogeneity is highly correlated with the angular second moment, when comparing the average absolute correlation of them (0.5377 and 0.5040) and excluding homogeneity.According to Table 6, five texture components are retained, which are mean, variance, contrast, ASM, and correlation.The input data of the Bohai Bay data set is 5 × 5 × 344, where 334 contains the original 176 bands of the input sample and its texture feature band, and the patch information.Additionally, for the patch information, each nearest neighbor sample contains the optimal band combination of three bands and five low-correlation texture components.Table 7 shows its model network structure, normalizing the input data to [0, 1], and other hyper-parameter settings are the same as the Baffin Bay data set.

Experimental Results and Discussion
The classification results and result maps of the Bohai Bay data set are shown in Table 8 and Figure 7.Among the four comparison algorithms, the input of the 3D-CNN is 9 × 9 × 176, and the others are the same as the Baffin Bay data set.The proposed method has the same spatial dimension as GLCM-CNN, and both of them are 5 × 5. From Table 8, the proposed method obtains the highest classification accuracy (OA = 97.91%),which is 14.04%, 8.30%, 7.32%, 3.83%, 2.96%, and 1.48% higher than the decision tree (83.87%),SVM (89.61%), 1D-CNN (90.59%), 2D-CNN (94.08%), 3D-CNN (94.92%), and GLCM-CNN (96.43%), respectively.This indicates that, when the spatial dimension input size is not very different, the proposed method is effective in extracting spatial information from the nearest unlabeled, neighbor samples to enhance the sample quality.In the seawater and three sea ice categories, the proposed method has an advantage in classifying the intermediate category (gray-white ice and gray ice).Compared with 2D-CNN, 3D-CNN, and GLCM-CNN, which contain spatial information, in the proposed method, the classification accuracy of gray-white ice increased by 3.23%, 3.23%, and 1.77%, respectively, and that of gray ice increased by 2.96%, 1.32%, and 1.41%, respectively.Moreover, as demonstrated by the classification result map, the proposed method effectively eliminates noise points.The classification result map is smoother and the distinction between different types of edge regions is more precise.

Training Samples
The number of training samples is a very important factor in CNN training.Since CNN contains many parameters to be trained, a large number of training samples are needed to ensure the diversity of samples and to extract more robust and effective features.For sea ice at different latitudes, we explored the number of training samples separately.Additionally, 10 experiments were carried out under different training sample sizes, and the classification accuracy value was the average of 10 experimental results.For the Baffin Bay data set, which is located in the high latitudes of the Arctic, different types of sea ice have greater distinction.We only need fewer training samples to achieve a higher classification accuracy.This paper randomly selects the same number of training samples for different categories.As shown in Figure 8a, the classification accuracy increases with the training sample number, but the proposed method in this paper is superior to other algorithms under different training sample sizes.For example, when only 10 training samples are selected for each category, the average precision of the 10 experiments under random sampling is 90.34%, compared with the decision tree (84.91%),SVM (79.75%), 1D-CNN (81.37%), 2D-CNN (81.32%), 3D-CNN (84.89%), and GLCM-CNN (87.72%), which increased by 5.43%, 10.59%, 9.02%, 5.45%, and 2.62%, respectively.When the number of training samples for each category is 20, the difference between the other four methods reaches the maximum, which is 9.60%, 11.38%, 10.59%, 9.60%, 5.36%, and 3.25%, respectively.
Compared to Baffin Bay, Bohai Bay is in a mid-latitude and mainly contains one year ice.The separability between different types of sea ice is low, and more training samples are needed.Therefore, for the Bohai Bay data set, 10% to 50% of the training samples are randomly selected, and the rest are used as test samples.Ten experiments are performed at different scales.As shown in Figure 8b, the proposed method has advantages in different training sample ratios, especially when the training samples are insufficient.When the training sample ratio is 10%, the classification accuracy of the proposed approach is 92.74%.Compared with the decision tree (82.64%),SVM (65.33%), 1D-CNN (86.85%), 2D-CNN (88.72%), 3D-CNN (85.99%), and GLCM-CNN (89.13%), the proposed approach's accuracies are increased by 10.10%, 27.41%, 5.89%, 4.02%, 6.75%, and 3.61%, respectively.
Remote Sens. 2018, 10, x; doi: FOR PEER REVIEW www.mdpi.com/journal/remotesensingmany parameters to be trained, a large number of training samples are needed to ensure the diversity of samples and to extract more robust and effective features.For sea ice at different latitudes, we explored the number of training samples separately.Additionally, 10 experiments were carried out under different training sample sizes, and the classification accuracy value was the average of 10 experimental results.For the Baffin Bay data set, which is located in the high latitudes of the Arctic, different types of sea ice have greater distinction.We only need fewer training samples to achieve a higher classification accuracy.This paper randomly selects the same number of training samples for different categories.As shown in Figure 8(a), the classification accuracy increases with the training sample number, but the proposed method in this paper is superior to other algorithms under different training sample sizes.For example, when only 10 training samples are selected for each category, the average precision of the 10 experiments under random sampling is 90.34%, compared with the decision tree (84.91%),SVM (79.75%), 1D-CNN (81.37%), 2D-CNN (81.32%), 3D-CNN (84.89%), and GLCM-CNN (87.72%), which increased by 5.43%, 10.59%, 8.97%, 9.02%, 5.45%, and 2.62%, respectively.When the number of training samples for each category is 20, the difference between the other four methods reaches the maximum, which is 9.60%, 11.38%, 10.59%, 9.60%, 5.36%, and 3.25%, respectively.Compared to Baffin Bay, Bohai Bay is in a mid-latitude and mainly contains one year ice.The separability between different types of sea ice is low, and more training samples are needed.Therefore, for the Bohai Bay data set, 10% to 50% of the training samples are randomly selected, and the rest are used as test samples.Ten experiments are performed at different scales.As shown in Figure 8(b), the proposed method has advantages in different training sample ratios, especially when the training samples are insufficient.When the training sample ratio is 10%, the classification accuracy of the proposed approach is 92.74%.Compared with the decision tree (82.64%),SVM (65.33%), 1D-CNN (86.85%), 2D-CNN (88.72%), 3D-CNN (85.99%), and GLCM-CNN (89.13%), the proposed approach's accuracies are increased by 10.10%, 27.41%, 5.89%, 4.02%, 6.75%, and 3.61%, respectively. (a)

The Value of K
The KNN algorithm does not contain any parameters to be trained, and only the selection of the K value has an effect on the results.The K value represents the number of neighbor samples in the unlabeled sample.This section mainly explores the effect of the K value on the algorithm from 1 to 20 and performs five experiments under random sampling with different K values.As shown in Figure 9, as the K value increases, the classification accuracy value is on the rise, but the selected neighboring samples also increase.Although it brings a wealth of information to the labeled samples, it also means that the channel (depth) dimension of the input data increases, and it takes more time to train a robust and efficient classification model.Therefore, choosing the appropriate K value requires a comprehensive consideration of the impact on the classification accuracy and model computational complexity.In summary, in the experiment of this paper, the K values in both data sets are 20.

The Value of K
The KNN algorithm does not contain any parameters to be trained, and only the selection of the K value has an effect on the results.The K value represents the number of neighbor samples in the unlabeled sample.This section mainly explores the effect of the K value on the algorithm from 1 to 20 and performs five experiments under random sampling with different K values.As shown in Figure 9, as the K value increases, the classification accuracy value is on the rise, but the selected neighboring samples also increase.Although it brings a wealth of information to the labeled samples, it also means that the channel (depth) dimension of the input data increases, and it takes more time to train a robust and efficient classification model.Therefore, choosing the appropriate K value requires a comprehensive consideration of the impact on the classification accuracy and model computational complexity.In summary, in the experiment of this paper, the K values in both data sets are 20.

The Size of the GLCM Sliding Window
GLCM extracts spatial texture information through a sliding window of a certain size.The difference in sliding windows affects the extraction of features, and the computational cost increases as the window increases.In order to select the appropriate sliding window, this section selects 3 × 3, 5 × 5, and 7 × 7 as three sizes of windows for the two data sets to explore.It performs 10 experiments under different window sizes.The classification results are shown in Table 9 and Table 10.Both data sets achieve the highest precision in the sliding window 5 × 5, which is not much different from the results of the other two types of sliding windows.However, in the calculation, the larger the sliding window, the more edge samples are lost, which will affect the classification accuracy.Considering the balance between the classification accuracy and calculation time, in the Baffin Bay data set and the Bohai Bay data set, the sliding window size of the proposed method is 5 × 5.    9 and 10.Both data sets achieve the highest precision in the sliding window 5 × 5, which is not much different from the results of the other two types of sliding windows.However, in the calculation, the larger the sliding window, the more edge samples are lost, which will affect the classification accuracy.Considering the balance between the classification accuracy and calculation time, in the Baffin Bay data set and the Bohai Bay data set, the sliding window size of the proposed method is 5 × 5.

Training Time
Table 11 compares the training times of different algorithms for the two data sets.Because the shallow learning model is relatively simple, its training time is generally less than the deep learning method.The input size of the training data, the number of iterations, the amount of training samples input in batches, and the hyperparameters of the model (the number of convolutional layers, the size and number of convolution kernels, etc.) all have an impact on the training time.For example, in the Baffin Bay experiment, 3D-CNN has the same number of convolutional layers (two layers) and convolutional kernel size (3 × 3) as GLCM-CNN.Iterations are 2000 and batch size is 20 in both of them.However, the number of convolution kernels in each layer in 3D-CNN is 7 and 3, respectively, and the number of neurons in the fully connected layer is 256, which is 4, 2, and 120 in GLCM-CNN, respectively (select the best parameters based on the experimental results).In the case of the same input size, the channel (depth) dimension of GLCM-CNN is higher than 3D-CNN, but the number of convolution kernels in the model and the number of neurons in the fully connected layer are both smaller than 3D-CNN.Therefore, its training time is less than 3D-CNN.Due to the addition of texture information and unlabeled sample information increasing the computational complexity, the training time of the proposed method is increased when compared with other algorithms.A similar situation also appeared in the Bohai Bay experiment.

Method Validation
Because of the particularity of the geographical environment and condition, it is difficult to obtain the measured data of the sea ice covered area.In order to verify the effectiveness of the proposed method, we refer to the sea ice distribution vector data with the same-area and same-period downloaded from the National Snow and Ice Data Center (http://nsidc.org/) and make a qualitative verification.The downloaded data format is the Sea Ice Grid-3 (SIGRID-3), which contains the ice map (distribution by region) and the corresponding attribute list (such as the concentration, stage of development, and form of ice) [38].
The Baffin Bay experimental area in the paper is part of the red area in Figure 10a and the National Snow and Ice Data Center also provides the sea ice data description file [38] of the red area, as shown in Figure 10b.The Baffin Bay experimental area in the paper is part of the red area in Figure 10(a) and the National Snow and Ice Data Center also provides the sea ice data description file [38] of the red area, as shown in Figure 10(b).
The main parameter values are CT: 91, SA: 93, FA: 06, CN: 95, Poly_type: I, and the value "-9" represents no information.As can be seen from Table 12, the total sea ice concentration in the region is 90%-100%.From the parameters SA and CN of the sea ice type, we can get the sea ice information: Vast Floe (>120 cm) and Old Ice.In our classification results, there are 35,778 classified pixels, of which 2,394 are labeled as seawater, which accounts for 6.69% of the total.Therefore, our classification results are consistent with the data provided by the National Snow and Ice Data Center.
Figure 11(a) shows the sea ice distribution near Bohai Bay, and the red cover contains the experimental area.As can be seen from Table 12, the total sea ice concentration in the region is 90-100%.From the parameters SA and CN of the sea ice type, we can get the sea ice information: Vast Floe (>120 cm) and Old Ice.In our classification results, there are 35,778 classified pixels, of which 2394 are labeled as seawater, which accounts for 6.69% of the total.Therefore, our classification results are consistent with the data provided by the National Snow and Ice Data Center.Figure 11a shows the sea ice distribution near Bohai Bay, and the red cover contains the experimental area.Figure 11b shows the sea ice parameters in the red area.13 below.The total sea ice concentration in the region is 70%-90%, as can be seen from Table 13.From the parameters CA and SA, we can get the sea ice information: young ice (10-30 cm).In our classification results, there are 41,540 classified pixels, of which 20,759 are labeled as seawater.This means that the sea ice accounts for 50.03% of the total.In addition, the experimental area is part of a red area in Figure 11(a).There will be some inevitable errors because of regional differences.We refer to the spectral curve of sea ice and the data of NSIDC to define the sea ice type, extract deep spatial spectral features by 3D-CNN, and obtain the results similar to sea ice data from NSIDC.Therefore, from the qualitative point of view, the proposed method and the classification results are reliable.

Conclusions
In hyperspectral remote sensing sea ice image classification, it is difficult to acquire labeled samples due to environmental conditions, and the labeling cost is high.In addition, most traditional sea ice classification methods only use spectral features, and do not make full use of the rich spatial features included in hyperspectral remote sensing sea ice images, which limits further improvement of the sea ice classification accuracy.This paper proposes a classification model based on deep learning and spectral-spatial-joint features for sea ice images.In the proposed method, by combining a spot of labeled samples with a large number of unlabeled samples, we fully exploit the spectral and spatial information in the hyperspectral remote sensing sea ice data and improve the accuracy of sea  13 below.The total sea ice concentration in the region is 70-90%, as can be seen from Table 13.From the parameters CA and SA, we can get the sea ice information: young ice (10-30 cm).In our classification results, there are 41,540 classified pixels, of which 20,759 are labeled as seawater.This means that the sea ice accounts for 50.03% of the total.In addition, the experimental area is part of a red area in Figure 11a.There will be some inevitable errors because of regional differences.We refer to the spectral curve of sea ice and the data of NSIDC to define the sea ice type, extract deep spatial spectral features by 3D-CNN, and obtain the results similar to sea ice data from NSIDC.Therefore, from the qualitative point of view, the proposed method and the classification results are reliable.

Conclusions
In hyperspectral remote sensing sea ice image classification, it is difficult to acquire labeled samples due to environmental conditions, and the labeling cost is high.In addition, most traditional sea ice classification methods only use spectral features, and do not make full use of the rich spatial features included in hyperspectral remote sensing sea ice images, which limits further improvement of the sea ice classification accuracy.This paper proposes a classification model based on deep learning and spectral-spatial-joint features for sea ice images.In the proposed method, by combining a spot of labeled samples with a large number of unlabeled samples, we fully exploit the spectral and spatial information in the hyperspectral remote sensing sea ice data and improve the accuracy of sea ice classification.Compared with the classification method based on single information and other spectral-spatial information, the proposed method effectively extracted sea ice spectral-spatial features with few training samples by using a large amount of unlabeled sample information, and reached the superior overall classification results, which provided a new idea for the classification of remote sensing sea ice images.
(1) Comparing the 1D-CNN model, which can only extract spectral features, and the 2D-CNN model, which can only extract spatial features, the 3D-CNN model can simultaneously extract the spectral and spatial features, and fully exploit the sea ice characteristic information hidden in the remote sensing data.The 3D-CNN model is a classification model suitable for hyperspectral remote sensing sea ice images.
(2) Because the textural characteristics of different types of sea ice are clearly different, texture feature enhancement by GLCM is more conducive for sea ice identification and classification.The proposed method extracts sea ice texture features based on GLCM and combines the texture information with sea ice spectral-spatial information.At the same time, it uses the large-scale features of neighboring unlabeled samples to further enhance the quality of labeled samples.Additionally, the 3D-CNN model is designed for deep spatial spectral feature extraction and classification, which can significantly improve the accuracy of sea ice classification under small sample conditions.
(3) The addition of a large amount of unlabeled sample information increases the computational cost.To reduce the time complexity of the model, the proposed method preprocesses the spectral-spatial information of unlabeled samples.In terms of spectral information, the band selection algorithm is used for dimensionality reduction, a large number of redundant spectral bands are eliminated, and deep spectral feature extraction is performed by using selected bands with a large amount of information and a low similarity.In terms of spatial information, low-correlation texture features are selected based on correlation analysis for deep spatial feature extraction, which effectively reduces the training time.However, the computational cost of the proposed method is relatively high when the experimental area is large.
There is no cloud and snow cover in both data sets in this paper, so we did not discuss this issue.However, cloud cover is an important problem in sea ice detection [39].It will bring about the problem of homology (i.e., different things with the same spectrum) in sea ice detection, and affect the improvement of sea ice classification accuracy.Moreover, because snow cover has a different reflectance with sea ice, we can make full use of hyperspectral data with nano-scale spectral resolution to distinguish the snow cover.In addition, melt ponds have an enormous impact on lowering the ice cover albedo, but there are still spectral differences compared to sea ice [40].In future research, we intend to integrate microwave remote sensing and higher resolution data to reduce the impact of cloud cover, snow cover, and melt ponds on sea ice detection.In addition, because of the great potential of deep learning in automatic feature extraction and learning model building, it is widely used in various fields [41].However, it requires large datasets and long training time.How to balance the sample size and training time is the key for the application.The proposed method can provide a new way of thinking for the application, with small labeled samples.

Figure 1 .
Figure 1.The structure of the proposed framework.

Figure 1 .
Figure 1.The structure of the proposed framework.

( 1 )
Process the original data set by the principal component analysis (PCA) algorithm, taking the first principal component (PC).

18 ):
Input data is randomly divided into training samples and test samples, and the input size of each sample is K × K × B. Training stage (19) Randomly select Batch (Batch = 20) training samples from the training samples into the pre-established 3D-CNN network each time.

Figure 2 .
Figure 2. Spectral reflectance curves of sea ice types and seawater from hyperspectral data for the Baffin Bay data set.

Figure 2 .
Figure 2. Spectral reflectance curves of sea ice types and seawater from hyperspectral data for the Baffin Bay data set.

Figure 2 .
Figure 2. Spectral reflectance curves of sea ice types and seawater from hyperspectral data for the Baffin Bay data set.

3 .
Experimental Results and DiscussionTable4shows the classification results of the Baffin Bay data set based on different classification methods.The classification result values are obtained by comparing the prediction label and the corresponding ground truth.In the experiment, the input sizes of 2D-CNN and 3D-CNN are 19 × 19 and 5 × 5 × 176, respectively.The input size of GLCM-CNN is 5 × 5 × (176 + 8), where 176 is the original band number and 8 are the eight texture components extracted from the first principal component by GLCM.It can be seen from Table

Figure 5 .Figure 6 .
Figure 5. Spectral reflectance curves of sea ice types and seawater from hyperspectral data for the Bohai Bay data set.

Figure 5 .
Figure 5. Spectral reflectance curves of sea ice types and seawater from hyperspectral data for the Bohai Bay data set.

Figure 5 .Figure 6 .
Figure 5. Spectral reflectance curves of sea ice types and seawater from hyperspectral data for the Bohai Bay data set.

Figure 8 .
Figure 8.The effects of training samples on classification accuracies in two data sets: (a) Baffin Bay data set and (b) Bohai Bay data set.

Figure 9 .
Figure 9.The effects of K values on the accuracies of the two data sets: (a) Baffin Bay data set and (b) Bohai Bay data set.

Figure 9 .
Figure 9.The effects of K values on the accuracies of the two data sets: (a) Baffin Bay data set and (b) Bohai Bay data set.

4. 3 .
The Size of the GLCM Sliding Window GLCM extracts spatial texture information through a sliding window of a certain size.The difference in sliding windows affects the extraction of features, and the computational cost increases as the window increases.In order to select the appropriate sliding window, this section selects 3 × 3, 5 × 5, and 7 × 7 as three sizes of windows for the two data sets to explore.It performs 10 experiments under different window sizes.The classification results are shown in Tables

Figure 10 .
Figure 10.(a) Sea ice distribution map near Baffin Bay.(b) Description of the sea ice covered area.
Figure 11(b)  shows the sea ice parameters in the red area.

Figure 10 .
Figure 10.(a) Sea ice distribution map near Baffin Bay.(b) Description of the sea ice covered area.The main parameter values are CT: 91, SA: 93, FA: 06, CN: 95, Poly_type: I, and the value "-9" represents no information.As can be seen from Table12, the total sea ice concentration in the region is 90-100%.From the parameters SA and CN of the sea ice type, we can get the sea ice information: Vast Floe (>120 cm) and Old Ice.In our classification results, there are 35,778 classified pixels, of which 2394 are labeled as seawater, which accounts for 6.69% of the total.Therefore, our classification results are consistent with the data provided by the National Snow and Ice Data Center.

Figure 11 .
Figure 11.(a) Sea ice distribution map near Bohai Bay.(b) Description of the sea ice covered area.

Table 1 .
Number of training samples (pixels) in each class for the Baffin Bay data set.

Table 2 .
Correlation matrix for eight textural components in Baffin Bay data sets.

Table 3 .
Network structure for the Baffin Bay data set.

Table 4 .
Classification results (%) of the Baffin Bay data set.
* The bold style represents the highest accuracy among the compared methods.*Thebold style represents the highest accuracy among the compared methods.

Table 5 .
Number of training samples (pixels) in each class for the Bohai Bay data set.

Table 5 .
Number of training samples (pixels) in each class for the Bohai Bay data set.

Table 5 .
Number of training samples (pixels) in each class for the Bohai Bay data set.
Table 6 is the correlation

Table 6 .
Correlation matrix for eight textural components in Bohai Bay data sets.

Table 7 .
Network structure for the Bohai Bay data set.

Table 8 .
Classification results (%) of the Bohai Bay data set.41%, respectively.Moreover, as demonstrated by the classification result map, the proposed method effectively eliminates noise points.The classification result map is smoother and the distinction between different types of edge regions is more precise.

Table 8 .
Classification results (%) of the Bohai Bay data set.
*The bold style represents the highest accuracy among the compared methods.

Table 9 .
Classification results (%) of the Baffin Bay data set with sliding windows of different sizes.

Table 9 .
Classification results (%) of the Baffin Bay data set with sliding windows of different sizes.

Table 10 .
Classification results (%) of the Bohai Bay data set with sliding windows of different sizes.

Table 11 .
Training time (minutes) for the two data sets for different methods.

Table 12 .
Description of the main parameters.

Table 12 .
Description of the main parameters.

Table 13 .
Description of the main parameters.

Table 13 .
Description of the main parameters.