Deep Learning Based Sea Ice Classiﬁcation with Gaofen-3 Fully Polarimetric SAR Data

: In this paper, the performance of C-band synthetic aperture radar (SAR) Gaofen-3 (GF-3) quad-polarization Stripmap (QPS) data is assessed for classifying late spring and summer sea ice types. The investigation is based on 18 scenes of GF-3 QPS data acquired in the Arctic Ocean in 2017. In this study, ﬂoe ice (FI), brash ice (BI) between ﬂoes and open water (OW, ice-free area) were classiﬁed based on a mini sea ice residual convolutional network, which we call MSI-ResNet. While investigating the optimal patch size for MSI-ResNet, we found that, as the patch size continues to grow, the classiﬁcation accuracy ﬁrst increases and then decreases. A patch size of 31 × 31 was found to achieve the best performance. The performance of classiﬁcation using different polarization combinations from the QPS data was also assessed. The vertical-vertical (VV) polarization input overestimates the FI category while incorrectly identifying most of the BI as FI. The VH polarization produces a synchronous improvement in FI, BI, and OW discrimination, with a higher overall accuracy and kappa coefﬁcient (91.09% and 0.85, respectively) than the VV polarization (83.37% and 0.70, respectively). The combination of VV and vertical-horizontal (VH) polarizations presents a modest precision improvement for BI and OW together with a slight overestimation for FI. With VV, VH, and horizontal-horizontal (HH) polarization data as the inputs, the user’s accuracy improves to 95.12%, 93.42%, and 95.17% for FI, BI, and OW, respectively. The accuracy was assessed against visual interpretation of the sea ice classes in the images using a stratiﬁed sampling method. The application of the MSI-ResNet method to data covering the Beaufort Sea and the north of the Severnaya Zemlya archipelago was found to achieve a high overall accuracy (kappa) of 94.62% ( ± 0.92) and 94.23% ( ± 0.90), respectively. This is similar to the classiﬁcation accuracy obtained in the Fram Strait. From the results of this study, it is shown that the MSI-ResNet method performs better than the classical support vector machine (SVM) classiﬁer for sea ice discrimination. The GF-3 QPS mode data also show more details in discriminating scattered sea ice ﬂoes than the coincident Sentinel-1A Extra Wide (EW) swath mode data.


Introduction
Polar sea ice is a sensitive indicator of global climate changes. Information about sea ice type is also important for ship navigation and climate change prediction in polar regions [1][2][3][4][5]. However, the large extent and harsh environment make most of the polar regions difficult to access and the cost of field investigation remarkably high [6]. Spaceborne remote sensing methods, particularly those using active and passive microwave instruments have proven to be successful in monitoring sea ice. Long-term records of Arctic sea ice monitoring (>40 years) are now available from different operational sources, including the Canadian Ice Service (CIS), the Russian Arctic and Antarctic Research Institute (AARI), the Norwegian Ice Service (NIS), and the U.S. National Ice Center (NIC).
Synthetic aperture radar (SAR) has proven to be suitable for monitoring polar sea ice because it is independent of sunlight and atmospheric influences such as clouds and water vapor [7][8][9]. As the imaging mechanism is triggered by surface roughness and subsurface physical properties, SAR can be used to distinguish the different types of sea ice. A few milestones among the SAR systems that have been used to monitor and research Arctic sea ice are NASA's SeaSAT mission, the series of satellites operated by the European Space Agency (ESA) (the European Remote Sensing, ENVISAT, and Sentinel-1 systems), the Japan Aerospace Exploration Agency (JAXA) Advanced Land Observing Satellite/Phased Array type L-band Synthetic Aperture Radar (ALOS PALSAR) systems, the German Aerospace Center (DLR) TerraSAR systems, and the Canadian Space Agency (CSA) RADARSAT systems.
Many studies of sea ice classification using polarimetric SAR data have been conducted as polarized data hold more information about the ice surface. According to Gill et al. [10], the authors explored the potential of polarimetric parameters and used ground truth data to estimate sea ice classification accuracy based on the maximum likelihood classifier. They found that the accuracy increased when more uncorrelated polarimetric parameters were used. By using the parameters of σ 0 vv , entropy and σ 0 hv , the accuracies for open water (OW), smooth first-year ice (SFYI), rough first-year ice (RFYI), and deformed first-year ice (DFYI) were 96.72%, 96.58%, 67.44%, and 95.58%, respectively. Moen et al. [11] used three consecutive RADARSAT-2 (RS-2) scenes to investigate the robustness of polarimetric SAR data for sea ice classification under slightly varying winter environmental conditions based on a supervised classification method with unsupervised automatic segmentation and labeling of the scene as a reference. This study discriminated between seven sea ice types and found that scenes with similar incidence angles produced reasonable results. Another study by Ressel et al. [12] examined the performance of an automated sea ice classification algorithm based on polarimetric TerraSAR-X (TS-X) images. By the use of four polarimetric features of the geometric intensity, the scattering diversity, the surface scattering fraction features, and comparison with in situ measurements, the study correctly identified young ice (YI), SFYI, rough first-year and multi-year ice (RFYMYI), multi-year ice (MYI), and OW. The polarimetric features of spaceborne L-(ALOS-2), C-(RS-2), and X-(TS-X) band quad-polarimetric SAR data were evaluated and validated by Singha et al. [13] for sea ice discrimination using an artificial neural network method, obtaining accuracies of 100% and 96.9% for OW and all the sea ice classes, respectively.
The neural network approach has been applied to SAR sea ice classification in previous studies. An unsupervised neural network Learning Vector Quantization method was applied to airborne polarimetric SAR data by Hara et al. [14] to classify sea ice, achieving a total classification accuracy of 77.8%. A pulse-coupled neural networks-based unsupervised method for sea ice classification in the Baltic Sea under dry snow conditions was developed by Karvonen et al. [15] using Radarsat-1 ScanSAR wide mode data. A supervised neural network was also developed by Ressel et al. [16] for TS-X backscatter data using gray level co-occurrence matrix (GLCM) textural features as the inputs. The authors found the classification accuracies for OW, smooth drift ice/smooth fast ice (SDI) and moderately deformed drift ice (MDDI) to be 79.4%, 89.3%, and 94.5% respectively. Song et al. [17] designed a residual convolutional network for sea ice classification called SI-Resnet, using the backscatter from Sentinel-1 SAR Extra Wide (EW) swath mode data in HH polarization, and reported a reasonably high overall classification accuracy and kappa coefficient of 94% and 91.9%, respectively. The ResNet deep learning framework was presented by He et al. [18] for easing the training of networks by reformulating the layers as residual learning functions, with reference to the layer inputs instead of learning unreferenced functions. ResNet V2, which is a refined version of ResNet, was subsequently proposed by He et al. [19] in 2016. To date, ResNet V2 has been found to be one of the most effective deep learning network frameworks for image detection and classification.
Gaofen-3 (GF-3) is a civilian spaceborne SAR satellite developed as part of China's High-Resolution Earth Observation System Project. GF-3 was launched on 10 August 2016, by the China Academy of Space Technology (CAST). The satellite operates in a sun-synchronous orbit with an orbital altitude of about 755 km, and an in-orbit design life of 8 years. One of its main purposes is monitoring ocean and coastal areas [20]. The nominal resolution of the satellite instruments ranges from 1 to 500 m, and the nominal swath width varies from 10 to 650 km. One of the distinctive features of the GF-3 system is its fully polarimetric imaging capability. GF-3 can acquire fully-polarimetric data in three modes of quad-polarization Stripmap I (QPSI), quad-polarization Stripmap II (QPSII), and wave mode. The former two modes are referred to as QPS mode in this paper. More technical specifications of the sensor can be found in [21,22]. GF-3 SAR data have been used in different marine environment investigations and services, e.g., sea surface wind retrieval [23,24], sea ice detection [25], and ship detection [26,27]. The performance of GF-3 in the observation of intertidal flats, offshore tidal turbulent wakes, and oceanic internal waves has also been evaluated [28]. However, to date, there has been no specific investigation of the polar sea ice classification capabilities of GF-3.
The objective of this study was to investigate the performance of GF-3 full-polarization data for late spring and summer sea ice classification based on the three linear orthogonal polarization backscatter coefficients (σ 0 vv , σ 0 hh , and σ 0 vh ) from QPS mode data. As residual neural networks have been found to be effective in image recognition [19], we adopted this approach, with some adaptive modifications and developed the MSI-ResNet (where 'MSI' means mini sea ice) scheme. This method was found to be effective in discerning between FI, BI, and OW. The optimal patch size for the deep learning scheme was determined, so as to ensure more precise results. The influence of different polarization combinations on sea ice classification was also synthetically explored and analyzed. In addition, in this paper, a comparison between the results from MSI-ResNet and the Support Vector Machine (SVM) classifier [29] for sea ice classification with QPS mode data is presented. Finally, the classification results obtained using GF-3 QPS data are compared with the results obtained from near-coincident Sentinel-1A data using the same MSI-ResNet classifier.

The GF-3 QPS Mode Dataset
In this study, 18 scenes of GF-3 QPS data acquired over late spring and summer Arctic sea ice were used to evaluate the sea ice classification performance based on a deep neural network approach (Section 3). Figure 1 shows the spatial distribution of these scenes in three regions. The five scenes in the Beaufort Sea, denoted as region 1 (R1), were acquired on 25 May 2017. The seven scenes located north of the Severnaya Zemlya archipelago, denoted as region 2 (R2), were acquired on 2 August 2017. The other six scenes in the Fram Strait, denoted as region 3 (R3), were acquired on 14 and 17 June 2017. According to the temperature related seasonal descriptors in [30] and the 2 m temperatures from ERA5 [31] for those three regions, the R1 and R2 scenes are in early melt season, and the R3 scenes are in advanced melt season. All the images are the Level-1A single look complex product. Table 1 summarizes the information about each scene. The imaging mode of the scenes in regions R1 and R2 was QPSI, and the imaging mode was QPSII in region R3. The nominal resolution for the R1 and R2 scenes is 8 m and for R3 it is 25 m. The incidence angle varies Remote Sens. 2021, 13, 1452 4 of 22 between 35.35 • and 43.79 • . All these data were acquired in conditions with a wind speed of less than 6 m/s. melt season, and the R3 scenes are in advanced melt season. All the images are the Level-1A single look complex product. Table 1 summarizes the information about each scene. The imaging mode of the scenes in regions R1 and R2 was QPSI, and the imaging mode was QPSII in region R3. The nominal resolution for the R1 and R2 scenes is 8 m and for R3 it is 25 m. The incidence angle varies between 35.35° and 43.79°. All these data were acquired in conditions with a wind speed of less than 6 m/s.

SAR Data Preprocessing
The preprocessing of the GF-3 SAR data included radiometric calibration, speckle reduction, normalization, and preparation of the training data. The first three steps

SAR Data Preprocessing
The preprocessing of the GF-3 SAR data included radiometric calibration, speckle reduction, normalization, and preparation of the training data. The first three steps constitute the fundamental processing requirements when using SAR data, and normalization is a prerequisite for preparing the inputs of a deep learning method.
The GF-3 radiometric calibration method is given in the user manual as follows: σ 0 dB = 10 log 10 P I * (QV/m) where σ 0 dB is the calibrated backscatter coefficient in the unit of dB, and P I is the sum of the squares of the real and imaginary parts of the single look complex SAR data. The QV (qualified value) is the maximum digital value of the image before quantization and K dB is Remote Sens. 2021, 13, 1452 5 of 22 the calibration constant, both of which are provided in the metadata of each scene. The term m is of 32,767 for Level-1A data. Normalization of the backscattering coefficients to a fixed incidence angle for each scene was unnecessary because the variation of the incidence angle across the swath in this dataset is small, ranging from 1.8 • to 2.2 • .
A Lee filter was applied to reduce the speckle noise. The Lee filter accentuates the edges between ice and water with an insignificant loss of texture features. A window size of 5 × 5 pixels was used. The calibrated GF-3 SAR backscatter coefficient data was used as inputs to the adaptive Lee filter. After calibration and speckle filtering, we rescaled the backscatter coefficient dB range into a digital range of 0-255 for each region. The scaling of each region was performed for each image separately. The limits of the rescaled backscatter coefficient were set to 1.5% and 98.5% of all the polarization ranges in the given region. We then combined the different polarization data (σ 0 vv , σ 0 vh , and σ 0 hh ) into RGB format, in preparation for the input into the deep learning scheme. Figure 2 shows the color composite images of the R1-1, R2-6, and R3-15 scenes (the first two alphanumeric letters are the region designation and the third number is the ID of the scene as listed in Table 1).
The GF-3 radiometric calibration method is given in the user manual as follows: where σ is the calibrated backscatter coefficient in the unit of dB, and P is the sum of the squares of the real and imaginary parts of the single look complex SAR data. The QV (qualified value) is the maximum digital value of the image before quantization and K is the calibration constant, both of which are provided in the metadata of each scene. The term m is of 32,767 for Level-1A data. Normalization of the backscattering coefficients to a fixed incidence angle for each scene was unnecessary because the variation of the incidence angle across the swath in this dataset is small, ranging from 1.8° to 2.2°. A Lee filter was applied to reduce the speckle noise. The Lee filter accentuates the edges between ice and water with an insignificant loss of texture features. A window size of 5 × 5 pixels was used. The calibrated GF-3 SAR backscatter coefficient data was used as inputs to the adaptive Lee filter. After calibration and speckle filtering, we rescaled the backscatter coefficient dB range into a digital range of 0-255 for each region. The scaling of each region was performed for each image separately. The limits of the rescaled backscatter coefficient were set to 1.5% and 98.5% of all the polarization ranges in the given region. We then combined the different polarization data (σ , σ , and σ ) into RGB format, in preparation for the input into the deep learning scheme. Figure 2 shows the color composite images of the R1-1, R2-6, and R3-15 scenes (the first two alphanumeric letters are the region designation and the third number is the ID of the scene as listed in Table 1).  Table 1.

Dataset for Model Training
Ground truth data are important for implementing supervised neural network classification. The sea ice type maps released by a sea ice monitoring agency such as CIS, NIC, or AARI are commonly used as training datasets in sea ice classification studies. However, CIS ice charts are not available in all the geographic areas of the present study, and although the NIC/AARI ice charts are produced weekly, they are generated at a coarse resolution. Therefore, for the data during the melt season, the training data were generated using manual visual identification of the different sea ice types in the images.
The World Meteorological Organization (WMO) has defined seven major sea ice categories based on the ice development stage [32]. However, it is impractical to identify all these categories, especially in late spring and summer scenes when young ice types do

Dataset for Model Training
Ground truth data are important for implementing supervised neural network classification. The sea ice type maps released by a sea ice monitoring agency such as CIS, NIC, or AARI are commonly used as training datasets in sea ice classification studies. However, CIS ice charts are not available in all the geographic areas of the present study, and although the NIC/AARI ice charts are produced weekly, they are generated at a coarse resolution. Therefore, for the data during the melt season, the training data were generated using manual visual identification of the different sea ice types in the images.
The World Meteorological Organization (WMO) has defined seven major sea ice categories based on the ice development stage [32]. However, it is impractical to identify all these categories, especially in late spring and summer scenes when young ice types do not exist, and flooded ice surfaces can mask the underlying ice type in radar images. This results in MYI (which has a distinctly high backscatter in winter) and first-year ice (FYI) having similar radar signatures in summer. The surface deformation form, which is caused by the collision and convergence of mobile ice floes, makes the backscatter high in SAR images in both co-and cross-polarization [33]. However, the deformation may become eroded in summer or covered with wet snow, both of which reduce the backscatter. BI may also continue to exist between ice floes, and its roughness results in relatively high backscatter. Therefore, in this study, three surface categories were considered: floe ice (FI), brash ice (BI), and open water (OW). The FI category combines FYI and MYI, both of which are commonly of round or elliptical shape, with medium backscatter. The BI category represents the crushed ice between ice floes, and OW denotes the ice-free area, which has the lowest backscatter coefficient, because of its relatively smooth surface.
To construct the machine learning model (see Section 3.1), training data and validation data were required. The training data were used for developing the model. The validation data served as auxiliary data for tuning the parameters of the model to avoid overfitting and were used to improve the model capability by checking the performance of the model during the training phase.
The training and validation datasets used in developing the model were generated as follows. Firstly, we labeled the areas of the different surface types, i.e., FI, BI, and OW, in the RGB composite SAR scenes using LabelMe [34], which is an efficient open-source graphical image annotation tool. Figure 3 shows the sparse labeled areas in the composite image of the R3-16 scene, with enlarged segments representing the three surface categories of FI, BI, and OW in blue, green, and red, respectively. The labeled areas may not feature homogeneous SAR backscatter because the given surface may have a range of backscatter. The labeled areas were randomly selected but evenly distributed within the image space, and they occupy a small percentage (about 0.14%) of the entire image area.
caused by the collision and convergence of mobile ice floes, makes the backscatter high in SAR images in both co-and cross-polarization [33]. However, the deformation may become eroded in summer or covered with wet snow, both of which reduce the backscatter. BI may also continue to exist between ice floes, and its roughness results in relatively high backscatter. Therefore, in this study, three surface categories were considered: floe ice (FI), brash ice (BI), and open water (OW). The FI category combines FYI and MYI, both of which are commonly of round or elliptical shape, with medium backscatter. The BI category represents the crushed ice between ice floes, and OW denotes the ice-free area, which has the lowest backscatter coefficient, because of its relatively smooth surface.
To construct the machine learning model (see Section 3.1), training data and validation data were required. The training data were used for developing the model. The validation data served as auxiliary data for tuning the parameters of the model to avoid overfitting and were used to improve the model capability by checking the performance of the model during the training phase.
The training and validation datasets used in developing the model were generated as follows. Firstly, we labeled the areas of the different surface types, i.e., FI, BI, and OW, in the RGB composite SAR scenes using LabelMe [34], which is an efficient open-source graphical image annotation tool. Figure 3 shows the sparse labeled areas in the composite image of the R3-16 scene, with enlarged segments representing the three surface categories of FI, BI, and OW in blue, green, and red, respectively. The labeled areas may not feature homogeneous SAR backscatter because the given surface may have a range of backscatter. The labeled areas were randomly selected but evenly distributed within the image space, and they occupy a small percentage (about 0.14%) of the entire image area. For each pixel in the labeled area of each type, the pixel and its neighboring pixels in the composite SAR image were extracted as a patch. Each patch was considered to represent a single surface type, i.e., the type of the center pixel. Figure 4 depicts a virtual segment in an image, with the three colors representing the three surfaces of FI, BI, and OW. For instance, the black outer boundary represents the labeled area, with all the pixels inside representing the FI surface. For each pixel within the labeled area, a window is established. This is shown in the dotted lines for pixels "a", "b", "c", and "d". The For each pixel in the labeled area of each type, the pixel and its neighboring pixels in the composite SAR image were extracted as a patch. Each patch was considered to represent a single surface type, i.e., the type of the center pixel. Figure 4 depicts a virtual segment in an image, with the three colors representing the three surfaces of FI, BI, and OW. For instance, the black outer boundary represents the labeled area, with all the pixels inside representing the FI surface. For each pixel within the labeled area, a window is established. This is shown in the dotted lines for pixels "a", "b", "c", and "d". The window is 3 × 3 pixels in this example, where the window constitutes one patch. With the changing of the window size, the surface type information contained in a given patch becomes different. To determine the most appropriate information content for GF-3 QPS mode data for the algorithm, four patch sizes were tested in this study, i.e., 25 × 25, 31 × 31, 37 × 37, and 43 × 43. All the pixels within a patch constituted a training sample, which was used as the input for the deep learning network. As a patch may contain peripheral pixels, the surface types of the pixels within the same patch can be different. As the VV, VH, and HH polarizations were considered, each image patch was a 3-D matrix of size, 31 × 31 × 3. The number of generated patches is equal to the number of pixels in all the labeled area. Of the generated patches, 80% were randomly selected as training data, and the other 20% were used for the validation. We did not reject any training data. The number of training samples for the R1, R2, and R3 scenes were 207,409, 219,888, and 344,043, respectively, and the ratios of FI, BI, and OW were about the same in each scene. mode data for the algorithm, four patch sizes were tested in this study, i.e., 25 × 25, 31 × 31, 37 × 37, and 43 × 43. All the pixels within a patch constituted a training sample, which was used as the input for the deep learning network. As a patch may contain peripheral pixels, the surface types of the pixels within the same patch can be different. As the VV, VH, and HH polarizations were considered, each image patch was a 3-D matrix of size, 31 × 31 × 3. The number of generated patches is equal to the number of pixels in all the labeled area. Of the generated patches, 80% were randomly selected as training data, and the other 20% were used for the validation. We did not reject any training data. The number of training samples for the R1, R2, and R3 scenes were 207,409, 219,888, and 344,043, respectively, and the ratios of FI, BI, and OW were about the same in each scene. The composite images of the R1-1, R2-6, and R3-15 SAR scenes were used to test the performance of the final trained model for each region, respectively, and the rest of the scenes in each region were used for the labeling and making the training data. The training scenes were not used for the testing, considering the possible overfitting of the machine learning.

Methodology
In this study we constructed a deep neural network structure called MSI-ResNet for classifying the three surface types of FI, BI, and OW in the GF-3 SAR QPS data. This structure is based on ResNet V2 [19] after shrinking and modifying the original network to allow for classification of a small number of categories in high resolution SAR imagery as the input data. In the field of machine learning, different patch sizes and inputs will have an influence on the classification results. Based on MSI-ResNet, the effect of the patch size and the classification performance of different polarization combinations of GF-3 QPS mode data were explored. For the classification result assessment, a stratified random sampling method was used to compare the results with the visual classification of the surface types in the SAR images. To further assess the MSI-ResNet classification results, the results were also compared with the classification results obtained using the SVM classifier. The specifics of the MSI-ResNet structure and stratified random sampling method are respectively presented in Sections 3.1 and 3.2. Data from the images with an ID of 1, 6, and 15 ( Figure 1 and Table 1) were selected for performing the accuracy evaluation for each region, and the other data in the same region were used for the training. The composite images of the R1-1, R2-6, and R3-15 SAR scenes were used to test the performance of the final trained model for each region, respectively, and the rest of the scenes in each region were used for the labeling and making the training data. The training scenes were not used for the testing, considering the possible overfitting of the machine learning.

Methodology
In this study we constructed a deep neural network structure called MSI-ResNet for classifying the three surface types of FI, BI, and OW in the GF-3 SAR QPS data. This structure is based on ResNet V2 [19] after shrinking and modifying the original network to allow for classification of a small number of categories in high resolution SAR imagery as the input data. In the field of machine learning, different patch sizes and inputs will have an influence on the classification results. Based on MSI-ResNet, the effect of the patch size and the classification performance of different polarization combinations of GF-3 QPS mode data were explored. For the classification result assessment, a stratified random sampling method was used to compare the results with the visual classification of the surface types in the SAR images. To further assess the MSI-ResNet classification results, the results were also compared with the classification results obtained using the SVM classifier. The specifics of the MSI-ResNet structure and stratified random sampling method are respectively presented in Sections 3.1 and 3.2. Data from the images with an ID of 1, 6, and 15 ( Figure 1 and Table 1) were selected for performing the accuracy evaluation for each region, and the other data in the same region were used for the training.

Structure of MSI-ResNet
A neural network is able to establish the intrinsic connection between input-target pairs when they are well associated [35]. A deep learning network, which is also known as a deep neural network, consists of an input layer, hidden layers, and an output layer. The hidden layers include convolutional layers, pooling layers, and fully connected layers. A convolutional layer (conv) functions as a feature extractor by convolving with the input data, generally using multiple kernels of a specific size. The convolved features are then nonlinearized by an activation layer to produce the feature maps. A pooling layer compresses the feature map to reduce its redundancy and converts the output to a vector during the last pooling process. All the learned features of the previous layers are combined by the fully connected (fc) layer to determine the desired patterns.
Deep learning models have been widely used in image classification. Among the different models, deep residual neural network models, and especially the ResNet V2 model, can solve the problem of gradient explosion and gradient disappearance. Therefore, according to the characteristics of SAR remote sensing imagery and the principle of the ResNet V2 model, we designed a lightweight deep residual neural network model for sea ice classification, i.e., MSI-ResNet, based on pixels (Figure 5a). The model effectively shortens the training time, while improving the training efficiency and classification accuracy. As mentioned above, there are two types of blocks in the MSI-ResNet model, as shown in Figure 5b,c. Each block consists of three layers: a batch normalization (BN) layer, a rectified linear unit (ReLU) activation layer, and a weight layer (the parameters of that convolution). In addition to the residual block itself, the BN is carried out to prevent the gradient from disappearing and exploding in each residual block, which can effectively improve the training efficiency. Suppose that the input of the -th residual block is and the output is . In most cases (as shown in Figure 5b), in a residual block performs two convolution operations with the step size of 1. The residual ( , ) plus gives : The dimensions of and must be equal in Equation (2) to conduct the addition operation. In another case (Figure 5c), as shown in the third residual block in Figure 5a, the number of channels and the size of each channel have changed which will cause the inputs and outputs to have different dimensions as mentioned above. To achieve the addition operation for and ( , ), a convolution operator is needed for to make the convolved and ( , ) have the same channel number and size. The calculation formula is: We set the learning rate to 0.0001, the weight decay to 0.0001, the BN decay to 0.997, and the batch norm scale to 10 −5 . The classifier is the SoftMax function.
The loss function is the SoftMax-cross-entropy cost function, which is defined as: The MSI-ResNet model is structured in 10 layers, as shown in Figure 5a, with input images of a size of 3 × 31 × 31. The first convolutional layer's kernel size is 5 × 5, the stride is 2, and the number of convolution kernels is 32. It is the largest layer of convolution kernels in the model to obtain the features of large neighborhood of the model and image denoising. The resulting vectors are processed using max pooling with a stride of 1. Four residual blocks follow, each of which consists of two convolutional layers with the kernel size of 3 × 3, and the inputs of each block are connected with the outputs using an arrowed curve. The kernel number generally increases as the neural network becomes deeper to learn more features of the specific inputs. Each convolutional layer in the first two residual blocks has 32 kernels, and there are 64 kernels for the last two blocks in the structure of MSI-ResNet. The stride of the third block changes to 2 in order to decompress the outputs of that block as the channel number doubles, while all the other blocks remain with a default stride of 1. This leads to dimension inconsistency in the third residual block, whose input and output dimensions are 64 × 14 × 14 and 64 × 7 × 7, respectively. We use the dotted curve to represent this in Figure 5a. The other three solid lines refer to a consistent dimension connection for the blocks. The specific structure of these two residual blocks are shown in Figure 5b,c, respectively. After the residual blocks, the image is processed by average pooling to reduce the dimension of the image, and a vector of 64 × 1 × 1 is obtained, which greatly reduces the computational load. The last layer is a fully connected layer, which outputs the probability of the center pixel of the input image belongings to each kind of surface type.
As mentioned above, there are two types of blocks in the MSI-ResNet model, as shown in Figure 5b,c. Each block consists of three layers: a batch normalization (BN) layer, a rectified linear unit (ReLU) activation layer, and a weight layer (the parameters of that convolution). In addition to the residual block itself, the BN is carried out to prevent the gradient from disappearing and exploding in each residual block, which can effectively improve the training efficiency. Suppose that the input of the l-th residual block is x l and the output is x l+1 . In most cases (as shown in Figure 5b), in a residual block x l performs two convolution operations W l with the step size of 1. The residual F(x l , W l ) plus x l gives x l+1 : The dimensions of x l and F must be equal in Equation (2) to conduct the addition operation. In another case (Figure 5c), as shown in the third residual block in Figure 5a, the number of channels and the size of each channel have changed which will cause the inputs and outputs to have different dimensions as mentioned above. To achieve the addition operation for x l and F(x l , W l ), a convolution operator W s is needed for x l to make the convolved W s x l and F(x l , W l ) have the same channel number and size. The calculation formula is: We set the learning rate to 0.0001, the weight decay to 0.0001, the BN decay to 0.997, and the batch norm scale to 10 −5 . The classifier is the SoftMax function.
The loss function is the SoftMax-cross-entropy cost function, which is defined as: where h(x i ) is the predicted output, y i is the expected output, n is the total number of samples, and x i is an input vector.

The Stratified Random Sampling Assessment Method
Due to the large width and resolution of SAR image coverage, it is always difficult to obtain accurate reference data by field measurement or manual annotation. Evaluation of the classification accuracy is thus usually conducted by the use of sampling and constructing an error matrix. Stratification is a common technique for data sampling when there are certain subdivisions in the imagery. If a random sample is taken in each stratum, the whole procedure is described as stratified random sampling which guaranteed that the strata have already been constructed. The stratified random sampling method allows each stratum to have different classification accuracy expectations [36]. As a result, stratified random sampling has been applied to many remote sensing image classification accuracy assessment tasks [37,38]. One of the convenient forms of stratified random sampling formulas for any allocation with continuous data is given as follows [36]: where the term n is the number of samplings for each stratum, the suffix h denotes the stratum, W h = N h /N is the stratum weight, N h is the total number of units, N is the total number of units. In our study, N is the total pixel number of the validation image. S 2 h is the unbiased estimate of the true variance for a certain stratum (the divisor for the variance is where U h is the expected user's accuracy for each stratum. We set the user's accuracy values of FI, BI, and OW as 0.7, 0.9, and 0.95, respectively, which are appropriate values, according to the previous research [10,16,39]. The term V is the desired variance of all the estimates, which was specified as 0.01 in this study. Figure 6 depicts the results of the sea ice classification of the R3-15 scene using patch sizes from 25 to 43 with a step size of 6 using the MSI-ResNet method with the VV, VH, and HH polarization combination as input. The blue, green, and red colors denote FI, BI, and OW, respectively. The assessment of the classification accuracy was performed based on visual interpretation of the imagery. Random sampling from each ice type was used while maintaining the proportion of the samples from each class. The confusion matrix is shown in Table 2.

Experiments with the Patch Size
Remote Sens. 2021, 13, x FOR PEER REVIEW 10 of 22

The Stratified Random Sampling Assessment Method
Due to the large width and resolution of SAR image coverage, it is always difficult to obtain accurate reference data by field measurement or manual annotation. Evaluation of the classification accuracy is thus usually conducted by the use of sampling and constructing an error matrix. Stratification is a common technique for data sampling when there are certain subdivisions in the imagery. If a random sample is taken in each stratum, the whole procedure is described as stratified random sampling which guaranteed that the strata have already been constructed. The stratified random sampling method allows each stratum to have different classification accuracy expectations [36]. As a result, stratified random sampling has been applied to many remote sensing image classification accuracy assessment tasks [37,38]. One of the convenient forms of stratified random   The user's accuracy is the correctly classified pixels in a category divided by the total number of pixels that are classified into that category. The producer's accuracy is the number of the correctly classified pixels of a category divided by the number of reference pixels selected from the training data [40,41]. The overall accuracy combines these two measures. The kappa coefficient takes the bias caused by sample size differences into account, so that it can be used to evaluate the consistency between the model prediction results and the actual classification results. A high kappa coefficient value means high consistency. Table 2 shows that the overall accuracy and kappa coefficient reach their maximum values (94.67% and 0.91, respectively) when the patch size is 31 × 31, and their minimum values (89.53% and 0.83, respectively) when the patch size is 43 × 43. This means that the patch size may add noise that hinders the development of the machine learning, and hence affects the classification accuracy. We recommend exploring the optimal patch size to refine the accuracy of the classification. Considering all the examined patch sizes, the average user's (producer's) accuracies for FI, BI, and OW are 95.37% (93.17%), 87.79% (84.43%), and 87.33% (95.47%), respectively. When using the minimum and maximum patch sizes in Table 2, the variation range of the user's (producer's) accuracy is 1.01% (5.35%), 8.83% (12.93%), and 17.54% (6.89%) for FI, BI, and OW, respectively, and the corresponding variance is 0.25 (5.76), 16.04 (43.17), and 52.57 (10.53), respectively. The FI shows the highest classification accuracy, which remains relatively steady with the changing patch size. The accuracy of BI is more variable than that of FI, with the largest variance of the producer's accuracy. The OW is most sensitive to the patch size, and shows the highest accuracy variance.
The patch size of 31 × 31 was used in the subsequent exploration. The OW is overestimated in all the patch sizes since its user's accuracy is always lower than the producer's accuracy. As for the resulting fractions of these three ice surfaces (Table 2), the OW fraction increases from 18.8% to 23.24% as the patch size increases, which is very different to the fluctuations for FI and BI.

Experiments with Polarization Data Combination
The classification results for the R3-15 scene obtained by MSI-ResNet with different polarization combinations and a patch size of 31 × 31 are presented in Figure 7, and the corresponding confusion matrix is shown in Table 3. Noticeably, Figure 7d is the same as Figure 6c. In these experiments, only one type of copolarization data (VV polarization) was used. Additionally, as the VH and HV polarizations are physically reciprocal, only the VH cross-polarization was considered. Table 3 shows that the overall accuracy and kappa coefficient increase with the added polarization data. The combination of the three polarizations results in the maximum accuracy and kappa coefficient. The improvement over using VV only, VH only, and the combination of VV + VH is 11.3%, 3.58%, and 3.55%, respectively. The worst discrimination result is with the single VV polarization as the input data. The VH polarization leads to a similar overall accuracy and kappa to that obtained from using the combination of VV and VH polarization inputs, at around 91% and 0.85, respectively. This is an improvement of about 7.7% and 0.15 (respectively) compared to the use of VV polarization only.
is an improvement of about 7.7% and 0.15 (respectively) compared to the use of VV polarization only.
The average user's (producer's) accuracy for FI, BI, and OW is 89.51% (94.77%), 91.83% (81.13%), and 88.65% (87.55%), respectively, when considering the VV, VH, VV + VH, and VV + VH + HH polarization combinations together. This also shows the approximate order of the feasibility of the identification of each surface.    Using the single VV polarization, the classification of the BI surface type achieves the highest user's accuracy of 94.63% and the lowest producer's accuracy of 67.46%. Most of the BI is misclassified into the overestimated FI, as shown in Figure 7a and Table 3. The VH polarization experiment results in similar user's and producer's accuracies for every ice type, as well as OW. The overall accuracy from using the copolarization VV data is higher, especially for FI and OW discrimination. In short, when dual-polarization or multipolarization data are used for sea ice classification, the proportion of FI misclassification is greatly reduced, which also improves the classification accuracy of BI and OW.
Almost all the ice types achieve the maximum user's and producer's accuracies with the three polarizations as input data, except for the producer's accuracy of BI, which shows the highest value using the VV and VH polarization combination. Furthermore, all the ice types show the minimum accuracy when using the copolarization input. The classification accuracy for the VV and VH polarization combination is slightly higher than that when using VH polarization only, but is very much better than the results obtained when using the VV polarization. This confirms that using dual-or quad-polarization data can improve the sea ice classification precision when compared to using single polarization data. The improvements in the overall accuracy and kappa from dual to quad modes are 3.55% and 0.06, respectively. Figure 8 shows the box-whisker plots of the backscatter coefficient statistics of the three surface types from all the images in the R3 region based on the sea ice classification results obtained using the MSI-ResNet method with the input of the VV, VH, and HH polarization combination and a patch size of 31 × 31. The circles denote the median values. The values corresponding to the upper and lower boundaries of each solid rectangle are the upper quartile (Q3) and lower quartile (Q1), respectively, and Q3−Q1 is the interquartile range (IQR). The upper and lower extremes of each box-whisker plot are the Q3 + 1.5*IQR and Q3 − 1.5*IQR, respectively. The lower limits of the box-whisker plots in Figure 8 suggest that the noise equivalent sigma zero (NESZ) values of the VV, VH, and HH polarizations are approximately −33, −45, and −33 dB (near to the 40-42 • incidence angle), which are comparable values to those reported in previous GF-3 research [23,24]. The median backscatter coefficient values of FI, BI, and OW in the VV polarization in Figure 8 are closer together than in the VH polarization, which confirms that the classification performance of the VH polarization is better than that of the VV polarization. The separation between the three types in the HH polarization is better than that in the VV polarization, which results in the overall accuracy being further improved when the three polarizations are used together.

Classification of the R1-1 and R2-6 Scene Images
To further investigate the stability of the sea ice classification performance of GF-3, another two sea ice classification experiments based on MSI-ResNet were conducted using the R1-1 and R2-6 scene data. The related results are shown in Figure 9 and Table 4. For each experiment, the VV, VH, and HH polarization combination data were used with a patch size of 31 × 31. To further investigate the stability of the sea ice classification performance of GF-3, another two sea ice classification experiments based on MSI-ResNet were conducted using the R1-1 and R2-6 scene data. The related results are shown in Figure 9 and Table 4. For each experiment, the VV, VH, and HH polarization combination data were used with a patch size of 31 × 31.

Application and Comparison
4.3.1. Classification of the R1-1 and R2-6 Scene Images To further investigate the stability of the sea ice classification performance of GF-3, another two sea ice classification experiments based on MSI-ResNet were conducted using the R1-1 and R2-6 scene data. The related results are shown in Figure 9 and Table 4. For each experiment, the VV, VH, and HH polarization combination data were used with a patch size of 31 × 31.    The R1-1 images of the Beaufort Sea in late spring ( Figure 2) contain many scattered large ice floes with rough surface, scattered ice debris, brash ice, and extended open water with a visible wave induced rough surface in the northwest part of the scene. On the other hand, the R2-6 image (north of the Severnaya Zemlya archipelago in midsummer) contains many small ice floes surrounded with crushed ice. The overall accuracies (kappa) for those two areas are 94.62% (0.92) and 94.23% (0.90), respectively, which are as high as the results for R3-15 when using MSI-ResNet (in Table 2). For each region, the FI shows the best classification results from the aspect of both the user's and producer's accuracies. The user's accuracy for OW is much higher than the producer's accuracy in these two cases, unlike the case for the R3-15 scene, which indicates that the OW is slightly underestimated. Moreover, the BI is overestimated in the R1-1 scene, with a relatively low user's accuracy.

Comparison with the SVM Classifier
The results of the classification of the R3-15 (Table 3) scene obtained using MSI-ResNet are compared to the results achieved using the LibSVM classifier [29] in Table 4, where calibrated, filtered, and scaled VV, VH, and HH backscattering coefficients were used. The related parameters for LibSVM used in this study were 8, 17, and 31 × 31 for the displacement, quantization, and region size, respectively, based on former studies of the sea ice classification of SAR data [42]. The radial basis function was chosen as the kernel for the application of LibSVM. The training data were 4000 FI pixels, 4000 BI pixels, and 4000 OW pixels. The classification results are displayed in Figure 10 and Table 4. The results of the LibSVM classifier show an overall accuracy and kappa coefficient of 5.63% and 0.1, respectively, which are both lower values than the results obtained from using MSI-RestNet. In addition, the FI is overestimated when using the LibSVM method. Improving the accuracy of FI detection may be the main direction of future optimization when using this method by testing the sensitivity of the displacement, quantitative, region size, and number of training samples.

Comparison with the SVM Classifier
The results of the classification of the R3-15 (Table 3) scene obtained using MSI-ResNet are compared to the results achieved using the LibSVM classifier [29] in Table 4, where calibrated, filtered, and scaled VV, VH, and HH backscattering coefficients were used. The related parameters for LibSVM used in this study were 8, 17, and 31 × 31 for the displacement, quantization, and region size, respectively, based on former studies of the sea ice classification of SAR data [42]. The radial basis function was chosen as the kernel for the application of LibSVM. The training data were 4000 FI pixels, 4000 BI pixels, and 4000 OW pixels. The classification results are displayed in Figure 10 and Table 4. The results of the LibSVM classifier show an overall accuracy and kappa coefficient of 5.63% and 0.1, respectively, which are both lower values than the results obtained from using MSI-RestNet. In addition, the FI is overestimated when using the LibSVM method. Improving the accuracy of FI detection may be the main direction of future optimization when using this method by testing the sensitivity of the displacement, quantitative, region size, and number of training samples.

Comparison with Sentinel-1 SAR Classification
For a comparison with the data of another SAR sensor, and to explore the applicability of the MSI-ResNet method, a scene of near-coincident Sentinel-1A (S1A) EW

Comparison with Sentinel-1 SAR Classification
For a comparison with the data of another SAR sensor, and to explore the applicability of the MSI-ResNet method, a scene of near-coincident Sentinel-1A (S1A) EW swath mode data, covering the GF-3 R3-15 to R3-18 scenes was processed using the MSI-ResNet classifier. The S1A scene was acquired on 17 June 2017 at 08:17 UTC, which was 41 min later than the GF-3 scene acquisition. The nominal cover is 400 × 400 km and the pixel spacing is 40 × 40 m. The EW mode at a slightly higher spatial resolution than the GF-3 QPS mode data was adopted for the comparison as it is the only mode of S1A that covers the 18 scenes of GF-3 data used in this study. The S1A Interferometric Wide mode data with a higher resolution of 10 m are unavailable in the coverage of our research region. Figure 11a shows the geographic location and coverage of the S1A scene (with the blue rectangular box). The coincident coverage of S1A over the area of the R3-15 scene (the black box in Figure 11a) was classified using the MSI-ResNet classifier, while the coincident coverages of scenes R3-16, R3-17, and R3-18 (the red box in Figure 11a) were used for the training.
The S1A false-color image for scene R3-15 is shown in Figure 11b. Regardless of the differences in the number of polarization channels and pixel spacing, the composite images of both GF-3 and S1A in the same area of R3-15 are visually very similar. All the polarization (HH and HV) images of S1A were radiometrically calibrated, Lee filtered, and scaled before the training. The improved and effective denoising method proposed in [43] was applied for the elimination of the additive and residual noise of the HV polarization. The variation of the incidence angle across the extracted subimage of S1A was small and therefore ignored. The patch size of 7 × 7 was found to have a better accuracy than 15 × 15 or 31 × 31 (not provided in the text), and was adopted for the S1A classification experiment. The accuracy assessment method of stratified random sampling was also applied for the S1A data.
pixel spacing is 40 × 40 m. The EW mode at a slightly higher spatial resolution than the GF-3 QPS mode data was adopted for the comparison as it is the only mode of S1A that covers the 18 scenes of GF-3 data used in this study. The S1A Interferometric Wide mode data with a higher resolution of 10 m are unavailable in the coverage of our research region. Figure 11a shows the geographic location and coverage of the S1A scene (with the blue rectangular box). The coincident coverage of S1A over the area of the R3-15 scene (the black box in Figure 11a) was classified using the MSI-ResNet classifier, while the coincident coverages of scenes R3-16, R3-17, and R3-18 (the red box in Figure 11a) were used for the training.   The classification image from S1A in Figure 11c is compared to the classification data from GF-3 R3-15 in Figure 11d which is the result of GF-3 with dual-polarization data (HH and HV) as the input in the MSI-ResNet classifier. Qualitatively speaking, the results are similar. The quantitative classification confusion matrix is presented in Table 5. The application of MSI-ResNet to the S1A data results in good discrimination of large ice floes and OW, with a user's accuracy of 89.16%, and 87.82%, respectively, which is comparable with the GF-3 results obtained using dual-polarization inputs, as shown in Figure 11d. Nevertheless, this approach is weaker in identifying the scattered tiny ice floes surrounded by BI, when compared with Figure 11d, which causes the overestimation of BI. This can be attributed to either the similar backscatter coefficients in the sea ice mixed region, the training data labeling or the patch size selection. In general, the GF-3 QPS mode data can capture more specific details in the discrimination of sea ice classification, especially for the scattered ice floes, than the S1A EW mode data with the MSI-ResNet method.

Discussion
In this study, we used 18 scenes acquired by the polarimetric SAR sensor onboard the Chinese GF-3 satellite to classify sea ice in late spring and summer in the Arctic Ocean, with the assumption that all the scenes have the same surface cover types, namely, floe ice, brash ice, and open water. The warm weather in summer induces flooding or snow melting of the ice floe surfaces, which lowers the backscatter of the ice floe surfaces. It also causes the melting of thin FYI and thus the expansion of the open water area, which increases the mobility of the ice floes and leads to formation of more brash ice. The study was limited in data and space, using only the 18 available scenes acquired over four days from three regions. The wind speed during the data collection did not exceed 6 m/s, and the incidence angle range across the 30-km swath of each scene was less than 3 • . We believe that this study is the first attempt to investigate the performance of GF-3 data in sea ice classification. Therefore, in this paper, it was necessary to provide comparisons with the other sensors used in previous studies.
Areas of low backscatter intensity, such as the flooded or snow melting surfaces of sea ice, are usually contaminated by system noise. The NESZ is a measure of the sensitivity of a given SAR system to areas of low backscatter [44]. Low backscatter areas, especially under low wind, large incidence angle, and cross-polarization conditions, can be well observed by SAR systems, with low NESZ values [45]. The empirically estimated NESZ values for GF-3 QPS mode shown in Figure 8 (under a wind speed of <6 m/s and an incidence angle of about 40 • ) are very low, and are comparable with the NESZ values of other C-band SAR sensors at the same incidence angle, e.g., −33 dB (HH, VV) and −34 dB (HV, VH) for Radarsat-2 fine quad mode data [46] and lower than the −22 dB for S1A EW mode data [45,47]. Notably, the low NESZ achieved by the cross-polarization denotes the good observation capabilities of GF-3 quad-polarization data in polar sea ice monitoring.
The backscatter coefficient is an essential parameter for surface classification in SAR images. It is triggered by different scattering mechanisms [48], and is affected by the surface properties, sensor parameters, and viewing geometry. Backscatter in measured in terms of its intensity, phase, and polarization. The polarization is particularly important when discriminating ice from the surrounding open water. This is because the ice surface may depolarize the backscatter if the surface becomes deformed or very rough, while the water surface does not depolarize the backscatter, no matter how wind-roughened the surface is [33]. The cross-polarization observations have been recognized as a good tool to discriminate sea ice from open water [49][50][51]. The reduced sensitivity to changes in incidence angle makes cross-polarization observations more suitable for sea ice classification [45,52]. Therefore, the use of σ 0 hv in this study resulted in an improvement of the overall accuracy (Table 3). The ocean clutter is more suppressed in σ 0 hh than in σ 0 vv , which makes the former better in ice-water discrimination [53]. This is also shown in the results of σ 0 vv +σ 0 vh (Table 3) and σ 0 hh +σ 0 hv (Table 5). There are some SAR ice classification studies that have been conducted for summer ice types although they overlap between the ice types as they become covered with wet snow or flooded surfaces. Park et al. [54] applied a new proposed semiautomated SAR-based sea ice classification scheme on the S1A EW data for classifying three summer ice types in the Fram strait region with an overall accuracy of 68%, and an ice-water discrimination accuracy of 98% which is comparable with the accuracy of about 90% acquired by Zhang et al. [55] using a mixture statistical distribution based conditional random fields model. Singha et al. [56] studied the influence of melting on sea ice classification and recommended an independent training for different seasons using ALOS-2 PALSAR data based on an artificial neural network method. By the use of NASA's airborne AIRSAR system in March over the Beaufort Sea, the results reported in [57] showed that C-band fully polarimetric data can achieve a 9% and 7% improvement over single-polarization (σ 0 vv ) and dual-polarization (σ 0 vv +σ 0 hh ) data in sea ice classification, respectively. This study was based on the maximum a posteriori classifier. The results of the present study showed that, by using the full-polarization parameters (σ 0 vv +σ 0 vh +σ 0 hh ) within a machine learning scheme, an improvement of 11.3% in classification accuracy over the accuracy obtained when using a single polarization (σ 0 vv ) can be achieved (Table 3). In addition to the direct usage of backscatter intensity, some studies have combined other information, such as the autocorrelation and the cross-and copolarization ratio and difference, to pursue a better performance [58][59][60]. Other studies have used another set of polarimetric parameters based on decomposition of the coherence or covariance matrices derived from vectors composed of elements of the scattering matrix. These include eigen decomposition of the coherence matrix and the generated canonical entropy/anisotropy/alpha-angle parameters [10,11,13,61]. These parameters are indicators of the power of the three main scattering mechanisms, i.e., surface, volume, and doublebounce. In the present study, we did not use such parameters, in order to focus on testing the use of machine learning using the more traditional backscatter parameters, i.e., σ 0 hh , σ 0 vv , and σ 0 vh . The sea ice classification accuracy of the other SAR sensors based on machine learning is described below, for comparison.
Zakhvatkina et al. [62] used the average backscatter value and eight texture features from ENVISAT ASAR wide-swath mode data based on a three-layer neural network method, and obtained an overall accuracy of 80% for four winter ice types. Liu et al. [63] applied the ice concentration and selected texture features for the second SVM iteration based on RADARSAT-2 ScanSAR mode data. This resulted in an overall accuracy of 91.74% for five late autumn ice types. Song et al. [17] applied the S1-ResNet method with 14 layers built on ResNet to the S1A EW HH polarization data, and achieved an overall accuracy of 90.3% and a kappa coefficient of 0.86, respectively. MSI-ResNet is also built on the ResNet structure, but our classification accuracy for S1A EW data is slightly lower. However, the differences in the design of the network, training data generation, and validation data make a direct intercomparison difficult. The performance for the other frequencies is also presented here, for comparison. Ressel et al. [64] applied the input of the co-pol ratio and other selected polarimetric features to the openly accessible fast artificial neural network (FANN) based on TerraSAR-X StripMap mode data, resulting in an overall accuracy of 95% for three spring sea ice classes and open water. Aldenhoff et al. [58] used the inputs of the σ 0 hh and σ 0 vh backscatter intensities, the σ 0 vh /σ 0 hh polarization ratio, and the σ 0 vh autocorrelation from the ALOS-2 PALSAR-2 wide-beam dual-mode data in a three-layer neural network. This resulted in an overall accuracy of 84.17% for ice and water classification.
Notably, most of the above-mentioned studies used data from the winter in the Arctic region. Since a neural network can establish the intrinsic connection between input/target pairs when they are well associated [65], the potential of expanding the present technique to winter ice data from GF-3, when such data become available, is a possible direction of future study.

Conclusions
In this study, a deep neural network method based on the ResNet deep learning structure was developed to classify late spring and summer sea ice types in GF-3 QPS mode satellite data obtained with a moderate incidence angle and at low sea surface wind conditions. The method, which is called MSI-ResNet, features 10 layers, and is aimed at performing classification of the three late spring and summer ice categories, i.e., FI, BI (between ice floes), and OW. The FI category was chosen because the backscatter from FYI and MYI are similar in the melt season. Experiments to test the effect of the patch size inherent in the algorithm and the different polarization combinations for the input were undertaken using SAR scenes from the Fram Strait. Another two groups of data from the Beaufort Sea and the north of the Severnaya Zemlya archipelago were used to further validate the classification performance. In addition, a comparison of the classification results was conducted using the results obtained from another classifier (the LibSVM method) and another sensor (Sentinel-1A). The classification accuracy in all the experiments was estimated by using visual interpretation of the images and stratified random sampling from the identified classes.
Based on the MSI-ResNet method, the patch size experiments indicated that the classification accuracy does not linearly increase with an increase in the patch size. The optimal overall accuracy for the three categories (4.15-5.14%) and kappa coefficient (0.06-0.08) were obtained with a patch size of 31 × 31. The OW was shown to be the most sensitive surface type to patch size. Meanwhile, the OW category was found to be overestimated in each patch size since the BI type tends to be misclassified as OW. On the other hand, FI was found to be less sensitive to the patch size and obtained the highest user's accuracy, which was 7.58-8.04% higher than that of BI and OW. Most of the misclassified pixels were between the FI and BI surface types.
The polarization combination experiments showed that the input combination of the three polarizations produces an improvement in the overall accuracy, kappa coefficient and the accuracy of the FI, BI, and OW surface types. The combination of VV and VH polarizations produced a much better improvement than the use of VV polarization only, but an insignificant improvement over the use of VH polarization only. The average overall accuracies of all three categories when using VV, VH, and VV + VH were 83.37%, 91.09%, and 91.12%, respectively. The VV polarization was found to overestimate FI, while the VH polarization produced a better classification accuracy for this category. The combination of the three polarizations also produced a high accuracy in the other scenes from the Beaufort Sea (R1-1), the scenes near the Central Arctic (R2-6), and the From Strait (R3-15) scene. Sea ice classification of GF-3 QPS mode data based on MSI-ResNet also performed better than the simple LibSVM classifier.
The GF-3 QPS mode data showed similar details for the scattered FI compared with the coincident S1A EW mode data in the same area of R3-15. Comparable classification results for FI and OW were obtained using MSI-ResNet with input from S1A (HH + HV) or GF-3 QPS (HH + HV). Considering that the GF-3 QPS mode data and the S1A EW mode data have the same magnitude of spatial resolution, the overestimation of the BI and its relatively low overall accuracy presented in the result for S1A imply that the newly designed deep learning model (MSI-ResNet) based on the ResNet structure, as presented in this paper, is more suitable for the GF-3 QPS mode data.
The different performances of MSI-ResNet with different patch sizes may be caused by the fading of the sharp boundaries between the FI and BI especially in the areas of small ice floes. This also makes it hard to visually identify the two ice types. However, there are sharp boundaries between the BI and OW. Therefore, if the extent of one type changes the other will change in the opposite direction.
Further investigation is recommended for future work. In addition to the patch size, other factors that may affect the classification accuracy of the MSI-ResNet method remain to be explored, such as the depth of the neural network model and the size of the sample number. To date, only summer data are available for GF-3. In the future, with the GF-3 QPS mode data of other seasons, a more comprehensive analysis should be performed to assess the classification accuracy for winter ice types. The optimal usage of LibSVM for GF-3 classification also has potential for improvement. In summary, the results of this