SI-STSAR-7: A Large SAR Images Dataset with Spatial and Temporal Information for Classification of Winter Sea Ice in Hudson Bay

Remote sensing satellites have been broadly applied to sea ice monitoring. The substantial increase in satellite imagery provides a large amount of data support for deep learning methods in the sea ice classification field. However, there is a lack of public remote sensing datasets to facilitate sea ice classification with spatial and temporal information and to benchmark the deep learning methods. In this paper, we provide a labeled large sea ice dataset derived from time-series sentinel-1 SAR images, dubbed SI-STSAR-7, and a validated dataset construction method for sea ice classification research. The SI-STSAR-7 dataset includes seven different sea ice types corresponding to different sea ice development stages in Hudson Bay during winter, and its samples are time sequences of SAR image patches in order to embody the differences of backscattering intensity and textures between different sea ice types, as well as the change of sea ice with time. We construct the dataset by first performing noise reduction and mitigation of incidence angle dependence on SAR images, and then producing data samples and labeling them based on our proposed sample-producing principles and the weekly regional ice charts provided by Canadian Ice Service. Three baseline classification methods are developed on SI-STSAR-7 to establish benchmarks, which are evaluated with accuracy and kappa coefficient. The sample-producing principles are verified through experiments. Based on the experimental results, sea ice classification can be implemented well on SI-STSAR-7.


Introduction
Sea ice monitoring in polar regions is very important because sea ice has a great impact on ocean hydrological conditions, atmospheric circulation, and climate change [1]. Moreover, sea ice can directly affect human activities such as ship navigation and seabed mining and may even lead to major disasters [2]. Sea ice classification is the basis of sea ice monitoring, and the classification results can be used to calculate sea ice concentration and area [3].
Many studies on sea ice classification have been carried out using Gabor wavelet techniques [4], Markov random field (MRF) [5,6], neural network [7,8], support vector machine (SVM) [9,10], and other methods. With the widespread application of deep learning in the image field, researchers have applied deep learning models to sea ice classification and achieved high classification accuracy [11,12]. However, the performance of deep learning methods depends on a large dataset. Some public remote sensing image datasets have been developed for ground object detection [13,14], ship identification [15], and oil well detection [16]. In the field of sea ice, although the products of sea ice extent and concentration are widely available [17], the remote sensing image datasets for sea ice classification are scarce [18]. To boost the development of effective deep learning methods, more studies on public sea ice datasets are needed.
The synthetic aperture radar (SAR) imagery from the satellites, such as Sentinel-1, Radarsat-2, and TerraSAR-X, can be a good data source for sea ice classification. SAR satellites have been widely used in sea ice monitoring [19,20]. As an active microwave sensor, SAR has the ability to observe the earth all day without the limitation of light and cloud and provide multi-polarization and multi-band high-resolution remote sensing data. Among the C-band satellites, Sentinel-1 A/B consists of two SAR satellites, can cover the entire world within six days to provide SAR data with the highest time resolution. The polarization modes of SAR imagery, such as single polarization, dual polarization, and full polarization, demonstrate different abilities in sea ice classification. Compared with single polarization, multi-polarization (dual polarization and full polarization) SAR images improve the classification and segmentation of sea ice [21][22][23]. The advantage of full polarized data is limited by long and narrow image bands in sea ice classification [8].
When constructing a sea ice classification dataset, what information it contains is related to what kind of classification models it can support. At present, most sea ice analysis methods are typically based on a combination of backscatter intensity, texture features, and some ancillary information applying different numerical models for classification and segmentation [24]. For example, Aldenhoff et al. [25], based on backscatter intensities in coand cross-polarization and autocorrelation as a texture feature, used neural networks to achieve the mapping between image features and ice/water classification. Chen et al. [26] used extended-maximum operator, morphological image processing, and geometrical features to identify individual MYI floes from SAR images. However, these studies only considered the differences between the spatial characteristics of different sea ice types. To develop deep learning models for sea ice classification or concentration prediction, Malmgren-Hansen et al. [27] fused Sentinel-1 SAR images with Advanced Microwave Scanning Radiometer 2 (AMSR2) data to produce the AI4Arctic/ASIP Sea Ice Dataset and used it to train a Convolutional Neural Network (CNN) architecture for sea ice concentration prediction. Khaleghian et al. [18] combined near-simultaneous SAR images from Sentinel-1 and optical data from Sentinel-2 to train a CNN sea ice classification model. Both studies have focused on the use of complimentary information to improve the model performance. So far, it has been rare to build a dataset that considers temporal dimension information.
Ice type can be defined in terms of the stage of sea ice development-from newly frozen smooth ice (Nilas), to deformed and roughened Young Ice, to thick ice cover that survived summer melts (Old Ice) with several intermediate stages [28]. Every year, sea ice has a melting period and a freezing period. Therefore, for the same area, the growth and development of sea ice are periodic and closed. Moreover, different types of ice grow at different speeds. The time correlation of sea ice has been applied in other areas of sea ice monitoring. For example, Petrou et al. [29] first studied the potential of Recurrent Neural Network (RNN) in predicting the trajectory of sea ice in the next few days based on previously observed satellite images. The predicted results obtained by this method are consistent with the real motion of sea ice. Chi and Kim [30] used Multi-Layer Perceptron (MLP) and Long Short-Term Memory (LSTM) to predict the Arctic sea ice concentration, respectively. The results show that the two methods are superior to the traditional autoregressive model in both numerical statistics and vision. It is clear that taking temporal information into consideration is helpful for sea ice studies. In addition, we previously studied the impact of the combined use of spatial and temporal information on sea ice classification, and the results show that the addition of temporal information is helpful for sea ice classification [31].
To promote the development of sea ice research based on spatial and temporal information and alleviate the lack of a large of labeled datasets suitable for deep learning, we constructed a spatiotemporal dataset for sea ice classification based on Sentinel-1 SAR images, namely SI-STSAR-7. The dataset was constructed using the GRD products of Sentinel-1 SAR images with HH and HV polarization models, obtained from Hudson Bay in winter. Seven sea ice types with 164,564 samples were included in the large dataset to support the training and test of deep learning models. Each sample is a 32 × 32 × 6 image block (i.e., a time series of image patches with the time-step of 6 and the patch size of 32 × 32) to maintain spatial and temporal information of sea ice. The dataset can be accessed at IEEE DataPort: http://ieee-dataport.org/open-access/si-stsar-7 (accessed on 20 May 2021).
A concise summary of our contributions are as follows:

1.
A complete method for constructing a spatiotemporal dataset for sea ice classification is provided. In sample production, the ice concentration principle, ice development principle, and cross-subswath principle are creatively proposed to improve the quality of the dataset. Among them, the cross-subswath principle can effectively alleviate the impact of Sentinel-1 thermal noise on sea ice classification, especially in the first subswath region, which provides a reference scheme for the region subject to high thermal noise in sea ice research.

2.
Using the proposed method, we provide a large spatiotemporal dataset for sea ice classification based on Sentinel-1 SAR images. This is the first large labeled sea ice SAR dataset that provides both spatial and temporal information. We also preliminarily studied the impact of time-step (the number of consecutive SAR scenes) on sea ice classification.

3.
Comprehensive evaluation results of three advanced classification algorithms based on accuracy and kappa coefficient, are presented as the benchmarks of sea ice classification using SI-STSAR-7.
This paper is organized as follows. Section 2 presents the process of constructing the dataset. Section 3 presents the baseline experimental results on the constructed dataset and the verification results of our proposed sample-producing principles. Section 4 is the conclusion.

Dataset Construction
The construction process of the SI-STSAR-7 dataset in this study is shown in Figure 1. We firstly determined the data source and study area, taking into account the accessibility of SAR data, the availability of sea ice reference data, and the diversity of sea ice types. Then, we downloaded the SAR images and the corresponding reference data within the determined location and time. Next, SAR image preprocessing was undertaken to reduce the impacts of noises and the effect of incidence angles. Finally, the sea ice data samples were produced following a set of principles to ensure the accuracy of sea ice classification as high as possible. The details of the dataset construction are presented in the following subsections.

SAR Source and Study Area
The SAR data used in this study come from the Sentinel-1 satellite data that is freely accessed from the Copernicus Open Access Center (https://scihub.copernicus.eu (accessed on 20 May 2021)). Sentinel-1's satellites A and B share the same orbital plane, with an orbital phase difference of 180 degrees, which can provide a 6-day repetition period at the equator. Sentinel-1 can provide a dual-polarization mode with high time resolution, and thus it is suitable for monitoring the change of sea ice.
The format of SAR data we used is as follows: a medium resolution dual-polarization (HH and HV) Level-1 ground range detected (GRD) product obtained in the Extra Wide (EW) swath mode, with a standard strip width of 400 km, a resolution of about 90 m, and a pixel spacing of 40 m. In our study area of Hudson Bay, Sentinel-1A and Sentinel-1B alternately provides a SAR image at the designated location every six days. There are some differences between the two sets of images, as listed in Table 1. It should be noted that the exact size of every SAR image differs by several pixels from those given in Table 1, and the coordinates of SAR images from the two satellites do not overlap exactly. To maintain the position consistency of the two satellite images, we cut out the non-overlapping regions of the data in the study. Following our previous study [31], we selected Hudson Bay as the study area, which locates in the central-eastern part of Canada and its sea ice condition has been long-term monitored by Canadian Ice Service (CIS). It has a large area and a long freezing process, which can provide a large sea ice coverage and different sea ice types and is suitable for data preparation. The time span of the SAR data was set to two freeze-up periods of Hudson Bay from October 2019 to May 2020 and from October 2020 to April 2021.
A total of 80 SAR scenes were collected to provide the data needed for our study. It covers all types of sea ice in Hudson Bay, including seven types: open water (OW), new ice (NI), grey ice (GI), grey-white ice (GWI), thin first year ice (ThinFI), medium first year ice (MedFI), and thick first year ice (ThickFI). In this study, sea ice type is defined based on the stages of development (SoD), which is indeed based on thickness of ice, as shown in Table 2. A fact worthy of our attention is that the sea ice in the freeze-up period is moving as it develops. Although the World Meteorological Organization (WMO) standard defines 15 ice classes with different SoD codes, only seven of them are involved in our study area (see Table 2).

Reference Data
Reference data are used for sample label preparation, that is to help identify the ice types. We refer to weekly regional ice charts and weekly regional ice data provided by CIS, available at https://iceweb1.cis.ec.gc.ca/Archive/page1.xhtml (accessed on 20 May 2021). The weekly regional ice chart is in graphics interchange format (GIF), made by ice experts based on satellite imagery, weather, and oceanographic information, visual observations from ships and aircrafts within three days of the effective date. The weekly regional ice data are in ESRI ArcInfo interchange file (E00) format, containing coordinate and geometry data and can be imported into geographic information system software (such as ArcGIS). In the ice chart, ice information is presented in the ice egg format and color-coded using the WMO Standard. The ice egg contains information on ice concentration, stage of development (sea ice type), and morphology (size of ice floe). Figure 2 shows the weekly ice chart of Hudson Bay on 14 December 2020. The ice egg circled in red gives the ice information in the area where it is located. The 9+ represents the total ice concentration in the area, which means that more than 90% of the area is covered by ice. The notations "5" and "7" indicate that there are two types of sea ice: GWI and ThinFI (see Table 2 for sea ice type code). The digits "1" and "9" indicate that these two types of sea ice account for 10% and 90% of the total ice, respectively. The light green color indicates that ThinFI dominates the region. More information about the ice chart can be found on the Government of Canada website [32]. The ice egg circled in red indicates that more than 90% of the area is occupied by ice, and the ThinFI (code 7) with the proportion of 90% of the total ice is dominating.

SAR Image Preprocessing
SAR is a coherent imaging radar system with high resolution, and speckle noise is its inherent characteristics [33]. Moreover, its unique working mode causes scalloping effect and interswath discontinuities effect on SAR images, which have a great impact on sea ice classification [34]. The cross-polarized channel of Sentinel-1 is highly affected by thermal noise in the low backscatter intensity region, which is a typical characteristic of open water and smooth first-year ice [35]. Although the European Space Agency (ESA) provides a calibrated noise vector for noise power subtraction, the contribution of residual noise is still significant considering the relatively narrow backscatter distribution of the cross-polarized channel [34]. In addition, the SAR images with a wide range of incidence angles have significant changes in backscatter intensity and contrast due to the continuous change in incidence angles, which poses great challenges to the successful visual or automatic classification of Sentinel-1 EW images [36].
In order to alleviate the impact of SAR image residual noise and incidence angle effects on sea ice classification, we followed the generic workflow for processing Sentinel-1 GRD products and use the Sentinel Application Platform (SNAP) [37] to preprocess SAR data [38,39]. The steps are shown in Figure 3. The preprocessing steps in this paper are a more comprehensive preprocessing scheme than our previous study [31], adding the reduction of the border noise and speckle noise of the SAR image, as well as the mitigation of the incidence angle dependence. It should be noted that the preprocessing does not really change the SAR image resolution. The range doppler terrain correction, however, projects the ground range image into a geographic coordinate WGS84, in which each cell size is about 0.00036 • in both longitude and latitude.

Noise Reduction
Our preprocessing scheme mainly reduces the boundary noise, thermal noise, and speckle noise in SAR images. Among them, Lee Sigma filter is used to reduce speckle noise, the parameters of which are set to kernel size 7 × 7, target window size 3 × 3, and sigma 0.9. Figure 4 shows the comparison before and after the preprocessing of "Applying Precise Orbit File → Border Noise Removal → Thermal Noise Removal → Speckle Filtering → Calibration to Sigma0 → Conversion to dB" for the HV polarization image of SAR data acquired on 24 November 2020. After noise reduction processing, it can be found that the speckle noise of the image is significantly reduced, and the texture details of the ice are clearer, which is conducive to the classification of sea ice. Regarding the interswath discontinuities effect, except for the intensity discontinuity between the first subswath (EW1) and the second subswath (EW2), the obvious offset between the other subswaths is effectively alleviated.

Incidence Angle Dependence Correction
To compensate for the incidence angle effect, we need to determine the empirical relationships between incidence angle (θ) and the backscattering coefficient (σ 0 ) at HHand HV-polarizations. A lot of research results show that the linear regression model well describes the dependence of σ 0 in dB scale on θ [40][41][42]. Although some researchers expect the incidence angle dependence of radar backscatter to be lower at cross-polarization compared to co-polarization [43], Aldenhoff et al. [42] demonstrate that incidence angle normalization should be applied to both co-polarization and cross-polarization images to minimize the variation of backscatter intensity within the same ice type in one image swath.
In this study, the dependence of σ 0 in dB scale on θ determined by the linear regression model [40] was used to compensate for the incidence angle effect. The compensation formula is as follows: where σ 0 N is the compensated sigma naught value, σ 0 is the sigma naught value before compensation, c is the dependence of sigma naught value on incidence angle, θ is the incidence angle corresponding to the pixel, θ 0 is the central value of the wide incidence angle range of SAR products.
It is not a straightforward task to compensate for the effect of incidence angle dependence for different ice types separately, because this would first require classifying the SAR image into the required ice type, which typically results in a certain number of misclassifications between the ice types. Coarse correction of generic sea ice σ 0 versus θ loops can also improve the results significantly in many sea ice classification studies based on SAR data. It should be noted that for some sea ice types (such as smooth young ice or rough thin ice), the general correction may be lower or higher than the true dependence of σ 0 on θ.
In this study, referring to the research results of Mäkynen et al. [40] and Aldenhoff et al. [42], which also studied Sentinel-1 SAR products with EW mode and dual polarization GRD, the incidence angle dependence coefficients for HH polarization and HV polarization were determined to be −0.24 and −0.16 dB/ • , respectively. The central value of the wide incidence angle range was set to 33 • calculated based on our SAR data. Then, the incidence angle dependence coarse correction on SAR image was performed using (1). Figure 5 shows the dependence of σ 0 on θ of our SAR image before and after the coarse correction. Figures 5a and 5c are graphs of the dependence of σ 0 on θ for HH-and HVpolarization without incidence angle dependence correction, respectively. Compared with HV-polarization, the backscattering coefficient of HH-polarization is more sensitive to the incidence angle, which is consistent with the conclusion of Nghiem et al. [43]. The HH-polarization after the incidence angle dependence coarse correction (Figure 5b) shows that the incidence angle dependence of GI, GWI, ThinFI, and MedFI is almost eliminated, while the dependence of NI and ThickFI is over-corrected slightly. In addition, the change of the incidence angle has a great influence on OW (Figure 5a). Its residual dependence is still significant after coarse correction (Figure 5b) because the incidence angle dependence coefficients are determined only based on sea ice. However, this does not affect the accurate identification of OW. In HV-polarization, the incidence angle dependence of different types of sea ice is in different directions (Figure 5c), and coarse correction can relatively improve the effect of the incidence angle on the backscattering coefficients of different sea ice on the whole (Figure 5d). Figure 5. The dependence of the backscattering coefficient on incidence angle. Before (a) and after (b) coarse correction of HH-polarization incidence angle dependence. Before (c) and after (d) coarse correction of HV-polarization incidence angle dependence.

Sample Production
To produce representative data samples for training and testing sea ice classification models, SAR images need to be filtered to meet the sample requirements. In addition to the sea ice concentration proposed in paper [31], factors such as the development process of sea ice, the interswath discontinuities effect, the scalloping effect on the first subswath, as well as the edge mismatch between the SAR image and the ice chart will affect the quality of the dataset. To alleviate the influence of these factors, we refer to the ice chart and ice data that have the least time difference between SAR image acquisition time and ice charting time, and select sample scenes and regions according to the following explained principles: When producing the samples for a specific type of sea ice, we believe that the higher the concentration of this kind of ice, the more reliable the SAR images (or partial images) where the ice is located can represent this ice type. Therefore, we rely on the information of ice eggs in the ice chart and attempt to find when and where the total ice concentration reaches 90% or more and the dominant ice type accounts for 90% and above of the total ice area. Take the example shown in Figure 2, the ice egg circled in red indicates that the total ice concentration in the area exceeds 90%, and the sea ice type ThinFI represented by the notation "7" accounts for 90% of the total ice, hence this area can be used as the sample region of ThinFI.
The requirement of the main ice concentration can ensure that the sample represents a type of sea ice as much as possible, to avoid the impact of the incorrect label of sea ice type on the classification performance. However, due to the rapid growth rate of Ni ice type, this type of ice never reaches a high concentration rate. Thus, SAR images with 50% NI ice concentration on 9 December 2019, and 70% NI ice concentration on 24 November 2020, were collected in our study. For GI and GWI, the ice charts show that the concentration of GI is at most 80% and there are only a few areas where the concentration of GWI can reach 90%, so we selected the sample region where the main ice type accounts for 80%. The sample region with the proportion of the main ice type to 90% was selected for the three ice types: ThinFI, MedFI, and ThickFI.

•
Ice development principle To better describe this principle, we took the development process of MedFI as an example. Table 3 shows the development process of MedFI in a certain region of Hudson Bay from January to February 2021. In Table 3, "SAR date" refers to the date when the SAR scene for the region was collected, and the information of sea ice type and concentration is given for each date. Taking the SAR date "20210117" for example, the ice types in the SAR scene collected on 17 January 2021, are ThinFI and MedFI, which account for 70% and 30% of the total ice, respectively. Assuming we want to obtain a data sample with a time-step of 4, we cropped image patches from the same region of four SAR scenes successively acquired to form the sample, and then labeled the sample (true class) as the main ice type in the region of the last acquired scene. For instance, "sample 1" in Table 3 contains a sequence of four image patches and its ice type is labeled as "MedFI".
To learn the entire process of ice development as fully as possible, we produced four sample types representing MeidumFI. The difference between these samples can be seen intuitively in Table 3. Sample 1 represents the sample whose ice concentration of MedFI has just reached 90%. With time, the concentration of MedFI gradually tends to be stable. Sample 2, sample 3, and sample 4 represent different stages of MedFI from formation to stability. This ice development principle can ensure that our samples can represent the entire development process of ice types, so that the model can better extract the temporal characteristics of sea ice.
Ice movement also influences the change of ice type in one data sample. In particular, the ice in the early stages of development is often thin and small, and easy to be moved away by wind and current. Therefore, the time-series of data samples for OW, NI, GI, and GWI may record a quick change of the dominant ice types. In this case, the temporal information has a limited effect on the ice classification. More discussion is presented in Section 3.4.

•
Cross-subswath principle The image intensity of Sentinel-1 satellite with progressive scans SAR is often disturbed by additional thermal noise, particularly in cross-polarization channel. Although we reduced the noise of the SAR images (see Section 2.3.1), the contribution of residual noise is still significant, especially the interswath discontinuities effect and the scalloping effect on the first subswath, which brings great interference to feature learning. For this, we try to sample across subswaths. Setting the sample regions across different subswaths through manual intervention can effectively alleviate the influence of the interswath discontinuities effect and the scalloping effect on sea ice classification.

•
Boundary principle A fact worthy of our attention is that Sentinel-1 SAR data is provided by the Copernicus Open Access Center every six days, while the weekly regional ice chart is provided by CIS every seven days. This results in an imperfect match between the SAR image and the ice chart for the boundaries of different ice classes. To minimize the impact of such a mismatch, we recommend that only samples that are far enough away from the boundary are valid. This principle has also been used in the studies of Park et al. [44] and Boulze et al. [12].
We verified the concentration principle, ice development principle, and cross-subswath principle in Section 3.5.
After screening according to the above four principles, the specific regions of specific SAR scenes that meet the sample requirements are determined, as shown in Table 4 and Figure 6. Table 4 provides the dates and ice information of these SAR scenes for different ice types. Figure 6 schematically demonstrates these determined regions. As there are overlaps among these regions, we used different colors to distinguish them (note: the colors have nothing to do with the ice types).
With the selected SAR scenes for different ice types, the data samples are produced in the following process. Firstly, ArcGIS software is used to crop the SAR scenes according to the screened sample regions. Then, the CIS ice data that has the shortest time difference with the SAR image acquisition time are opened in ArcGIS. After geo-coordinate positioning, the position of the cropped SAR sub-image in the ice data can be determined, so that the main ice type of the region in the ice data is designated as the ice type label of this SAR sub-image. Next, to produce a data sample with both spatial and temporal information, we crop the SAR sub-image into small patches of 32 × 32 pixels, denoted as I (per pixel is about 0.000359 • in the WGS84 coordinate system). Then, we concatenate the patches from consecutive s SAR scenes at the corresponding position into time-sequences (or 3D blocks).
Each data sample is a time sequence I t−(s−1) , . . . , I t−1 , I t , where t represents the current time. Finally, we label each sample with the main ice type at the time t. In order to facilitate the use of this dataset for training classification models, we numerically code OW, NI, GI, GWI, ThinFI, MedFI, and ThickFI from 0 to 6.
The entire process generates the sea ice sample data with the size of (N, s, 32, 32, 2) and the ground-truth data of sea ice type labels as one-dimension vector with the size of N. Here, N is the number of samples, s is the number of SAR scenes involved in each sample, that is a sample's time-step, 32 × 32 is the sample resolution, and 2 refers to the number of HH-and HV-polarization channels. The time-step of 6 is determined based on our previous study [31]. Nevertheless, this dataset can be used with any time-steps between one to six depending on the users' needs, simply by removing the unwanted image patches from the sequence. The large dataset SI-STSAR-7 has been published at IEEE DataPort for data sharing. The public access link is http://ieee-dataport.org/open-access/si-stsar-7 (accessed on 20 May 2021). It includes the dataset used for model training and the SAR scene data for testing and evaluating a sea ice classification model. Users can find more details about SI-STSAR-7 from the "Readme.pdf" in the data access link.

Experiments
To further prove the effectiveness of our proposed method of producing a sea ice spatiotemporal dataset and construct a baseline on the generated spatiotemporal dataset SI-STSAR-7, this paper provides three state-of-the-art sea ice classification methods for benchmark. These methods include a classic machine learning algorithm, Support Vector Machine (SVM), and two deep learning networks, Convolutional Neural Network (CNN) and Convolutional LSTM (ConvLSTM). Accuracy and kappa coefficient are applied for evaluating the classification performance of these methods on sea ice spatiotemporal datasets from 1 to 6 time-steps. The classification results constitute the baseline system. In addition, experiments verified the validity and necessity of the sample-producing principles we proposed. [45] is an extension of the Long Short-Term Memory (LSTM) network. By converting the Hadamard product of the LSTM during the input-to-state and state-to-state transitions into convolution, the network can learn the temporal features and extract the spatial features of the image more effectively. The visualized architecture of the ice classification network based on ConvLSTM is shown in Figure 7. It consists of four ConvLSTM layers, three Batch Normalization (BN) layers, one Global Average Pooling (GAP), and one output layer. The BN [46] layer after each ConvLSTM layer is to normalize the output of the ConvLSTM layer to speed up the training and avoid overfitting. The size of the input image patch is t × 32 × 32 × 2, where t is the time-step (that is, the number of consecutive SAR scenes), 32 × 32 is the pixel of each SAR image block, and 2 refers to the HH-and HV-polarization channels. The output of ConvLSTM is the code of ice type, which is 0~6. Table 5 shows the parameter settings of the sea ice classification network based on ConvLSTM.   [11] and Boulze et al. [12] applied the CNN model to the study of sea ice classification and achieved high classification accuracy. Therefore, we take CNN as one of our baseline methods. To maintain the consistency with the baseline method based on ConvLSTM, we kept the network architecture of the sea ice classification method based on CNN the same as that of Method 1, and only replaced the ConvLSTM layer with the CNN layer. At the same time, we also kept the same parameters of the corresponding layers of the two methods. Figure 8 shows the network visualized architecture of the baseline method based on CNN. The parameters of method 2 are shown in Table 6.  Method 3: SVM is widely used in sea ice classification in recent years [9,10,47,48]. In addition, it has been successfully applied to the Map-Guided sea ice classification system [49]. In this study, SVM is directly applied to the sea ice spatiotemporal dataset to extract high-level features and perform sea ice classification. In the experiment, SVM uses a Gaussian kernel function and its parameter C is set to 1, and the parameter of gamma is automatically learned.

Method 1: ConvLSTM
Compared with the classification model in [31], which first extracted spatial features of sea ice with ResNet networks, and then further learned temporal features from the extracted spatial features with a LSTM network, the three methods designed in this study focus on learning spatial and temporal information of sea ice together to make direct classification.

Evaluation Metrics
In order to verify the performance of the baseline methods, the following metrics in the test dataset were calculated to evaluate the classification model.

•
Accuracy: The proportion of correctly classified samples to total samples. • Accuracy of each ice class: The proportion of correctly classified samples of a given class to total samples of that class. • Kappa coefficient: An indicator for measuring the consistency of multi-class models, which is based on the confusion matrix. The formula is as follows.
where k is the kappa coefficient, p o is the overall accuracy, m is the number of samples correctly classified, n is the total number of samples, a i and b i are the number of real samples and the predicted samples of ith ice level, respectively.

Implementation
Our experiments were conducted on a computer with the following specifications: Ubuntu 16.04 operating system, Intel Xeon Gold 6130 2.10 GHz CPU, 256-GB Memory, and NVIDIA TITAN RTX GPU. Keras framework and Python programming language were adopted to build and train deep learning classification networks, and SVM is implemented with the SVM library.
We normalized the samples by min-max normalization. Then, the dataset was shuffled randomly and divided into training set, validation set, and test set at the ratio of 8:1:1, all of which had the same proportion of ice types.
For deep learning methods, we adopted cross-entropy loss function to minimize the error between the predicted value and the real value. Adam optimizer was used to calculate and update the network parameters. In order to speed up the training, we built a highperformance data input pipeline to load the training data into the computing device with 128 batch-size. In addition, in model training, we also adopted the learning rate decay strategy and the early stopping to accelerate the model convergence and prevent overfitting.

Sea Ice Classification Performance on SI-STSAR-7
To study the effect of spatial and temporal information on the performance of sea ice classification, we used the proposed dataset SI-STSAR-7 with time-steps from 1 to 6 to train and test three baseline classification methods. The experimental results of the baseline methods based on ConvLSTM, CNN, and SVM on the test dataset are given in Table 7.  Table 7 shows the accuracy and kappa coefficient of sea ice classification increase constantly with the increase of the number of time-steps for all the three methods. This indicates that temporal information plays a significant role in promoting the recognition ability of sea ice classification models. We also found that the overall performance of the deep learning methods (ConvLSTM and CNN) is better than the machine learning method (SVM). This suggests that our large dataset is suitable for deep learning models, which have deeper feature extraction and better learning capability. Comparing the evaluation results under the same time-step, Method 1 always overperforms Methods 2 and 3. Clearly, Method 1 using the ConvLSTM network is more effective for spatiotemporal feature extraction, which contributes to identifying different sea ice types.
More details about how the classification accuracy of different sea ice types is changed with the time-step are demonstrated in Figure 9a-c for the three baseline classification methods, respectively. These figures exhibit that although the overall classification accuracy shows a positive dependence on the time step, the usefulness of sea ice temporal information for classification varies with ice types. For NI, GI, and GWI, when the time-step is increased from 1 to 3, the recognition ability of all baseline methods is greatly improved, all of which reach more than 90%. We consider that this trend is related to the short growth period of these types of sea ice, as well as the fast ice movement. According to our statistics on the ice development duration in winter, the existence duration of thin ice is about 1-4 weeks, of which NI can exist for the shortest time, about 1 week, GI can survive for about 2 weeks, and GWI can survive for 3 weeks or more. Therefore, when the time-step is set to 3 (equivalent to 3 weeks), the three ice classes can be well distinguished. In addition, these relatively thin ices are usually small and move fast due to wind and currents. Short time-steps are more reasonable to reflect their fast change. When increasing the time-step, however, the model accuracy does not drop. This indicates that the model has automatically learned that the previous time-step ice information is not related to the current ice and has given very low weights to them. ThinFI, MedFI, and ThickFI take more than three weeks from the initial formation to reaching 90% of concentration, and they exist for a long duration throughout the winter and move slowly due to their thicker and larger morphology. As a result, the short time-step is not as effective as the long time-step in distinguishing them. Moreover, the three methods are generally better at recognizing OW, NI, GI, and GWI than ThinFI, MedFI, and ThickFI. It shows that the three types of first-year ice have similar characteristics and are difficult to differentiate. The temporal characteristics of sea ice have a higher degree of discrimination as the time-steps increase, which benefits sea ice with similar spatial characteristics.
By comparing the three subgraphs in Figure 9, it can be found that, at the same timestep, the recognition ability of the three methods for a certain kind of ice generally presents that ConvLSTM is better than CNN, and both of them are better than SVM. However, the MedFI category is a special case. SVM has higher classification accuracy for MedFI with a small time-step, and its accuracy has hardly improved with the increase of time-step.

Concentration Principle
Concentration principle can ensure that the sample represents a type of sea ice as much as possible, so as to avoid the impact of incorrect labeling of ice type on classification performance. In order to verify the necessity of this principle, we selected SAR images of different ice types at different concentrations: 0.9 vs. 0.7 (the proportion of main ice type to the total ice area when the total ice concentration reaches 90% or more) and compared the probability density function (PDF) curves of sigma0 values of these ice types (see Figure 10). Figure 10. Probability density function curves of different ice types at different concentrations. In the HH polarization mode, ice concentration is (a) 70% and (b) 90%; in the HV polarization mode, ice concentration is (c) 70% and (d) 90%. Note: Since the OW has no concentration information, its data is the same under the same polarization mode; NI can account for up to 70% of the total ice volume, so in (b,d) the concentration of NI is the same 70%; the concentration of GI in (b,d) is 80%. Figures 10a and 10b are the PDF of sigma0 values of various ice types with the ice concentration 70% and 90% in HH polarization mode, respectively. Figure 10c,d show the PDFs in HV mode. Overall, the distribution of various ice types is more separated and more different when their concentration is 90% than it is 70%. This indicates that when the main ice type accounts for 70% of the total ice area, there may be more similar backscattering characteristics between different ice types, making it more difficult to distinguish them. When the proportion of the main ice type in the total ice area increases to 90%, the samples can present more distinguishing characteristics of the specific ice type, which is helpful for reducing misclassification rate of the classification model.
When it comes to the specific ice type, different behaviors can be found. Given that the PDF of OW is clearly different from other ice types, the classification between ice and water is relatively easier. With 70% of concentration, GI and GWI, and ThinFI and MedFI have almost identical PDF curves in both HH and HV modes (Figure 10a,c). These phenomena are obviously improved when the ice concentration goes to 90%. There is an exception in Figure 10d, where the differences between NI and GI have shrunk. Nevertheless, when the data from HH and HV modes are used together, NI and GI can be easily distinguished.

Ice Development Principle
To prove the influence of ice development principle on ice classification performance, we constructed two small-scale datasets for verification. Both datasets have the samples with the time-step of 6. Table 8 describes the difference between the two datasets. All the data samples in Dataset 2 meet the ice development principle presented in Section 2.4 and cover the entire ice development process as much as possible. The samples of OW, NI, GI, and GWI in Dataset 1 remain the same as Dataset 2, but for the rest three types of ice (ThinFI, MedFI, and ThickFI), Dataset 1 only has samples whose ice concentration has just reached 90% (such an example is shown as sample 1 in Table 3). We used Method 1 based on the ConvLSTM network introduced in Section 3.1 as the classification method and trained it with the two sea ice classification datasets, respectively. In order to verify the classification performance of the two datasets trained models, we used the collected SAR images of the two freezing periods to produce a test dataset with seven ice classes that conform to the actual situation of ice development. The results are given in Figure 11. By comparing Figure 11a,b, it is obvious that the classification accuracy of model 2 for ThinFI, MedFI, and ThickFI is much higher than that of model 1, indicating that Dataset 2 can better provide the time correlation information of sea ice, so as to realize the high-precision classification. Therefore, the ice development principle that we consider when constructing spatiotemporal dataset is beneficial and necessary.

Cross-Subswath Principle
Cross-subswath principle is realized by setting the sample region to cross different subswaths through manual intervention. To visually demonstrate the effect of this principle, we constructed another dataset for comparison with our dataset SI-STSAR-7. The difference between these two datasets is that the SI-STSAR-7 contains the sample region across the subswaths of EW1 and EW2, while the other dataset does not. Then used them separately to train the classification model ConvLSTM introduced in Section 3.1. For description convenience, it is defined that the model trained using the dataset that does not contain cross-subswath samples is model 3, and the model trained using the dataset that contains the samples following cross-subswath principle is model 4. Next, we used the two trained models to test a whole SAR scene obtained on 24 November 2020, and generated the sea ice distribution map, shown in Figure 12. It should be noted that the SAR scene on 24 November 2020, was not used to make our dataset.  The CIS weekly regional ice chart in Figure 12a highlights the SAR scene bounded by red lines. The sea ice distribution maps in Figure 12b,c roughly agreed with the weekly regional ice chart, except the lower right area where the proportions of NI and GI in the total ice area are both 50% according to the egg code and which dominant color could be NI or GI. Figure 12b clearly shows the influence of the scalloping effect on EW1 and the discontinuity effect between EW1 and EW2 on sea ice classification. These phenomena are greatly improved by increasing the sample region spanning EW1 and EW2 (Figure 12c). This proves that the cross-subswath principle we proposed can effectively alleviate the disturbance of residual noise to sea ice classification. Some sea ice research chooses to abandon the first subswath to avoid its influence [48,50], but our method provides a potential solution to this problem without losing any information.

Conclusions
This paper provides a labeled large sea ice dataset SI-STSAR-7 based on Sentinel-1 SAR images for deep learning-oriented sea ice classification research. To our best knowledge, this is the first work to publish a sea ice classification dataset of SAR images, which contains both spatial and temporal information to reflect the differences of backscattering intensity and textures between different sea ice types, as well as the growth and development characteristics of sea ice with time. The dataset is described in source data collection, SAR image preprocessing, and sample production. We applied three state-of-the-art machine learning and deep learning models to our spatiotemporal dataset for sea ice classification. This provides a benchmark for future research. By changing the time-step from 1 to 6, we proved the validity of the temporal information for sea ice classification and discussed the response speed of different sea ice types to the time-step. In addition, we also verified the effectiveness and necessity of our proposed sample-producing principles through experiments. The experimental results in this paper reveal (1) the process we designed for constructing SI-STSAR-7 is effective because the results of the baseline methods are reasonable; (2) high-precision sea ice classification can be achieved on SI-STSAR-7, evidenced by the baseline method's performance; (3) the proposed sample-producing principles can effectively alleviate the impact of thermal noise on sea ice classification. We hope that SI-STSAR-7 can promote the development of sea ice research based on sea ice spatial and temporal information and provide a reference for relevant researchers.
For data sharing, the spatiotemporal dataset for model training and testing (seven ice types with six time-step spatiotemporal data) and the spatiotemporal dataset for classification model performance evaluation (six consecutive SAR scenes) are available on the public link http://ieee-dataport.org/open-access/si-stsar-7 (accessed on 20 May 2021). As discussed in Section 3.4, the time-series data are more beneficial to thicker ice than thinner ice. To meet for different users' needs, this dataset can be used with any time-step between one and six. For example, if three time-steps are needed, you just simply remove the earliest three image patches from the sequence.
There are some limitations to this work. Firstly, although our seven types of sea ice dataset can support fine ice classification, it does not cover land and other ice types given by WMO. In the future, using other regions of SAR images, we can further expand the ice types of this dataset for wider use. Secondly, the data sample labeling was taken the CIS's ice chart as the ground truth. It is not completely accurate due to the time difference between the SAR image acquisition date and the weekly ice chart issue date. Moreover, the quality of our dataset is inevitably affected by the original SAR image quality and ice chart quality. Although the ice charts are quality controlled, it may still involve the subjective bias introduced when producing the ice chart by sea ice analysts. For this problem, we provided a quality specification in the "Readme" file to indicate the conditions of the data. Thirdly, the classification model trained with this sea ice dataset may lead to the dependence of the model on the development process of sea ice over time. This will limit the ability of the model to classify ice that has different development characteristics from the training dataset. For example, the ice changes in the melting period are very different from the freezing period. In our further research, more varied validation datasets are necessary to tackle the potential problems. Finally, different visual structures (e.g., stripes, eddies) on SAR images affected by wind and currents may lead to misclassification of sea ice. Without purposely eliminating these influences, our dataset contains weather-affected samples. For deep learning models, this kind of data is conducive to increasing the robustness of the model. Nonetheless, extra information of wind and currents adding as the input of the model may help to improve the classification accuracy of the model. Data Availability Statement: Sentinel-1 SAR data used in this study are available at Copernicus Open Access Center https://scihub.copernicus.eu (accessed on 20 May 2021). Weekly regional ice charts and weekly regional ice data are available at https://iceweb1.cis.ec.gc.ca/Archive/page1.xhtml (accessed on 20 May 2021). Our SI-STSAR-7 dataset is publicly available at http://ieee-dataport.org/ open-access/si-stsar-7 (accessed on 20 May 2021).