Lake Ice-Water Classiﬁcation of RADARSAT-2 Images by Integrating IRGS Segmentation with Pixel-Based Random Forest Labeling

: Changes to ice cover on lakes throughout the northern landscape has been established as an indicator of climate change and variability, expected to have implications for both human and environmental systems. Monitoring lake ice cover is also required to enable more reliable weather forecasting across lake-rich northern latitudes. Currently, the Canadian Ice Service (CIS) monitors lakes using synthetic aperture radar (SAR) and optical imagery through visual interpretation, with total lake ice cover reported weekly as a fraction out of ten. An automated method of classiﬁcation would allow for more detailed records to be delivered operationally. In this research, we present an automatic ice-mapping approach which integrates unsupervised segmentation from the Iterative Region Growing using Semantics (IRGS) algorithm with supervised random forest (RF) labeling. IRGS ﬁrst locally segments homogeneous regions in an image, then merges similar regions into classes across the entire scene. Recently, these output regions were manually labeled by the user to generate ice maps, or were labeled using a Support Vector Machine (SVM) classiﬁer. Here, three labeling methods (Manual, SVM, and RF) are applied after IRGS segmentation to perform ice-water classiﬁcation on 36 RADARSAT-2 scenes of Great Bear Lake (Canada). SVM and RF classiﬁers are also tested without integration with IRGS. An accuracy assessment has been performed on the results, comparing outcomes with author-generated reference data, as well as the reported ice fraction from CIS. The IRGS-RF average classiﬁcation accuracy for this dataset is 95.8%, demonstrating the potential of this automated method to provide detailed and reliable lake ice cover information operationally.


Introduction
Seasonal ice on lakes represents a significant component of the cryosphere and plays a role in many biologic, ecologic and socio-economic processes [1].A movement towards later freeze-up and earlier break-up dates on northern lakes since the middle of the last century is well documented and predicted to continue [2][3][4].This alteration to the state of lake ice cover is expected to have implications for both human and environmental systems, making it imperative to monitor in the face of climate change [5].The inclusion of lakes and lake ice in weather forecasting and climate models has also been advocated, as many climate simulations do not account for the multitude of small lakes across Canada.
Simulations that account for these lakes have presented more accurate results when compared with real-world observations [6].The advancement and launch of many earth observing synthetic aperture radar (SAR) satellite systems allows detailed lake ice monitoring to take place, especially across large and inaccessible expanses.However, despite the proven importance of lake ice phenology in the context of climate change, these detailed records have yet to be created operationally.
Currently, the Canadian Ice Service (CIS) monitors ice cover on a weekly basis for over 130 lakes across North America using a combination of SAR and optical imagery.This data is used in a numerical weather prediction system operated at Environment and Climate Change Canada [7].Scenes are visually interpreted, and ice cover is reported as a fraction out of ten on a weekly basis for each lake [8,9].More detailed information including percent coverage, position, and extent of the lake ice is not available operationally as it would be time consuming and thus costly if produced in the existing manner.If an unsupervised method of classification were made operational, detailed records of ice extent on these lakes could be provided at high spatial and temporal resolutions.The need for such a method has become crucial considering the wealth of SAR data becoming available from new satellite missions such as Sentinel 1 A/B and RADARSAT Constellation Mission.
SAR systems are particularly well suited for sea ice and lake ice monitoring as they are unaffected by cloud cover and can acquire images overnight and during polar darkness.However, ice-water classification becomes complicated as backscatter signatures of ice and water vary significantly throughout and across scenes, where several signatures can be observed for each class depending on the age, type, or thickness of ice, presence of wind, and incidence angle of the sensor [8].This classification challenge has motivated a library of work over the past three decades, mostly focusing on sea ice with some applications to freshwater lakes and rivers.Many approaches have been presented including the use of gray-level thresholds [10,11], cluster analysis [12,13], watershed segmentation [14,15], maximum likelihood (ML) [16], neural networks (NN) [17], Support Vector Machines (SVM) [18][19][20], random forest (RF) [21,22] and others, with researchers often combining several methods and/or data sources into a multi-step workflow.
For example, the Advanced Reasoning using Knowledge for Typing of Sea ice (ARKTOS) system incorporated local thresholding, unsupervised clustering, and watershed merging for segmentation, as well as class labeling through a rule-based module [23].MAp-Guided Ice Classification (MAGIC) [24] employed ice charts from CIS in addition to SAR data to produce pixel-based ice maps, and Kim et al. [25] mapped landfast sea ice in the Antarctic using seven satellite-derived products.An overview of data-based SAR sea ice classification is provided by Zakhvatikina et al. [26].
Random Forest classification has recently proved successful in ice type and ice-water classification.The previously mentioned work by Kim et al. [25] tested decision trees (DT) and RF classification to map landfast sea ice and concluded that RF achieved better performance both visually and numerically.DT and RF models were also applied to classify melt ponds on ice in the Chukchi Sea in TerraSAR-X SAR imagery, similarly showing that the RF method outperformed DT [27].Later, an RF model was used to produce a sea ice map of the Chukchi Sea based on KOMPSAT-5 SAR imagery with 99.2% accuracy [21].Six models including convolutional neural network (CNN), Bayesian, SVM, and RF were compared by Shen et al. [22] for sea ice type and open water classification using Cryosat-2 Altimeter data.The authors found that RF combined with an optimal feature assembly resulted in a mean accuracy of 91.5%, a 9% improvement over the other methods.
Since ice and water can have similar backscatter signatures within or across SAR scenes, pixel intensity alone is not always sufficient for accurate classification.The use of Gray-Level Co-occurrence Matrices (GLCM) or other statistical methods to determine texture features has been investigated and shown to be effective.Backscatter thresholds for both HH and HV bands were statistically determined based on gray-level metrics and were used with a threshold technique to discriminate between melting lake ice and open water in Radarsat-2 SAR images of small, shallow lakes in the Old Crow Flats region, Yukon [28].Zakhvatkina et al. [17] investigated which GLCM SAR texture features were optimal for discriminating between sea ice types in ENVISAT ASAR imagery.The authors calculated nine texture features and used them to train a neural network classifier which was tested on 20 images, resulting in average classification accuracy greater than 80%.Later, texture features were used with an SVM classification approach to distinguish sea ice and open water with average accuracy of 91 ± 4% [20].The Iterative Region Growing using Semantics (IRGS) algorithm was used with an SVM classifier trained on 28 GLCM features resulting in 96.42% average accuracy [18].Liu et al. [29] also implemented SVM for labeling after extracting GLCM features with good results.GLCM features were chosen to train the classifier in this research because of the considerable results obtained in these studies.
The IRGS algorithm has been created to perform segmentation of SAR ice imagery in an automated and reliable way.This technique incorporates 'high detail local' and 'large scale global' information and is further explained in Section 3 [18].Both automated and manual labeling techniques have been employed in combination with IRGS for ice classification.Ochilov & Clausi [30] presented a novel unsupervised sea ice labeling technique which uses neighborhood information in a Markov random field framework.In this approach, no training samples were required but an ice analyst provided metadata about each polygon to be used during labeling.IRGS segmentation was later employed with a SVM model to assign ice-water labels with high accuracy, as previously mentioned [18].Li et al. [31] used some properties of IRGS in a semi-supervised approach for ice-water classification based on self-training.Most recently, IRGS segmentation was combined with manual ice-water labeling on 26 RADARSAT-2 scenes of Lake Erie attaining 90.4% average accuracy [32].
Here, we present the "IRGS-RF" classification approach which combines IRGS segmentation with random forest supervised labeling.This fully automated approach has achieved 95.8% average accuracy in the classification of ice and water in 36 RADARSAT-2 SAR images of Great Bear Lake (Canada).The approach combines the IRGS segmentation methodology with an RF classification model incorporating GLCM features.The efficacy of this method is demonstrated here and compared with four other methods.These include a second integrated approach which instead applies supervised SVM labeling with IRGS, similar to that presented by Leigh et al. [18] (hereafter "IRGS-SVM"), a semi-automated approach combining IRGS segmentation with manual labeling as described by Wang et al. [32] ("IRGS-Manual"), and pixel-based RF and SVM classification methods without the use of IRGS ("RF" and "SVM" respectively).The accuracy of these five methods are tested against 14,400 randomly sampled reference pixels across the 36 scenes used.

Data
The images used in this study are captured over Great Bear Lake (GBL), shown in Figure 1.This large, deep lake is located within the Mackenzie River Basin, Northwest Territories, Canada.It spans 31,000 km 2 with a mean depth of 76 m and a maximum depth of 446 m.It is described to be ice covered from November to July [9].Mean monthly temperatures at Déline (located on the South-West shore) range from 13.3 to −25.2 • C, remaining below 0 • C from October to April.

Synthetic Aperture Radar Data
Thirty-six RADARSAT-2 scenes of GBL are used in this study spanning three ice cover seasons from 2013-2016 as outlined in Table 1.All images are dual-polarized HH and HV images in ScanSAR wide beam mode.Each image covers a swath width of 500 by 500 km, with a nominal spatial resolution of 100 m.Each image is originally approximately 10,000 by 10,000 pixels with 50 m by 50 m pixel spacing; however the images have been downsampled using a 4 × 4 block average to minimize computation time.Images were acquired in both ascending and descending passes, with incidence angles ranging from 20-49 degrees [33].Areas of land were excluded from segmentation using a vector-based mask with a 250 m buffer between the lake and its shorelines to minimize the inclusion of land pixels in classification results.Only scenes that included 70% or more of the lake were used so that classification outcomes could be compared to the weekly fraction reported by CIS.The chosen image set represents a suitable sample for this study as it offers a range of backscatter signatures and incidence angles for ice and water at varying points throughout the ice freeze-up and break-up process.

Reference Data
To assess the accuracy of the classification outcomes, reference information was generated for comparison.A random sample of 400 pixels within the lake were selected for each scene and labeled as either ice or water based on visual interpretation of morphology, texture, and backscatter.This data was created in the MAp-Guided Ice Classification (MAGIC) System [24], further detailed in Appendix A. A CIS ice analyst provided training to the authors and advised the use of an RGB visualization of SAR bands (HH/HH/HV in the R/G/B channels) to help discriminate between classes.MODIS optical imagery was also used during this process to create the most accurate reference data possible.
Weekly ice concentration fractions are recorded by CIS for two sections (north and south) of GBL.In this work, north and south fractions have been averaged to simplify reporting.These fractions are compared to the total ice concentration produced by the tested classification methods.

IRGS Segmentation
The IRGS algorithm has been specifically tailored to deal with the unique segmentation challenges present in SAR scenes of ice and water.The steps involved in IRGS segmentation are shown in Figure 2 and described by Leigh et al. [18].The MAGIC system detailed in Appendix A is used to implement IRGS in this study.When employing IRGS the lake is first divided using a watershed algorithm into several sub-regions called 'autopolygons' which follow the natural structure of the image [24,34].This step (shown in Figure 2c) is carried out using only the HV image because it shows less backscatter variation as a result of strong wind roughness or incidence angle effects [32].The creation of autopolygons decreases errors caused by these effects across the image as each one is segmented individually in the following steps, negating the need for incidence angle normalization.Within each autopolygon, small uniform regions are again distinguished using a watershed algorithm and both SAR bands [18].Each region is then represented by a node in a region adjacency graph and assigned an initial label.The subsequent segmentation is an iterative process which involves merging and clustering regions towards an ideal configuration with fewer nodes [35].
During this process edge strength between adjacent regions as well as neighborhood information is considered, increasing segmentation accuracy [14].This process ultimately divides each autopolygon into homogeneous regions of ice or water.Once complete, an image wide 'gluing' step is performed, wherein similar regions from any of the autopolygons are merged into a set number of final classes as defined by the user (shown in Figure 2d) [18].This is called the 'glocal' approach as it incorporates 'high detail local' and 'large scale global' information [18].These final classes can then be labeled manually by the user or labeled with the use of an automated technique.Both are described in the following sections.

IRGS-Manual Semi-Automated Classification
For manual classification, the IRGS method as described above was implemented in MAGIC to segment the image into multiple classes.Each autopolygon was segmented into five internal classes based on both the HH and HV bands and then glued into 12 final arbitrary classes across the entire scene.To complete the binary classification these classes were manually labeled as ice or water based on visual interpretation.These 12 segmentation classes do not represent specific ice types, but instead are necessary to prevent areas of ice and water from being merged.Preliminary testing through trial-and-error was conducted to arrive at the most suitable parameters to be used for this process, balancing the homogeneity of the final classes with the amount of time needed to label them.Gluing to 12 final classes adequately separated sections of ice and water, while remaining relatively quick to label.This method was applied by Wang et al. [32] for ice-water classification, using eight classes in the gluing step.The authors also used experimentation to arrive at optimal parameters for their study.

Features
Multiple studies (as mentioned in Section 1) have demonstrated the value of using GLCM features to increase classification results.In this work, the following GLCM measures were used: • Contrast Group: contrast (CON), dissimilarity (DIS), homogeneity (HOM) • Orderliness Group: applied second moment (ASM), entropy (ENT), inverse moment (INV) • Statistics Group: mean (MU), standard deviation (STD), correlation (COR) Window size and step size of the sliding window are also important parameters for GLCM features.Window size determines the perceptive area for textural features.For example, open water areas are more easily captured by larger windows while smaller windows have good performance detecting small individual ice floes.The step size of GLCM reflects the scale of repeating patterns.The parameters used in this paper are shown in Table 2.All features were extracted from both HH and HV images, resulting in 162 total features.Pixel intensity, local average, and maximum pixel intensities in window sizes of 5 × 5 and 25 × 25 were also selected into the feature pool.Recursive feature elimination with cross-validation [36] was adopted in this work for selecting the best feature combination to reduce computation time and avoid overfitting.The initial feature set is recursively pruned to eliminate features with less importance.This procedure is repeated based on a cross-validation strategy until reaching the best classification result.In this work, feature selection ran 36 times for cross-validation of each scene in the data set using a leave-one-out (LOO) method to achieve the final feature ranking.For each loop in this procedure, the feature selection estimator was trained on 35 scenes and tested on the remaining scene to determine feature importance.After all 36 scenes were cross validated, the importance rankings for each feature from each loop were summed to provide the final feature rank.The selected 31 features from this process are listed in Table 3.

Classifiers Support Vector Machine
A support vector machine (SVM) is a supervised classification method for binary cases.It computes a linear decision hyperplane in a high-dimensional feature space using the subset of training samples, which are called the support vectors, near the decision boundary to maximize margins.The SVM tries to minimize the following risk function to determine the label for the test sample.
where y i are class labels ∈ {0, 1} for binary cases (0 is ice and 1 is water in this study), α i are weights learned by training, K() is a kernel function, x i are the support vectors, and x is the sample to be classified.There are different kernels for different tasks, such as linear, polynomial, radial basis function (RBF), and sigmoid.Nonlinear kernels are widely used for classification since they map features to high dimension space to solve nonlinear problems.The kernel function used in this study is RBF with the form where γ is a Gaussian scaling parameter.t i − t 2 is the squared Euclidean distance (SED) between two data points t i and t i .RBF measures the similarity between data samples and projects them to a new feature space for better classification performance.To reduce overfitting and computational cost, SVM is built on very few support vectors to determine the classification hyperplane.Moreover, SVM provides more robust results compared to other conventional classifiers [37].
Random Forest RF classifiers consist of many classification and regression trees.The training data for each decision tree is bootstrap sampled from the whole data set, and for each node in a decision tree, the training data is sampled without replacement from the whole data set.This bootstrap sampling strategy helps to suppress overfitting.After all the decision trees have been trained, each produces a classification label and a vote for the final label, which is determined using majority voting.This procedure increases the robustness of RF and reduces running time as decision trees are trained in parallel.Unlike SVM, it is unnecessary to select features for RF since a Gini index is employed to assign importance to each feature.Because of these properties, RF is easy to implement and has better generalization ability [38].
In this paper, RF is selected for three reasons.First, it is less affected by outliers and noise in a data set, which is of great importance for SAR image interpretation as SAR images are degraded by speckle noise.Second, RF can deal with many input features and will not be trapped in overfitting to the data set.This is necessary for this work as nearly two hundred GLCM features were calculated.Finally, RF can determine the importance of each input feature by measuring the degradation in classification accuracy when randomly altering one of the input features while keeping the rest constant.

Integration of Segmentation and Labeling
For the SVM and RF methods, each pixel was assigned an ice or water label.For the IRGS-SVM and the proposed IRGS-RF method, class labels were assigned regionally based on the IRGS segmentation output.A flow chart of the proposed IRGS-RF automated ice-water classification method is shown in Figure 3.The input is composed of the HH and HV images, landmask, and pre-trained classifier.The classifier is trained based on the selected features in Table 3 and the same features are extracted from HH and HV images for supervised labeling.The segmentation results generated by IRGS contain hundreds of simply connected homogeneous regions, as gluing into a low number of final classes is not needed to expedite manual labeling.To assign labels the pixel-wise classifier is adopted to generate a 'rough' classification within each segment.Inspired by the mechanism of RF, majority voting is introduced to label each segmented region, where the dominant class from the pixel-wise classification result is assigned to the whole segment.Training and testing for the labeling step are also done using a LOO method similar to the one employed in Section 3.3.1 wherein models are trained on 35 scenes and tested on the single remaining scene.This process is repeated until all scenes have been evaluated.

Results
In this work, five classification methods were tested on 36 RADARSAT-2 scenes of GBL and validated against 400 reference pixels per scene.The IRGS-Manual, RF, and IRGS-RF methods performed very well overall, each with average accuracy over 90%.The IRGS-RF approach had the highest agreement with reference data, resulting in an average accuracy of 95.8%.For this method, the highest accuracy for a single scene was 100% and the lowest was 85%.Table 4 details the classification accuracy for each scene and method.The SVM methods performed nominally well overall, with average accuracy at 74.7% and 81.1% for the SVM and IRGS-SVM approaches, respectively.However, visual inspection reveals that the pixel-based SVM method often produced noisy results which did not accurately represent ice and water.In some cases both SVM methods performed very poorly, erroneously classifying large swaths of open water as ice with accuracy as low as 3.8%.The majority of overall error in the SVM and IRGS-SVM results are from water error, which is 19.0% and 16.7% respectively, and examples of this phenomenon are found during both the freeze-up and break-up periods.In scenes from 6 November 2014, 1 July 2015, and 30 June 2016 these two methods label most of the lake as ice covered when the opposite is true, resulting in 33.5% accuracy or less in these three cases.
Ice error and water error for all tested methods is also reported in Table 4.The IRGS-Manual method has slightly more open water error (3.7%) than ice error (2.6%), meaning this method more often overestimated the amount of ice present on the lake.The IRGS-RF method has nearly equal amounts ice error and water error at 1.9% and 2.2% respectively.
Box plots showing the spread of accuracy values for the tested methods by period and overall are included in Figure 4.The low minimum accuracy results from the SVM and IRGS-SVM methods are clearly shown here as outlier values.The scene from 1 July 2015 is the most extreme case, where the accuracy for each is 10.3% and 3.8% respectively.Two outliers also exist in the RF results from 18 and 25 May of 2016 where accuracy is 72.3% and 70.0%respectively.The IRGS-Manual approach performed slightly better during ice break-up period, with an average accuracy of 97.0% compared to 91.2% during freeze-up.This is likely due to the higher contrast in backscatter between classes which is visibly noticeable during that time.The IRGS-RF approach outperformed the other methods tested and had a consistently high level of agreement with reference data independent of period, with only one percent difference between freeze-up and break-up accuracy.Examples of classification outcomes for all tested methods are shown in Figures 5-7 and 9, and are discussed in the following section.

Discussion
The proposed IRGS-RF method achieved better performance compared with the other methods in general according to the numerical results, as well as upon further visual inspection.The scene of 27 November 2014, which is shown in Figure 5, serves as a good example of performance for the methods tested during freeze-up, when multiple signatures of ice and water are present.Figure 5a shows a complex HH polarized scene from GBL.Three main areas of open water are present, each with similarly high backscattering and texture present.In this case the IRGS-RF result has 98.5% agreement with reference data, adequately capturing the three main areas of open water.The pixel-based RF method also performed well, but produced small ice errors on sections of black ice in the northwest arm.The IRGS-Manual approach performed below average for this scene with 86% accuracy.A common flaw in this method arises during the gluing step when homogeneous segments from each autopolygon are merged across the entire scene to create the final class outputs.These segments should remain homogeneous after gluing; however, ice and water segments from the autopolygons sometimes become merged.This gluing step is necessary to ease manual classification as the user only needs to label a handful of segments instead of the hundreds present before this step.This flaw often forces the user to choose a single label for a segment they know to be heterogeneous based on which label accounts for more of the segment, as is the case in Figure 5b and several other scenes during freeze-up, likely contributing to the lower average freeze-up accuracy of this method.An advantage of the IRGS-SVM and IRGS-RF automated methods is that gluing into a low number of final classes is not needed, and each segment within and across all autopolygons can be labeled individually since time consuming manual labeling is no longer required.
An example from the ice break-up period acquired on 25 May 2016 is shown in Figure 6 where a small amount of open water has begun to appear in the southern arm of GBL.Most of the lake has a solid ice cover with relatively low backscatter when compared to earlier images, likely resulting from internal ice melt, melt water ponds, or melting snow cover lowering the reflected radar signal [39].A narrow range of backscatter values can still be observed, with ice in the south arms of the lake appearing brighter than that in the central basin.Some unique ice textures are also present.IRGS-Manual performs near perfectly (as the user can simply label all segments as ice), but misses the small area of open water.Conversely, all the automated methods produce ice error in multiple regions.
Although SVM has been proven as a powerful method for binary classification, it has some limitations.First, it requires several well-tuned key parameters to achieve a satisfactory result.Second, SVM projects features to a high-dimensional space, which demands significant computational power and may lead to overfitting.Figures 5c and 6c display noticeable noise-like errors typical to many classification results from the pixel-based SVM method.This method failed to properly label ice and water in several cases including scenes from 6 November 2014, 1 July 2015 and 30 June 2016, where agreement with reference data was as low as 10.3%.IRGS-SVM also failed in these cases, as IRGS-SVM labeling uses the SVM result for segment-wise majority voting.These issues are not common in the in the pixel-based RF or IRGS-RF results.Bootstrap aggregating, which is the principle idea of RF, suppresses overfitting by limiting the features and samples for training.In addition, RF does not require feature selection as it determines importance for each feature [36].Despite these advantages, the RF and IRGS-RF methods both perform relatively poorly in the classification of the 25 May 2016 scene (Figure 6e,f) when compared to other results from these methods, though both showed better overall performance than the SVM methods.
Although the RF and IRGS-RF results shown in Figure 6 were poorer than average, the integration with IRGS improved the classification, with final accuracy from IRGS-RF increased to 88.3% from 70.0% without IRGS.It is possible that surface melt conditions may have contributed to the high amount of ice error in this case.In other scenes from spring break-up the IRGS-RF method classified total ice cover with perfect or near perfect accuracy (for example scenes 20150528_142923, 20150606_012716, and 20160518_144544 shown in Figure 7f).
An automated classification method must be able to characterize key events in the ice phenology cycle including full ice cover, the beginning of melt onset, and when the lake is clear of ice, as these metrics are commonly used to quantify ice cover change in long term studies [2,3].Figure 7 shows the results of RF and IRGS-RF from 10 July 2014 and 18 May 2016 when the lake is fully open water and completely ice covered.In Figure 7a, the surface appears dark with some slightly brighter patches caused by wind roughness.Both the pixel-based RF and IRGS-RF methods perform well at 99.2% and 99.7% respectively, proving them to be robust enough to characterize open water when surface roughness from wind is present.Figure 7b shows GBL totally ice covered, with a narrow range of backscatter characteristics similar to those seen in Figure 6a from later that month.Pixel-based RF achieved 72.3% classification accuracy with ice error present in much of the southern and eastern basins.In the IRGS-RF result, most of these errors are removed and accuracy is improved to 95.5%.The classification improvement in scenes with high ice cover by integrating IRGS with RF is further visualized in Figure 8. IRGS-RF results from 6 June 2015, as well as the previously mentioned scenes from 18 and 25 May 2016, are all shown to follow the reported ice fraction from CIS more closely than the RF results.Examples are given in Figures 7e,f, and 9e,f, where the ice cover fraction and results from IRGS-RF classification are displayed along with the fraction reported by CIS, showing that the proposed method adequately captures ice cover and is comparable with the CIS fraction.In general, the addition of IRGS segmentation before RF labeling improved the accuracy of results, and average accuracy increased by 2.5% from 93.3% for RF to 95.8% for IRGS-RF.Although this is numerically a small improvement, the value of this addition cannot be demonstrated only by numerical results.The IRGS step refines ice-water boundaries and suppresses ice error and water error.This improvement is demonstrated in Figures 5f-7f, and is confirmed by the examples shown in Figure 9.In Figure 9c from 6 June 2015, the pixel-based RF has erroneously labeled regions in the northwest arm as open water.IRGS-RF was able to capture that the whole lake shares the same spatial contextual information and overall accuracy improved to 99.5%.For the scene acquired on 2 December 2015 (Figure 9b) the numerical improvement from RF to IRGS-RF is only 0.8% but the improvement is more noticeable when visually comparing the two results.Red circles in Figure 9d show regions of ice error which were eliminated in the IRGS-RF result.In green circle 1 the pixel-based RF result has lost detail at the ice-water boundary, where both classes have similar signatures.This detail is preserved in the IRGS-RF result.Similarly, the ice error shown in green circle 2 is refined by the proposed method.Based on the demonstrated improvements, integration of IRGS segmentation with RF is valuable for minimizing classification error, especially considering this step adds less than one minute to computation time.

Conclusions
The robust and automated IRGS-RF method is proposed in this work to classify lake ice and open water.This method combines IRGS segmentation and supervised pixel-wise RF labeling, both of which are state-of-the-art methods in remote sensing.Five methods including the proposed approach were tested on 36 RADARSAT-2 scenes of Great Bear Lake from 2013 to 2016.The results were validated against a reference dataset created through human interpretation, which is currently the most common operational method for ice-water classification.Results from the proposed method were also compared to the reported fraction from CIS at the time.
For pixel-wise classification algorithms, RF performed much better than SVM with overall accuracy at 93.3% and 74.7% respectively.When combined with spatial context information provided by IRGS segmentation, both methods increased in accuracy.The proposed IRGS-RF method achieved overall accuracy of 95.8%, a slight improvement over the previously tested IRGS-Manual (93.8% accuracy).IRGS-RF performed reliably well across both the freeze-up and break-up periods making it a robust classification tool for operational use.
The value of IRGS segmentation integrated with RF labeling is not only captured in the numerical results, but demonstrated through visual interpretation.Ice-water boundaries are refined, and noise-like errors are suppressed.Although the proposed method achieved good results, some limitations exist.First, processing time is longer than human interpretation due to GLCM feature extraction.Second, some ice error and water error occur, especially in early spring.However, IRGS-RF shows promise as an automated means of accurately processing SAR data for operational lake ice monitoring.RF labeling minimizes the need for visual interpretation, an essential advancement needed to process the vast amount of imagery becoming available from recently launched SAR missions including Sentinel 1 A/B and the RADARSAT Constellation Mission (RCM).
It has been noted that supervised classification algorithms often only perform well on specific geographic regions based on training data and thus cannot be widely applied without regional validation [32].Future work should test the IRGS-RF method on a larger dataset including multiple lakes at several latitudes and locations, training and testing the model under a variety of scenarios.This will further test the robustness of the method for operational use.

Figure 1 .
Figure 1.Location of Great Bear Lake within Canada.Footprints of the 36 RADARSAT-2 scenes used in this work are shown in yellow.

Figure 2 .
Figure 2. Steps of the IRGS segmentation approach for the scene from 2 December 2015.(a) HH polarization SAR image, (b) HV polarization SAR scene with brightness increased by 75%, (c) Autopolygons; HV image after autopolygon segmentation, (d) Segmentation; all segments 'glued' into the final chosen number of classes.

Figure 3 .
Figure 3. Flowchart of ice-water classification system.Inputs are images, landmask, and trained classifier (SVM or RF).The left block is unsupervised segmentation using IRGS, while right block is supervised pixel-based labeling.The final classification result is the combination of segmentation and labeling based on majority voting.

Figure 4 .
Figure 4. Box and whisker plots showing the distribution of classification accuracy values resulting from each method for the tested scenes overall (36 scenes) and by period (20 and 16 for freeze-up and break-up respectively).Mean is represented by "x".Outliers which exceed a value of 1.5 times below the first quartile are represented by dots.

Figure 7 .
Figure 7. Ice-water classification of scenes where GBL is fully open water and fully ice covered.Ice is shown in yellow and water in blue.Left: classification results of 10 July 2014 with full open water.From top to bottom: (a) HH image, (c) RF pixel-wise classification with accuracy of 99.2%, and (e) IRGS-RF; IRGS segmentation labeled by RF with accuracy of 99.7%.Right: classification results of 18 May 2016, with full ice cover.From top to bottom: (b) HH image, (d) RF pixel-wise classification with accuracy of 72.3%, and (f) IRGS-RF; IRGS segmentation labeled by RF with accuracy of 95.5%.IRGS-RF and CIS ice cover fractions are shown in windows (e,f) for comparison.

Figure 8
Figure 8 displays the amount of ice cover on GBL resulting from pixel-based RF and IRGS-RF classification (as a fraction out of ten) in comparison to the ice cover reported by CIS.Both methods closely follow the reported ice fraction from CIS, varying by 1/10th or less in 26 and 27 of the 36 scenes for RF and IRGS-RF, respectively.The largest discrepancy in ice cover occurs in the scene from 1 July 2015 where the RF and IRGS-RF classifications resulted in total ice coverage of 1.0/10 and 0.7/10 while the recorded fraction from CIS was 5.5/10.Upon visual inspection the IRGS-RF result most correctly represents the ice conditions present in the scene, with accuracy at 92.8%.Although the ice fraction for the RF result is closer to the CIS reported fraction, several areas of open water error are present in the central and southern basin of the lake.These errors are likely caused by increased backscatter from wind roughness which is visible in the HH polarization.The classification improvement in scenes with high ice cover by integrating IRGS with RF is further visualized in Figure8.IRGS-RF results from 6 June 2015, as well as the previously mentioned scenes from 18 and 25 May 2016, are all shown to follow the reported ice fraction from CIS more closely than the RF results.Examples are given in Figures7e,f, and 9e,f, where the ice cover fraction and results from IRGS-RF classification are displayed along with the fraction reported by CIS, showing that the proposed method adequately captures ice cover and is comparable with the CIS fraction.

Figure 8 .
Figure 8. Ice cover fraction for GBL generated with IRGS-RF with comparison to the reported fraction from CIS.The date of each SAR scene in this study is shown with the corresponding fraction; however the date of reported ice cover from CIS may vary from this date by ±3 days.

Figure 9 .
Figure 9. Ice-water classification accuracy improvement from pixel-based RF to IRGS-RF.Ice is shown in yellow and water in blue.Left: classification results of 6 June 2015 (scene ID: 20150606_012716), from top to bottom: (a) HH image, (c) RF pixel-wise classification with accuracy of 90.0%, and (e) IRGS-RF; IRGS segmentation labeled by RF with accuracy of 99.5%.Right: classification results of 2 December 2015 (scene ID: 20151202_144602), from top to bottom: (b) HH image, (d) RF pixel-wise classification with accuracy of 98.0%, and (f) IRGS-RF; IRGS segmentation labeled by RF with accuracy of 98.8%.Red circles show ice error which is suppressed in IRGS-RF result, green circles demonstrate the refinement of ice edge.IRGS-RF and CIS ice cover fractions are shown in windows (e,f) for comparison.

Table 1 .
List of SAR scenes used in this work.Highlighted scenes indicate ice freeze-up period while non-highlighted indicate ice break-up period.

Table 3 .
List of 31 GLCM and statistical features selected using recursive feature elimination and cross-validation strategy.Window size and step size are shown in pixels.

Table 4 .
Classification accuracy results of each tested method for all 36 scenes (shown in percent).Accuracy and error totals may not add to 100% due to rounding.