End-to-End Classification Network for Ice Sheet Subsurface Targets in Radar Imagery

Sea level rise, caused by the accelerated melting of glaciers in Greenland and Antarctica in recent decades, has become a major concern in the scientific, environmental, and political arenas. A comprehensive study of the properties of the ice subsurface targets is particularly important for a reliable analysis of their future evolution. Newer deep learning techniques greatly outperform the traditional techniques based on hand-crafted feature engineering. Therefore, we propose an efficient end-to-end network for the automatic classification of ice sheet subsurface targets in radar imagery. Our network uses bilateral filtering to reduce noise and consists of ResNet module, improved Atrous Spatial Pyramid Pooling (ASPP) module, and decoder module. With radar images provided by the Center of Remote Sensing of Ice Sheets (CReSIS) from 2009 to 2011 as our training and testing data, experimental results confirm the robustness and effectiveness of the proposed network in radargram.


Introduction
In recent years, global warming has exacerbated the melting of glaciers in Greenland and Antarctica, which has considerable influence on sea level rise and, subsequently, the safety of those living in coastal and seaside locations. In order to gather data about glacier melting, it is vital to identify and quantify changes in the ice and bedrock layers of these glaciers. Historically, glaciologists probed the subsurface structure of ice sheets in polar regions by drilling ice cores, but newer methods-such as ground-penetrating radar (GPR) technology-enable scientists to gather robust data sets quickly and efficiently.
Prior work on this topic involves analyzing data from radar sounder instruments (known as radargrams or echograms) to draw inferences about the properties of the ice sampled. Radar sounders, which are usually operated on airborne or satellite platforms, are active instruments that can perform non-intrusive depth measurements of the subsurface structure of the ice sheets on a large spatial scale. While the advent of ground-penetrating radar made the data collection process more efficient, the data analysis process is still intensely time-consuming because it is typically done by hand.
Newer work in this field utilizes image processing, computer vision, and deep learning techniques to automatically or semi-automatically determine ice surface and bottom boundaries from echograms [1][2][3][4][5][6][7]. Gifford et al. [1] employs both the edge-based and active contour methodologies to automate the task of locating polar ice and bedrock layers from airborne radar data acquired over Greenland and Antarctica. In the edge-based approach, edge detection, thresholding, and edge following are utilized to identify the layers of interest for the ice thickness estimation. The active contour approach involves fitting a contour to the boundary using image and contour costs, as well as a gradient force that pushes the contour upward from the bottom of the image. Both methods have their [17][18][19] propose a series of DeepLab networks for image pixel classification, where the latest network, DeepLabv3+, provides the accurate and high resolution results in PASCAL-VOC2012. Yuan et al. [20] introduces an object context pooling (OCP) scheme and focus on the context aggregation strategy for robust classification. Fu et al. [21] propose a dual attention network (DANet) to adaptively integrate local features with their global dependencies. Therefore, it is worth exploring strategies for applying these deep learning methods to the classification of the ice subsurface targets. In this paper, we propose a deep convolution classification network to achieve pixel-level classification of ice subsurface targets. This network is composed of filter processing, encoder and decoder. Our network has been validated on the radargram data set provided by the Center of Remote Sensing of Ice Sheets (CReSIS) from 2009 to 2011, and the results show that our network can outperform an automatic classification system for the ice sheet subsurface targets based on support vector machine [10], Deeplab networks [17][18][19], object context network [20], and dual attention network for scene segmentation [21].
The main contributions of this paper are listed as follows: (1) for the first time in the literature, we introduce the deep convolution network to realize the classification of the ice subsurface targets, and the network realizes the end-to-end processing, which avoids the complex feature engineering for the ice subsurface targets; (2) in the encoder, the modified ASPP structure is used to obtain multiscale features and improve classification accuracy by removing image level features and changing feature dimensions; (3) in the decoder, a reasonable method of feature fusion is used to solve the problem of long network training and testing times and further improve the accuracy; (4) we use a bilateral filtering algorithm to reduce the speckle noise of radar image to provide the high level of information from radar images.

Materials and Methods
To decipher and categorize these radargrams, we first reduce the noise in the radar images. Then, we apply the proposed network to robustly classify these ice subsurface targets.
The proposed network structure is shown in Figure 2. Images are pre-processed with bilateral filtering [22], which reduces the amount of noise interference. The network is comprised of an encoder and a decoder. The encoder, which includes the network backbone and the ASPP module, extracts radar image features. The ResNet network, which includes atrous convolution, forms the network backbone, and captures long-range information without changing the image resolution. The The traditional classification methods need to design features manually, which are high complexity and slow speed, and are not suitable for large, complex datasets. In a more recent development, deep learning technology is employed to better estimate the ice and bedrock boundaries in glacier echograms. Kamangir et al. [14] combines holistically-nested edge detection (HED) [15] with the undecimated wavelet transform technique to develop an end-to-end ice boundary detection network. Xu et al. [16] proposes a multi-task spatiotemporal neural network that combines 3D ConvNets and recurrent neural network (RNN) to estimate ice surface boundaries from sequences of tomographic radar images. However, the identification of the ice subsurface targets using deep learning technology has not been fully explored. Deep learning algorithms are efficient in many public data sets (e.g., Cityscapes dateset, PASCAL-VOC dataset) because they can automatically learn the features of the different data scales without manual intervention. Chen et al. [17][18][19] propose a series of DeepLab networks for image pixel classification, where the latest network, DeepLabv3+, provides the accurate and high resolution results in PASCAL-VOC2012. Yuan et al. [20] introduces an object context pooling (OCP) scheme and focus on the context aggregation strategy for robust classification. Fu et al. [21] propose a dual attention network (DANet) to adaptively integrate local features with their global dependencies. Therefore, it is worth exploring strategies for applying these deep learning methods to the classification of the ice subsurface targets.
In this paper, we propose a deep convolution classification network to achieve pixel-level classification of ice subsurface targets. This network is composed of filter processing, encoder and decoder. Our network has been validated on the radargram data set provided by the Center of Remote Sensing of Ice Sheets (CReSIS) from 2009 to 2011, and the results show that our network can outperform an automatic classification system for the ice sheet subsurface targets based on support vector machine [10], Deeplab networks [17][18][19], object context network [20], and dual attention network for scene segmentation [21].
The main contributions of this paper are listed as follows: (1) for the first time in the literature, we introduce the deep convolution network to realize the classification of the ice subsurface targets, and the network realizes the end-to-end processing, which avoids the complex feature engineering for the ice subsurface targets; (2) in the encoder, the modified ASPP structure is used to obtain multi-scale features and improve classification accuracy by removing image level features and changing feature dimensions; (3) in the decoder, a reasonable method of feature fusion is used to solve the problem of long network training and testing times and further improve the accuracy; (4) we use a bilateral filtering algorithm to reduce the speckle noise of radar image to provide the high level of information from radar images.

Materials and Methods
To decipher and categorize these radargrams, we first reduce the noise in the radar images. Then, we apply the proposed network to robustly classify these ice subsurface targets.
The proposed network structure is shown in Figure 2. Images are pre-processed with bilateral filtering [22], which reduces the amount of noise interference. The network is comprised of an encoder and a decoder. The encoder, which includes the network backbone and the ASPP module, extracts radar image features. The ResNet network, which includes atrous convolution, forms the network backbone, and captures long-range information without changing the image resolution. The decoder combines the features extracted by the encoder with the low-level features. We then apply two 3*3 convolutions to refine the features, follow by a simple bilinear upsampling by a factor of 4. We adopt an improved ASPP structure. By removing image-level features, we change the five branches structure into a four branches structure. Additional input feature dimension compression during convolution removes redundant information. To reduce the training time, we modify the channel combination of the decoder by changing the original channel combination from (256,48) to (64,16). The improved network achieves end-to-end processing, and results in faster, more accurate classification of the ice sheet subsurface targets in radar images.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 16 decoder combines the features extracted by the encoder with the low-level features. We then apply two 3*3 convolutions to refine the features, follow by a simple bilinear upsampling by a factor of 4. We adopt an improved ASPP structure. By removing image-level features, we change the five branches structure into a four branches structure. Additional input feature dimension compression during convolution removes redundant information. To reduce the training time, we modify the channel combination of the decoder by changing the original channel combination from (256,48) to (64, 16). The improved network achieves end-to-end processing, and results in faster, more accurate classification of the ice sheet subsurface targets in radar images.

Noise Removal during Image Pre-Processing
Ground-penetrating radar is capable of retrieving sample data across vast areas of ice, but the resulting images are often plagued by speckles and other types of interference. Speckle noise reduces the quality of radar images, and seriously hinders the interpretation and further interpretation of images (i.e., image feature extraction, image segmentation, etc.). Radar image quality can be improved with a number of noise-reduction techniques. Rahnemoonfar et al. [4] successfully denoised images using the anisotropic diffusion method, which requires regional filter smoothing. Kamangir et al. [3] applied an undecimated wavelet transform to decompose the ice radar image into wavelet sub-bands, and then improved overall image quality using threshold processing. Radar image noise reduction must not only reduce noise, but also maintain relatively complete edge information. Inspired by the work proposed by Tomasi and Manuchi [22], in which a bilateral filtering method reduces speckle noise and maintains image edges, we employ bilateral filtering to reduce the noise in the radar images in our proposed classification framework.
Bilateral filtering [22] is a filtering method that considers both the spatial and range information of an image. The method is non-iterative, local, and simple. Bilateral filtering model can be described as

Noise Removal during Image Pre-Processing
Ground-penetrating radar is capable of retrieving sample data across vast areas of ice, but the resulting images are often plagued by speckles and other types of interference. Speckle noise reduces the quality of radar images, and seriously hinders the interpretation and further interpretation of images (i.e., image feature extraction, image segmentation, etc.). Radar image quality can be improved with a number of noise-reduction techniques. Rahnemoonfar et al. [4] successfully denoised images using the anisotropic diffusion method, which requires regional filter smoothing. Kamangir et al. [3] applied an undecimated wavelet transform to decompose the ice radar image into wavelet sub-bands, and then improved overall image quality using threshold processing. Radar image noise reduction must not only reduce noise, but also maintain relatively complete edge information. Inspired by the work proposed by Tomasi and Manuchi [22], in which a bilateral filtering method reduces speckle noise and maintains image edges, we employ bilateral filtering to reduce the noise in the radar images in our proposed classification framework.

of 15
Bilateral filtering [22] is a filtering method that considers both the spatial and range information of an image. The method is non-iterative, local, and simple. Bilateral filtering model can be described as with the normalization where f (·) and h(·) are the input and output image, respectively. x denotes the current pixel, and ξ references the pixels adjacent to pixel x. c(ξ, x) signifies the geometric distance between x and ξ. s( f (ξ), f (x)) approximates the grayscale value between x and ξ. c(ξ, x) and s( f (ξ), f (x)) are expressed as where σ d denotes geometric distance variance and σ r denotes approximate variance of the grayscale value.
From (3), we can see that the points far from the center x in space have little influence on the final filtering result of x. Spatial filtering weights the adjacent points in space, and the weighting coefficient diminishes with the increased distance. From (4), we can see that the range filtering weights the adjacent points with similar grayscale values, and the weighting coefficient decreases with the large variation in adjacent pixel grayscale values. Because bilateral filtering includes both spatial and range information of the image, it can significantly reduce speckle noise in radar images. We use bilateral filtering [22] in training and test images, which allows for noise reduction of images and improves classification performance in the radar images.

Encoder Structure
We use ResNet-101 [23] as the backbone network of the encoder structure. ResNet is a residual learning framework used to ease the training of networks that are substantially deeper than those used previously. The backbone network (ResNet)-which consists of conv1, a pool1, and four block modules-extracts high-level features from ice radar images (Figure 3a). Each block consists of several bottleneck units (Figure 3b). We use output stride to represent the ratio of input image spatial resolution to final output resolution, and set the output stride to 16. After the filtered image is input into the backbone network and passed through conv1 and pool1, the output resolution is 1/4 of the input image resolution. Then, after passing through block1, block2, block3, and block4 in turn, the output resolution becomes 1/8, 1/16, 1/16, and 1/16 of the input image, respectively. Here, the stride of the last pooling or convolutional layer in block3 is set to 1 to avoid signal decimation. Then, all subsequent convolutional layers are replaced with atrous convolutional layers with rate r = 2. Atrous convolution adaptively modifies the filter's field-of-view by changing the rate value, which allows us to extract dense feature responses without changing the image spatial resolution or learning any extra parameters [17,18]. The backbone network finally outputs 2048 dimensional features as the input of the atrous spatial pyramid pooling module.
Atrous spatial pyramid pooling (ASPP) is inspired by the efficacy of spatial pyramid pooling [24], which demonstrates that it is effective to resample features at different scales to accurately and efficiently classify regions of an arbitrary scale. We review the ASPP module proposed in [19], as shown in Figure 4a, where three atrous convolutions with different atrous rates, image-level features, and a 1*1 convolution are applied behind the network backbone. The features of all branch results are concatenated and then fed through a 1*1 convolution. Based on the success of the original ASPP module, we propose a new ASPP structure to further improve the classification performance and reduce training parameters, as illustrated in Figure 4b.    Considering the ordered distribution of ice subsurface targets, the features captured by the atrous convolution with different rates in the ASPP module represent the global contextual information captured by image-level features. Therefore, we remove image-level features from consideration and use a four branches structure to replace the five branches structure of traditional ASPP. In addition, we use a 1*1 convolution with 512 dimensions to change the input feature dimension of atrous convolution. The original ASPP module is designed to evaluate data with respect to the PASCAL VOC 2012 semantic segmentation benchmark [25], which contains 20 foreground object classes and one background class, and has an input feature dimension of 2048. However, ice radar images can only be divided into four categories. In the original ASPP module, the input feature dimension of 2048 is high, which will not only cause feature redundancy, but also increase the training time. Therefore, a 1*1 convolution is employed to reduce both the input feature dimension (to 512) and the time needed to reconstruct features. Similarly, we modify the encoder output feature from 256 to 64, which is beneficial to further refine features used in the decoder. Considering the ordered distribution of ice subsurface targets, the features captured by the atrous convolution with different rates in the ASPP module represent the global contextual information captured by image-level features. Therefore, we remove image-level features from consideration and use a four branches structure to replace the five branches structure of traditional ASPP. In addition, we use a 1*1 convolution with 512 dimensions to change the input feature dimension of atrous convolution. The original ASPP module is designed to evaluate data with respect to the PASCAL VOC 2012 semantic segmentation benchmark [25], which contains 20 foreground object classes and one background class, and has an input feature dimension of 2048. However, ice radar images can only be divided into four categories. In the original ASPP module, the input feature dimension of 2048 is high, which will not only cause feature redundancy, but also increase the training time. Therefore, a 1*1 convolution is employed to reduce both the input feature dimension (to 512) and the time needed to reconstruct features. Similarly, we modify the encoder output feature from 256 to 64, which is beneficial to further refine features used in the decoder.

Decoder Structure
We use a decoder structure in the whole network framework as shown in Figure 5. In order to use the edge information in low-level features to enrich the classified edge, the decoder concatenates the high-level and the low-level features in proportion. Then two 3*3 convolutions are used to refine the features, and the image features are bilinearly up-sampled by a factor of 4 to achieve end-to-end processing.
Because the number of encoder features channels is modified from original 256 to 64, we also reduce the channel number of the low-level features in the decoder from 48 to 16. The two types of features are then combined and processed using two 3*3 convolutions with 64 channels. In the decoder structure, the (64,16) channel combination can reduce redundant information in feature fusion. We find that (64,16) channel combination improves the accuracy of radar image classification and significantly reduces the training time.

Decoder Structure
We use a decoder structure in the whole network framework as shown in Figure 5. In order to use the edge information in low-level features to enrich the classified edge, the decoder concatenates the high-level and the low-level features in proportion. Then two 3*3 convolutions are used to refine the features, and the image features are bilinearly up-sampled by a factor of 4 to achieve end-to-end processing.
Because the number of encoder features channels is modified from original 256 to 64, we also reduce the channel number of the low-level features in the decoder from 48 to 16. The two types of features are then combined and processed using two 3*3 convolutions with 64 channels. In the decoder structure, the (64,16) channel combination can reduce redundant information in feature fusion. We find that (64,16) channel combination improves the accuracy of radar image classification and significantly reduces the training time.

Results
Our sample data consist of radar images of Antarctica acquired between 2009 and 2011 (i.e., 2009_Antarctica DC8, 2010 Antarctica DC8 and 2011 Antarctica TO). The first and second data sets (i.e., 2009 and 2010) were obtained by MCoRDS with bandwidth of 9.5 MHZ and transmission power of 550 W. The third data set (i.e., 2011) was acquired by MCoRDS2 with bandwidth of 30 MHZ and transmission power of 1050 W. For all data sets, pulse compression and windowing algorithms are used to improve the range resolution of the image, and synthetic aperture radar (SAR) processing is used to improve the azimuth resolution. The range resolution of radar image obtained by MCoRDS and MCoRDS2 is 13.6 m and 4.3 m respectively. The azimuth resolution of all images is 25 m. Moreover, a minimum variance distortionless response technique is employed to suppress the clutter from the cross-track direction [26]. In this work, 360 images are used for training and 100 images are used for testing. All images of training and testing contain all target classes, i.e., free space, layers, bedrock, and noise (including EFZ). Although the radar image dataset is small, the network can accelerate the convergence speed of gradient descent and obtain better classification performance, due to the ResNet model which is pretrained by the ILSVRC-2012-CLS image classification dataset in the training phase. Our implementation is built on TensorFlow. The batch normalization parameters are trained with decay = 0.9997. Momentum and weight decay coefficients are set to 0.9 and 0.0002 respectively. Batchsize is set to 4 with 10,240 iterations. We employ a 'poly' learning rate policy where the initial learning rate is 0.007. During training, we set the image crop size to 513, randomly flip the images from right to left, and scale the input images from 0.5 to 2.0.
Like other classification methods of the ice subsurface targets [10,11], this paper employs overall accuracy (OA) to evaluate classification performance of our network. Instead of taking OA as the evaluation of window sample classification (i.e., each window contains multiple pixels) as in [10] and

Results
Our sample data consist of radar images of Antarctica acquired between 2009 and 2011 (i.e., 2009_Antarctica DC8, 2010 Antarctica DC8 and 2011 Antarctica TO). The first and second data sets (i.e., 2009 and 2010) were obtained by MCoRDS with bandwidth of 9.5 MHZ and transmission power of 550 W. The third data set (i.e., 2011) was acquired by MCoRDS2 with bandwidth of 30 MHZ and transmission power of 1050 W. For all data sets, pulse compression and windowing algorithms are used to improve the range resolution of the image, and synthetic aperture radar (SAR) processing is used to improve the azimuth resolution. The range resolution of radar image obtained by MCoRDS and MCoRDS2 is 13.6 m and 4.3 m respectively. The azimuth resolution of all images is 25 m. Moreover, a minimum variance distortionless response technique is employed to suppress the clutter from the cross-track direction [26]. In this work, 360 images are used for training and 100 images are used for testing. All images of training and testing contain all target classes, i.e., free space, layers, bedrock, and noise (including EFZ). Although the radar image dataset is small, the network can accelerate the convergence speed of gradient descent and obtain better classification performance, due to the ResNet model which is pretrained by the ILSVRC-2012-CLS image classification dataset in the training phase. Our implementation is built on TensorFlow. The batch normalization parameters are trained with decay = 0.9997. Momentum and weight decay coefficients are set to 0.9 and 0.0002 respectively. Batchsize is set to 4 with 10,240 iterations. We employ a 'poly' learning rate policy where the initial Appl. Sci. 2020, 10, 2501 8 of 15 learning rate is 0.007. During training, we set the image crop size to 513, randomly flip the images from right to left, and scale the input images from 0.5 to 2.0.
Like other classification methods of the ice subsurface targets [10,11], this paper employs overall accuracy (OA) to evaluate classification performance of our network. Instead of taking OA as the evaluation of window sample classification (i.e., each window contains multiple pixels) as in [10] and [11], we take OA to measure the image pixel classification results. OA represents the percentage of predicted correct classification to total samples. At the same time, we also use kappa coefficient as the evaluation metric, which integrates the diagonal and non-diagonal terms of the confusion matrix.

ASPP Design Choices
To evaluate the effect of different branches in ASPP, we compare the performance of different structures after removing different branches in ASPP, as shown in Table 1. After removing image-level features, the classification performance of the structure is better than that of the original network and other variant structures, and an observation that validates our choice to remove image-level features from ASPP structure to improve network performance with respect to radar image classification. Using different input feature dimension values, we further demonstrate the positive effect of reducing the ASPP input feature dimension from 2048 to 512 ( Table 2). The OA and Kappa values are optimal with 512 dimensions; other dimension values resulted in lower classification performance because of feature redundancy (higher dimension), or loss of image details (lower dimension). With this improved ASPP structure, we have improved the classification performance over that of the original ASPP structure.

Decoder Design Choices
We change the channel combination of both the encoder features and the low-level features in the decoder structure. In order to evaluate the effectiveness of our proposed channel combination method, we compare the classification performances of four different channel combinations and the number of the network model parameters generated during their training. As shown in Table 3, the channel combination (64,16) has the highest OA and Kappa values and the best classification performance. This channel combination also requires the second lowest number of parameters, surpassed only by channel combination (32,8). Compared to the original structure, the number of parameters in combinations (32,8) and (64,16) are reduced by 25% and 22%, respectively. Channel combination (64, 16) represents the optimal combination of strong classification performance and a reduced number of parameters in the network model. We also design different convolution structures for the decoder module, and report the findings in Table 4. The feature dimension of our convolution structure is 64; after concatenating the feature maps, we find that it is best to employ two 3*3 convolutions with 64 channels to refine the feature map.

Image Filtering
We compare the noise suppression effects of three different noise reduction methods on radar images: Lee filtering, anisotropic diffusion, and bilateral filtering. To quantify the noise reduction of each method, we use the equivalent number of looks (ENL) and edge preserving index (EPI) metrics as the primary evaluation criteria. ENL represents the image smoothing effect, and EPI represents the ability of the network to preserve image edges. ENL and EPI are expressed as where µ denotes the mean value of the image, σ 2 denotes the variance of the image, p s (i, j) denotes the grayscale value of the output image at the point (i, j), and p o (i, j) denotes the grayscale value of the input image at the point (i, j). A higher value of ENL corresponds to a smoother image; a higher EPI value corresponds to better preserved image edges. Using our experimental dataset, we calculate the EPI and ENL values for Lee filtering, anisotropic diffusion, and bilateral filtering [22] (Table 5). Bilateral filtering yields the best image smoothing and edge preservation of the radar images. Figure 6 visualizes the images after different filtering methods: original image (Figure 6a), Lee filtering (Figure 6b), anisotropic diffusion (Figure 6c), and bilateral filtering (Figure 6d). To demonstrate that filtering can improve the classification accuracy of the network, we calculate the OA and Kappa values for network models that had been trained and tested on the original image and the filtered image (Table 6); the network trained and tested with the filtered images results removes more noise from the images, resulting in improved ice subsurface target classification.

Comparison with Deep Learning Methods
To assess the classification performance of our network, we use our data set to calculate the OA and Kappa values for the other methods ( Table 7). The first three are different DeepLab network versions propose recently by Chen et al., all of which use ResNet-101 as the network backbone, and are generally considered to be methods with high accuracy and robustness. The next two methods are two kinds of OCNet, ResNet-OC, and ASP-OC, which introduce object context pooling (OCP) on the basis of ResNet and ASPP modules respectively. DANet proposes to append two types of attention modules on top of dilated FCN. OCNet and DANet all achieve new state-of-the-art performance in challenging scene pixel level classification.
As shown in Table 7, the proposed approach shows a consistent performance improvement over the other classification methods. OCNet and DANet do not achieve as good a performance in the classification of ice subsurface targets as they do in scene pixel level classification. From this

Comparison with Deep Learning Methods
To assess the classification performance of our network, we use our data set to calculate the OA and Kappa values for the other methods ( Table 7). The first three are different DeepLab network versions propose recently by Chen et al., all of which use ResNet-101 as the network backbone, and are generally considered to be methods with high accuracy and robustness. The next two methods are two kinds of OCNet, ResNet-OC, and ASP-OC, which introduce object context pooling (OCP) on the basis of ResNet and ASPP modules respectively. DANet proposes to append two types of attention modules on top of dilated FCN. OCNet and DANet all achieve new state-of-the-art performance in challenging scene pixel level classification.
As shown in Table 7, the proposed approach shows a consistent performance improvement over the other classification methods. OCNet and DANet do not achieve as good a performance in the classification of ice subsurface targets as they do in scene pixel level classification. From this comparison, note that Deeplabv3+ obtains better classification results, which indicates that ASPP and decoder modules are necessary for the classification of ice subsurface targets. Although Deeplabv3+ and the proposed method both have ASPP and decoder modules in the structure, as compared with Deeplabv3+, the proposed method shows improvements of 0.13% OA on a radar image dataset; that is, the proposed method modified using bilateral filtering, ASPP of removing image level features, changing feature dimensions, and effective decoding of feature fusion can learn more information from radar images. Table 7. Comparison of classification performance of different deep learning methods.

Visualization of Results
To evaluate the classification performance of our method more intuitively, we visualize the classification maps in Figure 7. The results show that our method has a good effect on the details of classification targets, especially the highlighted areas marked by rectangles in the figures where the bedrock exhibits slight changes (Figure 7a,c) and the layers change continuously (Figure 7b,d). We further compare our method with other classification method of ice subsurface targets (i.e., reference [10]). Considering that [10] takes OA as the evaluation of window sample classification, which is not suitable for comparison with the OA used to measure the more detailed image pixel classification in this paper, we only qualitatively visualize the classification results of the two methods, as shown in Figure 8. Compared with the bedrock area in the radargram, the classification results of the method reported in [10] in the bedrock area are noticeably wider (Figure 8a,c), which can be caused by fixed window classification. However, our method gets more accurate results in bedrock areas ( Figure 8b). Moreover, our method is very fast, taking only an average of 2 s to infer each image on a computer with Intel Core i7-7700 @ 3.6GHZ and NVIDIA GTX1080Ti GPU card. The method of [10] needs about 45 s to infer each image by using a cluster of 192 CPUs (at 2.05 GHz).

Visualization of Results
To evaluate the classification performance of our method more intuitively, we visualize the classification maps in Figure 7. The results show that our method has a good effect on the details of classification targets, especially the highlighted areas marked by rectangles in the figures where the bedrock exhibits slight changes (Figure 7a,c) and the layers change continuously (Figure 7b,d). We further compare our method with other classification method of ice subsurface targets (i.e., reference [10]). Considering that [10] takes OA as the evaluation of window sample classification, which is not suitable for comparison with the OA used to measure the more detailed image pixel classification in this paper, we only qualitatively visualize the classification results of the two methods, as shown in Figure 8. Compared with the bedrock area in the radargram, the classification results of the method reported in [10] in the bedrock area are noticeably wider (Figure 8a,c), which can be caused by fixed window classification. However, our method gets more accurate results in bedrock areas (Figure 8b). Moreover, our method is very fast, taking only an average of 2 s to infer each image on a computer with Intel Core i7-7700 @ 3.6GHZ and NVIDIA GTX1080Ti GPU card. The method of [10] needs about 45 s to infer each image by using a cluster of 192 CPUs (at 2.05 GHz). Figure 7. Examples of (a,b) radargrams and (c,d) corresponding classification maps generated with the presented network. In the classification maps, each class corresponds to a different color: free space (green); layers (yellow); bedrock (red); noise and EFZ (black).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 16 Figure 7. Examples of (a,b) radargrams and (c,d) corresponding classification maps generated with the presented network. In the classification maps, each class corresponds to a different color: free space (green); layers (yellow); bedrock (red); noise and EFZ (black).

Figure 8.
Examples of (a) radargram, (b) corresponding classification maps generated with the presented network and (c) corresponding classification maps generated with the [10] method. In the classification maps, each color represents a different target class: free space (black); layers (blue); bedrock (red); noise and EFZ (yellow).

Experimental Results on Image Boundaries
In order to evaluate the performance of our classification results on the boundary, we obtain the boundary results of our method (i.e., ice surface and ice bottom), which directly affect the accuracy of the ice thickness calculation. We use the widely used balanced F-measure for evaluation. The Fmeasure equation is Figure 8. Examples of (a) radargram, (b) corresponding classification maps generated with the presented network and (c) corresponding classification maps generated with the [10] method. In the classification maps, each color represents a different target class: free space (black); layers (blue); bedrock (red); noise and EFZ (yellow).

Experimental Results on Image Boundaries
In order to evaluate the performance of our classification results on the boundary, we obtain the boundary results of our method (i.e., ice surface and ice bottom), which directly affect the accuracy of the ice thickness calculation. We use the widely used balanced F-measure for evaluation. The F-measure equation is precision = TP TP + FP (8) recall = TP TP + FN (9) where TP is a true positive, FP is a false positive, and FN is a false negative. The F-measure is the weighted harmonic mean of precision and recall. We calculate the precision, recall, and F-measure in the test set. Our method obtains 77% F-measure for the entire test set, which is same accuracy as the existing state-of-the-art boundary detection method [14]. Figure 9 visualizes the boundary result maps. Comparing Figure 9b, the result of the proposed network, with Figure 9a, the manually picked interfaces, it can be concluded that the proposed network gets the similar result as the manually picked interfaces; furthermore, our result appears to be even more accurate in some parts as shown in Figure 9c,d. In conclusion, our method not only gets high precision classification results, but also shows promising results with respect to manually picked data on the boundary.
where TP is a true positive, FP is a false positive, and FN is a false negative. The F-measure is the weighted harmonic mean of precision and recall. We calculate the precision, recall, and F-measure in the test set. Our method obtains 77% Fmeasure for the entire test set, which is same accuracy as the existing state-of-the-art boundary detection method [14]. Figure 9 visualizes the boundary result maps. Comparing Figure 9b, the result of the proposed network, with Figure 9a, the manually picked interfaces, it can be concluded that the proposed network gets the similar result as the manually picked interfaces; furthermore, our result appears to be even more accurate in some parts as shown in Figure 9c,d. In conclusion, our method not only gets high precision classification results, but also shows promising results with respect to manually picked data on the boundary.

Conclusions
In this paper, we have presented a novel method for the automatic classification of ice sheet subsurface targets, which can automatically divide the radar images into various categories for analysis of ice sheet characteristics and therefore solve the time-consuming problem of ice radar

Conclusions
In this paper, we have presented a novel method for the automatic classification of ice sheet subsurface targets, which can automatically divide the radar images into various categories for analysis of ice sheet characteristics and therefore solve the time-consuming problem of ice radar image data analysis. Our methodology, which uses ResNet and improved ASPP modules to extract multi-scale features, and utilizes the decoder module to fuse high-level semantic features and low-level features, can accurately classify ice radar images at the pixel level. Improvements we make to the ASPP and decoder modules, as well as the employment of bilateral filtering, resulted in less noisy radar images and, subsequently, better classification performance with CReSIS radar image data from 2009 to 2011.