ASA-DRNet: An Improved Deeplabv3+ Framework for SAR Image Segmentation

Pollution caused by oil spills does irreversible harm to marine biosystems. To find maritime oil spills, Synthetic Aperture Radar (SAR) has emerged as a crucial mean. How to accurately distinguish oil spill areas from other types of areas is a committed step in detecting oil spills. Owing to its capacity to extract multiscale features and its distinctive decoder, the Deeplabv3+ framework has been developed into an excellent deep learning model in field of picture segmentation. However, in some SAR pictures, there is a lack of clarity in the segmentation of oil film edges and incorrect segmentation of small areas. In order to solve these problems, an improved network, named ASA-DRNet, has been proposed. Firstly, a new structure which combines an axial self-attention module with ResNet-18 is proposed as the backbone of DeepLabv3+ encoder. Secondly, a atrous spatial pyramid pooling (ASPP) module is optimized to improve the network’s capacity of extracting multiscale features and to increase the speed of model calculation and finally merging low-level features of different resolutions to enhance the competence of network to extract edge information. The experiments show that ASA-DRNet obtains the better results compared to other neural network models.


Introduction
Oil is called the "blood of industry" and plays a significant part in development of human society [1]. However, at the same time, the toxic and harmful substances contained in oil can also cause harm to the natural environment and human health. In the wake of rapid expansion of oil exploration at sea and the shipping industry, oil-spill accidents frequently occur, and oil-spill pollution at sea has become one of the major threats to the marine environment [2]. In the few past decades, there have been many major marine oil spills around the world, which have seriously damaged the local marine ecological environment [3]. In early March 2019, a cargo ship ran aground on a reef in the Solomon Islands, causing heavy oil spills. The local waters and coastline were and have been polluted due to untimely disposal, and spilled heavy oil is gradually approaching East Rennell Island, a World Natural Heritage Site (the world's largest atoll-shaped coral island built up from coral) [4].
Synthetic Aperture Radar (SAR) detects oil spills at sea by emitting electromagnetic pulses [5,6]. SAR gains electromagnetic information on the sea level from reflected echoes of the target [7,8]. When the scattering mechanism occurs on a clean sea surface, strong Bragging scattering occurs, showing as bright areas in SAR images. When the scattering mechanism occurs on the sea level which is covered by an oil slick, it decreases Bragging scattering and shows as black areas in SAR images [8][9][10].
At present, there are two main approaches for marine oil spill images segmentation: traditional algorithms and deep learning models. The traditional algorithms mainly use some elementary features of oil spill images to segment images, such as methods based on different thresholds, segmentation methods based on edge detection, segmentation methods based on polarization features, etc.

1.
ResNet-18 and axial self-attention are combined as the new backbone of the DeepLabv3+ encoder to enhance the extraction of important features, and avoid the interference of errors and irrelevant features, to obtain more adequate and comprehensive deeper features.

2.
Optimizing the structure of Atrous Spatial Pyramid Pooling (ASPP) is achieved by reducing the expansion rate of atrous convolutions in equal proportion, and then performing 2D decomposition which increases the perceptual field while reducing the parameters of the model. All of these improve the speed of detection and also avoid the loss of target information to obtain more comprehensive features. 3.
The capacity of the network to extract edge information is optimized by integrating low-level features at different resolutions to improve the accuracy of segmentation. This paper is organized as follows: Section 2 focuses on the related work of our research. In Section 3, we describe the proposed ASA-DRNet model based on the DeepLabv3+ framework, axial self-attention, improved ResNet-18, and optimized ASPP module. Section 4 focuses on SAR images datasets and validates the oil spill detection capability of the deep learning model. Section 5 contains the conclusion.

Related Work
Oil spill image segmentation algorithms based on conventional techniques and oil spill image segmentation algorithms based on deep learning models are the two primary categories that have been the focus of recent research in this area.

Traditional Methods
Some researchers have attempted to use traditional threshold segmentation methods, edge detection algorithms, and polarization feature-based segmentation algorithms for the task of segmenting oil spill images. Li et al. [25] proposed a double-threshold oil spill image segmentation algorithm based on a feature probability function. High and low thresholds are used to extract different levels of gray-scale information, and the feature probability function is used to morphologically segment the oil spill area. Wang et al. [26] first applied a 2D-Otsu threshold algorithm to SAR oil spill images' segmentation and optimized the algorithm by creating a new histogram region of coherent speckle multiplicative noise. Li et al. [27] proposed an algorithm based on maximum entropy threshold segmentation.
Edge detection techniques for images are an important tool for SAR image processing. Yin et al. [28] proposed a fuzzy enhancement theory combined with a genetic algorithm for oil spill image edge segmentation by improving the Pal.King edge detection algorithm. Singha et al. [29] proposed a SAR image segmentation method combining a threshold detection algorithm and a Canny edge detection algorithm.
Entropy (H), geometric intensity (v), and total power (%) are polarization characteristics that can provide complicated coherence matrices, scattering matrices, and other polarization information for detecting oil spills. These features are used for oil spill detection and have recently become a research hotspot. Ren et al. [30]. proposed a new polarization feature G based on eigenvalue decomposition, which not only reflects the polarization state between different targets in the set, but also describes the impurity of different scattering types in a statistical sense. Shu et al. [31]. comprehensively analyzed the performance of 36 polarization features of the condensed polarization SAR in oil spill image segmentation and found that the best segmentation accuracy was achieved by the odd scattering coefficients in the condensed polarization features.

Deep Learning Methods
Deep learning models have been used for the segmentation of SAR oil spill photos in recent years, thanks to the rapid progress of machine learning. Li et al. [32] developed a multiscale conditional adversarial network for oil spill image segmentation based on limited data training. Fan et al. [33] combined different threshold segmentation algorithms with U-Net to merge the global features of SAR oil spill images and achieved a segmentation accuracy of 98.40%. Using a deep convolutional neural network (DCNN), Shirvany et al. [34] obtained an oil spill convolutional network (OSCN) which could achieve 94.01% accuracy and 83.51% recall by adjusting the hyperparameters of the SAR dark-spot dataset. Wang et al. [35] suggested a Deeplabv3+ semantic segmentation method with several loss limitations. Wang et al. [36] suggested an enhanced deep learning model and optimized the model's hyperparameters using Bayesian optimization (BO) with an average accuracy of 74.69%. Liu et al. [37] suggested a DenseNet convolutional neural network-based densely linked network model. The model extracts multiscale features of images and improves the ability to capture subtle features and the accuracy of segmentation of images. Chen et al. [38] used a stacked autoencoder (SAE) and deep belief network (DBN) to improve the polarimetric feature sets and minimize the feature dimension via layer-wise unsupervised pre-training. Gallego et al. [39] used deep neural autoencoders to separate oil spills from Side-Looking Airborne Radar (SLAR) data and achieved a pixel-level score of 93.01%. Ma et al. [40] proposed a deep convolutional neural network (DCNN) based on amplitude and phase data from Sentinel-1 dual-polarimetric images to detect oil spills. The normalizing layer of a neural network is called group normalization (GN). The experimental findings demonstrated improved performance over those conventional techniques.

The Proposed ASA-DRNet Model
Here, we propose an optimized network structure to alleviate coarse segmentation in order to address unclear edge segmentation and mis-segmentation between oil spill areas and oil spill-like regions. The whole network is an end-to-end structure consisting of an encoder-decoder struture. The encoder is composed of an improved DCNN model and an optimized Atrous Spatial Pyramid Pooling (ASPP) model. In the structure of the decoder, output of the encoder is directly upsampled. The bottom edge features of different resolutions from improved DCNN are combined with higher level semantic features of output of the encoder for feature fusion operation. The overall structure of the proposed DeeplabV3+ network is shown in Figure 1.
based densely linked network model. The model extracts multiscale features of images and improves the ability to capture subtle features and the accuracy of segmentation of images. Chen et al. [38] used a stacked autoencoder (SAE) and deep belief network (DBN) to improve the polarimetric feature sets and minimize the feature dimension via layerwise unsupervised pre-training. Gallego et al. [39] used deep neural autoencoders to separate oil spills from Side-Looking Airborne Radar (SLAR) data and achieved a pixel-level score of 93.01%. Ma et al. [40] proposed a deep convolutional neural network (DCNN) based on amplitude and phase data from Sentinel-1 dual-polarimetric images to detect oil spills. The normalizing layer of a neural network is called group normalization (GN). The experimental findings demonstrated improved performance over those conventional techniques.

The Proposed ASA-DRNet Model
Here, we propose an optimized network structure to alleviate coarse segmentation in order to address unclear edge segmentation and mis-segmentation between oil spill areas and oil spill-like regions. The whole network is an end-to-end structure consisting of an encoder-decoder struture. The encoder is composed of an improved DCNN model and an optimized Atrous Spatial Pyramid Pooling (ASPP) model. In the structure of the decoder, output of the encoder is directly upsampled. The bottom edge features of different resolutions from improved DCNN are combined with higher level semantic features of output of the encoder for feature fusion operation. The overall structure of the proposed DeeplabV3+ network is shown in Figure 1.

DCNN Module (ResNet-18 + Axial Self Attention)
The DCNN network proposed in this paper is based on an improved ResNet-18 network. As shown in Figure 2 below, the network mainly consists of four residual blocks and Axial self-attention blocks. Each of the residual blocks has the same structure and are used to extract deep features in series. The Axial self-attention module is embedded behind the residual blocks instead of the original simple convolution operation, which on the one hand makes the extracted features more extensive and adequate, and on the other

DCNN Module (ResNet-18 + Axial Self Attention)
The DCNN network proposed in this paper is based on an improved ResNet-18 network. As shown in Figure 2 below, the network mainly consists of four residual blocks and Axial self-attention blocks. Each of the residual blocks has the same structure and are used to extract deep features in series. The Axial self-attention module is embedded behind the residual blocks instead of the original simple convolution operation, which on the one hand makes the extracted features more extensive and adequate, and on the other hand, enhances the extraction of important features and avoids the interference of redundant features.

The Proposed Axial Self Attention Block
Given an input U ∈ R (H * W * C) , compress it into K ∈ R (H * (1 * W) * C) and Q ∈ R ((1 * H) * W * C) feature matrices in the x-axis and y-axis directions. The channel attention feature matrix F ∈ R (H * W * C) is obtained by multiplying K and Q and matching a softmax layer. This is done as follows: , √ As shown in the Figure 3 below:

The Proposed Axial Self Attention Block
Given an input U ∈ R (H * W * C) , compress it into K ∈ R (H * (1 * W) * C) and Q ∈ R ((1 * H) * W * C) feature matrices in the x-axis and y-axis directions. The channel attention feature matrix F ∈ R (H * W * C) is obtained by multiplying K and Q and matching a softmax layer. This is done as follows: As shown in the Figure 3 below: hand, enhances the extraction of important features and avoids the interference of redundant features.

The Proposed Axial Self Attention Block
Given an input U ∈ R (H * W * C) , compress it into K ∈ R (H * (1 * W) * C) and Q ∈ R ((1 * H) * W * C) feature matrices in the x-axis and y-axis directions. The channel attention feature matrix F ∈ R (H * W * C) is obtained by multiplying K and Q and matching a softmax layer. This is done as follows: , √ As shown in the Figure 3 below:    Spatial attention branching: firstly, 1 × 1 convolution is used to downscale the feature matrix. Then, two 3 × 3 atrous convolutions are concatenated to increase the perceptual field. Finally, 1 × 1 convolution is passed to obtain the V matrix through sigmoid activation function. This is done as follows: Finally, the original feature map is subjected to a weight rescaling operation, i.e., multiplying by the feature weights on the channels and space [40].

Optimized ASPP Module
After the DCNN module, we use the ASPP module to capture multi-scale information. The original network suffers from unclear edge segmentation when performing SAR oil spill image segmentation. Some of the images also have misspecification of segmentation of small oil spill areas from the background. This study aims to improve the ASPP module of the encoder in light of the aforementioned occurrences. We reduced the expansion rate of the three atrous convolutions in equal proportion to improve the network's ability to extract multi-scale information. The 3 × 3 atrous convolution in the ASPP is also decomposed into 3 × 1 and 1 × 3 atrous convolutions by 2D decomposition, and the expansion rate is maintained. It is found that the number of convolutional parameters in this improved ASPP module is smaller than the original one, which effectively increases the computational speed of the model. In addition, connections are added between layers to enable the sharing of features to extract deeper semantic information. The overall structure of the ASPP module is showed in Figure 4 below. is the feature weight value of the th row and th column. Function calculates their relationship.
Spatial attention branching: firstly, 1 1 convolution is used to downscale the feature matrix. Then, two 3 3 atrous convolutions are concatenated to increase the perceptual field. Finally, 1 1 convolution is passed to obtain the V matrix through sigmoid activation function. This is done as follows: (4) Finally, the original feature map is subjected to a weight rescaling operation, i.e., multiplying by the feature weights on the channels and space [40].

Optimized ASPP Module
After the DCNN module, we use the ASPP module to capture multi-scale information. The original network suffers from unclear edge segmentation when performing SAR oil spill image segmentation. Some of the images also have misspecification of segmentation of small oil spill areas from the background. This study aims to improve the ASPP module of the encoder in light of the aforementioned occurrences. We reduced the expansion rate of the three atrous convolutions in equal proportion to improve the network's ability to extract multi-scale information. The 3 × 3 atrous convolution in the ASPP is also decomposed into 3 × 1 and 1 × 3 atrous convolutions by 2D decomposition, and the expansion rate is maintained. It is found that the number of convolutional parameters in this improved ASPP module is smaller than the original one, which effectively increases the computational speed of the model. In addition, connections are added between layers to enable the sharing of features to extract deeper semantic information. The overall structure of the ASPP module is showed in Figure 4 below. One-dimensional dilated convolution's precise calculation formula is: where, U stands for the input feature map, W represents the convolution kernel, N represents the filter size, and r represents the dilation rate of the dilated convolution. The feature extraction procedures' concatenation computation is displayed in formula (6): One-dimensional dilated convolution's precise calculation formula is: where, U stands for the input feature map, W represents the convolution kernel, N represents the filter size, and r represents the dilation rate of the dilated convolution. The feature extraction procedures' concatenation computation is displayed in formula (6): where, Concat(·) represents the dense connections of different layers, Zpool(·) represents the normal pooling operations, and C(·) represents simple convolution operation with a convolutional kernel size of 1 × 1. The result of Yi in each layer is showed in Formula (7): where, the dilated convolution with dilated rate r and convolutional kernel size n is represented as Fr,n 1& n 2 (·).

Decoder Module
Because the encoder output scale is only 1/16th of the original image, the decoder directly upsamples the encoder output and performs a feature fusion operation by combining the high-level semantic features from the encoder output with a single low-level edge feature from the DCNN input in the original network. As the neural network's layers expand, the extracted characteristics get more abstract, and this method might cause the margins of the segmentation results to blur. As a result, using solely low-level characteristics of a specific resolution as input to the decoder is insufficient. We attempt to combine low-level characteristics of various resolutions in order to improve the network's capacity to extract edge information. The optimized Deeplabv3+ network is more accurate for edge segmentation.

Dataset
As there have not been recognized or peer-accepted standard offshore oil spill databases for a long time, this experiment involves creating your own dataset; i.e., labeling the images. Although there is software, such as Labelme and ITK-snap, that can be used for labeling, this experiment is conducted on MATLAB 2022a. MATLAB also provides a very useful image labeling tool, Image Labeler. The pixel ROI annotation function is used in the semantic segmentation task. Some SAR images and labels in our datasets are showed below Figure 5. where, the dilated convolution with dilated rate r and convolutional kernel size n is rep resented as Fr,n1&n2(·).

Decoder Module
Because the encoder output scale is only 1/16th of the original image, the decode directly upsamples the encoder output and performs a feature fusion operation by com bining the high-level semantic features from the encoder output with a single low-leve edge feature from the DCNN input in the original network. As the neural network's layer expand, the extracted characteristics get more abstract, and this method might cause th margins of the segmentation results to blur. As a result, using solely low-level character istics of a specific resolution as input to the decoder is insufficient. We attempt to combin low-level characteristics of various resolutions in order to improve the network's capacit to extract edge information. The optimized Deeplabv3+ network is more accurate for edg segmentation.

Dataset
As there have not been recognized or peer-accepted standard offshore oil spill data bases for a long time, this experiment involves creating your own dataset; i.e., labeling th images. Although there is software, such as Labelme and ITK-snap, that can be used fo labeling, this experiment is conducted on MATLAB 2022a. MATLAB also provides a ver useful image labeling tool, Image Labeler. The pixel ROI annotation function is used i the semantic segmentation task. Some SAR images and labels in our datasets are showe below Figure 5. In this work, considering the performance of the CPU, 600 images from our own oi spill detection dataset were chosen as the training set and 200 images were chosen as the testing set. Additionally, the original dataset's picture size was decreased from 1250 × 650 pixels to 256 × 256 pixels. Oil-spill-Dataset was the name of the dataset utilized in this study. This dataset covered examples from the four categories of land, sea, oil spill area and oil-like spill area. To lessen the impact of like-oil spill regions and lands on the categorization, these four categories were created.
The relevant environment for this experiment is shown in Tables 1 and 2.  In this work, considering the performance of the CPU, 600 images from our own oil spill detection dataset were chosen as the training set and 200 images were chosen as the testing set. Additionally, the original dataset's picture size was decreased from 1250 × 650 pixels to 256 × 256 pixels. Oil-spill-Dataset was the name of the dataset utilized in this study. This dataset covered examples from the four categories of land, sea, oil spill area, and oil-like spill area. To lessen the impact of like-oil spill regions and lands on the categorization, these four categories were created.
The relevant environment for this experiment is shown in Tables 1 and 2.

Evaluation Metrics
To evaluate the accuracy of the model segmentation, we used the mean intersection over union (mIOU), mean pixel accuracy (mPA), precision (P), and recall (R). According to Equation (8), the mIOU is the ratio of the model's expected outcomes to the real values for each group in the combined collection.
Equation (9) illustrates how the mPA is determined by independently computing the percentage of pixels that are properly categorized for each class, then adding and averaging the numbers. mPA = 1 According to Equations (10) and (11), recall is the likelihood that the real value is properly anticipated while precision is the percentage of correctly predicted outcomes.
where, TP stands for true positives, FP stands for false positives, TN stands for true negatives, j stands for number of types of classification, and FN stands for false negatives.

Experimental Results of ASA-DRNet
On our own datasets, ASA-DRNet performed well in segmentation, and the IOU, PA, accuracy, and recall for each group in the training set are shown in Figure 6. The IOU, PA, precision, and recall for each category in the testing set are shown in Figure 7.

Evaluation Metrics
To evaluate the accuracy of the model segmentation, we used the mean intersection over union (mIOU), mean pixel accuracy (mPA), precision (P), and recall (R). According to Equation (8), the mIOU is the ratio of the model's expected outcomes to the real values for each group in the combined collection.
Equation (9) illustrates how the mPA is determined by independently computing the percentage of pixels that are properly categorized for each class, then adding and averaging the numbers.
According to Equations (10) and (11), recall is the likelihood that the real value is properly anticipated while precision is the percentage of correctly predicted outcomes.
where, TP stands for true positives, FP stands for false positives, TN stands for true negatives, j stands for number of types of classification, and FN stands for false negatives.

Experimental Results of ASA-DRNet
On our own datasets, ASA-DRNet performed well in segmentation, and the IOU, PA, accuracy, and recall for each group in the training set are shown in Figure 6. The IOU, PA, precision, and recall for each category in the testing set are shown in Figure 7.

Results of the Comparison Experiment
To further validate the performance of our method compared to different deep learning networks, we qualitatively compared the segmentation results of images from different scenes and categories in the dataset, and we found that both ASA-DRNet and the other networks had some degree of error, but, overall, ASA-DRNet had better segmentation results. The segmentation results are shown in Figure 8.

Results of the Comparison Experiment
To further validate the performance of our method compared to different deep learning networks, we qualitatively compared the segmentation results of images from different scenes and categories in the dataset, and we found that both ASA-DRNet and the other networks had some degree of error, but, overall, ASA-DRNet had better segmentation results. The segmentation results are shown in Figure 8.

Results of the Comparison Experiment
To further validate the performance of our method compared to different deep learning networks, we qualitatively compared the segmentation results of images from different scenes and categories in the dataset, and we found that both ASA-DRNet and the other networks had some degree of error, but, overall, ASA-DRNet had better segmentation results. The segmentation results are shown in Figure 8.
(a1) (b1) (c1)  We trained Resnet, SegNet, UNet, deepLabv3 + Resnet-18, deepLabv3 + Resnet-18 + SE block, and ASA-DRNet on our training datasets and tested them on the testing sets in order to further confirm the efficacy of our strategy. Each model's mIOU, mPA, mPrecision (P), and mRecall (R) were computed. Table 3 displays the test outcomes for each model on the testing datasets. The problem of training neural networks with too much depth is greatly eliminated by the clever application of a shortcut connection to ResNet. Its mIOU and mPA are 55.12% and 58.96%, respectively. SegNet is a symmetric network consisting of an encoder and a decoder. Its mIOU and mPA are higher than ResNet by 1.60% and 2.60% respectively. UNet, a major network in the field of image segmentation, gains from the feature fusion functionality offered by skip connections, and outperforms SegNet and ResNet in terms of segmentation performance. Its mIOU, mPA, mPrecision, and mRecall are 59.25%, 63.86%, 70.64%, and 73.18%, respectively, and its mIOU is higher than ResNet and SegNet by 4.13% and 2.53%, respectively. DeepLabv3+ResNet-18 is a network that has been proposed in recent years and has achieved even more superior results in image segmentation. Its mIOU and mPA are 2.51% and 1.3% higher than those of UNet. This is because it has a stronger Res-Net-18 encoder and a multi-scale design based on dilated convolution. The improved SE- Figure 8. Images of the segmentation results obtained on our datasets. (a1,a2,a3,a4,a5) SAR image, (b1,b2,b3,b4,b5) label, (c1,c2,c3,c4,c5) result of Resnet, (d1,d2,d3,d4,d5) result of SegNet, (e1,e2,e3,e4,e5) result of UNet, (f1,f2,f3,f4,f5) result of DRNet, (g1,g2,g3,g4,g5) result of SE-DRNet, and (h1,h2,h3,h4,h5) result of ASA-DRNet. Light blue represents the sea, red represents the oil spill-like area, green represents land, and black represents the oil spill area.
We trained Resnet, SegNet, UNet, deepLabv3 + Resnet-18, deepLabv3 + Resnet-18 + SE block, and ASA-DRNet on our training datasets and tested them on the testing sets in order to further confirm the efficacy of our strategy. Each model's mIOU, mPA, mPrecision (P), and mRecall (R) were computed. Table 3 displays the test outcomes for each model on the testing datasets. The problem of training neural networks with too much depth is greatly eliminated by the clever application of a shortcut connection to ResNet. Its mIOU and mPA are 55.12% and 58.96%, respectively. SegNet is a symmetric network consisting of an encoder and a decoder. Its mIOU and mPA are higher than ResNet by 1.60% and 2.60% respectively. UNet, a major network in the field of image segmentation, gains from the feature fusion functionality offered by skip connections, and outperforms SegNet and ResNet in terms of segmentation performance. Its mIOU, mPA, mPrecision, and mRecall are 59.25%, 63.86%, 70.64%, and 73.18%, respectively, and its mIOU is higher than ResNet and SegNet by 4.13% and 2.53%, respectively. DeepLabv3+ResNet-18 is a network that has been proposed in recent years and has achieved even more superior results in image segmentation. Its mIOU and mPA are 2.51% and 1.3% higher than those of UNet. This is because it has a stronger ResNet-18 encoder and a multi-scale design based on dilated convolution. The improved SE-DRNet based on the DeepLabV3+ network has higher accuracy and recall, and their mIOU is also higher than the original DRNet, reaching 63.21%. ASA-DRNet outperforms the previous models in terms of segmentation performance; it has an accuracy and recall over DRNet of 2.74% and 2.09%, respectively. Its mIOU and mPA, which reach 64.47% and 68.72%, respectively, are likewise much greater than those of the other models.  Figure 8 shows the segmentation results of the six different networks. From the figure, we can clearly see that ResNet and SegNet have a large error in segmenting the edges of images and small regions. Its mPrecision and mRecall can only reach 64.36%, 69.92%, 68.74%, and 71.35%. Compared with the segmentation results of ResNet and SegNet, UNet has a better performance with an mPrecision of 70.64%, but it still has the problem of unclear segmentation of edge details. The combination of DRNet and the attention module is more effective in solving the problem of unclear segmentation edge details and small region segmentation errors. The mPrecision and mRecall of the proposed method can reach 74.98% and 76.75%, which are the best results.

Conclusions
Remote sensing image segmentation has recently attracted the interest of numerous academics because of the advancements made in deep learning and satellite imaging technologies. The segmentation of multicategory objects in remote sensing pictures is still a very challenging task. DRNet, based on axial self-attention, is the solution we provide in this study for the issues of low segmentation accuracy and many scales across several categories. Firstly, a new structure, which combines axial self-attention module with ResNet-18, is proposed as the backbone of the DeepLabv3+ encoder. Secondly, an ASPP module is optimized to improve the network's ability to extract multi-scale features and to increase the speed of model calculation. Finally, merging low-level features of different resolutions is used to optimize the ability of network to extract edge information. The experiments show that ASA-DRNet performs the best results compared to other neural network models. On the home-made dataset of this experiment, our method achieved the highest MIOU of 64.47%, which is much higher than UNet's 59.25% and outperforms advanced methods, such as DRNet and SE-DRNet. At the same time, the values of mPrecision and mRecall of our algorithm also outperformed other algorithms, reaching 74.98% and 76.75%.
In summary, our approach has a great generalizability and excellent segmentation accuracy in addition to substantially resolving the issues raised in this work. Our approach does, however, have certain limitations. Our selected datasets have rather low picture resolution for the experimental design. High-resolution photos are also often utilized in remote sensing, although the efficacy of our approach on these images has not yet been shown. In addition, certain remote sensing photographs with high noise still provide poor segmentation results, and blurred and noisy images continue to pose significant obstacles to the segmentation of remote sensing images. Future works will conduct further study. Acknowledgments: This work was supported by the project of National Natural Science Foundation of China and Zhenjiang smart ocean information perception and transmission laboratory project. The above funding did not lead to any conflict of interests regarding the publication of this manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: SAR synthetic aperture radar FCN fully convolutional network PSPN pyramid scene parsing network DCNN deep convolutional neural network DRNet deeplabv3+resnet-18 ASPP atrous spatial pyramid pooling SE squeeze and excitation module ASA axial self attention