An Improved SAR Image Semantic Segmentation Deeplabv3+ Network Based on the Feature Post-Processing Module

: Synthetic Aperture Radar (SAR) can provide rich feature information under all-weather and day-night conditions because it is not affected by climatic conditions. However, multiplicative speckle noise exists in SAR images, which makes it difﬁcult to accurately identify some fuzzy targets in SAR images, such as roads and rivers, during semantic segmentation. This paper proposes an improved Deeplabv3+ network that can be effectively applied to the semantic segmentation task of SAR images. Firstly, this paper added the attention mechanism and, combined with the idea of an image pyramid, proposed the Feature Post-Processing Module (FPPM) to post-process the network output feature map, obtain better ﬁne image features, and solve the problem of fuzzy texture and spectral features of SAR images. Compared to the original Deeplabv3+ network, the segmentation accuracy has been improved by 3.64% and mIoU improved by 1.09%. Secondly, to solve the problems of limited SAR image data and an unbalanced sample, this paper used the focal loss function to improve the backbone function of the network, which increased the mIoU by 1.01%. Finally, the Atrous Spatial Pyramid Pooling (ASPP) module was improved and the 3 × 3 void convolution in ASPP was decomposed into 2D, which can maintain the void ratio and effectively reduce the calculation amount of the module, shorten the training time by 19 ms and improve the semantic segmentation effect.


Introduction
Semantic segmentation of SAR images is a basic and important problem in remote sensing image interpretation. Its purpose is to assign a category label to each pixel in SAR images. Due to their all-weather and rich feature information, SAR images have received special attention from scholars at home and abroad in recent years. SAR image semantic segmentation gives pixel segment semantic information to guide the further analysis and understanding of SAR images, which is of great significance to promote the development of SAR image processing technology. With the rapid development of Deep Learning, Deep Convolutional Neural Networks (DCNN) are vital for feature extraction and characterization. Fully Convolutional Networks (FCN) [1], based on the emergence of the end-to-end classical semantic segmentation model, have achieved great success. However, at the same time, due to the fixed network structure, FCN also reveals many disadvantages. Without considering the global context information, the sampling of the feature map will be restored to the image size of the original image, resulting in inaccurate pixel positioning. Ronnerberge [2] proposed a U-Net network for biomedical image segmentation. However, when doing multi-classification tasks, U-Net convolutional networks have poor edge contour segmentation and easily cause GPU RAM overflow. Marmanisac [3] proposed an integrated learning method combining FCN, SegNet, and edge detection, reducing the segmentation error and improving the segmentation accuracy when segmenting high-resolution remote sensing images. When dealing with objects with similar appearance, PSPNet [4] network where [ where ) (w z w c represents the output with the width of channel c as w, the height coordinate with the width of channel c as w is the value of the feature map of i, and H represents the height of the feature map.
The above two transformations aggregate features along two spatial directions to obtain a pair of direction-aware feature maps. This differs from the SE [16] module that generates a single feature vector in the channel attention method. These two transformations also allow the attention module to capture long-term dependencies along one spatial direction and preserve accurate location information along the other. This helps the network to locate the target of interest.
(2) Coordinate Attention Generation By (1), the global receptive field can be well obtained and accurate position information can be encoded. In order to use the resulting features, the following two transformations are given, called Coordinated Attention generation. After the transformation in Information Embedding, the aggregation feature map generated by Formulas (4) and (5)  )) ( ( w w w f F g σ = (6) where σ is a sigmoid activation function, to reduce the model's complexity and computational overhead, an appropriate reduction ratio r is usually used to reduce the number of channels of f. Then the output g h and g w are expanded as attention weights, respectively. Finally, the output of the Coordinate Attention block Y = [y 1 , y 2 ,..., y c ] can be written as:

Method
Deeplabv3+ is widely used in optical image semantic segmentation and has achieved good results. Given the approximate color, position, and other characteristics between optical and SAR images, this paper uses Deeplabv3+ to achieve the SAR image semantic segmentation task. Given the characteristics of SAR images, the focal loss function proposed by Tsung Yi Lin [19] is used to improve the semantic segmentation loss function. The decomposition theory proposed by Alvarez J [20] is used to improve the ASPP module of Deeplabv3+. At the same time, the Feature Post-Processing Module (FPPM) proposed in this paper is added to the Decoder module of Deeplabv3+, which improves the accuracy of Deeplabv3+ in SAR image semantic segmentation tasks. The improved Deeplabv3+ network model is shown in Figure 1: , where ) (w z w c represents the output with the width of channel c as w, the height coordinate with the width of channel c as w is the value of the feature map of i, and H represents the height of the feature map.
The above two transformations aggregate features along two spatial directions to obtain a pair of direction-aware feature maps. This differs from the SE [16] module that generates a single feature vector in the channel attention method. These two transformations also allow the attention module to capture long-term dependencies along one spatial direction and preserve accurate location information along the other. This helps the network to locate the target of interest.
(2) Coordinate Attention Generation By (1), the global receptive field can be well obtained and accurate position information can be encoded. In order to use the resulting features, the following two transformations are given, called Coordinated Attention generation. After the transformation in Information Embedding, the aggregation feature map generated by Formulas (4) and (5)  )) ( ( w w w f F g σ = (6) where σ is a sigmoid activation function, to reduce the model's complexity and computational overhead, an appropriate reduction ratio r is usually used to reduce the number of channels of f. Then the output g h and g w are expanded as attention weights, respectively. Finally, the output of the Coordinate Attention block Y = [y 1 , y 2 ,..., y c ] can be written as:

Method
Deeplabv3+ is widely used in optical image semantic segmentation and has achieved good results. Given the approximate color, position, and other characteristics between optical and SAR images, this paper uses Deeplabv3+ to achieve the SAR image semantic segmentation task. Given the characteristics of SAR images, the focal loss function proposed by Tsung Yi Lin [19] is used to improve the semantic segmentation loss function. The decomposition theory proposed by Alvarez J [20] is used to improve the ASPP module of Deeplabv3+. At the same time, the Feature Post-Processing Module (FPPM) proposed in this paper is added to the Decoder module of Deeplabv3+, which improves the accuracy of Deeplabv3+ in SAR image semantic segmentation tasks. The improved Deeplabv3+ network model is shown in Figure 1: ] is a splicing operation along the spatial dimension, δ is a nonlinear activation function, f is an intermediate feature map that encodes spatial information in the horizontal and vertical directions. Here r is used to control the reduction rate of SE and SE block size, and then decompose f into two separate tensors f h along the spatial dimension f h and f w . Utilize the other two 1 × 1 Convolution transformation F h and F w transform f h and f w into tensors with the same number of channels to input X respectively, and obtain: where σ is a sigmoid activation function, to reduce the model's complexity and computational overhead, an appropriate reduction ratio r is usually used to reduce the number of channels of f. Then the output g h and g w are expanded as attention weights, respectively. Finally, the output of the Coordinate Attention block Y = [y 1 , y 2 ,..., y c ] can be written as:

Method
Deeplabv3+ is widely used in optical image semantic segmentation and has achieved good results. Given the approximate color, position, and other characteristics between optical and SAR images, this paper uses Deeplabv3+ to achieve the SAR image semantic segmentation task. Given the characteristics of SAR images, the focal loss function proposed by Tsung Yi Lin [19] is used to improve the semantic segmentation loss function. The decomposition theory proposed by Alvarez J [20] is used to improve the ASPP module of Deeplabv3+. At the same time, the Feature Post-Processing Module (FPPM) proposed in this paper is added to the Decoder module of Deeplabv3+, which improves the accuracy of Deeplabv3+ in SAR image semantic segmentation tasks. The improved Deeplabv3+ network model is shown in Figure 1: The new focal loss function, FPPM model, improved ASPP module and attention mechanism module in Figure 1 will be introduced in detail in the following sections. The rate in the figure represents the step size. As can be seen from Figure 1, our improvements to the Deeplabv3+ network are mainly reflected in the following aspects: First, in order to solve the problems of polarimetric SAR images, such as being seriously affected by speck noise, easy to produce shadows, and low spatial resolution, we introduced the attention mechanism and combined with the image pyramid idea, proposed an FPPM to post-process the feature map output from the network, and used the image quality analysis method to calculate the number of branches of the module to obtain more detailed image features. Secondly, in order to solve the problem of limited data and unbalanced samples in polarimetric SAR images, the focal loss function is used to improve the backbone function of the network. Finally, aiming at the problem of long calculation time of the model, the ASPP module was optimized, and 3 × the void convolution of 3 is decomposed into 2D  The new focal loss function, FPPM model, improved ASPP module and attention mechanism module in Figure 1 will be introduced in detail in the following sections. The rate in the figure represents the step size. As can be seen from Figure 1, our improvements to the Deeplabv3+ network are mainly reflected in the following aspects: First, in order to solve the problems of polarimetric SAR images, such as being seriously affected by speck noise, easy to produce shadows, and low spatial resolution, we introduced the attention mechanism and combined with the image pyramid idea, proposed an FPPM to postprocess the feature map output from the network, and used the image quality analysis method to calculate the number of branches of the module to obtain more detailed image features. Secondly, in order to solve the problem of limited data and unbalanced samples in polarimetric SAR images, the focal loss function is used to improve the backbone function of the network. Finally, aiming at the problem of long calculation time of the model, the ASPP module was optimized, and 3 × the void convolution of 3 is decomposed into 2D to maintain its void ratio, while effectively reducing the computational complexity of the module.

Feature Post-Processing Module (FPPM)
Although the SAR image has more abundant feature information, the signal-to-noise ratio of the nonlinear FM continuous pulse signal is low. Due to the side-looking coherent imaging mode, image noise pollution is severe, and it is easy to produce shadow, speckle noise, etc., and the spatial resolution is low. In order to make full use of the fine features of SAR images inspired by the pyramid pool, this paper proposes the FPPM. The FPPM structure is shown in Figure 2. The subgraph (a) in Figure 2 describes the detailed

Feature Post-Processing Module (FPPM)
Although the SAR image has more abundant feature information, the signal-to-noise ratio of the nonlinear FM continuous pulse signal is low. Due to the side-looking coherent imaging mode, image noise pollution is severe, and it is easy to produce shadow, speckle noise, etc., and the spatial resolution is low. In order to make full use of the fine features of SAR images inspired by the pyramid pool, this paper proposes the FPPM. The FPPM structure is shown in Figure 2. The subgraph (a) in Figure 2 describes the detailed architecture of FPPM. Its first branch applies the primary non-local attention mechanism to the whole image, and the second branch divides the image into (H/2) × (W/2) blocks. Each block uses the Non-local mechanism. In the figure, XC represents the channel number of X characteristic graphs with a channel number of C. The third branch divides the image into (H/4) × (W/4) and applies it to Non-local, and so on. The detailed operation of non-local blocks on a single patch is described in (b), which divides the image into small blocks, performs Non-local on each block respectively, and folds it back to the entire image. The results of X branches are combined and then sent to the ECA module to generate weight parameters better to capture the feature map's local information. In this paper, the ECA module is introduced into the FPPM to better mine the channel information of the output characteristic graph of the previous network. The ECA module is a new method to capture local cross-channel information interaction, ensuring good network performance and low computational complexity. The ECA module obtains features through GAP (Global Average Pooling) and uses one-dimensional convolution with convolution kernel size k to generate channel weights. In the process of enhancing channel dependency, it is assumed that the output feature AA is integrated through GAP, and the channel weight is generated by one-dimensional convolution with convolution kernel size k without dimension reduction operation ω, as shown in Formula (8): Among them, σ Represents the Sigma function and C1D represents onedimensional convolution. In order to adequately capture local cross-channel information interaction, it is necessary to determine the approximate range of channel interaction information, that is, the convolution kernel size k of one-dimensional convolution. The ECA module adaptively selects k through Equation (9): where | t | odd means the odd number nearest to t, the same as the original γ = 2, b = 1. It is worth noting that the parameters and computation of this attention module can be almost ignored and it is a highly lightweight network architecture. The network model of ECA is shown in Figure 3. In this paper, the ECA module is introduced into the FPPM to better mine the channel information of the output characteristic graph of the previous network. The ECA module is a new method to capture local cross-channel information interaction, ensuring good network performance and low computational complexity. The ECA module obtains features through GAP (Global Average Pooling) and uses one-dimensional convolution with convolution kernel size k to generate channel weights. In the process of enhancing channel dependency, it is assumed that the output feature AA is integrated through GAP, and the channel weight is generated by one-dimensional convolution with convolution kernel size k without dimension reduction operation ω, as shown in Formula (8): Among them, σ Represents the Sigma function and C1D represents one-dimensional convolution. In order to adequately capture local cross-channel information interaction, it is necessary to determine the approximate range of channel interaction information, that is, the convolution kernel size k of one-dimensional convolution. The ECA module adaptively selects k through Equation (9): where |t| odd means the odd number nearest to t, the same as the original γ = 2, b = 1. It is worth noting that the parameters and computation of this attention module can be almost ignored and it is a highly lightweight network architecture. The network model of ECA is shown in Figure 3.

Determination of FPPM Model Parameters
This section determines the optimal value of branch number X of the FP the image quality evaluation algorithm. X branches of the FPPM divide imag H/X, and each block uses the Non-local mechanism, which will impact the im during the division process. The image quality evaluation algorithm is used the quality and distortion of the block image after each layer of branch division to the image quality score, the optimal number of branches of the FPPM is ob many times the FPPM should divide the image in pyramid mode and the b image feature extraction.
The objective quality evaluation algorithm calculates the image establishing a model. Its goal is to use computers to replace human vision achieve automatic and accurate image quality evaluation [21]. Accord dependence of the image to be evaluated on the information related to the ori the objective image quality evaluation methods can be roughly divided categories: the complete reference, partial reference, and no reference. T method has the most mature development and has been widely concerned by the academic community. It has achieved fruitful research results, such as t typical structure similarity evaluation algorithm, and its improved alg achieved considerable success [22][23][24][25][26]. The image quality evaluation algori this section is a general NR evaluation algorithm that uses the logarithm characteristics of images. This algorithm is based on the idea of NSS (N Statistics) in the general nonreference quality evaluation algorithm that does learning. It does not require any training or learning process. It only extract features of the image and does not need to transform it. The algorithm has re computational complexity.

Logarithmic Energy Characteristics
As an actual performance to evaluate image quality, image definition logarithmic energy statistical characteristics of the normalized luminance coeffi spatial domain. The logarithmic energy characteristic formula of the luminance coefficient is as follows:

Determination of FPPM Model Parameters
This section determines the optimal value of branch number X of the FPPM through the image quality evaluation algorithm. X branches of the FPPM divide image into H/X × H/X, and each block uses the Non-local mechanism, which will impact the image quality during the division process. The image quality evaluation algorithm is used to calculate the quality and distortion of the block image after each layer of branch division. According to the image quality score, the optimal number of branches of the FPPM is obtained: how many times the FPPM should divide the image in pyramid mode and the best effect for image feature extraction.
The objective quality evaluation algorithm calculates the image quality by establishing a model. Its goal is to use computers to replace human vision systems to achieve automatic and accurate image quality evaluation [21]. According to the dependence of the image to be evaluated on the information related to the original image, the objective image quality evaluation methods can be roughly divided into three categories: the complete reference, partial reference, and no reference. The reference method has the most mature development and has been widely concerned and studied by the academic community. It has achieved fruitful research results, such as the relatively typical structure similarity evaluation algorithm, and its improved algorithm has achieved considerable success [22][23][24][25][26]. The image quality evaluation algorithm used in this section is a general NR evaluation algorithm that uses the logarithmic statistical characteristics of images. This algorithm is based on the idea of NSS (Natural Scene Statistics) in the general nonreference quality evaluation algorithm that does not require learning. It does not require any training or learning process. It only extracts the spatial features of the image and does not need to transform it. The algorithm has relatively low computational complexity.

Logarithmic Energy Characteristics
As an actual performance to evaluate image quality, image definition can use the logarithmic energy statistical characteristics of the normalized luminance coefficient in the spatial domain. The logarithmic energy characteristic formula of the normalized luminance coefficient is as follows: where, N represents the number of normalized luminance coefficients in the n th image block.
The simulation results show that the logarithmic energy feature strongly correlates with image visual perception and the spatial characteristics significantly reduce the computational complexity. Under different levels of distortion, the logarithmic energy characteristics change monotonously, and the standard deviation between the logarithmic energy of the image is minimal, which proves that it is suitable for nonreference image quality evaluation.

Spatial Feature Extraction
Statistical characteristics model the distribution rule between adjacent pixels. Since the normalized luminance coefficient of the natural image conforms to the zero mean Gaussian distribution, the statistical model based on logarithmic derivative can adopt the generalized Gaussian distribution(GGD). The expression of the generalized Gaussian distribution is: where Γ(X) = ∞ 0 t x−1 e −t dt, x > 0 is the gamma equation, µ, α, β They are mean value, shape parameter, and scale parameter α, β. The shape and variance of the generalized Gaussian distribution curve are determined, respectively (α, β). It can be estimated effectively by the time-matching algorithm. For each image block, the log energy feature of the normalized luminance coefficient plus the GGD parameter of the log derivative forms a 13-dimensional feature vector. In order to capture image multi-resolution features, spatial features are extracted under two different resolution conditions: original image resolution and reduced image resolution. Continuing to reduce the resolution has little impact on the algorithm performance, so the image resolution is only reduced by twice. Finally, for each image block, 26-dimensional spatial feature vectors are extracted.

MVG (Multivariate Gaussian) Model
Based on the idea of NSS, the final image quality evaluation score is obtained by calculating the distance between the test image and the MVG model of the raw image library. Firstly, the MVG model of the natural image library is constructed, and the extracted spatial feature vector of the natural image is modeled using MVG to obtain the mean vector of a spatial feature of the natural image ν and covariance matrix Σ. The MVG model is as follows: where, (x 1 , . . . , x k ) is the extracted statistical feature of natural image. The MVG modeling process of the test image is the same as that of the natural image. The objective evaluation quality score of the test image is obtained by calculating the distance between the test image and the MVG model of the white natural image library: Among them, ν 1 , ν 2 , and Σ 1 , Σ 2 represent the mean vector and covariance matrix (D (ν 1 , ν 2 , Σ 1 , Σ 2 ). The higher the value, the greater the deviation between the test image and the SAR image, that is, the more serious the distortion of the test image; on the contrary, the less distortion. It can be calculated from the above formula that when the number of branches X is 3, the quality of the smallest unit image block decreases the least, and the feature extraction effect is the best.

Focal Loss Function
Although SAR images contain more abundant feature information of ground structure, imaging technology is cumbersome, and data post-processing is complex. Therefore, the SAR image data sets that can be used for segmentation are limited, and the feature labels of most data sets are incredibly uneven. In the dataset used in this paper, the number of tags such as "road" and "mountain" is large, and the number of tags such as "forest" is small. This imbalance will lead to the problem of low training efficiency. Since most samples are simple targets, these samples provide less valuable information to the model in training. The advantage of a simple sample size will affect the training of the model and degrade its performance of the model. The focal loss function proposed by Tsung Yi Lin [19] solves the problem of category imbalance by reducing the internal weight. This function focuses on training using data sets with sparse complex samples. The loss function is a dynamic scaling cross-entropy loss. As a more effective alternative to deal with the previous class imbalance, the scaling factor decays to zero with increased confidence in the correct class, as shown in Figure 4. Intuitively, this scale factor can automatically reduce the contribution of simple samples in the training process and quickly focus the model on complex samples. Experiments show that the focal loss this paper proposed enables us to train a high-precision single-stage segmentation model that is significantly better than the original loss function, which is the most advanced function for training unbalanced samples.
as "forest" is small. This imbalance will lead to the problem of low training efficiency. Since most samples are simple targets, these samples provide less valuable information to the model in training. The advantage of a simple sample size will affect the training of the model and degrade its performance of the model. The focal loss function proposed by Tsung Yi Lin [19] solves the problem of category imbalance by reducing the internal weight. This function focuses on training using data sets with sparse complex samples. The loss function is a dynamic scaling cross-entropy loss. As a more effective alternative to deal with the previous class imbalance, the scaling factor decays to zero with increased confidence in the correct class, as shown in Figure 4. Intuitively, this scale factor can automatically reduce the contribution of simple samples in the training process and quickly focus the model on complex samples. Experiments show that the focal loss this paper proposed enables us to train a high-precision single-stage segmentation model that is significantly better than the original loss function, which is the most advanced function for training unbalanced samples. The focal loss is designed to solve the imbalance between different samples in the training process of single stage target detection scene. For binary classification 1, focal loss will be introduced from cross entropy (CE) loss: (14) A common way to solve class imbalance is to introduce weight factors p ∈ [0,1] as the weight of class 1, 1 − p. As the weight of class-1, in practice, it can be set by inverse class frequency or as a super parameter for cross-validation. For the convenience of symbols, pt is defined. p-balanced CE loss will be written as: This loss is a simple extension of CE and serves as the experimental baseline for the focal loss we propose.
CE loss function cannot balance the learning of fewer samples well, so we introduce focal loss as a loss function to solve the problem of sample imbalance in segmentation tasks. Focal loss is improved based on the cross-entropy function by modifying the cross- The focal loss is designed to solve the imbalance between different samples in the training process of single stage target detection scene. For binary classification 1, focal loss will be introduced from cross entropy (CE) loss: A common way to solve class imbalance is to introduce weight factors p ∈ [0, 1] as the weight of class 1, 1 − p. As the weight of class-1, in practice, it can be set by inverse class frequency or as a super parameter for cross-validation. For the convenience of symbols, p t is defined. p-balanced CE loss will be written as: This loss is a simple extension of CE and serves as the experimental baseline for the focal loss we propose.
CE loss function cannot balance the learning of fewer samples well, so we introduce focal loss as a loss function to solve the problem of sample imbalance in segmentation tasks. Focal loss is improved based on the cross-entropy function by modifying the cross-entropy function and adding a sample difficulty weight adjustment factor (1 − P t ) γ. The mathematical expression is: In fact, you can also add a category weight α, The form (10) is rewritten as: where α is the weight parameter between categories (0-1 two categories); (1 − P t ) γ adjusts the factor for simple/complex samples, γ is the focal parameter. When the prediction of a particular category is accurate, that is, when Pt is close to 1, (1 − P t ) γ the value of is close to 0. When the prediction of a specific category is inaccurate, that is, when Pt is close to 0, (1 − P t ) γ the value of is close to 1 set up γ = 2, α = 0.25. We use this form in the experiment because its accuracy is higher than that of the non-equilibrium form.

Improvement of ASPP Module
SAR images can collect data of different wavelengths and polarizations and contain more spectrum, texture, and other feature information for different ground objects. Therefore, this paper uses a pyramid pooling structure to divide the features of SAR images, but this dramatically increases the amount of data, increases the memory occupied by the network model, and prolongs the processing time of a single image. Therefore, this paper redesigns the residual unit of the backbone network and optimizes the ASPP module.
It has been proved that two-dimensional convolutions can be decomposed into a series of one-dimensional convolutions According to the literature [20], under the constraint that the relaxation rank of convolution layer is 1, the convolution layer f i can be rewritten as: The sum is a vector with the length of d and is a weight scale and K is the rank of f i . Based on this expression, Alvarez and Peterson proposed that each convolution layer can be decomposed into 1D convolution and a nonlinear operation. Under the condition that the input of the decomposition layer is a 0 c , the i-th output a i l expression is: where L is the number of convolution layers, φ (·) is ReLU. Replace the bottleneck unit of the backbone network with a 1D non-bottleneck unit at 3 × 3. With the same number of channels in the convolution input characteristic graph, 1D non-bottleneck units can reduce the parameters of non-bottleneck units by 33% and bottleneck units by 29% (If c is 3 × 3 Number of convolution output channels, then 3 × 3 Conventional convolution parameter value is w 0 × 3 × 3 × c). The parameter quantity after 2D decomposition is After decomposition, the weight parameters can be reduced by about 33%. Because 3 × 3 convolution will learn some redundant information, and the number of parameters is large, it will take a long time to train. Conventional convolution has been proven to compute much overlapping redundant information. According to the method shown in Figure 5, 3 × 3 s cavity convolution is decomposed into 3 by 2D × 1 and 1 × 3 to maintain its void ratio. The rate in the figure represents the step size. The convolution parameters of the improved ASPP module are 33% less than those of the conventional convolution, and the speed is 3 × 3. The convolution is fast, and essential semantic information can be extracted, effectively reducing the calculation of this module.

Results
This paper used the SAR image taken by the Sentinel 1 satel surrounding areas in Jiangsu Province, China, as the original data resolution of 10 m. The data were obtained on 19 April, 2011, the co image resolution was 5 m, and the acquisition time was April 20 adopting this data set is that it is one of the few existing SAR imag perfectly correspond to the precise optical image, which is convenie visual image feature information for the supplement. Secondly, comp image data sets, this data set has more sample categories, and the different types are more prominent, which is convenient fo segmentation tasks. In this paper, LabelMe labeling software was images, and then the labeled SAR images were cut into 256 × 256 marked SAR data set was subjected to data enhancement operatio rotation image, and finally, the whole data set was divided into a thousand eight hundred images and a verification set of two hundred training set, the two hundred images in the validation set did not pa the network model, but only tested the segmentation effect of the m was trained and then calculated various indicators for verification pictures in the verification set were from different regions of the w number of samples of different categories was equal.

Results
This paper used the SAR image taken by the Sentinel 1 satellite in Nanjing and surrounding areas in Jiangsu Province, China, as the original data set, with an image resolution of 10 m. The data were obtained on 19 April, 2011, the corresponding optical image resolution was 5 m, and the acquisition time was April 2017. The reason for adopting this data set is that it is one of the few existing SAR image data sets that can perfectly correspond to the precise optical image, which is convenient for extracting the visual image feature information for the supplement. Secondly, compared with other SAR image data sets, this data set has more sample categories, and the differences between different types are more prominent, which is convenient for testing semantic segmentation tasks. In this paper, LabelMe labeling software was used to label SAR images, and then the labeled SAR images were cut into 256 × 256 small images. The marked SAR data set was subjected to data enhancement operations such as random rotation image, and finally, the whole data set was divided into a training set of one thousand eight hundred images and a verification set of two hundred images. Unlike the training set, the two hundred images in the validation set did not participate in learning the network model, but only tested the segmentation effect of the model after the model was trained and then calculated various indicators for verification. The two hundred pictures in the verification set were from different regions of the whole image, and the number of samples of different categories was equal.
This paper quantitatively measures network performance using pixel Dice Similarity Coefficient (DICE), intersection-over-union (IoU), mean intersection-over-union (mIoU), and global accuracy (GA), including: IoU cls = n ii P represents the true value and T represents the predicted value. t i represents the total number of pixels and i. k represents the number of categories of pixels. n ij represents the number of pixel categories and i is predicted to be in the category j. GA is a good way to show that the network training precision, IoU cls , is also the appropriate punishment for the classification of network errors, and the two complement each other, while IoU cls only represents the network's prediction accuracy of a single pixel. In order to obtain a general evaluation of the overall evaluation results, the average of the mIoU is compared to the overall semantic segmentation accuracy of the network for all pixels.

Determination of CA Attention Mechanism
Although SE (Squeeze-and-Excitation) Block [16] has been widely used in recent years, it only considers measuring the importance of each channel by modeling channel relationships and ignoring location information, which is important for generating spatially selective attention maps. Later, the CBAM (Convolutional Block Attention Module) [17] attempted to utilize location information by reducing the channel dimensions of the input tensor and then using convolution to calculate spatial attention. GALA (Global-Local Attentive Latent Alignment) [27] extends this concept by designing advanced attention spans. In the GALA attention mechanism, the global attention mechanism is used to capture the global information and context of an image to better understand the overall meaning and semantics of the image. The ECA-Net (Efficient Channel Attention Network) [21] analyzed the side effects of dimensionality reduction in SE channel attention and proposed a local cross-channel interaction strategy without dimensionality reduction, effectively avoiding the impact of dimensionality reduction on channel attention learning effectiveness. The MS-FPN (Multi-scale Feature Pyramid Network) [28] adopted a pyramid structure to extract shallow and deep feature maps, adaptively learning and selecting important feature maps obtained from different scales, thereby improving detection and segmentation accuracy. However, none of the above methods effectively model the remote dependencies required for visual tasks. The CA attention mechanism not only considers the relationship between channels, but also considers the location information in the feature space. In order to prove the difference between the CA attention mechanism used in this article and other attention mechanisms in the original Deeplabv3+ network, we set up a comparative experiment in this section, and the experimental results are shown in Table 1.  Table 1, we can see that the model with CA attention performs much better than the model using other attention and adds less training time. The red boxes indicate the areas where this method performs better than other methods. Experiments have verified the performance of the CA attention mechanism, proving that it can enhance and improve the performance of the model. In the following figure, we visualize the segmentation results generated by models using different attention methods, as shown in Figure 6. Obviously, CA attention is more helpful than other attention mechanisms in accurately dividing target boundaries.

FPPM Model Parameter Experimental Results
According to the image quality evaluation algorithm in Section 3.2, this section has been verified and calculated that when the number of branches X in the FPPM was taken as 3, the image quality decreased least, and was closest to the target pixel value of the original feature image, and the feature extraction effect was the best. In order to further verify the optimal value, a contrast experiment was set up after the ablation experiment. This section studied the influence of different branch numbers X on the prediction results of the model in this paper. Since X = 1 model is equivalent to direct feature extraction, which is meaningless for the FPPM model, this experiment set the value of X to start from 2, that is, X = 2, X = 3, X = 4, X = 5, X = 6. The results are shown in Table 2.  Table 2 shows that the network prediction results obtained under the three weighting coefficients are similar. When X = 3, the optimal value of mIoU cls was obtained and the consumption duration increased the least. The global accuracy of the model decreased with the increase of X, and time consumption increased dramatically. When the number of branches exceeds three layers, the image information in the feature map will be enlarged excessively and the marked targets in the image will be divided into new targets that are difficult to recognize. In this way, new noise interference will be added to the segmentation, affecting the model segmentation effect. Therefore, the X value in this paper is verified. When the feature map was divided into pyramid levels in the FPPM model, the three-layer model segmentation was the best. The contrast effect is shown in Figure 6. Figure 7 shows the results of SAR image processing by the FPPM model. The red boxes indicate the areas where this method performs better than other methods. From top to bottom, there are original SAR images, optical images, and output results with different values of X from 2 to 6. Figure 7 also shows that when the branch number X of FPPM model is 3, the image segmentation effect is the closest to the label, and the feature extraction effect is the best. In the segmentation process, after the third pyramid division, the image feature pixel value is close to the target pixel value, which is easy to divide, and the image quality has not decreased significantly, so the segmentation effect is more refined than other control groups.

FPPM Model Parameter Experimental Results
According to the image quality evaluation algorithm in Section 3.2, this section has been verified and calculated that when the number of branches X in the FPPM was taken as 3, the image quality decreased least, and was closest to the target pixel value of the original feature image, and the feature extraction effect was the best. In order to further verify the optimal value, a contrast experiment was set up after the ablation experiment. This section studied the influence of different branch numbers X on the prediction results of the model in this paper. Since X = 1 model is equivalent to direct feature extraction, which is meaningless for the FPPM model, this experiment set the value of X to start from 2, that is, X = 2, X = 3, X = 4, X = 5, X = 6. The results are shown in Table 2.  Table 2 shows that the network prediction results obtained under the three weighting coefficients are similar. When X = 3, the optimal value of cls mIoU was obtained and the consumption duration increased the least. The global accuracy of the model decreased with the increase of X, and time consumption increased dramatically. When the number of branches exceeds three layers, the image information in the feature map will be enlarged excessively and the marked targets in the image will be divided into new targets that are difficult to recognize. In this way, new noise interference will be added to the

Ablation Experiment
In this section, ablation experiments were carried out on some of the improvements proposed in this paper and the experimental results proved its effectiveness. First, to verify the effectiveness of the CA attention mechanism module, the Deeplabv3+ model, equipped with the CA module, was compared with the original model through experiments. It was verified through the analysis of experimental results. Then, in order to prove the applicability of the focal loss function, a comparative experiment was carried out with the Deeplabv3+ model based on the cross-entropy loss function. The experimental results show that the focal loss function can significantly improve the segmentation effect compared with the cross-entropy loss function in data set segmentation with uneven samples. Secondly, the influence of the improved ASPP module proposed in this paper on the performance of the Deeplabv3+ model is compared and based on the above improvements, the effectiveness of the FPPM proposed in this paper on SAR image semantic segmentation is further verified.
In the

Ablation Experiment
In this section, ablation experiments were carried out on some of the improvements proposed in this paper and the experimental results proved its effectiveness. First, to verify the effectiveness of the CA attention mechanism module, the Deeplabv3+ model, equipped with the CA module, was compared with the original model through experiments. It was verified through the analysis of experimental results. Then, in order to prove the applicability of the focal loss function, a comparative experiment was carried out with the Deeplabv3+ model based on the cross-entropy loss function. The experimental results show that the focal loss function can significantly improve the segmentation effect compared with the cross-entropy loss function in data set segmentation with uneven samples. Secondly, the influence of the improved ASPP module proposed in this paper on the performance of the Deeplabv3+ model is compared and based on the above improvements, the effectiveness of the FPPM proposed in this paper on SAR image semantic segmentation is further verified.
In the Table 3, we have listed all the changes to the Deeplabv3+ network. ✓' indicates that the experimental group in this row has used the method represented by this column, and the blank indicates that the experimental group in this row has not used the method represented by this column.  Figure 8 shows the results of SAR image processing by the model. The red boxes indicate the areas where this method performs better than other methods. From top to bottom, there are original SAR images, optical images, tag images, and the output results of the first five groups of ablation experiments. The experimental results show that the focal loss function and CA module are helpful for the processing tasks of Deeplabv3+ network, and improving the ASPP module is also helpful for improving the processing speed of the model.   Figure 8 shows the results of SAR image processing by the model. The red boxes indicate the areas where this method performs better than other methods. From top to bottom, there are original SAR images, optical images, tag images, and the output results of the first five groups of ablation experiments. The experimental results show that the focal loss function and CA module are helpful for the processing tasks of Deeplabv3+ network, and improving the ASPP module is also helpful for improving the processing speed of the model.   Figure 8 shows the results of SAR image processing by the model. The red b indicate the areas where this method performs better than other methods. From to bottom, there are original SAR images, optical images, tag images, and the output re of the first five groups of ablation experiments. The experimental results show tha focal loss function and CA module are helpful for the processing tasks of Deepla network, and improving the ASPP module is also helpful for improving the proces speed of the model.   Figure 8 shows the results of SAR image processing by the model. The red b indicate the areas where this method performs better than other methods. From to bottom, there are original SAR images, optical images, tag images, and the output re of the first five groups of ablation experiments. The experimental results show tha focal loss function and CA module are helpful for the processing tasks of Deepla network, and improving the ASPP module is also helpful for improving the proces speed of the model.   Figure 8 shows the results of SAR image processing by the model indicate the areas where this method performs better than other metho bottom, there are original SAR images, optical images, tag images, and th of the first five groups of ablation experiments. The experimental result focal loss function and CA module are helpful for the processing tasks network, and improving the ASPP module is also helpful for improving speed of the model.   Figure 8 shows the results of SAR image processing by the model. The red boxes indicate the areas where this method performs better than other methods. From top to bottom, there are original SAR images, optical images, tag images, and the output results of the first five groups of ablation experiments. The experimental results show that the focal loss function and CA module are helpful for the processing tasks of Deeplabv3+ network, and improving the ASPP module is also helpful for improving the processing speed of the model.   Figure 8 shows the results of SAR image processing by the model. The red b indicate the areas where this method performs better than other methods. From to bottom, there are original SAR images, optical images, tag images, and the output re of the first five groups of ablation experiments. The experimental results show tha focal loss function and CA module are helpful for the processing tasks of Deepla network, and improving the ASPP module is also helpful for improving the proces speed of the model.  Figure 8 shows the results of SAR image processing by indicate the areas where this method performs better than o bottom, there are original SAR images, optical images, tag ima of the first five groups of ablation experiments. The experim focal loss function and CA module are helpful for the proce network, and improving the ASPP module is also helpful for speed of the model.   Figure 8 shows the results of SAR image processing by the model. The red boxes indicate the areas where this method performs better than other methods. From top to bottom, there are original SAR images, optical images, tag images, and the output results of the first five groups of ablation experiments. The experimental results show that the focal loss function and CA module are helpful for the processing tasks of Deeplabv3+ network, and improving the ASPP module is also helpful for improving the processing speed of the model.   Figure 8 shows the results of SAR image processing by the model. The red b indicate the areas where this method performs better than other methods. From to bottom, there are original SAR images, optical images, tag images, and the output re of the first five groups of ablation experiments. The experimental results show tha focal loss function and CA module are helpful for the processing tasks of Deepla network, and improving the ASPP module is also helpful for improving the proces speed of the model. 90.3 Figure 8 shows the results of SAR image processing by the model indicate the areas where this method performs better than other metho bottom, there are original SAR images, optical images, tag images, and th of the first five groups of ablation experiments. The experimental result focal loss function and CA module are helpful for the processing tasks network, and improving the ASPP module is also helpful for improving speed of the model.  Figure 8 shows the results of SAR image processing by indicate the areas where this method performs better than o bottom, there are original SAR images, optical images, tag ima of the first five groups of ablation experiments. The experim focal loss function and CA module are helpful for the proce network, and improving the ASPP module is also helpful for speed of the model.  Figure 8 shows the results of SAR image pro indicate the areas where this method performs be bottom, there are original SAR images, optical imag of the first five groups of ablation experiments. Th focal loss function and CA module are helpful fo network, and improving the ASPP module is also speed of the model.  Figure 8 shows the results of SAR image processing by the model. The red boxes indicate the areas where this method performs better than other methods. From top to bottom, there are original SAR images, optical images, tag images, and the output results of the first five groups of ablation experiments. The experimental results show that the focal loss function and CA module are helpful for the processing tasks of Deeplabv3+ network, and improving the ASPP module is also helpful for improving the processing speed of the model.

Results Comparison on Composite Images
To show the performance of the improved Deeplabv3+ model, the effect of the model on synthetic data is evaluated first. This section uses the model to process the composite data by adding noise to the label image as the input image. This paper uses two traditional shallow image segmentation algorithm models: the adaptive threshold method and the maximum inter-class variance method (OSTU algorithm). At the same time, four methods based on depth learning algorithm are used: FCN, PSPNet, Deeplabv3+, and the improved Deeplabv3+ in this paper, and the segmentation results of the above six methods on synthetic images are compared. Finally, the output results of FCN, PSPNet, Deeplabv3+ and the improved Deeplabv3+ are compared with the original label graph, and the results are shown in Table 4.

Results Comparison on Composite Images
To show the performance of the improved Deeplabv3+ model, the effect of the model on synthetic data is evaluated first. This section uses the model to process the composite data by adding noise to the label image as the input image. This paper uses two traditional shallow image segmentation algorithm models: the adaptive threshold method and the maximum inter-class variance method (OSTU algorithm). At the same time, four methods based on depth learning algorithm are used: FCN, PSPNet, Deeplabv3+, and the improved Deeplabv3+ in this paper, and the segmentation results of the above six methods on synthetic images are compared. Finally, the output results of FCN, PSPNet, Deeplabv3+ and the improved Deeplabv3+ are compared with the original label graph, and the results are shown in Table 4.  Table 4 shows that compared with the traditional shallow method, the deep learning method has higher segmentation accuracy. From the traditional method to the deep learning method, SAR image semantic segmentation accuracy has made significant progress, increasing by more than 30%. In the range of deep learning, the accuracy and mIoUcls of the improved Deeplabv3+ network model in this paper have been improved to some extent compared with other previous algorithms. Compared with the original Deeplabv3+ network, the accuracy has been improved by 2.44%, and the IoUcls4 has been improved by nearly 5%. Figure 9 shows the results of the model after processing the composite image. From top to bottom are the original SAR image, optical image, composite image, tag image, adaptive threshold method result, OSTU algorithm result, FCN network result, PSPNet network result, Deeplabv3+ network result, and the result of the improved network model in this paper. It can be seen from the area marked by the red box in Figure 9 that after the improvement of the Deeplabv3+ network model, the model has more accurate image segmentation, greatly improved recognition ability of small targets, more accurate recognition of the shape and contour of the target, and transparent edges and corners can be achieved for the segmentation of irregular targets. The background areas between different targets have been clear. It can be seen that the SAR image semantic segmentation model based on the improved Deeplabv3+ network  Table 4 shows that compared with the traditional shallow method, the deep learning method has higher segmentation accuracy. From the traditional method to the deep learning method, SAR image semantic segmentation accuracy has made significant progress, increasing by more than 30%. In the range of deep learning, the accuracy and mIoU cls of the improved Deeplabv3+ network model in this paper have been improved to some extent compared with other previous algorithms. Compared with the original Deeplabv3+ network, the accuracy has been improved by 2.44%, and the IoU cls4 has been improved by nearly 5%. Figure 9 shows the results of the model after processing the composite image. From top to bottom are the original SAR image, optical image, composite image, tag image, adaptive threshold method result, OSTU algorithm result, FCN network result, PSPNet network result, Deeplabv3+ network result, and the result of the improved network model in this paper. It can be seen from the area marked by the red box in Figure 9 that after the improvement of the Deeplabv3+ network model, the model has more accurate image segmentation, greatly improved recognition ability of small targets, more accurate recognition of the shape and contour of the target, and transparent edges and corners can be achieved for the segmentation of irregular targets. The background areas between different targets have been clear. It can be seen that the SAR image semantic segmentation model based on the improved Deeplabv3+ network proposed in this paper can achieve a better semantic segmentation effect in the processing of composite images.

Results on SAR Images
In order to verify that the improved Deeplabv3+ model proposed in this paper can improve the segmentation effect of SAR images, this section used the SAR images of Nanjing and its surrounding areas, and carried out experiments on two traditional shallow image segmentation algorithm models: adaptive threshold method and maximum interclass variance method (OSTU algorithm). At the same time, four methods based on deep learning algorithm were used: FCN, PSPNet Deeplabv3+ and the improved Deeplabv3+ models in this paper have carried out semantic segmentation experiments. Finally, the SAR image segmentation results of the above six methods were compared, and the results are shown in Table 5.

Results on SAR Images
In order to verify that the improved Deeplabv3+ model proposed in this paper can improve the segmentation effect of SAR images, this section used the SAR images of Nanjing and its surrounding areas, and carried out experiments on two traditional shallow image segmentation algorithm models: adaptive threshold method and maximum inter-class variance method (OSTU algorithm). At the same time, four methods based on deep learning algorithm were used: FCN, PSPNet Deeplabv3+ and the improved Deeplabv3+ models in this paper have carried out semantic segmentation experiments. Finally, the SAR image segmentation results of the above six methods were compared, and the results are shown in Table 5. As can be seen from Table 5, first of all, the segmentation accuracy of the deep learning method is higher than that of the traditional algorithm, which has improved by more than 40%, and will not cause label errors. Secondly, within the scope of deep learning algorithm, Deeplabv3+ network is significantly better than other networks in SAR image semantic segmentation and has dramatically improved accuracy and time consumption. The improved Deeplabv3+ network in this paper has higher accuracy, which is 2.09% higher than the original Deeplabv3+ network. This is because the focal loss function used in this paper has improved the backbone function of the network, reducing the impact of sample imbalance. The FPPM is used to post-process the feature map output from the network and the feature map generated by the previous network is filtered hierarchically to obtain better fine image features. The results of several network models are shown in Figure 10. The red boxes indicate the areas where this method performs better than other methods. From top to bottom, they are SAR images, optical images, label images, adaptive threshold method results, OSTU algorithm results, FCN network results, PSPNet network results, Deeplabv3+ network results, and the results of the methods proposed in this paper. It can be seen from Figure 10 that the SAR image semantic segmentation model based on the improved Deeplabv3+ network proposed in this paper is closer to the actual tag image in SAR image processing results, which further proves the effectiveness of this method in improving the SAR image semantic segmentation effect. In recent years, many experts and scholars at home and abroad have tried to improve the image semantic segmentation network to achieve better results in the SAR image semantic segmentation task. Many of these improved methods have made good progress, such as the Lightweight Attention Model proposed by FU Guodong [29], the CG-Net model proposed by Sun Y [30] and so on. Compared with these methods, the improved method proposed in this paper has more novel technology in feature information expansion and can fully use data feature information. The CBAM-Unet++ method presented by Zhao Z [31] in 2021 combines Unet++and convolution block attention module based on the original CBAM, which makes it easier for the architecture to ignore irrelevant background information, and adds the original feature map and channel attention output to the spatial attention network, thus improving the accuracy of image segmentation. However, this method is only a simple superposition of feature attention mechanism; without multi-dimensional aggregation of multiple attention mechanisms, only shallow indication features can be extracted and cannot be used in complex sample segmentation tasks with small sample sizes. In this paper, the pyramid idea is first proposed to be used in the attention mechanism module to post-process the feature map and comprehensively capture the image feature information, which can also improve the segmentation effect in the case of limited data. From the view of the ability of feature extraction, the change of loss function, and the efficiency of the ASPP model, the improved method of Deeplabv3+ network proposed in this paper is more practical. The above three experiments prove that the model presented in this paper has the best effect compared with other models.

Conclusions
As one of the main data sources of a satellite remote sensing platform, SAR is widely used in geological exploration and other fields because its imaging is not affected by climate and other conditions. As the imaging result of SAR, SAR image processing effect is of great significance in urban construction control, surface vegetation classification and other aspects. However, due to the multiplicative speckle noise in the SAR image, it is more difficult to accurately identify than the optical image, so using the existing network model for optical image segmentation is not satisfactory. Moreover, the SAR image imaging technology is relatively cumbersome and there is little data available for training. The existing network model needs to be improved in order to achieve better segmentation accuracy. Focusing on the Deeplabv3+ network, this paper deeply studies the SAR image In recent years, many experts and scholars at home and abroad have tried to improve the image semantic segmentation network to achieve better results in the SAR image semantic segmentation task. Many of these improved methods have made good progress, such as the Lightweight Attention Model proposed by FU Guodong [29], the CG-Net model proposed by Sun Y [30] and so on. Compared with these methods, the improved method proposed in this paper has more novel technology in feature information expansion and can fully use data feature information. The CBAM-Unet++ method presented by Zhao Z [31] in 2021 combines Unet++and convolution block attention module based on the original CBAM, which makes it easier for the architecture to ignore irrelevant background information, and adds the original feature map and channel attention output to the spatial attention network, thus improving the accuracy of image segmentation. However, this method is only a simple superposition of feature attention mechanism; without multi-dimensional aggregation of multiple attention mechanisms, only shallow indication features can be extracted and cannot be used in complex sample segmentation tasks with small sample sizes. In this paper, the pyramid idea is first proposed to be used in the attention mechanism module to post-process the feature map and comprehensively capture the image feature information, which can also improve the segmentation effect in the case of limited data. From the view of the ability of feature extraction, the change of loss function, and the efficiency of the ASPP model, the improved method of Deeplabv3+ network proposed in this paper is more practical. The above three experiments prove that the model presented in this paper has the best effect compared with other models.

Conclusions
As one of the main data sources of a satellite remote sensing platform, SAR is widely used in geological exploration and other fields because its imaging is not affected by climate and other conditions. As the imaging result of SAR, SAR image processing effect is of great significance in urban construction control, surface vegetation classification and other aspects. However, due to the multiplicative speckle noise in the SAR image, it is more difficult to accurately identify than the optical image, so using the existing network model for optical image segmentation is not satisfactory. Moreover, the SAR image imaging technology is relatively cumbersome and there is little data available for training. The existing network model needs to be improved in order to achieve better segmentation accuracy. Focusing on the Deeplabv3+ network, this paper deeply studies the SAR image semantic segmentation technology and proposes a network model suitable for SAR image semantic segmentation with good performance. The main work completed is as follows: (1) Feature Post-Processing Module: Aiming at the problem that there is multiplicative speckle noise in the SAR image, which makes some slender targets such as roads and rivers in the polarimetric SAR image challenging to be accurately identified during semantic segmentation, this paper proposes a Feature Post-Processing Module (FPPM), as a supplement to the original Deeplabv3+ network feature extraction module, to capture the image feature map in-depth and refined, and then send the folded feature map into different branches in turn, undertake the feature and channel attention mechanism, and complete the deep capture of local information on the feature map. This paper also determines the number of branches of the FPPM through the MVG model based on the image quality evaluation algorithm and improves the model structure. Compared with the original Deeplabv3+ network, the mIoU cls improves by 3.63% and the segmentation accuracy improves by 1.09%.
(2) Improvement of loss function: The SAR image imaging technology is complicated and the data processing is complex. Therefore, the SAR image data set that can be used in the segmentation field is limited and the distribution of feature labels in most data sets is highly uneven, which significantly impacts the segmentation effect. To solve this problem, this paper uses the unbalanced sample focal loss function to optimize the original Deeplabv3+ network and improves the model performance degradation caused by the uneven data set. By adjusting the internal weighting of the loss function, the training times of simple category samples are reduced and the mIoU cls is increased by 1.01%.
(3) Improvement of ASPP module: Because the convolution method of the ASPP model in the original Deeplabv3+ network is too simple and direct, it can lead to the overlapping of the information learned in the convolution layer, bringing a large number of redundant parameters, and can increase unnecessary time for model training. Aiming at this problem, this paper decomposes the ASPP model in 2D and decomposes the convolution layer into 3 × 1 and 1 × 3. The parallel stacking of convolution can not only keep the void ratio of the model unchanged but can also reduce the number of parameters by about 33% and the model's training time by 19 ms.
To sum up, the improved Deeplabv3+ network model proposed in this paper has achieved good results in the SAR image semantic segmentation task, improving the segmentation accuracy of the original model by 1.09% and the mIoU cls by 4.64%. However, due to the use of the pre-training model for training, this module is currently only optimized for the feature map extracted from the previous network. Whether the attention mechanism module can be added to the backbone network of the Deeplabv3+ model in the future, and whether the attention module can be added to the overall network to achieve the goal of re-optimizing network performance while ensuring that the network does not fit, is worth further exploration in the future.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to [this data is also used for other experimental studies and cannot be made public].