AMFA-DeepLab: An Improved Lightweight DeepLabV3+ Adaptive Multi-Statistic Fusion Attention Network for Sea Ice Segmentation in GaoFen-1 Images

Hao, Zengzhou; Li, Xin; Zhu, Qiankun; Li, Yunzhou; Mao, Zhihua; Chen, Jianyu; Pan, Delu

doi:10.3390/rs18050783

Open AccessArticle

AMFA-DeepLab: An Improved Lightweight DeepLabV3+ Adaptive Multi-Statistic Fusion Attention Network for Sea Ice Segmentation in GaoFen-1 Images

by

Zengzhou Hao

^1,2,3,*

,

Xin Li

^1,4,

Qiankun Zhu

¹,

Yunzhou Li

^2,3

,

Zhihua Mao

¹

,

Jianyu Chen

¹ and

Delu Pan

^1,3

¹

State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China

²

Institute of Oceanographic Instrumentation, Qilu University of Technology (Shandong Academy of Sciences), Qingdao 266061, China

³

Academician Workstation of Shandong Province, Shandong Academy of Sciences, Jinan 250014, China

⁴

School of Information Engineering, Zhejiang Ocean University, Zhoushan 316022, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(5), 783; https://doi.org/10.3390/rs18050783

Submission received: 9 January 2026 / Revised: 14 February 2026 / Accepted: 2 March 2026 / Published: 4 March 2026

(This article belongs to the Topic AI for Natural Disasters Detection, Prediction and Modeling)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

By adopting MobileNetV2 as the backbone in DeepLabV3+, the model achieves extreme lightweight design with only 5.85 million parameters and the fast inference speed with 281.76 frames per second.
By constructing an innovative adaptive multi-statistic fusion attention (AMFA) module, the model dynamically enhances image edge features while suppressing background interference for sea ice segmentation in GaoFen-1 images.

What are the implications of the main findings?

A new AMFA-DeepLab is proposed by integrating the MobileNetV2 and AMFA modules in the DeepLabV3+ network model and is applied for sea ice segmentation in GaoFen-1 Images.
The AMFA-DeepLab achieves high-precision segmentation of sea ice under complex marine conditions, with precision, recall, F1-score, and intersection over union reaching 94.82%, 97.03%, 95.91%, and 92.15%, respectively. It also maintains extremely low model complexity—only 10.7% that of the DeepLabV3+ model—while reducing training time by nearly 20%.

Abstract

For addressing difficult detail extraction and low operating efficiency in monitoring sea ice in a large area with wide-field-of-view images from the Chinese Gaofen-1 satellite, a lightweight, high-precision sea ice segmentation network adaptive multistatistic fusion attention (AMFA) module using DeepLabV3+ as the base architecture (AMFA-DeepLab) is proposed. First, the module replaces the backbone network with a lightweight MobileNetV2 to ensure feature extraction capability and greatly reduce model computational complexity using inverted residuals and depthwise separable convolution. Second, to solve the problems of fragmented ice texture blurring and speckle noise interference in optical images, an AMFA is designed and introduced into the decoder side. This module innovatively integrates the global median pooling branch and adapts the recalibrated feature weight through a dynamic channel mixing mechanism, effectively enhancing the model’s capability of capturing fine sea ice edge features and its antinoise robustness in complex backgrounds. Experimental results based on the dataset from Liaodong Bay in the Bohai Sea of China show that the intersection over union of AMFA-DeepLab reaches 92.15% and the F1-score reaches 95.91%, increases of 3.06%, and 1.68%, respectively, compared with those of the baseline model. In addition, only 5.85 million model parameters are needed, the training time is shortened to 4.42 h, and the inference speed is 281.76 frames per second. Visualized analysis and generalization test further demonstrates that this model can accurately eliminate clutter interference from coastal land and seawater and extract the fine filamentous structure of drift ice in the scene of complex melting ice. This research overcomes the precision bottleneck while achieving an ultimate lightweight model, providing efficient technical support for operational dynamic monitoring of sea ice disasters based on Chinese GaoFen-1 satellites.

Keywords:

sea ice; lightweight network; DeepLab; attention module; GaoFen-1

1. Introduction

Changes in sea ice occupy an important position in the global climatic system. Sea ice not only influences global climatic patterns but also has far-reaching impacts on the ecosystem and shipping safety. On a global scale, China is one of the most impacted countries suffering from sea ice disasters. The occurrence of such disasters has close correlations with environmental conditions such as seawater temperature and sea area climate [1]. Affected by the dynamic features and potential risks of sea ice, some sailing activities face severe challenges that pose threats to sailing safety and ship operating efficiency [2]. Moreover, in the field of marine fisheries, floating sea ice on the sea may destroy aquaculture facilities in nearshore areas, thus resulting in the death of fish and shrimp owing to insufficient dissolved oxygen in the water [3]. Consequently, accurately and efficiently obtaining information on sea ice distribution has a key practical value for safeguarding marine shipping, fishery aquaculture, and the marine ecological environment. At present, the core technical means for sea ice monitoring include on-site icebreaker cruising, deployment of coastal observation stations, and satellite remote sensing. However, because sea ice is mostly distributed in high-latitude remote sea areas, the extreme weather conditions and complex marine conditions make traditional monitoring means costly and high-risk. With its rapid development, satellite remote sensing technology has become a mainstream means for sea ice monitoring in large areas owing to its wide coverage, excellent temporal resolution, and rapid data acquisition.

The development history of sea ice remote sensing image recognition technology entails three stages: traditional image processing, shallow machine learning, and deep learning. Each method has its advantages and disadvantages with respect to feature extraction and applicable scenes. In the early stages, traditional image processing methods primarily relied on the physical differences in grayscale, texture, and spectrum between sea ice, seawater, and land to achieve extraction through threshold segmentation or edge detection. For example, Liu et al. [4] and Zou et al. [5] completed the classification of sea ice in the Arctic and the Antarctic, respectively, by using differences in the backscattering coefficient and texture. Li et al. [6] enhanced the sea ice drifting monitoring capability of Synthetic Aperture Radar (SAR) images by texture feature screening. Such methods are characterized by directly viewing physical meaning and convenient computation, but their threshold setting relies highly on human experience, making it difficult to adapt to complex and varied marine conditions and lighting conditions. Afterward, shallow machine learning methods represented by support vector machines and random forests emerged gradually. Zhou et al. [7] extracted sea ice parameters using FY-3 data by combining spectral and texture features. Li et al. [8] introduced a random forest algorithm to achieve automatic classification of sea ice under clouds with FY-4A data. Li et al. [9] achieved accurate extraction for sea ice in Liaodong Bay based on HY-1C/D data using multi-channel feature rules. However, such methods still need sophisticated artificial feature engineering, and the shallow models have difficulty in capturing high-dimensional semantic information. Consequently, their generalization capability is still limited in complex scenes with cloud and fog interference or vague sea ice–seawater boundaries.

In recent years, the breakthrough of deep learning technology has brought about a paradigm shift for remote sensing interpretation. Classic architectures such as fully convolutional networks [10] and U-Net [11] significantly improved sea ice recognition precision through their end-to-end feature-learning capability. At present, many scholars have proposed modified models for different data sources. Owing to their advantage of all-weather observation, SAR images are researched most widely. For example, Xu et al. [12] optimized the U-Net hybrid loss function for Sentinel-1 dual-polarized SAR data. Zhang et al. [13] constructed MSI-ResNet based on Gaofen-3 (GF-3) fully polarized SAR images. Dai et al. [14] proposed an improved IceDeepLab network for Arctic sea ice classification, demonstrating the effectiveness of the DeepLabV3+ architecture in processing large-scale SAR imagery. Sun et al. [15] validated the effectiveness of DeepLabV3+ in SAR images of the Greenland Sea. For high-spectral and multispectral images, Han et al. [16,17] explored deep spectral features of sea ice in Baffin Bay using three-dimensional convolutional and multiscale feature fusion networks. Furthermore, for modification of the universal model, Zhou et al. [18] and Zheng et al. [19] further improved the performance of YOLO and U-Net++ in sea ice monitoring in polar regions separately by introducing an attention module and modifying the convolution module.

Although many research results have been attained thus far, there remains an evident lag in the adaptation between data sources and algorithms in the field of sea ice monitoring based on deep learning. Current deep learning models mostly focus on SAR data or some optical satellite data. However, research on models customized to Chinese GF-1 satellite wide-field-of-view (WFV) sensors is relatively scarce. GF-1 WFV sensor observes 16 m-resolution broad-width images, which can concurrently achieve large coverage area and fine texture, making them very suitable for sea ice disaster monitoring in nearshore aquaculture areas.

Nevertheless, in contrast to SAR data’s focus on backscattering, GF-1 data are characterized by the following: (1) sea ice in GF-1 high-resolution optical images being easily influenced by shielding from clouds and fog; (2) the presence of spectral confusion between thin ice and seawater; and (3) fractal features and fragmented ice details of sea ice edges being more complex owing to the increase in resolution. If existing models designed for SAR or medium- and low-resolution images are directly used, problems such as edge blur and missed detection of tiny floating ice blocks typically occur.

Among the numerous deep learning models, the DeepLabV3+ [20] model exhibits an extremely high potential for solving the above optical sea ice recognition problems owing to its unique structural advantage. Sea ice has a significant multiscale feature in remote sensing images, as they contain not only large-area fixed ice blocks but also fragmented tiny floating ice blocks. The Atrous Spatial Pyramid Pooling (ASPP) module introduced in the DeepLabV3+ model can effectively capture multiscale contextual information through atrous convolutions with different dilation rates. In addition, the encoder–decoder architecture of the model can effectively restore the spatial resolution, and it can better preserve the fine irregular boundaries of sea ice compared with the traditional fully convolutional network architecture. Drawing on its architectural advantages, DeepLabV3+ has exhibited considerable potential for the fine-grained segmentation of diverse and complex terrestrial features. For example, Chen et al. [21] incorporated an axial attention mechanism into DeepLabV3+ to enhance the accuracy of oil spill detection in SAR imagery, while Feng et al. [22] and Yang et al. [23] further validated the efficacy of improved DeepLabV3+ architectures—through the introduction of multiscale attention and cross-scale feature interaction modules, respectively—in applications such as tailings pond identification and crop disease monitoring. Collectively, these studies underscore the strong capability of DeepLabV3+ as a baseline model for handling intricate textures and multiscale targets.

However, the DeepLabV3+ model still faces two key defects in business-oriented monitoring by directly adapting GF-1 WFV high-resolution images. The first is the contradiction between the computational redundancy and lightweight demand of the model. The original DeepLabV3+ model usually employs Xception [24] as its backbone network, which is characterized by numerous parameters and high computational complexity. Sea ice monitoring typically has an extremely high requirement for timeliness, the large GF-1 WFV width leads to too heavy a model, and the greatly increased data quantity makes is hard to achieve rapid processing of large-area images under limited computational power. The second is the missed detection of small targets caused by insufficient utilization of channel features. In GF-1 images, fragmented ice and thin ice exhibit little difference from the background seawater in terms of spectral features. The ASPP module of DeepLabV3+ mainly focuses on multiscale fusion on the spatial dimension and neglects the difference in feature response on the channel dimension. The lack of an effective channel attention module causes the model to fail to automatically suppress the response of the noise channel, making it difficult to focus on the key channel expressing the fragmented ice texture, finally resulting in confusing the sea ice with the complex background.

Therefore, to meet the dual requirements of sea ice detail extraction precision and real-time processing timeliness for GF-1 WFV data, a lightweight, high-precision sea ice segmentation network AMFA module using DeepLabV3+ as the base architecture (AMFA-DeepLab) is proposed. The two main modifications are the following: (1) For the computational efficiency, a lightweight MobileNetV2 network is used to replace the original computation-intensive Xception backbone network. With its unique inverted residuals and depthwise separable convolution, the network greatly reduces the number of model parameters and GPU memory usage while preserving the efficient extraction capability for multiscale features of sea ice. (2) To address the difficult problem of fine feature recognition, an AMFA module is designed and introduced in the decoder feature fusion stage. In contrast to a routine single pooling strategy, the module innovatively integrates average, maximum, and global median pooling and generates adaptive channel weights to recalibrate the features through a dynamic mixing mechanism. In particular, the introduced median pooling significantly enhances the robustness of the model against speckle noise in remote sensing images so that the model can better capture fine sea ice textures and effectively suppress the interference from complex backgrounds. Through the above modifications, AMFA-DeepLab achieves the best balance between high precision and light weight, providing effective technical support for business-oriented dynamic monitoring of large-area sea ice based on Chinese high-resolution satellites.

2. Materials and Methods

2.1. Base Architecture of DeepLabV3+ and Its Limitations

DeepLabV3+ is a highly representative encoder–decoder framework used in image semantic segmentation. This model is based on DeepLabV3 and introduces an ASPP module. It can efficiently capture multiscale contextual information through atrous convolutions with different dilation rates, effectively solving the difficult problem of distinguishing sea ice blocks of great differences in size. In addition, the decoder module can better restore sea ice edge details by fusing bottom-level spatial features and high-level semantic features. However, the original DeepLabV3+ usually uses Xception as its backbone network, and it has evident limitations in large-area sea ice monitoring based on GF-1 WFV data. Because the Xception network is characterized by numerous parameters and high computational complexity, meeting the harsh requirement for real-time capability in emergency monitoring for sea ice disasters is challenging. Moreover, the Xception network lacks self-adaption in the weight distribution for feature channels; consequently, segmentation precision easily decreases because of noise accumulation in complex marine conditions with cloud and fog interference.

2.2. The Overall Framework of the AMFA-DeepLab Network

The modification of the DeepLabV3+ network in this study is primarily embodied in the following two aspects: (1) The backbone network Xception module is replaced with a lightweight MobileNetV2 module. With its unique inverted residuals introduced into the linear bottleneck layer, Xception significantly reduces the number of parameters and computational complexity of the model while ensuring sea ice feature extraction precision. (2) An AMFA module is introduced after the fusion of high- and lower level features of the decoder. This module introduces the global median pooling branch against the speckle noise interference common in sea ice remote sensing images on the basis of traditional average pooling and maximum pooling and adaptively adjusts the weights of all statistics through dynamic channel mixing to conduct secondary refining of the fused feature images. Experimental results show that the AMFA module can effectively suppress incorrect activation of non-ice areas, enhancing the capability of the model for capturing fragmented ice and edge details. The modified model provides more accurate solutions for large-area sea ice monitoring while preserving efficient reasoning. The overall architecture of the AMFA-DeepLab network model proposed here is shown in Figure 1.

2.2.1. MobileNetV2 Module

To balance the precision and computational efficiency of the model in sea ice feature extraction, we selected MobileNetV2 module as the backbone network. As a lightweight convolutional neural network specially designed for mobile terminals, the core innovation of MobileNetV2 lies in introducing inverted residuals into the linear bottleneck layer. As shown in Figure 2, this architecture exhibits two forms by convolution stride, each of which contains three key feature transformation stages. The first is the dimension expansion stage, in which the feature channels are expanded using 1 × 1 convolution and the low-dimensional features are mapped to a high-dimensional space with the ReLU6 activation function. The second is the depthwise separable convolution stage, in which spatial feature extraction is conducted channel by channel through 3 × 3 convolution kernels to greatly reduce the number of parameters and computational cost. The third is the dimension reduction stage, in which the features are projected to a low-dimensional space through 1 × 1 convolution. This architecture removes nonlinear activation in the end dimension reduction layer and constitutes a linear bottleneck layer with linear activation to avoid information loss. Furthermore, when the stride is 1 (the upper half in Figure 2), the architecture introduces a shortcut connection similar to that of ResNet to conduct addition for input and output element by element, forming inverted residuals. When the stride is 2 (the lower half in Figure 2), downsampling is conducted, and the shortcut connection is not used. Because fine features such as thin ice texture and fragmented ice edges are relatively fragile in high-dimensional manifolds, the introduced linear bottleneck layer and inverted residuals can effectively preserve the edge details and texture structures of sea ice to a maximum extent, enhancing the analytical ability of the model for complex ice conditions.

2.2.2. AMFA Module

A channel attention module is generally used to improve model sensitivity to key features by explicitly modeling the interdependence between channels. The classic Squeeze-and-Excitation module [25] captures background statistical information using only global average pooling, whereas the convolutional block attention module [26] further introduces global maximum pooling to enhance the extraction of significant features. Nevertheless, in processing high-resolution sea ice remote sensing images, the above two methods have two shortcomings. The first is that high-frequency noise of sea surface wave light easily interferes with maximum pooling, resulting in incorrect activation of non-ice areas. The second is that the existing feature fusion mostly uses simple addition or splicing, lacking the ability to dynamically adjust the weights of different statistics based on image contents.

Consequently, we propose an AMFA module to reduce the impact of these two shortcomings. The AMFA module follows a three-stage architecture of complementary statistics extraction, nonlinear feature enhancement, and adaptive dynamic fusion. Specifically, this module is embedded into a node following the deep and shallow feature fusion stage of the DeepLabV3+ decoder. The fused features will contain the high-level semantic information and the low-level spatial details. Applying an attention module here can refine the pixel-level boundaries between sea ice and seawater while most effectively suppressing shallow-feature background noise. The structure of the AMFA module is shown in Figure 3.

When the input feature map after decoder fusion is

X \in R^{B \times C \times H \times W}

, the AMFA module first compresses the spatial dimension through three parallel global pooling operations, generating three channel descriptors, which separately capture the distribution characteristics of the feature map on different statistical dimensions. The global average pooling (

F_{a v g}

), extracting the smooth statistical information of the background, reflects the overall activation level of the feature channel. The global maximum pooling (

F_{m a x}

), extracting the most significant texture and edge features, focuses on the key target area. The global median pooling (

F_{m e d}

) aims at the distribution of salt-and-pepper noise specific to sea ice images. Unlike global average and maximum pooling, the global median pooling has natural statistical robustness against isolated high-frequency noises while preserving the representative central trend of feature distribution. In this study, we innovatively introduce median pooling to effectively filter isolated high-frequency noise, thereby compensating for the shortcoming that maximum pooling is sensitive to noise. Afterward, the above three descriptors perform nonlinear mapping separately through a Multi-Layer Perceptron (MLP), with weights shared to yield transformed high-dimensional feature vectors,

V_{a v g}, V_{m a x}, V_{m e d} \in R^{B \times C \times 1 \times 1}

. The weight sharing ensures that the three statistical channels transform in the same mapping space, ensuring fairness in subsequent fusion and avoiding the introduction of too many additional parameters.

To achieve adaptive selection of features, we designed a dynamic channel mixer. In contrast to the static summation strategy of a traditional method, the strategy of this mixer is to dynamically generate normalized weights by learning the interrelationships of three statistics. First, the mixer stacks the original three pooling descriptors on dimension and captures the dependence between different statistics through a one-dimensional (

1 D

) convolutional layer. Second, it uses the

S o f t m a x

function to perform the normalization on the statistic dimension, generating adaptive weight vectors:

W = C o n v 1 D (S t a c k (F_{a v g}, F_{m a x}, F_{m e d})),

(1)

[α, β, γ] = S o f t m a x (W),

(2)

where

α, β, γ \in R^{B \times C}

represents the importance weights of average, maximum, and median features on the current channel, respectively, and satisfies

α + β + γ = 1

. Normalization constraints ensure that the fused features maintain a consistent numerical scale, preventing uncontrolled increase in feature amplitude. Finally, it uses the generated dynamic weights to perform weighted fusion of the MLP-transformed features, generating the final channel attention map,

M_{c}

, through the Sigmoid activation function, achieving recalibration of the original feature “

X

”:

M_{c} = σ (α \cdot V_{a v g} + β \cdot V_{m a x} + γ \cdot V_{m e d}),

(3)

Y = M_{c} \otimes X,

(4)

where

σ

is the Sigmoid function and

\otimes

denotes multiplication, element by element.

In summary, the three mechanisms in the AMFA module contribute to the improvement of model performance at different levels. Firstly, the global median pooling enhances noise robustness by filtering out isolated high-frequency noises, and it can solve the problem of false activation in global max pooling within sea ice segments. Secondly, the shared MLP achieves nonlinear feature enhancement across statistical measures with minimal parameter overhead. The dynamic channel mixer enables the network to adaptively emphasize the most informative statistics on each channel based on image content and overcomes the limitations of traditional fixed fusion strategies. Through the above processing, the AMFA module can preserve the key texture features of sea ice to a maximum extent while suppressing background noise, thus significantly improving the robustness of the model in complex marine conditions.

2.2.3. Loss Function

In sea ice semantic segmentation tasks, there is typically a dramatic imbalance of pixel quantity between sea ice (foreground) and seawater (background); consequently, the traditional single-pixel-level loss function easily leads to overfitting of the model to the background. To solve this problem, a hybrid loss function constructed from a weighted combination of binary cross entropy (BCE) and the Dice coefficient [27] was used in this study. The total loss function

L_{t o t a l}

is defined as follows:

L_{t o t a l} = L_{B C E} + λ L_{D i c e},

(5)

where

λ

is a hyperparameter used to balance the weights of two loss terms, and it is set to 1.0 in this study. BCE loss (

L_{B C E}

) is used to measure pixel-level classification accuracy, ensuring the convergence stability of the model. Owing to its well-behaved log-likelihood gradient, BCE loss provides stable and reliable gradients throughout the entire training process, which is particularly important for compensating the gradient instability that Dice loss may exhibit during early training stages. It can be calculated as follows:

L_{B C E} = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \cdot \log ({\hat{y}}_{i}) + (1 - y_{i}) \cdot \log (1 - {\hat{y}}_{i})],

(6)

where

N

denotes the total number of pixels in the batch,

y_{i} \in {0, 1}

denotes the true label of the

i

-th pixel (where 0 represents the background and 1 represents the sea ice), and

{\hat{y}}_{i} \in [0, 1]

denotes the value of the probability that the model predicts this pixel as sea ice.

Dice loss (

L_{D i c e}

) originates from the field of medical image segmentation. It is used to directly optimize the overlap degree of sets and has high robustness for foreground–background class imbalance. Since Dice loss evaluates the overlap ratio between predicted and ground truth regions rather than relying on absolute pixel counts, it is inherently invariant to class proportion changes and can automatically strengthen the learning signal for the minority class regardless of the ice-water ratio in different images. It can be calculated as follows:

L_{D i c e} = 1 - \frac{2 \sum_{i = 1}^{N} y_{i} {\hat{y}}_{i} + ϵ}{\sum_{i = 1}^{N} y_{i} + {\sum_{i = 1}^{N} \hat{y}}_{i} + ϵ},

(7)

where the numerator denotes the intersection of the predicted map and the true value map, the denominator denotes the union of the above two, and

ϵ

is a smooth term used to prevent the denominator from being zero (and is generally taken as 1 × 10⁻⁵). Through the combined optimization of the above two losses, the model can focus on pixel-level classification correctness and ensure the overall geometric integrity of the sea ice area. Therefore, BCE loss ensures convergence stability, while Dice loss provides ratio-invariant overlap optimization. The two losses form an effective complementary pair under equal weighting (λ = 1.0), providing sufficiently robust optimization for the current sea ice segmentation without the need for additional dynamic weighting mechanisms.

2.3. Network Training

2.3.1. Dataset

The experimental data selected for this study were from the WFV sensors carried by the Chinese GF-1 satellite, and the study area was the Liaodong Bay in the Bohai Sea, China, which poses considerable risk of sea ice disasters and has a long icing period in winter and a wide distribution of drift ice, making it an ideal scene for verifying the robustness of the sea ice recognition algorithm (see Figure 4). In order to ensure the spatial-temporal diversity and representativeness of the dataset, the large-width GF-1 WFV L1A original images were obtained in January and February, from 2014 to 2016. These selected images cover a variety of typical sea ice conditions, from thick ice in the icing period to broken ice floes in the melting period. Because the GF-1 WFV sensors have a 16 m spatial resolution and 800 km coverage width, they can effectively capture the extremely rich and heterogeneous sea ice texture features, which is enough to meet the training needs of deep learning. To ensure the repeatability of study, the standardized preprocessing process is implemented. First, in the data preprocessing stage, red, green, and blue bands of the WFV images were selected to synthesize true color images, so as to preserve the most direct viewing visual texture features of sea ice. Second, to construct high-quality training labels, the sea ice area was defined as the foreground (with a pixel value of 1), and the seawater, land, and cloud layer were collectively defined as the background (with a pixel value of 0); the annotation strategy combined semiautomatic threshold extraction, supplemented by morphological operation for initial segmentation with artificial expert check, thus building pixel-level binary semantic segmentation true values. Third, the original remote sensing images were processed using the overlapping sliding window cropping strategy. The crop size and sliding step were set to 512 × 512 and 256 pixels, respectively. Finally, in this study, 3066 pairs of image-label samples in total were constructed, with the principle of random sampling being followed. Based on a ratio of 7:2:1, 3066 pairs were divided into three sets (i.e., a training set containing 2146 pairs, a validation set containing 613 pairs, and a training set containing 307 pairs), providing a data basis for subsequent model training and performance evaluation.

2.3.2. Experimental Setting

All models in this study were constructed based on the PyTorch v2.5.1 deep learning framework and were subjected to accelerated training on an NVIDIA RTX 4090 GPU (with 24 GB GPU memory). An Adam optimizer was selected for parameter update, with the initial learning rate being set to 1 × 10⁻⁴ and the batch size being set to 8. To avoid getting stuck in local optima and improve model convergence, the cosine annealing learning rate schedule [28] was used to dynamically adjust the learning rate. To suppress the overfitting phenomenon and improve the robustness of the model generalization, online data augmentation, such as horizontal flip, vertical flip, and random scaling and photometric transformation, were introduced during training. Under the above configuration, the models were trained for 200 epochs in total and all of them tended to become stable and reach convergence states.

2.4. Evaluation Metrics

To comprehensively evaluate the overall performance of the modified model in sea ice semantic segmentation tasks, six evaluation metrics were selected, covering the segmentation precision and computational efficiency of the model. First, based on a pixel-level confusion matrix (including true positive (

T P

), false positive (

F P

), false negative (

F N

), precision (

P

), recall (

R

), F1-score (

F 1

), and intersection over union (

I o U

)), Ref. [29] were selected to quantify the accuracy of sea ice extraction. Precision reflects the reliability of the prediction results, recall reflects the sea ice detection ratio of the model, the F1-score is the integrated harmonic mean of the above two, and IoU is used to measure the degree of overlap of the predicted and true areas; these are the most critical metrics for semantic segmentation. These metrics are calculated as follows:

P = \frac{T P}{T P + F P},

(8)

R = \frac{T P}{T P + F N},

(9)

F 1 = 2 \times \frac{P \times R}{P + R},

(10)

I o U = \frac{T P}{T P + F P + F N},

(11)

where

T P

denotes the quantity of pixels that are correctly predicted as sea ice,

F P

denotes the quantity of background pixels that are incorrectly identified as sea ice, and

F N

denotes the quantity of sea ice pixels undetected. Then, owing to special requirements for real-time capability and hardware sources in sea ice disaster monitoring, model parameters training time and inference speed were introduced as efficiency evaluation metrics. The model parameters count the total quantity of all trainable weights (in millions), directly reflecting the spatial complexity of the model. The training time records the total time consumed by the model to complete the whole process of convergence on the training set (in hours) and is used to evaluate the computational cost and deployment potential of the algorithm. Finally, the inference speed is measured in frames per second (FPS), which calculates the average number of images processed by the model per second, serving as a key indicator to evaluate the real-time performance of the algorithm.

3. Results

3.1. The Results of the Ablation Experiments

To validate the effectiveness of the lightweight backbone network, MobileNetV2, and the AMFA module proposed in this study in sea ice segmentation tasks, step-by-step ablation experiments were designed and conducted on the sea ice dataset with the original DeepLabV3+ backbone (Xception backbone) as the baseline. The main comparison metrics for the experiment included P, R, F1, and IoU. The specific results are listed in Table 1.

It can be seen from Table 1 that, in Experiment 1, Xception with a relatively large number of parameters is used as the backbone network, and IoU is 89.09%. In Experiment 2, the backbone network is replaced with a lightweight MobileNetV2 network. To ensure that the high-level semantic features retain sufficient spatial resolution after backbone replacement, the output stride was maintained at 16, consistent with the original DeepLabV3+ configuration. Specifically, for the convolutional blocks beyond the fourth downsampling stage, the original stride of 2 was replaced with a stride of 1, and atrous (dilated) convolutions with a dilation rate of 2 were applied to compensate for the reduced receptive field, thereby preserving the spatial detail of high-level feature maps. The low-level features were extracted from the output of the fourth convolutional block (at stride 4 relative to the input), which are subsequently fed into the decoder module for multiscale feature fusion. Experimental results show that the segmentation performance of the model does not decrease with the significant decrease in the number of parameters. Instead, IoU and F1 increased to 91.96% and 95.81%, respectively. This counterintuitive increase indicates that, in cases in which the sea ice samples are relatively limited and the texture features are specific, the over-deep and over-wide Xception network may suffer from overfitting or feature redundancy, and the MobileNetV2 network with its more compact architecture can more efficiently extract effective features through its linear bottleneck layer architecture, providing a good lightweight basis for subsequent module embedding. In Experiment 3, the AMFA module constructed in this study is introduced on the basis of a lightweight DeepLabV3+ model. Compared with Experiment 2, although the precision fluctuates slightly in Experiment 3, R further increases from 96.42% to 97.03%, indicating an enhanced sensitivity of the model to missed targets. The most critical integrated metric, IoU, finally reaches 92.15% in Experiment 3 (with an increase of 0.19% compared with Experiment 2 and of 3.06%, compared with the baseline), and F1 reaches 95.91%. This fully demonstrates that the AMFA module does not involve simple parameter stacking but, through adaptive weighting of channel attention, effectively compensates for the potential shortages of the lightweight network in capturing such fine textures as thin ice and cracks.

To visually explore the influences of the module on model attention, Figure 5 visualizes the feature response heatmaps of the model in different ablation experiment stages. Each row represents a typical scene (from the top down: dense fragmented ice blocks, nearshore land, and complex drift ice). Each column represents a model in a different stage (from left to right: original image, Xception baseline, lightweight MobileNetV2, and MobileNetV2+AMFA). The redder the color is, the higher the confidence of the model for sea ice class will be; the bluer the color is, the stronger the background suppression will be.

In the first row (dense fragmented ice blocks) and the third row (complex drift ice), heatmaps based on MobileNetV2+AMFA exhibit extremely high edge sharpness in the relatively scattered and vague activated areas based on the baseline model. The sea ice areas exhibit highlighted and uniform red activation, whereas the ice cracks and seawater gaps are clearly depicted as blue tones. Such high contrast of distinct red and blue indicates that the AMFA module successfully leads the model to concentrate computational resources onto the core sea ice textures, dramatically improving the significance and confidence of features. In the nearshore land scene in the second row, it can be discerned that, in the heatmaps based on Xception and MobileNetV2 networks, the land area (on the lower-left corner) exhibits large areas of yellow and light green clutter, indicating that confusion occurs in the model’s feature extraction, with land and sea ice areas difficult to distinguish from each other. Comparatively, the heatmap based on the model with the AMFA module introduced exhibits pure dark blue in this area. This powerfully demonstrates that the median pooling branch in the AMFA module effectively filters off non-ice interference, including land textures, achieving perfect background suppression.

In summary, the visualized results corroborate the quantitative metrics for the ablation experiment from the perspective of interpretability: the AMFA module not only enhances the sensitivity of the model to sea ice but also greatly improves the recognition robustness of the model in complex backgrounds.

3.2. The Results of the Comparison Experiments

To compare the performance of different models, the following image segmentation networks were selected: SegNet, U-Net++ [30], U-NetV2 [31], MK-UNet [32], BiSeNetV2 [33], DeepLabV3+, and the AMFA-DeepLab proposed in this study. The evaluation metrics included P, R, F1, IoU, parameters, training time, and FPS.

To further validate the optimization efficiency and training stability of the modified model, Figure 6 shows the training loss convergence curves of the AMFA-DeepLab proposed in this study and of SegNet, U-Net++, U-NetV2, MK-UNet, BiSeNetV2, and the original DeepLabV3+ in 200 epochs. As can be clearly seen from Figure 6, the AMFA-DeepLab (red curve) exhibits steeper gradient during the initial stage of training, compared with modified networks such as U-Net++ (gray curve), U-NetV2 (cyan curve), and BiSeNetV2 (green curve). Although MK-UNet (purple curve) and U-NetV2 (cyan curve) perform better in the late stage than SegNet and U-Net++, the AMFA-DeepLab still maintains an evident advantage and finally stabilizes in the lowest loss value interval. The proposed AMFA-DeepLab is significantly better than the original DeepLabV3+ and other U-Net variants, indicating that the AMFA module more effectively captures the fine texture features of sea ice through multistatistic fusion.

Table 2 summarizes the detailed evaluation metrics for different models on the test set. The experimental results show that the AMFA-DeepLab proposed in this study gives the best results with respect to all key precision metrics. Specifically, the IoU and F1 of AMFA-DeepLab reach 92.15%, and 95.91%, respectively, and the P and R also maintain the highest level. Compared with U-Net++ (IoU 89.95%), U-NetV2 (IoU 90.07%), and MK-UNet (IoU 89.38%), the modified model exhibits an evident performance advantage, with IoU increasing by 2.20%, 2.08%, and 2.77%, respectively. In addition, compared with the baseline model, DeepLabV3+, the AMFA-DeepLab exhibits a 3.06% increase in precision. This result powerfully demonstrates that the AMFA module, through multistatistic fusion, can more accurately capture the fine texture and edge features of sea ice than the traditional network, thus dramatically reducing both the missed detection rate and false detection rate. In terms of computational efficiency, the AMFA-DeepLab exhibits an excellent lightweight advantage. The number of parameters of the AMFA-DeepLab is only 5.85 million, which is far lower than that of DeepLabV3+ (54.71 million), only ~60% of that of U-Net++ (9.95 million) and equivalent to that of BiSeNetV2 (5.19 million). Moreover, the training time of the model is only 4.42 h, which is nearly 47% shorter than that of U-Net++ (8.34 h). It is worth noting that in terms of reasoning speed, based on 512 × 512 input, BiSeNetV2 achieves a higher frame rate (430.86 FPS), but its segmentation accuracy IOU is only 87.80%. In contrast, AMFA-DeepLab achieves the highest accuracy of all comparison models while maintaining the super real-time speed of 281.76 FPS. This demonstrates that the modified model significantly increases data processing speed throughout while greatly reducing the demand for computational power, making it even more suitable for the demand for timeliness in business-oriented monitoring of large-area sea ice.

Figure 7 shows the distribution relationships of parameters (x axis) and IoU precision (y axis) of all these models. The x value of each bubble in Figure 7 corresponds to the parameter scale of the corresponding model. As shown in the Figure 7, the AMFA-DeepLab (red bubble) is independently located in the uppermost left area of the coordinate system. This location indicates that this model simultaneously has the lowest model complexity and the highest segmentation precision. Compared with other models in Figure 7, especially the lightweight BiSeNetV2 (green bubble) and MK-UNet (purple bubble), which are also located in the left low parameter area, AMFA-DeepLab not only achieves significant compression of parameters in the x axis but even overcomes the precision bottleneck of existing models in the y axis. In addition, compared with the original DeepLabV3+, it achieves higher accuracy with a significant reduction in the number of parameters. This powerfully demonstrates that combining the linear bottleneck layer of MobileNetV2 with an AMFA module can effectively resolve the inherent contradiction in the traditional network that lightweight design decreases precision, providing the best solution for real-time accurate monitoring of sea ice based on high-resolution remote sensing.

To evaluate the detail capture capability of the model under different marine conditions, Figure 8 shows a visualized comparative analysis of four representative typical scenes selected from the test set. Each block image has a size of 512 × 512 pixels with 16 m spatial resolution. The original image, label, and prediction results based on various models are shown in turn from left to right.

As can be discerned from the scene of narrow ice cracks in the first row, the results based on the SegNet and DeepLabV3+ models exhibit severe breaking phenomena, failing to maintain the continuity of ice cracks, so that long cracks are incorrectly segmented into unconnected short line segments. It is worth noting that the lightweight BiSeNetV2 almost completely missed the detection of fine ice cracks due to its limited feature capacity. Although the results based on U-Net++ restore the connectivity to a certain extent, detection of tiny parts of the crack tips are still missed. Comparatively, AMFA-DeepLab can accurately capture fine cracks with widths of only several pixels, with generated segmentation masks highly coinciding with the labels, demonstrating the advantage of the AMFA module with respect to maintaining high-frequency linear features.

In the scene of dense, fragmented sea ice blocks in the second row, there are numerous ice blocks with narrow gaps between them, and there are tiny, crushed ice residues scattered in the background, which can easily lead to adhesion or noise in the segmentation results. In this scene, owing to the severe interference from background fragments, SegNet outputs evident salt-and-pepper noise and fails to clearly identify the ice block boundaries. DeepLabV3+ and BiSeNetV2 tend to wrongly merge adjacent ice blocks, losing the narrow water channel features between ice blocks. The AMFA-DeepLab proposed in this study not only effectively filters off tiny interference in the background but also accurately separates closely adjacent independent ice blocks, clearly restoring the complete contour of each fragmented ice block without adhesion.

The third row corresponds to a scene of complex drift ice, with relatively complex textures. In this scene, the SegNet model lacks effective aggregation of contextual information, resulting in loss of many drift ice textures. U-Net++, U-NetV2, and MK-UNet reserve rough contours, but significant edge overlapping occurs in processing gradient details in the drift ice, easily resulting in adhesion. AMFA-DeepLab successfully captures the dynamic textures specific to drift ice, clearly restoring the fine fragmentation features in the surface of drift ice. It successfully reserves the complex texture structures in sea ice and accurately restores the gradient details at ice–water boundaries, reflecting the powerful analytical ability of the AMFA module regarding texture features.

The fourth row is a scene of coexisting nearshore land and sea ice. It can be seen that missed detections occur in the results based on SegNet, U-Net++, DeepLabV3+, and BiSeNetV2, with evident serration appearing. The edge curve generated by the model proposed in this study is smooth and quite close to the true boundary; in particular, extremely high positioning precision is exhibited at concave–convex transition locations on land.

In summary, the AMFA-DeepLab proposed in this study is significantly better than SegNet, U-Net++, U-NetV2, BiSeNetV2, MK-Unet, and DeepLabV3+ in terms of preserving the continuity of fine cracks, suppressing background noise, and restoring the complex texture details. These results fully verify that the multi statistics fusion strategy of AMFA module, which can effectively overcome the inherent semantic fuzziness in complex sea ice region, has strong robustness.

3.3. Results of the Generalization Tests

To further explore the adaptability of the model to complex sea ice texture across different years or during the transition to the melting season, a full-scale GF-1 WFV image obtained on 2 February 2017 was used for generalization test. This data was strictly excluded from the model training stage, as a challenging case.

In Figure 9, the original GF-1 WFV image shows the complex sea ice distribution, especially in the bottom of the sea ice region; it presents a complex filiform structure formed by drift ice, which is very difficult to retain via traditional segmentation models. The ice identification result confirms that the AMFA-DeepLab still has strong generalization ability. The result not only outlines the overall sea ice range, but also accurately captures the fine filament structure at the bottom, effectively distinguishing it from the open sea water area. Those results confirm that the model has learned robust feature representation, can adapt to complex sea conditions on other years, and is suitable for operational monitoring requirements and tasks in Liaodong Bay in the Bohai Sea, China.

4. Discussion

The proposed model for sea ice segmentation based on an improved lightweight DeepLabV3+ adaptive multi-statistic fusion attention architecture demonstrates improvements in model training efficiency, extraction accuracy, and noise resistance robustness compared to those of traditional models. By integrating the MobileNetV2 and AMFA modules, the model achieves lightweight training and captures more nuanced multiscale features, effectively overcoming the segmentation challenges posed by blurred texture features and weak boundary details in sea ice identification.

Unlike many studies that introduce complex attention mechanisms resulting in significantly increased computational load, this research successfully achieves a dual optimization of performance and efficiency by reconstructing the backbone network into a lightweight MobileNetV2. As shown in Table 1, the model’s training time is substantially reduced to 4.42 h, and the number of parameters decreases by approximately 90% compared to the original DeepLabV3+ model. Importantly, its inference speed reaches 281.76 FPS, which is significantly better than the U-Net++ (107.91 FPS) and SegNet (121.23 FPS). Although the inference speed of BiSeNetv2 is faster, its IOU (87.80%) is much lower than that of AMFA-DeepLab (92.15%). Therefore, AMFA-DeepLab achieves high inference efficiency while maintaining high accuracy, offering far superior computational efficiency and high accuracy in segmentation, better meeting the real-time requirements for emergency sea ice disaster monitoring.

The notable advantage of the proposed network model lies in its capability for fine-grained segmentation, which is particularly crucial for identifying irregularly shaped broken ice floes and narrow ice cracks. Traditional image segmentation processing approaches are often susceptible to interference from speckle noise. Some network models, including the U-Net series and DeepLabV3+, perform adequately in capturing overall contours and tend to produce fractures or adhesion when handling subtle textures. In contrast, the proposed AMFA-DeepLab incorporates a noise-resistant mechanism through the global median pooling within the AMFA module, which effectively suppresses false activations in non-ice regions, thereby significantly enhancing segmentation accuracy in edge areas. As shown in Figure 7, the architectural optimization has led to more accurate identification results. Particularly in complex drifting ice scenarios, the model successfully reduces false positives and clearly restores the transitional details at the ice–water interface. As shown in Figure 9, the model can still have the ability to capture the fine filament structure from some images taken in other years or during the melting period. The dynamic channel mixing mechanism of AMFA facilitates the network’s adept utilization of global contextual information while achieving adaptive feature recalibration through the fusion of average, max, and median statistics. This improvement not only reduces false detections but also substantially improves segmentation performance in challenging edge regions.

It is still a challenge to achieve appropriate segmentation performance across different seasons, specific regions, and complex environmental conditions. During the warm season, melting ice may significantly change the spectral reflectance of sea ice, leading to confusion with open water. While in the cold season, the heavy snow covering sea ice may smoothen the image texture details, posing a challenge to the model that depends on texture features. In addition, the current training dataset only covers the Liaodong Bay Area in the Bohai Sea, China, where the ice is mainly one-year ice, and it is less affected by snow cover. In contrast, polar environments usually contain complex multi-year sea ice, and some deep covered snow or some melting ice pools. The AMFA-DeepLab is trained based on the specific sea ice texture features on Liaodong Bay, and it may encounter domain offset problems when directly applied to polar regions. Therefore, it still needs to verify the generalization ability in polar regions with different ice types through transfer learning or domain adaptation technology.

However, the current model still has some limitations, primarily constrained by its reliance on optical remote sensing image. This study solely employs visual RGB images from Chinese GF-1 satellite WFV sensor and does not incorporate infrared imagery or integrate other spectral information. Consequently, the model may miss some texture information that is more sensitive to thermal or multiple spectra, which to some extent affects the model’s generalization ability. Additionally, The AMFA module exhibits insufficient capability for capturing the features of newly formed and melting ice image, leading to challenges in segmenting certain thin ice areas. Future research will focus on multimodal data fusion and dynamic temporal analysis. On one hand, to address the inherent drawback of optical imagery, the thermal infrared images and some spectral information will be introduced to expand the scale and diversity of the dataset and enhance model generalization. Furthermore, Synthetic Aperture Radar (SAR) data will be incorporated to construct an optical-infrared-radar feature fusion network, thereby strengthening the capability for all-weather sea ice segmentation and classification. On the other hand, considering that the formation and evolution of sea ice constitute a dynamic and evolving process, the integration of time-series remote sensing images with long-term sequence models can be adopted to enhance sea ice segmentation or identification. This future study will not only improve the sea ice monitoring, but will also enable tracking and prediction of sea ice, thereby providing more proactive and valuable decision support for disaster prevention in fields such as shipping, fisheries, and offshore operations.

5. Conclusions

Sea ice monitoring based on Chinese GF-1 WFV images suffers from large computational redundancy, weak antinoise ability, and easily missed detection of fine features. To address these problems, we constructed a modified lightweight AMFA-DeepLab segmentation network model. Through theoretical analysis, model construction, ablation experiments comparative experiments, and generalization tests, the following main conclusions can be drawn: (1) Replacing the Xception network with the MobileNetV2 backbone network greatly decreases the number of model parameters—from 54.71 million to 5.85 million, which is only 10.7% of those of the DeepLabV3+ model, and the training time is shortened by nearly 20%. Experimental results show that the linear bottleneck layer and inverted residuals of MobileNetV2 can even more efficiently extract sea ice features with the inference speed as high as 281.76 FPS without negatively impacting segmentation precision but with significant reduction in demand for computing capability, thereby successfully solving the difficult problem of timeliness in monitoring large-area sea ice. (2) For the speckle noise and weak texture feature specific to sea ice images, we proposed an AMFA module, which, through introducing global median pooling and dynamic weight distribution, effectively overcomes the limitations of traditional single pooling strategy. The ablation experiment and heatmap visualization analysis consistently demonstrate that this model significantly suppresses incorrect activation of nearshore land and seawater background (false detection) and greatly improves the response to dense fragmented ice blocks and narrow ice gaps (solving the problem of missed detection). The final IoU of the model reaches 92.15%, and it is the best of all of the models used for comparison. The generalization tests show that the AMFA-DeepLab has good cross-temporal and spatial robustness. Even for the complex drift ice scene during the melting period in 2017, the model can still accurately capture the filiform texture structure of sea ice, proving that it can adapt to inter-annual changes and some environmental disturbances. The AMFA-DeepLab designed in this study achieves high-precision segmentation of sea ice under complex marine conditions while maintaining extremely low model complexity. Not only is it better than the traditional SegNet and U-Net series models, but it achieves a balance between model computational efficiency and segmentation precision, overcoming the limitation that a lightweight network has in maintaining high precision. This method has strong robustness capability, and it can meet the demand for rapid emergency response monitoring of sea ice disasters based on Chinese GaoFen-1 satellite, thus being of important engineering application value. In future work, we will focus on improving the generalizability of the model by expanding the scale and diversity of the training datasets, which include different ice images on polar region and enhancing the capability for continuous dynamic monitoring of sea ice.

Author Contributions

Conceptualization, Z.H. and D.P.; methodology, Z.H.; software, X.L. and J.C.; validation, Z.H., X.L., and Q.Z.; formal analysis, Z.H.; investigation, X.L. and Y.L.; resources, Q.Z.; data curation, X.L. and Q.Z.; writing—original draft preparation, Z.H. and X.L.; writing—review and editing, Z.H., Z.M., and D.P.; visualization, X.L.; supervision, D.P.; project administration, Z.H. and D.P.; funding acquisition, Z.H. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Joint Funds of the Zhejiang Provincial Natural Science Foundation of China Under Grant No. LZJMZ24D060001, and Major Innovation Project of Science, Education and Industry Integration Pilot Project of Qilu University of Technology (Shandong Academy of Sciences) Under Grant No. 2025ZDYS01.

Data Availability Statement

The GF-1 WFV image in this study can be downloaded from China Platform of Earth Observation System (https://www.cpeos.org.cn/home/#/, accessed on 9 April 2025). SegNet can be downloaded from (https://github.com/alexgkendall/SegNet-Tutorial, accessed on 8 May 2025). U-Net++ can be downloaded from (https://github.com/MrGiovanni/UNetPlusPlus, accessed on 8 May 2025). U-NetV2 can be downloaded from (https://github.com/yaoppeng/U-Net_v2, accessed on 8 May 2025). DeepLabV3+ can be downloaded from (https://github.com/chenxi116/DeepLabv3.pytorch, accessed on 8 May 2025). BiSeNetv2 can be downloaded from (https://github.com/CoinCheung/BiSeNet, accessed on 15 January 2026). MK-UNet can be downloaded from (https://github.com/SLDGroup/MK-UNet, accessed on 15 January 2026).

Acknowledgments

We would like to thank the China Platform of Earth Observation System (CPEOS), which provided the GaoFen-1 satellite data. The authors would also like to thank the editors and reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AMFA	Adaptive Multistatistic Fusion Attentione
AMFA-DeepLab	AMFA module using DeepLabV3+ as the base architecture
GF-1	GaoFen-1
SAR	Synthetic Aperture Radar
WFV	wide field of view
GF-3	GaoFen-3
ASPP	Atrous Spatial Pyramid Pooling
MLP	Multi-Layer Perceptron
1D	one-dimensional
BCE	binary cross entropy
TP	true positive
FP	false positive
FN	false negative
P	precision
R	recall
F1	F1-score
IoU	intersection over union
FPS	frames per second

References

Geng, S.S.; Han, C.H.; Yang, Y.; Zheng, B. An analysis on the economic losses of sea ice disasters and defense countermeasures in bohai sea. Mar. Econ. 2023, 13, 101–107. [Google Scholar] [CrossRef]
Liu, W.P.; Yan, D.W.; Peng, Z.K.; Xie, M.H.; Sun, Y.L. Vessel safety navigation under the influence of antarctic sea ice. J. Mar. Sci. Eng. 2025, 13, 1267. [Google Scholar] [CrossRef]
Wang, C.L.; Wu, J.D.; He, X.; Ye, M.Q.; Liu, Y. Quantifying the spatial ripple effect of the Bohai Sea ice disaster in the winter of 2009/2010 in 31 provinces of China. Geomat. Nat. Hazards Risk 2018, 9, 986–1005. [Google Scholar] [CrossRef]
Liu, H.Y.; Guo, H.D.; Zhang, L. Sea ice classification using dual polarization SAR data. IOP Conf. Ser. Earth Environ. Sci. 2014, 17, 012115. [Google Scholar] [CrossRef]
Zou, J.H.; Zeng, T.; Guo, M.H.; Cui, S.X. The study on an Antarctic sea ice identification algorithm of the HY-2A microwave scatterometer data. Acta Oceanol. Sin. 2016, 35, 74–79. [Google Scholar] [CrossRef]
Li, X.N.; Zhang, J.; Dai, Y.S.; Zhang, X. Research on the enhanced performance of texture feature for sea ice drift monitoring based on gray level co-occurrence matrices. Mar. Sci. 2018, 42, 9–17. [Google Scholar] [CrossRef]
Zhou, Y.; Kuang, D.B.; Gong, C.L.; Hu, Y.; Fang, S.H.; Zhang, Y.; Peng, Y. A method to extract parameters of Arctic sea ice from FY-3/MERSI imagery. J. Infrared Millim. Waves 2017, 36, 41–48+126–127. [Google Scholar] [CrossRef]
Li, T.; Wu, D.; Han, R.; Xia, J.Y.; Ren, Y.J. A sea ice recognition algorithm in bohai based on random forest. Comput. Mater. Contin. 2022, 73, 3721–3739. [Google Scholar] [CrossRef]
Lia, L.; Suo, Z.Y.; Shi, L.J.; Wang, Q.; Jiao, J.N.; Tang, J.; Lu, Y.C. Identification and estimation of sea ice in Liaodong Bay using HY-1C/D satellite images. Natl. Remote Sens. Bull. 2025, 29, 897–909. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W., Frangi, A., Eds.; Springer: Cham, Switzerland, 2015. [Google Scholar]
Xu, H.; Ren, Y.B. Detecting sea ice of Bohai Sea using SAR images based on a hybrid loss U-Net model. Acta Oceanol. Sin. 2021, 43, 157–170. [Google Scholar] [CrossRef]
Zhang, T.Y.; Yang, Y.; Shokr, M.; Mi, C.L.; Li, X.M.; Cheng, X.; Hui, F.M. Deep learning based sea ice classification with Gaofen-3 fully polarimetric SAR data. Remote Sens. 2021, 13, 1452. [Google Scholar] [CrossRef]
Dai, Y.; Li, X.-M.; Yuan, H. Pan-Arctic winter sea ice classification using Sentinel-1 dual-polarized SAR images. Remote Sens. Environ. 2026, 333, 115140. [Google Scholar] [CrossRef]
Sun, S.C.; Wang, Z.Y.; Li, Z.J.; Zhang, B.J.; Tian, K.; Zhao, X.Y. An extraction method for sea ice based on improved DeepLabV3+model: Taking the Arctic Greenland Sea as an example. Acta Oceanol. Sin. 2024, 46, 131–142. [Google Scholar] [CrossRef]
Han, Y.; Gao, Y.; Zhang, Y.; Wang, J.; Yang, S. Hyperspectral sea ice image classification based on the spectral-spatial-joint feature with deep learning. Remote Sens. 2019, 11, 2170. [Google Scholar] [CrossRef]
Han, Y.; Cui, P.; Zhang, Y.; Zhou, R.; Yang, S.; Wang, J. Remote sensing sea ice image classification based on multilevel feature fusion and residual network. Math. Probl. Eng. 2021, 2021, 9928351. [Google Scholar] [CrossRef]
Zhou, L.; Xu, R.X.; Bian, J.Y.; Ding, S.F.; Han, S.; Skjetne, R. Optimized recognition algorithm for remotely sensed sea ice in polar ship path planning. Remote Sens. 2025, 17, 3359. [Google Scholar] [CrossRef]
Zheng, B.; Shi, L.J.; Zou, B.; Ren, P.; Zeng, T.; Sun, X.Y.; Zhang, C.H. Research on Sentinel-1 SAR sea ice detection method in Liaodong Bay based on AUNet++. Acta Oceanol. Sin. 2024, 46, 117–128. [Google Scholar] [CrossRef]
Chen, L.C.; Zhu, Y.K.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Lecture Notes in Computer Science; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018. [Google Scholar]
Chen, S.; Wei, X.; Zheng, W. ASA-DRNet: An Improved Deeplabv3+ Framework for SAR Image Segmentation. Electronics 2023, 12, 1300. [Google Scholar] [CrossRef]
Yang, T.; Wei, J.; Xiao, Y.; Wang, S.; Tan, J.; Niu, Y.; Duan, X.; Pan, F.; Pu, H. LT-DeepLab: An Improved DeepLabV3+ Cross-Scale Segmentation Algorithm for Zanthoxylum bungeanum Maxim Leaf-Trunk Diseases in Real-World Environments. Front. Plant Sci. 2024, 15, 1423238. [Google Scholar] [CrossRef]
Feng, X.; Wei, C.; Xue, X.; Zhang, Q.; Liu, X. RST-DeepLabv3+:Multi-Scale Attention for Tailings Pond Identiffcation with DeepLab. Remote Sens. 2025, 17, 411. [Google Scholar] [CrossRef]
Shaheed, K.; Mao, A.; Qureshi, I.; Kumar, M.; Hussain, S.; Ullah, I.; Zhang, X. DS-CNN: A pre-trained Xception model based on depth-wise separable convolutional neural network for finger vein recognition. Expert Syst. Appl. 2022, 191, 116288. [Google Scholar] [CrossRef]
Mekruksavanich, S.; Phaphan, W.; Jitpattanakul, A. Squeeze-and-Excitation Hybrid Network for Biometric Identification via Photoplethysmography. In Proceedings of the 2024 Research, Invention, and Innovation Congress: Innovative Electricals and Electronics (RI2C), Bangkok, Thailand, 8–9 August 2024; IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018. [Google Scholar]
Milletari, F.; Navab, N.; Ahmadi, S.-A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar] [CrossRef]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the International Workshop on Deep Learning in Medical Image Analysis, Granada, Spain, 20 September 2018; Springer: Cham, Switzerland, 2018. [Google Scholar]
Peng, Y.; Chen, D.Z.; Sonka, M. U-net v2: Rethinking the skip connections of u-net for medical image segmentation. In Proceedings of the 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI), Houston, TX, USA, 14–17 April 2025; IEEE: Piscataway, NJ, USA, 2025. [Google Scholar]
Rahman, M.M.; Marculescu, R. MK-UNet: Multi-kernel lightweight CNN for medical image segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, HI, USA, 19–23 October 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1042–1051. [Google Scholar]
Yu, C.; Gao, C.; Wang, J.; Yu, S.; Shen, C.; Sang, N. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]

Figure 1. Framework of the AMFA-DeepLab network model.

Figure 2. Inverted residual structure of the MobileNetV2 module.

Figure 3. Structure of the AMFA module.

Figure 4. The typical GF-1 WFV raw image covering the Liaodong Bay on 10 January 2016.

Figure 5. Visual comparison of feature response heatmaps in ablation experiment stages.

Figure 6. Comparison of loss function convergence curves during training among different models.

Figure 7. Comparison of IoU precision and efficiency of different segmentation networks.

Figure 8. Extraction results based on different segmentation networks.

Figure 9. The GF-1 WFV image on 2 February 2017 (left) and the ice identification result (right). (a) The original GF-1 WFV image shows the complex distribution of drift ice and the filiform texture at the bottom. (b) Overlay of (light blue is the predicted sea ice mask).

Table 1. The results of the evaluation metrics for the ablation experiment.

Exp.	Module			P (%)	R (%)	F1 (%)	IoU (%)
Exp.	Xception	MobileNetV2	AMFA	P (%)	R (%)	F1 (%)	IoU (%)
1	√	-	-	92.72	95.79	94.23	89.09
2	-	√	-	95.22	96.42	95.81	91.96
3	-	√	√	94.82	97.03	94.86	90.22

Where “√” means that the module is adopted, “-” means that the module is not adopted.

Table 2. The results of experimental evaluation metrics for different models.

Model	P (%)	R (%)	F1 (%)	IoU (%)	Parameters (Millions)	Training Time (h)	Inference Speed (FPS)
SegNet	91.72	93.07	92.39	85.85	29.44	6.78	121.23
MK-UNet	92.81	96.02	94.39	89.38	5.24	8.69	142.42
BiSeNetv2	92.11	94.94	93.50	87.80	5.19	3.75	430.86
U-Net++	93.76	95.67	94.71	89.95	9.95	8.34	107.91
U-Netv2	93.54	96.04	94.77	90.07	25.15	4.44	95.86
DeepLabV3+	92.72	95.79	94.23	89.09	54.71	5.48	166.83
AMFA-Deepleb	94.82	97.03	95.91	92.15	5.85	4.42	281.76

where the bold formatting indicates the optimal value in each column.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hao, Z.; Li, X.; Zhu, Q.; Li, Y.; Mao, Z.; Chen, J.; Pan, D. AMFA-DeepLab: An Improved Lightweight DeepLabV3+ Adaptive Multi-Statistic Fusion Attention Network for Sea Ice Segmentation in GaoFen-1 Images. Remote Sens. 2026, 18, 783. https://doi.org/10.3390/rs18050783

AMA Style

Hao Z, Li X, Zhu Q, Li Y, Mao Z, Chen J, Pan D. AMFA-DeepLab: An Improved Lightweight DeepLabV3+ Adaptive Multi-Statistic Fusion Attention Network for Sea Ice Segmentation in GaoFen-1 Images. Remote Sensing. 2026; 18(5):783. https://doi.org/10.3390/rs18050783

Chicago/Turabian Style

Hao, Zengzhou, Xin Li, Qiankun Zhu, Yunzhou Li, Zhihua Mao, Jianyu Chen, and Delu Pan. 2026. "AMFA-DeepLab: An Improved Lightweight DeepLabV3+ Adaptive Multi-Statistic Fusion Attention Network for Sea Ice Segmentation in GaoFen-1 Images" Remote Sensing 18, no. 5: 783. https://doi.org/10.3390/rs18050783

APA Style

Hao, Z., Li, X., Zhu, Q., Li, Y., Mao, Z., Chen, J., & Pan, D. (2026). AMFA-DeepLab: An Improved Lightweight DeepLabV3+ Adaptive Multi-Statistic Fusion Attention Network for Sea Ice Segmentation in GaoFen-1 Images. Remote Sensing, 18(5), 783. https://doi.org/10.3390/rs18050783

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AMFA-DeepLab: An Improved Lightweight DeepLabV3+ Adaptive Multi-Statistic Fusion Attention Network for Sea Ice Segmentation in GaoFen-1 Images

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Base Architecture of DeepLabV3+ and Its Limitations

2.2. The Overall Framework of the AMFA-DeepLab Network

2.2.1. MobileNetV2 Module

2.2.2. AMFA Module

2.2.3. Loss Function

2.3. Network Training

2.3.1. Dataset

2.3.2. Experimental Setting

2.4. Evaluation Metrics

3. Results

3.1. The Results of the Ablation Experiments

3.2. The Results of the Comparison Experiments

3.3. Results of the Generalization Tests

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI