SAB-DeepLabV3+: A Semantic Segmentation Framework for Mapping Maize Waterlogging from Single-Date Multispectral Imagery

An, Jiahao; Wang, Qingxue; Wang, Chunshan; Sun, Xiang; Tian, Qingwei; Yuan, Jin

doi:10.3390/agronomy16121168

Open AccessArticle

SAB-DeepLabV3+: A Semantic Segmentation Framework for Mapping Maize Waterlogging from Single-Date Multispectral Imagery

by

Jiahao An

¹,

Qingxue Wang

^2,3,*,

Chunshan Wang

^2,3,*,

Xiang Sun

^4,5,

Qingwei Tian

^2,3 and

Jin Yuan

¹

North Alabama International College of Engineering and Technology, Guizhou University, Guiyang 550025, China

²

College of Information Science and Technology, Hebei Agricultural University, Baoding 071001, China

³

Agricultural Remote Sensing Application Hebei Engineering Research Center, Baoding 071001, China

⁴

Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China

⁵

National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2026, 16(12), 1168; https://doi.org/10.3390/agronomy16121168 (registering DOI)

Submission received: 15 April 2026 / Revised: 26 May 2026 / Accepted: 12 June 2026 / Published: 15 June 2026

(This article belongs to the Special Issue Application of Machine Learning and Modelling in Food Crops)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Rapid identification of maize waterlogging is essential for post-disaster agricultural assessment, but most existing methods rely on multi-temporal imagery that is often unavailable immediately after extreme rainfall events. This study proposes SAB-DeepLabV3+, a semantic segmentation model for mapping waterlogging-affected maize from single-date multispectral imagery within pre-extracted maize planting areas. Built on DeepLabV3+, the model integrates three task-specific modules: a Spectral-Spatial Information Enhancement Module to improve feature discrimination under spectral mixing, an Adaptive Multi-Scale Pooling Module to capture heterogeneous patch sizes, and a Boundary Enhancement Module to refine transition zones. A pixel-level dataset containing 12,198 image patches was constructed from 62 multispectral scenes collected across five major maize-producing cities in Heilongjiang Province, China, during 2022–2024. On the test set, SAB-DeepLabV3+ achieved a waterlogged-class IoU of 68.30%, mIoU of 80.37%, mF1 of 88.62%, and OA of 93.49%, outperforming DeepLabV3+. Leave-one-city-out evaluation further produced an average mIoU of 76.56% and a waterlogged-class IoU of 63.45%. These results indicate that single-date high-resolution multispectral imagery can support rapid and reliable maize waterlogging mapping.

Keywords:

maize waterlogging; single-date multispectral imagery; semantic segmentation; DeepLabV3+; cross-regional generalization; agricultural disaster remote sensing

1. Introduction

Agricultural waterlogging stress is one of the major water stress types restricting stable and high yields of dryland crops such as maize [1]. It is usually driven by factors including short-duration heavy rainfall, rising groundwater levels, and insufficient farmland drainage. After waterlogging stress occurs, the root of maize remains in a state of high water content or hypoxia for a long time [2], thus inhibiting root respiration and nutrient uptake [3]. This can further lead to stunted growth, increased lodging risk, and even yield reduction. In high-latitude, flat, and large-scale agricultural areas, such disasters often occur rapidly, affect a wide range, and exhibit a highly heterogeneous spatial distribution [4]. In recent years, the frequent occurrence of extreme precipitation events in Northeast China has placed continuous pressure on regional food security and agricultural risk management, including agricultural insurance assessment and post-disaster compensation [5]. Therefore, it is of great practical significance to develop a high-precision and rapidly deployable waterlogging disaster identification method for emergency assessment, disaster relief decision-making, and precision field management.

Remote sensing technology provides key support for agricultural disaster monitoring with its wide coverage and rapid acquisition capabilities. Existing studies mostly rely on multi-temporal optical images to characterize crop stress through time-series changes in vegetation indices, water indicators, or soil moisture indicators, and have made remarkable progress in disaster evolution analysis [6,7]. However, multi-temporal methods usually depend on continuous observation data [8] and are susceptible to cloud cover, image acquisition intervals, and phenological differences [9], making it difficult to meet the demands of high timeliness and fine mapping in post-disaster emergency assessment. In practical production scenarios, only single-scene images can usually be obtained in a timely manner. Therefore, this study aims to evaluate the capacity of single-date high-resolution multispectral imagery to provide sufficient spectral-spatial discriminative information for the reliable identification of agricultural waterlogging stress in the absence of time-series information [10,11,12]. This issue essentially involves judging the information sufficiency and feature separability under single-date conditions, that is, whether the spectral response differences and spatial structural features in multispectral imagery can be fully exploited to form stable discriminative boundaries without temporal information support. Specifically, the key lies not in superficial improvements to model structures, but in constructing deep feature representation mechanisms for spectral-spatial synergistic enhancement [13].

Existing methods for identifying agricultural waterlogging stress can be broadly divided into three categories [14]. The first category includes methods based on spectral indices or threshold rules, which are simple to implement and highly interpretable, but sensitive to threshold settings and susceptible to soil background, crop coverage, and phenological differences, resulting in limited stability in complex farmland environments [15]. The second category consists of change detection methods using multi-temporal or multi-source data (e.g., SAR), which can strengthen disaster signals but usually rely on continuous observations or multi-source data collaboration, imposing certain constraints on data timeliness and cross-sensor consistency [16,17]. The third category involves deep learning-based semantic segmentation methods, which significantly improve object extraction accuracy from high-resolution images through multi-scale context modeling and have been widely applied in agricultural remote sensing scenarios [18]. However, in the task of single-date maize waterlogging stress identification, insufficient attention has been paid to spectral-spatial joint discrimination mechanisms, scale adaptability, and fine boundary representation. Overall, current research lacks systematic verification of the information sufficiency of single-date multispectral imagery, as well as spectral-spatial collaborative modeling schemes for mixed spectral responses under vegetation-covered conditions.

To address the above issues, this study proposed a spectral-spatial and boundary-enhanced semantic segmentation framework, namely SAB-DeepLabV3+ (Spectral-spatial and Boundary-enhanced DeepLabV3+), based on DeepLabV3+. This method improves the identification accuracy and spatial integrity of waterlogged regions in complex farmland scenarios via spectral-spatial information enhancement, adaptive multi-scale context modeling, and boundary-aware feature optimization. The main contributions of this paper are as follows:

(1) A single-date multispectral dataset for maize waterlogging stress was constructed, covering five prefecture-level cities in Heilongjiang Province of China, with typical events from 2022 to 2024, including pixel-level annotation, conventional testing, and cross-regional generalization validation.

(2) A maize waterlogging stress identification framework, namely SAB-DeepLabV3+, was proposed, with three customized modules: SSIEM (Spectral-Spatial Information Enhancement Module), AMSP (Adaptive Multi-Scale Pyramid Module), and BEAM (Boundary Enhancement Attention Module), targeting spectral aliasing, scale heterogeneity, and boundary blurring, respectively.

(3) The effectiveness of the proposed method was systematically evaluated through conventional tests and cross-regional independent tests, confirming the application potential of single-date RGB-NIR imagery for rapid maize waterlogging mapping.

2. Materials and Methods

2.1. Study Area

In this study, five typical major maize-producing prefecture-level cities in Heilongjiang Province were selected as the study area, including Daqing, Qiqihar, Mudanjiang, Jixi and Hegang (Figure 1). All these cities are located in the core commodity grain production area of Northeast China, with large maize planting areas. In recent years, these cities have been affected by varying degrees of heavy rainfall and field waterlogging during the maize growing season, making them ideal for regional-scale research on maize waterlogging monitoring.

The study area features diverse landform types, covering typical agricultural ecological units such as the low-lying agricultural area in the central-western Songnen Plain, the western margin of the Sanjiang Plain, and the piedmont hilly transition zone of the Lesser Khingan Mountains. Specifically, among the five cities included, Daqing and Qiqihar have relatively small topographic relief and are prone to persistent plain waterlogging in years with excessive precipitation; Mudanjiang, Jixi, and Hegang are dominated by mountain-hill-plain composite landforms, with waterlogging processes mostly manifested as slope runoff convergence, stagnant water in low-lying areas, and poor farmland drainage conditions.

Climatically, the study area has a temperate continental monsoon climate with uneven annual precipitation distribution, concentrated summer rainfall, and prominent interannual fluctuations. In the middle and late growth stages of maize, short-duration heavy rainfall or persistent overcast rain can easily induce field waterlogging and rhizosphere stress, exerting cumulative effects on crop physiological activities and canopy structure [19]. Variations in precipitation conditions, topographic convergence patterns, and farmland water conservancy infrastructure among different cities lead to significant heterogeneity in the spatial distribution, patch morphology, and damage severity of waterlogging impacts [20].

Overall, the study area can well represent the typical environmental conditions where maize waterlogging stress mainly occurs in Heilongjiang Province, providing a representative application background for subsequent construction of remote sensing samples and methodical validation.

2.2. Data Sources and Dataset Construction

This study constructed a cross-regional remote sensing dataset for maize waterlogging stress using post-event single-date multispectral images acquired during the maize growing seasons from 2022 to 2024. The waterlogging events considered in this study refer to local field waterlogging caused by short-duration heavy rainfall or persistent rainfall during the middle and late growth stages of maize in the five study cities. Image acquisition was mainly concentrated from mid-to-late August to early September, when waterlogging stress had produced observable effects on canopy structure, leaf pigment status, and field background. The 62 selected scenes were acquired after different local waterlogging events across Daqing, Hegang, Jixi, Mudanjiang, and Qiqihar, and were processed as independent single-date scenes rather than mosaicked into a unified regional image. Specifically, the dataset included 20 scenes from 2022, 21 scenes from 2023, and 21 scenes from 2024.

The remote sensing data were obtained from the GW-A59-C multispectral satellite product of the China SatNet GW constellation, also known as the Guowang constellation. To avoid ambiguity, “Guowang” in this study refers to the China SatNet GW constellation and is not related to the State Grid Corporation of China or the Power Engineering Satellite. The main characteristics of the GW-A59-C data product are summarized in Table 1. After screening for cloud cover, imaging quality and disaster coverage, 62 scenes of images were finally selected for dataset construction, including 11 scenes in Daqing, 15 in Hegang, 15 in Jixi, 12 in Mudanjiang and 9 in Qiqihar. The original GW-A59-C images were geometrically referenced multispectral products with RPC/WGS84 UTM/WGS84 projection information. Image preprocessing was performed using ENVI 5.6 software (NV5 Geospatial Solutions, Inc., Boulder, CO, USA), mainly including radiometric calibration, atmospheric correction, and spatial registration, to ensure consistency among multispectral images, maize masks, and vector labels.

The dataset construction workflow consisted of three stages: data preparation, data processing, and dataset creation, as shown in Figure 2. In the data preparation stage, maize planting vector data were used to extract maize planting regions and generate maize masks. Drone aerial photography and field sampling were not used as model input data, but served as auxiliary reference information for label verification and correction. Field campaigns were conducted during the 2022–2024 growing seasons, mainly from mid-to-late August to early September and generally within three days after local waterlogging events. The campaigns were carried out in representative maize fields across Daqing, Hegang, Jixi, Mudanjiang, and Qiqihar, focusing on typical waterlogged areas, non-waterlogged areas, and uncertain boundary regions.

In the data processing stage, the preprocessed multispectral images were cropped using maize masks to retain only maize planting regions. NDVI was calculated from the Red and NIR bands to assist visual interpretation of waterlogging-affected maize. The NDVI result map was used together with RGB-NIR imagery, spectral anomalies, texture changes, field morphology, drone photographs, and field sampling records to generate and refine the binary disaster map. It should be noted that NDVI was used only as auxiliary interpretation information rather than as an independent threshold-based labeling rule. Specifically, manual interpretation was conducted within the maize masks by jointly comparing RGB composites, NIR band responses, NDVI maps, and field verification information. Waterlogging-affected maize was identified when patches showed reduced vegetation vigor, abnormal visible color or brightness, weakened NIR reflectance, lower NDVI values relative to surrounding healthy maize, irregular or aggregated patch patterns, and spatial correspondence with low-lying or poorly drained field areas. In contrast, maize areas with normal canopy color, relatively homogeneous texture, stable NIR response, and no field evidence of waterlogging were labeled as non-waterlogged maize. Ambiguous regions, especially transition zones and fragmented patches, were further checked using drone photographs and field sampling records before final pixel-level correction.

Initial labeling was independently completed by five annotators and reviewed by three experts to ensure label consistency and reliability. Approximately one-third of typical sample regions in each city were selected for drone and field verification, and the interpretation labels were revised pixel by pixel according to the verification results. The final labeling system included two categories: non-waterlogged maize, assigned as 0, and waterlogging-affected maize, assigned as 1. After rasterization, the vector labels were registered with the corresponding multispectral images to generate standardized semantic segmentation labels. Figure 3 displays the overlay results of multispectral images and pixel-level labels for visual verification of the consistency between label boundaries and image features.

In the dataset creation stage, the labeled maize images and corresponding binary disaster maps were cropped into 512 × 512-pixel patches using a sliding-window strategy, yielding a total of 12,198 valid image-label samples. Considering the spatial autocorrelation of remote sensing samples, a spatially non-overlapping tiling strategy was adopted to construct the training, validation, and test sets with a split ratio of 7:1:2. This strategy was used to reduce evaluation bias caused by spatial information leakage from adjacent samples. It should be noted that this splitting method belongs to spatially non-overlapping patch partitioning rather than fully scene-independent partitioning. Therefore, a leave-one-city-out cross-regional experiment was further designed to evaluate the model’s transferability from a stricter region-independent perspective. Specifically, in each round of this experiment, samples from one city were used as the independent test region, while samples from the remaining four cities were used for model training and validation. This process was repeated five times, with Daqing, Hegang, Jixi, Mudanjiang, and Qiqihar each serving once as the held-out test city.

During the training phase, data augmentation was applied only to the training samples to improve model robustness. Specifically, each patch was randomly scaled within 0.5–1.5 and then cropped or padded to 512 × 512 pixels. Random horizontal flipping was performed with a probability of 0.5, and random rotation within −10° to 10° was applied with a probability of 0.25. Brightness, contrast, and hue perturbations were applied only to the RGB bands, while the NIR band was kept unchanged to preserve its spectral consistency. In addition, Gaussian blur with a 5 × 5 kernel was randomly applied with a probability of 0.25. For geometric transformations, bilinear interpolation was used for images, whereas nearest-neighbor interpolation was used for label masks to maintain categorical labels.

2.3. Research Methods

2.3.1. DeepLabV3+

DeepLabV3+ is a classic encoder–decoder semantic segmentation framework that has been widely used in remote sensing feature extraction and agricultural scene segmentation tasks. The model expands the receptive field through dilated convolutions and aggregates multi-scale contextual information using the Atrous Spatial Pyramid Pooling (ASPP) strategy, thus enhancing semantic representation while maintaining relatively high feature resolution [21]. Subsequently, the decoder fuses the up-sampled high-level semantic features with low-level detail features to recover spatial resolution and improve the representation of boundary transition zones [22]. The overall network architecture of DeepLabV3+ is shown in Figure 4.

To balance segmentation accuracy and computational efficiency, we selected DeepLabV3+ as the baseline framework, with MobileNetV2 as the backbone network. Although DeepLabV3+ exhibits strong advantages in multi-scale feature modeling, it still suffers from three limitations in the task of single-date multispectral maize waterlogging mapping:

(1) Insufficient spectral discriminative modeling. The original DeepLabV3+ framework is designed mainly for natural images and cannot explicitly model the differential contributions of RGB-NIR bands in waterlogging identification.

(2) Relatively fixed scale response. The multi-branch receptive fields of the standard ASPP are determined by preset dilation rates, making it difficult to fully adapt to the significant differences in the area, shape, and spatial organization of waterlogged patches in farmland.

(3) Limited boundary recovery capability. When disaster-affected and non-disaster-affected regions show gradual transitions, relying solely on conventional decoder fusion makes it challenging to accurately delineate field boundaries and fragmented patches [23,24].

To address the above issues, we introduced targeted enhancement modules at key feature flow nodes of DeepLabV3+ to form an improved framework.

2.3.2. SAB-DeepLabV3+

To tackle the common problems like spectral aliasing, large patch-scale variations, and boundary blurring in single-date multispectral maize waterlogging imagery, we proposed the SAB-DeepLabV3+ (Spectral-spatial and Boundary-enhanced DeepLabV3+) framework based on DeepLabV3+ (overall structure shown in Figure 5). The model retains the original encoder–decoder backbone and only embeds enhancement modules at three key positions to improve the model’s adaptability to complex farmland scenes.

The SAB-DeepLabV3+ model still adopts MobileNetV2 as the backbone network. Specifically, the input image first passes through the encoder to extract low-level spatial detail features and high-level semantic features. Subsequently, the high-level semantic features sequentially pass through the Spectral-Spatial Information Enhancement Module (SSIEM) and the Adaptive Multi-Scale Pooling Module (AMSP) to improve discriminative representation and contextual modeling for disaster-affected regions. In the decoding stage, the up-sampled high-level features are fused with low-level detail features and further refined by the Boundary Enhancement Attention Module (BEAM) to sharpen the boundaries of waterlogged regions, finally outputting pixel-level segmentation results.

The three modules aforementioned correspond to three key challenges in single-date maize waterlogging identification:

(1): SSIEM operates on high-level encoder features and adaptively recalibrates different spectral channels and their spatial responses using global context, thereby strengthening the effective information related to waterlogging discrimination while suppressing redundant or interfering responses.
(2): AMSP replaces the standard ASPP module and dynamically learns the weights of different receptive field branches, thereby improving the model’s ability to jointly adapt to large-scale contiguous waterlogged regions and small-scale fragmented disaster patches.
(3): BEAM operates on decoded fused features to strengthen the representation of transition zones between waterlogged and non-waterlogging-affected regions using boundary-sensitive information, thus improving boundary expression and shape recovery.

Notably, the design goal of SAB-DeepLabV3+ is not simply to stack generic attention structures, but to implement task-driven structural improvements targeting specific identification challenges in single-date maize waterlogging mapping. While maintaining the stability of the overall network, the model balances segmentation accuracy, boundary quality, and computational efficiency. The detailed structures and mathematical formulations of SSIEM, AMSP, and BEAM are described in detail in the following subsections. To improve the readability of the multi-module architecture, the encoder–decoder assignment and functional role of each module are summarized in Table 2.

2.3.3. SSIEM Module

In the SAB-DeepLabV3+ framework, the high-level features output by the encoder carry not only spatial structural information but also implicit response differences in multispectral bands under varying crop physiological status and waterlogging stress conditions. However, in single-date agricultural remote sensing images, land cover types such as waterlogged vegetation, healthy vegetation, water bodies, and bare soil often exhibit significant overlaps in the spectral space, weakening the discriminative ability of some spectral channels for waterlogging identification and making them vulnerable to background interference.

Traditional convolutional neural networks (CNNs) generally treat multispectral channels as equivalent and deterministic input features, lacking explicit modeling of the differences in discriminative contribution among spectral channels. This tends to induce disaster signals to be masked by redundant background or ambiguous spectral information. In view of this, we introduced SSIEM at the output end of the encoder’s high-level features to adaptively recalibrate different spectral channels and their spatial responses. This strategy can strengthen effective features related to waterlogging discrimination while suppressing redundant or interfering responses. The overall structure of SSIEM is shown in Figure 6.

Let the high-level features output by the encoder be expressed as:

F \in ℝ^{C \times H \times W}

(1)

where C, H, and W represent the number of channels, feature map height, and width, respectively.

First, Global Average Pooling (GAP) is performed on the spatial dimension to obtain channel-wise spectral response description vectors as follows:

z = G A P (F), z \in ℝ^{C}

(2)

Then, an efficient channel attention mechanism is used to model inter-channel dependencies. This process can be expressed as:

w = σ (C o n v 1 D (z))

(3)

where

w \in R^{C}

represents the confidence estimate of channel-wise spectral responses;

C o n v 1 D (\cdot)

denotes a one-dimensional convolution operator used to capture local correlations between adjacent spectral channels;

σ (\cdot)

is the Sigmoid activation function.

Based on this, channel recalibration is performed on the original features as follows:

F' = w ⊙ F

(4)

where

⊙

denotes element-wise multiplication.

On the basis of channel enhancement, the spatial heterogeneity of waterlogged regions and the gradual characteristics of their boundary transition zones are taken into account. Specifically, SSIEM incorporates an adaptive response enhancement mechanism in the spatial dimension to generate a spatial weight map by aggregating channel-wise information and applying spatial convolution as follows:

M_{s} = σ (f^{3 \times 3} (C o n c a t (A v g P o o l (F'), M a x P o o l (F'))))

(5)

where

f^{3 \times 3} (\cdot)

denotes a 3×3 convolution operator, and

M_{s} \in R^{H \times W}

.

High-weight regions generally correspond to core waterlogged regions with distinct discrimination, while low-weight regions are mostly distributed in transition zones between waterlogged regions and the background. Finally, the module output can be expressed as:

F_{o u t} = M_{s} ⊙ F'

(6)

Through the above joint channel-spatial recalibration process, SSIEM can highlight the spectral-spatial responses related to waterlogging identification from high-level features, thus improving feature discriminability in complex farmland scenarios. This provides more stable semantic representations for subsequent multi-scale contextual modeling.

2.3.4. AMSP Module

In a standard DeepLabV3+ network, the ASPP module extracts multi-scale contextual information through the Atrous Convolution branches with different preset dilation rates, based on the implicit assumption that the target scale distribution is relatively stable. However, in agricultural disaster remote sensing scenarios, disaster-affected regions often exhibit highly uneven scales and fragmented spatial morphologies, such that fixed dilation rates cannot effectively respond to disaster patches of different scales.

To address this problem, we proposed the AMSP module, which dynamically adjusts the contribution of different receptive field branches by introducing a scale weight learning mechanism. Its structural diagram is shown in Figure 7.

Let the output features of SSIEM be F″. AMSP performs multi-scale feature extraction through K parallel branches as follows:

F_{k} = f_{k} (F''), k = 1, \dots, K

(7)

where

f_{k} (\cdot)

denotes convolution operations under different dilation rates or receptive field configurations.

To avoid scale information redundancy caused by simple concatenation or averaging, a global feature-guided weight generation mechanism was introduced. First, global pooling is performed on features of each scale and the results are fused to obtain a scale description vector as follows:

s = \sum_{k = 1}^{K} G A P (F_{k})

(8)

Then, scale weights are generated through linear channel mapping (1 × 1 convolution) and an activation function:

α = S o f t m a x (f^{1 \times 1} (s))

(9)

where

α = [α_{1}, . . ., α_{K}]

and

\sum_{k} α_{k} = 1

.

Finally, multi-scale features are fused according to their adaptive weights:

F_{A M S P} = \sum_{k = 1}^{K} α_{k} \cdot F_{k \cdot}

(10)

Unlike channel or kernel selection mechanisms, the weight learning of AMSP acts directly on the scale-level context branch level, enabling the model to autonomously emphasize more discriminative receptive field branches according to the scale characteristics of disaster patches in the input image, thus improving the flexibility and robustness of multi-scale context modeling.

2.3.5. BEAM Module

In single-date maize waterlogging identification tasks, wide transition zones often exist between waterlogged and non-waterlogged regions. Especially in regions with mild waterlogging, fragmented fields, or complex background textures, boundaries usually exhibit gradual rather than sharp transition characteristics. Therefore, relying solely on high-level semantic features easily leads to local over-smoothing and a certain degree of misclassification in boundary transition zones. To address this issue, we introduced BEAM after the fusion of low-level and high-level features, which integrates local edge cues and high-level semantic information to improve the representation quality of boundary transition zones and local classification consistency. The overall structure of BEAM is shown in Figure 8.

Let the fused feature in the decoding stage be denoted as

F_{d}

. First, edge responses are extracted through a gradient operator as follows:

E = g (F_{d})

(11)

where

g (\cdot)

denotes a differentiable Sobel operator used to characterize potential boundary zones. This operator can participate in end-to-end training under the backpropagation framework.

Subsequently, a semantically guided attention mechanism is introduced to map high-level semantic information into boundary weights:

A_{b} = σ (f^{1 \times 1} (F_{d}))

(12)

where

f^{1 \times 1} (\cdot)

denotes a channel mapping operator used to generate attention weights related to boundaries.

Finally, the boundary-enhanced feature is expressed as:

F_{B E A M} = F_{d} + A_{b} ⊙ E

(13)

Through the above design, BEAM can maintain the dominance of high-level semantic information while appropriately introducing local structural constraints. This is conducive to refining the representation of boundary transition zones and improving the local integrity of waterlogged patches.

2.3.6. Relationship to Existing Modules

The proposed SSIEM, AMSP, and BEAM are task-oriented adaptations of existing attention, multi-scale, and boundary-refinement strategies for single-date RGB-NIR maize waterlogging segmentation. While Table 2 provides a functional overview of their roles in the encoder–decoder pipeline, Table 3 further reports their detailed architectural settings and computational costs.

Compared with standalone attention modules such as SE, ECA, CBAM, and CoordAttention, SSIEM is not used as a generic plug-in attention block. Instead, it is designed as a task-oriented spectral-spatial enhancement module for RGB-NIR maize waterlogging segmentation. By jointly incorporating spectral response enhancement, channel recalibration, and spatial encoding, SSIEM aims to strengthen waterlogging-related features while suppressing interference from non-waterlogged maize, moist soil, and heterogeneous field backgrounds.

AMSP is derived from the multi-scale context modeling idea of ASPP, but differs from standard ASPP and static multi-scale variants by introducing adaptive scale selection. Rather than simply concatenating atrous convolution branches with fixed receptive fields, AMSP dynamically adjusts the contribution of different scale branches according to the spatial morphology of waterlogged patches, thereby improving its adaptability to both large continuous regions and fragmented patches.

BEAM is related to edge-guided refinement modules, but it does not introduce an additional edge-supervision branch. Instead, it uses Sobel-derived edge responses together with semantic guidance and boundary gating within the decoder. This design allows the model to refine gradual transition zones between waterlogged and non-waterlogged maize while reducing the influence of irrelevant texture edges.

2.4. Experimental Environment and Configuration

All experiments were performed in a 64-bit Linux system (Kernel 5.15.0) with Python 3.8.20 installed. The deep learning framework employed was PyTorch 2.4.1 (built on CUDA 12.1), combined with cuDNN 9.1.0 for accelerated computing and operator optimization. The hardware platform was equipped with 4 NVIDIA GeForce RTX 3090 graphics processing units (each with 24 GB of video memory and a computing capability of 8.6), which facilitate multi-GPU parallel training to meet the video memory capacity and computational throughput requirements of semantic segmentation tasks for high-resolution remote sensing images.

The Adam optimizer was uniformly adopted during model training, with an initial learning rate of 1 × 10⁻⁴, a batch size of 16, and total training epochs of 100. The loss function adopted a combination of cross-entropy loss and Dice loss to balance pixel-level classification accuracy and regional overlap consistency. To ensure fair comparison, all deep learning models were trained and evaluated using the same training, validation, and test splits, input size, data augmentation strategy, optimizer, learning-rate schedule, batch size, number of epochs, loss function, and evaluation metrics. The main comparative experiments were conducted with a fixed random seed of 11. To further evaluate training variability, the ablation experiments were repeated using three random seeds, namely 11, 22, and 33, and the results were reported as mean ± standard deviation.

Module-level profiling was conducted on an NVIDIA GeForce RTX 3090 GPU using the feature dimensions listed in Table 3. Inference time was averaged over 100 iterations after 10 warm-up runs, and FLOPs were reported according to the profiler’s multiply add convention.

2.5. Evaluation Indicators

To comprehensively evaluate the segmentation performance of the proposed model in the task of maize waterlogging identification, we adopted Intersection over Union (IoU), mean Intersection over Union (mIoU), Overall Accuracy (OA), mean Pixel Accuracy (mPA), and mean F1-score (mF1) as evaluation indicators. Considering that waterlogged regions usually account for a small proportion in pixel distribution, mIoU, mPA, and mF1 were used simultaneously to alleviate the influence of class imbalance on evaluation results.

Single-class IoU is defined as the degree of overlap between the prediction result and the ground truth, with the calculation formula as follows:

I o U = \frac{T P}{T P + F P + F N}

(14)

where TP, FP, and FN denote the correctly classified target pixels, background pixels misclassified as targets, and missed target pixels, respectively. The value of mIoU is calculated as the average IoU over all classes.

OA was used to measure the overall classification correctness across the entire image, defined as:

O A = \frac{T P + T N}{T P + F P + F N + T N}

(15)

where TN denotes the number of pixels correctly classified as background.

To further measure the recognition balance among classes, we adopted mPA, defined as the average of single-class pixel accuracies:

m P A = \frac{1}{K} \sum_{i = 1}^{K} \frac{T P_{i}}{T P_{i} + F N_{i}}

(16)

where K denotes the number of classes.

F1-score comprehensively takes into account the precision and recall, with the single-class F1 defined as:

F 1 = \frac{2 T P}{2 T P + F P + F N} = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(17)

Subsequently, we further calculated the average F1-score over all classes as mF1 to characterize the model’s comprehensive recognition ability.

In summary, this study comprehensively evaluated model performance from the perspectives of spatial overlap, overall classification accuracy, and class balance, providing a basis for performance comparison among different methods in maize waterlogging segmentation tasks.

3. Results and Analysis

3.1. Spectral Separability Analysis

To evaluate the spectral separability between waterlogged and non-waterlogged maize in single-date multispectral imagery, 10,000 pixels were randomly selected from each class. The Jeffries-Matusita (JM) distance was then calculated for single-band features, the four-band RGB-NIR feature, and two vegetation indices, namely NDVI and GNDVI. NDVI was selected because it uses the contrast between red absorption and NIR reflectance to characterize vegetation vigor, while GNDVI replaces the red band with the green band and is generally more sensitive to chlorophyll-related variations. Both indices can be directly derived from the available RGB-NIR imagery and are therefore suitable for evaluating whether commonly used vegetation-index transformations can improve the separability of waterlogged and non-waterlogged maize.

The results showed that the four-band RGB-NIR feature achieved the highest JM distance of 1.3291 among the tested configurations, providing empirical evidence that the combined multispectral feature offered stronger separability than individual bands and vegetation indices. For single-band features, the Red band showed the highest separability with a JM distance of 0.6589, followed by NIR, Green, and Blue, with JM distances of 0.5494, 0.5288, and 0.3741, respectively. The JM distances of NDVI and GNDVI were 0.7229 and 0.8845, respectively, both higher than those of individual bands but lower than that of the four-band RGB-NIR feature. Compared with GNDVI, the best-performing index, the four-band feature increased the JM distance by 0.4446. This indicates that although vegetation indices can enhance the spectral contrast between the two classes, a single index may compress band-specific information and cannot fully represent the spectral complexity of maize waterlogging in heterogeneous farmland scenes.

Overall, single-date RGB-NIR imagery exhibited a “separable but not fully separated” characteristic for maize waterlogging identification. These results demonstrate the potential of single-date multispectral data for rapid waterlogging assessment, while also indicating that spectral features alone remain insufficient under complex farmland conditions. Therefore, further integration of spatial context and deep feature modeling is necessary to improve segmentation accuracy.

3.2. Comparative Experiments

3.2.1. Comparison with Traditional Single-Date Methods

To further verify the feasibility of applying single-date multispectral imagery for waterlogging identification within maize planting regions, we conducted a comparative experiment with three types of traditional single-date baseline methods: the NDVI threshold method, GNDVI threshold method, and Random Forest (RF) classification method. For the NDVI and GNDVI threshold methods, the thresholds were automatically selected only on the training set. Class-balanced pixels were first sampled from the training patches, and 1200 candidate thresholds were generated from the 0.001–0.999 quantiles of the sampled index values. For each candidate threshold, both lower-than-threshold and higher-than-threshold decision rules were tested. The optimal threshold was defined as the one that achieved the highest F1-score for the waterlogging-affected maize class, and the selected threshold was then fixed for evaluation on the test set. The RF classifier used a total of 22-dimensional input features, including original bands, vegetation index/ratio features, and local spatial statistical features. Class-balanced sampling was employed during training, and the number of trees was set to 500. Meanwhile, class_weight = balanced_subsample was adopted to alleviate the impact of class imbalance. To ensure consistency for comparison with deep learning models, the threshold optimization and classifier training for traditional methods were only performed on the training set, with the final results uniformly reported on the test set. Relevant results are presented in Table 4.

3.2.2. Comparison with Deep Learning Segmentation Models

To systematically evaluate the performance of the proposed SAB-DeepLabV3+ model in remote sensing identification of maize waterlogging stress, we selected several representative semantic segmentation models for comparative experiments, including encoder–decoder CNNs, lightweight real-time models, multi-scale context modeling methods, high-resolution or attention-enhanced models, and Transformer-based semantic segmentation models. To ensure a fair comparison, all deep learning models were trained from scratch using the same data splits, input size, data augmentation strategy, optimizer settings, training epochs, loss function, and evaluation metrics, without using extra pre-trained weights. Data augmentation was applied only to the training set, while the validation and test sets were not augmented. Quantitative evaluation results are summarized in Table 5.

As can be seen from Table 5, the IoU differences among different models for the non-waterlogged maize class were relatively small, with most methods exceeding 88%, indicating that this class has generally good separability in single-date multispectral imagery. In contrast, the IoU of the waterlogging-affected maize class fluctuated much more significantly, suggesting that waterlogged regions are more complex in spatial morphology and spectral response, making them the key for distinguishing model performance.

From the perspective of model architecture, traditional encoder–decoder structures (e.g., UNet, DoubleUNet) performed well in preserving local details well but had limited ability to model large-scale continuous waterlogged regions holistically. Although the lightweight model BiSeNetV2 achieved high efficiency, it was weak in depicting fine-grained waterlogged regions in complex farmland scenes. In contrast, models with multi-scale context modeling capabilities (e.g., PSPNet, FPN, and DeepLabV3+) performed more stably overall, indicating that multi-scale semantic fusion plays an important role in waterlogging identification. High-resolution preservation or attention-enhanced models (e.g., HRNet, DCSA-UNet) showed certain advantages in spatial detail representation, but their overall improvement on our dataset was limited. Transformer-based models (e.g., SegFormer, TransUNet, and SwinUNet) showed no obvious advantages without pre-trained weights, implying a strong dependence on prior knowledge when dealing with moderate-scale agricultural remote sensing samples.

To further support the architectural interpretation above, Figure 9 presents qualitative segmentation results from representative test regions that emphasize three key aspects of model performance: holistic modeling of large-scale continuous waterlogged areas, multi-scale contextual discrimination, and preservation of high-resolution spatial details. As shown in Figure 9, models with limited global-context modeling tended to produce discontinuous or fragmented predictions in large waterlogged patches, whereas multi-scale context-based models generally provided more coherent segmentation results. Meanwhile, high-resolution or attention-enhanced models showed relatively better delineation of local boundaries and small patches, although their improvements were not always consistent across scenes. In comparison, SAB-DeepLabV3+ achieved a better balance among large-scale continuity, multi-scale discrimination, and boundary detail preservation, thereby reducing omission and misclassification in complex farmland conditions.

3.3. Ablation Experiments

To systematically evaluate the independent contribution and synergistic mechanism of each functional module in the proposed SAB-DeepLabV3+ framework, progressive ablation experiments were conducted on the basis of the DeepLabV3+ baseline model. Specifically, the single-module, dual-module, and complete model configurations were implemented stepwise by sequentially incorporating SSIEM, AMSP, and BEAM. To assess the stability of the ablation results, each configuration was trained three times using random seeds of 11, 22, and 33. The quantitative results are reported as mean ± standard deviation in Table 6.

As shown in Table 6, the baseline DeepLabV3+ achieved a high IoU-NM of 90.96% ± 0.27%, whereas its IoU-WM was only 63.20% ± 0.92%, indicating that waterlogging-affected maize remained the more difficult class. Under single-module settings, SSIEM, AMSP, and BEAM all improved the baseline performance to different degrees. SSIEM produced the largest single-module gain, increasing IoU-WM and mIoU to 65.01% ± 0.14% and 78.15% ± 0.06%, respectively. AMSP also improved IoU-WM to 64.54% ± 0.42%, while BEAM showed a relatively moderate gain, suggesting that its main contribution lies in boundary refinement and local consistency.

When SSIEM and AMSP were jointly incorporated, IoU-WM and mIoU further increased to 66.99% ± 0.44% and 79.40% ± 0.32%, respectively, indicating their complementarity in spectral-spatial enhancement and multi-scale representation. The complete SAB-DeepLabV3+ achieved the best overall performance, with IoU-WM, mIoU, mF1, and OA reaching 68.15% ± 0.22%, 80.23% ± 0.22%, 88.53% ± 0.13%, and 93.40% ± 0.16%, respectively. Compared with the baseline, the complete model improved these four metrics by 4.95, 3.15, 2.17, and 1.22 percentage points on average, demonstrating the stable and complementary effects of the three proposed modules.

Figure 10 provides a qualitative comparison of the ablation results in three representative difficult scenarios, including fragmented waterlogging patches, blurred boundary regions, and areas with spectral confusion. For each case, the original image, ground truth, baseline prediction, dual-module prediction with SSIEM and AMSP, and final tri-module prediction are presented. Compared with the baseline model, which tended to produce incomplete or discontinuous predictions in fragmented waterlogged areas, the dual-module configuration improved the spatial continuity and overall integrity of affected regions. These visual observations are consistent with the quantitative results in Table 6 and indicate the complementary effects of SSIEM and AMSP in enhancing spectral-spatial discrimination and multi-scale structural representation. After further introducing BEAM, the complete model produced more stable predictions along transition zones and reduced local misclassification near blurred boundaries, suggesting that BEAM contributes mainly to boundary refinement and local spatial consistency.

Overall, the qualitative results in Figure 10 are consistent with the multi-seed quantitative trends in Table 6. From the baseline model to the dual-module configuration and then to the complete SAB-DeepLabV3+ model, the recognition integrity of waterlogged regions, boundary consistency, and local classification stability were progressively improved. These results provide both quantitative and visual evidence for the effectiveness of the proposed modular design.

3.4. Comparative Experiments with Representative Modules

Based on the structural differences discussed in Section 2.3.6, we further conducted replacement experiments to empirically evaluate whether the proposed task-oriented modules provide advantages over representative existing attention, multi-scale, and boundary-refinement structures. We adopted a univariate replacement strategy to perform comparative experiments on the high-level feature enhancement module, multi-scale context module, and boundary enhancement module separately under consistent backbone network, decoder architecture, training settings, and data partitioning.

3.4.1. Comparison Between SSIEM and Generic Attention Modules

To verify the effectiveness of SSIEM in single-date maize waterlogging identification, we separately introduced SE, ECA, CBAM, and SSIEM at the high-level feature output of the encoder and compared with the baseline DeepLabV3+ without incorporating an attention module. All experiments retained the original ASPP context module and decoder structure, with only the high-level feature enhancement method replaced to ensure fair comparison.

As shown in Table 7, all generic attention modules improved waterlogging identification to a certain extent, confirming that adaptive enhancement of high-level semantic features is beneficial for this task. Compared with the baseline model, CBAM, SE, ECA, and CoordAttention increased the IoU of the waterlogging-affected class from 62.23% to 64.37%, 64.51%, 63.93%, and 64.67%, respectively. Among these generic attention modules, CoordAttention achieved the highest IoU-WM and mIoU, indicating that incorporating coordinate-aware spatial information can help improve the representation of waterlogging-affected regions. However, SSIEM still achieved the best overall performance, with IoU-WM, mIoU, Recall-WM, and mF1 reaching 65.14%, 78.18%, 79.57%, and 87.15%, respectively. In particular, the higher Recall-WM suggests that SSIEM is more effective in reducing missed detections of weakly responsive waterlogging-affected regions. Overall, compared with generic attention modules, including CBAM, SE, ECA, and CoordAttention, SSIEM is more suitable for modeling subtle spectral-spatial differences between waterlogged and non-waterlogged maize under single-date RGB-NIR conditions.

3.4.2. Comparison Between AMSP and Multi-Scale Context Modules

To verify the effectiveness of the adaptive multi-scale modeling strategy in AMSP, we further compared among the original ASPP, the static multi-scale structure without dynamic scale selection (Static-MSP), and the full AMSP. All three modules operated on high-level encoder features, with other network architectures and training configurations unchanged. Static-MSP retained the same multi-scale branch settings as AMSP but adopted fixed concatenation fusion instead of dynamic scale weight learning. This was designed to distinguish whether performance gains originate from the explicit multi-scale structure itself or the adaptive scale selection mechanism.

As shown in Table 8 Static-MSP already outperformed the original ASPP, raising the IoU of the waterlogging-affected class from 62.23% to 64.17%, indicating that explicit multi-scale branches are conducive to enhancing representation for waterlogged patches of different sizes. On this basis, AMSP further achieved the best performance, with the IoU, mIoU, and mF1 of the waterlogging-affected class reaching 64.96%, 78.08%, and 87.08%, respectively. Compared with Static-MSP, AMSP yielded additional gains in Recall-WM and overall mIoU, demonstrating that its improvement stems not only from the multi-scale structure but also from adaptive selection of scale responses. In general, AMSP can better adapt to variations in the area, shape, and spatial organization of waterlogged patches, thus delivering stronger context modeling capability.

3.4.3. Comparison Between BEAM and Boundary Enhancement Variants

To verify the roles of each component in BEAM, we constructed two simplified variants for comparative analysis while keeping the full BEAM structure unchanged, namely Edge-only and Semantic-no-gate. The former only uses Sobel edge extraction and residual enhancement, whereas the latter further incorporates semantic guidance without the disaster boundary gate. The complete BEAM integrates three components simultaneously: edge extraction, semantic guidance, and disaster boundary gating. All boundary enhancement modules were inserted after the fusion of low-level and high-level features, with other network architectures and training configurations remaining consistent.

As shown in Table 9, all boundary enhancement strategies consistently improved segmentation performance for the waterlogging-affected maize class relative to the no-boundary-module baseline. Specifically, Edge-only and Semantic-no-gate achieved waterlogged-class IoU values of 63.32% and 63.51%, respectively, both exceeding the baseline by a clear margin. These results indicate that edge cues alone can partially sharpen boundary delineation, while the addition of semantic guidance provides complementary contextual information that further refines class transition regions. The full BEAM configuration, which integrates edge enhancement, semantic context, and a boundary gating mechanism, achieved the best overall performance, with a waterlogged-class IoU of 63.99%, an mIoU of 77.74%, and an OA of 92.61%. Notably, the performance gain of the full BEAM over Edge-only and Semantic-no-gate is modest in absolute terms but consistent across all metrics, suggesting that the boundary gating mechanism plays a complementary rather than dominant role. These findings collectively indicate that the performance improvement brought by BEAM arises from the synergistic interaction among edge information, semantic context, and adaptive boundary gating, rather than from any single component in isolation.

Overall, these comparative experiments provide empirical support for the task-oriented design of SSIEM, AMSP, and BEAM. Compared with representative alternative structures, SSIEM achieved stronger enhancement of waterlogging-related high-level features, AMSP improved the adaptability of multi-scale context modeling, and BEAM provided complementary refinement for boundary transition zones. These results indicate that the performance improvement of SAB-DeepLabV3+ is derived from the combined effects of spectral-spatial discrimination enhancement, adaptive scale representation, and boundary-aware feature refinement, rather than from a single module alone.

3.5. Generalization Experiment

To evaluate the model’s transferability in unseen regions, we conducted cross-regional generalization experiments by employing a leave-one-city-out strategy. In each trial, one city was selected as the independent test area, while the remaining four cities were used for training and validation, with no retraining or fine-tuning during the testing phase. This setup can realistically simulate the cross-regional deployment of the model in practical operational scenarios. To further prove that the performance improvement stems from the three proposed modified modules in this study rather than data partitioning differences, cross-regional generalization tests were also performed on the baseline DeepLabV3+ under identical data partitioning, training strategy, and testing procedures.

Table 10 presents the results of SAB-DeepLabV3+ across the five independent cities. Overall, the model performed stably across all cities, with mIoU ranging from 75.96% to 77.15% (mean = 76.56%) and OA ranging from 88.97% to 93.62% (mean = 91.38%). The IoU of the waterlogging-affected maize class ranged from 60.53% to 68.89% (mean = 63.45%), indicating that the model maintains favorable waterlogging identification ability under cross-regional conditions. In contrast, the IoU of the non-waterlogged maize class exceeded 85% in all cities, suggesting that the main challenge of cross-regional generalization lies in the waterlogging-affected class, especially in regions with blurred boundaries, fragmented patches, or mild waterlogging.

Among different cities, Daqing achieved the best performance, with the IoU of the waterlogging-affected class reaching 68.89%. Qiqihar exhibited high Precision (87.43%) but relatively low Recall (68.96%). These results indicate that the model controlled false positives well in unseen regions but still showed certain omissions in boundary transition zones or mildly waterlogged regions. Overall, SAB-DeepLabV3+ maintained stable segmentation performance across all cities, demonstrating strong cross-regional generalization ability and practical deployment potential.

Furthermore, Table 11 presents the comparison results between DeepLabV3+ and SAB-DeepLabV3+ under the cross-regional setup. As can be seen, the proposed method outperformed the baseline in all five independent cities. On average, SAB-DeepLabV3+ increased mIoU from 74.43% to 76.56% (a gain of 2.13 percentage points) and OA from 90.71% to 91.38% (a gain of 0.67 percentage points) compared to DeepLabV3+. More importantly, the improvement gained was concentrated on the waterlogging-affected maize class (the more challenging class), with IoU-WM increasing from 59.82% to 63.45% (a gain of 3.63 percentage points), Recall-WM from 67.78% to 73.26% (a gain of 5.48 percentage points), and F1-WM from 74.77% to 77.60% (a gain of 2.83 percentage points). In comparison, the IoU of the non-waterlogged maize class improved by only 0.62 percentage points, demonstrating that the spectral-spatial enhancement, adaptive multi-scale fusion, and boundary enhancement modules can effectively strengthen the model’s discriminability for cross-regional waterlogging patterns rather than merely benefiting easily classified samples.

At the city level, the most significant improvement was observed in Qiqihar, where IoU-WM increased from 53.81% to 62.74% (a gain of 8.93 percentage points) and mIoU increased by 5.36 percentage points. Combined with regional test results, the baseline model tends to be conservative in identifying waterlogged regions in this city, achieving high Precision but low Recall. In contrast, SAB-DeepLabV3+ not only maintained high Precision but also significantly reduced the under-segmentation of waterlogging-affected regions, showing stronger cross-regional adaptability. In other cities, including Daqing, Hegang, Mudanjiang, and Jixi, the proposed method also achieved consistent improvement gains, indicating that the model was effective not only in individual regions but also robust under diverse geographical environments and disaster backgrounds.

In summary, SAB-DeepLabV3+ not only achieved superior results in within-region testing but also consistently outperformed the baseline in rigorous cross-city generalization experiments, especially showing stronger transferability in waterlogging-affected maize class identification. This demonstrates that the three proposed modules can effectively alleviate spectral aliasing, scale heterogeneity, and boundary blurring under single-date multispectral conditions, supporting cross-regional deployment of the model in practical agricultural disaster monitoring.

4. Discussion

4.1. Feasibility of Single-Date RGB-NIR Imagery and Relation to Previous Studies

A central question of this study is whether single-date high-resolution RGB-NIR imagery can support maize waterlogging identification when time-series observations are unavailable. The spectral separability analysis showed that the four-band combination achieved a higher JM distance than individual bands and vegetation indices, indicating that single-date multispectral imagery contains useful discriminatory information. However, the separability was still incomplete, confirming that waterlogged and non-waterlogged maize remain partially overlapped in the original spectral space. This explains why threshold-based indices and shallow machine-learning methods performed substantially worse than deep segmentation models.

Recent crop flood-damage studies have often relied on multi-temporal or multi-source imagery to characterize inundation, crop damage, and post-event recovery [38,39]. In contrast, the present study addresses a more constrained but operationally relevant setting: pixel-level waterlogging segmentation from a single post-event scene. This setting is closer to rapid emergency assessment when continuous observations are not immediately available. Similar challenges have also been reported in other single-scene agricultural classification tasks, where spectral overlap, background interference, and spatial heterogeneity reduce the reliability of simple spectral rules.

4.2. Performance Interpretation and Regional Transferability

The improvement of SAB-DeepLabV3+ mainly resulted from targeted modeling of three difficulties in single-date maize waterlogging mapping: spectral ambiguity, scale heterogeneity, and blurred transition zones. Compared with DeepLabV3+, the proposed model achieved higher IoU and mIoU for the waterlogging-affected class, showing that the designed modules contributed primarily to the more difficult class rather than to already well-separated non-waterlogged maize.

The leave-one-city-out experiments further showed that SAB-DeepLabV3+ maintained stable performance across five cities in Heilongjiang Province, although cross-regional accuracy remained lower than within-region accuracy. The remaining errors were concentrated in mildly affected fields, fragmented patches, and boundary transition regions, suggesting that regional differences in rainfall processes, drainage conditions, management practices, and waterlogging severity can still cause domain shifts. Therefore, the results support regional transfer potential within the tested setting, rather than unrestricted generalization beyond the study area.

4.3. Insurance-Oriented Application Scope and Dependence on Maize Masks

This study intentionally performs waterlogging segmentation within pre-extracted maize planting masks. This design reflects an agricultural insurance scenario rather than a full-scene land-cover mapping task. In crop insurance practice, the insured crop type and parcel boundaries are usually registered at the time of policy enrollment. Accordingly, the operational question after a disaster is often not “where is maize located?”, but “which insured maize parcels are affected by waterlogging?”. Under this application setting, using prior maize masks is consistent with the available business information and helps focus the model on disaster discrimination within the target crop.

Nevertheless, the accuracy of the maize masks can influence the final waterlogging results. Omission errors in the masks may exclude actual insured maize areas and lead to missed detections, whereas commission errors may introduce non-maize pixels, bare soil, or other spectrally similar surfaces and increase false positives. Boundary misalignment may also affect the delineation of transition zones. Therefore, the reported performance should be interpreted as crop-mask-dependent binary segmentation of waterlogged and non-waterlogged maize, rather than end-to-end full-scene flood-damage mapping. Future operational use should either ensure reliable parcel masks or quantify the sensitivity of the model to mask uncertainty.

4.4. Sensor Context and Comparison with High-Resolution Data Sources

Compared with other high-resolution optical data sources, the GW-A59-C multispectral imagery used in this study provides a four-band RGB-NIR configuration with 3 m spatial resolution and a swath width of 30 km, which is suitable for regional agricultural monitoring and rapid post-disaster mapping. Compared with PlanetScope/SuperDove imagery, GW-A59-C has a comparable meter-level spatial resolution but fewer spectral bands, as SuperDove provides additional bands such as Red Edge, Green I, Yellow, and Coastal Blue, which may offer stronger sensitivity to vegetation stress and chlorophyll-related changes. Compared with Maxar WorldView imagery, GW-A59-C has coarser spatial resolution and fewer spectral channels, whereas WorldView data can provide very-high-resolution panchromatic, multispectral, and SWIR observations that are useful for detailed object-level mapping.

However, the objective of this study was not to demonstrate the superiority of GW-A59-C imagery over richer commercial data sources. Instead, this study aimed to evaluate whether single-date four-band RGB-NIR imagery can support rapid maize waterlogging segmentation within pre-extracted maize planting masks. Future work should further compare the proposed framework across different high-resolution sensors, including GW-A59-C, PlanetScope/SuperDove, Maxar WorldView, GaoFen, and SAR-based data sources.

4.5. Limitations and Future Validation

Overall, the results demonstrate the feasibility of using single-date RGB-NIR imagery and the proposed SAB-DeepLabV3+ model for maize waterlogging segmentation within pre-extracted maize planting masks. However, the findings should be interpreted within the specific experimental setting of this study, including the dependence on maize masks, the use of single-date GW-A59-C imagery, and validation within a limited regional context. These factors indicate that further evaluation is still needed before the method can be generalized to broader operational scenarios.

In addition, the proposed method cannot directly determine the level of crop damage or the percentage of potential yield loss from a single post-event image. Its output should be interpreted as pixel-level waterlogging occurrence or affected-area mapping, rather than as a direct indicator of irreversible crop damage. Based on the classified waterlogged pixels, the method can further derive the proportion of waterlogging-affected area within each registered maize parcel, which may support rapid screening, field verification prioritization, drainage scheduling, and preliminary agricultural insurance inspection. However, actual yield loss is influenced by waterlogging duration, crop growth stage, inundation depth, soil drainage conditions, management practices, and post-event recovery. Therefore, a parcel classified as waterlogged in a single scene may partially or fully recover later, and false-positive waterlogging detections could lead to overestimation of actual damage if used directly for compensation decisions. The proposed method should therefore be used as an auxiliary evidence layer for rapid post-disaster assessment, rather than as the sole basis for agricultural insurance payment or final yield-loss estimation.

Future work should therefore focus on: (1) quantifying the sensitivity of waterlogging mapping to maize mask errors; (2) developing integrated frameworks that jointly perform maize planting-region extraction and waterlogging identification; (3) expanding validation across years, regions, crop types, and sensor systems; (4) incorporating complementary data sources, such as SAR, SWIR, thermal infrared, terrain information, meteorological records, and crop growth information, to improve robustness under complex disaster and imaging conditions; and (5) linking waterlogging maps with field survey data and yield-loss models to assess damage severity and potential yield reduction more reliably.

5. Conclusions

Aiming at rapid maize waterlogging mapping using single-date, high-resolution multispectral imagery, this study proposed SAB-DeepLabV3+, a spectral-spatial collaboratively enhanced semantic segmentation model based on DeepLabV3+. Subsequently, systematic experiments were conducted on a dataset constructed from 62 scenes covering five typical major maize-producing cities in Heilongjiang Province of China from 2022 to 2024. The main conclusions are as follows:

(1): Single-date RGB-NIR imagery provides a discriminative basis for maize waterlogging identification. Spectral separability analysis showed that the JM distance of the four-band RGB-NIR feature was 1.3291, higher than those of individual bands and the tested vegetation indices, namely NDVI and GNDVI. Traditional single-date baseline experiments further confirm that single-date multispectral imagery contains usable waterlogging information, yet simple thresholding and shallow machine learning methods cannot meet high-precision identification requirements.
(2): SAB-DeepLabV3+ effectively improves the identification accuracy of the waterlogging-affected maize class. Compared with the baseline DeepLabV3+, the proposed model increased the IoU of the waterlogging-affected maize class from 62.23% to 68.30%, while the overall mIoU, mF1, and OA increased from 76.74% to 80.37%, from 86.07% to 88.62%, and from 92.35% to 93.49%, respectively.
(3): The SSIEM, AMSP, and BEAM modules were used to model spectral aliasing, scale heterogeneity, and boundary blurring, respectively, and achieved complementary gains. Ablation experiments and module comparisons show that all three modules outperformed their corresponding alternative structures, and the combination achieved the optimal performance.
(4): The proposed method exhibits favorable cross-regional transfer potential. In leave-one-city-out experiments, SAB-DeepLabV3+ achieved average mIoU, IoU-WM, and OA of 76.56%, 63.45%, and 91.38%, respectively, indicating stable identification performance in unseen regions. Given pre-determined maize planting regions, this method can support rapid post-disaster waterlogging mapping, field verification, drainage scheduling, and agricultural insurance surveys.

Overall, this study demonstrates that, under the crop-mask-dependent and regional GW-A59-C imagery setting, task-driven modeling of spectral aliasing, scale heterogeneity, and boundary blurring can improve the pixel-level identification of waterlogged maize fields from single-date RGB-NIR imagery. From a scientific perspective, the results provide evidence that single-scene multispectral imagery, when combined with targeted spectral-spatial, multi-scale, and boundary-enhancement designs, can support rapid agricultural disaster assessment when time-series observations are unavailable. From an application perspective, the proposed framework has potential value for post-disaster field verification, drainage-priority assessment, agricultural insurance inspection, and loss evaluation, thereby supporting farmers, insurers, and agricultural management agencies in making more timely decisions. Future studies should further validate the method across broader regions, years, crop systems, and sensor types before large-scale operational application.

Author Contributions

J.A.: writing—original draft; Q.W.: Software, writing—review and editing; C.W.: methodology; X.S.: data curation; Q.T.: validation; J.Y.: visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key Research and Development Program of China under Grant 2024YFD1601304, and in part by the National Natural Science Foundation of China (Grant No. 62472012).

Data Availability Statement

The data presented in this study are available on request from the corresponding authors. Due to the interests of partners, the dataset used in this study is currently not publicly available. Consideration will be given to making the dataset open after the completion of the collaborative project.

Acknowledgments

We are grateful to our colleagues at Hebei Key Laboratory of Agricultural Big Data and National Sub-center for Digital Agriculture Innovation (Beijing-Tianjin-Hebei Region) for their help and input, without which this study would not have been possible.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Acharya, B.; Dodla, S.; Tubana, B.; Gentimis, T.; Rontani, F.; Adhikari, R.; Duron, D.; Bortolon, G.; Setiyono, T. Characterizing Optimum N Rate in Waterlogged Maize (Zea mays L.) with Unmanned Aerial Vehicle (UAV) Remote Sensing. Agronomy 2025, 15, 434. [Google Scholar] [CrossRef]
Li, X.; Feng, Y.; Sun, X.; Liu, W.; Yang, W.; Ge, X.; Jia, Y. Effects of Various Levels of Water Stress on Morpho-Physiological Traits and Spectral Reflectance of Maize at Seedling Growth Stage. Agronomy 2024, 14, 2173. [Google Scholar] [CrossRef]
Min, C.-W.; Yoon, I.-K.; Kim, M.-J.; Jung, J.-S.; Rahman, M.A.; Lee, B.-H. Effects of Waterlogging at Different Developmental Stages on Growth, Yield and Physiological Responses of Forage Maize. Agronomy 2025, 15, 2389. [Google Scholar] [CrossRef]
Zhi, F.; Zhang, J.; Bao, Y.; Bao, Y.; Dong, Z.; Tong, Z.; Liu, X. Assessment of waterlogging hazard during maize growth stage in the Songliao plain based on daily scale SPEI and SMAI. Agric. Water Manag. 2024, 304, 109081. [Google Scholar] [CrossRef]
Chen, H.; Liang, Q.; Liang, Z.; Liu, Y.; Xie, S. Remote-sensing disturbance detection index to identify spatio-temporal varying flood impact on crop production. Agric. For. Meteorol. 2019, 269–270, 180–191. [Google Scholar] [CrossRef]
Teixeira, A.C.; Bakon, M.; Lopes, D.; Cunha, A.; Sousa, J.J. A systematic review on soil moisture estimation using remote sensing data for agricultural applications. Sci. Remote Sens. 2025, 12, 100328. [Google Scholar] [CrossRef]
Badarneh, O.; Hazaymeh, K.; Almagbile, A.; Shogoor, S.A. Remote sensing-based agricultural drought mapping in Northern Jordan using Landsat and MODIS data. Environ. Adv. 2024, 18, 100602. [Google Scholar] [CrossRef]
Thapa, A.; Horanont, T.; Neupane, B. Parcel-Level Flood and Drought Detection for Insurance Using Sentinel-2A, Sentinel-1 SAR GRD and Mobile Images. Remote Sens. 2022, 14, 6095. [Google Scholar] [CrossRef]
Andrade, J.; Cunha, J.; Silva, J.; Rufino, I.; Galvão, C. Evaluating single and multi-date Landsat classifications of land-cover in a seasonally dry tropical forest. Remote Sens. Appl. Soc. Environ. 2021, 22, 100515. [Google Scholar] [CrossRef]
Karmakar, P.; Teng, S.W.; Murshed, M.; Pang, S.; Li, Y.; Lin, H. Crop monitoring by multimodal remote sensing: A review. Remote Sens. Appl. Soc. Environ. 2024, 33, 101093. [Google Scholar] [CrossRef]
Mazzia, V.; Khaliq, A.; Chiaberge, M. Improvement in Land Cover and Crop Classification based on Temporal Features Learning from Sentinel-2 Data Using Recurrent-Convolutional Neural Network (R-CNN). Appl. Sci. 2020, 10, 238. [Google Scholar] [CrossRef]
Galieni, A.; D’Ascenzo, N.; Stagnari, F.; Pagnani, G.; Xie, Q.; Pisante, M. Past and Future of Plant Stress Detection: An Overview from Remote Sensing to Positron Emission Tomography. Front. Plant Sci. 2021, 11, 609155. [Google Scholar] [CrossRef]
Cho, S.B.; Soleh, H.M.; Choi, J.W.; Hwang, W.-H.; Lee, H.; Cho, Y.-S.; Cho, B.-K.; Kim, M.S.; Baek, I.; Kim, G. Recent Methods for Evaluating Crop Water Stress Using AI Techniques: A Review. Sensors 2024, 24, 6313. [Google Scholar] [CrossRef]
Omia, E.; Bae, H.; Park, E.; Kim, M.S.; Baek, I.; Kabenge, I.; Cho, B. Remote Sensing in Field Crop Monitoring: A Comprehensive Review of Sensor Systems, Data Analyses and Recent Advances. Remote Sens. 2023, 15, 354. [Google Scholar] [CrossRef]
Ding, Y.; Zheng, X.; Zhao, K.; Xin, X.; Liu, H. Quantifying the Impact of NDVIsoil Determination Methods and NDVIsoil Variability on the Estimation of Fractional Vegetation Cover in Northeast China. Remote Sens. 2016, 8, 29. [Google Scholar] [CrossRef]
Ren, J.; Shao, Y.; Wan, H.; Xie, Y.; Campos, A. A two-step mapping of irrigated corn with multi-temporal MODIS and Landsat analysis ready data. ISPRS J. Photogramm. Remote Sens. 2021, 176, 69–82. [Google Scholar] [CrossRef]
Zhang, J.; Pan, B.; Shi, W.; Zhang, Y. Monitoring Waterlogging Damage of Winter Wheat Based on HYDRUS-1D and WOFOST Coupled Model and Assimilated Soil Moisture Data of Remote Sensing. Remote Sens. 2023, 15, 4133. [Google Scholar] [CrossRef]
Zheng, G.; Jiang, Z.; Zhang, X.; Jiang, D. Multi-scale feature fusion-based semantic segmentation network for agricultural remote sensing images. Chem. Biol. Technol. Agric. 2025, 12, 126. [Google Scholar] [CrossRef]
Nazir, A.; Ahmad, A.; Ramzan, M.; Gilani, H.; Mobeen, M.; Tarer, S.; Hanan, N.P. Flood-Induced Agricultural Damage Assessment: A Case Study of Pakistan. Water 2025, 17, 3060. [Google Scholar] [CrossRef]
Wei, P.; Ye, H.; Nie, C.; Qin, M.; Zhang, Y.; Wang, H.; Huang, S.; Liu, R. Evaluating the impact of crop waterlogging and flood disasters using multi-source data: A case study of the Sanjiang Plain. Clim. Serv. 2025, 39, 100596. [Google Scholar] [CrossRef]
Sun, J.; Zhou, J.; He, Y.; Jia, H.; Liang, Z. RL-DeepLabv3+: A lightweight rice lodging semantic segmentation model for unmanned rice harvester. Comput. Electron. Agric. 2023, 209, 107823. [Google Scholar] [CrossRef]
Wang, Y.; Gao, X.; Sun, Y.; Liu, Y.; Wang, L.; Liu, M. Semantic segmentation-based conservation tillage corn straw return cover type recognition. Comput. Electron. Agric. 2025, 229, 109792. [Google Scholar] [CrossRef]
Lai, P.; Lv, C.; Zhou, L.; Yang, S.; Xu, J.; Dong, Q.; He, M. Improved lightweight DeepLabV3+ for bare rock extraction from high-resolution UAV imagery. Ecol. Inform. 2025, 89, 103204. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, S.; Meng, X.; Zhang, G.; Zang, D.; Han, Y.; Ai, H.; Liu, H. Remote sensing image segmentation of gully erosion in a typical black soil area in Northeast China based on improved DeepLabV3+ model. Ecol. Inform. 2024, 84, 102929. [Google Scholar] [CrossRef]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Cham, Switzerland, 2018. [Google Scholar] [CrossRef]
Jha, D.; Riegler, M.A.; Johansen, D.; Halvorsen, P.; Johansen, H.D. DoubleU-Net: A Deep Convolutional Neural Network for Medical Image Segmentation. In Proceedings of the 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS); IEEE: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation. Int. J. Comput. Vis. 2020, 129, 3051–3068. [Google Scholar] [CrossRef]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2017; pp. 6230–6239. [Google Scholar]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: New York, NY, USA, 2017; pp. 936–944. [Google Scholar]
Chen, L.; Zhu, Y.; Papandreou, G.; Schroff, F.; Hartwig, A. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv 2018, arXiv:1802.02611. [Google Scholar] [CrossRef]
Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep High-Resolution Representation Learning for Visual Recognition. In Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence; IEEE: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
Xu, Q.; Ma, Z.; He, N.; Duan, W. DCSAU-Net: A deeper and more compact split-attention U-Net for medical image segmentation. Comput. Biol. Med. 2023, 154, 106626. [Google Scholar] [CrossRef]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. arXiv 2021, arXiv:2105.15203. [Google Scholar] [CrossRef]
Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Manning, W. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. In Proceedings of the Computer Vision—ECCV 2022 Workshops; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
Wen, C.; Sun, Z.; Li, H.; Han, Y.; Gunasekera, D.; Chen, Y.; Zhang, H.; Zhao, X. Flood Mapping and Assessment of Crop Damage Based on Multi-Source Remote Sensing: A Case Study of the “7.27” Rainstorm in Hebei Province, China. Remote Sens. 2025, 17, 904. [Google Scholar] [CrossRef]
Lateef, L.O.; Costa, H.; Cabral, P. Improved integrated framework for flooded crop damage and recovery assessment: A multi-source earth observation and participatory mapping in Hadejia, Nigeria. J. Environ. Manag. 2025, 384, 125542. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Location and environmental background of the study area. (a) Location of Heilongjiang Province and the five selected prefecture-level cities in Northeast China. (b) Spatial distribution of monthly accumulated precipitation in August 2023 in Heilongjiang Province, derived from the China Daily Gridded Precipitation Dataset V2, CHM_PRE V2, 1960–2024, 0.1°, provided by the National Tibetan Plateau Data Center. The August monthly precipitation was obtained by summing the daily precipitation grids, with units of mm. (c–g) Land-use distribution of Daqing, Qiqihar, Hegang, Mudanjiang, and Jixi, respectively.

Figure 2. Workflow of dataset construction and labeling.

Figure 3. Overlay of multispectral imagery and pixel-wise waterlogging labels.

Figure 4. Architecture of the DeepLabV3+ network.

Figure 5. Overall architecture of the proposed SAB-DeepLabV3+ framework.

Figure 6. Network structure of SSIEM.

Figure 7. Network structure of AMSP.

Figure 8. Network structure of BEAM.

Figure 9. Qualitative segmentation results in representative test regions: (a) large continuous waterlogged areas, (b) multi-scale waterlogged patches, and (c) fragmented patches with complex boundaries. Yellow boxes highlight typical differences among methods.

Figure 10. Visual comparison of ablation results in representative difficult waterlogging-affected maize regions: (a) fragmented waterlogging patches, (b) blurred boundary regions, and (c) areas with spectral confusion. Yellow boxes highlight typical error-prone regions.

Table 1. Main characteristics of the GW-A59-C multispectral satellite product used in this study.

Parameter	Description
Satellite product	GW-A59-C multispectral satellite product
Constellation	China SatNet GW/Guowang constellation
Launch date	19 May 2023
Orbit type	Sun-synchronous orbit
Orbital altitude	508 km
Orbital inclination	55°
Off-nadir viewing angle	±20°
Spatial resolution	3 m
Swath width	30 km
Temporal resolution	1 day; up to two acquisitions per day under programmed observation
Positioning accuracy	Better than 3 m
Spectral bands	Blue, Green, Red, and NIR
Blue band	460–520 nm
Green band	540–595 nm
Red band	635–685 nm
NIR band	805–895 nm
Projection information	RPC/WGS84 UTM/WGS84
Scenes used in this study	62 scenes
Input patch size	512 × 512 pixels
Ground extent of each patch	Approximately 1.536 km × 1.536 km

Table 2. Encoder–decoder assignment, pipeline connection, functional role, advantage, and limitation of each module in the proposed multi-module architecture..

Module	Encoder /Decoder Assignment	Pipeline Connection	Main Function	Main Advantage	Potential Limitation
SSIEM	Encoder-side enhancement	Applied after high-level encoder feature extraction and before AMSP	Enhances spectral-spatial responses related to maize waterlogging	Strengthens waterlogging-sensitive features and suppresses background interference	Depends on the discriminative quality of RGB-NIR spectral responses
AMSP	Encoder- decoder bridge/bottleneck	Replaces the original ASPP module after SSIEM	Performs adaptive multi-scale contextual modeling	Improves representation of both large continuous waterlogged areas and fragmented patches	Increases computational cost due to multiple atrous branches
BEAM	Decoder-side refinement	Applied after fusion of high-level semantic features and low-level detail features	Refines boundary transition zones and local spatial consistency	Improves boundary delineation and reduces local misclassification	May be limited when boundary cues are weak or affected by noise

Table 3. Architectural configuration and computational cost of the proposed modules.

Module	Placement	Input Dimension	Output Dimension	Dilation Rates	Branch Settings	Parames	FLOPs	Inference Time
SSIEM	After high-level encoder features	320 × 64 × 64	320 × 64 × 64	—	Channel-spatial enhancement pathway	0.139 M	0.525 G	0.710 ms
AMSP	Replacing the original ASPP module	320 × 64 × 64	256 × 64 × 64	6, 12, 18	One 1 × 1 branch and three 3 × 3 atrous branches	2.308 M	9.40 G	1.121 ms
BEAM	Decoder boundary-refinement stage	x: 256 × 64 × 64 semantic: 256 × 32 × 32	256 × 64 × 64	—	Semantic-guided boundary refinement	0.678 M	2.55 G	0.469 ms

Note: FLOPs are reported according to the profiler’s multiply add convention.

Table 4. Comparison results of traditional single-date methods for maize waterlogging identification.

Method	IoU-WM/%	Recall-WM/%	mIoU/%	mPA/%	mF1/%	OA/%
NDVI	16.94	53.85	31.81	52.93	46.31	51.92
GNDVI	18.51	57.44	33.55	55.34	48.32	53.96
Random Forest	24.96	34.38	52.45	57.88	64.40	81.18

Note: IoU-WM denotes the IoU of the waterlogging-affected maize class, and Recall-WM denotes the recall of the waterlogging-affected maize class.

Table 5. Quantitative comparison among different semantic segmentation models on maize waterlogging stress dataset.

Model Category	Model	IoU-NM/%	IoU-WM/%	mIoU/%	mPA/%	mF1/%	OA/%
Encoder–Decoder (CNN)	SegNet [25]	87.25	45.78	66.52	74.84	78.00	88.49
	UNet [26]	90.77	62.14	76.46	84.32	85.91	91.98
	UNet++ [27]	89.79	56.19	72.99	80.33	83.28	90.97
	DoubleUNet [28]	90.25	58.73	74.49	81.91	84.44	91.44
Lightweight/Real-time	BiSeNetV2 [29]	88.22	48.60	68.41	76.04	79.58	89.40
Multi-scale Context	PSPNet [30]	91.11	63.65	77.38	85.18	86.57	92.31
	FPN [31]	90.94	63.63	77.29	85.55	86.51	92.18
	DeepLabV3+ [32]	91.24	62.23	76.74	84.05	86.07	92.35
High-resolution /Attention	HRNet [33]	90.00	57.88	73.94	81.54	84.03	91.21
High-resolution /Attention	DCSA-UNet [34]	90.52	58.71	74.61	81.39	84.50	91.64
Transformer-Based	SegFormer [35]	88.23	50.27	69.25	77.39	80.33	89.49
	TransUNet [36]	89.91	55.65	72.78	79.65	83.10	91.04
	SwinUNet [37]	90.93	62.06	76.50	83.87	85.92	92.11
Proposed	SAB-DeepLabV3+	92.43	68.30	80.37	87.87	88.62	93.49

Table 6. Quantitative results of ablation experiments under three random seeds.

Exp	SSIEM	AMSP	BEAM	IoU-NM/%	IoU-WM/%	mIoU/%	mPA/%	mF1/%	OA/%
0	×	×	×	90.96 ± 0.27	63.20 ± 0.92	77.08 ± 0.32	85.29 ± 1.27	86.36 ± 0.27	92.18 ± 0.17
1	√	×	×	91.30 ± 0.08	65.01 ± 0.14	78.15 ± 0.06	87.04 ± 0.31	87.12 ± 0.05	92.51 ± 0.06
2	×	√	×	91.05 ± 0.21	64.54 ± 0.42	77.79 ± 0.25	86.59 ± 0.83	86.88 ± 0.18	92.31 ± 0.16
3	×	×	√	91.13 ± 0.36	63.94 ± 0.08	77.53 ± 0.19	85.69 ± 0.55	86.68 ± 0.11	92.34 ± 0.27
4	√	√	×	91.80 ± 0.21	66.99 ± 0.44	79.40 ± 0.32	87.54 ± 0.39	87.98 ± 0.21	92.97 ± 0.17
5	√	√	√	92.31 ± 0.21	68.15 ± 0.22	80.23 ± 0.22	88.04 ± 0.38	88.53 ± 0.13	93.40 ± 0.16

Note: Values are reported as mean ± standard deviation over three random seeds, namely 11, 22, and 33. √ indicates the use of the module, and × indicates the non-use of the module.

Table 7. Comparison of SSIEM with representative attention modules.

Attention Module	IoU-NM/%	IoU-WM/%	Recall-WM/%	mIoU/%	mPA/%	mF1/%	OA/%
None	91.24	62.23	71.21	76.74	84.05	86.07	92.35
CBAM	90.95	64.37	77.18	77.66	86.37	86.79	92.22
SE	91.12	64.51	76.32	77.82	86.12	86.89	92.36
ECA	91.08	63.93	74.98	77.50	85.56	86.66	92.30
CoordAttention	91.86	64.67	77.18	77.86	86.44	86.93	92.32
SSIEM	91.22	65.14	79.57	78.18	87.40	87.15	92.46

Note: Recall-WM denotes the recall of the waterlogging-affected maize class.

Table 8. Comparison of AMSP with different multi-scale context modules.

Context Module	IoU-NM/%	IoU-WM/%	Recall-WM/%	mIoU/%	mPA/%	mF1/%	OA/%
ASPP	91.24	62.23	71.21	76.74	84.05	86.07	92.35
Static-MSP	90.97	64.17	76.42	77.57	86.08	86.72	92.23
AMSP	91.21	64.96	79.10	78.08	87.21	87.08	92.44

Table 9. Comparison between BEAM and different boundary enhancement variants.

Boundary Strategy	IoU-NM/%	IoU-WM/%	Recall-WM/%	mIoU/%	mPA/%	mF1/%	OA/%
None	91.24	62.23	71.21	76.74	84.05	86.07	92.35
Edge-only	90.81	63.32	75.17	77.07	85.50	86.36	92.07
Semantic-no-gate	91.06	63.51	73.92	77.29	85.14	86.5	92.28
BEAM	91.50	63.99	74.14	77.74	85.36	86.8	92.61

Table 10. Cross-regional generalization performance of SAB-DeepLabV3+ under the leave-one-city-out setting.

City	IoU-NM/%	IoU-WM/%	Recall-WM/%	F1-WM/%	mIoU/%	OA/%
Jixi	92.93	60.53	71.59	75.41	76.73	93.62
Daqing	85.41	68.89	80.52	81.58	77.15	88.97
Hegang	90.52	63.00	72.68	77.30	76.76	91.84
Mudanjiang	89.83	62.09	72.54	76.61	75.96	91.28
Qiqihar	89.62	62.74	68.96	77.10	76.18	91.17
Average	89.66	63.45	73.26	77.60	76.56	91.38

Note: F1-WM denotes the F1-score of the waterlogging-affected maize class.

Table 11. Cross-regional comparison between DeepLabV3+ and SAB-DeepLabV3+ under the leave-one-city-out setting.

City	IoU-WM (DL)/%	IoU-WM (SAB)/%	∆IoU- WM/%	mIoU (DL)/%	mIoU (SAB)/%	∆mIoU/%	OA (DL)/%	OA (SAB)/%	∆OA/%
Jixi	57.69	60.53	+2.84	75.18	76.73	+1.55	93.34	93.62	+0.28
Daqing	66.89	68.89	+2.00	75.95	77.15	+1.20	88.49	88.97	+0.48
Hegang	59.79	63.00	+3.21	74.92	76.76	+1.84	91.33	91.84	+0.51
Mudanjiang	60.92	62.09	+1.17	75.28	75.96	+0.68	91.07	91.28	+0.21
Qiqihar	53.81	62.74	+8.93	70.82	76.18	+5.36	89.34	91.17	+1.83
Average	59.82	63.45	+3.63	74.43	76.56	+2.13	90.71	91.38	+0.67

Note: DL denotes DeepLabV3+; SAB denotes SAB-DeepLabV3+; WM-IoU represents the IoU of the waterlogging-affected maize class; ∆ represents the improvement gain of SAB over DL.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

An, J.; Wang, Q.; Wang, C.; Sun, X.; Tian, Q.; Yuan, J. SAB-DeepLabV3+: A Semantic Segmentation Framework for Mapping Maize Waterlogging from Single-Date Multispectral Imagery. Agronomy 2026, 16, 1168. https://doi.org/10.3390/agronomy16121168

AMA Style

An J, Wang Q, Wang C, Sun X, Tian Q, Yuan J. SAB-DeepLabV3+: A Semantic Segmentation Framework for Mapping Maize Waterlogging from Single-Date Multispectral Imagery. Agronomy. 2026; 16(12):1168. https://doi.org/10.3390/agronomy16121168

Chicago/Turabian Style

An, Jiahao, Qingxue Wang, Chunshan Wang, Xiang Sun, Qingwei Tian, and Jin Yuan. 2026. "SAB-DeepLabV3+: A Semantic Segmentation Framework for Mapping Maize Waterlogging from Single-Date Multispectral Imagery" Agronomy 16, no. 12: 1168. https://doi.org/10.3390/agronomy16121168

APA Style

An, J., Wang, Q., Wang, C., Sun, X., Tian, Q., & Yuan, J. (2026). SAB-DeepLabV3+: A Semantic Segmentation Framework for Mapping Maize Waterlogging from Single-Date Multispectral Imagery. Agronomy, 16(12), 1168. https://doi.org/10.3390/agronomy16121168

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SAB-DeepLabV3+: A Semantic Segmentation Framework for Mapping Maize Waterlogging from Single-Date Multispectral Imagery

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Sources and Dataset Construction

2.3. Research Methods

2.3.1. DeepLabV3+

2.3.2. SAB-DeepLabV3+

2.3.3. SSIEM Module

2.3.4. AMSP Module

2.3.5. BEAM Module

2.3.6. Relationship to Existing Modules

2.4. Experimental Environment and Configuration

2.5. Evaluation Indicators

3. Results and Analysis

3.1. Spectral Separability Analysis

3.2. Comparative Experiments

3.2.1. Comparison with Traditional Single-Date Methods

3.2.2. Comparison with Deep Learning Segmentation Models

3.3. Ablation Experiments

3.4. Comparative Experiments with Representative Modules

3.4.1. Comparison Between SSIEM and Generic Attention Modules

3.4.2. Comparison Between AMSP and Multi-Scale Context Modules

3.4.3. Comparison Between BEAM and Boundary Enhancement Variants

3.5. Generalization Experiment

4. Discussion

4.1. Feasibility of Single-Date RGB-NIR Imagery and Relation to Previous Studies

4.2. Performance Interpretation and Regional Transferability

4.3. Insurance-Oriented Application Scope and Dependence on Maize Masks

4.4. Sensor Context and Comparison with High-Resolution Data Sources

4.5. Limitations and Future Validation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI