Robust Segmentation of Mangrove in Remote Sensing Images via ODE-Based Neural Networks and Adversarial Training

Yu, Hao; Pan, Xiaoyan; Wu, Tingtian; Chen, Yiqing; Li, Yuanling; Chen, Xiaohua; Hu, Junjie; Chen, Zongzhu

doi:10.3390/app16125812

Open AccessArticle

Robust Segmentation of Mangrove in Remote Sensing Images via ODE-Based Neural Networks and Adversarial Training

by

Hao Yu

¹

,

Xiaoyan Pan

²,

Tingtian Wu

²,

Yiqing Chen

²,

Yuanling Li

²,

Xiaohua Chen

²,

Junjie Hu

³

and

Zongzhu Chen

^2,*

¹

School of Intelligent Systems Science and Engineering, Jinan University, Zhuhai 519070, China

²

Key Laboratory of Tropical Forestry Resources Monitoring and Application of Hainan Province, Hainan Academy of Forestry, Haikou 571100, China

³

Machine Intelligence Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(12), 5812; https://doi.org/10.3390/app16125812 (registering DOI)

Submission received: 10 May 2026 / Revised: 4 June 2026 / Accepted: 5 June 2026 / Published: 9 June 2026

(This article belongs to the Special Issue Applications of Deep and Machine Learning in Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

Mangrove ecosystems are recognized for their exceptional carbon sequestration potential and crucial contribution to coastal ecological balance. However, the sharp decline in mangrove area necessitates efficient monitoring via remote sensing. While Deep Neural Networks (DNNs) have excelled in segmentation tasks, their robustness remains inadequate. This limitation stems from the lack of theoretical guarantees regarding the continuity of layer-by-layer discrete transformations, rendering models susceptible not only to man-made adversarial attacks but also to natural degradations. To address these vulnerabilities, this paper leverages Neural Ordinary Differential Equations (NODEs) to enhance the robustness of mangrove segmentation. We designed and integrated various NODE architectures, including a novel NODE-SE-Block inspired by adaptive feature recalibration, to achieve more stable feature representations. Crucially, our findings reveal that by employing an adversarial training framework based on known attacks, the NODE-integrated network demonstrates superior cross-domain robustness. It not only defends against malicious exploits but also exhibits significantly enhanced resilience toward natural degradations, such as Gaussian noise and sensor-induced artifacts. Experimental results on mangrove datasets verify that the proposed methodology provides a reliable and interference-resistant foundation for ecological management in mission-critical scenarios.

Keywords:

mangrove remote sensing; adversarial attack; neural ordinary differential equations; semantic segmentation; robustness

1. Introduction

Mangroves possess remarkable carbon sequestration capacity and provide unique ecological services, but their sustainability has been severely challenged by extensive human activities and environmental changes. Mangrove ecosystems have experienced extensive degradation, as large areas in Southeast Asia have been cleared for aquaculture and coastal development. In regions such as West Africa and South America, pollution and rising sea levels driven by climate change have further accelerated their decline, with research showing that global mangrove cover has declined by nearly one-third over the past fifty years. These challenges highlight the urgency of effective monitoring and conservation of mangroves.

Compared with traditional machine learning approaches, DNNs exhibit superior feature extraction capabilities and higher object recognition accuracy. Consequently, DNN-based remote sensing technologies have emerged as a promising means for large-scale and long-term monitoring of mangrove ecosystems, facilitating timely and accurate assessments of their spatial distribution and ecological status. Nevertheless, DNNs still suffer from several inherent limitations. The lack of interpretability obscures the understanding of internal decision-making mechanisms [1], while the performance of DNNs remains highly sensitive to external perturbations [2,3]. In mangrove monitoring tasks, the inevitable noise arising from atmospheric interference, sensor imperfections, and environmental variability can blur canopy boundaries and fine structural details, thereby amplifying model vulnerability. Furthermore, the spectral similarity between mangroves and other vegetation, as well as their spatial overlap with water bodies, increases the likelihood of misclassification. Without adequate robustness, DNNs are prone to overfitting noisy data and losing generalization capability, which undermines the reliability of ecological monitoring and assessment. Therefore, enhancing the noise resistance of DNN-based models is essential for achieving stable and practical mangrove detection in real-world remote sensing applications.

Fortunately, the NODE [4,5] offers a new research approach for improving the robustness of DNNs. NODE characterizes DNNs from the perspective of dynamic systems, extending the discrete stacking paradigm of traditional DNNs to a continuous dynamical evolution process. Unlike DNNs with explicit layer structures, NODE implicitly establishes a mapping between input and output, resulting in observed higher nonlinearity, clearer dynamical characteristics, and stronger fitting capabilities. NODE effectively mitigates the impact of input noise or perturbations by evolving feature embeddings along the NODE trajectory, thereby demonstrating enhanced robustness. In addition, unlike the black box effect of the traditional DNN intermediate hidden layer, the evolution trajectory of the NODE internal state reflects the trend of feature changes.

However, employing a full NODE architecture often incurs substantial computational overhead and slows inference. This is because continuous-time modeling requires repeated NODE solver iterations that iteratively call the neural network, while training gradients are computed through the computationally expensive adjoint method. To balance robustness and efficiency, we selectively incorporate NODE into key components of the network to perform dynamic modeling over time, as shown in Figure 1. Specifically, we used NODE to implement the original convolution branch of the void space pyramid [6]. Furthermore, we combined the modeling idea of NODE to transform the classic SE-Block [7], proposing the NODE-SE-Block and embedding it into the segmentation network.

In the complex field monitoring scenarios of mangroves, remote sensing models not only need to defend against adversarial human intervention but also must cope with random natural environmental degradation. This study reveals a key collaborative mechanism: adversarial training [8] using known human-induced perturbation samples can significantly improve the robustness of the NODE architecture to non-specific environmental noise, which can be called Synergetic Adversarial Training (SAT), as shown in Figure 2. The underlying mechanism of this phenomenon lies in the effective depth regularization of the model’s continuous vector field through adversarial training, thereby constructing a more stable feature evolution trajectory.

By optimizing the network against extremal directional perturbations, the training process enforces a smoother decision manifold that satisfies a higher degree of Lipschitz continuity. This rigorous hardening steers the model away from volatile, non-robust features and toward the invariant structural essence of the mangrove canopy. Consequently, while natural interferences like atmospheric haze or sensor stripes differ in origin from adversarial attacks, they fall within the robust subspace established during the defense against FGSM. This cross-robustness effect ensures that the proposed methodology remains dependable in complex ecological environments, bridging the gap between theoretical security and real-world reliability.

In summary, the contributions of this paper mainly include the following four aspects:

We developed and integrated diverse NODE modules into critical network components, including a novel plug-and-play NODE-SE-Block that leverages continuous dynamics within channel attention mechanisms to adaptively optimize feature weights and bolster model resilience.
We propose an adversarial training framework that uses a deliberate malicious attack as the benchmark for maximum perturbation. By forcing the model to remain invariant under these extreme gradients, the framework stabilizes feature trajectories and smooths the decision manifold, effectively extending robustness from deliberate attacks to natural noise.
Extensive experiments on mangrove remote sensing datasets demonstrate that the proposed method significantly enhances robustness against human-induced adverse disturbances and natural environmental degradation.

2. Related Works

Remote sensing monitoring of mangroves is crucial for ecological conservation and environmental assessment, and semantic segmentation using DNNs is particularly crucial in this field. However, in practical applications, noise interference and environmental variations pose significant challenges to model stability and robustness. The following section will explore current research methods and techniques to improve model robustness in mangrove remote sensing tasks.

2.1. Remote Sensing of Mangrove

Mangrove mapping and semantic segmentation have been studied across a range of sensor types and methodological paradigms. Early reviews summarized the advantages and limitations of optical, hyperspectral and SAR sensors for mangrove monitoring and emphasized challenges including mixed pixels, tidal inundation and species-level confusion [9]. Recent research on DNNs has focused on pixel-level segmentation using U-Net [10] and its hybrids [11] on multisource high-resolution imagery, showing marked gains over traditional pixel or object-based classifiers when sufficient labeled data are available. Concurrently, large-scale benchmark efforts and dataset [12,13] releases have enabled systematic comparisons of kinds of architectures for global-scale mangrove mapping [14]. In addition to improving the accuracy of semantic segmentation, related robustness research is also valuable.

In recent years, the robustness of remote sensing semantic segmentation models against noise and perturbations has attracted increasing attention, although systematic studies specifically targeting mangrove ecosystems remain scarce [15]. Existing studies mainly focus on reducing the impact of common disturbances such as clouds, shadows, and tidal variations, often through preprocessing [16], spectral unmixing [17], or multi-temporal data fusion [18]. With the development of deep learning, several works introduced adversarial training [19,20] and data augmentation strategies [21] to improve segmentation robustness against synthetic noise or perturbations. A recent advanced method specifically targets robustness and boundary precision in complex tidal and mixed-background scenes by combining local CNN feature extractors with transformer-style global context modules or dual-backbone fusion designs, which report state-of-the-art gains on regional benchmarks [22].

2.2. Sensitivity of Deep Learning Networks

Although deep neural networks have achieved remarkable success in computer vision and remote sensing tasks, their sensitivity to perturbations remains a significant challenge. Existing studies have shown that even small perturbations may noticeably affect model predictions and reduce robustness in real-world environments.

Recent research has demonstrated this vulnerability across different application scenarios. For example, Ref. [23] showed that stochastic computing architectures may weaken traditional defense mechanisms and allow adversarial perturbations to bypass protection strategies. In medical imaging, Ref. [24] proposed a semi-supervised generative adversarial framework for myocarditis diagnosis and highlighted the sensitivity of deep learning models to perturbation distributions and optimization instability. In addition, Ref. [25] demonstrated that even low-epsilon perturbations can successfully mislead online image stream classifiers.

These studies indicate that deep learning models are generally sensitive to adversarial perturbations and environmental variations. These studies indicate that deep learning models are generally sensitive to adversarial perturbations and environmental variations, highlighting the importance of robustness research for remote sensing semantic segmentation models in practical mangrove monitoring applications.

2.3. Neural Ordinary Differential Equations

In recent studies, NODEs have been proposed as a continuous form of deep networks. The basic idea is to regard the residual block as a discrete solver of ODE and define the state evolution equation and use the numerical integrator to solve the network output [5]. NODE has been gradually introduced into computer vision tasks [26,27]. The core idea is to replace the traditional discrete hierarchy with continuous-time modeling, thereby achieving more flexible feature evolution and higher robustness. In image classification, NODE has been used to replace the convolutional layer in the residual network, reducing the number of parameters while maintaining or even improving performance [28]. Existing studies have combined NODE with Gaussian processes to enhance model robustness and uncertainty modeling, while also incorporating numerical methods to improve their performance against adversarial attacks and out-of-distribution samples [29]. These works show that the ODE architecture has broad application potential in visual tasks.

2.4. Defense Mechanisms Against Perturbations

Adversarial training has been established as a primary paradigm for empirical defense. Adversarial training is the most widely used empirical defense, which improves robustness by jointly training on clean and adversarial examples. Early studies employed single-step FGSM [30] for inner maximization, achieving robustness against simple attacks. However, later works showed that models trained with single-step methods remain vulnerable to stronger multi-step attacks. Madry et al. [31] formalized multi-step PGD-based adversarial training as a strong baseline against first-order adversaries. To reduce computational cost, efficient variants such as Free Adversarial Training [32] and Fast Adversarial Training [33] were proposed. Ensemble Adversarial Training [34] further enhances robustness by leveraging adversarial examples transferred from multiple models.

Regularization-based methods enhance model robustness by imposing objective-level constraints. Beyond adversarial training, regularization-based defenses improve robustness by constraining the model’s local sensitivity at the objective level. Input gradient regularization penalizes

| \nabla_{x} L (θ, x, y) |

to smooth responses to small perturbations [35]. Adversarial Logit Pairing (ALP) [36] enforces consistency between clean and adversarial logits, while TRADES [37] explicitly formulates the accuracy–robustness tradeoff through a decoupled training objective. Related approaches, including Jacobian regularization, label smoothing [38], and margin-based training [39], further promote local linearity and confidence calibration.

Input-transformation defenses utilize randomization and preprocessing to mitigate adversarial effects. Another line of defense applies randomization or input transformations to disrupt gradient-based attacks [40,41]. Typical methods include random resizing or padding, noise injection, and preprocessing operations such as JPEG compression and denoising. While effective against simple adversaries, these techniques may suffer from gradient masking when used alone, leading to overestimated robustness without careful evaluation.

Architecture-level robustness focuses on suppressing perturbation amplification through structural and dynamical design. Architecture-level defenses aim to suppress perturbation amplification within network structures. Lipschitz constraints [42] bound layer-wise sensitivity, while structural designs such as feature denoising [43], non-local blocks [44], and normalization or residual connections improve stability during adversarial training. More recently, Neural ODEs model forward propagation as continuous dynamical systems, implicitly encouraging smoothness and stability [45]. Existing analyses on stability and perturbation propagation support their robustness potential, which aligns with the design motivation of our proposed ODE-based architecture.

In summary, existing research on mangrove remote sensing has achieved remarkable progress in segmentation accuracy through advances in deep learning and multisource data integration. However, the robustness of these models under real-world noise, environmental disturbances, and adversarial perturbations remains underexplored.

NODE offers inherent stability advantages by modeling feature evolution as a continuous dynamic process [4,29]. This property suggests that NODE-based architectures can naturally resist small input perturbations and suppress gradient amplification during propagation.

Motivated by these insights, we propose a NODE-based mangrove remote sensing semantic segmentation framework. To further enhance robustness, we integrate a collaborative adversarial training strategy that jointly optimizes clean and perturbed samples. The combination of continuous-time feature modeling and adversarial supervision enables the proposed framework to achieve improved robustness against both human-induced perturbations and natural image degradation, while maintaining competitive segmentation accuracy and computational efficiency. As a result, our method offers a robust and practical solution for real-world mangrove segmentation applications.

3. Methodology

Traditional deep networks typically transform features layer by layer through discretely stacked network layers, meaning that feature representations evolve in a discrete form along the network depth direction. While this design has been successful in practice, the abrupt mapping between discrete layers can easily lead to feature discontinuities, gradient amplification, and sensitivity to input perturbations, especially in the presence of noise or adversarial perturbations.

To alleviate the aforementioned problems, NODEs reinterpret deep networks as continuous-time dynamical systems. Specifically, given input features

z (t_{0})

, their evolution within the network is no longer defined by a series of discrete layers, but is modeled through the following ordinary differential equations:

\frac{d z (t)}{d t} = f (z (t), t, θ), z (t_{0}) = z_{0},

(1)

where

f (\cdot)

represents the dynamic function parameterized by the neural network,

θ

is a learnable parameter, and t represents a continuous depth or time variable.

In this modeling approach, the feature representation evolves continuously along the time dimension, and the network output

z (t_{M})

is obtained by numerically integrating the above differential equation over the interval

[t_{0}, t_{M}]

, M represents the length of continuous depth or time.

z (t_{M}) = z (t_{0}) + \int_{t_{0}}^{t_{M}} f (z (t), θ) d t .

(2)

From the perspective of representation learning, NODEs replace the traditional discrete layer stacking with a continuous evolution process, making feature updates smoother and effectively suppressing abrupt amplification and the accumulation of high-frequency noise. At the same time, since feature evolution is constrained by differential equations, its dynamic behavior is easier to analyze in theory and naturally possesses a certain degree of stability and Lipschitz continuity, which provides favorable conditions for improving the robustness of the model.

To comprehensively enhance the robustness and continuity of feature learning, we introduce NODE modeling into three representative components of the encoder: Vanilla Convolution, Dilated Convolution, and SE Convolution, corresponding respectively to local feature extraction, multi-scale contextual representation, and channel-wise feature recalibration.

3.1. Vanilla Convolution with NODE

Conventional convolution gradually increases the receptive field by stacking discrete layers, initially capturing only local information. NODE Convolution (NODE Conv), on the other hand, models the convolution operation as a dynamical system in continuous time, representing the evolution of features over time.

We apply NODE Conv to the bottleneck layer of the network, which is named BottleneckNODE, for compression, learning, and recovery of image features as show in Figure 3. In BottleneckNODE, channel compression is first performed using a

1 \times 1

convolution. Subsequently, a NODE Conv

3 \times 3

layer is used to model continuous dynamic feature changes. Finally, a

1 \times 1

convolution is performed to restore the number of channels. In the NODE Conv part, the evolution of the feature map

z (t)

can be described by an ordinary differential equation above and

f (\cdot)

represents a nonlinear transformation consisting of

3 \times 3

convolution, batch normalization, and ReLU activation.

The output features are obtained by solving the NODE and integrating it over the time interval

[t_{0}, t_{M}]

. NODE Conv aggregates richer multi-scale contextual information within a single convolution branch, thereby better capturing long-range dependencies.

Since intermediate layers contain a large number of feature representations, directly applying NODE modeling throughout the entire encoder would incur prohibitive computational overhead. To strike a balance between efficiency and robustness, we adopt a hierarchical design in which standard convolutional layers are first employed to efficiently extract low-level and mid-level image features. NODE Conv is then introduced only at the final stage of the encoder to integrate multi-scale contextual information and capture long-range dependencies.

By modeling feature transformation as a smooth dynamical system, NODE-based convolution enhances robustness against noise and perturbations while preserving expressive power. Following this design principle, all NODE-enhanced modules in our framework are applied after sufficient feature extraction by conventional convolutions, and the same strategy is consistently adopted throughout the network. Therefore, the subsequent NODE-based modules share an identical adaptation philosophy and are not repeatedly elaborated in later sections.

3.2. Dilated Convolution with NODE

Dilated Convolution can effectively enlarge the receptive field without increasing the number of parameters or computational cost, thereby incorporating richer contextual information during feature extraction. The process of dilated convolution can also be dynamically modeled by an NODE, as shown in Figure 4.

The application of dilated convolution in the ASPP structure reflects its advantages. In the conventional ASPP structure, atrous convolution branches with different dilation rates can effectively capture multi-scale contextual information. However, the convolution operation is essentially a discrete feature mapping, which makes it difficult to fully model the continuous evolution of features across scales. To address this limitation, we introduce NODE into ASPP for dynamic modeling, namely NODE-ASPP, to achieve smoother and more robust multi-scale feature representation.

Compared to the discrete convolution operation of standard ASPP, the state change process of each dilation branch can be represented as:

z_{i} = f_{conv} (x; r_{i}) .

(3)

In each branch, we define a dynamic system based on the dilation rate

r_{i}

:

z_{i} (t_{M}) = z_{i} (t_{0}) + \int_{t_{0}}^{t_{M}} f_{θ} (z_{i} (t), t, r_{i}) d t .

(4)

Finally, the output features of all integrated branches are concatenated, channel-compressed, and fused using a 1 × 1 convolution to obtain the feature representation of NODE-ASPP-Block:

z (t_{M}) = Concat (z_{1} (t_{M}), z_{2} (t_{M}), \dots, z_{n} (t_{M})),

(5)

F_{NODE - ASPP} = {Conv}_{1 \times 1} (z (t_{M})) .

(6)

NODE-ASPP-Block enables continuous convolution evolution of each dilation branch, achieving stronger multi-scale feature modeling and robustness through a continuously changing multi-scale receptive field.

3.3. SE Convolution with NODE

In traditional channel-wise attention mechanisms, channel weights are typically generated by a two-layer fully connected network, lacking dynamics and continuity. To address this, we designed the NODE-SE-Block, which introduces NODE modeling to enable attention weights to vary continuously as input features evolve, as shown in Figure 5.

Similar to the traditional approach, the input features

X \in R^{C \times H \times W}

are transformed into a channel descriptor vector through global average pooling:

z_{c} (t_{0}) = F_{s q} (X_{c}) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{c, i, j} .

(7)

We introduce a dynamic modeling strategy by formulating the channel descriptor as the initial state of an ordinary differential equation:

z (t_{0}) = [z_{1} (t_{0}), z_{2} (t_{0}), \dots, z_{C} (t_{0})] .

(8)

Subsequently, the hidden representation of the channel descriptor evolves over time. The attention weight representation for each channel can be obtained by integrating the corresponding differential equation:

z_{c} (t_{M}) = F_{e x} (z_{t_{0}}) = z_{c} (t_{0}) + \int_{t_{0}}^{t_{M}} f_{θ} (z_{c} (t), t, c) d t .

(9)

The function

f_{θ} (\cdot)

denotes the parameterized dynamics governing channel-wise feature evolution. In practice,

f_{θ}

is implemented as a lightweight two-layer fully connected network with an intermediate nonlinear activation, which enables flexible yet stable modeling of dynamic channel interactions while introducing negligible computational overhead.

Finally, the feature weights of each channel are reassembled in channel order and a nonlinear activation function is applied to obtain the dynamic channel attention weights:

z (t_{M}) = [z_{1} (t_{M}), z_{2} (t_{M}), \dots, z_{C} (t_{M})] .

(10)

W = σ (z (t_{M})) .

(11)

The obtained attention weights are used to rescale the original features:

\hat{X} = F_{s c a l e} (X, W) = W ⊙ X .

(12)

This design enables the generation of dynamic channel attention, where the attention weights are continuously evolved via NODE modeling rather than being fixed by static parameters.

3.4. Synergistic Adversarial Training

In complex mangrove field monitoring scenarios, remote sensing segmentation models are exposed not only to human-induced perturbations but also to diverse and unpredictable natural degradations, such as atmospheric haze, sensor noise, and stripe artifacts. These natural disturbances are typically stochastic and difficult to model explicitly, which poses a significant challenge for robust deployment in real-world ecological environments.

To address this issue, we reveal a key synergistic mechanism between adversarial training and NODE architectures. Specifically, adversarial training using artificial perturbations can significantly enhance the robustness of NODE-based models against non-adversarial environmental noise. We refer to this mechanism as Synergistic Adversarial Training (SAT).

3.4.1. Robust Optimization Perspective

To formalize the proposed synergistic mechanism, let

f_{θ} : X \to Y

denote the remote sensing segmentation model parameterized by

θ \in R^{d}

, where

X

and

Y

represent the input image space and the semantic label space, respectively. Let

L (\cdot, \cdot)

denote the task-specific loss function, such as cross-entropy. Standard empirical risk minimization (ERM) optimizes the model by minimizing the expected loss over the training distribution

D

:

min_{θ} E_{(x, y) \sim D} [L (f_{θ} (x), y)],

(13)

where

(x, y)

are image-label pairs sampled from

D

.

SAT extends this framework into a robust min-max optimization problem, which can be viewed as a zero-sum game between the model and an adversary:

min_{θ} E_{(x, y) \sim D} [max_{{∥ δ ∥}_{p} \leq ϵ} L (f_{θ} (x + δ), y)],

(14)

where

δ \in R^{n}

represents a bounded additive perturbation constrained within an

l_{p}

-norm ball

B_{p} (x, ϵ) = {{x + δ : ∥ δ ∥}_{p} \leq ϵ}

with a radius

ϵ

.

Crucially, in the context of mangrove monitoring, any natural environmental degradation

η

that satisfies

{∥ η ∥}_{p} \leq ϵ

is contained within the same perturbation set. Consequently, by explicitly optimizing against the most damaging adversarial directions

δ^{*}

, the model implicitly develops a robustness margin that covers stochastic and non-adversarial noise encountered in real-world ecological scenarios.

3.4.2. Synergy with Continuous Dynamics

In Neural Ordinary Differential Equation (NODE) architectures, the feature evolution is conceptualized as a continuous-time dynamical system defined by:

\frac{d z (t)}{d t} = f (z (t), t, θ),

(15)

where

z (t)

denotes the hidden state at time t. When adversarial training is integrated with NODE-based encoders, the optimization process acts as a form of implicit regularization on the continuous vector field f. By forcing the model to remain invariant under extreme input perturbations, the learned dynamics are encouraged to satisfy a lower **Lipschitz constant** K:

∥ f (z_{1}) - f (z_{2}) ∥ \leq K ∥ z_{1} - z_{2} ∥,

(16)

where

z_{1}

and

z_{2}

represent hidden state representations. A smaller K indicates that the vector field is smoother, which effectively suppresses the amplification of perturbations during the integration process. This stability ensures that the NODE-based encoder captures the invariant structural characteristics of mangrove canopies, filtering out fragile, noise-sensitive patterns and resulting in more reliable feature trajectories

z (t)

across varying environmental conditions.

As a result, the NODE-based encoder learns smoother feature trajectories that favor invariant structural characteristics of mangrove canopies, rather than fragile, noise-sensitive patterns.

Although adversarial perturbations and natural environmental degradations arise from different mechanisms, both can be viewed as bounded perturbations in the input space. By explicitly optimizing the model with respect to worst-case adversarial directions, SAT enlarges the robustness margin around clean samples. Natural perturbations, which typically correspond to non-optimal directions within this margin, are therefore effectively handled by the trained model. For generating adversarial examples, we selected three representative methods as noise sources: Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), and Momentum Iterative FGSM (MI-FGSM). By combining these three methods, we can cover attacks ranging from single-step fast attacks to multi-step, more powerful attacks with momentum, thereby improving the model’s generalization ability to different perturbation patterns.

4. Experimental Section

4.1. Dateset

This experiment utilized a publicly available mangrove segmentation dataset (https://aistudio.baidu.com/datasetdetail/74400/0, accessed on 16 March 2026), which includes high-resolution remote sensing images paired with their corresponding pixel-level annotations. The dataset contains a total of 10,000 samples, evenly distributed with 5000 positive samples and 5000 negative samples. The dataset provides four-channel RGBA images, with additional channels representing the spectral characteristics and vegetation index of mangroves. The dataset covers complex scenes such as coastal zones, water bodies, farmland, and woodlands, and features blurred class boundaries and strong background interference, providing a good benchmark for research on intelligent mangrove identification and robust segmentation methods.

The annotations in the dataset use different pixel values to distinguish mangrove from non-mangrove areas. In this experiment, the dataset was divided into training, validation, and test sets in an 8:1:1 ratio, used for model training, hyperparameter tuning, and performance evaluation, respectively.

4.2. Evaluation Indicators

In this study, we use mean Intersection over Union (mIoU) as the main segmentation performance evaluation metric. Specifically, mIoU is calculated by calculating the intersection over union (IoU) of the predicted result and the ground-truth annotated region, and averaged over all categories. It is defined as follows:

mIoU = \frac{1}{K} \sum_{k = 1}^{K} \frac{T P_{k}}{T P_{k} + F P_{k} + F N_{k}},

(17)

where

T P_{k}

,

F P_{k}

,

F N_{k}

represent the number of true positive, false positive, and false negative pixels of the kth class, respectively, and K is the total number of classes.

4.3. Implementation Details

In our experiments, adversarial training is employed to evaluate the robustness of the proposed model in the mangrove segmentation task. The overall pipeline is as follows.

First, the input image is resized to

256 \times 256

and fed into the residual encoder to extract both deep and shallow feature representations. These features are subsequently refined through multiple enhancement modules, fused with shallow features, and finally passed to the decoder to generate the segmentation output.

To further improve robustness, adversarial examples are constructed online during training. Specifically, perturbations are generated based on the current model using three representative attack methods, namely FGSM, MI-FGSM, and PGD, and then added to the original images. For adversarial artificial perturbation experiments, adversarial examples are generated using a specific, single attack method throughout the training process. Conversely, to build generalized robustness against adversarial natural perturbations, we adopt a random per-batch approach. Clean samples and their corresponding adversarial counterparts are combined in a strict 1:1 ratio to form each training batch. The perturbation budget is set to

ϵ = 0.03

for all attacks. For MI-FGSM and PGD, the number of iterations is set to 10 with a step size of

α = 0.001

. During training, the model is jointly supervised using both clean and adversarial samples, thereby improving its robustness against noise and adversarial perturbations.

In addition, the proposed architecture integrates an ODE-based feature evolution module within the encoder. The continuous dynamics are numerically solved using a fixed-step fourth-order Runge–Kutta (RK4) solver over the interval

t \in [0, 1]

, with a single-step discretization. This results in a constant number of function evaluations (NFE = 4) for each forward pass.

For optimization, the Adam optimizer is adopted with an initial learning rate of

6 \times 10^{- 5}

, which is gradually adjusted during training. The model is trained for up to 1000 epochs, and an early stopping strategy based on validation loss is applied to prevent overfitting once convergence is detected. Mini-batch training is used throughout the optimization process. All experiments are conducted on an NVIDIA GeForce TITAN RTX GPU. The implementation was developed using Python 3.10 and PyTorch 2.1.0, accelerated by CUDA 12.1 and cuDNN 8.9.

4.4. Comparisons

To provide a consistent experimental setting, we adopt a unified segmentation network composed of a ResNet 50 backbone and a shared decoder. All NODE-based improvements are incorporated into this framework by modifying the corresponding components, while keeping the overall architecture consistent.

NODE Conv: Convolutional layers are reformulated as NODE Conv operations for temporal feature modeling.
NODE ASPP: ODE solvers is introduced into the ASPP module.
NODE SE: Dynamically modeling channel attention weights using NODE.

To ensure a fair comparison of robustness across different architectures, the same training strategy and hyperparameters are applied to several representative segmentation networks. Model performance is evaluated on a unified test set using multiple metrics, with the results reported in Table 1, comparing different networks under clean and perturbed inputs with varying attack strengths.

To quantitatively evaluate the robustness gains of our methodology, this section employs traditional discrete architectural modules as comparative baselines. Specifically, the standard Squeeze-and-Excitation block (SE) and the standard Atrous Spatial Pyramid Pooling module (ASPP) serve as the benchmark configurations against their respective NODE-enhanced counterparts. This experimental setup is designed to directly measure the performance improvement achieved by substituting discrete feature recalibration and spatial pooling with our proposed continuous-time NODE dynamics when the network is subjected to human-induced adversarial perturbations.

We further include DeepLabv3 as a state-of-the-art segmentation baseline for comparison. By leveraging atrous convolution and the ASPP module, DeepLabv3 is able to capture rich multi-scale contextual information without significantly increasing computational cost. This provides a strong and representative benchmark for evaluating the effectiveness of the proposed approach.

4.4.1. Analysis of Structural Robustness Metrics

To quantitatively evaluate the structural stability of the proposed modules, we measured the empirical local Lipschitz constant and Jacobian spectral norm using random input testing and power iteration methods. As shown in Table 1, the evaluation reveals significant differences in how continuous-time modeling affects various network components.

The results demonstrate that the NODE architecture effectively suppresses perturbation amplification in complex feature extraction modules. Notably, the discrete ASPP module is highly sensitive to perturbations and exhibits a Lipschitz constant of 141.49. By replacing it with NODE ASPP, this metric drops drastically to 6.86, alongside a similar reduction in the spectral norm from 28.78 to 4.30. Furthermore, the NODE SE module successfully reduces the Lipschitz constant from 123.66 for the standard SE module to 91.52. These substantial reductions indicate that modeling multi-scale spatial pooling and channel recalibration as continuous dynamic processes fundamentally stabilizes the feature mappings.

Conversely, compared with the relatively shallow and smooth discrete ResNet50 baseline with a metric of 7.52, the NODE Conv module produces a higher theoretical metric of 85.04. This increase mainly stems from the additional implicit computational depth introduced by the ordinary differential equation solver, which simultaneously enhances the non-linear representational capacity of the feature extraction process. Although the higher Lipschitz-related metric suggests increased dynamic complexity, subsequent experimental results demonstrate that the NODE Conv module can still maintain favorable robustness under adversarial perturbations when combined with the proposed synergistic adversarial training strategy.

4.4.2. Performance Under Human-Induced Noise

Table 2 presents a performance comparison of various networks under a clean environment without adversarial training to establish a baseline for their segmentation abilities. Notably, the results demonstrate that the integration of the NODE architecture does not compromise the network’s inherent segmentation capability, maintaining a highly competitive performance compared to standard backbones in noise-free environments.

Table 3 demonstrates that the proposed NODE-based models consistently outperform traditional discrete architectures in adversarial robustness across all evaluated white-box attacks. Under FGSM, NODE SE achieves better robustness than the standard SE module, improving the mIoU from 0.8968 to 0.9232 while reducing the performance drop from 5.51 to 3.43. Iterative attacks reveal similar advantages: NODE Conv shows remarkable resilience against MI-FGSM with a minimal drop of 2.27, contrasting sharply with the severe 4.43 degradation observed in Deeplabv3. Finally, against the PGD attack, NODE ASPP proves to be the most stable configuration, yielding the lowest performance degradation (2.35) among all tested methods.

In summary, the experimental results verify that re-formulating discrete feature mappings as continuous-time dynamic systems via NODEs effectively suppresses gradient-based perturbations. By smoothing feature trajectories and enhancing Lipschitz continuity, our methodology ensures more stable and reliable mangrove segmentation in the presence of adversarial noise.

Beyond the fixed-strength evaluation in Table 3, a sensitivity analysis was conducted by progressively increasing the perturbation intensity, as shown in Figure 6. Since different attack methods exhibit different degradation behaviors, the perturbation ranges were selected accordingly. Specifically, FGSM was evaluated with

ϵ \in [0, 0.1]

, whereas MI-FGSM and PGD were evaluated with

ϵ \in [0, 0.3]

.

Figure 6 presents the robustness comparison under progressively stronger adversarial perturbations. As the perturbation magnitude

ϵ

increases, the performance of all models gradually degrades due to the increasing distortion in the input feature distribution. However, the degradation patterns differ significantly across architectures.

Overall, the NODE-enhanced variants exhibit improved robustness and slower performance degradation compared with their discrete counterparts. For example, under FGSM with

ϵ = 0.10

, NODE Conv achieves an accuracy of 0.5630, significantly outperforming ResNet50 (0.2898) and ASPP (0.3684). In particular, the NODE Conv model consistently maintains higher stability under MI—FGSM and PGD attacks, suggesting that continuous-time feature evolution can effectively mitigate the propagation of adversarial perturbations.

Interestingly, DeepLabv3 also shows relatively strong robustness under FGSM attacks across different perturbation intensities. This behavior is likely related to the relatively limited optimization strength of single-step attacks, under which the overall segmentation structure of DeepLabv3 can still be partially preserved. However, under stronger iterative attacks such as MI-FGSM and PGD, the robustness advantage of the proposed NODE-based architectures becomes significantly more evident, indicating their superior stability against progressively accumulated perturbations.

The SE module also demonstrates strong robustness characteristics. Across the three attack settings, SE maintains relatively stable performance compared with most baseline architectures, indicating that channel-wise feature recalibration helps suppress noise-sensitive responses.

Under the stronger PGD attack, the advantage of the NODE-based design becomes more pronounced at higher perturbation levels. When

ϵ = 0.30

, NODE ASPP achieves the best performance (0.6536), outperforming both the original ASPP module and other baseline models.

These results indicate that modeling feature transformations as continuous dynamical systems improves the stability of learned representations under adversarial perturbations, while the SE mechanism further contributes to robust feature recalibration.

4.4.3. Performance Under Natural Noise

The proposed Synergistic Adversarial Training strategy aims to defend against adversarial perturbations while ensuring that the learned robustness generalizes to natural noise.

We further evaluate whether the learned robustness generalizes to naturally occurring noise. To this end, two common noise models are introduced: Gaussian noise, which simulates environmental disturbances such as sensor noise and illumination variation, and salt-and-pepper noise, which represents sensor degradation and random pixel corruption.

The results are shown in Figure 7. Without SAT training, the segmentation performance degrades rapidly as the noise intensity increases. In contrast, models trained with SAT maintain significantly higher accuracy under both Gaussian and sensor degradation noise. Furthermore, the experimental results show that NODE-based architectures achieve more substantial robustness improvements under SAT compared with traditional discrete networks. Although adversarial training enhances the performance of all evaluated models, the NODE-enhanced variants consistently maintain higher segmentation accuracy and exhibit slower performance degradation as the corruption intensity increases.

Among the compared methods, NODE ASPP and NODE SE demonstrate particularly strong robustness under both Gaussian noise and sensor degradation noise. Even under severe corruption conditions, the NODE-based models are able to preserve more stable segmentation performance and clearer structural predictions than their conventional counterparts. These results indicate that the proposed SAT framework is especially effective when combined with NODE-based feature modeling, leading to better robustness generalization from adversarial perturbations to naturally occurring environmental disturbances.

Overall, the experimental findings demonstrate that integrating SAT with NODE architectures not only improves adversarial robustness, but also significantly enhances the stability and reliability of mangrove segmentation under complex real-world noise conditions.

4.4.4. Ablation Study on SE Depth and NODE Architecture

To verify that the performance superiority of NODE SE is driven by the continuous-time node mechanism rather than simple depth accumulation, we further introduce DeepSE as an ablation baseline by scaling the conventional SE block to comparable or even greater depths. Specifically, the original SE module is expanded to two and four stacked layers, denoted as DeepSE (Layer2) and DeepSE (Layer4), respectively. Among the evaluated adversarial attacks, we select PGD as the representative benchmark due to its strong iterative perturbation capability and wide adoption in robustness evaluation. The complexity comparison of the four models is reported in Table 4.

After adversarial training, we compare the performance of SE (Layer1), DeepSE (Layer2), deepSE (Layer4), and NODE SE under both clean and adversarial conditions. The experimental results can be seen in Figure 8.

As shown in Figure 8, increasing the depth of the conventional SE module does improve the mIoU performance under attack-free conditions. However, under PGD attack, NODE SE still achieves a 1.79% higher mIoU than the deeper DeepSE(Layer4) configuration. This result clearly indicates that the robustness enhancement of NODE SE is not merely attributed to increased network depth or parameter accumulation, but rather originates from the continuous dynamic feature evolution introduced by the NODE mechanism.

4.4.5. Inference Speed Comparison

Figure 9 reports the inference speed comparison of different network architectures measured in frames per second (FPS). As shown in the results, the introduction of NODE-based modules leads to a moderate reduction in inference speed due to the additional computation introduced by the ODE solver. For example, NODE Conv achieves 120.61 FPS compared with 138.12 FPS for the standard ResNet50 backbone.

Despite this additional computational cost, the NODE-enhanced models still maintain competitive inference efficiency. In particular, NODE SE achieves 124.8 FPS, which remains close to the original SE module (133.73 FPS). Similarly, NODE ASPP runs at 95.69 FPS compared with 101.16 FPS for ASPP. These results indicate that the proposed continuous-time modeling introduces only a limited computational overhead while preserving high inference efficiency.

Compared with the full Deeplabv3 architecture, which runs at 59.31 FPS, all lightweight module-level variants achieve substantially higher inference speed. This demonstrates that the proposed NODE-based modifications provide improved robustness while maintaining practical computational efficiency for real-world segmentation tasks.

4.5. Visualization Results

In this section, we compared and visualized the segmentation results obtained from clean and noise-corrupted inputs. As shown in Figure 10, our proposed network produces highly consistent segmentation maps under both conditions. Even when the input images are disturbed by noise, the predicted boundaries and region structures remain nearly identical to those obtained from clean images, showing only minimal deviations in fine details. This consistency demonstrates that our model is capable of resisting noise interference and maintaining stable segmentation performance. Such robustness is particularly valuable for mangrove remote sensing applications, where images are often affected by atmospheric noise, illumination changes, and sensor instability.

5. Limitations and Future Work

Although the proposed NODE-based framework demonstrates strong robustness against both adversarial perturbations and natural noise, several limitations still remain. First, due to the introduction of ODE solvers, the proposed framework inevitably incurs additional computational overhead compared with conventional discrete architectures, which may limit its applicability in resource-constrained or real-time remote sensing scenarios. Second, the current experimental setting adopts a commonly used random train/validation/test split strategy and mainly evaluates robustness under representative perturbations, including adversarial attacks and typical noise corruption. More challenging cross-region, cross-sensor, and remote sensing-specific corruption settings may further reveal the generalization capability of the proposed framework under domain shifts. In addition, the current study mainly focuses on spatial feature robustness and does not explicitly model temporal ecological dynamics.

Future work will therefore investigate lightweight NODE architectures for more efficient deployment, robustness evaluation under broader real-world remote sensing corruptions and cross-domain settings, as well as the integration of temporal modeling and multi-modal remote sensing information to better capture long-term ecological variations and further improve robustness in practical monitoring applications.

6. Conclusions

Benefiting from the dynamic modeling capability of NODE, NODE-based modules demonstrate strong robustness against a wide range of noise perturbations. In addition, the proposed SAT strategy further enhances the adversarial robustness and stability of the network under natural perturbations. Our results show that NODE-embedded neural segmentation networks can effectively resist both natural and potentially human-induced noise in mangrove remote sensing image segmentation tasks. Furthermore, we propose a novel approach to dynamically model and generate the channel attention weights of traditional SE-Blocks using NODE. Ablation studies on inference latency further confirm the efficiency and rationality of individual NODE-based modules. Taken together, these findings suggest that NODE-based neural architectures combined with SAT possess strong potential for future applications in mangrove remote sensing monitoring and ecological maintenance.

Author Contributions

The contributions of the authors are distributed as follows: H.Y. and Z.C. developed the initial conceptualization of the study. H.Y. was responsible for the methodology, software development, formal analysis, data curation, visualization, and the preparation of the original manuscript draft. X.P., T.W. and Y.C. performed the experimental validation and result verification. Y.L. assisted in the investigation and data collection. X.C. and J.H. provided essential resources and hardware support. Z.C. and H.Y. handled the manuscript’s critical review and editing. Z.C. and J.H. secured the funding for this project. Z.C. provided overall supervision and project administration. All authors have read and approved the final version of the manuscript.

Funding

This project was supported by the Open Fund Project of the Key Laboratory of Tropical Forestry Resources Monitoring and Application of Hainan Province under Grant SZDSYS2024-002.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors also wish to convey their sincere appreciation to all colleagues and collaborators for their insightful discussions and constructive suggestions throughout the course of this study. Their valuable input and feedback have significantly contributed to enhancing the rigor and completeness of this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Räuker, T.; Ho, A.; Casper, S.; Hadfield-Menell, D. Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. arXiv 2023, arXiv:2207.13243. [Google Scholar] [CrossRef]
Shu, H.; Zhu, H. Sensitivity Analysis of Deep Neural Networks. Proc. AAAI Conf. Artif. Intell. 2019, 33, 4943–4950. [Google Scholar] [CrossRef]
Erichson, N.B.; Taylor, D.; Wu, Q.; Mahoney, M.W. Noise-Response Analysis of Deep Neural Networks Quantifies Robustness and Fingerprints Structural Malware. arXiv 2021, arXiv:2008.00123. [Google Scholar] [CrossRef]
Chen, R.T.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural ordinary differential equations. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 (NeurIPS 2018), Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
Yi, Z. nmODE: Neural memory ordinary differential equation. Artif. Intell. Rev. 2023, 56, 14403–14438. [Google Scholar] [CrossRef]
Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 40, 834–848. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Rebuffi, S.A.; Croce, F.; Gowal, S. Revisiting adapters with adversarial training. arXiv 2022, arXiv:2210.04886. [Google Scholar] [CrossRef]
Kuenzer, C.; Bluemel, A.; Gebhardt, S.; Quoc, T.V.; Dech, S. Remote Sensing of Mangrove Ecosystems: A Review. Remote Sens. 2011, 3, 878–928. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Sun, Z.; Jiang, W.; Ling, Z.; Zhong, S.; Zhang, Z.; Song, J.; Xiao, Z. Using Multisource High-Resolution Remote Sensing Data (2 m) with a Habitat–Tide–Semantic Segmentation Approach for Mangrove Mapping. Remote Sens. 2023, 15, 5271. [Google Scholar] [CrossRef]
Bunting, P.; Rosenqvist, A.; Lucas, R.M.; Rebelo, L.M.; Hilarides, L.; Thomas, N.; Hardy, A.; Itoh, T.; Shimada, M.; Finlayson, C.M. The Global Mangrove Watch—A New 2010 Global Baseline of Mangrove Extent. Remote Sens. 2018, 10, 1669. [Google Scholar] [CrossRef]
de Souza, L.J.V.; Zreik, I.V.R.; Salem-Sermanet, A.; Seghouani, N.; Pourchier, L. A Deep Learning-Based Approach for Mangrove Monitoring. arXiv 2024, arXiv:2410.05443. [Google Scholar] [CrossRef]
Wang, X.; Zhang, Y.; Ca, J.; Qin, Q.; Feng, Y.; Yan, J. Semantic segmentation network for mangrove tree species based on UAV remote sensing images. Sci. Rep. 2024, 14, 29860. [Google Scholar] [CrossRef] [PubMed]
Pham, T.D.; Yokoya, N.; Bui, D.T.; Yoshino, K.; Friess, D.A. Remote Sensing Approaches for Monitoring Mangrove Species, Structure, and Biomass: Opportunities and Challenges. Remote Sens. 2019, 11, 230. [Google Scholar] [CrossRef]
Li, Z.; Shen, H.; Weng, Q.; Zhang, Y.; Dou, P.; Zhang, L. Cloud and cloud shadow detection for optical satellite imagery: Features, algorithms, validation, and prospects. ISPRS J. Photogramm. Remote Sens. 2022, 188, 89–108. [Google Scholar] [CrossRef]
Taureau, F.; Robin, M.; Proisy, C.; Fromard, F.; Imbert, D.; Debaine, F. Mapping the Mangrove Forest Canopy Using Spectral Unmixing of Very High Spatial Resolution Satellite Images. Remote Sens. 2019, 11, 367. [Google Scholar] [CrossRef]
Jiang, B.; Li, X.; Chong, H.; Wu, Y.; Li, Y.; Jia, J.; Wang, S.; Wang, J.; Chen, X. A deep-learning reconstruction method for remote sensing images with large thick cloud cover. Int. J. Appl. Earth Obs. Geoinf. 2022, 115, 103079. [Google Scholar] [CrossRef]
Zan, Y.; Lu, P.; Meng, T. A Gradual Adversarial Training Method for Semantic Segmentation. Remote Sens. 2024, 16, 4277. [Google Scholar] [CrossRef]
Yu, W.; Xu, Y.; Ghamisi, P. Universal adversarial defense in remote sensing based on pre-trained denoising diffusion models. Int. J. Appl. Earth Obs. Geoinf. 2024, 133, 104131. [Google Scholar] [CrossRef]
Hao, X.; Liu, L.; Yang, R.; Yin, L.; Zhang, L.; Li, X. A Review of Data Augmentation Methods of Remote Sensing Image Target Recognition. Remote Sens. 2023, 15, 827. [Google Scholar] [CrossRef]
Fu, L.; Wang, Y.; Wu, S.; Zhuang, J.; Wu, Z.; Wu, J.; Chen, H.; Chen, Y. TCCFNet: A semantic segmentation method for mangrove remote sensing images based on two-channel cross-fusion networks. Front. Mar. Sci. 2025, 12, 1535917. [Google Scholar] [CrossRef]
Banitaba, F.S.; Aygun, S.; Shoushtari Moghadam, M.; Jalilvand, A.; Li, B.; Hassan Najafi, M. Adversarial Attack Bypass by Stochastic Computing. IEEE Embed. Syst. Lett. 2025, 17, 234–239. [Google Scholar] [CrossRef]
Gui, H.; Zhang, N. Myocarditis Diagnosis Using Semi-Supervised Generative Adversarial Network and Differential Evolution. J. Artif. Intell. Syst. Model. 2024, 2, 83–102. [Google Scholar]
Mohasel Arjomandi, H.; Khalooei, M.; Amirmazlaghani, M. Low-epsilon adversarial attack against a neural network online image stream classifier. Appl. Soft Comput. 2023, 147, 110760. [Google Scholar] [CrossRef]
Li, D.; Tang, P.; Zhang, R.; Sun, C.; Li, Y.; Qian, J.; Liang, Y.; Yang, J.; Zhang, L. Robust Blood Cell Image Segmentation Method Based on Neural Ordinary Differential Equations. Comput. Math. Methods Med. 2021, 2021, 5590180. [Google Scholar] [CrossRef]
Ru, J.; Lu, B.; Chen, B.; Shi, J.; Chen, G.; Wang, M.; Pan, Z.; Lin, Y.; Gao, Z.; Zhou, J.; et al. Attention guided neural ODE network for breast tumor segmentation in medical images. Comput. Biol. Med. 2023, 159, 106884. [Google Scholar] [CrossRef] [PubMed]
Khoshsirat, S.; Kambhamettu, C. Semantic Segmentation Using Neural Ordinary Differential Equations. In Proceedings of the Advances in Visual Computing; Bebis, G., Li, B., Yao, A., Liu, Y., Duan, Y., Lau, M., Khadka, R., Crisan, A., Chang, R., Eds.; Springer: Cham, Switzerland, 2022; pp. 284–295. [Google Scholar]
Anumasa, S.; Srijith, P.K. Improving Robustness and Uncertainty Modelling in Neural Ordinary Differential Equations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Virtually, 5–9 January 2021; pp. 4053–4061. [Google Scholar]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2015, arXiv:1412.6572. [Google Scholar] [CrossRef]
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2019, arXiv:1706.06083. [Google Scholar] [CrossRef]
Shafahi, A.; Najibi, M.; Ghiasi, A.; Xu, Z.; Dickerson, J.; Studer, C.; Davis, L.S.; Taylor, G.; Goldstein, T. Adversarial training for free! In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 8–14 December 2019. [Google Scholar]
Wong, E.; Rice, L.; Kolter, J.Z. Fast is better than free: Revisiting adversarial training. arXiv 2020, arXiv:2001.03994. [Google Scholar] [CrossRef]
Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I.; Boneh, D.; McDaniel, P. Ensemble Adversarial Training: Attacks and Defenses. arXiv 2020, arXiv:1705.07204. [Google Scholar] [CrossRef]
Ross, A.S.; Doshi-Velez, F. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients. arXiv 2017, arXiv:1711.09404. [Google Scholar] [CrossRef]
Kannan, H.; Kurakin, A.; Goodfellow, I. Adversarial Logit Pairing. arXiv 2018, arXiv:1803.06373. [Google Scholar]
Zhang, H.; Yu, Y.; Jiao, J.; Xing, E.P.; Ghaoui, L.E.; Jordan, M.I. Theoretically Principled Trade-off between Robustness and Accuracy. arXiv 2019, arXiv:1901.08573. [Google Scholar] [CrossRef]
Müller, R.; Kornblith, S.; Hinton, G. When Does Label Smoothing Help? arXiv 2020, arXiv:1906.02629. [Google Scholar] [CrossRef]
Elsayed, G.F.; Krishnan, D.; Mobahi, H.; Regan, K.; Bengio, S. Large Margin Deep Networks for Classification. arXiv 2018, arXiv:1803.05598. [Google Scholar] [CrossRef]
Xie, C.; Wang, J.; Zhang, Z.; Ren, Z.; Yuille, A. Mitigating Adversarial Effects Through Randomization. arXiv 2018, arXiv:1711.01991. [Google Scholar] [CrossRef]
Guo, C.; Rana, M.; Cisse, M.; van der Maaten, L. Countering Adversarial Images using Input Transformations. arXiv 2018, arXiv:1711.00117. [Google Scholar] [CrossRef]
Cisse, M.; Bojanowski, P.; Grave, E.; Dauphin, Y.; Usunier, N. Parseval networks: Improving robustness to adversarial examples. In Proceedings of the 34th International Conference on Machine Learning—Volume 70, Sydney, NSW, Australia, 2017; ICML’17; ACM: New York, NY, USA, 2017; pp. 854–863. [Google Scholar]
Xie, C.; Wu, Y.; Maaten, L.V.D.; Yuille, A.L.; He, K. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 501–509. [Google Scholar]
Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar] [CrossRef]
Haber, E.; Ruthotto, L.; Holtham, E.; Jun, S.H. Learning across scales—A multiscale method for Convolution Neural Networks. arXiv 2017, arXiv:1703.02009. [Google Scholar] [CrossRef]

Figure 1. Framework of the proposed Network with NODE-based architecture. In the network, each submodule treats time as a continuous dimension to perform dynamic modeling, thereby capturing richer representations of the underlying data distribution. By leveraging this continuous-depth formulation, the network is able to enhance its feature extraction capability and improve the robustness of image representation.

Figure 2. Comparison of CAM heatmaps of remote sensing segmentation networks before and after introducing the SAT strategy under natural noise interference. The subfigure (a) illustrates that without SAT, noise causes distortion in the extraction of ground features; the subfigure (b) illustrates that with SAT, the model can penetrate Gaussian and salt-and-pepper noise and lock in the structural features of the mangrove canopy.

Figure 3. Schematic of NODE-Based Convolutional Dynamics. The convolution operations within the encoder of the residual network are modeled as a continuous-time dynamical system.

Figure 4. NODE-ASPP-Block diagram. Each dilated convolution branch models the action of the convolution kernel as an NODE function.

Figure 5. NODE-SE-Block diagram. The evolution of each channel’s contribution is formulated as a continuous-time system.

Figure 6. Performance of the compared networks under various adversarial attacks. Subfigure (a) illustrates the performance under FGSM attacks with different noise intensities, subfigure (b) illustrates the performance under MI-FGSM attacks with different noise intensities, subfigure (c) illustrates the performance under PGD attacks with different noise.

Figure 7. Performance comparison before and after SAT under natural noise in terms of mIoU. (a) Comparison of non-NODE-based networks; (b) Comparison of NODE-based networks.

Figure 8. Performance Comparison Between NODE SE and Deepened SE Baselines Under PGD Attack.

Figure 9. Comparison of inference latency between different models in terms and FPS.

Figure 10. Visualization of NODE-based segmentation networks results under FGSM attack. Subfigure (a) shows input images and corresponding ground-truth masks, subfigure (b) shows segmentation performance comparison between NODE Conv and ResNet, subfigure (c) shows segmentation performance comparison between NODE ASPP and the conventional ASPP module, subfigure (d) shows segmentation performance comparison between NODE SE and the conventional SE module.

Table 1. Theoretical robustness metrics of different modules estimated via random input testing and power iteration.

Metric	ResNet50	ASPP	SE
Lipschitz Constant	7.5296	141.4945	123.6679
Spectral Norm	3.7839	28.7885	35.7250
Metric	NODE Conv	NODE ASPP	NODE SE
Lipschitz Constant	85.0418	6.8623	91.5244
Spectral Norm	19.2601	4.3029	22.5218

Lipschitz Constant represents the empirical local Lipschitz constant evaluated under the perturbation budget

ϵ = 0.03

. Better performance results are in bold.

Table 2. Comparison of Segmentation Ability of Networks under Clean Environment.

NODE-Based Network	mIoU	Non-NODE-Based Network	mIoU
NODE Conv	0.9833	ResNet50	0.9811
NODE ASPP	0.9844	ASPP	0.9808
NODE SE	0.9825	SE	0.9836
		Deeplabv3	0.9932

Better performance results are in bold.

Table 3. Performance under human-induced noise after adversarial training in terms of mIoU.

Type	ResNet50	ASPP	SE	Deeplabv3
FGSM	0.9005 (−4.94)	0.8969 (−5.45)	0.8968 (−5.51)	0.9400 (−2.63)
MI-FGSM	0.8905 (−2.98)	0.8900 (−2.73)	0.8927 (−2.72)	0.8936 (−4.43)
PGD	0.8800 (−3.46)	0.8815 (−2.99)	0.8487 (−4.39)	0.8824 (−3.95)
Type	NODE Conv	NODE ASPP	NODE SE
FGSM	0.9114 (−4.26)	0.9134 (−4.59)	0.9232 (−3.43)
MI-FGSM	0.9095 (−2.27)	0.8901 (−2.84)	0.8946 (−2.82)
PGD	0.8811 (−3.32)	0.9093 (−2.35)	0.8886 (−3.03)

The numbers in parentheses indicate the performance drop of the corresponding metric compared to the interference-free setting, a smaller performance drop indicates that the network exhibits stronger robustness. Better performance results are in bold.

Table 4. Comparison of model complexity Between NODE SE and Deepened SE Baselines.

Model	Variant Params (M) ↓
SE (Layer1)	24.131
DeepSE (Layer2)	24.148
DeepSE (Layer4)	24.164
NODE SE (Ours)	24.135

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, H.; Pan, X.; Wu, T.; Chen, Y.; Li, Y.; Chen, X.; Hu, J.; Chen, Z. Robust Segmentation of Mangrove in Remote Sensing Images via ODE-Based Neural Networks and Adversarial Training. Appl. Sci. 2026, 16, 5812. https://doi.org/10.3390/app16125812

AMA Style

Yu H, Pan X, Wu T, Chen Y, Li Y, Chen X, Hu J, Chen Z. Robust Segmentation of Mangrove in Remote Sensing Images via ODE-Based Neural Networks and Adversarial Training. Applied Sciences. 2026; 16(12):5812. https://doi.org/10.3390/app16125812

Chicago/Turabian Style

Yu, Hao, Xiaoyan Pan, Tingtian Wu, Yiqing Chen, Yuanling Li, Xiaohua Chen, Junjie Hu, and Zongzhu Chen. 2026. "Robust Segmentation of Mangrove in Remote Sensing Images via ODE-Based Neural Networks and Adversarial Training" Applied Sciences 16, no. 12: 5812. https://doi.org/10.3390/app16125812

APA Style

Yu, H., Pan, X., Wu, T., Chen, Y., Li, Y., Chen, X., Hu, J., & Chen, Z. (2026). Robust Segmentation of Mangrove in Remote Sensing Images via ODE-Based Neural Networks and Adversarial Training. Applied Sciences, 16(12), 5812. https://doi.org/10.3390/app16125812

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Segmentation of Mangrove in Remote Sensing Images via ODE-Based Neural Networks and Adversarial Training

Abstract

1. Introduction

2. Related Works

2.1. Remote Sensing of Mangrove

2.2. Sensitivity of Deep Learning Networks

2.3. Neural Ordinary Differential Equations

2.4. Defense Mechanisms Against Perturbations

3. Methodology

3.1. Vanilla Convolution with NODE

3.2. Dilated Convolution with NODE

3.3. SE Convolution with NODE

3.4. Synergistic Adversarial Training

3.4.1. Robust Optimization Perspective

3.4.2. Synergy with Continuous Dynamics

4. Experimental Section

4.1. Dateset

4.2. Evaluation Indicators

4.3. Implementation Details

4.4. Comparisons

4.4.1. Analysis of Structural Robustness Metrics

4.4.2. Performance Under Human-Induced Noise

4.4.3. Performance Under Natural Noise

4.4.4. Ablation Study on SE Depth and NODE Architecture

4.4.5. Inference Speed Comparison

4.5. Visualization Results

5. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI