Next Article in Journal
A Systematic Review of Deep Learning-Based Methods for Ship Trajectory Prediction
Previous Article in Journal
A Shape Optimization Method Based on Sensitivity-Driven Surrogate Model for a Rim-Driven-Propelled UUV
Previous Article in Special Issue
A Deep Learning-Integrated Framework for Operational Rip Current Warning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Physics-Structured Residual Learning for Ship Maneuvering Prediction: Multi-Source Disturbance Decomposition and Compensation

1
School of Naval Architecture and Ocean Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
2
Key Laboratory of Ship and Ocean Hydrodynamics of Hubei Province, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2026, 14(9), 808; https://doi.org/10.3390/jmse14090808
Submission received: 31 March 2026 / Revised: 24 April 2026 / Accepted: 25 April 2026 / Published: 28 April 2026
(This article belongs to the Special Issue Artificial Intelligence and Its Application in Ocean Engineering)

Abstract

Ship maneuvering models based on MMG or Abkowitz formulations often suffer from systematic mismatches under real operating conditions, where shallow water, hull fouling, rudder degradation, and wind loads may coexist. This study proposes a physics-structured residual learning framework for multi-source disturbance decomposition and compensation. Disturbance-specific expert networks are introduced to map different disturbance sources into separate residual channels. A CNN-SE-BiLSTM encoder is further designed to estimate the slowly varying latent disturbance states from residual sequences, whereas wind is treated through an external pathway owing to its directly measurable and higher-frequency nature. Simulations on the KVLCC2 benchmark vessel under single-source, triple-source, and wind-inclusive disturbance scenarios demonstrate stable long-horizon closed-loop autoregressive prediction, with position-RMSE reductions of 74.7–91.7% relative to the corresponding nominal-MMG and wind-ablation baselines. These results indicate that the proposed physics-structured residual learning framework improves long-horizon prediction accuracy while retaining interpretable and modular disturbance-specific correction channels under complex operating conditions.

1. Introduction

The parameters of ship maneuvering models established based on the MMG or Abkowitz formulations are calibrated under ideal conditions using captive model tests or CFD simulations [1,2,3,4]. However, in actual operations, ships frequently encounter environments that deviate from these calibration conditions [5]. When entering shallow waters, bottom confinement can substantially alter the added mass and hydrodynamic derivatives [6,7,8]. Biofouling on the hull accumulates over time during service, continuously increasing frictional resistance and modifying the wake field [9,10]. Meanwhile, rudder surface degradation and cavitation-related roughness may deteriorate hydrodynamic performance and weaken rudder effectiveness [11,12]. These disturbances may induce systematic drift in model parameters, which can accumulate progressively in closed-loop control and ultimately manifest as trajectory deviations, thereby posing a serious threat to the safety of autonomous navigation systems.
In response to disturbances arising from deviations from the calibrated operating conditions, a variety of relatively mature approaches have been developed in traditional physics-based methods in both research and engineering practice. The most common category is offline recalibration and scenario-specific correction of mechanistic models based on experiments, CFD, or empirical formulas, including MMG-based empirical approaches for estimating hydrodynamic parameters directly from basic hull-form parameters [13]. For example, in shallow-water maneuvering studies, correction factors associated with the water-depth-to-draft ratio are often introduced to modify the added mass, maneuvering derivatives, wake distribution, and rudder effectiveness parameters, thereby extending models established under deep-water conditions to shallow-water operations [8,14,15]. For external environmental effects such as wind, waves, and currents, additional environmental force and moment terms are usually incorporated into the equations of motion [16,17]. When problems such as hull fouling, propulsion performance degradation, or rudder deterioration arise, traditional approaches tend to describe them indirectly. These effects are often modeled as increases in resistance, losses in propulsive efficiency, or degradations in rudder force characteristics [9,10,11,12,18,19].
Within the traditional mechanistic framework, numerous online identification and adaptive estimation methods have also been developed to update model parameters or compensate for equivalent mismatches during operation. Representative approaches include recursive least squares, the extended Kalman filter, and the unscented Kalman filter, which use navigation measurements to perform joint state–parameter estimation and thereby alleviate the mismatch caused by fixed model parameters under varying operating conditions [20,21]. At the control level, disturbance observers, extended state observers, unknown input observers, as well as robust, adaptive, and sliding-mode control methods are also widely adopted to estimate or compensate for hard-to-model disturbances in real time within the closed-loop system [22,23].
However, the effectiveness of these methods generally relies on strong prior assumptions and remains limited in complex real-world navigation scenarios [24]. On the one hand, offline recalibration, scenario-specific correction, and multi-condition parameter scheduling usually require the disturbance type to be known in advance. Their effectiveness decreases when multiple disturbances coexist and evolve continuously over time [15]. On the other hand, although online identification, observer-based, and compensation-based methods can suppress the negative effects on closed-loop control, their core still focuses on tracking and compensating for the overall deviation, and makes less distinction between the physical sources of different disturbances [20,21,22,24]. Therefore, under multi-source mismatch conditions they often struggle to simultaneously achieve deviation compensation, which limits model interpretability and diagnosability in complex scenarios.
Environmental uncertainty has a significant impact on ship maneuvering dynamics. Data-driven nonparametric modeling methods offer clear advantages over the traditional mechanistic framework, as they do not require an explicit predefined mathematical form and are therefore better suited to handling complex disturbances [25,26]. Existing studies can be broadly categorized into three groups. The first includes static or shallow nonlinear regression models, such as ANNs and SVR, which predict future motion responses from current or limited historical states and control inputs [27,28]. Recent studies have also explored singular-value-decomposition-based approaches to identify hydrodynamic variations from limited maneuvering information [29]. The second group consists of temporal deep learning models, such as LSTM, GRU, and TCN, which can explicitly capture temporal dependence and memory effects and have therefore been widely applied to time-series-based maneuvering modeling and prediction [25,30]. The third group comprises physics-guided or hybrid approaches, including PINNs, Neural ODEs, and residual-learning frameworks, which incorporate dynamical priors, physical constraints, or structured residual representations into data-driven models to improve interpretability [31,32,33].
Nevertheless, the modeling targets and validation settings in existing studies are still largely limited to single disturbances or relatively idealized conditions, such as shallow-water effects, isolated environmental loads, or specific maneuvering tasks [34,35]. Some recent studies have begun to consider multiple environmental factors or more complex maneuvering scenarios, most still rely on a single unified model to learn the aggregated effect of multiple disturbances, without explicitly decomposing and separately modeling different disturbance sources [31,35]. As a result, despite improvements in overall prediction accuracy, these methods still have limited ability to reveal differences in disturbance pathways and residual structures, thereby constraining both model interpretability and targeted compensation.
In addition, existing studies on ship motion prediction are still evaluated mainly through one-step forecasting or conditional prediction based on historical observation windows [25,30,36]. Although a few studies have considered recursive or autoregressive prediction, such efforts remain limited, and the prediction horizons are usually short. For practical applications such as collision avoidance, risk assessment, and receding-horizon optimization, the more critical requirement is not only short-term accuracy, but also accurate and stable autoregressive trajectory prediction over extended horizons [25,30,36].
To address the complex mismatch caused by multi-source disturbances, it is necessary to develop a structured and modular framework in which disturbances from different sources are decomposed into modules that can be inferred and compensated independently, with each module focusing on a specific disturbance type. In the maritime domain, some existing approaches have attempted to improve prediction accuracy through multiple expert subnetworks combined with adaptive fusion. Although these methods generally outperform a single end-to-end network in overall prediction performance, they still essentially absorb multiple disturbance sources into a unified mapping and therefore do not provide explicit physical attribution of individual disturbances [36].
In the broader machine learning literature, structured composition has been achieved through mixture-of-experts models and neural modular networks, in which different modules are assigned explicit subfunctions [37,38]. In engineering fault diagnosis, dedicated modules for different fault sources have also been explored in practice, showing that structured decomposition can improve identifiability, attribution capability, and diagnostic accuracy [39,40]. In hybrid physics–data modeling, integrated frameworks that use a mechanistic model as the backbone and introduce disturbance-specific residual modules to enable attribution have been investigated in other domains [41].
This suggests that introducing a physics-structured modular paradigm into ship maneuvering modeling can improve long-horizon prediction while retaining source-wise interpretability. In this study, the proposed framework focuses on one central objective: decomposing systematic residual mismatch into disturbance-specific correction channels and integrating them with the MMG dynamics for stable closed-loop prediction. The main contributions are summarized as follows:
  • A physics-structured residual learning framework is proposed for ship maneuvering prediction under multi-source disturbances. The framework uses the MMG model as the physical backbone and introduces disturbance-specific MLP experts for shallow-water effects, hull fouling, rudder degradation, and wind loads, so that different mismatch sources are compensated through separate residual correction channels rather than a single holistic residual mapping.
  • A CNN-SE-BiLSTM encoder is developed to infer the latent intensities of the endogenous slow-varying disturbances, namely shallow-water effects, hull fouling, and rudder degradation, from residual-state sequences. Wind is treated through a separate measured-input pathway because wind speed and direction are directly observable and vary more rapidly than the endogenous parameter-modification disturbances.
  • The proposed framework is evaluated through long-horizon closed-loop autoregressive prediction under single-source, triple-source, and wind-inclusive disturbance scenarios. The results demonstrate improved prediction accuracy and source-wise interpretability.

2. Materials and Methods

Figure 1 illustrates the overall workflow of the proposed physics-structured residual learning framework, including the MMG physical backbone, the CNN-SE-BiLSTM encoder, the disturbance-specific MLP experts, and the autoregressive closed-loop feedback path.

2.1. MMG Three-Degree-of-Freedom Model

This framework is established on the basis of the MMG method and develops a rigid-body dynamic model for a surface vessel in three degrees of freedom: surge (u), sway (vₘ), and yaw (r):
( m + m x ) u ˙ ( m + m y ) v m r m x G r 2 = X H + X P + X R ( m + m y ) v ˙ m + ( m + m x ) u r + m x G r ˙ = Y H + Y R ( I z z + J z z ) r ˙ + m x G ( v ˙ m + u r ) = N H + N R
The coordinate system and motion variables used in this model are defined in Figure 2. The specific coefficients and formulas can be referred to in Yasukawa and Yoshimura [2], where they are defined for the KVLCC2 hull form, with parameters taken from the SIMMAN 2014 benchmark dataset.
The MMG model characterizes the nominal physical laws governing ship maneuvering motion. In most cases, however, it is established under fixed parameter settings, specific experimental conditions, and idealized assumptions. Once the vessel operates under off-nominal conditions, such as shallow water, hull fouling, or rudder degradation, these disturbances may jointly introduce a structural discrepancy between MMG predictions and the actual dynamics.
From this perspective, the disturbances considered in this study are not treated as a substitute for the MMG model. Instead, they are regarded as sources of compensation for the dynamics not captured by the nominal formulation. Residual learning is therefore employed to represent this mismatch. The residual term should thus be interpreted as an additional correction to the MMG baseline under disturbance conditions, rather than as a complete alternative description of the ship dynamics.
Table 1 presents the symbols used in this study and their physical meanings.

2.2. Disturbance Modeling

This study considers four classes of disturbance sources with clear engineering relevance in ship operation, namely shallow-water effects, hull fouling, rudder degradation, as well as one type of external load that can be directly measured by onboard sensors, namely wind load. These factors are parameterized by a normalized intensity, λ k [ 0,1 ] , where   k { s w , h f , r d , w i n d }   denotes   the   disturbance   type . For the three endogenous slow-varying disturbances, the latent intensity vector is later written as λ = [ λ s w , λ h f , λ r d ] , whereas wind is represented by measured wind-speed and wind-direction inputs. These modifications are not intended to reproduce all the details of the underlying physical processes with high fidelity. Rather, based on simplified approximate formulas widely adopted in the literature, they are designed to construct physically plausible directions of parameter variation and serve as a structured disturbance generator for training data generation. It is worth noting that full CFD simulations with multi-source coupling are computationally prohibitive and therefore cannot be used to generate large-scale training data with controllable disturbance intensity. The method adopted in this study thus represents a practical engineering compromise.
Its design objective is to cover the space of residual directions under different physical mechanisms and to provide controllable intensity parameterization, rather than to pursue the quantitative accuracy of each correction formula. Detailed correction formulas and parameter lists for each disturbance are provided in Appendix A.

2.2.1. Shallow Water ( λ s w )

Shallow-water effects are modeled by modifying the MMG model parameters to approximately capture the influence of decreasing water depth on maneuvering performance. The increase in resistance is adopted from the empirical formula of Lackenby [7], where the increment in resistance is set to Δ R / R = 0.57 ( T / h ) 1.79 . To account for the influence of shallow water on course-keeping derivatives and added mass, this study directly adopts the regression expressions of Kijima [6], using the standard variable σ :
σ = 1 1 ( T / h ) 2
as the basis, and correcting each parameter in the form:
C = σ n D
The exponent n D is taken from Kijima’s regression results for a wide range of ship types C b 0.7 . Corrections to rudder effectiveness and propulsion-related parameters are based on engineering estimates. A complete list of coefficients is provided in Appendix A Table A1. The water-depth-to-draft ratio h / T   is mapped to the normalized intensity as:
λ s w = 1 h / T 1.2 5.0 1.2
thereby yielding the normalized disturbance strength.
A complete list of the modified parameters is provided in Table A1.

2.2.2. Hull Fouling ( λ h f )

Hull fouling is modeled primarily as an increase in frictional resistance together with a modification of the wake field. The fouling level is directly parameterized using the scale in Schultz [9] (Table A2), with λ h f [ 0,1 ] , where λ h f = 0   corresponds to a clean antifouling-coated hull (AF, k s = 30   μ m ) and λ h f = 1   corresponds to heavy calcareous fouling (F, k s = 10,000   μ m ). The total-resistance increase Δ R T / R T   and the increment in friction coefficient Δ C F   are obtained by linear interpolation of the six discrete data points in Schultz [9], thereby reproducing the nonlinear relationship reported in the literature. The wake fraction and thrust deduction factor are corrected as:
w p 0 w p 0 1 + 0.20 Δ C F ,   t p t p 1 + 0.15 Δ C F
Following the estimates of Townsin [10] for the degradation of propulsive performance due to hull fouling. The fouling-induced variation in lateral-force derivatives such as Y v and N v is represented by engineering scaling factors of the form:
1 + ( 0.08 ~ 0.12 ) Δ C F
reflecting the secondary influence of surface roughness on sway force and yaw moment. A complete list of the modified parameter-correction coefficients is provided in Table A3.

2.2.3. Rudder Degradation ( λ r d )

The principal effect of rudder degradation is a reduction in the rudder lift slope. It is parameterized by a degradation factor d f [ 0,0.25 ] , which is mapped to:
λ r d = d f 0.25
The primary effect is modeled as:
f α f α ( 1 d f )
meaning that the rudder lift slope decreases linearly with the degree of degradation. In the MMG ship–rudder–hull interaction model [2], the reduction in f α further induces coupled variations in other parameters through the interaction relationships:
ε ε ( 1 0.20 d f ) γ R γ R ( 1 0.15 d f ) a H a H ( 1 0.10 d f ) t R t R ( 1 + 0.08 d f )
The above coupling coefficients are based on engineering estimates. A complete list of the modified parameters is provided in Table A4.

2.2.4. Wind Load ( λ w s , λ w d )

Wind load is treated as an external force acting directly on the ship, without modifying the MMG model parameters. Wind speed and wind direction are obtained from the shipborne anemometer and are regarded as known sensor inputs, parameterized by λ w s (wind speed) and λ w d (wind direction).
The wind-force calculation follows the simplified Fourier-series method of Fujiwara et al. [42]. The true wind speed U T   and wind direction ξ T   in the earth-fixed frame are transformed into velocity components in the body-fixed frame as:
u w = U T cos ( ξ T ψ ) ,   v w = U T sin ( ξ T ψ )
where ψ is the ship heading. The apparent wind velocity is then:
u a = u w u ,         v a = v w v m ,         V a = u a 2 + v a 2 ,         θ a = atan 2 ( v a , u a )
where θ a = 0 corresponds to head wind and θ a = π   corresponds to following wind.
The wind-load coefficients are represented in Fourier-series form by fitting the wind-tunnel data for a VLCC-type ship [42]:
C X = 0.60 cos θ a 0.10 cos 3 θ a + 0.05 cos 5 θ a C Y = 0.80 sin θ a + 0.15 sin 3 θ a + 0.05 sin 5 θ a C N = 0.12 sin θ a 0.08 sin 2 θ a + 0.03 sin 3 θ a
The wind force and wind-induced yaw moment are then given by:
X W = 1 2 ρ a V a 2 A F C X ,         Y W = 1 2 ρ a V a 2 A L C Y ,         N W = 1 2 ρ a V a 2 A L L p p C N
where ρ a = 1.225   k g / m 3   is the air density, A F = 870   m 2   is the frontal projected area, and A L = 3200   m 2 is the lateral projected area.
Wind loading is parameterized by a two-dimensional condition vector:
λ w i n d = ( λ w s , λ w d )
where λ w s denotes wind strength and λ w d denotes wind direction. The wind-speed component is normalized using the Beaufort scale as:
λ w s = B F 5 5
which maps the operational range B F ∈ [5, 10] to λ w s ∈ [0, 1]. The wind-direction component is defined as:
λ w d = ξ T 2 π [ 0 , 1 )
Wind conditions below Beaufort 5, corresponding to U T < 9.3 m/s are excluded from the expert training domain. As shown in Table 2, the associated residual magnitude is much smaller than that induced by the dominant disturbance sources considered in this study. Under a representative turning maneuver with δ = 20 ° , the acceleration residual induced by Beaufort 3 wind is Δ a 1.4 × 10 4   m / s 2 which is less than 1 % of the residual generated by severe shallow-water effects at h/T = 1.2 for which Δ a 1.6 × 10 2   m / s 2 .
Even at Beaufort 5, the wind-induced residual reaches only 1.8 % of the shallow-water benchmark. Below this threshold, the wind perturbation is effectively indistinguishable from numerical noise in MMG integration and would therefore reduce the signal-to-noise ratio of expert-network training. Accordingly: λ w s = 0 is anchored at Beaufort 5, which marks the onset of operationally significant wind loading for VLCC-class vessels.
In relation to Equation (1), The three parameter-modification disturbances (shallow water, fouling, and rudder degradation) act on the MMG parameter set through multiplicative correction factors. When multiple disturbances are activated simultaneously, the correction factors are multiplied sequentially for the same parameter:
p j p j c j ( SW ) c j ( HF ) c j ( RD )
Since the parameter subsets affected by different disturbance types are largely non-overlapping: shallow water mainly affects added mass and hydrodynamic derivatives, fouling mainly affects resistance and wake, and rudder degradation mainly affects rudder-force coefficients. This multiplicative superposition can be regarded, from a physical point of view, as an approximate combination of independently acting effects. Wind load, by contrast, is not treated as a parameter modification. It is introduced as directly measured external force and moment components added to the right-hand side of the surge, sway, and yaw equations in Equation (1), denoted as X w , Y w , and N w , respectively.

2.3. Modular Expert Network

2.3.1. Disturbance-Specific Expert Networks

The expert networks used in this study are implemented as multilayer perceptrons (MLPs). An MLP is a feedforward neural network composed of stacked fully connected layers and nonlinear activation functions, and is suitable for approximating nonlinear mappings between vessel states, disturbance coordinates, and residual acceleration corrections.
As illustrated in Figure 3, for each disturbance source k { s w , h f , r d , w i n d } , the proposed framework assigns an independent MLP expert f E x p e r t , k to learn the mapping from vessel states and disturbance intensity to residual accelerations:
Δ a k = f Expert-FMMG , k   u , v m , r , δ , δ ˙ , n , λ k f MMG   u , v m , r , δ , δ ˙ , n
where
Δ a k = Δ u ˙ Δ v ˙ m Δ r ˙
This decomposition is consistent with the physical prior introduced in Section 2.2. Since different disturbance sources act on different parameter groups, they induce distinct dominant directions in the residual acceleration space. The expert-wise design is therefore adopted to preserve such structural heterogeneity and to reduce interference among different disturbance mechanisms during learning.
All four experts share the same backbone architecture, namely a three-hidden-layer MLP. Each hidden layer is organized as Linear–LayerNorm–ReLU–Dropout 0.1 . The SW, HF, and RD experts take a 7-dimensional input, while the wind expert uses a 10-dimensional input. In addition to the basic motion states, the wind expert further incorporates wind-related variables, including wind speed, wind direction, and the relative wind-direction encoding:
[ λ w s , λ w d , sin ψ rel , cos ψ rel ]
All experts produce a three-dimensional residual acceleration output. The detailed hyperparameter settings are summarized in Table 3.
The training data are generated by simulating the MMG model under different disturbance intensities. The steering scenarios include zigzag maneuvers, turning maneuvers, and random steering sequences. To improve robustness against sensor uncertainty, mixed observation noise is injected into the state inputs during training, consisting of 70% white noise and 30% AR (1) process noise. The noise magnitude is set to 2 5 % of the standard deviation of each channel. In contrast, the disturbance intensity λ k is retained as an exact conditioning label and is not perturbed.
The expert networks are trained using the mean squared error loss:
L Expert = MSE   f Expert , k ( x , λ k ) , a cond ( λ k ) a nom
where a c o n d ( λ k ) denotes the actual acceleration under disturbance condition λ k , and a n o m denotes the nominal acceleration predicted by the baseline MMG model. The four expert networks are trained independently, with a total parameter count of approximately 410   K .
At the system level, each expert directly outputs a 3-DOF residual acceleration, and the overall correction is obtained by summing the outputs of the four condition-specific experts:
Δ a total = k f Expert , k ( x , λ k )
The MLP head in the encoder is used solely to estimate the latent disturbance intensities   λ k , rather than to produce the final residual correction. Therefore, the aggregation stage is implemented as expert-wise residual prediction, consistent with the actual framework design.
The feasibility of the modular expert architecture relies on a key assumption: residual signals induced by different operating conditions are distinguishable in the acceleration space. To formalize this assumption, we introduce the residual Jacobian matrix. At the operating point:
x = ( u , v m , r , δ , δ ˙ , n )
the residual Jacobian is defined as:
J ( x ) = Δ u ˙ λ sw Δ u ˙ λ hf Δ u ˙ λ rd Δ v ˙ m λ sw Δ v ˙ m λ hf Δ v ˙ m λ rd Δ r ˙ λ sw Δ r ˙ λ hf Δ r ˙ λ rd
Here, Δ u ˙ , Δ v ˙ m , and Δ r ˙ denote the residuals of surge, sway, and yaw accelerations between the perturbed and nominal models, respectively. The parameters λ s w , λ h f and λ r d represent the normalized intensities of shallow-water, hull-fouling, and rudder-degradation perturbations.
Each column of J ( x ) describes the local sensitivity direction of the residual acceleration vector with respect to one perturbation. The matrix therefore characterizes whether the effects of different perturbations are linearly independent in the acceleration space. In particular, if J ( x ) is full rank at the operating point x :
rank ( J ( x ) ) = 3
then the residual signatures of the three perturbations are locally linearly independent, and the perturbation sources are locally identifiable from the acceleration residuals.
Numerically, J ( x ) is computed using central finite differences. Around the nominal intensity λ k = 0.5 , a small perturbation of ± ε is applied to each factor, with ε = 10 4 , and the corresponding residual acceleration differences are obtained directly from the MMG model. This procedure requires no network training and depends only on the structure of the physical model itself. Hence, the resulting identifiability conclusion is independent of the learning algorithm.
It should also be noted that wind disturbance is not included in the above 3 × 3 Jacobian framework because its input space contains additional directional encoding, s i n   ψ r e l ,   c o s   ψ r e l . As a result, its residual structure is fundamentally different from that of the three scalar-parameterized latent perturbations, and it can be naturally separated through an independent input channel.

2.3.2. Disturbance Intensity Encoder (CNN-SE-BiLSTM)

The encoder takes a multi-channel time-series tensor of size B 10 300 , where B is the batch size, 10 is the number of input channels, and 300 is the number of samples within the time window. This corresponds to a 60 s observation window with a sampling interval of Δ t = 0.2 s. The 10 channels consist of three residual acceleration channels Δ u ˙ Δ v ˙ m Δ r ˙ , three state/control channels u δ n , two heading encoding channels s i n ψ , c o s ψ , and two known wind-state channels.
Figure 4 illustrates the CNN block of the proposed encoder. To estimate disturbance intensity, we design an encoder composed of a CNN block, an SE block, and a BiLSTM block. The CNN extracts local temporal patterns from the input residual sequence, the SE block adaptively reweights feature channels to emphasize disturbance-related information, and the BiLSTM captures bidirectional temporal dependencies for higher-level sequence representation.
As the front-end feature extractor, the 1D CNN captures local temporal variations in the residual sequence, including abrupt changes, slow trends, and control-related fluctuations. The convolutional front-end consists of three 1D convolutional layers with channel dimensions 10 64 128 128 . Each layer is followed by Batch Normalization, ReLU activation, and Dropout. The first two layers use strided convolution for temporal downsampling, while the third layer refines local features without further reducing the sequence length. As a result, the input sequence is compressed from B 10 300 to B 128 75 , enabling the subsequent BiLSTM to model temporal dependence with lower computational cost.
Figure 5 illustrates the structure of the SE channel-attention block introduced after the CNN front-end to adaptively recalibrate channel-wise features. Because different input channels and extracted feature maps contribute unequally to disturbance estimation, the SE block uses global temporal pooling and two lightweight fully connected layers to generate normalized channel weights. These weights are applied to the convolutional features so that disturbance-sensitive channels are emphasized and less informative channels are suppressed. This provides a more discriminative representation for the subsequent BiLSTM-based temporal modeling.
Figure 6 illustrates the BiLSTM block used at the back end of the encoder to model long-range temporal dependencies in the CNN-SE feature sequence. Although the CNN-SE front-end extracts local temporal patterns and emphasizes disturbance-related channels, the residual response also contains long-range temporal dependence caused by sustained offsets, gradual accumulation, and delayed propagation. Therefore, a BiLSTM block is used at the back end of the encoder to model the CNN-SE feature sequence in both forward and backward temporal directions. The bidirectional representation allows the encoder to use contextual information from the available historical observation window when estimating slowly varying disturbance intensities.
The BiLSTM output is passed through fully connected layers to produce the normalized disturbance-intensity estimate λ = λ s w , λ h f , λ r d 0,1 3 , corresponding to the three latent endogenous disturbance channels: shallow water, hull fouling, and rudder degradation. Wind is treated differently because it is an externally forced, direction-dependent, and directly measurable disturbance. Therefore, the measured wind parameters λ w s λ w d are included as input channels to expose the encoder to wind-disturbed trajectories, but they are not predicted by the encoder. Instead, wind compensation is handled by a dedicated wind expert.
This design is consistent with the different time-scale characteristics of the disturbance sources. The encoder acts as a window-based estimator for slowly varying latent conditions, which is appropriate for shallow-water effects, hull fouling, and rudder degradation. By contrast, wind can vary more rapidly and can be measured directly by onboard sensors. Moreover, the residual-magnitude analysis in Section 2.2 shows that, under a representative 20 turning maneuver, the residual induced by Beaufort 5 wind is only 1.8% of that produced by severe shallow-water effects at h / T = 1.2 . Thus, moderate wind should not be treated as a dominant latent condition to be inferred by the encoder. For stronger wind cases, such as Beaufort 10, an explicit wind expert provides a more suitable external compensation pathway.
In implementation, the convolutional features are first recalibrated by the SE block and then fed into a two-layer BiLSTM with hidden size 128 and inter-layer dropout. The final forward and backward hidden states of the last BiLSTM layer are concatenated into a 256-dimensional global temporal feature. This feature is then mapped to the disturbance-intensity estimates by a fully connected regression head with dimensions 256 128 64 3 , using ReLU activations and dropout. A sigmoid output layer is applied to constrain the predictions to 0 1 . The backward branch of the BiLSTM operates only within the current historical window x t i W + 1 , , x t i and does not access any data beyond the update instant t i ; therefore, no information leakage is introduced. Detailed layer configurations and parameter statistics are listed in Table 4.
During training, the expert networks are kept frozen and only the encoder parameters are updated. The encoder output λ is passed through the frozen experts to generate residual predictions, which are then compared with the ground-truth residuals directly obtained from simulation. The total loss is defined as:
L = L pred + α L sup + β L sparse
Here, L p r e d is the mean squared error of the normalized residual prediction, which evaluates whether the inferred λ enables the experts to reproduce the true residuals. L s u p is a channel-weighted supervision loss between λ and the ground-truth label, with α = 3.0 , channel weights of 2.0 for SW/HF and 1.0 for RD, directly guiding the encoder to learn the correct intensity mapping. L s p a r s e = λ 1 , with β = 0.01 , encourages the output to approach zero under disturbance-free conditions.
Optimization is performed using AdamW with learning rate 3 × 10 4 and weight decay 10 4 . The batch size is 256 and the maximum number of training epochs is 200. The learning-rate schedule consists of linear warm-up over the first 10 epochs, followed by cosine annealing, and stochastic weight averaging (SWA) from epoch 60 onward with a learning rate of 10 4 . Gradient clipping with a threshold of 1.0 and early stopping with a patience of 30 epochs are applied during training. The training and validation sets are split by trajectory with a ratio of 80/20. To improve robustness, 10% random noise is injected into the residual channels and 1–2% noise into the state channels during training; Mixup augmentation is enabled with 50% probability using α m i x = 0.4 ; and the starting point of each window is randomly jittered within ± 5 steps. All models are trained in Python 3.9 with PyTorch 2.3.1.

3. Results

3.1. Distinguishability of Residual Signals Under Different Operating Conditions

To quantitatively verify the above hypothesis, we compute the Jacobian matrix J ( x ) along a standard 10 / 10 zigzag maneuver using central differences, and perform sampling analysis at multiple operating points. To avoid contamination from near-straight-motion segments, only samples with effective rudder excitation ( δ > 1 ) are retained for statistical evaluation. The results are presented in Figure 7.
This analysis is included to support the physical basis of the disturbance-specific expert decomposition. If different disturbance sources generate distinguishable residual directions, then assigning separate expert networks to different disturbance channels is justified.
Figure 7a shows that all three singular values of the Jacobian matrix remain nonzero during the active steering phases, indicating that the disturbance directions maintain nontrivial projections in the residual space. Figure 7b further shows that the residual fingerprints of different disturbance sources are not fully collinear. The relatively higher similarity between SW and HF is consistent with the channel-sensitivity pattern in Figure 7c: both disturbances mainly affect the Δ u ˙ and Δ v ˙ m channels, while their influence on Δ r ˙ is weak. However, HF is more concentrated in Δ u ˙ , whereas SW affects both Δ u ˙ and Δ v ˙ m , so the two directions are similar but not identical. The SW–RD similarity remains at an intermediate level because both affect Δ v ˙ m , whereas RD shows a stronger relative contribution to Δ r ˙ .
Figure 7c provides an averaged view of the Jacobian structure over the trajectory. A clear channel-wise sensitivity pattern can be observed: hull fouling produces the strongest response in the surge residual Δ u ˙ ; shallow water affects both Δ u ˙ and Δ v ˙ m ; and rudder degradation mainly affects Δ v ˙ m and Δ r ˙ . These results indicate that the residual signals contain sufficient discriminative information for disturbance classification and intensity estimation, thereby supporting the subsequent disturbance-specific expert decomposition and CNN-SE-BiLSTM encoder design.

3.2. Regression Accuracy of Disturbance-Channel Correction

To evaluate the proposed encoder on single-disturbance scenarios, we construct an independent test set consisting of 300 trajectories, including 100 trajectories for each disturbance type (SW/HF/RD), with each trajectory containing only one disturbance category. A sliding window of length W = 300 steps is adopted, corresponding to 60 s, with a stride of 50 steps (10 s). For each window, the encoder outputs λ = [ λ s w , λ h f , λ r d ] , which is compared channel-wise with the corresponding ground-truth components extracted from the full vector λ s w λ w s λ w d λ h f λ r d . Wind-related components are not evaluated here because they are directly measured inputs rather than encoder predictions. The number of evaluation windows is n = 16,500 for each predicted channel.
Table 5 summarizes the regression accuracy of the proposed model for the three disturbance channels. It should be noted that the R 2 metric reported here evaluates the regression performance with respect to the defined disturbance-generator intensity coordinates, rather than directly measured physical-state variables. Therefore, this metric primarily reflects the encoder’s fitting capability for the latent disturbance-intensity representation.
Figure 8 shows the channel-wise scatter distributions between the encoder-predicted latent disturbance-intensity vector λ = [ λ s w , λ h f , λ r d ] and the corresponding ground-truth SW, HF, and RD components. It can be seen that the samples in all channels are closely distributed around the diagonal line, indicating strong consistency between the predicted and true disturbance intensities. Together with the statistical results in Table 5, these observations demonstrate that the proposed CNN-SE-BiLSTM encoder can stably extract discriminative features related to disturbance intensity from residual sequences and achieve accurate regression of the correction intensities for the three latent disturbance channels.

3.3. Closed-Loop Prediction Performance

The closed-loop validation adopts a two-stage evaluation protocol consisting of observation warm-up and strict closed-loop autoregressive prediction. First, during the initial T w a r m = 60   s (or 400   s ), the encoder is driven by the ground-truth observation sequence, and the disturbance-intensity estimate λ is obtained from the residual features over the most recent time windows, so as to mitigate the accumulation of errors caused by unstable encoder estimates in the initial phase.
The system is then switched to the strict closed-loop prediction mode, in which the vessel states are propagated entirely by the model itself without reference to any ground-truth trajectory information. Specifically, the current estimate λ   is used to activate the corresponding expert network, which generates the correction term Δ a . This correction is added to the nominal MMG model output, and the subsequent vessel trajectory over T p r e d = 240   s (or 400   s ) is propagated using an RK4 integrator. During this phase, the encoder input is no longer constructed from real observations; instead, the residual sequence and the corresponding time windows are reconstructed from the model’s own closed-loop predicted trajectory and are then fed back into the encoder to update λ .
Therefore, the entire prediction stage constitutes a strict autoregressive closed-loop rollout, in which state propagation, residual construction, and disturbance estimation are all recursively driven by the model’s own previous predictions. To account for potentially time-varying disturbances, λ   is updated every 10   s based on the latest time window.
All comparative methods share the same ground-truth control input sequence and identical initial observation conditions. The evaluation metrics include position RMSE (m), heading RMSE (∘), terminal position error (m), and the normalized root-mean-square errors of surge velocity, sway velocity, and yaw rate n R M S E u nRMS E v m n R M S E r .
Three representative test scenarios are considered, with the trajectory and state-prediction results presented in Figure 9, Figure 10 and Figure 11 and the corresponding averaged performance metrics summarized in Table 6, Table 7 and Table 8. The first is a random steering scenario under a single severe shallow-water disturbance, corresponding to Figure 9 and Table 6 which is used to examine the upper limit of the model’s compensation capability under a strong single-source disturbance. The second is a random steering scenario under combined disturbances (SW + HF + RD), corresponding to Figure 10 and Table 7 which is used to evaluate the model’s ability to disentangle and compensate for coupled multi-source disturbances. The third is a standard zigzag 10 / 10 maneuver under a single severe shallow-water condition, corresponding to Figure 11 and Table 8 which is used to assess the applicability and stability of the model under a typical maneuvering scenario.
Table 9 summarizes the detailed configurations of the baseline and ablation models, including their architectures, hidden-layer settings, parameter counts, input-output formats, and training trajectories. These settings provide the basis for the subsequent closed-loop performance comparisons.
  • Comparison with the UKF baseline
The comparison with UKF should mainly be understood as a comparison against a model-based reference baseline, rather than a direct competition with a practically deployable final engineering solution. The purpose of introducing UKF in this work is to show that, even when a high-fidelity explicit state equation and a classical recursive filtering framework are available, the estimation of λ can still be affected by observational ambiguity; therefore, the true physical parameters do not naturally correspond to the optimal disturbance-compensation coordinates. It should also be noted that UKF benefits from stronger modeling priors in this comparison, including a high-accuracy explicit state-space model and a recursive residual-correction mechanism. Even under such conditions, the proposed method still outperforms UKF in the single shallow-water random steering scenario and the triple-source disturbance random steering scenario, while achieving comparable performance in the single shallow-water zigzag maneuver. These results indicate that the proposed method can still realize stable and effective disturbance compensation without relying on an explicit filtering framework or strong model priors.
  • Comparison with the BiLSTM and MLP baseline
The comparison with BiLSTM and MLP baselines is mainly intended to highlight the essential difference between the proposed physically structured closed-loop framework and purely data-driven methods in complex disturbance-compensation tasks. First, the overall poor performance of MLP under various mismatch conditions is consistent with its methodological characteristics: since this type of model is essentially closer to a one-step mapping and lacks sufficient capability to capture temporal memory and contextual evolution of disturbances, it is difficult for it to maintain stable predictions in scenarios with strong coupling, time-varying effects, and significant distribution mismatch; this is also reflected during training, where its validation loss shows pronounced fluctuations.
In contrast, BiLSTM actually achieves high accuracy at the one-step residual prediction level, but its major performance degradation appears during multi-step autoregressive closed-loop rollout. To determine whether the failure of the BiLSTM baseline stems from insufficient one-step residual fitting or from distribution shift during autoregressive rollout, we evaluated the same trained BiLSTM checkpoint under five random steering seeds and two representative scenarios, namely single severe shallow-water scenario and triple-source disturbance scenario, using three settings: teacher-forced one-step residual prediction, GT-window closed-loop evaluation, and fully autoregressive closed-loop rollout. The results show that the model is highly accurate in one-step residual prediction. For example, in a separate 180 s diagnostic rollout under the single severe shallow-water scenario, the correlation coefficients for Δ u ˙ / Δ v ˙ m / Δ r ˙ reach 0.911/0.989/0.983, with RMSE values of only 7.08 × 10 4 , 8.02 × 10 4 , and 2.2 × 10 5 , respectively. When a ground-truth window is still provided at each step as context, the closed-loop position RMSE remains 15.65 ± 4.57 m. However, under fully autoregressive residual rollout, the position RMSE increases to 108.25 ± 27.24 m and the heading RMSE rises to 28.24 ± 7.97 . The corresponding mean per-seed degradation ratios are 7.70× for position RMSE and 10.35× for heading RMSE, respectively. A similar mean per-seed position-RMSE deterioration of 8.08× is also observed in the triple-source disturbance scenario.
This difference arises from the feedback mechanism used to construct the prediction context. In the ground-truth-window setting, the residual window remains anchored to the true trajectory, which keeps the input context close to the training distribution and suppresses the propagation of local residual errors. In the fully autoregressive setting, by contrast, both the vessel states and residual windows are reconstructed from the model’s own previous predictions. Small residual-amplitude errors, phase shifts, or low-frequency biases can therefore be recursively fed back into subsequent integration steps and accumulated into much larger ship-position errors. Since all three evaluations use exactly the same network weights, these results indicate that the main limitation of BiLSTM does not lie in the learnability of the one-step residual itself, but rather in the distribution shift and self-amplifying error accumulation induced by autoregressive residual rollout.
  • Comparison with the ConcatMLP (True λ ), Oracle λ
The comparison with ConcatMLP (True λ ) and Oracle λ is mainly intended to examine the actual role of λ in residual compensation and to further investigate whether the true λ is naturally identical to the optimal coordinate for closed-loop compensation.
First, it should be noted that ConcatMLP (True λ ) is directly provided with the ground-truth λ as input, and therefore it is naturally expected to achieve the smallest error; this in itself can also be regarded as a direct validation of the effectiveness of λ . From the results, ConcatMLP (True λ ) generally outperforms Oracle λ , although the gap is not large. This suggests that residual compensation is influenced not only by the true disturbance parameter itself, but also by additional effective information arising from multi-channel dynamic coupling, modeling errors, and compensation interactions.
It should also be emphasized that the training objective of the proposed encoder is not to reconstruct the true λ with high fidelity. Instead, the loss function is jointly defined by the normalized expert RMSE and the residual term. Therefore, the encoder output λ is not a pointwise restoration of the true physical parameter, but rather an optimal equivalent coordinate that is more beneficial to the current expert system after jointly accounting for multi-channel residual superposition, interaction effects, and the approximation error of the expert itself. In other words, the λ   that yields the best closed-loop performance does not necessarily coincide with the true λ . This also explains why Oracle λ can even perform worse than the proposed method in the single shallow-water scenario: although the true parameter provides the physically correct reference, it does not necessarily correspond directly to the most effective representation for closed-loop compensation, whereas the learned λ is more closely aligned with the coordinate representation that is optimal for closed-loop prediction and compensation performance.

3.4. Closed-Loop Validation Under the Four-Source Disturbance Scenario and Wind-Expert Ablation

The wind-expert ablation further shows that the role of the wind expert is not merely to reduce instantaneous error, but more importantly to provide a much more stable compensation for external wind disturbances during the early stage of closed-loop prediction. As shown in Figure 12, with the wind expert enabled, both the BF5 and BF10 cases remain substantially more stable immediately after the observation phase, and the predicted trajectory as well as the evolutions of u, v m , and r follow the ground truth more closely. In contrast, without the wind expert, the model deviates much earlier once autonomous rollout begins, and the error continues to accumulate over time.
The different trajectory trend observed in the BF5 case without the wind expert is mainly caused by the absence of explicit wind-load compensation. In the early stage after the prediction starts, the yaw-rate response still follows a trend similar to the ground truth. However, the surge-velocity, sway-velocity, and yaw-rate channels are not independently sufficient to determine long-horizon trajectory accuracy at each instant; even moderate residual errors in these channels can change the integrated heading and position over time. Therefore, the BF5 no-expert case gradually departs from the reference trajectory during the fully autoregressive rollout. With the wind expert included, the wind-induced residual forces and moments are partially compensated, so the predicted trajectory remains closer to the ground truth.
The quantitative results in Table 10 confirm this trend. Under BF5, adding the wind expert reduces the position RMSE from 296.1 m to 64.2 m and the endpoint error from 763.0 m to 170.9 m. Under the stronger BF10 condition, the position RMSE is reduced from 586.8 m to 99.5 m, while the endpoint error decreases from 1389.9 m to 163.7 m. These correspond to reductions of 78.3% and 83.0% in position RMSE under BF5 and BF10, respectively.
It should also be noted that, even with the wind expert, the predicted trajectory under BF10 still exhibits a non-negligible deviation from the ground truth. This discrepancy is reasonable because BF10 represents a much stronger external wind-load condition, which introduces larger sway-force and yaw-moment residuals and makes the closed-loop rollout more sensitive to small compensation errors. During the long fully autoregressive prediction horizon, any remaining wind-load approximation error, phase mismatch, or velocity-state deviation can be recursively fed back and accumulated through numerical integration. Therefore, the wind expert substantially improves wind compensation and delays divergence, but it cannot completely eliminate long-horizon drift under strong wind forcing.
Taken together, the results in Figure 9, Figure 10, Figure 11 and Figure 12 and Table 6, Table 7 and Table 8, together with Table 10, demonstrate stable long-horizon closed-loop autoregressive prediction across the main disturbance scenarios and wind-inclusive ablation cases. As summarized in Table 11, the proposed framework achieves position-RMSE reductions of 74.7–91.7% relative to the corresponding nominal-MMG and wind-ablation baselines, confirming its overall effectiveness under complex disturbance conditions.

3.5. Time-Varying Disturbance Tracking

To evaluate the model’s online adaptability under time-varying operating conditions, we further examine the disturbance-intensity tracking performance in a gradually varying shallow-water scenario. The results in Figure 13 show that the proposed method can smoothly track the variation trend of the dominant shallow-water channel while maintaining low responses in the inactive channels, demonstrating good channel selectivity and online adaptation capability.
It should be noted that the Encoder is trained only under several discrete shallow-water conditions and has never explicitly seen a continuously varying “gradually entering shallow water” process during training. Nevertheless, it still exhibits stable tracking performance in this scenario, indicating that the learned representation is not limited to memorizing discrete conditions, but is able to effectively interpolate and generalize across continuously varying shallow-water intensities.
This property is of clear practical significance, since in real ship operations the shallow-water effects usually change gradually along the sailing process rather than appearing as a few discrete labeled conditions. These results suggest that the proposed framework does not require large amounts of specially collected gradual shallow-water transition data; instead, training under several representative shallow-water conditions is already sufficient to achieve effective adaptation to continuously varying real-world conditions.

3.6. Counterfactual Channel Intervention Analysis

To determine whether the encoder identifies operating conditions primarily from condition-induced residuals rather than from generic motion and control context, we designed a counterfactual channel-swapping experiment using physically matched trajectory pairs. For each maneuver type and random seed, the same open-loop control sequence was replayed under three operating conditions: shallow water, hull fouling, and rudder degradation, with equivalent intensities set to λ s w = 0.92 , λ h f = 0.25 , and λ r d = 0.80 , respectively. In each paired sample, the source is the reference trajectory window, whereas the donor is a matched trajectory window generated under a different operating condition but with the same control input. Thus, the source-donor difference is mainly caused by the operating condition itself.
The encoder input was divided into a residual group and a motion-control context group. The residual group contains the three acceleration residual channels Δ u ˙ Δ v ˙ m Δ r ˙ , which directly describe the mismatch between the nominal model and the disturbed dynamics. The context group contains the remaining state and control channels, such as vessel motion states and control inputs. Four settings were compared: Source and Donor use all channels from the source and donor windows, respectively; Residual Swap replaces only the residual group with that of the donor; and Context Swap replaces only the context group.
The results in Figure 14 show that Source and Context Swap remain source-oriented, whereas Residual Swap and Donor shift almost completely toward the donor condition. This means that replacing the residual channels alone is sufficient to move the encoder output toward the donor, while replacing the motion-control context alone is not. Therefore, the encoder primarily relies on condition-related residual channels rather than generic motion and control context.

4. Discussion

4.1. Physical Interpretability and Generalizability of the Expert Decomposition

The proposed modular expert decomposition is not merely a computational factorization, but an explicit encoding of different disturbance sources into physically meaningful substructures. This makes the framework extendable to other scenarios, including regime-partitioned modeling over a wide speed range (by assigning dedicated experts to different speed or flow regimes), scale-effect correction in multi-fidelity fusion (by combining baseline physical experts with lightweight correction experts), and demand-driven augmentation under high-dimensional coupling (by incrementally adding residual experts on top of an existing low-order backbone). Moreover, the modular design supports progressive deployment, allowing base experts to be introduced first and additional experts to be incorporated as new simulation, experimental, or full-scale data become available.

4.2. Identifiability Boundaries and the Methodological Role of Disturbance Generators

Overall, the results indicate that the encoder’s diagnostic performance is constrained by the distinguishability of residual fingerprints. SW and HF are more prone to joint attribution because their residual directions are relatively similar, whereas RD is more distinguishable due to its stronger yaw- and sway-related residual signatures. In this sense, the shallow-water, hull-fouling, and rudder-degradation corrections should be interpreted as structured disturbance generators that span physically plausible residual directions, rather than exact representations of all real-vessel parameter changes. Accordingly, λ should be understood as an equivalent operating coordinate in the expert basis, rather than a directly measurable physical quantity.
A remaining limitation is that the identifiability boundary has not yet been quantitatively defined. For cases with highly similar residual fingerprints, a unified criterion is still lacking for deciding whether experts should be separated or merged. Future work could address this issue by introducing quantitative measures such as residual-direction similarity, sensitivity condition numbers, and decomposition-stability indicators.

4.3. Long-Horizon Autoregressive Closed-Loop Prediction

A remaining limitation is that true closed-loop operation has not yet been considered in many other works. In practical ship collision-avoidance tasks, prediction must be performed autoregressively, with the model continuously feeding back its own predicted states and residuals, rather than relying on one-step inference or ground-truth-referenced long-horizon prediction. Since engineering deployment typically requires stable forecasts over several minutes, future work should adopt stricter evaluation protocols based on sustained autoregressive rollouts, in order to assess the method under more realistic and practically relevant conditions.

4.4. Scope of Environmental Disturbance Modeling

The present framework is not intended to exhaustively model all environmental disturbances encountered during ship operation, but to verify whether physics-structured residual learning can decompose and compensate systematic model mismatches through disturbance-specific expert channels. Wave and current effects are also important in real navigation; however, their explicit inclusion would require additional environmental-state inputs and corresponding load models. In particular, wave loads can be coupled with the MMG equations as additional force and moment terms, but they depend on sea-state variables such as significant wave height, wave period, wave direction, and encounter frequency. Therefore, wave and current effects are regarded as important extensions of the proposed modular framework rather than independent expert channels in the present study.

4.5. The Encoder as an Implicit Low-Pass Filter and Its Scope of Applicability

The encoder is better understood as a mechanism for identifying slowly varying latent conditions from motion responses, rather than as a universal estimator for all disturbance sources. Under this view, it is particularly suitable for endogenous factors that evolve gradually and leave persistent signatures in the residual dynamics, such as shallow-water effects, hull fouling, and rudder degradation. By contrast, exogenous disturbances such as wind and waves are more naturally treated through direct sensing or dedicated external modules. A promising future direction is therefore a multimodal fusion framework, in which the encoder captures endogenous conditions from vessel responses, while external sensory measurements are integrated for exogenous disturbances, with attention-based or gated fusion used to form a unified representation.

4.6. What Causes Autoregressive Closed-Loop Divergence—Insufficient Fitting Accuracy or an Architectural Limitation?

To further examine whether autoregressive closed-loop divergence is caused mainly by residual fitting accuracy or by the closed-loop robustness of the model architecture, we conducted an additional teacher-forced residual-scale analysis, with the detailed results provided in Appendix B (Table A5). In this analysis, both BiLSTM and the proposed method were supplied with the same scaled ground-truth residual sequence, so that the two architectures were compared under identical residual-degradation conditions rather than under their own predicted residuals.
The results in Appendix B (Table A5) show that closed-loop stability is not determined solely by one-step residual fitting accuracy. Even when the residual input is controlled, the direct residual-regression structure of BiLSTM remains more sensitive to residual degradation, whereas the proposed method maintains acceptable trajectory accuracy under substantially lower residual scales. This indicates that the structured design of the proposed framework, namely condition encoding combined with disturbance-specific experts, reduces the effective learning difficulty and improves robustness to residual degradation and autoregressive feedback. Therefore, the closed-loop divergence observed in BiLSTM should be understood as the combined effect of residual-approximation sensitivity and autoregressive error amplification, rather than simply as a failure of one-step residual fitting.

4.7. Rationale for Simulation-Based Validation and the Boundary of Real-Data Applicability

The simulation environment adopted in this study is based on KVLCC2 and classical hydrodynamic theory. Compared with real-ship data, this setting provides controllable disturbance sources, known intensity labels, and repeatable operating conditions, which are necessary for evaluating source-wise decomposition and channel-wise attribution. Real-ship datasets, by contrast, rarely provide sufficiently clean labels for individual disturbance sources, especially when multiple disturbances coexist and evolve simultaneously. Therefore, simulation-based validation is used here as a controlled testbed for examining whether the proposed expert decomposition can separate and compensate different mismatch channels. Nevertheless, the practical applicability of the framework still needs to be further evaluated using real-ship maneuvering data collected under diverse operating and environmental conditions.
To further examine the robustness and generalizability of the proposed framework beyond the main KVLCC2 simulation cases, additional analyses are provided in the Supplementary Materials, including cross-vessel transfer performance on the Mariner benchmark ship based on the Abkowitz model (Table S1), robustness tests with unseen correction formulas (Table S2), and closed-loop prediction performance under unseen correction formulas (Table S3).
The MARIN experimental-data analysis provided in Appendix C (Figure A1) is therefore treated as an applicability-boundary check rather than a main validation result. Because the nominal MMG model and the MARIN free-running data differ in propulsion configuration, self-propulsion point, derivative sources, and residual scale, the resulting mismatch may lie outside the structured disturbance space covered during training. In such cases, forcing the residuals to be projected onto the predefined expert basis can lead to unstable or physically unreasonable attribution. This does not indicate a failure of the framework; instead, it clarifies its applicability condition: stable attribution and targeted compensation are meaningful only when the true mismatch remains within or near the structured disturbance space defined during training. For real-vessel deployment, additional experts, new disturbance priors, or retraining with data closer to the target system may be required, and external validity still needs to be verified through captive tests, free-running experiments, and richer full-scale datasets.

4.8. Practical Deployment, Data Requirements, and Computational Cost

For practical deployment, the proposed framework can be integrated as an additional prediction-correction module built on top of an existing maneuvering model, rather than replacing the navigation or control system. The onboard navigation system provides measured vessel states, while the control system provides rudder angle or rudder command and propeller speed. The nominal MMG or Abkowitz model first generates the baseline acceleration and trajectory prediction. The encoder and expert networks then provide residual-acceleration corrections, and the corrected short- to medium-horizon trajectory can be passed to downstream modules such as collision-risk assessment, trajectory planning, decision support, or model predictive control.
The main deployment challenge lies in the preparation of calibration data for the expected mismatch sources. Such data may come from a combination of CFD simulations, model tests, sea trials, and operational records. Shallow-water data can be collected from operations or tests under different water-depth-to-draft ratios, whereas wind data are relatively straightforward to obtain because wind speed and direction are directly measurable by onboard sensors. By contrast, hull fouling and especially rudder degradation are more difficult to label during routine operation, and may require inspection records, maintenance logs, dedicated effectiveness tests, controlled degradation simulations, or system-identification data collected before and after known condition changes.
Based on these data, the expert networks can be calibrated to learn the residual acceleration between the nominal maneuvering model and the observed or simulated disturbed dynamics. After the experts are trained and frozen, the encoder can be trained using multi-condition trajectory windows to infer slowly varying endogenous disturbance coordinates from recent residual histories. During online operation, the encoder is updated periodically, while the expert networks are evaluated at each integration step to correct the nominal prediction. Directly measurable exogenous disturbances such as wind are supplied through onboard sensors and compensated by the wind expert, whereas slowly varying endogenous conditions are inferred from residual histories.
The runtime results in Table 12 indicate that this deployment mode is compatible with real-time use. On the tested CPU platform, the per-step cost of the expert networks and RK4 integration is 0.60 ± 0.03 ms, and the amortized per-step cost including the periodic encoder update is 0.71 ± 0.03 ms, both of which are far below the integration step size of Δ t = 200 ms used in the closed-loop prediction. These results suggest that the framework has sufficient computational margin for onboard implementation. Nevertheless, vessel-specific calibration, hardware-in-the-loop testing, and sea-trial validation are still required before practical deployment under real sensor noise and model uncertainty.

5. Conclusions

This study proposes a physics-structured residual learning framework for long-horizon autoregressive ship maneuvering prediction under multi-source model mismatch. Instead of absorbing all off-nominal effects into a single black-box correction, the proposed framework uses the MMG model as the physical backbone and decomposes systematic residual mismatch into disturbance-specific correction channels. Shallow-water effects, hull fouling, and rudder degradation are represented as endogenous slow-varying disturbance intensities inferred from residual sequences, while wind load is handled through a separate measured-input pathway. In this way, the framework combines source-wise residual compensation, physical interpretability, and stable closed-loop autoregressive prediction in a unified architecture.
The main results demonstrate that the proposed framework substantially improves long-horizon closed-loop prediction accuracy under complex disturbance conditions. Across the main closed-loop prediction and wind-ablation scenarios, the proposed method achieves position-RMSE reductions of 74.7–91.7% relative to the corresponding nominal-MMG and wind-ablation baselines. Compared with the BiLSTM baseline, the proposed method reduces the position RMSE from 255.3 m to 8.3 m in the severe shallow-water case and from 227.7 m to 14.4 m in the triple-source disturbance case. The wind-expert ablation further shows that explicitly compensating wind-induced residual forces and moments reduces the position RMSE from 296.1 m to 64.2 m under BF5 and from 586.8 m to 99.5 m under BF10. However, the proposed framework still has several limitations. First, the disturbance database is generated using simplified correction formulas rather than full CFD simulations or real-ship measurements, which reduces the physical fidelity of the generated disturbances to some extent but enables controllable and large-scale multi-source data generation. Second, the quantitative separability boundary among different disturbance sources has not yet been fully established. Although the residual-direction analysis indicates that the selected disturbance channels have distinguishable dominant signatures, a general criterion for deciding when disturbance sources should be separated into independent experts or merged into a shared correction module remains to be developed. Third, the present study does not aim to exhaustively model all environmental disturbances encountered during ship operation. Instead, representative parameter-modification disturbances and one directly measurable external load, namely wind, are used to verify the effectiveness of the proposed physics-structured residual learning framework. Wave and current effects are also important in real navigation, but their explicit inclusion would require additional sea-state or current-field inputs and corresponding environmental load models, which would considerably expand the scope of the present study. Therefore, these effects are not treated as independent expert channels here. Future work will focus on collecting and constructing real-ship maneuvering datasets under diverse disturbance conditions, extending the framework to wave- and current-inclusive scenarios, conducting real-ship validation, and integrating the method with onboard navigation and control systems.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jmse14090808/s1, Supplementary text: Additional analyses on the robustness and generalizability of the proposed physics-structured residual learning framework; Table S1: Cross-vessel transfer performance on the Mariner benchmark ship based on the Abkowitz model; Table S2: Robustness test with unseen correction formulas; Table S3: Closed-loop prediction performance under unseen correction formulas.

Author Contributions

Conceptualization, Z.X.; Methodology, Z.X.; Software, Z.X.; Formal analysis, Z.X.; Investigation, Z.Y.; Resources, B.L.; Writing—original draft, Z.X.; Writing—review and editing, Z.Y., B.L. and X.W.; Supervision, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The minimal dataset supporting the findings of this study is publicly available in Zenodo at https://doi.org/10.5281/zenodo.19363618. This dataset includes representative figure files, numerical data for the main tables and trajectory plots, and the core evaluation and plotting scripts used to generate the key figures and summary statistics reported in the manuscript. The remaining raw data are not publicly available at this stage because they require further curation, documentation, and organization before they can be shared in a clear and reusable form. However, additional raw data may be made available from the corresponding author upon reasonable request.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT (GPT-5.4 Thinking) for English translation and language polishing of the initial Chinese draft. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Disturbance Correction Formulation

The shallow-water resistance increment is modeled using the Lackenby formula [7]. The maneuvering derivatives and added masses are corrected based on the regression formulas of Kijima and Nakiri [6]. Using the standard variable σ = 1 / 1 ( T / h ) 2 , each parameter is scaled according to the power-law form C = σ n D , where the exponent n D is taken from the regression results reported by Y. Kijima for tanker-type vessels with C b > 0.7 .
Table A1. Parameter correction coefficients and sources for the shallow-water condition.
Table A1. Parameter correction coefficients and sources for the shallow-water condition.
ParameterCorrection Coefficient CSource
R 0 1 + 0.57 T / h 1.79 Lackenby [7]
m x σ 0.15 Kijima [6]
m y σ 1.00 Kijima [6]
J z σ 0.50 Kijima [6]
Y v σ 0.40 Kijima [6]
Y r σ 0.10 Kijima [6]
N v σ 0.30 Kijima [6]
N r σ 0.25 Kijima [6]
Y v v v , Y v v r σ 0.40 follows Y v
Y v r r , Y r r r σ 0.10 follows Y r
N v v v , N v v r σ 0.30 follows N v
N v r r , N r r r σ 0.25 f o l l o w s   N r
X v v , X v r , X r r σ 0.20 Engineering est.
f α σ 0.10 Engineering est.
w p 0 σ 0.20 Engineering est.
t p σ 0.10 Engineering est.
Based on the six-point interpolation of Schultz table I [9] and the formula of Townsin [10], the fouling intensity is defined as λ h f [ 0,1 ] , corresponding to the Schultz fouling grades from AF (clean) to F (heavy calcareous fouling). The resistance increment Δ R T and the friction-coefficient increment Δ C F are obtained by linear interpolation from the table below.
Table A2. Hull Fouling Levels and Corresponding Roughness and Resistance Variations.
Table A2. Hull Fouling Levels and Corresponding Roughness and Resistance Variations.
λ h f Grade k s ( μ m) Δ C F Δ R T
0.00AF300%0%
0.15B10010%5%
0.30C30021%11%
0.55D100035%21%
0.80E300052%34%
1.00F10,00080%55%
The correction of each parameter is formulated using the interpolated Δ C F as an intermediate variable.
Table A3. Parameter Correction Coefficients Based on the Interpolated Δ C F and Their Sources.
Table A3. Parameter Correction Coefficients Based on the Interpolated Δ C F and Their Sources.
ParameterCorrection Coefficient CSource
R 0 1 + Δ R T Schultz [9]
w p 0 1 + 0.20⋅Δ C F Townsin [10]
t p 1 + 0.15⋅Δ C F Townsin [10]
Y v 1 + 0.10⋅Δ C F Engineering estimate
N v 1 + 0.10⋅Δ C F Engineering estimate
X v v 1 + 0.12⋅Δ C F Engineering estimate
Y v v v 1 + 0.10⋅Δ C F Engineering estimate
N v v v 1 + 0.08⋅Δ C F Engineering estimate
It is parameterized by the rudder lift-gradient degradation factor d f [ 0 ,   0.25 ] , with the cascading effects governed by the MMG rudder–hull interaction model [2].
Table A4. Parameterization of Rudder Degradation and Associated MMG Interaction Effects.
Table A4. Parameterization of Rudder Degradation and Associated MMG Interaction Effects.
ParameterCorrection Coefficient CSource
f α 1 − 1.00 dfPrimary effect: reduced rudder lift
ε 1 − 0.20 dfDegraded flow field at the rudder
γ R 1 − 0.15 dfWeakened flow-straightening effect
a H 1 − 0.10 dfReduced interaction force
t R 1 + 0.08 dfReduced interaction force

Appendix B. Teacher-Forced Residual-Scale Sensitivity Analysis

Table A5. Teacher-forced residual-scale sensitivity under identical degradation of residual information. Here, α denotes the common scaling factor applied to the ground-truth residual sequence before inference. α = 1.00 means full residual information, whereas α = 0.00 means that the residual channels are fully removed. Min α for Pos. RMSE ≤ threshold denotes the smallest residual scale at which the mean position RMSE still remains below the specified threshold across five random seeds. Smaller values indicate stronger robustness to residual-information degradation.
Table A5. Teacher-forced residual-scale sensitivity under identical degradation of residual information. Here, α denotes the common scaling factor applied to the ground-truth residual sequence before inference. α = 1.00 means full residual information, whereas α = 0.00 means that the residual channels are fully removed. Min α for Pos. RMSE ≤ threshold denotes the smallest residual scale at which the mean position RMSE still remains below the specified threshold across five random seeds. Smaller values indicate stronger robustness to residual-information degradation.
ScenarioModelPos. RMSE at
α = 1.00 (m)
Heading RMSE at
α = 1.00 (deg)
Min α for Pos. RMSE ≤ 10 mMin α for Pos. RMSE ≤ 20 mPos. RMSE at
α = 0.00 (m)
Heading RMSE at α = 0.00 (deg)
Single severe shallow-water scenario (h/T = 1.2)BiLSTM15.65 ± 4.572.97 ± 0.64N/A0.7031.99 ± 6.285.43 ± 2.36
Single severe shallow-water scenario (h/T = 1.2)Ours1.29 ± 0.570.29 ± 0.130.600.3027.88 ± 3.853.75 ± 2.10
Triple-source disturbance scenarioBiLSTM15.89 ± 3.082.92 ± 0.37N/A0.5528.86 ± 4.344.27 ± 1.92
Triple-source disturbance scenarioOurs4.16 ± 1.540.32 ± 0.080.650.2525.50 ± 2.513.12 ± 0.94
N/A indicates that the model did not satisfy the corresponding RMSE threshold within the tested range of α; therefore, the minimum α value is not applicable.

Appendix C. Validation with MARIN Experimental Data

The mismatch scale in the MARIN data is substantially larger than that covered by the training disturbance space. The RMS residuals of the MARIN free-running trajectories reach approximately 68 104 × 10 3 in the surge and sway channels, whereas the shallow-water training condition at h / T = 1.5 produces only 3.3 × 10 3 and 2.9 × 10 3 , respectively. This corresponds to a roughly 20–35 times larger residual magnitude.
Figure A1. Encoder-estimated condition parameters λ applied to MARIN free-running model test data (KVLCC2, deep water). (a) 10°/10° zigzag, (b) −10°/−10° zigzag, (c) 20°/20° zigzag, and (d) −20°/−20° zigzag.
Figure A1. Encoder-estimated condition parameters λ applied to MARIN free-running model test data (KVLCC2, deep water). (a) 10°/10° zigzag, (b) −10°/−10° zigzag, (c) 20°/20° zigzag, and (d) −20°/−20° zigzag.
Jmse 14 00808 g0a1

References

  1. Abkowitz, M.A. Measurement of Hydrodynamic Characteristics from Ship Maneuvering Trials by System Identification. SNAME Trans. 1980, 88, 283–318. [Google Scholar]
  2. Yasukawa, H.; Yoshimura, Y. Introduction of MMG Standard Method for Ship Maneuvering Predictions. J. Mar. Sci. Technol. 2015, 20, 37–52. [Google Scholar] [CrossRef]
  3. Explanatory Notes to the Standards for Ship Manoeuvrability; Polski Rejestr Statków: Gdańsk, Poland, 2008.
  4. Kołodziej, R.; Hoffmann, P. Numerical Estimation of Hull Hydrodynamic Derivatives in Ship Maneuvering Prediction. Pol. Marit. Res. 2021, 28, 46–53. [Google Scholar] [CrossRef]
  5. Yuan, X.; Li, Z.; Wang, G.; Xue, W. Full-Scale Maneuvering Trials Correction and Motion Modelling Considering the Actual Sea Conditions. Sensors 2020, 20, 3963. [Google Scholar] [CrossRef]
  6. Kijima, K.; Nakiri, Y. Prediction Method of Ship Manoeuvrability in Deep and Shallow Waters. In Proceedings of the MARSIM and ICSM 1990, Tokyo, Japan, 4–7 June 1990; pp. 311–318. [Google Scholar]
  7. Lackenby, H. The Effect of Shallow Water on Ship Speed. Nav. Eng. J. 1964, 76, 21–26. [Google Scholar] [CrossRef]
  8. Li, J.; Wang, Q.; Dong, K.; Wang, X. Numerical Simulations of a Ship’s Maneuverability in Shallow Water. J. Mar. Sci. Eng. 2024, 12, 1076. [Google Scholar] [CrossRef]
  9. Schultz, M.P. Effects of Coating Roughness and Biofouling on Ship Resistance and Powering. Biofouling 2007, 23, 331–341. [Google Scholar] [CrossRef]
  10. Townsin, R.L. The Ship Hull Fouling Penalty. Biofouling 2003, 19, 9–15. [Google Scholar] [CrossRef]
  11. Choi, J.-E.; Kim, J.-H.; Lee, H.-G. Computational Investigation of Cavitation on a Semi-Spade Rudder. J. Mar. Sci. Technol. 2010, 15, 64–77. [Google Scholar] [CrossRef]
  12. Hao, J.; Zhang, M.; Huang, X. The Influence of Surface Roughness on Cloud Cavitation Flow around Hydrofoils. Acta Mech. Sin. 2018, 34, 10–21. [Google Scholar] [CrossRef]
  13. Yoshimura, Y.; Nakamura, M.; Taniguchi, T.; Yasukawa, H. Empirical Formulas of Hydrodynamic Parameters for Predicting Ship Maneuvering Based on the MMG-Model. Ocean Eng. 2025, 337, 121831. [Google Scholar] [CrossRef]
  14. Kim, D.; Tezdogan, T.; Incecik, A. Hydrodynamic Analysis of Ship Manoeuvrability in Shallow Water Using High-Fidelity URANS Computations. Appl. Ocean Res. 2022, 123, 103176. [Google Scholar] [CrossRef]
  15. Maljković, M.; Pavić, I.; Meštrović, T.; Perkovič, M. Ship Maneuvering in Shallow and Narrow Waters: Predictive Methods and Model Development Review. J. Mar. Sci. Eng. 2024, 12, 1450. [Google Scholar] [CrossRef]
  16. Fossen, T.I. How to Incorporate Wind, Waves and Ocean Currents in the Marine Craft Equations of Motion. IFAC Proc. Vol. 2012, 45, 126–131. [Google Scholar] [CrossRef]
  17. Gokarn, R.P. Environmental Effects. In A Study of Ship Manoeuvrability; Springer: Singapore, 2024; pp. 69–84. [Google Scholar] [CrossRef]
  18. Choi, W.; Min, G.; Han, S.; Yun, H.; Terziev, M.; Dai, S.; Kim, D.; Song, S. Resistance and Speed Penalty of a Naval Ship with Hull Roughness. Ocean Eng. 2024, 312, 119058. [Google Scholar] [CrossRef]
  19. Farkas, A.; Degiuli, N.; Martić, I.; Dejhalla, R. Impact of Hard Fouling on the Ship Performance of Different Ship Forms. J. Mar. Sci. Eng. 2020, 8, 748. [Google Scholar] [CrossRef]
  20. Zhu, M.; Hahn, A.; Wen, Y.; Bolles, A. Parameter Identification of Ship Maneuvering Models Using Recursive Least Square Method Based on Support Vector Machines. TransNav Int. J. Mar. Navig. Saf. Sea Transp. 2017, 11, 23–29. [Google Scholar] [CrossRef]
  21. Haseltalab, A.; Negenborn, R.R. Adaptive Control for Autonomous Ships with Uncertain Model and Unknown Propeller Dynamics. Control Eng. Pract. 2019, 91, 104116. [Google Scholar] [CrossRef]
  22. Menges, D.; Rasheed, A. An Environmental Disturbance Observer Framework for Autonomous Surface Vessels. Ocean Eng. 2023, 285, 115412. [Google Scholar] [CrossRef]
  23. Xu, H.; Guedes Soares, C. Review of System Identification for Manoeuvring Modelling of Marine Surface Ships. J. Mar. Sci. Appl. 2025, 24, 459–478. [Google Scholar] [CrossRef]
  24. Zhang, C.; Yu, S. Disturbance Observer-Based Prescribed Performance Super-Twisting Sliding Mode Control for Autonomous Surface Vessels. ISA Trans. 2023, 135, 13–22. [Google Scholar] [CrossRef]
  25. Liu, D.; Gao, X.; Huo, C.; Su, W. Research on Maneuvering Motion Prediction for Intelligent Ships Based on LSTM-Multi-Head Attention Model. J. Mar. Sci. Eng. 2025, 13, 503. [Google Scholar] [CrossRef]
  26. Guo, S.; Zhuang, S.; Wang, J.; Peng, X.; Liu, Y. Deep Learning-Based Non-Parametric System Identification and Interpretability Analysis for Improving Ship Motion Prediction. J. Mar. Sci. Eng. 2025, 13, 2017. [Google Scholar] [CrossRef]
  27. Wang, Z.; Xu, H.; Xia, L.; Zou, Z.; Soares, C.G. Kernel-Based Support Vector Regression for Nonparametric Modeling of Ship Maneuvering Motion. Ocean Eng. 2020, 216, 107994. [Google Scholar] [CrossRef]
  28. Moreira, L.; Soares, C.G. Simulating Ship Manoeuvrability with Artificial Neural Networks Trained by a Short Noisy Data Set. J. Mar. Sci. Eng. 2023, 11, 15. [Google Scholar] [CrossRef]
  29. Guzelbulut, C. Identification of Ship Maneuvering Behavior Using Singular Value Decomposition-Based Hydrodynamic Variations. J. Mar. Sci. Eng. 2025, 13, 496. [Google Scholar] [CrossRef]
  30. Jiang, Y.; Hou, X.-R.; Wang, X.-G.; Wang, Z.-H.; Yang, Z.-L.; Zou, Z.-J. Identification Modeling and Prediction of Ship Maneuvering Motion Based on LSTM Deep Neural Network. J. Mar. Sci. Technol. 2022, 27, 125–137. [Google Scholar] [CrossRef]
  31. Wang, Z.; Cheng, J.; Xu, L.; Hao, L.; Peng, Y. Hybrid Physics-ML Modeling for Marine Vehicle Maneuvering Motions in the Presence of Environmental Disturbances. arXiv 2024, arXiv:2411.13908. [Google Scholar] [CrossRef]
  32. An, G.; Xiang, G.; Xiang, X.; Guedes Soares, C. Physics Informed Neural Networks Based Identification Modelling of Ship Maneuvering Motion and Associated Optimal Excitation Design. Eng. Appl. Comput. Fluid Mech. 2025, 19, 2566860. [Google Scholar] [CrossRef]
  33. Zhang, Y.-W.; Xia, W.-K.; Zhu, M.-Y.; Zhang, X.-Y.; Liu, J.-D. IKN-NeuralODE Continuous-Time Modeling Method for Ship Maneuvering Motion. J. Mar. Sci. Eng. 2026, 14, 546. [Google Scholar] [CrossRef]
  34. Moreira, L.; Soares, C.G. Investigation of Vessel Manoeuvring Abilities in Shallow Depths by Applying Neural Networks. J. Mar. Sci. Eng. 2024, 12, 1664. [Google Scholar] [CrossRef]
  35. Wakita, K.; Akimoto, Y.; Maki, A. Probabilistic Prediction of Ship Maneuvering Motion Using Ensemble Learning with Feedforward Neural Networks. arXiv 2024, arXiv:2412.00363. [Google Scholar] [CrossRef]
  36. Jiang, Z.; Ma, Y.; Li, W. A Data-Driven Method for Ship Motion Forecast. J. Mar. Sci. Eng. 2024, 12, 291. [Google Scholar] [CrossRef]
  37. Andreas, J.; Rohrbach, M.; Darrell, T.; Klein, D. Neural Module Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 39–48. [Google Scholar] [CrossRef]
  38. Chakraborty, D.; Maulik, R.; Harrington, P.; Foster, D.; Nabian, M.A.; Choudhry, S. MoWE: A Mixture of Weather Experts. arXiv 2025, arXiv:2509.09052. [Google Scholar] [CrossRef]
  39. Mian, Z.; Deng, X.; Dong, X.; Tian, Y.; Cao, T.; Chen, K.; Jaber, T.A. A Literature Review of Fault Diagnosis Based on Ensemble Learning. Eng. Appl. Artif. Intell. 2024, 127, 107357. [Google Scholar] [CrossRef]
  40. Lai, J.; Zhang, Y.; Zhao, C.; Wang, J.; Yan, Y.; Chen, M.; Ji, L.; Guo, J.; Han, B.; Shi, Y.; et al. Multi-Expert Ensemble ECG Diagnostic Algorithm Using Mutually Exclusive–Symbiotic Correlation between 254 Hierarchical Multiple Labels. npj Cardiovasc. Health 2024, 1, 8. [Google Scholar] [CrossRef] [PubMed]
  41. Ma, Z.; Jiang, G.; Chen, J. Physics-Informed Ensemble Learning with Residual Modeling for Enhanced Building Energy Prediction. Energy Build. 2024, 323, 114853. [Google Scholar] [CrossRef]
  42. Fujiwara, T.; Ueno, M.; Nimura, T. Estimation of Wind Forces and Moments Acting on Ships. J. Soc. Nav. Archit. Jpn. 1998, 1998, 77–90. [Google Scholar] [CrossRef]
Figure 1. Framework of the proposed physics-structured residual learning method. The CNN-SE-BiLSTM encoder estimates disturbance intensities from a 60 s residual-state window. The labels SW, HF, RD, and Wind denote the MLP experts associated with shallow-water effects, hull fouling, rudder degradation, and wind loads, respectively. These experts generate residual acceleration corrections that are added to the nominal MMG dynamics for autoregressive closed-loop prediction.
Figure 1. Framework of the proposed physics-structured residual learning method. The CNN-SE-BiLSTM encoder estimates disturbance intensities from a 60 s residual-state window. The labels SW, HF, RD, and Wind denote the MLP experts associated with shallow-water effects, hull fouling, rudder degradation, and wind loads, respectively. These experts generate residual acceleration corrections that are added to the nominal MMG dynamics for autoregressive closed-loop prediction.
Jmse 14 00808 g001
Figure 2. Definition of the coordinate system used for ship maneuvering motion, redrawn based on Yasukawa and Yoshimura [2].
Figure 2. Definition of the coordinate system used for ship maneuvering motion, redrawn based on Yasukawa and Yoshimura [2].
Jmse 14 00808 g002
Figure 3. Multilayer perceptron (MLP) architecture used in this study.
Figure 3. Multilayer perceptron (MLP) architecture used in this study.
Jmse 14 00808 g003
Figure 4. CNN block of the proposed encoder.
Figure 4. CNN block of the proposed encoder.
Jmse 14 00808 g004
Figure 5. Structure of the SE channel attention block.
Figure 5. Structure of the SE channel attention block.
Jmse 14 00808 g005
Figure 6. Structure of the BiLSTM block.
Figure 6. Structure of the BiLSTM block.
Jmse 14 00808 g006
Figure 7. Distinguishability analysis of residual signals under different disturbance conditions based on singular values, column cosine similarity, and Jacobian structure. (a) Singular values of the Jacobian matrix over time. (b) Pairwise column cosine similarity between disturbance types. (c) Row-normalized heatmap of the mean absolute Jacobian magnitude, J , averaged over the trajectory ( N = 193 ). The numbers in each cell denote the corresponding unnormalized mean J i j , while the colormap highlights the relative sensitivity pattern within each residual channel.
Figure 7. Distinguishability analysis of residual signals under different disturbance conditions based on singular values, column cosine similarity, and Jacobian structure. (a) Singular values of the Jacobian matrix over time. (b) Pairwise column cosine similarity between disturbance types. (c) Row-normalized heatmap of the mean absolute Jacobian magnitude, J , averaged over the trajectory ( N = 193 ). The numbers in each cell denote the corresponding unnormalized mean J i j , while the colormap highlights the relative sensitivity pattern within each residual channel.
Jmse 14 00808 g007
Figure 8. Scatter plots of predicted and true disturbance intensities for the three channels.
Figure 8. Scatter plots of predicted and true disturbance intensities for the three channels.
Jmse 14 00808 g008
Figure 9. Closed-loop trajectory and state prediction results under the severe shallow-water random steering scenario (h/T = 1.2, λ s w = 1.00).
Figure 9. Closed-loop trajectory and state prediction results under the severe shallow-water random steering scenario (h/T = 1.2, λ s w = 1.00).
Jmse 14 00808 g009
Figure 10. Closed-loop trajectory and state prediction results under the triple-source disturbance scenario (SW: h/T = 1.5, λ s w = 0.92; HF: λ h f = 0.20; RD: λ r d = 0.60).
Figure 10. Closed-loop trajectory and state prediction results under the triple-source disturbance scenario (SW: h/T = 1.5, λ s w = 0.92; HF: λ h f = 0.20; RD: λ r d = 0.60).
Jmse 14 00808 g010
Figure 11. Closed-loop state prediction results under the severe shallow-water 10°/10° zigzag maneuver (h/T = 1.2, λ s w = 1.00), Observation phase: 0–400 s; autonomous prediction phase: 400–800 s.
Figure 11. Closed-loop state prediction results under the severe shallow-water 10°/10° zigzag maneuver (h/T = 1.2, λ s w = 1.00), Observation phase: 0–400 s; autonomous prediction phase: 400–800 s.
Jmse 14 00808 g011
Figure 12. Wind ablation study for closed-loop state prediction under the compound 10°/10° zigzag maneuver (base condition: SW h/T = 1.5, λ s w = 0.92; HF: λ h f = 0.20; RD: λ r d = 0.20). Comparing BF5 and BF10 wind intensities with and without the dedicated wind expert module. Observation phase: 0–120 s; autonomous prediction phase: 120–800 s.
Figure 12. Wind ablation study for closed-loop state prediction under the compound 10°/10° zigzag maneuver (base condition: SW h/T = 1.5, λ s w = 0.92; HF: λ h f = 0.20; RD: λ r d = 0.20). Comparing BF5 and BF10 wind intensities with and without the dedicated wind expert module. Observation phase: 0–120 s; autonomous prediction phase: 120–800 s.
Jmse 14 00808 g012
Figure 13. Tracking performance of disturbance intensity estimation under time-varying severe shallow-water conditions. (a) prescribed time-varying water-depth-to-draft ratio (h/T). (b) estimated disturbance intensities ( λ ) compared with the true values and UKF results. (c) tracking errors between the estimated and true disturbance intensities.
Figure 13. Tracking performance of disturbance intensity estimation under time-varying severe shallow-water conditions. (a) prescribed time-varying water-depth-to-draft ratio (h/T). (b) estimated disturbance intensities ( λ ) compared with the true values and UKF results. (c) tracking errors between the estimated and true disturbance intensities.
Jmse 14 00808 g013
Figure 14. Counterfactual channel intervention for the disturbance-intensity encoder. (a) Donor-source preference gap distribution across 90 matched intervention cases, with case-level mean gaps annotated above each box. Black rhombus denotes the mean value of each group. (b) Window-level donor top-1 identification rate under each intervention setting.
Figure 14. Counterfactual channel intervention for the disturbance-intensity encoder. (a) Donor-source preference gap distribution across 90 matched intervention cases, with case-level mean gaps annotated above each box. Black rhombus denotes the mean value of each group. (b) Window-level donor top-1 identification rate under each intervention setting.
Jmse 14 00808 g014
Table 1. List of symbols.
Table 1. List of symbols.
SymbolPhysical Meaning
u, v m , rSurge velocity, sway velocity at midship, yaw rate
m, m x , m y Mass of the ship, added mass in the surge direction, added mass in the sway direction
u ˙ , v m ˙ , r ˙ Surge, midship sway, yaw angular acceleration
X H , Y H , N H Hydrodynamic surge force, sway force, and yaw moment acting on the hull
X P Propeller thrust in the surge direction
X R , Y R , N R Surge force, sway force, and yaw moment generated by the rudder
I z z , J z z The yaw moment of inertia, the moment of inertia about the z-axis
X G The longitudinal position of the center of gravity
Ψ, δ, δ cmd , nheading angle, rudder angle, rudder angle command, propeller speed
h/TWater-depth-to-draft ratio
Table 2. Comparison between Wind-Induced Residuals and Shallow-Water Residuals in turning 20° (initial 90° beam wind).
Table 2. Comparison between Wind-Induced Residuals and Shallow-Water Residuals in turning 20° (initial 90° beam wind).
Condition λ Δ a surge         Δ a sway         Δ a yaw         Δ a Ratio to SW (h/T = 1.2)
BF 1 (0.8 m/s)4.63 × 10−55.41 × 10−51.01 × 10−77.12 × 10−50.5%
BF 3 (4.3 m/s)4.26 × 10−51.33 × 10−42.84 × 10−71.39 × 10−40.9%
BF 5 (9.3 m/s)0.02.52 × 10−52.81 × 10−48.99 × 10−72.82 × 10−41.8%
BF 8 (18.9 m/s)0.67.66 × 10−69.21 × 10−44.92 × 10−69.21 × 10−45.9%
BF 10 (26.4 m/s)1.03.47 × 10−61.74 × 10−31.11 × 10−51.74 × 10−311.1%
SW h/T = 3.00.531.15 × 10−31.44 × 10−35.56 × 10−61.84 × 10−311.7%
SW h/T = 1.50.925.05 × 10−36.60 × 10−32.49 × 10−58.31 × 10−352.9%
SW h/T = 1.21.001.07 × 10−21.15 × 10−24.29 × 10−51.57 × 10−2100%
Table 3. Configuration of the Expert Networks.
Table 3. Configuration of the Expert Networks.
ItemValue
Hidden dimensions[256, 256, 128]
Input dim (SW/HF/RD/Wind)7D/10D (wind)
OptimizerAdamW ( lr = 1 × 10 3 ,   weight   decay = 1 × 10 5 )
LR scheduleReduceLROnPlateau (factor = 0.5)
Early stoppingpatience = 40 epochs
I/O normalizationZ-score (per-channel)
Training data600 s episodes, dt = 0.2 s, ~630 K samples per expert
Table 4. Detailed architecture and parameter configuration of the proposed CNN–SE–BiLSTM model.
Table 4. Detailed architecture and parameter configuration of the proposed CNN–SE–BiLSTM model.
LayerOutput SizeKernel/Hidden Size + StrideParameters
Input10 × 300
Conv1 + BN + ReLU + Dropout64 × 150k = 7; s = 24.7 K
Conv2 + BN + ReLU + Dropout128 × 75k = 5; s = 241.3 K
Conv3 + BN + ReLU + Dropout128 × 75k = 3; s = 149.5 K
SE Block128 × 75r = 84.2 K
BiLSTM (2 layers)256h = 128660 K
FC1 + ReLU + Dropout12833 K
FC2 + ReLU + Dropout648.3 K
FC3 + Sigmoid30.2 K
Total~801 K
Table 5. Estimation accuracy of the proposed model for different disturbance categories.
Table 5. Estimation accuracy of the proposed model for different disturbance categories.
Disturbance CategoryR2MAE
Shallow-water effects λ s w 0.9900.012
Hull fouling λ h f   0.9930.013
Rudder degradation λ r d 0.9900.012
Overall (3 channels)0.9910.013
Table 6. Closed-loop prediction performance comparison under the severe shallow-water random steering scenario (h/T = 1.2, λ s w = 1.00), N = 10.
Table 6. Closed-loop prediction performance comparison under the severe shallow-water random steering scenario (h/T = 1.2, λ s w = 1.00), N = 10.
MethodPosition RMSEΨ RMSEFinal Position Error nRMSE u nRMSE v m nRMSE r
Nominal MMG68.3 ± 28.47.1 ± 4.7155.1 ± 71.11.84 ± 0.792.81 ± 1.321.15 ± 0.59
BiLSTM255.3 ± 43.240.4 ± 9.0639.9 ± 99.24.73 ± 4.1711.67 ± 6.625.32 ± 1.77
MLP55.5 ± 32.06.7 ± 5.3130.9 ± 83.71.00 ± 0.532.66 ± 1.451.06 ± 0.63
ConcatMLP (True λ )8.5 ± 5.81.1 ± 0.620.5 ± 13.20.10 ± 0.040.41 ± 0.200.16 ± 0.07
UKF9.8 ± 14.11.1 ± 1.422.7 ± 32.20.15 ± 0.070.32 ± 0.180.17 ± 0.07
Oracle λ 12.0 ± 7.01.5 ± 0.728.2 ± 15.70.21 ± 0.160.49 ± 0.280.20 ± 0.09
Ours8.3 ± 3.01.0 ± 0.519.4 ± 6.40.19 ± 0.130.38 ± 0.230.17 ± 0.07
Table 7. Closed-loop prediction performance comparison under the triple-source disturbance scenario (SW: h/T = 1.5, λ s w = 0.92; HF:   λ h f = 0.20; RD: λ r d = 0.60), N = 10.
Table 7. Closed-loop prediction performance comparison under the triple-source disturbance scenario (SW: h/T = 1.5, λ s w = 0.92; HF:   λ h f = 0.20; RD: λ r d = 0.60), N = 10.
MethodPosition RMSEΨ RMSEFinal Position Error nRMSE u nRMSE v m nRMSE r
Nominal MMG56.9 ± 25.14.9 ± 3.9123.6 ± 62.51.69 ± 0.691.44 ± 0.500.85 ± 0.35
BiLSTM227.7 ± 60.734.1 ± 9.2574.2 ± 144.82.94 ± 2.927.18 ± 4.794.34 ± 1.73
MLP47.7 ± 27.75.8 ± 3.5111.6 ± 68.40.84 ± 0.481.38 ± 0.580.81 ± 0.38
ConcatMLP (True λ )1.5 ± 0.60.1 ± 0.13.3 ± 1.60.05 ± 0.020.06 ± 0.030.02 ± 0.01
UKF19.5 ± 13.22.0 ± 1.746.5 ± 31.70.53 ± 0.300.46 ± 0.230.27 ± 0.15
Oracle λ 3.7 ± 1.10.3 ± 0.27.8 ± 2.30.16 ± 0.040.10 ± 0.050.05 ± 0.02
Ours14.4 ± 6.71.2 ± 0.933.4 ± 15.90.51 ± 0.280.31 ± 0.110.19 ± 0.10
Table 8. Closed-loop prediction performance comparison under the severe shallow-water 10°/10° zigzag maneuver (h/T = 1.2, λ s w = 1.00), N = 10.
Table 8. Closed-loop prediction performance comparison under the severe shallow-water 10°/10° zigzag maneuver (h/T = 1.2, λ s w = 1.00), N = 10.
MethodPosition RMSEΨ RMSEFinal Position Error nRMSE u nRMSE v m nRMSE r
Nominal MMG256.1 ± 99.616.7 ± 7.0557.5 ± 270.35.28 ± 1.171.81 ± 0.260.96 ± 0.19
BiLSTM811.5 ± 86.362.1 ± 6.11913.4 ± 170.111.62 ± 7.234.02 ± 0.152.94 ± 0.13
MLP189.7 ± 84.114.0 ± 6.2448.5 ± 226.53.14 ± 1.471.53 ± 0.300.75 ± 0.20
ConcatMLP (True λ )5.7 ± 4.40.5 ± 0.513.9 ± 13.30.23 ± 0.100.10 ± 0.030.04 ± 0.02
UKF20.1 ± 6.91.7 ± 1.146.6 ± 27.01.00 ± 0.380.24 ± 0.100.14 ± 0.05
Oracle λ 17.3 ± 6.60.9 ± 0.438.3 ± 14.01.13 ± 0.220.09 ± 0.030.05 ± 0.01
Ours21.2 ± 13.12.3 ± 0.759.4 ± 30.00.88 ± 0.450.28 ± 0.020.17 ± 0.02
Table 9. Architecture and training configuration of all compared methods.
Table 9. Architecture and training configuration of all compared methods.
MethodArchitectureHidden ConfigParamsInputOutputTraining Traj
BiLSTMCNN-SE-BiLSTMConv [64, 128, 256], SE, BiLSTM
(h = 256, L = 2)
2,792,93110-channel sliding window3-DOF residual ( Δ u ˙ ,   Δ v ˙ m ,   Δ r ˙ )4730 (same as Encoder)
MLP4-layer MLP[1024, 1024, 512, 256]1,719,299(u, v, r, δ, δ ˙ , n)3-DOF residual ( Δ u ˙ ,   Δ v ˙ m ,   Δ r ˙ )4730 (same as Encoder)
ConcatMLP (True λ )3-layer MLP[512, 512, 256]403,45911D (state + true λ )3-DOF residual ( Δ u ˙ ,   Δ v ˙ m ,   Δ r ˙ )700 (multi-condition)
Ours EncoderCNN-SE-BiLSTMConv [64, 128, 128], SE, BiLSTM (h = 128, L = 2)801,23810-channel sliding window λ 4730
Ours ExpertPer-condition MLP[256, 256, 128]3 × 102, 403 + 1 × 103,171SW/HF/RD: 7D; Wind: 10D3-DOF residual ( Δ u ˙ ,   Δ v ˙ m ,   Δ r ˙ )210
Ours Total~1.21 M
Table 10. Wind expert ablation results under the compound zigzag maneuver with BF5 and BF10 wind intensities.
Table 10. Wind expert ablation results under the compound zigzag maneuver with BF5 and BF10 wind intensities.
ConfigPos RMSEEndpoint Err
BF5 No Expert296.1 m763.0 m
BF5 With Expert64.2 m170.9 m
BF10 No Expert586.8 m1389.9 m
BF10 With Expert99.5 m163.7 m
Table 11. Summary of position-RMSE reductions in the main closed-loop prediction and wind-ablation scenarios.
Table 11. Summary of position-RMSE reductions in the main closed-loop prediction and wind-ablation scenarios.
ScenarioReference ComparisonPosition RMSEPosition-RMSE Reduction
Severe shallow-water random steeringNominal MMG → Ours68.3 m → 8.3 m87.8%
Triple-source random steeringNominal MMG → Ours56.9 m → 14.4 m74.7%
Severe shallow-water 10°/10° zigzagNominal MMG → Ours256.1 m → 21.2 m91.7%
BF5 wind-inclusive zigzagNo Wind Expert → With Wind Expert296.1 m → 64.2 m78.3%
BF10 wind-inclusive zigzagNo Wind Expert → With Wind Expert586.8 m → 99.5 m83.0%
Table 12. Runtime statistics of the proposed framework for online deployment. The encoder is updated once every 10 s (50 steps); its cost is amortized over the update interval.
Table 12. Runtime statistics of the proposed framework for online deployment. The encoder is updated once every 10 s (50 steps); its cost is amortized over the update interval.
ModuleParametersSingle-Run Forward Time
Encoder
Encoder (CNN–SE–BiLSTM)
801 K2.70 ± 0.08 ms
Four expert networks (total)~410 K0.48 ± 0.03 ms
MMG RK4 single-step integration0.123 ± 0.005 ms
Per-step cost (experts + RK4)0.60 ± 0.03 ms
Amortized per-step cost (incl. encoder)0.71 ± 0.03 ms
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, Z.; Yao, Z.; Luo, B.; Wang, X. Physics-Structured Residual Learning for Ship Maneuvering Prediction: Multi-Source Disturbance Decomposition and Compensation. J. Mar. Sci. Eng. 2026, 14, 808. https://doi.org/10.3390/jmse14090808

AMA Style

Xu Z, Yao Z, Luo B, Wang X. Physics-Structured Residual Learning for Ship Maneuvering Prediction: Multi-Source Disturbance Decomposition and Compensation. Journal of Marine Science and Engineering. 2026; 14(9):808. https://doi.org/10.3390/jmse14090808

Chicago/Turabian Style

Xu, Zizhuo, Ziyang Yao, Binqiao Luo, and Xianzhou Wang. 2026. "Physics-Structured Residual Learning for Ship Maneuvering Prediction: Multi-Source Disturbance Decomposition and Compensation" Journal of Marine Science and Engineering 14, no. 9: 808. https://doi.org/10.3390/jmse14090808

APA Style

Xu, Z., Yao, Z., Luo, B., & Wang, X. (2026). Physics-Structured Residual Learning for Ship Maneuvering Prediction: Multi-Source Disturbance Decomposition and Compensation. Journal of Marine Science and Engineering, 14(9), 808. https://doi.org/10.3390/jmse14090808

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop