1. Introduction
With the continued acceleration of economic globalization, cross-border trade has become increasingly interconnected, time-sensitive, and complex. Modern supply chains span continents, linking production facilities, distribution hubs, and consumer markets into vast networks whose stability depends on the seamless flow of logistics. Maritime transport underpins the global trading system, with international shipping accounting for more than 80% of world trade by volume [
1], making the stability of international commerce and the broader global economy critically dependent on maritime logistics. From containerized consumer goods to essential bulk commodities such as energy resources and agricultural products, maritime operations play a decisive role in shaping the resilience and competitiveness of global supply chains.
At the same time, this central role exposes maritime transport to increased vulnerability under dynamic ocean conditions. Excessive rolling or pitching can destabilize cargo, forcing vessels to reduce speed or alter routes, thereby disrupting shipping schedules [
2]. Severe vessel motions, such as excessive roll, pitch, and heave induced by waves and wind, may lead to container displacement, higher insurance claims, and increased operating costs. Here, roll refers to the rotational motion about the longitudinal axis that can cause lateral cargo instability, pitch denotes the rotational motion about the transverse axis affecting longitudinal load balance, and heave represents the vertical oscillatory motion that increases dynamic loading on cargo securing systems. Persistent oscillations further elevate fuel consumption and complicate berthing operations, triggering port congestion and demurrage. For perishable or high-value cargoes, including food, pharmaceuticals, and energy products, such uncertainties result in significant financial losses and may even pose humanitarian risks. Consequently, effective global logistics management requires not only addressing the technical challenges of vessel motion prediction but also recognizing its strategic importance in ensuring resilient, efficient, and reliable supply chains.
Despite its importance, ship motion forecasting remains a formidable challenge. Accurately modeling complex maritime environments while satisfying the high precision requirements of modern logistics continues to be difficult for existing approaches.
Existing studies on ship motion forecasting can be broadly categorized into physics-based, statistical, and data-driven methods. Physics-based models provide physically interpretable insights into vessel motion and maneuvering behaviors. For instance, Perera and Soares integrated an extended Kalman filter with a physics-based ship maneuvering model to estimate and predict vessel motion states, demonstrating the effectiveness of combining physical dynamics with statistical filtering for capturing nonlinear ship behavior [
3]. However, such physics-based or physics-informed approaches often rely on computationally intensive simulations and detailed environmental inputs, which limits their scalability and practicality for real-time deployment in complex maritime environments. Statistical and traditional machine learning methods offer computational efficiency and interpretability. For example, Ma et al. investigated the application of ARIMA models for ship trajectory prediction, demonstrating the effectiveness of classical statistical approaches in capturing short-term trends and local dynamic patterns in vessel motion [
4]. In addition, Zheng et al. proposed a support vector machine (SVM)-based collision risk assessment method for maritime traffic, illustrating the practicality of conventional machine learning techniques in safety-related decision-making tasks [
5]. However, these methods typically rely on linear assumptions or shallow model structures, which restricts their ability to capture complex nonlinear dynamics and cross-variable interactions in highly coupled maritime environments. More recently, deep learning approaches, particularly recurrent architectures such as RNNs, LSTMs, and GRUs, have improved temporal modeling performance and become common baselines for ship motion prediction. For example, Murray and Perera proposed a data-driven framework that leverages historical AIS data to predict future vessel behavior, demonstrating the capability of recurrent neural networks to model sequential dependencies in maritime traffic patterns [
6]. Similarly, Wang et al. developed single-input single-output (SISO) and multi-input single-output (MISO) deep learning models for ship roll motion prediction, highlighting the effectiveness of recurrent architectures in learning temporal dynamics from multivariate ship motion signals [
7]. Nevertheless, these recurrent models are prone to error accumulation and gradient vanishing when applied to long prediction horizons, which undermines their effectiveness in capturing long-term temporal dependencies.
To address these limitations, we propose DSformer [
8], a non-recurrent, data-driven framework for ship motion prediction. By integrating a dual-sampling mechanism that captures both global trends and local fluctuations with a time-varying variable attention module, DSformer explicitly models long-range temporal dependencies and cross-variable interactions in multivariate ship motion time series.
Specifically, DSformer represents the first ship motion prediction framework that completely abandons recurrent architectures, thereby avoiding error accumulation and vanishing gradient issues commonly observed in existing deep learning models. In addition, its data-driven dual-attention mechanism is designed to jointly capture temporal autocorrelation and cross-variable dependencies. Furthermore, the proposed framework is validated using real-world datasets encompassing real sea states rather than relying solely on simulated environments, demonstrating robust predictive performance under realistic operating conditions.
The proposed framework is designed to support ship motion forecasting across different temporal horizons, thereby enabling a wide range of practical maritime applications.
For short-term forecasting tasks that demand high precision, such as those observed in electricity load balancing, traffic flow monitoring, and short-horizon weather prediction, accurate modeling requires sensitivity to rapid temporal fluctuations and abrupt dynamic changes. This requirement is addressed through its local sampling mechanism and time-focused attention, which emphasize short-term temporal dependencies and capture high-frequency variations present in these benchmark datasets [
8].
For long-term forecasting tasks in domains such as taxation analysis, energy demand forecasting, and disease monitoring, reliable prediction depends on capturing global trends, periodic patterns, and accumulated temporal effects over extended horizons. The global sampling strategy and channel-focused attention of the proposed model enable the model to learn long-range temporal dependencies and cross-variable interactions, providing stable and consistent forecasts that are essential for long-term planning and policy-oriented decision making [
8].
By jointly modeling short-term dynamics and long-term temporal structures within a unified non-recurrent framework, the proposed approach offers a flexible and scalable solution that bridges operational safety requirements and strategic logistics optimization in maritime transportation.
Compared with existing machine learning models, the proposed model offers a more reliable and scalable solution for maritime applications by enabling stable long-horizon prediction under real sea conditions, which is critical for operational decision-making in shipping and logistics.
In summary, while this study primarily addresses the technical challenges of ship motion forecasting through advanced machine intelligence, it offers a vital perspective on operational efficiency and safety. By providing high-precision predictions, our model establishes a technical foundation for critical maritime scenarios. Specifically, it has the potential to identify safe operational windows for helicopter landings and offshore wind farm maintenance, assist harbor pilots during complex berthing maneuvers, and mitigate risks of cargo damage due to parametric rolling. This work illustrates that accurate vessel motion prediction is not merely a theoretical pursuit but a key enabler for smarter, safer, and more efficient maritime logistics.
2. Materials and Methods
2.1. DSformer
Accurate ship motion prediction demands models that can simultaneously capture both long-term trajectory trends and fine-scale short-term dynamics, while leveraging correlations across multi-sensor streams under uncertain marine environments. To address this challenge, we propose DSformer, a dual-structured architecture that integrates a Dual Sampling Module and a Dual-Focus Module for hierarchical feature extraction and fusion.
As shown in
Figure 1, the Dual Sampling Module simultaneously encodes global and local motion patterns through down-sampling and segment sampling. The former emphasizes low-frequency signals to capture global voyage trends, while the latter preserves high-frequency signatures linked to short-term maneuvers. Within the Dual-Focus Module, temporal attention integrates these complementary representations to uncover latent temporal dependencies, while variable attention captures cross-sensor correlations (e.g., between speed, heading, and wind). This dual-attention mechanism enables robust state representation across varying sea conditions.
The fused features are first aggregated through normalization and residual operations, and subsequently decoded by a multilayer perceptron (MLP) to produce final predictions. The hierarchical and parallel architecture not only enhances predictive accuracy and stability but also improves scalability and facilitates optimization on real-world ship datasets.
2.2. Dual Sampling Module
The primary function of this module is to transform the original multivariate time-series data
, where
represents the number of variables and
denotes the number of time steps, into two three-dimensional feature tensors
and
(Here, the subscripts “
” and “
” denote Reduction Sampling and Partitioned Sampling, respectively.), each with a distinct emphases, where
is the number of transformed subsequences. In ship motion prediction,
is more suitable for capturing the macro trends of overall ship movement, while
highlights local maneuver details.
Figure 2 illustrates the two sampling methods.
Reduction Sampling: As shown in
Figure 2a, for the
i-th variable,
, the raw sequence with an initial length of
is processed using Reduction Sampling to obtain
one-dimensional subsequences, each of length
. This operation yields a set of subsequences that span extended temporal intervals. To mitigate the information loss introduced by Reduction Sampling, these
subsequences are concatenated to form a feature matrix
. Finally, stacking the feature matrices from
variables yield the three-dimensional feature tensor
. These subsequences are evenly distributed over time, emphasizing long-term trends and mitigating the interference of local noise. Each subsequence characterizes the ship’s motion pattern over an extended temporal scale. For the
j-th subsequence, its main composition is defined as follows:
Partitioned Sampling: As shown in
Figure 2b, Partitioned Sampling divides the original sequence into
segments based on temporal continuity, with each segment containing a consecutive data length of
. For the
i-th variable,
, these
segments are concatenated to form a feature matrix
, to avoid potential information loss. This process is critical for capturing the fine-grained maneuvers and short-term environmental disturbances of the ship. By combining the feature matrices
from
variables, the final tensor is denoted as
. During this process, each segment preserves its local continuity, enabling a comprehensive representation of local features. For the
j-th subsequence, its main composition is summarized as follows:
In summary, the Dual Sampling Module effectively extracts global trend information while preserving local detailed features, thereby providing rich and complementary input for the subsequent exploration by the Dual-Focus Module.
2.3. Dual-Focus Module
Ship motion data exhibit pronounced temporal autocorrelation and cross-variable dependencies, offering inherent structural priors that can be used in model design. For example, the sustained positive autocorrelation over hundreds of time steps in
Figure 3 motivates modeling long-range temporal dependencies, while the strong positive and negative correlations among motion and velocity variables in
Figure 4 motivate explicit cross-variable interaction modeling.
Figure 3 illustrates the autocorrelation structure across key ship motion variables, revealing clear periodicity and long-range dependencies over time. This indicates that temporal dynamics are not random fluctuations but rather coherent patterns driven by vessel motion and environmental forces.
Figure 4 further shows the cross-variable correlation matrix highlights strong couplings among different physical quantities (e.g., yaw, pitch, roll angles and velocities). Such correlations reflect the intrinsic coupling among different degrees of freedom in ship dynamics and the environmental conditions that govern them. Since the dataset contains long continuous time series, even weak correlations tend to be statistically significant. Therefore, we focus on the magnitude and sign of the correlation coefficients to highlight practically meaningful inter-variable dependencies rather than reporting
p-values.
Building on these data-driven observations, the Dual-Focus Module incorporates two complementary attention mechanisms: the Time-Focused Path, which learns long-range dependencies along the time axis, and the Channel-Focused Path, which captures cross-variable interactions.
This dual-attention design enables the model to effectively leverage the intrinsic structure of maritime time-series data, capturing both the dynamic evolution of vessel motion over time and the inter-variable coupling that arises under complex sea conditions.
The Dual-Focus Module comprises two parallel components, the Time-Focused Path and the Channel-Focused Path, which work together to model temporal dynamics and inter-variable interactions. The Time-Focused Path captures contextual dependencies along the temporal axis, revealing periodic patterns and long-term trends in navigation trajectories. In parallel, the Channel-Focused Path models intrinsic relationships among variables and extracts key interaction signals that are essential for accurate ship state estimation from heterogeneous sensor streams. The outputs of both paths are subsequently fused and normalized before being projected into a two-dimensional tensor for MLP-based decoding.
Time-Focused Path: The tensors
and
generated by the Dual Sampling Module are used as inputs. For clarity, the computation is illustrated using
; the same procedure is applied to
, to obtain
and
. In the Time-Focused Path, the three-dimensional input is linearly transformed to generate the query, key, and value matrices. A multi-head temporal attention mechanism then computes the similarity between the query and key matrices and applies weighted aggregation over the value to produce an initial temporal representation:
Here, represents a fully connected layer, and denotes the normalized exponential function.
Residual connections and layer normalization are applied to further refine this representation:
This helps maintain stable gradients and consistent feature distributions across samples. By capturing long-range temporal dependencies, the Time-Focused Path encodes navigation periodicity, environmental fluctuations, and dynamic ship responses.
Channel-Focused Path: The tensors and from the Dual Sampling Module serve as inputs. For clarity, the computation is illustrated using ; the same procedure is applied to , yielding and . The Channel-Focused Path extracts inter-variable dependencies across sensor streams by treating the variable dimension as a sequence for attention computation. Since raw sensor data are stored in a fixed order that is unsuitable for attention operations, we first reorder the variable dimension, enabling each sensor representation to participate independently in the attention process. This transformation ensures accurate similarity estimation among variables and prevents information entanglement due to improper data arrangement.
To make the reordering operation mathematically explicit, the reduction-sampled tensor is permuted by moving the variable dimension to the last axis. After this permutation, is represented as a tensor of shape . In this representation, each variable corresponds to one attention token, while the subsequence and temporal dimensions jointly form the feature embedding associated with each token.
Given the reordered input
, we apply a linear projection to produce.
and compute multi-head attention along the variable dimension:
Unlike the Time-Focused Path, this component employs only the attention layer to reduce computational complexity while capturing latent correlations among sensor channels. The resulting representation emphasizes inter-variable interactions, such as those between roll and pitch, that are crucial for accurate ship state estimation.
Based on
,
,
, and
, the fused features are then obtained using the following formulas, respectively:
Information Flow: The outputs
and
from the Dual-Focus Module are first fused through layer normalization, producing a two-dimensional tensor
that encodes both global and local temporal structures, as well as cross-variable relationships:
This integration allows the subsequent prediction to directly utilize the refined high-level representations for accurate forecasting. Specifically, an MLP decoder maps the refined high-level features to the future ship state:
2.4. Loss Function
To improve convergence and robustness, a hybrid loss combining
and
objectives is used during training. The
term emphasizes absolute deviations, which increases resistance to outliers and allows the model to capture abrupt state transitions in ship dynamics [
9]. In contrast, the
term minimizes the mean squared error, encouraging smooth and stable long-horizon forecasts [
10]. The joint loss function is defined as:
where
controls the trade-off between local robustness and global stability.
In this study, the weighting factor is treated as a fixed hyperparameter that balances robustness and smoothness. Unless otherwise specified, is set to 0.5 for all experiments, assigning equal importance to the and components. This choice follows common practice in hybrid loss design for time-series forecasting and robust regression, where equal weighting provides a stable and reliable trade-off between sensitivity to transient disturbances and overall prediction smoothness.
We further verified that the proposed model is not highly sensitive to moderate variations of . Preliminary experiments with values ranging from 0.3 to 0.7 exhibited similar convergence behavior and prediction accuracy across all three datasets. As a result, a fixed is adopted for all datasets to ensure experimental consistency and to avoid dataset-specific tuning, which could otherwise obscure the evaluation of the proposed model.
3. Results
All experiments were implemented using the open-source time-series forecasting framework BasicTS (
https://github.com/GestaltCogTeam/BasicTS (accessed on 15 November 2025)), with customized model configurations and extensions to support multivariate ship motion data. The efficacy of the proposed method has been verified under testing conditions distributionally similar to the training phase, ensuring reliable performance within the intended operational domain.
3.1. Experimental Setup
Dataset. To rigorously evaluate the performance of DSformer, we conducted comparative experiments on three real-world ship motion datasets, denoted as Env01, Env02, and Env03 (see
Table 1). All datasets were collected from the same vessel to ensure consistent platform characteristics while exposing the model to diverse environmental and operational conditions.
The data were acquired from a 20-ton class vessel with a length of 17 m and a beam of 3 m. A medium-precision fiber-optic inertial navigation system (INS) was installed near the vessel’s center of mass to measure six-degree-of-freedom (6-DOF) motion responses. The INS was a commercially available, factory-calibrated system, and all sensor outputs were recorded using a dedicated onboard data acquisition computer at a sampling frequency of 100 Hz.
All motion quantities were expressed in a body-fixed coordinate frame, where the origin is located at the vessel’s center of mass. The X-axis points forward along the vessel’s longitudinal direction, the Y-axis points to starboard along the transverse direction, and the Z-axis points upward along the vertical direction. The measured motion variables include: (1) roll, pitch, and yaw angles; (2) roll, pitch, and yaw angular rates; (3) surge, sway, and heave velocities. These nine variables jointly characterize the vessel’s translational and rotational motion dynamics.
The three datasets correspond to different environmental conditions and sea states. A summary of the environmental parameters, including wind speed, wind direction, water speed, temperature, humidity and sea conditions, is provided in
Table 2. Each dataset was segmented into training, validation, and test subsets using an 8:1:1 ratio.
Table 2 summarizes the average environmental conditions of the three datasets. Compared to Env01 and Env02, Env03 experiences substantially higher wind speeds, elevated humidity levels, and minimal current velocities. Such conditions amplify nonlinearly coupled and high-intensity vessel responses [
11], increasing the complexity of modeling and prediction. In contrast, Env01 and Env02 were collected under more moderate and balanced conditions, resulting in smoother temporal dynamics [
12].
Beyond average environmental levels, dataset complexity in this study is primarily defined in terms of the dynamic variability and non-stationarity of ship motion responses, which directly determine the difficulty of motion forecasting. To quantitatively assess this aspect, we analyze the short-term variation intensity of key motion variables by examining the distributions of the absolute first-order differences in pitch rate and roll rate.
As illustrated in
Figure 5, Env03 exhibits significantly broader distributions and heavier tails in both |
Pitch Rate| and |
Roll Rate| compared with Env01 and Env02, indicating more abrupt temporal changes and stronger non-stationary dynamics. Such behavior reflects more irregular and volatile motion responses induced by complex environmental forcing.
Taken together, the harsher environmental conditions summarized in
Table 2 and the elevated dynamic variability observed in
Figure 5 jointly demonstrate that Env03 represents the most dynamically complex dataset in this study. Consequently, Env03 is treated as a more challenging benchmark for evaluating the robustness and effectiveness of ship motion forecasting models.
Summary statistics, including the mean and standard deviation (
Table 3), further highlight these contrasts. The pronounced variability in both environmental and motion characteristics makes this dataset suite particular suitable for evaluating the robustness and generalization of ship motion forecasting models.
Baselines. To demonstrate the efficiency of DSformer, we selected 13 SOTA models, recognized for their strong performance in time series forecasting, as benchmarks. These include PatchTST [
13], LightTS [
14], Nlinear [
15], Dlinear [
15], CycleNet [
16], Crossformer [
17], iTransformer [
18], SOFTS [
19], Autoformer [
20], Informer [
21], Fredformer [
22], ETSformer [
23], and Pyraformer [
24].
Hyperparameters. The primary hyperparameter values for DSformer are shown in
Table 4. All models were implemented in PyTorch 2.0.1 and trained for a fixed number of epochs (e.g., 100), with the best-performing checkpoint selected based on validation loss. The optimizer and remaining model hyperparameters retained their default values as defined in the PyTorch and BasicTS frameworks.
Evaluation. Evaluation metrics are vital for assessing model performance. To provide a more comprehensive validation of DSformer, we adopted five metrics in our experiments:
MAE,
MSE,
MAPE,
RMSE, and
WAPE.
3.2. Prediction Performance
Table 5,
Table 6 and
Table 7 summarize the predictive performance of DSformer and 13 baselines models on the Env01, Env02, and Env03 datasets, with the best scores highlighted in red.
Figure 6 visualizes the results via a logarithmic radar chart. Several key observations can be made:
Consistent superiority: DSformer achieves the lowest error across all metrics on both Env01 and Env03, demonstrating its superior accuracy advantage.
Global–local balance: Models such as PatchTST and LightTS perform competitively in capturing global trends but still fall short of DSformer in overall predictive accurancy.
Architectural limitations: CycleNet, Crossformer, and iTransformer show higher errors, likely resulting from insufficient integration of local and global information.
Dual-path strength: DSformer’s dual sampling and dual-focus design enables effective extraction of multi-scale temporal and cross-variable dependencies, highlighting its superior performance in long-horizon.
Robust stability: On Env02, although Crossformer approaches DSformer on certain metrics, DSformer maintains more balanced and stable performance.
Resilience under complexity: On Env03, Fredformer slightly outperforms DSformer in MAPE; however, DSformer matches or exceeds its performance on other metrics, demonstrating greater robustness under Complex conditions.
Overall advantage: Despite close competition on certain metrics, DSformer consistently delivers the most balanced and effective feature integration, highlighting its strength in complex multivariate long-sequence prediction tasks.
3.3. Sampling Interval
The sampling interval is a key hyperparameter in DSformer, controlling the trade-off between capturing dynamic information and suppressing noise. By setting the temporal resolution of the input data, it directly influences the model’s sensitivity to high-frequency details and random disturbances. We analyze its impact on prediction performance, measured by MAPE, across different environmental conditions.
Mild environmental conditions (Env01 & Env02). Under stable conditions with low wind speeds (2.9–3.0 m/s), vessel motion exhibits regular, low-noise dynamics. Dense sampling (interval = 3) effectively captures fine-scale fluctuations, improving predictive accuracy without introducing disruptive noise. As reported in Kwon [
25], benign weather conditions minimize external disturbances, making high-frequency sampling advantageous for preserving subtle vessel behaviors. Both Env01 and Env02 exhibit reduced prediction errors at dense sampling intervals, with only marginal fluctuations at longer intervals (5–6).
Complex environmental conditions (Env03). In contrast, the Env03 dataset, characterized by strong winds (8.2 m/s) and high humidity, exhibits substantial high-frequency random fluctuations. While dense sampling preserves fine-grained temporal variations, it also increases the model’s sensitivity to irrelevant perturbations, which can obscure underlying motion patterns. Previous studies [
26,
27] have shown that harsh environmental conditions significantly increase system uncertainty and measurement noise. A moderately sparse sampling interval (interval = 5) effectively reduces sensitivity to high-frequency disturbances while retaining dominant motion trends, thereby enhancing model robustness and improving prediction accuracy. However, excessively sparse sampling (e.g., interval = 6) begins to remove informative short-term variations, resulting in degraded predictive performance.
The sampling interval is a critical hyperparameter in DSformer, as it directly determines the temporal resolution of the input sequence and governs the trade-off between disturbance suppression and information preservation. To systematically assess its influence, we conduct an extended ablation study with sampling intervals ranging from 3 to 8 under diverse environmental conditions (see
Figure 7).
The experimental results exhibit a clear unimodal performance pattern. When the sampling interval is small (e.g., 3 or 4), the model operates at high temporal resolution but remains highly sensitive to high-frequency fluctuations and measurement noise, which adversely affects long-horizon prediction stability. As the sampling interval increases, the impact of stochastic disturbances is progressively reduced, leading to improved forecasting accuracy and reaching an optimum at interval = 5.
Beyond this point, further enlarging the sampling interval (interval ≥ 6) causes a noticeable decline in performance. Excessively large intervals overly coarsen the temporal representation, resulting in the loss of informative short-term dynamics that are essential for accurately modeling ship motion responses under rapidly changing sea states. This loss of temporal detail becomes particularly pronounced in complex environments, where abrupt maneuvers and environmental disturbances play a critical role.
Overall, interval = 5 achieves a favorable balance between reducing sensitivity to high-frequency disturbances and preserving meaningful temporal information. This behavior reflects an empirically observed regularization effect induced by temporal sparsification and explains the consistently superior performance achieved at this interval across different datasets.
3.4. Efficiency
This section evaluates the operational efficiency of DSformer relative to a set of representative baseline models including PatchTST, Crossformer, Autoformer, Informer, Fredformer, ETSformer, and Pyraformer. All experiments were performed under identical conditions on an NVIDIA RTX 4090D (24 GB) GPU. We measured the average training time per epoch across the three datasets (
Table 8) and integrated multiple error metrics (
MAE,
MSE,
MAPE, and
WAPE) to assess overall performance (
Figure 8).
The results show that DSformer consistently achieves lower prediction errors while exhibiting the shortest training time per epoch among the compared models. However, training time alone does not fully characterize computational efficiency without considering model complexity. Therefore, we further examine the number of trainable parameters reported in
Table 8.
Despite having a comparable or even larger parameter count than several baseline models, DSformer maintains the fastest training speed per epoch. This indicates that the efficiency advantage of DSformer does not arise from aggressive parameter reduction, but rather from a more efficient architectural design that enables effective utilization of model capacity.
The high efficiency primarily stems from the Dual Sampling Module, which, unlike Transformer variants that rely on complex embedding structures and thereby increase computational cost. DSformer reduces the input sequence length through two complementary strategies:
Reduction Sampling: enlarges the sampling interval to extract essential global trends while substantially reducing sequence length and computational load.
Partitioned Sampling: divides the sequence into consecutive subsequences to preserve fine-grained local dynamics while improving computational efficiency through parallel processing.
This dual module reduces redundant information, preserves critical multi-scale features, and allows downstream attention modules to operate on more compact and information-rich sequences, thereby improving computational speed and memory efficiency.
Beyond raw speed, DSformer offers practical advantages for maritime deployment:
Low latency and real-time adaptability: Rapid inference capability enables low-latency forecasting under dynamically changing sea states, supporting timely operational decision-making in practical maritime scenarios.
Low resource demand and onboard feasibility: Reduced computational overhead makes DSformer suitable for deployment on compact and resource-constrained maritime systems, which are typically required to operate under harsh, humid, and saline environmental conditions.
3.5. Qualitative Time-Series Prediction Analysis
While aggregated metrics such as MAE and RMSE provide an overall measure of prediction accuracy, they do not fully reflect a model’s ability to capture temporal dynamics and physically meaningful motion patterns. To further assess the practical reliability of the proposed method, we present a time-domain comparison between measured ship motion signals and model predictions in
Figure 9.
As shown in
Figure 9, DSformer produces predictions that closely follow the measured trajectories in both amplitude and phase, particularly during rapid oscillations and transition regions. In contrast, several baseline models exhibit noticeable phase lag, amplitude attenuation, or over-smoothing effects. These discrepancies, although sometimes masked in averaged error metrics, may accumulate over time and adversely affect real-world applications such as motion compensation, route planning, and control systems.
4. Conclusions
Maritime transportation remains an essential component of global logistics, yet existing physical and statistical approaches continue to struggle with capturing the complexity of multivariate, long-horizon time series under real-world sea conditions. These limitations hinder precise real-time monitoring and adaptive decision-making, both of which are critical for supply chain resilience and operational efficiency. To address this, we introduce DSformer, a ship motion forecasting architecture that combines dual sampling of global and local dynamics with temporal–variable attention to achieve deep multi-source information fusion. By effectively suppressing noise and capturing complex inter-variable couplings, this architecture contributes to advancing intelligent maritime logistics systems.
Extensive experiments conducted on real-world datasets confirm DSformer’s superior performance in prediction accuracy, noise suppression, and trend modeling across both moderate and complex sea states. Sensitivity analysis indicates that dense sampling enhances accuracy in low-noise conditions, while moderate sampling improves robustness in harsh environments, providing practical guidance for operational deployment in logistics.
Beyond predictive accuracy, DSformer demonstrates outstanding runtime efficiency, enabling near-real-time predictions and early warnings. With its low computational requirements, DSformer can be integrated into onboard systems, making it well suited for time-sensitive logistics operations, such as dynamic route adjustments, schedule optimization, and accident prevention under volatile sea states.
Future work, applying DSformer across different vessel types, routes, and environmental conditions will further increase its utility for smart shipping and global supply chain management. A systematic exploration of hyperparameter-performance interactions will inform more precise and adaptive deployment strategies, enabling low-latency, high-efficiency forecasting at scale.
In essence, DSformer not only advances multivariate long-sequence forecasting but also applies its algorithmic innovation to maritime logistics management, offering a scalable solution to enhance supply chain resilience, safety, and efficiency in the context of globalized trade.