1. Introduction
The maritime and ocean engineering communities have increasingly recognized the importance of developing accurate and computationally efficient models for the prediction and control of marine system dynamics. In particular, the capability to accurately predict ship motions is essential for supporting design and operations. This holds particularly for seakeeping and maneuvering in adverse weather conditions, for which the availability of reliable predictive tools helps develop and operate vessels, ensuring the safety of structures, payload, and crew. In this regard, commercial and military ships must meet the International Maritime Organization (IMO) Guidelines and NATO Standardization Agreements (STANAG), respectively. Both regulatory frameworks emphasize the need for accurate assessments of vessel motions and loads under a wide range of sea conditions, highlighting the importance of robust modeling and simulation tools to support compliance and operational readiness.
The complexity and nonlinear nature of hydrodynamic phenomena involved pose significant challenges to predicting ship responses in waves. Recent works [
1,
2] demonstrated the ability of computational fluid dynamics (CFD) methods with unsteady Reynolds-averaged Navier–Stokes (URANS) formulations to assess ship performances in waves and extreme sea conditions. Along with the high fidelity of simulations, however, comes their high computational costs. This is particularly true when simulations aim to achieve statistical convergence of relevant quantities of interest, and complex fluid–structure interactions are investigated. Real-time applications, such as control, fault detection, and digital twinning in general, are also limited by the computational resources required from such models.
In this context, data-driven modeling techniques have emerged as powerful alternatives or complements to traditional first-principles approaches. They promise to reduce the computational cost while keeping the fidelity of their estimates comparable to the original data sources. This requires the methods to be properly trained and/or calibrated, which is generally not trivial [
3]. Within this framework, system identification provides a structured approach for constructing predictive models able to incorporate key characteristics of the system from high-fidelity numerical tools and/or experiments. Here, we interpret system identification broadly as the development of reduced-order models (ROMs) capable of predicting system responses, independent of whether their parameters correspond directly to physical quantities.
Equation-based reduced order models (ROMs), such as the Maneuvering Modeling Group (MMG) model [
4,
5], have been developed as physics-based efficient approaches, characterized by fast evaluation, and demonstrated good agreement with experiments and CFD for the maneuvering of displacement ships [
6] and twin-hull configurations [
7,
8] and have been recently studied also for planing hulls [
9]. Physics-based models are typically classifiable as white-box (fully equation-based) or grey-box (physiscs-based with data tuning). This characteristic makes their solution highly interpretable, offering insights on the identified system and the involved physical phenomena Despite the promising results, such models typically exhibit a limited adaptability to unmodeled dynamics (white-box) or require a large amount of data from CFD computations or experimental fluid dynamics (EFD) for their training and definition of forcing terms (grey-box).
Data-driven machine learning techniques gained popularity due to their ability to model complex input–output relations in an automated manner directly from data, not requiring the effort of gaining prior knowledge on the system (equation-free approaches). In particular, recurrent neural networks (RNNs) [
10] and long short-term memory networks (LSTMs) [
3], along with their bidirectional LSTM (BiLSTM) [
11,
12] variant, have been demonstrated to be effective for building equation-free data-driven models for ship motions and provide multi-step-ahead forecasting of the ship’s motion in several sailing conditions, including calm water and waves [
13]. The strength of machine and deep learning methods lies in their ability to capture relevant hidden and nonlinear input–output relations directly from available data, their compactness, and enabling fast evaluation. However, deep learning models typically require large datasets for training (more complex architectures usually require more expensive training) and to generalize effectively. In addition, while powerful, such models are often considered black-box approaches and pose challenges in terms of the physical interpretability of their results.
Among the available methods, the dynamic mode decomposition (DMD) [
14,
15,
16,
17] and its methodological variants have recently gained attention due to their ability to extract dominant dynamic features directly from experimental or numerical data, with small or no assumption on the underlying physics, providing interpretable, low-dimensional representations of nonlinear systems. DMD can be classified somewhere between a black-box and a grey-box approach for reduced-order modeling. With the former, DMD shares the data-driven and equation-free structure like other machine learning techniques, such as RNN, LSTM, and BiLSTM. However, DMD-based methods retain a certain level of interpretability thanks to the linear nature of the model. DMD can be considered a method to build a finite-dimensional approximation of the Koopman operator [
18], which, in turn, describes a nonlinear dynamical system as a possibly infinite-dimensional linear system [
19]. The reduced-order linear model is obtained by DMD from a small set of snapshots of the dynamical system under analysis. The DMD obtains the model with a direct procedure (linear algebra) that, from a machine learning perspective, constitutes the training phase. Its data-driven nature, the fast non-iterative training, and its data-lean characteristics contributed to the popularity of DMD as a reduced-order modeling technique in several fields, such as fluid dynamics and aeroacoustics [
14,
20,
21,
22,
23], epidemiology [
24], neuroscience [
25], finance [
26].
DMD was applied for the first time to the forecasting of ship dynamics in [
27], in which the proof of concept for short-term forecasting of trajectories, motions, and loads of maneuvering ships in waves was given. In [
28], the approach was systematically assessed on the same test cases and first extended to the use of an augmented DMD by augmenting the system state with lagged copies of the original states and their derivatives. This approach enabled the modeling of memory effects in the system, improving accuracy over the tested cases compared to the standard formulation. Ref. [
29] systematically explored the use of Hankel-DMD (HDMD) for short-term forecasting of ship motions, highlighting its potential for real-time prediction and control applications and digital twinning. One of the first efforts into the development of a DMD-based, data-driven system identification model for ship motions was conducted in [
30], using the dynamic mode decomposition with control (DMDc). This incorporates the control variables (e.g., rudder angle) and forcing inputs (e.g., wave elevation) in the system regression, separating their effect from the free evolution of the system. Furthermore, Hankel-DMD with control (HDMDc) was applied to both ship motion and forces prediction, demonstrating the capability of the method to achieve good accuracy without degradation through the prediction time. The HDMDc was then tested on several ship test cases, introducing methodological advancements to face specific challenges, such as the embedding of nonlinear observables in the state and input vectors to address extremely nonlinear responses of planing hulls in slamming [
31], and the use of a Tikhonov-regularized least-square formulation for improving the numerical stability of the DMD regression when using noisy experimental data [
32].
Quantifying the uncertainty associated with the prediction of a ROM has become increasingly relevant for data-driven modeling, and a key characteristic for its usage in the context of, e.g., multifidelity analysis and optimization. Few approaches for introducing uncertainty quantification in DMD analysis have been presented in the literature so far. Ref. [
33] first introduced a probabilistic model by modeling the measurement noise as Gaussian and treating the dynamic modes and eigenvalues as random variables, whose posterior distributions were inferred through Gibbs sampling. Later, ref. [
34] presented the bagging optimized-DMD, where Breiman’s statistical resampling strategy was applied to training data, generating ensembles of DMD models and estimating confidence intervals for the extracted modes and eigenvalues. A similar approach was used in [
32] and called a frequentist approach: several training signals were used, producing an ensemble of HDMDc models and estimating confidence intervals for time-resolved predictions of ship motions. Refs. [
29,
30] considered the uncertainty arising from the selection of HDMD and HDMDc hyperparameters: the length of the training sequence and the number of time-lagged copies of the state were separately considered as probabilistic variables with uniform distributions within a suitable range. Monte Carlo sampling was applied to obtain an ensemble of predictions forming a posterior distribution. This concept, referred to as Bayesian, was further developed in [
29,
30,
31,
32], where all relevant hyperparameters are considered as stochastic variables at once.
HDMDc and its stochastic extensions enabled the construction of robust, uncertainty-aware ROMs capable of capturing the essential dynamics of marine systems under realistic operating conditions. In practical marine applications, it is crucial that ROMs remain valid across a variety of sea states, wave spectra, and loading conditions. Testing the transferability of DMD-derived models beyond their training datasets, therefore, represents a key step toward their reliable adoption in real-world scenarios. However, despite the growing body of literature on DMD-based modeling in ship dynamics, most studies to date have focused on fixed or well-controlled experimental conditions, with limited exploration of the generalization capabilities of such models when exposed to unseen environments. In [
30,
31,
32], for example, the test set for the DMD-based ROMs was composed from ship’s dynamic responses to unseen forcing wave signals, which, however, represented different realizations of the same sea state, characterized by an identical spectral distribution of wave energy. Consequently, although the wave sequences used for testing were unseen during training, they were statistically consistent with the training conditions, thus not probing the generalization capability of the models to different sea states.
The present work aims to address this gap by investigating specific capability of HDMDc and BHDMDc system identification to generalize beyond the training conditions. A dedicated experimental campaign was performed at the CNR-INM towing tank facility, collecting data from a Codevintec CK-14e autonomous surface vehicle (ASV) subject to irregular and regular head wave conditions. The ASV features a recessed moon pool, which induces significant nonlinearities in the hydrodynamic response, primarily associated with sloshing and piston-like oscillations of the internal free surface. Specifically, the DMD-based ROMs are learned using data from the irregular waves condition and subsequently applied to predict the vessel response in irregular and regular wave conditions. Statistical and probabilistic analyses were employed to quantify prediction accuracy and uncertainty.
The remainder of this paper is organized as follows.
Section 2 describes the experimental setup and data acquisition.
Section 3 presents the HDMDc and BHDMDc methodologies adopted in the study.
Section 6 discusses the identification results and the validation of the models under different sea conditions. Finally,
Section 7 summarizes the main findings and outlines perspectives for future research on data-driven modeling of marine systems.
3. HDMDc
DMD [
14,
36] was originally presented to decompose high-dimensional time-resolved data into a set of spatiotemporal coherent structures, characterized by fixed spatial structures (modes) and associated temporal dynamics, providing a linear reduced-order representation of possibly nonlinear system dynamics. The original DMD characterizes naturally evolving dynamical systems. In contrast, its extension to forced systems, called DMD with control (DMDc) [
37], accounts for the influence of forcing inputs in the analysis, helping disambiguate it from the unforced dynamics of the system.
The standard DMD and DMDc formulations approximate the Koopman operator, creating a best-fit linear model that links sequential data snapshots of measurements [
14,
15,
38]. This model provides a locally linear (in time) representation of the dynamics, which, however, is unable to capture many essential features of nonlinear systems. The augmentation of the system state is thus the subject of several DMD algorithmic variants [
17,
39,
40,
41] aiming to find a coordinate system (or embedding) that spans a Koopman-invariant subspace, to search for an approximation of the Koopman operator valid also far from fixed points and periodic orbits in a larger space. However, there is no general rule for defining these observables and guaranteeing they will form a closed subspace under the Koopman operator [
42].
The HDMD [
43] is a specific version of the DMD algorithm developed to deal with the cases of nonlinear systems in which only partial observations are available [
17]. Incorporating time-lagged information in the data used to learn the model, HDMD and HDMDc increase the dimensionality of the system. Including time-delayed data in the analysis, the HDMD and its extension to externally forced systems, HDMDc, can extract linear modes and the associated input operator defined in a space of augmented dimensionality. Such modes are capable of reflecting the nonlinearities in the time evolution of the original system through complex relations between present and past states. The state vector is thus augmented by embedding
s time-delayed copies of the original variables. The HDMDc involves, in addition, augmenting the input vector with
z time-delayed copies of the original forcing inputs. The use of time-delayed copies as additional observables in the DMD has been connected to the Koopman operator as a universal linearizing basis [
44], yielding the true Koopman eigenfunctions and eigenvalues in the limit of infinite-time observations [
43].
The HDMDc identifies a representation of the dynamics as an externally forced system:
The vectors
and
are referred to as the extended state and input vectors, respectively, at the time instant
j. The vectors
and
are obtained starting from the original state and input vectors
and
, respectively:
which are augmented by embedding a number
s and
z of time-lagged (delayed) copies of the original state and input variables, such that:
As a consequence, the extended state matrix and the extended system input matrix are defined as
and
, respectively.
The procedure to extract the extended matrices from data starts by introducing the vector
:
such that Equation (3) can be rewritten as follows:
Data from
m snapshots are re-arranged in two augmented data matrices
and
, which are built as follows:
Specifically the matrices
,
, and
contain the state and input snapshots at the
m considered time instants:
while the Hankel matrices
,
, and
contain the extended delayed state and input snapshots:
The augmented matrix
is approximated by solving the following regularized least-square minimization:
which solution is given by the following equation:
where
is a regularization factor. The above Tikhonov-regularized formulation extends the one presented in [
45] to the HDMDc. It is suitable for improving the numerical stability of DMD regression, with its robustness in high-dimensional spaces and accuracy for applications involving noisy data. A similar effect can be pursued with the exact-DMD formulation by SVD rank truncation, see [
31], also reducing the model size. The Tikhonov-regularized formulation is here preferred for its more robust behavior to noisy data. Once the matrices
and
are obtained, Equation (3) can be used to calculate the time evolution of the augmented state vector
from an initial condition, where the tilde indicates the HDMDc estimation. By isolating its first
N components, the predicted time evolution of the original state variables
is extracted.
Uncertainty Quantification in HDMDc Through Ensembling
DMD-based models can be further extended to provide uncertainty estimation of their predictions. To this aim, the ensembling approach is applied, i.e., the combination of predictions coming from different models to obtain a prediction along with its statistics. The Bayesian formulation for HDMDc, which leads to BHDMDc was intrduced in [
30]. The driving rationale behind the BHDMDc is the observation highlighted by the authors in previous works [
13,
28] that the final prediction from DMD-based models can strongly vary for different hyperparameter settings. Furthermore, there is no general rule for determining their optimal values.
The dimensions and the values within matrices
and
depend on the four hyperparameters
m,
s,
z, and
. In place of the the first three, we equivalently use their respective time lengths, i.e., the observation time length
, the maximum delay time in the augmented state
, and the maximum delay time in the augmented input
. The relation between
m,
s,
z, and
,
,
is given by:
with
the step of the temporal discretization. These dependencies can be denoted as follows:
In BHDMDc, the hyperparameters are considered as stochastic variables with given prior distribution,
,
,
, and
, respectively. Through uncertainty propagation, the solution
also depends on
,
,
, and
:
and the following are used to define, at a given time
t, the expected value (
) of the solution and its standard deviation (
):
where
,
,
, and
,
,
,
are lower and upper bounds and
,
, and
are the given probability density functions for
,
,
, and
.
In practice, a uniform probability density function is assigned to the hyperparameters, and a set of realizations is obtained through a Monte Carlo sampling, obtaining a posterior distribution on the prediction.
We note that the proposed method is termed “Bayesian” in a broader sense than classical Bayesian inference. In particular, posterior distributions are not inferred over the Koopman operator or dynamic mode decomposition matrices themselves. Instead, in BHDMDc, the hyperparameters of the method are treated in a Bayesian manner, assigning them given prior distributions and propagating their uncertainty through Monte Carlo sampling. This results in a predictive distribution for the system variables. While this does not constitute full Bayesian parameter inference, it is consistent with Bayesian principles of uncertainty propagation and model averaging and aligns with how “Bayesian” is used in the related literature for uncertainty-aware reduced-order modeling.
4. Statistical Variables and Performance Metrics
To compare the predictions made by the deterministic and Bayesian models with the ground truth from the experiments, four error indices are employed: the average normalized mean square error (ANRMSE) [
13], its time-resolved version (
), the normalized average minimum/maximum absolute error (NAMMAE) [
13], and the Jensen-Shannon divergence (JSD) [
46].
The ANRMSE quantifies the root mean square error between the predicted values
and the measured (reference) values
at different time steps, normalizing the result for each variable by
k times the standard deviation of the measured value (
in this work) and averaging over the N variables in
:
where
is the number of time instants in the considered time window, and
indicates the standard deviation of the measured values in the considered time window for the variable
:
and
For the same time window, the time-resolved variant of the ANRMSE called
is evaluated as the average across the
N variables of the time evolution of the square root difference between the reference and predicted signal, normalized by its standard deviation in the considered time interval:
This is used to monitor potential trends in the prediction error, i.e., whether the accuracy decreases or increases for longer predictions.
The NAMMAE metric [
13] provides an engineering-oriented assessment of prediction accuracy. It measures the absolute difference between the minimum and maximum values of the predicted and measured time series, normalized by
k times the standard deviation of the measured values, and averaged over
N variables, as follows:
In addition to the direct comparison of DMD-predicted and experimentally measured time histories, calculating the probability density function (PDF) of the variables under prediction is of interest for the application in irregular waves. To statistically assess the PDF estimator obtaining confidence intervals, a moving block bootstrap (MBB) method is applied to time histories from EFD and BHDMDc-based predictions of each variable. For the MBB analysis of a generic variable
, a single time series of
items is obtained for the EFD and the BHDMDc, respectively, by joining all the measured or predicted series. From the single time series, a number
of moving blocks is used, each defining a time series composed by
, where
c is the block index, and
is an optimal block length [
47] with
and
. From the original set of C blocks, a number of
blocks are drawn at random with replacement and concatenated in the order they are picked, forming a new bootstrapped series of size
. The PDF of each bootstrapped time series is obtained using kernel density estimation [
48] as follows:
Here,
K is a normal kernel function defined as
where
is the bandwidth [
49], and
is the inter-quartile range for the variable
with
being the quantile function for the variable
. The expected value and a confidence interval are calculated for the PDF of each variable for the EFD measurements and DMD-based predictions. The quantile function
q is evaluated at probabilities
and
, defining the lower and upper bounds of the 95% confidence interval of the PDFs as
.
The so-obtained PDFs of each variable from the different sources are then compared using the JSD [
46], defined as follows:
where
The Jensen–Shannon divergence (JSD) [
50] is a symmetric measure of similarity between two probability distributions
V and
W, based on the Kullback–Leibler divergence (
D) [
51]. Here,
and
are the PDFs of the variable
from the EFD and HDMDc, respectively. The JSD quantifies the average discrepancy of each distribution with respect to their mixture
M, defined over the domain
. It is always finite, bounded as
.
For each variable, the expected value and the quantile function for and of JSD are calculated on the PDFs evaluated from the bootstrapped time series, defining the lower and upper bound of the 95% confidence interval .
5. System Identification Setup
In the present work, no equation-based motion model of the ASV is assumed a priori. The reduced-order model is obtained in a fully data-driven and equation-free manner, where the state-space representation of Equation (3) is identified directly from the experimental measurements through the (Bayesian) Hankel-DMDc (see
Box 1 for a summary of the workflow for the system identification using HDMDc). The state vector of the system is defined as
The variables in the state vector correspond to the surge, heave, and pitch degrees of freedom, which dominate the symmetric head-sea response of the moored vessel, together with their first and second time derivatives and the measured mooring loads. These quantities reflect the relevant physics of the phenomenon to be predicted by the system identification procedure.
The input vector is composed of observables based on the wave elevations measured by the Kenek probes
and
, and the model surge
x.
During the experiments, the relative position between the model and the wave probes was not constant, and large deviations from the rest position occurred in the x-direction. The elastic restoring force from the moorings was not sufficient to counteract the wave forces and keep the surge oscillating around the rest position. Hence, directly using the wave elevation in the input vector as measured by the probes would lead to non-negligible phase error in the prediction. The HDMDc and BHDMDc are, in fact, able to learn a single phase relation between input and output, which, however, would be insufficient due to the dynamic change in the relative position between probes and model. For this reason, the observables
and
were approximated as delayed signals from
and
:
where the delay
depends on the surge
x and the
effective phase velocity
c:
In this work, a simple estimation of
c is obtained through signal correlation between
and
:
The cross-correlation operator applied to the measured discrete wave elevation signals results in a time-discrete vector
collecting the correlation value varying the number
i of shift samples. The temporal shift
is hence found as follows:
where
denotes the discrete integer shift between the signals for which the cross-correlation
reaches its maximum.
Box 1. System identification using HDMDc workflow
Inputs. Training state and input time series , ; training window length ; delay-embedding length ; Testing input time series for the forecast horizon .
Outputs. Deterministic prediction for the horizon and identified system matrices .
An average encounter wave period
is calculated as the average encounter wave period of the measured
signal in the three EFD runs with irregular waves:
For processing ease with DMD, data were downsampled to 32 time steps per , . This reduced the size of the data matrices to be handled while preserving sufficient temporal resolution to retain the fidelity of the original signal.
Leveraging results from previous studies by the authors [
30,
31,
32], reasonable values for the HDMDc hyperparameters for deterministic analysis are defined:
,
,
and
, so that
,
, and
.
The ranges of variation for the uniform hyperparameter distributions in the Bayesian analysis were obtained considering a ±50% interval centered on the deterministic values, defining the respective priors: ∼ , ∼ , ∼ , and ∼ . The Bayesian predictions are obtained using 100 Monte Carlo realizations of the hyperparameters. s, z, and m are taken as the nearest integers from the calculated values.
All the analyses are based on normalized data using the Z-score standardization: the mean of the time series of each variable is removed and the amplitude is scaled with the standard deviation evaluated on the training signal.
The data from each experimental run in irregular waves was split into training (the first half) and testing data (the second half). Data from regular wave tests are used completely as testing.
6. Results and Discussion
In order to statistically assess the performance of HDMDc and BHDMDc, the ANRMSE and NAMMAE were evaluated for several training and testing sequences. In particular, 10 training sequences and 10 test sequences were randomly selected and combined in a full-factorial analysis. The same training and test sequences were used for the deterministic and Bayesian versions of the ROM to guarantee a fair comparison. The length of the test time histories for the irregular waves case was . Due to reduced signal lengths, for regular waves.
For both irregular-to-irregular and irregular-to-regular cases, results are presented as box–violin plots comparing the ANRMSE and NAMMAE for the deterministic and Bayesian ROMs, see
Figure 5 and
Figure 9, respectively.
The boxes show the first, second (equivalent to the median value), and third quartiles, while the whiskers extend from the box to the farthest data point lying within 1.5 times the interquartile range, defined as the difference between the third and the first quartiles from the box. The density of the data distribution is additionally represented by the violin shape, providing a visual summary of the distribution beyond the quartiles. The values of single realizations, including outliers, are plotted with dots.
In addition, representative test sequences are shown in
Figure 6 and
Figure 7 for irregular-to-irregular and in
Figure 10,
Figure 11 and
Figure 12 for irregular-to-regular cases. Figures compare the experimental measurements (EFD), the deterministic ROM prediction, and the Bayesian ROM prediction (mean value as solid line and 95% confidence interval as shaded area of the same color).
Finally, the MBB analysis is applied to the irregular-to-irregular case, and PDFs from bootstrapped sequences for EFD and BHDMDc are obtained for each predicted variable, along with their 95% confidence interval, and are shown in
Figure 8. The differences between the EFD and BHDMDc distributions are evaluated using JSD, and results are summarized in
Table 2.
6.1. Irregular-to-Irregular
The HDMDc and the BHDMDc ROMs produce an overall reliable and robust prediction for unseen irregular input waves. No trend in was observed throughout the testing window, suggesting that the same level of accuracy shown in can be achieved for arbitrarily long sequences.
Observing the time-resolved predictions in
Figure 6 and
Figure 7 evidences that the great majority of the ANRMSE and NAMMAE errors arise from the prediction of the mooring forces. This is also confirmed by
Table 2 showing that the JSD for the
e
PDFs is an order of magnitude higher than the other variables. Comparing the box–violin plot in
Figure 5, it can be noted that the BHDMDc model reduced both ANRMSE and NAMMAE errors, lowering the average error and also reducing the results dispersion.
The uncertainty in the Bayesian model is very low, as can be observed in
Figure 6 and
Figure 7. This is consistent with the high accuracy achieved for vessel dynamics predictions and indicates the robustness of the model to hyperparameter changes in the identified ranges (confirming the rule of thumbs for the hyperparameter values identified in the literature [
31,
32]). However, confidence intervals are not sufficiently extended to cover the ground truth for mooring loads, whose predictions are less accurate and larger uncertainties would have been expected.
The remarkable accuracy obtained for the ship dynamics is also reflected in the MBB analysis, as the PDFs from EFD and BHDMDc data are very close, and their confidence intervals are almost always overlapping. The MBB analysis also confirms a reduced accuracy in the estimation of mooring forces.
The difficulty in obtaining accurate predictions of mooring forces is due to a combination of measurement noise and artifacts arising from the physical behavior of the mooring lines. Specifically, the mooring lines rested at the water surface, alternately emerging or being submerged by waves (particularly the bow mooring line), which induced sudden variations in tension that do not necessarily reflect the system dynamics. In addition, insufficient pretensioning, mainly due to an initial underestimation of the ASV mass and wave forces, caused the lines to be slack at times, further contributing to spurious oscillations in both the mooring lines and the measured loads. These factors complicated the learning of the system response by the DMD-based system identification, with a consequent increased error in the prediction of the mooring forces.
Nevertheless, the DMD-based models capture the low-frequency content of the loads, as can be seen in
Figure 6 and
Figure 7, achieving a fair estimation of load peaks. Regularization plays a key role, preventing the identification of spurious and unstable dynamics, to which the method is particularly sensitive in noisy data environments.
6.2. Irregular-to-Regular
The ROM learned in irregular wave conditions is tested in regular waves.
Figure 10,
Figure 11 and
Figure 12 show the results of a selected test sequence for
,
, and
, respectively. The DMD-based models achieve remarkable accuracy also in these cases for the variables linked to the ship’s dynamics, with the largest error being in the reproduction of mooring forces. Part of the error clearly arises from an imperfect identification of the load-related subsystem, already noted in the irregular waves data results. In addition, it has been noted that the issue evidenced for the measure of mooring forces was even more prominent in the regular waves case. During the regular wave tests, the model was more effectively pushed downstream by the regular waves from excitations with shorter wavelength, amplifying slackness of the stern line (loads on the stern mooring are consequently better predicted for higher
).
The box–violin plots showing the statistical analysis of ANRMSE and NAMMAE results are presented in
Figure 9. It can be noted that the ANRMSE is slightly higher than for irregular-to-irregular wave predictions; however, the NAMMAE is lower. This is coherent with a more pronounced phase error in the prediction of
and
, while the models still provide a reasonable prediction of the loads’ extrema.
The ROMs trained on P-M irregular waves were able to predict vessel responses to regular head waves accurately. On one side, the regular wave frequencies lie in the decreasing side of the spectral peak of the training sea state
Hz,
Hz, and
Hz, as can be seen from
Figure 2. However, irregular and regular wave excitations have substantial differences in terms of frequency content and phase coherence. The successful prediction of regular-wave responses is, hence, an indication that the models have generalized beyond the specific type of excitation seen during training, at least when the new excitation remains within the spectral band covered during training. This highlights the importance of selecting informative training datasets, i.e., constructing and using training datasets that encompass a sufficiently rich representation of the system dynamics, enabling the development of ROMs capable of generalizing across multiple operating conditions. In this sense, irregular sea states appear promising due to their broadband spectral content, inherently providing a more comprehensive excitation of the system.
7. Conclusions
This work presented the system identification of a small ASV in moored conditions using the HDMDc and its uncertainty-aware extension, BHDMDc. The methods were applied to experimental data collected in the CNR-INM towing tank under both irregular and regular head wave conditions. The ASV under investigation features a recessed moon pool, which induces nonlinear responses due to sloshing, thereby increasing the modeling challenge.
The deterministic HDMDc formulation successfully captured the dominant dynamics of the vessel motions, providing accurate predictions of surge, heave, and pitch across all tested conditions. The prediction of mooring forces was more challenging and exhibited larger errors, mainly due to measurement noise and extremely nonlinear effects related to the partial slackening and intermittent immersion of the mooring lines. Nevertheless, the models were able to reproduce the low-frequency components and general trends of the mooring loads, offering a meaningful approximation of their temporal evolution.
The Bayesian extension introduced uncertainty quantification by propagating the variability of the HDMDc hyperparameters through Monte Carlo sampling. The Bayesian model consistently improved prediction accuracy compared to the deterministic version, also improving the robustness as measured by ANRMSE and NAMMAE.
A key outcome of this study is the generalization capability of HDMDc and BHDMDc models trained exclusively on irregular wave data. The trained models were able to accurately predict the ASV response in regular wave conditions, a different excitation regime although still characterized by frequencies within the spectral range of the training sea state. This highlights the importance of constructing informative training datasets that encompass a sufficiently rich system dynamics. In particular, irregular sea states, due to their broadband spectral content, inherently provide a more comprehensive excitation of the system, enabling the development of ROMs capable of generalizing across multiple operating conditions within the same frequency band. Further extending the approach to include, e.g., different sea states, wave directions, and operating conditions, would represent a key enabling technology for digital twins of marine systems, where reliable real-time predictions and uncertainty quantification are essential for control, monitoring, and decision support applications. In this context, the interpolation of parametric reduced-order models [
52,
53] may represent a viable strategy to overcome the intrinsic limitations of using a single, albeit informative, training set, by combining multiple locally trained models to construct a global predictive model that retains accuracy across a broader range of operating conditions.