1. Introduction
Accurate wave predictions have become increasingly important in recent decades due to several affected activities, such as research and rescue, tourism, shipping, and renewable energy sources. Numerical Wave Prediction (NWP) models, which are gradually being employed by operational centers to successfully mimic environmental conditions on a worldwide scale, are a dependable and efficient way to accomplish these goals.
However, when forecasting wave parameters in a specific geographic region of interest, NWP models often struggle to give adequate results. This is due to the complex interplay between multiple factors, including the strong reliance on initial and lateral conditions, the challenge of capturing small-scale phenomena, and the parametrization of certain wave processes [
1].
To avoid such issues, one feasible option would be to increase the NWP model’s resolution; however, the efficiency of this methodology is unknown, and the computational cost will surely increase dramatically. A different strategy would be to utilize post-processing algorithms to improve the direct output of the NWP model in use or to employ assimilation systems to enhance its initial conditions. Galanis et al. [
2] introduced a strategy that enhances the effect of data assimilation on predicting ocean waves, demonstrating improved accuracy via integrated modeling techniques. Famelis et al. [
3] investigated both classical and Quasi-Newton methods to optimize the prediction of meteorological parameters, while Famelis and Tsitouras [
4] proposed a quadratic shooting solution for environmental parameter prediction, which effectively addresses complex boundary conditions.
Building on these foundational advancements, Dong et al. [
5] developed a hybrid data assimilation system incorporating machine learning to augment numerical weather prediction models, addressing limitations inherent in traditional methods. Similarly, Rojas-Campos et al. [
6] applied deep learning techniques to post-process NWP precipitation forecasts, significantly improving the predictive accuracy. Furthermore, Krasnopolsky [
7] conducted a comprehensive review of machine learning applications in data assimilation and model physics, emphasizing the transformative potential of these technologies.
Finally, recently, Kordatos et al. [
8] further explored the application of Radial Basis Function neural networks for predicting significant wave height, demonstrating their efficacy in improving forecasts through spatially nested datasets. Collectively, these studies illustrate the critical role that advanced numerical methods and machine learning play in enhancing the accuracy and reliability of environmental predictions, with broad implications for sectors such as marine operations and climate research.
The proposed methodology is among the post-processing algorithms. More precisely, it aims to improve the predictions of an NWP model by reducing the systematic and non-systematic parts of the simulation error. Systematic errors, also known as biases, are consistent and predictable deviations caused by inherent deficiencies in the model, such as flawed parameterizations or incomplete representation of physical processes. These errors persist over time or specific conditions, making them identifiable and correctable through techniques like bias correction or model calibration.
On the other hand, non-systematic errors are random and unpredictable deviations arising from factors such as incomplete observations, numerical noise, or unresolved small-scale phenomena (wave shoaling, wave refraction, diffraction, etc.). Their lack of a consistent pattern makes them more challenging to mitigate, underscoring the chaotic and stochastic nature of the simulated system. Addressing both types of errors is crucial for enhancing the accuracy and reliability of environmental predictions.
The first objective has been extensively discussed by several researchers, who have developed various tools to address it, like ANN mechanisms [
9,
10] or sophisticated statistical models [
11,
12,
13]. In our approach, however, we utilize the Kalman filter (KF) algorithm to remove such errors [
14,
15,
16]. The Kalman filter is considered the fastest sequential approach [
17] that combines recent forecasts with recursively observed data. Thus, its low CPU memory demand provides a significant benefit for every application.
In many cases, though, KFs are unable to detect and, hence, decrease the non-systematic part of the forecast error [
18], resulting in poor and unstable final predictions. To tackle this challenge, a Radial Basis Function neural network (RBF nn) is applied in this work, acting as an additional filter after Kalman’s initial implementation, with the goal of constraining the associated forecast uncertainty.
Under this framework, this study introduces a novel dual filter that uniquely combines Radial Basis Function neural networks with Kalman filters to enhance significant wave height forecasts obtained from the WAve Model (WAM). Unlike existing methodologies [
19,
20], the produced system is designed to simultaneously eliminate systematic biases and constrain the variability of the remaining non-systematic errors, resulting in more accurate and reliable final predictions. Moreover, another innovative aspect of the proposed system is its self-adaptiveness, which automatically determines the optimal RBF structure through hyperparameter optimization. This advanced capability ensures the robustness of the method across diverse regions and temporal scales, as illustrated via various case studies.
The suggested methodology was evaluated using an innovative time-window process application. Specifically, the former case study concerns the areas of Mykonos and Crete in the Aegean Sea for the years 2007–2009, while the latter case study concerns the region of 46002 in the Pacific Ocean for the years 2011–2013. In every case, the obtained results are compared to those derived from the standard Kalman filter to assess the efficacy of the suggested dual filter over classic methodologies.
The setup of the rest of the paper is: in
Section 2, the main properties of the WAM model are described, along with a comprehensive analysis of the suggested methodology.
Section 3 and
Section 4 focus on the main elements of the Kalman filters and Radial Basis Function neural networks, while the time-window process application, together with the obtained results, is presented in
Section 5. Finally, the extracted conclusions from the dual filter implementation are extensively discussed in
Section 6.
3. Kalman Filters
Kalman filtering [
30] is a set of mathematical formulations that compose a powerful and computationally efficient algorithm that estimates the evolution of an unknowing state vector
at time
, given information about a recorded vector
at the same time. It is assumed that the process of the state
from time
to
is given by the following system equation:
while the connection between
and the observable vector
is given by the measurement equation:
Combining Equations (3) and (4), the following state-measurement model is constructed:
where the variables
,
are random vectors that follow the normal distribution with a zero mean, are independent, which means that
E(
·
) = 0 for any
and also time-independent, which implies that
E(
·
) = 0 and
, for all
The quantities
and
express the system and the measurement coefficient matrices, respectively, and need to be determined before the implementation of the filter.
After the state-space model is established, the Kalman filter algorithm applies the following steps:
Step 1: Based on the vector
and its error covariance matrix
, the optimal estimate for time
can be found by
Step 2: When
is available, the corrected value of
at time
is calculated based on the following equations:
where
Step 3: The new value of the covariance matrix of the unknown state
is given by
Equation (8) is known as the Kalman Gain, and it is the crucial parameter of the filter since it determines the way the filter will adjust to any possible new conditions [
31]. For instance, a relatively small Kalman gain suggests high uncertainty in the measurements, meaning that only a small observation segment will be utilized for the new state prediction. Equations (5) and (6) present the prediction phase, while Equations (7) and (9) perform the correction phase. Finally, the parameters
and
are the covariance matrices of the random vectors
,
, respectively, also known as system and measurement noise covariance matrices.
To implement the Kalman filter’s algorithm, initial values must be defined for the state vector
and its error covariance matrix
at time
. However, their effect on the efficiency of the filter is not significant, as it has been proven that, very soon, both
and
converge to their actual values [
32]. On the other hand, that is not the case with the covariance matrices
and
, as the selected calculation method crucially affects the filter’s performance.
Researchers have developed several methods to update these quantities. Some studies apply covariance matrices that are fixed and defined prior to the usage of the filtering process [
33,
34], while others update them within the procedure using the past seven values of
and
[
35,
36]. Here, the initial strategy is applied.
Non-Linear Kalman Filter
Through KF, this study aims to decode and thus eliminate the systematic error of the simulation, which is described as the difference between the observed measurement and the corresponding forecast from the wave numerical model WAM. Here, that bias (
) is expressed as a polynomial [
19,
37] of the model’s previous direct output
:
where
expresses the degree of the polynomial and
is the dimension of the state vector.
This work proposes a quadric polynomial, i.e.,
, as Bogdanovs et al. [
37] observed that employing greater degrees of polynomials results in a substantial estimation error deviation. Therefore, Equation (10) is transformed to
The equation above forms the measurement equation with state vector and measurement transition matrix . Furthermore, regarding the progression of the state vector over time, it is assumed that its change is random due to the lack of accurate information; therefore, the system’s transition matrix is equal to .
Based on the aforementioned, the system Equation (3) and the measurement Equation (4) for this study becomes
and
The initial value for the vector
at time
is considered zero unless other indications about its prior condition are available, whereas its corresponding error covariance matrix
is set to be diagonal with relatively large values, which dictates low trust in the initial guesses. In particular, it is proposed that
[
25].
Crucial for the three-dimensional filter’s successful implementation is the selection of the covariance matrices. In general, a safe strategy is to assume initial values close to zero and later adaptively update and estimate them. However, as it is unclear which adaptation rule to apply, this study utilizes fixed covariance matrices that were defined before the use of the filter. Specifically, various tests are conducted with different combinations of and to determine the optimal one. The results show that for the environmental parameter of significant wave height, the best values were and , respectively, where is the identity matrix.
When the filtering process is done, the systematic error of the simulation is obtained through the optimal state vector
, which is then added to WAM’s direct output to produce the “corrected” forecasts for the second stage of the dual filter (Radial Basis Function neural network implementation).
4. Radial Basis Function Neural Networks
While the polynomial variation of the non-linear Kalman filter algorithm is effective in mitigating systematic deviations, it struggles to address the stochastic and unpredictable nature of the remaining white noise. To overcome this obstacle, the proposed methodology sequentially combines the quadric KF with an RBF neural network, which acts as a secondary filter to constrain the non-systematic part of the forecast error.
Radial Basis Function neural networks [
38,
39] are a special type of ANN that has been widely utilized in the academic community [
40,
41,
42,
43] due to their simple design and training algorithms, which are distinguished by their high accuracy and minimal computational cost [
39]. A standard RBF structure consists of three layers: the input layer, the hidden layer with several neurons (clusters) and radial basis functions as activation functions (
), and the linear output layer (
Figure 4).
Despite the simplicity of the architecture, choosing the activation function and the network’s parameters may be a difficult task. In terms of activation functions, this work employs the Gaussian [
44,
45,
46],
and the Multiquadric [
47,
48,
49],
, as there are insufficient indications on which one is best suited to the wave parameter under study.
The major distinction between these transfer functions is their response. The Gaussian has a local response, which means that the neuron’s output is closer to zero if the distance from the center point increases, while the Multiquadric exhibits the opposite behavior and is therefore characterized by a global response. More information about their main properties can be found in Hagan et al. [
17].
When the activation function is specified, the network’s parameters must be defined through the training process. Typically, there are two strategies for training an RBF neural network: the first approach applies non-linear, gradient-based optimization procedures to determine all the network parameters in one step [
50,
51], whereas the second approach divides the training process into two phases.
The first phase tries to determine the number and locations of the hidden node centroids, while the second phase specifies the synaptic weights. This two-stage procedure exploits the linear interconnection of the hidden and output layers, which allows the use of linear regression to calculate the weights [
52]. Hence, it is frequently faster than optimizing all RBF network parameters simultaneously [
53].
This study applies the two-stage approach. To demonstrate the training process, let’s present as the input vector of a matrix , where , with being the number of training patterns and being the dimension of the input vectors.
Initially, the RBF network calculates the distance between the
input vector and each centroid (
) in the hidden layer. Afterward, that outcome is multiplied by an offset parameter (
), known as width, which scales the activation function, instigating it to either widen or enlarge. As a result, the network input for the
hidden layer neuron can be computed as
where
represents the Euclidean distance.
The produced quantity is transformed via the transfer function (here, the Gaussian or the Multiquadric) and generates the output of the
neuron, which is then multiplied by the corresponding synaptic weight (
). Extending this process to each neuron in the hidden layer and summing up the results, the direct output of the RBF network is obtained by
where
expresses the number of centroids.
The next step of the illustrated process is the determination of the locations of the hidden layer centers. Here, the Kmeans++ algorithm is implemented [
54]. Kmeans++ is an improved version of the classic Kmeans [
55] that identifies a set of centroids with an
approximation for the optimum center set [
56]. However, Kmeans++ does not instantly define the optimum number of clusters (neurons); instead, this quantity should be specified prior to applying the method, which creates uncertainty regarding their optimal value.
To avoid this major drawback and define the size of the network size, the proposed methodology trains the Radial Basis Function neural network for multiple clusters ranging from 10 to 70. Their optimal number would be the one that minimizes the Sum-Squared-Error (
SSE):
where
is the
training error, i.e.,
, with
being the corresponding scalar target for the
input vector.
Based on the established centroids, the width of each cluster can be determined through the next formula [
17]:
where
presents the average distance between the associated center of the
cluster and its neighbors and is computed by
Here, the quantity expresses the number of input vectors that are closest to the related center. Therefore, and are the nearest and the next nearest input vectors to the center .
That concludes the first phase of the two-stage training algorithm. The next and final step includes the estimation, through linear regression, of the synaptic weights that connect the hidden with the output layer. To present this process, the network’s response for the matrix
based on Equation (11) is expressed as
where
is the radial functions
output matrix and
is the synaptic weights vector. Thus, the vector of weights that optimizes the performance of the RBF architecture, i.e., minimizes Equation (12), is given by
where
presents the scalar target values of the
training patterns, i.e.,
.
Aside from the analysis of the training algorithm, another issue that needs to be clarified for the successful implementation of the RBF network is the treatment of overfitting. Overfitting is a phenomenon in which an ANN memorizes the properties of a known data set, inhibiting the formation of models that effectively extrapolate from observed to unseen data [
57].
To address this issue, this work applies the L2 regularization strategy [
52]. The primary aim of this procedure is to reduce the network’s dependency on specific “routes” not by decreasing the network weights but by constraining their magnitude. To accomplish this, a parameter
is added to the Sum-Squared-Error to penalize large weights. Hence, the
(12) is transformed into
and the corresponding optimal vector
is transformed to
The determination of the penalty parameter
is not an easy task, as its value will crucially affect the generalization capabilities of the RBF network; therefore, the choice cannot be random. Several approaches have been developed to define the
parameter [
58], but this work suggests an alternative strategy. More specifically, for every number of clusters, multiple trainings are conducted for different values of that parameter, ranging from
to
. The ideal value is the one that minimizes Equation (13).
The developed dual filter is outlined in Algorithm 1, while the main characteristics of the Radial Basis Function neural network are summarized in
Table 1. Detailed results about the number of clusters, penalty parameters, and activation functions from the RBF’s training process can be found in
Appendix A.
Algorithm 1: Combine KFs and RBFNNs. |
Based on the training data set (): | {Inputs, Targets} → {Model’s Forecast, Observations} |
for each element in do | |
Apply the non-linear Kalman filter and obtain | |
endfor | |
Create the Input data for the RBF network | |
Create the training and validation datasets for the RBF | Distinct training and validation datasets for each training. Same for every topology |
for each Cluster do |
for each penalty parameter do |
for each Activation function do |
Form the RBF structure.% number of clusters, regularization parameter λ, activation function |
while train ≤ maxtrain % Conduct multiple trainings for each structure |
Determine the centroids from the training dataset using the K-means++ and compute the widths. Train network using LLS based on the training data set. |
performance → Network’s performance % SSE based on the validation data set |
if performance < Initial Value |
Set Initial Value equal to performance |
The best results for every combination are stored in a cell array. Number of clusters, performance, penalty parameter, activation function, train time, best centers, widths, and external weights. |
endif |
train → train+1 |
endwhile |
train → 1 |
endfor |
endfor |
readjust Initial Value |
Endfor |
Define the optimal RBF network structure |
if several indices in the Total SSE vector display similar results. % Their absolute difference remains smaller than a specified threshold position → the index with minimum train time else position → the minimum SSE index end |
Best RBFNN structure → best results{position}. |
The produced dual filter is constructed primarily as a self-adaptive computational system that simultaneously targets the systematic and non-systematic parts of the forecast error. Nevertheless, the proposed method indirectly can also boost the computational efficiency of WAM, by minimizing the requirement for high-resolution simulations, or repeated runs of the numerical wave prediction model. That is partly owing to the use of Kalman filters and Radial Basis Function neural networks, which are highly efficient post-processing techniques. Their relatively low computational cost, along with their capacity to generate enhanced predictions, makes the overall framework efficient compared to original numerical models. Therefore, the developed dual filter accomplishes significant error reductions without increasing the computational demands of the core model.
6. Conclusions
The motivation of this research was to develop a novel post-processing algorithm that combines Radial Basis Function neural networks and Kalman filters to improve the forecasts of a numerical wave model regarding the parameter of significant wave height. To accomplish this, the produced model targets the simulation’s systematic error alongside the remaining non-systematic part of that error.
Initially, a non-linear Kalman filter is applied to decode and, as a result, eliminate the bias between the recorded observations and the direct outputs of the WAM system. Afterward, a Radial Basis Function neural network is utilized, acting as an additional filter, with the goal of detecting and reducing the variability in the non-systematic part of that bias and the accompanying anticipated uncertainty.
The suggested methodology was applied via a time-window process involving several regions and time periods. The first case study concerns the areas of Mykonos and Heraklion (Crete) in the Aegean Sea from 2007 to 2009, while the second case focuses on the region 46002 in the Pacific Ocean between 2011 and 2013. For every case study, the extracted results were compared to those obtained by the classic Kalman filter to determine the degree of improvement offered by the suggested dual filter.
The results revealed that combining RBF neural networks and KFs significantly improved the forecasting capabilities of the simulation system in use. Specifically, the recorded systematic errors decreased considerably, with an average reduction of 53% in the Bias index, whereas the Rmse evaluation indicator and, thus, the related forecast uncertainty were reduced by 28%. In contrast, the standard Kalman filter implementation resulted in a 73% and 37% increase in the relevant indices.
Furthermore, the usage of Kalman filters in conjunction with Radial Basis Function neural networks illustrated stable behavior regardless of forecasting horizons and geographical regions, providing a smooth and efficient tool that avoids the boundaries of classic Kalman filters, which substitute initial systematic deviations with comparable over- and under-estimation periods, leading to lower mean error values but no meaningful gain in forecasts.
The suggested methodology is applicable to similar simulations in fields such as economics or signal processing, as it is independent of the type of data and therefore can be extended beyond environmental applications.