An EEMD-BiLSTM Algorithm Integrated with Boruta Random Forest Optimiser for Signiﬁcant Wave Height Forecasting along Coastal Areas of Queensland, Australia

: Using advanced deep learning (DL) algorithms for forecasting signiﬁcant wave height of coastal sea waves over a relatively short period can generate important information on its impact and behaviour. This is vital for prior planning and decision making for events such as search and rescue and wave surges along the coastal environment. Short-term 24 h forecasting could provide adequate time for relevant groups to take precautionary action. This study uses features of ocean waves such as zero up crossing wave period (Tz), peak energy wave period (Tp), sea surface temperature (SST) and signiﬁcant lags for signiﬁcant wave height (Hs) forecasting. The dataset was collected from 2014 to 2019 at 30 min intervals along the coastal regions of major cities in Queensland, Australia. The novelty of this study is the development and application of a highly accurate hybrid Boruta random forest (BRF)–ensemble empirical mode decomposition (EEMD)–bidirectional long short-term memory (BiLSTM) algorithm to predict signiﬁcant wave height (Hs). The EEMD–BiLSTM model outperforms all other models with a higher Pearson’s correlation (R) value of 0.9961 (BiLSTM—0.991, EEMD-support vector regression (SVR)—0.9852, SVR—0.9801) and comparatively lower relative mean square error (RMSE) of 0.0214 (BiLSTM—0.0248, EEMD-SVR—0.043, SVR—0.0507) for Cairns and similarly a higher Pearson’s correlation (R) value of 0.9965 (BiLSTM—0.9903, EEMD–SVR— 0.9953, SVR—0.9935) and comparatively lower RMSE of 0.0413 (BiLSTM—0.075, EEMD-SVR—0.0481, SVR—0.057) for Gold Coast.


Introduction
About 60% of the world's population live around coastal areas [1] and, therefore, to understand how wave factors affect the coastal areas is extremely important. There are many factors that influence the wave impact on the coastal areas which includes sea level rise and changes in frequency of floods and storms. The effects of sea level rise have impacted many island nations in the South Pacific. These islands are most vulnerable to sea level rise leading to inundation of their land in the future [2]. Furthermore, as many South Pacific islands have already experienced significant sea level trends and inundations, it is extremely important to understand how this will project into the coming years for Australia. For example, according to [3], which analysed 16 years of sea-level data from the Australian project for sea-level trends for Tuvalu, if the increasing rate of rise continues, the loss of land will be significant in the next 50 years.
The Great Barrier Reef in Queensland, Australia is the world's largest coral reef system which is made up of more than 2900 individual reefs and 900 islands stretching for over 2300 kilometres over an area of approximately 344,400 square kilometres [4]. It is a popular This superior ability is enhanced with the incorporation of many neural layers to overcome problems with relatively complex function approximation due to its capability for non-linear mapping [30,31]. The problem of overfitting is also addressed with the use of hyperparameter tuning of BiLSTM parameters in the model development phase. The BiLSTM model has been successfully used in other recent studies which have shown accurate and high-quality results [32][33][34][35]. However, a hybrid Boruta random forest-ensemble empirical mode decomposition (BRF-EEMD)-BiLSTM model has not been applied in the context of sea wave forecasting to date in any previous study. Therefore, the prediction of Hs using this model is new and will provide useful insights on how sea wave parameters can help to efficiently perform short-term forecasting.
A key novelty in this study is the hybrid nature of the BiLSTM model which is incorporated with EEMD and Boruta random forest algorithms to increase the accuracy and reliability of significant wave height forecasting. It has been shown in many past studies [36][37][38] that EEMD, which is an improved version of EMD, can effectively help to break down signals into their components and mitigates modelling complexity. In [39], EEMD and long short-term memory (LSTM) algorithms have produced superior results when compared with a list of benchmark models to forecast crude oil price. In another study [40], EEMD and LSTM are combined for short-term wind speed prediction. The proposed approach in this study also outperforms other comparable models. A study [41] based on forecasting energy demand which also proposes a EEMD hybrid technique with multimodel ensemble BiLSTM shows that the hybrid-based method outperforms all the stateof-the-art techniques used for comparison. Furthermore, an EEMD-Particle Swarm Optimisation (PSO)-support vector machine (SVM) hybrid method to predict rainfall-runoff in watersheds in [37] shows that this approach effectively attains better results than the benchmark standalone model.
In addition to the data decomposition method, the uses of input selection techniques such as BRF optimiser increases the efficiency of modelling by screening and selecting the significant inputs. Feature selection is an important step in the application of machinelearning methods as datasets often have too many variables to build forecasting models [42]. An obvious reason for using a feature selection method is to overcome the computational load on algorithms by selecting more significant inputs. Due to its iterative ability, BRF can deal with the fluctuating nature of a random forest's importance measure and the interactions between the attributes [42]. Many studies have effectively used BRF feature selection for the improvement of machine learning models. A study [43] for future This superior ability is enhanced with the incorporation of many neural layers to overcome problems with relatively complex function approximation due to its capability for non-linear mapping [30,31]. The problem of overfitting is also addressed with the use of hyperparameter tuning of BiLSTM parameters in the model development phase. The BiLSTM model has been successfully used in other recent studies which have shown accurate and high-quality results [32][33][34][35]. However, a hybrid Boruta random forest-ensemble empirical mode decomposition (BRF-EEMD)-BiLSTM model has not been applied in the context of sea wave forecasting to date in any previous study. Therefore, the prediction of Hs using this model is new and will provide useful insights on how sea wave parameters can help to efficiently perform short-term forecasting.
A key novelty in this study is the hybrid nature of the BiLSTM model which is incorporated with EEMD and Boruta random forest algorithms to increase the accuracy and reliability of significant wave height forecasting. It has been shown in many past studies [36][37][38] that EEMD, which is an improved version of EMD, can effectively help to break down signals into their components and mitigates modelling complexity. In [39], EEMD and long short-term memory (LSTM) algorithms have produced superior results when compared with a list of benchmark models to forecast crude oil price. In another study [40], EEMD and LSTM are combined for short-term wind speed prediction. The proposed approach in this study also outperforms other comparable models. A study [41] based on forecasting energy demand which also proposes a EEMD hybrid technique with multi-model ensemble BiLSTM shows that the hybrid-based method outperforms all the state-of-the-art techniques used for comparison. Furthermore, an EEMD-Particle Swarm Optimisation (PSO)-support vector machine (SVM) hybrid method to predict rainfallrunoff in watersheds in [37] shows that this approach effectively attains better results than the benchmark standalone model.
In addition to the data decomposition method, the uses of input selection techniques such as BRF optimiser increases the efficiency of modelling by screening and selecting the significant inputs. Feature selection is an important step in the application of machinelearning methods as datasets often have too many variables to build forecasting models [42]. An obvious reason for using a feature selection method is to overcome the computational load on algorithms by selecting more significant inputs. Due to its iterative ability, BRF can deal with the fluctuating nature of a random forest's importance measure and the interactions between the attributes [42]. Many studies have effectively used BRF feature selection for the improvement of machine learning models. A study [43] for future soil moisture estimation successfully uses Boruta random forest (BRF) feature selection to select and capture significant inputs. The hybrid BRF-LSTM model outperforms the standalone models (LSTM, SVR and Multivariate Adaptive Regression Splines (MARS)) in this study. A similar approach is used in [44], which found superior results by a hybrid EEMD-Boruta-ELM model when forecasting soil moisture. The importance of input selection by BRF for data-driven streamflow forecasting is also demonstrated in [45].

Study Area and Data
This study utilised 30 min interval recorded wave features data (Table 1) for a period from 2014-2019 acquired from the Queensland government open data portal for coastal areas in Queensland. The wave features used in this study are significant wave height (Hs), maximum wave height (Hmax), zero up crossing wave period (Tz), peak energy wave period (Tp) and approximation of sea surface temperature (SST). Table 1 and Figure 1 show the selected sites and geographical location. There are no set criteria for data partition, however datasets are normally divided into training, validation and testing. This study uses a data partition of 60% training, 20% validation and 20% testing (see Table 2) for best results.

Data Preparation
The initial step in data preparation is to identify outliers, missing and 'incorrect' data values in the dataset. The accuracy of data modelling depends significantly upon the accuracy and reliability of data. Outliers are data points that are in a minority and have patterns which are quite different to the majority of other data points in the sample [46]. Any presence of outliers in the data will significantly affect how the machine learning models will effectively train the model for forecasting. Cook's distance [47] can be effectively used to detect and remove the outliers to improve any dataset for machine learning modelling. This method of removing outliers is mostly used to detect the influence of data points in a regression analysis [48]. Cook's distance D i of observation k is given as: where,ŷ i is the ith fitted response value,ŷ i(k) is the ith fitted response value when the fit excludes observation k, MSE is the mean squared error, and p is the number of coefficients in the regression model. The next important step is to determine the stationarity of the time series data. In order to fit a stationary model, it is highly important to determine that the data is a realisation of a stationary process [49]. This study uses two statistical tests to confirm the stationarity of the wave data (see Table 3), the augmented Dickey Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests. The augmented Dickey Fuller test is for larger datasets and tests the null hypothesis that unit root is present in the data. A greater negative value of the ADF statistic than the critical value confirms that null hypothesis can be rejected, and no unit root is present. To complement the ADF test, the KPSS test can be used to confirm ADF test. If the KPSS statistic is less than the critical value, then the null hypothesis is accepted confirming that the series is stationary. The following results confirm the stationarity criteria for the Gold Coast dataset and the same procedure was used for Cairns to confirm stationarity.
Considering, Hs as the time series variable for the 30 min interval, the significant lags are then used with the other wave feature variables of zero up crossing wave period (Tz), peak energy wave period (Tp), and sea surface temperature (SST) as the inputs to predict the significant wave height (Hs) (see Table 4).

Data Normalisation
All of the model input data were normalized [50][51][52] to make the range of [0,1] for modelling by Equation (3): After the forecasting using the trained model, values are then returned to the original values using Equation (4).
where x is the input data value, x min is the overall minimum and x max is the overall maximum value.

Data Decomposition by Ensemble Empirical Mode Decomposition (EEMD)
The EEMD method is an adaptive method that decomposes its original signal into components with amplitude and frequency-modulated parameters with a noise-assisted and analysis technique [53]. It is based on the commonly used empirical mode decomposition (EMD) which is a self-adaptive decomposition technique [54]. This method is suitable for both non-linear and non-stable signals as well. EEMD is largely improved from EMD by the addition of Gaussian white noise into the raw series. Hence, it can attribute signals with different time scales to the reference time scales.
The algorithm splits the original wave signals into intrinsic mode functions (IMFs). The training, validation and testing data are split separately by the algorithm. This is done to ensure no leakage of information occurs from the training IMFs into the validation and testing phase of modeling.
Given, n-dimensional length L of a dataset, the application in this study follows the algorithm in [54,55]. The procedure considers two important conditions:

•
The n-dimensional length either has an equal number of extrema and zero crossings, or they differ at most by one.

•
The mean value at any point which is defined by local maxima and the envelope defined by the local minima are zero.
The EEMD is implemented as follows [54]: • The white noise series are added to the wave data; • Wave dataset is then decomposed with added white noise into its IMFs (see Figure 2); • Steps 1 and 2 are repeated with different Gaussian white noise series.

•
Since the mean value of added noise is zero, the average over all corresponding IMFs will be the final decomposition. to ensure no leakage of information occurs from the training IMFs into the validation and testing phase of modeling. Given, n-dimensional length L of a dataset, the application in this study follows the algorithm in [54,55]. The procedure considers two important conditions:


The n-dimensional length either has an equal number of extrema and zero crossings, or they differ at most by one.  The mean value at any point which is defined by local maxima and the envelope defined by the local minima are zero. The EEMD is implemented as follows [54]:  The white noise series are added to the wave data;  Wave dataset is then decomposed with added white noise into its IMFs (see Figure 2);  Steps 1 and 2 are repeated with different Gaussian white noise series.  Since the mean value of added noise is zero, the average over all corresponding IMFs will be the final decomposition.    Figure 3 shows the partial autocorrelation function (PACF) plot of the Hs time series to show the antecedent behaviour in terms of the lag of Hs in hours. The partial autocorrelation (PACF) function analysis is also used to determine the lags of the wave prediction model for the target variable Hs. The PACF analysis method is widely used as it provides partial correlation of the stationary time series with its own lagged values. It helps to de-termine how many past lags can be useful to include in the forecasting model. The de-composition of original input series of 48 lags produces 6 IMFs of each lag.
Remote Sens. 2021, 13, x FOR PEER REVIEW 7 of 20 model for the target variable Hs. The PACF analysis method is widely used as it provides partial correlation of the stationary time series with its own lagged values. It helps to determine how many past lags can be useful to include in the forecasting model. The decomposition of original input series of 48 lags produces 6 IMFs of each lag.

Feature Selection by Boruta Random Forest Optimiser (BRF)
Boruta is a feature selection method where the algorithm is built around the random forest classification. It is based on an ensemble where selection is made according to the voting of multiple weak classifiers known as decision trees. The following steps summarize the feature selection process [42]:  the information system is extended by the addition of all variables in consideration, minimum of five shadow attributes are added;  the added attributes are shuffled so that their correlation with the response are removed;  the random forest classifier is applied to gather the z-scores;  the maximum z-score among the shadow attributes is found and every attribute that has a better score than this is taken as a hit;  a two-sided test of equality is performed with attributes that attained an undetermined importance;  the attributes that have significantly lower z-score than the maximum z-score among the shadow attributes are removed;  the attributes that have significantly higher z-score are selected;  all shadow attributes are then removed.
The correlogram in Figure 4 shows the covariance between the objective variable (Hs) in terms of the cross-correlation coefficient (rcross) for Cairns. The input lags are then selected based on their significance when the IMFs are screened by the Boruta algorithm (see Figure 5).

Feature Selection by Boruta Random Forest Optimiser (BRF)
Boruta is a feature selection method where the algorithm is built around the random forest classification. It is based on an ensemble where selection is made according to the voting of multiple weak classifiers known as decision trees. The following steps summarize the feature selection process [42]: • the information system is extended by the addition of all variables in consideration, minimum of five shadow attributes are added; • the added attributes are shuffled so that their correlation with the response are removed; • the random forest classifier is applied to gather the z-scores; • the maximum z-score among the shadow attributes is found and every attribute that has a better score than this is taken as a hit; • a two-sided test of equality is performed with attributes that attained an undetermined importance; • the attributes that have significantly lower z-score than the maximum z-score among the shadow attributes are removed; • the attributes that have significantly higher z-score are selected; • all shadow attributes are then removed.
The correlogram in Figure 4 shows the covariance between the objective variable (Hs) in terms of the cross-correlation coefficient (rcross) for Cairns. The input lags are then selected based on their significance when the IMFs are screened by the Boruta algorithm (see Figure 5).        Figure 6 shows the overview of the workflow for the hybrid model development. The wave predictors and target variable (Hs) pass through a series of model screening process so that significant features are extracted from the raw data.
Remote Sens. 2021, 13, x FOR PEER REVIEW 9 of 20 Figure 6 shows the overview of the workflow for the hybrid model development. The wave predictors and target variable (Hs) pass through a series of model screening process so that significant features are extracted from the raw data.  Figure 7 shows the proposed BiLSTM architecture snapshot used for significant wave height modelling. The input connects to five layers, two BiLSTM layers, a unidirectional LSTM layer, a dense layer and an activation layer which finally connects to the output.  Figure 7 shows the proposed BiLSTM architecture snapshot used for significant wave height modelling. The input connects to five layers, two BiLSTM layers, a unidirectional LSTM layer, a dense layer and an activation layer which finally connects to the output.
Model design variables and hyperparameters are set manually with a pre-determined value before the training phase. These can be considered as 'knobs' which can be turned to tune the deep learning models. These values can be found by a trial and error approach, doing 100% manual search or alternatively by grid-search as in this study. Table 5 shows the model variables and hyperparameters which were used for BiLSTM.  Figure 7 shows the proposed BiLSTM architecture snapshot used for significant wave height modelling. The input connects to five layers, two BiLSTM layers, a unidirectional LSTM layer, a dense layer and an activation layer which finally connects to the output.

Support Vector Regression (SVR) Model Development
The penalty or the cost function C in SVR determines if the data are of a good quality by the narrow distance between any two hyperplanes. If the data are noisy, a smaller value of C is preferred so that support vectors are not penalised. Therefore, it is important to deduce an optimal value of C. Hence, a cross-validated grid search was done on the γ and C values, at different values of ε. For different combinations for ε, γ, and C, Table 6 shows the optimal scores obtained from the grid search procedure. Table 6. Optimal values obtained from grid search for ε, γ, and C in the support vector regression (SVR) model. Each LSTM network unit is a special kind of recurrent neural network (RNN) (see Figure 8) and is an effective model for many sequential learning problems [56][57][58][59][60][61]. It was designed to model temporal information and long-term memory more than conventional RNNs. RNN is an artificial neural network and is a strong dynamic system that takes the input sequence one at a time (see Figure 9), which has the capability of holding historical information of already processed inputs in the hidden units [62]. However, the back-propagation with time method also presents an issue with vanishing and exploding gradient [63,64]. In addition to this, RNN does not have the capability to model the longrange context dependency between the inputs and outputs. Hence, to overcome these problems, LSTM was introduced which has special memory units in the hidden layer).

Epsilon (ε) Gamma (γ) Parameter (C) Kernel
input sequence one at a time (see Figure 9), which has the capability of holding historical information of already processed inputs in the hidden units [62]. However, the back-propagation with time method also presents an issue with vanishing and exploding gradient [63,64]. In addition to this, RNN does not have the capability to model the long-range context dependency between the inputs and outputs. Hence, to overcome these problems, LSTM was introduced which has special memory units in the hidden layer).  The BiLSTM model (see Figure 10) is a DL network architecture and is an extension of the single unidirectional LSTM where each training sequence is fed into both, forward and backward recurrent nets that are separately connected to the same output layer. Hence, for every point in a given sequence, the network structure takes full sequential information about all points before and after [65,66]. The following gates and functions form the basis of the overall architecture: Forget Gates: Input Gates: Output Gates: Sigmoid Function: Cell Input State: Hypertangent Function: where, , , and are bias vectors. The , , , and are weight matrices which connects the previous cell output state to the gates and the input cell state. The , Figure 9. The long short-term memory (LSTM) cell block representation on how the input is processed within the architecture.
The BiLSTM model (see Figure 10) is a DL network architecture and is an extension of the single unidirectional LSTM where each training sequence is fed into both, forward and backward recurrent nets that are separately connected to the same output layer. Hence, for every point in a given sequence, the network structure takes full sequential information about all points before and after [65,66]. The following gates and functions form the basis of the overall architecture: Forget Gates: Input Gates: Output Gates: Sigmoid Function: Cell Input State: The BiLSTM network takes data and processes it in both directions with separate hidden layers ( Figure 10). It takes output from these two sequences and combines them using a merge mode which could be a concatenating 'concat', summation 'sum', average 'ave' or a multiplication 'mul' function. Finally, the sequence gives an output of prediction vectors: = [… − , , + … ]. Each is calculated using the merge mode as follows: The support vector (SV) algorithm was developed by Vapnik and co-workers in Russia [67]. The algorithm is based on non-linear generalisation of the generalised portrait algorithm developed in Russia in the past three decades [67,68]. The support vector machine (SVM) can also be applied using regression (SVR) and is based on the same principles (see Figure 11). This regression version was developed by Vapnik, Steven Golowich, and Alex Smola in 1997 [69]. It can be effectively used for prediction to minimize the error, find the optimum solution and avoid the "curse of dimensionality" [70]. where C = penalty parameter, = an insensitive loss function and , * = slack variables.
The BiLSTM network takes data and processes it in both directions with separate hidden layers ( Figure 10). It takes output from these two sequences and combines them using a merge mode which could be a concatenating 'concat', summation 'sum', average 'ave' or a multiplication 'mul' function. Finally, the sequence gives an output of prediction vectors: Y t = [. . . y t−1 ,y t , y t+1 . . .]. Each y t is calculated using the merge mode as follows: The support vector (SV) algorithm was developed by Vapnik and co-workers in Russia [67]. The algorithm is based on non-linear generalisation of the generalised portrait algorithm developed in Russia in the past three decades [67,68]. The support vector machine (SVM) can also be applied using regression (SVR) and is based on the same principles (see Figure 11). This regression version was developed by Vapnik, Steven Golowich, and Alex Smola in 1997 [69]. It can be effectively used for prediction to minimize the error, find the optimum solution and avoid the "curse of dimensionality" [70]. Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 20 Figure 11. The SVR model structure shows how the error and the hyperplane are arranged around the data points.

Results and Discussion
Several statistical metrics were used in this study to evaluate the performance of the stacked BiLSTM and the SVR models. The commonly used model score metrics such as the Pearson's correlation coefficient (R), Nash-Sutcliffe coefficient (NS), Willmott's index of agreement (WI), root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).
The mathematical forms are as follows: − Data observed, − Data simulated, − Mean of observed data, − Mean of simulated data

Pearson's Correlation Coefficient (R)
2. Nash-Sutcliffe Coefficient (NS) 3. Willmott's Index of agreement (WI) 4. Root Mean Square Error (RMSE) Figure 11. The SVR model structure shows how the error and the hyperplane are arranged around the data points.
Subject to: min where C = penalty parameter, ε = an insensitive loss function and ξ i , ξ i * = slack variables.

Results and Discussion
Several statistical metrics were used in this study to evaluate the performance of the stacked BiLSTM and the SVR models. The commonly used model score metrics such as the Pearson's correlation coefficient (R), Nash-Sutcliffe coefficient (NS), Willmott's index of agreement (WI), root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).
The mathematical forms are as follows: Pearson's Correlation Coefficient (R) 2 Nash-Sutcliffe Coefficient (NS) 4 Root Mean Square Error (RMSE) 5 Mean Absolute Error (MAE) 6 Mean Absolute Percentage Error (MAPE) Results obtained from the standalone and hybrid models for forecasting significant wave height (Hs) were assessed to validate their effectiveness in the 24 h forecasting. The forecasted values using all models in this study were analysed in terms of the predictive accuracy. The comparison was made based on the six key statistical performance criteria (Equations (13)- (18)). The performance metrics of all models in the testing phases are shown in Table 7 with the best model in bold. For all the study locations in the testing phase of significant wave height prediction Hs, the EEMD-BiLSTM model outperformed all other benchmark models with higher Pearson's correlation coefficient (R) value and lower RMSE, MAPE and MAE (see Table 7). The Taylor diagram (see Figure 12) shows the statistical summary of how the observed values match with the forecasting values. Figure 12 shows the comparison of the models in terms of Pearson's correlation, RMSE and standard deviation. The illustration shows the objective model (red dot) outperforming all other benchmark models in terms of these important performance measuring metrics.
In addition to this, to further evaluate and validate the accuracy of the models, dimensionless metrics such as WI [71] and NS [72] have been calculated. These metrics overcome the insensitivity of correlation-based measures to differences in the observed and model-simulated means and variance [71][72][73]. WI is a dimensionless measurement of model accuracy which is bounded by 0 and 1, meaning no agreement for 0, and a perfect fit for 1 [71,74]. The values obtained show consistent higher scores of more than 95% for standalone and greater than 98% for hybrid models. NS is another normalised metric that is widely used in evaluating forecasted results [75]. It ranges from −∞ to 1, an efficiency of 1 corresponds to a perfect match of simulated data to the observed data. EEMD-BiLSTM NS achieves a value of greater than 0.95 (Figure 13), these values confirm the efficiency of the prediction models as used in many past research studies [75][76][77][78]. The curve fitting between the observed significant wave height (Hs obs ) and predicted significant wave height (Hs pred ) for 30-min prediction are shown in Figure 13 for both the models at the two coastal sites. The scatter plot shows the relationship between the normalized predicted and observed values. EEMD-BiLSTM provides a better curve fitting between (Hs obs ) and (Hs pred ), a higher quality of fit R 2 and a small residual error. For all the study locations in the testing phase of significant wave height prediction , the EEMD-BiLSTM model outperformed all other benchmark models with higher Pearson's correlation coefficient (R) value and lower RMSE, MAPE and MAE (see Table  7). The Taylor diagram (see Figure 12) shows the statistical summary of how the observed values match with the forecasting values. Figure 12 shows the comparison of the models in terms of Pearson's correlation, RMSE and standard deviation. The illustration shows the objective model (red dot) outperforming all other benchmark models in terms of these important performance measuring metrics.  In addition to this, to further evaluate and validate the accuracy of the models, dimensionless metrics such as WI [71] and NS [72] have been calculated. These metrics overcome the insensitivity of correlation-based measures to differences in the observed and model-simulated means and variance [71][72][73]. WI is a dimensionless measurement of model accuracy which is bounded by 0 and 1, meaning no agreement for 0, and a perfect fit for 1 [71,74]. The values obtained show consistent higher scores of more than 95% for standalone and greater than 98% for hybrid models. NS is another normalised metric that is widely used in evaluating forecasted results [75]. It ranges from −∞ to 1, an efficiency of 1 corresponds to a perfect match of simulated data to the observed data. EEMD-BiLSTM NS achieves a value of greater than 0.95 ( Figure 13), these values confirm the efficiency of the prediction models as used in many past research studies [75][76][77][78]. The curve fitting between the observed significant wave height ( ) and predicted significant wave height ( ) for 30-min prediction are shown in Figure 13 for both the models at the two coastal sites. The scatter plot shows the relationship between the normalized predicted and observed values. EEMD-BiLSTM provides a better curve fitting between ( ) and ( ), a higher quality of fit R 2 and a small residual error. The histograms in Figure 14 compare the 24 h absolute forecasting error. These show the difference between the forecasting model value and the actual value in the testing period. The EEMD models where signal decomposition was used shows high frequency of values with lower absolute forecasting error whereas the standalone models have The histograms in Figure 14 compare the 24 h absolute forecasting error. These show the difference between the forecasting model value and the actual value in the testing period. The EEMD models where signal decomposition was used shows high frequency of values with lower absolute forecasting error whereas the standalone models have higher absolute forecasting error with larger values. The objective model, EEMD-BiLSTM, has a lower range of forecasting error when compared with all benchmark models used in the study for both study sites.

Conclusions
Since the physical models present many challenges of computational complexity and convergence issues of solutions, data-driven models can be an important alternative in providing ocean wave predictions and assessment. The results have shown that the hybrid EEMD-BiLSTM model that is enhanced with the Boruta feature selection optimiser can provide valuable and accurate insight into 24-hour forecasting, much needed for prior preparation for coastal wave events. This information on wave heights and notable changes in the sea level could be very helpful for assessing wave activities along coastal areas. According to [79], coastal vulnerability is in the medium range if the mean significant wave height is between 1.25 m and 1.4 m. The analysis of wave data in this study shows Gold Coast with a high frequency of Hs with height greater than 2 m when compared with Cairns. Thus, any accurate and reliable forecasting will help to ensure prior

Conclusions
Since the physical models present many challenges of computational complexity and convergence issues of solutions, data-driven models can be an important alternative in providing ocean wave predictions and assessment. The results have shown that the hybrid EEMD-BiLSTM model that is enhanced with the Boruta feature selection optimiser can provide valuable and accurate insight into 24-hour forecasting, much needed for prior preparation for coastal wave events. This information on wave heights and notable changes in the sea level could be very helpful for assessing wave activities along coastal areas. According to [79], coastal vulnerability is in the medium range if the mean significant wave height is between 1.25 m and 1.4 m. The analysis of wave data in this study shows Gold Coast with a high frequency of Hs with height greater than 2 m when compared with Cairns. Thus, any accurate and reliable forecasting will help to ensure prior planning and preparation. The Australian Bureau of Meteorology (BOM) also uses this concept of significant wave height for warnings regarding ocean swells.
This study has also shown that a new and advanced hybrid methodology that combines two LSTM layers where LSTM units have taken the dependence between consecutive events into computation on a relevant time stamp can be effectively implemented with an effective signal decomposition and feature selection technique for the marine environment. It represents a new paradigm in significant wave height forecasting where all timesteps of the input features are utilised through the forward and backward feed network structure. To validate the merits of the proposed BRF-EEMD-BiLSTM model, a comprehensive evaluation of the model was carried out through calculation of statistical metrics and visualisation with comparison with benchmark models. Considering these results, the newly proposed hybrid model with half-hourly wave input data can be an effective tool for short-term prediction of Hs. The overall analysis and assessment of wave features at the two coastal sites of Queensland shows high wave impact and more coastal vulnerability towards the South East region. The availability of forecasting ability of artificial intelligence models such as those used in this study will further enable more reliable and accurate future wave warning systems. This study can also be extended at medium to long time-horizon Hs forecasting.

Data Availability Statement:
The data that support the findings of this study are openly available in Queensland Government Open Data Portal at https://www.data.qld.gov.au/dataset/coastal-datasystem-historical-wave-data (accessed on 1 January 2020).