Forecasting of Extreme Storm Tide Events Using NARX Neural Network ‐ Based Models

: The extreme values of high tides are generally caused by a combination of astronomical and meteorological causes, as well as by the conformation of the sea basin. One place where the extreme values of the tide have a considerable practical interest is the city of Venice. The MOSE (MOdulo Sperimentale Elettromeccanico) system was created to protect Venice from flooding caused by the highest tides. Proper operation of the protection system requires an adequate forecast model of the highest tides, which is able to provide reliable forecasts even some days in advance. Nonlinear Autoregressive Exogenous (NARX) neural networks are particularly effective in predicting time series of hydrological quantities. In this work, the effectiveness of two distinct NARX ‐ based models was demonstrated in predicting the extreme values of high tides in Venice. The first model requires as input values the astronomical tide, barometric pressure, wind speed, and direction, as well as previously observed sea level values. The second model instead takes, as input values, the astronomical tide and the previously observed sea level values, which implicitly take into account the weather conditions. Both models proved capable of predicting the extreme values of high tides with great accuracy, even greater than that of the models currently used.


Introduction
Extreme tidal events strongly affect the socio-economic and environmental aspects of many locations around the world. The Bay of Fundy, which separates the eastern Maritime Provinces of Nova Scotia and New Brunswick, Canada, is characterized by the world's largest tidal ranges. High tide can be up to over 16 m higher than low tide [1]. The tides in the Bay of Fundy are so high because of an unusual combination of factors: the shape of the bay and the seiches. It is very important for people living in the Bay of Fundy to know the tide heights in order to be able to move or sail along the coast safely.
The approach of a storm to the shore leads to an increase in the sea level above the normal value. For less intense storms that cannot be qualified as hurricanes, or which do not cross the coast, this rise is generally lower than 1 m. For very intense hurricanes, the storm surge may exceed 5 m. On 29 August 2005, the storm surge induced by Hurricane Katrina, exceeding 8 m in some places, caused multiple failures to various flood protection structures of greater New Orleans area, submerging 80% of the city [2]. Due to the storm surge, the sea penetrated 10 km inland in many areas and up to 19 km inland along bays and rivers, devastating the coasts of Mississippi and Alabama and making Katrina one of the most destructive natural disasters in the history of the United States.
Unlike what happens in the oceans, in the Mediterranean Sea the maximum tidal fluctuations generally do not exceed a few tens of centimeters, except for some particular cases, where shape and geometry of coastline play a major role along with storm events. The most emblematic and interesting case is represented by the Venice lagoon, where the storm tide may overcome 1.5 m in correspondence with critical meteorological conditions, exposing its extraordinary historic-architectural heritage to the flooding risk. When the storm tide exceeds 150 cm, 70% of the Historic Centre of Venice is flooded, while a storm tide exceeding 110 cm causes the flooding of 12% of the same Historic Centre.
In order to protect Venice and its lagoon from storm tides exceeding 110 cm, the MOSE system, consisting of 4 mobile barriers, was designed and it is nearing completion by the Interregional Superintendency for Public Works in Veneto-Trentino Alto Adige-Friuli Venezia Giulia of the Italian Ministry of Infrastructures. The 78 flap gates, installed at the bottom of the inlets, allow to temporarily separate the lagoon from the sea during the events of severe storm tide. When the system is fully operational, the mobile barriers will be raised if the storm tide exceeds 110 cm. The procedures for activating the MOSE barriers are still being fine-tuned in detail. However, the lifting procedure of the mobile gates is quite complex and takes about 5-6 h in total. In order to cope with extreme storm tide events, it is advisable to have forecast models that are accurate even with advance times long enough to allow the MOSE (MOdulo Sperimentale Elettromeccanico) system to be activated safely.
Currently, the prediction of extreme storm tide levels in Venice is carried out using statistical [3] or hydrodynamic models [4]. In the last two decades, there has been a significant increase in the use of Machine Learning (ML) algorithms in the prediction of hydrological quantities [5][6][7][8][9][10][11][12]. Few applications so far have involved sea level prediction. Imani et al. (2018) used the Extreme Learning Machine (ELM) for the prediction of the sea level fluctuation at Chiayi coast, Taiwan [13]. Liu et al. (2019) developed a combined tidal forecasting model based on the Support Vector Regression (SVR) and on the Autoregressive Integrated Moving Average, providing a simulation of the measured tidal data at the station of Bay Waveland Yacht Club, USA [14]. Most of the studies in the existing literature are based on Artificial Neural Networks (ANN). Jain et al. (2007) developed a tidal prediction model based on the Feed Forward with Levenberg-Marquardt Back Propagation neural network, reaching a forecasting horizon of 24 h, and used it for the tide prediction on the west Indian Ocean coasts [15]. Karimi et al. (2013) predicted the sea level fluctuation in Darwin Harbor, Australia, based on Adaptive Neuro-Fuzzy Inference System (ANFIS) and ANN [16]. Riazi et al. (2020) proposed a Deep Learning approach for the tide level prediction based on the forces (e.g., astronomical forces) that affects the tide level and applied it for the prediction of the future tides for six stations located on the Queensland coast, Australia [17]. However, no studies based on these procedures and focused on the extreme values of storm tides were reported in literature.
In this study, the Nonlinear Autoregressive Exogenous (NARX) neural networks were used for the prediction of the extreme storm tide events in Venice historic center. NARX networks are a dynamic recurrent neural networks, particularly suitable for the prediction of time series which highlight seasonal components, of quantities relating to natural phenomena [18]. NARX networks have so far found a fair application in some forecasting problems of hydrological time series. Lee and Resdi (2016) developed a NARX-based model capable of catchment-wide simultaneous prediction of river stages at multiple gauging stations in Kemaman basin, Malaysia [19]. Guzman et al. (2017) used NARX networks to simulate daily groundwater levels at a local scale in the Mississippi River Valley Alluvial aquifer [20]. Rjeily et al. (2017) employed NARX network to forecast flooding events within urban drainage systems [21]. Yang et al. (2019) applied NARX and other recurrent neural networks to simulate the operation of three multi-purpose reservoirs located in the upper Chao Phraya River basin, Thailand [18]. Di Nunno and Granata (2020) used NARX network to predict the daily groundwater level fluctuation for 76 monitored wells located in Apulia, Southern Italy [22]. Furthermore, NARX networks are able to optimize time performance in comparison with other ANNs [23]. In the future, this ability will allow a continuous and quick retrain of the network with new data, in order to take into account, in the storm tides forecasting, the effects related to the sea-level rise and subsidence which affect the Venice Lagoon.
This study reports the updates of the research whose first results are shown in Di Nunno et al. (2021), in which NARX networks were used to forecast tide oscillations in 19 tide gauge stations covering the entire Venetian Lagoon [24], while here the attention focused on the extreme storm tide events in Venice historic city center.
The NARX-based models were developed with tide data collected at the Punta della Salute tide gauge, located at the end of Canal Grande, and meteorological data collected at the Piattaforma CNR, located outside of the Venetian Lagoon. The accuracy of the models was assessed as the input variables and lag times varied, considering a forecasting horizon of up to 72 h, that it is enough time to activate the MOSE barriers. Furthermore, a sensitivity analysis to the training time series length and to the duration of the extreme tidal events was also performed. Finally, a comparison with the prediction performed using statistical and deterministic models was also made in order to evaluate the ability of the NARX network to provide reliable predictions with respect to other well-known models.

Study Area and Extreme Events
The Venice Lagoon is located in the Upper Adriatic and extends from the Sile river in the north to the Brenta river in the south, covering an area of 550 km 2 . Only the 8% is represented by land, including the city of Venice and many smaller islands while about 11% is permanently covered by open water and 80% consists of mudflats, tidal shallows, and salt marshes. The lagoon is connected to the Adriatic Sea by means of three inlets: Lido, Malamocco, and Chioggia ( Figure 1). In ordinary conditions, the observed sea level in the Lagoon is little different from that induced by the astronomical tide since meteorological effects are small. In the case of particularly adverse weather conditions, with significant drop pressure and strong sirocco wind, the meteorological effects become significant: if they are in phase with a high astronomical tide, they can lead to the phenomenon of high water, which is conventionally defined as a tide level greater than 80 cm with reference to the Punta della Salute tide gauge. At this tide level, the flooding of the lower part of the city begins. The storm surge may be enhanced by a severe low pressure on the upper Adriatic and contemporary high pressure on the lower Adriatic. The situation worsens further if a strong wind from North-East (Bora) blows in the northern Adriatic together with sirocco wind in middle Adriatic, leading to the convergence of wind-induced marine currents. In addition, due to the shape of the Adriatic Sea, a storm surge leads to the formation of seiches whose amplitude is progressively decreasing [25]. The concomitant action of the above factors can lead to extreme values of storm tide levels, as observed more and more frequently in recent decades.
The most well-known and impressive extreme event occurred on November 4, 1966. The storm tide level was the highest recorded, 194 cm [26]. The surge is even more impressive since the astronomical contribution to the observed peak was less than its mean level, so it was negative. Those days were characterized by exceptionally adverse weather conditions that caused extensive civil and hydrogeological damage in various areas of Italy, such as the famous flood in Florence. The sirocco wind was strong and persistent, reaching a speed of 52 knots in Venice, of 58 knots in lower Adriatic Sea. The pressure drop was 30 hPa in 2 days. The flood has been extremely long in time: the tide level persisted over 110 cm for 22 h.
The second most severe event of exceptional high water occurred in Venice on 12 November 2019 and resulted to the flooding of almost 90% of the city. In November 2019, there was an anomalous persistence of sirocco winds on the Adriatic Sea, which caused an accumulation of water masses in the Venetian Lagoon. The simultaneous persistence of low pressure on the upper Adriatic induced a significant rise in sea level, which was about 30 cm higher than the average value of the last 25 years.
In recent years, also due to eustatism and subsidence, as well as due to climate change, the extreme storm tide events affecting Venice have become increasingly frequent. The good availability of data relating to water level, wind characteristics, and barometric pressure has allowed to develop data-driven models capable of accurately predicting storm tide levels with forecast horizons of many hours. It is shown in the following that models based on NARX networks allow to predict even extreme events with high accuracy.

NARX Model Architectures
NARX is a recurrent dynamic network composed of interconnected nodes. Each node represents an artificial neuron that receives one or more inputs and elaborates them to produce an output. These sums pass through a non-linear activation function. The main advantages of the NARX networks with respect to other ANNs approaches are the faster convergence in reaching the optimal weights of the connections between neurons and input parameters [28] and the reduced number of the latter to calibrate and make the model effective [20]. The defining equation for the NARX model is: where u(t) and y(t) are, respectively, the input and output values at time t, nu and ny are the input and output network layers, and f is the non-linear function, approximating by the Feedforward Neural Network (FNN). The NARX architectures include 3 different and sequential layers ( Figure 2). The first is the input layer, which contains the input parameters of the neural network. The second is the hidden layer, which is the computational step between input and output. The third is the output layer, which leads to the predicted value y(t). Moreover, the output value is then fed back to the input values as part of the NARX architectures.
For the hidden layer, a sigmoid activation function f1 was used, while a linear activation function f2, with only one neuron n, was used for the output layer. A preliminary analysis was carried out to assess the optimal number of hidden nodes, which was found to be equal to 3 ( Figure 2, indicated as h1, h2, and h3). For the output layer, the weight w and bias b of the NARX model were optimized based on the Bayesian Regularization training algorithm. The NARX process was stopped if one of the following parameters was reached: the maximum number of epochs, settled equal to 1000; the Levenberg-Marquardt adjustment parameter, settled equal to 1 × 10 −10 ; the error gradient was below a minimal value, settled equal to 1 × 10 −7 .
The Bayesian Regularization allows to achieve the best predictions in comparison with other two training algorithms preliminarily tested, the Levenberg-Marquardt and the Scaled Conjugate Gradient. It consists of a Gauss-Newton approximation to the Hessian matrix, based on the Bayesian technique proposed by MacKay (1992) [29], and implemented in the Levenberg-Marquardt algorithm, in order to reduce the probability of overfitting and the computational overhead [30]. Despite a slow convergence with respect to the direct application of Levenberg-Marquardt algorithm, the Bayesian Regularization algorithm usually leads to improved forecasting [22].
A normalization of the input values was also performed, in order to have a common range between 0 and 1 and improve the modeling performance. The water level, the astronomical tide, the barometric pressure and the wind speed were normalized with respect to the, respectively, maximum values along the time series, while the wind direction was divided by 360, equal to the complete angle in degree: where i indicates the temporal step and * indicates the dimensionless form of each parameter.
The two different NARX-based models were implemented in MATLAB ® 2020a environment [31]: Model A and Model B. The two models were different in the input variables. Both the models included the lagged values of the water level htide(t-ta) among the inputs.
In addition, Model A considered astronomical tide hastr(t-ta), wind speed vwind(t-ta), wind direction αwind(t-ta), and barometric pressure Patm(t-ta) as exogenous input parameters, whereas Model B considered only hastr(t-ta) as an exogenous input variable. The astronomical tide hastr(t-ta) values were computed through the following: where A0 is the average sea level, An is the amplitude, σn the angular frequency, kn the phase delay of component n, and N is the number of harmonics used to evaluate the astronomical tide height. These values can be found on the Venice Municipality website [32].
It should be noted that, despite the meteorological parameters were neglected for Model B, their influence is implicitly expressed by including previously observed water level values, which in turn also depend on meteorological factors. Different lag time ta values were considered, in order to assess the performance of the models as ta increases: 12, 24, 48, and 72 h.

Evaluation Metrics
The performances of the NARX-based models were evaluated by means of three evaluation metrics: the coefficient of determination R 2 , which assess how well the model replicates observed outcomes and predicts future outcomes, the Mean Absolute Error (MAE), which provides the average error magnitude for the predicted values, and the Relative Absolute Error (RAE), equal to the ratio between absolute error and absolute value of the difference between average and each measured values. These metrics are defined as: where is the total number of measured data, is the predicted value for the i-th data, is the measured value for the i-th data, and is the averaged value of the measured data. The performances of the NARX-based model were computed considering only the extreme tidal events, with htide equal or greater than 110 cm.

Training and Testing
The NARX-based models were trained using all water level data, collected between January 2009 and June 2012. After the training stage, the models were employed to simulate all the storm tide events characterized by a level higher than 110 cm that occurred from January 2009 to December 2020. Overall, 52 extreme storm tide events were identified. Of these, 43 events are classified as very high storm tide, with sea level between 110 cm and 140 cm, and 9 events as exceptional storm tide, with storm tide higher than 140 cm.
The NARX-based model showed a very good fit for both Model A and Model B and for all lag time ( Table 1). The best results were achieved for the lower ta, equal to 12 h (Model A-R 2 = 0.950, MAE = 1.96 cm and RAE = 23.03%; Model B-R 2 = 0.941, MAE = 2.14 cm, and RAE = 25.13%). The prediction performance reduces as the lag time increases, with the NARX models providing the less accurate forecasts for ta = 72 h (Model A-R 2 = 0.899, MAE = 2.91 cm, and RAE = 34.25%; Model B-R 2 = 0.860, MAE = 3.29 cm, and RAE = 38.75%). In Figure 3 the comparison between measured and predicted extreme storm tide levels for both Model A and Model B, and for different lag times is reported, highlighting a slight tendency to underestimate these extreme events. Overall, regardless of the lag time, the NARX network-based models are able to forecast extreme storm tide events, including the exceptional storm tides. Moreover, the model performance was not significantly affected by the additional input parameters: Model B is only slightly less accurate than model A. This obviously does not mean that the weather parameters have little influence on extreme events: the fundamental influence of the weather parameters is taken into account by means of the lagged values of the storm tide level [24].

Time Series Analysis
A further analysis was carried out on the time series related to the extreme storm tide events, in order to evaluate the ability of NARX to provide reliable predictions in the hours before and after the water level peaks, but also the sea level during the whole storm tide event. As an example, the modeling of the highest storm tide in the entire investigated period, equal to 185 cm, is here described. This event took place from 11 to 14 November  , and for all the tested lag times, the NARX network has proven to be able to forecast that extreme event by accurately simulating the storm tide trend. For the 85 cm peak, Model A provides the best prediction with the lower ta, equal to 12 h, with an overestimation of the storm tide equal to 0.29 cm. Model B also provides very good outcomes: the less accurate forecast was obtained for ta = 24 h, with an underestimation of 3.08 cm. For the third peak, equal to 129 cm, a very effective prediction was achieved with Model A also for ta = 72 h, with a slight underestimation of just 0.02 cm. For the same lag time, Model B provides a very accurate forecast too, with an underestimation of 0.61 cm. For the exceptional tide peak = 185 cm, the best forecast was achieved with Model A and ta = 48 h, with an underestimation of 0.2 cm. However, Model B also provided accurate predictions for all lag times, and the worst forecast was obtained for ta = 24 h, with an overestimation of 5.99 cm. Following this peak, there is a further exceptional storm tide, corresponding to a 145 cm sea level, the last of the considered period. For the latter, the best forecasts were again achieved with Model A, with an underestimation of 0.27 cm for ta = 24 h. However, accurate predictions were also obtained with Model B, with an underestimation of 0.28 cm in the case of ta = 48 h. Figure 4 reports the comparison between measured and predicted sea level for both Model A and Model B and for the different lag times.

Sensitivity Analysis
The sensitivity of the NARX-based models to the training time series length and to the duration of the extreme tidal events is discussed in this section.
Model B was considered to assess the prediction accuracy as the training time series length decreases. A total of four different training time series lengths were investigated:   expected, a performance reduction was observed as the length of the time series reduced (Table 2), with the worst predictions obtained with the shorter training time series (R 2 = 0.820, MAE = 3.96 cm, and RAE = 40.57%). However, predictions obtained even with a reduced time series length were still quite accurate.  The prediction accuracy for extreme tidal events with different duration was evaluated considering three different events.   Figure 6 reports the comparison between measured and predicted tide for an extreme storm tidal event that occurred between 10 November 2012 and 14 November 2012 (indicated as Event 1), with a duration of 48 h. An exceptional storm tide was recorded during this event, with htide higher than 140 cm. As confirmed by the metrics, reported in Table 3, both models A and B were able to provide accurate predictions of this event, for all lag times. Figure 7 reports a second event (Event 2) which occurred between 27 October 2018 and 3 November 2018, with a duration of 168 h. The two consecutive exceptional storm tides were measured during this period, with also two very high storm tides (htide higher than 110 cm) before and after the exceptional storm tides. Additionally, in this case, predictions were very accurate, with a slight underestimation of the exceptional storm tides for Model B and ta = 72 h.
The third event (Event 3) is shown in Figure 8. It occurs between 20 December 2019 and 29 December 2019, with a duration of 216 h. During this period, one exceptional storm tide was measured, preceded by two very high storm tides, and followed by another very high storm tide. In this case, it is observed that both Model A and Model B are able to provide an accurate prediction of the peaks, for all lag times.
The analysis of the investigated events indicates that the accuracy of the model is not affected by the duration of the storm tide event (Table 3).

Comparison with Other Models
The accuracy of NARX-based models in predicting extreme events was compared with that of two statistical models, referred as ISPRA statistical models (ISPRA-STAT2008 [33]), and one deterministic model, referred as Shallow water Hydrodynamic Finite Element Model (SHYFEM, [4,34]).
To evaluate the prediction accuracy of the deterministic model, the results of a modeling carried out in the period 1 October 2012-1 October 2013 for the tide gauge of Punta della Salute were considered. The evaluation of the prediction accuracy for the statistical models was instead carried out for the period 18 March 2012-30 June 2013 for the same tide gauge [35]. The comparison was conducted based on an accuracy index, expressed as: where εm indicates the mean error and 2σ the confidence interval, with σ representing the standard deviation of the errors, i.e., the differences between predicted and measured levels. Statistical models take into account the hourly time series of the sea level and of the meteorological parameters, such as barometric pressure, wind speed, and wind direction observed in the previous hours. For the first model, referred as STAT-1, the meteorological parameters were provided by the European center ECMWF, while for the second model, referred as STAT-2, those parameters were provided by the BOLAM limited area hydrostatic model [36]. The SHYFEM model consists of a two-dimensional hydrodynamic model which allows to forecast the sea level over the entire Mediterranean basin, based on the wind speed, wind direction, and atmospheric pressure fields provided both by ECMWF (SHYFEM 5) and BOLAM (SHYFEM 6), as for the statistical models [37]. Table 4 reports the values of the accuracy index, showing a comparison between NARX-based models, SHYFEM, and statistical models, and grouping the measured levels in height classes. As for the statistical models, the comparison was made regardless of the lag times, while for the SHYFEM model data relating to lag times equal to 24, 48, and 72 h were available.
For htide in the range of 80 to 100 cm, SHYFEM model highlights mean errors which, for ta = 72 h, reach values closer to 0 with respect to the NARX models (Model A-εm = −0.9 cm, SHYFEM 6-εm = −0.3 cm). However, both NARX-based models exhibit significantly narrower confidence interval 2σ in comparison with SHYFEM and statistical models. In particular, for htide between 80 and 100 cm, the lower value of 2σ was computed for Model A and ta = 24 h, equal to 5.5 cm (Model B and ta = 24 h-2σ = 6.1 cm). Considering the same htide interval and lag time, deterministic and statistical models shows a wider confidence interval, with 2σ equal to 12.9, 16.8, 16.1, and 16.6 cm, respectively, for SHYFEM 5, SHYFEM 6, STAT-1, and STAT-2. Moreover, NARX models do not show a marked increase in 2σ as the lag time increases, with 2σ = 6.1 cm for Model A and ta = 72 h (Model B and ta = 72 h-2σ = 6.2 cm). The same cannot be said for the deterministic models that highlight a relevant increase in 2σ as the lag time increases, with 2σ equal to 18.5 cm (SHYFEM 5) and 20.9 cm (SHYFEM 6) for ta = 72 h.
As the storm tide level increases, statistical models exhibit a marked widening of the confidence interval. In particular, for htide > 120 cm, STAT-1 and STAT-2 show 2σ values, respectively, equal to 27 and 26.2 cm. Even the deterministic models exhibit a widening of the confidence interval. For ta = 24 h, 2σ equal to 26.2 and 26.1 cm were, respectively, computed for SHYFEM 5 and SHYFEM 6. As the lag time increases, instead, SHYFEM 5 and SHYFEM 6 show an opposite trend, with a narrowing of the confidence interval for the first model (2σ = 16.8 cm) and a widening of the same for the second model (2σ = 47.0 cm). NARX-based models instead are characterized by confidence intervals similar to those computed for htide between 80 and 100 cm. As for Model A, 2σ values were equal to 5.1 and 5.2 cm, respectively, for lag times of 24 and 72 h. Model B shows a greater widening of the confidence interval as the lag time increases, passing from 5.3 cm for ta = 24 h to 7 cm for ta = 72 h.
Overall, the NARX-based models highlight good accuracy for all lag times (from this point of view, the NARX-based models would seem to outperform those based on different Machine Learning algorithms, such as M5P and RF [38]) and sea level with a narrow confidence interval in comparison with the statistical and deterministic models, confirming the ability of the NARX network to provide reliable predictions of extreme storm tide events.

Conclusions
This study assessed the ability of NARX network-based models to forecast extreme storm tide events in Venice. A total of two models were built, including the lagged values of the measured sea level and astronomical tide as input values. In the first one, Model A, also the meteorological parameters related to wind features and barometric pressure were taken into account. The prediction accuracy of this model was very good. Meteorological parameters are not included as input in Model B, however, their influence is implicitly expressed by previously observed sea level values, leading to a still high prediction accuracy.
The ability to forecast even the exceptional storm tides, linked with a narrow confidence interval compared to different statistical and deterministic models, makes the NARX networks a reliable tool for the extreme storm tide events forecasting in Venice. Moreover, the limited effect of the lag time increase on the forecasting performance demonstrates the great capability of NARX networks to predict extreme events with a forecasting horizon of 1-3 days, that allows a timely activation of the MOSE barriers.
However, it should be noted that such an approach leads to satisfactory results in appropriately monitored sites, for which adequate datasets are available, including even extreme events. Therefore, where sufficient data are not available, with particular reference to extreme events, an approach based on Machine Learning algorithms cannot be used.

Appendix A. Ensemble Model
In order to evaluate the temporally transferability of the model, a further NARX modeling was performed. Several models were built to predict the storm tide using different subsets for training and testing. In addition, the outcomes of each model were combined within an ensemble model in order to provide a single final forecast of the target values. In this case, the ensemble approach is equivalent to a cross validation for the time series.
In particular, 10 different networks were trained for the different lag times and for both Model A and B, the results of which were averaged to obtain the final predictions. The temporal continuity of the training datasets was preserved, since it was a time series modeling. After the training stage, in agreement with the NARX modeling described in Section 3.1., the models were used for the prediction of all tidal events with level higher than 110 cm from January 2009 to December 2020. Figure A1 illustrates the subsets used for training and testing the different networks.
For ta = 12 h the performances achieved by each network were always high, with R 2 values always higher than 0.93 for both Model A and Model B. A reduction in the performance was observed as the lag time increases. However, R 2 values were still higher than 0.76 for all network and for both Model A and Model B highlighting how, despite a long horizon forecasting, NARX modeling was still able to provide accurate predictions.
The ensemble model exhibited very good fit for both Model A and B and for all lag time (Table A1) Figure A2, the comparison between measured and predicted storm tides for the ensemble models (both Model A and Model B) and for different lag times is reported. All individual networks and the ensemble model showed absolutely comparable performance, ensuring the temporal transferability of NARX-based models.   Figure A2. Comparison between measured and predicted storm tide for different lag times-Ensemble model.

Appendix B. NARX Model with Only the Lagged Tide Level as Input Variable
In this Appendix, the predictions obtained with a third NARX-based model, indicated as Model C, are shown. This model differs from Models A and B in the input variables. In particular, Model C included only the lagged values of the tide level htide(t-ta) as input. As an example, Figure A3 reports the comparison between measured and predicted sea level for the same event investigated in Section 3.2., for which Models A and B showed good accuracy. Model C clearly failed to provide accurate predictions, leading to significant underestimation of the highest peaks. Table A2 reports some metrics that attest to the poor performance of Model C., with the best results for ta = 12 h (R 2 = 0.397, MAE = 17.77 cm, and RAE = 79.22%) and the worst ones for ta = 72 h (R 2 = 0.310, MAE = 18.72 cm, and RAE = 83.54%).
This further modeling shows the indispensability of the astronomical tide among the input variables.