An Experimental Assessment of Extreme Wave Evaluation by Integrating Model and Wave Buoy Data

: Calculating the signiﬁcant wave height (SWH) in a given location as a function of the return time is an essential tool of coastal and ocean engineering; such a calculation can be carried out by making use of the now widely available weather and wave model chains, which often lead to underestimating the results, or by means of in situ experimental data (mostly, wave buoys), which are only available in a limited number of sites. A procedure is hereby tested whereby the curves of extreme SWH as a function of the return time deriving from model data are integrated with the similar curves computed from buoy data. A considerable improvement in accuracy is gained by making use of this integrated procedure in all locations where buoy data series are not available or are not long enough for a correct estimation. A useful and general design tool has therefore been provided to derive the extreme value SWH for any point in a given area.


Introduction
The design of offshore structures, ships, and coastal works is based on an estimate of the extreme values of sea state parameters such as the significant wave height (SWH). Such an estimate is naturally a function of the probability of the extreme events, and it is normally carried out by fitting appropriate extreme value distribution curves to experimental data series. Data, whenever possible, are provided by in situ wave meters (buoys, or sometimes pressure gauges or wave stacks) and more recently by satellite altimeter data.
However, while the advantage of using experimental data are obvious, only very few wave meters sites are available all over the world, and even less have been kept long enough to provide a reliable historical series. The problem of evaluating extreme events in locations where no data are available remains therefore open.
The best available solution to this problem is nowadays provided by simulated data: during the last 20 years, many state and international meteorological centres, as well as some research institutions and private companies have started systematically running global and regional wave spectral generation and propagation models. Such simulation systems (henceforth indicated as "model") are in turn driven by meteorological forecasting systems and constantly validated through the acquisition ("assimilation") of measured data from both wave meters and satellite altimeters at fixed steps in time. Both forecast and analysis data are published in near real time, thus providing long time series of simulated data, also often indicated as "synthetic data", which are nowadays an important source of information for statistical analyses, to the point that the use of wave data produced by such model chains has now become commonplace. Estimating the extreme values of SWH for high return times through synthetic and test a methodology which integrates of synthetic and experimental data in order to yield SWH(T R ) curves for such locations.

Methods
In order to overcome the problems described in Section 1 to estimate SWH extreme values and return times from model data, a new procedure-in the following indicated as "integrated"-was proposed [4].

Integrated Procedure
The basic idea is that that the parameters of any SWH(T R ) function, which links SWH with its return time T R , are themselves randomly distributed and that the distribution of such parameters can be estimated by integrating the data from the model with those from the buoys in the area. A somewhat similar approach with rainfall data is reported in [31].
Thus, if SWH m (T R , X, Y) is the significant wave height for the generic location X Y and for a given T R estimated from a wave spectral generation and propagation, it can be assumed that the "true" value of SWH b is given by the following Equation (1): In the Equation (1), E represents the error originating from many sources, but mostly due to the meteorological uncertainty. It should here be remembered that model data derive from a chain made up of two parts: a wave model which takes explicitly into account the physical aspects of the wave formation and propagation, which is basically a deterministic algorithm; and a meteorological part, which provides the input winds and involves necessarily a higher degree of randomness.
It should be made clear here that the value SWH b is "true" in a purely conventional sense, and it is assumed here to be the value computed from buoy recorded data, i.e., the value that would be used for design or research purposes if experimental data were available. Additionally, the procedure proposed here is independent from the particular form of extreme value distribution SWH(T R ), which is adopted and fitted to the data: there are indeed many alternatives, and the relevant state of the research on the field has been briefly discussed in Section 1.
In particular, according to current ocean engineering, the Weibull distribution was adopted (Equation (2)) with the Peak Over Threshold (POT) method in the form described for instance in [6].
where H P are the peak values of SWH while A, B and k are the distribution parameters also respectively known as scale, position, and shape parameters. While A and B can be computed with the least square method, the shape parameter k, following the usual practise as indicated in [6,32], is chosen with the best fit criterion among the following 4 values (0.75; 1.00; 1.40 and 2.00).
In any case, since the present paper is not aimed at evaluating, discussing or recommending one particular form of SWH(T R ) or any particular procedure to estimate its parameters, the only requirement is that such form and procedure should be uniform throughout the whole analysis.
Once the distribution parameters are known in a given location, the SWH return value for a return period T R (in years) is computed by making use of Equation (3): where λ is mean frequency of the recorded extreme events: it is given by ratio between the total number of events N T and the length n of the observation period expressed in years. The same operation is carried out in each available buoy location with both the historical experimental direct datasets SWH b (T R ) and with the model data series SWH m (T R ); since however most of the times the buoy positions do not exactly coincide with model grid points, a spatial bi-linear interpolation procedure (co-location) is used, as described in [33].
Assuming then the error E in Equation (1) to be represented by an appropriate probability distribution E(µ, σ), which is of course unknown, the problem is reduced to the search of the appropriate parameters; i.e., the average µ and the root mean square σ of its distribution. They can be estimated, again for each T R , by taking into account the "true" values SWH bi (T R ) computed at the locations i where the wave buoys are available. E i (T R ), again for each location i is thus evaluated as: The relative error e i at location i is then given by Equation (5): Its expected value µ(T R ) in the area can then be estimated as: and its root mean square σ(T R ) as: where the sums over the index i being extended to N b wave buoys considered in the region. Therefore, for a generic location t where no buoy data are available, an estimated SWH st of the "true" SWH bt , can be obtained from SWH mt (T R ) by following Equation (8) SWH st (T R ) = SWH mt (T R ) + µ(T R ) × SWH mt (T R ), where SWH mt (T R ) is evaluated by using model recorded data at location t. The integrated procedure, is thus schematically illustrated in Figure 1 for a general test location.
Water 2020, 12, x FOR PEER REVIEW 4 of 14 Assuming then the error E in Equation (1) to be represented by an appropriate probability distribution E(μ, σ), which is of course unknown, the problem is reduced to the search of the appropriate parameters; i.e., the average μ and the root mean square σ of its distribution. They can be estimated, again for each TR, by taking into account the "true" values SWHbi(TR) computed at the locations i where the wave buoys are available. Ei(TR), again for each location i is thus evaluated as: Ei(TR) = SWHbi(TR, Xi, Yi) − SWHmi(TR, Xi, Yi).
Its expected value μ(TR) in the area can then be estimated as: and its root mean square σ(TR) as: where the sums over the index i being extended to Nb wave buoys considered in the region. Therefore, for a generic location t where no buoy data are available, an estimated SWHst of the "true" SWHbt, can be obtained from SWHmt(TR) by following Equation (8) SWHst(TR) = SWHmt(TR) + μ(TR) × SWHmt(TR), where SWHmt(TR) is evaluated by using model recorded data at location t. The integrated procedure, is thus schematically illustrated in Figure 1 for a general test location. The computational cost of the method is not very high, since it amounts to computing 2Nb + 1 EVPD, an easily standardized and commonly available algorithm, as opposed to a single one, as it would be needed for a conventional procedure based on model data only. The estimation of Ei(TR), ei(TR) and SWHst(TR) is straightforward and can be carried out in a single EXCEL ® file. The computational cost of the method is not very high, since it amounts to computing 2N b + 1 EVPD, an easily standardized and commonly available algorithm, as opposed to a single one, as it would be needed for a conventional procedure based on model data only. The estimation of E i (T R ), e i (T R ) and SWH st (T R ) is straightforward and can be carried out in a single EXCEL ® file.

Validation
In order to verify the soundness of the procedure and the reliability of the results in a given test location, enough historical data must be available to estimate the experimental true value of the SWH b (T R ) with a given T R return time in order to compare it with the estimated value. It is worth recalling that, as stated above, "true" means the value that would be computed from an experimental time series.
In order to do so, the spatial distribution parameters have to be evaluated by making use of a number of wave meters that should not include the test location.
Assuming then that in a given area there are N b buoys available, the procedure is applied by taking one of them to provide the "true" values at the test location t, while the remaining N a = N b − 1 series are used to estimate the relative error distribution e(T R ) according to Equation (5). The procedure highlighted in Figure 1 can thus be applied N b times, each time choosing in rotation one of the available data series to be taken as test location. Such a methodology, normally called "jackknife" is well known and has been tested in various application [34,35].
For each of the N b available test locations a corrected estimated return time curve (SWH st ) is thus obtained, and it can be compared with the "true" SWH bt curve. The following Figure 2 reports an example of the results for NOAA buoy 46014 located along the US Pacific Coast. Here N b = 8 and therefore N a = 7.
Water 2020, 12, x FOR PEER REVIEW 5 of 14 recalling that, as stated above, "true" means the value that would be computed from an experimental time series. In order to do so, the spatial distribution parameters have to be evaluated by making use of a number of wave meters that should not include the test location.
Assuming then that in a given area there are Nb buoys available, the procedure is applied by taking one of them to provide the "true" values at the test location t, while the remaining Na = Nb − 1 series are used to estimate the relative error distribution e(TR) according to Equation (5). The procedure highlighted in Figure 1 can thus be applied Nb times, each time choosing in rotation one of the available data series to be taken as test location. Such a methodology, normally called "jackknife" is well known and has been tested in various application [34,35].
For each of the Nb available test locations a corrected estimated return time curve (SWHst) is thus obtained, and it can be compared with the "true" SWHbt curve. The following Figure 2 reports an example of the results for NOAA buoy 46014 located along the US Pacific Coast. Here Nb = 8 and therefore Na = 7. In Figure 2, the estimated SWHst is much closer to the buoy value SWHbt than the model SWHmt, thus providing a better and safer evaluation of the sea state design value. The same computation can be carried out by rotating the test position among the Nb available buoy locations: this kind of analysis is performed over three test areas described in Section 2.2.
For each test location t the model error (ERM) and the integrated procedure error (ERS), both normalized with the buoy value and indicated as "relative errors", are computed and defined respectively as: In Figure 2, the estimated SWH st is much closer to the buoy value SWH bt than the model SWH mt , thus providing a better and safer evaluation of the sea state design value. The same computation can be carried out by rotating the test position among the N b available buoy locations: this kind of analysis is performed over three test areas described in Section 2.2. For each test location t the model error (ERM) and the integrated procedure error (ERS), both normalized with the buoy value and indicated as "relative errors", are computed and defined respectively as: Improvement, which is the difference between the absolute relative errors, will show how accurate the integrated procedure is compared to the simple application of the model data; all values will be reported in percentage.

Study Sites
Three areas have been considered for this work: the West (Pacific) Coast and the East (Atlantic) Coast of the United States and the Gulf of Mexico, indicated in the following respectively as PC, AC and GoM. The reason for this choice is the availability of long and good quality wave buoy records given by NOAA (National Oceanic and Atmospheric Administration) National Data Buoy Center (NDBC) [36]. Figure 3 reports the areas considered with the geographical location of the buoys, and their relative NOAA-NDBC identification code number. All the data records are approximately 30 years long and the sampling rate is one hour.
Water 2020, 12, x FOR PEER REVIEW 6 of 14

Study Sites
Three areas have been considered for this work: the West (Pacific) Coast and the East (Atlantic) Coast of the United States and the Gulf of Mexico, indicated in the following respectively as PC, AC and GoM. The reason for this choice is the availability of long and good quality wave buoy records given by NOAA (National Oceanic and Atmospheric Administration) National Data Buoy Center (NDBC) [36]. Figure 3 reports the areas considered with the geographical location of the buoys, and their relative NOAA-NDBC identification code number. All the data records are approximately 30 years long and the sampling rate is one hour. Model data are obtained from the climate forecast system (CFS), run by NOAA National Center for Environmental Prediction (NCEP) [37], which produced the CFS Reanalysis (CFSR) dataset, i.e., a reanalysis of the sea and atmosphere state for the period of 1979 to 2009, on all grid points and with a 3-hourly time resolution. For this study we used the two following 10' × 10' nested grid data sets:


Gulf of Mexico and NW Atlantic 10 min (ecg_10m) for buoys located in Gulf of Mexico and along US Atlantic Coast;  US West Coast 10 min (wc_10m) for buoys located along US Pacific Coast.
The integrated procedure had already been applied for some of the GoM area buoys in [4], but here for the first time a jackknife validation as described in Section 2.2 is carried out. The model data have also been updated to the most recent version (Phase 2) of CFSR dataset, which was not yet available at the time.

Results
The most important result of the work is the comparison between SWH(TR) evaluated through the different procedures for each zone: Model data are obtained from the climate forecast system (CFS), run by NOAA National Center for Environmental Prediction (NCEP) [37], which produced the CFS Reanalysis (CFSR) dataset, i.e., a reanalysis of the sea and atmosphere state for the period of 1979 to 2009, on all grid points and with a 3-hourly time resolution. For this study we used the two following 10' × 10' nested grid data sets:

•
Gulf of Mexico and NW Atlantic 10 min (ecg_10m) for buoys located in Gulf of Mexico and along US Atlantic Coast; • US West Coast 10 min (wc_10m) for buoys located along US Pacific Coast.
The integrated procedure had already been applied for some of the GoM area buoys in [4], but here for the first time a jackknife validation as described in Section 2.2 is carried out. The model data have also been updated to the most recent version (Phase 2) of CFSR dataset, which was not yet available at the time.

Results
The most important result of the work is the comparison between SWH(T R ) evaluated through the different procedures for each zone: • SWH bt through wave meter data; • SWH mt through model data; • SWH st through integrated procedure proposed here.
The metrics "ERM", "ERS" and "Improvement" as defined in Section 2.2 are also reported.

Pacific Coast
Results are reported in Tables 1 and 2, while Figure 4 provides some comparisons in graphical forms. The percentage errors for T R = 100 are very similar to those computed for T R = 50. This derives from the similarity of the SWH(T R ) curves for high values of T R , as noted in Section 2.2 and as shown in Figure 2. The analysis of the results can therefore be limited to just one value of the return periods, i.e., T R = 50 years. Figure 4 compares the results of the integrated procedure versus the model for T R = 50 years and provides an insight into the behaviour of the errors. Figure 4 shows that the values computed with the integrated procedure (green circles) are visibly closer to the values computed with buoy data (blue line) than those computed with the model data (red circles). The low value of the R 2 between SWH st and the identity line reflects the dispersion of the error; obviously its average and its extreme values represent a net improvement over the simple model data.   Figure 4 shows that the values computed with the integrated procedure (green circles) are visibly closer to the values computed with buoy data (blue line) than those computed with the model data (red circles). The low value of the R 2 between SWHst and the identity line reflects the dispersion of the error; obviously its average and its extreme values represent a net improvement over the simple model data.

Atlantic Coast
Numerical results are reported in the following Tables 3 and 4 with the same format and symbols as in Tables 1 and 2.

Atlantic Coast
Numerical results are reported in the following Tables 3 and 4 with the same format and symbols as in Tables 1 and 2. The results of the integrated procedure are compared with the model values and the respective errors are plotted against the true buoy values SWH bt in Figure 5. The similarity between the results for T R = 100 and 50 years is also confirmed. The results of the integrated procedure are compared with the model values and the respective errors are plotted against the true buoy values SWHbt in Figure 5. The similarity between the results for TR = 100 and 50 years is also confirmed. As in the previous case (Figure 4), the improvement is evident with the green circles being much closer to the blue line than the red circles.

Gulf of Mexico
The following Tables 5 and 6 report again the numerical result as in Tables 1-4. As in the previous case (Figure 4), the improvement is evident with the green circles being much closer to the blue line than the red circles.

Gulf of Mexico
The following Tables 5 and 6 report again the numerical result as in Tables 1-4.     Note that there is here a single instance (buoy 42002) where for both the TR considered the absolute values of ERS are greater than correspondent ERM values with consequent negative improvements (−2.74% and −4.04%); however, even in this case, as in all the others, the model results lead to underestimating the SWHbt(TR) while the integrated procedure overestimates it, certainly a Note that there is here a single instance (buoy 42002) where for both the T R considered the absolute values of ERS are greater than correspondent ERM values with consequent negative improvements (−2.74% and −4.04%); however, even in this case, as in all the others, the model results lead to underestimating the SWH bt (T R ) while the integrated procedure overestimates it, certainly a safer error than the former one. The effectiveness of the integrated procedure in removing the bias of the model is thus confirmed again.

Discussion
In considering the results, it should be borne in mind that the object of the investigation is inherently stochastic, since the formation of waves is a natural process driven by meteorological phenomena. The unknown functions SWH(T R ) are the outcome of a statistical elaboration on extreme wave data, and as such they are unavoidably affected by random variations.
The following Table 7, which summarises the main results of Section 3, shows that the SWH mt values, computed through the simple use of model data are systematically affected by an error that is never lower in absolute value than 9.70%, and on average of the order of 20%. Besides, the errors are always negative, i.e., the SWH mt present a negative bias over the buoy values, so that using model results as design parameters would seriously put any coastal or offshore construction at risk. It also appears that, as remarked in Section 3.1, the model errors are consistent between the 50 and 100 years return times, since the SWH(T R ) curves portray a similar behaviour for the high values of T R , which are of interest in practical applications.
A further consideration is that, while the average extreme SWH in the three areas are not too far apart from each other, the model errors are relatively lower in the GoM-most likely reflecting a slightly better performance of the NOAA modelling system in the meteorological conditions of that area.
By applying the integrated procedure, the error ERS decreases to about 0.3-0.5%-a drastic and consistent improvement-even though there are some variations between the various test sites. Only one location (buoy 42002 in GoM, see Tables 5 and 6) shows a negative improvement, i.e., an error of the integrated procedure which is, in absolute value, slightly (4%) greater than the model error. Additionally, in this case however, the resulting error has a positive value, i.e., the SWH computed with the integrated procedure is higher than the buoy-based value-certainly a safer result.
An inherent negative bias in the model has thus been completely removed. It is worth mentioning that the presence of such a negative bias confirms what had been highlighted in previous work [1,3,25,38], i.e., that the weather/wave models underestimate the extremes despite the constant assimilation of satellite measurement, which due to their coarse temporal and spatial resolution are likely to miss the strongest peaks of the storms.
Considering the inherently stochastic nature of the problem, the results are more than satisfying. Applying the integrated methodology to any given location requires of course the handling of a considerable amount of data. Simply using model data only requires downloading a single series from one of the available weather/wave systems, such as NOAA [37] or ECMWF [39]; integrating it, if N b wave buoy are available, requires downloading N b extra model data series as well as N b buoy data series. On the other hand, the computational effort is not particularly heavy since the extreme value fitting procedures, briefly recalled in Section 2.1, are nowadays fully standardized.

Conclusions
A new procedure, based on integrating wave model data with those obtained with experimental wave data series, has been proposed and tested to evaluate SWH values for a given return time T R in a given location.
Extreme SWH values derived from model time series are integrated with the corresponding extreme value series from available buoys in the same geographical area.
While of course the calibration or the assimilation of measured wave data with weather/wave model results is nothing new, this is the first time that an integration is carried out to determine the parameters of the SWH extreme value distributions. Such parameters are themselves randomly distributed and they have been estimated by making use of the differences between the SWH obtained from the model and those computed from the in-situ data. In other words, the model data are used as indicators, and the buoy data are used for the correction of biases and the evaluation of uncertainties.
By making use of a jackknife procedure, 24 tests have been carried out on three areas along the coasts of the United States, where long records of good quality buoy data are available. It has been shown that the use of model data always causes a relevant undervaluation of the SWH values for a given return time T R and that a considerable improvement in accuracy is gained by making use of this integrated procedure in place of using model data.
In a location where there are no buoy wave meters, or where the data series are not long enough for a correct estimate, the integration of the buoy data in the area with model archive data provides a useful and general design tool to derive the extreme value SWH.
Compared with the simple application of a model on the location of interest, the integrated procedure requires a much heavier amount of data, since it implies accessing many time series rather than a single one; on the other hand, the computational burden is not excessive, since it only requires well known and widely accepted procedures for EVPD.
Extension of the method to any other area in the world is certainly possible, provided that a number of wave meter data series in the general area are accessible: while model data are available all over the world, how to identify and evaluate the relevant wave meter buoys is a matter of specific investigation, which should be carried out along the lines outlined here.
Funding: This research received no external funding.