Characterization of Extreme Wave Conditions for Wave Energy Converter Design and Project Risk Assessment

: Best practices and international standards for determining n -year return period extreme wave (sea states) conditions allow wave energy converter designers and project developers the option to apply simple univariate or more complex bivariate extreme value analysis methods. The present study compares extreme sea state estimates derived from univariate and bivariate methods and investigates the performance of spectral wave models for predicting extreme sea states at buoy locations within several regional wave climates along the US East and West Coasts. Two common third-generation spectral wave models are evaluated, a WAVEWATCH III ® model with a grid resolution of 4 arc-minutes (6–7 km), and a Simulating WAves Nearshore model, with a coastal resolution of 200–300 m. Both models are used to generate multi-year hindcasts, from which extreme sea state statistics used for wave conditions characterization can be derived and compared to those based on in-situ observations at National Data Buoy Center stations. Comparison of results using di ﬀ erent univariate and bivariate methods from the same data source indicates reasonable agreement on average. Discrepancies are predominantly random. Large discrepancies are common and increase with return period. There is a systematic underbias for extreme signiﬁcant wave heights derived from model hindcasts compared to those derived from buoy measurements. This underbias is dependent on model spatial resolution. However, simple linear corrections can e ﬀ ectively compensate for this bias. A similar approach is not possible for correcting model-derived environmental contours, but other methods, e.g., machine learning, should be explored. V.S.N., M.N.A., T.W., Z.Y. Investigation,


Introduction
Best practices for determining the site-specific environmental conditions, design load cases (DLC), and load responses for maritime structures and their subsystems (e.g., offshore oil platforms, offshore floating wind turbines, wave energy converters, and mooring systems) are found in a variety of international standards, e.g., [1][2][3]. Some of the most common load types experienced by maritime structures include hydrostatic, hydrodynamic (currents, waves), and aerodynamic (wind) loads. Design conditions include transport, operation under normal conditions occurring every year, and survival under extreme or abnormal environmental conditions for a range of return periods (e.g., 1, 5, 50, and are accepted for wave energy resource assessment and ocean engineering design when validated with buoy measurements [1,2]. Model hindcasts also provide a means to address the noted limitations of spatial coverage and measurement duration from buoy observations. Model performance studies have demonstrated that the third-generation spectral wave models WWIII and SWAN can accurately predict a host of wave energy resource statistics, and that their accuracy is significantly improved with grid refinement [19,20]. Further, these models can examine the spatial variation of wave characteristics at fine resolutions over larger regions encompassing different wave climates, e.g., [20,21]. These models, however, underpredict large waves and extreme n-year significant wave heights. The main source of this underbias is due to limitations of most wind reanalysis datasets, namely their inability to resolve fine-scale high-energy wind gusts, e.g., [22,23]. Model performance studies commonly emphasize predictions of common sea state statistics (bulk parameters) on average, e.g., H s and peak period, T p . Several studies, however, have evaluated model performance predicting extreme significant wave heights at return periods equal to or greater than 1-year, e.g., [7,19,24,25]; and some have examined the use of simple linear correction methods to model-derived estimates of 100-year significant wave heights, H s(100) , to reduce model underbias relative to observation-derived estimates [24,25].
Few studies have compared n-year return period significant wave heights, H s(n) , derived using different univariate extreme value analysis methods, e.g., [9], or compared H s(n) derived using univariate methods with the maximum n-year contour H s values derived using bivariate methods, e.g., [7]. Peak-over-threshold (POT) methods should be used with care and their results should be checked with other methods, e.g., the annual maxima (AM) method, as a threshold value selection can be subjective [1,9]. On the other hand, the AM method, the most common of the so-called block maxima methods, can be subject to large uncertainty due to the small sample population [9]. In comparing H s(100) values estimated by univariate analysis (POT method) with maximum values of 100-year environmental contours estimated using bivariate analysis, [7] found values agreed within one meter for twelve of fifteen sites; but no systematic bias was observed.
The present study investigates characterization of extreme wave conditions using different data sources, buoy measurements vs. wave model outputs, and different methods allowed by design standards, univariate methods (AM and POT) and the bivariate environmental contour method. The goal is to understand how estimates of extreme wave characteristics are affected by both factors, and to identify bias trends that may be adjusted. Compared to previous investigations, the present study is much broader in scope in terms of the range of return periods evaluated, the different data sources and methods used, and the number of wave sites and wave climates considered. We compare model skill predicting extreme significant wave heights with 1, 5 and 50-year recurrence intervals, H s (1) , H s (5) , and H s(50) , at approximately two dozen buoy locations spanning several regional wave climates along the US East and West Coasts. We also investigate the use of simple linear corrections to model-derived extreme wave heights to improve their agreement with extreme wave heights derived from buoy measurements. While this model performance evaluation does not consider spectral details that can identify sources of model deficiencies, e.g., [26], it does provide an initial assessment of model performance trends, including bias, and how performance may vary among different sites within a regional wave climate and between different wave climates.

Study Region and Buoy Observations
The study region, shown in Figure 1, encompasses potential wave energy project sites within the economic exclusion zones (EEZ) along the US East and US West Coasts [27]. Only National Data Buoy Center (NDBC) stations with hourly H s and T e time series spanning long deployment periods of at least 12.5 years were selected, to meet the ISO standard for estimating 50-year return period events using the POT method, which recommends periods of record (POR) at a quarter of the desired return period [15]. Missing data records in these time series reduced the POR to 12.21 years at Station 46050 and to 10.62 years at Station 46054 ( Figure 1, Table A1). Also, data in these time series were removed to match modelled hindcast time series generated at three-hour intervals. Depths at these stations vary from 30 m (44009) to 4048 m (41002) and are classified as intermediate and deep wave sites with normalized peak frequencies of f P > 0.05 g/h. resolution of 200 m for coastal areas extending 20 km offshore and a resolution of 1-3 km for the inner-shelf regions [21,29]. At thirty years or more, these model hindcasts exceed minimum requirements for estimating extreme wave heights with return periods up to fifty years. and time series outputs from these model hindcasts at three-hour intervals are used to estimate ( ) , ( ) , and ( ) and environmental contours ( , ) for comparison with those derived from measured (observation) ( , ) time series at NDBC stations at eight sites along the East Coast and thirteen sites along the West Coast. For the initial comparison with observation-derived ( ) estimates, the model-derived time series are limited to the observation POR at each station. These POR-limited model-derived estimates are also used to determine the linear correction, to adjust raw model-derived estimates to better match observation-derived values.

Univariate Extreme Value Analysis
The AM method is applied to estimate ( ) and ( ) from time series generated from models and buoy observations for comparison, at three-hour intervals from all data sources at corresponding intervals and spanning the same period of record (POR) to ensure consistency. This method, which fits the yearly maxima to a Gumbel distribution, is an accepted standard detailed in [1]. It is simple to implement and requires no user inputs that can introduce user bias like peak-over-threshold (POT) methods. It does, however, require a minimum POR of approximately 20 years [1]. While this requirement is not met for ten sites, time series for these sites strengthen the correlation between the model-and measurement-derived ( ) by doubling the population of samples used to derive the linear correction for model-derived ( ) estimates. As the same POR is

Modelled Hindcast Data Sources
The present study utilizes modelled hindcast data from several wave model studies to estimate extreme wave statistics: a WWIII 30-year model (v.5.08) hindcast study encompassing all US coastal waters with a spatial resolution of 4 minutes (6-7 km) [28], a high-resolution regional SWAN 32-year model hindcast study for the US West Coast with a spatial resolution of 200-300 m [20], and a high-resolution regional SWAN 32-year model hindcast study for the US East Coast with a spatial resolution of 200 m for coastal areas extending 20 km offshore and a resolution of 1-3 km for the inner-shelf regions [21,29]. At thirty years or more, these model hindcasts exceed minimum requirements for estimating extreme wave heights with return periods up to fifty years. H s and T e time series outputs from these model hindcasts at three-hour intervals are used to estimate H s(1) , H s (5) , and H s(50) and environmental contours (H s , T e ) n for comparison with those derived from measured (observation) (H s , T e ) time series at NDBC stations at eight sites along the East Coast and thirteen sites along the West Coast. For the initial comparison with observation-derived H s(n) estimates, the model-derived H s time series are limited to the observation POR at each station. These POR-limited model-derived estimates are also used to determine the linear correction, to adjust raw model-derived estimates to better match observation-derived values.

Univariate Extreme Value Analysis
The AM method is applied to estimate H s (5) and H s(50) from H s time series generated from models and buoy observations for comparison, at three-hour intervals from all data sources at corresponding intervals and spanning the same period of record (POR) to ensure consistency. This method, which fits the yearly maxima H s to a Gumbel distribution, is an accepted standard detailed in [1]. It is simple to implement and requires no user inputs that can introduce user bias like peak-over-threshold (POT) methods. It does, however, require a minimum POR of approximately 20 years [1]. While this requirement is not met for ten sites, H s time series for these sites strengthen the correlation between the model-and measurement-derived H s(n) by doubling the population of samples used to derive the linear correction for model-derived H s(n) estimates. As the same POR is used for model and observation data sources, the bias introduced by relaxing this requirement is consistent for both data sources. Following comparisons of extreme wave height estimates between buoy observations and modelled data sources that are limited to the buoy POR, estimates of extreme wave height using the entire 30 years of the modelled hindcast data sources are also provided herein to examine the dependence of POR length; especially for the buoy stations that did not meet the minimum POR requirements for the AM method.
For low return periods below 5 years, POT methods are recommended [4]. We apply a POT method to estimate H s(1) utilizing the generalized Pareto distribution model (GPD), which has been broadly applied for estimating extreme wave heights, e.g., [7,24,30,31]. For a Type I GPD model, the shape parameter is zero [30], and the cumulative distribution function is where x is the extreme significant wave height associated with an individual storm, µ is the threshold significant wave height used to filter the sample population of significant wave heights in the time series, and σ, the scaling parameter, is the mean value of the excess (x − µ).
As with other studies, the threshold is selected to be low enough to maintain a minimum population of samples to ensure a robust model fit that limits variance, while high enough to be characterized as a tail sample by the GPD model. Quantile-quantile plots are used to identify the threshold that provides the best-modelled distribution fit [30]. The Wald-Wolfowitz Runs (WWR) test is used to check that samples are independent [32]. The most notable disadvantage of POT method is its sensitivity to the threshold value selected. Despite best efforts to objectively select threshold values, it is difficult not to introduce user bias. Threshold values selected in the present study were 99-percentile H s values on average, with a standard deviation of several tenths of a meter. For comparison, [7] found that 98-percentile values provided the best regional model fits along the Canadian Pacific Coast when using a Type II GPD model. Vanem [9], in comparing estimates of H s(100) and H s (20) using block maxima and POT methods at a wave site in the North Atlantic, selected 99.5 and 99.95-percentile values for his study to examine the sensitivity of extreme significant wave height to the threshold value.

Linear Correction of Modelled Extreme Wave Heights
A simple linear correction method proposed by Stephens and Gorman [25] is applied to improve the agreement between H s(n) values derived from modelled and measured (observation) data sources at the observation-limited POR, where modelled H s(n) values are scaled by the average relative bias among the study sites, where N is the number of sites, and i is the site index. This correction scaling constant, s, should not be confused with the average mean bias, B, which considers bias relative to the measurement-derived value to evaluate the model performance.
To compare the goodness of fit between the model and measurement-derived H s(n) values, a linear regression best-fit line is plotted along with the line of equivalence, and r 2 , and slope (m) values are reported (Figures 2-4). The mean absolute relative bias (B) is calculated as a summary statistic to quantify this comparison, Here, the bias is relative to the measured data source (observation) rather than the modelled data source as calculated for the scaling factor using (2).

Bivariate Extreme Value Analysis (Environmental Contours)
Environmental contours for 1-, 5-and 50-year sea states within the (H s , T e ) parameter space are created using an inverse first-order reliability method (IFORM) with principal components analysis (PCA) and inverse Gaussian and normal distribution models as described in [14]. This method is similar to the traditional IFORM summarized in [1]. PCA is applied prior to the IFORM to generate an uncorrelated representation of (H s , T e ) in terms of principal components (C 1 , C 2 ) that result in better fitting contours. The probability density function of the principal component C 1 is parameterized with an inverse Gaussian (Wald) distribution model with parameters µ and λ determined by maximum likelihood estimation (MLE). Values of C 2 are binned based on their corresponding C 1 values. Probability density functions of C 2 conditioned on C 1 are parameterized with a normal (Gaussian) distribution model with estimates of the C 2 normal distribution parameters µ and σ as a function of C 1 for each bin modelled with the fitting functions with linear coefficients β 0 , β 1 , and quadratic coefficients γ 0 , γ 1 , γ 2 , determined by a least square regression procedure. The fitted inverse Gaussian distribution model for C 1 and fitted normal distributions models for C 2 , as a function of the mean value of C 1 for each bin are used to construct an environmental contour in principal component space for a given return period using the IFORM, which is then transformed back to the (H s , T e ) parameter space using the equations where v 1,1 , v 1,2 , v 2,1 , and v 2,2 are the PCA rotation coefficients and s is the required adjustment applied to the principal component, C 2 i , to ensure it is greater than zero in the principal components space.
When negative values of H s i occur at very low values, an artifact of the PCA not being constrained by limitations of the wave physics, they are set to zero. As these sea state occurrences delineate very low wave heights at the base of the contour line, they are not significant for characterizing extreme wave loads.

Comparison of Univariate Methods
Estimates of H s(n) derived using both AM and POT methods are compared at all study buoy stations, and for all three data sources, in scatter plots shown in Figure 2. Again, for this comparison, the periods of record of model-derived data sources were reduced to match those of the observation-derived data sources for each buoy station-not the entire 30-year POR. The perfect agreement line and slope and r 2 values of linear regression fits are shown to quantify how well estimates are correlated using AM and POT methods applied to the same data source.
The comparison of H s(1) estimates derived using the AM and POT methods, Figure 2a, is presented to illustrate the problem that arises when using the AM method for low return period events, and why it is not recommended practice [4]. The present study adopts the POT method for estimating H s(1) in subsequent analyses. H s(1) estimates derived using the AM method are significantly lower than those derived using the POT method, with all points falling below the line of equivalence. As indicated by the low values for regression slope, this underbias increases with the H s(1) magnitude. Low r 2 values indicate large scatter for all data sources, i.e., large and frequent discrepancies between estimates when using these two different methods. By comparison, Figure 2b,c shows that H s (5) and H s(50) estimates are in much better agreement. Points are nearly equally distributed about the line of equivalence, slopes are close to 1 and high r 2 values show significant correlations between results using these two different methods applied to the same data source. Agreement is relatively better for H s (5) estimates, but the good agreement for both H s (5) , and H s(50) support using the AM method for the present study due to its simpler implementation. Nevertheless, while discrepancies between estimates using the AM and POT method are random, they can vary significantly, similar to [9]. The magnitude and frequency of these large discrepancies increase with return period, but are less sensitive to the POR or data source. In some cases, H s(50) estimates are in perfect agreement, while, in other cases, H s(50) estimates differ by more than 2 m.
Low values indicate large scatter for all data sources, i.e., large and frequent discrepancies between estimates when using these two different methods. By comparison, Figure  2b,c shows that ( ) and ( ) estimates are in much better agreement. Points are nearly equally distributed about the line of equivalence, slopes are close to 1 and high values show significant correlations between results using these two different methods applied to the same data source. Agreement is relatively better for ( ) estimates, but the good agreement for both ( ) , and ( ) support using the AM method for the present study due to its simpler implementation. Nevertheless, while discrepancies between estimates using the AM and POT method are random, they can vary significantly, similar to [9]. The magnitude and frequency of these large discrepancies increase with return period, but are less sensitive to the POR or data source. In some cases, ( ) estimates are in perfect agreement, while, in other cases, ( ) estimates differ by more than 2 m. for these comparisons, the periods of record of model-derived data sources were reduced to match those of the observation-derived data sources for each buoy station given in Tables A1, A2 and A3.  . Note that, for these comparisons, the periods of record of model-derived data sources were reduced to match those of the observation-derived data sources for each buoy station given in Tables A1-A3. As shown in Tables A1-A3, raw model-derived H s(n) estimates using full 30-year POR are generally within 10% of values using the lower buoy station limited POR. However, discrepancies are particularly large, equal to or exceeding 10%, for several stations. These large discrepancies are not generally associated with low periods of record. Raw model-derived H s(1) estimates using the POT method, as expected, are least sensitive to the POR length. Discrepancies are all below 10% with the exception of SWAN model-derived H s(1) estimates at Stations 46026 and 46054. As the return period increases and the AM method is employed, the number of large discrepancies increases. More large discrepancies are observed for SWAN model-derived estimates than WWIII.

Linear Correction of Modelled Extreme Wave Heights
Corrections applied to the model-derived H s(n) , with the scaling factors calculated using Equation (2), significantly improve the agreement between model and observation-derived values. The average relative absolute bias for WWIII is reduced to 3%-4%, and that for SWAN to 6%-7%. Comparing scatter plots with raw model-derived estimates, Figure 3a-c, to corrected model estimates, Figure 3d  (AM method). Note that raw estimates (a,b,c) are adjusted by linear correction with scaling factor, s, using (1) to generate corrected estimates (d,e,f). Note that, for these comparisons, the periods of record of model-derived data sources were reduced to match those of the observation-derived data sources for each buoy station given in Tables A1, A2 and A3.

Environmental Contours
Environmental contours delineating the 1-year, 5-year and 50-year extreme sea states are shown for five selected buoy stations along the East Coast, Figure 4, and five selected stations along the West Coast, Figure 5. These stations were selected to highlight the effect of latitude on extreme sea states for these two US wave climate regions. Each plot compares contours generated using three different data sources: buoy observations, WWIII hindcast data, and SWAN hindcast data. For all stations, these comparisons show that the model-derived contours capture expected trends that affect the contour size as measured by the range of ( ) and ( ) . As expected, the maximum values of ( ) increase with return period and with latitude along both coasts, as more energetic wave climates are found in northern latitudes [21,29]. There is no similar north-south trend in maximum values of ( ) .
WWIII model-derived contours are generally smaller than those based on buoy observations, but the WWIII model-based 5-year and 50-year contours at East Coast Station 44025 are nearly identical to those derived from observations. For the West Coast, WWIII and SWAN model-derived contours are similar, but both underpredict the ( ) and ( ) range compared to the contour based on observations. . Note that raw estimates (a-c) are adjusted by linear correction with scaling factor, s, using (1) to generate corrected estimates (d-f). Note that, for these comparisons, the periods of record of model-derived data sources were reduced to match those of the observation-derived data sources for each buoy station given in Tables A1-A3.

Environmental Contours
Environmental contours delineating the 1-year, 5-year and 50-year extreme sea states are shown for five selected buoy stations along the East Coast, Figure 4, and five selected stations along the West Coast, Figure 5. These stations were selected to highlight the effect of latitude on extreme sea states for these two US wave climate regions. Each plot compares contours generated using three different data sources: buoy observations, WWIII hindcast data, and SWAN hindcast data. For all stations, these comparisons show that the model-derived contours capture expected trends that affect the contour size as measured by the range of H s(n) and T e(n) . As expected, the maximum values of H s(n) increase with return period and with latitude along both coasts, as more energetic wave climates are found in northern latitudes [21,29]. There is no similar north-south trend in maximum values of T e(n) .
WWIII model-derived contours are generally smaller than those based on buoy observations, but the WWIII model-based 5-year and 50-year contours at East Coast Station 44025 are nearly identical to those derived from observations. For the West Coast, WWIII and SWAN model-derived contours are similar, but both underpredict the H s(n) and T e(n) range compared to the contour based on observations.
For the East Coast, SWAN model-derived contours are generally in good agreement. Similar to [7], discrepancies are small over a wide range of periods and wave heights for East Coast Station 44005 for 1-year and 5-year contours, and 41004 and 41010 for all n-year contours. At the East Coast stations, SWAN model-derived contours exhibit a longer T e(n) range than WWIII contours, and are in good agreement with buoy contours, e.g., Stations 44005, 41004, and 41010 for 1-year and 5-year contours. However, at a few stations, the SWAN contours overestimate the T e(n) range, e.g., 50-year contours at Stations 44005 and 44025. For the East Coast, SWAN model-derived contours are generally in good agreement. Similar to [7], discrepancies are small over a wide range of periods and wave heights for East Coast Station 44005 for 1-year and 5-year contours, and 41004 and 41010 for all n-year contours. At the East Coast stations, SWAN model-derived contours exhibit a longer ( ) range than WWIII contours, and are in good agreement with buoy contours, e.g., Stations 44005, 41004, and 41010 for 1-year and 5-year contours. However, at a few stations, the SWAN contours overestimate the ( ) range, e.g., 50-year contours at Stations 44005 and 44025.   Maximum values of ( ) on each contour are compared with values derived using POT and AM methods in Figure 6 for the three different data sources evaluated. Again, results using the AM method are shown to highlight differences at this low return period and why this method should not be used for low return period estimates. As shown in Figure 6a,b,c, the maximum values of ( ) on each contour agree well with ( ) values estimated using the POT method, similar to [7]. Values estimated using the AM method agree less well, likely because the univariate POT methods are more consistent with those used in environmental contour methods; this disagreement is more pronounced for 50-year estimates compared to 5-year, and shows sensitivity to data source at this higher return period. Maximum values on 5-year and 50-year contours exhibit a systematic underbias compared to estimates of ( ) and ( ) based on univariate methods, when using observation and SWAN model hindcast datasources. Maximum values of H s(n) on each contour are compared with values derived using POT and AM methods in Figure 6 for the three different data sources evaluated. Again, results using the AM method are shown to highlight differences at this low return period and why this method should not be used for low return period estimates. As shown in Figure 6a-c, the maximum values of H s(n) on each contour agree well with H s(n) values estimated using the POT method, similar to [7]. Values estimated using the AM method agree less well, likely because the univariate POT methods are more consistent with those used in environmental contour methods; this disagreement is more pronounced for 50-year estimates compared to 5-year, and shows sensitivity to data source at this higher return period. Maximum H s values on 5-year and 50-year contours exhibit a systematic underbias compared to estimates of H s (5) and H s(50) based on univariate methods, when using observation and SWAN model hindcast datasources.

Discussion
The present study demonstrates that extreme significant wave height estimates using the AM method and POT method can vary significantly, but estimates using the simpler AM method at higher return periods starting at n = 5-years are generally in good agreement with the POT method, even when the minimum 20-year POR requirement is not met [1]. The AM method should, therefore, be considered for estimating higher return period events to avoid the inherent user bias problems of the POT method. The POT method should always be used for low return period events, e.g., n = 1-year [4]. For estimating n-year significant wave heights between one and five years, the results of both methods should be compared with a preference for the POT method and good engineering judgment and understanding of the risks of overdesign and underdesign.
Raw model-derived estimates of extreme significant wave height with return periods equal to or greater than 1 year exhibit a systematic underbias compared to observation-derived estimates. This finding, along with similar observations made in previous studies for ( ) [24,25], indicates that raw model-derived estimates should not be used to characterize extreme wave or sea state conditions with return periods equal to or greater than 1 year for WEC design or project risk assessment. While raw model-derived estimates are still valuable for the relative comparison of project risks between sites, or the spatial distribution of extreme wave conditions, they should be corrected when used to build DLCs for WEC design.
Applications of grid refinement and improved model physics and atmospheric forcing data can significantly reduce model bias in predicting extreme wave heights, including those with high return periods up to 50 years. However, simple linear correction methods applied to model-derived estimates, ( ) , as demonstrated herein and in other studies [24,25], may offer a more economical approach to offset their observed underbias relative to observation-derived estimates. Without correction, design load cases for extreme sea states based on ( ) , ( ) , and ( ) could be

Discussion
The present study demonstrates that extreme significant wave height estimates using the AM method and POT method can vary significantly, but estimates using the simpler AM method at higher return periods starting at n = 5-years are generally in good agreement with the POT method, even when the minimum 20-year POR requirement is not met [1]. The AM method should, therefore, be considered for estimating higher return period events to avoid the inherent user bias problems of the POT method. The POT method should always be used for low return period events, e.g., n = 1-year [4]. For estimating n-year significant wave heights between one and five years, the results of both methods should be compared with a preference for the POT method and good engineering judgment and understanding of the risks of overdesign and underdesign.
Raw model-derived estimates of extreme significant wave height with return periods equal to or greater than 1 year exhibit a systematic underbias compared to observation-derived estimates. This finding, along with similar observations made in previous studies for H s(100) [24,25], indicates that raw model-derived estimates should not be used to characterize extreme wave or sea state conditions with return periods equal to or greater than 1 year for WEC design or project risk assessment. While raw model-derived estimates are still valuable for the relative comparison of project risks between sites, or the spatial distribution of extreme wave conditions, they should be corrected when used to build DLCs for WEC design.
Applications of grid refinement and improved model physics and atmospheric forcing data can significantly reduce model bias in predicting extreme wave heights, including those with high return periods up to 50 years. However, simple linear correction methods applied to model-derived estimates, H s(n) , as demonstrated herein and in other studies [24,25], may offer a more economical approach to offset their observed underbias relative to observation-derived estimates. Without correction, design load cases for extreme sea states based on H s(1) , H s (5) , and H s(50) could be significantly underestimated [3], and likewise for wave energy project risks based on H s(50) and H s(50) /H s(mean) [6].
Model-derived environmental contours predict similar trends to those derived by buoy observations, but the frequency and magnitude of discrepancies between model-and observation-derived contours is mixed, similar to [7]. In some cases, model-derived contours are in excellent agreement with buoy-derived contours, but generally they underestimate the H s and T e range, similar to model-derived estimates using univariate methods. In contrast to [7], results herein indicate a tendency for maximum H s values on contours to exhibit a systematic underbias compared to estimates based on univariate methods. Unlike univariate methods, there is no discernible improvement in the agreement of model-derived contours with observation-derived contours by increasing model spatial resolution. Further, simple linear correction methods cannot be applied to improve this agreement.
To summarize the main findings of the present study and their implications for design and project risk assessment, consider extreme wave height characterization based on H s(50) for NDBC Stations 46029 amd 46050. The various estimates of H s(50) are given in Table A3. Buoy observations at Station 46029 are limited to only fourteen years at this site, below the 20-year minimum requirement for estimating a 50-year return period event using the AM method [1], but above the 12.5-year minimum requirement for estimating this event using the POT method [15]. The observation-derived estimate of H s(50) = 14.3 m using the POT method described herein, therefore, follows the best-practice standard. Using the AM method, which does not follow best practice, H s(50) = 14.2 m. Finally, the maximum value of H s on the 50-year contour generated by bivariate analysis is 15.3 m. In this case, one would typically select H s(50) = 15.3 m to characterize the extreme wave condition for WEC design and project risk assessment. It follows best practice and it is a significantly larger and more conservative value than values given using the univariate methods.
For comparison, at Station 46050, with a POR of twelve years, H s(50) = 14.9 m using the POT method and H s(50) = 15.1 m using the AM method. The maximum value of H s on the 50-year contour is 14.5 m. In this case, the minimum requirement for the POT method is closely met, and H s(50) = 14.9 m would be justified; however, one could also make a case for choosing the more conservative value H s(50) = 15.1 m based on the AM method, even though the buoy POR does not satisfy the 20-year minimum requirement. Corrected model-derived estimates base on the full 30-year records, which do satisfy standard requirements for using the AM method, could also be evaluated. For Station 46029, H s(50) = 13.8 m, using model-corrected WWIII hindcast data, and H s(50) = 14.7 m, using model-corrected SWAN hindcast data. With these results, the largest and most conservative SWAN model-derived estimate H s(50) = 14.7 m would be justified if the reduced risk of failure is worth the added cost. For Station 46050, H s(50) = 14.2 m using the AM method based on the 30-year model-corrected SWAN hindcast data. As this is less than the observation-derived H s(50) = 14.9 m using the POT method, one would typically choose the larger, more conservative value; however, both values satisfy standard practice requirements, and one could alternatively choose the lower value if the added structural design cost is deemed not worth the reduced risk of failure.

Conclusions
The results presented herein demonstrate several challenges in estimating n-year return period extreme significant wave height, H s(n) , or sea state, (H s , T e ) n , to characterize extreme wave conditions for WEC design and project risk assessment. Periods of record for observation-derived estimates typically do not meet the minimum requirement to estimate 50-year return period events, i.e., twelve and a half years for the POT method [15], or twenty years for the AM method [1]. When different methods, e.g., POT and AM, can be used, discrepancies between them are random and can exceed 20%. While validated model hindcasts extend the POR and spatial coverage of data sources, model-derived estimates are systematically underbiased compared to those derived from buoy observations. Within the scope of the present study, which limits its investigation to coastal waters along the US East and West Coasts, buoy observations that are less than approximately two decades, model hindcast data limited to three decades, and the assumption of stationary wave climate trends inherent in most extreme value analyses, results support the use of simple linear corrections to model-derived estimates, H s(n) , as a practical way to extend the POR and spatial coverage of extreme wave statistics, while correcting for underbiased discrepancies compared to observation-derived estimates.
Additional efforts are needed to more rigorously verify these correction methods using a split-sample method, with a population of sea state statistics derived from one set of observations to train the correction scaling, and a population of sea state statistics derived from another set of observations to validate it. Although model underbias is not observed to be affected by differences in regional wave climates, regional effects on scaling factors should not be ruled out and, therefore, should be investigated further. While these simple linear correction methods cannot improve agreement between model-and observation-derived environmental contours, other correction or training techniques, e.g., machine learning, should be investigated to realize similar benefits for bivariate extreme sea state analysis.
Characterization of extreme wave conditions for WEC design and project risk assessment is improved by comparing estimates using different methods and data sources. In this comparison, standard best practices should be followed to the extent possible, but engineering judgment that balances risk of failure and cost must be exercised.
Although the effect of nonstationary wave climate trends on extreme wave conditions is not the focus of the present investigation, it may have significant implications for WEC design and project risk assessment. Simple linear adjustments to extreme wave height estimates, H s(n) , based on reported estimates of their projected increases could be applied, but further research is needed to reach consensus on the magnitude and regional distribution of nonstationary trends in extreme wave height. Funding: Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525. This study was also partially funded by the U.S. Department of Energy, Office of Energy Efficiency & Renewable Energy, Water Power Technologies Office under Contract DE-AC05-76RL01830 to Pacific Northwest National Laboratory. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Acknowledgments:
The authors thank the reviewers for their thoughtful comments and suggestions, which greatly improved this manuscript; and Arun Chawla and NOAA for providing the WWIII hindcast data used to estimate extreme wave heights and environmental contours.

Conflicts of Interest:
The authors declare no conflict of Interest.