Statistical Analysis of Wave Climate Data Using Mixed Distributions and Extreme Wave Prediction

Wei Li *, Jan Isberg, Rafael Waters, Jens Engström, Olle Svensson and Mats Leijon Division of Electricity, Department of Engineering Sciences, Swedish Centre for Electricity Energy Conversion, Uppsala University, Box 534, Uppsala SE-751 21, Sweden; Jan.Isberg@angstrom.uu.se (J.I.); rafael.waters@angstrom.uu.se (R.W.); jens.engstrom@angstrom.uu.se (J.E.); olle.svensson@angstrom.uu.se (O.S.); mats.leijon@angstrom.uu.se (M.L.) * Correspondence: Wei.Li@angstrom.uu.se; Tel./Fax: +46-184-715-849


Introduction
As one of the renewable energy sources, ocean wave energy is considered having a promising potential to supply large amounts of clean energy to meet the rapidly increasing energy demand for the development of the world. The development of ocean wave energy converting technologies could also reduce the consumption of fossil fuels in the long term and provide a reliable solution to the sustainable development of human society. Therefore, in many countries, ocean wave energy test sites have been established to study the most feasible technologies for harvesting ocean wave energy and experimentally test full scale wave energy converters [1]. Due to the difference in wave conditions from site to site, the knowledge of the characteristics of the local wave climate for a specific test site is essential to the successful development of the wave energy conversion technology [2]. The investigation of various aspects of the wave climate at a wave energy test site will provide significant information for the design, construction and performance optimization of wave energy converting systems [3,4].
The Lysekil wave energy test site (58 ¥ 11.700 I N, 11  The Lysekil wave energy test site (58°11.700′N, 11°22.450′E) is located about 2 km off the west coast of Sweden near the city of Lysekil ( Figure 1) and covers an area of 40,000 m 2 . It was initiated in 2004 by the Swedish Centre for Renewable Electric Energy Conversion and Uppsala University for developing ocean wave energy converting technologies and testing the wave energy converters developed by Uppsala University (Uppsala, Sweden) [5,6]. The wave energy converter developed by Uppsala University is based on a direct driven linear generator moored by a gravity-based foundation standing on the seafloor. The linear generator is connected through a rope to a point absorber buoy floating on the surface, capturing wave energy from the motion of the waves [7]. There have been more than twelve full scale linear wave energy converters deployed and tested at the Lysekil site by 2015. In the near future, a wave farm is planned to be built near the test site to supply electricity to the electric grid of the local community. Therefore, detailed wave climate information of the site becomes critical to the further development and optimization of the wave energy converting technology. For wave climate analysis, different probabilistic methods have been proposed to describe the long-term wave distribution [8][9][10][11][12][13][14][15]. The Lognormal, Rayleigh and Weibull distribution are the most commonly used models for long-term wave distribution modelling [16,17]. Wave distribution modeling is crucial for the wave energy resource estimation of the wave energy test sites and the energy production evaluation of the wave energy converters [18,19]. In the extreme wave analysis, the use of the Peak over Threshold (POT) method is recommended by the Maritime Hydraulics group of the International Association for Hydraulic Research [20]. The POT method is based on fitting a Generalized Pareto Distribution to the chosen wave data with a certain threshold and it has been widely used in the prediction of extreme wave heights [21,22]. Long-term extreme wave prediction is of significant importance for the survivability assessment of wave energy converters. A previous study conducted on the wave climate at the west coast of Sweden by Waters et al. [23] was based on wave data generated from the SWAN and WAM wave models. In this paper, wave climate studies based on 9 years of observation with a wave measurement buoy located at the Lysekil wave energy test site are presented. A detailed statistical analysis of the measured wave data is investigated to reveal the characteristics of the wave climate for the test site. The long-term extreme waves are predicted from the application of the POT method based on the measured wave data. Furthermore, a new approach of using a mixed-distribution model is proposed to describe the long-term behavior of the wave height more accurately. The goodness of fit assessed with the R-squared statistical test from the mixed-distribution model to the measured wave data at the test site is impressively good. The mixed-distribution model is also fitted to observed wave data from different locations of the world to show the general applicability of the model. For wave climate analysis, different probabilistic methods have been proposed to describe the long-term wave distribution [8][9][10][11][12][13][14][15]. The Lognormal, Rayleigh and Weibull distribution are the most commonly used models for long-term wave distribution modelling [16,17]. Wave distribution modeling is crucial for the wave energy resource estimation of the wave energy test sites and the energy production evaluation of the wave energy converters [18,19]. In the extreme wave analysis, the use of the Peak over Threshold (POT) method is recommended by the Maritime Hydraulics group of the International Association for Hydraulic Research [20]. The POT method is based on fitting a Generalized Pareto Distribution to the chosen wave data with a certain threshold and it has been widely used in the prediction of extreme wave heights [21,22]. Long-term extreme wave prediction is of significant importance for the survivability assessment of wave energy converters. A previous study conducted on the wave climate at the west coast of Sweden by Waters et al. [23] was based on wave data generated from the SWAN and WAM wave models. In this paper, wave climate studies based on 9 years of observation with a wave measurement buoy located at the Lysekil wave energy test site are presented. A detailed statistical analysis of the measured wave data is investigated to reveal the characteristics of the wave climate for the test site. The long-term extreme waves are predicted from the application of the POT method based on the measured wave data. Furthermore, a new approach of using a mixed-distribution model is proposed to describe the long-term behavior of the wave height more accurately. The goodness of fit assessed with the R-squared statistical test from the mixed-distribution model to the measured wave data at the test site is impressively good. The mixed-distribution model is also fitted to observed wave data from different locations of the world to show the general applicability of the model.

Statistical Analysis of Measured Wave Data
The wave data used in this study are measured by a 'Datawell Waverider' wave measurement buoy deployed in 2005 by Uppsala University at the Lysekil test site at a water depth of 25 m. The data consists of time series of wave elevations of the waves measured during 30 min intervals and sampled at 2.56 Hz. A set of continuous observations taken during 9 years from 2005 to 2013 is analyzed for the wave climate investigation. As shown in Figure 2, the statistical analysis of the recorded wave data, the wave climate at the test site is relatively calm with the most frequent significant wave height H S in the range of 0.2-0.5 m which is common for the wave climate at the Swedish west coast. The significant wave height H S is the average height of the highest one-third of the waves, as measured from the trough to the crest of the waves. We can also see that significant wave heights between 4 and 5 m rarely occur and during the 9 years of observation, there is no measured record of a wave height exceeding 5 m. The monthly mean wave height H and the monthly mean significant wave height at the test site from 2005 to 2013 are shown in Figure 3. It is clear that the winter season from November to February has higher waves compared to the summer season from April to August. The yearly mean wave height and the yearly mean significant wave height are 0.47 m and 0.75 m, as indicated in Figure 3.  For the extreme wave condition investigation, the monthly maximum wave height and the monthly maximum significant wave height are shown in Figure 4. We can see that the monthly maximum wave height at the test site is in the range of 6-9 m and from December to February in the winter season the maximum wave height is at its highest. The average wave power density is 2.39 kW/m with an average wave peak period T p 4.7 s. For the design of a wave energy converter related to a certain wave climate, the statistical analysis of the wave climate indicates the frequent occurrence range of the wave height which is in relation to the wave energy that the wave energy converter is harvesting. The analysis of the wave climate variation throughout a year provides valuable information for a performance optimization to the wave energy converters. The estimation of the maximum wave heights in different months is so important for the survivability of a wave energy converter.

Extreme Wave Estimation
From the offshore engineering perspective, the estimation of the extreme wave conditions for the wave energy converter test site is very important in relation with the construction of the wave energy converter and the safety of its mooring system design. Extreme waves could cause unexpected damage to the deployed wave energy converters. We have experienced this many times in our real sea experiments at the Lysekil test site. Hence a survivability assessment based on extreme wave estimation is crucial to the long-term and stable energy production of the wave energy converters. In offshore engineering, 25 years, 50 years and 100 years return sea waves are commonly estimated for the design of the structures. Considering the fairly calm wave climate at Lysekil test site and the 25 years designed life time for the developed wave energy converter, a 25 years return sea wave was used in the estimation of the extreme waves at the Lysekil wave energy converter test site.
The POT method is applied to extrapolate the return sea waves, which is widely used in modelling extreme waves [24,25]. In the POT approach, given the significant wave height H S and maximum H S in each of a large number of storms is considered from the measured wave climate data. The distribution of the selected H S exceeding a certain chosen threshold can be described by the Generalized Pareto Distribution (GPD). Let x be the selected significant wave height H S , then the probability density function of Generalized Pareto Distribution is given by: f pxq 6 9 9 8 The corresponding cumulative distribution function of the GPD is given as: where k and σ represent the shape parameter and scale parameter respectively, and µ is the threshold parameter in Equations (1) and (2) [26][27][28][29]. The parameters of the Generalized Pareto Distribution are evaluated by the Maximum Likelihood Estimation (MLE) method here. The return sea value is defined as: where N is the return period and Fpxq the cumulative distribution function of the GPD [16]. The POT method is applied on the measured wave data at the Lysekil test site for the extreme wave estimation. Considering the length of measured wave time series we have, we have taken into account adjacent maxima in the wave time series within less than 48 h as to be in the same storm. The selected peaks will be used as provided data to the POT method for further processing with the selected threshold. The suitable threshold for a certain wave data time series is estimated numerically through searching for a stable and approximately linear relation between the chosen threshold and the parameters of the corresponding Generalized Pareto Distribution. The Quantile-Quantile (Q-Q) plot of the significant wave height with threshold 1.5 m versus the GPD distribution is shown in Figure 5. The GPD distribution fits the data with the chosen threshold very well except for a slight deviation close to the tail of the distribution. However, the goodness-of-fit is sufficient to give a reliable prediction for the 25 years return sea value. The return sea values for different periods are shown in Figure 6. We can see that within a longer return period the return sea values is higher and the return sea values has smaller differences within the long return periods than the short return periods. In order to have more reliable predicted return sea values for wave energy converter design long return periods should be considered. In order to investigate the maximum wave height at the test site for the designed lifetime of the wave energy converter, the POT method is also applied here to the maximum wave heights selected from the measured wave data with chosen threshold. As shown in Figure 7, the maximum wave height has a similar behavior as the significant wave height in the Quantile-Quantile plot versus the GPD distribution, with 3.5 m as chosen threshold. In Figure 7 a 3.5 m threshold is chosen for the measured maximum wave height data. It is reasonable to have a much higher threshold for the maximum wave height data comparing to the significant wave data in Figure 5 since the maxim wave heights H max are much higher than the significant wave heights H S . If a threshold is too low it is likely to cause a big bias in the model. The reverse case that a too high threshold will generate few excesses with which the estimation from the model will not be reliable. Figure 8 gives the return values of the maximum wave height for different periods. The comparison of predicted return values of the significant wave height and maximum wave height for Lysekil test site is shown in Table 1. We can see that the predicted significant wave height and the maximum wave height at Lysekil test site have very small differences between 25 years return period and 50 years return period. This result is also related to the fact that the Lysekil test site generally has a calm wave climate at the west coast of Sweden. Therefore it is sufficient to apply 25 years return sea values on the design of the wave energy converters and survivability assessment for the Lysekil test site.    The relation of the significant wave height and maximum wave height for the measure wave data in every 30 min interval is shown in Figure 9. The maximum wave height is given by the approximate expression H max 1.77H s , which is a least square fitted curve represented by the blue solid line in Figure 9. In offshore engineering, H max 2H s [25] is commonly used to estimate the significant wave height from record wave data and it is an approximate expression assuming that the wave heights are Rayleigh distributed. However practically the statistical characters of the waves have a slightly different distribution which will be usually resulting in lower extremes. We can see this from the fitted extreme wave expression above for the Lysekil site.

Mixed-Distribution Model
For the wave climate modelling, fitting probability distributions to the measured wave climate data to describe the characteristics of the wave climate is a common and simple methodology. A probability distribution which could describe the wave climate in a given wave energy test site with a good accuracy would give more effective estimation on the wave energy resource and prediction on the energy production performance of a certain wave energy converter. The lognormal, Rayleigh and Weibull distributions are the most commonly used distributions for wave height modelling. The mathematical description of the lognormal distribution is expressed in Equation (4): where the two parameters µ and σ are, respectively, the mean and standard deviation of the variable's natural logarithm. The Rayleigh distribution is described in Equation (5): where b is the scale parameter of the distribution. The Weibull distribution is expressed in Equation (6): where δ is the scale parameter and ε is the shape parameter. The Weibull distribution is positive only for positive values of x, and is zero otherwise. However, in practice the fit is not always as good as would be desired from the above distributions to the measured wave data. Here we propose a methodology of using mixed probability distributions to fit the record wave data. Let P 1 pxq, P 2 pxq . . . , P n pxq represent a finite set of probability density functions, w 1 , w 2 . . . , w n are weights such that w i ¥ 0 and°w i 1, the mixed probability distribution F(x) is defined as the sum of this finite set of probability density functions: For simplicity, the lognormal, Rayleigh and Weibull distributions are chosen as the basic probability functions to form three different mixed distributions with the number of the basic probability functions n = 2 for each mix. The corresponding weights for each basic distribution and its parameters in the mixed-distribution are evaluated numerically by the MLE method. This proposed mixed-distribution model is applied to the measured wave climate data at the Lysekil test site for modelling the significant wave height. Figure 10 shows the plots of the estimated kernel density function (KDE) of the significant wave heights at the Lysekil test site, the fitted lognormal, Rayleigh and Weibull distributions and the corresponding Q-Q plot of each distribution versus the measured significant wave heights. A Q-Q plot is a graphical method in statistics to plot the quantiles of two probability distributions against each other for comparsion. If the two distributions being compared are similar, the points in the Q-Q plot will approximately lie on the line y = x. The R-square statistical test is used to measure the goodness of fit for the fitted distribution to the estimated KDE from the measured wave data as well. The results from the R-square test and the corresponding parameters for each fitted distribution are presented in Table 2.  The R-square test measures how successful the fit is in explaining the variation of the measured data. It can take on any value between 0 and 1, with a value closer to 1 indicating that a greater proportion of the variance is accounted for by the model. In the R-squared test, the root mean squared error (RMSE) is an estimate of the standard deviation for the random component in the data, with a value closer to 0 indicating a fit that is more useful for prediction. In Figure 10a comparing the fitted lognormal distribution and the estimated KDE for the measured wave data at Lyskil test site we can see that the lognormal distribution could give an acceptable representation to the measured wave data. The R-square value 0.968 and RMSE 0.065 in Table 2 indicates a quite good fit as well. However the accuracy is not be so good since the clear slight deviation occurs over the range 0.5-3 m of the significant wave height.
From Figure 10b we can see that the Q-Q plot shows discrepancy for significant wave heights over 3 m. This indicates a very poor fit in the larger significant wave height range from the lognormal distribution. For the fitted Rayleigh distribution we can clearly see a very poor fit over the entire significant wave height range in Figure 10c,d. The corresponding R-square value and RMSE are 0.382 and 0.248. The goodness of fit from the fitted Weibull distribution is fairly good as we can see in Figure 10e,f. However the accuracy is still poor on the tail which represents the larger significant wave heights range in the wave height distribution. There exists a slight deviation in the range 1-2.5 m of the significant wave height as well. Based on the results in Figure 10 and Table 2 we can see that the accuracy of presenting the measured wave data for Lysekil test site from the lognormal, Rayleigh and Weibull distributions are very limited. In order to have more reliable wave energy resource estimation of the test site and prediction of the power production for the wave energy converter the accuracy of modelling the measured wave data using a single probability distribution method needs to be further improved.
The mixed distributions from Lognormal-Rayleigh, Lognormal-Weibull, and Rayleigh-Weibull distributions of the significant wave heights compared to the estimated kernel density of the significant wave heights and the corresponding Q-Q plots of the mixed distributions versus significant wave heights, are shown in Figure 11. The results from the R-square test and the corresponding parameters for each fitted mixed-distribution are presented in Table 3. Clearly, the mixed-distributions give very reasonable goodness of fit in general, showing a great improvement in comparison to single distributions. The fitted mixed-distributions and the estimated kernel density of the significant wave heights in Figure 11a,c,e have a sound fit in comparison to the single distributions in Figure 10. The deviation in the entire significant wave height range is very small as we can see from the Q-Q plot in Figure 11b,d,f. The R-square values and RMSE have reached a very high level for which it gives good accuracy from the presentation of the mixed-distributions to the measured wave data from the Lysekil test site. Comparing the results in Tables 2 and 3 we can see that the lognormal distribution has a quite good indicated value of the goodness of fit from the R-square test. However the deviation is clearly showing in Figure 10a,b. The reason is that the entire measured wave data set is very large and the percentage of the deviation is quite small compared to the entire data set. Therefore it is good to combine the graphical comparison method and the R-square statistical test to measure the goodness of fit for the considered distributions. We also realized that the advantage of the mixed distribution model is the deviation of fitting from a single distribution can be offset by the mixture of different distributions and the increased number of parameters by mixing different distributions also leads to an overall better fit. The proposed mixed-distributions gives a good presentation with high accuracy of the measured wave data from Lysekil test site. This will improve the accuracy of wave energy source estimation at the test site and provide reliable performance prediction of the wave energy converter as well.

Application of the Mixed-Distribution Model to Different Measured Wave Data
The mixed-distribution model can describe the measured wave data at Lysekil test site quite well, as shown in Figure 11. Therefore, we applied the mixed-distribution model to wave data measured from four other locations to show the general applicability of this methodology. Table 4 shows the relevant information of measured wave data sets for the four locations. The wave data sets for Sites 1-3 are obtained from the Swedish Meteorological and Hydrological Institute (SMHI) and the data for Site 4 is obtained from the Irish Marine Institute. Site 1 is located at the northeast coast of Sweden in the Gulf of Bothnia and Sites 2 and 3 are at the Swedish coast in the Baltic Sea and the North Sea, respectively. Site 4 is located in the Galway Bay of Ireland. Figure 12 shows the geolocation of the four chosen sites. We can see that the four chosen sites have similar moderate wave climates as the Lysekil test site from Table 4. The results of modelling significant wave height with the mixed-distributions for the four chosen sites are shown in Figures 13 and 14. The results from the R-square test and the corresponding parameters for each fitted mixed-distribution to the measured wave data at the four chosen sites are presented in Table 5. We compared the plots of the estimated kernel density of the significant wave height to the mixed-distributions for each chosen site and the Q-Q plots is inserted showing the goodness of fit.

Application of the Mixed-Distribution Model to Different Measured Wave Data
The mixed-distribution model can describe the measured wave data at Lysekil test site quite well, as shown in Figure 11. Therefore, we applied the mixed-distribution model to wave data measured from four other locations to show the general applicability of this methodology. Table 4 shows the relevant information of measured wave data sets for the four locations. The wave data sets for Sites 1-3 are obtained from the Swedish Meteorological and Hydrological Institute (SMHI) and the data for Site 4 is obtained from the Irish Marine Institute. Site 1 is located at the northeast coast of Sweden in the Gulf of Bothnia and Sites 2 and 3 are at the Swedish coast in the Baltic Sea and the North Sea, respectively. Site 4 is located in the Galway Bay of Ireland. Figure 12 shows the geolocation of the four chosen sites. We can see that the four chosen sites have similar moderate wave climates as the Lysekil test site from Table 4. The results of modelling significant wave height with the mixeddistributions for the four chosen sites are shown in Figures 13 and 14. The results from the R-square test and the corresponding parameters for each fitted mixed-distribution to the measured wave data at the four chosen sites are presented in Table 5. We compared the plots of the estimated kernel density of the significant wave height to the mixed-distributions for each chosen site and the Q-Q plots is inserted showing the goodness of fit.  From the presented results for the goodness of fit in Figures 13 and 14 and Table 5 we can see that the mixed-distribution method gives a sound fit to the measured wave data to the four chosen site as well. The R-square value and RMSE indicate a high accuracy level of all the results from the mixed-distributions. The Q-Q plot for the mixed-distributions are showing a quite good fit in the overall range of the significant wave heights, despite some deviations on the tails towards larger From the presented results for the goodness of fit in Figures 13 and 14 and Table 5 we can see that the mixed-distribution method gives a sound fit to the measured wave data to the four chosen site as well. The R-square value and RMSE indicate a high accuracy level of all the results from the mixed-distributions. The Q-Q plot for the mixed-distributions are showing a quite good fit in the overall range of the significant wave heights, despite some deviations on the tails towards larger wave heights. Since larger wave heights have extreme low occurrences in the entire wave height statistics this deficiency does not have significant impact on the results of the wave climate modelling from the mixed-distributions as shown in the results of R-square test in Table 5. This is also the reason that POT method is recommended when it comes to extreme wave modelling and prediction. The goodness of fit for the four chosen sites is fairly sound and it gives a simple and reliable illustration of the generality of the mixed distribution method.  As discussed, the mixed-distributions could give a good fit to the measured wave data. One reason could be the increased number of parameters obtained by mixing different distributions which will lead to an overall better fit. Intuitively, we could also say that the real ocean wave is formed due to multiple physical causes which lead to different characteristics of the waves in probability distributions. The mixed-distributions could approach the precise distribution of the considered ocean wave by reflecting all the aspects of the physical facts with the mixture of different ocean wave related probability distributions. This proposed mixed-distribution aims to use distribution functions to describe the long-term behavior of significant wave data. It reveals the long-term characteristics of the wave climate and it is very useful for wave climate analysis. Researchers have been using different distributions to describe long-term behavior of measured wave data, however, the fit of the distribution to the data varies significantly from chosen different distributions. Sometimes it is difficult to find a distribution to have a good fit to the measured wave data. The mixed-distribution method presented here has an advantage for fitting a general measured wave data with very good fitness. For simplicity, we take n = 2 in this paper and as we can see the results from n = 2 in the mixed-distribution are already quite good. If we take n ¥ 3, the fit to the sample would be even better since there would be more parameters involved in the fitting process.

Conclusions
This paper provides a detailed wave climate analysis of the Lysekil wave energy test site with regard to statistical analysis of the wave climate and extreme wave prediction based on recorded wave climate data from the test site. This study has significant importance for wave energy converter development at the Lysekil test site, especially, on the survivability assessment of wave energy converters. A new mixed distribution methodology is proposed in this paper to model the significant wave height. A sound fit to the wave data from the Lysekil site resulted from the mixed-distribution model. This proposed mixed-distribution method can describe the characteristics of the significant wave height with a great accuracy at the Lysekil test site and it will give more reliable results on wave energy source estimation at the test site and wave energy converter performance prediction. In addition, wave data sets from four other sites were analyzed using the proposed model and showed the general applicability of the method. We notice that the wave climate for the Lysekil test site and the four chosen sites are generally moderate and the mixed-distribution model may require modifications for sites with a much more severe wave climate.