The Effect of Averaging, Sampling, and Time Series Length on Wind Power Density Estimations

: The Wind Power Density (WPD) is widely used for wind resource characterization. However, there is a signiﬁcant level of uncertainty associated with its estimation. Here, we analyze the effect of sampling frequencies, averaging periods, and the length of time series on the WPD estimation. We perform this analysis using four approaches. First, we analytically evaluate the impact of assuming that the WPD can simply be computed from the cube of the mean wind speed. Second, the wind speed time series from two meteorological stations are used to assess the effect of sampling and averaging on the WPD. Third, we use numerical weather prediction model outputs and observational data to demonstrate that the error in the WPD estimate is also dependent on the length of the time series. Finally, artiﬁcial time series are generated to control the characteristics of the wind speed distribution, and we analyze the sensitivity of the WPD to variations of these characteristics. The WPD estimation error is expressed mathematically using a numerical-data-driven model. This numerical-data-driven model can then be used to predict the WPD estimation errors at other sites. We demonstrate that substantial errors can be introduced by choosing too short time series. Furthermore, averaging leads to an underestimation of the WPD. The error introduced by sampling is strongly site-dependent.


Introduction
The need to replace fossil fuels and the continuously increasing demand for energy drive the harvesting of renewable energy sources. Amongst those, wind energy is currently one of the most efficient sources. Before new sites can be used for the development of wind energy projects, a resource assessment has to be carried out to identify potential sites and to predict the available energy. For such assessments, the wind power density is a useful metric because it characterizes the energy content of the wind climatology. It is a relatively simple metric, and hence, it can easily be computed for large areas, guiding the exploration of potential sites. It is also useful when attempting to predict future changes in wind resources using simulations of future climate scenarios.
While widely used and despite the understanding that averaging is detrimental and that high sampling frequencies and long time series improve the Wind Power Density (WPD) estimate, it is not well known how long the time series should be, what sampling/averaging is permissive, and what errors to expect in WPD estimations. Here, we address these questions.
where v, ρ, and N denote the wind speed, air density, and the number of samples, respectively. It is generally computed from a time series of wind speed observations. WPD estimates depend on the length of the time series, averaging effects that reduce the weights of the velocity fluctuations, and sampling effects, i.e., the wind speed variability might not be adequately represented by the sampling used. Wind speed time series are discrete and are characterized by their sampling frequency and averaging period. If they are derived from model outputs, they may be "instantaneous" model realizations, meaning that they represent the model state at a particular time. Wind speed observations are generally recorded using cup and sonic anemometers operating at frequencies within the range 1-20 Hz, and the reported data are usually the time series of the mean values within each 10 min period; thus, 10 min is the usual sampling rate. Satellite data generally have a much lower sampling frequency, of around two measurements per day. Despite their low sampling frequency, satellite data have been used in various studies [1][2][3][4][5][6][7][8]. A review of the various data sources from remote sensing, reanalysis, and mesoscale models was presented by Wang et al. [9]. Mesoscale models are frequently used in wind resource characterization [10][11][12]. However, sampling in mesoscale model outputs is generally performed at low frequencies. Hourly data are commonly considered high frequency, and six hourly data are common, due to the massive amounts of data generated by these models. Lower sampling frequencies are probably not very useful for wind energy applications, as the details of the diurnal cycle would undoubtedly be missing [13,14].
Data derived from climate models are also frequently used for WPD estimation [15][16][17][18][19][20]. Climate models often only provide low sampling and generally generate temporally averaged outputs because this preserves the climate information without the storage burden. For wind energy, however, this is problematic, as we will show. The extent of the errors introduced by the sampling frequency in the WPD estimate is not well known. This study illustrates the effect of sampling frequency and averaging on the WPD estimate. A method that estimates the error and relates it to three wind speed time series statistical characteristics, namely the Weibull shape parameter, the mean wind speed, and the exponential autocorrelation decay base, is outlined here.
The paper is organized as follows. In Section 2.1, we introduce the observational, the model, and the artificial datasets and the three observational sites used in this study to illustrate the problem. In Section 2.2, we introduce the error norm applied throughout this publication. In Section 3, we then present the results of the analysis, first in general analytic terms, by evaluating the impact of averaging. We then focus on sampling frequency and time-series length, using observational time series and model data. This section concludes with the introduction of an artificial time series, from which a numerical-data-driven model is constructed to generalize the analysis. We provide this model as a Supplementary Materials to the article, which can be used by the reader to evaluate errors at other case sites of interest for wind energy resource exploitation. This model is compared to the error obtained by analyzing the observations. Lastly, in Section 4, we present some conclusions.

Data and Methodology
The analysis was performed using four approaches. In the first approach, we compared the errors in the WPD estimates when assuming a Gaussian and a Weibull distribution for the wind speed. In the second approach, we used observational data from meteorological stations and model outputs to analyze the effect of sampling at three different sites. Generally, the analysis of observational data has been limited because long wind speed time series are scarce or proprietary. Moreover, the findings are site-dependent and cannot be directly transferred to other sites. In the third approach, we used data from a numerical weather prediction model to analyze the effect of the length of the time series. Finally, in the fourth approach, we derived a numerical-data-driven model from the results of the error analyses based on the artificial datasets, and the performance of this model was compared to the observational data from the third site.

Data
Three observational datasets were analyzed. The first observational dataset consisted of wind speed measurements at Østerild [21], Northern Denmark, taken over one year at a 106 m height. The second observational dataset consisted of measurements at La Rumorosa, Mexico. At La Rumorosa, which will be referred to as the M01 mast data, wind speeds were measured over one year at an 80 m height. The third observational dataset consisted of wind speed measurements at Høvsøre [22], Western Denmark, taken over a 14 year period at a 100 m height. All measurements were averaged and recorded every 10 min. Data were obtained from simulations using the Weather Research and Forecasting (WRF) model with the configuration of the Mexican Wind Atlas project [23] to explore the dependence of the WPD on the length of wind speed time series. Here, 10 m instantaneous winds, sampled at a frequency of one hour and evaluated at the nearest grid point to the Rumorosa M01 mast location, were used as a case study.

Methods
Meteorological data from the three sites were analyzed. Furthermore, artificial time series were generated. The analysis using these artificial time series provided the error due to sampling and averaging, which depended on the time series characteristics. These artificial data could then be evaluated, here using five-dimensional linear interpolation, to compute the error in WPD estimations for a range of time series characteristics, thus potentially representing a range of sites. This interpolation is the numerical-data-driven model. These artificial hourly wind speed time series were generated through a Markov random walk method combined with a Weibull distribution of the wind speed and a transition matrix following Veers and McNerney [24]. These artificial hourly wind speed time series depended on the mean wind speed, the Weibull shape parameter, and the autocorrelation decay base. Sixty (60) wind speed time series with a sampling frequency of one hour and a 20 year length formed the basis of the artificial time series analysis. The 60 time series were comprised of the full permutation set for three parameters with the following values: a Weibull shape parameter of 1.9, 2.0, 2.2, or 2.4; a mean wind speed of 6.0, 10.0, or 14.0 m/s; and an exponential autocorrelation decay base of 1.4, 1.5, 1.6, 2.0, or 2.4.
The observational and artificial data were analyzed using different averaging periods, t 1 , sampling intervals, t 2 , and time series lengths, l. In the following, this is expressed as v t 1 t 2 , where the averaging period, t 1 , appears as a superscript to the right of the averaged quantity, indicated with an over-line, and the sampling interval, t 2 , appears as a subscript to the vertical bar, commonly used as "evaluated at." The error in estimating the WPD, i.e., the WPD error, is here the difference between the theoretically optimum time series of several decades in length and high sampling frequency. Hence, the WPD error is defined as a function of t 1 , t 2 , and l: and: and the monotone operator: with δt and δl denoting the step in the sampling interval and time series length used when evaluating the errors, respectively. In this study, δl was one year and δt one hour. The monotone operator enforces the fact that with a shorter time series length, the error has to be larger, and equally, a longer sampling period will also increase the error. For each l = l re f , the period of the time series with the largest error has to be used. Not using the largest error would lead to a potentially large underestimation and non-monotone behavior, as illustrated in Figure 1. To achieve this, a sliding window was applied through the windowing operator: with dt = 30 days. Two implementations, using a sliding window search and only using the first l years, i.e., for n = 0, W (x) = x(t s = 0), are contrasted in Figure 1. The use of the windowing operator was motivated by the following three immediate observations that could be made by comparing the graphs in that figure. First, the error introduced by not using a sliding window analysis can be observed in the top left panel. The error for a five year time series was nearly as low as the error using the full 20 years. While this observation was valid for this illustrative synthetic dataset and the particular period analyzed, the occurrence of this minimum did not help derive an estimate of potential errors that could be incurred by using too short time series.

Results and Discussion
We first illustrate the error of estimating the WPD as the cube of the mean wind speed and highlight how the asymmetry of the wind speed distribution amplifies this error. Then, using the observational data, we demonstrate the strong detrimental impact of averaging on WPD estimates. Due to the generally short time series available from observational data, we demonstrate using long time series from model outputs that seasonal changes strongly impact the WPD estimates. Due to the limitations of data availability, artificial wind speed time series were generated with different characteristics, mimicking different site climatologies. Apart from the expected dependence on the Weibull shape parameter, we also found a strong dependence on the autocorrelation decay base and the time series length.

The Problem of the Cube of the Mean Wind Speed for WPD Calculation
Assuming that changes in air density are negligible, the error of estimating the WPD from the cube of the mean wind speed, v 3 , can be estimated as: where p(v) is the probability distribution (pdf) of the wind speed. For a Gaussian pdf, the error is: where σ 2 v is the variance of the pdf. For such a distribution, the error is a function of both the mean wind speed and the variance of the pdf (see Figure 2, left). For a Weibull distribution, which is widely used to represent wind speed distributions [25][26][27][28][29][30][31][32][33], the error in the WPD estimation is: where λ, k, and Γ are the scale, shape parameter, and Gamma function, respectively. Since for a Weibull distribution: ε w is only dependent on k, as shown in Figure 2, right. From these plots, it is clear that the error is non-linear in the shape parameter and that the broader the wind speed distribution (smaller value of k), the larger the error.

Impact of Sampling and Averaging
Using data from the M01 and the Østerild masts, we now analyze the dependence of the WPD on sampling and averaging by constructing subsets from the observational datasets using different sampling frequencies and averaging periods. Figure 3 shows the error with different sampling frequencies and averaging periods at the two masts, from 0 up to nearly 50 h to mimic the worst-case scenario.  The combined sampling and averaging error was always below one, where one represents here the best possible estimation of WPD, i.e., using the wind speed time series constructed from 10 min averages and sampled every 10 min [34]. Using 24 h sampling/averaging periods resulted in an error between 10 and 15% for both observational datasets. If only the sampling frequency were changed, the error could be either larger or smaller than one. The reason for that was that the sampling selected a particular subset of data points from the distribution. This selection may result in one of three outcomes. The subset gave a good representation of the original wind speed distribution, leading to a similar WPD estimation and a small sampling error; the subset was biased towards higher wind speeds, leading to an overestimation of the WPD and a sampling error over one, or the subset was biased towards lower wind speeds, leading to an underestimation of the WPD and a sampling error below one. If the sampling interval was small, around one or two hours, the error was below 1%. A 6 h sampling interval increased the error to approximately 2.5%, and for a 24 h interval, it increased to 10-15%. These two sites showed similar behavior for the WPD estimation error. We now need to address two further questions: first, What is the impact of the length of the time series on the WPD estimation, and second, what is the impact on the wind speed distribution characteristics on the WPD estimation?

Impact of the Length of the Time Series
The impact of the time series length was difficult to estimate because long time series are rare, and the error measure intrinsically assumed that the long time series used as the reference was indeed close to the exact WPD. We first estimated the WPD error using model data, then used artificial time series, and finally, compared the error obtained from the artificial time series with the error obtained from a long observational time series.

Impact of the Length of the Time Series Using Model and Observational Data
In order to assess the impact of the length of the time series on WPD estimates, we analyzed hourly instantaneous WRF model outputs of wind speeds at a 10 m height for La Rumorosa over ten years from the grid point closest to the M01 mast and 14 years of measurements from the Høvsøre met mast. For different sampling frequencies, in our example 1, 2, 3, 6, and 12 h data, the error in the estimation of the WPD, ε, is shown in Figure 4. The longer sampling interval introduced a bias, which was most probably a result of the impact of the diurnal cycle on the wind speed, which, with increasing sampling period, was less and less represented. The figure also shows that for the sampling intervals tested, we needed at least four years of data to reduce the error below 10% for the WRF data and 3.5 years for the Høvsøre data. Furthermore, the intra-annual cycle of the wind speed influenced the error, as one could find seasonal fluctuations, which, even with a nearly 10 year time series length, were around 5% for the WRF data. It is worth noting that only two sites were tested; at other sites, the required time series length to have errors below 10% may vary, depending on the local variability of the winds and climatology. The error behavior at other sites could be analyzed using the numerical-data-driven model presented next.

Impact of the Time Series Length Using an Artificial Time Series
The realism of the error derived from the artificial time series can be demonstrated by comparing the numerical-data-driven model (see Appendix A) with observations from Høvsøre. The observations from Høvsøre were reduced to hourly sampling to match the sampling of the artificial time series. At Høvsøre, the 14 year time series had a Weibull shape parameter of 2.19 and a mean wind speed of 9.47 m/s. By using these two parameters, the autocorrelation decay became a free parameter. Figure 5 shows the autocorrelation decay of several artificial time series together with that estimated from the Høvsøre data. An autocorrelation decay of 1.6 fit the observations well, at least within the range 0-20 h. Hoevsoere artificial with exponential decay base 1.5 artificial with exponential decay base 1.6 artificial with exponential decay base 1.8 Figure 5. Autocorrelation decay as a function of the lag time for the Høvsøre observations and artificial time series.
In order to estimate the sensitivity of the predicted error to the input parameters, the numerical-data-driven model was evaluated at k = 2.0, 2.2, and 2.4, v = 8.5, 9.5, and 10.5 m/s, and exponential autocorrelation decay bases of 1.5, 1.6, and 1.7. The resulting error estimates are shown alongside the observational data in Figure 6.  It can be seen that for a length of one year, ε was in the range of 1.12 to 1.35. The observations confirmed an ε of 1.33. The model error reduced faster than that estimated from the observational data. This accelerated error reduction was probably due to the lack of climatological effects within one year, which were not present in the artificial time series. In general, climatological variability in Northern Europe can lead to inter-decadal variations in wind energy of up to 30% and inter-annual variations in power; mean relative standard deviations of approximately 13% over 22 years have been reported [35]. After three years, the model dropped below 5% error, which was achieved by the observations after seven year. This result highlighted that the model provided a lower bound of the likely WPD error.

Variability of the WPD Error Using Synthetic Data
In the previous sections, one realization of a synthetic wind speed time series per set of wind speed characteristics was used to estimate the WPD error. However, the time series were generated using a stochastic process, and hence, these time series also inherited a stochastic behavior. The WPD error could thus exhibit such stochastic behavior, and here, we explore this by generating 1000 time series with a length of 20 years, which were computed with the Høvsøre site parameters (Weibull shape parameter of 2.19, mean wind speed of 9.47 m/s, and autocorrelation decay of 1.6).
The resulting histograms of ε, based on 20 year long time series, are shown in Figure 7. The standard deviations of ε for the 1, 6, 11, and 16 year long time series were 0.048, 0.018, 0.010, and 0.006, respectively. The mean of ε decreased from 1.25 for l = 1 year to 1.07, 1.04, and 1.02, for l = 6, 11, and 16 years, respectively.
These results for the distribution of ε reiterated that for short time series, the errors were potentially large. For one year time series, the errors reached up to approximately 40%, and the uncertainty, here defined as the spread of the predictions made by a thousand time series evaluations, was equally large, ranging from 15 to 40%. The use of longer time series improved the maximum observed errors, and the uncertainty was reduced to values around 5%.

Conclusions
The findings of this study clearly showed that a sufficient length of the wind speed time series used for useful wind resource estimation at a site was essential. Several years of wind speed time series data were required. Using four years of wind speed time series could be sufficient for some locations, e.g., the Høvsøre site presented here, but for other sites, the same length could still bear a 25% error in the final WPD estimate.
This study was limited by the fact that the analysis was only performed at three locations. Other locations with different wind climatologies and seasonal changes are likely to show slightly different length requirements. Furthermore, at the hub height, where modern large wind turbines operate, the winds are likely to show less variability. The study, however, highlighted important characteristics inherent to the estimation of WPD, which were expected to be transferable to other sites, and some of these limitations were rectified by the use of artificial time series data.
The estimation of the WPD at a site is vital to guide the exploration and development of wind energy. The examples presented in this work helped to quantify the uncertainties in WPD estimations, their strong dependence on the source data, and how they were pre-processed. The results herein are particularly relevant for the studies attempting to evaluate the impact of future climate scenarios on wind resources. If the evaluation periods in those climate impact studies are too short, the sampling is coarse, and potentially inappropriate averaging is used, any sensitivity to climate change may well be orders of magnitude smaller than the error incurred by the limitations mentioned above.
It is also instructive to see the impact of the exponential decay base on the WPD error. If there were no dependence on the exponential decay, then a one year time series of hourly data would yield the same error as as a two year time series with 2 h sampling. However, due to the temporal evolution of the wind, in contrast to a purely stochastic process, the frequency of the sampling not only affected the amount of data over a given period, but also the representation of its fundamental characteristics.
The presented study started with an analysis of the impact of the cube of the wind speed on its discrete mean. It was shown how averaging added substantial underestimates (>10%) when the averaging periods were too long (>10 h). We also showed how this was in part due to the distribution of the wind speeds, which followed a Weibull distribution closely. This effect was non-linear and could, therefore, not easily be corrected. We also showed that the length of the time series was also crucial. Too short time series would introduce errors due to seasonal and inter-decadal oscillations.
This study presented of a numerical-data-driven model, which could be used to analyze different sites of interest and their sensitivity to sampling and time series length. This model was shown to provide good results with slight underestimates of the WPD. Despite this tendency to underestimate, the results were nevertheless beneficial, as they could guide us to find the minimum requirements for both sampling frequencies and time series length. In other words, the numerical-data-driven model could be used to optimize the time series length requirement, to avoid the error incurred by analyzing a too short time series being substantially larger than an application-dependent upper limit.
The stochastic analysis presented in Section 3.3.3, however, urges caution concerning the short time series and the numerical-data-driven model, as there, the distribution of ε could be quite broad. This problem will be investigated further in future work, alongside the evaluation of the sensitivity to the site parameters and possible ways of how to integrate more site characteristics into the model and construct a system that can reliably predict time series requirements for a large variety of sites. This study demonstrated the need, motivation, and first step towards this longer term aim.