The Influence of the Wind Measurement Campaign Duration on a Measure-Correlate-Predict (MCP)-Based Wind Resource Assessment

Driven by the energy auctions system, wind power in Brazil is undergoing a phase of expansion within its electric energy mix. Due to wind’s stochastic nature and variability, the wind measurement campaign duration of a wind farm project is required to last for a minimum of 36 months in order for it to partake in energy auctions. In this respect, the influence of such duration on a measure-correlate-predict (MCP) based wind resource assessment was studied to assess the accuracy of generation forecasts. For this purpose, three databases containing time series of wind speed belonging to a site were considered. Campaigns with durations varying from 2 to 6 years were simulated to evaluate the behavior of the uncertainty in the long-term wind resource and to analyze how it impacts a wind farm power output estimation. As the wind measurement campaign length is increased, the uncertainty in the long-term wind resource diminished, thereby reducing the overall uncertainty that pervades the wind power harnessing. Larger monitoring campaigns implied larger quantities of data, thus enabling a better assessment of wind speed variability within that target location. Consequently, the energy production estimation decreased, allowing an improvement in the accuracy of the energy generation prediction by not overestimating it, which could benefit the reliability of the Brazilian electric system.


Introduction
As of 2009, the expansion of wind power in the Brazilian electrical energy mix [1,2] was bolstered by a bidding mechanism that promoted competitiveness among different energy sources. This auction-based systematic concept favored the reduction of electricity prices, making wind power generation competitive with other well-explored power sources within the Brazilian electric power sector, such as hydro and thermal [3,4]. Furthermore, it was shown that it would be preferable, at competitive prices, to meet the Brazilian new demand for energy with only renewable sources such as wind instead of thermal power [5].
Due to the stochastic nature of wind resources (i.e., a characteristic that determines variability in its availability), predicting wind power generation is an uncertain process and, thereby, acquiring knowledge of the historical behavior of wind is essential to estimate how much electricity a wind farm can produce and deliver to the grid [6][7][8]. Wind resources vary in different temporal scales, from the shortest (seconds), to intermediate (hours to days), to longer timescales (months to years to decades) [9]. It can be worrisome if interannual variability is neglected when designing a wind farm, to the extent that it could lead to overconfidence in the wind resources [10,11]. Several authors address the interannual variability as an issue to be dealt with if a better balance between electricity production and demand is sought [10][11][12][13][14][15]. Countries like Brazil, which lack the access to wind data recorded over a broad area and over long, multi-decadal periods, face the task of estimating its wind power potential where only short-term datasets resulting from recent wind measurement campaigns are available, since these are not able to represent the wind interannual variability. For that challenge to be overcome, the use of reanalysis data and measure-correlate-predict (MCP) methods is necessary.
MCP is a technique that is able to hindcast wind resources in a given target location where periodical measurements of wind speed and direction are carried out over a short period of time (usually from 12 to 36 months) [7]. The methodology is an important step of the wind resource assessment (WRA), which can historically characterize the local wind behavior. However, wind data that is representative for longer timescales is required for the MCP and WRA to provide accurate knowledge about the local long-term wind variability and also to estimate the power output of a wind farm throughout its multidecadal lifespan [7,16]. Even though presenting significant uncertainties, reanalysis databases are commonly used to fill the gap in locations lacking long-term reference datasets because they result from a combination of different meteorological observations and numerical weather prediction models, providing coverage for large areas and long periods [17,18]. The MCP, as much as other WRA steps and reanalysis databases themselves, is subject to an uncertainty degree inherent to the unpredictability of the wind behavior. Thereby, the identification and quantification of this group of uncertainties is relevant when it comes to reliably determining the local wind energy potential [19], which in turn directly affects the financial structure of wind power projects [20,21].
In Brazil, there is a minimum duration regarding the wind measurement campaign for a project to be able to participate in energy auctions. This minimum length for short-term datasets was extended from 24 to 36 whole months from 2017 onwards [22]. The duration of the campaign is a relevant matter, since it indirectly dictates the volume of data to be used in the MCP, determining the concurrent period of data between target and reference locations. Longer periods of measurement tend to better outline the local wind variability [16,23]. Therefore, this recent Brazilian measure could improve the understanding of wind resources natural variability and, consequently, lead to a probable reduction regarding the group of uncertainties that pervades the WRA. Moreover, it is expected that new wind farm projects could more accurately reflect the reality of wind power generation in Brazil.
Few authors have studied the length of the concurrent data between short and long-term datasets. In [16], Rogers et al. stated that the longer the short-term dataset, the smaller the standard deviation of MCP predictions of mean wind speed becomes. Also, in [23], Taylor et al. affirmed that by increasing the duration of data collection it is possible to reduce the uncertainty in wind speed. Yet no research proposed to evaluate if longer short-term datasets could improve the WRA by influencing its overall uncertainty and, ultimately, the wind power output estimation. Given the aforementioned, the contribution of this article regards the assessment of how the uncertainty related to the long-term historical characterization of wind resources in a target location is affected by the increase in the wind measurement campaign timespan. Subsequently, the research aims to analyze if this longer monitoring period could promote enhancements in the estimation of power generation in a hypothetical scenario of a wind farm set in that target location.

Wind Data
Fundamental for the study of wind resources, anemometric data containing periodical records of wind speed were obtained for this research: In total, three databases consisting of numerical time series with information on the wind speed measured at different heights above ground level, also including information about air density. Table 1 sums up the characterization of the data according to certain parameters, including the altitude of both the target and reference locations, the measurement height (above ground level), the dates when the first and last measurements were taken, the periodicity with which the data were recorded, and average wind speed for the whole measurement campaign. Other target and reference locations were considered for this research but, due to poor correlation, they did not meet the criteria to perform a MCP-based WRA, as seen in Appendix A.
Databases 1 and 2 (D1 and D2) both represent the target location and are 18 km away from one another. Their measurements carried out by anemometers installed in two meteorological masts; each of these towers has three wind speed sensors at three different heights (42, 66 and 81 m above ground level) and they record short-term periods of six complete years of records at every 10 min. Database 3 (D3) represents the reference location and is situated 20 km away from D1 and 38 km away from D2. It comprehends a long-term measurement period of 30 years with records of wind speed at a height of 50 m (above ground level) every 60 min. The data belonging to D3 was obtained from a reanalysis study denominated Modern Era Retrospective-analysis for Research and Applications (MERRA) [24]. The information held within this time series results from tridimensional climate data in a global scale, combining past observations from different measurement systems numerically modeled to predict and to construct the evolution of the Earth's climate throughout time [17,18,25]. The choice of using MERRA as a wind reference data provider is justified by two reasons: Proximity with D1 and D2, and the fact that it allows satisfying results concerning the representativeness of the long-term wind regime for a target location [26,27].
Each of these databases illustrates the wind behavior of a specific geographic location. However, they are located in the same micro region of Bahia (one of the Brazilian Northeastern states), where the terrain complexity ( Figure 1) implies wind flow changes, such as accelerations, decelerations and wind direction deviations, making the targeted wind regime unique. The fact that all three datasets are close to each other favors the possibility of a good correlation between data from both target and reference locations [28]. Each of these databases illustrates the wind behavior of a specific geographic location. However, they are located in the same micro region of Bahia (one of the Brazilian Northeastern states), where the terrain complexity ( Figure 1) implies wind flow changes, such as accelerations, decelerations and wind direction deviations, making the targeted wind regime unique. The fact that all three datasets are close to each other favors the possibility of a good correlation between data from both target and reference locations [28].

MCP Methods
The measure-correlate-predict (MCP) technique is applied to promote the historical characterization of wind resources. The MCP is an important tool to retrospectively reproduce long-term wind conditions of the target location where only short-term data is available. For that to be performed, a correlation between concurrent short-term data for two different locations is initially established; one of them being the target location, where wind resources are to be exploited, and the other being the reference location, where consistent long-term data is available. Secondly, the resulting correlation is employed over the full long-term reference time series to synthesize a long-term time series for the target location. The simultaneity whereby the short-term data from both the target and the reference location were recorded is vital for applying the MCP methods [25].
Various MCP methods have been proposed in the scientific literature, most of which were reviewed in [25]. Some of them are based on linear [16,[29][30][31][32][33][34][35] and non-linear [36][37][38] regression techniques; others are based on a probabilistic distribution of wind data for the target location [39][40][41][42][43][44][45], but all of these consider one single reference station. Nevertheless, some methods take into account multiple reference stations [46][47][48][49][50]. Among such different MCP methods, designing a wind farm would require the proper selection of a method that better suits the datasets for the target and reference stations, since different methods differ from each other on the relationship they establish between the datasets representing the target and reference locations [25]. Hence, comparing distinct methods is necessary before electing a single one to proceed with the WRA. Several studies worked on this matter of comparing different MCP methods according to metrics such as mean absolute error, mean absolute relative error or mean squared error [51][52][53][54][55][56][57][58][59][60][61]. Nonetheless, there is no consensus about one single best MCP model, reinforcing the necessity of testing several of them for every set of wind data.
For performing the MCP-based historical characterization of the wind regime represented by D1, D2 and D3, this research resorted to a software widely used by the wind power industry: Windographer. This software can run tests with the datasets and automatically compare the errors (mean absolute error (MAE), and root mean squared error (RMSE)) of some methods such as Linear Least Squares [16], Matrix Time Series [44], SpeedSort [33], Total Least Squares [32], Variance Ratio [16], Vertical Slice [34,36] and Weibull Fit [38]. Due to relative smaller errors, the Vertical Slice method was elected to proceed with the MCP calculation and to synthesize long-term time series of wind data. In this method, the dispersion diagram of pairs (wind speed at target location versus concurrent wind speed at reference location) is sectioned into equally sized vertical stripes. Later, the algorithm calculates the mean values of the target location wind speed for each stripe, marking down on the diagram a pair between the latter values and the mean values of each stripe. The linear fit is then conducted by the linear connection of the pairs, originating the first line in (0, 0) [25,34,36].
The MCP should be executed with data recorded at similar measurement heights for both the target and reference locations, so that the synthesized data can later be adjusted to the wind turbine hub height [25,62]. Thus, the Vertical Slice method was implemented to extend different time series with measurements at 42 m in hindsight (for both D1 and D2), using wind data from D3 as the reference location data, which are given at 50 m above ground level. The original data of D1 and D2 (six years of wind speed measurements for each) was cut into different sub datasets, providing extracted short-term time series of different time lengths that simulate wind monitoring campaigns varying in their durations. In total, five short-term time series for each target location dataset, D1 and D2, were analyzed, and they differ in their time spans (or N T , the number of integer years of measurement): The 10 resulting MCP-synthesized long-term time series contain wind speed data that correspond to measurements at 42 m in height, preserving the original measurement height from D1 and D2. All the data had to be vertically extrapolated (using Equations (1) and (2) [6][7][8]) to the turbine hub height, which was considered 81 m above ground level. In Equations (1) and (2), V(h) is the average wind speed for a given height h, β is a constant intrinsic to the local vertical wind profile and α is the power law exponent, also intrinsic to the local vertical wind profile. The characterization of the wind speed vertical profile at the target location (represented by D1 and D2) is presented in Table 2. The schematic view of the process described in Section 2.2 is seen in Figure 2.

Long-Term Wind Resource Uncertainty
The uncertainty associated with the long-term historical characterization of wind resources in the target location (represented by σ H , or historical uncertainty) concerns the year-on-year variability of the wind speed [19]. Equation (3) is suggested in [7] to calculate σ H , where σ R represents the standard deviation regarding the yearly averaged wind speeds of the reference location. r 2 is the squared Pearson correlation between the concurrent wind speed data from both the target and reference locations, provided that the data is averaged with equivalent periodicity; N R and N T respectively quantify the integer number of years of measurement of the reference database (D3) and the integer number of years of concurrent data between the target databases (D1 or D2) and D3.
Given that D3 collects wind speed data from 1985 to 2014, N R is fixed at 30. The variable N T , however, varies from 2 to 6, since D1 and D2 uses wind data ranging from 2009 to 2014. N T indirectly represents the length of the wind monitoring campaign; therefore, for each set of two time series related and considered for calculating σ H , five different periods were explored, as cited before. r 2 was also affected by the duration of the wind monitoring campaign due to being related to the quantity of data contained in the target location time series, as shown in Table 3. Table 3. Squared Pearson correlation (r 2 ) between data from D1 or D2 at different heights and different wind monitoring campaign lengths in combination with data from D3. The original data from D1 and D2 was hourly averaged so that they could match D3 s periodicity of 60 min. Later on, the historical uncertainty was calculated according to the length of each time series (regarding the target location) and its respective measurement height. For each of the three measurement heights of D1 and D2 and for each of the five periods of measurement, one value of σ H resulted, accumulating a total of 30 values, as seen in Figure 3.  The OU is calculated by Equation (5), where u is an uncertainty of type k and U is the total number of identified uncertainties during the WRA [19]. All the parameters used for calculating OU are displayed in Table 4 and their values were adopted arbitrarily based on a typical range [63,64]. Et is obtained via Equation (6), which takes into account the characteristics of the wind turbine (Cp is the coefficient of power and A is the circular area swept by the turbine blades), ρ the instantaneous air density (given in D1 and D2), the wind speed at the turbine height (V, given by the resulting MCP-synthesized time series extrapolated to 81 m), and the number of hours in a given year (8760) [65].

Wind Power Output Estimation
The estimation of wind farm power production is linked to the uncertainties identified and quantified during the execution of the WRA. In Brazil, the regulation requires that projects willing to participate in an energy auction must adopt a probability of occurrence of 90% (P 90 ) regarding the total energy production estimated to those projects' wind farms throughout their lifespans [22]. Equation (4) indicates how those calculations are made, where E represents the maximum energy the wind farm can deliver to the grid; L B and L W are the losses due to blades degradation and wake effect, respectively; E t is the energy output for a wind turbine; P 90 is a constant related to the normal distribution for that probability of occurrence of 90% and is equal to 1.28155; OU is the overall uncertainty; FU is the forced unavailability of the wind farm; PU is the programmed unavailability of the wind farm; and CL is the estimated internal consumption and electrical losses from the wind farm to its connection with the grid [22].
The OU is calculated by Equation (5), where u is an uncertainty of type k and U is the total number of identified uncertainties during the WRA [19]. All the parameters used for calculating OU are displayed in Table 4 and their values were adopted arbitrarily based on a typical range [63,64]. E t is obtained via Equation (6), which takes into account the characteristics of the wind turbine (C p is the coefficient of power and A is the circular area swept by the turbine blades), ρ the instantaneous air density (given in D1 and D2), the wind speed at the turbine height (V, given by the resulting MCP-synthesized time series extrapolated to 81 m), and the number of hours in a given year (8760) [65]. Table 4. Losses and uncertainties.

Losses (Calculation of E) Considered Value [%] Typical Range [%]
Wake Each of these uncertainty types (k) presented in Table 4 has its own degree of importance and a typical range that varies according to the specifications of the target location. Altogether, they constitute the overall uncertainty, which directly influences the power output estimation of a wind farm. However, the selected subject for scrutiny was the uncertainty related to the historical characterization of wind resources and one of the goals for this work is to assess the sensitivity of the wind power output estimation to different values of this uncertainty factor, which, in turn, are sensitive to the wind monitoring campaign timespan. The process of energy output estimation considering all the addressed variables to this research is illustrated on Figure 4. For instance, another factor of influence such as turbulence was disregarded in this research's analysis. Although the location of D1, D2 and D3 seems to be a terrain of relevant roughness (Figure 1) which could provoke enough turbulence to disturb wind turbine dynamics [66,67], it is noteworthy that this is not a general model of factors of influence for the wind farm power output The process of energy output estimation considering all the addressed variables to this research is illustrated on Figure 4. For instance, another factor of influence such as turbulence was disregarded in this research's analysis. Although the location of D1, D2 and D3 seems to be a terrain of relevant roughness (Figure 1) which could provoke enough turbulence to disturb wind turbine dynamics [66,67], it is noteworthy that this is not a general model of factors of influence for the wind farm power output estimation. This research addresses one single uncertainty factor (yellow box on Figure 4) among several (orange box on Figure 4), associated with the increase in the wind monitoring campaign duration.

Results and Discussion
In each of the six combinations between time series belonging to the target location (and its following measurement heights) and time series from the reference location, the tendency to reduce the historical uncertainty regarding the increase in the length of the wind monitoring campaign was evident. In relation to the 2 year duration, all the cases showed an average reduction of 18%, 29%, 35%, and 40% when, respectively, one, two, three, and four years were added to the monitoring campaign, as observed in Table 5.

Results and Discussion
In each of the six combinations between time series belonging to the target location (and its following measurement heights) and time series from the reference location, the tendency to reduce the historical uncertainty regarding the increase in the length of the wind monitoring campaign was evident. In relation to the 2 year duration, all the cases showed an average reduction of 18%, 29%, 35%, and 40% when, respectively, one, two, three, and four years were added to the monitoring campaign, as observed in Table 5.   The long-term wind resource uncertainty is included in the OU as a single type of uncertainty among all of those presented in Table 4. When aggregated in Equation (5), all these uncertainty types result in the OU. The calculation of OU is associated to the resulting long-term time series synthesized by the MCP Vertical Slice method and took into consideration sub datasets from D1 or D2 (both only at 42 m, since these sub datasets were the ones inputted in the MCP analysis) as well as D3. In that sense, the different values of this historical characterization uncertainty were subject to different wind monitoring campaigns durations and, in combination with the fixed values considered for the other uncertainty types, resulted in different values for OU, as verified in Table 6. The MCP-synthesized time series were then used to estimate the wind power production considering a hypothetic wind farm of 15 wind turbines. The estimation was made by a tool from the Windographer software that automatically executes Equation (6). Arbitrarily, a 2 MW rated power wind turbine with a hub at 80 m above ground level was elected (Alstom ECO 80/2000 Class II), which is approximately the same height of the MCP extrapolated resulting synthetic data. Its power curve according to air density is seen in Figure 5 [65]. Hence, a wind farm with 15 of these turbines would account for an installed capacity of 30 MW. The results are shown in Table 7 and Figure 6.
D2 (both only at 42 m, since these sub datasets were the ones inputted in the MCP analysis) as well as D3. In that sense, the different values of this historical characterization uncertainty were subject to different wind monitoring campaigns durations and, in combination with the fixed values considered for the other uncertainty types, resulted in different values for OU, as verified in Table 6. The MCP-synthesized time series were then used to estimate the wind power production considering a hypothetic wind farm of 15 wind turbines. The estimation was made by a tool from the Windographer software that automatically executes Equation (6). Arbitrarily, a 2 MW rated power wind turbine with a hub at 80 m above ground level was elected (Alstom ECO 80/2000 Class II), which is approximately the same height of the MCP extrapolated resulting synthetic data. Its power curve according to air density is seen in Figure 5 [65]. Hence, a wind farm with 15 of these turbines would account for an installed capacity of 30 MW. The results are shown in Table 7 and Figure 6.   The values in Table 7 and Figure 6 allow concluding that the increase in the duration of the wind monitoring campaign caused the reduction in the estimated quantity of energy to be generated by the hypothetical wind farm in all the cases analyzed. Consequently, the capacity factors would also decrease. These results reflect that longer periods of wind data collection can be more representative of the yearly variability each wind regime is susceptible to.
Conversely, the reduction of the historical uncertainty-and thus the overall uncertainty-acts in the opposite direction, since its isolated effects reflect on the increase of the estimated quantity of energy produced. Yet, these effects are overshadowed by those using larger datasets, which can ultimately provide an energy estimation less prone to errors and, thus, more reliable. Table 7 and Figure 6 allow concluding that the increase in the duration of the wind monitoring campaign caused the reduction in the estimated quantity of energy to be generated by the hypothetical wind farm in all the cases analyzed. Consequently, the capacity factors would also decrease. These results reflect that longer periods of wind data collection can be more representative of the yearly variability each wind regime is susceptible to. Conversely, the reduction of the historical uncertainty-and thus the overall uncertainty-acts in the opposite direction, since its isolated effects reflect on the increase of the estimated quantity of energy produced. Yet, these effects are overshadowed by those using larger datasets, which can ultimately provide an energy estimation less prone to errors and, thus, more reliable.

Conclusions
This study reveals that a wind monitoring campaign contemplating a more extended duration may allow for better interpretation of the variability in the availability of wind resources in a given target location under the application of the MCP technique. Hence, it is possible to reduce the uncertainty related to the historical characterization of the wind regime in that location, contributing to reducing the overall uncertainty of a WRA. Under the conditions of the recent increase from 24 to 36 months in the minimum length of required wind data recording for a wind farm project to participate in an energy auction in Brazil, the uncertainty related to the historical characterization of wind resources for the data used in this research could be reduced by approximately 18%.
This reduction in the uncertainties group contributes to an increase in the power output estimation of a wind farm, due to the lower overall uncertainty value. This sole fact could lead to an overestimation, which, in turn, may affect the financial structure of a wind farm project. Nevertheless, from the data considered for this research, it was shown that bigger datasets implied a reduction in the estimated energy generation by a wind farm, even considering the overall uncertainty reduction.
The acknowledgement of the outcomes that should follow this expansion in the minimum length of the wind measurement campaign, is perceived as an important contribution to the state of the art pervading the WRA. It seems that the recent new Brazilian regulation could help to enhance a WRA, resulting in more accurate, less prone to overestimation, wind power output estimations. It is understandable that, even if an underestimated energy production is generated by the end of the WRA, the electric sector as a whole may benefit when considering its reliability and expected guaranteed power to be delivered to the grid. For that reason, the findings within this paper should

Conclusions
This study reveals that a wind monitoring campaign contemplating a more extended duration may allow for better interpretation of the variability in the availability of wind resources in a given target location under the application of the MCP technique. Hence, it is possible to reduce the uncertainty related to the historical characterization of the wind regime in that location, contributing to reducing the overall uncertainty of a WRA. Under the conditions of the recent increase from 24 to 36 months in the minimum length of required wind data recording for a wind farm project to participate in an energy auction in Brazil, the uncertainty related to the historical characterization of wind resources for the data used in this research could be reduced by approximately 18%.
This reduction in the uncertainties group contributes to an increase in the power output estimation of a wind farm, due to the lower overall uncertainty value. This sole fact could lead to an overestimation, which, in turn, may affect the financial structure of a wind farm project. Nevertheless, from the data considered for this research, it was shown that bigger datasets implied a reduction in the estimated energy generation by a wind farm, even considering the overall uncertainty reduction.
The acknowledgement of the outcomes that should follow this expansion in the minimum length of the wind measurement campaign, is perceived as an important contribution to the state of the art pervading the WRA. It seems that the recent new Brazilian regulation could help to enhance a WRA, resulting in more accurate, less prone to overestimation, wind power output estimations. It is understandable that, even if an underestimated energy production is generated by the end of the WRA, the electric sector as a whole may benefit when considering its reliability and expected guaranteed power to be delivered to the grid. For that reason, the findings within this paper should support wind power industry stakeholders to grasp the modifications in the Brazilian wind power sector guidelines, hoping it could also influence decision-making among different countries and their power sectors' regulatory frameworks. Lastly, it is worth mentioning that incrementing the wind monitoring campaign duration implies more expenses to a project; making it not financially healthy to increase the former indefinitely. The optimum duration-considering costs and accurate power generation outputs-is an interesting topic for further discussion.

Appendix A
Access to an extra database for a second target location (situated close to the hydropower plant of Porto Primavera, in the state of São Paulo, Brazil) was granted to try and pursue for the generalization of this research's results. The database consisted of anemometric wind speed and direction recordings for a period of 24 months at a height of 10 m above ground level. The information in it, however, was insufficient to perform certain steps within the WRA, such as vertical extrapolation and the MCP analysis. The former would require measurements of wind speed at different heights so the vertical profile could be outlined properly and, as for the latter, long-term reference data (given by MERRA reanalysis) of a location 50 km close to that target location presented a squared Pearson correlation index of 0.17 between reference and location wind speed data, which is considered too low for the MCP analysis to be reliable [25]. Moreover, since the emphasis of this research was to explore the influence of the wind monitoring campaign with a duration of two or more whole years, this database was completely disregarded since it comprised only 24 months of measurement. The information of both target and reference databases is presented in Table A1.