Forecasting the Long-Term Wind Data via Measure-Correlate-Predict (MCP) Methods

The current study aims to forecast and analyze wind data such as wind speed at a test site called “Urumsill” on Deokjeok Island, South Korea. The measured wind data available at the aforementioned test site are only for two years (2015 and 2016), making it impossible to analyze the long-term wind characteristics. In order to overcome this problem, two measure-correlate-predict (MCP) techniques were adopted using long-term wind data (2000–2016), measured by a meteorological mast (met-mast) installed at a distance of 3 km from the test site. The wind data measured at the test site in 2016 were selected as training data to build the MCP models, whereas wind data of 2015 were used to test the accuracy of MCP models (test data). The wind data at both sites were measured at a height of 10 m and showed a good agreement for the year 2016 (training period). Using the comparison results of the year 2016, wind speed predictions were made for the rest of the years (2000–2016) at the test site. The forecasted values of wind speed had maximum relative error in the range of ±0.8 m/s for the test year of 2105. The predicted wind data values were further analyzed by estimating the mean wind speed, the Weibull shape, and the scale parameters, on a seasonal and an annual basis, in order to understand the wind behavior in the region. The accuracy and presence of possible errors in the forecasted wind data are discussed and presented.


Introduction
Wind energy is one of the cleanest sources for electricity production in terms of greenhouse gas (GHG) emissions. The development of new wind farms, both on-shore and off-shore, has increased rapidly over the last two decades [1]. Wind farm development is a complex process that requires a great deal of experience, analysis, and pre-feasibility studies of the selected site. Wind farm developers usually require long-term measured wind data in order to design and plan wind farms at a particular site. However, the non-availability of long-term "good quality" measured wind data hinders this process [2]. It is both impractical and nearly impossible to obtain measured long-term wind data at every planned wind farm site [3]. Many researchers have pointed out that short-term wind data sets are insufficient to predict the behavior of wind over the entire life span of a wind farm (usually 20 years) [4][5][6]. One way to address this challenge is to use computer-simulated wind data for the planned wind farm sites [5]. However, there are major issues with the availability and accuracy of such wind data. So, in this scenario, measure-correlate-predict (MCP) methods could be an alternative

Wind Data Collection at Two Sites
Prior to applying MCP methods to the data sets, it should be made sure that both sites are not too far/near to each other. Figure 1 was prepared to provide an understanding of the exact position and other geographical characteristics of the two sites. As it can be seen, the Urumsill (test site) is at least 3 km from the met-mast position (reference site) on Deokjeok Island. The height difference between both sites is approximately 60 m (test site: 80 m above sea level (ASL); reference site: 20 m ASL), and there are mountains between both sites having heights more than 100 m above sea level (ASL). The test site is situated at a higher altitude and in a more steep area as compared to the reference site. In fact, the met-mast at reference site is installed on a very flat surface, whereas the "wind master" tower at the test site is surrounded by steep mountains (see Figure 2a,b). The green color in Figure 1 represents the sea surface, whereas north is considered as the zero degree of the wind angle. This analogy will be frequently used in subsequent sections. Table 1 presents the important information about the collected wind data sets at the two sites. Originally, the wind data were recorded in one-second time intervals at both sites. However, due to the large amount of data, the original data were averaged out using a 10 min time interval and the standard deviation from the mean value of each parameter is also stated in Table 1. Figure 2a shows the met-mast installed by the KMA in order to capture the weather data at the reference site, and Figure 2b shows a vertical tower called the "wind master", used to record wind data at the test site. Anemometers were installed on both towers at the same height of 10 m, and the data averaging time interval was 10 min in both cases. The wind data used in the current study was recorded for seventeen years (2000-2016) by the met-mast, whereas only two years (2105 and 2016) of wind data were available at the test site.
Automatic weather systems (AWS) were designed and installed at both sites in accordance with the regulations of the WMO (World Meteorology Organization). Different sensors were attached with AWS tower to measure wind direction, wind speed, and temperature (see Table 1). All the data collected were transferred to a data logger connected with the main computer which stored the data and distributed it automatically when instructed. Figure 2c explains the overall process of wind data collection and storage in the computer.

MCP Methods
MCP algorithms compare the measured wind data sets of two sites (test and reference) over a concurrent time period (usually known as training time), in order to establish a relationship between the two data sets. Such comparisons can lead towards building a mathematical model which links wind speed at both sites in a particular fashion. Finally, these equations are utilized to predict the wind data at the test site. There are many types of MCP models (i.e., linear regression, matrix method, Weibull scale, wind index, etc.) [18]. All of these models have been extensively used by researchers globally in order to forecast long-term wind data. For instance, Ko and Huh [19] used an MCP method in order to predict the long-term wind data at a site called "Seongsan" on Jeju Island, South Korea. They discussed two MCP methods, linear and matrix, but used only the matrix method for analysis due to its robustness [20]. Sreevalsan et al. [21] adopted the linear MCP methodology in order to estimate the long-term wind data at a test site named as Majlahati. They also applied fast-Fourier transform (FFT) analysis in their study to obtain the results. The linear regression model is the more flexible and robust, and thus was deployed in the current study.

Method 1: The Linear Regression Model
The classical regression type of the MCP techniques was adopted in the current study in order to forecast wind speed data at a test site. Such methods compare the wind data of two sites (i.e., the test site and reference site) over a concurrent period of time. A best-fit polynomial curve of degree "n" is drawn between the two data sets. In the current case, the linear regression model was considered as it gives reasonable fits for wind energy estimations. The mathematical form of such a regression model is as follows: where V TS and V RS are the wind speeds at the test site (Urumsill) and reference site (met-mast site), respectively, and, m and a are the slope and y-intercept of the best-fitting straight line. The values of the slope and the y-intercept were determined using the results of Figure 4.

Method 2: The Variance Ratio Model
The linear regression MCP model works on the basis of mean wind speeds. That is, it can predict the mean wind speed nearly equal to the measured mean wind speed at test site over a concurrent time period. However, if such a technique is adopted, the predicted variance is usually less than the measured variance. This creates uncertainties, biases, and errors in the predicted wind data. In order to tackle these issues, the variance ratio method was introduced [22], and it can be defined mathematically as follows: where v and σ are the mean wind speed and the standard deviation in wind speed, respectively.

Estimation of Errors in the Predicted Wind Data
In order to measure the variability in the estimated wind speed, two statistical parameters were introduced (i.e., the coefficient of variation, CV, and the range of variation, RV). CV determines the fluctuations in the estimated wind data around a mean value, whereas RV indicates the extremes of the wind data time series. They are calculated as follows.
In order to measure the accuracy in the predicted wind data, three statistical parameters were introduced: the ratio of mean wind speeds (R v ), the ratio of wind speeds variance (R σ 2 ), and the maximum absolute error (MAE) [22], calculated as follows: where v est and v mes are the estimated and measured wind speeds, respectively. Apart from the above-mentioned error-predicting parameters, the following set of equations can also estimate the accuracy in the forecasted wind data.
bias error = , The values of all of these parameters were estimated for the forecasted wind data of year 2015 and the measured test wind data of year 2015.

Estimation of Weibull Parametres
Once the wind data time series is forecasted, it can be very useful to analyze the wind data so that an estimation of wind potential can be made at the test site. Probability density functions (PDFs) are frequently used by researchers and scientists to estimate the wind potential. Over the past few years, many researchers have tried different techniques. However, from the results of previous studies it has become clear that Weibull distribution models are the most suitable for the estimation of wind potential [23][24][25]. Out of 12 probability density models, Carta et al. [26] concluded that the Weibull model has several advantages over other wind speed distribution models. Corotis et al. [27] compared the performance of Chi-squared and Weibull distributions in fitting the observed wind speed and power histograms. They came to the conclusion that both distributions can accurately predict the wind potential, but Weibull stands out for its better performance. Hennessey [28] stated that along with providing an accurate representation of wind speed distribution, Weibull can also easily estimate wind potential. Deaves and Lines [29] used a Weibull density function to fit low wind speed data, and they concluded that the Weibull model was applicable over the complete range of wind speed. The Weibull model has been applied unanimously by almost all researchers involved in wind speed data analysis. Therefore, a Weibull PDF will be used in this study to estimate the wind potential and to analyze wind characteristics. For this purpose, the Weibull probability density function (PDF) and the cumulative distribution function (CDF) were defined as follows, respectively [30].
k and c are called the Weibull shape and scale parameters, respectively. The accurate estimation of these two important parameters is of immense importance in wind energy analysis. There are numerous methods to calculate k and c, but the current study will adopt the following five methodologies to estimate these parameters [31]: • maximum likelihood method (MLM) • power density (PD) method where v 3 is the mean of the cube of the wind speed, v 3 is the cube of the mean wind speed, and E pf is called the energy pattern factor, whereas c can be determined from Equation (17).
In order to evaluate the performance of each of the five methods listed above, a root mean square error (RMSE) analysis was conducted. RMSE can be defined as below: where y i is the actual value of the ith bin of wind speed, x i is the value forecasted by the Weibull method, and n is the total number of wind speed data points in a particular wind data time series.

Wind Data Predictions at Test Site
Prior to applying any MCP model on the met-mast data, it is highly recommended to first compare the time series wind data of both locations over a concurrent time period. The aim of carrying out this preliminary process was to determine whether both wind data sets had similar patterns with respect to time. Figure 3 shows the season-wise results of such comparisons over a concurrent time period of five days. As can be seen in the figures, both data sets had similar patterns with respect to time, although wind data at the reference site had slightly higher magnitude. Therefore, generalizing the results of Figure 3, it can be concluded that MCP methods can be applied to the remainder of the wind data sets as well. Figure 4 shows the wind speed data comparison for both sites measured during training year 2016. All the wind data presented in this figure are 10 min averages and measured at a height of 10 m. A straight best-fit line was passed through the data in order to obtain the slope and the y-intercept of Equation (1). The value for the square of the correlation coefficient (R 2 ) was 86%, which is considered to be a reliable fit for the wind data. Wind speed data at the test site seemed to have a slightly higher magnitude than that of the reference site.
After obtaining the values of the slope and the y-intercept from Figure 4, the following equation of a straight line can be obtained.
Using the results of Figure 4 and Equation (26), a wind data time series was forecasted at the test site for all the considered years (2000-2016) according to method 1.  Similarly, in order to forecast wind data using method 2, Table 2 was prepared, which contains the values of the mean wind speeds and the standard deviations at both sites. Inserting the values of Table 2 in Equation (2), the following wind data forecasting equation was obtained for method 2.
After forecasting the seventeen years (2000-2016) of wind data at the test site using both methods, it was decided to assess the accuracy of both methods. For this purpose, measured wind data of 2015 at the test site were used as test data to compare with forecasted data of the same year. Table 3 presents the month-wise values of CV and RV in the predicted wind data at the test site for the test year 2015. Experimental values of these variables are also presented so that a comparison can be made among the forecasted and measured data. From the results of Table 3, it is clear that method 2 (variance ratio) was more accurate than method 1 (linear regression). Therefore, the following sections contain results which were obtained by adopting method 2 only.

Forecasted vs. Measured Wind Data at Test Site (Test Year 2015 Only)
Before analyzing the forecasted wind data, it was useful to first compare it with the measured wind data at the test site. Doing so increased the confidence in the results obtained from the analysis of the forecasted wind data. Figure 5 shows the season-wise comparison results of the predicted and the measured wind speed data at the test site. Because the measured data at the test site was available only for the years 2015 and 2016, Figure 5 was prepared using wind data of only the year 2015 (test data), for both the measured and the predicted case. As it is clear from Figure 5 that the wind data predictions showed good agreement with the measured data, the forecasted wind data could be used for further analysis. Although the predicted wind data was slightly over-estimated, the minimum r-squared (R2) value was 79%, which is acceptable. Similarly, Table 4 displays the monthly values of the mean wind speeds, estimated using the measured and forecasted wind data for the year 2015 at the test site. Along with mean wind speed, other statistical parameters ( , , MAE, etc.) are also listed in Table 4, so that the accuracy of the forecasted wind data can be assessed.  Furthermore, Figure 6 shows the season-wise plots of relative error ( ) estimated in the forecasted wind data for the test year 2015 on an hourly basis. It is clear from Figure 6 that the forecasted values of wind speed had maximum relative error in the range of ±0.8 m/s for the test year of 2015, which can be considered as acceptable. Therefore, results of Figures 5 and 6 and Table Similarly, Table 4 displays the monthly values of the mean wind speeds, estimated using the measured and forecasted wind data for the year 2015 at the test site. Along with mean wind speed, other statistical parameters (R v , R σ 2 , MAE, etc.) are also listed in Table 4, so that the accuracy of the forecasted wind data can be assessed. Furthermore, Figure 6 shows the season-wise plots of relative error ( ) estimated in the forecasted wind data for the test year 2015 on an hourly basis. It is clear from Figure 6 that the forecasted values of wind speed had maximum relative error in the range of ±0.8 m/s for the test year of 2015, which can be considered as acceptable. Therefore, results of Figures 5 and 6 and Table 4 suggest that wind data forecasting was completed successfully and the small amount of error can be neglected.

Analysis of Predicted Wind Data
After forecasting the wind data and measuring its accuracy, this section presents some useful analysis and results, which were prepared using the forecasted wind data. Figure 7 shows the annual variation in the Weibull shape and scale parameters, estimated by all the methods adopted in the current study. The shape parameter shows relatively smaller fluctuations as compared to the scale factor, and had an overall mean value of approximately 1.8. On the other hand, the scale parameter suddenly dropped to a value of 1 m/s from year 2009 and continued onward.

Analysis of Predicted Wind Data
After forecasting the wind data and measuring its accuracy, this section presents some useful analysis and results, which were prepared using the forecasted wind data. Figure 7 shows the annual variation in the Weibull shape and scale parameters, estimated by all the methods adopted in the current study. The shape parameter shows relatively smaller fluctuations as compared to the scale factor, and had an overall mean value of approximately 1.8. On the other hand, the scale parameter suddenly dropped to a value of 1 m/s from year 2009 and continued onward.  Table 5 contains the annual RMSE values estimated by all methods. In an ideal case, the RMSE value should be zero, but this is practically not possible. So, the lowest RMSE value indicates the most accurate method used to estimate the Weibull parameters. From Table 5, it is clear that the empirical method produced the overall lowest values of RMSE. Therefore, all of the following results are presented using the Weibull values estimated by the empirical method.  Figure 8 shows the season-wise plots of Weibull PDF and CDF, prepared by using all the forecasted wind data.
It can be noted that most of the wind speeds were in the range of 1-4 m/s, and 95% of the total winds were below 6 m/s, regardless of the season. Relatively lower magnitudes of wind speeds were observed during the winter and the summer as compared to other seasons. The most frequently occurring wind speeds were 2 m/s in the winter and the summer and 3 m/s in the spring and the autumn. Similarly, Figure 9 shows the typical wind rose diagrams prepared by using all the forecasted wind data on a seasonal basis. The wind rose diagrams summarize the wind characteristics for a specific time period. It is very important to assess the prevailing wind directions to determine the layout of the wind turbines to minimize the wake loss. From the wind rose diagrams, it is clear that during all the seasons most winds came from either the north-east (NE) (45 • ) or the south-west (SW) (180-225 • ), with a maximum magnitude below 4 m/s, and mostly between 1 m/s and 3 m/s.

Conclusions
The current study aimed to forecast and analyze the long-term wind data at a test site called "Urumsill" on Deokjeok Island, South Korea, at which only two years (2015 and 2016) of measured wind data were originally available. The reference wind data were recorded by a met-mast continuously since the year 2000, and that met-mast was located at a distance of 3 km from Urumsill. At the test site, measured wind data of 2016 were used as training data to build MCP models, whereas wind data of 2015 were used to measure the accuracy in the forecasted data (test data).
The measured wind data of both sites showed a similar pattern in wind speed and angle, over a concurrent time period of five days during all four seasons of the year 2016. When the measured wind data of both sites were compared against each other for the year 2016, the r-squared (R 2 ) value was found to be 0.86. Similarly, on a monthly basis, the maximum value of CV was less than 0.712 and RV was found to be within [−1 to +4.676] range in the predicted wind data of the test year 2015. Measured and forecasted wind data at the test site were also compared against each other on a seasonal basis for the year 2015, and minimum R 2 value was found to be 0.79 (spring) in this case, which can be considered as acceptable. Furthermore, the magnitude of maximum difference between the monthly mean wind speeds of both types of data (measured vs. forecasted) corresponded to a value of 0.541 m/s, and similarly, the values of some statistical error-indicating parameters (R v , R σ 2 , and MAE, bias error, etc.) were also within an admissible range (over 90% accuracy). Finally, it was found that the forecasted values of wind speed had maximum relative error ( ) in the range of ±0.8 m/s for test year of 2015.
The Weibull shape (k) and scale (c) parameters were estimated for all the years (2000-2016) using five different methods according to the forecasted wind data. The empirical method was found to be the most suitable in this case, as it produced the lowest value of RMSE. Overall, the mean values of k and c were found to be 1.81 and 1.75 m/s, respectively. Most of the winds seemed to be in the range of 1 m/s to 3 m/s, and blew from either the north-east direction or the south-west direction.
Author Contributions: S.A. and S.-M.L. collected the data whereas C.-M.J. provided his technical support throughout the study. S.A. analyzed the data and wrote the paper.