A Statistical Modeling Methodology for Long-Term Wind Generation and Power Ramp Simulations in New Generation Locations

: In future power systems, a large share of the energy will be generated with wind power plants (WPPs) and other renewable energy sources. With the increasing wind power penetration, the variability of the net generation in the system increases. Consequently, it is imperative to be able to assess and model the behavior of the WPP generation in detail. This paper presents an improved methodology for the detailed statistical modeling of wind power generation from multiple new WPPs without measurement data. A vector autoregressive based methodology, which can be applied to long-term Monte Carlo simulations of existing and new WPPs, is proposed. The proposed model improves the performance of the existing methodology and can more accurately analyze the temporal correlation structure of aggregated wind generation at the system level. This enables the model to assess the impact of new WPPs on the wind power ramp rates in a power system. To evaluate the performance of the proposed methodology, it is veriﬁed against hourly wind speed measurements from six locations in Finland and the aggregated wind power generation from Finland in 2015. Furthermore, a case study analyzing the impact of the geographical distribution of WPPs on wind power ramps is included.


Introduction
Wind power generation modeling has been a popular research topic in recent years since the installed wind generation capacities have been increasing rapidly in numerous countries.In the future, if CO 2 emissions are to be reduced globally, an even larger share of electricity will have to be generated from renewable sources including wind power.However, the increasing penetration of wind power in the power systems gives rise to increasing variation and power ramps in the net power generation.Methods for analyzing the effects of a large number of WPPs on the power ramps in future power systems are, thus, becoming increasingly important.
Long-term stochastic simulation of wind power to analyze WPPs and their variability has been a popular approach [1][2][3].A frequently applied method in such modeling is to utilize Monte Carlo simulations for scenario analysis [2][3][4].It has been common to utilize copulas in the stochastic modeling of wind generation in multiple WPPs.This allows the separation of modeling from the dependency structures of the WPPs, i.e., the temporal and spatial correlations from the modeling of the marginal Energies 2018, 11, 2442 2 of 18 probability distributions, i.e., margins of the individual WPPs [2][3][4][5].A similar modeling approach is also used in this paper.
It is relatively straightforward to fit a time series model, e.g., an autoregressive model, or an artificial neural network (ANN) to a dataset covering existing locations [6][7][8][9][10], whereby the parameters of the time series model are estimated so that the important spatial and temporal characteristics of the data are modeled with some level of accuracy.However, as was shown in Reference [11], it is not straightforward to estimate the coefficients of a vector autoregressive (VAR) model when modeling new locations without measurement data (i.e., non-measured locations).This is because, in a full VAR model (i.e., a VAR model where no coefficients are assumed to be zero), the individual coefficients of the model cannot be said to contribute either to only spatial correlations or to only temporal correlations [8].
To overcome this challenge, one approach is to specify a simpler time series model where the coefficients of the model specify either temporal or spatial characteristics [11][12][13].For example, in References [14,15], the VAR model was simplified so that the non-diagonal components of the VAR coefficient matrices were assumed to be zero (i.e., the simplified VAR model).This allows for a straightforward estimation of the model for new locations.This paper will show, however, that such a simplification may result in significant errors in simulated ramp rates.To tackle this problem, the use of the full VAR model is proposed along with a new estimation method that is suitable for modeling new locations.
In previous research, a simplified VAR model has been unable to model the shape of the cross-correlation function (XCF) between two WPP locations [11,12].Failure to model the XCFs between individual WPPs correctly, especially near lag 0, affects the modeling of the autocorrelation function (ACF) of the aggregated wind power generation, which will be shown in this paper.However, the proposed full VAR model can capture the correct shape of the spatial correlation structure between two locations with greater accuracy, which is important for the assessment of the aggregated power generation of multiple WPPs.In addition, the proposed VAR model can be estimated using only wind speed measurement data, which is not the case with many previously published approaches, such as in References [12,16].
The simplified VAR model is also used in wind and solar forecast simulations [17][18][19].While this paper focuses on the simulation of wind generation, the presented full VAR model can also be considered for forecast simulations, where the use of a simplified time series model can have similar drawbacks, to those demonstrated in this paper for wind generation.
Furthermore, for the wind power generation model used to transform the simulated wind speeds to wind power instead of a commonly used third degree power curve [20], a state-of-the-art approach based on logistic functions [21,22] was used.The utilized approach allows for a deterministic estimation of detailed turbine specific power curves from the turbine data sheet parameters, which is essential when modeling non-measured WPPs.
To summarize the contribution to the literature, the proposed VAR model enables a more accurate modeling of both the temporal correlations in individual WPPs and the spatial correlations between the WPPs in new locations.It is demonstrated that the autocorrelations of the aggregated power generation depend on the spatial correlations between the individual WPP locations, which can be modeled accurately with the full VAR model.Consequently, the proposed model can be utilized in the analysis of aggregated generation in future energy systems, including the detailed analysis of wind power ramp rates.It is shown that, with the new methodology, the estimation of the ACF of the aggregated generation and ramp rate probability distribution are much more accurate than with the previous approaches [12,15] that utilized the simplified VAR model.This paper is organized as follows.Section 2 demonstrates the relation between the autocorrelations of an aggregated time series and the cross-correlations of the individual time series.Section 3 presents the proposed full VAR model and the specification of the model parameters for the analysis of temporal and spatial dependency structures.Section 4 describes the estimation data and the estimation of the simulation parameters and illustrates the overall structure of the methodology to model non-measured WPP locations.
Energies 2018, 11, 2442 3 of 18 Section 5 presents the simulation results for six out-of-sample wind speed locations.Section 6 describes the power generation model used in the power simulations.Section 7 introduces the simulation results for out-of-sample modeling of the aggregated wind power generation in Finland in 2015.Section 8 presents the results for a case study assessing the impact of the geographical distribution of the WPP locations on the wind power ramp rates and Section 9 concludes the paper.

The Impact of the Cross-Correlations of Individual Time Series on the Autocorrelations of Their Sum
This section shows that the autocorrelation of a sum of multiple time series depends on the cross-correlations between the individual time series.Consequently, it highlights the importance of using a full VAR model in the modeling of aggregated wind power generation, since it is able to model the shape of the XCF near lag 0, which is shown in Section 5.2.Next, a simple example to demonstrate the dependency between the cross-correlations of two stationary time series y 1,t and y 2,t and the autocorrelation of their sum y t = y 1,t + y 2,t is presented for lag h = 1.The autocorrelation of the aggregated time series y t can be specified for lag h = 1 as shown in the equation below.
cor(y t , y t−1 ) = cov(y t , y t−1 ) Var(y t ) On the other hand, according to the properties of covariance and the variance of a sum of two random variables is given by Using Equations ( 2) and (3) and the relation between covariance and correlation cov(y 1,t , y 2,t ) = cor(y 1,t , y 2,t )σ y1 σ y2 , Equation (1) can be expressed with the correlations between y 1,t and y 2,t as shown below.
cor(y t , y t−1 ) = where σ is the standard deviation (STD) of the corresponding time series.As shown in Equation ( 4), the cross-correlations between time series y 1,t and y 2,t with lags −1 and 1 (as XCF is asymmetric with respect to 0) affect the autocorrelation of the aggregated time series y t at lag 1.Consequently, a similar dependency between the cross-correlations of individual time series and the autocorrelations of their aggregate occurs with multiple time series and for all lags.Hence, it is crucial to be able to model the cross-correlations between individual non-measured WPP locations as accurately as possible in order to capture the correct behavior of the ACF of the aggregated time series for the WPP locations.

The Modeling of the Temporal and Spatial Dependency Structures
This section presents the full VAR model and the specification of the model parameters.

The Vector Autoregressive Model
A k-dimensional VAR model with order p, i.e., a VAR k (p) model for a process z t = [z 1,t , z 2,t , . . . ,z k,t ] is specified by the equation below.
Energies 2018, 11, 2442 where c is a k × 1-dimensional vector of intercept terms, A 1 , .., A p are k × k-dimensional VAR coefficient matrices and u t = [u 1,t , u 2,t , . . . ,u k,t ] is the error term of the VAR model [23].The error term u t is a white noise process, i.e., it has zero mean and no correlation between the components when lag h = 0.
In the VAR model presented in this paper, the margins of the error terms u t are assumed to follow Student's t-distribution and the correlations between the components of u t are modeled with a Gaussian copula.The use of t-distributions for the error term margins is justified in References [12,15,16].In addition, the intercept terms are assumed to be zero (c = 0).In the proposed model, the VAR coefficient matrices A 1 , .., A p are fully specified without any components assigned to zero, which is shown below.
where, e.g., a h;1,k , is the coefficient between z 1,t and z k,t at lag h.

The Specification of the VAR Model Parameters
In this section, the method to specify the required VAR model parameters, i.e., the VAR coefficient matrices and the covariance matrix of the error terms ∑ u , is described in detail.The process to specify the model parameters starts with the estimation of autocorrelation matrices R z (h) for the process z t , which is described in detail in Section 4.2.After the autocorrelation matrices R z (h) are estimated, they can be transformed to autocovariance matrices Γ z (h) using the equation below.
where D is a k × k-dimensional diagonal matrix with the STDs of the components of z t on the diagonal.Next, the VAR coefficient matrices are specified utilizing the obtained autocovariance matrices Γ z (h) and the Yule-Walker equations.The Yule-Walker equations, for lags h > 0, can be formulated as shown below.
where A 1 , . . ., A p are the VAR coefficient matrices and p is the order of the VAR model [23].Equation (8), describing the relation between VAR parameters and autocovariance matrices as a function of lag h, can be written in the following form using basic matrix multiplication.
According to [23], Equation ( 9) can be further formulated by the equation below.
which, on the other hand, is equivalent to the formula below.
[Γ z (1), . . . ,Γ z (p)] = AΓ Z (0), (11) where Γ Z (0) is the autocovariance matrix of the process z t at lag 0 and A is the kp × kp-dimensional VAR coefficient matrix of z t both in the VAR kp (1) form, i.e., in the state-space representation.The state-space representation allows any VAR k (p) process to be written in a kp-dimensional VAR kp (1) form [22].A is composed from the VAR coefficient matrices A 1 , . . ., A p as shown below.
where I k is a k × k-dimensional identity matrix.Since matrix A is needed to specify the VAR model for the simulations, it can be solved from Equation (11) with the following formulation.
where Γ Z (0) is a nonsingular matrix.Equation (13) defines the relation between the VAR coefficient matrix A and the autocovariance matrices of the process z t without any additional variables.Therefore, with Equation (13), the VAR coefficient matrices from 1 to p can be estimated using the estimated autocovariance matrices.The only model parameter not yet specified is the covariance matrix of the error terms ∑ u , which can be determined from the Yule-Walker equations if the aforementioned VAR coefficient matrix A, which is specified in Equation (12), and the autocovariance matrices Γ z (h) are given.According to [23], the VAR kp (1) form of the kp × kp-dimensional covariance matrix of the error terms ∑ U can be obtained by using the equation below.
The k × k-dimensional covariance matrix of the error terms ∑ u can be obtained from the first k columns and k rows of ∑ U .

The Simulation of Non-Measured WPP Locations
This section presents the overall Monte Carlo (MC) simulation procedure for the modeling of non-measured WPP locations.First, the used data and the marginal distributions are described and, then, the estimation of the simulation parameters is presented.Lastly, the structure of the MC simulation procedure is presented step by step.

The Data and the Marginal Distributions
This paper utilizes the same wind speed datasets for the estimation of the VAR model as seen in References [12,15].The datasets consist of hourly wind speed measurement time series from 12 high altitude locations (height from 74 to 150 m above the surrounding ground level) and 19 low altitude locations in Finland.The length of the measured time series varies from one to three years depending on location.It should be noted that the low altitude measurement data are used only in the estimation of the spatial and spatiotemporal correlations (as all 19 locations have the same measurement period of three years), which is explained in Section 4.2.
Empirical cumulative distribution functions (ECDFs) [3,16] are applied as margins for the wind speed measurement locations.The wind speed data from k measured locations denoted as y t = [y 1,t , y 2,t , . . . ,y k,t ] , are transformed to data with normally distributed margins z wD t (where wD denotes with day structures, meaning that the monthly diurnal variations are still present in the data).The transformation to Gaussian data is performed by using the probability integral transformation.
Energies 2018, 11, 2442 6 of 18 where F −1 N is the inverse cumulative distribution function (CDF) of the standard normal distribution and Fi is the estimated ECDF margin for location i.
Next, the monthly diurnal structures, i.e., the day structures, are estimated from z wD t by calculating mean values for each hour of the day of each month.These means are subtracted from z wD t and z t = [z 1,t , z 2,t , . . . ,z k,t ] is obtained.The VAR model is then estimated from z t .A similar approach has been used in Reference [15].The VAR model identification was achieved by assessing the ACFs and partial autocorrelation functions (PACFs) of z t , which indicated that a suitable order for the VAR model was p = 4.The adequacy of the VAR k (4) model, fitted for z t , was verified by checking that the model residuals u t had no statistically significant ACF, PACF, or XCF values (except XCF with lag h = 0, which is acceptable).The margins of u t had too high kurtoses in all estimation locations to follow a normal distribution.Hence, t-distributions were fitted for the margins of the residuals.
When p = 4, the autocovariance matrices Γ z (h) are needed for lags h = 0, . . ., 4 for the estimation of the VAR coefficient matrices, which is shown in Equation ( 13), and for the estimation of the covariance matrix of the error terms ∑ u , which is shown in Equation ( 14).The autocovariance matrices Γ z (h) are obtained from the autocorrelation matrices R z (h) with Equation ( 7).The STDs for the diagonal matrix D for new locations, which are used to convert R z (h) to Γ z (h) in ( 7), are averages of the STDs calculated from z t used in the estimation.
The autocorrelation matrices R z (h) specify both the temporal and spatial correlations of z t .For the non-measured WPP locations, the temporal correlations (the diagonal components) of R z (h) for the lags h = 0, . . ., 4 are the average values of the ACFs calculated from the components of z t used in estimation.
The spatial and spatiotemporal correlations (the off-diagonal components) of R z (h) are estimated by utilizing the distances between the non-measured WPP locations, which was done in [15].Figure 1 illustrates the correlations (for h = 0) estimated from z t calculated for the 19 low altitude wind speed measurement locations against the distances between the measurement locations represented as blue circles.Figure 1 also shows the five curves fitted for these spatial and spatiotemporal correlations for lags h = 0, . . ., 4. These five curves linking the distances between the locations to the corresponding spatial and spatiotemporal correlations are used to estimate the off-diagonal elements of R z (h) for lags h = 0, . . ., 4 for the new locations.

The Estimation of the VAR Model and Simulation Parameters for Non-Measured WPP Locations
The VAR model identification was achieved by assessing the ACFs and partial autocorrelation functions (PACFs) of  , which indicated that a suitable order for the VAR model was p = 4.The adequacy of the VARk (4) model, fitted for  , was verified by checking that the model residuals  had no statistically significant ACF, PACF, or XCF values (except XCF with lag ℎ = 0, which is acceptable).The margins of  had too high kurtoses in all estimation locations to follow a normal distribution.Hence, t-distributions were fitted for the margins of the residuals.
When p = 4, the autocovariance matrices  (ℎ) are needed for lags ℎ = 0, … ,4 for the estimation of the VAR coefficient matrices, which is shown in Equation ( 13), and for the estimation of the covariance matrix of the error terms ∑ , which is shown in Equation ( 14).The autocovariance matrices  (ℎ) are obtained from the autocorrelation matrices  (ℎ) with Equation ( 7).The STDs for the diagonal matrix D for new locations, which are used to convert  (ℎ) to  (ℎ) in ( 7), are averages of the STDs calculated from  used in the estimation.
The autocorrelation matrices  (ℎ) specify both the temporal and spatial correlations of  .For the non-measured WPP locations, the temporal correlations (the diagonal components) of  (ℎ) for the lags ℎ = 0, … ,4 are the average values of the ACFs calculated from the components of  used in estimation.
The spatial and spatiotemporal correlations (the off-diagonal components) of  (ℎ) are estimated by utilizing the distances between the non-measured WPP locations, which was done in [15].Figure 1 illustrates the correlations (for h = 0) estimated from  calculated for the 19 low altitude wind speed measurement locations against the distances between the measurement locations represented as blue circles.Figure 1 also shows the five curves fitted for these spatial and spatiotemporal correlations for lags ℎ = 0, … ,4.These five curves linking the distances between the locations to the corresponding spatial and spatiotemporal correlations are used to estimate the off-diagonal elements of  (ℎ) for lags ℎ = 0, … ,4 for the new locations.The covariance matrix ∑ , which is estimated and shown in Section 3.2., is modeled as a Gaussian copula.The mean value of the degrees of freedom from the fitted t-distributions (5.8068) is used to specify the margins of ∑ for the non-measured WPP locations.
The correlation matrix of the error terms  , which is derived from ∑ , is used as the correlation The covariance matrix ∑ u , which is estimated and shown in Section 3.2., is modeled as a Gaussian copula.The mean value of the degrees of freedom from the fitted t-distributions (5.8068) is used to specify the margins of ∑ u for the non-measured WPP locations.
The correlation matrix of the error terms R u , which is derived from ∑ u , is used as the correlation matrix for the Gaussian copula, R Gauss .This approximation showed very similar R Gauss and R u in the simulations and gave valid results in the applications, which are shown in Sections 5 and 7.
The monthly diurnal variations, i.e., the day structures, are estimated from z t and used in the estimation, which is described in Section 4.1.The averages of the estimated day structures are used for the non-measured WPP locations.Weibull distributions commonly used to describe the wind speed conditions in a specific location [12] are applied as margins for the new WPP locations.The Weibull distribution parameters for each new location are obtained from the Wind Atlas database [24].

The Estimation of the VAR Model and Simulation Parameters for Non-Measured WPP Locations
The simulation procedure is illustrated in Figure 2. The procedure begins with the drawing of a sample from the Gaussian copula for the k simulated WPP locations for the length of the simulation and transforming the margins to t-distributions.
for the non-measured WPP locations.Weibull distributions commonly used to describe the wind speed conditions in a specific location [12] are applied as margins for the new WPP locations.The Weibull distribution parameters for each new location are obtained from the Wind Atlas database [24].

The Estimation of the VAR Model and Simulation Parameters for Non-Measured WPP Locations
The simulation procedure is illustrated in Figure 2. The procedure begins with the drawing of a sample from the Gaussian copula for the k simulated WPP locations for the length of the simulation and transforming the margins to t-distributions.
Next, the correlated t-distributed random numbers are used as innovations for the VARk (4) model.The model produces a simulated multivariate time series  with the appropriate spatial and temporal dependency structures.Then the estimated monthly diurnal structures are added to the simulated data, which yields  .As Figure 2 shows, the simulated  is further transformed to wind speeds with the formula below.
where  , is the resulting simulated wind speed in location i at time t,  is the estimated inverse CDF of the Weibull distributed wind speed margin for location i, and  is the CDF of the standard normal distribution.The last step in the procedure is to transform the multivariate wind speed time series  to the power time series for each individual WPP using the wind power generation model presented in Section 6.The power time series for each WPP can be aggregated to a single time series depicting the aggregated wind power generation for the whole simulated system.Next, the correlated t-distributed random numbers are used as innovations for the VAR k (4) model.The model produces a simulated multivariate time series z t with the appropriate spatial and temporal dependency structures.Then the estimated monthly diurnal structures are added to the simulated data, which yields z wD t .As Figure 2 shows, the simulated z wD t is further transformed to wind speeds with the formula below.
where y i,t is the resulting simulated wind speed in location i at time t, F−1 i is the estimated inverse CDF of the Weibull distributed wind speed margin for location i, and F N is the CDF of the standard normal distribution.The last step in the procedure is to transform the multivariate wind speed time series y t to the power time series for each individual WPP using the wind power generation model presented in Section 6.The power time series for each WPP can be aggregated to a single time series depicting the aggregated wind power generation for the whole simulated system.

Monte Carlo Simulation Results for the Six Out-of-Sample Wind Speed Locations
This section introduces the MC simulation results for the long-term out-of-sample simulations for six high altitude wind speed locations in Finland.The proposed full VAR model is compared with the simplified VAR model, which is utilized with the same overall methodology.

The Simulation Setup for Out-of-Sample Wind Speed Simulations
The performance of the VAR model is compared with the simplified VAR model, which has been used previously in similar modeling.The simplified VAR model is a VAR model in which the off-diagonal components of the VAR-parameter matrices are assumed to be zeros [14,15].
The two models are estimated with the same data set of measured high altitude wind speed locations in Finland, which is presented in Section 4.1.The measured data from six simulated locations are left out of the estimation procedure, which provides the six out-of-sample locations used in the model validation.
Additionally, 100 MC simulation runs with a time resolution of one hour and a length of one year were performed.This yielded 8.76 × 10 5 samples for each of the six locations.The results presented in this section also consider the aggregated time series of these six locations to assess the models' ability to simulate the properties of an aggregate of several locations.The reader should note that wind speed aggregation from several locations is carried out only for model evaluation purposes.

The Spatial Correlation Structures
Figure 3 illustrates the XCFs between two of the six test locations, which is estimated both from the measured data and from the simulated data from the two models.Both models can capture the peak of the XCFs at lag 0 relatively well.However, only the full VAR model can capture the round shape of the XCF estimated from the measured data.The XCF estimated from the simulation data of the simplified VAR model produces XCF values that are too low for most lags and a too peaked shape of the XCF around lag 0. This problem is the root cause for the poorer performance of the simplified VAR model.This is the next part presented in this section.The high XCF values for the full VAR model between lags 10 and 24 are caused by the average diurnal structures that are used for the out-of-sample test locations, as in one of the two considered locations the utilized average diurnal structure deviates from the actual diurnal structure.

Monte Carlo Simulation Results for the Six Out-of-Sample Wind Speed Locations
This section introduces the MC simulation results for the long-term out-of-sample simulations for six high altitude wind speed locations in Finland.The proposed full VAR model is compared with the simplified VAR model, which is utilized with the same overall methodology.

The Simulation Setup for Out-of-Sample Wind Speed Simulations
The performance of the VAR model is compared with the simplified VAR model, which has been used previously in similar modeling.The simplified VAR model is a VAR model in which the off-diagonal components of the VAR-parameter matrices are assumed to be zeros [14,15].
The two models are estimated with the same data set of measured high altitude wind speed locations in Finland, which is presented in Section 4.1.The measured data from six simulated locations are left out of the estimation procedure, which provides the six out-of-sample locations used in the model validation.
Additionally, 100 MC simulation runs with a time resolution of one hour and a length of one year were performed.This yielded 8.76 × 10 samples for each of the six locations.The results presented in this section also consider the aggregated time series of these six locations to assess the models' ability to simulate the properties of an aggregate of several locations.The reader should note that wind speed aggregation from several locations is carried out only for model evaluation purposes.

The Spatial Correlation Structures
Figure 3 illustrates the XCFs between two of the six test locations, which is estimated both from the measured data and from the simulated data from the two models.Both models can capture the peak of the XCFs at lag 0 relatively well.However, only the full VAR model can capture the round shape of the XCF estimated from the measured data.The XCF estimated from the simulation data of the simplified VAR model produces XCF values that are too low for most lags and a too peaked shape of the XCF around lag 0. This problem is the root cause for the poorer performance of the simplified VAR model.This is the next part presented in this section.The high XCF values for the full VAR model between lags 10 and 24 are caused by the average diurnal structures that are used for the outof-sample test locations, as in one of the two considered locations the utilized average diurnal structure deviates from the actual diurnal structure.

The Temporal Correlation Structure of the Aggregate
The ACFs estimated from the aggregated measurement data and simulated data from the models are shown in Figure 4.It can be observed that the ACF estimated from the full VAR model

The Temporal Correlation Structure of the Aggregate
The ACFs estimated from the aggregated measurement data and simulated data from the models are shown in Figure 4.It can be observed that the ACF estimated from the full VAR model has a very similar shape to the ACF estimated from the aggregated measurements and the most critical lags (the ones close to lag 0) are very close to each other.The simplified VAR model produces ACFs that are too low compared to the aggregated measurement data for all observed lags.This is caused by the relation between the ACF of the aggregated time series and the XCFs between the individual locations, which is discussed in Section 2.

The One Hour Ramp Rates of the Aggregate
The one-hour ramp rate PDFs and the STDs of the one-hour ramp rates estimated from the aggregated measurement data and simulated data from the two models are illustrated in Figure 5.The PDFs and STDs of the measurements and full VAR model simulations are very similar.The simplified VAR model is unable to capture the correct shape of the ramp rate PDF since the slope is too gentle and consequently the STD is also too large.This is due to the same reason as seen with the low ACF values compared with the measurement data in the previous subsection; namely, the simplified VAR model's limited ability to model the XCFs between the individual locations, which results in ramp rate PDFs that have too gentle sloping.

The One Hour Ramp Rates of the Aggregate
The one-hour ramp rate PDFs and the STDs of the one-hour ramp rates estimated from the aggregated measurement data and simulated data from the two models are illustrated in Figure 5.The PDFs and STDs of the measurements and full VAR model simulations are very similar.The simplified VAR model is unable to capture the correct shape of the ramp rate PDF since the slope is too gentle and consequently the STD is also too large.This is due to the same reason as seen with the low ACF values compared with the measurement data in the previous subsection; namely, the simplified VAR model's limited ability to model the XCFs between the individual locations, which results in ramp rate PDFs that have too gentle sloping.

The One Hour Ramp Rates of the Aggregate
The one-hour ramp rate PDFs and the STDs of the one-hour ramp rates estimated from the aggregated measurement data and simulated data from the two models are illustrated in Figure 5.The PDFs and STDs of the measurements and full VAR model simulations are very similar.The simplified VAR model is unable to capture the correct shape of the ramp rate PDF since the slope is too gentle and consequently the STD is also too large.This is due to the same reason as seen with the low ACF values compared with the measurement data in the previous subsection; namely, the simplified VAR model's limited ability to model the XCFs between the individual locations, which results in ramp rate PDFs that have too gentle sloping.

The Wind Power Generation Model
This section presents the power generation model to transform the simulated wind speeds to generated power.The paper utilizes an improved state-of-the-art power generation model based on a logistic function in the methodology [21].For the wind turbine model, a three-parameter logistic power curve estimated using a deterministic process is utilized.This is proposed in Reference [22], where it is shown to perform better than other approaches.This particular approach was chosen as the parameters of the logistic function, which can be obtained using only information available from the wind turbine data sheets without any requirements for measured power generation data.The implemented logistic function to transform the wind speeds to generated power can be written using the formula below.
where P r is the rated power of the wind turbine, u is the wind speed, u ip is the wind speed in the inflection point, which is the point in the power curve where the gradient of the power is at its maximum, and s is the slope of the power curve in the inflection point, i.e., the value of the gradient.Equation ( 17) is applied between the specific cut-in and cut-out speeds specified for a turbine.An example of the shape of the power curve produced by the utilized turbine model is illustrated for the Gamesa G128 4.5 MW wind turbine [25] in Figure 6.
In addition, the wake effect, i.e., the slowed down and turbulent wind caused by turbines in the downwind direction inside a wind farm, is considered using a wake coefficient, which is presented in Reference [26], before aggregating the power generation of the individual turbines to give the power generation of the wind farm.The wake coefficient for a wind farm is determined based on the number of turbines in the farm and the distance between the turbines.When the farm topology is not known, e.g., with new wind farms, a regular array topology with the distance of nine times the rotor diameter between the turbines is assumed.A similar approach was used with new wind farms in Reference [12].

The Wind Power Generation Model
This section presents the power generation model to transform the simulated wind speeds to generated power.The paper utilizes an improved state-of-the-art power generation model based on a logistic function in the methodology [21].For the wind turbine model, a three-parameter logistic power curve estimated using a deterministic process is utilized.This is proposed in Reference [22], where it is shown to perform better than other approaches.This particular approach was chosen as the parameters of the logistic function, which can be obtained using only information available from the wind turbine data sheets without any requirements for measured power generation data.The implemented logistic function to transform the wind speeds to generated power can be written using the formula below.
where  is the rated power of the wind turbine, u is the wind speed,  is the wind speed in the inflection point, which is the point in the power curve where the gradient of the power is at its maximum, and s is the slope of the power curve in the inflection point, i.e., the value of the gradient.Equation ( 17) is applied between the specific cut-in and cut-out speeds specified for a turbine.An example of the shape of the power curve produced by the utilized turbine model is illustrated for the Gamesa G128 4.5 MW wind turbine [25] in Figure 6.
In addition, the wake effect, i.e., the slowed down and turbulent wind caused by turbines in the downwind direction inside a wind farm, is considered using a wake coefficient, which is presented in Reference [26], before aggregating the power generation of the individual turbines to give the power generation of the wind farm.The wake coefficient for a wind farm is determined based on the number of turbines in the farm and the distance between the turbines.When the farm topology is not known, e.g., with new wind farms, a regular array topology with the distance of nine times the rotor diameter between the turbines is assumed.A similar approach was used with new wind farms in Reference [12].Figure 6.An example of the three-parameter logistic power curve used to transform the wind speeds to wind power.The example power curve is estimated for a Gamesa G128 4.5 MW wind turbine.The power generation shuts down at the cut-out speed, which is 27 m/s for this turbine.

Monte Carlo Simulation Results for the Wind Power Generation Structure of Finland in 2015
This section introduces the MC simulation results for the long-term simulations for wind power generation in Finland during 2015.The performance of the full VAR model is compared with the simplified VAR model.

Monte Carlo Simulation Results for the Wind Power Generation Structure of Finland in 2015
This section introduces the MC simulation results for the long-term simulations for wind power generation in Finland during 2015.The performance of the full VAR model is compared with the simplified VAR model.

The Temporal Correlation Structures
The ACFs estimated from the measured power generation data and simulated data from the full VAR and simplified VAR models are illustrated in Figure 9.It can be seen from the figure that the ACFs estimated from the measurements and from the full VAR model simulations have a very similar shape.The ACF produced by the simplified VAR model is notably lower for all lags compared to the ACF of the measurements.

The One Hour Ramp Rates
Figure 10 illustrates the one-hour ramp rate PDFs and the STDs of the ramp rates estimated from the measured power generation data and aggregated data simulated with the two models.The PDFs and the STDs estimated from the measurements and from the full VAR model simulations are close to each other, even though the full VAR model estimate of the PDF is slightly more spread, which can be seen from both the PDF and the STD.However, the simulation results are still close to the measured data, especially considering that this is a complete out-of-sample simulation test.Similarly, as with the wind speeds in Section 5, the simplified VAR model is unable to capture the correct peaked shape of the one-hour ramp rate PDF since the slope of the PDF is significantly too gentle and, consequently, the STD is too large.

The Temporal Correlation Structures
The ACFs estimated from the measured power generation data and simulated data from the full VAR and simplified VAR models are illustrated in Figure 9.It can be seen from the figure that the ACFs estimated from the measurements and from the full VAR model simulations have a very similar shape.The ACF produced by the simplified VAR model is notably lower for all lags compared to the ACF of the measurements.The presented simulation results are averages of the 100 MC runs.

The Temporal Correlation Structures
The ACFs estimated from the measured power generation data and simulated data from the full VAR and simplified VAR models are illustrated in Figure 9.It can be seen from the figure that the ACFs estimated from the measurements and from the full VAR model simulations have a very similar shape.The ACF produced by the simplified VAR model is notably lower for all lags compared to the ACF of the measurements.

The One Hour Ramp Rates
Figure 10 illustrates the one-hour ramp rate PDFs and the STDs of the ramp rates estimated from the measured power generation data and aggregated data simulated with the two models.The PDFs and the STDs estimated from the measurements and from the full VAR model simulations are close to each other, even though the full VAR model estimate of the PDF is slightly more spread, which can be seen from both the PDF and the STD.However, the simulation results are still close to the measured data, especially considering that this is a complete out-of-sample simulation test.Similarly, as with the wind speeds in Section 5, the simplified VAR model is unable to capture the correct peaked shape of the one-hour ramp rate PDF since the slope of the PDF is significantly too gentle and, consequently, the STD is too large.

The One Hour Ramp Rates
Figure 10 illustrates the one-hour ramp rate PDFs and the STDs of the ramp rates estimated from the measured power generation data and aggregated data simulated with the two models.The PDFs and the STDs estimated from the measurements and from the full VAR model simulations are close to each other, even though the full VAR model estimate of the PDF is slightly more spread, which can be seen from both the PDF and the STD.However, the simulation results are still close to the measured data, especially considering that this is a complete out-of-sample simulation test.Similarly, as with the wind speeds in Section 5, the simplified VAR model is unable to capture the correct peaked shape of the one-hour ramp rate PDF since the slope of the PDF is significantly too gentle and, consequently, the STD is too large.

The Numerical Comparison
The most relevant numerical statistics are presented in Table 1.The statistics are estimated from the measured aggregated power generation and from the aggregated data simulated with the two simulation models.According to Table 1, both models perform well in most aspects of the modeling.There are, however, major differences in the modeling of the ACF and the STD of the ramp rates.The full VAR model also performs well in the modeling of the ACF at lag 1 (and also with other lags, as shown in Figure 9) and the STD of the one-hour ramp rate.
To summarize the results for the wind power generation in Finland in 2015, the full VAR model performs well in all studied measures and surpasses the simplified VAR model in the modeling of aggregate power generation of a large system.Most notably, the temporal dependency structures and ramp rates of the aggregate generation are modeled correctly, which is not the case with the simplified VAR.Therefore, the full VAR model can be considered as the superior choice for the modeling of wind power generation in multiple non-measured WPP locations.Table 1.The numerical statistics estimated from the measured aggregated wind power generation in Finland in 2015 and from the aggregated simulated data from the two models.σ stands for standard deviation and the capacity factor is the ratio between installed capacity (calculated with the maximum installed capacity at the end of 2015) and the actual output.The <20% and >80% rows show the percentage of hours in a simulation run when the aggregated generation is within the limits.The presented statistics estimated from the simulation results are averages of the 100 MC runs.

The Numerical Comparison
The most relevant numerical statistics are presented in Table 1.The statistics are estimated from the measured aggregated power generation and from the aggregated data simulated with the two simulation models.According to Table 1, both models perform well in most aspects of the modeling.There are, however, major differences in the modeling of the ACF and the STD of the ramp rates.The full VAR model also performs well in the modeling of the ACF at lag 1 (and also with other lags, as shown in Figure 9) and the STD of the one-hour ramp rate.
To summarize the results for the wind power generation in Finland in 2015, the full VAR model performs well in all studied measures and surpasses the simplified VAR model in the modeling of aggregate power generation of a large system.Most notably, the temporal dependency structures and ramp rates of the aggregate generation are modeled correctly, which is not the case with the simplified VAR.Therefore, the full VAR model can be considered as the superior choice for the modeling of wind power generation in multiple non-measured WPP locations.Table 1.The numerical statistics estimated from the measured aggregated wind power generation in Finland in 2015 and from the aggregated simulated data from the two models.σ stands for standard deviation and the capacity factor is the ratio between installed capacity (calculated with the maximum installed capacity at the end of 2015) and the actual output.The <20% and >80% rows show the percentage of hours in a simulation run when the aggregated generation is within the limits.The presented statistics estimated from the simulation results are averages of the 100 MC runs.

The Comparison of the Ramp Rates of the Wind Generation with Different Geographical Distributions
In this section, two wind power generation scenarios are studied to analyze the effect of geographical distribution of the WPP locations on the aggregated wind power ramps.Two distinct geographical distributions of the WPPs are considered; dispersed and concentrated.

The Simulation Setup for the Scenarios
The two wind power generation scenarios consist of 12 WPPs with 20 Gamesa G128-4.5 turbines with 4.5 MW nominal power [25] in each, which results in 90 MW of total installed capacity per WPP.Hence, the aggregated installed capacity in both scenarios is 1080 MW.The hub height of the wind turbines is considered to be 140 m for all turbines.
In the first of the two scenarios, the locations of the WPPs are geographically dispersed and, in the second scenario, the locations are geographically concentrated.In the concentrated scenario, the average distance between the WPPs is 59.6 km and, in the dispersed scenario, the average distance is 285.6 km.In both scenarios, the WPPs are considered to be located in Southern and Central Finland.
To enable a more straightforward comparison of the scenarios in terms of the impact caused by the geographical distribution of the WPPs on the wind power ramps, the generation conditions are fixed in each WPP location.This means that the Weibull parameters describing the wind speed conditions are similar for each WPP in both scenarios.Consequently, the annual generated energy and the hourly mean generation are very similar for both scenarios.The utilized Weibull parameters (A = 8.5 and B = 2.3) are the averages of the parameters obtained for a height of 140 m from the Finnish Wind Atlas database [24] for the WPPs in Section 7.

The Simulation Results for the Scenarios
This section presents the aggregated MC simulation results for the two scenarios.A total of 100 MC simulation runs were performed with an hourly time resolution, resulting in 8.76 × 10 5 simulated samples for each WPP in both scenarios.The simulation results for the aggregated power generation are illustrated for one-hour ramp rate PDFs in Figure 11 and for power generation PDFs in Figure 12.Table 2 presents the numerical results obtained for the scenarios.
As Figure 11 shows, the geographical distribution does have a significant impact on the shape of the one-hour ramp rate PDFs.It is clear that the dispersed scenario yields smaller hourly volatility for the one-hour ramp rates compared to the concentrated scenario.The power generation PDFs in Figure 12 show that, in the geographically dispersed scenario, the hourly generation has smaller volatility and fewer hours with a very large or small aggregated generation, which is in line with previous research, as shown in References [11,12,14,16].
In addition, according to Table 2, the yearly energy and hourly mean generation are similar in both scenarios, which is expected.In the geographically concentrated scenario, extreme generation events, with aggregated generation less than 20% or more than 80% of the installed generation capacity, are more likely (also visible in Figure 12).In addition, the STD of the aggregated generation is notably smaller in the geographically dispersed scenario.The probabilities for ramp rates being more than 10% or 20% of the installed capacity are larger with the concentrated distribution, which is also visible from Figure 11.In addition, Table 3 presents the STDs of the ramp rates for different ramp lengths from 1 hour to 10 hours.The ramp rates of the dispersed generation are notably lower, regardless of the length of the ramp rate.The differences between the two scenarios increase when looking at longer ramp lengths.

Table 2.
The numerical statistics estimated from the aggregated wind power generation with dispersed and concentrated geographical distribution.The <10%, <20%, and >80% rows show the percentage of hours in a simulation run when the aggregated generation or the one-hour ramp rates are within the respective limits.The presented statistics estimated from the simulation results are averages of the 100 MC runs.

Table
The numerical statistics estimated from the aggregated wind power generation with dispersed and concentrated geographical distribution.The <10%, <20%, and >80% rows show the percentage of hours in a simulation run when the aggregated generation or the one-hour ramp rates are within the respective limits.The presented statistics estimated from the simulation results are averages of the 100 MC runs.

Concentrated Geographical Distribution
Yearly Energy (TWh)  To summarize, the STDs of the ramp rates and the probabilities for very large up or down power ramps are notably smaller with the geographically dispersed scenario.Hence, the dispersed placement of WPPs results in smaller ramp rates and, thus, easier manageability of the power ramps from the perspective of the power system operator.

Conclusions
This paper has presented an improved VAR model based methodology to model wind power generation in multiple new WPP locations.It was shown that the detailed modeling of temporal correlations in individual WPP locations and the spatial correlations between the WPPs are essential in order to properly model the dependency structures of the aggregated power generation.It was demonstrated that the proposed model is able to model the correlational structures with required accuracy for the analysis of aggregated wind power generation and power ramps.Furthermore, an estimation method suitable for the modeling of new WPP locations was presented for the proposed full VAR model.
The ability of the proposed methodology to model the aggregated generation of non-measured WPP locations was verified both in wind speed and wind power domains against measured data.An aggregate of six out-of-sample wind locations and the aggregated wind power generation in Finland in 2015 were modeled and the simulations compared well with the measured data.The model was able to simulate aggregated time series with similar statistical qualities and dependency structures to those present in the measured data in both studied cases and was found to be superior to the simplified VAR model.
The presented VAR model extends the applicability of the existing methodology to cover detailed long-term simulations and scenario analyses of the volatility of the future wind power generation including wind power ramp analysis.The detailed analysis of power ramps and especially the assessment of the impact of new WPPs on the aggregated power ramps are crucial topics for both transmission system operators and power producers.
Lastly, a case study assessing the impact of the geographical distribution of the WPP locations on the aggregated ramp rates was presented.The paper quantifiably verified that the impact of the geographical distribution is significant on the aggregated ramp rates and that dispersed geographical distribution of wind turbines results in smaller volatility in aggregate generational output.

4. 2 .
The Estimation of the VAR Model and Simulation Parameters for Non-Measured WPP Locations

Figure 1 .
Figure1.The spatial correlations ρ (i.e., XCFs at lag 0) estimated from  between wind speed measurement locations plotted against distances between the locations with fitted curves for the spatial correlations at lags 0, 1, 2, 3, and 4.

Figure 1 .
Figure1.The spatial correlations ρ (i.e., XCFs at lag 0) estimated from z t between wind speed measurement locations plotted against distances between the locations with fitted curves for the spatial correlations at lags 0, 1, 2, 3, and 4.

Figure 2 .
Figure 2. A flowchart of the MC simulation procedure to simulate wind power generation in nonmeasured WPP locations.White background indicates operations and blue indicates simulated data.

Figure 2 .
Figure 2. A flowchart of the MC simulation procedure to simulate wind power generation in non-measured WPP locations.White background indicates operations and blue indicates simulated data.

Figure 3 .
Figure 3.The XCFs estimated from the measurement data and from the simulated data from the two models for two of the six wind speed test locations.The presented simulation results are averages of the 100 MC runs.

Figure 3 .
Figure 3.The XCFs estimated from the measurement data and from the simulated data from the two models for two of the six wind speed test locations.The presented simulation results are averages of the 100 MC runs.

Energies 2018 ,
11,  x FOR PEER REVIEW 9 of 18 has a very similar shape to the ACF estimated from the aggregated measurements and the most critical lags (the ones close to lag 0) are very close to each other.The simplified VAR model produces ACFs that are too low compared to the aggregated measurement data for all observed lags.This is caused by the relation between the ACF of the aggregated time series and the XCFs between the individual locations, which is discussed in Section 2.

Figure 4 .
Figure 4.The ACFs estimated from the aggregated measurement data and from the aggregated simulated data from the two models for the six wind speed test locations.The presented simulation results are averages of the 100 MC runs.

Figure 5 .
Figure 5.The one-hour ramp rate PDFs estimated from the aggregated measurement data and from the aggregated simulated data from the two models for the six wind speed test locations.The dotted lines with corresponding colors illustrate the standard deviations of the one-hour ramp rates in the three cases.The presented simulation results are averages of the 100 MC runs.

Figure 4 .
Figure 4.The ACFs estimated from the aggregated measurement data and from the aggregated simulated data from the two models for the six wind speed test locations.The presented simulation results are averages of the 100 MC runs.

Energies 2018 ,
11,  x FOR PEER REVIEW 9 of 18 has a very similar shape to the ACF estimated from the aggregated measurements and the most critical lags (the ones close to lag 0) are very close to each other.The simplified VAR model produces ACFs that are too low compared to the aggregated measurement data for all observed lags.This is caused by the relation between the ACF of the aggregated time series and the XCFs between the individual locations, which is discussed in Section 2.

Figure 4 .
Figure 4.The ACFs estimated from the aggregated measurement data and from the aggregated simulated data from the two models for the six wind speed test locations.The presented simulation results are averages of the 100 MC runs.

Figure 5 .
Figure 5.The one-hour ramp rate PDFs estimated from the aggregated measurement data and from the aggregated simulated data from the two models for the six wind speed test locations.The dotted lines with corresponding colors illustrate the standard deviations of the one-hour ramp rates in the three cases.The presented simulation results are averages of the 100 MC runs.

Figure 5 .
Figure 5.The one-hour ramp rate PDFs estimated from the aggregated measurement data and from the aggregated simulated data from the two models for the six wind speed test locations.The dotted lines with corresponding colors illustrate the standard deviations of the one-hour ramp rates in the three cases.The presented simulation results are averages of the 100 MC runs.

Figure 6 .
Figure 6.An example of the three-parameter logistic power curve used to transform the wind speeds to wind power.The example power curve is estimated for a Gamesa G128 4.5 MW wind turbine.The power generation shuts down at the cut-out speed, which is 27 m/s for this turbine.

Figure 8 .
Figure 8.The PDFs estimated from the measured aggregated wind power generations in Finland in 2015 and from the aggregated simulated data from the two models.The dotted lines illustrate the average power generation during 2015 with the corresponding colors (the dotted lines are overlapping).The presented simulation results are averages of the 100 MC runs.

Figure 9 .
Figure 9.The ACFs estimated from the measured aggregated wind power generation in Finland in 2015 and from the aggregated simulated data from the two models.The presented simulation results are averages of the 100 MC runs.

Figure 8 .
Figure 8.The PDFs estimated from the measured aggregated wind power generations in Finland in 2015 and from the aggregated simulated data from the two models.The dotted lines illustrate the average power generation during 2015 with the corresponding colors (the dotted lines are overlapping).The presented simulation results are averages of the 100 MC runs.

Energies 2018 , 18 Figure 8 .
Figure 8.The PDFs estimated from the measured aggregated wind power generations in Finland in 2015 and from the aggregated simulated data from the two models.The dotted lines illustrate the average power generation during 2015 with the corresponding colors (the dotted lines are overlapping).The presented simulation results are averages of the 100 MC runs.

Figure 9 .
Figure 9.The ACFs estimated from the measured aggregated wind power generation in Finland in 2015 and from the aggregated simulated data from the two models.The presented simulation results are averages of the 100 MC runs.

Figure 9 .
Figure 9.The ACFs estimated from the measured aggregated wind power generation in Finland in 2015 and from the aggregated simulated data from the two models.The presented simulation results are averages of the 100 MC runs.

Energies 2018 , 18 Figure 10 .
Figure 10.The one-hour ramp rate PDFs estimated from the measured aggregated wind power generation in Finland in 2015 and from the aggregated simulated data from the two models.The dotted lines illustrate the standard deviations of the one-hour ramp rates in the four cases with the corresponding colors.The presented simulation results are averages of the 100 MC runs.

Figure 10 .
Figure 10.The one-hour ramp rate PDFs estimated from the measured aggregated wind power generation in Finland in 2015 and from the aggregated simulated data from the two models.The dotted lines illustrate the standard deviations of the one-hour ramp rates in the four cases with the corresponding colors.The presented simulation results are averages of the 100 MC runs.

Figure 11 .
Figure 11.The one-hour ramp rate PDFs estimated from the aggregated wind power generation with dispersed and concentrated geographical distribution.The dotted lines with the same colors illustrate the standard deviations of the one-hour ramp rates.The presented simulation results are averages of the 100 MC runs.

Figure 12 .
Figure 12.The PDFs estimated from the aggregated wind power generation with dispersed and concentrated geographical distribution.The dotted lines illustrate the corresponding hourly mean power generation (the dashed lines are overlapping).The presented simulation results are averages of 100 MC runs.

Figure 11 . 18 Figure 11 .
Figure 11.The one-hour ramp rate PDFs estimated from the aggregated wind power generation with dispersed and concentrated geographical distribution.The dotted lines with the same colors illustrate the standard deviations of the one-hour ramp rates.The presented simulation results are averages of the 100 MC runs.

Figure 12 .
Figure 12.The PDFs estimated from the aggregated wind power generation with dispersed and concentrated geographical distribution.The dotted lines illustrate the corresponding hourly mean power generation (the dashed lines are overlapping).The presented simulation results are averages of 100 MC runs.

Figure 12 .
Figure 12.The PDFs estimated from the aggregated wind power generation with dispersed and concentrated geographical distribution.The dotted lines illustrate the corresponding hourly mean power generation (the dashed lines are overlapping).The presented simulation results are averages of 100 MC runs.

Table 2 .
The numerical statistics estimated from the aggregated wind power generation with dispersed and concentrated geographical distribution.The <10%, <20%, and >80% rows show the percentage of hours in a simulation run when the aggregated generation or the one-hour ramp rates are within the respective limits.The presented statistics estimated from the simulation results are averages of the 100 MC runs.

Table 3 .
The standard deviations (σ) of the ramp rates for different ramp lengths with dispersed and concentrated geographical distribution.The results are averages of the 100 MC runs.