A New Wind Speed Scenario Generation Method Based on Principal Component and R-Vine Copula Theories

: The intermittent and uncertain properties of wind power have presented enormous obsta-cles to the planning and steady operation of power systems. In this context, as an effective technique to study wind power uncertainty, the development of an accurate wind speed scenario generation method is of great signiﬁcance for evaluating the impact of wind power in the power system. In the case of several wind farms, accurate scenario generation involves precise acquisition of the correlation between wind speeds and the greatest retention of statistical properties of wind speed data. Under this goal, this research provided a new method for scenario development based on principle component (PC) and R-vine copula theories that incorporates the spatiotemporal correlation of wind speeds. By integrating with PC theory, this strategy avoids the dimension disaster induced by employing R-vine copula alone while taking beneﬁt of its ﬂexibility. The simulation results utilizing the historical wind speeds of three adjacent wind farms as samples showed that the method described in this article could effectively preserve the statistical properties of wind speed data. Eight evaluation indicators covering three facets of the scenario generation method were used to compare the proposed method holistically to two other commonly used scenario generation methods. The results indicated that this method’s accuracy was increased further. Additionally, the validity and necessity of applying R-vine copula in this model was demonstrated through comparisons to C-vine and D-vine copulas.


Introduction
To address issues such as the energy crisis and greenhouse gas emissions, the development and utilization of renewable energy have accelerated in recent decades, and renewable energy sources, including wind power, are becoming more integrated into the power grid.However, wind energy as a form of renewable energy exhibits a high degree of randomness and intermittency, posing certain challenges for the safe and stable operation of the power grid as well as planning [1].
In the existing literature, wind power modeling methods have been widely used.In the study of power generation expansion planning, Sahragard et al. [2] considered the wind power penetration in power generation expansion planning by using the conversion model of wind speed and wind power.Band et al. [3] took the wind energy in the Gulf of Oman as the research object and assessed the change of its power generation potential Citation: Goh according to the relevant wind power model.One of the current challenges is determining how to account for wind energy generation uncertainty in the optimization and planning of grid dispatching with wind energy integration.Additionally, when a grid contains several wind farms, it is critical to accurately capture the temporal and spatial correlation between wind farms for wind power modeling.As a result, in-depth research on wind energy uncertainty modeling in the context of multiple wind farms remains necessary.Scenario generation has been extensively studied as a method to deal with the uncertainty associated with wind energy.Three types of scenario generation techniques exist.The first is a method for simulating wind power time series using the Markov chain model and the autoregressive moving average (ARMA) model.D'Amico et al. [4] developed first and second order semi-Markov chains to generate synthetic wind speed time series, which is a more accurate approach than using a simple Markov chain in reproducing the statistical properties of wind speed data.Chen et al. [5] used Fourier series and an ARMA model to determine the seasonal trend and temporal autocorrelation of wind speed, respectively.Sim et al. [6] compared and analyzed the ability of autoregressive integrated moving averages (ARIMA) to predict wind speed and generate wind speed time series data from historical values.Sun et al. [7] proposed a method for generating multi-wind farm scenarios based on truncated multivariate Gaussian mixture models and Markov chain quasi-Monte-Carlo sampling.Morales et al. [8] decoupled the multivariate ARMA model, simplifying parameter estimation while preserving the statistical properties of wind farm wind speed data sets.Abedi et al. [9] combined the ARMA and fuzzy models to preserve the spatiotemporal correlation of wind farms while studying the impact of wind power correlation on the joint energy and reserve market.Duong et al. [10] demonstrated the efficacy of a hybrid method based on principal component analysis (PCA) and the ARMA model for generating data sets that preserve wind farm temporal and spatial correlation.
The second category of method for studying wind energy scenario generation is the machine learning method represented by generative adversarial networks (GAN).Recent research has applied the Wasserstein GAN (WGAN) [11], a conditionally improved WGAN combined with an unsupervised labeling model [12], a faster and more stable improved GAN [13], a controllable GAN with new evaluation indexes [14], and a GAN combined with reinforcement learning without manual labeling [15] to the generation of wind power scenarios.These findings demonstrate that these types of methods are capable of effectively preserving wind energy output's spatiotemporal correlation.
The third category of wind power scenario generation methods is based on copula theory-based wind power spatiotemporal modeling.Copula theory has attracted the attention of numerous researchers due to its ability to be applied to both linear and nonlinear correlations.By adopting the truncation method for the D-vine copula, Haghi et al. [16] were able to reduce the computational burden while maintaining the flexibility of selecting the appropriate copula function for the varied correlation characteristics of wind farms.Similarly, Becker [17] examined the time autocorrelation of wind energy forecast errors using a D-vine copula.Lin et al. [18] used the t copula to generate the joint distribution of multiple wind farms and combined it with the imprecise Dirichlet model to increase the rationality of the generated scenarios.Borujeni et al. [19] also used t copula to derive the joint probability distribution for hours when examining the tail-dependent structure of wind speed data.Eryilmaz et al. [20] investigated the system reliability of a wind power system with two wind farms by using Gumbel copula to capture the correlation between wind speeds between the two wind farms and evaluating the system's total capacity in relation to the turbines' reliability.To fully account for the impact of wind power uncertainty on economic dispatch, Li et al. [21] sampled the joint distribution of multiple wind farms established via D-vine copula and obtained wind speed data sets that took spatial correlation into account.Deng et al. [22] developed a wind speed scenario generation method that took into account the tail dependency structure of wind speed data.They used t copula and C-vine copula to establish joint distributions of spatial and temporal correlation of the tail structure, respectively, in order to minimize parameter calculation.Qiu et al. [23] used C-vine and D-vine copulas to generate wind speed scenarios from clustered multivariate wind speed data sets.Eventually, scenarios that were more consistent with historical data were retained.To model the correlation between multiple wind farms, Xu et al. [24] proposed a simplified C-vine copula constructure similar to truncation.Probabilistic small signal stability analysis is used to demonstrate the method's effectiveness in wind power probability modeling with high correlation.Henderson et al. [25] estimated copula parameters using a Bayesian approach and demonstrated the effect of the size of the wind speed data sets on the uncertainty associated with parameter estimation via multiple copula functions.
Wang et al. [26] used Gaussian/t copula to capture the temporal correlation between multi-period wind power forecast errors in order to accurately assess the capacity of the energy storage system that must be configured in the system containing the wind farm.Li et al. [27] developed a dynamic copula to model the correlation between wind power forecast errors and wind power fluctuations as another application for studying the correlation of wind power forecast errors.The combination of the ARIMA model and the generalized autoregressive conditional heteroscedasticity (GARCH) model improved the accuracy of the wind power prediction error range obtained using this method.Philippe et al. [28] used step-by-step Gaussian copula and Archimedean copula modeling to estimate temporal and spatial correlations, significantly reducing the number of estimated parameters while maintaining scenario accuracy.Given the copula functions' ability to model both the dependency structure and the marginal distribution separately, Wang et al. [29] combined the highly flexible R-vine copula and probabilistic forecasting to improve forecast quality in the case of multiple wind farms with incomplete sample data.Wang et al. [30] proposed a distance-weighted kernel density estimation method to improve the accuracy of the marginal distribution and combined it with the R-vine copula to accurately model the spatiotemporal correlation of wind farms in another study involving the application of the R-vine copula.Additionally, in the study of probabilistic power flow, copula theory was frequently used to investigate the correlation between wind speeds in order to fully account for the uncertainty associated with wind energy [31][32][33].
While it is clear that extensive research has been conducted on the scenario generation method for dealing with wind energy uncertainty, there are still some gaps in the existing research.In the case of multiple wind farms, the scenario generation method based on Markov and ARMA model [4][5][6][7] lacks the flexibility to account for non-linear correlation and spatial correlation.Additionally, in order to satisfy the ARMA model's requirement for stationary data, it is necessary to perform corresponding transformations or assumptions on the input data, limiting its applicability.While the GAN-based scenario generation method [11][12][13][14][15] is effective at obtaining correlation between data, it is time consuming to train and its efficiency varies significantly depending on the experimental equipment.
As a result, the copula theory has been widely applied, as it is capable of effectively resolving the aforementioned issues.However, existing studies have identified drawbacks such as the loss of statistical characteristics in the data [16,24], the lack of flexibility in applying a single copula [18][19][20]28,[31][32][33], and the consideration of only temporal or spatial correlation [20,21,23,24].Furthermore, in the application of vine copula theory, most of them have studied C-vine and D-vine copula with special structure [16,17,[21][22][23][24], and there is little research on R-vine copula with more general structure.Due to the fact that the few scenario generation methods involving the R-vine copula applied it directly [29,30], the curse of dimensionality problem also occurs when dealing with high-dimensional data.While the independence test mentioned in [30] could assist in reducing the number of variables, when applied to data with correlation, this method contributed little to reducing the computational burden.In light of these limitations and research gaps, we proposed a novel method combining R-vine copula and PC theory for alleviating the Rvine copula's curse of dimensionality while retaining the proposed model's accuracy.Rvine can obtain correlations with a variety of different characteristics and separating spatiotemporal correlation modeling alleviates the problem of dimensional disaster, whereas using PC theory establishes the conditions for separation modeling.The simulation first verified the effectiveness of the scenario generation process proposed in this article, then compared the proposed method and other methods comprehensively using three aspects of the scenario generation evaluation indicators to verify the accuracy of the proposed method.Finally, by comparing the results obtained with C-vine and D-vine, the advantages of using R-vine in this model were evaluated, as well as the necessity and effectiveness of using R-vine copula to improve the accuracy.
This article is structured as follows.Section 2 discusses the theoretical foundations of PC theory and R-vine copula.Section 3 presents a method for generating wind speed scenarios.Section 4 introduces the wind speed data used in this study and evaluates the effectiveness of the method.Section 5 draws a conclusion.

Principal Component Generation Process
PC theory can extract the primary characteristics of historical data by transforming the correlated historical data for each dimension into an uncorrelated set of PC values.For a given matrix, X containing n-dimensional sample data: where n is the dimension of input data, m x is the th m dimensional vector of input data, and The basic generation process of PC is as follows [10]: 1. Centralized input data: 2. Calculation of the covariance matrix of the input data: 3. Calculation of eigenvalue vector λ and eigenvector matrix U of matrix C : , ,..., , ...

Calculation of principal components:
T  ' Z U X (10) Each row in Z is a PC, and the order of PC is the number of the row it is in.

Copula Theory
Copula is a type of function that employs the marginal distribution function to create a joint distribution capable of capturing all correlated information between variables.Along with linear correlation, copula functions have the distinct advantage of being able to describe nonlinear correlation.Sklar's theorem, which serves as the theoretical foundation for copula functions, defines the joint distribution of variables.According to Sklar's theorem [23], for n-ary random variables ( ), ( ),..., ( ) , there is a unique copula function to obtain the joint distribution among variables which can be formulated as: ( , ,..., ) ( ( ), ( ),..., ( )) The joint probability density can be formulated as: where C is the copula function, c is the density function of C , and ( ), ( ),..., ( ) As illustrated in Figure 1, copula functions can be classified into two families: elliptical copula family and Archimedean copula family.The Archimedean copula family contains a number of different copula functions, the most frequently used of which are the Gumbel copula, the Clayton copula, and the Frank copula.The Gaussian copula and the t copula are the most frequently encountered in the family of elliptical copulas.

R-Vine Copula
Copula functions, as discussed in Section 2.2.1, exhibit a variety of characteristics [22].While Gaussian, t, and Frank copulas all have a symmetric structure, only the t copula is suitable for capturing upper and lower tail dependency structures.Both the Clayton copula and the Gumbel copula are asymmetric structures that can be used to represent lower and upper tail dependency structures, respectively.However, only the Gaussian and t copulas are suitable for describing the joint distribution of multivariate variables, whereas the other functions are restricted to establishing correlation between binary variables.As a result, a single copula function has limitations in capturing the correlation of multivariate variables with varying tail dependency structures.
The emergence of vine copula enables the solution of the problem of high-dimensional copula.The fundamental concept of vine copula is to decompose the multivariate copula probability density expression into a binary copula joint probability density product, as defined in Equation ( 13) [22].As can be seen from Equation ( 13), the decomposition is not unique when it comes to n-ary (n ≥ 3) copula functions., ( ) ( ( ), ( )) ( ) where v is vector composed of a set of variables, j v is a variable in v , and Different decomposition methods produce trees with a variety of distinct structures, each with one less layer than the mount of variables.And the structure of the tree is referred to as R-vine structure.The n-1 layer trees It should be stressed that when two edges on the tree i T are used as two connecting nodes on the tree i+1 T , it must be ensured that there is a common node between the two edges [34].Taking 4 n  as an example, Figure 2 shows a certain R-vine structure in the case of four-dimensional variables.Therefore, it can be deduced that the R-vine copula joint probability density of Equation ( 12) is [34]:  3. Formulation of Wind Speed Scenarios.

Reification of the Structure of the R-Vine Copula Model
The approach proposed by Dißmann et al. [34] for determining the structure of the R-vine copula model was used in this research.The process of developing an R-vine copula model is divided into three stages: the first stage involves determining the specific structure of each layer of the copula, the second stage involves estimating the parameters of the binary copula corresponding to each edge, and the third stage involves evaluating the goodness of fit of each binary copula.The R-vine copula model's trees are structured using the inverse standard of the minimal spanning tree approaches provided by the Prim's and Kruskal's algorithms.Prim's approach is used to estimate the greatest spanning tree of each layer of the R-vine copula model, which is more suitable for dense networks.Additionally, the weight assigned to each edge in Prim's algorithm is the empirical Kendall's rank correlation coefficient  for each binary copula function, which may be calculated as: (17) where i x and i y ( 1,2,..., ) i n  are the samples corresponding to binary copula, respectively, and n is the total number of samples in each sample set.
The Akaike information criterion (AIC) is used to assist in determining the optimal binary copula for each node of each tree for the R-vine copula model, which may be expressed as follows [28]: where k is the number of parameters of the corresponding copula pair, c is the copula density function, and i u and i v are the CDF values of the ith value of two sample sets, respectively.Maximum likelihood estimation (MLE) is applied to estimate the parameters of the binary copulas.For the two marginal CDFs , the maximum likelihood estimate of the copula functions can be expressed as [19]: In [34], a more extensive discussion of the R-vine copula model construction and parameter calculation is provided.Figure 3 depicts the flowchart for building the R-vine copula model structure.

Procedure of Scenario Generation Method
According to the introductions in Sections 2 and 3.1, the following are the precise processes for generating the wind speed scenarios based on the historical wind speed of p wind farms in q years: Step 1:  ; ;...; ( 1, 2,..., ) Step 2: For the wind speed data sets of p wind farms in the same hour , the PC generation process introduced in Section 2.1 is utilized to transform the p -dimensional spatially correlated wind speed sets into PC 1 2 ; ;...; Step 3: The kernel density estimation (KDE) method [21] is applied to estimate the marginal cumulative distribution function (CDF) of Step 4: Combined with Sections 2.2 and 3.1, the corresponding R-vine copula structure is determined for the transformed matrix ; ;...; ( 1, 2,..., ) cording to this, 24-dimensional temporal correlated scenarios ;...; S ;S S are generated.
Step 6: Through the data reconstruction as shown in Equation ( 21) [10], the scenario matrix 1 2 ;...; ( 1, 2,..., 24) , 1,2,...,24 (21) where i x and i U are the mean and the eigenvector matrix when the PCs are generated using the wind farms' data of the hour, which are shown as Equations ( 4) and ( 9).The pseudo code of the scenario generation is as follows:

Results and Analysis
This section begins by introducing wind speed measurements.Along with the five major copula functions illustrated in Figure 1, the R-vine structure also employs two additional significant copula functions, namely the Joe copula and the Ali-Mikhail-Haq copula.The validity of the scenario creation approach used in this article is then established, and the created 1000 wind speed scenarios are compared to the samples using three-dimensional figures.Finally, the other two models are used to generate the identical amount of scenarios, and some metrics are utilized to compare and examine the three models' accuracy.

Data Sources
The wind speeds of three adjacent wind farms were simulated and studied in this article.Wind speed data were provided by the National Renewable Energy Laboratory (NREL), and the three wind farms can be identified by the site IDs 604171, 605146, and 606121, respectively [35].Wind speed data for the three wind farms were collected over a six year period from 1 January 2007 to 31 December 2012 using wind turbines with a hub height of 100 m.The sampling interval for the sample values was one hour.Kendall's rank correlation coefficients between two wind speed sample sets for these three wind farms th i were 0.8992, 0.7872, and 0.8743, respectively, indicating that there were strong spatial correlations between the three wind farms.

Evaluation of the Process for Generating Wind Speed Scenarios
Wind speed samples were transformed into equivalent PC values using PC theory's orthogonal transformation and then converted to CDF values using KDE.Because the distribution of PC values in different orders is quite different, and the first PC frequently contains the majority of the information in the samples, in order to unify the legend, taking the first PCs at hour 7, 15, and 23 as examples, Figure 4 illustrates the probability density plots of the KDE and the corresponding frequency histogram of the PC values.To maintain consistency with the examples, the accompanying analysis used the first PCs of the three-time stamps [22].As illustrated in Figure 4, the proposed estimate method achieved the corresponding continuous probability distribution using discrete samples while retaining the numerical distribution's statistical properties.To demonstrate that the orthogonal transformation of PC theory eliminated correlation between PCs over the same time period, Figure 5 shows scatter plots of wind speed samples and corresponding PCs at the 7th, 15th, and 23rd h for three wind farms that have all been transformed to CDF values using KDE.The goal of translating all data into CDF values before to plotting was to unify the comparison's dimensions.As illustrated in Figure 5, the three PCs created from the sample data of three wind farms with clear spatial correlation were dispersed uniformly in the value space with no correlation.This conclusion established an effective rationale for focusing exclusively on the temporal correlation between PCs.To examine the PCs' temporal correlation across time, Figure 6 presents scatter plots of the CDF values of wind speed samples and the corresponding initial PCs for each wind farm (WF) at the 7th, 15th, and 23rd hours.As can be observed, the distribution space and degree of dispersion of the samples and PCs were nearly identical.This demonstrates that the PCs could still reflect the temporal correlation of samples.To summarize, the PC generation process used for the three wind farm samples in each period preserved the temporal correlation between the samples but also eliminated the spatial correlation between wind farms, providing theoretical support for subsequent consideration of only the temporal correlation among the PCs.The final wind speed scenarios were derived by performing inverse transformations on the PC scenarios generated using the R-vine copula theory.Figure 7 depicts scatter plots of wind speed samples and scenarios for each wind farm at the 7th, 15th, and 23rd hours.As illustrated in Figure 7, the generated wind speed scenarios were concentrated in the first half of the coordinate axis, and their distribution space and degree of dispersion were essentially identical to the wind speed samples.Figure 8 illustrates the link between wind speed samples and scenarios of the three wind farms for the 7th, 15th, and 23rd hours, respectively, demonstrating the same linear relationship between samples and scenarios.By comparing the wind speed samples and the generated scenarios, it was clear that the wind speed scenarios generated using the method described in this article retained all of the potential correlation characteristics of the wind speed samples and had a distribution interval for the values that was nearly identical to the distribution interval for the samples.It should be noted that due to the article's length constraint, only the 7th, 15th, and 23rd hours were randomly chosen for graphical example display.The comparison of the final wind speed scenarios to the wind speed samples demonstrated that this method was capable of obtaining the correlation and numerical characteristics of the sample data effectively.As a result, the results obtained using the method described in this article were consistent across all time periods.Their results are identical for time periods shown and not shown.

Comparison of Typical Scenario Generation Methods
Following a comparison of the wind speed samples to the created scenarios, this part compared the proposed model, dubbed the PC-R-vine model, against two other approaches using the same samples, applying multiple assessment criteria to assess each method's performance.As the first study to use PC theory to the generation of wind speed scenarios, this article made use of the PCA-ARMA model [10] as one of the comparison approaches.Additionally, the Hourly Mixed Copula Model (HMCM) [28], which has been proven to be superior to several earlier proposed approaches for generating wind speed scenarios, was used to compare with the proposed method in this research.
Three types of evaluation indicators can be used to categorize scenario generation: output-based evaluation, distribution-based evaluation, and event-based evaluation [36].To fully compare the advantages and disadvantages of various models, the three types of evaluation indicators mentioned above were used.

Output-based evaluation
In contrast to the mean absolute error which only indicates the range of error variation, the mean relative error can be used to indicate the accuracy of the scenario.As a result, this article used the average relative error indicator to compare the mean, standard deviation, skewness, and kurtosis of various numerical characteristics across scenarios and samples.The mean shows the data's overall numerical level and its relative error mean E in relation to the scenarios and samples can be expressed as follows [22]:  23)-( 25): , , where Kur are the samples' standard deviation, skewness, and kurtosis of wind farm N at the th i hour, respectively.The Euclidean distance between the correlation coefficients of the wind speed samples and the scenarios can indicate the degree to which the wind speed data has been restored to their original correlation.This article assessed the Euclidean distance of the correlation between samples and situations in terms of temporal and spatial correlation, as stated in [28]: Table 1 presents these metrics for the three models discussed in this section.Except for the standard deviation, all other evaluation metrics for PC-R-vine were superior to those for the other two models, as shown in Table 1.Specifically, when compared to the PCA-ARMA and HMCM models, the PC-R-vine model produced the lowest relative errors for the mean value, skewness, and kurtosis of the wind speed scenarios, indicating that the numerical range and probability distribution characteristics of the wind speed data generated by the PC-R-vine model were closer to those of the wind speed samples.In terms of standard deviation, the errors of the PC-R-vine and HMCM models relative to wind speed samples were much smaller than those of the PCA-ARMA model, but the PC-R-vine model had a bigger standard deviation than the HMCM model.This finding indicates that, when compared to the samples, the fluctuation degree of wind speed values generated by the PC-R-vine model was significantly less than that generated by the PCA-ARMA model, but slightly greater than that provided by the HMCM model.While HMCM had a lower relative error for standard deviation, its mean had a higher relative error, which means that while the generated wind speed scenarios had less fluctuation, their numerical level was lower than the sample data.Smaller fluctuations around low numerical values, which have a detrimental effect on the numerical distribution of the scenarios, were generated by this model to some extent.This is also why the relative errors for kurtosis and skewness associated with HMCM were greater.Meanwhile, it demonstrated that an effective model testing process necessitates a thorough evaluation using a variety of indicators.Additionally, the Euclidean distances in terms of temporal and spatial correlations in the PC-R-vine model was significantly smaller than that in the PCA-ARMA and HMCM models, indicating that the wind speed scenarios generated by the PC-R-vine model better retained the correlation characteristics of wind speed samples.
In summary, while the PC-R-vine model did not outperform the HMCM models in terms of standard deviation, it outperformed the two models in terms of other assessment criteria.As a result, it can be stated that the wind speed scenarios generated by the PC-Rvine model retained a higher degree of statistical fidelity to the wind speed samples.

Distribution-based evaluation
Along with numerical characteristics, it is necessary to investigate the distribution differences between scenarios and samples.To demonstrate the consistency of wind speed samples and scenarios of each wind farm, Quantile-Quantile plots (QQ plots) between the wind speed values derived by each model and the corresponding sample data are shown in Figures 9-11.As illustrated in Figure 9, for wind farm 1, the variation in probability distribution between the scenarios generated by HMCM model and the wind speed samples was the most noticeable.In comparison to the PC-R-vine model, the PCA-ARMA model's difference in probability distributions varied more prominently in the right corner of Figure 9b.Through comparison of Figure 10a-c, the discrepancy between the probability distributions of wind speed scenarios provided by the PC-R-vine model and wind speed samples was minimal for wind farm 2. This result was also supported by the wind speed data for wind farm 3, as illustrated in Figure 11a-c.As a result of the preceding, it can be inferred that the PC-R-vine model's probability distribution for wind speed situations is the most similar to the wind speed samples.

Event-based evaluation
The purpose of the event-based evaluation index is to conduct comparative analysis on the entire dataset.The coverage rate concept was used in this article to determine the degree to which the sample data was included in the generated scene.Due to the fact that the majority of sample data fell within the numerical range of the generated scenarios, this article appropriately modified the concept of coverage rate by converting it to an uncovered percentage metric (UPM).It can be stated in the following manner [36]:   SN is the number of scenarios.Table 2 summarizes the UPM results for the three models.As shown in Table 2, the PC-R-vine method produced the lowest result with a UPM value of 0.22%, indicating that the wind speed scenarios generated by the PC-R-vine model contained more wind speed samples than the other two methods.As a result, the wind speed scenarios generated by the method proposed in this article were more representative of the data as a whole, and their overall accuracy and reliability were higher.

PC-R-Vine
PCA-ARMA HMCM UPM (%) 0.22 0.46 0.49 The proposed method was compared to two other models using three different types of evaluation metrics in this article.The results indicated that the model proposed in this article outperformed the other two models in numerous aspects of these three indicator categories.By and large, the PC-R-vine was superior to the PCA-ARMA model in most dimensions.This is mostly owing to the flexibility of the copula theory utilized in the first model, which allows for the capture of nonlinear correlations, whereas the latter is limited to linear correlations.Additionally, the PC-R-vine model is preferable than the HMCM model because the R-vine copula is adaptable and flexible in terms of copula function selection, and PC theory and its inverse transformation assure the effective retention of correlations.
It should be noted that both the PC and R-vine copula theories used in this article are applicable to high-dimensional data, which means that as the number of wind farms increases, the results will remain stable and the accuracy will remain constant.As a result, this model is still applicable to a greater number of wind farms.Additionally, the primary goal of scenario generation methods is to ensure model accuracy, followed by runtime acceptability.Therefore, the duration of various scenario generation processes varies depending on the specific methods used, as long as they remain within an acceptable range.While scenario generation methods that do not use the vine structure have some advantages in runtime, their accuracy is frequently inferior to the method combined with the vine copula.The purpose of this article was to provide a method for scenario generation that is computationally efficient while maintaining accuracy, and thus there was no comparison of model running time.Although the running time of the scenario generation method proposed in this article was slightly inferior to that of the HMCM and PCA-ARMA models, it was completely within an acceptable range, and the method's accuracy was significantly higher than that of the other two models, which is consistent with the article's expectations and objectives.

Comparison of Several Vine Structure Models
To demonstrate the importance and effectiveness of R-vine in this article, the most frequently used C-vine and D-vine structures were applied to the scenario generation process introduced in Section 3.2.The R-vine copula was replaced with C-vine and D-vine copulas, and the accuracy of the wind speed scenarios generated by each was compared and analyzed.Due to the maturity of C-vine and D-vine copula techniques and the article's limited space, this article did not delve into detail about the structure of C-vine and D-vine copula.The accuracy of C-vine and D-vine was evaluated using the evaluation indicators introduced in Section 4.3, and the results are shown in Table 3.As shown in Table 3, there was little difference between the results obtained with Cvine and D-vine, which is due to the fixed structure of the two, limiting the difference between the two.In terms of skewness and temporal correlation, D-vine and C-vine had the smallest mean relative errors in Euclidean distance, respectively.From a single perspective, this demonstrated that D-vine deviated the least from the sample distribution and C-vine was the most capable of capturing temporal correlations.Nonetheless, as demonstrated in Section 4.3, their poor performance in other areas means that their advantages in a single area did not translate to the accuracy of the final results.
In comparison to the C-vine and D-vine copulas, the R-vine copula outperformed them in terms of mean, standard deviation, kurtosis, Euclidean distance with respect to spatial correlation, and UPM, indicating that the R-vine results were superior to those obtained by the other two structures in terms of numerical level, degree of fluctuation, ability to obtain spatial correlation, and degree of sample data inclusion.While R-vine copula performed slightly worse than C-vine and D-vine copula in terms of temporal correlation acquisition, R-vine copula generated correlation scenarios that were closer to the numerical level of the sample data.While C-vine and D-vine retained the greatest degree of temporal correlation, the sample data error was greater, resulting in a greater difference between the distribution and numerical characteristics of the final result and the sample data.As a result, the results obtained by R-vine were more precise than those obtained by C-vine and D-vine.
In addition, for the wind speed data of p wind farms with time resolution T , the parameter estimation amount of the method proposed in this article was from this that the method can greatly reduce the computational burden of parameters to be estimated.Therefore, the method proposed in this article was an effective scenario generation method that took into account both accuracy and computational efficiency.

Conclusions
By utilizing an accurate generation approach for wind speed scenarios, it is possible to significantly increase the reliability of incorporating wind power output while developing a power system that includes wind power.This article provided a new method for creating wind speed scenarios in the case of many wind farms that takes into account the spatiotemporal correlation of wind speeds.To begin, PC theory was used to turn each wind farm's 24 h wind speed data into PC values, thereby temporarily eliminating spatial correlation between wind farms.The simulation results indicated that the temporal correlation of the wind speed data between the PCs was maintained during this stage.Then, the R-vine copula was utilized to capture the temporal correlation between the collected PCs, and PC scenarios were constructed that account for the temporal correlation.Finally, using the inverse process of PC generation, the collected findings were translated into the final wind speed scenarios, restoring the spatial correlation between wind speed data.The simulation examination of wind speed data from three wind farms provided by NREL demonstrated that the proposed method was more accurate at capturing the correlation relationship than the HMCM and PCA-ARMA models.Additionally, the generated scenarios' statistical properties were more similar to the original data, and the probability distribution gap between the generated scenarios and the original data was lower.All of these data suggest that the strategy proposed in this article could produce more detailed wind speed scenarios.This method can be applied to scenario-based power system stochastic programming problems, providing support for the development of power system with wind power.
Although the combination of PC and R-vine copula theories increased calculation efficiency while maintaining the highest possible accuracy, it has some drawbacks.Restricted by the fact that PC theory is limited to the transformation of linear correlations, there are limitations in applying PC theory to model complex nonlinear correlations of wind farms, influencing its applicability in this case.The method proposed in this article reduced the accuracy of capturing complex nonlinear correlations when seeking a balance with computational efficiency.Future research can enhance PC theory's ability to deal with nonlinear correlation data sets by combining it with other transformations, allowing it to convert correlation data into completely independent data, as well as expand its applicability to data sets with different correlations, thereby increasing the accuracy of the method proposed in this article.

Figure 1 .
Figure 1.The main types of copula functions.

1 (D
is the set corresponding to the conditioning variables, ( ) a e and ( ) b e are any other variables excluding the variables in e D , e D x is the set of variables corresponding to e D , and ( ), ( ) e a e b e D c is the copula probability density of variable a and b .

Figure 3 .
Figure 3.The flowchart depicting the formation of the R-vine copula model.

( 1 ,
2,..., ) N N p  W represents dividing the historical data of the th N wind farm by time 1, 2,..., 24 T  h, where the ith(i = 1,2,…,24) row i N W is the historical wind speed set of wind farm N at the th i hour in 365 q days.

Z
is transformed into the values uniformly distributed over [0,1].
;S S of each time stamp transformed to PC values are transformed into wind speed values i WS that restore the spatial correlation among wind farms.

Algorithm 1  do 2 : 5 : 6 : 9 : 12 :
Mixed method to generate wind speed scenarios based on PC and R-vine copula theories.Input: Historical wind speed of p wind farms in q years.Output: Wind speed scenarios of p wind farms.1: for 1: i p Divide the samples of wind farm i into 24 sample sets by time 1, 2,..., 24 T Apply PC theory to the matrix consisting of sample sets of p wind farms at the th j hour.Transform the values of each PC into CDF values using KDE.7: Apply the R-vine copula model to generate PC scenarios according to the matrix consisting of the th k PCs of each hour.10: Restore the p generated PCs of the th l hour to wind speed scenarios by data reconstruction of PC theory and the inverse transformation of the kernel density function.13: end for

Figure 4 .
Figure 4. Histogram of frequency and probability plot of the first PC (a) at the 7th hour, (b) at the 15th hour, (c) at the 23rd hour.

Figure 5 .
Figure 5. Sample scatter plots with related PCs (a) at the 7th hour, (b) at the 15th hour, (c) at the 23rd hour.


are the Spearman's correlation coefficients for the th i hour and th j hour wind speed data for wind farm N in the samples and scenarios, respec- are the Spearman's correlation coefficients of wind speed data for wind farm N and wind farm M at the th i hour in the samples and scenarios, respectively.The smaller the above-mentioned six evaluation metrics, the more favorable the outcome.

Figure 9 .
Figure 9. QQ plot for WF 1 of samples and scenarios generated using (a) the PC-R-vine model, (b) the PCA-ARMA model, (c) the HMCM model.

Figure 10 .
Figure 10.QQ plot for WF 2 of samples and scenarios generated using (a) the PC-R-vine model, (b) the PCA-ARMA model, (c) the HMCM model.

Figure 11 .
Figure 11.QQ plot for WF 3 of samples and scenarios generated using (a) the PC-R-vine model, (b) the PCA-ARMA model, (c) the HMCM model.
 , where C and D are constants.It can be seen

Table 2 .
Each model's uncovered percentage results.