A Typical Scenario Generation Method Based on KDE-Copula for PV Hosting Capacity Analysis in Distribution Networks

Zhao, Bo; Jiang, Minglei; Wang, Xuyang; Wang, Ruizhang; Xiong, Jingyao; Yang, Nan; Li, Zhenhua

doi:10.3390/pr14040617

Open AccessArticle

A Typical Scenario Generation Method Based on KDE-Copula for PV Hosting Capacity Analysis in Distribution Networks

by

Bo Zhao

¹,

Minglei Jiang

¹,

Xuyang Wang

²,

Ruizhang Wang

²,

Jingyao Xiong

³,

Nan Yang

^3,*

and

Zhenhua Li

³

¹

Economic and Technical Research Institute, State Grid Jilin Electric Power Co., Ltd., Jilin 130022, China

²

State Grid Economic and Technical Research Institute Co., Ltd., Beijing 102209, China

³

College of New Energy and Electrical Engineering, China Three Gorges University, Yichang 443002, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(4), 617; https://doi.org/10.3390/pr14040617

Submission received: 9 January 2026 / Revised: 1 February 2026 / Accepted: 6 February 2026 / Published: 10 February 2026

(This article belongs to the Special Issue Applications of Smart Microgrids in Renewable Energy Development)

Download

Browse Figures

Versions Notes

Abstract

Wind-solar power generation is inherently uncertain. These uncertainties bring considerable difficulties to the assessment of hosting capacity. To tackle these difficulties, it is essential to create typical scenarios that can precisely capture the statistical traits and interrelationships of wind-solar power. In this research, we systematically integrate various scenario generation techniques, resulting in the creation of a holistic framework grounded in kernel density estimation (KDE) and Copula functions. Our proposed approach represents the stochastic nature of wind-solar power output by constructing their respective probability density functions (PDFs). It comprehensively depicts the potential spatiotemporal complementarity between wind-solar power by utilizing Copula functions and establishing a joint probability distribution model. Through Monte Carlo simulation, we generated a large number of wind-solar output scenarios. Subsequently, we employed the K-means clustering algorithm to reduce the number of scenarios. The findings reveal that the integrated framework, which combines KDE and Copula theory, achieves higher fitting accuracy for the marginal distributions and correlation structures of wind-solar power generation. As a result, the generated scenarios are more representative and reliable, offering strong support for photovoltaic (PV) hosting capacity analysis (HCA) and the formulation of typical plans. We validate the proposed method using historical wind-solar data from several representative regions in China, such as Inner Mongolia, northern Hebei, the Beijing–Tianjin–Hebei region, and Hubei Province. This validation demonstrates the method’s applicability under various geographical and climatic conditions.

Keywords:

wind-PV uncertainty; kernel density estimation (KDE); Copula function; scenario generation; Monte Carlo simulation; HCA

1. Introduction

Under the impetus of the worldwide shift towards a low-carbon and clean energy framework, the proportion of wind and photovoltaic (PV) power within the power system has been on a consistent upward trajectory. For China, the large-scale implementation of wind-solar power generation stands as a pivotal approach. It not only bolsters the low-carbon transformation of the power sector but also aids in realizing the nation’s “dual-carbon” goals. In this context, studies such as that by Schindler et al. [1] systematically assessed the spatiotemporal complementarity of wind-solar resources, providing a foundation for regional planning. Yang et al. [2,3] further advanced planning models that integrate electric vehicles and multi-agent coordination, enhancing the economic and reliability dimensions of renewable integration. Meanwhile, Vechkinzova et al. [4] and Nema et al. [5] reviewed global trends and hybrid system developments, respectively, highlighting the technological and market pathways toward decarbonization.

Nevertheless, the electricity generation from these renewable energy sources is highly contingent on meteorological and environmental factors and exhibits marked randomness, volatility, and intermittency. These characteristics introduce considerable uncertainty into system planning and operation, as documented in several key studies. For instance, Yang et al. [6] developed a metric-based decision-making framework for security-constrained unit commitment, addressing renewable uncertainty through deep learning. Stiebler [7] provided a systematic treatment of wind energy conversion and its grid integration challenges. Yang et al. [8] further examined pricing strategies under spatiotemporal renewable variability. Complementarity and variability in wind-solar generation were also quantitatively analyzed by Couto et al. [9], while Suchitra et al. [10] focused on optimal design of hybrid systems in distribution networks, emphasizing reliability under uncertainty. This introduces considerable uncertainty into system planning and operation. To enhance the dependability of planning strategies, it is imperative to create typical scenarios that precisely capture the unique output characteristics of wind-solar power as well as their inter-relationships.

To enhance the dependability of planning strategies under such uncertainties, generating representative scenarios that accurately capture both the marginal characteristics and the spatiotemporal dependencies of wind-solar outputs is essential. This study therefore focuses on typical scenario generation for joint wind-solar output, with the core purpose of providing a statistically consistent and representative scenario set for downstream applications such as PV hosting capacity analysis (HCA) in distribution networks. The proposed method integrates kernel density estimation (KDE) and Copula theory to explicitly model marginal distributions and correlation structures, offering a unified and interpretable framework for uncertainty modeling in renewable-rich power systems.

To achieve this, a review of existing scenario generation approaches is necessary. Current approaches for generating typical operational scenarios can be broadly grouped into three types: predictive, optimization-driven, and sampling-centered methods. Predictive methods make use of a large amount of historical data to train forecasting models, which are then used for scenario generation. Techniques falling under this category include auto-regressive moving average (ARMA) models and various machine learning methods [11]. In recent years, generative adversarial networks (GANs), which have witnessed rapid development, have been widely adopted for generating wind-solar output scenarios [12]. For example, in references [13,14,15], the adversarial learning mechanism of GANs is leveraged to tackle uncertainties and correlation problems in modeling wind-solar power output. Optimization-based methods mainly concentrate on scenario reduction by aggregating historical data samples. For instance, reference [16] puts forward a method for generating typical wind power scenarios based on an enhanced K-Means clustering algorithm. It clusters and reduces wind power data within a specific time frame to extract a representative set of typical scenarios. To address the high randomness in clustering parameters and initial cluster centers of the basic K-means clustering algorithm, reference [17] improves the algorithm by integrating the clustering validity index (Density-based index, DBI) and hierarchical clustering. It further uses the improved K-means clustering algorithm to generate typical daily output scenarios of photovoltaic power. Reference [18] employs a heuristic moment-matching approach, applying matrix and cubic transformations to historical data of multiple random variables. This enables the extraction of representative typical scenarios from a large initial scenario set while preserving the stochastic characteristics of the historical data, thus improving the effectiveness of scenario reduction. Sampling-centered methods extract stochastic features from historical output data and generate scenarios that conform to these statistical characteristics through sampling processes. The Monte Carlo method forms the basis for references [19,20,21,22], where certain enhancements, such as Copula function sampling and Latin hypercube sampling, are introduced to generate a series of scenarios derived from historical data. Although the above-mentioned studies generate scenarios from various angles, each method has its drawbacks. Scenarios generated by predictive methods often lack diversity and show weak correlations among multiple variables. Sampling methods heavily rely on historical data; if the dataset is not comprehensive, the resulting scenarios may lack representativeness. Moreover, optimization-based models usually involve complex solution procedures and may have poor accuracy in fitting probability distributions.

To tackle the problems mentioned above, recent research has combined various wind-solar typical scenario generation methods [23,24,25]. These hybrid strategies mainly employ optimization-based frameworks and incorporate prediction- and sampling-based techniques as supplementary elements. This combination aims to lower computational complexity while enhancing the quality and representativeness of the generated scenarios. For example, in reference [26], sampling is carried out using the Frank–Copula function. This sampling result is then combined with the Conditional Wasserstein Generative Adversarial Network with Gradient Penalty (CWGAN-GP) for prediction purposes. Additionally, an improved K-Means clustering method, which is optimized through the whale optimization algorithm (WOA), is applied to cluster the generated output data. The integration of these three types of methods helps to preserve the diversity of scenarios while maintaining computational efficiency. However, these methods still face shortcomings in correlation modeling and show inadequate accuracy when it comes to fitting marginal distributions.

A structured comparison of these method categories, highlighting their inherent trade-offs, is summarized in Appendix A (Table A1).

In light of the aforementioned shortcomings, this study proposes a novel and integrated framework for generating typical wind-solar scenarios. The core contributions and novelty of this work are summarized as follows:

(1): A Unified Modeling Framework: We develop a holistic methodology that seamlessly integrates Kernel Density Estimation (KDE) and Copula theory. This framework uniquely couples nonparametric marginal fitting with correlation structure modeling in a single, statistically consistent process, addressing the common decoupling issue in hybrid methods.
(2): Enhanced Accuracy in Both Marginal and Dependence Characterization: The proposed method eliminates restrictive parametric assumptions for wind-solar outputs through KDE, achieving superior fitting accuracy for complex, real-world data distributions. Simultaneously, it employs a rigorous, metric-based Copula selection criterion to optimally capture the spatiotemporal complementarity between wind-solar resources, overcoming the correlation modeling weaknesses found in many existing approaches.
(3): A Practical and Interpretable Tool for Grid Planning: Beyond theoretical modeling, the framework is designed for engineering applicability. It yields more representative, interpretable, and computationally efficient typical scenarios compared to direct historical clustering or “black-box” generative models. This directly enhances the reliability of critical grid planning studies, such as PV HCA.

The effectiveness and advantages of this framework are empirically validated using multi-regional data from China in the subsequent sections.

2. KDE and Copula Theory Principles

2.1. Principle of KDE

In the application of the KDE method, the distance between

x

and its neighboring points are calculated to analyze their proximity, with their contribution to the estimated density

\hat{f} (x)

thereby determined. Let

X_{1}, X_{2}, \dots, X_{n}

denote independent and identically distributed samples drawn from

X

and let

f (x)

represent the unknown density function that

X

follows. The estimation process [27] of

f (x)

is given as follows:

\hat{f} (x) = \frac{1}{n h} \sum_{i = 1}^{n} K (\frac{x - X_{i}}{h})

(1)

where

n

is the sample size,

h

represents the bandwidth,

K (x)

stands for the kernel function, and

\hat{f} (x)

denotes the estimated probability density at point

x

.

The kernel function

K (x)

must satisfy three fundamental conditions: non-negativity, symmetry, and normalization, as presented in Equation (2):

\{\begin{cases} K (x) \geq 0 \\ K (x) = K (- x) \\ \int_{- \infty}^{+ \infty} K (x) d x = 1 \end{cases}

(2)

In addition to selecting the kernel function, the bandwidth h is a critical parameter that requires careful optimization. The performance of the estimator

\hat{f} (x)

is often evaluated using the Mean Integrated Squared Error (MISE) with respect to the true density

f (x)

, which is defined as

\begin{matrix} M I S E (h) = \int E {[{\hat{f}}_{h} (x) - f (x)]}^{2} d x \\ = \frac{R (K)}{n h} + \frac{h^{4}}{4} {[u_{2} (K)]}^{2} R (f^{″}) + o (h^{4}) + o (\frac{1}{n h}) \end{matrix}

(3)

where

f^{″} (\cdot)

denotes the second derivative of the true density function

f (x)

. In addition,

u

is a dummy variable of integration. The quantities

u_{2} (K)

,

R (K)

, and

R (f^{″})

are respectively defined as

\{\begin{array}{l} u_{2} (K) = \int u^{2} K (u) d u \\ R (K) = \int K^{2} (u) d u \\ R (f^{″}) = \int {f^{″}}^{2} (u) d u \end{array}

(4)

Accordingly, the expression for the Asymptotic Integrated Mean Squared Error (AMISE) is as follows:

A I M S E = \frac{R (K)}{n h} + \frac{h^{4}}{4} {[u_{2} (K)]}^{2} R (f^{″}) + o (h^{4}) + o (\frac{1}{n h})

(5)

The asymptotic optimal bandwidth

h_{0}

can be expressed as

h_{0} = {(\frac{R (K)}{u_{2}^{2} (K) R (f^{″})})}^{\frac{1}{5}} n^{- \frac{1}{5}}

(6)

Since

R (f^{″})

is generally unknown, it requires appropriate estimation. According to Silverman’s rule-of-thumb [28], if the underlying density

f (x)

is assumed to follow a normal distribution

N (0, σ^{2})

where

σ

denotes the standard deviation,

R (f^{″})

can be analytically evaluated as

R (f^{″}) = \frac{(3 π^{- 0.5} σ^{- 5})}{8}

(7)

This quantity

R (f^{″})

measures the overall curvature of the density function and directly affects the bias term in the AMISE-based bandwidth selection. As the Gaussian kernel function is adopted in this study, minimizing the AMISE with respect to the bandwidth

h

leads to the well-known Silverman rule-of-thumb. Accordingly, the optimal bandwidth

h

can be expressed as

h = 1.06 σ n^{- \frac{1}{5}}

(8)

where

σ

and

n

are obtained from the historical samples of each marginal variable (wind/PV) within each season (Section 3.1).

2.2. KDE Model Verification Method

(1): Goodness-of-fit test

By evaluating the goodness of fit, the performance of each fitting method is assessed against the original data, enabling the identification of the most appropriate fitting function. Two commonly used goodness-of-fit tests are the Pearson

χ^{2}

and Kolmogorov–Smirnov (K-S) tests.

1.: Pearson $χ^{2}$

In the Pearson

χ^{2}

test, let

X_{1}, X_{2}, \dots, X_{n}

denote a sample of size

n

drawn from population

X

. The PDF and cumulative distribution function (CDF) of

X

are

f (x)

and

F (x)

, respectively. The entire range of data is divided into

k

mutually exclusive subintervals, and the number of sample observations falling into each subinterval is counted. The Pearson

χ^{2}

test statistic is then given by

χ^{2} = \sum_{i = 1}^{k} \frac{{(v_{i} - n p_{i})}^{2}}{n p_{i}}

(9)

where

v_{i}

denotes the number of samples in the

i

-th interval and

p_{i}

represents the theoretical probability of samples expected to fall within the

i

-th interval.

As the sample size

n

tends to infinity, the distribution of test statistic

χ^{2}

converges to a chi-square distribution with degrees of freedom. For a given confidence level

α

, the corresponding chi-square quantile is expressed as

P (χ_{k - 1}^{2} < χ_{k - 1}^{2} (α)) = α

(10)

If the calculated statistic satisfies

χ^{2} < χ_{k - 1}^{2} (α)

, this indicates that the probability distribution

F (x)

meets the goodness-of-fit requirement at the confidence level

α

.

2.: K-S Test

Following the Pearson

χ^{2}

test approach, let

X_{1}, X_{2}, \dots, X_{n}

denote a random sample of size

n

drawn from population

X

. The PDF and CDF of

X

are denoted by

f (x)

and

F (x)

, respectively.

Based on the original samples, the empirical CDF

F_{0} (x)

is constructed as

F_{0} (x) = \{\begin{matrix} 0, & x < X_{(1)} \\ \frac{k}{n}, & X_{(k)} \leq x < X_{(k + 1)}, k = 1, 2, \dots, n - 1 \\ 1, & x \geq X_{(n)} \end{matrix}

(11)

The test statistic is

D_{n} = \max_{1 \leq i \leq n} | F_{n} (x) - F_{0} (x) |

(12)

If

D_{n} < D_{n} (n, α)

holds, this indicates that the probability distribution

F (x)

at the confidence level

α

satisfies the goodness-of-fit requirement.

(2): Fitting accuracy test

The goodness-of-fit test is employed to determine whether the probability model of wind-solar output can adequately represent its actual data. This is a qualitative test. If both conditions

χ^{2} < χ_{k - 1}^{2} (α)

and

D_{n} < D_{n} (n, a)

are satisfied, the model can be considered to adequately fit the actual output. The fitting accuracy test quantitatively measures the discrepancy between the modeled probability distribution of wind-solar output and the frequency distribution of the actual data. From a quantitative perspective, two indicators—mean absolute percentage error (MAPE) and root mean square error (RMSE)—are employed to evaluate the model’s fitting accuracy.

M A P E = \frac{1}{k} \sum_{i = 1}^{k} |\frac{p_{g i} - p_{o i}}{p_{o i}} \times 100 %|

(13)

R M S E = \sqrt{\frac{1}{k} \sum_{i = 1}^{k} {(p_{g i} - p_{o i})}^{2}}

(14)

In the formula,

k

represents the number of intervals,

p_{o i}

denotes the probability of the i-th interval in the frequency histogram, and

p_{g i}

represents the probability of the i-th interval in the wind-solar output probability model.

2.3. Theoretical Principles of the Copula Function

(1): The definition and basic properties of the Copula function

The mathematical definition of the Copula function [29,30] is given as follows:

F (x_{1}, x_{2}, \dots, x_{n}) = C (F_{X_{1}} (x_{1}), F_{X_{2}} (x_{2}), \dots, F_{X_{i}} (x_{i}))

(15)

where

n

represents the number of variables,

F_{X_{i}} (x_{i})

denotes the marginal distribution function of the variable,

C (\cdot)

represents the Copula function, and

F (x_{i}, x_{2}, \dots, x_{n})

denotes the joint distribution function of

n

variables.

Copula functions are generally classified into two families: Archimedean Copulas and elliptical Copulas. The Archimedean Copula family can be expressed as follows:

C (u_{1}, u_{2}, \dots, u_{n}) = φ^{- 1} [φ (u_{1}), φ (u_{2}), \dots φ (u_{n})]

(16)

where

φ (u)

denotes the generator of the Archimedean Copula function;

φ^{- 1} (u)

denotes its inverse function.

φ (u)

satisfies

\sum_{i = 1}^{N} φ (u_{i}) \leq φ (0)

(17)

where

φ (u)

possesses the three properties, as shown in Equation (18):

\{\begin{cases} φ (1) = 0 \\ φ^{'} (u) < 0 \\ φ^{″} (u) > 0 \end{cases}

(18)

Different generator functions give rise to different forms of the Archimedean Copula. Among these forms, the most commonly utilized ones are the Gumbel Copula, Clayton Copula, and Frank Copula.

The generator of the Gumbel Copula is

{(- \ln x)}^{θ}

, where

θ \in [1, + \infty)

. The corresponding Copula expression is given by Equation (19):

C (u, v) = \exp [- {[{(- \ln u)}^{θ} + {(- \ln v)}^{θ}]}^{1 / θ}]

(19)

The generator of the Clayton Copula is

(1 / θ) (x^{- θ} - 1)

, with

θ \in [- 1, 0) \cup (0, + \infty)

. The corresponding Copula expression is defined as

C (u, v) = \max [{(u^{- θ} + v^{- θ} - 1)}^{- 1 / θ}, 0]

(20)

The generator of the Frank Copula function is

φ (x) = - \ln \frac{e^{x} - 1}{e^{- x} - 1}

, where

θ \in (- \infty, 0) \cup (0, + \infty)

. The corresponding Copula function is given by

C (u, v) = (- \frac{1}{θ}) \ln [1 + \frac{(e^{- θ u} - 1) (e^{- θ v} - 1)}{e^{- θ} - 1}]

(21)

It is noteworthy that the elliptic Copula family mainly comprises the Gaussian Copula and t-Copula.

In the Gaussian Copula,

u

and

v

are bivariate variables and their linear correlation coefficient is

ρ

, where

ρ \in [- 1, 1]

. The corresponding Gaussian Copula is defined as

C (u, v) = \int_{- \infty}^{Φ^{- 1} (u)} \int_{- \infty}^{Φ^{- 1} (v)} \frac{1}{2 π \sqrt{1 - ρ^{2}}} \exp [- \frac{s^{2} - 2 ρ s t + t^{2}}{2 (1 - ρ^{2})}] d s d t

(22)

where

Φ (\cdot)

represents the standard normal distribution and

Φ^{- 1} (\cdot)

denotes its inverse function.

Similarly, in the bivariate t-Copula, let the linear correlation coefficient between

u

and

v

be

ρ

. The corresponding Copula function can be expressed as

C (u, v) = \int_{- \infty}^{Φ^{- 1} (u)} \int_{- \infty}^{Φ^{- 1} (v)} \frac{1}{2 π \sqrt{1 - ρ^{2}}} {[1 + \frac{s^{2} - 2 ρ s r + r^{2}}{k (1 - ρ^{2})}]}^{- (k + 2) / 2} d s d r

(23)

(2): Correlation Coefficient of the Copula function

As a statistical measure, the correlation coefficient reflects the degree of linear dependence between variables. The most commonly used coefficients include Kendall’s rank correlation coefficient and Spearman’s rank correlation coefficient.

1.: Kendall’s rank correlation coefficient

Let

(X_{1}, Y_{1})

and

(X_{2}, Y_{2})

be two pairs of independent and identically distributed random variables. Their Kendall rank correlation coefficient

τ

is defined as

τ = P [(X_{1} - X_{2}) (Y_{1} - Y_{2}) > 0] - P [(X_{1} - X_{2}) (Y_{1} - Y_{2}) < 0]

(24)

where

P [(X_{1} - X_{2}) (Y_{1} - Y_{2}) > 0]

represents the probability that

X

and

Y

move in the same direction, while

P [(X_{1} - X_{2}) (Y_{1} - Y_{2}) < 0]

represents the probability that they move in opposite directions.

The Kendall’s rank correlation coefficient

τ

measures the concordance between

X

and

Y

. When

τ = 1

,

X

and

Y

are perfectly concordant. When

τ = - 1

, they are completely discordant. When

τ = 0

,

X

and

Y

are independent.

2.: Spearman’s rank correlation coefficient

Let

(X_{1}, Y_{1})

,

(X_{2}, Y_{2})

, and

(X_{3}, Y_{3})

be three pairs of independent and identically distributed random variables. The Spearman rank correlation coefficient

ρ_{s}

between

X

and

Y

is defined as

ρ_{s} = 3 {P [(X_{1} - X_{2}) (Y_{1} - Y_{3}) > 0] - P [(X_{1} - X_{2}) (Y_{1} - Y_{3}) < 0]}

(25)

From the above formula, it can be observed that

ρ_{s}

is three times the difference between the probabilities of concordance and discordance among the pairs

(X_{1}, Y_{1})

and

(X_{2}, Y_{2})

, reflecting the strength of their monotonic relationship.

(3): Optimal selection of the Copula function

Let

(x_{i}, y_{i}) (i = 1, 2, \dots, n)

denote the sample of the bivariate random variable

(X, Y)

and let

F_{n} (x)

and

G_{n} (x)

denote the empirical distribution functions of the one-dimensional variables

X

and

Y

, respectively. The empirical Copula function for the sample is defined as

C_{n} (u, v) = \frac{1}{n} \sum_{i = 1}^{n} I_{[F_{n} (x_{i}) \leq u]} I_{[G_{n} (y_{i}) \leq v]}

(26)

where

I_{0}

denotes the indicator function. When

F_{n} (x_{i}) \leq u

,

I_{[F_{n} (x_{i}) \leq μ]} = 1

.

The optimal Copula is determined based on the square Euclidean distance. The square Euclidean distance is defined as

d^{2} = \sum_{i = 1}^{n} | C_{n} (u_{i}, v_{i}) - C (u_{i}, v_{i}) |^{2}

(27)

where

u_{i} = F (x_{i})

,

v_{i} = G (y_{i})

.

In practice, several candidate Copula families (t-Copula and typical Archimedean Copulas, such as Clayton, Gumbel, and Frank) are fitted to the dataset and their parameters are estimated during the Copula fitting process. The Copula with the smallest squared Euclidean distance to the empirical Copula in Equation (27) is selected. The squared Euclidean distance reflects the closeness between different Copula models and the empirical Copula function. A smaller

d^{2}

value indicates superior fitting performance.

3. Generation and Reduction of Wind-Solar Power Output Scenarios

3.1. Generation of Wind-Solar Power Output Scenarios Based on Monte Carlo Simulation

Scenario analysis addresses system uncertainty by constructing deterministic scenarios from probabilistic models of wind-solar output. It consists of two main steps: scenario generation and scenario reduction. Scenario generation involves the production of a series of scenarios via sampling based on the characteristics of historical data, with these scenarios guaranteed to capture the inherent uncertainties of the system [31]. In this study, wind-solar combined output scenarios are generated by means of a Monte Carlo-based approach. To preserve seasonal characteristics, the historical dataset is first partitioned into four seasonal subsets (spring, summer, autumn, and winter). For each season, the KDE-based marginal fitting and Copula-based dependence modeling are conducted independently and the corresponding MCS scenarios are generated from the season-specific joint model. The generated scenarios of each season are then reduced separately using K-means clustering to obtain season-specific typical scenarios and their occurrence probabilities.

After the Copula function is selected according to Section 2.2, the joint PDF is sampled. This sampling process generates a large number of samples [32]. The main steps are as follows:

(1): First, the historical data of n days are divided into hourly intervals, with each day consisting of 24 time periods.
(2): The KDE method is adopted for the performance of non-parametric fitting on the data of each time period, thus obtaining the marginal PDF.
(3): The Copula function is employed to link the marginal PDFs of wind-solar outputs at each time step to form a joint PDF. Subsequently, the Monte Carlo method is applied for the generation of the daily wind-solar power output curves.
(4): Step (3) is repeated to generate a large number of wind-solar power output scenarios. To reduce redundancy among generated scenarios, K-means clustering is further adopted for scenario reduction, with representative typical scenarios extracted.

The steps for generating the entire set of scenarios are presented in Figure 1.

3.2. Typical Scenario Reduction of Wind-Solar Output Based on K-Means

Given that the

N

generated wind-solar power scenarios exhibit a high degree of similarity, scenario reduction is required to improve computational efficiency. Accordingly, this study employs the K-means clustering method to aggregate and reduce scenarios with similar characteristics. Let the initial reserved set be

C_{0} = T

, where

T = {1, 2, \dots, N}

. The initial reduction set is

R_{0} = \emptyset

. The specific reduction steps are as follows:

(1): Randomly select K scenarios from the scenario set $T$ as cluster centers and denote them as ${c_{1}, c_{2}, \dots, c_{K}}$ .
(2): Assign each scenario $x_{j} \in T$ to cluster $C_{k}$ , whose center $C_{k}$ is the nearest. The Euclidean distance between $x_{j}$ and each cluster center $C_{k}$ is calculated as follows:

$d_{j, k} = \sqrt{\sum_{i = 1}^{N} {(x_{j, i} - c_{k, i})}^{2}}$

(28)
(3): Recalculate each cluster center as the mean of all scenarios within the corresponding cluster:

$c_{k} = \frac{1}{| C_{k} |} \sum_{x_{j} \in C_{k}} x_{j}$

(29)
(4): Steps (2) and (3) are repeated until the cluster centers converge or the maximum number of iterations is reached.
(5): Select one representative scenario from each cluster to form the reduction set $R$ .

Determination of the Number of Clusters (K): The selection of K = 4 for the final typical scenarios was informed by a combination of quantitative analysis and practical considerations. We employed the Elbow Method by plotting the within-cluster sum of squares (WCSS) against the number of clusters (K) and observed a distinct “elbow point” at K = 4. This was further validated by calculating the Silhouette Coefficient for different K values; K = 4 yielded a favorable balance between cluster cohesion and separation. This number also aligns with the practical need for a parsimonious yet representative set of scenarios for seasonal analysis (Spring, Summer, Autumn, Winter) and matches the granularity often used in operational planning studies.

4. Simulation Analysis

4.1. Case Illustration

The historical photovoltaic and wind power data used in this section were derived from the Xihe Energy Meteorological Big Data Platform. The typical scenarios in Section 3, which focus solely on distributed photovoltaic outputs, were constructed on the basis of solar irradiance data from a representative region in Hubei Province. In contrast, the wind-solar resource data used in Section 4 were collected from multiple representative regions. For confidentiality reasons, the names of the studied regions are anonymized in this paper. Specifically, Area A corresponds to a region in Inner Mongolia, Area B corresponds to a region in northern Hebei, and Area E corresponds to the Beijing–Tianjin–Hebei region. The hourly PV output and wind power of each region throughout 2023 are shown in Appendix B. In addition, the Gaussian function was selected as the kernel function in the KDE process due to its favorable smoothness and numerical stability. As a symmetric and infinitely differentiable kernel, it yields smooth density estimates and is widely adopted as a standard choice in KDE applications. Moreover, the Gaussian kernel is consistent with Silverman’s rule-of-thumb for bandwidth selection, which enables a closed-form and widely accepted bandwidth expression.

4.2. Simulation Process

With the construction of a typical wind-solar scenario in a representative area of Inner Mongolia taken as an example, the KDE method was applied for the fitting of photovoltaic output and wind power in this area. The normalized fitting results of the photovoltaic output and wind power output in a certain area of Inner Mongolia are presented in Figure 2.

The goodness-of-fit and fitting accuracy of the results presented in Figure 2 were evaluated, and the corresponding outcomes are summarized in Table 1. Specifically, the significance level of the Pearson test was set to 0.05, with a critical value of 28.869, while the critical value for the K-S test was 0.0202.

The significance level (α = 0.05) was adopted as a commonly used criterion in goodness-of-fit testing. For the Pearson

χ^{2}

test, the critical value was determined from the

χ^{2}

distribution at the specified significance level, with the degrees of freedom defined by the number of histogram intervals used in the test. For the K-S test, the critical value was computed according to the standard Kolmogorov–Smirnov criterion based on the sample size at the same significance level.

As presented in Table 1, the KDE results for PV and wind power outputs met the requirements of both the goodness-of-fit test and the fitting accuracy test. The specific expressions for the KDE of PV and wind power outputs were as follows:

For the PV power output, the KDE was expressed as

\hat{f} (x) = 4.25 \times 10^{- 3} \sum_{i = 1}^{8760} K (\frac{x - X_{i}}{0.0522})

(30)

For the wind power output, the KDE was given by

\hat{f} (x) = 2.82 \times 10^{- 3} \sum_{i = 1}^{8760} K (\frac{x - X_{i}}{0.0405})

(31)

The joint probability density of wind-solar outputs for a representative region in Inner Mongolia was analyzed using four types of Copula functions, namely, t-Copula, Clayton-Copula, Gumbel-Copula, and Frank-Copula. The corresponding Copula density functions and distribution functions are presented in Figure 3.

According to the selection criterion in Section 2.3, Frank-Copula was chosen because it yielded the smallest squared Euclidean distance (Table 2). The Kendall’s τ and Spearman’s ρ values derived from the Frank-Copula function were closest to those of the sample’s empirical Copula, and the corresponding squared Euclidean distance was the smallest. This demonstrates that the Frank-Copula function exhibited the best fitting performance among the compared models. Furthermore, given that the Frank-Copula function can capture both positive and negative dependencies between variables, it is well-suited for modeling the complementary relationship between wind-solar power outputs, outputs that often exhibit negative correlation characteristics. Therefore, the Frank-Copula function was adopted for modeling, with the parameter set as

θ = 0.2605

. The specific expression of the joint wind-solar power distribution function was as follows:

C (u, v) = - 3.839 \ln [1 - 4.353 (e^{- 0.2605 u} - 1) (e^{- 0.2605 v} - 1)]

(32)

A large number of random samples were drawn from the wind-solar joint probability model using the Monte Carlo method, generating a large number of wind-solar power output scenarios, as presented in Figure 4. The typical scenarios obtained after scenario reduction via the K-means clustering method are presented in Figure 5 and their corresponding probabilities were 0.25, 0.3425, 0.175, and 0.2325, respectively.

To provide a historical-only benchmark, a baseline set of scenarios was obtained by directly clustering the observed daily profiles without applying KDE-based marginal reconstruction, Copula-based dependence modeling, or Monte Carlo sampling. Specifically, the historical PV and wind output profiles were grouped using the same K-means reduction procedure as the proposed framework, yielding baseline typical scenarios and their occurrence probabilities.

Compared with directly clustering historical profiles, the proposed KDE-Copula framework first calibrated the marginal distributions using KDE and then modeled the wind-solar dependence via the selected Copula before scenario reduction. This provided a statistically calibrated scenario pool for subsequent K-means reduction. In contrast, clustering performed directly on raw historical profiles did not explicitly enforce marginal goodness-of-fit or dependence consistency, and the resulting typical scenarios could inherit irregularities contained in the original observations. This consideration became more relevant when the scenario set was compressed into a small number of typical scenarios, where preserving distributional characteristics and dependence was crucial.

Figure 5 and Figure 6 compare the typical daily wind-solar output patterns obtained from (i) directly clustering historical profiles (baseline) and (ii) clustering scenarios generated by the proposed KDE-Copula-based framework (proposed). For photovoltaic output (Figure 6a), both methods reproduced the expected diurnal bell-shaped profile; however, the baseline typical curves exhibited more local irregularities and inter-scenario crossings during the ramp-up/ramp-down periods, reflecting the fact that the cluster representatives were directly inherited from finite historical realizations. In contrast, the proposed typical PV curves were smoother and more consistently separated across scenarios, with a more stable peak time and clearer stratification of scenario levels, which was consistent with generating the scenario pool from a calibrated marginal model prior to reduction. For wind power output (Figure 6b), the difference was more pronounced: typical baseline curves showed stronger intra-day fluctuations and more frequent crossings between scenarios, while the proposed curves presented more coherent temporal trajectories and a more stable ordering among scenarios over the day. Overall, the proposed method yielded typical scenarios that were more interpretable and less affected by idiosyncratic artifacts in individual historical samples because the scenario reduction was performed on a large-scale scenario pool generated from explicitly calibrated marginals and dependence modeling, rather than directly on raw observations.

During the generation process of typical scenarios in this method, the computation time for each step was as follows: fitting the marginal distribution of each season using KDE took approximately 5–10 s, selecting the Copula function and estimating its parameters took approximately 3–5 s, generating 400 joint samples using Monte Carlo took approximately 2–4 s, and reducing scenarios using K-means clustering took approximately 1–2 s. The entire process took no more than 30 s on a standard workstation (Intel i7, 16GB RAM), making it suitable for rapid evaluation needs in practical engineering applications.

4.3. Simulation Results

Based on the above steps, typical scenarios for a representative area in northern Hebei and another in the Beijing–Tianjin–Hebei region were generated for the four seasons, namely, spring, summer, autumn, and winter. The typical seasonal days for spring, summer, autumn, and winter corresponded to Scenarios 1, 2, 3, and 4, respectively, as presented in Figure 7 and Figure 8. Given that Section 3 focuses exclusively on the typical scenarios of distributed PV power, the combined distribution of wind-solar output is not considered herein. Four typical scenarios of distributed PV output in a representative area of Hubei Province were generated for the entire year; their corresponding scenario probabilities are presented in Table 3. To further verify the accuracy of the proposed method, we compared the differences in key statistics (such as mean, variance, peak value, etc.) between the generated scenarios and historical data. The results indicate that the proposed method outperforms the direct clustering method (baseline) in both MAPE and RMSE metrics. The specific comparison is shown in Table 4.

4.4. Sensitivity Analysis of Key Parameters

(1): Sensitivity to KDE Bandwidth Selection

The bandwidth ‘h’ is a crucial parameter in KDE that balances bias and variance. We compare the widely adopted Silverman’s rule-of-thumb with a cross-validation (CV) based method. Table 5 shows that while CV occasionally yielded a marginally better fit (slightly lower MAPE for wind power), the difference was negligible (MAPE difference < 0.5%). Silverman’s rule provided stable results with significantly lower computational cost, justifying its use in our framework for practical engineering applications.

(2): Sensitivity to Monte Carlo Sample Size

The number of initial scenarios (N) generated via MCS affected the statistical stability of the final reduced set. We varied N from 200 to 2000. Key statistical moments (mean, standard deviation) of the final typical scenarios stabilized when N ≥ 1000. The computational time increased linearly with N. A sample size of N = 1000 was chosen as a standard, offering a robust compromise between statistical reliability and computational overhead.

(3): Sensitivity to Cluster Number K in K-means

The number of typical scenarios (K) was a critical user-defined parameter. We evaluate K = 3, 4, 5, and 6 using the Elbow Method (WCSS) and the Silhouette Coefficient. The WCSS curve showed a distinct elbow at K = 4, and the Silhouette Coefficient was maximized at K = 4 (0.61), indicating well-separated and cohesive clusters. Choosing K = 5 or 6 led to smaller, less distinct clusters with overlapping characteristics, reducing interpretability. Therefore, K = 4 was optimal for capturing the fundamental operational modes (e.g., high PV–low wind, low PV–high wind, etc.) within a season.

5. Discussion

This method provides a more reliable data foundation for PV HCA in distribution networks, and the generated typical scenarios can enhance the robustness and economic efficiency of planning schemes. However, this study still has certain limitations: firstly, the method’s performance is highly dependent on the quality and completeness of historical data; secondly, the constructed joint probability model is static in nature.

This implies that the selected Frank-Copula and its associated dependence parameter (e.g., Kendall’s τ) represent an ‘average’ dependence structure over the modeling window. As a result, long-horizon dependence and cross-seasonal transitions may be underrepresented when a static Copula is used. Consequently, the model may not fully capture the evolving nature of wind-PV complementarity across different time scales, such as season-dependent changes in dependence patterns or variations under different synoptic weather regimes. While our seasonally partitioned modeling (Section 3.1) mitigates cross-season heterogeneity by fitting separate models for spring, summer, autumn, and winter, it still assumes constant dependence within each 3-month season. Future research could incorporate time-varying or regime-switching Copula models to better reflect cross-day or cross-seasonal variations in dependence. Techniques such as rolling-window Copula fitting or incorporating meteorological covariates into the dependence parameter could further enhance the temporal adaptability of the scenario generation framework.

5.1. Direct Application to Hosting Capacity Assessment

The typical scenarios generated by the proposed framework are designed to be directly integrated into a probabilistic HCA workflow. In standard HCA, numerous power flow simulations are performed to determine the maximum amount of new PV generation that can be accommodated without violating grid constraints (e.g., voltage limits and thermal ratings). Instead of using the full historical dataset or a large number of Monte Carlo samples, our method yields a small set of representative scenarios with statistical weights (e.g., the four scenarios in Figure 5 and their probabilities in Table 3). These scenarios preserve key stochastic patterns and the dependence structure between wind and PV outputs.

A practical implementation in HCA can proceed as follows:

(i) For each typical scenario (e.g., a 24 h wind-solar profile), conduct deterministic power flow analyses while gradually increasing the simulated PV penetration at candidate nodes.

(ii) For each scenario, identify the maximum penetration level before any constraint violation occurs.

(iii) Evaluate the overall hosting capacity in a probabilistic manner. A conservative estimate can be obtained by taking the minimum penetration limit across scenarios. Alternatively, a risk-informed hosting capacity can be computed by weighting each scenario-specific penetration limit by its occurrence probability (Table 3), which accounts for the likelihood of different generation patterns.

By replacing a large number of random simulations with a few representative scenarios, the proposed approach reduces the computational burden of probabilistic HCA, while still accounting for critical spatiotemporal dependence and representative extreme patterns. This links the scenario generation method to a core planning application and improves its practical relevance.

5.2. Trade-Off Between Computational Efficiency and Accuracy

The KDE-Copula framework proposed in this study not only ensures statistical accuracy but also boasts high computational efficiency. Compared to directly using historical data for extensive Monte Carlo simulations, our method significantly reduces the required number of simulations through KDE and Copula modeling, thereby reducing the computational burden. Furthermore, K-means clustering further compresses the number of scenarios to 4–5 typical scenarios, greatly enhancing the computational efficiency of subsequent planning analyses (such as HCA). Although this method introduces some computational overhead in marginal fitting and correlation modeling, it still outperforms traditional sampling methods and complex generative models (such as GANs) overall, achieving a good balance between accuracy and efficiency.

6. Conclusions

This study proposes an integrated KDE-Copula framework that contributes to the field in three key aspects, as outlined in the Introduction. First, by unifying KDE and Copula theory, it provides a statistically consistent modeling process. Second, it achieves enhanced accuracy in both marginal and dependence characterization. Third, it delivers a practical tool for grid planning, notably by directly facilitating efficient HCA. It effectively generates typical scenarios that capture the statistical characteristics and dependency structures of wind-solar power output. Compared to traditional methods reliant on parametric assumptions and certain machine learning models that struggle to precisely characterize marginal distributions, KDE demonstrates superior fitting accuracy for the marginal distributions of wind-solar power output. Meanwhile, the Frank-Copula function is confirmed to optimally capture the complementarity between wind-solar power outputs, thereby overcoming the deficiencies in correlation modeling inherent in traditional methods.

Author Contributions

Conceptualization, N.Y., M.J., and Z.L.; methodology, N.Y. and Z.L.; software, J.X.; validation, B.Z., X.W., and R.W.; formal analysis, J.X.; investigation, J.X.; resources, B.Z., M.J., X.W., and R.W.; data curation, B.Z., X.W., and R.W.; writing—original draft preparation, J.X.; writing—review and editing, J.X., N.Y., and M.J.; visualization, J.X.; supervision, N.Y. and Z.L.; project administration, M.J., N.Y., and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the science and technology project of State Grid Jilin Electric Power Co., Ltd.: Research on Target Network Frame Planning and Project Implementation Strategy for Rural Distribution Networks in Jilin Province under Multi-Element Access, with the funding number 2024-117.

Data Availability Statement

The data supporting the findings of this study were provided by our industry partners and are not publicly available due to commercial confidentiality and restrictions in the data-sharing agreements. Derived data supporting the findings of this study are available from the corresponding authors [Nan Yang] upon reasonable request, subject to permission from the data providers.

Conflicts of Interest

This study received research funding from State Grid Jilin Electric Power Co., Ltd. Authors Bo Zhao and Minglei Jiang are employees of State Grid Jilin Electric Power Co., Ltd., or its subsidiary research institute. Authors Xuyang Wang and Ruizhang Wang are employees of State Grid Economic and Technical Research Institute Co., Ltd., an affiliated entity. The funder provided practical engineering data, problem definition, and scenario validation support pertinent to the research topic. Beyond these disclosed employment relationships and research funding, authors Jingyao Xiong, Nan Yang, and Zhenhua Li declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors collectively declare no other financial conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PV	photovoltaic
KDE	kernel density estimation
PDFs	probability density functions
ARMA	auto-regressive moving average
GANs	generative adversarial networks
DBI	density-based index
CWGAN-GP	Conditional Wasserstein Generative Adversarial Network with Gradient Penalty
WOA	Whale Optimization Algorithm
MISE	mean integrated squared error
AMISE	asymptotic integrated mean squared error
K-S	Kolmogorov–Smirnov
CDF	cumulative distribution function
MAPE	mean absolute percentage error
RMSE	root mean square error
HCA	hosting capacity analysis

Appendix A

Table A1. Scenario generation methods: a comparison.

Category	Core Mechanism	References	Key Advantages	Main Limitations
Predictive model and generative model	Learn data distribution via adversarial training (GANs), denoising (diffusion models), or variational inference	[12,13,14,15,33,34,35]	Capable of capturing highly complex, non-parametric distributions; excellent at generating diverse and realistic-looking scenarios	Extremely high computational cost for training and sampling; often act as “black boxes” with poor interpretability; may fail to preserve tail dependencies or physical constraints
Method based on optimization and clustering	Reduce scenario set size by minimizing distance metrics (clustering) or matching statistical moments	[16,17,18]	Conceptually simple and computationally efficient for scenario reduction; good at preserving key statistical properties (e.g., moments) of the original set	Quality heavily depends on initialization and pre-defined cluster number (K); tends to oversimplify the original distribution; marginal fitting is not its primary objective
Sampling and statistical methods	Draw samples from parametric or nonparametric distributions (e.g., KDE, Copula)	[19,20,21,22,29,30,36,37,38]	Strong statistical foundation; models are typically interpretable and tunable; non-parametric variants (e.g., KDE) offer distributional flexibility	Parametric versions rely on often unrealistic distributional assumptions; standard models (e.g., static Copula) struggle with time-varying or high-dimensional dependencies
Hybrid and integrated frameworks	Combine elements from above categories to leverage their respective strengths	[23,24,25,26,37,39,40,41]	Aim to achieve a better trade-off among accuracy, diversity, efficiency, and physical plausibility	Design can be complex and ad-hoc; may inherit limitations from constituent methods; achieving a tight, principled coupling between marginal distributions and dependence structures remains a key challenge

Appendix B

Figure A1. Hourly solar irradiance data for the entire year (8760 hourly samples, 365 days × 24 h) corresponding to the typical scenarios used in this study.

Figure A2. Hourly wind speed data for the entire year (8760 hourly samples, 365 days × 24 h) corresponding to the typical scenarios used in this study.

References

Schindler, D.; Behr, H.D.; Jung, C. On the spatiotemporal variability and potential of complementarity of wind and solar resources. Energy Convers. Manag. 2020, 218, 113016. [Google Scholar] [CrossRef]
Yang, N.; Lin, D.; Ding, L.; Yang, C.; Zhang, L.; Yang, Y.; Ye, X.; Xiong, Z.; Huang, Y. Optimal Planning for Charging Stations within Multi-coupled Networks Considering Load-Balance Effects. Energy 2025, 336, 138481. [Google Scholar] [CrossRef]
Yang, N.; Xiong, Z.; Ding, L.; Liu, Y.; Wu, L.; Liu, Z.; Shen, X.; Zhu, B.; Li, Z.; Huang, Y. A Game-Based Power System Planning Approach Considering Real Options and Coordination of All Types of Participants. Energy 2024, 312, 133400. [Google Scholar] [CrossRef]
Vechkinzova, E.; Steblyakova, L.P.; Roslyakova, N.; Omarova, B. Prospects for the development of hydrogen energy: Overview of global trends and the Russian market state. Energies 2022, 15, 8503. [Google Scholar] [CrossRef]
Nema, P.; Nema, R.K.; Rangnekar, S. A current and future state of art development of hybrid energy system using wind and PV-solar: A review. Renew. Sustain. Energy Rev. 2009, 13, 2096–2103. [Google Scholar] [CrossRef]
Yang, N.; Hao, J.; Li, Z.; Ye, D.; Xing, C.; Zhang, Z.; Wang, C.; Huang, Y.; Zhang, L. Data-Driven Decision-Making for SCUC: An Improved Deep Learning Approach Based on Sample Coding and Seq2Seq Technique. Prot. Control Mod. Power Syst. 2025, 10, 13–24. [Google Scholar] [CrossRef]
Stiebler, M. Wind Energy Systems for Electric Power Generation; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar] [CrossRef]
Yang, N.; Shen, X.; Liang, P.; Ding, L.; Yan, J.; Xing, C.; Wang, C.; Zhang, L. Spatial-temporal Optimal Pricing for Charging Stations: A Model-Driven Approach Based on Group Price Response Behavior of EVs. IEEE Trans. Transp. Electrif. 2024, 10, 8869–8880. [Google Scholar] [CrossRef]
Couto, A.; Estanqueiro, A. Exploring wind and solar PV generation complementarity to meet electricity demand. Energies 2020, 13, 4132. [Google Scholar] [CrossRef]
Suchitra, D.; Jegatheesan, R.; Deepika, T.J. Optimal design of hybrid power generation system and its integration in the distribution network. Int. J. Electr. Power Energy Syst. 2016, 82, 136–149. [Google Scholar] [CrossRef]
Gao, F.; Pei, S.; Li, L.; Zha, P.; Wang, Q. Controllable and interpretable wind-solar scenario generation method based on improved generative adversarial network. Electr. Power Autom. Equip. 2025, 1–12. [Google Scholar] [CrossRef]
Chen, W.; Tu, J.; Tan, Y.; Jin, F. Wind power scenario transfer generation using deep convolution generative adversarial network. Tech. Autom. Appl. 2025, 1–6. [Google Scholar]
Yang, N.; Yang, C.; Wu, L.; Shen, X.; Jia, J.; Li, Z.; Chen, D.; Zhu, B.; Liu, S. Intelligent Data-Driven Decision-Making Method for Dynamic Multisequence: An E-Seq2Seq-Based SCUC Expert System. IEEE Trans. Ind. Inf. 2022, 18, 3126–3137. [Google Scholar] [CrossRef]
Cai, Z.; Liu, X.; Li, C.; Tai, N.; Huang, W.; Huang, S.; Wei, J. Dual Alternative Iteration Algorithm-Based Hierarchical MPC Strategy for Frequency Regulation Control and Active Power Allocation of Wind-storage Coupling System. Prot. Control Mod. Power Syst. 2025, 10, 160–175. [Google Scholar] [CrossRef]
Sun, K.; Zhang, D.; Li, Y.; Yan, J. Scenario generation method of wind power output considering spatiotemporal uncertainty. Electr. Power Autom. Equip. 2024, 44, 101–107. [Google Scholar] [CrossRef]
Liao, P.; Qi, J.; Sun, S.; Zhi, L.; Xue, D. Application of typical wind power scenarios based on improved k-means clustering in day-ahead dispatching. Electr. Eng. Mater. 2020, 46–52. [Google Scholar] [CrossRef]
Shu, H.; Li, C.; Dai, Y.; Tang, Y.; Han, Y. An adaptive single-phase reclosing technique for wind farm transmission lines based on SOD transformation of CVT secondary voltage. Prot. Control Mod. Power Syst. 2025, 10, 58–71. [Google Scholar] [CrossRef]
Liu, J.; Wei, L.; Li, H. Combined scenario generation method of wind power and load based on heuristic moment matching method. Jilin Electr. Power 2021, 49, 16–20. [Google Scholar] [CrossRef]
Yu, Y.; Liu, Z.; Zhao, G.; Liu, W.; Deng, Y.; Wen, D.; Mo, L. Spatiotemporal correlation scenario generation of wind-solar complementary system based on SGMM-MCopula. Sci. Technol. Eng. 2025, 25, 4156–4167. [Google Scholar] [CrossRef]
Cao, H.; Mu, C.; Yang, Y. Long-term optimization scheduling method for hydro–wind-PV multi-energy complementary systems considering multi-uncertainty. Yangtze River 2024, 55, 26–34. [Google Scholar] [CrossRef]
Li, Y. Research on Profit Optimization of Park Microgrid Based on Probabilistic Income Analysis Algorithm. Electr. Energy Manag. Technol. 2023, 35–40. [Google Scholar] [CrossRef]
Zhang, X.; Kuang, Y.; Zhou, L.; Zhao, X.; Zhang, Y.; Cheng, J. Wind-PV uncertainty-accounted collaborative optimal scheduling for active distribution networks and multi-microgrid systems. Water Resour. Hydropower Eng. 2025, 56, 782–790. [Google Scholar] [CrossRef]
Huang, W.; Li, Y.; Li, J.; Wu, F.; Wang, Z. Multi-time scale joint optimal scheduling for wind-photovoltaic–electrochemical energy storage–pumped storage considering renewable energy uncertainty. Electr. Power Autom. Equip. 2023, 43, 91–98. [Google Scholar] [CrossRef]
Zheng, Y.; Gong, J.; Mei, G. Economic risk game model of microgrid considering wind and photovoltaic uncertainties. Electr. Meas. Instrum. 2023, 60, 107–114. [Google Scholar] [CrossRef]
Liang, K.; Li, F.; Zhang, G. Active distribution network congestion management strategy considering wind power uncertainty. Sci. Technol. Eng. 2023, 23, 10345–10354. [Google Scholar] [CrossRef]
Wang, C. Optimal Energy Storage System Configuration Considering Wind and Solar Power Uncertainty. Master’s Thesis, Kunming University of Science and Technology, Kunming, China, 2024. Available online: https://link.cnki.net/doi/10.27200/d.cnki.gkmlu.2024.000637 (accessed on 1 February 2026).
Song, Y.; Li, H. Typical scene generation of wind and photovoltaic power output based on kernel density estimation and Copula function. Electr. Eng. 2022, 23, 56–63. Available online: https://dqjs.cesmedia.cn/EN/Y2022/V23/I1/56 (accessed on 1 February 2026).
Silverman, B.W. Density Estimation for Statistics and Data Analysis; Routledge: New York, NY, USA, 2018. [Google Scholar] [CrossRef]
Zhao, J.; Yuan, Y.; Fu, Z.; Sun, C.; Qian, K.; Xu, W. Reliability assessment of wind-PV hybrid generation system based on Copula theory. Electr. Power Autom. Equip. 2013, 33, 124–129. [Google Scholar] [CrossRef]
Gao, F.; Bao, D.; Di, Y.; Zhang, S. Scene generation method considering dynamic correlation of wind and photovoltaic outputs. Acta Energiae Solaris Sin. 2024, 45, 256–264. Available online: https://qikan.cqvip.com/Qikan/Article/Detail?id=7112889518 (accessed on 1 February 2026).
Bie, Z.; Wang, X. The application of Monte Carlo method in evaluating the reliability of power systems. Autom. Electr. Power Syst. 1997, 68–75. [Google Scholar]
Xu, J.; Wei, G.; Jin, Y.; Zhang, G.; Zhang, K.; Sun, H. Economic analysis on integration topology of Rudong offshore wind farm in Jiangsu Province. High Volt. Eng. 2017, 43, 74–81. Available online: https://qikan.cqvip.com/Qikan/Article/Detail?id=671096166 (accessed on 1 February 2026).
Chen, Y.; Wang, Y.; Kirschen, D.; Zhang, B. Model-Free Renewable Scenario Generation Using Generative Adversarial Networks. IEEE Trans. Power Syst. 2019, 33, 3265–3275. [Google Scholar] [CrossRef]
Dong, W.; Chen, X.; Yang, Q. Data-driven scenario generation of renewable energy production based on controllable generative adversarial networks with interpretability. Appl. Energy 2022, 308, 118387. [Google Scholar] [CrossRef]
Capel, E.H.; Dumas, J. Denoising Diffusion Probabilistic Models for Probabilistic Energy Forecasting. In Proceedings of the 2023 IEEE Belgrade PowerTech; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
Zhu, J.; He, Y.; Yang, X.; Yang, S. Ultra-short-term wind power probabilistic forecasting based on an evolutionary non-crossing multi-output quantile regression deep neural network. Energy Convers. Manag. 2024, 301, 118062. [Google Scholar] [CrossRef]
Hoang, K.T.; Thilker, C.A.; Knudsen, B.R.; Imsland, L.S. A hierarchical framework for minimising emissions in hybrid gas-renewable energy systems under forecast uncertainty. Appl. Energy 2024, 373, 123796. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, Z.; Li, X.; Xie, J.; Lee, K.Y. Layered-Vine Copula-Based Wind Speed Prediction Using Spatial Correlation and Meteorological Influence. IEEE Trans. Instrum. Meas. 2023, 72, 1010312. [Google Scholar] [CrossRef]
Liang, J.; Tang, W. Sequence Generative Adversarial Networks for Wind Power Scenario Generation. IEEE J. Sel. Areas Commun. 2020, 38, 110–118. [Google Scholar] [CrossRef]
Wang, Y.; Liu, Y.; Yang, Q. Operational Scenario Generation and Forecasting for Integrated Energy Systems. IEEE Trans. Ind. Inform. 2024, 20, 2920–2931. [Google Scholar] [CrossRef]
Goyal, S.K.; Gangil, G.; Saraswat, A. A Review on Uncertainty Modelling Approaches for Stochastic Optimization in Power System. In Proceedings of the 2024 4th International Conference on Innovative Practices in Technology and Management (ICIPTM); IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]

Figure 1. Scenario generation process.

Figure 2. KDE fitting curves of (a) wind power output and (b) photovoltaic power output.

Figure 3. Joint probability density and distribution functions of wind-solar power outputs in Region A under different Copulas.

Figure 4. Correlation to generate 400 scenarios of (a) photovoltaic output and (b) wind power output.

Figure 5. Typical (a) photovoltaic output and (b) wind output scenarios of Region A.

Figure 6. Baseline (a) photovoltaic output and (b) wind power output scenarios of Region A derived from historical observations.

Figure 7. A typical scenario of (a) photovoltaic output and (b) wind power output in northern Hebei and a typical scenario of (c) photovoltaic output and (d) wind power output in Beijing–Tianjin–Hebei.

Figure 8. A typical scenario of photovoltaic output in a certain area of Hubei Province.

Table 1. Results of goodness-of-fit and accuracy tests.

	$Pearson χ^{2}$	K-S	MAPE (%)	RMSE
PV output fitting result	13.23	0.0188	6.84	0.3105
Wind power fitting result	2.29	0.0145	13.96	0.0354

Table 2. Correlation coefficients and squared Euclidean distance of each Copula function.

Copula Function Type	Kendall Rank Correlation Coefficient	Spearman Rank Correlation Coefficient	Square Euclidean Distance
t-Copula	0.0749	0.1082	0.1930
Clayton-Copula	0.0582	0.0879	0.1991
Gumbel-Copula	0.0931	0.1385	0.2220
Frank-Copula	0.0399	0.0592	0.1710
Sample Copula	0.0355	0.0399	0

Table 3. Probabilities corresponding to each typical scenario.

Area	Scenario 1	Scenario 2	Scenario 3	Scenario 4
A certain place in Inner Mongolia	0.2500	0.3425	0.1750	0.2325
A certain place in northern Hebei	0.2425	0.2350	0.2925	0.2300
A certain place in the Beijing–Tianjin–Hebei region	0.2025	0.2200	0.2675	0.3100
A certain place in Hubei	0.4370	0.2160	0.1420	0.2050

Table 4. Comparison of calculation accuracy and time.

Method	MAPE (%)	RMSE	Calculation Time (s)
Direct clustering (baseline)	12.5	0.45	10
Traditional Copula	8.2	0.32	25
KDE-Copula	6.8	0.31	28
GAN-based	7.1	0.30	185

Table 5. Results of parameter sensitivity analysis.

Parameter Tested	Test Options	Key Metric(s) Evaluated	Recommendation
KDE Bandwidth Method	Silverman vs. Cross-Validation	MAPE (PV/Wind), Comp. Time	Performance difference < 0.5%. Silverman is recommended for efficiency.
MCS Sample Size (N)	200, 500, 1000, 2000	Stat. Moment Stability, Time	Moments stabilize for N ≥ 1000. Recommended default.
Cluster Number (K)	3, 4, 5, 6	WCSS, Silhouette Coefficient	K = 4 gives clear elbow and max Silhouette (0.61). Optimal.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, B.; Jiang, M.; Wang, X.; Wang, R.; Xiong, J.; Yang, N.; Li, Z. A Typical Scenario Generation Method Based on KDE-Copula for PV Hosting Capacity Analysis in Distribution Networks. Processes 2026, 14, 617. https://doi.org/10.3390/pr14040617

AMA Style

Zhao B, Jiang M, Wang X, Wang R, Xiong J, Yang N, Li Z. A Typical Scenario Generation Method Based on KDE-Copula for PV Hosting Capacity Analysis in Distribution Networks. Processes. 2026; 14(4):617. https://doi.org/10.3390/pr14040617

Chicago/Turabian Style

Zhao, Bo, Minglei Jiang, Xuyang Wang, Ruizhang Wang, Jingyao Xiong, Nan Yang, and Zhenhua Li. 2026. "A Typical Scenario Generation Method Based on KDE-Copula for PV Hosting Capacity Analysis in Distribution Networks" Processes 14, no. 4: 617. https://doi.org/10.3390/pr14040617

APA Style

Zhao, B., Jiang, M., Wang, X., Wang, R., Xiong, J., Yang, N., & Li, Z. (2026). A Typical Scenario Generation Method Based on KDE-Copula for PV Hosting Capacity Analysis in Distribution Networks. Processes, 14(4), 617. https://doi.org/10.3390/pr14040617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Typical Scenario Generation Method Based on KDE-Copula for PV Hosting Capacity Analysis in Distribution Networks

Abstract

1. Introduction

2. KDE and Copula Theory Principles

2.1. Principle of KDE

2.2. KDE Model Verification Method

2.3. Theoretical Principles of the Copula Function

3. Generation and Reduction of Wind-Solar Power Output Scenarios

3.1. Generation of Wind-Solar Power Output Scenarios Based on Monte Carlo Simulation

3.2. Typical Scenario Reduction of Wind-Solar Output Based on K-Means

4. Simulation Analysis

4.1. Case Illustration

4.2. Simulation Process

4.3. Simulation Results

4.4. Sensitivity Analysis of Key Parameters

5. Discussion

5.1. Direct Application to Hosting Capacity Assessment

5.2. Trade-Off Between Computational Efficiency and Accuracy

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI