1. Introduction
As the share of renewable energy in power systems continues to grow, the large-scale integration of photovoltaic (PV) generation has become a significant trend in the development of future distribution networks [
1,
2]. However, the high randomness of PV output poses challenges to the stable operation of the grid, making PV hosting capacity a key metric for assessing the grid’s ability to accommodate PV generation [
3,
4]. Accurately evaluating PV hosting capacity helps optimize the allocation of PV resources, enhances grid scheduling flexibility, and provides critical support for renewable energy integration and grid stability. However, PV output is significantly influenced by weather variations, and meteorological uncertainties directly affect the fluctuation characteristics of PV power, presenting challenges regarding the accuracy of hosting capacity assessments [
5,
6]. Therefore, there is an urgent need to establish a reliable scenario generation method that fully captures the impact of meteorological uncertainties on PV hosting capacity, ensuring the reliability of the assessment results.
The core objective of a PV hosting capacity assessment is calculating the maximum PV output the grid can accommodate. Existing research mainly focuses on optimizing calculation methods [
7] and improving assessment models [
8]. The former enhances assessment accuracy and computational efficiency through optimization techniques, while the latter improves the applicability of results by constructing mathematical models that better reflect grid operational characteristics. For example, some studies have introduced improved grey wolf optimization algorithms [
9] or data-driven approaches based on neural networks [
10] to enhance the accuracy and efficiency of the calculation. Meanwhile, other works have incorporated voltage sensitivity analysis [
11], urban spatial distribution and demographic features [
12], and rooftop PV potential estimations based on satellite imagery [
13] to better reflect practical grid constraints and spatial deployment characteristics. Although existing research has improved the accuracy and applicability of hosting capacity assessments, most methods are based on specific PV scenarios, making it difficult to adapt to complex weather conditions. Therefore, developing a PV hosting capacity assessment method that can accommodate alternative scenario variations and effectively handle the uncertainties of PV power fluctuations is a key challenge in enhancing the adaptability of the assessment methods.
Researchers have begun to focus on generating PV power scenarios to enhance the adaptability of PV hosting capacity assessments. Existing approaches can generally be classified into two categories—those based on adjusting historical data characteristics and those driven by data. The first involves adjustments based on historical data characteristics, typically employing density-based clustering methods [
14] or random perturbation simulation strategies [
15] to extract and refine structural features from existing samples to construct representative scenarios. The second category comprises data-driven methods, often relying on recurrent neural network structures [
16] or hybrid models combining deep learning and clustering algorithms [
17] to expand the sample distribution and better capture complex fluctuation patterns. In general, methods based on historical data offer clear modeling logic and are easy to implement, making them suitable for scenarios with high-quality data and relatively stable fluctuations. However, they may have limited generalizability under complex meteorological conditions. In contrast, data-driven methods exhibit stronger capabilities in nonlinear modeling and generating diverse samples, making them more suitable for assessments under high uncertainty. However, they impose greater demands regarding model design and training quality.
In recent years, deep learning generative models have become one of the key methods for PV power scenario generation research due to their advantages in feature extraction and modeling nonlinear relationships [
18,
19]. Generative adversarial networks (GANs), as a representative in this field, use adversarial training between the generator and discriminator, enhancing the quality of generated samples through iterative competition [
20,
21]. With the continuous optimization of GAN technology, several variations have emerged and are widely applied in complex tasks, such as renewable energy scenario generation [
22,
23]. Building upon this foundation, recent studies have further improved scenario generation performance by incorporating conditional constraints [
24], enhancing the correlation between feature representations and control encoding [
25], and optimizing the structural mapping between inputs and outputs [
26]. In summary, GAN models demonstrate strong potential in improving PV power scenario generation capabilities, effectively supporting PV hosting capacity assessment models, especially when dealing with complex meteorological conditions and source–load fluctuations, thus helping to enhance the accuracy and applicability of the assessments.
This study proposes a PV hosting capacity assessment model for distribution networks considering source–load uncertainty. The main contributions of this work are summarized as follows:
A PV hosting capacity evaluation model is introduced, employing an alternative scenario optimization framework to assess the maximum PV integration capacity while considering system constraints such as voltage stability, power flow balance, and line transmission capacity;
A scenario generation method based on WGAN-GP is introduced to simulate the random fluctuations in PV power, ensuring that the generated PV power scenarios exhibit reasonable physical characteristics. A load uncertainty modeling method is proposed, which constructs fluctuation ranges from historical data and introduces random disturbances to account for load uncertainty;
A physics-data joint-driven strategy is proposed to control the generated samples, ensuring that the generated scenarios exhibit diversity and align with actual meteorological conditions and the operational rules of the power system, thereby enhancing the data’s physical plausibility and reliability;
To enhance the structural controllability of scenario generation, this paper improves the original WGAN-GP framework by introducing a target-driven weighted sampling mechanism. This modification enables the generator to focus more effectively on key structural samples, thereby improving its ability to represent critical features and increasing the accuracy and diversity of generated scenarios.
The paper is structured as follows:
Section 2 introduces the PV system model and the PV hosting capacity assessment model;
Section 3 describes the WGAN-GP-based scenario generation method and load uncertainty modeling;
Section 4 presents a case study analysis and the results;
Section 5 provides the discussion; and
Section 6 concludes the study.
3. Source–Load Uncertainty Scenario Generation and Optimization Method
3.1. PV Scenario Generation Method Based on WGAN-GP
The assessment of PV hosting capacity depends on constructing representative scenarios that capture PV output characteristics under varying meteorological conditions. To this end, this study employs the WGAN-GP for scenario generation, leveraging adversarial training between the generator and discriminator to produce diverse and realistic sample data [
28].
3.1.1. Generator Design
The core task of the generator is to generate samples that follow the target distribution based on the noise input to meet the scenario requirements of the PV hosting capacity assessment. To ensure the diversity and authenticity of the generated data, the generator’s design includes data input, a network structure, and a loss function.
- 1.
Data input
The input to the generator is random noise z, which follows a specific probability distribution to ensure that the generator explores a more expansive sample space, thereby increasing the diversity and generalization ability of the generated samples.
- 2.
Network structure
The generator uses a multi-layer, fully connected network for data mapping. The input layer transforms the random noise to provide a basis for subsequent feature learning. The multi-layer hidden layers employ nonlinear activation functions (such as ReLU or Leaky ReLU) to extract data distribution features and learn underlying patterns. The output layer generates samples that conform to the target distribution, ensuring the continuity and stability of the data.
- 3.
Loss function
The training objective of the generator is to minimize the Wasserstein distance between the generated data and the real data, ensuring that the generated data approaches the actual sample data in terms of distribution shape and dynamic characteristics. The loss function
LG is expressed as follows:
where
represents the generated samples from the generator;
Pg is the distribution of the generated samples; and
is the score assigned by the discriminator to the generated samples. By minimizing this loss function, the generator can continuously adjust its generation strategy to align its output data more closely with the actual data.
3.1.2. Discriminator Design
The core task of the discriminator is to measure the distributional difference between the generated data and real data, providing an optimization direction for the generator. The WGAN-GP discriminator computes the Wasserstein distance to assess the closeness of the data distributions and introduces a gradient penalty to ensure the model satisfies the K-Lipschitz continuity condition. This approach improves the training stability and ensures the authenticity and diversity of the generated data.
- 1.
Data input
The input to the discriminator consists of both real data and data generated by the generator. By feature extraction and comparison, the discriminator measures the closeness between the generated and real data.
- 2.
Network structure
The discriminator employs a deep convolutional network to extract temporal features and learn data distribution patterns. The input layer receives and normalizes the time-series data to ensure numerical stability. The convolutional layers use convolutions to capture time-dependent characteristics, enhancing the discriminator’s ability to represent dynamic changes in the data. The fully connected layers map the extracted features to a lower-dimensional space and compute scores. The output layer generates the scores to assess the authenticity of the input data. A higher score indicates that the data are closer to the real distribution.
- 3.
Loss function
The optimization objective of the discriminator is represented by the Wasserstein distance, maximizing the score gap between real and generated data and introducing a gradient penalty term to ensure stable training. The loss function is given by the following:
where
represents the real samples;
Pr is the real data distribution;
is the gradient of the discriminator with respect to the generated sample; and
λGP is the gradient penalty strength coefficient. The first term in Equation (16) increases the score for real data to enhance the discriminator’s ability. The second term reduces the score for generated data to increase the score gap between the real and generated data. The third term
LGP applies a gradient penalty to constrain the gradient norm, ensuring it satisfies the 1-Lipschitz condition, as defined in Equation (17). Specifically, the penalty forces the gradient norm of the discriminator output with respect to the interpolated input to remain close to 1, thereby softly enforcing the Lipschitz constraint. This regularization improves training stability and helps prevent gradient vanishing or explosion issues.
The operational flow of WGAN-GP is shown in
Figure 2, and its core mechanism includes the following iterative optimization process. First, the generator receives random noise as input and generates samples through nonlinear mapping. Subsequently, the discriminator is trained to distinguish between real and generated data, calculating the Wasserstein distance to measure the distributional difference between the two while applying a gradient penalty term to ensure the model satisfies the Lipschitz continuity constraint. After the discriminator has been trained, the feedback information is used to optimize the generator, enabling it to progressively generate data that closely approximates the real data distribution, thereby improving its authenticity and fitting accuracy. The generator and discriminator are optimized alternately in an iterative training process until the statistical characteristics of the generated data sufficiently match those of the real data.
3.1.3. Sample Weight Assignment
During the optimization process of the discriminator, appropriate weight allocation helps strengthen the model’s learning of target–feature samples while preserving the diversity of the generated data. Improper weight adjustment may cause the model to become biased toward specific categories, thereby affecting the balance of the data distribution. To address this, this paper introduces target feature screening, ensuring that weight adjustments are applied only to samples that match the defined target features. This allows the model to focus on learning key features while maintaining data diversity. The screening rule is defined as follows:
where
Fi represents the statistical feature value of sample
i (such as the maximum irradiance), and
Fmin and
Fmax are the screening thresholds used to retain samples whose feature values fall within this interval.
Sd is the target feature screening indicator, where
Sd = 1 indicates that the sample matches the target feature and will participate in the weight adjustment process; otherwise, it will not affect the weight calculation.
After completing the target feature screening, a dynamic adjustment strategy based on generation error feedback is adopted. This allows the sampling weights to be adaptively optimized throughout the training process, ensuring that the model sufficiently learns from samples within the target feature interval while avoiding mode collapse. The dynamic weight is calculated using the following formula:
where
represents the weight of sample
d at the
m-th training iteration,
β is the weight adjustment coefficient, and
denotes the generation error of sample
d at iteration m, which is specifically defined as follows:
where
and
denote the generated sample and the real sample for the m-th training iteration, respectively; and
D(·) represents the score assigned by the WGAN-GP discriminator.
The average generation error within the target feature interval
is calculated as follows:
where
NS represents the number of samples that meet the target feature condition. The generation error of each sample and the average error within the target interval jointly determine the adjustment direction of the sampling weight. When the individual sample error exceeds the average error (i.e.,
), the sampling weight is increased to strengthen the model’s learning on that sample. Conversely, when the sample error is below the average (i.e.,
), the sampling weight is decreased to prevent the model from overfitting to specific samples and to maintain the overall diversity of the data distribution.
Based on this, the loss function of the discriminator is modified as follows:
where
ωd affects only the expectation of real samples; increasing the weights of key target samples encourages the discriminator to focus more on the distributional characteristics of samples that exhibit the desired features.
This strategy enhances the model’s attention to key samples during the early stages of training, enabling faster adaptations of the generated data to the target distribution. In the later stages, the sample weights are dynamically adjusted to ensure that the generated data not only reflect the target features but also maintain diversity, thereby effectively preventing mode collapse. Through proper weight assignment and adaptive adjustment, the generator can effectively learn the distribution patterns of target feature samples and ultimately produce scenarios that meet the intended structural objectives.
3.1.4. Probability Adjustment Based on Sample Weight
In PV power scenario generation, samples with specific target characteristics are often underrepresented in the dataset. Even when assigned higher training weights, these samples may still have limited influence on the model due to their low selection frequency. To improve the match between generated data and target characteristics, this study introduces a probability adjustment strategy based on dynamic sample weights. By increasing the selection probability of target–feature samples, the model’s learning effectiveness for those features is enhanced, thereby improving both the quality and diversity of the generated data.
The selection probability
P(
d) of sample
d is determined by its weight
ωd and is calculated as follows:
where
N is the total number of samples. This adjustment strategy increases the occurrence frequency of target-specific samples within each training batch, thereby enhancing their influence on the optimization of the discriminator. As a result, the discriminator becomes more sensitive to key feature distributions, while the generator improves its ability to learn from target samples, ensuring that the final generated outputs align with the expected structural characteristics.
3.2. Load Uncertainty Modeling Method
In PV hosting capacity assessments for distribution networks, load fluctuation uncertainty is one of the key factors affecting their PV power integration capability. Load fluctuations are influenced not only by seasonal changes but also by daily electricity consumption patterns. In particular, in a given region, the daily load profile typically exhibits a regular pattern of variation. Due to the stability of this daily fluctuation trend, the amount of variation over the long term is relatively limited. Therefore, modeling using historical data can effectively capture the main characteristics of load fluctuations. Based on this regularity, the random load fluctuations can be simulated by introducing appropriate random disturbances into the historical data without relying on complex scenario generation methods. This approach can accurately reflect load fluctuation characteristics and efficiently support the assessment of PV integration capacity.
The modeling of load fluctuation uncertainty primarily depends on a statistical analysis of historical data. The load distribution range and mean can be calculated by collecting and analyzing daily load data over a period. Specifically, the upper and lower bounds of load fluctuation are determined based on statistical parameters such as the minimum, maximum, and mean values of historical load data. These statistical parameters provide the foundation for modeling load fluctuations and help capture the range and trend of load variations. Random perturbation methods simulate load variability once the load fluctuation range and mean are obtained. Random disturbances are introduced within a range based on the historical data’s mean and fluctuation range during each simulation. The size of the disturbance is determined according to the characteristics of load fluctuations, typically using a normal or uniform distribution to generate the disturbance values, ensuring that the simulated load data fluctuate within the historical data’s distribution range. In this way, the simulated load data reflect the fluctuations within a reasonable range, avoiding generating load values outside the actual range.
This modeling method can realistically reflect the uncertainty of load fluctuations, especially during high or low network load periods, providing more accurate load scenarios for PV hosting capacity assessments. By incorporating load fluctuation uncertainty, the proposed PV hosting capacity evaluation model can simulate PV integration capacity under different load conditions, thereby providing a more comprehensive assessment of the impact of PV integration on the operation of the distribution network. This method offers reliable support for grid dispatch optimization and PV integration planning.
3.3. Optimization Generation Strategy Driven Jointly by Physics and Data
In the scenario generation process based on WGAN-GP, relying solely on data-driven methods may generate data that violates physical laws, such as non-zero irradiance at night. In contrast, load data, constrained by historical observational data, naturally adapt to the statistical distribution, eliminating the need for additional physical constraints. Therefore, to ensure the physical validity of the generated data while maintaining the flexibility of data-driven approaches, constraints must be applied to irradiance to ensure that it aligns with the physical characteristics of PV generation and enhances the reliability of the generated data.
This paper adopts a hard constraint clipping strategy to achieve this, in which irradiance data that do not conform to physical laws are corrected after sample generation. Specifically, all generated irradiance data are forcibly set to zero during nighttime hours (e.g., from sunset to sunrise) to ensure they comply with the diurnal variation pattern. Additionally, to prevent the generation of non-physical samples, a non-negativity constraint is applied to all irradiance data, clipping any irradiance values that are below zero or zero, ensuring the physical reasonableness of the data. This strategy allows WGAN-GP to maintain the advantages of data-driven methods while strictly adhering to the physical characteristics of PV irradiance, thus enhancing the authenticity of the generated data and providing more reliable scenario data support for PV hosting capacity assessments.
The core objective of the PV hosting capacity assessment model is to calculate the maximum PV output that the grid can accommodate while ensuring the safe and stable operation of the network. This paper employs the SSA to determine the optimal solution for PV integration capacity, considering constraints such as voltage stability, power flow balance, and line transmission capacity. Regarding uncertainty modeling, PV power fluctuations are simulated through scenario generation methods to represent random variations accurately. In contrast, load fluctuations are modeled by constructing fluctuation ranges from historical data and introducing random disturbances.
The scenario generation section proposes a PV power scenario generation method based on WGAN-GP. This method involves adversarial training between the generator and the discriminator, combined with a physics-driven data strategy, ensuring the physical plausibility and diversity of the generated scenarios. Finally, the generated PV power scenarios are integrated with the assessment model, and through an alternative scenario optimization framework, the PV integration capacity under various meteorological and load conditions is accurately evaluated.
5. Discussion
In PV hosting capacity assessments, the breadth and depth of uncertainty modeling directly determine the accuracy and applicability of the evaluation method. Although numerous studies have examined the impact of source-side or load-side uncertainty on PV integration capacity, most existing approaches exhibit the following limitations: (1) they often model only one side of the uncertainty, ignoring the coupled evolution of source and load fluctuations; or (2) they frequently adopt fixed typical day samples or static scenarios, which fail to reflect the dynamic changes and temporal patterns present in real-world operating conditions. To address these issues, this paper establishes a joint source–load modeling framework: on the source side, solar irradiance and temperature are simultaneously introduced as input variables, and WGAN-GP is used to generate high-quality scenarios with both physical consistency and temporal structure; on the load side, intraday variation sequences are generated using historical statistical distributions combined with perturbation mechanisms. This joint modeling strategy significantly enhances the evaluation model’s structural adaptability and scenario generalization under multi-source uncertainty.
In addition, this paper conducts a comprehensive analysis and comparison of existing studies that have attempted to incorporate uncertainty modeling. One class of methods is based on adjustments to historical data features, using techniques such as clustering, association mining, or random perturbation to construct scenarios. However, these methods are inherently limited by the distribution of available samples and show weak generalization performances under data scarcity or unseen states. Another class involves data-driven modeling using generative networks like the VAE or conventional GANs. While such methods can increase sample diversity, they often lack physical constraints, resulting in unrealistic nighttime generation or numerical anomalies and a struggle to express temporal structure accurately.
To overcome these challenges, this paper proposes a scenario generation method based on WGAN-GP. By optimizing the adversarial training process between the generator and discriminator, the method not only stabilizes training but also reinforces the physical consistency of the generated samples. Additionally, the method introduces a goal-driven weighted sampling strategy to enhance the learning of key structural features, effectively guiding the generator to prioritize samples with target characteristics and improve modeling precision in critical regions. On the source side, the proposed approach effectively learns intraday trends and localized fluctuations in meteorological data while ensuring that the generated sequences remain physically reasonable regarding their value range, structural patterns, and real-world interpretability. Compared with the VAE and traditional GANs, the proposed method significantly improves sample quality. Specifically, it outperforms the baseline models in MSE (numerical accuracy), ACC (temporal correlation), and SSIM (structural similarity), verifying its comprehensive advantage in scenario generation quality and assessment adaptability. The improved performance in these evaluation metrics further strengthens the reliability and practical usability of the hosting capacity evaluation results.
Furthermore, to enable efficient multi-scenario optimization, this study employs the SSA as the solver for the hosting capacity model. Although the SSA is not a newly proposed algorithm, its lightweight structure and mature mechanisms make it well suited for complex, high-dimensional problems. In the case study, SSA achieves comparable optimization results to other mainstream methods such as PSO and GWO. Its simple parameter configuration and stable performance contribute to maintaining computational stability under uncertainty, making it a practical and reliable choice for distributed PV hosting assessments.
This study offers a practical and robust framework across three key dimensions: modeling structure, scenario generation, and optimization solving. The proposed method is adaptable to source–load uncertainty and provides a reliable modeling foundation for future PV planning and dispatch strategies in distribution networks.