1. Introduction
With the promotion of the dual carbon strategy, the installed capacity of distributed PV has increased dramatically. This large-scale grid connection has led to a series of issues such as voltage limits and reverse power flow, and existing policies have become increasingly stringent in restricting additional PV capacity. Industrial and commercial users adopting the “self-generation and self-consumption + surplus electricity to the grid” model can achieve considerable benefits, resulting in the widespread grid connection of distributed PV in industrial parks. Research on PV grid connection location selection methods and optimal capacity planning is of great significance in attracting user investment and expanding the grid’s acceptance capacity for PV, while ensuring the safe and stable operation of the power grid.
In the early stage of PV grid connection, when the penetration rate of PV in the distribution network is low, the PV site selection and capacity determination method focuses on the overall grid planning perspective, emphasizing economic factors while considering safety factors to determine the grid connection location and capacity of distributed PV. Reference [
1] considers the impact of distributed PV access on the distribution network voltage and applies an improved particle swarm algorithm with parameter self-adaptation strategy for site selection and capacity determination of distributed PV. References [
2,
3] comprehensively consider the investment cost of distributed PV and the grid’s safe operation indicators, using a multi-objective particle swarm algorithm to obtain the Pareto solution set and then selecting the optimal PV configuration through decision-making measures. Reference [
4] uses economic objectives for planning and voltage deviation, voltage fluctuation, and harmonic indices as constraints, employing a multi-population genetic algorithm for planning and solution. Reference [
5] first divides the year into multiple scenarios through clustering, then uses an improved particle swarm algorithm to carry out site selection and capacity planning for multiple types of distributed energy sources, aiming to minimize the annual comprehensive cost.
The problems caused by the high penetration of PV grid connection, such as voltage violations and reverse power flow in the distribution network, prompt grid operators to limit the maximum allowable capacity for distributed PV based on the load-carrying capacity of each distribution transformer network [
6].Traditional methods [
7,
8,
9] based on capacity-to-load ratio, empirical formula, load adjustment, and rule-based approaches primarily focus on the operational safety of the distribution network, using the total investment and operational costs of distributed PV as the objective function, with voltage deviation, line, and distribution capacity as constraints. These constraints are set too rigidly, leading to overly conservative results for the maximum allowable capacity. Distributed PV capacity planning methods considering source–load uncertainty [
10,
11], hierarchical multi-objective capacity planning methods for distributed energy systems [
12], and maximum capacity planning methods for distributed PV [
13] all take the maximum integration capacity of distributed PV as the objective function. The results obtained from these methods, which consider grid security as a constraint, show an improvement over traditional approaches while ensuring safe operation of the distribution network. However, all of these methods focus solely on planning from the grid-wide perspective and fail to comprehensively consider the costs and benefits of PV grid integration from the user’s perspective.
With the decline in PV investment costs and the gradual removal of national PV subsidies, the increasingly competitive PV market requires a more rational user–grid transaction model. Currently, the transaction model has expanded from the simple benchmark feed-in tariff to include two additional modes: direct trading and electricity consignment. Research on the economic efficiency of transaction models has recently promoted the development of optimal PV integration capacity in distribution networks. Reference [
14] considers the impact of each PV user on system incremental network losses to establish an annual dynamic wheeling cost, thereby determining the PV integration capacity at each node and the starting point for fee collection. Reference [
15] first derives the initial maximum admissible PV capacity based on a full-year temporal assessment, and then establishes a PV plant revenue evaluation model accounting for curtailment, emphasizing economic profitability, to determine the optimal PV capacity that maximizes plant revenue, which exceeds the more conservative maximum admissible capacity. References [
16,
17] optimize the mechanism based on the wheeling cost, respectively proposing the wheeling cost model considering basic usage charges, congestion, and cross-subsidy costs, and a model accounting for losses of grid franchise rights, using fee variations to guide users in optimizing PV integration, providing an effective approach.
This paper first fully considers the source–load uncertainty and uses the k-means algorithm to cluster the annual light radiance and load data of a certain area, obtaining multiple typical daily scenarios to provide a basis for subsequent distributed PV grid connection capacity planning. Secondly, from a global perspective of power grid planning, a distributed PV site selection and capacity planning model optimizing the annual comprehensive operation cost is established, and the resulting range of connection nodes is used as a reference for subsequent detailed planning schemes. Furthermore, from the perspective of grid-connected users, for industrial and commercial users within the access node range, an optimal capacity planning model emphasizing economic benefits is developed, guided by the improvement of wheeling cost and considering both the costs and benefits of PV integration, with the optimal capacity allocation for each node determined using an improved gray wolf optimization algorithm. For industrial and commercial users outside the access range, the optimal grid connection locations are determined based on a comprehensive index that considers node margins and system voltage stability. This method is applied to an IEEE 33-node simulation, and the simulation results show that the proposed method can increase the distribution network’s acceptance of distributed PV, improve the overall benefits for users and the grid, and effectively promote the transformation of the distribution network into a clean grid.
2. A Method for Constructing Multi-Typical Daily Scenarios Considering Source–Load Uncertainty
Currently, stochastic programming, fuzzy programming, robust optimization, and scenario analysis methods [
18,
19,
20,
21] are commonly used to consider the impact of source–load uncertainty on subsequent planning, improving the adaptability and flexibility of the planning and enhancing the system’s ability to handle complex uncertainties. In this context, scenario analysis retains multiple representative scenarios through the steps of scenario generation and reduction. The planning model performs objective function calculations under the conditions of each scenario, and the results are weighted and summed according to the proportion of each scenario. This method fully considers the characteristics of each scenario and has the advantages of clear principles and strong operability. It has been widely used in power system analysis [
22]. The k-means algorithm, as an efficient unsupervised machine learning method, is widely used in scenario generation and reduction. Its core function lies in measuring the Euclidean distance between historical data points, automatically classifying them into several clusters, using the centroid of each cluster as a typical scenario and the proportion of original data points within the cluster as the probability of the scenario’s occurrence. Therefore, the k-means algorithm is an effective technique for converting raw uncertain data into the input format required for scenario analysis [
23,
24].
The real load and light radiance data for a specific region are normalized, and a year-round load level and light radiance dataset is created on a daily basis. The k-means clustering algorithm is used to generate multiple typical daily scenarios. The k-means algorithm is one of the most commonly used methods for unsupervised clustering. Its main feature is that, after specifying k cluster numbers, it continuously optimizes the k cluster centroids during iterations, ultimately achieving the optimal clustering solution. The steps are as follows:
- (1)
k samples are randomly selected from the dataset as the initial mean vectors, denoted as uk, which serve as the centroids for the initial clusters;
- (2)
Based on the existing mean vectors uk, each sample is assigned to the cluster whose mean vector is closest to it, resulting in the initial clustering, as shown in the following equation:
Equation (1) represents the objective function of the k-means algorithm, which calculates the sum of the Euclidean distances between all data points in the sample and their corresponding cluster mean vectors, with the variable being the cluster membership code rij. When data xi is assigned to cluster j, the value of its cluster code rij is 1, and 0 otherwise. When J reaches its minimum value, the corresponding cluster membership set should partition the data points into the closest clusters;
- (3)
The cluster membership codes rik determined by the partitioning result of step (2) are used to update the cluster mean vectors uk, with the goal of minimizing J;
- (4)
Repeat steps (2) and (3) until the preset number of iterations is reached or the mean vectors of the clusters no longer change, and then output the clustering results.
In the k-means clustering algorithm, the setting of the number of clusters needs to consider both the characteristics of the original data and the computational cost. The Davies–Bouldin index (DB index) is selected to evaluate clustering quality and compare the performance of different cluster numbers [
25]. The DB index, a widely used metric for evaluating clustering quality, quantitatively assesses both the separation between clusters and the compactness within clusters. A lower DB Index value indicates better clustering performance, with optimal clustering structures typically corresponding to DB Index values approaching zero. The formula for the DB index is
In the formula, σi is the sum of the average distances from all points in cluster i to the mean vector of that cluster, and d(ui,uj) is the distance between the mean vectors of clusters i and j. The DB index represents the ratio of the intra-cluster distance to the inter-cluster distance, with a smaller DB value indicating better clustering performance.
3. Optimal Planning Method for Distributed PV Grid Integration
Distributed PV planning includes both grid connection location decision-making and optimal capacity planning [
26]. First, a site selection and capacity determination model considering source–load uncertainty is constructed to determine the optimal access node range and capacity. For users within the optimal access point range, a distributed PV optimal grid connection capacity planning model is proposed, aiming to maximize user benefits by considering the wheeling cost and achieving the optimal connection capacity. For users at non-optimal access points, a comprehensive index is established, considering the capacity margin of grid nodes and the impact of PV integration on voltage stability, to evaluate the access location and make a connection decision. The overall flowchart of the optimal planning method for distributed PV grid integration is shown in
Figure 1.
3.1. A Distributed PV Site Selection and Capacity Planning Model Considering Source–Load Uncertainty
In the previously obtained multiple typical daily scenarios, a distributed PV site selection and capacity planning model is constructed, with the total investment and operational cost across multiple scenarios as the objective function and grid security operation as the constraint. The objective function is
In the equation,
S represents the set of typical daily scenarios,
ps is the proportion of this scenario in all scenarios,
CPVI and
CPVOM are the investment cost and operation and maintenance cost of distributed PV, and
Cbuy is the cost of purchasing electricity from the upper level. The cost calculation formulas are as follows:
In this equation,
r represents the discount rate,
n is the service life of the PV equipment,
cinv_PV indicates the construction cost per kilowatt of PV equipment, and
Pi,PV refers to the PV installation capacity at node
i.
In this equation,
cope_PV represents the operation and maintenance cost per kilowatt of PV equipment.
In this equation, Psb(t) represents the inflow power at the interconnection node between the main grid and the distribution network at time t. When its value is negative, it indicates that the distribution network is delivering power back to the main grid, in which case the distribution network’s electricity purchase cost is considered to be 0.
The constraints are as follows:
Equation (8) is an equality constraint that specifies that the relationship between the voltage at each node, branch currents, active power, and reactive power in the system must strictly follow the power flow calculation formula. Equations (9)–(11) are inequality constraints that stipulate that the node voltage and branch power must remain within certain ranges, and the PV capacity involved in grid connection cannot exceed a set capacity limit, which is a proportion of the total load of the distribution network.
The improved particle swarm optimization algorithm, which adaptively adjusts inertia weight and learning factors, is used to optimize the distributed PV site selection and sizing model [
27], resulting in the optimal locations and capacities for distributed PV integration in the distribution network.
3.2. Optimal Distributed PV Capacity Planning Model for Distribution Networks Guided by Wheeling Cost
The current wheeling cost policy is based on the difference in transmission and distribution prices between different voltage levels. The wheeling cost is calculated by subtracting the transmission and distribution price of the user’s voltage level from the highest voltage level involved in the power transaction. This policy aims to compensate for the increased costs of the grid when distributed energy is fed back to the upper-level grid, without considering other impacts of distributed energy on grid connection. Relevant improvements have been made to the wheeling cost structure in terms of grid losses and electricity sale rights [
14,
15,
16,
17]. This paper considers the impact of distributed PV on grid loss and electricity sale rights loss to improve the wheeling cost and uses its dynamic changes to guide grid-connected users to install PV systems reasonably, with the goal of maximizing user benefits, and constructs an optimal distributed PV capacity planning model for the distribution network.
3.2.1. Objective Function
Considering the PV power generation revenue, investment and operation maintenance costs, and wheeling cost, the objective function of the planning model under multiple scenarios is as follows:
In the equation, Cuser represents the total user revenue, Crev_PV represents the distributed PV generation revenue, Cinv represents the distributed PV investment cost, Cope represents the operation and maintenance cost, and Cint represents the wheeling cost.
The revenue from PV power generation is divided into two situations: when the PV output at a certain moment is less than the load at that moment, the revenue is reflected as a reduction in the user’s commercial electricity purchase cost; when the PV output at a certain moment is greater than the load at that moment, the revenue is the sum of the reduced total load expenditure and the revenue from selling electricity to the grid, as follows:
In the equation,
represents the set of 24 time periods in a day,
cpur(t) is the electricity price for the user to purchase from the grid at time
t,
is the set of nodes,
is the PV output at node
i at time
t under typical daily scenario
s,
is the load at node
i at time
t under typical daily scenario
s, and
cPV is the electricity selling price. The PV output is calculated from meteorological data using Formula (24):
In the equation, r represents the light radiance, W/m2; light radiance refers to the intensity of light striking a surface, emphasizing the distribution of light power through unit area; A is the area of a single PV panel, m2; η is the photoelectric conversion efficiency of the PV panel.
The PV investment and operation and maintenance costs are as follows. In the equation, for convenience in comparing multiple scenarios on a typical day, the investment cost is converted to a daily cost.
The impact of distributed PV systems as a power source integrated into the distribution network on network losses should be considered. In addition, PV participation in grid-connected electricity sales affects the grid’s franchising rights. Based on these two considerations, the design of the wheeling cost is as follows:
In the equation,
CPVloss represents the network loss cost that the PV user needs to share for the distribution network, and
Cmag is the cost of loss to the grid’s franchising rights caused by the PV user’s participation in direct electricity sales to the grid.
In the equation,
represents the output of the grid-connected distributed PV system,
is the total power supply of the distribution network,
represents the network loss value, and
cbuy is the unit electricity price for the grid purchasing power from the upper level.
In the equation,
csell(t) is the time-of-use electricity price the grid charges the user at time
t, and
Cgreen is the grid’s green certificate revenue. The green certificate revenue
Cgreen is composed of the following Equation (20), where
is the total electricity fed into the grid at that moment, and
kgreen is the new energy quota coefficient. The new energy quota coefficient refers to the ratio of renewable energy generation to the total power fed into the grid, and the grid can trade any surplus generation exceeding this ratio in the green certificate market to obtain revenue. Therefore, the franchising loss should only include the portion within the new energy quota. The grid network fee cost, after subtracting the green certificate revenue, reflects the space for the grid to offer discounts to the users, thus further reducing the user’s wheeling cost.
3.2.2. Constraints
In addition to the flow constraints, branch power constraints, and distribution transformer capacity constraints from the maximum admissible capacity model, the constraints of the optimal capacity planning model include a reverse power flow constraint. This means that the power fed back from the distribution network to the upper-level grid cannot exceed the rated capacity of the transformer at the connection point with the main grid, as shown below:
In the Equation, represents the total installed PV capacity of the distribution network, and PT,E represents the rated capacity of the transformer at the connection point between the distribution network and the upper-level grid.
Under the above constraints, the decision variables of the planning model are the PV capacities installed at each node within the optimal access range. The distributed PV planning scheme is obtained through optimization using the improved GWO algorithm, and the flowchart is shown in
Figure 2.
3.3. Grid-Connected Location Decision-Making Methods
For users at non-optimal connection points, the grid provides several selectable grid connection nodes based on the user’s geographical location, without considering the node’s capacity to accommodate PV systems and the impact of PV integration on the distribution network’s voltage. To address this issue, a grid connection location decision-making method that takes into account node capacity margins and voltage stability is proposed.
3.3.1. Model for Calculating the Maximum Access Capacity of Distributed PV at Each Node
To account for the capacity margin at each node, it is necessary to calculate the maximum permissible capacity of distributed PV at each node. The maximum permissible capacity refers to the maximum capacity that can be connected at each node under the constraints of system voltage, network loss, power flow, etc. [
13]. The decision variable of the maximum permissible capacity calculation model is the distributed PV capacity connected at each node in the distribution network, and the objective function is the minimization of network loss across multiple scenarios for the entire distribution network:
In addition to the constraints in the previously mentioned site selection and capacity determination model, to prevent the power fed back into the grid by the PV system from exceeding the rated power of the distribution transformer at node
i, the connected PV capacity
PPV,i must be less than the rated capacity
PT,i of the distribution transformer at node
i. This adds the constraint on the distribution transformer capacity:
The improved gray wolf optimization (GWO) algorithm [
28] is used for solving the calculation, effectively avoiding the situation of getting stuck in local optima during the computation process, and obtaining the maximum permissible capacity value at each node of the distribution network.
3.3.2. Decision-Making Method for Grid-Connected Location Based on Comprehensive Evaluation Indexes
The static voltage stability (SVS) index is selected to assess the impact of distributed PV integration on the system voltage [
29,
30].
The radial distribution network branch circuit model is shown in
Figure 3:
Assuming that power flows from node
i to node
j, the voltage relationship between the two nodes is
In the Equation,
and
represent the voltage phasors at nodes
i and
j,
Rij and
Xij are the resistance and reactance between the two nodes, and
Pij and
Qij are the active and reactive power flowing into node
j through the branch
ij. When node
j connects to distributed PV, the expression for active power
Pij is
In the equation, PL,j represents the load at node j, PC,j is the transmission power flowing from node j to other branches, and PPV,j is the PV output connected to node j.
Expanding Equation (14) and simplifying it to an Equation with
as the variable, we obtain
At this point, determining the stability of the voltage at node
j can be transformed into determining whether
Uj in Equation (26) has a solution. If no power flow solution exists, it indicates that the node voltage is unstable. Therefore, based on the root discriminant of the equation, the condition for voltage stability is
Equation (28) is the widely used SVS index for determining the static voltage stability of nodes. If SVSj 1, it indicates that the voltage at node j has a power flow solution and is stable, with a smaller index indicating better stability. When SVSj 1, the voltage at node j is in a critical stable state, and further increasing the load at the node will lead to voltage instability and eventual collapse. Therefore, the static voltage stability index of the entire distribution network should be the maximum value of the SVSj indices of all nodes.
After considering the impact of distributed PV grid integration on the voltage level of the distribution network, the node capacity margin index
Cm(
t) for different schemes is calculated using the maximum admissible capacity model as
In the equation, PPVt(i) represents the maximum PV admissible capacity at the connection point i in the grid connection scheme t, PPV,0 represents the PV capacity applied for by the user, and PPV,max represents the maximum value of the maximum PV admissible capacity for all nodes. The larger the Cm(t) index, the greater the capacity margin of the connection point selected in this scheme, allowing for future user connection demands.
Similarly, the static voltage stability index of the distribution network for scheme
t,
CSVS(
t), is given by
Finally, by assigning weights and summing Equations (29) and (30), the comprehensive index for evaluating the grid connection scheme is obtained:
In the equation,
w1 and
w2 represent the weights of the two indices, and it is ensured that the sum of the weights equals 1. The scheme with the largest comprehensive index
C(
t) among the alternatives is selected as the final grid connection scheme. The flowchart is shown in
Figure 4.
5. Conclusions
With the gradual rollback of PV preferential policies and improvements in the market trading mechanism, new changes have occurred in the development pattern of distributed PV grid integration. There is an urgent need for reasonable planning of distributed PV grid connection. This paper proposes an optimal grid integration capacity planning method for distributed PV, conducting multi-scenario simulations for typical days after fully considering the uncertainty of load and PV output in the IEEE 33 node system, leading to the following conclusions:
The scenario analysis method is used to construct multiple typical daily scenarios based on annual load and sunlight data. K-means clustering is performed, and the DB index is used to select the optimal number of clusters, ultimately obtaining typical daily scenarios that effectively reflect the annual load and sunlight information, fully considering the uncertainty of source and load.
After obtaining the optimal PV access node range using the traditional site selection and capacity determination method, for nodes within the optimal PV access range, the primary guidance is to charge users a wheeling cost. A distributed PV optimal capacity planning model, considering the maximization of users’ economic benefits, is adopted to optimize the PV grid connection planning.
For users applying for grid connection outside this range, the maximum admissible capacity for each node is first calculated. A comprehensive indicator that considers both node capacity margin and static voltage stability of the distribution network is used to evaluate the grid connection scheme, and the scheme is quantitatively assessed to decide the user’s grid connection plan.
Based on the actual load conditions and local weather data for a certain region in China, multiple typical daily scenarios are set in the modified IEEE33 node system. The PV optimal capacity planning method proposed in this paper is compared with the traditional PV site selection and capacity determination method. The results show that the proposed method can guide users in grid connection under more realistic conditions. Compared to traditional methods, it provides a greater improvement for the PV grid connection scheme in grid planning. With the consideration of the wheeling cost strategies, the method increases the distributed PV acceptance capacity by 20.14%. Additionally, user benefits and the operational safety and economic efficiency of the distribution network are significantly improved, leading to a 27.77% increase in total revenue. This research explores methods for the integration of distributed PV into the distribution grid under the new circumstances.
This study has achieved certain results in the planning of distributed PV access capacity, but there is still room for further research. Future work may explore the coordinated planning of PV-storage systems, the optimized configuration of energy storage under dynamic electricity pricing, and regional adaptive capacity allocation schemes. In terms of algorithms, the performance of intelligent algorithms can be compared with traditional methods to assess their applicability in large-scale scenarios. Furthermore, with the development of the distributed energy market, it is worth researching PV-storage coordinated planning methods under multi-agent game theory in the market environment, to enhance the practical application value of the results.