Developing a Decision Tree Algorithm for Wind Power Plants Siting and Sizing in Distribution Networks

: The interconnection of wind power plants (WPPs) with distribution networks has posed many challenges concerned with voltage stability at the point of common coupling (PCC). In a distribution network connected WPP, the short-circuit ratio (SCR) and impedance angle ratio seen at PCC (X/R PCC ) are the most important parameters, which affect the PCC voltage (V PCC ) stability. Hence, design engineers need to conduct the WPP siting and sizing assessment considering the SCR and X/R PCC seen at each potential PCC site to ensure that the voltage stability requirements deﬁned by grid codes are provided. In various literature works, optimal siting and sizing of distributed generation in distribution networks (DG) has been carried out using analytical, numerical, and heuristics approaches. The majority of these methods require performing computational tasks or simulate the whole distribution network, which is complex and time-consuming. In addition, other works proposed to simplify the WPP siting and sizing have limited accuracy. To address the aforementioned issues, in this paper, a decision tree algorithm-based model was developed for WPP siting and sizing in distribution networks. The proposed model eliminates the need to simulate the whole system and provides a higher accuracy compared to the similar previous works. For this purpose, the model accurately predicts key voltage stability criteria at a given interconnection point, including V PCC proﬁle and maximum permissible wind power generation, using the SCR and X/R PCC values seen at that point. The results conﬁrmed the proposed model provides a noticeable high accuracy in predicting the voltage stability criteria under various validation scenarios considered.


Introduction
Wind power is one of the most sustainable, abundant and cost-effective energy sources [1,2]. A large portion of wind power is being injected into distribution systems through small wind power plants (WPPs). According to voltage regulation requirements defined by grid codes in various countries, such as Australia, the UK and Canada, the interconnection of WPPs to distribution networks must ensure that the steady-state voltage at the point of common coupling (PCC) is maintained between 95% and 105% of the rated grid voltage [3]. At a given distribution network connected WPP, the steady-state voltage at the PCC (V PCC ) is significantly impacted by short circuit capacity (SCC), short circuit ratio (SCR) and overall system impedance angle ratio seen at that site expressed by the X/R PCC . These parameters are explained as follows: • SCC: The amount of power that flows through a specified point when a short-circuit fault occurs at that point is expressed by SCC. The value of SCC depends on rated voltage (V rated ) and short-circuit impedance (Z sc ) and is given as in (1) [4].
• SCR: The ratio between the grid's SCC and the power injected by WPP is given by SCR. At the PCC bus of a distribution system connected to WPP, SCR quantifies the bus strength against the power quality issues caused by the wind power penetration. The value of SCR is calculated, as shown in (2) [4]. SCR = SCC P wind (2) Given that wind turbine generators are generally installed on areas located far from the distribution substation, e.g., hilltops and close by the ocean, the output electrical power is transmitted to the grid through long lines. This results in large short-circuit impedance (Z sc ) and small SCC and SCR [1,5]. Typically, the SCR value is less than 10 in distribution grid-connected WPPs. The small range of SCR, in turn, causes high voltage variations and power Quality issues at the PCC [4]. Hence, there is a tradeoff between the value of SCR and the voltage stability in distribution systems connected to WPP.
• X/R PCC : The grid impedance angle ratio seen at the PCC bus is defined by the X/R PCC . The value of the X/R PCC is determined by the ratio of Thevenin equivalent reactance and Thevenin equivalent resistance seen from that specified point [4]. The internal reactance of distribution lines is small, making the equivalent X/R value seen at the PCC small. The majority of existing approaches proposed for mitigating the voltage stability issues through reactive power compensation are applicable to power transmission networks where the X/R ratio is large [6]. Hence, these methods are not appropriate for distribution networks.
Given the significance of the three aforementioned parameters in the PCC voltage stability in distribution systems connected to WPP, designers should select an optimal PCC site where the values of SCC and X/R PCC ensure the V PCC stability requirements defined by the grid codes. In addition, given the relation between wind power penetration and SCR, engineers need to define the maximum power that can be injected by WPP, ensuring that V PCC is maintained within the standard range, i.e., 0.95 pu < V PCC < 1.05 pu.
Different approaches have been proposed in the literature for siting and sizing of distributed generators (DGs) in distribution networks. Authors in [7] applied analytical methods for sizing and siting of DGs to minimize power losses in the system. Such analytical methods require calculating the system bus impedance matrix, the inverse of bus admittance matrix and Jacobean matrix. Given the large size of distribution networks, calculating these matrices is computationally demanding [1]. In [8], A. Keane and M. O'Malley proposed linear programming (LP)-based DG allocation method for harvesting maximum DG energy and minimizing the voltage variations in an Irish 38-KV sevenbus radial distribution network. Mixed-integer nonlinear programming (MINLP)-based method has been studied in [9] to determine the optimal combination of different renewable DGs with minimum power loss in an IEEE-RTS 41-bus test system. Similarly, dynamic programming (DP) has been utilized in [10] for optimal allocation of DGs for power-loss reduction and reliability improvement in a 9-bus radial test distribution system. The main drawback of the methods proposed in [8][9][10] is that the methods rely on simulation of the whole distribution system, which is a complicated and time-consuming task and requires specifications of each system component [11].
Heuristics methods have been commonly used in optimal distributed generation placement (ODGP) because of their simplicity, generality, flexibility and superiority in solving optimization problems [12]. Ali et al. in [13] investigated four DG sizing and siting methods based on simulated annealing (SA), variable search environment descending (VSED), genetic algorithm (GA) and hybrid genetic algorithm (HGA) to minimize the power loss and improve the voltage profile in IEEE standard 34 bus test distribution. Similarly, particle swarm optimization (PSO) has been applied for sizing and sitting of DGs in [14] to improve voltage profile and minimize the cost of power losses in four different bus systems: 12-, 15-, 33-and 69-bus system. Ant colony optimization (ACO) has been utilized in [12] to determine the optimal sizing and placement of multiple DGs using a 69-bus distribution system. Artificial bee colony (ABC) algorithm has been proposed for optimal placement and sizing of DGs in [15] for improvement of voltage profile in IEEE 33, 69 and 229 bus system. In addition, other heuristics methods, such as harmony search (HS), differential evolution (DE), Tabu search (TS) etc., have been applied for DG optimal sizing and siting. These methods can deal with large and complex ODGP and provide a nearoptimal solution. However, similar to the previous analytical methods discussed earlier, the heuristic methods also rely on simulation of a whole distribution system, which is complex and time-consuming [11]. Moreover, the accuracy of the heuristic methods depends on the tuning of optimization parameters, such as crossover and mutation in GA [13], acceleration constants (c1, c2) in PSO [14], etc. Improper tuning of these parameters may lead to higher computational effort and adversely affect the accuracy of the prediction [4]. In addition, using analytical approaches and artificial intelligence (AI)-based methods for WPP siting and sizing produces unrealistic results as the reactive power exchanged between the grid and WPP is considered to be zero in these methods [4].
To address the aforementioned issues related to using analytical and AI-based methods, it is required to simplify the WPP sizing and siting in distribution systems using more efficient and accurate approaches. As a suitable approach for WPP sizing in the distribution systems, the author in [5] developed a mathematical relation between V PCC and SCR for a test system with 0 ≤ SCR ≤ 2.5. Referring to [5], V PCC can be taken as a quadratic function of SCR. However, the equation proposed in [5] did not consider the relation between V PCC and X/R PCC ratio. Given the significant effect of the X/R PCC ratio on V PCC stability, the lack of consideration of the relationship between these parameters adversely impacts the accuracy and validation of the relation proposed in [5]. In addition, in the majority of actual distribution networks, the SCR value is more than 2.5 [16]. Given that the mathematical model has been tested for 0 ≤ SCR ≤ 2.5, the validity of the proposed relation in [5] for a system with SCR > 2.5 is ambiguous. The aforementioned issues concerned with the mathematical relation proposed in [5] were addressed and removed by a more comprehensive mathematical model proposed in our previous work presented in [1]. The model expressed the mathematical relation between the V PCC variation, SCR and X/R PCC ratio for various test distribution networks connected to induction generator (IG) and doubly-fed induction generator (DFIG)-based WPPs. For IG-based WPPs, two mathematical relations were developed regarding the range of the X/R PCC : an exponential function for WPPs with the X/R PCC < 2 and a quadratic function for WPPs with the X/R PCC > 2. Furthermore, for DFIG-based WPPs, a mathematical relation was developed considering that the X/R PCC < 2. The mathematical method presented in [1] is one of the most valuable and comprehensive approaches expressing the relationships between V PCC and the main PCC parameters of distribution network connected WPPs. Such a mathematical model enables the prediction of the key V PCC stability criteria, including V PCC profile, step-V PCC variation and maximum permissible size of WPP. Taking advantage of the predicted V PCC parameters, the design engineers can easily find the best bus for the interconnection of a WPP without carrying out complex and time-demanding computational tasks and simulating the test systems. However, the results obtained in [1] demonstrated that the accuracy of the mathematical relations is adversely impacted when SCR and X/R PCC ratios are small. In addition, for IG-based WPPs, the accuracy of the proposed relations is low when the X/R PCC is around 2. Hence, although the method proposed in [1] simplifies the WPP siting and sizing process compared to the other existing methods, its accuracy is impacted by small SCR and X/R PCC ratios, which, in turn, limits the method applicability. To address this issue and increase the prediction accuracy, the mathematical model proposed in [1] was replaced by a decision tree algorithm-based method in this paper. Therefore, in this work, a decision tree algorithm method was developed to model the relation between V PCC variation (dV PCC ), SCR and X/R PCC . The input parameters of the proposed decision tree-based model are SCR and X/R PCC, which are the baseline characteristics of distribution feeders and easily available in any power system network. Using the values of input parameters, the model precisely predicts the P wind -dV PCC characteristic, which can then be used for optimal WPP siting and sizing. The decision tree algorithm is one of the supervised learning algorithms and can be implemented for regression and classification problems [17]. The accuracy of the decision tree algorithm in predicting output parameters is enhanced by training decision trees with a large training data set [17]. In this study, the X/R PCC -dV PCC data points were initially obtained using simulation test systems with different SCR values. Later on, the simulation results were extended to enlarge the training data set. The extended data were then used to develop the decision tree algorithm-based model. The proposed decision tree-based model enables to plot P wind versus dV PCC and provides the design engineer with insightful information to carry out an initial predictive assessment on the key power quality parameters at the PCC of WPPs, including V PCC profile, and maximum permissible power can be injected into the distribution network (P wind _max). Taking advantage of the power quality parameters predicted by the proposed decision tree algorithm, WPP planning engineers can easily estimate the optimal size of WPP and select the most appropriate site for the interconnection of WPP to distribution networks where the voltage stability requirements defined by the grid codes are provided with very high accuracy. Hence, the main contribution of this work to the existing knowledge is to simplify the WPP sizing and siting analysis as well as achieving a noticeable higher accuracy compared to the similar methods recently published in the literature. The aims of this study were to: • Develop a novel voltage stability decision tree algorithm-based model predicting the key power quality components at a given PCC bus, i.e., V PCC and P wind , based on the values of SCR and X/R PCC seen at that bus; • Simplify the siting and sizing of IG-and DFIG-based WPPs in weak distribution network; • Increase the prediction accuracy compared to the voltage stability mathematical model presented in [1].
The paper structure is as follows: Section 2 outlines and discusses the methodology and different steps followed to develop the decision tree algorithm-based model. Section 3 presents the validation results obtained and compares the accuracy of the proposed model with similar previous techniques. Section 4 explains the significance and novelty of the work and its application in predicting the key voltage stability criteria and analyzing the WPP siting and sizing. Finally, Section 5 summarizes the highlights of this work.

Methodology
The overall methodology used in developing the decision tree algorithm for predicting the aforementioned power quality parameters is presented as follows: • Data collection and extension: In this study, the X/R PCC -dV PCC characteristics were required for test systems with different SCR ratios. For this purpose, the X/R PCC -dV PCC data points were obtained simulating the test systems from authors' previous work presented in [1]. As discussed earlier, the higher accuracy of the prediction can be achieved by increasing the number of data points. However, the size of simulation data obtained by the test systems is small due to the limited capability of the MATLAB/Simulink solver in providing X/R PCC -dV PCC data points. Hence the obtained simulation data were then extended to obtain large training data set. In this work, the extension of simulation data was conducted using Microsoft Excel.

•
Developing decision tree algorithm: The extended data were then trained in the decision tree in the MATLAB (version 2014a developed by MathWorks) to formulate a model for predicting dV PCC using the values of SCR and X/R PCC . Boosted regression decision tree was utilized to predict the voltage profile from given network parameters (SCR and X/R PCC ).

Data Collection and Extension
Modeling of decision tree requires a training data set based on which it creates a model for the prediction of unknown feature from known features [18]. The aim of this study was to develop a model that predicts voltage variations in response to changes in wind power generation (P wind ) at a given PCC bus with specific SCC and X/R values. Hence, to develop such a predictive model, it is required to obtain a training data set, which includes X/R PCC -dV PCC values for a range of SCR ratios. In this study, the initial training data set was obtained using four simulation test systems considered in the authors' previous work presented in [1]. The test systems were simulated based on the IEEE 9-bus and IEEE 37-bus distribution network models. Given that the power quality issues in distribution network-connected WPPs are mainly related to the PCC sites with SCR < 10, the SCR range considered in this study is 4 < SCR < 10. In addition, the range of the X/R PCC considered is based on the analysis results gained using actual distribution systems presented in [19].
In this study, the analysis was carried out for both IG and DFIG-based WPP. Figures 1  and 2 show the single-line block diagram of the test systems. The specifications of the test systems were discussed and presented in the authors' previous work available in [1].  As shown in Figures 1 and 2, the PCC site in the 37-bus and 9-bus test systems are Bus 6 and Bus 9, respectively. The lengths of the lines are different among the four test systems considered resulting in different SCC and SCR values. For each test system, the SCC, P wind and the corresponding SCR values are as presented in Table 1. From Table 1, it can be observed that Test 1 is weaker than the other test systems as it has the lowest SCR value. On the other hand, the highest SCR value is related to Test 4, making this test system stiffer than the other systems considered.
For each test system shown in Table 1, the X/R PCC ratio was changed to monitor the corresponding V PCC value, while the values of SCC, P wind and SCR are constant. Having the X/R PCC -dV PCC data points, the V PCC variation was calculated using (3).
where V PCC signifies the PCC voltage value after the P wind is generated and injected into the test distribution systems and V initial is the PCC voltage value before the WPP connection when the P wind = 0. Referring to [1], the V initial value at the PCC of test systems was considered to be 0.98 p.u. Figures 3 and 4 show the X/R PCC -dV PCC characteristics for the IG and DFIG-based test WPPs, respectively.  As discussed, for the X/R PCC -dV PCC curve characteristics presented in Figures 3 and 4, the mathematical functions of graphs with the best fit were developed in [1]. However, as discussed by the authors in [1], the prediction error of the method presented in [1] is high for interconnection points with a small SCC, SCR and X/R PCC ratio. In this study, the mathematical model developed in [1] was replaced by a decision tree algorithm to improve the prediction accuracy. The algorithm was developed using the X/R PCC -dV PCC data points presented in Figures 3 and 4. As discussed earlier, to obtain the X/R PCC -dV PCC data points, in each simulation test system with the characteristics shown in Table 1, the X/R PCC ratio was changed, and the corresponding dV PCC was monitored. However, MATLAB/Simulink solver was not able to show the difference in dV PCC value when the X/R PCC was slightly changed. For example, when the X/R PCC was changed by 1%, the dV PCC value obtained by the simulation models was constant, meaning that the change in the X/R PCC was not reflected in the dV PCC value. The limited capability of the MATLAB/Simulink solver in providing exclusive dV PCC value for each X/R PCC value resulted in collecting a small number of the X/R PCC -dV PCC data points. Hence, only 15 data points were obtained from each simulation test system. On the other hand, the prediction accuracy of the algorithm is increased if the larger data set is used to train the decision tree algorithm [17]. Given that the number of data points obtained by the simulation models is not sufficient for training the decision tree algorithm, the data were extended to obtain large training data set.
In this study, Microsoft Excel (Microsoft office 2013 developed by Microsoft) was used to extend the simulated data by forming a trendline between dV PCC and X/R PCC for each SCR level. Higher-order polynomials were fitted, maximizing R 2 value by trial and error method. R 2 value represents the goodness of fit and lies between 0 and 1. R 2 closer to 1 represents a better fit [20] and can represent more data points. The best-fit polynomial was then utilized to determine dV PCC from the X/R PCC, forming a large dataset for each SCR level. Finally, the extended X/R PCC -dV PCC data were obtained for both IG-and DFIG-based WPPs, as shown in Figures 5 and 6, respectively.  The dataset presented in Figures 5 and 6 contains a large dataset (on average 500 data points for each test system), which is appropriate to be used as training data set in the next step.

Decision Tree Algorithm
The decision tree algorithm is one of the machine-learning (ML) algorithms, which develops regression or classification models forming a tree structure [21]. The model is trained to predict the class or value of a target parameter using learning decision rules inferred from prior data (training data) [22]. As discussed, the better prediction accuracy of the decision tree-based model can be achieved by using larger training data set [17]. The training data set must contain the values of feature and response variables. Feature variables are those, which are being given to the model as input and the response variable is the model output.
As discussed in the previous section, the proposed decision tree-based model is developed using the training data set containing 90% of extended values of the X/R PCC -dV PCC obtained for each SCR value shown in Figures 5 and 6, whereas 10% of extended data were set aside as test data set.
The block diagram of the proposed algorithm is presented in Figure 7. As shown in Figure 7, the input parameters of the proposed model are X/R PCC and the ratio between SCC and P wind , i.e., SCR. The model can predict the P wind -dV PCC characteristic curve using the value of input parameters.
In this study, MATLAB (MATLAB R2014a developed by MathWorks) was used to create the decision tree algorithm. This study aimed to develop a model from a numerical set of data, which suits the regression tree model. An ensemble-learning was utilized for training decision trees with "input variables" and "response variables" provided in the extended datasheet. Ensemble learning provides the best solution using multiple learning algorithms to increase the efficiency of the prediction [23].
To develop the decision-tree algorithm in MATLAB, it is required to accurately define the values of algorithm parameters. The minimum number of training data per leaf of the decision tree is denoted by minimum leaf size. Large minimum leaf size values increase the prediction error, and small values lead to overfitting [18]. The value of minimum leaf size should be at least 5 to prevent noises due to overfitting, while an increase in the leaf size will start deviating from the actual pattern of the data and increases the prediction error [18]. The losses in the decision tree model from the actual pattern of data can be determined using the MATLAB function "loss()". As an example, when the minimum leaf size is increased from 5 to 10, the value calculated by loss() function is increased, which indicates that the developed model skewed from the actual pattern. Hence, in this study, the value of minimum leaf size was considered to be 5 to ensure that the actual pattern of the data is reflected in the prediction. The other decision tree parameters should be selected, which provides a better prediction for the regression data. Furthermore, the least square boosting method ("LSBoost") was selected for boosting the decision tree algorithm. The pseudo-code for developing the decision tree algorithm is provided in Table 2.

Results and Discussion
This section provides the analysis studies carried out to verify the accuracy of the proposed decision tree-based model in predicting the P wind -dV PCC characteristic for different test systems. In this regard, the P wind -dV PCC characteristics plotted by the proposed model were compared with the reference characteristics given by the IEEE test systems presented in Figures 5 and 6. In addition, the P wind -dV PCC characteristics gained by the proposed decision tree algorithm-based model were compared with the results obtained by one of the most efficient methods presented in [1], which is capable of simplifying the WPP sizing and siting. Both IG and DFIG-based WPPs were considered in the verification analysis.

IG-Based WPPs
Nine test systems were considered with different SCC and X/R PCC values, as shown in Table 3. For each scenario, the considered test system was run to obtain the dV PCC for various P wind values. Having the simulation results, the reference P wind -dV PCC characteristic was plotted for each scenario. In addition, the P wind -dV PCC characteristics were obtained for each scenario using the decision tree-based model developed in this paper and the mathematical model proposed in [1] considering the SCC and X/R PCC ratios presented in Table 3. Given that the analysis was carried out for the IG-based WPP, the following equations were depicted from [1]  For the X/R PCC > 2 (5) Figure 8a-i shows the results obtained for the scenarios stated in Table 3.  As shown in Figure 8a-i, for all scenarios considered in Table 3, the P wind -dV PCC characteristics predicted by the decision tree-based model developed in this paper follow the reference graphs obtained by the simulation test models even when the large wind power generation weakens the PCC feeder. As discussed earlier, at the weak PCC sites, the large wind power penetration makes the SCR value small. Hence, the small SCR values do not adversely impact the accuracy of the proposed model in predicting the P wind -dV PCC characteristics. On the other hand, the results demonstrate that the curves predicted by the mathematical relations proposed in [1] largely deviate from the reference graphs, especially when the SCR value is small due to the large wind power generation. Hence, as the authors mentioned in [1], the accuracy of the mathematical model is adversely impacted at weak PCC where the grid's SCC and SCR are small. For example, referring to Figure 8c,d related to the results for the scenarios with the small grid's SCC, i.e., Scenario 3 with an SCC of 15MVA and Scenario 4 with an SCC of 18 MVA, the highest error between the reference graphs and the characteristics predicted by the proposed decision tree algorithm is less than 0.5%. However, Figure 8c,d show that the error between the reference curve and the curve predicted by the equations proposed in [1] is more than 1% when the wind power generation is around 4 MW, which corresponds to the SCR of around 4.
One of the other advantages of the proposed decision tree-based model over the mathematical method developed in [1] is to provide significantly high accuracy in predicting the P wind -dV PCC characteristic when the X/R ratio is around 2. From the results presented in Figure 8h,i related to the scenarios with X/R ratio close to 2, i.e., Scenarios 8 and 9, it shown that the error between the curves plotted by the decision tree-based model and the corresponding reference curves is negligible. However, the results in Figure 8h,i show that curve characteristics plotted by the mathematical model noticeably deviate from the reference graphs when the X/R PCC ratio is around 2, and the wind power generation is large.
Generally, the results in Figure 8a-i demonstrate that the P wind -dV PCC characteristics plotted by the proposed decision tree-based model follow the corresponding reference graphs for different ranges of the X/R PCC ratio and wind power penetration, whereas the accuracy of the mathematical model in predicting the characteristics is decreased when wind power generation is increased and/or the X/R PCC ratio is around 2.

DFIG-Based WPPs
This section discusses the analysis conducted to validate the accuracy of the decision tree-based model developed in this work in predicting the P wind -dV PCC characteristics for the DFIG-based WPPs. Nine test systems were considered with the SCC and X/R PCC values shown in Table 4. Similar to the verification studies carried out for the IG-based WPPs, for each test system shown in Table 4, the curves predicted using the proposed model are compared with the reference curves obtained using the simulation results and the curves plotted by the mathematical model proposed in [1]. Referring to [1], for the DFIG-based WPPs, the dV PCC is given by:  Table 4. Referring to Figure 9a-i, the prediction of the P wind -dV PCC characteristics using the proposed decision tree-based model has a minimal error, while the error between the reference and predicted results are noticeable in most cases when the graphs are plotted using the mathematical equation. The results shown in Figure 9b related to the scenario with the smallest X/R (Scenarios 2) confirms the discussion presented in [1] regarding the low prediction accuracy of the mathematical equation for weak PCC sites with a small X/R PCC ratio. From Figure 9b, the error between the reference results gained by the simulation models and the graphs predicted by the mathematical model is over 1% when the wind power penetration is large, whereas the graph plotted using the proposed decision tree algorithm precisely tracks the reference curve characteristic for any level of wind power penetration. From Figure 9e, it can be seen that the highest error of the proposed decision tree-based model in predicting the P wind -dV PCC curve characteristic is less than 0.5% in Scenario 5, while the error is greater than 1% in this scenario when the characteristic is predicted using the mathematical relation.   The next section compares the model proposed in this work with the mathematical model developed in [1] by calculating the exact value of the prediction error (PE) for each IG and DFIG-based scenario presented in Tables 3 and 4.

Comparison of Decision Tree Model and Mathematical Model for Different Ranges of X/R PCC
To compare the prediction accuracy of the decision tree-based model proposed in this paper and the mathematical method developed in [1], the error between the predicted and reference P wind -dV PCC characteristics, so-called prediction error (PE), was evaluated for the scenarios considered in Tables 3 and 4. For each scenario, the prediction error is given by (7) [24]: where: N is the number of the P wind -dV PCC data points; ∆V p expresses the dV PCC value obtained by the predictive models, i.e., the decision treebased model proposed in this paper and the mathematical model presented in [1], given the P wind value; ∆V r expresses the reference dV PCC obtained using the test simulation systems for each level of wind power penetration.
As an example, the PE values of both the proposed decision tree algorithm and mathematical model developed in [1] were calculated for test system 2 in Table 3, as shown in (8) and (9). The curve characteristics shown in Figure 8b were used to find the ∆V p and ∆V r values, as shown in Table 5. Prediction error for the proposed decision tree-based model: Prediction error for the mathematical model presented in [1]: To address the randomness of the prediction by decision tree algorithm, 3 tries were performed for each test system mentioned in Tables 3 and 4, and the PE value was considered to be the mean value of the tries. Figures 10 and 11 show the PE values calculated for the IG and DFIG-based test systems, respectively.
For the IG-based scenarios, the PE values resulted from the proposed decision tree algorithm are negligible for any X/R PCC ratios, whereas the maximum prediction error of the mathematical model is 2.5% when the X/R PCC ratio is around 2. This confirms the findings discussed in Section 3.1 that the accuracy of the mathematical model is adversely impacted when the X/R PCC ratio tends to 2.  For the DFIG-based scenarios, the maximum PE of the proposed decision tree algorithm is less than 0.1% in the scenarios with a small X/R PCC ratio. However, the maximum P.E value of the mathematical method is 1%, which occurred in the scenario with the smallest X/R PCC ratio. Therefore, as mentioned in the previous section, the accuracy of the mathematical model in predicting the P wind -dV PCC characteristics is low at the interconnection sites with a small X/R PCC ratio, while the proposed decision tree algorithm overcomes this issue.

Significance of the Proposed Decision Tree-Based Model
The decision tree-based model proposed in this study encompasses the advantages of the similar methods proposed for simplifying the WPP sizing and siting, while it significantly provides higher accuracy. Referring to the results shown in the previous section, the proposed model enables to accurately predict the P wind -dV PCC characteristic for any X/R PCC ratio and SCC and SCR values. Consequently, for a potential WPP interconnection site, design engineers can calculate the V PCC profile given the V initial value using (3) and plot the P wind versus V PCC profile characteristic.
For example, Figures 12 and 13 show the P wind -V PCC characteristic for one of the IG-based scenarios (Scenario 4 in Table 3) and one of the DFIG-based scenarios (Scenario 2 in Table 4), respectively. The V initial value at the PCC of test systems used for the scenarios considered in Figures 12 and 13 is 1 pu and 0.98 pu, respectively.
As mentioned in Section 1, the V PCC profile must be maintained between 95% and 105% of the network nominal voltage to satisfy the steady-state voltage stability requirements defined by the grid codes [3]. Therefore, after plotting the V PCC -P wind characteristics for the potential WPP interconnection sites, designers and planners can determine the best PCC site, where the X/R PCC and SCC values ensure that the grid code requirements are concerned with the magnitude of steady-state V PCC are met.
In addition, the prediction of the P wind -dV PCC characteristic using the proposed model enables to estimate the maximum permissible size of WPP, called P wind _max, ensuring that the steady-state V PCC requirements defined by the grid codes would be satisfied. For example, from Figures 12 and 13, the P wind _max values at the PCC of the test system considered are 3.6 MW and 5.2 MW, respectively. The results presented in Figures 12 and 13 confirm that the predicted P wind _max gained by the proposed decision tree algorithm literally tracks the reference P wind _max values obtained by the simulation models.  Table 3.  Table 4. Referring to Section 1, most works published in the literature regarding DG siting and sizing rely on the simulation of whole distribution networks with complicated structures and/or carrying out complex and time-consuming computational tasks. The authors' previous work proposed an efficient mathematical method for simplifying WPP siting and sizing by removing the need to simulate the whole test system and conduct complex calculations [1]. However, the accuracy of the proposed mathematical method was limited. For the IG-based WPPs, the accuracy of the mathematical model in predicting the P wind -V PCC curve characteristic is reduced as the SCR is decreased or the value of the X/R PCC moves toward 2. In addition, for the DFIG-based WPPs, the accuracy of the mathematical model in predicting P wind -V PCC characteristics is low if the PCC site has a small X/R ratio. The proposed decision tree-based model addressed the aforementioned issues by simplifying the WPP sizing and siting through predicting P wind -V PCC characteristics with noticeably high accuracy. Similar to the mathematical model developed in [1], the model proposed in this paper requires only two PCC parameters, i.e., X/R PCC and SCC, to predict the P wind -V PCC characteristics. The predicted P wind -V PCC characteristic can be used for optimal WPP sizing and siting. Given that the X/R PCC and SCC are the baseline characteristics of a distribution feeder, their values are generally available or can easily be calculated using fundamental power system analysis methods. More importantly, the verification results shown in Section 3 demonstrated that the proposed decision tree-based model eliminates the issues concerned with the limited accuracy of the mathematical model presented in [1] by providing a negligible prediction error for any SCR and X/R PCC ratios.

Conclusions
In this study, a novel decision tree algorithm-based model was developed to predict the voltage behavior in response to the wind power injection at a potential feeder for connecting IG and DFIG-based WPPs in a distribution network. For this purpose, the proposed model enables to plot wind power versus PCC voltage (P wind -V PCC ) characteristic at a given potential interconnection point using the distribution system baseline parameters seen at that point, including SCC and X/R PCC . Taking advantage of the plotted P wind -V PCC characteristic, design engineers can carry out an initial predictive assessment on the critical voltage stability criteria, including V PCC value and P wind _max, to determine the optimal WPP connection site and its maximum permissible size ensuring the grid code requirements. The proposed model simplifies the siting and sizing of WPPs by removing the need to simulate the whole distribution system and performing computational calculations, which is one of the main advantages of the proposed model over the majority of existing approaches. In addition, the proposed model was benchmarked against one of the latest mathematical methods developed for simplifying WPP sizing and siting to affirm its accuracy in predicting the P wind -V PCC characteristic and voltage stability criteria.
For the IG-based WPPs, the verification results gained using the mathematical method demonstrated that the error between the predicted and reference results was large when SCR is small (SCR ≤ 4). However, the largest error between the reference characteristics and the corresponding curves plotted by the proposed decision tree algorithm was ignorable even when the PCC site was weak, and its SCR was smaller than 4. In addition, for the IG-based WPPs, the prediction accuracy of the mathematical model in predicting P wind -dV PCC curve characteristics is around 2.5% when the X/R PCC ratio tends to 2, whereas the curves predicted by the proposed decision tree algorithm precisely track the reference characteristics when the X/R PCC ratio is around 2.
For the DFIG-based WPPs, the proposed model sorted out the issue of the mathematical model regarding low prediction accuracy when the X/R ratio is small. In this respect, the highest prediction error between the reference results and the data predicted by the mathematical model was around 1% when the X/R PCC ratio is around 0.5, while the proposed model provided an accuracy of almost 100% over the whole range of the X/R PCC ratio.
Generally, the verification studies demonstrated the proposed decision tree-based model is superior to the previous similar methods. In this study, the test systems used for the verification analysis were based on IEEE standard distribution network models, which other researchers widely use for conducting power system analysis. The proposed model was developed considering a number of practical factors to increase the accuracy of the proposed model for actual applications. This includes developing the proposed model using the real-world range of the X/R PCC and reducing the uncertainty due to the load deviations by considering the V initial parameter. In addition, the validation of the presented model using actual systems is important and will be addressed in future studies to further complement this research. The practical verification of the proposed model requires the values of V PCC , X/ RPCC and SCC obtained from an actual distribution network. However, the authors did not have access to such values. In addition, simulation and modeling the real-world distribution systems require using professional engineering software, such as PSS/e, which is not currently available to the authors. Therefore, as one of the extensions to this research, the authors intend to validate the proposed model using an actual case where a wind power plant is being proposed for further integration. Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://www.mdpi.com/1996-1073/13/13/3485 (accessed on 18 April 2021).

Conflicts of Interest:
The authors declare no conflict of interest.

IG
Induction generator I sc Short-circuit current DFIG Double-fed induction generator dV PCC Voltage variation concerning the voltage value before wind power plant connection at the point of common coupling PE Prediction error PCC Point of common coupling P wind Power generated by wind power plant SCC Short-circuit capacity SCR Short-circuit ratio V initial Voltage at distribution feeder before the connection of wind power plant WPP Wind power plant X/R PCC Short-circuit impedance angle ratio seen at the point of common coupling Z sc Short-circuit impedance