A Novel Reactive Power Optimization in Distribution Network Based on Typical Scenarios Partitioning and Load Distribution Matching Method

Featured Application: This work is a prospective study on reactive power optimization based on the background of big data, which is supported by Science and Technology Project of State Grid Corporation of China (SGCC) (EPRIPDKJ (2015) 1495), and Beijing Natural Science Foundation (3172039). The research results will be applied in demonstration applications of SGCC in the future. Abstract: This paper proposed an entropy weight optimum seeking method (EWOSM) based on the typical scenarios partitioning and load distribution matching, to solve the reactive power optimization problem in distribution network under the background of big data. Firstly, the mathematic model of reactive power optimization is provided to analyze the relationship between the data source and the optimization schemes in distribution network, which illustrate the feasibility of using large amount of historical data to solve reactive power optimization. Then, the typical scenarios partitioning method and load distribution matching method are presented, which can select out some loads that have the same or similar distributions with the load to be optimized from historical database rapidly, and the corresponding historical optimization schemes are used as the alternatives. As the reactive power optimization is a multi-objective problem, the multi-attribute decision making method based on entropy weight method is used to select out the optimal scheme from the alternatives. The objective weights of evaluation indexes are determined by entropy weight method, and then the multi-attribute decision making problem is transformed to a single attribute decision making problem. Finally, the proposed method is tested on several systems with different scales and compared with existing methods to prove the validity and superiority.


Introduction
Reactive power optimization is an effective means to ensure the safe and economic operation of power system. The reasonable reactive power distribution can reduce the network loss [1,2], improve the voltage quality [3,4], and maintain the normal operation of the power grid. Reactive power optimization method based on entropy weight method. Case simulation with different systems and results comparisons are provided in Section 4. Section 5 is the summary and presents the conclusions.

Relationship between the Data Source and the Optimal Schemes in Reactive Power Optimization
The database in distribution network has accumulated a large amount of historical data, which comes from different systems, such as SCADA (Supervisory Control and Data Acquisition), GIS (Geographic Information System) and EMS (Energy Management System). It also can be divided into operational monitoring data, marketing data, and management data according to the use of data. Besides, part of the data is well-structured, but more data is unstructured or semi-structured. Facing such a large amount of multi-source and heterogeneous data, it is necessary to consider how to select out the effective data and make fully use of it to solve reactive power optimization problem. Therefore, data fusion and data cleaning should be performed to make all the data well-structured firstly [33,34]; then, the mathematic model of reactive power optimization is provided to analyze the relationship between the data source and the optimal schemes, which can guide us to find out the data that have a decisive effect on reactive power optimization.
Reactive power optimization is a multi-objective programming problem, and generally the network loss, the node voltage offset and the minimum module-eigenvalue of the Jacobian matrix are chosen as the objective functions to evaluate the economy and security of the system. The minimum module-eigenvalue can measure the static voltage stability of the system; the smaller the value is, the more unstable the system is, and the value will decrease to 0 if the voltage collapse occurs. The formulas of the objective function are expressed as follows: (1) (3) where f is the objective function; f 1 , f 2 and f 3 are respectively the network loss, the node voltage offset and the minimum module-eigenvalue; w 1 , w 2 and w 3 are the corresponding weight of f 1 , f 2 and f 3 , and w 1 + w 2 + w 3 = 1; V i and V j are the voltage amplitude of node i and node j; G ij and θ ij are respectively the conductance and voltage phase angle difference between node i and node j, and particularly i = j; n is the number of nodes in the system; V i B is the ideal voltage amplitude of node i, whose value is usually 1.0 (p.u.); ∆V i max is the maximum allowable voltage offset of node i, which is generally ±7% in the distribution network; J is the Jacobian matrix while the power flow is converged; and eig (J) is the eigenvalue of matrix J.
The constraint of the reactive power optimization model contains power balance constraints, voltage constraints and control variables constraints. The power balance constraints are as follows: where P Gi and Q Gi represent the active and reactive power of node i; P Li and Q Li represent the active and reactive load of node i; B ij represents the susceptance between node i and node j. The voltage and the control variables constraints are expressed as follows: Q ci q = K ci q · q ci q , i q = 1, . . . , n q (8) where V i,min and V i,max represent the upper and lower voltage bounds of node i respectively; n q and n T are respectively the number of capacitor compensation nodes and transformer nodes; if the capacitors are equal grouping and the single capacity is q ciq , Q ciq is the compensation capacity of node i q with K ciq groups put into operation; T i T is the ratio of transformer with the tap position K Ti T and the minimum adjustment ∆T i T . As expressed in Equations (1)- (4), the objective function f can be expressed as a function of the system active load P, reactive load Q, the capacitor compensation capacity Q c and the transformer tap T, which can be expressed as follows: f = F(P, Q, Q c , T) (10) where F(·) expresses a mapping relation from P, Q, Q c and T to the optimal objective function. Generally, the reactive power control scheme in distribution network consists of compensation capacity Q c and transformer tap T, which are determined by the load level and load distribution in each node while the network topology remains unchanged. Besides, reactive load is dependent upon the existence of active load, so it can be obtained by power factor. Therefore, Equation (10) can be simplified as follows: (Q c , T) = g(P 1 , P 2 , . . . , P n ; Q 1 , Q 2 , . . . , Q n ) (11) where g(·) expresses a mapping relation from load distribution to reactive power control scheme; (P 1 , . . . , P n ) and (Q 1 , . . . , Q n ) respectively express the active power load and reactive power load distribution in different nodes.
Since the compensation capacity of capacitor and the transformer tap are both finite discrete variables, the number of reactive power control schemes is limited. Though power load is a continuous variable and has some randomness, it appears a strong regularity and periodicity; and the optimal reactive compensation scheme remains the same in a certain range of load fluctuation. Therefore, while the amount of power load data and load distribution patterns that accumulated in the historical database is large enough, and assume that all historical control schemes are optimal, the corresponding control schemes of the existing load distributions are all contained in the database. Therefore, if the reactive power optimization is carried out for a certain time in the future, the similar network topologies, load distributions and corresponding control schemes are selected out from the historical database, and then the optimal control scheme is among the schemes. This paper is to solve the problem that how to find out the optimal plan from the historical database.

The Method of Typical Scenarios Partitioning
In the previous section, it is concluded that the control scheme of reactive power in distribution network is determined by the network topology and load distribution. Then, how to make fully and effectively use of the two types of data in historical database has become the key to solve reactive power optimization problem. This paper presented a typical scenarios partitioning method, containing typical topology scenarios partitioning and typical load scenarios partitioning.
Firstly, typical topology scenarios partitioning method is presented. The change of topology in distribution network is achieved by the cooperation of section switches and tie switches. Although the combination of different switch states can produce many kinds of network topologies, it can be seen from the historical operating data that the difference of the duration of different network topologies in one year is obvious, and the number of typical topologies that has long duration is limited. Therefore, select topologies that has long duration in historical database as the typical topology scenarios, and other topologies are treated as special scenarios.
Secondly, the method of typical load scenarios partitioning is as follows. For some load that have strong regularity and change smoothly, the load fluctuates around its typical load curve during a specific operating cycle; while the data quantity is large enough, this type of load obeys a normal distribution approximately around its typical load curve in a specific scenario in statistically, so the similar load distributions can be found in the typical load scenario corresponding to the load distribution to be optimized. As it appears different load characteristics in different seasons, weekdays and weekends, it can be simply divided into eight typical load scenarios preliminarily, that is, weekdays and weekend in four seasons. Besides, different types of load are affected by different factors, so the eight typical load scenarios can be further refined according to the weather, holidays and other factors, to increase the number of typical scenarios. For example, typical scenarios of residential load in summer can be further subdivided based on temperature and humidity; national legal holidays also have great influence on the commercial load, so it can be subdivided into the scenario of Spring Festival, May Day, National day and other scenarios. The refinement of typical load scenarios can not only improve the accuracy of load distribution matching, but also improve the computing time of the method. The typical daily load curves of the eight scenarios in a certain region are shown in Figure 1. limited. Therefore, select topologies that has long duration in historical database as the typical topology scenarios, and other topologies are treated as special scenarios. Secondly, the method of typical load scenarios partitioning is as follows. For some load that have strong regularity and change smoothly, the load fluctuates around its typical load curve during a specific operating cycle; while the data quantity is large enough, this type of load obeys a normal distribution approximately around its typical load curve in a specific scenario in statistically, so the similar load distributions can be found in the typical load scenario corresponding to the load distribution to be optimized. As it appears different load characteristics in different seasons, weekdays and weekends, it can be simply divided into eight typical load scenarios preliminarily, that is, weekdays and weekend in four seasons. Besides, different types of load are affected by different factors, so the eight typical load scenarios can be further refined according to the weather, holidays and other factors, to increase the number of typical scenarios. For example, typical scenarios of residential load in summer can be further subdivided based on temperature and humidity; national legal holidays also have great influence on the commercial load, so it can be subdivided into the scenario of Spring Festival, May Day, National day and other scenarios. The refinement of typical load scenarios can not only improve the accuracy of load distribution matching, but also improve the computing time of the method. The typical daily load curves of the eight scenarios in a certain region are shown in Figure 1.

The Method of Load Distribution Matching
This paper proposed a load distribution matching method, which can quickly find out the load that has the similar distribution with the load to be optimized from massive historical data, and the corresponding control schemes are also selected out. While the power flow of time t in a certain day is to be reactive power optimized, the first step is to select out the samples that both have the same topology scenario and load distribution scenario with the load at time t, and the samples form a set. Secondly, load distribution matching method is implemented with the sample set, and then a smaller sample set is formed, which contains the optimal scheme.
Assume that there are N0 days in the historical database that has the same topology scenario and load scenario to the time t; then match the load distribution at time t of the N0 days with the moment to be optimized in turn, and select out the load that has the similar distribution with the time t. Suppose that there are Ne similar load distributions are selected out, then the corresponding historical optimal schemes form the alternative set.
It is necessary to make a matching rule for the load distribution matching. If a load distribution obeys the rule, it can judge that the load distribution is similar to the moment to be optimized, and its corresponding historical optimal plan can be an alternative. In the Pauta criterion of statistics, while the amount of statistical data is large enough, the probability of the value of the random

The Method of Load Distribution Matching
This paper proposed a load distribution matching method, which can quickly find out the load that has the similar distribution with the load to be optimized from massive historical data, and the corresponding control schemes are also selected out. While the power flow of time t in a certain day is to be reactive power optimized, the first step is to select out the samples that both have the same topology scenario and load distribution scenario with the load at time t, and the samples form a set. Secondly, load distribution matching method is implemented with the sample set, and then a smaller sample set is formed, which contains the optimal scheme.
Assume that there are N 0 days in the historical database that has the same topology scenario and load scenario to the time t; then match the load distribution at time t of the N 0 days with the moment to be optimized in turn, and select out the load that has the similar distribution with the time t. Suppose that there are N e similar load distributions are selected out, then the corresponding historical optimal schemes form the alternative set.
It is necessary to make a matching rule for the load distribution matching. If a load distribution obeys the rule, it can judge that the load distribution is similar to the moment to be optimized, and its corresponding historical optimal plan can be an alternative. In the Pauta criterion of statistics, while the amount of statistical data is large enough, the probability of the value of the random variable that subject to the normal distribution N~(µ,σ 2 ) beyond µ ± 3σ is 0.27%, which is considered a small probability event and almost impossible happened in mathematics.
The Pauta criterion is applied to the load distribution matching in this paper. In a system with n nodes, generate two load intervals (P − 3σ P , P + 3σ P ) and (Q − 3σ Q , Q + 3σ Q ), where the mean values are the load to be optimized in each node P = (P 1 , . . . , P n ) and Q = (Q 1 , . . . , Q n ); the variance values are σ P 2 and σ Q 2 ; σ P = (σ P1 , . . . , σ Pn ) and σ Q = (σ Q1 , . . . , σ Qn ) are empirical value, which can be calculated with large amount of historical load in different typical scenarios. Take the active power load as an example to illustrate the specific matching process, the active load is divided into three regions by the interval (P − 3σ P , P + 3σ P ), which is shown in Figure 2. variable that subject to the normal distribution N ~(μ,σ 2 ) beyond μ ± 3σ is 0.27%, which is considered a small probability event and almost impossible happened in mathematics. The Pauta criterion is applied to the load distribution matching in this paper. In a system with n nodes, generate two load intervals (P − 3σP, P + 3σP) and (Q − 3σQ, Q + 3σQ), where the mean values are the load to be optimized in each node P = (P1, …, Pn) and Q = (Q1, …, Qn); the variance values are σP 2 and σQ 2 ; σP = (σP1, …, σPn) and σQ = (σQ1, …, σQn) are empirical value, which can be calculated with large amount of historical load in different typical scenarios. Take the active power load as an example to illustrate the specific matching process, the active load is divided into three regions by the interval (P − 3σP, P + 3σP), which is shown in Figure 2. Read load distributions in the typical scenario from the historical database in turn. And if a load distribution is completely within the interval (P − 3σP, P + 3σP), that is, within the Region 1 in Figure  2, take the corresponding historical optimal plan as an alternative; otherwise, the corresponding historical plan cannot be used. After the active power load distribution matching is finished and Nm historical load are selected out, reactive power load distribution matching is also carried out with the Nm historical load.
In summary, if the reactive load optimization is carried out for the load at time t, the process of the load distribution matching is shown in Figure 3, and the final result is a set of alternative schemes composed of Ne historical reactive power control plans.
As the method proposed in this paper is dependent on the historical data, it is suitable for the system that the load changes smoothly and the historical data accumulates enough. In the system with frequently load changing or little historical data, maybe there is no load in the interval (P − 3σP, P + 3σP) to be matched or the number of matched load samples is little. There are two solutions in this case. The first one is simulation off-line, generate a number of load distributions in the interval (P − 3σP, P + 3σP) randomly, and conventional method is used for reactive power optimization of each load distribution respectively; the historical database is extended in this way, and then the proposed method can be used. The second solution is that conventional method is directly used for reactive power optimization and the optimal results are saved into the database; while enough load data are accumulated in the database, the proposed method can be used. Read load distributions in the typical scenario from the historical database in turn. And if a load distribution is completely within the interval (P − 3σ P , P + 3σ P ), that is, within the Region 1 in Figure 2, take the corresponding historical optimal plan as an alternative; otherwise, the corresponding historical plan cannot be used. After the active power load distribution matching is finished and N m historical load are selected out, reactive power load distribution matching is also carried out with the N m historical load.
In summary, if the reactive load optimization is carried out for the load at time t, the process of the load distribution matching is shown in Figure 3, and the final result is a set of alternative schemes composed of N e historical reactive power control plans.
As the method proposed in this paper is dependent on the historical data, it is suitable for the system that the load changes smoothly and the historical data accumulates enough. In the system with frequently load changing or little historical data, maybe there is no load in the interval (P − 3σ P , P + 3σ P ) to be matched or the number of matched load samples is little. There are two solutions in this case. The first one is simulation off-line, generate a number of load distributions in the interval (P − 3σ P , P + 3σ P ) randomly, and conventional method is used for reactive power optimization of each load distribution respectively; the historical database is extended in this way, and then the proposed method can be used. The second solution is that conventional method is directly used for reactive power optimization and the optimal results are saved into the database; while enough load data are accumulated in the database, the proposed method can be used.

Introduction to Multi-Attribute Decision Making Problem Based on Entropy Weight Method
While selecting the optimal control scheme from alternatives, several reactive power optimization evaluation indexes should be considered comprehensively, including the network loss, voltage offset, power factor and so on. Therefore, it is a multi-attribute decision making problem, which is also called multiple objective decision making with finite alternatives. It is a decision making problem that select the optimal scheme from alternatives or schedule the alternatives ranking with the consideration of multiple attributes. Entropy weight method based on information entropy can solve the problem. The objective weights of multiple attribute indexes are determined by information entropy calculated from alternatives, and then the best scheme is selected out.
The concept of entropy is derived from thermodynamics and it is a physical quantity that reflecting the directivity of natural thermal processes in initially. Then information entropy is put forward with the development of related research, which opens up a new way for quantitative decision method. In information theory, the amount and quality of information obtained in decisionmaking is an important factor to determine the accuracy and reliability of final decision. Exactly, entropy is a good measure for useful information provided by data.
In the multi-attribute decision making problem with M evaluation indexes and N alternatives, the evaluation index matrix is expressed as follows:

Introduction to Multi-Attribute Decision Making Problem Based on Entropy Weight Method
While selecting the optimal control scheme from alternatives, several reactive power optimization evaluation indexes should be considered comprehensively, including the network loss, voltage offset, power factor and so on. Therefore, it is a multi-attribute decision making problem, which is also called multiple objective decision making with finite alternatives. It is a decision making problem that select the optimal scheme from alternatives or schedule the alternatives ranking with the consideration of multiple attributes. Entropy weight method based on information entropy can solve the problem. The objective weights of multiple attribute indexes are determined by information entropy calculated from alternatives, and then the best scheme is selected out.
The concept of entropy is derived from thermodynamics and it is a physical quantity that reflecting the directivity of natural thermal processes in initially. Then information entropy is put forward with the development of related research, which opens up a new way for quantitative decision method. In information theory, the amount and quality of information obtained in decision-making is an important factor to determine the accuracy and reliability of final decision. Exactly, entropy is a good measure for useful information provided by data.
In the multi-attribute decision making problem with M evaluation indexes and N alternatives, the evaluation index matrix is expressed as follows: where y ij is the value of the i-th evaluation index at the j-th alternative. Matrix R = {r ij } is got with the standardized of matrix Y. The greater the value of the element in matrix R is, the better the evaluation effect is, so all evaluation indexes should be standardized according to this regulation. For the evaluation index that smaller value has better evaluation effect should be standardized with Equation (13), and conversely with Equation (14).
where max j {y ij } and min j {y ij } are respectively the maximum and minimum value of the i-th row in matrix Y. The maximum value of the elements in matrix R is 1 and minimum is 0, that is, Calculate the proportion p ij that evaluation index r ij on the i-th index according to Equation (15), where r ij is the i-th evaluation index of the j-th alternative.
And the entropy of the i-th evaluation index is expressed as follows [35]: where p ij × lnp ij = 0 if p ij = 0; and k = 1/ln N in order to make it meet the constrain that 0 ≤ H i ≤ 1.
Entropy is a measure of uncertainty, and the smaller the entropy value is, the more effective the information corresponding to the evaluation index is. Therefore, the entropy weight w i of the i-th index is shown as follows: After the entropy weight of each evaluation index is determined, the multi-attribute decision making is transformed into a single attribute decision making problem; and then the optimal scheme can be selected from the alternatives.

Specific Steps of EWOSM
The proposed EWOSM contains two procedures. Firstly, typical scenarios partitioning and load distribution matching method is used and select out N e alternative schemes from historical database. Secondly, select the optimal scheme from alternatives with the entropy weight method, and the specific steps are expressed as follows.
Step 1: Based on the power load at the time to be optimized, calculate the power flow with N e alternatives respectively and the results of power flow must be checked according to Equations (5)- (9) to ensure the constraints are satisfied. If the constraints are not met, the corresponding alternative control scheme should be removed. Besides, three indexes containing the network loss, the node voltage offset and the minimum module-eigenvalue of the Jacobian matrix are used to evaluate the control effect of each alternative, and the three indexes are presented by y 1 , y 2 and y 3 .
Step 2: This is a multi-attribute decision-making problem with 3 evaluation indexes and N e alternatives, and the evaluation index matrix is formed as expressed in Equation (12).
Step 3: As the smaller the value of the network loss and the node voltage offset, the better the control effect of reactive power, standardize the two indexes according to Equation (13); conversely, the larger the minimum module-eigenvalue is, the more stable the system is, so standardize the index according to Equation (14). Then calculate the proportion p ij that the evaluation index r ij on the index i with Equation (15).
Step 4: Calculate the entropy of each evaluation index and the corresponding entropy weight according to Equation (16) and Equation (17) respectively.
Step 5: The objective weight w 1 , w 2 and w 3 of the three evaluation indexes are substituted into Equation (19) to calculate the total evaluation value of each alternative respectively. And the scheme that has the highest evaluation value is the optimal control plan.
In summary, the reactive power optimization process based on the entropy-weight method in distribution network is shown in Figure 4. control scheme should be removed. Besides, three indexes containing the network loss, the node voltage offset and the minimum module-eigenvalue of the Jacobian matrix are used to evaluate the control effect of each alternative, and the three indexes are presented by y1, y2 and y3.
Step 2: This is a multi-attribute decision-making problem with 3 evaluation indexes and Ne alternatives, and the evaluation index matrix is formed as expressed in Equation (12).
Step 3: As the smaller the value of the network loss and the node voltage offset, the better the control effect of reactive power, standardize the two indexes according to Equation (13); conversely, the larger the minimum module-eigenvalue is, the more stable the system is, so standardize the index according to Equation (14). Then calculate the proportion pij that the evaluation index rij on the index i with Equation (15).
Step 4: Calculate the entropy of each evaluation index and the corresponding entropy weight according to Equation (16) and Equation (17) respectively.
Step 5: The objective weight w1, w2 and w3 of the three evaluation indexes are substituted into Equation (19) to calculate the total evaluation value of each alternative respectively. And the scheme that has the highest evaluation value is the optimal control plan.
In summary, the reactive power optimization process based on the entropy-weight method in distribution network is shown in Figure 4. As the proposed EWOSM is based on the analysis and comparison of a large number of historical reactive power optimization schemes, under the ideal condition, two assumptions are set up that the historical database is large enough to containing all the load distributions and the corresponding historical control scheme are optimized, so that the result of EWOSM should be the optimal scheme. But in practical engineering applications, the two assumptions are hardly to set up, so the result cannot be guaranteed the optimal scheme and it is more like a suboptimal feasible solution. Then, a hybrid method based on the combination of EWOSM and some existing methods, such as Genetic Algorithm (GA) method, neighborhood search method and Sequential Quadratic Programming (SQP) method, is proposed to ensure the optimality and practicability. For example, the Flow chart of reactive power optimization in distribution network based on entropy-weight method.
As the proposed EWOSM is based on the analysis and comparison of a large number of historical reactive power optimization schemes, under the ideal condition, two assumptions are set up that the historical database is large enough to containing all the load distributions and the corresponding historical control scheme are optimized, so that the result of EWOSM should be the optimal scheme. But in practical engineering applications, the two assumptions are hardly to set up, so the result cannot be guaranteed the optimal scheme and it is more like a suboptimal feasible solution. Then, a hybrid method based on the combination of EWOSM and some existing methods, such as Genetic Algorithm (GA) method, neighborhood search method and Sequential Quadratic Programming (SQP) method, is proposed to ensure the optimality and practicability. For example, the neighborhood search algorithm can be used to find the global optimal solution in the neighborhood of the result of EWOSM; besides, the result of EWOSM also can be taken as an initial solution of the existing optimization algorithm to speed up the convergence and improve efficiency.

Case Descriptions of the 173 Nodes System
To demonstrate the effectiveness, the proposed method EWOSM was tested on a practical distribution system with 173 nodes. The head of the system is slack bus, and it is in the low voltage side of a step-down substation from 220 kV to 110 kV. There are two lines of 110 kV connected from the slack bus to two step-down substations from 110 kV to 10 kV respectively, and the substations are connected by five medium-voltage lines of 10 kV, which are named by line A, line B, line C, line D and line E. The single-line diagram of the tested system is shown in Figure 5. neighborhood search algorithm can be used to find the global optimal solution in the neighborhood of the result of EWOSM; besides, the result of EWOSM also can be taken as an initial solution of the existing optimization algorithm to speed up the convergence and improve efficiency.

Case Descriptions of the 173 Nodes System
To demonstrate the effectiveness, the proposed method EWOSM was tested on a practical distribution system with 173 nodes. The head of the system is slack bus, and it is in the low voltage side of a step-down substation from 220 kV to 110 kV. There are two lines of 110 kV connected from the slack bus to two step-down substations from 110 kV to 10 kV respectively, and the substations are connected by five medium-voltage lines of 10 kV, which are named by line A, line B, line C, line D and line E. The single-line diagram of the tested system is shown in Figure 5. In Figure 5, T1 and T2 are the main transformers in the substation, and the models are both SFZ11-12500/110 with the tap ranging from 0.9 p.u. to 1.1 p.u. by 17 steps. The load is connected to the medium-voltage distribution network by distribution transformer; take T3-T7 for example, the models are S11-400/10, S11-630/10, S11-630/10, S11-630/10 and S11-800/10 respectively, and the taps are fixed to 1.0 p.u. There are thirteen nodes installed by shunt capacitors in the system, where Qc1 and Qc2 are the centralized compensations in the low-voltage side of substation, and Qc3-Qc13 are the feeder compensations. The capacity of capacitors is equal grouped and each group is 50 kvar. The configuration information of capacitors is listed in Table 1. The proposed method EWOSM was coded In Figure 5, T1 and T2 are the main transformers in the substation, and the models are both SFZ11-12500/110 with the tap ranging from 0.9 p.u. to 1.1 p.u. by 17 steps. The load is connected to the medium-voltage distribution network by distribution transformer; take T3-T7 for example, the models are S11-400/10, S11-630/10, S11-630/10, S11-630/10 and S11-800/10 respectively, and the taps are fixed to 1.0 p.u. There are thirteen nodes installed by shunt capacitors in the system, where Qc1 and Qc2 are the centralized compensations in the low-voltage side of substation, and Qc3-Qc13 are the feeder compensations. The capacity of capacitors is equal grouped and each group is 50 kvar. The configuration information of capacitors is listed in Table 1. The proposed method EWOSM was coded in MATLAB R2009a (MathWorks, Natick, MA, USA) and run on an Intel i5-3230M 2.6 GHz notebook with 4 GB RAM (Dell, Round Rock, TX, USA). The historical load data in the test system is from the actual historical database from year 2011 to 2015, and the corresponding reactive power control schemes are also read from the historical database. The missing part of control schemes are calculated by Sequential Quadratic Programming (SQP) based on literature [36], and then the historical database is completed.
In Figure 5, the tie lines are represented by the red dashed lines, and S1-S6 are the tie switches on the corresponding tie lines. The network topology changes through the cooperation of the section switches and tie switches.

The Typical Scenarios Partitioning of the 173 Nodes System
There are eight typical topology scenarios in the historical database from year 2011 to 2015, and the sum of the duration accounts for 94.85% of the total running time of the system. The rest time is accounted for other topology scenarios. The specific information of the typical topology scenarios are shown in Table 2. According to the information of seasons, weekdays and weekends, there are eight typical load scenarios, and the specific scenario information is shown in Table 3. The total load forecast curve of one day is shown in Figure 6, and three different load levels are selected out for reactive power optimization calculations, respectively at 2 o'clock, 10 o'clock and 17 o'clock.
Appl. Sci. 2017, 7, 787 12 of 19 The total load forecast curve of one day is shown in Figure 6, and three different load levels are selected out for reactive power optimization calculations, respectively at 2 o'clock, 10 o'clock and 17 o'clock. Take the high load level (at 17 o'clock) for example to illustrate the calculation processes. Firstly, it is determined that the topology at the time belongs to the first type of the typical topology scenarios according to Table 2, and the typical scenario lasts for 10,512 h from year 2011 to 2015, which can be converted to 438 days. Besides, the day belongs to the typical summer weekdays according to the date, and the typical load scenario lasts for 328 days from year 2011 to 2015. Take the intersection of the two typical scenarios, and there are 217 days in the database that the topology scenario and load scenario are both belonged to the same typical scenario with time to be optimized.

The Load Distribution Matching of the 173 Nodes System
Then, the load distributions at 17 o'clock of the 217 days are matched with the proposed load distribution matching method, and σ is set to the 1% of the load on corresponding nodes. Finally, there are 59 alternatives are selected out, and the results of load distribution matching are shown in Figure 7.
As shown in Figure 7, the black symbols * represent the load distribution at 17 o'clock; the red symbols ˅ and magenta symbols ˄ respectively indicate the upper and lower limits of the load distribution matching, that is, the 1% positive and negative deviation of the load on each node at 17 o'clock; the blue points represent the 59 load distributions from the matching results. It can be seen obviously from the partial enlarged diagram at the top-left corner of Figure 7 that the matched 59 load distributions are all within the 1% positive and negative deviation of the load on each node at 17 o'clock.

The Entropy Weight Method of the 173 Nodes System
Power flow is calculated in turn with the 59 alternatives based on the load at 17 o'clock, and the corresponding network loss, node voltage offset and minimum module-eigenvalue are shown in Figure 8, which are calculated according to Equations (2)-(4). Take the high load level (at 17 o'clock) for example to illustrate the calculation processes. Firstly, it is determined that the topology at the time belongs to the first type of the typical topology scenarios according to Table 2, and the typical scenario lasts for 10,512 h from year 2011 to 2015, which can be converted to 438 days. Besides, the day belongs to the typical summer weekdays according to the date, and the typical load scenario lasts for 328 days from year 2011 to 2015. Take the intersection of the two typical scenarios, and there are 217 days in the database that the topology scenario and load scenario are both belonged to the same typical scenario with time to be optimized.

The Load Distribution Matching of the 173 Nodes System
Then, the load distributions at 17 o'clock of the 217 days are matched with the proposed load distribution matching method, and σ is set to the 1% of the load on corresponding nodes. Finally, there are 59 alternatives are selected out, and the results of load distribution matching are shown in Figure 7.
As shown in Figure 7, the black symbols * represent the load distribution at 17 o'clock; the red symbols ∨ and magenta symbols ∧ respectively indicate the upper and lower limits of the load distribution matching, that is, the 1% positive and negative deviation of the load on each node at 17 o'clock; the blue points represent the 59 load distributions from the matching results. It can be seen obviously from the partial enlarged diagram at the top-left corner of Figure 7   As shown in Figure 8, there are some cases that the network loss or node voltage offset of the 59 alternatives is similar or the same. It is because that a part of the control schemes of the alternatives may be same or similar. The weights of the network loss, node voltage offset and minimum moduleeigenvalue are 0.5402, 0.2106 and 0.2492 respectively, and the optimal scheme is the thirty-second alternative. The specific optimal scheme and results are shown in Table 4.  950  400  400  1000  300  350  950  250  200  Qc2/kvar  1000  800  850  900  550  550  950  350  350  Qc3/kvar  600  800  850  800  700  650  450  550  500  Qc4/kvar  900  550  550  450  450  400  500  350  300  Qc5/kvar  1050  950  1000  1150  800  800  600  600  600  Qc6/kvar  600  450  450  250  400  350  350  300  300  Qc7/kvar  600  500  500  450  400  400  300  300  300  Qc8/kvar  500  500  500  450  450  450  350  350  300  Qc9/kvar  1200  800  800  700  650  600  450  500  500  Qc10/kvar  1900  1150  1200  1150  950  950  650  700  700  Qc11/kvar  650  1150  1150  850  950  900  850  700  700  Qc12/kvar  1900  1550  1650  1450  1350  1300  900  1050  1000  Qc13/kvar  800  750  800  550  650  650  450  500  450  T1 1 + 6 ×  The number of alternatives The minimum module-eigenvalue The number of alternatives The minimum module-eigenvalue As shown in Figure 8, there are some cases that the network loss or node voltage offset of the 59 alternatives is similar or the same. It is because that a part of the control schemes of the alternatives may be same or similar. The weights of the network loss, node voltage offset and minimum module-eigenvalue are 0.5402, 0.2106 and 0.2492 respectively, and the optimal scheme is the thirty-second alternative. The specific optimal scheme and results are shown in Table 4. The proposed method EWOSM is compared with conventional reactive power optimization method to verify the validity and effectiveness. Genetic Algorithm (GA) and SQP method are chosen as the representatives of the existing artificial intelligence algorithms and traditional mathematical methods respectively to compare with EWOSM. The optimal results of the three methods under three different load levels are shown in Table 4. GA method based on literature [37,38] is used as the solution of conventional reactive power optimization, and the specific parameters are as follows: crossover rate is 0.8, mutation rate is 0.2, population size is 20, the maximum number of iterations is 100, and the algorithm will stop with no evolution for more than 50 continuous generations. And SQP method is based on literature [36].
It can be seen from Table 4 that the differences of network loss of GA, SQP and EWOSM under the three different load levels are less than 1%, and the differences of minimum module-eigenvalue are less than 1.5%, which proves the validity and effectiveness of the proposed EWOSM. Besides, the three methods are also applied to several different scales of test systems, including the standard distribution system of IEEE 33 nodes [39], PG & E 69 nodes [40] and a practical distribution system with 292 nodes, and the results are listed in Appendix A (Table A1) due to length of the article, which can further proof the effectiveness of EWOSM.

The Influence of System Scale and the Number of Control Variables on the Computation Time
In order to verify the superiority of the proposed method in terms of computing speed, several test systems are simulated respectively to analyze the influence of the system scale and the number of control variables on the computation time, based on the historical data from year 2011 to 2015. To fully verify the impact of the two factors on the computation time, the network topologies of the following test systems are assumed to remain unchanged.

Analysis of the Influence of System Scale on the Computation Time
Firstly, test the impact of the system scale on the computation time with three systems, containing the standard distribution system of IEEE 33 nodes, PG & E 69 nodes and a practical 292 distribution system with 292 nodes. The three testing systems all contain one on-load tap changer (OLTC) and three shunt capacitor compensation nodes, and the specific configuration information of capacitors are shown in Table 5. The single group capacity of capacitors is 50 kvar, and tap of OLTC is 1 ± 8 × 1.25%. The proposed EWOSM, GA method and SQP method are used for reactive power optimization respectively on the three testing systems, and the comparison of computation time is shown in Figure 9.
As shown in Figure 9 that while the number of control variables is the same, the computation time of EWOSM and GA method increases with the increasing of the system scale, but the computation time of SQP method has little relevance to the system scale. The reason is that power load of all nodes need to be matched in the load distribution matching. Therefore, the computation time of EWOSM is positive correlated to the number of system nodes, but it is still much shorter than GA method.

Analysis of the Influence of the Number of Control Variables on the Computation Time
Next, the influence of the number of control variables on computation time is tested on the 173 nodes system mentioned in Section 4.1. The system contains five medium-voltage lines of 10 kV and fifteen control variables. Adjust the number of lines that take part in reactive power optimization, and then the number of control variables changes correspondingly. The proposed EWOSM, GA method and SQP method are used to calculate the optimal results respectively. The specific setting of control variables is shown in Table 6 and the comparison of computation time is shown in Figure  10.

Analysis of the Influence of the Number of Control Variables on the Computation Time
Next, the influence of the number of control variables on computation time is tested on the 173 nodes system mentioned in Section 4.1. The system contains five medium-voltage lines of 10 kV and fifteen control variables. Adjust the number of lines that take part in reactive power optimization, and then the number of control variables changes correspondingly. The proposed EWOSM, GA method and SQP method are used to calculate the optimal results respectively. The specific setting of control variables is shown in Table 6 and the comparison of computation time is shown in Figure 10.

Analysis of the Influence of the Number of Control Variables on the Computation Time
Next, the influence of the number of control variables on computation time is tested on the 173 nodes system mentioned in Section 4.1. The system contains five medium-voltage lines of 10 kV and fifteen control variables. Adjust the number of lines that take part in reactive power optimization, and then the number of control variables changes correspondingly. The proposed EWOSM, GA method and SQP method are used to calculate the optimal results respectively. The specific setting of control variables is shown in Table 6 and the comparison of computation time is shown in Figure  10.    It can be seen from Figure 10 that the computation time of SQP method increases with the increasing of the number of control variables, while the computation time of EWOSM and GA method are less affected by the number of control variables. While the number of control variables is small, the advantage in computation time of the method proposed is not obvious compared with the existing methods. But with the increasing of the number of control variables, the advantage is obviously enhanced.

The Combination of EWOSM and SQP Method
As the proposed method EWOSM is based on the analysis and comparison of large amount of historical data, in practical applications, some historical data may be missing and some historical control schemes are not optimized, which will lead to that the result of EWOSM is more like a suboptimal feasible solution rather than a global solution. In practical scenarios that the global optimal solution is necessary, EWOSM can be used in combination with GA method, neighborhood search method, SQP method and other existing methods to speed up the convergence and ensure the global optimization.
In this case, comparison of objective function values, computation time and convergence algebra of three different methods is designed to prove the effectiveness of the proposed hybrid method, in which the result of EWOSM is treated as an initial solution of SQP method. The computation time and results of EWOSM, the hybrid method and SQP method are shown in Table 7. It can be seen from Table 7 that the objective function of the three methods is almost the same, which proves the validity of the hybrid method proposed. From the view of computation time, the time of hybrid method is 7.41 s, which is reduced by 62% than SQP method. Besides, from the view of convergent rate, the hybrid method converges in the ninth iteration, which is much less than the forty-second iteration of SQP method. The comparison result illustrates that the effect of the proposed hybrid method in speeding up the convergence and reducing the computation time is remarkable.

Conclusions
A reactive power optimization method in distribution network based on EWOSM is presented. The proposed method is tested on several systems with different scales and comparison has been made with GA method and SQP method. The results have proved the validity and effectiveness of the proposed method EWOSM. The contributions and the novelties can be concluded as follows: (1) The proposed EWOSM can rapidly and accurately select out the optimal scheme from large amount of historical data. And the advantage in computation time is remarkable than existing methods. (2) The proposed EWOSM can be used in combination with existing methods to speed up the convergence and ensure the global optimization. (3) As the proposed EWOSM is based on the analysis of large amount of historical data, it is more suitable for the distribution system that has relatively stable load and complete historical database; otherwise the proposed EWOSM needs to cooperate with existing methods. The application of big data theory and method in reactive power optimization needs to be further improved and perfected.