Data-Driven Decentralized Algorithm for Wind Farm Control with Population-Games Assistance

: In wind farms, the interaction between turbines that operate close by experience some problems in terms of their power generation. Wakes caused by upstream turbines are mainly responsible of these interactions, and the phenomena involved in this case is complex especially when the number of turbines is high. In order to deal with these issues, there is a need to develop control strategies that maximize the energy captured from a wind farm. In this work, an algorithm that uses multiple estimated gradients based on measurements that are classiﬁed by using a simple distributed population-games-based algorithm is proposed. The update in the decision variables is computed by making a superposition of the estimated gradients together with the classiﬁcation of the measurements. In order to maximize the energy captured and maintain the individual power generation, several constraints are considered in the proposed algorithm. Basically, the proposed control scheme reduces the communications needed, which increases the reliability of the wind farm operation. The control scheme is validated in simulation in a benchmark corresponding to the Horns Rev wind farm.


Introduction
Nowadays, it is quite strange to find wind turbines operating isolated into a geographical scheme.Conversely, they are conveniently arranged in groups known as wind farms, which inject power into the electrical grid in a way and magnitude comparable to non-renewable energy sources.
Initially, control strategies designed for wind farms were mainly based on aggregate models, which represent those arrays as a large equivalent wind turbine [1][2][3][4].The drawback of this modeling approach relies on the lack of interactions among turbines caused by the wakes.A turbine located in the path of wakes produced by up-stream turbines suffers from the reduction of arriving wind speed and will probably be exposed to a more turbulent air flow [5].Consequently, using a unique control signal for all the turbines into the farm will not result as effective as computing a single signal for each turbine depending on its operating conditions.In particular, when the main objective is to maximize the energy capture, control strategies aimed at reducing the wake effects have shown promising results [6].
Control strategies considering non-aggregate models have been an active research topic in the last years (see [7] and reference therein).One of their main difficulties is the consideration of complex dynamics from all the physical phenomena involved in the turbine interactions and the wake effects.High-fidelity models can only be used to validate strategies, and simplification must be used to obtain control-oriented models in order to use model-based control techniques, see for instance [8,9].For this reason, some reported approaches propose the use of data-driven control techniques.Moreover, since reliability is also an important feature to take into account, control strategies should also consider the reduction of the communication channel usage.Among others, the use of safe experimentation dynamics to design a distributed control strategy aimed at maximizing the energy capture considering local information and the total power amount produced by the wind farm is reported in [2].In this proposed approach, control variables are randomly perturbed in order to optimize such total power.The resultant value of these variables should be updated only in case the amount of the total generated power was improved.Another viewpoint is reported in [10], where the control strategy is fed with local information only and its convergence is improved by adding information about the objective function gradient, no matter whether the resultant optimum was local.In [11], a Bayesian ascent algorithm is proposed including two different phases, i.e., a learning and an optimization phase.More recently, Ciri et al. [12] have applied extremum-seeking control to improve the power production by looking for the optimal gain of the torque controller at each individual wind turbine.
Thus, the implementation of a population-dynamics approach with time-varying set of strategies in a gradient-estimation algorithm, altogether as a data-driven decentralized control scheme, is the main contribution of this paper.The scheme comprises of a single controller per each wind turbine.The proposed approach is inspired in a population game [13], where the strategies are aligned to the domain-space coordinates of an unknown function to be maximized.Hence, the gradient estimation is performed from a unique (or multiple) measurement(s) such as in [14,15], corresponding to the population game strategies.Finally, the set of strategies are updated in function of the gradients estimates and the entire routine is run again.Furthermore, the proposed algorithm allows also to establish set-points for the desired power generation.Moreover, different from other works related to evolutionary dynamics, e.g., [16][17][18][19], where the evolutionary games work as the optimization algorithm, this paper proposes to take advantage of this game-theoretical approach to assists an heuristic algorithm.
One of the main features of the proposed approach is the use of local information only, i.e., each turbine only requires information about its own control signal apart of the total power amount from the whole wind farm (following the approach reported in [2]).However, different from [2], the proposed control algorithm avoids the random testing towards obtaining improvements in the total generated power.Instead, gradient estimation is used [10,20], but highlighting significant differences: • the algorithm proposed in this paper uses historical information of the system evolution to compute multiple directions of the gradient estimations in a decentralized fashion and for every single wind turbine, i.e., it is not necessary to share information among turbines, and • the proposed approach produces global solutions due to the availability of the total generated power amount.
Moreover, different from [2,20,21], the individual saturation of the power generation in the turbines is considered, and it is shown that the proposed data-driven algorithm can handle with this situation maximizing the total power.Preliminary results related to this work were reported in [21].Similarly as in [11,12], the key aspect in the algorithm is given by the appropriate computation of an ascent direction.
This paper is structured as follows.In Section 2, the problem of controlling wind farms and the wake modeling are briefly presented.Preliminary concepts used in the proposed algorithm design are stated in Section 3, while Section 4 describes in-depth the proposed algorithm.Section 5 presents and explains the data-driven decentralized control scheme for wind farms.Next, in Section 6, the main results obtained are presented and discussed, which have been obtained with a case study of 80 wind turbines maximizing the total power generation by using the axial induction factors.Finally, some conclusions are drawn in Section 7.

Notation
Calligraphy letters are used to denote the sets, e.g., S. The column vectors are denoted with bold font, e.g., y.Every sub-index refers to elements corresponding to strategies in a population, e.g., s i,k refers to a vector associated to the ith strategy in a time instant k.Besides, | • | denotes the cardinality of a set.Hence, R ≥0 denotes the set of all the non-negative real numbers and R >0 denotes the set of all the strictly positive real numbers.Finally, both continuous-time and discrete-time frameworks are used throughout this paper denoted by k and t, respectively.Moreover, discrete time is denoted as a sub-index, e.g., S k , whereas continuous time is denoted as an argument, i.e., p i (t).

Problem Statement
The power generated by a turbine i ∈ W is given by being the air density, A the area swept by the rotor, V i the wind speed experienced by the turbine, and W = {1, ..., m} a set indexing the turbine into the farm.The power coefficient C P depends on the axial induction factor a i according to In wind farm control applications, the axial induction factor a i is used as control signal to regulate the power generated by each turbine.The maximum individual power production is achieved when In case of wind turbines within a farm, the wind speed faced by each turbine is a combination of the free-stream speed and the wakes produced by nearby turbines.Both the wind farm layout and the predominant wind speed direction determine the interactions caused by the wake effect.An illustration of the wake produced down-stream in a simple layout of m turbines aligned with the free-stream wind speed V ∞ can be seen in Figure 1.
Example of wind farm layout and the corresponding wake effect.
The speed deficit caused by the wake effect can be modeled as proposed in [5].When all turbines are aligned with V ∞ , the wind speed faced by the turbine i ∈ W is given by where with D j the diameter of the wind rotor, x j the position the turbine and β a coefficient indicating the expansion of the wake.
The control objective considered in this paper is to maximize the total generated power [2,20], i.e., max This objective consists in finding the vector of axial induction factors a = [a 1 a 2 . . .a m ] ∈ R m that maximizes the total power P T produced by the entire wind farm.In general, a greedy strategy, in which every turbine works at a i = 1/3, does not achieve the optimal value due to the wind speed deficit caused by the wake effects.In order to maximize the total energy capture, the most upstream turbines must reduce the generation.Thus, the wind speed deficit faced by downstream turbines is lower and the total power is higher.As the quantification of the wake effect is quite complex and depending on uncertain parameters, it is advisable the use of control algorithms not relying on wake modeling.This is the purpose of the algorithm proposed in this paper.

Preliminary Concepts
The proposed strategy uses a data set, which is composed by two main elements: (i) the estimation of the gradient; and (ii) the computation of the relevance of each datum.Figure 2 presents the main steps of the proposed heuristic algorithm, which are explained in detail next along Sections 3 and 4.

Gradient Estimation
Consider an unknown function f (y), where y ∈ R m , i.e., f : R m → R. The objective is to find the appropriate vector y such that f is maximized.For that, a data-driven algorithm based on gradient estimation (which uses a distributed gradient estimation proposed in [14,15]), and evolutionary game-theoretical concepts is proposed.In particular, several gradients are estimated and a qualification regarding each one of them is performed in order to determine a direction for the vector y.
Let c, d ∈ R m be two arbitrary vectors in the domain of the function f .It is assumed that the function values f (c) and f (d) are available through measurements, and that information is known in the coordinates c and d, respectively.Therefore, the estimation of an increasing rate and direction over the function f between the vectors c and d is given by The function g : R m × R m → R m is a modification of the original one presented in [14].Basically, there is a need in this problem to capture the information regarding the increasing directions, and that is why the magnitude has been slightly modified.
Remark 1.In order to estimate the gradients for the unknown function f (y), it is necessary 1.
to be able to capture measurements of the unknown function f , and 2.
to know the correspondence of the measurement with the element in the domain of f , i.e., for a measurement f (d) the element d in the domain of f is known.

Population-Game Role
The proposed algorithm uses multiple estimated gradients based on measurements.The decision variables are updated by computing a superposition of the gradients.Thus, an appropriate increasing direction for the unknown function is identified in order to maximize it.One key step within the algorithm is to classify the multiple measurements according to their quality (those elements in the domain with higher-value measurements are considered to be of better quality, see Section 4).Such classification is performed by using a population-games-based algorithm whose setting changes every iteration.Next, the evolutionary dynamics and population games background is introduced.
Consider a population with a finite and large number of rational agents.Within the population, there are n available strategies every discrete time k ∈ Z ≥0 (with a sampling time given by τ ≥ t f ) associated to a coordinate.At time instant k and during a fixed period of time denoted by τ, agents have the possibility to choose among n available strategies.Let I = {1, . . ., n} be the set of indexes corresponding to the n available strategies, and S k = {s 1 k , . . ., s n k } the set of strategies, where s i k ∈ R m , for all i ∈ I, and S k ⊂ R m .It should be noticed that the set of strategies S k varies along the discrete time with a sampling time given by τ.
At each discrete-time step k, a strategic interaction in continuous time occurs for a period of time equivalent to τ (sampling time).In the previously mentioned interaction and at time instant k, the scalar value p i (t) ∈ R ≥0 , with 0 ≤ t ≤ τ, corresponds to the proportion of agents that are selecting the strategy s i k ∈ S k .All the proportions of agents selecting the different strategies in the population compose a strategic distribution or a population state, which is denoted by p(t) ∈ R n , 0 ≤ t ≤ τ.The set of all possible population states evolves in a simplex denoted by ∆ = p(t) ∈ R n ≥0 : ∑ i∈I p i (t) = 1 .Agents have incentives to select among available strategies in the population according to a fitness function given by f , for all i ∈ I.The fitness function for the whole population is given by . Notice that the constant value π affects all the fitness functions to ensure that f i (p i (t)) is decreasing with respect to p i (t).Then, it is ensured that the population game is stable according to Definition 1 (adapted from [13]).Definition 1.A population game f : ∆ → R n is a stable game if z D f z ≤ 0, for all z ∈ ∆ T , p ∈ ∆, where ∆ T is the tangent space of the simplex given by ∆ T = {z ∈ R n : ∑ i∈I z i = 0}.The objective within the population is to achieve a Nash equilibrium that is denoted by p ∈ ∆.Formally, the set of Nash equilibrium is defined next.Definition 2. Let p ∈ ∆ in the population be a Nash equilibrium if each used strategy entails the maximum benefit for the proportion of agents selecting it, i.e., the set {p ∈ ∆ : p i > 0 ⇒ f i (p ) ≥ f j (p )}, for all i, j ∈ I, corresponds to the Nash equilibria.
Additionally, suppose that the possible interaction among agents choosing different strategies is given by an undirected non-complete communication graph denoted by G = (I, E ), where I is the set of vertices representing the strategies and E ⊂ {(i, j) : i, j ∈ I} is the set of edges or links determining possible communication and information sharing among strategies.The set of neighbors of the node i ∈ I is given by N i = {j : (i, j) ∈ E }.It should be clear that each strategy estimates a gradient based only on information from its neighbors.
For the proposed algorithm, the proportion of agents p i (τ) represents a quality assigned to the strategy s i k evaluated at time τ, which is a tuning parameter of the proposed algorithm.In other words, the proportion provides information about how well the strategy s i k maximizes the function f with respect to the other available strategies S k \{s i k } at time instant k ∈ Z ≥0 .Afterwards, the set of strategies S k has an update based on gradient estimations over the function f , and the different qualities for all the strategies are given by p ∈ R n .
The Nash equilibrium for the population game f is obtained by solving the maximization of the potential function using the distributed population dynamics presented in [18].In this case, the distributed projection dynamics (DPD) are chosen, and are defined as Here, L is the Laplacian matrix corresponding to the connected graph G. Clearly, another distributed population dynamics could have been used (distributed replicator dynamics, distributed Smith dynamics).If the sampling time τ is big enough, then p(τ) = p ∈ ∆ is the Nash equilibrium of the game.Otherwise, the quality of strategies at time k ∈ Z ≥0 corresponds to a transitory trend of the proportion of agents in the population.
Next, in Section 4 the combination of these two concepts is shown in order to derive a data-driven optimization algorithm.

Algorithms According to Information Availability
The proposed approach can be implemented in different manners depending on the availability of information.These approaches are selected depending on the information scheme of the application.

Using Multiple Measurements at Each Iteration
Consider the smallest population games involving only two strategies S k = {s 1 k , s 2 k }, and with an associated communication graph G as introduced in Section 3. The initial condition for the strategic set S 0 consists of two arbitrary strategies denoted by s 1 0 ∈ R m and s 2 0 ∈ R m .Hence, the initial population state is arbitrarily selected in the relative interior of the simplex set, i.e., p(0) ∈ int∆.Furthermore, the fitness functions corresponding to the strategies are given by f , being π ∈ R >0 an upper bound for the fitness functions such that π > f (s i k ), for all i ∈ I.
Remark 2. The value f (s i k ) is measurable in the corresponding node i ∈ I, for all k ∈ Z ≥0 .
Due to the fact that both strategies s 1 k ∈ S k and s 2 k ∈ S k have information about each other, then the gradients g(s 1 k , s 2 k ) and g(s 2 k , s 1 k ) can be computed in the nodes 1 ∈ I, and 2 ∈ I, respectively.Therefore, agents have the required information to make a decision within the population.
The update of the strategic set S k is performed as follows.For instance, assume that f (s 2 k ) > f (s 1 k ), then the strategy s 1  k evolves towards the strategy s 2 k in order to be closer in the domain.In contrast, s 2  k is updated in the opposite direction of the strategy s 1 k , getting farther from it in the domain.
Thus, according to the strategic update procedure, those strategies with better quality (i.e., whose image in the function f : R m → R is better) have a smoother change along the time, whereas those with lower values suffer bigger changes.Then, the update rate for the strategy s i k , for all i ∈ I is denoted by where γ ∈ R >0 is a common tunable parameter for all the strategies, which determines the update rate according to (7), for all i ∈ I.For instance, consider the case involving two strategies i, j ∈ I, and without loss of generality, let f (s i k ) > f (s j k ).Therefore, it is expected that p i (τ) > p j (τ), i.e., the strategy s i k has a better quality than the strategy s j k .Consequently, a bigger update rate is assigned to the strategy s j k , i.e., it is obtained that θ i (τ) < θ j (τ).In addition, it is proposed to take into account an exploration term that allows to identify potentially better quality strategies in the domain of the function f .The aforementioned exploration parameter is applied within the strategic update by means of a random value δ ∈ [−ε ε] m , where ε ∈ R >0 .Hence, the strategic update is performed as follows: In order to illustrate how the strategic update is performed according to a superposition of gradients, Figure 3 shows an example for a function f in the R 2 domain.Figure 3a shows a population case involving four strategies whose information interaction is determined by a star graph communication, i.e., G = (I, E ), where I = {1, ..., 4}, and E = {(1, 2), (1,3), (1, 4)}.For the example presented in Figure 3a, f   Notice that the proposed approach requires the availability of n measurements corresponding to n different images of the function f , which should be captured at the same time instant k ∈ Z ≥0 , i.e., it is necessary to capture the measurements corresponding to the values f (s i k ), for all i ∈ I. Figure 4 presents a general scheme for the algorithm.Nevertheless, a subtle modification can be applied to the algorithm such that less information is required at each time instant as discussed in the next section.

Using a Single Measurement at Each Iteration
Consider the same population game that has been introduced in Section 3, i.e., there is a set of n strategies at each time instant k ∈ Z ≥0 denoted by S k = {s 1 k , . . ., s n k }.However, different from Section 3, there are n − 1 preserved strategies within the set S k in the strategy update (n − 1 strategies do not change at each time instant), and only one of them is modified.Formally, it is said that {S k+1 }\{S k } = {s 1 k+1 }, which means that there is only one new strategy comparing the strategic sets S k+1 and S k .
It is quite important to highlight that even though there is only one new datum at each iteration, the estimation of all the gradients is different.
To initialize the algorithm, it is assumed that the measurements of the values corresponding to the function f for the n initial strategies are known, i.e., f (s 1 0 ), ..., f (s n 0 ) are initially known.Notice that, the condition to initialize the algorithm can be ensured within at least a time nτ, i.e., it is possible to capture the n required initial measurements in n iterations with sampling time τ.
Although there is only one new measurement at each time instant, all strategies can be updated.Therefore, the information limitation is treated by means of the following algorithm: To illustrate the procedure in Equation ( 8), Figure 3b presents an example with only one available measurement at time instant, i.e., the new measurable information is given by f (s 1 k ).Note that, at the next time instant k + 1, the values for the function f (s k ) are already known by the algorithm.

Data-Driven Decentralized Control of Wind Farms
The power generated by the wind turbine i ∈ W, denoted by P i (a depends on the behavior of other turbines as shown in Equation (3).Therefore, it is obtained that the unknown function f in Equation ( 5) corresponds to the total power function P T = ∑ i∈W P i in Equation (4).On the other hand, each strategy belonging to the domain of the unknown function is given by s . Therefore, for this specific control problem, there is only one available measurement at each time instant k ∈ Z ≥0 , i.e., the only available information is the total generated power for the current established axial induction factors.In consequence, the appropriate approach to deal with this data-drive control problem is the one introduced in Section 4.2.The control scheme corresponds to a decentralized architecture since individual decisions are made at each wind turbine without requiring communication with other turbines.
Figure 5 shows the traditional decentralized control scheme, where the information about the total generation power P T (a) is provided to each wind turbine i ∈ W. In addition, each turbine also has information about its current axial coefficient.Then, the algorithm in Equation ( 8) can be written for each turbine as follows: where for all j ∈ W. Notice that in the algorithm in Equation ( 9), each turbine has the required information since P T (a 1 k ) is measured and P T (a k ) has been stored for all ∈ N 1 .
V Now suppose that it is necessary to limit the individual power generation under a value P s i , i.e., the admissible power generation for each wind turbine is Therefore, it is necessary to verify the admissibility of axial coefficient a i if P i (a i , V i ) > P s i .This procedure is performed by computing Equations ( 1) and ( 2) in a decentralized manner to find the new admissible value for a i , i.e., an axial coefficient a s i such that P i (a s i , V i ) = P s i is established.

Case Study and Simulation Results
The effectiveness of the proposed control algorithm was shown by using the Horns Rev wind farm, which has 80 turbines of 2 MW each, and with rotor diameter of 80 m.This farm is arranged in a 8 × 10 layout with a separation among turbines of five rotor diameters (400 m) (see Figure 6).The model parameters were chosen as β = 0.The proposed decentralized algorithm considered three data at each iteration (one from a current measurement, and two from historical data), that means n = 3, being τ = 0.01 s, ε = 0.001 and γ = 1 the the tuning parameters selected.Besides, the communication graphs considered for each algorithm were G = (I, E ), where I = {1, 2, 3}, and E = {(1, 2), (1, 3)}.The decentralized scheme was composed of 80 different algorithms as in Equation ( 9), i.e., each turbine had its individual algorithm.Two scenarios were tested for four different wind speed directions, i.e., 0 • , 15 • , 30 • , and 45 • : • Scenario 1: the free-stream wind speed was below the rated value and all wind turbines are working in maximization of the energy capture.

•
Scenario 2: the free-stream wind speed was above the rated value and some turbines are working in power limitation (at 2 MW).
It is important to state that, when the wind speed faced by a turbine was above the rated value, the axial coefficient (for the corresponding turbine) was imposed by the turbine controller and by the wind farm control.The control algorithm then computed the axial induction factors for the rest of wind turbines in order to generate maximum power for the wind speed conditions.
Figure 7a shows the total power the wind farm produces related to scenario 1 considering four different wind speed directions.Dashed lines indicates the total power using a greedy control strategy (a i = 1/3 ∀ i) for the four directions, respectively.The proposed algorithm was capable of increasing the generation with respect to the greedy case in the four directions, although it was more marked in 0 • and 45 • .This was a consequence of the wind farm layout, in which the air flow disturbance caused by up-stream turbines is more notorious.In Figure 7b,c, it can be seen that the power and the axial induction factors for wind turbines 1-10 (in case of wind speed direction 0 • ).These figures show that the algorithm reduces the axial coefficients in the up-stream wind speed and increases the one in last turbines.This produces an increase of the wind power in the last turbines and the total power generated by the entire wind farm.
Results corresponding to scenario 2 are shown in Figure 8.In this scenario, the free-stream wind speed was above rated value and therefore the total power production should be close to rated values (160 MW).It can be observed in Figure 8 that this is the case for the wind speed directions of 15 • , 30 • and 45 • , but not for 0 • .In this last case, the wake effects induce the reduction the wind speed faced by the set of last wind turbines, causing a total power lower than 160 MW.Nevertheless, observing the powers and axial induction factors for wind turbines 1-10, it can be seen that the proposed algorithm in this situation still seeks to maximize the power generation by reducing the generation in the first turbines and increasing it in the last ones.

Conclusions
A data-driven control for wind farms under a decentralized scheme has been presented.In the proposed algorithm, multiple gradient estimations are utilized in order to improve and speed up the maximization of the total generated power.The control is organized in a decentralized scheme, in which m algorithms produce one control variable for each wind turbine.In spite of the limited communications among agents, the proposed control is able to converge to the global solution.The control strategy was evaluated by simulation in the the well known Horns Rev wind farm layout under low and high wind speed conditions (in the latter the control actions saturate).In both cases, the algorithm was capable of increasing the total power compared with the greedy case and for different wind directions (see Figures 7 and 8).
Even though it has been presented for this specific application, this algorithm may be useful in other engineering problems in which data-driven approaches (i.e., only measured information is available) and maximization of a given objective are required.Moreover, it has been shown the versatility of the algorithm to consider any parameter as the decision variable within the optimization problem.Besides, dealing with the variance minimization of the power generation remains an appealing open problem.Finally, since the proposed algorithm uses estimated gradients based on measurements for known parameter values, i.e., in the maximization of an unknown function f , it is known that the measurement f (a) corresponds to the variable a.In this regard, other decision variables could be easily considered by following the same procedure.

Figure 2 .
Figure 2. General steps of the heuristic proposed approach.
and f : R 2 → R. The superposition of the three gradients can be seen at the node 1 ∈ I.

Figure 3 .
Figure 3. Example gradient estimation with four strategies S(k) = {s 1 k , s 2 k , s 3 k , s 4 k }, i.e., n = 4, and f : R 2 → R, i.e., m = 2. Vectors illustrate the direction for the strategies update and the superposition of influences over strategy with index 1.(a) Various available measurements every iteration, (b) one available measurement every iteration.

Figure 4 .
Figure 4. General scheme for the gradient-estimation-based algorithm with population-games assistance.

Figure 5 .
Figure 5.Typical decentralized control scheme.Each wind turbine has information about the generated power and its own axial induction factor.

Figure 6 .
Figure 6.Horns Rev wind farm of 80 turbines facing a main wind speed with 45 • direction.

Figure 7 .Figure 8 .
Figure 7. (a) Total powers for scenario 1 (free-stream wind speed of 10 m/s) for four wind speed directions.(b) Power generated by wind turbines 1-10 and with wind direction of 45 • .(c) Axial coefficients for wind turbines 1-10 and with wind direction of 45 • .