# Exploring Reward Strategies for Wind Turbine Pitch Control by Reinforcement Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## Featured Application

**Wind Turbine Pitch Control.**

## Abstract

## 1. Introduction

## 2. Wind Turbine Model Description

^{2}), $R$ is the radius or blade length (m), $\rho $ is the air density (kg/m

^{3}), $v$ is wind speed (m/s), ${K}_{f}$ is the friction coefficient (N m/rad/s), ${\theta}_{ref}$ is the reference for the pitch (rad), and $\theta $ is the pitch (rad).

## 3. RL-Inspired Controller

- $S$ is a finite set of states perceived by the interpreter. This set is made with variables of the environment, which must be observable by the interpreter and may be different to the state variables of the environment.
- $A$ is a finite set of actions to be conducted by the agent.
- ${s}_{t}$ is the state at $t$
- ${a}_{t}$ is the action performed by the agent when the interpreter perceives the state ${s}_{t}$
- ${r}_{t+1}$ is the reward received after action ${a}_{t}$ is carried out
- ${s}_{t+1}$ is the state after action ${a}_{t}$ is carried out
- The environment or world is a Markov process: $MDP=\langle {s}_{0},{a}_{0},{r}_{1},{s}_{1},{a}_{1},{r}_{2},{s}_{2},{a}_{2}\dots \rangle $
- $\pi :S\times A\to \left[0,1\right]$ is the policy; this function provides the probability of selection of an action $a$ for every pair $\left(s,a\right)$
- ${p}_{s{s}^{\prime}}^{a}=\mathrm{Pr}\left\{{s}_{t+1}={s}^{\prime}|{s}_{t}=s\wedge {a}_{t}=a\right\}$ is the probability that a state changes from $s$ to $s\prime $ with action $a$
- ${p}^{\pi}\left({s}^{\prime},{a}^{\prime}\right)$ is the probability of selecting action ${a}^{\prime}$ at state ${s}^{\prime}$ under policy $\pi $
- ${r}_{s}^{a}=E\left\{{r}_{t+1}\right|{s}_{t}=s\wedge {a}_{t}=a\}$ is the expected one-step reward
- ${Q}_{\left(s,a\right)}^{\pi}={r}_{s}^{a}+\gamma {\sum}_{s\u2019}{p}_{ss\u2019}^{a}{\sum}_{a\u2019}{p}^{\pi}\left({s}^{\prime},{a}^{\prime}\right){Q}_{\left(s\u2019,a\u2019\right)}^{\pi}$ is the expected sum of discounted rewards

#### 3.1. Policy Update Algorithm

- One-reward (OR), the last one received. As it only takes into account the last reward (smallest memory), it may be very useful when the system to be controlled changes frequently, Equation (9)$$OR:{T}_{\left({s}_{t-1},{a}_{t-1}\right)}^{\pi}\left({t}_{i}\right):={r}_{t}$$
- Summation of all previous rewards (SAR). It may cause an overflow in the long term, that could be solved if the values are saturated to be maintained within some limits, Equation (10)$$SAR:{T}_{\left({s}_{t-1},{a}_{t-1}\right)}^{\pi}\left({t}_{i}\right):={T}_{\left({s}_{t-1},{a}_{t-1}\right)}^{\pi}\left({t}_{i-1}\right)+{r}_{t}$$
- Mean of all previous rewards (MAR). This policy gives more opportunities to not yet selected actions than SAR, especially when there are many rewards with the same sign, Equation (11)$$MAR:{T}_{\left({s}_{t-1},{a}_{t-1}\right)}^{\pi}\left({t}_{i}\right):=\frac{1}{i}\left[{T}_{\left({s}_{t-1},{a}_{t-1}\right)}^{\pi}\left({t}_{i-1}\right)+{r}_{t}\right]$$
- Only learning with learning rate (OL-LR). It considers a percentage of all previous rewards, Equation (12), given by the learning rate parameter $\alpha \in \mathbb{R}\left[0,1\right]$.$$OL-LR:{T}_{\left({s}_{t-1},{a}_{t-1}\right)}^{\pi}\left({t}_{i}\right):={T}_{\left({s}_{t-1},{a}_{t-1}\right)}^{\pi}\left({t}_{i-1}\right)+\alpha \xb7{r}_{t}$$
- Learning and forgetting with learning rate (LF-LR). The previous methods do not forget any previous reward; this may be effective for steady systems but for changing models it might be advantageous to forget some previous rewards, Equation (13). The forgetting factor is modelled as the complementary leaning rate $\left(1-\alpha \right)$.$$LF-LR:{T}_{\left({s}_{t-1},{a}_{t-1}\right)}^{\pi}\left({t}_{i}\right):=\left(1-\alpha \right){T}_{\left({s}_{t-1},{a}_{t-1}\right)}^{\pi}\left({t}_{i-1}\right)+\alpha \xb7{r}_{t}$$
- Q-learning (QL). The discount factor, $\gamma \in \mathbb{R}\left[0,1\right]$ is included in the policy function, Equations (14) and (15).$${a}_{max}=\underset{a}{\mathrm{arg}}MAX\left({T}_{\left({s}_{t},a\right)}^{\pi}\left({t}_{i-1}\right)\right),$$$${Q}_{L}:{T}_{\left({s}_{t-1},{a}_{t-1}\right)}^{\pi}\left({t}_{i}\right)=\left(1-\alpha \right)\xb7{T}_{\left({s}_{t-1},{a}_{t-1}\right)}^{\pi}\left({t}_{i-1}\right)+\alpha \left[{r}_{t}-\gamma \xb7{T}_{\left({s}_{t-1},{a}_{max}\right)}^{\pi}\left({t}_{i-1}\right)\right]$$

#### 3.2. Exploring Reward Strategies

#### 3.2.1. Only Positive (O-P) Reward Strategies

#### 3.2.2. Positive Negative (P-N) Reward Strategies

## 4. Simulation Results and Discussion

#### 4.1. Influence of the Reward Window

_{2}window, and the remaining part from the beginning of the Tw

_{1}window (Figure 11). An action performed at ${t}_{i-1}$ produces an effect that is evaluated when the reward is calculated at ${t}_{i}$. When the size of the window grows during the control period, the Tw

_{1}part also grows, but Tw

_{2}remains invariant. To produce positive rewards, it is necessary to reduce the MSE, therefore, during Tw

_{2}, the increases in Tw

_{1}should be compensated. A larger Tw

_{1}would give a larger accumulated error in this part, which would be more difficult to compensate during Tw

_{2}since only the squared error can be positive. It can then be concluded that the optimal window size for ∆MSE-RS is the control period, in this case, 100 ms.

_{1}part (Figure 11). However, unlike ∆MSE-RS, a longer Tw

_{1}produces less accumulated error in Tw

_{1}since, in this case, the positive errors compensate for the negative ones, and the accumulated error tends to 0. Therefore, Tw

_{2}has a greater influence on the window, and learning is faster.

#### 4.2. Influence of the Size of the Reward

#### 4.3. Combination of Individual Reward Strategies

## 5. Conclusions and Future Works

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Our World in Data. 2020. Available online: https://ourworldindata.org/renewable-energy (accessed on 8 September 2020).
- Mikati, M.; Santos, M.; Armenta, C. Electric grid dependence on the configuration of a small-scale wind and solar power hybrid system. Renew. Energy
**2013**, 57, 587–593. [Google Scholar] [CrossRef] - Tomás-Rodríguez, M.; Santos, M. Modelling and control of floating offshore wind turbines. Rev. Iberoam. Autom. Inf. Ind.
**2019**, 16, 381–390. [Google Scholar] [CrossRef] - Kim, D.; Lee, D. Hierarchical fault-tolerant control using model predictive control for wind turbine pitch actuator faults. Energies
**2019**, 12, 3097. [Google Scholar] [CrossRef] [Green Version] - Bianchi, F.D.; De Battista, H.; Mantz, R.J. Wind Turbine Control Systems: Principles, Modelling and Gain Scheduling Design; Springer: London, UK, 2006. [Google Scholar]
- Salle, S.D.L.; Reardon, D.; Leithead, W.E.; Grilmble, M.J. Review of wind turbine control. Int. J. Control
**1990**, 52, 1295–1310. [Google Scholar] [CrossRef] - Acho, L. A proportional plus a hysteretic term control design: A throttle experimental emulation to wind turbines pitch control. Energies
**2019**, 12, 1961. [Google Scholar] [CrossRef] [Green Version] - Astolfi, D.; Castellani, F.; Berno, F.; Terzi, L. Numerical and experimental methods for the assessment of wind turbine control upgrades. Appl. Sci.
**2018**, 8, 2639. [Google Scholar] [CrossRef] [Green Version] - Liu, J.; Zhou, F.; Zhao, C.; Wang, Z. A PI-type sliding mode controller design for PMSG-based wind turbine. Complexity
**2019**, 2019, 2538206. [Google Scholar] [CrossRef] [Green Version] - Nasiri, M.; Mobayen, S.; Zhu, Q.M. Super-twisting sliding mode control for gearless PMSG-based wind turbine. Complexity
**2019**, 2019, 6141607. [Google Scholar] [CrossRef] - Colombo, L.; Corradini, M.L.; Ippoliti, G.; Orlando, G. Pitch angle control of a wind turbine operating above the rated wind speed: A sliding mode control approach. ISA Trans.
**2020**, 96, 95–102. [Google Scholar] [CrossRef] - Yin, X.; Zhang, W.; Jiang, Z.; Pan, L. Adaptive robust integral sliding mode pitch angle control of an electro-hydraulic servo pitch system for wind turbine. Mech. Syst. Signal Process.
**2019**, 133, 105704. [Google Scholar] [CrossRef] - Bashetty, S.; Guillamon, J.I.; Mutnuri, S.S.; Ozcelik, S. Design of a Robust Adaptive Controller for the Pitch and Torque Control of Wind Turbines. Energies
**2020**, 13, 1195. [Google Scholar] [CrossRef] [Green Version] - Rocha, M.M.; da Silva, J.P.; De Sena, F.D.C.B. Simulation of a fuzzy control applied to a variable speed wind system connected to the electrical network. IEEE Latin Am. Trans.
**2018**, 16, 521–526. [Google Scholar] [CrossRef] - Rubio, P.M.; Quijano, J.F.; López, P.Z. Intelligent control for improving the efficiency of a hybrid semi- submersible platform with wind turbine and wave energy converters. Rev. Iberoam. Autom. Inf. Ind.
**2019**, 16, 480–491. [Google Scholar] - Marugán, A.P.; Márquez, F.P.G.; Perez, J.M.P.; Ruiz-Hernández, D. A survey of artificial neural network in wind energy systems. Appl. Energy
**2018**, 228, 1822–1836. [Google Scholar] [CrossRef] [Green Version] - Asghar, A.B.; Liu, X. Adaptive neuro-fuzzy algorithm to estimate effective wind speed and optimal rotor speed for variable-speed wind turbine. Neurocomputing
**2018**, 272, 495–504. [Google Scholar] [CrossRef] - Sierra-García, J.E.; Santos, M. Performance Analysis of a Wind Turbine Pitch Neurocontroller with Unsupervised Learning. Complexity
**2020**, 2020, 4681767. [Google Scholar] [CrossRef] - Chavero-Navarrete, E.; Trejo-Perea, M.; Jáuregui-Correa, J.C.; Carrillo-Serrano, R.V.; Ronquillo-Lomeli, G.; Ríos-Moreno, J.G. Hierarchical Pitch Control for Small Wind Turbines Based on Fuzzy Logic and Anticipated Wind Speed Measurement. Appl. Sci.
**2020**, 10, 4592. [Google Scholar] [CrossRef] - Lotfy, M.E.; Senjyu, T.; Farahat, M.A.F.; Abdel-Gawad, A.F.; Lei, L.; Datta, M. Hybrid genetic algorithm fuzzy-based control schemes for small power system with high-penetration wind farms. Appl. Sci.
**2018**, 8, 373. [Google Scholar] [CrossRef] [Green Version] - Li, M.; Wang, S.; Fang, S.; Zhao, J. Anomaly Detection of Wind Turbines Based on Deep Small-World Neural Network. Appl. Sci.
**2020**, 10, 1243. [Google Scholar] [CrossRef] [Green Version] - Wang, Z.; Hong, T. Reinforcement learning for building controls: The opportunities and challenges. Appl. Energy
**2020**, 269, 115036. [Google Scholar] [CrossRef] - Khamparia, A.; Singh, K.M. A systematic review on deep learning architectures and applications. Expert Syst.
**2019**, 36, e12400. [Google Scholar] [CrossRef] - Zhang, Z.; Zhang, D.; Qiu, R.C. Deep reinforcement learning for power system applications: An overview. CSEE J. Power Energy Syst.
**2019**, 6, 213–225. [Google Scholar] - Fernandez-Gauna, B.; Fernandez-Gamiz, U.; Grana, M. Variable speed wind turbine controller adaptation by reinforcement learning. Integr. Comput.-Aided Eng.
**2017**, 24, 27–39. [Google Scholar] [CrossRef] - Fernandez-Gauna, B.; Osa, J.L.; Graña, M. Experiments of conditioned reinforcement learning in continuous space control tasks. Neurocomputing
**2018**, 271, 38–47. [Google Scholar] [CrossRef] - Abouheaf, M.; Gueaieb, W.; Sharaf, A. Model-free adaptive learning control scheme for wind turbines with doubly fed induction generators. IET Renew. Power Gener.
**2018**, 12, 1675–1686. [Google Scholar] [CrossRef] [Green Version] - Sedighizadeh, M.; Rezazadeh, A. Adaptive PID controller based on reinforcement learning for wind turbine control. Proc. World Acad. Sci. Eng. Technol.
**2008**, 27, 257–262. [Google Scholar] - Saénz-Aguirre, A.; Zulueta, E.; Fernández-Gamiz, U.; Lozano, J.; Lopez-Guede, J.M. Artificial neural network based reinforcement learning for wind turbine yaw control. Energies
**2019**, 12, 436. [Google Scholar] [CrossRef] [Green Version] - Saenz-Aguirre, A.; Zulueta, E.; Fernandez-Gamiz, U.; Ulazia, A.; Teso-Fz-Betono, D. Performance enhancement of the artificial neural network–based reinforcement learning for wind turbine yaw control. Wind Energy
**2020**, 23, 676–690. [Google Scholar] [CrossRef] - Kuznetsova, E.; Li, Y.F.; Ruiz, C.; Zio, E.; Ault, G.; Bell, K. Reinforcement learning for microgrid energy management. Energy
**2013**, 59, 133–146. [Google Scholar] [CrossRef] - Tomin, N.; Kurbatsky, V.; Guliyev, H. Intelligent control of a wind turbine based on reinforcement learning. In Proceedings of the 2019 16th Conference on Electrical Machines, Drives and Power Systems ELMA, Varna, Bulgaria, 6–8 June 2019; pp. 1–6. [Google Scholar]
- Hosseini, E.; Aghadavoodi, E.; Ramírez, L.M.F. Improving response of wind turbines by pitch angle controller based on gain-scheduled recurrent ANFIS type 2 with passive reinforcement learning. Renew. Energy
**2020**, 157, 897–910. [Google Scholar] [CrossRef] - Chen, P.; Han, D.; Tan, F.; Wang, J. Reinforcement-based robust variable pitch control of wind turbines. IEEE Access
**2020**, 8, 20493–20502. [Google Scholar] [CrossRef] - Zhao, H.; Zhao, J.; Qiu, J.; Liang, G.; Dong, Z.Y. Cooperative Wind Farm Control with Deep Reinforcement Learning and Knowledge Assisted Learning. IEEE Trans. Ind. Inform.
**2020**, 16, 6912–6921. [Google Scholar] [CrossRef] - Sierra-García, J.E.; Santos, M. Wind Turbine Pitch Control First Approach based on Reinforcement Learning. In Proceedings of the 21st International Conference on Intelligent Data Engineering and Automated Learning—IDEAL Guimarães, Guimarães, Portugal, 4–6 November 2020. [Google Scholar]
- Mikati, M.; Santos, M.; Armenta, C. Modeling and Simulation of a Hybrid Wind and Solar Power System for the Analysis of Electricity Grid Dependency. Rev. Iberoam. Autom. Inf. Ind.
**2012**, 9, 267–281. [Google Scholar] [CrossRef] [Green Version] - Jiang, M.; Hai, T.; Pan, Z.; Wang, H.; Jia, Y.; Deng, C. Multi-agent deep reinforcement learning for multi-object tracker. IEEE Access
**2019**, 7, 32400–32407. [Google Scholar] [CrossRef] - Santos, M.; López, V.; Botella, G. Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems. Knowl.-Based Syst.
**2012**, 32, 28–36. [Google Scholar] [CrossRef] - Sutton, R.S.; Barto, A.G. Reinforcement Learning an Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2015; in progress. [Google Scholar]

**Figure 2.**Comparison of power output (

**up-left**), generator torque (

**up-right**) and pitch angle, for proportional-integral-derivative (PID) and different reinforcement learning (RL) control strategies.

**Figure 3.**Distribution of the error when ∆Mean Squared Error Reward Strategy (∆MSE-RS) and summation of all previous rewards (SAR) are applied for the first 25 iterations.

**Figure 4.**Evolution of the MSE (

**left**) and error centroid radius (

**right**) for different reward strategies and update policy One-reward (OR).

**Figure 5.**Evolution of the MSE (

**left**) and error centroid radius (

**right**) for different reward strategies and update policy SAR.

**Figure 6.**Evolution of the MSE (

**left**) and error centroid radius (

**right**) for different reward strategies and update policy means of all previous rewards (MAR).

**Figure 7.**Evolution of the MSE (

**left**) and error centroid radius (

**right**) for different reward strategies and update policy only learning with learning rate (OL-LR).

**Figure 8.**Evolution of the MSE (

**left**) and error centroid radius (

**right**) for different reward strategies and update policy Learning and forgetting with learning rate (LF-LR).

**Figure 9.**Evolution of the MSE (

**left**) and error centroid radius (

**right**) for different reward strategies and update policy Q-learning (QL).

**Figure 10.**Evolution of the MSE for ∆MSE-RS (

**left**) and ∆MEAN-RS (

**right**) and different temporal window.

**Figure 13.**Evolution of the MSE (

**left**) and variance (

**right**) for different reward sizes when velocity reward strategy (VRS) is applied.

**Figure 14.**Evolution of the MSE (

**left**) and variance (

**right**) for different reward sizes when ∆MSE-RS is applied.

**Figure 15.**Evolution of the MSE (

**left**) and variance (

**right**) for different reward sizes when ∆Mean-RS is applied.

**Figure 16.**Evolution of the MSE (

**left**) and variance (

**right**) for different combinations of position reward strategy (PRS) and velocity reward strategy (VRS).

**Figure 17.**Evolution of the MSE (

**left**) and variance (

**right**) for combinations of MSE-RS, Mean-RS, and VRS.

**Figure 18.**Evolution of the MSE (

**left**) and variance (

**right**) for combinations of PRS, MSE-RS, Mean-RS, and ∆MSE-RS.

**Figure 19.**Evolution of the MSE (

**left**) and variance (

**right**) for combinations of PRS, MSE-RS, Mean-RS and ∆Mean-RS.

Parameter | Description | Value/Units |
---|---|---|

${L}_{a}$ | Inductance of the armature | 13.5 mH |

${K}_{g}$ | Constant of the generator | 23.31 |

${K}_{\varphi}$ | Magnetic flow coupling constant | 0.264 V/rad/s |

${R}_{a}$ | Resistance of the armature | 0.275 Ω |

${R}_{L}$ | Resistance of the load | 8 Ω |

$J$ | Inertia | 6.53 kg m^{2} |

$R$ | Radio of the blade | 3.2 m |

$\rho $ | Density of the air | 1.223 kg/m^{3} |

${K}_{f}$ | Friction coefficient | 0.025 N m/rad/s |

$[{c}_{1}$,${c}_{2},{c}_{3}]$ | ${C}_{p}$ constants | $\left[0.73,151,0.58\right]$ |

$[{c}_{4}$,${c}_{5},{c}_{6}]$ | ${C}_{p}$ constants | $\left[0.002,\text{}2.14,\text{}13.2,\text{}18.4\right]$ |

[${c}_{7}$,${c}_{8},{c}_{9}$] | ${C}_{p}$ constants | $\left[18.4,-0.02,-0.003\right]$ |

[${K}_{\theta},{T}_{\theta}$] | Pitch actuator constants | [0.15, 2] |

Policy Update | ||||||
---|---|---|---|---|---|---|

Reward | OR | SAR | MAR | OL-LR | LF-LR | QL |

PRS | 402.86 | 407.59 | 403.22 | 404.11 | 402.28 | 406.53 |

VRS | 306.71 | 265.62 | 271.56 | 266.37 | 281.39 | 287.83 |

MSE-RS | 405.24 | 407.81 | 403.03 | 403.22 | 405.88 | 405.43 |

MEAN-RS | 401.07 | 402.86 | 404.10 | 405,73 | 401.63 | 405.32 |

$\u2206$MSE-RS | 308.49 | 270.21 | 273.08 | 274.71 | 282.84 | 274.33 |

$\u2206$MEAN-RS | 307.50 | 272.93 | 274.47 | 277.17 | 287.18 | 284.73 |

PID | 394.09 |

Policy Update | ||||||
---|---|---|---|---|---|---|

Reward | OR | SAR | MAR | OL-LR | LF-LR | QL |

PRS | 6.80 | 6.79 | 6.80 | 6.80 | 6.80 | 6.79 |

VRS | 7.22 | 7.06 | 7.13 | 7.09 | 7.17 | 7.18 |

MSE-RS | 6.79 | 6.79 | 6.80 | 6.80 | 6.79 | 6.79 |

MEAN-RS | 6.80 | 6.80 | 6.80 | 6.79 | 6.80 | 6.79 |

$\u2206$MSE-RS | 7.21 | 7.06 | 7.07 | 7.04 | 7.16 | 7.14 |

$\u2206$MEAN-RS | 7.21 | 7.05 | 7.12 | 7.03 | 7.17 | 7.16 |

PID | 7.25 |

Policy Update | ||||||
---|---|---|---|---|---|---|

Reward | OR | SAR | MAR | OL-LR | LF-LR | QL |

PRS | 123 | 126 | 123 | 123 | 123 | 125 |

VRS | 193 | 125 | 152 | 137 | 156 | 174 |

MSE_RS | 124 | 124 | 123 | 123 | 124 | 124 |

MEAN_RS | 122 | 123 | 124 | 124 | 122 | 124 |

$\u2206$MSE-RS | 183 | 127 | 130 | 122 | 159 | 150 |

$\u2206$MEAN-RS | 180 | 124 | 149 | 121 | 168 | 165 |

PID | 273 |

KPI | |||
---|---|---|---|

Reward | MSE [W] | Mean [kW] | Var [kW] |

PRS | 404.10 | 6.80 | 124 |

MSE-RS | 403.03 | 6.80 | 123 |

MEAN_RS | 404.81 | 6.80 | 124 |

VRS | 272.90 | 7.08 | 115 |

$\u2206$MSE-RS | 267.91 | 7.07 | 135 |

$\u2206$MEAN-RS | 269.61 | 7.08 | 137 |

PRS$\xb7$VRS | 262.20 | 7.07 | 131 |

VRS$\xb7$MSE-RS | 265.50 | 7.05 | 127 |

VRS$\xb7$MEAN-RS | 270.30 | 7.04 | 117 |

PRS$\xb7\u2206$MSE-RS | 264.99 | 7.80 | 125 |

MSE-RS$\xb7\u2206$MSE-RS | 284.39 | 7.00 | 111 |

MEAN-RS$\xb7\u2206$MSE-RS | 274.10 | 7.03 | 118 |

PRS$\xb7\u2206$MEAN-RS | 266.61 | 7.04 | 119 |

MSE-RS$\xb7\u2206$MEAN-RS | 280.54 | 7.01 | 115 |

MEAN-RS$\xb7\u2206$MEAN-RS | 285.60 | 7.00 | 116 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Sierra-García, J.E.; Santos, M.
Exploring Reward Strategies for Wind Turbine Pitch Control by Reinforcement Learning. *Appl. Sci.* **2020**, *10*, 7462.
https://doi.org/10.3390/app10217462

**AMA Style**

Sierra-García JE, Santos M.
Exploring Reward Strategies for Wind Turbine Pitch Control by Reinforcement Learning. *Applied Sciences*. 2020; 10(21):7462.
https://doi.org/10.3390/app10217462

**Chicago/Turabian Style**

Sierra-García, Jesús Enrique, and Matilde Santos.
2020. "Exploring Reward Strategies for Wind Turbine Pitch Control by Reinforcement Learning" *Applied Sciences* 10, no. 21: 7462.
https://doi.org/10.3390/app10217462