Digital Twins-Assisted Design of Next-Generation Advanced Controllers for Power Systems and Electronics: Wind Turbine as a Case Study

This paper proposes a novel adaptive controller based on digital twin (DT) by integrating software-in-loop (SIL) and hardware-in-loop (HIL). This work aims to reduce the difference between the SIL controller and its physical controller counterpart using the DT concept. To highlight the applicability of the suggested methodology, the regulation control of a horizontal variable speed wind turbine (WT) is considered for the design and assessment purposes. In the presented digital twin framework, the active disturbance rejection controller (ADRC) is implemented for the pitch angle control of the WT plant in both SIL and HIL environments. The design of the ADRC controllers in the DT framework is accomplished by adopting deep deterministic policy gradient (DDPG) in two stages: (i) by employing a fitness evaluation of wind speed error, the internal coefficients of HIL controller are adjusted based on DDPG for the regulation of WT plant, and (ii) the difference between the rotor speed waveforms in HIL and SIL are reduced by DDPG to obtain a similar output behavior of the system in these environments. Some examinations based on DT are conducted to validate the effectiveness, high dynamic performance, robustness and adaptability of the suggested method in comparison to the prevalent state-of-the-art techniques. The suggested controller is seen to be significantly more efficient especially in the compensation of high aerodynamic variations, unknown uncertainties and also mechanical stresses on the plant drive train.


Introduction and Preliminaries
Undeniably, renewable energy can promote countries to meet their valuable development goals through the provision of access to clean, secure, reliable and affordable energy [1,2]. Wind energy is one of the fastest growing and most promising energy sources, and its development has progressed tremendously worldwide. Therefore, the growth of wind turbine (WT) power generation has been increasing during the past decades [3,4]. Nowadays, multi mega-watt WTs are common in both off-shore and on-shore physical products of wind farms [5]. On the other hand, with the application of new-generation information technologies such as digital twins (DT) [6,7] in the wind turbine industry Nowadays, reinforcement learning (RL) algorithms have been used increasingly in a wide range of systems, namely, robotic, energy management, etc. [35]. They are a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its actions and experiences, and it is based on artificial neural networks with representation learning. The most popular algorithms of RL are the SARSA [36] and deep Q network (DQN) [37,38]. The key advances of DQN were the inclusion of an experience replay buffer (to overcome data correlation) and a different approach for the target Q-Network, whose weights change with the update of the main Q-Network to break the correlation between both networks. However, it was not designed for continuous states, which are deeply related to VS-WT control systems. Recently, to solve the DQN problems, a new deep RL algorithm, called deep deterministic policy gradients (DDPG) [39][40][41][42], has achieved good performance in many simulated continuous control problems. The main contributing advantage of DDPG algorithm can generate continuous actions, which is very valuable in a practical process.
In this work, a new application of DT based control strategy is introduced and implemented on a WT plant. The DT controller in this application is a virtual replica of the physical controller and can update itself concerning the measured information from its pre-designed physical counterpart. The ADRC controller has been adopted in HIL and SIL environments, and the parameters of the established controllers have been adjusted by the DDPG algorithm in the DT manner. For the adaptive realization of DT concept, the DDPG algorithm is applied to the WT system in two stages: (i) for the regulation of the wind speed in the HIL environment and (ii) for minimization of the system output behavior of the HIL and SIL environments. Several scenarios in the context of WT have been carried out to validate the correctness and applicability of the suggested DT controller method. This paper is organized as follows. Section 2 establishes the nonlinear model of variable speed wind turbines and calculates the state-space equations for it. Then, Section 3 introduces the optimized control system with an integrated control algorithm combining ADRC with reinforcement learning. In Section 4, we describe and discuss the digital twin concept for implementing the proposed controller in the HIL and SIL environments. The results of simulation in the MATLAB platform and of implementing this controller on DSP hardware as a digital twin concept are presented in Section 5. Finally, the concluding remarks are summarized in Section 6.

Variable Speed Wind Turbine Model
The two-mass model structure of WT depicted in Figure. 2, which is commonly used in the literature, is considered in the current work to illustrate the WT dynamics [43]. Nowadays, reinforcement learning (RL) algorithms have been used increasingly in a wide range of systems, namely, robotic, energy management, etc. [35]. They are a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its actions and experiences, and it is based on artificial neural networks with representation learning. The most popular algorithms of RL are the SARSA [36] and deep Q network (DQN) [37,38].
The key advances of DQN were the inclusion of an experience replay buffer (to overcome data correlation) and a different approach for the target Q-Network, whose weights change with the update of the main Q-Network to break the correlation between both networks. However, it was not designed for continuous states, which are deeply related to VS-WT control systems. Recently, to solve the DQN problems, a new deep RL algorithm, called deep deterministic policy gradients (DDPG) [39][40][41][42], has achieved good performance in many simulated continuous control problems. The main contributing advantage of DDPG algorithm can generate continuous actions, which is very valuable in a practical process.
In this work, a new application of DT based control strategy is introduced and implemented on a WT plant. The DT controller in this application is a virtual replica of the physical controller and can update itself concerning the measured information from its pre-designed physical counterpart. The ADRC controller has been adopted in HIL and SIL environments, and the parameters of the established controllers have been adjusted by the DDPG algorithm in the DT manner. For the adaptive realization of DT concept, the DDPG algorithm is applied to the WT system in two stages : (i) for the regulation of the wind speed in the HIL environment and (ii) for minimization of the system output behavior of the HIL and SIL environments. Several scenarios in the context of WT have been carried out to validate the correctness and applicability of the suggested DT controller method. This paper is organized as follows. Section 2 establishes the nonlinear model of variable speed wind turbines and calculates the state-space equations for it. Then, Section 3 introduces the optimized control system with an integrated control algorithm combining ADRC with reinforcement learning. In Section 4, we describe and discuss the digital twin concept for implementing the proposed controller in the HIL and SIL environments. The results of simulation in the MATLAB platform and of implementing this controller on DSP hardware as a digital twin concept are presented in Section 5. Finally, the concluding remarks are summarized in Section 6.

Variable Speed Wind Turbine Model
The two-mass model structure of WT depicted in Figure 2, which is commonly used in the literature, is considered in the current work to illustrate the WT dynamics [43].  The total power of wind has a direct relation with wind speed as in the following equation [23]: where is the air density (kg/m 3 ), is the swept area of the turbine (m 2 ), and is the wind speed ( / ). It has been long proven that if the wind speed is zero after passing the turbine, the total wind energy will be absorbed by the turbine [23]. However, due to the wind loses, it is practically impossible to transfer all the energy. Due to this reason, the power coefficient ( ) is presented, which represents the aerodynamic efficiency of the wind turbine. Using , the aerodynamic power of the turbine ( ) can be expressed as follows: The power coefficient is a nonlinear function that depends on two paramount factors: the tip speed ratio ( ) and the blade pitch angle ( ) as in the following numerical equation: The parameter can be calculated as follows: The parameter is calculated by the blade tip speed and wind speed upstream of the rotor as: with being the rotor angular speed. The power coefficient curves of the wind turbine have been shown as a function of and in Figure 3. The total power of wind has a direct relation with wind speed as in the following equation [23]: where ρ is the air density (kg/m 3 ), A is the swept area of the turbine (m 2 ), and V is the wind speed (m/s). It has been long proven that if the wind speed is zero after passing the turbine, the total wind energy will be absorbed by the turbine [23]. However, due to the wind loses, it is practically impossible to transfer all the energy. Due to this reason, the power coefficient (C P ) is presented, which represents the aerodynamic efficiency of the wind turbine. Using C P , the aerodynamic power of the turbine (P a ) can be expressed as follows: The power coefficient is a nonlinear function that depends on two paramount factors: the tip speed ratio (λ) and the blade pitch angle (β) as in the following numerical equation: The parameter λ i can be calculated as follows: The parameter λ is calculated by the blade tip speed and wind speed upstream of the rotor as: with ω r being the rotor angular speed. The power coefficient curves of the wind turbine have been shown as a function of λ and β in Figure 3.

of 19
The parameter is calculated by the blade tip speed and wind speed upstream of the rotor as: with being the rotor angular speed. The power coefficient curves of the wind turbine have been shown as a function of and in Figure 3. Nonlinear wind turbine model is shown in a generalized nonlinear form as follows [43]: The state vector x, control input u, and nonlinear vector G(X) are defined as: where ω r is the rotor speed, ω g is the generator speed and δ is the twist angle. τ β is the time constant of pitch actuator and β r is the pitch angle control. T g is the generator torque, J r and J g are the rotor and generator inertia, N g is the gear ratio, D s and K s are the drive-train damping and spring constant, respectively. The objective of this paper is to develop a novel digital twin-based pitch angle controller for rotor speed regulation at Region III of the wind turbine operation, by restricting the power derived from the wind turbine. The parameters of the wind turbine system are borrowed from [43].

ADRC Technique
The ADRC is generally regarded as a model-free technique since it does not need complete knowledge of the system. This method is introduced to deal with general nonlinear uncertain plants as it can eliminate the impact of the internal/external disturbances in real-time. Since the ADRC originates from the traditional PID controller, it has the same benefit of fast response and strong robustness as the PID controller. The block diagram of the ADRC controller is illustrated in Figure 4, which consists of TD, NLSEF and ESO. In Figure 4, w denotes the external disturbances, y is the output signal, and u is the control signal.
Inventions 2020, 5, 19 6 of 19 knowledge of the system. This method is introduced to deal with general nonlinear uncertain plants as it can eliminate the impact of the internal/external disturbances in real-time. Since the ADRC originates from the traditional PID controller, it has the same benefit of fast response and strong robustness as the PID controller. The block diagram of the ADRC controller is illustrated in Figure 4, which consists of TD, NLSEF and ESO. In Figure 4, denotes the external disturbances, is the output signal, and is the control signal. The TD is established to improve the ideal transient process and smooth signals for the controller. One feasible second-order TD can be formulated as: .
where the term v denotes the control objective, v 1 and v 2 are the desired trajectory and its derivative of v, respectively. Likewise, r and h 0 are the speed and filtering factors, respectively. The formulation of ESO is adopted as a type of robust control that could assess all the external disturbances and internal perturbations and then compensate all of them to get the correct response. The relationship between input and output in ESO is illustrated below: where z 1 , z 2 and z 3 are the observer output signals and the terms of β 1 , β 2 and β 3 denote the design variables of ESO.
The NLSEF is established in the ADRC structure to achieve the control input of the system by combination estimated states (z 1 , z 2 ) of ESO and output signals (e 1 , e 2 ) of TD. The low control of NLSEF is described by: where k 1 and k 2 are the proportional and differential parameters, respectively, and f al(.) is a nonlinear function, which is expressed as: Finally, the controller is achieved by

Deep Reinforcement Learning
Reinforcement Learning (RL) is a widely used methodology in machine learning due to its potential to learn highly in complex environments. It is applicable in many research and practical areas such as game theory, control theory, simulation-based optimization, multi-agent systems and statistics. Practically, there are numerous challenges such as the temporal correlation of data, divergence of learning or continuous nature of inputs and outputs, to implement RL in practical control problems. Recently, the Deep Q-Network (DQN) has revealed a new set of possibilities to solve most of the mentioned problems; however, control actions of DQN are restricted to a small action space that decrease its applicability. According to the advances of DQN and the actor-critic paradigm, deep deterministic policy gradients (DDPG) had been proposed by Lillicrap et al. [44] as an algorithm that solves continuous control problems by integrating neural networks in the RL paradigm.
Reinforcement Learning is concerned with how agents ought to take actions in an environment to maximize the reward. More specifically, all RL applications involve interaction between an active decision-making agent and its environment, within which the agent seeks to achieve a goal despite uncertainty about its environment. This interaction process is formulated as a Markov Decision Process (MDP) which is described by the concepts below: • Environment: Space through which the agent moves and responds to the agent. The environment takes the agent's current state and action as input and returns as output the agent's reward and its next state. • Agent: An agent tries to find an optimal policy to map the state of the environment to an action that will maximize the rewards of accumulated future in turn. State (s S): S is state-space or all possible states of the agent in the environment. • Policy (π): The policy is the strategy that the agent employs to determine the next action based on the current state. It maps states to actions, the actions that promise the highest reward. • Action (a A) : It is the set of all possible moves that the agent can make. • Reward (r R): A reward is feedback by which the success or failure of an agent's actions in a given state is evaluated.

•
Value function (V π ) : It is the expected long-term return with a discount, as opposed to the short-term reward. • Q-value or action-value (Q): Q-value is similar to V π , except that it takes an extra parameter, the current action a.
The final goal of an RL subject is to learn a policy π : S → A, which maximizes the expectation of a long-term discounted reward as below: where G t = ∞ k=0 γ k r t+k is the total long term discounted reward at each step, and γ (0. 1] refers to the discount factor which dampens the rewards' effect on the agent's choice of action to make future rewards worth less than immediate rewards. Considering the target policy π : S → A, which maps each state-space to deterministic actions, a value function V π is formulated as a depiction of total discounted reward G t for each s S.
Using the Bellman equation, V π can be recursively described as below: Action-value function Q π is represented in the equation below as the value function V π defined based on the Bellman equation: The policy that maximizes the action-value function or the value function is the optimal policy (π * = argmax a Q * (s.a)). Regarding DDGP algorithm, it contains two neural networks, Q(s t .a t θ Q ) and µ(s t θ µ ) , which have proven to perform well to solve continuous problems. In the algorithm, both functions, Q(s t .a t ) and µ(s t ), are approximated by the aforementioned neural networks respectively, where θ Q and θ µ are the weights of the critic and actor networks. The critic network is updated by minimizing the loss function based on the stochastic gradient descent: where The actor network's coefficient θ µ is updated based on the following policy gradient: In Equation (22), ρ is the discounted distribution, and β is a specific policy to the current policy π.
In the DDGP algorithm, a replay buffer is used to weaken the correlations existing in the input experiences, and target network approaches are exploited to stabilize the training procedure. According to the reply buffer mechanism, which uses a finite-size memory, each experience tuple e = (s t , a t , r t , s t+1 ) of each time step saved in an R-sized experience memory D = {e 1 , e 2 , . . . , e R }. In each step of the training process, a mini-batch of previous saved experiences is uniformly sampled from the memory }. In each step of the training process, a mini-batch of previous saved experiences is uniformly sampled from the memory R to update the neural network at a time step. In terms of the stability of the DDPG learning method, two additional neural networks, Q (s.a θ Q ) and µ s θ µ , named target networks, are also adopted for the actor and critic neural networks to avoid the instability of DDPG learning. Both weights of θ Q and θ µ are updated from the current networks at each time step. Moreover, in the training phase, a Laplacian exploration noise (N), which is represented in Equation (23), is added to the actions provided by the agent (i.e., a t = µ(s t |θ µ ) + N) for exploration purposes. The pseudo-code for the standard DDPG algorithm is presented in Algorithm 1 [45], and the DDPG based online learning framework is illustrated in Figure 5 [46].

14:
Update the actor policy using the sampled policy gradient:

Digital Twin Controller of WT System
For the aim of the control problem of a typical WT system, the digital twin (DT) method has been suggested as a significant strategy for performing, testing, and improving. Digital twin technology in WT systems has been composed of physical products, virtual products and connection data that ties physical and virtual products together. The DT is a digital replica or representation of a physical object or an intangible system that can be examined, altered and tested without interacting with it in the real world and avoiding negative consequences. It is a bridge between the digital world and the physical world. Digital twins have achieved remarkable popularity in recent years mainly in the industrial field. The pseudo-code for the standard DDPG algorithm is presented in Algorithm 1 [45], and the DDPG based online learning framework is illustrated in Figure 5 [46]. Algorithm 1: Framework of the DDPG for the WMR system. 1: Randomly initialize critic Q s, a θ Q and actor µ(s θ µ ) networks with weights θ Q and θ µ 2: Initialize target networks Q and µ with weights θ Q ← θ Q , θ µ ← θ µ 3: Set up empty replay buffer R 4: for episode = 1 to M do 5: Begin with a Laplacian noise N for exploration 6: Receive initial observation state 7: for t = 1 to T do 8: Apply action a t = µ(s t |θ µ ) + N to environment 9: Observe next state s t+1 and reward r t 10: Store following transitions (s t , a t , r t , s t+1 ) into replay buffer R 11: Sample random minibatch of K transitions from R 12: Set y i = r i + γQ s i+1 , µ (s i+1 θ µ ) θ Q 13: Update critic by the loss:

Critic-Network
14: Update the actor policy using the sampled policy gradient: 15: Update the target networks:

Digital Twin Controller of WT System
For the aim of the control problem of a typical WT system, the digital twin (DT) method has been suggested as a significant strategy for performing, testing, and improving. Digital twin technology in WT systems has been composed of physical products, virtual products and connection data that ties physical and virtual products together. The DT is a digital replica or representation of a physical object or an intangible system that can be examined, altered and tested without interacting with it in the real world and avoiding negative consequences. It is a bridge between the digital world and the physical world. Digital twins have achieved remarkable popularity in recent years mainly in the industrial field.
Under this approach, a digital twin of the wind turbine pitch angle controller, which is proposed in this paper, has been defined and implemented on a Texas Instruments (TI) digital signal processor (DSP) computing device. As shown in Figure 6, the hardware-in-loop (HIL) [47,48] concept enables the test of controller algorithms on the actual controller hardware deployed on the wind turbine. In contrast, the software-in-the-loop (SIL) concept allows the test of the algorithms but neglects the test of the controller hardware. Under this approach, a digital twin of the wind turbine pitch angle controller, which is proposed in this paper, has been defined and implemented on a Texas Instruments (TI) digital signal processor (DSP) computing device. As shown in Figure 6, the hardware-in-loop (HIL) [47,48] concept enables the test of controller algorithms on the actual controller hardware deployed on the wind turbine. In contrast, the software-in-the-loop (SIL) concept allows the test of the algorithms but neglects the test of the controller hardware. The purpose of using digital twins (DT) here is to design the controllers in such a way that the system in the SIL environment behaves similarly to the HIL one. In the suggested methodology, the ADRC controller is adopted in both the SIL and HIL environments for the pitch angle control of the WT plant. In this work, the design of the DT control-based strategy has been realized in two stages by the DDPG algorithm as depicted in Figure 7, which are discussed in the following sub-sections.

Design of the HIL Controller
Firstly, the DDPG scheme with the actor-critic architecture is applied as a parameter tuner to provide the regulative signals to set the NLSEF gains of the HIL setup adaptively. The following equation shows the coefficients of the self-adapting ADRC method: = . + ∆ . = . + ∆ . (24) where . and . are the NLSEF initial coefficients in the HIL setup and ∆ . and ∆ . are the regulatory signals, which are tuned by the DDPG algorithm. The state variables including the rotor speed of the HIL setup , its error ( = − ) and its error integral are expressed as = { .
. . The purpose of using digital twins (DT) here is to design the controllers in such a way that the system in the SIL environment behaves similarly to the HIL one. In the suggested methodology, the ADRC controller is adopted in both the SIL and HIL environments for the pitch angle control of the WT plant. In this work, the design of the DT control-based strategy has been realized in two stages by the DDPG algorithm as depicted in Figure 7, which are discussed in the following sub-sections. Under this approach, a digital twin of the wind turbine pitch angle controller, which is proposed in this paper, has been defined and implemented on a Texas Instruments (TI) digital signal processor (DSP) computing device. As shown in Figure 6, the hardware-in-loop (HIL) [47,48] concept enables the test of controller algorithms on the actual controller hardware deployed on the wind turbine. In contrast, the software-in-the-loop (SIL) concept allows the test of the algorithms but neglects the test of the controller hardware. The purpose of using digital twins (DT) here is to design the controllers in such a way that the system in the SIL environment behaves similarly to the HIL one. In the suggested methodology, the ADRC controller is adopted in both the SIL and HIL environments for the pitch angle control of the WT plant. In this work, the design of the DT control-based strategy has been realized in two stages by the DDPG algorithm as depicted in Figure 7, which are discussed in the following sub-sections.

Design of the HIL Controller
Firstly, the DDPG scheme with the actor-critic architecture is applied as a parameter tuner to provide the regulative signals to set the NLSEF gains of the HIL setup adaptively. The following equation shows the coefficients of the self-adapting ADRC method: = . + ∆ . = . + ∆ . (24) where . and . are the NLSEF initial coefficients in the HIL setup and ∆ . and ∆ . are the regulatory signals, which are tuned by the DDPG algorithm. The state variables including the rotor speed of the HIL setup , its error ( = − ) and its error integral are expressed as = { .

Design of the HIL Controller
Firstly, the DDPG scheme with the actor-critic architecture is applied as a parameter tuner to provide the regulative signals to set the NLSEF gains of the HIL setup adaptively. The following equation shows the coefficients of the self-adapting ADRC method: − ω HIL re f ) and its error integral are expressed as s HIL t = ω HIL r .e HIL . e HIL dt . Equation (25) shows the reward function for the designing the HIL controller, which is a basis for the evaluation of the DDPG control actions (k 1 and k 2 ).
With an increase in rotor speed error that results from a system perturbation, r HIL t comes down, and the weights of the actor and critic networks need to be updated accordingly. For more specification, to mitigate the effect of the perturbation, the actor-network senses the state variables s HIL t and then generates two continuous regulative signals. Then, the critic network receives s HIL t , k HIL 1.0 and k HIL 2.0 , and the weights of the critic network are trained. Then the function Q(s t .a t ) is derived in the output layer, which leads to an updated DDPG network with adapted regulative signals to feed the controller.

Design of the Digital Twin Controller Based on the System Output Specification of the HIL Setup
In the second step, the rotor speed output of the HIL environment is introduced as the reference input for the rotor speed regulation of the SIL environment. To do this, similarly to the HIL controller design, the NLSEF coefficients are adaptively regulated by employing the DDPG method, given as: where k SIL 1.0 and k SIL 2.0 are the NLSEF initial coefficients of SIL setup, which are tuned by the DDPG algorithm according to the pre-designed HIL controller.
For the design of the SIL controller, the state variables are chosen as s HIL t = ω HIL r .e HIL . e HIL dt , where ω SIL r and e SIL are the rotor speed and rotor speed error in the SIL. A reward function is also constructed for the optimal setting of the DT controller in the SIL setup, described as: Based on the reward function of (27), the actor and critic network of the DDPG scheme are trained in a way that minimizes the difference between the output responses of the WT system in the SIL and HIL environments. To do this, the actor takes the state variable s SIL t and generates continuous regulatory signals. Likewise, s SIL t , k SIL 1,0 and k SIL 2,0 are considered as the input of the critic network, and a continuous Q-Value is produced in the output of the network.

Experimental Results
For the optimal design of the ADRC controller based digital twin concept, the actor and critic networks are trained over 200 episodes. The DDPG learning agent interacts with the environment at a frequency of 10 KHz, which corresponds to one training step. The weights of both actor and critic network are being optimized with a base learning rate of 10 −4 and 10 −3 , respectively, employing Adam optimizer.
In the following section, the effectiveness and efficiency of the proposed control system is tested by Real-Time SIL (RT-SIL) MATLAB simulation experiments, as well as Real-Time HIL (RT-HIL) board. The output results are evaluated and verified under the following three typical scenarios of WT process: (i) the step changes of wind speed, (ii) the random changes of wind speed and (iii) the parametric uncertainty in the turbine model. In the comparative analysis of the real-time setup, the output results of the proposed method are compared with the ADRC and PI controllers in HIL and SIL environments.

Scenario I: The Step Changes in Wind Speed
In the first scenario, a multi-step variation of wind speed (which is varied within [12 m/s 21 m/s]) is applied to the nonlinear WT plant, as depicted in Figure 8. The average accumulated reward for the full-simulated training phase of 200 episodes in HIL is depicted in Figure 9. As shown in this figure, the reward chart follows an upward trend since episode 5 and has been almost constant since episode 20 onwards. This indicates that the rotor speed error at the HIL output is significantly reduced, and the DDPG algorithm calculates the coefficients controller k HIL 1 and k HIL 2 accurately.
Inventions 2020, 5, x FOR PEER REVIEW 12 of 19 figure, the reward chart follows an upward trend since episode 5 and has been almost constant since episode 20 onwards. This indicates that the rotor speed error at the HIL output is significantly reduced, and the DDPG algorithm calculates the coefficients controller and accurately.  The HIL output comparative results of ADRC-DDPG, ADRC and PI controllers under the multistep disturbance are presented in Figure 10. From Figure 10, it is clear that the ADRC-DDPG and ADRC controllers obtain satisfactory performance to control the WT system, but the rotor speed outcomes of the PI controller experience large deviations. It is also observed that the transient specifications of the rotor speed in terms of settling time and overshoot have been remarkably ameliorated in the suggested controller compared to the other two types of pitch angle control strategies. figure, the reward chart follows an upward trend since episode 5 and has been almost constant since episode 20 onwards. This indicates that the rotor speed error at the HIL output is significantly reduced, and the DDPG algorithm calculates the coefficients controller and accurately.  The HIL output comparative results of ADRC-DDPG, ADRC and PI controllers under the multistep disturbance are presented in Figure 10. From Figure 10, it is clear that the ADRC-DDPG and ADRC controllers obtain satisfactory performance to control the WT system, but the rotor speed outcomes of the PI controller experience large deviations. It is also observed that the transient specifications of the rotor speed in terms of settling time and overshoot have been remarkably ameliorated in the suggested controller compared to the other two types of pitch angle control strategies. The HIL output comparative results of ADRC-DDPG, ADRC and PI controllers under the multi-step disturbance are presented in Figure 10. From Figure 10, it is clear that the ADRC-DDPG and ADRC controllers obtain satisfactory performance to control the WT system, but the rotor speed outcomes of the PI controller experience large deviations. It is also observed that the transient specifications of the rotor speed in terms of settling time and overshoot have been remarkably ameliorated in the suggested controller compared to the other two types of pitch angle control strategies.
Inventions 2020, 5, x FOR PEER REVIEW 12 of 19 episode 20 onwards. This indicates that the rotor speed error at the HIL output is significantly reduced, and the DDPG algorithm calculates the coefficients controller and accurately.  The HIL output comparative results of ADRC-DDPG, ADRC and PI controllers under the multistep disturbance are presented in Figure 10. From Figure 10, it is clear that the ADRC-DDPG and ADRC controllers obtain satisfactory performance to control the WT system, but the rotor speed outcomes of the PI controller experience large deviations. It is also observed that the transient specifications of the rotor speed in terms of settling time and overshoot have been remarkably ameliorated in the suggested controller compared to the other two types of pitch angle control strategies.  Besides, the average reward of the DDPG agent when minimizing the difference between the output response specification of HIL and SIL environments is presented as illustrated in Figure 11. From Figure 11, it is noted that the average reward is eventually increased and stabilized during the 200 episodes, which proves the correctness and usefulness of the DDPG agent for the studied WT system based digital twin concept.
Inventions 2020, 5, x FOR PEER REVIEW 13 of 19 Besides, the average reward of the DDPG agent when minimizing the difference between the output response specification of HIL and SIL environments is presented as illustrated in Figure 11. From Figure 11, it is noted that the average reward is eventually increased and stabilized during the 200 episodes, which proves the correctness and usefulness of the DDPG agent for the studied WT system based digital twin concept. The SIL responses for the ADRC-DDPG, ADRC and PI controllers under the concerning wind speed disturbance are shown in Figure 12. The outcomes of Figure 12 reveal that with the application of the suggested controller, a superior performance of rotor speed response is achieved to that of the other two pitch angle control strategies. By comparing the curves of Figures 10 and 12, it can be inferred that under the actions of the ADRC-DDPG, the difference of rotor speed waveforms of the HIL and SIL is further reduced. The performance index corresponds to the dynamic specifications of the WT system under the multi-step wind speed, such as settling time, overshoot and output error, which are furnished in Table I. From the quantitative analysis of Table 1, it is noticed that with the application of the ADRC-DDPG, the considered dynamic specifications are greatly improved and outperform the ADRC and PI controllers for the same investigated plant.  The SIL responses for the ADRC-DDPG, ADRC and PI controllers under the concerning wind speed disturbance are shown in Figure 12. The outcomes of Figure 12 reveal that with the application of the suggested controller, a superior performance of rotor speed response is achieved to that of the other two pitch angle control strategies. By comparing the curves of Figures 10 and 12, it can be inferred that under the actions of the ADRC-DDPG, the difference of rotor speed waveforms of the HIL and SIL is further reduced. Besides, the average reward of the DDPG agent when minimizing the difference between the output response specification of HIL and SIL environments is presented as illustrated in Figure 11. From Figure 11, it is noted that the average reward is eventually increased and stabilized during the 200 episodes, which proves the correctness and usefulness of the DDPG agent for the studied WT system based digital twin concept. The performance index corresponds to the dynamic specifications of the WT system under the multi-step wind speed, such as settling time, overshoot and output error, which are furnished in Table 1. From the quantitative analysis of Table 1, it is noticed that with the application of the ADRC-DDPG, the considered dynamic specifications are greatly improved and outperform the ADRC and PI controllers for the same investigated plant.

The Random Changes in wind Speed
To study the feasibility of the adaptive ADRC-DDPG controller in a more realistic condition of the WT plant, a random variation of wind speed (which is fluctuated within [12 m/s 18 m/s]) is applied to the system as depicted in Figure 13.
The performance index corresponds to the dynamic specifications of the WT system under the multi-step wind speed, such as settling time, overshoot and output error, which are furnished in Table I. From the quantitative analysis of Table 1, it is noticed that with the application of the ADRC-DDPG, the considered dynamic specifications are greatly improved and outperform the ADRC and PI controllers for the same investigated plant.

The Random Changes In wind Speed
To study the feasibility of the adaptive ADRC-DDPG controller in a more realistic condition of the WT plant, a random variation of wind speed (which is fluctuated within [12 / 18 / ]) is applied to the system as depicted in Figure 13.

The Parametric Uncertainty in the Turbine Model
In this scenario, the robustness and supremacy of the suggested controller are evaluated by imposing some uncertainties in the WT model for both the HIL and SIL environments as follows: R b = +20%, J r = +40% and t B = +60%. The effects of these variations on the output rotor speed are calculated using two standard error measurement criteria including Mean Square Error (MSE) and Root MSE (RMSE). Figure 16a,b depicts the bar chart curves of MSE and RMSE corresponding to the designed pitch angle controllers for both the HIL and SIL. Looking at the information presented in this figure, not only does the suggested ADRC-DDPG controller have the best performance against uncertainties but also, and more importantly, based on the digital twin concept, SIL output variations follow almost completely the HIL output variations.

Conclusion
This paper concentrates on developing a novel adaptive ADRC controller-based digital twin for pitch control of a nonlinear speed WT plant. In this application, to regulate the speed rotor of a WT in HIL, the ADRC controller is firstly designed by the DDPG algorithm for this environment. Then, the output response of HIL is considered as the reference for the design of SIL controller. To do this, the ADRC of the SIL controller is designed by the DDPG algorithm minimizing the difference between the speed rotor waveforms of the HIL and SIL. To verify the efficiency of the suggested digital twin controller, critical examinations are carried out for pitch angle control of the WT plant in both the SIL and HIL environments. Comprehensive examinations demonstrate the dynamic behavior improvement of the digital twin-based system compared to state-of-the-art schemes.