Abstract
In the three-motor hybrid architecture, the auxiliary drive uses electrically excited synchronous motor (EESM), which has the advantages of high torque density, wide speed range and strong anti-demagnetization ability. However, the strong electromagnetic coupling between the field winding and the armature winding leads to the difficulty of current control, and the traditional PID has limitations in dynamic response and immunity. In order to solve this problem, a linear active disturbance rejection control (LADRC) method for the rotor of EESM is proposed in this paper, linear extended state observer (LESO) is used to estimate and compensate the system internal and external disturbances (such as winding coupling and parameter perturbation) in real time. The method only uses the input and output of the system and does not depend on any mechanical parameters, so that the torque response is improved by 50%, and the steady-state fluctuation is reduced by 10.2%. In addition, an adaptive dynamic programming (ADP) parameter optimization strategy is proposed to solve the bandwidth parameter tuning problem of LADRC algorithm in complex operating conditions, and the related mathematical analysis of optimality properties is given. Finally, the proposed method is compared with the traditional PI controller in several operating conditions of EESM, and the effectiveness of the proposed method is validated by the corresponding results.
1. Introduction
With the pursuit of extreme performance and efficiency of new energy vehicles, three-motor drive architecture is becoming increasingly popular. In this architecture, the electrically excited synchronous motor (EESM) as the auxiliary drive shows significant advantages; Its excitation current can be flexibly regulated, giving it excellent flux-weakening control capability and achieving both high-efficiency cruising and rapid dynamic response [,]. However, this also significantly increases the complexity of the motor controller. In addition to the torque distribution and dynamic coordination control mechanism between the main drive and auxiliary drive, each electrically excited motor requires an independent excitation current control loop, which is deeply coupled with the original torque/current control. This necessitates the design of advanced multivariable and multi-objective real-time optimization control strategies, integrating excitation regulation, efficiency maximization, dynamic power demand response, and redundancy management under fault conditions, ensuring that the three-motor control system achieves the optimal balance among performance, efficiency, and robustness.
Due to the linear and fixed-gain structure, the PID lacks the capability to estimate and compensate for internal cross-coupling and external disturbances in real-time, particularly under dynamic load conditions and d-axis current coupling disturbances, resulting in inadequate for addressing the advanced requirements of multi-motor coordination, nonlinear decoupling, and multi-objective optimization. Especially given the auxiliary drive’s demand for wide-speed capability and fast dynamic response, modern control strategies are required to break through performance bottlenecks while ensuring system robustness. The sliding mode control [,] demonstrates improved robustness and faster response than PID, its performance is heavily dependent on the design of the sliding surface and the switching gain. The chattering phenomenon inherent in SMC (Sliding model control) generates harmonic noise. Furthermore, the chattering issue becomes more severe when the system states are close to the sliding surface, limiting its practical application in high-precision scenarios. MPC (Model predictive control) shows good performance, but its superiority diminishes under parameter uncertainties []. Its core limitation lies in its heavy reliance on an accurate mathematical model of the machine. This is a critical drawback in practical applications, where machine parameters are subject to thermal drift and magnetic saturation. Additionally, for low-cost digital processors, the computational load for solving optimization problems online is a challenge. Against this backdrop, linear active disturbance rejection control (LADRC) [,,] has been proposed, which realizes disturbance observer design and feedback control with a linear structure, without relying on precise system models, thus greatly simplifying engineering implementation, at the same time, the absence of complex computational procedures substantially reduces the overall computational burden. Nevertheless, under the complex operating conditions of new energy vehicles, where different motor speeds and throttle depths correspond to varying torque, and frequent variations occur in the stator d-axis current and rotor excitation current, engineers still need to tune bandwidth parameters according to specific operating points to achieve optimal performance.
In the design of complex control systems, achieving a balance between optimal performance and robust stability has always been a central challenge for control engineers. To address this problem, the control theory community has developed various methods, each reflecting different design philosophies. Among them, Adaptive Dynamic Programming (ADP) and Linear Active Disturbance Rejection Control (LADRC) represent two important paradigms, i.e., performance-oriented and robustness-oriented strategies, respectively. The integration of these two approaches forms a solid theoretical foundation for constructing a new generation of intelligent and robust control architectures. From the perspective of control objectives, ADP emphasizes the optimization of control performance. At its core lies the minimization of a cost function, achieved through iterative learning to approximate the optimal control policy, allowing the system to reach the desired optimal performance. Theoretically, ADP can handle a wide range of systems, including nonlinear, time-varying, and uncertain ones. Its strength lies in unifying multiple objectives—such as control performance, energy consumption, and error minimization—within a single optimization framework. This makes ADP particularly suitable for EESM control problems, which requires adaptive and dynamically optimized strategies.
In contrast, LADRC adopts a fundamentally different design philosophy. It does not aim to derive mathematically optimal control laws, but instead focuses on ensuring system stability and dynamic performance under uncertain disturbances and incomplete models. This is achieved through the use of an Extended State Observer (ESO) that estimates the “total disturbance” in real time and compensates it via feedback. A key advantage of LADRC is its minimal dependence on accurate system models. Controller design requires only the system order and a desired bandwidth. This makes LADRC highly adaptable and deployable in EESM control problems, especially those with frequent parameter variations and strong external disturbances.
The complementarity between ADP and LADRC goes beyond a simple functional combination—it reflects a deeper synergy between control philosophies and engineering implementation. ADP focuses on global performance optimization and intelligent strategy learning, while LADRC emphasizes real-time responsiveness and disturbance rejection. By integrating their strengths, this hybrid approach can overcome the limitations of using either method alone, achieving the dual goals of performance optimality and robust stability. Such a framework offers a solid theoretical foundation for next-generation EESM intelligent adaptive control systems and holds great promise for future applications in complex system control. Compared to manually tuning the bandwidth parameter, Adaptive Dynamic Programming (ADP) can automatically and iteratively design an LADRC with optimal parameters. In 1977, Werbos [] first proposed the concept of Adaptive Critic Designs (ACDs), which integrates theories such as dynamic programming, reinforcement learning, and neural networks, making it a highly valuable and applicable method. The core idea of this theory is to use function approximation structures (e.g., neural networks) to iteratively and forward-in-time approximate the Bellman optimality conditions, thereby obtaining an optimal control policy.
The structure of the adaptive critic design algorithm originates from the actor–critic framework in reinforcement learning and consists of a model network, a critic network, and an action (or actor) network. The model network is used to model the dynamic system, the critic network approximates the optimal performance index function, and the action network approximates the optimal control strategy. The combination of the critic and action networks constitutes an agent. When the agent applies an action to the dynamic system, the environment provides rewards at different stages, which are used to adjust the critic network. The agent’s task is to learn a control policy that maximizes the cumulative rewards over time. This method effectively overcomes the limitations of traditional dynamic programming, allowing for online learning without requiring a known system model. In the past decade or so, adaptive dynamic programming has become a hot topic in intelligent control and computational intelligence research. The U.S. National Science Foundation held forums on approximate dynamic programming in 2002 and 2006. The IEEE Computational Intelligence Society established a dedicated Technical Committee on Adaptive Dynamic Programming and Reinforcement Learning in 2008, and international workshops on this topic were held in 2007, 2009, and 2011. Many major journals have published special issues on adaptive dynamic programming [,,,,,,,], and important review articles include [] and [], with key monographs listed in [,,].
2. LADRC of EESM
2.1. Model Description of Electrically Excited Motor
In the drive control of the electrically excited motor, an accurate model of the control object is required. Since we focus on the dynamic response performance of the motor, the end-region iron losses are neglected. The voltage equation of the electrically excited motor is as follows:
in which the resistance matrix R consists of the stator resistance and the rotor resistance , while the matrix ω primarily includes the angular speed
The vectors in the voltage equation consists of three components: d axis, q axis, and f axis:
Ignoring the effects of magnetic saturation and motor temperature, from the definition of the inductance, the differential term of the flux linkage can be reconstructed as
where L is the incremental inductance matrix:
The incremental inductance includes self-inductance and mutual inductance . The mutual inductance between the d and q axes is much smaller than that of other components, which can be approximately ignored.
Based on the above equations, the current derivative can be derived as
It can be seen from (7) that in the EESM model, the input is the voltage , speed ω, the output is current and electromagnetic torque . At the same time, the flux linkage can be expressed by apparent inductance as
In this paper, the interior electrically excited synchronous machine is considered, therefore . Combining (8) with the definition of apparent inductance, (7) can be rewritten as
The basics of instantaneous power theory are introduced in Appendix A. According to the instantaneous power theory [], the stator instantaneous power is calculated from the stator terminal voltage and current :
Based on (8), the motor power can be expressed in quadratic form as
Neglecting the coupling between the d and q axes, and can be expressed as follows:
The instantaneous torque of the motor can be derived from the instantaneous power:
Since the values of Ld and Lq are not equal and p represents the number of pole pairs, the reluctance torque component cannot be ignored, and the electromagnetic torque can be expressed as
As can be seen, consists of two components, one is the synchronous torque generated by the interaction between the excitation current and the q-axis current, and the other is the reluctance torque component.
2.2. Problem Statements
The control architecture of the electric-excited motor is shown in Figure 1. Substituting (2), (3), and (6) into Formula (7), the stator voltage equation is obtained as
Figure 1.
Control architecture diagram.
The rotor voltage equation can be expressed as
Based on the voltage equations of stator and rotor, it is known that there is strong coupling between d-axis and f-axis, and the current fluctuation will also affect the fluctuation of electromagnetic torque.
Formula (16) can be transformed into
which is abstracted as
where y and u denote the output and input of the system, respectively. f represents the total disturbance of the system, where the coefficients and are all unknown quantities. Combining (17) and (18) yields . It should be noted that the parameter can be estimated and is mainly related to the system parameters. The above state space equation can be expressed as
among which indicates the excitation current and indicates the disturbance f. Then the extended state observer (LESO) can be designed as
In order to reduce the problem to a unit-gain integral control problem subject to disturbance , the controller is designed as follows:
where represents the proportional gain factor of the controller, represents the observed value of excitation current, represents the sensor measurements, represents the perturbed observations.
2.3. Proof of Stability
From the state space Equation (19), the state variables are taken as
The equation above can also be written in matrix form
Equation (19) can be expressed in the standard state-space form as
where the vector A, B, C, E can be indicated as
The state-space observer (20) can be reconstructed as
where L is the observer gain vector
Let , combining (25) and (26), the error can be written in the form
where E is defined in (25), and
The characteristic polynomial of LESO can be expressed as
The root of characteristic polynomial lies in the left of the s-phase; therefore, the LESO is BIBO and the perturbation is also bounded.
3. Adaptive Dynamic Programming
3.1. Problem Statements
The motor system is described by the following discrete-time dynamic equation:
where is the system state which includes the EESM current information, is the LADRC, which specifies the control voltage when the system occupies state . is the environment disturbance. Here, we denote the set of possible system states as and the controller parameter space as . Given the current system state and the current action , the next system state is determined by a probability distribution .
The expected total reward for initial state under the LADRC is defined as
In (30), is the utility function which evaluates metrics such as the regulation time, magnitude of overshoot and fluctuation level of the excitation current. The shorter the setting time, the smaller the overshoot, and the lower the fluctuation level, the smaller the corresponding utility function and performance index function .
The goal of the presented algorithm is to find the optimal LADRC to minimize the performance index function (30).
3.2. ADP-Based LADRC Optimization Procedure
In this section, we develop an adaptive dynamic programming-based algorithm to obtain the optimal LADRC and the optimal performance index function (30) for the motor system.
For all , let
where is an arbitrary positive semi-definite function. Then, for all , the iterative LADRC is computed as
For all , let be the iterative value function that satisfies the following equation
The iterative LADRC is computed as
The algorithm will iterate between (33) and (34).
3.3. Algorithm Properties
Theorem 1.
For , let and be obtained by (31)–(34). Given constants , , , and that satisfy
and
respectively, for , the iterative value function satisfies
Proof.
First, we prove that
holds for .
According to (38), the left-hand side of the inequality (39) obviously holds for . Let . Based on the left-hand side of (38), it is easy to obtain
By adding to and subtracting the same term from (41), (41) can easily be transformed into
Since , (42) can be developed into
Combining similar terms of (43), we can obtain
According to the Bellman equation (44) becomes
Assume (40) holds for . Then for , we have
By adding to and subtracting the same term from (46), (46) can easily be transformed into
Since , (47) can be developed into
Combining similar terms of (48), we can obtain
According to the Bellman equation we obtain
The proof of
follows similar steps. The proof is completed. □
4. Experimental Results
4.1. Initial Process
To validate the effectiveness of the proposed ADP-LADRC strategy, an experimental test bench for an electrically excited synchronous machine was established. The core components of the platform include a 150 kW EESM, an inverter and a control board equipped with the Texas Instruments C2000 DSP (The EESM, inverter and control board are produced and manufactured by BYD Company in Shenzhen, China. The Texas Instruments C2000 DSP is from Texas Instruments Company in Dallas, TX, USA). The rotor current and speed were measured by a Hall-effect current sensor and a resolver, respectively. The detailed connection diagram of the system is shown in Figure 2.
Figure 2.
Experimental equipment layout.
Table 1.
AVL bench parameters.
Table 2.
EESM parameters.
To guarantee the accuracy of the results, the coefficient calibration of the system sensor device, along with the motor parameter calibration, must be carried out, the relevant processes are listed in the following Table 3.
Table 3.
Process preparation.
4.2. Results
From (17) and (18), it can be known that the parameter b0 can be determined by the actual measured value of the rotor’s self-inductance. The initial parameters were tuned through a series of step-response tests. Starting from a conservative value to ensure stability, the value was gradually increased until a proper excitation current response time was achieved, while carefully avoiding excessive overshoot and current chattering. The initial value represents a balance between dynamic performance and robustness. To ensure the rapid convergence of the interference term and the observation term, it is generally set that []. For a fair comparison, the PI parameters were tuned to meet the required performance. Meanwhile, the torque data recorded by the test bench host computer is sampled at 1 kHz, which, considering the system’s time constant, is sufficient to accurately track the dynamic variations in the motor torque.
To verify the effectiveness of the proposed ADP-LADRC with respect to bandwidth parameter tuning, tests were conducted at a speed of 4000 rpm. Step torque references of 50 N·m, 100 N·m, and 300 N·m were applied via the test bench host computer. The initial values of the LADRC-related bandwidth parameters were set as . It should be noted that the excitation current was set to 4 A by default after the PWM was enabled. In Figure 3, the performance comparison between the initial parameter values and the ADP-LADRC iteratively optimized bandwidth parameters at 4000 rpm is shown.
Figure 3.
ADP-LADRC iteration. (a) The current tracking under the initial and the first iterative ADP-LADRC; (b) The current tracking under the first and second iterative ADP-LADRC; (c) the current tracking under the second iterative and converged ADP-LADRC.
Figure 3a–c show the dynamic behaviors of the excitation current feedback during the ADP-LADRC iteration process. It can be observed that when using only the initial bandwidth parameters, the excitation current exhibits large steady-state fluctuations, and the feedback current is significantly affected by noise. With successive iterations of the ADP-LADRC algorithm, the steady-state fluctuations gradually decrease, and the final steady-state error is reduced to within 0.2 A.
To compare the torque dynamic responses of the ADP-LADRC and PI control algorithms, step torque commands ranging from 50 N·m to 100 N·m were applied at speeds between 1000 rpm and 4000 rpm. Due to mechanical friction and leakage effects of the test bench, there is an approximate 1.2–1.8 N·m deviation between the commanded and actual torque. The torque data were recorded by the test bench at 10 Hz. As shown in Figure 4a–c, the ADP-LADRC outperforms the PI controller in terms of torque response time, overshoot suppression, and the magnitude of steady-state fluctuations.
Figure 4.
ADP-LADRC iteration. (a) Dynamic response under speed of 1000 rpm and torque command of 50/100 N·m; (b) speed 2000 rpm, torque command 50/100 N·m dynamic response under speed of 2000 rpm and torque command of 50/100 N·m; (c) dynamic response under speed of 3000 rpm and torque command of 50/100 N·m; (d) dynamic response under speed of 4000 rpm and torque command of 50/100 N·m.
In order to compare the anti-disturbance performance of the ADP-LADRC with that of the PI controller, we set the motor speed at 4000 rpm and the target value of the excitation current was 8.5A. Subsequently, a step target torque command of 100 N·m was given and the torque was maintained for a period of time. Then, the command was immediately reset. From Figure 5, under the same conditions, the overshoot of ADP-LADRC and PI are 0.2% and 8%, respectively. The torque fluctuation amplitudes of ADP-LADRC and PI were 2.86A and 1.26A, respectively. Furthermore, the disturbance regulation time of ADP-LADRC was approximately 905 ms shorter than PI control.
Figure 5.
Anti-disturbance performance comparison of ADP-LADRC and PI.
5. Discussion
The core findings of this study indicate that the control strategy combining linear active disturbance rejection control (LADRC) with adaptive dynamic programming (ADP) can effectively suppress the interference of the stator d-axis component on the excitation current control of electrically excited motors. Compared with the traditional PID control and the fixed-parameter LADRC, the proposed ADP-LADRC shows significant advantages in terms of dynamic response speed, anti-interference ability. More importantly, the introduction of the ADP algorithm has solved the problem of parameter tuning in the application of LADRC. The bandwidth parameters of the traditional LADRC usually rely on expert experience or trial and error, making it difficult to achieve the best performance. In this study, the ADP framework was adopted, which through the interactive learning of the evaluation network (Critic Network) and the execution network (Action Network), optimizes the control parameters of LADRC online, enabling it to adapt to different operating conditions. Since LSEO converts the first-order system into an integrated series type system, in the DSP system, only two discretized difference equations are needed to implement ADP-LADRC. Therefore, the computational cost is extremely low and fixed. MPC uses the system model to predict the future N steps of states and behaviors. The computational load is strongly dependent on the prediction time domain N and the problem scale. Moreover, the convergence time of the iterative algorithm will cause fluctuations in the calculation time. Therefore, LADRC can provide better real-time performance and lower computational costs.
However, this study has some limitations, which can serve as directions for further exploration in the future. The learning rate and structure of the ADP neural network still need to be manually set at present. In the future, research can be conducted on its adaptive adjustment strategies to further improve the convergence speed. Additionally, it can be considered to combine the ideas of predictive control with the existing framework to cope with more stringent constraint conditions.
6. Conclusions
This paper addresses the dynamic response and disturbance rejection challenges in the rotor current control of electrically excited motors and proposes an ADP-optimized Linear Active Disturbance Rejection Control (ADP-LADRC) strategy. By employing an Extended State Observer (ESO) for real-time estimation of internal and external disturbances as well as effective suppression of unmodeled dynamics and high-frequency noise, the electromagnetic torque response speed is increased by approximately 50%, and the steady-state current fluctuation amplitude is reduced by 10.2%, meeting the requirements of high-dynamic operating conditions and improving torque output smoothness. The linearized design based on error feedback avoids the overshoot phenomenon caused by integral saturation in traditional PI controllers, enabling the rotor current to converge rapidly without overshoot under step commands, thereby enhancing system stability. Furthermore, to address the difficulty of tuning the LADRC parameters , the ADP algorithm is introduced to construct an evaluation–action dual-network structure, using system tracking error and control energy consumption as the cost function to iteratively approximate the optimal control law online. The ADP method can complete global parameter optimization within 0.5 s, achieving more than 90% efficiency improvement compared with conventional trial-and-error tuning.
Author Contributions
Conceptualization, H.L.; Formal analysis, J.Z.; Data curation, H.P.; Writing—original draft, H.L.; Writing—review & editing, J.Z. and H.P.; Visualization, H.L. and H.P. All authors have read and agreed to the published version of the manuscript.
Funding
The authors gratefully appreciate the anonymous reviewers for their valuable comments and all the authors listed in the references. This work is supported by National Key Research and Development Program of China (No. 2024YFB2505100).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Research data is unavailable due to privacy restrictions.
Conflicts of Interest
Authors Heping Ling and Hua Pan were employed by the BYD Company. The remaining authors declared that the Research was conducted in the absence of any commercial or financial relationship that could be constructed as a potential conflict of interest.
Appendix A
Instantaneous Power Definition
When converting from three-phase ABC stationary coordinate to a two-phase stationary coordinate, considering the equal amplitude transformation, according to the theory in [], the instantaneous power can be expressed in the following complex form
where p and q, respectively, represent the three-phase instantaneous active power and the three-phase instantaneous reactive power.
Converting the above equation to the dq coordinate system
Equation (10) is obtained.
References
- Petit, Y.L. Electric Vehicle Life Cycle Analysis and Raw Material Availability. Transp. Environ. 2017. Available online: https://www.transportenvironment.org/articles (accessed on 26 October 2017).
- Widmer, J.D.; Martin, R.; Kimiabeigi, M. Electric Vehicle Traction Motors without Rare Earth Magnets. Sustain. Mater. Technol. 2015, 3, 7–13. [Google Scholar] [CrossRef]
- Zhang, X.; Li, Z. Sliding-mode observer-based mechanical parameter estimation for permanent magnet synchronous motor. IEEE Trans. Power Electron. 2015, 31, 5732–5745. [Google Scholar] [CrossRef]
- Liang, D.; Li, J.; Qu, R.; Kong, W. Adaptive second-order sliding-mode observer for PMSM sensorless control considering VSI nonlinearity. IEEE Trans. Power Electron. 2017, 33, 8994–9004. [Google Scholar] [CrossRef]
- Borhan, H.; Vahidi, A.; Phillips, A.M.; Kuang, M.L.; Kolmanovsky, I.V.; Di Cairano, S. MPC-Based Energy Management of a Power-Split Hybrid Electric Vehicle. IEEE Trans. Control Syst. Technol. 2012, 20, 593–603. [Google Scholar] [CrossRef]
- Xue, W.; Huang, Y. On frequency-domain analysis of ADRC for uncertain system. In Proceedings of the 2013American Control Conference, Washington, DC, USA, 17–19 June 2013; IEEE: New York, NY, USA, 2013; pp. 6637–6642. [Google Scholar]
- Gao, Z. Scaling and bandwidth-parameterization based controller tuning. In Proceedings of the American Control Conference, Denver, CO, USA, 4–6 June 2003; IEEE: New York, NY, USA, 2003; Volume 6, pp. 4989–4996. [Google Scholar]
- Wang, G.; Liu, R.; Zhao, N.; Ding, D.; Xu, D. Enhanced linear ADRC strategy for HF pulse voltage signal injection-based sensorless IPMSM drives. IEEE Trans. Power Electron. 2018, 34, 514–525. [Google Scholar] [CrossRef]
- Werbos, P.J. Advanced forecasting methods for global crisis warning and models of intelligence. Gen. Syst. Yearb. 1977, 22, 25–38. [Google Scholar]
- Werbos, P.J. Approximate dynamic programming for real-time control and neural modeling. In Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches; White, D.A., Sofge, D.A., Eds.; Van Nostrand: New York, NY, USA, 1992; Chapter. 13. [Google Scholar]
- Murray, J.J.; Cox, C.J.; Lendaris, G.G.; Saeks, R. Adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 2002, 32, 140–153. [Google Scholar] [CrossRef]
- Saeks, R.E.; Cox, C.J.; Mathia, K.; Maren, A.J. Asymptotic dynamic programming: Preliminary concepts and results. In Proceedings of the International Conference on Neural Networks (ICNN’97), Houston, TX, USA, 12 June 1997; pp. 2273–2278. [Google Scholar]
- Bertsekas, D.P.; Tsitsiklis, J.N. Neuro-Dynamic Programming; Athena Scientific: Belmont, MA, USA, 1996. [Google Scholar]
- Enns, R.; Si, J. Helicopter trimming and tracking control using direct neural dynamic programming. IEEE Trans. Neural Netw. 2003, 14, 929–939. [Google Scholar] [PubMed]
- Lewis, F.L.; Huang, J.; Parisini, T.; Prokhorov, D.V.; Wunsch, D.C. Special Issue on neural networks for feedback control systems. IEEE Trans. Neural Netw. 2007, 18, 969–972. [Google Scholar] [CrossRef] [PubMed]
- Lewis, F.L.; Lendaris, G.; Liu, D. Special issue on approximate dynamic programming and reinforcement learning for feedback control. IEEE Trans. Syst. Man. Cybern. B Cybern. 2008, 38, 896–897. [Google Scholar] [CrossRef]
- Ferrari, S.; Jagannathan, S.; Lewis, F.L. Special issue on approximate dynamic programming and reinforcement learning. J. Control Theory Appl. 2011, 9, 309. [Google Scholar] [CrossRef]
- Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
- Wang, F.Y.; Zhang, H.; Liu, D. Adaptive dynamic programming: An introduction. IEEE Comput. Intell. Mag. 2009, 4, 39–47. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning—An Introduction; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Si, J.; Barto, A.; Powel, W.; Wunsch, D. Handbook of Learning and Approximate Dynamic Programming; IEEE: Piscataway, NJ, USA, 2004. [Google Scholar]
- Lewis, F.L.; Liu, D. Approximate Dynamic Programming and Reinforcement Learning for Feedback Control; Wiley: Hoboken, NJ, USA, 2012. [Google Scholar]
- Akagi, H.; Watanabe, E.H.; Aredes, M. The Instantaneous Power Theory. In Instantaneous Power Theory and Applications to Power Conditioning; Wiley-IEEE Press: Tokyo, Japan; Rio deJaneiro, Brazil, 2007; pp. 41–107. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).