Soft Actor-Critic-Based Power Optimization Method for UAV Wireless Charging Systems
Highlights
- A reinforcement learning-based power optimization framework is proposed for UAV wireless charging systems to mitigate power degradation caused by landing position variations.
- The learned SAC controller maintains strong performance in both the training and expanded evaluation regions and remains effective under measurement noise and model mismatch, with its practical feasibility supported by hardware validation.
- Measurement-based current observations can support power optimization without explicit online mutual-inductance identification.
- The proposed framework provides a practical data-driven alternative to model-dependent WPT optimization in position-uncertain UAV charging scenarios.
Abstract
1. Introduction
- A data-driven, SAC-based control framework is proposed for MTSR-WPT systems that achieves power optimization under uncertain UAV landing positions, formulating the problem as bounded-amplitude, direct phase control without explicit online parameter identification.
- A physics-informed dual-current state representation is constructed from measurable current responses, so that the controller can exploit position-related coupling information without requiring explicit mutual-inductance estimation.
- A deployment-oriented validation chain is provided, including nominal regional evaluation, robustness to measurement noise and model mismatch, and hardware verification on a physical prototype.
2. Modeling and Analysis of the MTSR-WPT System
3. SAC-Based Power Optimization Method for MTSR-WPT Systems
3.1. Problem Formulation
3.2. SAC-WPT Interaction Framework
- (1)
- State Observation: The agent’s observation is a dual-current state vector composed of three parts: (the transmitter current features measured under the fixed zero-phase reference excitation, implicitly reflecting the receiver’s position-dependent coupling state), (the current features measured under the current phase configuration , reflecting the immediate effect of the action), and the voltage-source phase values themselves. This design decouples the “landing position effect” from the “action effect,” providing structured physical prior knowledge to the agent.
- (2)
- Agent Decision: The SAC actor network (policy ) takes the state and outputs a phase adjustment .
- (3)
- Action Execution and Learning: The phase controller applies the adjustment, yielding the new phase vector , which is fed to the system. The environment evolves, producing a new state and a reward . The experience tuple is stored in a replay buffer for updating the agent’s neural network parameters.
3.3. SAC Algorithm Design
3.3.1. Maximum Entropy Objective and Exploration
3.3.2. Handling Periodic Phase Actions
- (1)
- Action Output and Decoding: The policy network outputs a raw action defined within the range (corresponding to adjustable phases). The actual phase adjustment is obtained through scaling and periodic wrapping:
- (2)
- Sine-Cosine Encoding of Phases: Two types of phase information are involved in state construction:
- Transmitter Coil Current Phases: Contained within and .
- Voltage Source Phases: at the end of the state vector.
- (3)
- State Dimension Specification: Based on the above unified encoding scheme, the total dimension of the state vector is .
3.3.3. Value and Policy Learning with the Dual-Current State
- (1)
- Critic Update: The critics are updated by minimizing the mean-squared Bellman error. The target value for a transition is computed as:
- (2)
- Actor Update: The actor is updated to maximize the expected Q-value and policy entropy:
- (3)
- Adaptive Temperature Adjustment: The temperature parameter , which controls the exploration-exploitation trade-off, can be learned automatically by minimizing:
3.4. Algorithm Implementation
| Algorithm 1: SAC-Based Phase Control for MTSR-WPT Systems | |
| Input: MTSR-WPT environment Env, SAC agent with Actor πθ, Critics Qω1, Qω2, replay buffer D Hyperparameters: discount γ, target update rate τ, target entropy Output: Trained SAC agent model {πθ, Qω1, Qω2} | |
| 1: | Initialize actor network θ, critic networks ω1, ω2, and target parameters 2 ← ω2 |
| 2: | for episode = 1 to M do |
| 3: | Sample receiver position (xr, yr) within working area |
| 4: | Reset Env at (xr, yr), obtain initial state s0 = [I1(0°), I2(Φ0), Φ0_encoded]T |
| 5: | for t = 0 to T − 1 do |
| 6: | Select action at ∼πθ(·|st) |
| 7: | Execute action at, observe reward rt and next state st + 1, done flag dt |
| 8: | Wrap phase components in Φt + 1 to [−180°, 180°] |
| 9: | Construct dual-current state st + 1 = [I1(0°), I2(Φt + 1), Φt + 1_encoded]T |
| 10: | Store transition (st, at, rt, st + 1, dt) in D. |
| 11: | # Agent Update |
| 12: | if |D| > batch_size then |
| 13: | Batch ← sample_hybrid_batch(D, current_episode_ratio = 0.5) |
| 14: | # Update Critic networks (Equations (16) and (17)) |
| 15: | y = r + γ(1 − d) * (min_j Qω-j(s’, ) − α log πθ(|s′)) |
| 16: | Update ω1, ω2 by ∇ωi LQ(ωi) for i = 1, 2 |
| 17: | # Update Actor network (Equation (18)) |
| 18: | Update θ by ∇θ Lπ(θ) |
| 19: | # Update temperature parameter α (Equation (19)) |
| 20: | Update α by ∇α L(α) if auto_alpha is True |
| 21: | # Soft update target networks |
| 22: | i, for i = 1, 2 |
| 23: | end if |
| 24: | end for |
| 25: | end for |
- (1)
- Application of Dual-Current State: The state constructed incorporates both I1 and I2, enabling the critic and actor networks to learn and make decisions based on decoupled position and action effects.
- (2)
- Hybrid Experience Replay: Line 13 employs a sampling strategy that mixes data from the current episode with historical data from the buffer, balancing rapid adaptation to new positions with training stability.
- (3)
- Periodic Wrapping: The wrap function in line 8 is crucial for maintaining phase values within the valid range.
- (4)
- Unified State Encoding: The phase information in the state is included through sine-cosine encoding. The current-phase information is contained within I1 and I2, while the voltage source phase information is appended as encoded values.
4. Simulation Results and Analysis
4.1. Simulation Setup
- (1)
- State Space: The proposed dual-current state representation is employed. For the 5T1R system, the state vector contains: the zero-phase current response (15 dimensions), the current-phase response (15 dimensions), and the voltage source phase (8 dimensions), resulting in a total of 38 dimensions.
- (2)
- Phase Space: The system controls four independent transmitter voltage phases (), each within the continuous range .
- (3)
- Reward Function: The reward function follows Equation (12), directly incentivizing an increase in the received charging power .
- (4)
- Algorithm Hyperparameters: The main SAC hyperparameters are summarized in Table 2.
4.2. SAC Agent Training and Performance
4.2.1. Sensitivity to Reward Coefficients
4.2.2. Performance in the Training and Expanded Evaluation Regions
- (1)
- Performance in the Training Region
- (2)
- Performance in the Expanded Evaluation Region
4.2.3. Robustness Under Practical Imperfections
- (1)
- Robustness to Measurement Noise
- (2)
- Robustness to Model Mismatch and Fine-Tuning
5. Experimental Validation
5.1. Experimental Platform Overview
5.2. Experimental Procedure: Static Point Validation
- (1)
- Baseline Measurement: Set all transmitter phases to zero (), measure and record the initial load power , and simultaneously record as the position feature.
- (2)
- Phase Control: Load the trained SAC agent for online phase control. Based on the real-time acquired dual-current state , the agent outputs a phase adjustment action once per control cycle.
- (3)
- Data Recording: The transfer power , current state , and action for each step during the control process are recorded.
5.3. Experimental Results and Analysis
6. Discussion
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| UAV | unmanned aerial vehicle |
| WPT | magnetically coupled resonant wireless power transfer |
| MTSR | multi-transmitter single-receiver |
| RL | reinforcement learning |
| SAC | Soft Actor-Critic |
| KVL | Kirchhoff’s voltage law |
| PTE | power transfer efficiency |
| SNR | signal-to-noise ratio |
| 5T1R | five-transmitter single-receiver |
References
- Jung, H.; Lee, B. Wireless Power and Bidirectional Data Transfer System for IoT and Mobile Devices. IEEE Trans. Ind. Electron. 2022, 69, 11832–11836. [Google Scholar] [CrossRef]
- Liu, Z.; Wang, J.; Lim, E.G.; Leach, M.; Wang, Z.; Pei, R.; Jiang, Z.; Zhang, W.; Huang, Y. Efficiency-Enhanced Wireless Power Transfer System Featuring a Pattern-Reconfigurable Antenna for Mobile Charging. IEEE Antennas Wirel. Propag. Lett. 2025, 24, 4242–4246. [Google Scholar] [CrossRef]
- Lee, C. Topology Optimization of the Transmitter Ferrite and Receiver Coil for Minimizing the Weight of the Wireless-Charging Portable Devices. IEEE Trans. Ind. Electron. 2024, 71, 12192–12201. [Google Scholar] [CrossRef]
- Zhu, H.; Wu, X.; Xiong, F.; Tahir, M. Robust Wireless Power and Data Transmission System against Misalignment for Implantable Medical Devices. IEEE Trans. Power Electron. 2025, 40, 14169–14180. [Google Scholar] [CrossRef]
- Ma, Y.; Sun, Y.; Cui, K.; Fan, X. A 6.78-MHz Digital Rectifier-Based Single-Stage Wireless Charger Using Digital-Controlled CC–CV Technique for Implantable Biomedical Devices. IEEE Trans. Power Electron. 2023, 38, 101–106. [Google Scholar] [CrossRef]
- Wu, Y.; Pan, W.; Xu, W.; Xie, R.; Zhuang, Y.; Mao, X.; Zhang, Y. An Integrated Charger of Wireless Power Transfer, Onboard Charger, and Auxiliary Power Module for Electric Vehicles. IEEE Trans. Power Electron. 2025, 40, 6334–6344. [Google Scholar] [CrossRef]
- Meira Gomes, Z.; Prado, E.D.O.; Le Gall, Y.; Damm, G.; Ripoll, C.; Pinheiro, J.R. Design, Model, and Control of a Dynamic Wireless Power Transfer System for a 30-kW Electric Vehicle Charger Application. IEEE J. Emerg. Sel. Top. Power Electron. 2025, 13, 3882–3894. [Google Scholar] [CrossRef]
- Chu, S.Y.; Cui, X.; Zan, X.; Avestruz, A. Transfer-Power Measurement Using a Non-Contact Method for Fair and Accurate Metering of Wireless Power Transfer in Electric Vehicles. IEEE Trans. Power Electron. 2022, 37, 1244–1271. [Google Scholar] [CrossRef]
- Gu, Y.; Wang, J.; Liang, Z.; Zhang, Z. Flexible Constant-Power Range Extension of Self-Oscillating System for Wireless In-Flight Charging of Drones. IEEE Trans. Power Electron. 2024, 39, 15342–15355. [Google Scholar] [CrossRef]
- Citroni, R.; Mangini, F.; Frezza, F. Efficient Integration of Ultra-low Power Techniques and Energy Harvesting in Self-Sufficient Devices: A Comprehensive Overview of Current Progress and Future Directions. Sensors 2024, 24, 4471. [Google Scholar] [CrossRef]
- Dai, X.; Li, X.; Li, Y.; Deng, P.; Tang, C. A Maximum Power Transfer Tracking Method for WPT Systems with Coupling Coefficient Identification Considering Two-Value Problem. Energies 2017, 10, 1665. [Google Scholar] [CrossRef]
- Zhang, Y.; Yan, Z.; Liang, Z.; Li, S.; Mi, C.C. A High-Power Wireless Charging System Using LCL-N Topology to Achieve a Compact and Low-Cost Receiver. IEEE Trans. Power Electron. 2020, 35, 131–137. [Google Scholar] [CrossRef]
- Li, J.; Liu, X.; Leung, K.N. A 24-to-240 W 95.6%-Efficiency <300-μs-Settling-Time Hybrid MCR/PT Wireless Power Transfer System. IEEE Trans. Power Electron. 2024, 39, 8928–8946. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, B.; Zhai, Y.; Wang, H.; Yuan, B.; Lou, Z. A Novel Type of 3-D Transmitter for Omnidirectional Wireless Power Transfer. IEEE Trans. Power Electron. 2024, 39, 6537–6548. [Google Scholar] [CrossRef]
- Feng, J.; Li, Q.; Lee, F.C.; Fu, M. Transmitter Coils Design for Free-Positioning Omnidirectional Wireless Power Transfer System. IEEE Trans. Ind. Inform. 2019, 15, 4656–4664. [Google Scholar] [CrossRef]
- Pahlavan, S.; Shooshtari, M.; Jafarabadi Ashtiani, S. Star-Shaped Coils in the Transmitter Array for Receiver Rotation Tolerance in Free-Moving Wireless Power Transfer Applications. Energies 2022, 15, 8643. [Google Scholar] [CrossRef]
- Zhu, Q.; Su, M.; Sun, Y.; Tang, W.; Hu, A.P. Field Orientation Based on Current Amplitude and Phase Angle Control for Wireless Power Transfer. IEEE Trans. Ind. Electron. 2018, 65, 4758–4770. [Google Scholar] [CrossRef]
- Kim, D.; Ahn, D. Maximum Efficiency Point Tracking for Multiple-Transmitter Wireless Power Transfer. IEEE Trans. Power Electron. 2020, 35, 11391–11400. [Google Scholar] [CrossRef]
- Zhu, Z.; Yuan, H.; Liang, C.; Wang, C.; Lv, S.; Yang, A.; Chu, J.; Rong, M.; Wang, X.; Hu, A.P. Maximum Efficiency Tracking of a Wireless Power Transfer System With 3-D Coupling Capability Using a Planar Transmitter Coil Configuration. IEEE Trans. Power Electron. 2024, 39, 10594–10604. [Google Scholar] [CrossRef]
- Tian, X.; Chau, K.T.; Liu, W.; Pang, H.; Lee, C.H.T. Maximum Power Tracking for Magnetic Field Editing-Based Omnidirectional Wireless Power Transfer. IEEE Trans. Power Electron. 2022, 37, 12901–12912. [Google Scholar] [CrossRef]
- Liu, C.; Ma, J.; Liu, X.; Qiu, L.; Wu, W.; Fang, Y. A Predictive Control Method Based on Neural Predictor and Soft Actor–Critic for Power Converters. IEEE Trans. Ind. Electron. 2025, 72, 4556–4566. [Google Scholar] [CrossRef]
- Yang, H.; Chen, Q.; Shi, X.; Xu, Y.; Zhang, X. Fast Charging Management of a Lithium-Ion Battery and Cooling System: A Stackelberg Game-Based Soft Actor Critic−Deep Reinforcement Learning Method. IEEE Trans. Ind. Electron. 2025, 72, 11347–11359. [Google Scholar] [CrossRef]
- Ye, J.; Zhao, D.; Pan, X.; Li, S.; Wang, B.; Zhang, X.; Iu, H.H.C. Improving Voltage Regulation of Interleaved DC–DC Boost Converter via Soft Actor–Critic Algorithm-Based Reinforcement Learning Controller. IEEE J. Emerg. Sel. Top. Power Electron. 2025, 13, 5958–5969. [Google Scholar] [CrossRef]
- Zeng, Y.; Pou, J.; Sun, C.; Maswood, A.I.; Dong, J.; Mukherjee, S.; Gupta, A.K. Multiagent Deep Reinforcement Learning-Aided Output Current Sharing Control for Input-Series Output-Parallel Dual Active Bridge Converter. IEEE Trans. Power Electron. 2022, 37, 12955–12961. [Google Scholar] [CrossRef]
- Zeng, Y.; Liang, G.; Liu, Q.; Rodriguez, E.; Pou, J.; Jie, H.; Liu, X.; Zhang, X.; Kotturu, J.; Gupta, A. Multi-Agent Soft Actor-Critic Aided Active Disturbance Rejection Control of DC Solid-State Transformer. IEEE Trans. Ind. Electron. 2025, 72, 492–503. [Google Scholar] [CrossRef]
- Yong, H.; Seo, J.; Kim, J.; Kim, M.; Choi, J. Suspension Control Strategies Using Switched Soft Actor-Critic Models for Real Roads. IEEE Trans. Ind. Electron. 2023, 70, 824–832. [Google Scholar] [CrossRef]
- Xu, C.; Wang, J.; Ding, Y.; Zheng, C. UAV power line inspection strategy based on SAC algorithm. Electr. Power Syst. Res. 2025, 248, 111925. [Google Scholar] [CrossRef]
- Yuan, Q. S-Parameters for Calculating the Maximum Efficiency of a MIMO-WPT System: Applicable to Near/Far Field Coupling, Capacitive/Magnetic Coupling. IEEE Microw. Mag. 2023, 24, 40–48. [Google Scholar] [CrossRef]










| Parameter Type | Parameter Value |
|---|---|
| 56.0/56.0 | |
| 220/220 | |
| 50/3.5/3.5/50 | |
| 100 | |
| 8 | |
| Frequencies (MHz) | 1.45 |
| Hyperparameter | Setting |
|---|---|
| State/Action dimension | 38/4 |
| * Reward coefficient / | 10/−5 |
| Initial learning rate actor/critic/ | 1 × 10−4/1 × 10−4/1 × 10−4 |
| Hidden layers structure of actor/critic | [256, 256] |
| Discount factor | 0.98 |
| −4 | |
| Initial temperature | 0.15 |
| Soft update coefficient | 0.005 |
| Capacity of replay buffer | 5000 |
| Batch size | 64 |
| Total episodes | 500 |
| Extending coefficient κ | 0.15 |
| Sampling accuracy of training region | 0.1 mm |
| Optimizer of actor/critic | Adaptive Moment Estimation (Adam) |
| Key phase for grid search | [−135, −45, 45, 135] |
| Noise scales for multi-scale perturbation | [60, 30, 15, 7.5] |
| Metric | Training Region | Evaluation Region |
|---|---|---|
| (mean ± std) | 0.15 ± 0.08 W | 0.11 ± 0.08 W |
| (mean ± std) | 0.44 ± 0.04 W | 0.43 ± 0.05 W |
| (mean ± std) | 0.43 ± 0.04 W | 0.42 ± 0.06 W |
| (mean ± std) | 0.46 ± 0.04 W | 0.45 ± 0.05 W |
| Mean Gain | 2.87/+187.2% | 3.92/+291.6% |
| Mean achieved ratio | 96.9% | 95.6% |
| Mean stable ratio | 95.9% | 94.4% |
| Steps to 95%Pmax (p90) | 2.0 | 2.0 |
| PTE (zero-phase) | 89.4% | 83.5% |
| PTE (after optimization) | 93.4% | 93.0% |
| SNR_VI | Ratio (Mean/p10) | Stability (Mean/p10) | Steps to 95%Pmax (p90) |
|---|---|---|---|
| clean | 0.97/0.92 | 0.98/0.95 | 2.0 |
| 40 dB | 0.97/0.92 | 0.98/0.95 | 2.0 |
| 30 dB | 0.96/0.92 | 0.98/0.95 | 2.0 |
| 20 dB | 0.95/0.90 | 0.97/0.93 | 2.9 |
| Case | Ratio (Mean/p10) | Stability (Mean/p10) | Steps to 95%Pmax (p90) |
|---|---|---|---|
| Nominal policy on nominal model | 0.97/0.92 | 0.98/0.95 | 2.0 |
| Nominal policy on shifted model | 0.54/0.20 | 0.72/0.14 | 4.3 |
| Fine-tuned policy on shifted model | 0.93/0.84 | 0.92/0.80 | 3.0 |
| Parameter Type | Parameter Value |
|---|---|
| Coil inductances (μH) ///// | 57.2/57.1/57.7/57.5/57.6/56.8 |
| Capacitances (pF) ///// | 227.3/226.4/230.1/231.4/223.7/229.3 |
| Resistances (Ω) ////// | 3.7/3.6/4.7/3.8/3.8/3.5/51.5 |
| Height of the receiver plane (mm) | 100 |
| Voltage amplitude (V) of | 8 |
| Frequencies (MHz) | 1.45 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Dai, Z.; Yang, Y.; Luo, Y.; Lin, Z.; Yang, G. Soft Actor-Critic-Based Power Optimization Method for UAV Wireless Charging Systems. Drones 2026, 10, 218. https://doi.org/10.3390/drones10030218
Dai Z, Yang Y, Luo Y, Lin Z, Yang G. Soft Actor-Critic-Based Power Optimization Method for UAV Wireless Charging Systems. Drones. 2026; 10(3):218. https://doi.org/10.3390/drones10030218
Chicago/Turabian StyleDai, Zhuoyue, Yongmin Yang, Yanting Luo, Zhilong Lin, and Guanpeng Yang. 2026. "Soft Actor-Critic-Based Power Optimization Method for UAV Wireless Charging Systems" Drones 10, no. 3: 218. https://doi.org/10.3390/drones10030218
APA StyleDai, Z., Yang, Y., Luo, Y., Lin, Z., & Yang, G. (2026). Soft Actor-Critic-Based Power Optimization Method for UAV Wireless Charging Systems. Drones, 10(3), 218. https://doi.org/10.3390/drones10030218

