Optimal Configuration of Transformer–Energy Storage Deeply Integrated System Based on Enhanced Q-Learning with Hybrid Guidance
Abstract
1. Introduction
- (1)
- Reward shaping incorporating grid physical constraints.
- (2)
- Dynamic action space pruning to balance exploration–exploitation.
- (3)
- Transformation of passive devices into grid-proactive co-benefit entities.
2. Proposed System Configuration
2.1. Q-Learning Fundamentals
- (1)
- Curse of Dimensionality: The joint state–action space of transformer–ESSs scales exponentially with network size (e.g., 30-bus system > 105 states).
- (2)
- Slow Convergence: Random exploration (ϵ-greedy) in sparse-reward environments wastes >60% iterations on invalid actions (e.g., violating SOC limits) [21].
- (3)
- Physical Constraint Ignorance: Standard reward functions rt fail to embed grid operational rules (e.g., voltage safety margins).
2.2. Hybrid Guidance Mechanism
- (1)
- Knowledge-Guided Exploration
- (a)
- Reward Shaping: Augment immediate reward rt with feasibility indicators, such as equipment investment costs, reduction in network losses, voltage deviation, renewable energy generation outputs, and load demand are considered. Weights are then assigned to these feasibility indicators based on the specific application scenario.
- (b)
- Adaptive ϵ-Decay: Reduce random exploration probability ϵ when gradient ∇Q(st) exceeds threshold τ:
- (2)
- Action Space Pruning
2.3. Advantage Analysis
3. System Modelling
3.1. Optimization Model for Siting and Sizing of the TES-DIS
- (1)
- Objective Function
- (a)
- Comprehensive Economic Cost Index (f1)
- (b)
- Voltage Deviation Index (f2)
- (2)
- Siting and Sizing Constraints
- (a)
- Power Flow Constraints
- (b)
- Voltage Angle Constraints
- (c)
- Power Balance Constraints
- (d)
- Line Thermal Constraints
- (e)
- Node Voltage Constraints
- (f)
- Energy Storage System Constraints
- (g)
- Transformer Capacity Expansion Constraints
3.2. Optimization Model for Siting and Sizing of the TES-DIS
- (1)
- Algorithmic Framework
- (a)
- Curse of Dimensionality: In large-scale practical grids, candidate locations increase combinatorially with system size. When coupled with continuous/discrete capacity variables, the solution space renders exhaustive search or conventional metaheuristics (e.g., Genetic Algorithms, Particle Swarm Optimization) computationally intractable.
- (b)
- Dynamic Dependencies: TES-DIS operational strategies (charging/discharging) are affected by real-time electricity prices, load fluctuations, and renewable generation outputs. Accurate benefit assessment requires long-term simulation of multi-factor dynamic interactions, which static optimization cannot adequately address.
- (c)
- Exploration–Exploitation Trade-off: Standard methods (e.g., ϵ-greedy Q-Learning) exhibit inefficient random exploration in vast spaces, susceptibility to local optima, slow convergence, and unstable solution quality.
- (a)
- Markov Decision Process (MDP) Framework: Directly maps the hybrid decision process (discrete actions for siting, continuous/discrete actions for sizing), inherently mitigating dimensionality concerns.
- (b)
- Reward Function Design: Enables autonomous learning of the TES-DIS’s long-term comprehensive value within dynamic environments.
- (c)
- Harmonizing Function H(s,a): Nodal electrical properties are evaluated to strategically guide exploration toward high-potential regions, with its weighting factor adaptively tuned during the learning process. This tuning progresses from prioritizing heuristic exploration for rapid identification of promising regions in initial phases to emphasizing exploitation of high-Q-value actions for refined optimization in final phases. Consequently, search efficiency and solution quality are significantly enhanced while premature convergence is prevented. Crucially, the integration of grid prior knowledge actively constrains the effective search space, enabling tractable optimization for large-scale practical power systems.
- (2)
- Key Algorithmic Innovations
- (a)
- Reward Function Design
- (b)
- Heuristic Function H(s,a) Design (Core Enhancement)
- (c)
- Enhanced Action Selection Strategy
- (3)
- Cascade Analysis
4. Case Study
4.1. Case Study Setup
4.2. Optimal Configuration and Benefit Comparison of Different Capacity Expansion Strategies
4.3. Comparison of the Results of Different Optimization Algorithms
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liu, W.L.; Lyu, Z.P.; Liu, H.T. An Overview of Morphological Development and Ol Technology of Power Electronics Dominated Distribution Area. Proc. CSEE 2023, 43, 4899–4921. [Google Scholar] [CrossRef]
- Zhang, T.C.; Li, G.Y.; Wang, J.X.; Wei, W.; Zhou, M. Coordinated Operation Method of Renewable Energy Power Systems Based on Feasible Region Projection Theory. Trans. China Electrotech. Soc. 2024, 39, 2784–2796. [Google Scholar] [CrossRef]
- Xiao, X.Y.; Zheng, Z.X. New Power Systems Dominated by Renewable Energy Towards the Goal of Emission Peak & Carbon Neutrality: Contribution, Key Techniques, and Challenges. Adv. Eng. Sci. 2022, 54, 47–59. [Google Scholar] [CrossRef]
- Sheng, G.H.; Qian, Y.; Luo, L.G.; Song, H.; Liu, Y.D.; Jiang, X.C. Key Technologies and Application Prospects for Operation and Maintenance of Power Equipment in New Type Power System. High Volt. Eng. 2021, 47, 3072–3084. [Google Scholar] [CrossRef]
- Ding, M.; Wang, W.S.; Wang, X.L.; Song, Y.T.; Chen, D.Z.; Sun, M. A Review on the Effect of Large-Scale PV Generation on Power Systems. Proc. CSEE 2014, 34, 1–14. [Google Scholar] [CrossRef]
- Liang, D.L.; Liu, Y.B.; Kou, P.; Cai, S.L.; Zhou, K.; Zhang, M.K. Analysis of Development Trend for Intelligent Distribution Transformer. Autom. Electr. Power Syst. 2020, 44, 1–14. [Google Scholar] [CrossRef]
- Ci, S.; Zhou, Y.L.; Wang, H.J.; Shi, Q.L. Modeling and Operation Control of Digital Energy Storage System Based on Reconfigurable Battery Network: A Case Study of Base Station Energy Storage Application. J. Glob. Energy Interconnect. 2021, 4, 427–435. [Google Scholar] [CrossRef]
- Nazemi, M.; Dehghanian, P.; Lu, X.N.; Chen, C. Uncertainty-aware Deployment of Mobile Energy Storage Systems for Distribution Grid Resilience. IEEE Trans. Smart Grid 2021, 12, 3200–3214. [Google Scholar] [CrossRef]
- Kim, J.; Dvorkin, Y. Enhancing Distribution System Resilience with Mobile Energy Storage and Microgrids. IEEE Trans. Smart Grid 2019, 10, 4996–5006. [Google Scholar] [CrossRef]
- Walker, A.; Kwon, S. Analysis on Impact of Shared Energy Storage in Residential Community: Individual Versus Shared Energy Storage. Appl. Energy 2021, 282, 116172. [Google Scholar] [CrossRef]
- Dai, R.; Esmaeilbeigi, R.; Charkhgard, H. The Utilization of Shared Energy Storage in Energy Systems: A Comprehensive Review. IEEE Trans. Smart Grid 2021, 12, 3163–3174. [Google Scholar] [CrossRef]
- Li, X.S.; Fang, Z.J.; Li, F.; Xie, S.J.; Cheng, S. Game-Theoretic Optimal Dispatch of Distribution Network with Multi-Microgrid Leasing Shared Energy Storage. Proc. CSEE 2022, 42, 6611–6625. [Google Scholar] [CrossRef]
- Kang, C.Q.; Liu, J.K.; Zhang, N. New Form of Energy Storage for Future Power Systems: Cloud Energy Storage. Autom. Electr. Power Syst. 2017, 41, 2–8. [Google Scholar] [CrossRef]
- Guo, Y.Z.; Wang, C.T.; Shi, Y.H.; Shang, J.Y.; Yang, H. Comprehensive Optimal Allocation of Electricity/Heat Cloud Energy Storage in Regional Integrated Energy System. Power Syst. Technol. 2020, 44, 1611–1623. [Google Scholar] [CrossRef]
- Huber, J.E.; Kolar, J.W. Applicability of Solid-State Transformers in Today’s and Future Distribution Grids. IEEE Trans. Smart Grid 2019, 10, 317–326. [Google Scholar] [CrossRef]
- Ji, C.; Zhong, C.L.; Li, K.M.; Xu, M.Z.; Shao, J.; Zheng, F. Research on Multiple Objection Operation Strategy Optimization of Distribution Network Including Distributed Energy Storage. In Proceedings of the International Conference on Information Science & Control Engineering, Changsha, China, 21–23 July 2017; pp. 1163–1167. [Google Scholar] [CrossRef]
- Datta, U.; Kalam, A.; Shi, J. Smart Control of BESS in PV Integrated EV Charging Station for Reducing Transformer Overloading and Providing Battery-to-Grid Service. J. Energy Storage 2020, 28, 101224. [Google Scholar] [CrossRef]
- Damousis, I.G.; Bakirtzis, A.G.; Dokopoulos, P.S. A solution to the unit-commitment problem using integer-coded genetic algorithm. IEEE Trans. Power Syst. 2004, 19, 1165–1172. [Google Scholar] [CrossRef]
- Dahat, S.A.; Isasare, M.S.; Argelwar, R.P.; Shanu, T. Co-ordinated tuning of PSS with TCSC damping controller in single machine power system using PSO. In Proceedings of the 2018 2nd International Conference on Inventive Systems and Control (ICISC), Coimbatore, India, 19–20 January 2018; pp. 301–306. [Google Scholar] [CrossRef]
- Xiao, Z.F.; Li, J.N. Two-player Optimization Control Based on Off-policy Q-learning Algorithm. Control Eng. China 2022, 29, 1874–1880. [Google Scholar] [CrossRef]
- Luo, X.C.; Li, L.; Wei, Z.L.; Ge, J.B.; Yang, L.J. Applications of life cycle cost theory in Decision-Making of investment for distribution transformers renovation. Power Syst. Technol. 2011, 35, 207–211. [Google Scholar] [CrossRef]
Proposed Method | Traditional Q-Learning | |
---|---|---|
Energy storage-related costs (yuan) | 14,310,000 | 0 |
Transformer-related costs (yuan) | 5,430,000 | 67,450,000 |
Total (yuan) | 19,740,000 | 67,450,000 |
Algorithm Type | Average Cost of Capacity Expansion (yuan) | The Average Load Factor of Transformer | Average Voltage Offset (p.u.) | Average Running Time (s) |
---|---|---|---|---|
Improved | 18,840,000 | 0.879 | −0.0339 | 55 |
Unimproved | 19,120,000 | 0.881 | −0.0401 | 493 |
PSO Algorithms | 23,534,000 | 0.883 | −0.0356 | 1021 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Z.; You, L.; Kang, Y.; Tan, D.; Cai, X.; Xiong, H.; Liu, Y. Optimal Configuration of Transformer–Energy Storage Deeply Integrated System Based on Enhanced Q-Learning with Hybrid Guidance. Processes 2025, 13, 3267. https://doi.org/10.3390/pr13103267
Li Z, You L, Kang Y, Tan D, Cai X, Xiong H, Liu Y. Optimal Configuration of Transformer–Energy Storage Deeply Integrated System Based on Enhanced Q-Learning with Hybrid Guidance. Processes. 2025; 13(10):3267. https://doi.org/10.3390/pr13103267
Chicago/Turabian StyleLi, Zhe, Li You, Yiqun Kang, Daojun Tan, Xuan Cai, Haozhe Xiong, and Yonghui Liu. 2025. "Optimal Configuration of Transformer–Energy Storage Deeply Integrated System Based on Enhanced Q-Learning with Hybrid Guidance" Processes 13, no. 10: 3267. https://doi.org/10.3390/pr13103267
APA StyleLi, Z., You, L., Kang, Y., Tan, D., Cai, X., Xiong, H., & Liu, Y. (2025). Optimal Configuration of Transformer–Energy Storage Deeply Integrated System Based on Enhanced Q-Learning with Hybrid Guidance. Processes, 13(10), 3267. https://doi.org/10.3390/pr13103267