Hybrid AC/DC Transmission Grid Planning Based on Improved Multi-Step Backtracking Reinforcement Learning

Wang, Zhe; Dai, Yuxin; Yang, Wenxin; Yang, Yunzhang; Zhang, Zhiqi; Hu, Yahan; Liao, Jianquan; Wu, Tianchi

doi:10.3390/pr14010011

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Hybrid AC/DC Transmission Grid Planning Based on Improved Multi-Step Backtracking Reinforcement Learning

by

Zhe Wang

¹,

Yuxin Dai

¹,

Wenxin Yang

¹,

Yunzhang Yang

¹,

Zhiqi Zhang

¹,

Yahan Hu

¹,

Jianquan Liao

^2,*

and

Tianchi Wu

²

¹

State Grid Shaanxi Electric Power Company Limited Research Institute, Xi’an 710000, China

²

College of Electrical Engineering, Sichuan University Chengdu 610065, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(1), 11; https://doi.org/10.3390/pr14010011

Submission received: 19 November 2025 / Revised: 15 December 2025 / Accepted: 16 December 2025 / Published: 19 December 2025

(This article belongs to the Special Issue Advances in Optimal Operation of Modern Power Systems for Flexibility Enhancement)

Download Versions Notes

Abstract

Hybrid AC/DC transmission expansion planning must balance investment cost, supply reliability and AC/DC stability, which challenges conventional mathematical programming and heuristic methods. This paper proposes a multi-objective planning framework based on an improved multi-step backtracking α-Q(λ) reinforcement learning algorithm with eligibility traces and an adaptive learning factor. A tri-objective model minimises annual economic cost, expected power shortage and a comprehensive electrical index that combines electrical betweenness, commutation-failure margin and effective short-circuit ratio. The mixed-integer planning problem is reformulated as an interactive learning process, where the state encodes candidate line construction decisions, the action builds or cancels lines, and the eligibility-trace matrix is used to quantify line importance. Case studies on the Garver-6 system, the IEEE 24-bus reliability test system and a 500 kV regional hybrid AC/DC grid show that, compared with classical Q-learning, the proposed method yields lower annual cost, reduced expected power shortage and improved AC/DC stability; in the 500 kV system, the expected annual power shortage is reduced from 70,810 MWh to 28,320 MWh.

Keywords: transmission network planning; reinforcement learning; multi-step backtracking; multi-objective optimisation; eligibility trace; α-Q(λ) algorithm; AC/DC stability

Share and Cite

MDPI and ACS Style

Wang, Z.; Dai, Y.; Yang, W.; Yang, Y.; Zhang, Z.; Hu, Y.; Liao, J.; Wu, T. Hybrid AC/DC Transmission Grid Planning Based on Improved Multi-Step Backtracking Reinforcement Learning. Processes 2026, 14, 11. https://doi.org/10.3390/pr14010011

AMA Style

Wang Z, Dai Y, Yang W, Yang Y, Zhang Z, Hu Y, Liao J, Wu T. Hybrid AC/DC Transmission Grid Planning Based on Improved Multi-Step Backtracking Reinforcement Learning. Processes. 2026; 14(1):11. https://doi.org/10.3390/pr14010011

Chicago/Turabian Style

Wang, Zhe, Yuxin Dai, Wenxin Yang, Yunzhang Yang, Zhiqi Zhang, Yahan Hu, Jianquan Liao, and Tianchi Wu. 2026. "Hybrid AC/DC Transmission Grid Planning Based on Improved Multi-Step Backtracking Reinforcement Learning" Processes 14, no. 1: 11. https://doi.org/10.3390/pr14010011

APA Style

Wang, Z., Dai, Y., Yang, W., Yang, Y., Zhang, Z., Hu, Y., Liao, J., & Wu, T. (2026). Hybrid AC/DC Transmission Grid Planning Based on Improved Multi-Step Backtracking Reinforcement Learning. Processes, 14(1), 11. https://doi.org/10.3390/pr14010011

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid AC/DC Transmission Grid Planning Based on Improved Multi-Step Backtracking Reinforcement Learning

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI