Faulty Links’ Fast Recovery Method Based on Deep Reinforcement Learning

Huang, Wanwei; Gui, Wenqiang; Li, Yingying; Lv, Qingsong; Zhang, Jia; He, Xi

doi:10.3390/a18050241

Open AccessArticle

Faulty Links’ Fast Recovery Method Based on Deep Reinforcement Learning

by

Wanwei Huang

¹

,

Wenqiang Gui

¹,

Yingying Li

^2,*,

Qingsong Lv

³,

Jia Zhang

⁴ and

Xi He

⁵

¹

Department of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450007, China

²

College of Electronics & Communication Engineering, Shenzhen Polytechnic University, Shenzhen 518005, China

³

Henan Xinda Wangyu Technology Co., Ltd., Zhengzhou 450003, China

⁴

Zhengzhou Xinda Institute of Advanced Technology, Zhengzhou 450001, China

⁵

Henan Jiuyu Tenglong Information Engineering Co., Ltd., Zhengzhou 450005, China

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(5), 241; https://doi.org/10.3390/a18050241

Submission received: 22 February 2025 / Revised: 2 April 2025 / Accepted: 20 April 2025 / Published: 24 April 2025

(This article belongs to the Section Evolutionary Algorithms and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Aiming to address the high recovery delay and link congestion issues in the communication network of Wide-Area Measurement Systems (WAMSs), this paper introduces Software-Defined Networking (SDN) and proposes a deep reinforcement learning-based faulty-link fast recovery method (DDPG-LBBP). The DDPG-LBBP method takes delay and link utilization as the optimization objectives and uses gated recurrent neural network to accelerate algorithm convergence and output the optimal link weights for load balancing. By designing maximally disjoint backup paths, the method ensures the independence of the primary and backup paths, effectively preventing secondary failures caused by path overlap. The experiment compares the (1+2_ε)-BPCA, FFRLI, and LIR methods using IEEE 30 and IEEE 57 benchmark power system communication network topologies. Experimental results show that DDPG-LBBP outperforms the others in faulty-link recovery delay, packet loss rate, and recovery success rate. Specifically, compared to the superior algorithm (1+2_ε)-BPCA, recovery delay is decreased by about 12.26% and recovery success rate is improved by about 6.91%. Additionally, packet loss rate is decreased by about 15.31% compared to the superior FFRLI method.

Keywords:

wide-area measurement system communication network; software-defined network; deep reinforcement learning; gated recurrent neural network; backup path

1. Introduction

A WAMS, as an essential component of a smart grid [1], serves as a critical infrastructure for the real-time monitoring, protection, and control of power grids. The WAMS relies on the synchronized Phase Measurement Unit (PMU) to collect phasor data of each node in the power topology at high speed. The data are then transmitted via a high-speed communication network to the Phasor Data Concentrator (PDC), which analyzes the power system’s operational status based on phasor data collected at the same timestamp and takes appropriate measures to ensure the safe operation of the power system in real time. WAMSs have been widely deployed in power systems worldwide [2]. As a system that integrates electrical technology, computer, and communication technology, the communication infrastructure is a decisive factor in the successful operation of the WAMS in power systems [3]. During the transmission of phasor data, communication network link failures caused by network congestion or by adverse weather conditions affecting fiber optic and satellite link terminals prevent the PDC from fully collecting the phasor data. As a result, the WAMS is unable to accurately monitor and control the operation of the power system [4]. Therefore, it is urgent to address the issue of rapid recovery from phase transmission link failures in the communication network of the WAMS to ensure the real-time transmission of phasor data within the system.

In recent years, with the rapid development of SDN technology [5], the WAMS communication network based on SDN has attracted significant attention from scholars both domestically and internationally, providing new research perspectives for addressing the rapid recovery of phase transmission link failures in WAMS communication networks [6]. Leveraging SDN’s unique three-layer architecture, which separates the data plane, control plane, and application plane, SDN demonstrates distinct advantages in the rapid recovery of faulty links. On the one hand, controllers in the control plane can monitor the operational status of the network topology in real time, enabling the rapid detection and localization of faulty links within the network topology. On the other hand, the centralized configuration and control of the entire network topology enabled by the OpenFlow protocol used in SDN simplifies the distribution and installation of flow rules for primary and backup paths within the network topology [7]. Currently, fault recovery methods can be divided into two categories: control plane-based and data plane-based fast faulty-link recovery methods, depending on whether the participation of controllers in the control plane is required during the faulty-link recovery process [8].

(1): Control Plane-Based Approach for Fast Recovery of Faulty Links

To address the issue of prolonged recovery delay in control plane-based faulty-link recovery methods, the data plane-based approach for fast recovery of faulty links pre-computes the primary and backup paths using the SDN controller. When a link failure occurs, the switch forwards phase data packets along the backup path without the need for interaction with the controller, thus reducing the recovery delay. In a related study, Muthumanikandan V et al. used the BFD protocol to detect link failures; when a link failure occurred, the FRT method was used to compute the shortest recovery path, thereby reducing the number of flow entries required for the recovery path [9]. Zheng L et al. designed rounding algorithms with bounded approximation factors to address the faulty-link recovery problem; the SDN controller pre-computed alternative paths before the link failure occurred, enabling fast recovery [10]. Li J et al. proposed a backup path search algorithm that considered both backup path length and link load balancing, which reduced the backup path length and the maximum link utilization during fault recovery [11]. Astaneh S A et al. introduced a path risk metric to re-route the end-to-end traffic of a failed link by minimizing traffic operations, thereby ensuring fast recovery of the traffic [12]. Duan T et al. proposed the Hybrid Fast Path Recovery Algorithm (HFPR-A); when a communication link fails, HFPR-A quickly identifies the shortest path or an approximate shortest path between nodes using the fast path recovery algorithm [13]. The control plane-based faulty-link recovery algorithm mentioned above first locates the position of the faulty link in the network topology and then designs a recovery algorithm that satisfies constraints such as the shortest recovery path and link load. Finally, the SDN controller calculates new forwarding paths for the network topology and sends the updated flow table entries to the switches. However, these methods introduce significant latency during the interaction between the controller and the switches, making them unsuitable for WAMSs, where end-to-end data transmission latency is tightly constrained.

(2): Data Plane-Based Approach for Fast Recovery of Faulty Links

The data plane-based approach for recovery of faulty links requires the participation of the controller during the recovery process. After detecting a link failure, the controller calculates new data forwarding paths and configures the forwarding rules in the switches. In reference [14], a ring-based single-link failure recovery method (RSFR) was proposed, where backup paths were designed based on node importance and link performance within the network topology. These paths were periodically updated to reduce the consumption of backup resources. In reference [15], backup paths were pre-stored in the switches. When a link failure occurred, the disrupted data packets were aggregated into a single flow entry using the VLAN ID, thereby reducing the number of traffic packets. In reference [16], the objective of minimizing the disrupted traffic caused by link failures was considered. This was achieved by formulating the selection of backup paths as an integer programming problem, which reduced the need for backup flow rules and conserved backup bandwidth. The FFRLI fault-link recovery algorithm was proposed in reference [17] and used Markov chains to assess the importance of links. It preinstalled backup paths for primary routes within switches, while storing backup paths for secondary links in the controller, thereby effectively reducing the consumption of backup resources. Based on the mathematical analysis of WAMS topology characteristics, reference [18] proposed the (1+2_ε)-BPCA backup path construction algorithm and backup path installation method, which effectively reduced data transmission delays during and after failover. The data plane-based fault recovery method mentioned above addresses the prolonged recovery time issue associated with control plane-based fault recovery by pre-embedding the backup path in the switch. However, the pre-calculated backup paths do not fully account for link load conditions. As a result, when a link fails, the data being routed through the backup path are vulnerable to secondary link failures due to congestion within the backup path.

To address the aforementioned issues, this paper proposes a fast fault recovery method for communication links in WAMS networks, based on the data plane of SDN. The method, named Load Balancing Backup Path Based on Deep Deterministic Policy Gradient (DDPG-LBBP), leverages deep reinforcement learning for efficient fault recovery. Firstly, the DDPG-LBBP fault recovery method inputs the state features of the communication network in the WAMS into an improved DDPG algorithm with a Convolutional Neural Network (CNN). The algorithm framework is built with optimization objectives such as link delay and link utilization. During the training process, the gated recurrent neural network is employed to effectively mitigate the problems of gradient vanishing or explosion, thereby improving the training efficiency. After the algorithm converges, the optimal link weights for the network topology are output. Second, the maximum disjoint backup paths are designed to prevent the overlap of nodes and links between the primary and backup paths, thereby avoiding secondary network link failures caused by the presence of overlapping links and nodes in both paths. Finally, we design the maximum disjoint backup paths to prevent the overlap of nodes and links between the primary and backup paths. Additionally, we add a backup tag label to the header of the phase packets in the faulty link, which guides the phase packets through the backup path, enabling the rapid recovery of the faulty link. The contributions of this paper can be summarized as follows:

We summarize the transmission characteristics of the wide-area measurement system (WAMS) communication network, analyze the reasons why conventional faulty-link recovery algorithms are unsuitable for WAMSs, and emphasize the importance of minimizing fault recovery time and ensuring network link load balancing during the fault recovery process.
We propose a rapid fault recovery method based on deep reinforcement learning (DDPG-LBBP) and design a suitable algorithmic framework for WAMS networks. During training, the Gated Recurrent Unit (GRU) is employed to mitigate gradient issues and enhance efficiency. After convergence, the model outputs optimal link weights to achieve link load balancing.
We design the maximum disjoint backup path and introduce the backup label mechanism. The former prevents the overlap of nodes and links between the primary path and the backup path, eliminating secondary network link failures. The latter guides the phasor data packets to be transmitted through the backup path, enabling the rapid recovery of the faulty link and enhancing the overall reliability of the WAMS communication network.
The IEEE 30 and IEEE 57 benchmark power system communication networks are adopted as experimental network topologies. Under different data traffic intensities, the DDPG-LBBP algorithm is compared with the (1+2_ε)-BPCA, FFRLI, and LIR methods, respectively. The experimental results show that the DDPG-LBBP fault recovery method has the advantages of low faulty-link recovery delay, low packet loss rate, and high recovery success rate. Compared with the (1+2_ε)-BPCA algorithm, the recovery delay is reduced by about 12.26%, and the faulty-link recovery success rate is increased by about 6.91%. Compared with the FFRLI method, the packet loss rate in the network topology after faulty-link recovery is reduced by about 15.31%.

2. Wide-Area Measurement System Communication Network SDN Architecture

2.1. System Architecture Design

Based on SDN’s capability to acquire real-time global network topology and link status information, this paper designed an SDN-based WAMS communication network architecture to guide data packet forwarding within the network topology. As illustrated in Figure 1, the architecture primarily consists of the DDPG-LBBP algorithm in the application plane, SDN controllers in the control plane, and OpenFlow switches in the data plane.

(1): Application plane: The DDPG-LBBP algorithm resides in the application plane, where it is responsible for formulating path optimization strategies within the network topology. Firstly, the DDPG-LBBP algorithm obtains network status information, such as link bandwidth, utilization, delay, and packet loss rate, through the northbound interface. Based on this information, the algorithm leverages a neural network to produce training outputs, which are used to calculate action values—specifically, link weights. The controller then computes the traffic transmission path from the source node to the destination node using these link weights. Next, the controller gathers network topology status information and forwards it to the DDPG-LBBP algorithm for reward value computation. The algorithm updates the parameters within the neural network based on feedback from the reward value. After multiple iterations, the training process is completed. Finally, upon the convergence of the DDPG-LBBP algorithm’s training, the weights of all links in the network topology are determined and output. These weights are then utilized via the northbound interface to guide the controller in formulating the optimal path strategy between the source and destination nodes.
(2): Control plane: The control plane is composed of SDN controller entities, which are responsible for tasks such as acquiring link information and generating flow table entries. The controller periodically acquires network topology information via the southbound interface and provides real-time network status updates to the DDPG-LBBP algorithm through the northbound interface. Based on the link weights generated by the application plane, the controller calculates the data transmission paths between nodes, issues flow table entries to each switch, and directs data forwarding to the data plane.
(3): Data plane: The data plane mainly consists of network forwarding devices, such as OpenFlow switches. It features flexible flow tables and dynamic packet processing capabilities. It is responsible for dynamically configuring and managing the behavior of the data plane based on instructions from the control plane. It can handle packet forwarding, routing, and processing according to the control plane’s commands.

2.2. System Modeling

For the definitions of equation-related terms and abbreviations, please refer to Table A1 in Appendix A. The communication network topology of the WAMS is modelled as a directed graph, where

V = \{v_{1}, v_{2}, \dots v_{n}\}

represents the set of switches in the network topology,

E = \{e_{1, 2}, e_{1, 3}, \dots e_{i, j}\}

represents the set of links in the network topology, and the link connecting switches

v_{i}

and

v_{j}

in the network topology is marked

e_{i, j}

. The link weight of link

e_{i, j}

is marked

w e i g h t_{e_{i, j}}

, and the path from source node

s

to destination node

d

is marked

p_{s, d}

.

The network delay of the WAMS communication network consists of two parts,

d e l a y (v_{i})

and

d e l a y_{t} (e_{i, j})

, where

d e l a y (v_{i})

represents the delay for the PMU device to convert the voltage and current phasor data into data packets and transmit them to the switch connected to the PMU, and

d e l a y_{t} (e_{i, j})

represents the delay function of the link

e_{i, j}

at time

t

. The calculation process of the data transmission delay

D e l a y_{t} (p_{s, d})

between the phasor data from the source node

s

to the destination node

d

in the network topology is shown in Equation (1):

D e l a y_{t} (p_{s, d}) = d e l a y (v_{i}) + \sum_{e_{i, j} \in p_{s, d}} d e l a y_{t} (e_{i, j})

(1)

Assuming that link

e_{i, j}

is interconnected via port

m

of switch

v_{i}

and port

n

of switch

v_{j}

, the utilized bandwidth

b a n d w i d t h_{e_{i, j}} (t)

of link

e_{i, j}

at time

t

is determined by the aggregate number of bytes forwarded through ports

m

and

n

. The computational process for this determination is shown in Equation (2):

b a n d w i t h_{e_{i, j}} (t) = \frac{b_{i, m} (t) - b_{i, m} (t - T) + b_{j, n} (t) - b_{j, n} (t - T)}{T}

(2)

In Equation (2),

b_{i, m} (t)

represents the quantity of bytes transmitted by the

m

port of switch

v_{i}

at time

t

,

b_{i, m} (t - T)

denotes the number of bytes relayed by the

m

port of switch

v_{i}

at time

t - T

, and

T

signifies the interval for polling the switch status parameter.

The maximal attainable bandwidth for link

e_{i, j}

is denoted by

\max (b a n d w i d t h_{e_{i, j}})

, while the utilization rate of link

e_{i, j}

at time

t

is indicated by

R_{e_{i, j}} (t)

. The utilization of link

p_{s, d}

at time

t

is ascertained by the maximum utilization

B a n d w i d t h_{p_{s, d}} (t)

among the set of links comprising the path. The utilization

R_{e_{i, j}} (t)

of link

t

at time

t

and the utilization

B a n d w i d t h_{p_{s, d}} (t)

of path

p_{s, d}

are quantified using Equation (3):

\{\begin{matrix} R_{e_{i, j}} (t) = \frac{b a n d w i d t h_{e_{i, j}} (t)}{\max (b a n d w i d t h_{e_{i, j}})} \\ B a n d w i d t h_{p_{s, d}} (t) = \max \{R_{e_{i, j}} (t)\}, e_{i, j} \in p_{s, d} \end{matrix}

(3)

Due to the disparate dimensions of data transmission delay

D e l a y_{t} (p_{s, d})

, link utilization

B a n d w i d t h_{p_{s, d}} (t)

, and packet loss rate

L o s s_{e_{i, j}} (t)

, the link transmission delay

D e l a y_{t} (p_{s, d})

is normalized to establish a uniform metric, thereby transforming the dimensional expression into a scalar value, as delineated in Equation (4):

D e l a {y^{'}}_{t} (p_{s, d}) = \frac{\max \{D e l a y_{t} (p_{s, d})\} - D e l a y_{t} (p_{s, d})}{\max \{D e l a y_{t} (p_{s, d})\}}

(4)

In Equation (4),

D e l a {y^{'}}_{t} (p_{s, d})

represents the normalized value of the data transmission delay, while

\max \{D e l a y_{t} (p_{s, d})\}

denotes the maximum tolerable transmission delay for data transfer within the WAMS.

3. Faulty-Path Recovery Scheme Based on DDPG-LBBP

3.1. DDPG-LBBP Data Transmission Path Optimization Algorithm

DDPG-LBBP combines DDPG in deep reinforcement learning with Gated Recurrent Unit (GRU) to optimize the data transmission paths by leveraging the interactive capabilities of the deep reinforcement learning environment and the decision-making capabilities of deep learning. The DDPG algorithm, based on the actor–critic framework, employs a deep neural network to adapt the policy network and the Q-network. Specifically, the actor represents the policy network and is responsible for generating action strategies, and the critic represents the Q-network and is responsible for evaluating the value of the strategy. In the DDPG algorithm, both the policy function and the value function utilize a dual neural network architecture, which includes an online network and a target network. This structure helps improve the stability of the algorithm. DDPG-LBBP incorporates the DDPG algorithm into the optimization of data transmission paths. It leverages the experience replay mechanism of DDPG to store the states, actions, rewards, and other pertinent information gathered during each time step of the agent’s interaction with the environment in an experience replay pool D. When updating the parameters, the agent randomly samples from this experience replay buffer, reducing the correlation between samples and contributing to the enhancement of the algorithm’s convergence and stability. DDPG-LBBP replaces the extension network in the DDPG algorithm with a GRU, integrates the GRU update process with the online network and target network in the DDPG algorithm, and uses the update and reset mechanism of the GRU model to improve the operating efficiency of the algorithm. The DDPG- LBBP algorithm framework is shown in Figure 2.

In Figure 2, the actor and critic networks in the DDPG-LBBP algorithm are structured as dual online and target networks with identical architectures. Notably, the GRU is employed to replace the original online policy network, target policy network, online Q-network, and target Q-network utilized in the base DDPG algorithm. This substitution aims to enhance the capabilities of the networks for sequential decision-making and long-term dependency modeling. A gate control mechanism is introduced in the GRU model, with reset gates and update gates as the core, which is conducive to timely detection of dynamic changes in the network topology, thereby more accurately adjusting the output actions of the agent. The parameter update process of the actor network and the critic network in the DDPG-LBBP algorithm is as follows:

(1): Actor network update

In the DDPG-LBBP algorithm, the actor network is responsible for the output of the agent’s actions. The actor network includes the online policy network and the target policy network. Among them, the online policy network is used to output the action

a_{t} = μ (s_{t}| θ^{μ})

based on the current state

s_{t}

and the policy parameter

θ^{μ}

. The action interacts with the environment to generate the next action. In state

s_{t + 1}

, the reward value

r_{t}

is obtained and the interaction information

(s_{t}, a_{t}, r_{t}, s_{t + 1})

at that moment is stored in the experience replay pool D. The parameters of the target policy network are updated via a soft update mechanism, which is grounded in the parameters of the online policy network. To update the online policy network

\nabla_{θ^{μ}} G (μ_{θ^{μ}})

, the online Q-network is tasked with two primary responsibilities: providing the policy gradient

g r a d [Q]

and sampling a subset of data

(s_{t}, a_{t}, r_{t}, s_{t + 1})

from the experience replay buffer. These sampled data are then utilized to propagate the policy gradient backward. The policy gradient is calculated in accordance with Equation (5):

\begin{array}{l} \nabla_{θ^{μ}} G (μ_{θ^{μ}}) & = g r a d [Q] * g r a d [μ] \\ \approx \frac{1}{N} \sum_{i} \nabla_{a} Q {(s, a |Q^{a})|}_{s = s_{t}, a = μ (s_{t})} \nabla_{θ^{μ}} μ (s| θ^{μ}) |s_{t} \end{array}

(5)

In Equation (5), the variable

g r a d [Q]

represents the action gradient derived from the online Q-network. This gradient serves as a crucial factor in guiding the adjustment of the online policy network’s parameters, thereby fine-tuning the policy’s direction.

g r a d [μ]

represents the parameter gradient of the online policy network. This gradient ensures that the online policy network learns to select actions that yield high rewards. Meanwhile,

N

represents the number of randomly sampled experiences from the experience replay buffer.

(2): Critic network updates

The critic network comprises the online Q-network and the target Q-network, both tasked with estimating the action-value function. Specifically, the online Q-network computes the Q-value for a given state

s_{t}

and action

a_{t}

, sampled from the experience replay buffer. The target Q-network is used to calculate the Q value of state

s_{t + 1}

sampled from the experience replay pool and action

a_{t + 1}

selected from the target Policy network. The online Q-network updates the network parameters based on two Q values to minimize errors. The updating process is shown in Equation (6):

L = \frac{1}{N} {\sum_{i} (y_{t} - Q (s_{t}, a_{t} |θ^{Q}))}^{2}

(6)

In Equation (6),

N

represents the number of samples taken at random, and

y_{t}

represents the Q value of selecting action

a_{t + 1}

in the target Q-network when calculating the state

s_{t + 1}

.

The target Q-network performs iterative training based on state

s_{t + 1}

and action

a_{t + 1}

to obtain the target value function

Q^{'} (s_{t + 1}, μ^{'} (s_{t + 1} |θ^{μ^{'}}) |θ^{Q^{'}})

. The parameter

θ^{Q^{'}}

is obtained by regularly copying the online policy network parameters

θ^{Q^{'}}

. The target Q network calculates the target return value

y_{t}

for the online Q-network. The calculation process of

y_{t}

is shown in Equation (7):

y_{t} = r_{t} + γ Q^{'} (s_{t + 1}, μ^{'} (s_{t + 1} |θ^{μ^{'}}) |θ^{Q^{'}})

(7)

In Equation (7),

r_{t}

signifies the reward value accrued through executing action

a_{t}

in state

s_{t}

. Furthermore,

θ^{Q^{'}}

denotes the parameters of the target Q-network,

θ^{μ^{'}}

represents the parameters of the target policy network, and

γ \in [0, 1]

stands for the discount factor, which factors in the temporal decay of future rewards.

3.2. DDPG-LBBP Agent Interacts with the Environment

The GRU neural network is used in the DDPG-LBBP algorithm framework for the neural network training process of the DDPG algorithm. The GRU neural network, as an efficient recurrent neural network architecture, primarily comprises two gating mechanisms: a reset gate and an update gate. These gates enable the GRU to efficiently handle and extract relevant information from the input multi-dimensional network topology data, resulting in a superior training effect. The DDPG-LBBP algorithm leverages the sophisticated gating mechanisms of the GRU neural network, specifically the update gate and the reset gate. The update gate assesses the level of impact that historical state information has on the current state, while the reset gate modulates the degree of interaction between the current input and the historical state information. This approach streamlines the algorithm’s architecture, enhancing training efficiency while maintaining model accuracy. The interaction process between the DDPG-LBBP agent and the environment is shown in Figure 3.

The DDPG-LBBP agent interacts repeatedly with the network environment, adjusting the action output according to the dynamic changes in data transmission delay, link occupancy rate, packet loss rate and reward value in the network topology, and dynamically adjusting the link weight in the network topology. During the interaction between the agent and the environment, the following procedures are integral to the mapping of states, actions, and reward values:

(1): State mapping

The state serves as a representation of the real-time conditions within the network environment where the agent operates. The DDPG-LBBP algorithm incorporates the data transmission delay, link occupancy, and packet loss rate within the network topology as input state information. The SDN controller gathers and processes this state information, subsequently transmitting the actual network status to the agent. Specifically, the state of the nodes in the network topology at time

t

is denoted as

s_{t} = [d e l a y_{t} (e_{i, j}), b a n d w i d t h_{e_{i, j}} (t), L o s s_{e_{i, j}} (t)]

, where

d e l a y_{t} (e_{i, j})

represents the data transmission delay,

b a n d w i d t h_{e_{i, j}} (t)

signifies the link utilization, and

L o s s_{e_{i, j}} (t)

indicates the packet loss rate.

(2): Action mapping

The action represents a strategy formulated by the agent based on the evolving state and reward values. The DDPG-LBBP algorithm undergoes training by taking in the state

s_{t}

and reward value

r_{t}

at time

t

as inputs and subsequently outputs the optimized

θ^{Q}

action value after the iterative training converges. The action set outputted by the DDPG-LBBP agent is denoted as

a c t i o n = [a_{w_{1, 2}}, a_{w_{1, 3}}, \dots, a_{w_{i, j}}]

, wherein each action value

a_{w_{i, j}}

within the set corresponds to the link weight

w e i g h t_{e_{i, j}}

assigned to a specific link

e_{i, j}

.

(3): Reward value mapping

The reward value is the immediate benefit of the action

a_{t}

taken by the feedback agent in the current state

s_{t}

. The main optimization goals of the DDPG-LBBP algorithm are low link utilization, low transmission delay, and low packet loss rate. The link utilization, transmission delay, and packet loss rate are standardized as the basis for calculating the reward value. The specific calculation process of the reward value is shown in Equation (8):

r e w a r d = \frac{η}{D e l a {y^{'}}_{t} (p_{s, d})} + \frac{τ}{L o s s_{p_{s, d}} (t)} + \frac{ξ}{B a n d w i d t h_{p_{s, d}} (t)}

(8)

In Equation (8), the reward value factor parameter

η, τ, ξ \in (0, 1), η + τ + ξ = 1

is adjusted based on the relative importance of transmission delay, link utilization, and packet loss rate within the specific network topology to set the parameters

η, τ, ξ

. For the purposes of this article, parameter

ξ

was assigned a value of 0.3,

η

was set to 0.4, and

τ

was designated as 0.3, reflecting the relative weights given to these network performance metrics in determining the overall reward value.

In order to ensure the rapid recovery of faulty links in the communication network of the wide-area measurement system, this paper calculated the link weight value based on the DDPG algorithm improved by the GRU neural network. The specific process is shown in Algorithm 1.

Algorithm 1 Improved DDPG algorithm process based on GRU

Random initialization: parameters

θ^{μ}

,

θ^{μ'}

,

θ^{Q}

,

θ^{Q'}

, and

D

Input: Link status information in network topology

s_{t}

= [

d e l a y_{t} (e_{i, j})

,

b a n d w i d t h_{e_{i, j}} (t)

,

L o s s_{e_{i, j}} (t)

]

Output: Link

e_{i, j}

weight

w e i g h t_{e_{i, j}}

(1)

(1) For episode = 1,

M

do:
(2) Initialize

s_{1} \leftarrow

Initialize noise strategy

ℵ

(3) For

t

= 1,

T

do
(4) Select action

a_{t} = μ (s_{t} | θ^{μ})

according to the current policy
(5) Execute

a_{t} = μ (s_{t} | θ^{μ})

to make the SDN controller build phasor data transmission paths
(6) Obtain

r_{t}

and

s_{t + 1}

(7) Store transition

(s_{t}, a_{t}, r_{t}, s_{t + 1})

in D
(8) Sample a random minibatch of

N * (s_{t}, a_{t}, r_{t}, s_{t + 1})

from D
(9) Calculate target return value

y_{t}

,

y_{t} = r_{t} + γ Q^{'} (s_{t + 1}, μ^{'} (s_{t + 1} |θ^{μ^{'}}) |θ^{Q^{'}})

(10) Update the critic GRU network and update

s_{t + 1}

, the network parameters, by minimizing the error through

L = \frac{1}{N} {\sum_{i} (y_{t} - Q (s_{t}, a_{t} |θ^{Q}))}^{2}

(11) Update the parameters of the actor GRU network using the critic GRU network

\nabla_{θ^{μ}} G (μ_{θ^{μ}}) \approx \frac{1}{N} \sum_{i} \nabla_{a} Q {(s, a |Q^{a})|}_{s = s_{t}, a = μ (s_{t})} \nabla_{θ^{μ}} μ (s| θ^{μ}) |s_{t}

(12) Update the target network using

θ^{Q'} \leftarrow τ θ^{Q} + (1 - τ) θ^{Q'}

and

θ^{μ'} \leftarrow τ θ^{μ} + (1 - τ) θ^{μ'}

(13) End for
(14) End for

In Algorithm 1, lines 1–12 represent the parameter update process using the phasor’s flow state information during the training process of the improved DDPG algorithm. Within this framework, line 1 signifies the retrieval of the initial state

s_{1}

from the communication network environment of the WAMS. This initial state serves as the starting point for the subsequent optimization process. In lines 4 through 8, the agent, having executed action

a_{t}

, receives a reward value

r_{t}

and determines the subsequent action

a_{t + 1}

. It then stores this experience tuple D in the experience replay buffer D, facilitating the utilization of these data for future training iterations. Lines 9 through 12 randomly select training samples from the experience replay buffer D to facilitate the training of both critic GRU network and actor GRU network. This approach ensures that the networks are updated based on a diverse set of experiences, enhancing their generalization capabilities. After several rounds of iteration, the improved DDPG algorithm based on the GRU neural network converges, and finally, the link weight value

w e i g h t_{e_{i, j}}

in the wide-area measurement system communication network is output.

3.3. Method for Implementing Maximum Disjoint Backup Paths

Due to the physical layout limitations of nodes and links in the communication network topology of the wide-area measurement system, there are overlapping nodes and links in the main path and backup path, causing link congestion on the backup path for fault recovery and prolonging the faulty link. The recovery time may even cause the secondary link to fail. In order to solve the problem of secondary network link failure caused by overlapping links and nodes in the main path and backup path, this paper designed a maximum disjoint backup path that avoids the overlapping of nodes and links in the main path and backup path. According to Section 3.2, the new link weight in the network topology is calculated in Section 1, which is used to restore the faulty link in the network topology, improve the reliability of the backup path, and ensure that the vector data can be quickly restored and transmitted according to the backup path. In the communication network of the wide-area measurement system, based on the characteristics of all PMUs transmitting phasor data to the PDC, the Dijkstra algorithm is used to construct the shortest path tree

T

with the switch connected to the PDC as the destination node

d

. The source node

s

follows the shortest path to the root node

d

. To transmit data, the path

p_{s, d}

from the source node

s

to the root node

d

in the network topology is expressed as

p_{s, d} = e_{s, i} : : e_{i, i + 1} : : \dots : : e_{j, d}

. The weight of path

p_{s, d}

is calculated based on the action value

a_{w_{i, j}}

issued by the DDPG-LBBP agent. The specific calculation process of the weight of path

p_{s, d}

is shown in Equation (9):

d_{p_{s, d}} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} a_{w_{i, j}} i, j, n \in N^{*}

(9)

Based on the PMU in the WAMS periodically sending the collected phasor information to the PDC, the DDPG-LBBP algorithm proposed in this paper predicts the traffic changes in the network topology and computes the maximum disjoint backup path

b p_{s, d}

for each path in the shortest path tree

T

. The maximum disjoint backup path B is computed by the DDPG-LBBP algorithm in this paper. The DDPG-LBBP algorithm incorporates data transmission delay, link utilization, and packet loss rate into the path calculation process, reducing the likelihood of network congestion caused by the backup path. The steps of the maximum disjoint backup path algorithm are: (1) Traverse the shortest path tree

T

in the WAMS communication network topology and initialize the link weights in tree

T

to infinity. (2) Traverse all neighbor nodes of the starting node

s

in the network topology, select a node

v

with the smallest link weight from the unvisited neighbor nodes, modify this node as the current node, and add it to the node set. (3) Repeat step (2) until all nodes have been visited and the shortest path

b p_{s, d}

from the starting node

s

to the destination node

d

has been found. (4) After repeating the above steps, a new shortest path tree is computed which represents the maximum disjoint backup from all switches connected to the PMU to the destination node

d

path. The specific implementation process is shown in Algorithm 2. The maximum disjoint backup path network topology is shown in Figure 4.

Algorithm 2 Maximum disjoint backup path algorithm process

Input: Network topology

G = (V, E)

, link

e_{i, j}

weight

w e i g h t_{e_{i, j}}

in the topology

Output: The maximum disjoint backup path

b p_{s, d}

from start node

s

to destination node

d

(1) Initialize shortest path tree

T

(2) For each neighbor

v

of

s

(3) If

v

is not visited
(4)

d_{p_{s, v}}

= \min w e i g h t_{e_{s, v}}

(5) End if
(6) Add

v

to

to n o d e s_{-}

collection
(7) While there are unvisited nodes
(8) Update neighbor weights of

v

(9) Find maximum disjoint backup path

b p_{s, d}

(10) End while
(11) End for

As shown in Figure 4, the data forwarding path in the communication network topology is represented by lines with arrows, the data forwarding path in the network topology is calculated based on the action value generated by the DDPG-LBBP agent. The black line represents the data transmission path from node

s

to destination node

d

, the red line represents the maximum disjoint backup path calculated for the main path between nodes, and the red node number 7 represents the destination node

d

of all nodes. The data transmission process in the network topology is as follows: (1) node 1 to node 7 transmit data through the main path

p_{1, 7} = e_{1, 5} : : e_{5, 6} : : e_{6, 7}

; (2) when link

e_{1, 5}

in the path fails, the main path must be quickly switched to the maximum disjoint backup path

p_{1, 7} = e_{1, 8} : : e_{8, 9} : : e_{9, 6} : : e_{6, 7}

, which ensures data transmission between node 1 to node 7.

3.4. Backup Path Installation Method Implementation

The DDPG-LBBP algorithm enables rapid recovery of failed links by calculating the maximum disjoint backup paths for the WAMS communication networks. To prevent conflicts between packet forwarding rules based on the backup and main paths, and to meet the high real-time requirements of the WAMS communication networks, this paper designed a backup tag and incorporated it into the backup path installation process. Based on the pre-calculated maximum disjoint backup paths in the WAMS communication network topology, a backup tag was added to the header of the phase packet in the faulty link, guiding the phase packet to complete data transmission along the backup path. For a faulty link in the main path, the phase data from the faulty link can be redirected to the PDC based on the obtained maximum disjoint path, thus resuming the transmission of phase data. To reduce the backup path storage cost at the switch, when a link fails, the switch upstream of the faulty link adds a backup tag label to the header of the phase packet. Based on this backup tag label, the packet redirects traffic from the failed link to the backup path and directs the phase data packet to the PMU (or PDC). At the PMU (or PDC), the backup tag label is removed from the phase packet, which then continues along the primary path to the PDC. The backup tag label installation process is shown in Figure 5.

The phase data in the WAMS communication network are transmitted along the main path. The OpenFlow switch performs a matching operation on the phase data packet based on the flow table entry. The matching field in the flow table entry matches the source and destination IP addresses of the packet. After a successful match, the packet is processed according to the instructions in the flow table entry. The red node 8 in Figure 5 represents the switch connected to the PDC, and the blue node 4 represents the node connected to the PMU. When a failure occurs in the primary path, the switch upstream of the failed link detects that the current forwarding port is unavailable. This switch then adds a backup tag to the phase packet’s header, directing the packet to be transmitted along the backup path. When the phase packet reaches the switch connected to the PMU, the switch strips the backup tag from the packet.

4. Experiment and Result Analysis

4.1. Experimental Environment and Parameter Configuration

The experimental hardware environment used high-performance hardware equipment, including an Intel Core i7-13700H 5.0 GHz processor (from Intel Corporation, based in Santa Clara, CA, USA), NVIDIA RTX 4060 graphics card, 16 GB of memory, and a 1 TB solid-state hard disk drive, to ensure high-performance and stable operation of the experiment. The experimental software environment included Ubuntu 18.04 version, Python 3.6.9 version, TensorFlow 1.8.0 version, Gym 0.26.2 version, simulation tools, and controllers, including OpenFlow protocol V1.3, Mininet simulation tool V2.3.0, Ryu controller V4.34, and configured with an SDN-integrated wide-area measurement communication network topology. Among them, the OpenFlow protocol played a pivotal role in ensuring secure communication between the Ryu controller and the OpenFlow switch. By utilizing this protocol, the Ryu controller was able to monitor and gather real-time status information from the network topology, thereby facilitating network management and optimization. The Mininet simulation tool simulated Open Flow switches and network links to create a virtual network environment with real wide-area measurement communication network characteristics. The Ryu controller collected real-time topology operating status information in the wide-area measurement communication network, transmitted the status information to the DDPG agent for iterative training of the agent, and provided the data transmission path between nodes in the topology to each OpenFlow switch. The table of software functions and interaction relationships in the experiment is shown in Table 1.

In order to verify the effectiveness of the DDPG-LBBP algorithm proposed in this paper, the experimental process was based on the IEEE 30 and IEEE 57 benchmark power system communication network topologies, which are commonly used in the power system field. The performance of the DDPG-LBBP algorithm was tested, following the simplified methods commonly applied in IEEE benchmark power system communication network research [19]. The communication network topology was defined to match the node locations and interconnections of the IEEE benchmark power system topology. This paper referred to the PDC and PMU layout algorithms in [20,21] to determine the positions of the PMU and PDC in the IEEE 30 and IEEE 57 benchmark power systems, as shown in Figure 6. In order to simulate realistic scenarios more vividly and conduct a comprehensive evaluation of the DDPG-LBBP algorithm’s performance, based on the results of preliminary pre-experiments, we regulated the transmission rate of phasor data in the WAMS network to set different levels of data traffic intensity. Specifically, the traffic intensity was set within the range of 20–80% of the link’s maximum capacity. This range encompasses the typical operating conditions of the actual power system communication network, from low-load to relatively high-load scenarios. The bandwidth of each link was set to 1 Gbps, with a link delay defined as 1 ms/200 km. The maximum number of flow rules in the OpenFlow switch was capped at 1000. The Iperf tool was employed to simulate and generate phasor data flow within the WAMS network topology [22] and to compute the recovery delay of the failed components.

In Figure 6, the red node represents the switch connected to the PDC, and the green node represents the node connected to the PMU. Among them, Figure 6a shows the IEEE 30 benchmark power system communication network topology, where a PDC was deployed at node 17 and 10 PMUs were installed at nodes {1, 2, 6, 9, 10, 12, 15, 19, 25, 27}. In Figure 6b, the IEEE 57 benchmark power system communication network topology is shown, with a PDC deployed at node 22 and 10 PMUs installed at nodes {1, 4, 8, 10, 20, 21, 24, 28, 31, 32, 36, 41, 44, 46, 49, 52, 55, 57}.

4.2. Comparative Evaluation of DDPG-LBBP

In order to evaluate the performance of the proposed faulty-link fast recovery method, the DDPG-LBBP algorithm, the experiment compared DDPG-LBBP with other current representative faulty-link recovery methods. The algorithms compared included the (1+2

ε

)-BPCA backup path recovery algorithm ((1+2

ε

)-Approximation Backup Path Construction Algorithm) [18], FFRLI fault recovery method (Fast Fault Recovery Scheme Based on Link Importance) [17] and LIR faulty-link recovery method (Low Interruption Ratio Link Fault Recovery Scheme) [16]. The main comparison contents included the faulty-link recovery delay, packet loss rate after the faulty link was restored, and faulty-link recovery success rate.

4.3. Faulty-Link Recovery Delay

The fault recovery delay of the DDPG-LBBP algorithm in the power system communication network was tested on the IEEE 30 and IEEE 57 benchmarks. The faulty-link recovery delay was taken as the key indicator for experimental comparison. The faulty-link recovery delay was determined by the switch port’s switching time and the transmission delay of the phasor data packet to the destination node

d

along the backup path. To ensure the authenticity of the experimental results, a random algorithm was used in the IEEE 30 and IEEE 57 benchmark power system communication networks to simulate the generation of faulty links in the network topology. Multiple experiments were performed on each faulty link recovery method of DDPG-LBBP, (1+2

ε

)-BPCA, FFRLI, and LIR to statistically analyze the experimental results to ensure the credibility of the experimental results. The experimental results are shown in Figure 7 and Figure 8, where Figure 7 shows the recovery delay of each faulty-link recovery method in the IEEE 30 benchmark test power system communication network, and Figure 8 shows the recovery delay of each faulty-link recovery method in the IEEE 57 benchmark test power system communication network.

As shown in Figure 7, in the IEEE 30 benchmark test power system communication network, the fault recovery delay of the DDPG-LBBP faulty-link recovery method proposed in this paper was mainly distributed in the range of 1.9 ms~3.1 ms, only 7.9% of the fault recovery delay exceeded 3.1 ms, and the maximum fault recovery delay was 4.6 ms. The fault recovery delay of the (1+2

ε

)-BPCA backup path recovery algorithm was mainly distributed in the range of 2.8 ms~4.4 ms, and the maximum fault recovery delay was 5.4 ms. For the FFRLI and LIR faulty-link recovery methods, 22.1% and 68.4% of the faulty-link recovery delays exceeded 4.5 ms, respectively. The LIR faulty-link recovery method aims to reduce backup flow rules as the optimization objective. It does not consider link utilization when designing the backup path, which leads to link congestion during path recovery and results in a higher link recovery delay compared to other algorithms. The FFRLI fault recovery method calculates the backup path according to the importance of the link and recovers the faulty link faster than the LIR fault link recovery method.

As shown in Figure 8, in the IEEE 57 benchmark test power system communication network, the fault recovery delay of the DDPG-LBBP faulty-link recovery method proposed in this paper was mainly distributed in the range of 6.9 ms~8.2 ms, only 8.7% of the fault recovery delay exceeded 8.2 ms, and the maximum fault recovery delay was 9.4 ms. The fault recovery delay of the (1+2

ε

)-BPCA backup path recovery algorithm was mainly distributed in the range of 7.4 ms~9 ms, and the maximum fault recovery delay was 10 ms. For the FFRLI and LIR faulty-link recovery methods, 24.2% and 61.3% of the faulty-link recovery delays exceeded 9.4 ms, respectively. The (1+2

ε

)-BPCA backup path recovery algorithm uses a backup path approximation algorithm and optimizes the backup path installation process to reduce the faulty-link recovery delay. The DDPG-LBBP algorithm, which is grounded in the enhanced DDPG algorithm and the GRU, demonstrated substantial performance enhancements. Regarding the recovery latency of failed links, our algorithm dynamically fine-tunes the routing strategy in accordance with real-time network states, including link utilization and data transfer latency. Compared with the above three faulty-link recovery algorithms, the DDPG-LBBP algorithm proposed in this paper comprehensively considers the network topology status information to compute disjoint backup paths, resulting in the shortest fault recovery delay. Compared with the better (1+2

ε

)-BPCA backup path recovery algorithm, the faulty-link recovery delay was reduced by about 12.26%.

Figure 7 and Figure 8 show the corresponding faulty-link recovery delays of the four faulty-link recovery algorithms in the IEEE 30 and IEEE 57 benchmark test power system communication networks. It can be seen from the figures that the faulty-link recovery delays of the four faulty-link recovery algorithms in the IEEE 57 communication network were higher than those in the IEEE 30 communication network. The reason for this phenomenon is that some links in the IEEE 57 communication network were longer, resulting in larger link delays, which led to high faulty-link recovery delays in the IEEE 57 communication network. Since the DDPG-LBBP algorithm comprehensively considers the network topology status information to calculate disjoint backup paths, it can effectively reduce the fault recovery delay compared with the other three fault recovery algorithms.

4.4. Packet Loss Rate After a Faulty-Link Is Restored

Considering the fact that the phasor data flow in the communication network of the wide-area measurement system changes dynamically in real time, this paper designed an experimental environment with different traffic loads for the link design in the network topology. The Wireshark packet capture tool was used to analyze the phasor data packet loss in the network topology. The phasor packet loss rate is calculated as the ratio of phasor packets that have not reached the destination node

d

to the total number of phasor packets sent. To ensure the reliability of the experimental results, the packet loss rate was determined by averaging the results from several experiments in the IEEE 30 and IEEE 57 benchmark test power system communication networks. The packet loss rates of the DDPG-LBBP, FFRLI, (1+2

ε

)-BPCA, and LIR faulty-link recovery algorithms were then compared. The experimental results are shown in Figure 9.

As shown in Figure 9, the packet loss rate of the DDPG-LBBP algorithm fluctuated around 3.8% with the continuous increase in phasor data traffic in the IEEE 30 and IEEE 57 benchmark test power system communication networks, which was lower than that of other faulty-link recovery algorithms. The packet loss rates of the FFRLI, (1+2

ε

)-BPCA, and LIR faulty-link recovery algorithms showed an overall upward trend. The FFRLI fault recovery method formulates a backup path based on the importance of the link in the network topology, which can ensure the phasor data transmission of the main path, and thus had a lower packet loss rate than (1+2

ε

)-BPCA and LIR. The (1+2

ε

)-BPCA backup path recovery algorithm achieves faulty-link recovery using the faulty-link approximation backup path algorithm, without assessing the criticality of the link. This led to a higher packet loss rate in the network topology after fault recovery compared to the FFRLI fault recovery method. The LIR faulty-link recovery method was susceptible to excessive network link load during the experiment, resulting in the maximum phasor packet loss rate of the algorithm.

In terms of the packet loss rate, the DDPG-LBBP algorithm demonstrated remarkable adaptive capabilities. Confronted with varying traffic intensities, the algorithm intelligently optimized link utilization and data transmission strategies. In high-traffic-intensity scenarios, it balanced the traffic loads among multiple links to avert congestion. In low-traffic-intensity scenarios, it enhanced data transmission efficiency, thereby reducing the packet loss rate. The DDPG-LBBP algorithm proposed in this paper uses the improved DDPG method to optimize low link utilization, low transmission delay and low packet loss rate as its objectives. It ensures the load balancing of the backup path while achieving fast fault recovery and reducing the packet loss rate of the network topology. Compared with the better FFRLI fault recovery method, the packet loss rate in the network topology was reduced by about 15.31% after the faulty link was restored.

4.5. Faulty-Link Recovery Success Rate

To verify the faulty-link recovery performance of the DDPG-LBBP algorithm, the experiment used the faulty-link recovery success rate as the evaluation standard. In the IEEE 30 and IEEE 57 benchmark test power system communication networks, network topology environments with different traffic loads are set up. The ratio of the number of links successfully recovered during the experiment to the total number of faulty links in the topology is defined as the recovery success rate. The faulty-link recovery success rate of the DDPG-LBBP algorithm was experimentally compared with that of the (1+2

ε

)-BPCA, FFRLI, and LIR faulty-link recovery algorithms. The experimental results are shown in Figure 10.

As can be seen from Figure 10, the faulty-link recovery success rate of the DDPG-LBBP algorithm in the IEEE 30 and IEEE 57 benchmark test power system communication networks remained at 94.5%, and the faulty-link recovery success rates of the other three faulty-link recovery algorithms showed a significant downward trend as the traffic intensity in the network topology increased. Since the backup path calculated by the LIR fault link recovery method cannot be adjusted in time according to the network topology, this algorithm had the lowest recovery success rate compared to other fault recovery methods. The (1+2

ε

)-BPCA backup path recovery algorithm and the FFRLI fault recovery method had a higher faulty-link recovery success rate than the LIR faulty-link recovery method.

In terms of faulty-link recovery success rate, the DDPG-LBBP algorithm exhibited a distinct and highly efficient operational mechanism. This algorithm periodically updates non-overlapping backup paths based on the operational state of the network topology. During the installation of backup paths, a backup label mechanism is incorporated to precisely direct phasor data packets to be transmitted along the backup paths. This effectively minimizes data packet transmission errors and losses, significantly enhancing the recovery success rate of fault links. Compared with the better performing (1+2

ε

)-BPCA fault recovery algorithm, it improved the success rate of the faulty-link recovery by about 6.91%.

4.6. Ablation Experiments

To deeply analyze the impact mechanisms of key elements in the DDPG-LBBP algorithm on its performance, this study meticulously designed and conducted ablation experiments, focusing on the roles of the GRU, reward regulation, and path design in the algorithm. The experimental environment was consistent with that described in Section 4.1, and the experiments were carried out based on the communication network topology of the IEEE 57 benchmark test power system. To achieve precise analysis, this study set up three groups of comparative experiments. The first group was DDPG-RNN with the GRU removed, aiming to observe the impact of the GRU absence on the algorithm’s performance. The second group was DDPG-GRU with a basic reward mechanism, which was used to explore the differences in the algorithm’s performance under different reward mechanisms. The third group was the complete DDPG-LBBP algorithm, which served as a control group to provide a benchmark for comparative analysis. Four key indicators, namely, training time, average recovery delay, average packet loss rate, and average recovery success rate, were selected to comprehensively and systematically evaluate the algorithm’s performance from multiple dimensions. Each experiment was independently run 2000 times to ensure the reliability and stability of the experimental results. To deeply explore the information behind the experimental data, an Analysis of Variance (ANOVA) was used to process the experimental data and to determine whether the differences in the mean values of the evaluation indicators under different algorithm configurations were significant, thereby clarifying the impacts of key elements on the algorithm’s performance. The results of the ablation experiments are shown in Table 2.

As shown in Table 2, in terms of training time, DDPG-RNN had the longest training time, with a mean value of 356.8 s due to the absence of the GRU. In contrast, the training times of DDPG-GRU and DDPG-LBBP were significantly shortened to 214.6 s and 187.2 s, respectively, because of the presence of the GRU. Moreover, DDPG-LBBP’s training time was further reduced thanks to its optimized reward regulation mechanism. The analysis of variance showed that there were significant differences in training time among the three (F (2, 5997) = 1023.6, p < 0.001). Regarding the average recovery delay, DDPG-RNN performed the worst at 10.5 ms. The value of DDPG-GRU decreased to 8.3 ms, and DDPG-LBBP performed best at only 6.8 ms. The ANOVA indicated significant differences (F (2, 5997) = 876.5, p < 0.001), suggesting that the synergy between GRU and the reward regulation mechanism could accelerate the recovery of faulty links. In terms of the average packet loss rate, DDPG-RNN had a rate of 6.5%, DDPG-GRU had a rate of 4.2%, and DDPG-LBBP had the lowest rate of 3.5%. The significant differences (F (2, 5997) = 765.4, p < 0.001) reflected the role of the reward regulation mechanism in balancing network load and reducing the packet loss rate. The average recovery success rate of DDPG-RNN was 85.2%, DDPG-GRU’s rate increased to 91.5%, and DDPG-LBBP’s rate reached the highest at 94.5%. The significant differences (F (2, 5997) = 689.3, p < 0.001) indicated that the synergy of all elements could improve the recovery success rate.

In summary, through the adjustment and comparative analysis of the key elements of the DDPG-LBBP algorithm in the ablation experiments, the significant impacts of the GRU, reward regulation, and path design on the algorithm’s performance were accurately analyzed. The GRU, with its gating mechanism, effectively alleviated the problems of gradient vanishing or explosion. It not only significantly shortened the training time but also enhanced the algorithm’s ability to capture long-term dependencies when processing sequential data. As a result, it reduced the recovery delay and packet loss rate and increased the recovery success rate. The reward regulation mechanism comprehensively considered and optimized factors such as propagation delay, link utilization, and packet loss rate in the network topology. While balancing the network load, it optimized the training time, reduced the recovery delay, and lowered the packet loss rate, enhancing the algorithm’s performance in all aspects. The maximum disjoint backup path design collaborated with the former two. By avoiding the overlapping of nodes and links in the primary and backup paths, it fundamentally solved the problems of backup path congestion and secondary failures caused by path overlap. This further improved the reliability and stability of the algorithm in faulty-link recovery, ensured the fast and accurate transmission of vector data, and greatly enhanced the overall practicality of the algorithm.

5. Conclusions

Aiming at the high faulty-link recovery delay and link congestion problems existing in current network link failure fast recovery algorithms in a WAMS communication network application, this paper introduced SDN into the WAMS communication network and proposed a fault-link fast recovery method (DDPG-LBBP) based on deep reinforcement learning, which solved the problems of high faulty-link recovery delay and link congestion. The DDPG-LBBP method achieved two objectives: (1) By improving the DDPG algorithm, it realized the load balancing of the links in the communication network of the wide-area measurement system and output the optimal link weight for the design of the backup path. (2) By using the maximum disjoint path, it avoided the overlapping of nodes and links in the primary path and the backup path, thus avoiding the secondary network link failures. This paper compared the DDPG-LBBP fault recovery method with (1+2

ε

)-BPCA, FFRLI, and LIR, respectively. The results showed that the DDPG-LBBP fault recovery method had the advantages of a low faulty-link recovery delay, low packet loss rate, and high recovery success rate. Compared with the (1+2

ε

)-BPCA algorithm, the recovery delay was reduced by about 12.26%, and the faulty-link recovery success rate was increased by about 6.91%. Compared with the FFRLI method, the packet loss rate in the network topology after faulty-link recovery was reduced by about 15.31%.

This paper mainly considered the performance of the algorithm in the IEEE 30 and IEEE 57 benchmark power system communication networks, but the improved DDPG algorithm in various IEEE benchmark power system communication networks took a long time to train. Therefore, it is of great importance to study how the proposed fault recovery algorithm can quickly adapt to different scales of IEEE benchmark power system communication networks. In subsequent research work, we will further study a fast fault recovery algorithm that can be easily migrated in different networks.

Author Contributions

Conceptualization, W.H., W.G. and Y.L.; methodology, W.H., W.G. and Y.L.; software, W.H., W.G. and Y.L.; validation, W.H., W.G. and Y.L.; formal analysis, Q.L., J.Z. and X.H.; investigation, W.H., W.G. and Y.L.; resources, Q.L., J.Z. and X.H.; data curation, Q.L., J.Z. and X.H.; writing—original draft preparation, W.H., W.G. and Y.L.; writing—review and editing, W.H., W.G., Y.L., Q.L., J.Z. and X.H.; supervision, W.H. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Major Science and Technology Programs in Henan Province (No. 241100210100), the Henan Provincial Science and Technology Research Project (No. 242102211068, No. 232102210078), the Special Project for Research and Development in Key areas of Guangdong Province (No. 2021ZDZX1098), the China Higher Education Institution Industry-University-Research Innovation Fund (No. 2021FNB3001, No. 2022IT020), and the Stabilization Support Program of Science, Technology and Innovation Commission of Shenzhen Municipality (No. 20231128083944001).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Conflicts of Interest

Author Qingsong Lv was employed by the company Henan Xinda Wangyu Technology Co., Ltd. Author Jia Zhang was employed by the company Zhengzhou Xinda Institute of Advanced Technology. Author Xi He was employed by the company Henan Jiuyu Tenglong Information Engineering Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships.

Appendix A

Table A1. Glossary of equation terminology and abbreviations.

Glossary of Equation Terminology
$v$	Switch
$e_{i, j}$	Link connecting switches $v_{i}$ and $v_{j}$ in the network topology
$p_{s, d}$	Path from source node $s$ to destination node $d$
$b_{i, m} (t)$	Quantity of bytes transmitted by the $m$ port of switch $v_{i}$ at time $t$
$R_{e_{i, j}} (t)$	Utilization rate of link $e_{i, j}$ at time $t$
$l o s s$	Packet loss rate
$s$	State
$r$	Action
$g r a d$	Gradient
$γ$	Discount factor
D	Experience replay pool
WAMS	Wide-area measurement system
SDN	Software-Defined Networking
DDPG-LBBP	Load Balancing Backup Path Based on Deep Deterministic Policy Gradient
PMU	Phase Measurement Unit
PDC	Phasor Data Concentrator
CNN	Convolutional Neural Network
GRU	Gated Recurrent Unit

References

Hassan, M.A.M.; Abdalla, O.H.; Fayek, H.H.; Toha, S.F. Optimal WAMS Configuration in Nordic Power System. IJCSNS Int. J. Comput. Sci. Netw. Secur. 2023, 23, 130. [Google Scholar]
Arpanahi, M.K.; Torkzadeh, R.; Safavizadeh, A.; Ashrafzadeh, A.; Eghtedarnia, F. A novel comprehensive optimal PMU placement considering practical issues in design and implementation of a wide-area measurement system. Electr. Power Syst. Res. 2023, 214, 108940. [Google Scholar] [CrossRef]
Fayek, H.H.; Abdalla, O.H. Operation of the Egyptian Power Grid with Maximum Penetration Level of Renewable Energies Using Corona Virus Optimization Algorithm. Smart Cities 2022, 5, 34–53. [Google Scholar] [CrossRef]
Vahidi, S.; Ghafouri, M.; Au, M.; Kassouf, M.; Mohammadi, A.; Debbabi, M. Security of wide-area monitoring, protection, and control (WAMPAC) systems of the smart grid: A survey on challenges and opportunities. IEEE Commun. Surv. Tutor. 2023, 25, 1294–1335. [Google Scholar] [CrossRef]
Maleh, Y.; Qasmaoui, Y.; El Gholami, K.; Sadqi, Y.; Mounir, S. A comprehensive survey on SDN security: Threats, mitigations, and future directions. J. Reliab. Intell. Environ. 2023, 9, 201–239. [Google Scholar] [CrossRef]
Jia, H.; Hou, W.; Wan, S.; Wang, T.; Xiang, H. Fast Communication Path Restoration for Power System Observability Recovery in a Sdn-Enabled Wide-Area Monitoring System. Available online: https://ssrn.com/abstract=4625443 (accessed on 22 February 2025).
Qu, Y.; Chen, G.; Liu, X.; Yan, J.; Chen, B.; Jin, D. Cyber-resilience enhancement of PMU networks using software-defined networking. In Proceedings of the 2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Tempe, AZ, USA, 11–13 November 2020. [Google Scholar]
Kelian, V.H.; Mohd Warip, M.N.; Ahmad, R.B.; Phaklen, E.; Faiz, Z.F.; Zaizu, I.M. Traffic Engineering Provisioning of Multipath Link Failure Recovery in Distributed SDN Controller Environment, Proceedings of the AIP Conference Proceedings; AIP Publishing: Tokyo, Japan, 2024; p. 2898. [Google Scholar]
Muthumanikandan, V.; Valliyammai, C. Link failure recovery using shortest path fast rerouting technique in SDN. Wire-Less Pers. Commun. 2017, 97, 2475–2495. [Google Scholar] [CrossRef]
Zheng, L.; Xu, H.; Chen, S.; Huang, L. Performance guaranteed single link failure recovery in SDN overlay networks. In Proceedings of the 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS), Hong Kong, China, 2–4 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 703–708. [Google Scholar]
Li, J.; Qi, X.; Ma, W.; Liu, L. Path selection for link failure protection in hybrid SDNs. Future Gener. Comput. Syst. 2022, 137, 201–215. [Google Scholar] [CrossRef]
Astaneh, S.A.; Shah Heydari, S.; Taghavi Motlagh, S.; Izaddoost, A. Trade-offs between Risk and Operational Cost in SDN Failure Recovery Plan. Future Internet 2022, 14, 263. [Google Scholar] [CrossRef]
Duan, T.; Dinavahi, V. Fast path recovery for single link failure in SDN-enabled wide area measurement system. IEEE Trans. Smart Grid 2021, 13, 1645–1653. [Google Scholar] [CrossRef]
Wang, Y.; Feng, S.; Guo, H.; Qiu, X.; An, H. A single-link failure recovery approach based on resource sharing and performance pre-diction in SDN. IEEE Access 2019, 7, 174750–174763. [Google Scholar] [CrossRef]
Nurwarsito, H.; Prasetyo, G. Implementation Failure Recovery Mechanism using VLAN ID in Software Defined Networks. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 178. [Google Scholar] [CrossRef]
Liang, D.; Liu, Q.; Yan, B.; Hu, Y.; Zhao, B.; Hu, T. Low interruption ratio link fault recovery scheme for data plane in software-defined net-works. Peer-to-Peer Netw. Appl. 2021, 14, 3806–3819. [Google Scholar] [CrossRef]
Zhu, Z.; Yu, H.; Liu, Q.; Liu, D.; Mei, B. FFRLI: Fast fault recovery scheme based on link importance for data plane in SDN. Comput. Netw. 2023, 237, 110062. [Google Scholar] [CrossRef]
Duan, T.; Dinavahi, V. Dataplane Based Fast Failover in SDN-Enabled Wide Area Measurement System of Smart Grid. IEEE Trans. Ind. Inform. 2022, 19, 8148–8158. [Google Scholar] [CrossRef]
Tools, I.T.F.O.I.T.F.S.; Müller, S.C.; Georg, H.; Nutaro, J.J.; Widl, E.; Deng, Y.; Palensky, P.; Awais, M.U.; Chenine, M.; Küch, M.; et al. Interfacing power system and ICT simulators: Challenges, state-of-the-art, and case studies. IEEE Trans. Smart Grid 2016, 9, 14–24. [Google Scholar]
Chakrabarti, S.; Kyriakides, E. Optimal placement of phasor measurement units for power system observability. IEEE Trans. Power Syst. 2008, 23, 1433–1440. [Google Scholar] [CrossRef]
Zhu, X.; Wen, M.H.F.; Li, V.O.K.; Leung, K.-C. Optimal PMU-communication link placement for smart grid wide-area measurement systems. IEEE Trans. Smart Grid 2018, 10, 4446–4456. [Google Scholar] [CrossRef]
Islam, M.T.; Islam, N.; Refat, M.A. Node to node performance evaluation through RYU SDN controller. Wirel. Pers. Commun. 2020, 112, 555–570. [Google Scholar] [CrossRef]

Figure 1. SDN-based architecture for WAMS communication networks.

Figure 2. DDPG-LBBP algorithm framework.

Figure 3. The interaction process between the DDPG-LBBP agent and the environment.

Figure 4. Maximum disjoint backup path network topology.

Figure 5. Backup trail flow table configuration information.

Figure 6. IEEE 30 and 57 benchmark power system communication network topologies. (a) IEEE 30 benchmark test. (b) IEEE 57 benchmark test.

Figure 7. Faulty-link recovery delay in IEEE 30 communication networks.

Figure 8. Faulty-link recovery delay in IEEE 57 communication networks.

Figure 9. Packet loss rate of different fault recovery algorithms.

Figure 10. Recovery success rate of different fault recovery algorithms.

Table 1. The table of software functions and interaction relationships.

Software Name	Function in the Experiment	Interaction with Other Components
Ubuntu 18.04	Offered stable runtime, managed resources, eased software installation	Supported Python, enabled SDN component comms, resource-supply Mininet
Python 3.6.9	Allowed development of algorithms and processed data	Invoked other libraries and interacted with Mininet and Ryu
TensorFlow 1.8.0	Supported the training of deep learning models	Built and optimized models within Python
Gym 0.26.2	Provided a simulation environment for reinforcement learning	Facilitated interaction during algorithm training
OpenFlow protocol V1.3	Enabled communication between the controller and switches and configuration	Managed flow tables
Mininet simulation tool V2.3.0	Simulated the network environment	Served as a test scenario for the algorithm
Ryu controller V4.34	Collected network information and guided data forwarding	Communicated with switches and provided data for the algorithm

Table 2. The results of the ablation experiments.

	Training Time (s)	Average Recovery Delay (ms)	Average Packet Loss Rate (%)	Average Recovery Success Rate (%)
DDPG-RNN	356.8 ± 23.5	10.5 ± 1.2	6.5 ± 0.8	85.2 ± 3.5
DDPG-GRU	214.6 ± 15.8	8.3 ± 0.9	4.2 ± 0.6	91.5 ± 2.8
DDPG-LBBP	187.2 ± 12.4	6.8 ± 0.7	3.5 ± 0.5	94.5 ± 2.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, W.; Gui, W.; Li, Y.; Lv, Q.; Zhang, J.; He, X. Faulty Links’ Fast Recovery Method Based on Deep Reinforcement Learning. Algorithms 2025, 18, 241. https://doi.org/10.3390/a18050241

AMA Style

Huang W, Gui W, Li Y, Lv Q, Zhang J, He X. Faulty Links’ Fast Recovery Method Based on Deep Reinforcement Learning. Algorithms. 2025; 18(5):241. https://doi.org/10.3390/a18050241

Chicago/Turabian Style

Huang, Wanwei, Wenqiang Gui, Yingying Li, Qingsong Lv, Jia Zhang, and Xi He. 2025. "Faulty Links’ Fast Recovery Method Based on Deep Reinforcement Learning" Algorithms 18, no. 5: 241. https://doi.org/10.3390/a18050241

APA Style

Huang, W., Gui, W., Li, Y., Lv, Q., Zhang, J., & He, X. (2025). Faulty Links’ Fast Recovery Method Based on Deep Reinforcement Learning. Algorithms, 18(5), 241. https://doi.org/10.3390/a18050241

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Faulty Links’ Fast Recovery Method Based on Deep Reinforcement Learning

Abstract

1. Introduction

2. Wide-Area Measurement System Communication Network SDN Architecture

2.1. System Architecture Design

2.2. System Modeling

3. Faulty-Path Recovery Scheme Based on DDPG-LBBP

3.1. DDPG-LBBP Data Transmission Path Optimization Algorithm

3.2. DDPG-LBBP Agent Interacts with the Environment

3.3. Method for Implementing Maximum Disjoint Backup Paths

3.4. Backup Path Installation Method Implementation

4. Experiment and Result Analysis

4.1. Experimental Environment and Parameter Configuration

4.2. Comparative Evaluation of DDPG-LBBP

4.3. Faulty-Link Recovery Delay

4.4. Packet Loss Rate After a Faulty-Link Is Restored

4.5. Faulty-Link Recovery Success Rate

4.6. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI