Next Article in Journal
Internet of Things and Deep Learning for Citizen Security: A Systematic Literature Review on Violence and Crime
Previous Article in Journal
The New CAP Theorem on Blockchain Consensus Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A3C-R: A QoS-Oriented Energy-Saving Routing Algorithm for Software-Defined Networks

1
College of Electronics & Communication Engineering, Shenzhen Polytechnic University, Shenzhen 518005, China
2
College of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
3
College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450007, China
*
Author to whom correspondence should be addressed.
Future Internet 2025, 17(4), 158; https://doi.org/10.3390/fi17040158
Submission received: 20 February 2025 / Revised: 18 March 2025 / Accepted: 25 March 2025 / Published: 3 April 2025

Abstract

:
With the rapid growth of Internet applications and network traffic, existing routing algorithms are usually difficult to guarantee the quality of service (QoS) indicators such as delay, bandwidth, and packet loss rate as well as network energy consumption for various data flows with business characteristics. They have problems such as unbalanced traffic scheduling and unreasonable network resource allocation. Aiming at the above problems, this paper proposes a QoS-oriented energy-saving routing algorithm A3C-R in the software-defined network (SDN) environment. Based on the asynchronous update advantages of the asynchronous advantage Actor-Critic (A3C) algorithm and the advantages of independent interaction between multiple agents and the environment, the A3C-R algorithm can effectively improve the convergence of the routing algorithm. The process of the A3C-R algorithm first takes QoS indicators such as delay, bandwidth, and packet loss rate and the network energy consumption of the link as input. Then, it creates multiple agents to start asynchronous training, through the continuous updating of Actors and Critics in each agent and periodically synchronizes the model parameters to the global model. After the algorithm training converges, it can output the link weights of the network topology to facilitate the calculation of intelligent routing strategies that meet QoS requirements and lower network energy consumption. The experimental results indicate that the A3C-R algorithm, compared to the baseline algorithms ECMP, I-DQN, and DDPG-EEFS, reduces delay by approximately 9.4%, increases throughput by approximately 7.0%, decreases the packet loss rate by approximately 9.5%, and improves energy-saving percentage by approximately 10.8%.

1. Introduction

In recent years, with the explosive growth of network traffic of various new network applications and services, data flows with a large number of business characteristics have been generated. For example, online teaching and video conferencing have high requirements on bandwidth, and online games and instant messaging have high requirements on delay. Therefore, in order to ensure the real-time nature of network service interaction, network services in different scenarios put forward higher requirements for quality of service (QoS) such as network bandwidth, delay, packet loss rate, and throughput [1]. At the same time, with the continuous popularization of data-intensive applications and the continuous growth of the number of network devices, the proportion of network infrastructure energy consumption is gradually increasing, and energy-saving flow scheduling has become a common trend [2]. In the face of huge network QoS and energy-saving demands, the disadvantages of the traditional “best effort” network architecture and design concept are gradually exposed, such as difficulty in controlling the network scale, implementing network management, guaranteeing network services, and updating network technologies [3]. Therefore, how to achieve more efficient QoS differentiated services and rational allocation of network energy resources under the existing available network resources is of great significance.
In order to improve the openness and innovation of the traditional network architecture and promote the optimal selection of traffic scheduling strategies, software defined network (SDN) [4] technology has been well developed. SDN decouples the control plane and data forwarding plane of network equipment, which can make the utilization of network resources more fully and effectively [5]. Compared with traditional networks, SDN simplifies the complexity of network equipment, reduces network management configuration and operation requirements, effectively improves the flexibility of network control and management, and enhances the rapid support capabilities of new network technologies, new requirements, and new protocols. SDN-oriented routing planning makes the path selection flexible and changeable, which can realize the rational utilization of network resources, effectively avoid network congestion, and better improve the overall network service quality [6].
With the rise in artificial intelligence, machine learning technology has developed rapidly, and significant progress has been made in data processing and combinatorial optimization. Utilizing the convenience provided by SDN, SDN-oriented intelligent routing optimization and network resource scheduling can be realized, making network operation, maintenance, and management continue to develop in the direction of intelligence. Based on this, this paper proposes a QoS-oriented energy-saving routing optimization algorithm A3C-R in the SDN environment. A3C-R divides different types of service traffic, uses delay, bandwidth, and packet loss rate as QoS measurement indicators, and considers the energy consumption of the network. Based on the advantage of multi-threaded asynchronous update of the A3C algorithm [7], multiple agents can be executed simultaneously. It can reduce the correlation between samples in the training process through a variety of different state samples, thereby reducing the computing power demand in the process of agent and environment exploration, so as to choose the energy-saving path that meets the QoS requirements as much as possible. The main contributions of this paper are as follows:
  • We investigated the research value and significance of SDN-oriented QoS and energy-saving routing schemes, analyzed the research status of existing routing algorithms, and proposed an A3C-R intelligent routing algorithm that takes into account both QoS and energy saving.
  • A routing optimization goal that takes into account both QoS and energy saving is designed, and the A3C-R intelligent routing algorithm training framework is built based on A3C’s asynchronous training and advantage functions. And we design the state, action, and reward of the intelligent routing and network environment interaction process, carry out multi-agent model training and global parameter sharing, and improve algorithm efficiency and convergence.
  • A simulation experiment environment was built, and we verified the convergence of the A3C-R routing algorithm, as well as the QoS and energy-saving optimization effects of flows under different network traffic loads.

2. Related Research

With the continuous diversification of business traffic requirements and the increasing network energy consumption, routing traffic scheduling for QoS and energy saving is facing severe challenges. Due to the continuous development of SDN and artificial intelligence technology, many domestic and foreign researchers continue to propose intelligent traffic scheduling methods based on machine learning in the SDN environment.
  • Research Status of QoS Routing Technology
QoS routing aims to provide differentiated services under different business traffic requirements, network operating status and link available resources, and dynamically select transmission paths that meet network load and resource configuration for different business flows. With the rise in SDN, SDN-oriented QoS routing calculations continue to develop. Li S H [8] proposed a routing optimization technology solution based on traffic classification in the SDN environment. This solution can identify and classify the service types of data flows entering the network and perform routing optimization scheduling according to the QoS requirements corresponding to the service types of data flows, so as to provide a QoS guarantee for network applications of different service types. Compared with the traditional scheduling mechanism, it can effectively reduce network congestion and improve network service quality. However, the calculation efficiency of QoS of this algorithm still needs to be improved under complex network traffic. In order to further improve the computational efficiency of the QoS routing algorithm, some researchers have introduced the classic heuristic algorithm into SDN for research. Fei H et al. [9] proposed a novel energy efficient QoS secure routing algorithm based on the Levy Chaos Adaptive Snake Optimization-based Multi-Trust Routing Method (LCASO-MTRM). The method combines chaotic and adaptive operators and aims to reduce energy consumption, latency, and packet loss while increasing bandwidth. By introducing the link trust mechanism, LCASO-MTRM is able to more accurately assess the link trust level and reflect the current link status. Although the above QoS routing algorithms have better QoS performance than the classical Dijkstra routing algorithm, it is difficult to consider the balance of QoS parameters more comprehensively in the case of more complex business traffic. With the rise in artificial intelligence, Y Wang et al. [10] proposed a Dynamic Fuzzy Routing Algorithm based on Deep Reinforcement Learning (DFRDRL). This involved applying deep reinforcement learning for path routing decisions with the help of Target and Online Neural Networks (TONN). It also introduces a delay mechanism to detect congestion based on real-time traffic forecasts and prioritize low resource demand requests. The method achieves a significant improvement in optimizing network QoS performance such as delay, bandwidth, and link utilization. Sanchez L P A et al. [11] proposed a QoS-driven routing optimization method (DQS) based on Deep Reinforcement Learning (DRL) for routing optimization in Software Defined Network (SDN). The approach dynamically adapts routing decisions through a multi-objective function-driven DRL agent that combines link and queuing metrics to reduce end-to-end delay and improve network performance. However, when facing multiple traffic types, it is difficult to meet the QoS requirements of different applications.
2.
Research status of energy-saving routing technology
Energy-saving routing is a key means to solve the problem of network energy consumption. It is of great significance to reduce network operation costs and improve the efficiency of network resource utilization. The exploration of its optimization strategy has always been a research hotspot in the field of network research. Wang Z et al. [12] proposed an unequally clustered routing protocol based on a multi-hop threshold distance (UCRTD). The method determines the energy-saving threshold distance for multi-hop communication by analyzing the energy consumption model of WSNs and proposes an optimal relay node selection strategy based on distance and residual energy. However, the existing non-uniform clustering protocols lack the theoretical basis of multi-hop threshold distance, have insufficient load balancing during data transmission, and perform poorly in the face of complex and variable networks. Tang Q et al. [13] proposed a chaos particle swarm optimization (CPSO)-based cluster routing algorithm for a wireless sensor network (WSN). The method optimizes network performance through clustering, cluster head selection, and energy management. However, the performance of the algorithm is greatly affected by the parameters, and the adaptability and scalability of the algorithm still need to be improved in large-scale and dynamically changing network environments. Shu X et al. [14] proposed a multi-intelligent deep reinforcement learning algorithm (EMADRL). By defining the multi-agent reinforcement learning form of MDVRP, a policy network model based on the encoder–decoder structure is designed, and it is trained using the policy gradient algorithm, and local search strategies such as 2-opt search and sampling search are introduced to optimize the solution. However, the EMADRL algorithm has a low accuracy of Q-value estimation at the early stage of training, resulting in large fluctuations in the reward curve. Its convergence time is longer than some of the compared algorithms, and the convergence speed still needs to be improved, even though it can produce high-quality solutions quickly in practical high-computing power platforms. Niranjana M I et al. [15] proposed a Grid Based Reliable Routing (GBRR) algorithm for Wireless Sensor Networks (WSN) that combines virtual grid cluster construction and energy efficient next hop selection strategy. The main approach is to improve the reliability and energy efficiency of data transmission through mesh partitioning and intra/inter-cluster communication optimization. Akyıldız O et al. [16] proposed a task offloading scheme (TOS-P4) based on the P4 programming language for task allocation and resource management in fog computing networks. This method is able to improve efficiency and reduce the latency of task processing by performing task type identification and dynamic adjustment of load state in the data plane through P4 switches. However, in traditional fog computing network task offloading, it is difficult for tasks to be accurately assigned to the right server based on type, and SDN controllers have limitations in network traffic management and programmability.
To sum up, although the existing routing algorithms for QoS and energy saving can achieve the optimization goals set by themselves, they still have a single consideration factor, and it is difficult to take into account multiple optimization goals such as QoS and energy saving at the same time. Moreover, the convergence efficiency of existing routing algorithms and the effectiveness of algorithm models still need to be improved, which makes QoS-oriented energy-saving routing optimization of great research significance. Therefore, based on the efficient routing scheduling mechanism of SDN, this paper further studies how to provide reasonable allocation of network resources through intelligent routing scheduling, realize differentiated services for QoS requirements, and take into account network energy consumption optimization.

3. Modeling of QoS and Energy-Efficient Routing

3.1. Modeling of QoS Routing

QoS is the guarantee of network transmission quality, which can provide better service quality according to different business requirements [17]. The process of defining and measuring QoS parameters mainly involves the collection of network link state information, through which the port and flow information of the switch can be obtained and processed as the input of the intelligent routing module. This paper mainly designs QoS parameter indicators including: delay, bandwidth and packet loss rate. The parameters of QoS are mainly calculated through the message transmission process between the controller and the switch. The message transmission process between the controller and the switch is shown in Figure 1. The specific calculation method of each QoS parameter is as follows:
  • Delay
The SDN controller can generate a Packet-out message and send a data packet including LLDP to the switch S2, through the switch S1, and the time stamp is t1. When the controller receives the Packet-in message received by the switch S2, the time stamp is t2, and the time of the message from “controller-S1-S2-controller” is T 1 = t 1 t 2 . Similarly, the “controller-S2-S1-controller” time can be measured as T 2 . Then, the controller can measure the round-trip delays t c s 1 and t c s 2 of the switches S1 and S2, respectively, through the echo data packets, so the calculation formula of the link delay d S 1 , S 2 of the switches S1 and S2 is shown in (1).
d S 1 , S 2 = 1 2 T 1 + T 2 t c s 1 t c s 2
Therefore, the delay function of path P can be defined as Formula (2).
D e l a y P = i , j P d i , j
where d i , j is the delay of link i , j .
2.
Bandwidth
Bandwidth is a measure of the link’s ability to transmit data per unit of time. We can use rx_Bytes to measure the number of bytes received by the port and use tx_Bytes to measure the number of bytes sent by the port. Suppose we use tx_Bytes to measure the number of bytes sent by switch S1 to S2 at time t1 to t2 as W t and use rx_Bytes to measure the number of data packets received by switch S2 from S1 at time t1 to t2 as W r . Then, the average bandwidth and remaining bandwidth of the switches S1 and S2 used to transmit data in the t 1 , t 2 time interval are shown in Formula (3).
b a v e r a g e = min W t , W r t 2 t 1 b r e m a i n = b t o t a l b a v e r a g e
Since the minimum bandwidth requirement must be considered when transmitting data flows, the bandwidth of each link must not be less than the minimum bandwidth when selecting a path for routing. The bandwidth of path P is defined as shown in Formula (4).
B P = m i n i , j P b i , j
where b i , j is the available bandwidth on the link i , j .
3.
Packet loss rate
The packet loss rate refers to the ratio of lost packets to the total number of packets within a period of time. We can use rx_packets to measure the number of packets received by the port and tx_packets to measure the number of packets sent by the port. Suppose we use tx_packets to measure the number of data packets sent by switch S1 to S2 at time t1 to t2 as Δ p 1 and use rx_packets to measure the number of data packets received by switch S2 from S1 at time t1 to t2 as Δ p 2 . The packet loss rate of switches S1 and S2 in the t 1 , t 2 time interval is shown in Formula (5).
l S 1 , S 2 = Δ p 2 Δ p 1
Therefore, we can define the packet loss rate function of path P as Formula (6).
L o s s P = 1 Π i , j P 1 l i , j
Since the packet rate calculation cannot be multiplied, it needs to be converted into a transmission success rate before processing, where l i , j is the packet loss rate on link i , j .

3.2. Modeling of Energy-Efficient Routing

The energy consumption of switches in the network topology is mainly composed of two parts: fixed energy consumption and dynamic energy consumption. The fixed energy consumption is a constant related to the switch hardware. When the switch is turned on, there is energy consumption, and the dynamic energy consumption is a variable related to the forwarding rate of the switch port. This paper mainly conducts research on network energy consumption from the aspect of flow routing scheduling, aiming to transmit traffic evenly in the network topology, and reduce dynamic energy consumption related to port forwarding rate by improving link utilization. In this paper, the network topology, G, is defined as a collection of forwarding nodes and links, which can be expressed as G = V , E , C . Among them, V represents the collection of all switches and hosts, including all transmission nodes, E represents the collection of all links, and C represents the maximum bandwidth capacity of the link. In order to simplify the optimization idea of energy consumption, this paper abstracts the energy consumption of network ports as link energy consumption and adopts the energy consumption function of the more commonly used rate scaling model [18]. The calculation process of the energy consumption function p e x e of the links in the network topology is shown in Formula (7).
p e x e =             0 ,           x e = 0   σ + μ x e α ,         0 x e C
In Formula (7), x e is the transmission rate of the link, σ represents the inherent energy consumption when the link is turned on, μ x e α represents the dynamic energy consumption of the link, where μ represents the link rate correlation coefficient, and α represents the link rate correlation index and α > 1 .
We assume that each flow is independent and indivisible, and the set of flows is denoted as F l o w = f 1 , f 2 , f i , , f n | n N + and f i = r i , d i , p i , q i , w i , where r i represents the start time of the flow f i , d i represents the end time of the flow f i , p i represents the starting node of the transmission path of the flow f i , q i represents the termination node of the transmission path of the flow f i , and w i represents the data volume of the flow f i . Assume that the transmission rate of f i at time, t, is s i t , and Formula (8) needs to be satisfied.
r i d i s i t d t = w i ,     f i F l o w
If the link set of the flow transmission path is denoted as P i , our process of optimizing routing scheduling can be regarded as the solution to Formula (9).
S = s i t , P i | f i F l o w , t r i , d i
Define the active link in time T = t 1 , t 2 as E a , and the total number of active links is denoted as L a = E a . The total energy consumption of the link during the transmission of the flow on the active link is recorded as ϕ f S , and the calculation process is shown in Formula (10).
E a = e E | t t 1 , t 2 , x e t > 0 ϕ f S = t 2 t 1 · L a · σ + t 1 t 2 e E a μ x e t α d t
We assume that the working status of network devices and links in the current network topology remains unchanged, which is equivalent to the fixed network energy consumption of the network topology. We only need to pay attention to the dynamic energy consumption during the flow transmission process. Therefore, reducing dynamic energy consumption requires keeping the flow uniform in time and space, which is equivalent to solving the minimum value of Formula (11).
ϕ f = t 1 t 2 e E a μ x e t α d t
We use the delay, bandwidth, and packet loss rate in QoS as the main optimization goals and reduce network energy consumption. In order to facilitate the optimization training, we need to normalize the energy consumption and QoS data and use the method of Z-score [19] for processing. The Z-score method is a widely used statistical technique for data standardization. It transforms the original data into a standard normal distribution, which is beneficial for comparing data from different sources and detecting outliers. In our study, we used the Z-score method for two main purposes. First, to pre-process the experimental data. Given that the network performance data (such as delay, bandwidth utilization) collected from various sources and under different measurement conditions may have significant differences in scale and distribution, Z-score standardization helps to unify these data onto a common scale, ensuring more stable and accurate subsequent data analysis and model training. Second, we utilized the Z-score to identify outliers in the dataset. Data points with an absolute Z-value greater than 3 (or less than −3) are often considered as outliers. For such outliers, we conducted further investigations to determine whether they were caused by measurement errors or special network events, thus ensuring the quality of the data. The calculation process is shown in Formula (12).
y * = y u σ
where y * represents the target value, y represents the original data, u is the mean, and σ is the standard deviation. Therefore, we can use the Formula (13) to express the optimization process.
min ( δ · ϕ f + β · D e l a y P + λ · 1 D P + γ · L o s s P )
Among them, ϕ f , D e l a y P , D P , and L o s s P are normalized energy consumption, delay, bandwidth, and packet loss rate, respectively, and δ , β , λ , and γ are parameter weights.
In order to ensure that the above traffic scheduling process is not affected by the environment, we stipulate that the traffic transmission process must be completed before the latest deadline, the bandwidth occupation of each link does not exceed the link capacity C, and each flow is received by the destination node.

4. QoS and Energy-Saving Routing Based on A3C

4.1. A3C-R Routing Framework

A3C-R uses the asynchronous multi-thread training mechanism of the A3C algorithm to execute multiple agents, including a Global network and multiple Worker threads, and their network structures are exactly the same. Each Worker thread includes the Actor network and the Critic network to interact independently with the environment. The Actor network can realize the mapping from state to action, and the Critic network is to evaluate whether the action is good or bad. During the independent exploration process of multiple Workers and the environment of the A3C algorithm, it not only breaks the correlation between training data but also saves storage space, reduces computing power requirements, and shows obvious advantages in training speed and convergence. The A3C algorithm asynchronous update framework is shown in Figure 2.
In the traditional Actor network and the Critic network, the Actor network refers to the policy function π θ a | s , and the Critic network refers to the value function V ϕ s . The policy function π θ a s defines the probability distribution of selecting action a given the current state s . It is used to guide the agent’s decision-making process. The value function, V ϕ ( s )   , represents the expected cumulative reward that the agent can obtain starting from state, s . It evaluates the quality of the current state and helps the agent optimize its policy. The strategy function and the value function are continuously learned during the training process, and the reward starting at time, t, can be recorded as R τ t : T , and the calculation process of the approximate value, R ^ τ t : T , is shown in Formula (14).
R ^ τ t : T = r t + 1 + γ V ϕ s t + 1
Among them, s t + 1 and r t + 1 are the state and reward at time t + 1 , respectively, and the strategy function π θ a | s and function V ϕ s are learned during the update process. On the one hand, the parameter ϕ can be updated to make the value function, V ϕ s t , close to the real return value, R ^ τ t : T . The calculation process Formula (15) is shown.
min ϕ R ^ τ t : T V ϕ s t 2
On the other hand, the value function is used as a baseline update to reduce the variance of the policy gradient, and the update process is shown in Formula (16).
θ θ + α γ t R ^ τ t : T V ϕ s t θ log π θ a t | s t
The traditional Actor network and Critic network have the phenomenon that no matter which action is used in a certain state, they can obtain better action value, ignoring the differences between actions, making the training process unstable. Therefore, the A3C algorithm introduces an advantage function. The process is shown in Formula (17).
A s , a = Q s , a V s , a
Among them, Q s , a is the action value function, V s is the estimated cumulative state value function in the states, and the training sequence s t , a t , r t , s t + 1 is given. The calculation process of Q s , a is shown in Formula (18).
Q s , a = r t + γ V s t + 1
The policy update of the Actor network will adopt the policy gradient training method. The calculation process is shown in Formula (19).
θ θ + α t θ log π θ s t | a t + β θ H π θ · | s t
Among them, α is the learning rate, and the entropy value of strategy π is introduced to optimize the exploration process by organizing premature convergence to the suboptimal determination strategy. H is the entropy of the strategy. The entropy function, H π θ s i , measures the randomness of the policy, π θ . It is introduced to prevent premature convergence to suboptimal solutions and encourages exploration during the training process. The parameter β controls the influence of the entropy regularization term.
The Critic network will update the parameters using the temporal difference method. The calculation process is shown in Formula (20).
ϕ ϕ + α t θ r t + γ V s t + 1 ; ϕ V s t ; ϕ 2

4.2. Design of A3C-R Routing and Environment Interaction

We realize the interaction process between A3C-R routing and the environment in the SDN environment. The SDN mainly includes the data plane, the control plane, and the intelligent plane, as well as the southbound interface and the northbound interface. The training process of the A3C algorithm is mainly located in the intelligent plane. First, the data plane collects network status information and then transmits it to the control plane through the southbound interface to extract valid status information. Finally, the effective state information is input to the A3C-R routing algorithm of the intelligent plane through the northbound interface for asynchronous training. After the A3C-R routing algorithm training converges, it can output the link weights of the network topology, and then the control plane generates a flow table after receiving the link weight information and sends it to the data plane. In this way, the interaction between A3C-R routing and the environment is completed. The interaction process between A3C-R routing and the environment is shown in Figure 3. The design of the specific state, action, and reward during the interaction between A3C routing and the environment is as follows:
  • State
The state design in this paper mainly includes QoS information such as network delay, bandwidth, and packet loss rate, as well as network link energy consumption. We collected various network performance indicators according to the network information collection module, and then calculated the normalized network link energy consumption, delay, bandwidth, and packet loss rate through ϕ f , D e l a y P , D P , and L o s s P respectively. Finally, the above state information is recorded as a state set s = ϕ f , D e l a y P , D P , L o s s P .
2.
Action
The design of the action is the link weights of the network topology. Through the continuous interaction between the A3C-R routing algorithm and the environment, we can gradually train the optimal link weights suitable for the current network environment. We can record the action output by the A3C-R routing algorithm as a = w 1 , w 2 , , w n , where w is the weight value of each link in the network topology, and n is the total number of links in the network topology. The A3C-R algorithm, through the interaction with the network environment and continuous learning, generates link weights that comprehensively consider QoS and energy-saving requirements. To calculate the optimal routing path, the A3C-R algorithm utilizes the Dijkstra algorithm, which takes the optimized link weights generated by A3C-R as input and computes the optimal path from the source node to the destination node in the network topology. This path is not only the shortest in terms of link weights but also ensures that it meets the QoS and energy-saving requirements, as determined by the A3C-R algorithm. Finally, based on the link weights optimized by the A3C-R algorithm and the path calculated by the Dijkstra algorithm, the optimal routing strategy is implemented in the network.
3.
Reward
The design of the reward mainly takes into account the energy consumption of the current network link, as well as network performance indicators such as network delay, bandwidth, and packet loss rate. The design of the reward is shown in Formula (21).
r = δ ϕ f + β D e l a y P + λ 1 D P + γ L o s s P , δ , β , λ , γ 0 , 1
Among them, δ , β , λ , and γ are the weight parameters of link energy consumption, delay, bandwidth, and packet loss rate, respectively. In order to ensure the optimal balance of various performance indicators, we set the above parameters to 0.25, and their sum is 1.

4.3. A3C-R Routing Algorithm Process

During the training process of the QoS-oriented energy-saving routing algorithm A3C-R, the network status, including network energy consumption, and QoS indicators such as delay, bandwidth, and packet loss rate, are the main inputs. Through the asynchronous training mechanism of the A3C algorithm, multiple agents are executed to complete the independent interaction with the environment, and the experience value obtained during the interaction process is sampled to complete the update of the entire network. Finally, the weights of various network links that meet the requirements of QoS and network energy consumption are output, so as to calculate the optimal routing strategy. The process of A3C-R routing algorithm oriented to QoS and energy saving is shown in Algorithm 1.
Algorithm 1 A3C-R Routing Algorithm Process for QoS and Energy Saving
Input: The set of network states ϕ f , D e l a y P , D P , L o s s P
Output: The set of network link weights w 1 , w 2 , , w n
Initialize: Critic parameter ϕ and Actor parameter θ of the global network. Critic parameter ϕ and Actor parameter ϕ of Worker thread network.
Initialize: Global shared counter T , global maximum shared count T m a x , thread step size t = 1 .
(1) Repeat:
(2) Reset ϕ = ϕ , θ = θ
(3) t t + 1 , s t = ϕ f , D e l a y P , D P , L o s s P
(4) Repeat
(5) The Worker thread executes a t π a t | s t ; θ
(6) Obtain r t , s t + 1 a t
(7) t t + 1 , T T + 1
(8) Until s t is the terminal state, or t t s t a r t = t max
(9) R = 0 ,   s t e r m i n a l V ( s t ; ϕ ) , o t h e r s
(10) For i { t 1 , t 2 , , t s t a r t } do
(11) Calculate R r i + γ R at each moment
(12) d θ d θ + θ log π ( a i | s i ; θ ) ( R - V ( s i ; ϕ ) ) + c θ H ( π ( s i , θ ) )
(13) d ϕ d ϕ + ( R V s i ; ϕ 2 ) / φ
(14) End For
(15) Global network asynchronous update θ d θ , ϕ ϕ
(16) Until T > T m a x
The A3C-R algorithm, designed for QoS-oriented and energy-efficient intelligent routing, begins with a series of rigorous initialization steps. First, the Critic and Actor parameters of the global network are initialized, where the Critic evaluates state values, and the Actor generates decision actions. Simultaneously, the Critic and Actor parameters for each Worker thread are also initialized, establishing a solid foundation for multi-agent asynchronous training. In addition to parameter initialization, key global variables such as the global shared counter, maximum shared update steps, and thread step size are configured. These settings play a crucial role in defining the iteration process and termination conditions, ultimately influencing the algorithm’s convergence efficiency.
The core training process is detailed in Algorithm 1, lines 1–15, which describes the parameter update mechanism during training. Specifically, in lines 2–3, the agent retrieves the initial network state, including key indicators such as energy consumption, latency, bandwidth, and packet loss rate, which serve as the basis for decision making. In lines 4–9, the Worker thread executes actions based on the current state, then obtains the corresponding reward value to assess the effectiveness of the action, along with the next state for further evaluation. Finally, in lines 10–15, the agent updates global network parameters based on reward values, state information, and policy gradients, continuously refining the routing strategy. This parameter update mechanism enables A3C-R to dynamically adjust routing decisions, balancing QoS requirements and energy efficiency. As training progresses, the algorithm iteratively optimizes the routing policy, ultimately converging toward an optimal solution that enhances overall network performance.

5. Experimental Evaluation

5.1. Experimental Environment and Comparison Algorithms

In this paper, the network simulation software Mininet is used to realize the QoS and energy-saving test of the A3C-R routing algorithm. The experiment process adopts the RYU controller (version 4.34) and Open vSwitch switch (version 2.9.8). The RYU controller was selected for its support of OpenFlow 1.3, which allows for dynamic flow table management and fine-grained control of network flows. Its flexible, scalable, and modular architecture facilitates seamless integration with the A3C-R algorithm, enabling QoS-aware and energy-efficient routing. We use OS3E as the network topology. OS3E network topology consists of 38 nodes, which simulate various network devices such as switches, routers, and hosts. These nodes are interconnected through 48 links. The bandwidth of each link is uniformly set to 200 Mbps, which serves as a standardized parameter for testing the performance of different routing algorithms. This setting enables a fair comparison of how algorithms affect network performance metrics like delay, throughput, packet loss rate, and energy-saving efficiency under the same bandwidth condition. The experimental hardware implementation platform is the Linux operating system Ubuntu 18.04, tensorflow 1.8.0, and Python 3.5.0. The hardware is an i5-10600KF-CPU, with 16 GB-DDR4 memory, and two GTX-1080 8 G graphics cards. During the training process of A3C-R, the neural network uses the Relu activation function. In the pre-experiments, we tried a number of different combinations of learning rates. When the learning rate of the Actor network is too large, the model training is unstable and easy to skip the optimal solution; when it is too small, the training speed is slow and falls into the local optimum. It has been tested that a learning rate of 0.002 enables the Actor network to efficiently explore and learn the optimal policy while maintaining the training speed. For Critic networks, a learning rate that is too large leads to large fluctuations in the value assessment, while a learning rate that is too small is too slow to guide the Actor network in a timely manner. It was ultimately determined that a learning rate of 0.001 would enable the Critic network to stabilize the learning value function and provide reliable feedback to the Actor network. The algorithm trains for 100,000 steps, with the learning rates for the Actor and Critic set to 0.002 and 0.001, respectively. The batch size for training is 128, and the discount factor is 0.9. Specifically, the discount factor (0.9) is chosen based on related deep reinforcement learning (DRL) research in SDN routing optimization [20], which suggests that values between 0.8 and 0.99 balance long-term and short-term rewards effectively. We also conducted initial experiments with discount factors of 0.7, 0.8, 0.9, and 0.99, and found that 0.9 led to better convergence stability and long-term performance. Similarly, for the batch size, we tested values of 32, 64, 128, and 256, and found that a batch size of 128 achieved a good trade-off between training stability and computational efficiency.
This experiment is compared with traditional routing algorithm, intelligent routing algorithm, and energy-saving routing algorithm. The specific comparison content includes (1) the traditional multi-path routing algorithm ECMP. ECMP is a widely used routing algorithm that distributes traffic across multiple equal-cost paths, serving as a baseline for traditional routing performance. Comparing with ECMP allows us to demonstrate how A3C-R improves QoS and energy efficiency beyond conventional approaches. (2) I-DQN, a fast-converging DQN routing algorithm considering multi-type QoS requirements [21]. I-DQN is a deep reinforcement learning-based routing algorithm designed to optimize multiple QoS metrics such as delay and throughput. Since A3C-R also targets QoS improvements, I-DQN serves as a strong benchmark for evaluating QoS optimization effectiveness. (3) DDPG-EEFS, an energy-saving flow scheduling mechanism based on DRL algorithm [22]. This algorithm optimizes network energy consumption using deep reinforcement learning. Given that A3C-R also aims to reduce energy consumption, comparing it with DDPG-EEFS allows us to assess A3C-R’s energy-saving performance relative to other energy-efficient routing strategies. The comparison content includes QoS performance indicators such as network delay, throughput, and packet loss rate, as well as network energy-saving effects. All comparison experiments were conducted under the same experimental conditions, including network topology, traffic patterns, and configuration settings, to ensure a fair and consistent evaluation across all algorithms.

5.2. Convergence of Routing Algorithms

This experiment verified the convergence effect of the A3C-R algorithm through the reward value of the training process. The experimental process is based on the Formula (21) reward formula as the standard. In the energy consumption function, parameter α is set to 2, and parameter μ is set to 1. The rewards of the A3C-R routing algorithm training process are shown in Figure 4. The whole training process includes a total of 100,000 steps of interaction, and the reward of the A3C-R routing algorithm tends to be stable at about 40,000 steps, showing a good convergence effect. The reason is that the A3C-R routing algorithm is based on A3C’s asynchronous training method, which can interact with the environment and update model parameters through multiple agents at the same time. This method can speed up training and improve the utilization efficiency of samples. In addition, the A3C algorithm introduces the Advantage function, which is used to measure the advantage of performing a certain action relative to the average level, which can help the agent to better improve the strategy, thereby improving the convergence.

5.3. Performance Analysis of Routing Algorithms

This experiment sets up the comparison of routing QoS and energy-saving indicators under different network loads. Since the link bandwidth of the experimental topology is 200 Mbps by default, we increase the traffic load from 20 Mbps to 180 Mbps. We used a combination of different types of traffic workloads to mimic real-world network traffic. Specifically, we employed three main types of traffic: delay-sensitive traffic, which is characteristic of real-time applications like online gaming, video conferencing, and instant messaging; bandwidth-sensitive traffic, representing services such as high-definition video streaming and large-file transfers; and packet loss-sensitive traffic, corresponding to applications such as financial transaction data transmission and medical data transfer. These traffic types were randomly generated and mixed in certain proportions to create a complex and realistic traffic load. During the experiment, we tested QoS indicators such as delay, throughput, and packet loss rate under different network traffic loads, as well as network energy-saving effects. We use the percentage of energy saving (PES) of the network link to evaluate the energy efficiency of the routing algorithm. PES is defined as the percentage reduction in energy consumption achieved by the proposed algorithm compared to a baseline energy consumption, typically the total network energy consumption under full link load. The specific calculation process is shown in the Formula (22) [23]. Where ϕ f r o u t i n g   a l g o r i t h m represents the link energy consumption of the current routing algorithm, while ϕ f f u l l   l o a d represents the energy consumption under full link load conditions. A higher PES indicates better energy-saving performance. After many experiments, we found that parameter α affects the dynamic energy consumption part of the link. When the α value is too small, the algorithm is not sensitive enough to the link traffic changes to optimize the energy consumption effectively; when it is too large, it will pay excessive attention to the energy consumption at the expense of the QoS performance. It has been tested that setting α to 2 optimizes network power consumption to the maximum extent while ensuring QoS. The parameter μ affects the link intrinsic energy consumption part, its value is too small, and the algorithm does not take into account the link intrinsic energy consumption enough; when it is too large, it will lead to the unreasonable allocation of network resources and affect the QoS. Setting μ to 1 enables a better balance between considering the inherent energy consumption of the link and the dynamic energy consumption, ensuring that the algorithm optimizes energy consumption while maintaining good network performance. Therefore, In the energy consumption function, parameter α is set to 2, and parameter μ is set to 1. The routing algorithm performance test results are shown in Figure 5.
P E S = 1 ϕ f r o u t i n g   a l g o r i t h m ϕ f f u l l   l o a d * %
It can be seen from Figure 5 that as the traffic load continues to increase, the A3C-R routing algorithm is superior to other routing algorithms in terms of QoS performance indicators such as delay, throughput, and packet loss rate, as well as energy saving effects. The reason is that the traditional routing algorithm ECMP mainly uses equal-cost multi-path transmission, and does not comprehensively consider the delay, throughput, packet loss rate and energy performance indicators of each path in the current network. It is easy to cause link congestion when the network traffic increases. The I-DQN algorithm considers multiple types of QoS requirements in the routing algorithm modeling process, so the algorithm has better optimization effects such as delay, throughput, and packet loss rate compared with the ECMP algorithm. Due to the uniform transmission of traffic, its energy saving advantage is still better than that of ECMP algorithm. The DDPG-EEFS algorithm takes the minimum network energy consumption and the average completion time of network flow as the joint optimization goal in modeling. Therefore, compared with the I-DQN algorithm, it has a better energy-saving advantage and a delay optimization effect, but the throughput and packet loss rate effects are slightly weaker than the I-DQN algorithm. Our A3C-R algorithm takes QoS and energy consumption as the joint optimization goal and uses the asynchronous update advantage of the A3C algorithm to fully perceive the network status and select an energy-saving path that meets the QoS requirements. Therefore, the A3C-R algorithm has better QoS performance indicators and energy-saving advantages than the ECMP algorithm, I-DQN algorithm, and DDPG-EEFS algorithm.

6. Conclusions

The experimental results show that, compared with the ECMP algorithm, I-DQN algorithm, and DDPG-EEFS algorithm, the A3C-R algorithm reduces the delay by approximately 9.42%, increases the throughput by approximately 7.04%, reduces the packet loss rate by approximately 9.51%, and increases the energy saving percentage by approximately 10.78%. The A3C-R algorithm in this paper only considers the QoS and energy-saving optimization of the SDN data plane. With the continuous complexity of network devices and traffic, the performance and energy-saving optimization of the SDN control plane still has important research significance.
In the future, we will further investigate the joint optimization of both the SDN control and data planes to achieve a more comprehensive balance between QoS and energy efficiency. Furthermore, incorporating real-world deployment and validation in large-scale SDN environments will be a crucial step to assess the practicality and scalability of our approach.

Author Contributions

Conceptualization by S.W., R.S. and W.H.; methodology by S.W., R.S. and X.Z.; software by R.S. and X.Z.; validation by S.W., R.S., W.H. and H.L.; formal analysis by S.W. and R.S.; investigation by R.S. and X.Z.; data curation by R.S., W.H. and H.L.; writing—original draft preparation by S.W. and R.S.; writing—review and editing by S.W., R.S., X.Z., W.H. and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Guangdong Province Key Field Special Project “Research on Key Technologies of Zero-Trust Business Data Security Monitoring for Industry-Specific Networks” No.: 2021ZDZX1098, China University Industry-University-Research Innovation Fund “Zero-Trust API Gateway System for Industry Private Networks” No.: 2021FNB3001, “Research on Key Technologies for Security Monitoring of Private Network Big Data under Elastic Cloud Platform” No.: 2022IT020, Shenzhen Science and Technology Innovation Commission Stable Support Plan “Research and Application of Key Technologies for 5G Network Protection Based on Active Defense” No.: 20231128083944001.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, L.; Lu, Y.; Zhang, D.; Cheng, H.; Dong, P. DSOQR: Deep Reinforcement Learning for Online QoS Routing in SDN-Based Networks. Secur. Commun. Netw. 2022, 2022, 4457645. [Google Scholar]
  2. Wang, S.; Yuan, J.; Zhang, X.; Qian, Z.; Li, X.; You, I. QoS-aware flow scheduling for energy-efficient cloud data centre network. Int. J. Ad Hoc Ubiquitous Comput. 2020, 34, 141–153. [Google Scholar] [CrossRef]
  3. Lin, Y.J. Research on the development of time-sensitive networks and their security technologies. In Proceedings of the 4th International Conference on Informatics Engineering & Information Science (ICIEIS2021), Tianjin, China, 19–21 November 2021; SPIE: Bellingham, WA, USA, 2022; Volume 12161, pp. 90–95. [Google Scholar]
  4. Keshari, S.K.; Kansal, V.; Kumar, S. A systematic review of quality of services (QoS) in software defined networking (SDN). Wirel. Pers. Commun. 2021, 116, 2593–2614. [Google Scholar] [CrossRef]
  5. Rana, D.S.; Dhondiyal, S.A.; Chamoli, S.K. Software defined networking (SDN) challenges, issues and solution. Int. J. Comput. Sci. Eng. 2019, 7, 884–889. [Google Scholar] [CrossRef]
  6. Chenhui, W.; Hong, N.; Lei, L. A Routing Strategy with Optimizing Linear Programming in Hybrid SDN. IEICE Trans. Commun. 2022, 105, 569–579. [Google Scholar]
  7. Ding, Y.; Guo, J.; Li, X.; Shi, X.; Yu, P. Data Transmission Evaluation and Allocation Mechanism of the Optimal Routing Path: An Asynchronous Advantage Actor-Critic (A3C) Approach. Wirel. Commun. Mob. Comput. 2021, 2021, 6685722. [Google Scholar] [CrossRef]
  8. Li, S.H. Research and Implementation of Routing Optimization Technology Based on Traffic Classification in SDN. Master’s Thesis, Beijing University of Posts and Telecommunications, Beijing, China, 2021. [Google Scholar]
  9. Fei, H.; Jia, D.; Zhang, B.; Li, C.; Zhang, Y.; Luo, T.; Zhou, J. A novel energy efficient QoS secure routing algorithm for WSNs. Sci. Rep. 2024, 14, 25969. [Google Scholar]
  10. Wang, Y.; Othman, M.; Choo, W.O.; Liu, R.; Wang, X. DFRDRL: A dynamic fuzzy routing algorithm based on deep reinforcement learning with guaranteed latency and bandwidth for software-defined networks. J. Big Data 2024, 11, 150. [Google Scholar] [CrossRef]
  11. Sanchez LP, A.; Shen, Y.; Guo, M. DQS: A QoS-driven routing optimization approach in SDN using deep reinforcement learning. J. Parallel Distrib. Comput. 2024, 188, 104851. [Google Scholar]
  12. Wang, Z.; Zeng, W.; Yang, S.; He, D.; Chan, S. UCRTD: An Unequally Clustered Routing Protocol Based on Multi Hop Threshold Distance for Wireless Sensor Networks. IEEE Internet Things J. 2024, 11, 29001–29019. [Google Scholar]
  13. Tang, Q.; Nie, F. Clustering routing algorithm of wireless sensor network based on swarm intelligence. Wirel. Netw. 2024, 30, 7227–7238. [Google Scholar] [CrossRef]
  14. Shu, X.; Lin, A.; Wen, X. Energy-Saving Multi-Agent Deep Reinforcement Learning Algorithm for Drone Routing Problem. Sensors 2024, 24, 6698. [Google Scholar] [CrossRef] [PubMed]
  15. Niranjana, M.I.; Daisy, J.; RamNivas, D.; Gayathree, K.; Vignesh, M.; Parthipan, V. Grid Based Reliable Routing Algorithm with Energy Efficient in Wireless Sensor Networks Using Image Processing. In Proceedings of the 2024 5th International Conference on Communication, Computing & Industry 6.0 (C2I6), Bengaluru, India, 6–7 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
  16. Akyıldız, O.; Kök, İ.; Okay, F.Y.; Özdemir, S. A p4-assisted task offloading scheme for fog networks: An intelligent transportation system scenario. Internet Things 2023, 22, 100695. [Google Scholar] [CrossRef]
  17. Qadir, G.A.; Zeebaree, S.R.M. Evaluation of QoS in Distributed Systems: A Review. Int. J. Sci. Bus. 2021, 5, 89–101. [Google Scholar]
  18. Wang, L.; Zhang, F.; Zheng, K.; Vasilakos, A.V.; Ren, S.; Liu, Z. Energy-Efficient Flow Scheduling and Routing with Hard Deadlines in Data Center Networks. In Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems, Madrid, Spain, 30 June–3 July 2014; IEEE: Piscataway, NJ, USA, 2014. [Google Scholar]
  19. Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; PmLR: Cambridge, MA, USA, 2016; pp. 1928–1937. [Google Scholar]
  20. Zheng, X.; Huang, W.; Wang, S.; Zhang, J.; Zhang, H. Research on Energy-Saving Routing Technology Based on Deep Reinforcement Learning. Electronics 2022, 11, 2035. [Google Scholar] [CrossRef]
  21. Pradhan, A.; Bisoy, S.K. Intelligent Action Performed Load Balancing Decision Made in Cloud Datacenter Based on Improved DQN Algorithm. In Proceedings of the 2022 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 9–11 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
  22. Yao, Z.; Wang, Y.; Meng, L.; Qiu, X.; Yu, P. DDPG-Based Energy-Efficient Flow Scheduling Algorithm in Software-Defined Data Centers. Wirel. Commun. Mob. Comput. 2021, 2021, 6629852. [Google Scholar]
  23. Qiu, H.; Lv, C.; Zhou, D. Energy-saving routing algorithm for mobile blockchain Device-to-Device network in 5G edge computing environment. In Proceedings of the AIIPCC 2022; The Third International Conference on Artificial Intelligence, Information Processing and Cloud Computing, Online, 21–22 June 2022; VDE: Osaka, Japan, 2022; pp. 1–10. [Google Scholar]
Figure 1. Message passing between controller and switch.
Figure 1. Message passing between controller and switch.
Futureinternet 17 00158 g001
Figure 2. Asynchronous update framework of A3C algorithm.
Figure 2. Asynchronous update framework of A3C algorithm.
Futureinternet 17 00158 g002
Figure 3. The interaction process between A3C-R routing and the environment.
Figure 3. The interaction process between A3C-R routing and the environment.
Futureinternet 17 00158 g003
Figure 4. Reward values of training process.
Figure 4. Reward values of training process.
Futureinternet 17 00158 g004
Figure 5. Performance comparison of routing algorithms under different traffic loads.
Figure 5. Performance comparison of routing algorithms under different traffic loads.
Futureinternet 17 00158 g005aFutureinternet 17 00158 g005b
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, S.; Song, R.; Zheng, X.; Huang, W.; Liu, H. A3C-R: A QoS-Oriented Energy-Saving Routing Algorithm for Software-Defined Networks. Future Internet 2025, 17, 158. https://doi.org/10.3390/fi17040158

AMA Style

Wang S, Song R, Zheng X, Huang W, Liu H. A3C-R: A QoS-Oriented Energy-Saving Routing Algorithm for Software-Defined Networks. Future Internet. 2025; 17(4):158. https://doi.org/10.3390/fi17040158

Chicago/Turabian Style

Wang, Sunan, Rong Song, Xiangyu Zheng, Wanwei Huang, and Hongchang Liu. 2025. "A3C-R: A QoS-Oriented Energy-Saving Routing Algorithm for Software-Defined Networks" Future Internet 17, no. 4: 158. https://doi.org/10.3390/fi17040158

APA Style

Wang, S., Song, R., Zheng, X., Huang, W., & Liu, H. (2025). A3C-R: A QoS-Oriented Energy-Saving Routing Algorithm for Software-Defined Networks. Future Internet, 17(4), 158. https://doi.org/10.3390/fi17040158

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop