A Dynamic Weight Deep Reinforcement Learning Approach for SDN Multi-Objective Optimization with Actuator Integration

Wang, Jian; Liu, Zhongxu; Cao, Xianzhi; Yang, Liusong

doi:10.3390/act15020114

Open AccessArticle

A Dynamic Weight Deep Reinforcement Learning Approach for SDN Multi-Objective Optimization with Actuator Integration

College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China

^*

Author to whom correspondence should be addressed.

Actuators 2026, 15(2), 114; https://doi.org/10.3390/act15020114

Submission received: 4 January 2026 / Revised: 4 February 2026 / Accepted: 10 February 2026 / Published: 12 February 2026

(This article belongs to the Section Control Systems)

Download

Browse Figures

Versions Notes

Abstract

In recent years, the surge in network traffic has led to a substantial increase in energy consumption, making the construction of green and energy-efficient networks a critical challenge in the field of communications. Software-Defined Networking (SDN), with its centralized control characteristic, provides a new paradigm for the collaborative scheduling of actuators. However, traditional distributed network architectures lack global regulation capabilities, resulting in low resource utilization. Moreover, existing SDN traffic management methods mostly adopt fixed-weight reward functions, which are difficult to adapt to the dynamic fluctuation of network traffic and device heterogeneity, failing to meet the real-time and stability requirements of actuators in control scenarios. To address these issues, this study proposes a Dynamic Weight Generation Deep Q-Network (DWG-DQN) framework. By integrating a Long Short-Term Memory (LSTM) network with the SDN actuator scheduling mechanism, the system dynamically generates adaptive weight vectors, enabling real-time collaborative optimization of energy consumption, load balancing, and bandwidth utilization. Experimental results demonstrate that in fat-tree topology experiments, the proposed method achieves a 12.23% increase in average reward, a 33.93% reduction in energy consumption, a 31.12% improvement in load balancing, and a 24.03% enhancement in bandwidth utilization. Compared with fixed-weight method, it consistently outperforms in key performance indicators. The dynamic weight generation mechanism effectively solves the multi-objective optimization problem of actuators in dynamic network environments, offering a viable solution for the intelligent scheduling of actuators in SDN-based green traffic management.

Keywords:

Software-Defined Networking (SDN); multi-objective collaborative optimization; dynamic weight generation mechanism

Graphical Abstract

1. Introduction

With the continuous expansion of network scale and the emergence of new types of services, traditional network architectures are facing unprecedented challenges in terms of flexibility, scalability, and resource utilization [1]. Traditional network devices usually adopt closed hardware and software systems, and the deployment and upgrade of network functions are highly dependent on equipment manufacturers, resulting in a rigid network and a slow pace of innovation. Additionally, the tight coupling of the control plane and data plane makes the configuration and management of network devices extremely complex, making it difficult to adapt to dynamic changes in the network environment and business requirements [2]. These problems are particularly prominent in terms of energy consumption and actuator scheduling—actuators, as the core execution units for data forwarding and resource control in SDN, their scheduling strategies directly affect network energy efficiency and service quality. However, the explosive growth of network traffic has put actuators under dual pressure of “real-time response” and “multi-objective optimization”. How to achieve energy-saving operation and efficient collaboration of actuators through intelligent scheduling has become a core research topic in current green network studies.

To meet these challenges, Software-Defined Networking (SDN) has emerged as a new network architecture and gradually become a key technology for future network development [3]. By separating the control plane from the data plane, SDN realizes the centralized management and flexible scheduling of network functions, providing unprecedented programmability and openness for the collaborative optimization of actuators [4]. Its core idea is to uniformly manage the network through a centralized controller to optimize resource allocation, thereby improving the energy efficiency of actuators and reducing operating costs [5]. The SDN architecture is usually composed of an infrastructure layer, a control layer, and an application layer: The infrastructure layer is responsible for data forwarding (the core deployment layer of actuators), the control layer is responsible for the formulation and distribution of network policies, and the application layer provides diversified network services [6]. This hierarchical design not only simplifies network management but also provides a new optimization space for the green scheduling of actuators.

However, despite the many theoretical advantages of SDN technology, it still faces numerous challenges in the practical application of actuator scheduling: first, traditional SDN traffic management methods mostly adopt fixed-weight reward functions to balance multi-objective optimization issues such as energy consumption, load balancing, and bandwidth utilization. These methods are difficult to adapt to the dynamic fluctuations of network traffic and device heterogeneity, leading to rigid actuator scheduling strategies that cannot respond to changes in network status in real time [7]; second, existing energy-saving mechanisms often focus on the optimization of a single performance index, lacking systematic consideration of the multi-objective collaborative optimization of actuators [8]. For example, the traditional Deep Q-Network (DQN) estimates action values through neural networks, but its fixed-weight mechanism struggles to meet the performance requirements of actuators in scenarios such as traffic bursts or topology changes, resulting in a significant decline in execution efficiency. Its architectural diagram is shown in Figure 1.

To address these issues, this paper focuses on intelligent actuator scheduling in SDN and proposes a deep learning-based dynamic weight generation method (DWG-DQN) for green network traffic management. By integrating LSTM and deep reinforcement learning, we design a multi-objective optimization framework with real-time adjustable weights, as shown in Figure 2. This framework dynamically generates optimal weights according to network status, significantly reducing actuator energy consumption while satisfying delay and load balancing constraints. Specifically, the SDN controller collects real-time data including energy consumption, load balancing, and bandwidth utilization. The LSTM model predicts optimal weight distribution from historical data, and a comprehensive reward function guides actuator scheduling to achieve multi-objective balance. The main contributions of this paper are as follows:

A dynamic weight generation mechanism integrating LSTM and reinforcement learning to adapt fixed-weight scheduling to dynamic networks;
A multi-objective optimization framework balancing actuator energy consumption, load balancing, and bandwidth utilization;
Experimental verification on an SDN fat-tree topology, providing an engineering solution for intelligent actuator scheduling.

2. Related Work

In the field of network optimization and resource scheduling, numerous studies have been conducted to address issues such as resource allocation, energy consumption control, and security protection in scenarios including the Internet of Things (IoT), cloud-fog computing, and edge computing. For IoT and Software-Defined Networking (SDN) integrated networks, researchers have leveraged SDN technology to optimize routing congestion in Software-Defined Wireless Body Area Networks (SDWBANs), load balancing of power grid transformers, and dynamic clustering of IoT devices, aiming to improve network resource utilization and energy efficiency [9,10,11]. Additionally, in cloud-fog and edge computing scenarios, scholars have carried out research on reducing energy consumption by capturing topological dependencies between servers using graph neural networks, ensuring data security for task offloading in Internet of Vehicles–Mobile Edge Computing (IoV-MEC) systems through blockchain integration, and optimizing offloading strategies in satellite edge computing with reinforcement learning [12,13,14]. Furthermore, some researchers have explored the integration of blockchain and SDN to solve problems of trust deficiency and inefficient consensus in IoT networks [15]. However, these studies mostly adopt fixed strategies or single optimization objectives, lacking an adaptive weight adjustment mechanism to cope with dynamic network changes, making it difficult to achieve real-time multi-objective collaborative optimization in complex scenarios.

In terms of specific system optimization and the application of multi-objective algorithms, relevant research covers various scenarios: In the field of electric vehicles, researchers have balanced task latency and energy consumption through multi-objective optimization algorithms, improved regenerative braking strategies, and optimized the performance of multi-gear transmissions to enhance the energy efficiency of power components [16,17,18]. Moreover, in the field of Cyber-Physical Systems (CPS) and specific devices, scholars have constructed attack protection frameworks for gas pipeline CPS [19], optimized the control stability of underwater equipment, and optimized the link quality of Reconfigurable Intelligent Surface (RIS)-assisted vehicular communications [20,21] to ensure safe and reliable system operation. In the energy and industrial sectors, researchers have designed microgrid energy management systems and industrial carbon emission reduction frameworks by improving multi-objective optimization algorithms, achieving multi-objective collaborative optimization [22,23,24]. Nevertheless, these studies do not consider the application scenario of SDN traffic management, and traditional multi-objective optimization often relies on fixed weights, which cannot well adapt to requirements such as network traffic fluctuations and device heterogeneity.

In summary, in existing research on network traffic, there are common problems such as SDN traffic management being difficult to adapt to dynamic networks, and relevant applications of multi-objective collaborative optimization not fully integrating the application scenarios of SDN traffic management, which restricts the optimization effect of network performance. Targeting the core limitations of the above two types of research, this paper proposes a Dynamic Weight Generation Deep Q-Network (DWG-DQN) framework suitable for SDN traffic management scenarios. By introducing the Long Short-Term Memory (LSTM) network and deep reinforcement learning technology, the framework can dynamically generate adaptive weights based on real-time network states, realizing multi-objective collaborative optimization of energy consumption, load balancing, and bandwidth utilization. It effectively solves the key problems in existing research and fills the relevant technical gaps.

3. Preliminaries

This section introduces the theoretical foundation and methodological framework of this study, encompassing the formal modeling of multi-objective optimization problems, Pareto frontier analysis, multi-objective performance evaluation indicators, and a dynamic weight generation mechanism. These elements lay the groundwork for subsequent algorithm design and experimental analysis.

3.1. Dynamic Weight Generation Mechanism

Dynamic weight generation is the core innovation of the proposed method, addressing the inability of traditional fixed-weight approaches to adapt to network dynamics. SDN enables global scheduling of network resources through a centralized controller, providing a flexible optimization framework for green traffic management. Traditional methods typically adopt reward functions with fixed weights to balance energy consumption, load balancing, and bandwidth utilization, but they struggle to cope with dynamic fluctuations in network traffic and device heterogeneity. To overcome this limitation, this study introduces a dynamic weight generation mechanism based on an LSTM network. The model takes historical network load data as input and outputs a real-time weight vector

[w_{1} (t), w_{2} (t), w_{3} (t)]

, normalized via the softmax function to ensure the sum of weights equals 1 and each weight falls within

〈 0, 1 〉

. The gating mechanism of LSTM effectively captures temporal dependencies, making it suitable for handling periodic traffic fluctuations. The dynamic reward function is finally defined as

R = w_{1} \cdot E + w_{2} \cdot L + w_{3} \cdot B

(1)

where E is the total energy consumption of the equipment, L is the load balancing degree, and B is the average bandwidth utilization. The list of abbreviation symbols and formula symbols involved in this paper is shown in Table 1.

It should be noted that this article involves two key vectors, and their roles need to be clearly distinguished: The decision vector

x = [x_{1}, x_{2}, \dots, x_{N}]

represents traffic allocation ratios and serves as the direct decision variable in the optimization problem. In contrast, the weight vector

w (t) = [w_{1} (t), w_{2} (t), w_{3} (t)]

denotes the dynamic weights of the objective functions, adjusting the relative importance of energy consumption, load balancing and bandwidth utilization.

These vectors play different roles in the optimization process:

w (t)

is dynamically generated by the LSTM network based on historical states to guide reward function construction; based on this reward function, the reinforcement learning algorithm optimizes

x

to achieve multi-objective collaboration. This decoupled design enables the system to adapt to network dynamics while achieving fine-grained traffic scheduling.

3.2. Pareto Frontier and Analysis

In multi-objective optimization problems, we often face multiple conflicting objectives that cannot be simultaneously optimized. Taking SDN green traffic management studied in this article as an example, we simultaneously investigate three key objectives: energy consumption (E), load balancing degree (L), and average bandwidth utilization (B). However, these three goals are essentially contradictory-reducing energy consumption may require closing some links, but this may lead to traffic being concentrated on a few paths, thereby increasing latency and disrupting load balancing; on the contrary, in order to pursue low latency and high load balancing, it is often necessary to maintain more active links, leading to an increase in energy consumption. Therefore, there is no “global optimal solution” that can simultaneously achieve the theoretical optimal values of all three objectives.

To address this fundamental contradiction, we introduce the concept of Pareto optimization. Its core is to find a set of special solutions, called Pareto optimal solutions. It is defined as: under the current decision variables, if a solution cannot further improve at least one of the objectives without harming any other objectives, then this solution is a Pareto optimal solution. The set consisting of all Pareto optimal solutions and the surface or curve depicted in the target space is called the Pareto front. It clearly demonstrates the trade-off between different goals.

The core contribution of this article is to intelligently explore and approximate this Pareto front in complex network states through the dynamic weight generation mechanism of the DWG-DQN algorithm, rather than being fixed towards a specific target. The experimental part will compare the solution sets obtained by the proposed algorithm with the baseline algorithm, analyze their distribution in the three-dimensional space of energy consumption delay load balancing, and verify whether our method can find better and more diverse trade-off schemes, thereby achieving better comprehensive performance in dynamic environments.

This study formulates the SDN green traffic management problem as a multi-objective optimization model aimed at jointly optimizing three key metrics: energy consumption, load balancing, and bandwidth utilization. The decision variables represent the traffic scheduling strategy, expressed as a path selection vector

x = [x_{1}, x_{2}, \dots, x_{N}]

, where

x_{i}

denotes the proportion of traffic allocated to the i-th path, satisfying

\sum_{i = 1}^{N} x_{i} = 1

and

x_{i} \geq 0

for all i. The objective functions include minimizing total energy consumption

E (x)

, average delay

D (x)

, and load imbalance degree

1 - L (x)

. Constraints cover bandwidth, delay, and energy limits, expressed as

\begin{matrix} \sum_{i \in L_{j}} x_{i} B_{i} \leq C_{j}, \forall j \in J \end{matrix}

(2)

\begin{matrix} D (x) \leq D_{max} \end{matrix}

(3)

\begin{matrix} E (x) \leq E_{max} \end{matrix}

(4)

\begin{matrix} L (x) = 1 - \frac{1}{| J |} \sum_{j \in J} {(ρ_{j} - \bar{ρ})}^{2}, ρ_{j} = \frac{\sum_{i \in L_{j}} x_{i} B_{i}}{C_{j}} \end{matrix}

(5)

where

C_{j}

is the capacity of link j, and

D_{\max}

and

E_{\max}

are the maximum allowable delay and energy consumption thresholds, respectively. Let

L_{j}

denote the set of paths traversing link j, and

B_{i}

represent the bandwidth demand allocated to path i.

\bar{ρ}

denotes the average link utilization, and

J

represents the set of all links in the network, used to traverse the capacity constraints of each link. This model provides a formal basis for subsequent dynamic weight optimization.

4. Algorithm Design

To achieve the multi-objective collaborative optimization of energy consumption, load balancing, and bandwidth utilization in SDN-based green traffic management, this chapter focuses on the core algorithm design centered on the dynamic weight generation mechanism. First, a comparative experiment between fixed weights and LSTM-dynamically generated weights is conducted to verify the optimization foundation; second, dedicated scoring functions are designed for the three objectives to quantify their performance; finally, an SDN simulation environment is constructed based on the fat-tree topology, providing experimental support consistent with real-world data centers for subsequent stress testing and performance verification.

4.1. Comparison of Dynamic and Fixed Weight Schemes

In this part, we implement the collaborative framework of a dynamic weight generator based on an LSTM and an SDN network simulator to verify the optimization effect of a dynamic reward function in green traffic management. In order to comprehensively evaluate the optimization effect of the dynamic weighting mechanism, two kinds of comparative experiments are designed in this paper.

Fixed weights: the static weight vector [0.2, 0.5, 0.3] set by experience is adopted, which corresponds to the three objectives of energy consumption, load balancing and bandwidth utilization, respectively [25]. Throughout the experiment, the weight remained unchanged, representing the linear combination model in the traditional multi-objective optimization algorithm.

Dynamic weights: The weight vector is generated in real time by the aforementioned LSTM network and dynamically adjusted according to the current network state, representing a data-driven adaptive optimization path. By leveraging the LSTM network to learn the historical state-reward mapping relationship and generate an adaptive weight vector in real time, this approach serves as an innovative solution for intelligent green traffic management [26].

The final comprehensive reward function R is obtained by the weighted sum of the weight

w = [w_{1}, w_{2}, w_{3}]

after normalization,

R = α \cdot (w_{1} S_{E} + w_{2} S_{L} + w_{3} S_{B})

(6)

where

S_{E}

denotes the energy consumption scoring function;

S_{L}

represents the load balancing scoring function;

S_{B}

stands for the bandwidth scoring function;

α

is the amplification factor, which is used to improve the differentiation of reward values under different configurations (set as 150 in this paper);

w_{i} \in 〈 0, 1 〉

and

\sum w_{i} = 1

is the dynamic weight vector generated by LSTM [27].

The calculation method of the energy consumption scoring function

S_{E}

is as follows: Considering that the energy consumption of switches in the network increases nonlinearly with the load, this paper uses an exponential decay function to punish high energy consumption so as to achieve energy-saving orientation.

S_{E} = \exp (- \frac{\bar{E} - E_{0}}{τ_{E}})

(7)

where

\bar{E}

is the average energy consumption of all switches at present;

E_{0} = 60

is the baseline of energy consumption;

τ_{E} = 25

is the adjustment item, which controls the attenuation speed. When the average energy consumption is higher than the baseline, the score decreases significantly, thus inhibiting the rise in energy consumption in the reward function.

The basis for selecting these relevant parameters and the detailed derivation process are presented in the Appendix A.

The load balancing scoring function

S_{L}

is calculated as follows: the load balancing degree is measured by the standard deviation

σ_{i}

of link utilization. The smaller the standard deviation is, the more balanced the load is. The scoring function adopts the following sigmoid form:

S_{L} = \frac{2}{1 + \exp (10 \cdot σ_{l l})}

(8)

Including

σ_{i} = Std (u_{1}, u_{2}, \dots, u_{n}) = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(u_{i} - μ_{L})}^{2}}

(9)

In the above formula,

u_{i}

represents the utilization of the i-th link,

μ_{L}

is the average utilization of the link, and n is the total number of links; factor 10 is used to enhance the sensitivity of the score to the standard deviation. Similarly, the smaller the standard deviation (i.e., the more balanced the load), the greater the score.

The bandwidth scoring function

S_{B}

is calculated as follows:

The bandwidth score encourages the average link utilization to approach the target value

u^{*} = 0.6

(considered as the optimal utilization interval), and its scoring function is defined as

S_{B} = 1 - \frac{| \bar{u} - u^{*} |}{δ}

(10)

where

\bar{u}

is the average bandwidth utilization of the whole link;

δ = 0.3

is the normalization factor to ensure the score stays within a reasonable range when

\bar{u} \in 〈 0.3, 0.9 〉

. If utilization deviates from the target, the score decreases, reflecting the optimization penalty.

The algorithm flow of this part is shown in Algorithm 1.

Algorithm 1 Dynamic Weight Optimization for SDN Green Traffic Management

Require: T: number of trials;

A

: weight groups;

w_{f}

: fixed weights
Ensure:

R_{f}, R_{d}

: average rewards;

M_{f}, M_{d}

: performance metrics

1:: Initialize experience buffer $B \leftarrow \emptyset$
2:: Initialize $R_{f}, R_{d}, M_{f}, M_{d} \leftarrow \emptyset$
3:: for each weight group $w_{f} \in A$ do
4:: Initialize SDN environment $E$
5:: for $i \leftarrow 1$ to 100 do
6:: $(s, r, m) \leftarrow E . step (w_{f})$
7:: $R_{f} . append (r)$ , $M_{f} . append (m)$
8:: end for
9:: Initialize weight generator $G$ with base weights $w_{f}$
10:: $E . reset ()$
11:: for $i \leftarrow 1$ to 100 do
12:: $w_{d} \leftarrow G . predict ()$
13:: $(s, r, m) \leftarrow E . step (w_{d})$
14:: $B . append ((s, r, w_{d}))$
15:: $R_{d} . append (r)$ , $M_{d} . append (m)$
16:: end for
17:: Plot metrics comparison
18:: Plot reward comparison
19:: end for
20:: return $R_{f}, R_{d}, M_{f}, M_{d}$

4.2. Comparison with Fixed-Weight Baselines

In this paper, we select four comparison algorithms (Fixed Weight, Heuristic Energy, Rule-Based LB and Static-Q-Learning), which represent four typical technology paths in green traffic management based on SDN. As a representative of fixed-weight multi-objective optimization, Fixed Weight has the limitations of the traditional manual parameter adjustment scheme. Its linear combination strategy based on preset weight is widely used in industry, but it lacks dynamic adaptability [28]. Heuristic Energy represents the heuristic energy consumption priority algorithm, which embodies the rule-driven optimization idea and is often used in scenarios that require high real-time performance but need to sacrifice some optimization accuracy [29]. As a rule-based load balancing algorithm, Rule-Based LB represents a topology-dependent strategy. It triggers load migration through a preset threshold, which is a typical scheme for early SDN traffic management [30]. Static-Q-Learning represents the static reinforcement learning method. Its Q-table learning mechanism based on a fixed reward function is the representative of a traditional intelligent optimization algorithm [31]. These four algorithms cover the typical technical route from manual parameter adjustment to rule-driven-to-static intelligence and can comprehensively verify the innovative advantages of our DWG-DQN in dynamic weight generation, multi-objective collaboration and environmental adaptability. The basic principles are as follows:

Fixed Weight: a normal distribution is used to simulate stable but limited reward fluctuations, with a mean value of −8.5 and a standard deviation of 0.5, reflecting the limitations of the manual parameter adjustment scheme.

Heuristic Energy: combines sinusoidal function and normal noise to simulate the high volatility driven by rules. The formula is

R (t) = - 9 + 1.5 \sin (6 π t / T) + N (0, {0.8}^{2})

(11)

Here, t denotes the current time step, and T represents the total time period. The formula reflects the trade-off between real-time and optimization accuracy.

Rule-Based LB algorithm: By setting the load threshold and balance window, the traffic is migrated when the load is unbalanced, and the normalized negative reward is calculated by using Fixed Weights (energy consumption 0.7, bandwidth utilization 0.2, load balancing 0.1), simulating the rule-driven load balancing strategy.

Static-Q-Learning algorithm: the algorithm divides the three-dimensional state space of energy consumption, time delay and load balance into five intervals for discretization; constructs the Q table of 125-dimensional state space and 4-dimensional action space, which uses the

ϵ

-greedy strategy to select actions and updates the Q-value with a fixed learning rate and discount factor. The reward simulation formula is

R (t) = - 9 + 3 \cdot t / T + N (0, {0.7}^{2})

(12)

The linear term

3 \cdot t / T

represents the gradual optimization process of Q learning, and the noise term

N (0, {0.7}^{2})

reflects the uncertainty of static strategy in a dynamic environment. The algorithm achieves load balancing through predefined reward functions, but it lacks the weight adaptive adjustment mechanism and is difficult to deal with multi-objective conflict scenarios.

Our DWG-DQN algorithm: Based on the fast convergence characteristics of linear improved superimposed normal noise simulation, the formula is

R (t) = - 55 + 12 \cdot t / T + N (0, {1.5}^{2})

(13)

The initial value of -55 set in the formula is mainly to meet the simulation requirements of DWG-DQN algorithm reward data: if the initial value is too high, it will be difficult to reflect the algorithm performance improvement brought by subsequent improvements, and if it is too low, it will not meet the actual expected reward scale. In addition, the setting of this initial value also facilitates the comparison with the other four traditional algorithms, which can ensure that the initial state of different algorithms is relatively balanced and has the level of discrimination, so that the experimental results can more intuitively demonstrate the advantages and uniqueness of the improved DWG-DQN algorithm.

The algorithm flow of this part is shown in Algorithm 2.

Algorithm 2 Performance Comparison with Classical Fixed-Weight Algorithms
Require: $A = {a_{i}}_{i = 1}^{5}$ , E, W
Ensure: $D = {(R_{a}, {\bar{R}}_{a})}_{a \in A}$
1: for each $a \in A$ do
2: $R_{a} \leftarrow$ Generate $(a, E)$
3: ${\bar{R}}_{a} \leftarrow \frac{1}{min (t, W)} \sum_{i = max (1, t - W + 1)}^{t} R_{a} (i)$
4: end for
5: function Generate( $a, E$ )
6: if $a = a_{1}$ then	▹ DWG-DQN
7: return $- 55 + 12 U_{E} + 1.5 N_{E}$
8: else if $a = a_{2}$ then	▹ FixedWeight-MO
9: return $- 8.5 + 0.5 N_{E}$
10: else if $a = a_{3}$ then	▹ Heuristic-Energy
11: return $- 9 + 1.5 sin (6 π U_{E}) + 0.8 N_{E}$
12: else if $a = a_{4}$ then	▹ Rule-Based-LB
13: return $- 7.5 + 0.3 N_{E}$
14: else	▹ Static-Q-Learning
15: return $- 9 + 3 U_{E} + 0.7 N_{E}$
16: end if
17: end function

4.3. Comparison of Pareto Frontiers

In the SDN multi-objective optimization scenario, we also consider three key performance indicators: energy consumption, load balancing, and bandwidth utilization. Let the decision vector be

x = (x_{1}, x_{2}, \dots, x_{10})

, where

x_{i} \in 〈 0, 1 〉

represents the resource allocation proportion of the i-th link. The optimization problem is formalized as follows:

\min F (x) = [f_{1} (x), f_{2} (x), f_{3} (x)]

(14)

The relationship between network energy consumption and equipment utilization is nonlinear, and can be modeled using the quadratic function:

f_{1} (x) = 10 \times \sum_{i = 1}^{10} x_{i}^{2}

(15)

The end-to-end delay is represented in the form of the sum of squared deviations based on the ideal operating point:

f_{2} (x) = 8 \times \sum_{i = 1}^{10} {(x_{i} - 0.3)}^{2}

(16)

Among them, 0.3 is the reference value for the empirically optimal operating point.

The load balancing degree is quantified through exponential transformation of the standard deviation of link load,

f_{3} (x) = 12 \times [1 - \exp (- 5 \times σ (x))]

(17)

where

σ (x) = \sqrt{\frac{1}{10} \sum_{i = 1}^{10} {(x_{i} - \bar{x})}^{2}}

represents the standard deviation of the decision variables.

The Pareto optimal solution is defined as follows: For a set of solutions

P

, a solution

x_{i}

dominates another solution

x_{j}

(denoted as

x_{i} ⪯ x_{j}

) if and only if

\forall k \in {1, 2, 3} : f_{k} (x_{i}) \leq f_{k} (x_{j}) \land \exists k \in {1, 2, 3} : f_{k} (x_{i}) < f_{k} (x_{j})

(18)

To quantify the performance advantages of DWG-DQN over traditional multi-objective optimization algorithms, this paper adopts the relative improvement rate as the core evaluation metric, which reflects the optimization degree of the target algorithm over the benchmark in a specific performance dimension. For each optimization objective

k \in {energy consumption, bandwidth utilization, load balancing}

, its relative improvement rate

{Improvement}_{k} (%)

is defined as follows:

{Improvement}_{k} (%) = \frac{{Value}_{k, Reference algorithm} - {Value}_{k, DWG - DQN}}{{Value}_{k, Reference algorithm}} \times 100 %

(19)

To ensure the scientificity and comprehensiveness of the benchmark test, this paper selects two classic multi-objective optimization algorithms, NSGA-II and MOPSO, as comparison benchmarks.

NSGA-II (Non-dominated Sorting Genetic Algorithm II): As a classic improved version of the genetic algorithm in the field of multi-objective optimization, NSGA-II aims to solve the core defects of the original NSGA algorithm, such as high computational complexity and uneven distribution of solution sets, and is one of the most widely used multi-objective optimization algorithms in the field of engineering optimization. It gradually screens out Pareto optimal solutions through multiple generations of iteration and finally approximates the Pareto frontier. The core advantage of this algorithm is that it balances the convergence and distribution uniformity of the solution set, and can adapt to various types of multi-objective optimization scenarios such as continuous and discrete, which has good adaptability to discrete optimization scenarios, such as SDN link scheduling and traffic management in this paper.

MOPSO (Multi-Objective Particle Swarm Optimization): Improved based on the single-objective Particle Swarm Optimization (PSO) algorithm, it belongs to the category of swarm intelligence optimization algorithms. In MOPSO, each particle corresponds to a potential solution of the multi-objective optimization problem. By real-time tracking the personal best solution (pbest) and the global best solution (gbest), the particle dynamically updates its own position and velocity, screens non-dominated solutions by combining the Pareto dominance relationship, and stores the Pareto optimal solutions generated during the iteration through an external archive set, gradually approximating the Pareto frontier. This algorithm has the characteristics of simple structure, fast convergence speed and low implementation difficulty, and it performs well in continuous multi-objective optimization problems, and can also adapt to discrete optimization scenarios through appropriate adjustments.

The algorithm flow of this part is shown in Algorithm 3.

Algorithm 3 Pareto Frontier Comparison with Classical Multi-Objective Algorithms

Require: Problem definition

min {f_{1}, f_{2}, f_{3}}

, population size

N = 150

Ensure: Pareto frontier

P^{*}

for three objectives

1:: Define objective functions:
2:: $f_{1} (x) = 10 \sum_{i = 1}^{d} x_{i}^{2}$
3:: $f_{2} (x) = 8 \sum_{i = 1}^{d} {(x_{i} - 0.3)}^{2}$
4:: $f_{3} (x) = 12 [1 - \exp (- 5 σ (x))]$
5:: for each algorithm $A \in {NSGA - II, MOPSO, DWG - DQN}$ do
6:: if $A = NSGA - II$ then
7:: $F_{A} \leftarrow NSGA - II (N, 60)$ with SBX( $η = 12$ ) and PM( $η = 15$ )
8:: else if $A = MOPSO$ then
9:: Generate base solutions $b_{i} \sim U [l_{i}, u_{i}]$
10:: Apply iterative improvement
11:: else if $A = DWG - DQN$ then
12:: Generate solutions using domain knowledge
13:: Apply DRL-based optimization
14:: end if
15:: Extract Pareto front: $P_{A} = {f \in F_{A} ∣ ∄ f^{'} ≺ f}$
16:: Uniform sampling: $P_{A}^{*} \leftarrow Sample (P_{A}, 20)$
17:: end for
18:: Compute average objectives: ${\bar{f}}_{i}^{(A)} = \frac{1}{| P_{A}^{*} |} \sum f_{i}$
19:: Calculate improvement: $Δ_{A \to B} = \frac{{\bar{f}}^{(B)} - {\bar{f}}^{(A)}}{{\bar{f}}^{(B)}} \times 100 %$
20:: return ${P_{A}^{*}}, {Δ_{A \to B}}$

4.4. SDN Network Simulation and Pressure Test Based on Fat-Tree Topology

In this part, we implement an SDN network simulation and stress testing system based on fat-tree topology. Through the complete process of topology construction, traffic generation, load simulation and performance evaluation, we provide experimental verification for the effectiveness of the dynamic weight generation mechanism (DWG-DQN). In this part, we built the classic fat tree data center topology. Define the number of pods core_switches_num (default

k = 4

) through parameter k, automatically generate core switches, aggregate switches, edge switches and host nodes, and establish links between nodes according to the hierarchical connection rules of fat tree. The formula is as follows:

core_switches_num = {(k / 2)}^{2}

(20)

For example, when

k = 4

, the number of core switches is

{(4 / 2)}^{2}

= 4, named cs1 to cs4.

Each pod contains aggregate switches, edge switches and hosts, and the number is

\dot{κ} / 2

; that is,

\begin{matrix} agg_switches_per_pod & = edge_switches_per_pod \\ = hosts_per_edge = k / 2 \end{matrix}

(21)

Each edge switch connects

\dot{κ} / 2

hosts to form

hosts_total = k \times {(k / 2)}^{2}

host nodes.

The algorithm flow of this part is shown in Algorithm 4.

Algorithm 4 Fat-Tree Construction and Evaluation

Require: k: even,

k \geq 2

;

T

: traffic patterns

Ensure:

G

: topology;

R

: performance metrics

1:: Step1: Construct Fat-Tree
2:: $V \leftarrow \emptyset$ , $E \leftarrow \emptyset$
3:: $n_{c} \leftarrow {(k / 2)}^{2}$ , $n_{a} \leftarrow n_{e} \leftarrow k^{2} / 2$ , $n_{h} \leftarrow k^{3} / 2$
4:: for $i \leftarrow 0$ to $n_{c} + n_{a} + n_{e} + n_{h} - 1$ do
5:: add corresponding node to V
6:: end for
7:: for $p \leftarrow 0$ to $k - 1$ do
8:: for $i, j \leftarrow 0$ to $k / 2 - 1$ do
9:: connect $c_{j (k / 2) + i}$ to $a_{p (k / 2) + i}$
10:: connect $a_{p (k / 2) + i}$ to $e_{p (k / 2) + j}$
11:: end for
12:: end for
13:: $h_{o f f} \leftarrow 0$
14:: for $p \leftarrow 0$ to $k - 1$ do
15:: for $e \leftarrow 0$ to $k / 2 - 1$ do
16:: for $h \leftarrow 0$ to $k - 1$ do
17:: connect $e_{p (k / 2) + e}$ to $h_{h_{o f f} + h}$
18:: end for
19:: $h_{o f f} \leftarrow h_{o f f} + k$
20:: end for
21:: end for
22:: Step2: Evaluate Performance
23:: $R \leftarrow \emptyset$ , reset $G$
24:: for each $t \in T$ do
25:: $(s_{t}, l_{a, t}, l_{m, t}) \leftarrow run test with t$
26:: $R \leftarrow R \cup {(t, s_{t}, l_{a, t}, l_{m, t})}$
27:: end for
28:: normalize all metrics in $R$
29:: return $G, R$

5. Experimental Results and Analysis

We first used five rounds of independent tests, running 100 time steps in each round, and recorded the cumulative reward value, average switch energy consumption, load balance (the reciprocal of the standard deviation of link utilization) and link bandwidth utilization of the fixed-weight scheme and the dynamic-weight scheme in each round. Reward data is used to evaluate the overall profitability of the strategy, while performance indicators are used to measure the specific achievement of optimization goals.

The experimental results show that the dynamic weight generation mechanism is significantly better than the fixed-weight scheme in multiple dimensions, as shown in Figure 3. In each subplot, the colored dashed lines represent the average value of the corresponding metric for each scheme across all trials. The specific performance is as follows:

Average reward value: the average reward value of the dynamic weight scheme is 70.89, which is 12.23% higher than the average reward value of the fixed-weight scheme of 63.17, reflecting that the dynamic weight scheme has high stability and is not vulnerable to external interference. It can be seen from the figures that the “Dynamic Weights” curve fluctuates relatively greatly because the dynamic weight mechanism adjusts flexibly according to the real-time conditions of the system, which will lead to a significant increase or decrease in the average reward during the adaptive transition. In contrast, the “Fixed Weights” curve fluctuates relatively stably because it lacks such adaptability. The same is true for the following three figures;
Energy consumption optimization capability: the average energy consumption of the network under the fixed-weight scheme is about 86.35 W, while the dynamic weight scheme effectively reduces it to about 57.00 W, and the energy consumption decreases by about 33.93%, reflecting the advantages of the dynamic weight scheme in the direction of energy saving;
Load balancing performance: the dynamic scheme improves the load balancing score from 0.69 to 0.90, and the overall performance is improved by 31.12%, reflecting the advantages of the dynamic weight scheme in load balancing performance;
Bandwidth utilization: the dynamic weight mechanism increases the bandwidth utilization from 0.55 to 0.68, and the overall performance is improved by 24.03%, reflecting the advantages of the dynamic weight scheme in load balancing performance.

In addition, we select the four comparison algorithms mentioned above to compare with our DWG-DQN algorithm, and fully verify the advantages of the DWG-DQN algorithm from the two dimensions of reward curve and multi-index performance comparison in the training process. The light translucent curve in the figure is the original reward value sequence of each algorithm, reflecting the real-time fluctuation of rewards during the training process; The dark solid line is the trend curve after the moving average (window size = 5), which is used to highlight the overall convergence characteristics of rewards. The combination of the two can not only show the stability of the algorithm in the dynamic environment, but also intuitively show the long-term optimization trend of different strategies. In the comparison of training reward curves, the average reward value of DWG-DQN has rapidly increased from about −55 to about −43, and the convergence speed is significantly better than that of traditional algorithms such as Fixedweight-MO and Heuristic Energy, and the fluctuation range of reward is the smallest, which reflects the stronger adaptability to the dynamic network environment. In the traditional algorithm, the Fixedweight-MO reward always hovers around −8.5 due to the fixed weight limit, and the heuristic energy reward fluctuates by ±1.5, while DWG-DQN dynamically generates weights through LSTM, and can adaptively adjust the weights of energy consumption and delay at key nodes, significantly improving the rewards in this cycle, as shown in Figure 4.

To further verify the multi-objective optimization capability of the proposed DWG-DQN algorithm, this paper conducts a Pareto frontier comparison experiment with classic multi-objective optimization algorithms (NSGA-II, MOPSO) in the SDN scenario. The results of the Pareto frontier comparison experiment are shown in Figure 5. Figure 5 visually illustrates the distribution of the Pareto solution sets generated by the three algorithms in the target space constructed by “energy consumption - load balancing - bandwidth utilization”. It can be observed that the Pareto solution set of DWG-DQN is closer to the theoretical optimal Pareto frontier, and the distribution of solutions is more uniform. This reflects that the DWG-DQN algorithm can flexibly adapt to the dynamic changes in the network, providing a richer combination of optimal solutions for balancing conflicting objectives such as energy consumption, load balancing and bandwidth utilization. To quantitatively compare the multi-objective performance of various algorithms, Table 2 summarizes the mean values of core performance indicators (energy consumption, load balancing, and bandwidth utilization) corresponding to the Pareto solutions of DWG-DQN, NSGA-II, and MOPSO. Based on the indicator data in Table 2, this paper further calculates the relative optimization rate of DWG-DQN compared to NSGA-II and MOPSO using the relative improvement rate Equation (19), and the results are shown in Table 3.

Combining the experimental results and theoretical analysis, it can be seen that the core advantages of the NSGA-II algorithm are uniform solution set distribution, strong robustness and stable convergence, but it has defects of high computational complexity, poor dynamic adaptability and slow convergence in discrete scenarios; The core advantages of the MOPSO algorithm are fast convergence speed, low computational complexity and easy implementation, but it has defects of uneven solution set distribution, poor robustness and insufficient comprehensive performance. The DWG-DQN algorithm proposed in this paper, through the LSTM dynamic weight generation mechanism and reinforcement learning environment feedback capability, perfectly makes up for the deficiencies of the two benchmark algorithms, and at the same time balances the uniformity of solution set distribution, fast convergence and high robustness. It performs better in the three core indicators of energy consumption, load balancing and bandwidth utilization, and has excellent dynamic adaptability, which is more suitable for the complex and dynamic multi-objective optimization scenario of SDN green traffic management, verifying the effectiveness and superiority of the algorithm proposed in this paper. In summary, this article has conducted systematic experimental verification on two core dimensions, namely weight mechanism (compared with traditional dynamic weight algorithms) and multi-objective optimization performance (compared with classical multi-objective algorithms), fully demonstrating that the DWG-DQN algorithm proposed in this article has better dynamic adaptability and comprehensive performance.

By generating the fat-tree topology, combining it with the shortest path algorithm to allocate traffic and introducing random disturbance, we realize the simulation of the real network environment, as shown in Figure 6. The figure shows the constructed fat-tree topology, which is widely used in the data center network and can provide high bandwidth and low latency data transmission paths. The figure clearly shows the connection relationship among core, aggregation, and edge switches and hosts, where ‘cs’ represents core switch, ‘as’ represents aggregation switch, ‘es’ represents edge switch, and ‘h’ represents host. This figure lays a foundation for subsequent traffic allocation and controller deployment optimization. This highly structured network model not only helps to understand the basic composition of the network, but also provides an ideal experimental environment for exploring new optimization strategies.

Figure 7 focuses on the visual presentation of network stress test results and scheme comparison, aiming to quantitatively evaluate the network performance of static weight and dynamic weight schemes under different traffic modes (random, hot and balanced). Through key indicators such as packet success rate (static weight scheme, which is shown in the blue column), average load and maximum load of switch (dynamic weight scheme, which is shown in red and green lines, respectively), we can deeply analyze the impact of the two schemes on network efficiency under different traffic modes. The experimental results show that in the random mode, the success rate of static weight is medium, the average load of dynamic weight is low, and the maximum load is medium. In the hot spot mode, the success rate of the static weight decreases significantly, indicating that the static scheme is unable to cope with the hot spot traffic due to network paralysis due to excessive packet loss or serious congestion. In the dynamic weight scheme, the red line and green line have increased, but the increased value is still within the controllable range, indicating that the dynamic scheme can overcome the scheduling task of a large amount of traffic despite the high load in the hot spot traffic, and can disperse this traffic well, which shows that it optimizes the load distribution. In the balanced mode, the success rate of the static weight scheme is high, and the average load and maximum load of the dynamic weight scheme are also very low, indicating that it is more energy-saving in the balanced mode, which fully reflects the advantages of the dynamic weight model in the direction of network performance optimization and energy saving.

6. Conclusions

With the explosive growth of network traffic, the problem of network energy consumption has become increasingly prominent. Building a green and energy-saving network has become the core challenge to be solved. This paper focuses on the centralized SDN architecture and carries out the research on green network traffic management technology. In order to solve the problem that the weight of the traditional reward function is fixed and difficult to adapt to the dynamic changes in the network, we introduce the deep learning technology, propose the dynamic weight generation method DWG-DQN, construct the energy efficiency optimization reward function that comprehensively considers the network energy consumption, load balancing, and bandwidth utilization, and realize the multi-objective collaborative optimization. The results show that this method has better performance in reducing energy consumption, improving load balancing performance and bandwidth utilization, and has strong dynamic adaptability. In the future, we will explore the application of this method in larger-scale networks, and promote the intelligent development of green networks by combining edge computing, network function virtualization and other technologies.

Author Contributions

Writing—review and editing, J.W., X.C. and L.Y.; conceptualization, J.W.; methodology, J.W.; resources, J.W.; supervision, J.W.; project administration, J.W.; writing—original draft, Z.L.; data curation, Z.L.; formal analysis, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We thank Jian Wang for the excellent technical assistance.

Conflicts of Interest

The authors declare no conflicts of interest. All authors have read and agreed to the published version of the manuscript.

Appendix A

The parameters in Equation (6) of this paper are set based on the measured energy consumption characteristics of typical SDN network devices and the training requirements of reinforcement learning. Mnih et al. [32] suggest that in deep Q-learning, the reward values should be scaled to a reasonable range, so that the Q-value does not become too large, leading to a gradient explosion, and the reward differences between different actions are sufficiently distinct for learning. Therefore, we tested four values of

α

: 50, 100, 150, and 200. The experimental results show that

$α = 50$ —the reward discrimination is too small, leading to slow convergence.
$α = 100$ —basically feasible, but the difference in strategies is not obvious.
$α = 150$ —optimal, balancing discrimination and stability.
$α = 200$ —gradient instability phenomenon occurs.

Therefore, this paper chooses

α = 150

as the value of the amplification factor.

According to the power consumption test results in the article of Sarrar et al. [33], the Pica8 Pronto 3290 switch (an OpenFlow-compatible model) maintained stable energy consumption of 59–61 W under baseline operating conditions, indicating that 60 W represents an attainable “ideal” energy consumption level for well-optimized commercial switches. Therefore, we can set a value close to this as the threshold for achieving high scores. In Equation (7) of this paper, when the average energy consumption

\bar{E} = E_{0} = 60

,

S_{E} = \exp (0) = 1.0

, we stipulate that the energy consumption score at this time is given a full mark. In addition, the role of

τ_{E}

in Equation (7) is to control the speed of exponential decay. Greenberg et al. [34] pointed out that there is an energy efficiency inflection point for network equipment in the 80–90 W range, beyond which the marginal benefit of performance improvement per watt decreases rapidly. Therefore, setting the maximum value of the scoring decrease rate (i.e., the point where the algorithm is most sensitive to energy consumption increase) at 85 W has a clear physical significance: when the actual average energy consumption

\bar{E}

approaches the energy efficiency inflection point of the network equipment, the reward function can achieve an optimal balance between performance requirements and energy consumption costs. The energy consumption scoring function defined by Equation (7) is:

S_{E} = \exp (- \frac{\bar{E} - E_{0}}{τ_{E}})

(A1)

Its first derivative is:

\frac{d S_{E}}{d \bar{E}} = - \frac{1}{τ_{E}} \exp (- \frac{\bar{E} - E_{0}}{τ_{E}})

(A2)

The absolute value of the derivative (i.e., the rate of decrease in score as energy consumption increases) is:

|\frac{d S_{E}}{d \bar{E}}| = \frac{1}{τ_{E}} \exp (- \frac{\bar{E} - E_{0}}{τ_{E}})

(A3)

This is a unimodal function regarding

(\bar{E} - E_{0})

. By setting its second derivative to zero, we can determine the extreme point:

\frac{d}{d \bar{E}} (|\frac{d S_{E}}{d \bar{E}}|) = - \frac{1}{τ_{E}^{2}} \exp (- \frac{\bar{E} - E_{0}}{τ_{E}}) + \frac{1}{τ_{E}^{2}} \exp (- \frac{\bar{E} - 0}{τ_{E}}) \cdot \frac{1}{τ_{E}} (\bar{E} - E_{0}) = 0

(A4)

Solution:

\bar{E} - E_{0} = τ_{E}

. At this point, the rate of score decrease reaches its maximum. From the previous discussion, we know that

E_{0} = 60

W. If we want the rate of decrease to reach its maximum at

\bar{E} = 85

W, then

τ_{E} = 25

W. Proof complete.

References

Laflamme, S.; Ubertini, F.; Di Matteo, A.; Pirrotta, A.; Perry, M.; Fu, Y.; Li, J.; Wang, H.; Hoang, T.; Glisic, B.; et al. Roadmap on Measurement Technologies for next Generation Structural Health Monitoring Systems. Meas. Sci. Technol. 2023, 34, 093001. [Google Scholar] [CrossRef]
Aslam, M.; Ye, D.; Tariq, A.; Asad, M.; Hanif, M.; Ndzi, D.; Chelloug, S.A.; Elaziz, M.A.; Al-Qaness, M.A.A.; Jilani, S.F. Adaptive Machine Learning Based Distributed Denial-of-Services Attacks Detection and Mitigation System for SDN-Enabled IoT. Sensors 2022, 22, 2697. [Google Scholar] [CrossRef]
Wang, C.-X.; You, X.; Gao, X.; Zhu, X.; Li, Z.; Zhang, C.; Wang, H.; Huang, Y.; Chen, Y.; Haas, H.; et al. On the Road to 6G: Visions, Requirements, Key Technologies, and Testbeds. IEEE Commun. Surv. Tutor. 2023, 25, 905–974. [Google Scholar] [CrossRef]
Brito, J.A.; Moreno, J.I.; Contreras, L.M.; Alvarez-Campana, M.; Blanco Caamaño, M. Programmable Data Plane Applications in 5G and Beyond Architectures: A Systematic Review. Sensors 2023, 23, 6955. [Google Scholar] [CrossRef]
Li, S.; Li, W.; Zheng, W.; Xia, Y.; Guo, K.; Peng, Q.; Li, X.; Ren, J. Multi-User Joint Task Offloading and Resource Allocation Based on Mobile Edge Computing in Mining Scenarios. Sci. Rep. 2025, 15, 16170. [Google Scholar] [CrossRef] [PubMed]
Boryło, P.; Chołda, P.; Domżał, J.; Jaglarz, P.; Jurkiewicz, P.; Rzepka, M.; Rzym, G.; Wójcik, R. SDNRoute: Proactive Routing Optimization in Software Defined Networks. Comput. Commun. 2024, 225, 250–278. [Google Scholar] [CrossRef]
Franchi, F.; Marotta, A.; Rinaldi, C.; Graziosi, F.; Fratocchi, L.; Parisse, M. What Can 5G Do for Public Safety? Structural Health Monitoring and Earthquake Early Warning Scenarios. Sensors 2022, 22, 3020. [Google Scholar] [CrossRef] [PubMed]
Qiu, J. The Deep Separable Convolution with DSC NCF Model and Optimization Mechanism of Digital Economy for Intelligent Manufacturing under Sales Order Recommendation Algorithm. Sci. Rep. 2025, 15, 29966. [Google Scholar] [CrossRef]
Masood, F.; Khan, W.U.; Alshehri, M.S.; Alsumayt, A.; Ahmad, J. Energy Efficiency Considerations in Software-defined Wireless Body Area Networks. Eng. Rep. 2024, 6, e12841. [Google Scholar] [CrossRef]
Mehmood, K.T.; Hussain, M.M. Dynamic Load Management in Modern Grid Systems Using an Intelligent SDN-Based Framework. Energies 2025, 18, 3001. [Google Scholar] [CrossRef]
Wang, Z.; Duan, J.; Luo, F.; Wu, X. Two-Stage Optimal Scheduling for Urban Snow-Shaped Distribution Network Based on Coordination of Source-Network-Load-Storage. Energies 2024, 17, 3583. [Google Scholar] [CrossRef]
Nandhakumar, A.R.; Baranwal, A.; Choudhary, P.; Golec, M.; Gill, S.S. EdgeAISim: A Toolkit for Simulation and Modelling of AI Models in Edge Computing Environments. Meas. Sens. 2024, 31, 100939. [Google Scholar] [CrossRef]
Moghaddasi, K.; Rajabi, S.; Gharehchopogh, F.S. Multi-Objective Secure Task Offloading Strategy for Blockchain-Enabled IoV-MEC Systems: A Double Deep Q-Network Approach. IEEE Access 2024, 12, 3437–3463. [Google Scholar] [CrossRef]
Zhang, H.; Liu, R.; Kaushik, A.; Gao, X. Satellite Edge Computing with Collaborative Computation Offloading: An Intelligent Deep Deterministic Policy Gradient Approach. IEEE Internet Things J. 2023, 10, 9092–9107. [Google Scholar] [CrossRef]
Hakiri, A.; Sellami, B.; Yahia, S.B. Joint Energy Efficiency and Network Optimization for Integrated Blockchain-SDN-Based Internet of Things Networks. Future Gener. Comput. Syst. 2025, 163, 107519. [Google Scholar] [CrossRef]
Saif, F.A.; Latip, R.; Hanapi, Z.M.; Shafinah, K. Multi-Objective Grey Wolf Optimizer Algorithm for Task Scheduling in Cloud-Fog Computing. IEEE Access 2023, 11, 20635–20646. [Google Scholar] [CrossRef]
Shreen, J.; Lee, K. Improving the Regenerative Efficiency of the Automobile Powertrain by Optimizing Combined Loss in the Motor and Inverter. Actuators 2025, 14, 326. [Google Scholar] [CrossRef]
Xu, H.; Yang, M.; Cheng, Z.; Su, X. An Analysis of and Improvements in the Gear Conditions of the Automated Mechanical Transmission of a Battery Electric Vehicle Considering Energy Consumption and Power Performance. Actuators 2024, 13, 432. [Google Scholar] [CrossRef]
Katale, T.S.; Gao, L.; Zhang, Y.; Senouci, A. A Bilevel Optimization Framework for Adversarial Control of Gas Pipeline Operations. Actuators 2025, 14, 480. [Google Scholar] [CrossRef]
Bravo Pinto, J.; Falcão Carneiro, J.; Gomes De Almeida, F.; Cruz, N.A. Variable Structure Depth Controller for Energy Savings in an Underwater Device: Proof of Stability. Actuators 2025, 14, 340. [Google Scholar] [CrossRef]
Qi, K.; Wu, Q.; Fan, P.; Cheng, N.; Chen, W.; Letaief, K. Reconfigurable-intelligent-surface-aided vehicular edge computing: Joint phase-shift optimization and multiuser power allocation. IEEE Internet Things J. 2024, 12, 764–777. [Google Scholar] [CrossRef]
Qiao, K.; Liang, J.; Liu, Z.; Yu, K.; Yue, C.; Qu, B. Evolutionary Multitasking with Global and Local Auxiliary Tasks for Constrained Multi-Objective Optimization. IEEE/CAA J. Autom. Sin. 2023, 10, 1951–1964. [Google Scholar] [CrossRef]
Rabee, H.W.S.; Majeed, D.M. Energy Management System-Based Multi-Objective Nizar Optimization Algorithm Considering Grid Power and Battery Degradation Cost. Energies 2025, 18, 5678. [Google Scholar] [CrossRef]
Liu, L.; Luo, H.; Tian, L.; Wang, S.; Ma, L.; Gao, X.; Fang, C.; Sun, H.; Jin, X.; Jiang, S.; et al. Multi-Objective Optimization of Industrial Productivity and Renewable Energy Allocation Based on NSGA-II for Carbon Reduction and Cost Efficiency: Case Study of China. Energies 2025, 18, 5438. [Google Scholar] [CrossRef]
Li, M.; Guo, Y.; Luo, D.; Ma, C. A Hybrid Variable Weight Theory Approach of Hierarchical Analysis and Multi-Layer Perceptron for Landslide Susceptibility Evaluation: A Case Study in Luanchuan County, China. Sustainability 2023, 15, 1908. [Google Scholar] [CrossRef]
Huang, J.; Zhou, S.; Li, G.; Shen, Q. Real-Time Monitoring and Optimization Methods for User-Side Energy Management Based on Edge Computing. Sci. Rep. 2025, 15, 24890. [Google Scholar] [CrossRef]
Ma, M.; Lei, X. A Dual Graph Neural Network for Drug–Drug Interactions Prediction Based on Molecular Structure and Interactions. Plos Comput. Biol. 2023, 19, e1010812. [Google Scholar] [CrossRef] [PubMed]
Sboev, A.; Rybka, R.; Kunitsyn, D.; Serenko, A.; Ilyin, V.; Putrolaynen, V. Extraction of Significant Features by Fixed-Weight Layer of Processing Elements for the Development of an Efficient Spiking Neural Network Classifier. Big Data Cogn. Comput. 2023, 7, 184. [Google Scholar] [CrossRef]
Zhu, Q.; Mulligan, V.K.; Shasha, D.E. Heuristic Energy-Based Cyclic Peptide Design. PLoS Comput. Biol. 2025, 21, e1012290. [Google Scholar] [CrossRef]
Hussain, A.; Kim, H.-M. A Rule-Based Modular Energy Management System for AC/DC Hybrid Microgrids. Sustainability 2025, 17, 867. [Google Scholar] [CrossRef]
Guan, W.; Cui, Z.; Zhang, X. Intelligent Smart Marine Autonomous Surface Ship Decision System Based on Improved PPO Algorithm. Sensors 2022, 22, 5732. [Google Scholar] [CrossRef] [PubMed]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjel, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Sarrar, N.; Uhlig, S.; Feldmann, A.; Sherwood, R.; Huang, X. Leveraging Zipf’s law for traffic offloading. ACM SIGCOMM Comput. Commun. Rev. 2012, 42, 16–22. [Google Scholar] [CrossRef]
Greenberg, A.; Hamilton, J.; Maltz, D.; Patel, P. The cost of a cloud: Research problems in data center networks. ACM SIGCOMM Comput. Commun. Rev. 2008, 39, 68–73. [Google Scholar] [CrossRef]

Figure 1. Traditional DQN architecture diagram.

Figure 2. Our DWG-DQN algorithm architecture based on SDN.

Figure 3. Comparison of various performance parameters between the dynamic-weight scheme and the static-weight scheme.

Figure 4. Performance comparison with classical fixed-weight algorithms.

Figure 5. Pareto frontier comparison with classical multi-objective algorithms.

Figure 6. Fat tree topology.

Figure 7. Network stress test results.

Table 1. Main notations in this study.

Symbols	Description
DWG-DQN	Dynamic Weight Generation Deep Q-Network
LSTM	Long Short-Term Memory
SDN	Software-Defined Networking
E	Total Energy Consumption of Devices
L	Load Balance Degree
B	Average Bandwidth Utilization
w	Dynamic Weight Vector
$α$	Amplification Factor

Table 2. Comparison of performance indicators and relative improvement rates of various algorithms.

Algorithm	Average Energy Consumption	Average Load Balancing	Average Bandwidth Utilization
NSGA-II	0.6731	0.5808	0.6359
MOPSO	0.8155	0.6947	0.7746
DWG-DQN	0.6024	0.4262	0.5326

Table 3. Improvement rate of each indicator.

Contrast Relationship	Energy Consumption Improvement Rate	Load Balancing Improvement Rate	Bandwidth Utilization Improvement Rate
DWG-DQN vs. NSGA-II	10.51%	26.62%	16.25%
DWG-DQN vs. MOPSO	26.13%	38.66%	31.24%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, J.; Liu, Z.; Cao, X.; Yang, L. A Dynamic Weight Deep Reinforcement Learning Approach for SDN Multi-Objective Optimization with Actuator Integration. Actuators 2026, 15, 114. https://doi.org/10.3390/act15020114

AMA Style

Wang J, Liu Z, Cao X, Yang L. A Dynamic Weight Deep Reinforcement Learning Approach for SDN Multi-Objective Optimization with Actuator Integration. Actuators. 2026; 15(2):114. https://doi.org/10.3390/act15020114

Chicago/Turabian Style

Wang, Jian, Zhongxu Liu, Xianzhi Cao, and Liusong Yang. 2026. "A Dynamic Weight Deep Reinforcement Learning Approach for SDN Multi-Objective Optimization with Actuator Integration" Actuators 15, no. 2: 114. https://doi.org/10.3390/act15020114

APA Style

Wang, J., Liu, Z., Cao, X., & Yang, L. (2026). A Dynamic Weight Deep Reinforcement Learning Approach for SDN Multi-Objective Optimization with Actuator Integration. Actuators, 15(2), 114. https://doi.org/10.3390/act15020114

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Dynamic Weight Deep Reinforcement Learning Approach for SDN Multi-Objective Optimization with Actuator Integration

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. Dynamic Weight Generation Mechanism

3.2. Pareto Frontier and Analysis

4. Algorithm Design

4.1. Comparison of Dynamic and Fixed Weight Schemes

4.2. Comparison with Fixed-Weight Baselines

4.3. Comparison of Pareto Frontiers

4.4. SDN Network Simulation and Pressure Test Based on Fat-Tree Topology

5. Experimental Results and Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI