1. Introduction
With the continuous expansion of network scale and the emergence of new types of services, traditional network architectures are facing unprecedented challenges in terms of flexibility, scalability, and resource utilization [
1]. Traditional network devices usually adopt closed hardware and software systems, and the deployment and upgrade of network functions are highly dependent on equipment manufacturers, resulting in a rigid network and a slow pace of innovation. Additionally, the tight coupling of the control plane and data plane makes the configuration and management of network devices extremely complex, making it difficult to adapt to dynamic changes in the network environment and business requirements [
2]. These problems are particularly prominent in terms of energy consumption and actuator scheduling—actuators, as the core execution units for data forwarding and resource control in SDN, their scheduling strategies directly affect network energy efficiency and service quality. However, the explosive growth of network traffic has put actuators under dual pressure of “real-time response” and “multi-objective optimization”. How to achieve energy-saving operation and efficient collaboration of actuators through intelligent scheduling has become a core research topic in current green network studies.
To meet these challenges, Software-Defined Networking (SDN) has emerged as a new network architecture and gradually become a key technology for future network development [
3]. By separating the control plane from the data plane, SDN realizes the centralized management and flexible scheduling of network functions, providing unprecedented programmability and openness for the collaborative optimization of actuators [
4]. Its core idea is to uniformly manage the network through a centralized controller to optimize resource allocation, thereby improving the energy efficiency of actuators and reducing operating costs [
5]. The SDN architecture is usually composed of an infrastructure layer, a control layer, and an application layer: The infrastructure layer is responsible for data forwarding (the core deployment layer of actuators), the control layer is responsible for the formulation and distribution of network policies, and the application layer provides diversified network services [
6]. This hierarchical design not only simplifies network management but also provides a new optimization space for the green scheduling of actuators.
However, despite the many theoretical advantages of SDN technology, it still faces numerous challenges in the practical application of actuator scheduling: first, traditional SDN traffic management methods mostly adopt fixed-weight reward functions to balance multi-objective optimization issues such as energy consumption, load balancing, and bandwidth utilization. These methods are difficult to adapt to the dynamic fluctuations of network traffic and device heterogeneity, leading to rigid actuator scheduling strategies that cannot respond to changes in network status in real time [
7]; second, existing energy-saving mechanisms often focus on the optimization of a single performance index, lacking systematic consideration of the multi-objective collaborative optimization of actuators [
8]. For example, the traditional Deep Q-Network (DQN) estimates action values through neural networks, but its fixed-weight mechanism struggles to meet the performance requirements of actuators in scenarios such as traffic bursts or topology changes, resulting in a significant decline in execution efficiency. Its architectural diagram is shown in
Figure 1.
To address these issues, this paper focuses on intelligent actuator scheduling in SDN and proposes a deep learning-based dynamic weight generation method (DWG-DQN) for green network traffic management. By integrating LSTM and deep reinforcement learning, we design a multi-objective optimization framework with real-time adjustable weights, as shown in
Figure 2. This framework dynamically generates optimal weights according to network status, significantly reducing actuator energy consumption while satisfying delay and load balancing constraints. Specifically, the SDN controller collects real-time data including energy consumption, load balancing, and bandwidth utilization. The LSTM model predicts optimal weight distribution from historical data, and a comprehensive reward function guides actuator scheduling to achieve multi-objective balance. The main contributions of this paper are as follows:
A dynamic weight generation mechanism integrating LSTM and reinforcement learning to adapt fixed-weight scheduling to dynamic networks;
A multi-objective optimization framework balancing actuator energy consumption, load balancing, and bandwidth utilization;
Experimental verification on an SDN fat-tree topology, providing an engineering solution for intelligent actuator scheduling.
2. Related Work
In the field of network optimization and resource scheduling, numerous studies have been conducted to address issues such as resource allocation, energy consumption control, and security protection in scenarios including the Internet of Things (IoT), cloud-fog computing, and edge computing. For IoT and Software-Defined Networking (SDN) integrated networks, researchers have leveraged SDN technology to optimize routing congestion in Software-Defined Wireless Body Area Networks (SDWBANs), load balancing of power grid transformers, and dynamic clustering of IoT devices, aiming to improve network resource utilization and energy efficiency [
9,
10,
11]. Additionally, in cloud-fog and edge computing scenarios, scholars have carried out research on reducing energy consumption by capturing topological dependencies between servers using graph neural networks, ensuring data security for task offloading in Internet of Vehicles–Mobile Edge Computing (IoV-MEC) systems through blockchain integration, and optimizing offloading strategies in satellite edge computing with reinforcement learning [
12,
13,
14]. Furthermore, some researchers have explored the integration of blockchain and SDN to solve problems of trust deficiency and inefficient consensus in IoT networks [
15]. However, these studies mostly adopt fixed strategies or single optimization objectives, lacking an adaptive weight adjustment mechanism to cope with dynamic network changes, making it difficult to achieve real-time multi-objective collaborative optimization in complex scenarios.
In terms of specific system optimization and the application of multi-objective algorithms, relevant research covers various scenarios: In the field of electric vehicles, researchers have balanced task latency and energy consumption through multi-objective optimization algorithms, improved regenerative braking strategies, and optimized the performance of multi-gear transmissions to enhance the energy efficiency of power components [
16,
17,
18]. Moreover, in the field of Cyber-Physical Systems (CPS) and specific devices, scholars have constructed attack protection frameworks for gas pipeline CPS [
19], optimized the control stability of underwater equipment, and optimized the link quality of Reconfigurable Intelligent Surface (RIS)-assisted vehicular communications [
20,
21] to ensure safe and reliable system operation. In the energy and industrial sectors, researchers have designed microgrid energy management systems and industrial carbon emission reduction frameworks by improving multi-objective optimization algorithms, achieving multi-objective collaborative optimization [
22,
23,
24]. Nevertheless, these studies do not consider the application scenario of SDN traffic management, and traditional multi-objective optimization often relies on fixed weights, which cannot well adapt to requirements such as network traffic fluctuations and device heterogeneity.
In summary, in existing research on network traffic, there are common problems such as SDN traffic management being difficult to adapt to dynamic networks, and relevant applications of multi-objective collaborative optimization not fully integrating the application scenarios of SDN traffic management, which restricts the optimization effect of network performance. Targeting the core limitations of the above two types of research, this paper proposes a Dynamic Weight Generation Deep Q-Network (DWG-DQN) framework suitable for SDN traffic management scenarios. By introducing the Long Short-Term Memory (LSTM) network and deep reinforcement learning technology, the framework can dynamically generate adaptive weights based on real-time network states, realizing multi-objective collaborative optimization of energy consumption, load balancing, and bandwidth utilization. It effectively solves the key problems in existing research and fills the relevant technical gaps.
3. Preliminaries
This section introduces the theoretical foundation and methodological framework of this study, encompassing the formal modeling of multi-objective optimization problems, Pareto frontier analysis, multi-objective performance evaluation indicators, and a dynamic weight generation mechanism. These elements lay the groundwork for subsequent algorithm design and experimental analysis.
3.1. Dynamic Weight Generation Mechanism
Dynamic weight generation is the core innovation of the proposed method, addressing the inability of traditional fixed-weight approaches to adapt to network dynamics. SDN enables global scheduling of network resources through a centralized controller, providing a flexible optimization framework for green traffic management. Traditional methods typically adopt reward functions with fixed weights to balance energy consumption, load balancing, and bandwidth utilization, but they struggle to cope with dynamic fluctuations in network traffic and device heterogeneity. To overcome this limitation, this study introduces a dynamic weight generation mechanism based on an LSTM network. The model takes historical network load data as input and outputs a real-time weight vector
, normalized via the softmax function to ensure the sum of weights equals 1 and each weight falls within
. The gating mechanism of LSTM effectively captures temporal dependencies, making it suitable for handling periodic traffic fluctuations. The dynamic reward function is finally defined as
where
E is the total energy consumption of the equipment,
L is the load balancing degree, and
B is the average bandwidth utilization. The list of abbreviation symbols and formula symbols involved in this paper is shown in
Table 1.
It should be noted that this article involves two key vectors, and their roles need to be clearly distinguished: The decision vector represents traffic allocation ratios and serves as the direct decision variable in the optimization problem. In contrast, the weight vector denotes the dynamic weights of the objective functions, adjusting the relative importance of energy consumption, load balancing and bandwidth utilization.
These vectors play different roles in the optimization process: is dynamically generated by the LSTM network based on historical states to guide reward function construction; based on this reward function, the reinforcement learning algorithm optimizes to achieve multi-objective collaboration. This decoupled design enables the system to adapt to network dynamics while achieving fine-grained traffic scheduling.
3.2. Pareto Frontier and Analysis
In multi-objective optimization problems, we often face multiple conflicting objectives that cannot be simultaneously optimized. Taking SDN green traffic management studied in this article as an example, we simultaneously investigate three key objectives: energy consumption (E), load balancing degree (L), and average bandwidth utilization (B). However, these three goals are essentially contradictory-reducing energy consumption may require closing some links, but this may lead to traffic being concentrated on a few paths, thereby increasing latency and disrupting load balancing; on the contrary, in order to pursue low latency and high load balancing, it is often necessary to maintain more active links, leading to an increase in energy consumption. Therefore, there is no “global optimal solution” that can simultaneously achieve the theoretical optimal values of all three objectives.
To address this fundamental contradiction, we introduce the concept of Pareto optimization. Its core is to find a set of special solutions, called Pareto optimal solutions. It is defined as: under the current decision variables, if a solution cannot further improve at least one of the objectives without harming any other objectives, then this solution is a Pareto optimal solution. The set consisting of all Pareto optimal solutions and the surface or curve depicted in the target space is called the Pareto front. It clearly demonstrates the trade-off between different goals.
The core contribution of this article is to intelligently explore and approximate this Pareto front in complex network states through the dynamic weight generation mechanism of the DWG-DQN algorithm, rather than being fixed towards a specific target. The experimental part will compare the solution sets obtained by the proposed algorithm with the baseline algorithm, analyze their distribution in the three-dimensional space of energy consumption delay load balancing, and verify whether our method can find better and more diverse trade-off schemes, thereby achieving better comprehensive performance in dynamic environments.
This study formulates the SDN green traffic management problem as a multi-objective optimization model aimed at jointly optimizing three key metrics: energy consumption, load balancing, and bandwidth utilization. The decision variables represent the traffic scheduling strategy, expressed as a path selection vector
, where
denotes the proportion of traffic allocated to the
i-th path, satisfying
and
for all
i. The objective functions include minimizing total energy consumption
, average delay
, and load imbalance degree
. Constraints cover bandwidth, delay, and energy limits, expressed as
where
is the capacity of link
j, and
and
are the maximum allowable delay and energy consumption thresholds, respectively. Let
denote the set of paths traversing link
j, and
represent the bandwidth demand allocated to path
i.
denotes the average link utilization, and
represents the set of all links in the network, used to traverse the capacity constraints of each link. This model provides a formal basis for subsequent dynamic weight optimization.
4. Algorithm Design
To achieve the multi-objective collaborative optimization of energy consumption, load balancing, and bandwidth utilization in SDN-based green traffic management, this chapter focuses on the core algorithm design centered on the dynamic weight generation mechanism. First, a comparative experiment between fixed weights and LSTM-dynamically generated weights is conducted to verify the optimization foundation; second, dedicated scoring functions are designed for the three objectives to quantify their performance; finally, an SDN simulation environment is constructed based on the fat-tree topology, providing experimental support consistent with real-world data centers for subsequent stress testing and performance verification.
4.1. Comparison of Dynamic and Fixed Weight Schemes
In this part, we implement the collaborative framework of a dynamic weight generator based on an LSTM and an SDN network simulator to verify the optimization effect of a dynamic reward function in green traffic management. In order to comprehensively evaluate the optimization effect of the dynamic weighting mechanism, two kinds of comparative experiments are designed in this paper.
Fixed weights: the static weight vector [0.2, 0.5, 0.3] set by experience is adopted, which corresponds to the three objectives of energy consumption, load balancing and bandwidth utilization, respectively [
25]. Throughout the experiment, the weight remained unchanged, representing the linear combination model in the traditional multi-objective optimization algorithm.
Dynamic weights: The weight vector is generated in real time by the aforementioned LSTM network and dynamically adjusted according to the current network state, representing a data-driven adaptive optimization path. By leveraging the LSTM network to learn the historical state-reward mapping relationship and generate an adaptive weight vector in real time, this approach serves as an innovative solution for intelligent green traffic management [
26].
The final comprehensive reward function
R is obtained by the weighted sum of the weight
after normalization,
where
denotes the energy consumption scoring function;
represents the load balancing scoring function;
stands for the bandwidth scoring function;
is the amplification factor, which is used to improve the differentiation of reward values under different configurations (set as 150 in this paper);
and
is the dynamic weight vector generated by LSTM [
27].
The calculation method of the energy consumption scoring function
is as follows: Considering that the energy consumption of switches in the network increases nonlinearly with the load, this paper uses an exponential decay function to punish high energy consumption so as to achieve energy-saving orientation.
where
is the average energy consumption of all switches at present;
is the baseline of energy consumption;
is the adjustment item, which controls the attenuation speed. When the average energy consumption is higher than the baseline, the score decreases significantly, thus inhibiting the rise in energy consumption in the reward function.
The basis for selecting these relevant parameters and the detailed derivation process are presented in the
Appendix A.
The load balancing scoring function
is calculated as follows: the load balancing degree is measured by the standard deviation
of link utilization. The smaller the standard deviation is, the more balanced the load is. The scoring function adopts the following sigmoid form:
In the above formula, represents the utilization of the i-th link, is the average utilization of the link, and n is the total number of links; factor 10 is used to enhance the sensitivity of the score to the standard deviation. Similarly, the smaller the standard deviation (i.e., the more balanced the load), the greater the score.
The bandwidth scoring function is calculated as follows:
The bandwidth score encourages the average link utilization to approach the target value
(considered as the optimal utilization interval), and its scoring function is defined as
where
is the average bandwidth utilization of the whole link;
is the normalization factor to ensure the score stays within a reasonable range when
. If utilization deviates from the target, the score decreases, reflecting the optimization penalty.
The algorithm flow of this part is shown in Algorithm 1.
| Algorithm 1 Dynamic Weight Optimization for SDN Green Traffic Management |
Require: T: number of trials; : weight groups; : fixed weights Ensure: : average rewards; : performance metrics- 1:
Initialize experience buffer - 2:
Initialize - 3:
for each weight group do - 4:
Initialize SDN environment - 5:
for to 100 do - 6:
- 7:
, - 8:
end for - 9:
Initialize weight generator with base weights - 10:
- 11:
for to 100 do - 12:
- 13:
- 14:
- 15:
, - 16:
end for - 17:
Plot metrics comparison - 18:
Plot reward comparison - 19:
end for - 20:
return
|
4.2. Comparison with Fixed-Weight Baselines
In this paper, we select four comparison algorithms (Fixed Weight, Heuristic Energy, Rule-Based LB and Static-Q-Learning), which represent four typical technology paths in green traffic management based on SDN. As a representative of fixed-weight multi-objective optimization, Fixed Weight has the limitations of the traditional manual parameter adjustment scheme. Its linear combination strategy based on preset weight is widely used in industry, but it lacks dynamic adaptability [
28]. Heuristic Energy represents the heuristic energy consumption priority algorithm, which embodies the rule-driven optimization idea and is often used in scenarios that require high real-time performance but need to sacrifice some optimization accuracy [
29]. As a rule-based load balancing algorithm, Rule-Based LB represents a topology-dependent strategy. It triggers load migration through a preset threshold, which is a typical scheme for early SDN traffic management [
30]. Static-Q-Learning represents the static reinforcement learning method. Its Q-table learning mechanism based on a fixed reward function is the representative of a traditional intelligent optimization algorithm [
31]. These four algorithms cover the typical technical route from manual parameter adjustment to rule-driven-to-static intelligence and can comprehensively verify the innovative advantages of our DWG-DQN in dynamic weight generation, multi-objective collaboration and environmental adaptability. The basic principles are as follows:
Fixed Weight: a normal distribution is used to simulate stable but limited reward fluctuations, with a mean value of −8.5 and a standard deviation of 0.5, reflecting the limitations of the manual parameter adjustment scheme.
Heuristic Energy: combines sinusoidal function and normal noise to simulate the high volatility driven by rules. The formula is
Here, t denotes the current time step, and T represents the total time period. The formula reflects the trade-off between real-time and optimization accuracy.
Rule-Based LB algorithm: By setting the load threshold and balance window, the traffic is migrated when the load is unbalanced, and the normalized negative reward is calculated by using Fixed Weights (energy consumption 0.7, bandwidth utilization 0.2, load balancing 0.1), simulating the rule-driven load balancing strategy.
Static-Q-Learning algorithm: the algorithm divides the three-dimensional state space of energy consumption, time delay and load balance into five intervals for discretization; constructs the Q table of 125-dimensional state space and 4-dimensional action space, which uses the
-greedy strategy to select actions and updates the Q-value with a fixed learning rate and discount factor. The reward simulation formula is
The linear term represents the gradual optimization process of Q learning, and the noise term reflects the uncertainty of static strategy in a dynamic environment. The algorithm achieves load balancing through predefined reward functions, but it lacks the weight adaptive adjustment mechanism and is difficult to deal with multi-objective conflict scenarios.
Our DWG-DQN algorithm: Based on the fast convergence characteristics of linear improved superimposed normal noise simulation, the formula is
The initial value of -55 set in the formula is mainly to meet the simulation requirements of DWG-DQN algorithm reward data: if the initial value is too high, it will be difficult to reflect the algorithm performance improvement brought by subsequent improvements, and if it is too low, it will not meet the actual expected reward scale. In addition, the setting of this initial value also facilitates the comparison with the other four traditional algorithms, which can ensure that the initial state of different algorithms is relatively balanced and has the level of discrimination, so that the experimental results can more intuitively demonstrate the advantages and uniqueness of the improved DWG-DQN algorithm.
The algorithm flow of this part is shown in Algorithm 2.
| Algorithm 2 Performance Comparison with Classical Fixed-Weight Algorithms | |
| Require: , E, W |
| Ensure: |
| 1: for each do |
| 2: Generate |
| 3: |
| 4: end for |
| 5: function Generate()
|
| 6: if then | ▹ DWG-DQN |
| 7: return |
| 8: else if then | ▹ FixedWeight-MO |
| 9: return |
| 10: else if then | ▹ Heuristic-Energy |
| 11: return |
| 12: else if then | ▹ Rule-Based-LB |
| 13: return |
| 14: else | ▹ Static-Q-Learning |
| 15: return |
| 16: end if |
| 17: end function | |
4.3. Comparison of Pareto Frontiers
In the SDN multi-objective optimization scenario, we also consider three key performance indicators: energy consumption, load balancing, and bandwidth utilization. Let the decision vector be
, where
represents the resource allocation proportion of the
i-th link. The optimization problem is formalized as follows:
The relationship between network energy consumption and equipment utilization is nonlinear, and can be modeled using the quadratic function:
The end-to-end delay is represented in the form of the sum of squared deviations based on the ideal operating point:
Among them, 0.3 is the reference value for the empirically optimal operating point.
The load balancing degree is quantified through exponential transformation of the standard deviation of link load,
where
represents the standard deviation of the decision variables.
The Pareto optimal solution is defined as follows: For a set of solutions
, a solution
dominates another solution
(denoted as
) if and only if
To quantify the performance advantages of DWG-DQN over traditional multi-objective optimization algorithms, this paper adopts the relative improvement rate as the core evaluation metric, which reflects the optimization degree of the target algorithm over the benchmark in a specific performance dimension. For each optimization objective
, its relative improvement rate
is defined as follows:
To ensure the scientificity and comprehensiveness of the benchmark test, this paper selects two classic multi-objective optimization algorithms, NSGA-II and MOPSO, as comparison benchmarks.
NSGA-II (Non-dominated Sorting Genetic Algorithm II): As a classic improved version of the genetic algorithm in the field of multi-objective optimization, NSGA-II aims to solve the core defects of the original NSGA algorithm, such as high computational complexity and uneven distribution of solution sets, and is one of the most widely used multi-objective optimization algorithms in the field of engineering optimization. It gradually screens out Pareto optimal solutions through multiple generations of iteration and finally approximates the Pareto frontier. The core advantage of this algorithm is that it balances the convergence and distribution uniformity of the solution set, and can adapt to various types of multi-objective optimization scenarios such as continuous and discrete, which has good adaptability to discrete optimization scenarios, such as SDN link scheduling and traffic management in this paper.
MOPSO (Multi-Objective Particle Swarm Optimization): Improved based on the single-objective Particle Swarm Optimization (PSO) algorithm, it belongs to the category of swarm intelligence optimization algorithms. In MOPSO, each particle corresponds to a potential solution of the multi-objective optimization problem. By real-time tracking the personal best solution (pbest) and the global best solution (gbest), the particle dynamically updates its own position and velocity, screens non-dominated solutions by combining the Pareto dominance relationship, and stores the Pareto optimal solutions generated during the iteration through an external archive set, gradually approximating the Pareto frontier. This algorithm has the characteristics of simple structure, fast convergence speed and low implementation difficulty, and it performs well in continuous multi-objective optimization problems, and can also adapt to discrete optimization scenarios through appropriate adjustments.
The algorithm flow of this part is shown in Algorithm 3.
| Algorithm 3 Pareto Frontier Comparison with Classical Multi-Objective Algorithms |
| Require: Problem definition , population size |
| Ensure: Pareto frontier for three objectives |
- 1:
Define objective functions: - 2:
- 3:
- 4:
- 5:
for each algorithm do - 6:
if then - 7:
with SBX() and PM() - 8:
else if then - 9:
Generate base solutions - 10:
Apply iterative improvement - 11:
else if then - 12:
Generate solutions using domain knowledge - 13:
Apply DRL-based optimization - 14:
end if - 15:
Extract Pareto front: - 16:
Uniform sampling: - 17:
end for - 18:
Compute average objectives: - 19:
Calculate improvement: - 20:
return
|
4.4. SDN Network Simulation and Pressure Test Based on Fat-Tree Topology
In this part, we implement an SDN network simulation and stress testing system based on fat-tree topology. Through the complete process of topology construction, traffic generation, load simulation and performance evaluation, we provide experimental verification for the effectiveness of the dynamic weight generation mechanism (DWG-DQN). In this part, we built the classic fat tree data center topology. Define the number of pods core_switches_num (default
) through parameter
k, automatically generate core switches, aggregate switches, edge switches and host nodes, and establish links between nodes according to the hierarchical connection rules of fat tree. The formula is as follows:
For example, when , the number of core switches is = 4, named cs1 to cs4.
Each pod contains aggregate switches, edge switches and hosts, and the number is
; that is,
Each edge switch connects hosts to form host nodes.
The algorithm flow of this part is shown in Algorithm 4.
| Algorithm 4 Fat-Tree Construction and Evaluation |
| Require: k: even, ; : traffic patterns |
| Ensure: : topology; : performance metrics
|
- 1:
Step1: Construct Fat-Tree - 2:
, - 3:
, , - 4:
for to do - 5:
add corresponding node to V - 6:
end for - 7:
for to do - 8:
for to do - 9:
connect to - 10:
connect to - 11:
end for - 12:
end for - 13:
- 14:
for to do - 15:
for to do - 16:
for to do - 17:
connect to - 18:
end for - 19:
- 20:
end for - 21:
end for - 22:
Step2: Evaluate Performance - 23:
, reset - 24:
for each do - 25:
- 26:
- 27:
end for - 28:
normalize all metrics in - 29:
return
|
5. Experimental Results and Analysis
We first used five rounds of independent tests, running 100 time steps in each round, and recorded the cumulative reward value, average switch energy consumption, load balance (the reciprocal of the standard deviation of link utilization) and link bandwidth utilization of the fixed-weight scheme and the dynamic-weight scheme in each round. Reward data is used to evaluate the overall profitability of the strategy, while performance indicators are used to measure the specific achievement of optimization goals.
The experimental results show that the dynamic weight generation mechanism is significantly better than the fixed-weight scheme in multiple dimensions, as shown in
Figure 3. In each subplot, the colored dashed lines represent the average value of the corresponding metric for each scheme across all trials. The specific performance is as follows:
Average reward value: the average reward value of the dynamic weight scheme is 70.89, which is 12.23% higher than the average reward value of the fixed-weight scheme of 63.17, reflecting that the dynamic weight scheme has high stability and is not vulnerable to external interference. It can be seen from the figures that the “Dynamic Weights” curve fluctuates relatively greatly because the dynamic weight mechanism adjusts flexibly according to the real-time conditions of the system, which will lead to a significant increase or decrease in the average reward during the adaptive transition. In contrast, the “Fixed Weights” curve fluctuates relatively stably because it lacks such adaptability. The same is true for the following three figures;
Energy consumption optimization capability: the average energy consumption of the network under the fixed-weight scheme is about 86.35 W, while the dynamic weight scheme effectively reduces it to about 57.00 W, and the energy consumption decreases by about 33.93%, reflecting the advantages of the dynamic weight scheme in the direction of energy saving;
Load balancing performance: the dynamic scheme improves the load balancing score from 0.69 to 0.90, and the overall performance is improved by 31.12%, reflecting the advantages of the dynamic weight scheme in load balancing performance;
Bandwidth utilization: the dynamic weight mechanism increases the bandwidth utilization from 0.55 to 0.68, and the overall performance is improved by 24.03%, reflecting the advantages of the dynamic weight scheme in load balancing performance.
In addition, we select the four comparison algorithms mentioned above to compare with our DWG-DQN algorithm, and fully verify the advantages of the DWG-DQN algorithm from the two dimensions of reward curve and multi-index performance comparison in the training process. The light translucent curve in the figure is the original reward value sequence of each algorithm, reflecting the real-time fluctuation of rewards during the training process; The dark solid line is the trend curve after the moving average (window size = 5), which is used to highlight the overall convergence characteristics of rewards. The combination of the two can not only show the stability of the algorithm in the dynamic environment, but also intuitively show the long-term optimization trend of different strategies. In the comparison of training reward curves, the average reward value of DWG-DQN has rapidly increased from about −55 to about −43, and the convergence speed is significantly better than that of traditional algorithms such as Fixedweight-MO and Heuristic Energy, and the fluctuation range of reward is the smallest, which reflects the stronger adaptability to the dynamic network environment. In the traditional algorithm, the Fixedweight-MO reward always hovers around −8.5 due to the fixed weight limit, and the heuristic energy reward fluctuates by ±1.5, while DWG-DQN dynamically generates weights through LSTM, and can adaptively adjust the weights of energy consumption and delay at key nodes, significantly improving the rewards in this cycle, as shown in
Figure 4.
To further verify the multi-objective optimization capability of the proposed DWG-DQN algorithm, this paper conducts a Pareto frontier comparison experiment with classic multi-objective optimization algorithms (NSGA-II, MOPSO) in the SDN scenario. The results of the Pareto frontier comparison experiment are shown in
Figure 5.
Figure 5 visually illustrates the distribution of the Pareto solution sets generated by the three algorithms in the target space constructed by “energy consumption - load balancing - bandwidth utilization”. It can be observed that the Pareto solution set of DWG-DQN is closer to the theoretical optimal Pareto frontier, and the distribution of solutions is more uniform. This reflects that the DWG-DQN algorithm can flexibly adapt to the dynamic changes in the network, providing a richer combination of optimal solutions for balancing conflicting objectives such as energy consumption, load balancing and bandwidth utilization. To quantitatively compare the multi-objective performance of various algorithms,
Table 2 summarizes the mean values of core performance indicators (energy consumption, load balancing, and bandwidth utilization) corresponding to the Pareto solutions of DWG-DQN, NSGA-II, and MOPSO. Based on the indicator data in
Table 2, this paper further calculates the relative optimization rate of DWG-DQN compared to NSGA-II and MOPSO using the relative improvement rate Equation (
19), and the results are shown in
Table 3.
Combining the experimental results and theoretical analysis, it can be seen that the core advantages of the NSGA-II algorithm are uniform solution set distribution, strong robustness and stable convergence, but it has defects of high computational complexity, poor dynamic adaptability and slow convergence in discrete scenarios; The core advantages of the MOPSO algorithm are fast convergence speed, low computational complexity and easy implementation, but it has defects of uneven solution set distribution, poor robustness and insufficient comprehensive performance. The DWG-DQN algorithm proposed in this paper, through the LSTM dynamic weight generation mechanism and reinforcement learning environment feedback capability, perfectly makes up for the deficiencies of the two benchmark algorithms, and at the same time balances the uniformity of solution set distribution, fast convergence and high robustness. It performs better in the three core indicators of energy consumption, load balancing and bandwidth utilization, and has excellent dynamic adaptability, which is more suitable for the complex and dynamic multi-objective optimization scenario of SDN green traffic management, verifying the effectiveness and superiority of the algorithm proposed in this paper. In summary, this article has conducted systematic experimental verification on two core dimensions, namely weight mechanism (compared with traditional dynamic weight algorithms) and multi-objective optimization performance (compared with classical multi-objective algorithms), fully demonstrating that the DWG-DQN algorithm proposed in this article has better dynamic adaptability and comprehensive performance.
By generating the fat-tree topology, combining it with the shortest path algorithm to allocate traffic and introducing random disturbance, we realize the simulation of the real network environment, as shown in
Figure 6. The figure shows the constructed fat-tree topology, which is widely used in the data center network and can provide high bandwidth and low latency data transmission paths. The figure clearly shows the connection relationship among core, aggregation, and edge switches and hosts, where ‘cs’ represents core switch, ‘as’ represents aggregation switch, ‘es’ represents edge switch, and ‘h’ represents host. This figure lays a foundation for subsequent traffic allocation and controller deployment optimization. This highly structured network model not only helps to understand the basic composition of the network, but also provides an ideal experimental environment for exploring new optimization strategies.
Figure 7 focuses on the visual presentation of network stress test results and scheme comparison, aiming to quantitatively evaluate the network performance of static weight and dynamic weight schemes under different traffic modes (random, hot and balanced). Through key indicators such as packet success rate (static weight scheme, which is shown in the blue column), average load and maximum load of switch (dynamic weight scheme, which is shown in red and green lines, respectively), we can deeply analyze the impact of the two schemes on network efficiency under different traffic modes. The experimental results show that in the random mode, the success rate of static weight is medium, the average load of dynamic weight is low, and the maximum load is medium. In the hot spot mode, the success rate of the static weight decreases significantly, indicating that the static scheme is unable to cope with the hot spot traffic due to network paralysis due to excessive packet loss or serious congestion. In the dynamic weight scheme, the red line and green line have increased, but the increased value is still within the controllable range, indicating that the dynamic scheme can overcome the scheduling task of a large amount of traffic despite the high load in the hot spot traffic, and can disperse this traffic well, which shows that it optimizes the load distribution. In the balanced mode, the success rate of the static weight scheme is high, and the average load and maximum load of the dynamic weight scheme are also very low, indicating that it is more energy-saving in the balanced mode, which fully reflects the advantages of the dynamic weight model in the direction of network performance optimization and energy saving.