Adaptive Congestion Detection and Traffic Control in Software-Defined Networks via Data-Driven Multi-Agent Reinforcement Learning
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript presents a framework that uses Multi-Agent Reinforcement Learning (MARL) for adaptive congestion detection and traffic control in Software-Defined Networks (SDNs). The dual-agent system, integrating a Q-learning-based Congestion Classification Agent and a Deep Q-Network-based Decision-Making Agent, is implemented and validated using a simulated Mininet-Ryu environment. However, the manuscript may be further improved. I have the following suggestions.
The authors are suggested to use some practical network traces such as MAWI or CAIDA to validate generalization, benchmarking against more sophisticated learning or heuristic-based baselines.
Please include an ablation study to assess the individual contribution of each agent.
Please use enhanced reward function design and incorporate normalized and multi-objective reward functions that consider additional QoS metrics such as jitter, link utilization, and flow completion time, beyond just delay and packet loss.
The authors may use cooperative learning methods such as centralized training with decentralized execution (CTDE), parameter sharing, or attention-based communication to improve inter-agent coordination.
Author Response
Comment : The authors are suggested to use some practical network traces such as MAWI or CAIDA to validate generalization
Response : We appreciate this suggestion and have integrated MAWI traces into our simulation to evaluate the robustness of our framework under real-world traffic conditions. This addition validates our model beyond synthetic environments. Please note that we did not remove the Simulation using synthetic Traffic, but we have improved to include the enhanced reward.
Adding subsection 'Experimental Validation with MAWI Dataset ' to section 5 (Simulation and Results Analysis)
Comment : benchmarking against more sophisticated learning or heuristic-based baselines.
Response: We compared our MARL framework with:
- A Double DQN agent (sophisticated baseline),
- A non-learning heuristic-based agent using randomized,
- an ablation variant using only the DQN decision module without congestion classification. (Include the Ablation Study in this section)
These comparisons clearly demonstrate the benefit of multi-agent design in both stability and performance.
Adding subsection 'Comparative Evaluation of Our Data-Driven MARL vs. Ablation, Double DQN, and Random
Baselines' to section 5 (Simulation and Results Analysis) , this new Subsection replace the previous one 'Comparative Evaluation of our Data-driven MARL and Non-Learning approaches'
Comment : Please include an ablation study to assess the individual contribution of each agent.
Response : An ablation study was conducted to isolate the contributions of the classification and decision-making agents. When the classifier was removed, the system lacked adaptability to varying congestion states, leading to less effective decisions under fluctuating network conditions. Conversely, using only the classifier agent is not beneficial, as the purpose of our proposed system is both to classify congestion and to actively solve the congestion problem. Therefore, we also evaluated a version of the system with only the decision-making agent and compared it with the aforementioned baselines
Please see Section subsection 'Comparative Evaluation of Our Data-Driven MARL vs. Ablation, Double DQN, and Random Baselines' of section 5 (Simulation and Results Analysis)
Comment : Please use enhanced reward function design and incorporate normalized and multi-objective reward functions that consider additional QoS metrics such as jitter, link utilization, and flow completion time, beyond just delay and packet loss.
Response : We revised the reward design to incorporate multiple QoS metrics (delay, packet loss, jitter, link utilization), with thresholds to reflect realistic operational targets. The revised function uses a tiered structure to better guide learning and penalize poor decisions.
Therefore, we regenerated all the figures in the subsection 'Experimental Evaluation with Synthetic Traffic' of Section 5 and reviewed the corresponding result interpretations. We also updated the reward function in the subsection 'Implementation of the Framework' of Section 4 to highlight the enhanced reward design, which now incorporates the previously mentioned metrics
Comment : The authors may use cooperative learning methods such as centralized training with decentralized execution (CTDE), parameter sharing, or attention-based communication to improve inter-agent coordination.
Response : We agree this is an interesting direction. Although not implemented in this version, we highlight centralized training with decentralized execution (CTDE) and parameter sharing as promising extensions in our future work section.
Please see Section 6
Additional Modification : We improved the introduction to explicitly outline our main contributions, including the use of the MAWI dataset to validate our models. Additionally, we revised the conclusion (see Section 6) and the abstract to align them with the required modifications.
Thank you again for your valuable feedback. I hope I have addressed all your comments clearly and did not miss anything. Please let me know if any further clarification is needed.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsIn this study, we present a data-driven framework that performs both congestion classification and adaptive decision-making in SDNs by utilizing multi-agent reinforcement learning (MARL). The proposed framework performs decision-making by dynamically selecting optimal bandwidth and routing strategies using deep Q-learning and a congestion classification agent that evaluates network conditions based on delay and packet loss. It continuously monitors traffic conditions, adjusts policies through reward feedback, and improves behavior over time. Experiments demonstrate that the MARL-based system significantly improves throughput, reduces end-to-end delay, and provides an effective approach for intelligent congestion control in programmable network infrastructures. Through this, we expect that the proposed framework will enable efficient congestion management in software-defined networks (SDNs). Overall, this is a well-written paper, but please consider the following points and revise them accordingly.
- In order to clearly understand the contribution of this study, authors need to describe the academic contributions of the proposed framework in Introduction in line with the recent paper trends, and it would be good to place it between Line 68 and Line 69.
- Figure 1 seems very important, but I would like to improve visibility by making the letters a bit larger or increasing the image resolution.
- In Figures 4, 5 and 6, the text is relatively small and difficult to understand, so please provide images with improved visibility or higher resolution.
- In Section 5, it is understandable to use simulations to measure packet loss, delay, throughput, reinforcement learning rewards, etc. in measuring the effectiveness of the proposed framework, but in order to verify its validity or effectiveness, it is necessary to apply it to an actual system in the future to prove its actual effectiveness. The authors need to describe the limitations of this study in Section 6.
Author Response
Comment : In order to clearly understand the contribution of this study, authors need to describe the academic contributions of the proposed framework in Introduction in line with the recent paper trends, and it would be good to place it between Line 68 and Line 69.
Response : Line 55 - 78
We have revised the introduction to explicitly outline our main contributions and the motivation behind them, including:
-
A novel MARL architecture specifically designed for SDN congestion management, motivated by the need for scalable and adaptive control in dynamic network environments.
- The integration of a Q-learning-based classifier with a DQN-based decision-making agent, for congestion detection and adaptive routing
- A realistic validation setup using MAWI traffic data to demonstrate the practical effectiveness of our approach.,
- Comparative study with advanced baselines and ablation variants.
Comment : Figure 1 seems very important, but I would like to improve visibility by making the letters a bit larger or increasing the image resolution
Response: Figure 1 has been updated with:
- Larger font sizes,
- Improved resolution,
Comment : In Figures 4, 5 and 6, the text is relatively small and difficult to understand, so please provide images with improved visibility or higher resolution.
Response : We generated Plots 4–6. Please note that the order of the plots has been changed to maintain a consistent structure, and modifications were made to the code to incorporate suggestions from Reviewer 1.
- Figure 4 (old) --> Figure 5 (new)
- Figure 5 (old) --> Figure 6 (new)
- Figure 6 (old) --> Figure 2 (new)
Comment : In Section 5, it is understandable to use simulations to measure packet loss, delay, throughput, reinforcement learning rewards, etc. in measuring the effectiveness of the proposed framework, but in order to verify its validity or effectiveness, it is necessary to apply it to an actual system in the future to prove its actual effectiveness. The authors need to describe the limitations of this study in Section 6.
Response : We acknowledge the limitation of relying only on simulation. To address this, we have integrated real-world traffic traces from the MAWI dataset into our evaluation to ensure realistic traffic behavior alongside with more Qos Metrics delay, jitter, loss, utilization. Additionally, we stated in Section 6 that future work will focus on deploying the framework in a real SDN testbed (e.g., ONOS or P4 environments) to validate its operational effectiveness. We also plan to explore cooperative learning techniques, such as centralized training with decentralized execution (CTDE), parameter sharing, and attention-based coordination, to improve agent collaboration. Furthermore, we aim to adopt Double DQN to reduce overestimation bias and enhance decision stability under dynamic traffic conditions.
Thank you again for your valuable feedback. I hope I have addressed all your comments clearly and did not miss anything. Please let me know if any further clarification is needed.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper, overall, is quite an interesting discussion, but with some issues I'd like the authors to address.
First and foremost, I think there is far too little discussion on the authors' actual implementation of their system. The discussion on the framework, implementation, and testing, basically half a page in section 4.2. That is far, FAR too little detail to judge the work presented here adequately.
By contrast, the initial background sections are far too extensive, measuring about 8 pages of the entire paper, or close to half! That's too much. I would strongly advise reducing that discussion and then expanding the details on your implementation, code, model design, training, etc., to no less than three pages.
Additionally, the results you present do not support your claims. This is particularly obvious for the latency and throughput graphs, Figures 7 and 8. In particular, the delay average seems identical between the DD-MARL and the conventional approach. I see absolutely no difference in the averages. The same is true for the throughput, which seems to be averaging out to exactly the same. So if the goal was to show that it is more responsive, between data generated and data successfully received, then these graphs do nothing to make that point. Packet Loss in Figure 9 shares that same problem. It simply does not support the claims.
Consequently, this reviewer is too uncertain regarding the actual accomplishments of this work. Taken together, the lack of implementation details and the lack of results demonstrating the benefits of the proposed approach cast significant doubt over the actual work. This needs to be remedied before I can consider accepting this paper.
Author Response
Comment : First and foremost, I think there is far too little discussion on the authors' actual implementation of their system. The discussion on the framework, implementation, and testing, basically half a page in section 4.2. That is far, FAR too little detail to judge the work presented here adequately.
Response: Thank you for this valuable feedback. While Section 4.2 presents the system components, we would like to highlight whole Section 4 , which provides more detailed information about the simulation and implementation, especially in Sections 4.1 and 4.2.
- In Section 4.1, we describe the multi-agent framework, including the roles of the Congestion Classification Agent and the Decision-Making Agent, and how they interact with the SDN controller and the network environment. This section also presents the overall system architecture and explains how the agents cooperate to detect congestion and decide optimal actions.
- In Section 4.2, we added details about the synthetic traffic generation using tools like iPerf and D-ITG. We also incorporated the MAWI dataset into the Mininet simulation using tcpreplay and netcat (nc), command-line tools widely used for network testing, traffic replay, and debugging.
- We mention that a Python script is used to collect key traffic metrics (e.g., delay, packet loss, throughput) in real time. This script interacts with the open REST API of the Ryu controller, allowing us to extend functionality and extract detailed performance data during execution.
- We also highlight the architecture of the neural network used for the Decision-Making Agent and provide more detail on the reward function, which now includes multiple metrics (delay, throughput, packet loss, jitter, and link utilization). We also explain the penalty mechanism, which helps discourage poor decisions and improves the learning process.
Comment : By contrast, the initial background sections are far too extensive, measuring about 8 pages of the entire paper, or close to half! That's too much. I would strongly advise reducing that discussion and then expanding the details on your implementation, code, model design, training, etc., to no less than three pages.
Response : We appreciate the reviewer’s recommendation and fully agree that a more balanced focus on implementation and experimentation the contribution. In response, we expanded the implementation and evaluation sections by including (Section 5.2 and Section 5.3):
- the MAWI traffic dataset in experimental set up, which allows us to evaluate our framework under more realistic and heterogeneous network conditions beyond synthetic traffic.
-
We also added more models for comparison to better show how our approach performs. These include:
-
A Double DQN agent, to compare with a stronger single-agent learning method.
-
An ablation version, using only one of our two agents, to see how each part works on its own.
-
A non-learning baseline, which uses fixed decisions, to show the benefit of learning-based control. These changes make our experiments stronger and more complete.
-
We decided to keep the background section as it is because it helps explain key ideas in SDN, reinforcement learning, and multi-agent systems. This is important for readers who may not know these topics well and helps them understand our framework and design choices. We believe this makes the paper clearer and more useful for a wider audience, while still showing the technical work clearly.
Comment : Additionally, the results you present do not support your claims. This is particularly obvious for the latency and throughput graphs, Figures 7 and 8. In particular, the delay average seems identical between the DD-MARL and the conventional approach. I see absolutely no difference in the averages. The same is true for the throughput, which seems to be averaging out to exactly the same. So if the goal was to show that it is more responsive, between data generated and data successfully received, then these graphs do nothing to make that point. Packet Loss in Figure 9 shares that same problem. It simply does not support the claims.
Response : We understand the concern about the graphs looking similar. To make our results clearer, we updated Section 5.6 to highlight:
-
How our MARL agent reacts better and learns faster than the basic method.
-
That it chooses smarter actions over time, as shown in the reward and action charts : Figure 10 (Cumulative Reward) and Figure 11 (Action Distribution).
-
We also added tests using the MAWI dataset to include more realistic traffic.
Although the averages may seem similar, our revised analysis emphasizes how MARL maintains more stable and adaptive control during dynamic shifts.
Comment : Consequently, this reviewer is too uncertain regarding the actual accomplishments of this work. Taken together, the lack of implementation details and the lack of results demonstrating the benefits of the proposed approach cast significant doubt over the actual work. This needs to be remedied before I can consider accepting this paper.
Response : We have made several improvements in both methodology and explanation, including:
-
Enhanced reward design and expanded discussion on adaptive action selection.
-
Integration of real-world MAWI traffic traces to generalize.
-
A clearer articulation of how our multi-agent design enables specialized, cooperative learning that improves performance stability.
These changes make our work more realistic, flexible, and clear, and directly address the main concerns raised.
Thank you again for your valuable feedback. I hope I have addressed all your comments clearly and did not miss anything. Please let me know if any further clarification is needed.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe revision is satisfactory.
Reviewer 3 Report
Comments and Suggestions for AuthorsI wish the authors would have used the opportunity to expand their framework description with more details and specifics. I would have liked to see algorithms, code snippets, etc. None of that, however, was added. The authors instead focused on adding more text that unfortunately does very little to address my concerns with this paper. Additionally, the graphs still lack sufficient insights to support the claims made in the paper. Overall, my initial review still stands. The lack of progress to address them is concerning.
Author Response
Comment : I wish the authors would have used the opportunity to expand their framework description with more details and specifics. I would have liked to see algorithms, code snippets, etc. None of that, however, was added. The authors instead focused on adding more text that unfortunately does very little to address my concerns with this paper. Additionally, the graphs still lack sufficient insights to support the claims made in the paper. Overall, my initial review still stands. The lack of progress to address them is concerning.
Response :
Thank you again for your continued feedback and for giving us a second chance to improve our paper. Your comments helped us look more closely at how we present our work and explain our methods.
We understand your concern about the lack of code snippets and algorithms in the earlier version. In this revision, we have expanded Section 4.2 to include more detailed explanations, such as:
-
A clearer description of the architecture of the classifier and decision-making agents,
-
Step-by-step explanation of how the agents learn and make decisions,
-
A full overview of the training and evaluation process (offline and online),
-
How we process the dataset and extract features.
We also improved the figures by:
-
Adding more descriptive captions and explanations in the text,
-
Improving resolution and formatting for easier reading,
-
Making sure the graphs better match the results we discuss.
To support reproducibility, we have also shared our code and sample data on GitHub.
Finally, we want to note that we carefully responded to all comments from the first review round. You can find all our responses in the attached document.
We hope these updates better address your concerns and show the practical value of our work.
Thank you again for your helpful feedback.
Author Response File: Author Response.pdf
Round 3
Reviewer 3 Report
Comments and Suggestions for AuthorsWhile I still have reservations about this paper, I don't want to stand in the way of its publication. In evaluating the suggestions by the other reviewers, and the feedback and changes from the authors, I can see the progress of this paper.
Additionally, the authors have addressed several of my concerns. Hence, at this stage I have no additional suggestions for the authors.