Energy-Aware MPTCP Scheduling in Heterogeneous Wireless Networks Using Multi-Agent Deep Reinforcement Learning Techniques
Abstract
:1. Introduction
- Mathematical Model Development: Formulate a comprehensive mathematical model that encapsulates key components of MPTCP in heterogeneous wireless networks. This includes factors such as energy consumption, network performance, and the interactions among multiple agents.
- Energy-Efficient Scheduling Algorithm Design: Propose a novel multi-agent deep reinforcement learning-based algorithm, termed the Energy-Efficient Multi-agent Deep Deterministic Policy Gradient (EE-MADDPG). This algorithm aims to optimize energy consumption while ensuring satisfactory network performance.
- Performance Evaluation of the Proposed Algorithm: Execute extensive simulations to assess the performance of the proposed EE-MADDPG algorithm in terms of energy efficiency, network throughput, latency, and scalability. Furthermore, compare these results with existing centralized, distributed, and single-agent reinforcement learning-based methodologies.
- Examination of Various Network Parameters’ Impact: Conduct an in-depth analysis of the effects of different network parameters on the performance of the proposed EE-MADDPG algorithm. These parameters include the number of agents, network size, traffic load, and path diversity.
2. Related Work
3. Problem and Mathematical Formulation
3.1. Problem Formulation
- State space (S): The state space represents the network parameters relevant to MPTCP operations, such as link utilization, congestion status, delay, and buffer occupancy. These parameters can be collected and processed by each agent to form a state representation that captures the current network conditions.
- Action space (A): It consists of actions that each agent can perform to influence the MPTCP operations. These actions may include adjusting the sending rate, selecting paths, managing sub-flows, and modifying congestion control parameters.
- Transition function (P): It describes the probability of transitioning from one state to another by giving a specific action. In the context of MPTCP networks, this function is largely determined by network dynamics, such as traffic pattern, link capabilities, and congestion control mechanisms.
- Reward function (R): It quantifies the desirability of taking a specific action in a given state. The reward function should reflect energy efficiency, QoS, and network performance objectives. The formulation could be a weighted sum of energy consumption, delay, throughput, and packet loss metrics.
3.2. Mathematical Model
- : State space for agent i. The state represents the network parameters relevant to the MPTCP operation for agent i, such as link utilization, congestion status, delay, and buffer occupancy. The state can be represented as a vector:
- : Action space for agent i. The action consists of actions that each agent can perform to influence the MPTCP operation. These actions may include adjusting the sending rate , selecting paths , managing sub-flows , and modifying congestion control parameters . The action can be represented as a vector:
- : Transition function for agent i. The transition function describes the probability of transitioning from state to state given action for agent i. In the context of MPTCP networks, this function is largely determined by the network dynamics.
- : Reward function for agent i. The reward function quantifies the desirability of taking action in state for agent i. The reward function should reflect energy efficiency , QoS , and network performance objectives. A possible formulation could be a weighted sum of these metrics:
- : Discount factor. The discount factor determines the relative importance of immediate and future rewards in the reinforcement learning process.
4. Optimizing Energy Efficiency in MPTCP Using MADDPG
Training Process
- Initialize the environment and observe the initial state for each agent i.
- For each time step t from 0 to T:
- -
- Each agent selects an action using its corresponding actor network: .
- -
- The set of actions is executed in the environment, transitioning it to the next state and providing each agent with a reward .
- -
- The experience tuple is stored in the replay buffer R.
- Update the networks by sampling a random mini-batch of experiences from R and performing the following steps for each agent i. These Equations (3) and (4) [28] are a more general formulation of the update rules for the critic and actor networks in reinforcement learning, and they are also used in the MADDPG algorithm.The critic network is trained by minimizing a loss function:The goal of updating the critic network is to improve the accuracy of the Q-value estimates.The actor network is updated by maximizing the expected return:
- Update the target networks by interpolating their weights with the corresponding online networks:
- Repeat steps 2–4, until the agents reach an optimal joint policy that cannot be further improved.These Equations (7) and (8) [28] describe the update rules for the critic and actor networks in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm.The critic network is updated by minimizing the loss function:The goal of updating the critic network is to improve the accuracy of the Q-value estimates.The actor network is enhanced by optimizing the anticipated return.
Algorithm 1 MADDPG for energy efficiency in MPTCP. |
|
5. Methodology
6. Implementation
7. Results and Discussion
7.1. Throughput
7.2. Delay
7.3. Energy Consumption
7.4. Loss Function
- Throughput: Higher is better.
- Delay: Lower (more negative) is better.
- Energy Consumption: Lower (more negative) is better.
- EE-MADDPG has the highest throughput improvements compared to Round Robin (239.57%).
- EE-MADDPG has the lowest delay percentage (best improvement) compared to Round Robin (−70.33%).
- EE-MADDPG has the lowest energy consumption percentage (best improvement) compared to Round Robin (−62.11%).
- Scalability: MADDPG, as an extension of DDPG, is an actor–critic method that is highly scalable with the number of agents and the size of the action space. This makes it suitable for complex environments like MPTCP, where there can be multiple paths (agents) and many possible actions (data allocation across paths).
- Continuous Action Spaces: Unlike some other multi-agent reinforcement learning methods, MADDPG can handle continuous action spaces effectively. This is crucial for the MPTCP problem, where the action could be the amount of data to send over each path, which is a continuous variable.
- Policy-Based Method: MADDPG is a policy-based method, meaning it directly learns the optimal policy that maps states to actions. This is beneficial in a dynamic environment like MPTCP, where it is important to quickly adapt to changes in network conditions.
- Stability and Convergence: The use of a target network and soft updates in DDPG (and thus MADDPG) helps stabilize learning and ensure convergence, which is important for achieving reliable performance.
- Multi-Agent Coordination: MADDPG considers the actions and policies of other agents during learning, which is essential for coordinating data transmission over multiple paths in MPTCP.
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Notations
Symbol | Description |
N | Number of agents in the MPTCP network |
i | Index of an agent () |
State space for agent i | |
State of agent i | |
Action space for agent i | |
Action of agent i | |
Transition function for agent i | |
Reward function for agent i | |
Discount factor | |
Policy of agent i | |
Link utilization for agent i | |
Congestion status for agent i | |
Delay for agent i | |
Buffer occupancy for agent i | |
Sending rate for agent i | |
Path selection for agent i | |
Sub-flow management for agent i | |
Congestion control parameter for agent i |
References
- Selvaraju, S.; Balador, A.; Fotouhi, H.; Vahabi, M.; Bjorkman, M. Network Management in Heterogeneous IoT Networks. In Proceedings of the 2021 International Wireless Communications And Mobile Computing (IWCMC), Harbin, China, 28 June–2 July 2021; pp. 1581–1586. [Google Scholar] [CrossRef]
- Tomar, P.; Kumar, G.; Verma, L.; Sharma, V.; Kanellopoulos, D.; Rawat, S.; Alotaibi, Y. CMT-SCTP and MPTCP Multipath Transport Protocols: A Comprehensive Review. Electronics 2022, 11, 2384. [Google Scholar] [CrossRef]
- Guan, Z.; Li, Y.; Yu, S.; Yang, Z. Deep reinforcement learning-based full-duplex link scheduling in federated learning-based computing for IoMT. Trans. Emerg. Telecommun. Technol. 2023, 34, e4724. [Google Scholar] [CrossRef]
- Sefati, S.; Halunga, S. Ultra-reliability and low-latency communications on the internet of things based on 5G network: Literature review, classification, and future research view. Trans. Emerg. Telecommun. Technol. 2023, 34, e4770. [Google Scholar] [CrossRef]
- Zhang, W.; Yang, D.; Wu, W.; Peng, H.; Zhang, N.; Zhang, H.; Shen, X. Optimizing federated learning in distributed industrial IoT: A multi-agent approach. IEEE J. Sel. Areas Commun. 2021, 39, 3688–3703. [Google Scholar] [CrossRef]
- Celic, L.; Magjarevic, R. Seamless connectivity architecture and methods for IoT and wearable devices. Autom. J. Control. Meas. Electron. Comput. Commun. 2020, 61, 21–34. [Google Scholar] [CrossRef]
- Goyal, P.; Rishiwal, V.; Negi, A. A comprehensive survey on QoS for video transmission in heterogeneous mobile ad hoc network. Trans. Emerg. Telecommun. Technol. 2023, 34, e4775. [Google Scholar] [CrossRef]
- Ford, A.; Raiciu, C.; Handley, M.; Barre, S.; Iyengar, J. TCP Extensions for Multipath Operation with Multiple Addresses. (RFC Editor, 2013, Volume 1). Available online: https://rfc-editor.org/rfc/rfc6824.txt (accessed on 15 January 2017).
- Li, M.; Lukyanenko, A.; Ou, Z.; Ylä-Jääski, A.; Tarkoma, S.; Coudron, M.; Secci, S. Multipath Transmission for the Internet: A Survey. IEEE Commun. Surv. Tutorials 2016, 18, 2887–2925. [Google Scholar] [CrossRef]
- Wang, H.; Jiang, J.; Li, J.; Ahmed, M.; Peng, M. High Energy Efficient Heterogeneous Networks: Cooperative and Cognitive Techniques. Int. J. Antennas Propag. 2013, 2013, 231794. [Google Scholar] [CrossRef]
- Scharf, M.; Kiesel, S. NXG03-5: Head-of-line Blocking in TCP and SCTP: Analysis and Measurements. In Proceedings of the IEEE Globecom 2006, San Francisco, CA, USA, 27 November–1 December 2006; pp. 1–5. [Google Scholar] [CrossRef]
- Guleria, K.; Verma, A. Comprehensive review for energy efficient hierarchical routing protocols on wireless sensor networks. Wirel. Netw. 2019, 25, 1159–1183. [Google Scholar] [CrossRef]
- Warrier, M.; Kumar, A. Energy efficient routing in Wireless Sensor Networks: A survey. In Proceedings of the 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, 23–25 March 2016; pp. 1987–1992. [Google Scholar] [CrossRef]
- Paasch, C.; Ferlin, S.; Alay, O.; Bonaventure, O. Experimental Evaluation of Multipath TCP Schedulers. In Proceedings of the 2014 ACM SIGCOMM Workshop On Capacity Sharing Workshop, Chicago, IL, USA, 18 August 2014; pp. 27–32. [Google Scholar] [CrossRef]
- Partov, B.; Leith, D. Experimental Evaluation of Multi-Path Schedulers for LTE/Wi-Fi Devices. In Proceedings of the Tenth ACM International Workshop on Wireless Network Testbeds, Experimental Evaluation, and Characterization, New York, NY, USA, 3–7 October 2016; pp. 41–48. [Google Scholar] [CrossRef]
- Navaratnarajah, S.; Saeed, A.; Dianati, M.; Imran, M. Energy efficiency in heterogeneous wireless access networks. IEEE Wirel. Commun. 2013, 20, 37–43. [Google Scholar] [CrossRef]
- Light, J. Green Networking: A Simulation of Energy Efficient Methods. Procedia Comput. Sci. 2020, 171, 1489–1497. [Google Scholar] [CrossRef]
- Suraweera, H.A.; Yang, J.; Zappone, A.; Thompson, J.S. (Eds.) Green Communications for Energy-Efficient Wireless Systems and Networks, 1st ed.; The Institution of Engineering and Technology: London, UK, 2021; ISBN 978-1-83953-067-8. eISBN 978-1-83953-068-5. [Google Scholar]
- Wu, J.; Cheng, B.; Wang, M. Energy Minimization for Quality-Constrained Video with Multipath TCP over Heterogeneous Wireless Networks. In Proceedings of the 2016 IEEE 36th International Conference On Distributed Computing Systems (ICDCS), Nara, Japan, 27–30 June 2016; pp. 487–496. [Google Scholar] [CrossRef]
- Chaturvedi, R.; Chand, S. An Adaptive and Efficient Packet Scheduler for Multipath TCP. Iran. J. Sci. Technol. Trans. Electr. Eng. 2021, 45, 349–365. [Google Scholar] [CrossRef]
- Li, W.; Zhang, H.; Gao, S.; Xue, C.; Wang, X.; Lu, S. SmartCC: A Reinforcement Learning Approach for Multipath TCP Congestion Control in Heterogeneous Networks. IEEE J. Sel. Areas Commun. 2019, 37, 2621–2633. [Google Scholar] [CrossRef]
- Luo, J.; Su, X.; Liu, B. A Reinforcement Learning Approach for Multipath TCP Data Scheduling. In Proceedings of the 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 7–9 January 2019; pp. 276–280. [Google Scholar] [CrossRef]
- Wu, H.; Alay, Ö.; Brunstrom, A.; Ferlin, S.; Caso, G. Peekaboo: Learning-Based Multipath Scheduling for Dynamic Heterogeneous Environments. IEEE J. Sel. Areas Commun. 2020, 38, 2295–2310. [Google Scholar] [CrossRef]
- Ouamri, M.; Azni, M.; Singh, D.; Almughalles, W.; Muthanna, M. Request delay and survivability optimization for software defined-wide area networking (SD-WAN) using multi-agent deep reinforcement learning. Trans. Emerg. Telecommun. Technol. 2023, 34, e4776. [Google Scholar] [CrossRef]
- Zhang, C.; Patras, P.; Haddadi, H. Deep learning in mobile and wireless networking: A survey. IEEE Commun. Surv. Tutorials 2019, 21, 2224–2287. [Google Scholar] [CrossRef]
- Zhang, H.; Li, W.; Gao, S.; Wang, X.; Ye, B. ReLeS: A Neural Adaptive Multipath Scheduler based on Deep Reinforcement Learning. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference On Computer Communications, Paris, France, 29 April–2 May 2019; pp. 1648–1656. [Google Scholar] [CrossRef]
- Busoniu, L.; Babuska, R.; De Schutter, B. A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Trans. Syst. Man, Cybern. Part C (Appl. Rev.) 2008, 38, 156–172. [Google Scholar] [CrossRef]
- Neto, G. From Single-Agent to Multi-Agent Reinforcement Learning: Foundational Concepts and Methods. 2005. Available online: https://api.semanticscholar.org/CorpusID:12184463 (accessed on 3 June 2022).
- Dorri, A.; Kanhere, S.; Jurdak, R. Multi-Agent Systems: A Survey. IEEE Access 2018, 6, 28573–28593. [Google Scholar] [CrossRef]
- Wu, J.; Tan, R.; Wang, M. Energy-Efficient Multipath TCP for Quality-Guaranteed Video Over Heterogeneous Wireless Networks. IEEE Trans. Multimed. 2019, 21, 1593–1608. [Google Scholar] [CrossRef]
- Dong, P.; Shen, R.; Li, Y.; Nie, C.; Xie, J.; Gao, K.; Zhang, L. An Energy-Saving Scheduling Algorithm for Multipath TCP in Wireless Networks. Electronics 2022, 11, 490. [Google Scholar] [CrossRef]
- Raiciu, C.; Niculescu, D.; Bagnulo, M.; Handley, M. Opportunistic Mobility with Multipath TCP. In Proceedings of the Sixth International Workshop on MobiArch, Bethesda, MD, USA, 28 June 2011; pp. 7–12. [Google Scholar] [CrossRef]
- Pluntke, C.; Eggert, L.; Kiukkonen, N. Saving Mobile Device Energy with Multipath TCP. In Proceedings of the Sixth International Workshop on MobiArch, Bethesda, MD, USA, 28 June 2011; pp. 1–6. [Google Scholar] [CrossRef]
- Chen, S.; Yuan, Z.; Muntean, G. An energy-aware multipath-TCP-based content delivery scheme in heterogeneous wireless networks. In Proceedings of the 2013 IEEE Wireless Communications And Networking Conference (WCNC), Shanghai, China, 7–10 April 2013; pp. 1291–1296. [Google Scholar] [CrossRef]
- Cengiz, K.; Dag, T. A review on the recent energy-efficient approaches for the Internet protocol stack. EURASIP J. Wirel. Commun. Netw. 2015, 1–17. [Google Scholar] [CrossRef]
- Cao, Y.; Chen, S.; Liu, Q.; Zuo, Y.; Wang, H.; Huang, M. QoE-driven energy-aware multipath content delivery approach for MPT CP-based mobile phones. China Commun. 2017, 14, 90–103. [Google Scholar] [CrossRef]
- Zhao, J.; Liu, J.; Wang, H.; Xu, C. Multipath TCP for datacenters: From energy efficiency perspective. In Proceedings of the IEEE INFOCOM 2017-IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar] [CrossRef]
- Morawski, M.; Ignaciuk, P. Energy-efficient scheduler for MPTCP data transfer with independent and coupled channels. Comput. Commun. 2018, 132, 56–64. [Google Scholar] [CrossRef]
- Zhao, J.; Liu, J.; Wang, H.; Xu, C.; Gong, W.; Xu, C. Measurement, Analysis, and Enhancement of Multipath TCP Energy Efficiency for Datacenters. IEEE/ACM Trans. Netw. 2020, 28, 57–70. [Google Scholar] [CrossRef]
- Bertsekas, D. Nonlinear Programming; Athena Scientific: Nashua, NH, USA, 2016; ISBN 1886529051/978-1886529052. Available online: https://www.amazon.com/Nonlinear-Programming-3rd-Dimitri-Bertsekas/dp/1886529051 (accessed on 3 June 2022).
- Yang, D.; Zhang, W.; Ye, Q.; Zhang, C.; Zhang, N.; Huang, C.; Zhang, H.; Shen, X. DetFed: Dynamic Resource Scheduling for Deterministic Federated Learning over Time-sensitive Networks. IEEE Trans. Mob. Comput. 2023. [Google Scholar] [CrossRef]
- Yang, D.; Cheng, Z.; Zhang, W.; Zhang, H.; Shen, X. Burst-Aware Time-Triggered Flow Scheduling With Enhanced Multi-CQF in Time-Sensitive Networks. IEEE/ACM Trans. Netw. 2023. [Google Scholar] [CrossRef]
- Chahlaoui, F.; Dahmouni, H. A Taxonomy of Load Balancing Mechanisms in Centralized and Distributed SDN Architectures. SN Comput. Sci. 2020, 1, 268. [Google Scholar] [CrossRef]
- Dong, P.; Shen, R.; Wang, Q.; Zuo, Y.; Li, Y.; Zhang, D.; Zhang, L.; Yang, W. Multipath TCP Meets Reinforcement Learning: A Novel Energy-Efficient Scheduling Approach in Heterogeneous Wireless Networks. IEEE Wirel. Commun. 2023, 30, 138–146. [Google Scholar] [CrossRef]
- Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. arXiv 2020, arXiv:1706.02275. [Google Scholar] [CrossRef]
- He, B.; Wang, J.; Qi, Q.; Sun, H.; Liao, J.; Du, C.; Yang, X.; Han, Z. DeepCC: Multi-Agent Deep Reinforcement Learning Congestion Control for Multi-Path TCP Based on Self-Attention. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4770–4788. [Google Scholar] [CrossRef]
- Hu, F.; Deng, Y.; Hamid Aghvami, A. Scalable Multi-Agent Reinforcement Learning for Dynamic Coordinated Multipoint Clustering. IEEE Trans. Commun. 2023, 71, 101–114. [Google Scholar] [CrossRef]
- Sinan Nasir, Y.; Guo, D. Deep Actor-Critic Learning for Distributed Power Control in Wireless Mobile Networks. In Proceedings of the 2020 54th Asilomar Conference On Signals, Systems, and Computers, Pacific Grove, CA, USA, 1–4 November 2020; pp. 398–402. [Google Scholar] [CrossRef]
- Lim, Y.; Chen, Y.; Nahum, E.; Towsley, D.; Gibbens, R. Design, implementation, and evaluation of energy-aware multi-path TCP. In Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies, Heidelberg, Germany, 1–4 December 2015; pp. 1–13. [Google Scholar] [CrossRef]
- Dong, Z.; Cao, Y.; Xiong, N.; Dong, P. EE-MPTCP: An Energy-Efficient Multipath TCP Scheduler for IoT-Based Power Grid Monitoring Systems. Electronics 2022, 11, 3104. [Google Scholar] [CrossRef]
- Dong, P.; Wu, J.; Liu, Y.; Li, X.; Wang, X. Reducing transport latency for short flows with multipath TCP. J. Netw. Comput. Appl. 2018, 108, 20–36. [Google Scholar] [CrossRef]
Parameter | 4G RTT (ms) | 4G Loss Rate | 4G BW (Mbps) | Congestion Window |
---|---|---|---|---|
Value (Range) | [100, 200] | [0, 0.5] | [15, 20] | [0, 10] |
Parameter | WiFi RTT (ms) | WiFi Loss Rate | WiFi BW (Mbps) | Congestion Window |
Value (Range) | [40, 60] | [0, 1] | [5, 10] | [0, 10] |
Hyper-Parameter | Value | Hyper-Parameter | Value |
---|---|---|---|
Number of agents | 2 | Replay buffer size | 100,000 |
Actor learning rate | 0.001 | Batch size | 128 |
Critic learning rate | 0.01 | Number of hidden layers | 2 |
Discount factor | 0.99 | Hidden layer size | 128 |
Target network update rate | 0.001 |
Algorithm | Throughput (%) | Delay (%) | Energy Consumption (%) |
---|---|---|---|
Round Robin | 0.000000 | −0.000000 | −0.000000 |
EMTCP | 52.672159 | −19.297768 | −12.515934 |
DMPTCP | 93.496327 | −40.201063 | −24.852117 |
RELES | 146.105111 | −48.632261 | −37.386715 |
EE-MPTCP | 198.793723 | −63.679567 | −49.735080 |
EE-MADDPG | 239.571282 | −70.327133 | −62.109398 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arain, Z.A.; Qiu, X.; Xu, C.; Wang, M.; Abdul Rahim, M. Energy-Aware MPTCP Scheduling in Heterogeneous Wireless Networks Using Multi-Agent Deep Reinforcement Learning Techniques. Electronics 2023, 12, 4496. https://doi.org/10.3390/electronics12214496
Arain ZA, Qiu X, Xu C, Wang M, Abdul Rahim M. Energy-Aware MPTCP Scheduling in Heterogeneous Wireless Networks Using Multi-Agent Deep Reinforcement Learning Techniques. Electronics. 2023; 12(21):4496. https://doi.org/10.3390/electronics12214496
Chicago/Turabian StyleArain, Zulfiqar Ali, Xuesong Qiu, Changqiao Xu, Mu Wang, and Mussadiq Abdul Rahim. 2023. "Energy-Aware MPTCP Scheduling in Heterogeneous Wireless Networks Using Multi-Agent Deep Reinforcement Learning Techniques" Electronics 12, no. 21: 4496. https://doi.org/10.3390/electronics12214496
APA StyleArain, Z. A., Qiu, X., Xu, C., Wang, M., & Abdul Rahim, M. (2023). Energy-Aware MPTCP Scheduling in Heterogeneous Wireless Networks Using Multi-Agent Deep Reinforcement Learning Techniques. Electronics, 12(21), 4496. https://doi.org/10.3390/electronics12214496