An Intelligent TCP Congestion Control Method Based on Deep Q Network
Abstract
:1. Introduction
- (1)
- We propose TCP-DQN. By leveraging the state-of-the-art DQN algorithm, the training of the congestion control is more stable and faster than the conventional Q-learning method.
- (2)
- We implemented TCP-DQN and compared it with the cubic method, which is the state-of-the-art congestion control algorithm and is used as the default congestion control method in Linux kernel. The results show that TCP-DQN can improve the throughput by 200% compared with cubic.
2. Related Works
2.1. Traditional Congestion Control Protocols
2.2. Reinforcement Learning and Its Applications
3. Background
DQN
- (1)
- Each experience may be used for multiple parameter updating. Combing with the experience replay, the learning is more effective.
- (2)
- The samples of Q-learning learning are generated in continuous actions, which makes the samples have strong correlation with each other and result in oscillation in training. DQN randomly samples from the experience pool to eliminate the correlation between continuous samples and enhance stability of the training process.
- (3)
- DQN randomly picks samples from the experience pool for training, which avoids the problem that Q-learning may fall into a local optimal solution. In addition, DQN constructs two neural networks with the same structure for learning. They are: Q-network and target-Q network. Samples are generated by the target-Q network, and the Q-network is used for parameter update. After c-time parameter update, the parameters of Q-network are copied into target-Q network. This leads to more stable training because it keeps the target function fixed (for a while). The algorithm of DQN is described as follows (Algorithm 1):
Algorithm 1. DQN |
Initialize replay memory D to capacity N Initialize action value function Q with random weights θ For episode = 1, M do Initialize sequence and preprocessed sequence For t = 1, T do With probability ε select a random action at Otherwise select Execute action at in emulator and observe reward rt next state xt + 1 Set and preprocess Store transition in D Sample random minibatch of transitions from D Set Perform a gradient descent step on with respect to the network parameters θ End for End for |
4. Framework
4.1. Problem Formulation
4.2. State Space
- (1)
- The relative time t. The time that has been elapsed since the TCP connection was established. In cubic algorithm, the window size is designed as a cubic function of time t. Therefore, we consider t as an important parameter to determine the congestion window.
- (2)
- Congestion window size. Congestion control algorithm needs to adjust the new window size based on the current one. If the current congestion window is small, the agent will increase the window faster; else if the window is large, the agent will stop increasing the window size or slowly increasing the window size.
- (3)
- Number of unacknowledged bytes. The number of bytes sent but not confirmed by the receiver. If the network link is compared to a water pipe, the number of unacknowledged bytes can be understood as the water stored in the pipeline. If there is enough water in the pipeline, the water injected into the pipeline should be stopped or reduced, else the water should be injected more. Therefore, the water injection rate (congestion window size) should be determined according to the water volume (number of unacknowledged bytes) in the pipeline.
- (4)
- Number of ACK packets. This parameter can indirectly reflect the congestion situation. If the number of received ACK packets is normal, it indicates that the network is in a good condition and congestion has not occurred. The size of the congestion window can be increased. Otherwise, it indicates that the network is congested and the congestion window should be maintained or reduced.
- (5)
- Average RTT (round trip time) value. The average RTT value in the observation period. RTT refers to the total time spent from sending to receiving ACK for a data packet. The RTT value is closely related to the network congestion. If the network congestion is serious, the RTT value will increase significantly. Therefore, RTT value can reflect the network congestion. The congestion control algorithm should adjust the congestion window according to RTT value.
- (6)
- Throughput. The throughput during the observation period. Throughput is defined as the number of data bytes confirmed by the receiver per second. This parameter directly reflects the network status. High throughput indicates that enough packets have been sent into the network link. Otherwise, it indicates that the current network bandwidth is more redundant, and more packets can be sent on the link.
- (7)
- Number of lost packets. The number of lost packets indicates the congestion situation of the current network. If the number of lost packets is small, the agent should increase the congestion window. Otherwise, the agent should decrease the congestion window.
4.3. Action Space
4.4. Reward Function
5. Results and Discussion
5.1. Experimental Environment
- (1)
- CPU: Intel (R) Xeon (R) silver 4110 CPU @ 2.10 GHz processor;
- (2)
- Memory: 32 g DDR4 memory;
- (3)
- GPU: NVIDIA Titan V;
- (4)
- Operating system: Red Hat 4.8.5-28.
5.2. Throughput Comparison
5.3. RTT Comparison
5.4. Packet Loss Rate Comparison
6. Conclusions
- (1)
- This paper proposes a TCP congestion control algorithm based on a DQN algorithm, which abstracts the TCP congestion control process based on reinforcement learning into a partially observable Markov decision process. By referring to the mainstream algorithm, the state space and action space are reasonably designed, and the reasonable reward function is designed.
- (2)
- The throughput of DQN is up to 7 MB/s on average when the bandwidth is 10 MB/s. Compared with highspeed, cubic and NewReno algorithms, the throughput of TCP-DQN can reach more than 2–3 times. The RTT of DQN is 58 ms on average when the minimal RTT of the physical link is 56 ms.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Floyd, S.; Henderson, T. RFC2582: The NewReno Modification to TCP’s Fast Recovery Algorithm. 1999. Available online: https://dl.acm.org/doi/10.17487/RFC2582 (accessed on 29 September 2021).
- Brakmo, L.S.; Peterson, L.L. Tcp Vegas—End-to-End Congestion Avoidance on a Global Internet. IEEE J. Sel. Areas Commun. 1995, 13, 1465–1480. [Google Scholar] [CrossRef] [Green Version]
- Floyd, S. HighSpeed TCP for Large Congestion Windows. Rfc: 2003. Available online: https://www.hjp.at/doc/rfc/rfc3649.html (accessed on 29 September 2021).
- Ha, S.; Rhee, I.; Xu, L. CUBIC: A new TCP-friendly high-speed TCP variant. ACM Sigops Oper. Syst. Rev. 2008, 42, 64–74. [Google Scholar] [CrossRef]
- Xiao, L.; Lu, X.; Xu, D.; Tang, Y.; Wang, L.; Zhuang, W. UAV Relay in VANETs Against Smart Jamming with Reinforcement Learning. IEEE Trans. Veh. Technol. 2018, 67, 4087–4097. [Google Scholar] [CrossRef]
- Niroui, F.; Zhang, K.; Kashino, Z.; Nejat, G. Deep Reinforcement Learning Robot for Search and Rescue Applications: Exploration in Unknown Cluttered Environments. IEEE Robot. Autom. Lett. 2019, 4, 610–617. [Google Scholar] [CrossRef]
- Huang, S.; Lv, B.; Wang, R.; Huang, K. Scheduling for Mobile Edge Computing with Random User Arrivals: An Approximate MDP and Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2020, 69, 7735–7750. [Google Scholar] [CrossRef]
- Cao, Z.; Lin, C.; Zhou, M.; Huang, R. Scheduling Semiconductor Testing Facility by Using Cuckoo Search Algorithm with Reinforcement Learning and Surrogate Modeling. IEEE Trans. Autom. ENCE Eng. 2019, 16, 825–837. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Li, W.; Zhou, F.; Chowdhury, K.R.; Meleis, W.M. QTCP: Adaptive Congestion Control with Reinforcement Learning. IEEE Trans. Netw. Sci. Eng. 2018, 6, 445–458. [Google Scholar] [CrossRef]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Sun, G.; Zhou, R.; Sun, J.; Yu, H.; Vasilakos, A.V. Energy-efficient provisioning for service function chains to support delay-sensitive applications in network function virtualization. IEEE Internet Things J. 2020, 7, 6116–6131. [Google Scholar] [CrossRef]
- Sun, G.; Li, Y.; Yu, H.; Vasilakos, A.V.; Du, X.; Guizani, M. Energy-efficient and traffic-aware service function chaining orchestration in multi-domain networks. Future Gener. Comput. Syst. 2019, 91, 347–360. [Google Scholar] [CrossRef]
- Brakmo, L.S.; O’Malley, S.W.; Peterson, L.L. TCP Vegas: New techniques for congestion detection and avoidance. In Proceedings of the Conference on Communications Architectures, Protocols and Applications, London, UK, 31 August–2 September 1994; pp. 24–35. [Google Scholar]
- Gerla, M.; Sanadidi, M.; Wang, R.; Zanella, A.; Casetti, C.; Mascolo, S. TCP Westwood: Congestion window control using bandwidth estimation. In Global Telecommunications Conference, 2001. GLOBECOM ’01; IEEE: New York, NY, USA, 2001; Volume 3, pp. 1698–1702. [Google Scholar]
- Tan, K.; Song, J.; Zhang, Q.; Sridharan, M. A compound TCP approach for high-speed and long distance networks. In Proceedings of the IEEE INFOCOM 2006, Barcelona, Spain, 23–29 April 2006. [Google Scholar]
- Cardwell, N.; Cheng, Y.; Gunn, C.S.; Yeganeh, S.H.; Jacobson, V. BBR: Congestion-based congestion control. Queue 2016, 14, 20–53. [Google Scholar] [CrossRef]
- Busch, C.; Kannan, R.; Vasilakos, A.V. Approximating Congestion+ Dilation in Networks via “Quality of Routing” Games. IEEE Trans. Comput. 2011, 61, 1270–1283. [Google Scholar] [CrossRef] [Green Version]
- Liu, L.; Song, Y.; Zhang, H.; Ma, H.; Vasilakos, A.V. Physarum optimization: A biology-inspired algorithm for the steiner tree problem in networks. IEEE Trans. Comput. 2013, 64, 818–831. [Google Scholar]
- Dvir, A.; Vasilakos, A.V. Backpressure-based routing protocol for DTNs. In Proceedings of the ACM SIGCOMM 2010 Conference, New Delhi, India, 30 August–3 September 2010; pp. 405–406. [Google Scholar]
- Ji, Y.; Wang, J.; Xu, J.; Fang, X.; Zhang, H. Real-time energy management of a microgrid using deep reinforcement learning. Energies 2019, 12, 2291. [Google Scholar] [CrossRef] [Green Version]
- Fang, Y.; Huang, C.; Xu, Y.; Li, Y. RLXSS: Optimizing XSS detection model to defend against adversarial attacks based on reinforcement learning. Future Internet 2019, 11, 177. [Google Scholar] [CrossRef] [Green Version]
- Yi, J.-H.; Xing, L.-N.; Wang, G.-G.; Dong, J.; Vasilakos, A.V.; Alavi, A.H.; Wang, L. Behavior of crossover operators in NSGA-III for large-scale optimization problems. Inf. Sci. 2020, 509, 470–487. [Google Scholar] [CrossRef]
- Le, T.; Szepesvári, C.; Zheng, R. Sequential learning for multi-channel wireless network monitoring with channel switching costs. IEEE Trans. Signal Process. 2014, 62, 5919–5929. [Google Scholar] [CrossRef] [Green Version]
- Liu, N.; Li, Z.; Xu, J.; Xu, Z.; Lin, S.; Qiu, Q.; Tang, J.; Wang, Y. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA, 5–8 June 2017; pp. 372–382. [Google Scholar]
- Xu, Z.; Wang, Y.; Tang, J.; Wang, J.; Gursoy, M.C. A deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs. In Proceedings of the 2017 IEEE International Conference on Communications (ICC), Paris, France, 21–25 May 2017; pp. 1–6. [Google Scholar]
- Lu, T.; Chen, X.; McElroy, M.B.; Nielsen, C.P.; Wu, Q.; Ai, Q. A Reinforcement Learning-Based Decision System for Electricity Pricing Plan Selection by Smart Grid End Users. IEEE Trans. Smart Grid 2021, 12, 2176–2187. [Google Scholar] [CrossRef]
- Sp, A.; Paa, B.; Jmma, B. Energy-conscious optimization of Edge Computing through Deep Reinforcement Learning and two-phase immersion cooling. Future Gener. Comput. Syst. 2021, 125, 891–907. [Google Scholar]
- Jung, C.; Shim, D.H. Incorporating Multi-Context into the Traversability Map for Urban Autonomous Driving Using Deep Inverse Reinforcement Learning. IEEE Robot. Autom. Lett. 2021, 6, 1662–1669. [Google Scholar] [CrossRef]
- Deltetto, D.; Coraci, D.; Pinto, G.; Piscitelli, M.; Capozzoli, A. Exploring the Potentialities of Deep Reinforcement Learning for Incentive-Based Demand Response in a Cluster of Small Commercial Buildings. Energies 2021, 14, 2933. [Google Scholar] [CrossRef]
- Fischer, F.; Bachinski, M.; Klar, M.; Fleig, A.; Müller, J. Reinforcement learning control of a biomechanical model of the upper extremity. Sci. Rep. 2021, 11, 14445. [Google Scholar] [CrossRef] [PubMed]
- Habachi, O.; Shiang, H.-P.; Van Der Schaar, M.; Hayel, Y. Online learning based congestion control for adaptive multimedia transmission. IEEE Trans. Signal Process. 2013, 61, 1460–1469. [Google Scholar] [CrossRef]
- Hemmati, M.; Yassine, A.; Shirmohammadi, S. An online learning approach to QoE-fair distributed rate allocation in multi-user video streaming. In Proceedings of the 2014 8th International Conference on Signal Processing and Communication Systems (ICSPCS), Gold Coast, Australia, 15–17 December 2014; pp. 1–6. [Google Scholar]
- Van Der Hooft, J.; Petrangeli, S.; Claeys, M.; Famaey, J.; De Turck, F. A learning-based algorithm for improved bandwidth-awareness of adaptive streaming clients. In Proceedings of the 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), Ottawa, ON, Canada, 11–15 May 2015; pp. 131–138. [Google Scholar]
- Cui, L.; Yuan, Z.; Ming, Z.; Yang, S. Improving the Congestion Control Performance for Mobile Networks in High-Speed Railway via Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2020, 69, 5864–5875. [Google Scholar] [CrossRef]
- Xiao, K.; Mao, S.; Tugnait, J.K. TCP-Drinc: Smart congestion control based on deep reinforcement learning. IEEE Access 2019, 7, 11892–11904. [Google Scholar] [CrossRef]
- Bachl, M.; Zseby, T.; Fabini, J. Rax: Deep reinforcement learning for congestion control. In Proceedings of the ICC 2019–2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar]
- Wang, S.; Bi, J.; Wu, J.; Vasilakos, A.V.; Fan, Q. VNE-TD: A virtual network embedding algorithm based on temporal-difference learning. Comput. Netw. 2019, 161, 251–263. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Wang, L.; Dong, X. An Intelligent TCP Congestion Control Method Based on Deep Q Network. Future Internet 2021, 13, 261. https://doi.org/10.3390/fi13100261
Wang Y, Wang L, Dong X. An Intelligent TCP Congestion Control Method Based on Deep Q Network. Future Internet. 2021; 13(10):261. https://doi.org/10.3390/fi13100261
Chicago/Turabian StyleWang, Yinfeng, Longxiang Wang, and Xiaoshe Dong. 2021. "An Intelligent TCP Congestion Control Method Based on Deep Q Network" Future Internet 13, no. 10: 261. https://doi.org/10.3390/fi13100261
APA StyleWang, Y., Wang, L., & Dong, X. (2021). An Intelligent TCP Congestion Control Method Based on Deep Q Network. Future Internet, 13(10), 261. https://doi.org/10.3390/fi13100261