# Multi-Agent Reinforcement Learning Framework in SDN-IoT for Transient Load Detection and Prevention

^{*}

## Abstract

**:**

## 1. Introduction

## 2. SARL and MARL

## 3. Review of Similar Work

## 4. Main Concept

## 5. Proposed MADDPG Framework

Algorithm 1 Markov Decision Process (MDP) [44] |

An MDP is a 5-tuple $\left(S,A,P,R,\gamma \right)$, where; |

$\mathrm{S}$ is a set of states |

$\mathrm{A}$ is a set of actions |

$\mathrm{P}\left(\mathrm{s},\mathrm{a},\mathrm{s}\prime \right)$ is the probability that action $\mathrm{a}$ in state $\mathrm{s}$ at time $\mathrm{t}$ |

will lead to state $\mathrm{s}\prime $ at time $\mathrm{t}+1$ |

$\mathrm{R}\left(\mathrm{s},\mathrm{a},\mathrm{s}\prime \right)$ is the immediate reward received after a transition |

from state $\mathrm{s}$ to $\mathrm{s}\prime $, due to action $\mathrm{a}$ |

$\mathsf{\gamma}$ is the discounted factor, which is used to generate a discounted |

reward |

Algorithm 2 Proposed MADDPG [46] |

1: for episode = 1 to $M$ do2: Initialize a random process $\mathcal{N}$ for action exploration 3: Receive $=$ $\left[B{k}_{1}\left(t\right),B{k}_{2}\left(t\right),..........B{k}_{N}\left(t\right);G{k}_{1}\left(t\right),G{k}_{2}\left(t\right),\dots \dots \dots \dots G{k}_{N}\left(t\right);C{k}_{1}\left(t\right),C{k}_{2}\left(t\right),\dots \dots \dots \dots C{k}_{N}\left(t\right)\right]$ 4: for $t=1$ to max-episode-length do5: for agent $1$, select action ${a}_{agent1}\left(t\right)$ 6: Executive ${a}_{agent1}\left(t\right)=\left[{a}_{{k}_{1\dots n}}^{avail}\left(t\right),{a}_{{k}_{1\dots n}}^{Bincrease}\left(t\right)TH\right]$ and observe ${R}_{agent1}\left(t\right)=\frac{1}{U}$ for agent $2$, select action ${a}_{agent2}\left(t\right)$ Executive ${a}_{agent2}\left(t\right)=\left[{a}_{{k}_{1\dots n}}^{flo{w}_{i}}\left(t\right)>RR\right]$ and observe ${R}_{agent2}\left(t\right)=\{\begin{array}{c}1,flo{w}_{i}\left(t\right)RR!={k}_{1\dots n}\\ -1,{z}_{i}\cong loss\end{array}$ and new state $x\prime $ 7: Store $x,{a}_{1},{r}_{1}{x}^{\prime}\dots \dots x,{a}_{2},{r}_{2}{x}^{\prime}$ in replay buffer Ɗ 8: $x\u27f5x\prime $ 9: for agent $i=1$ to $2$ do10: Sample a random minibatch of $S$ samples ( ${x}^{j},{a}_{1}^{j},{r}_{1}^{j},x{\u2019}^{j}\dots {x}^{j},{a}_{2}^{j},{r}_{2}^{j},x{\prime}^{j})$ from Ɗ 11: Set ${y}^{j}={r}_{i}^{j}+\gamma {Q}_{i}^{{\mu}^{\prime}}({{x}^{\prime}}^{j},{{a}^{\prime}}_{1},{{a}^{\prime}}_{2})|{{a}^{\prime}}_{k}={\mu}^{\prime}{}_{k}({o}_{k}^{j})$ 12: Update critic by minimizing the loss $\mathcal{L}\left({\theta}_{i}\right)=\frac{1}{S}{\displaystyle \sum}_{j}({y}^{j}-{Q}_{i}^{\mu}({x}^{j},{a}_{1}^{j},{a}_{2}^{j}))$ ^{2}13: Update actor using the sampled policy gradient: ${\nabla}_{\theta i}J\approx \frac{1}{S}{\displaystyle \sum}_{j}{\nabla}_{\theta i}{\mu}_{i}({o}_{i}^{j})\nabla {a}_{i}{Q}_{i}^{\mu}({x}^{j},{a}_{1}^{j},{a}_{2}^{j})|{a}_{i}={\mu}_{i}({o}_{i}^{s})$ 14: end for15: Update target network parameters for each agent $i$ ${{\theta}^{\prime}}_{i}\u27f5$ $\tau {\theta}_{i}+\left(1-\tau \right)\theta {\prime}_{i}$ 16: end for17: end for |

## 6. MADDPG and Deep Neural Network (DNN)

## 7. Experimental Results and Performance Evaluation

#### 7.1. Experimental Set up

^{®}Pentium

^{®}CPU G2030 @ 3.00 GHz and 8 GB RAM capacity. The software requirement also includes Mininet-2.0 and Python-2.7.

#### 7.2. Experimental Results

#### 7.2.1. Reward

#### 7.2.2. Jitter

#### 7.2.3. End to End Delay

#### 7.2.4. Packet Loss

#### 7.2.5. Bandwidth Usage

#### 7.2.6. DDoS Detection Rate

#### 7.2.7. Global Reward

#### 7.2.8. Jitter

#### 7.2.9. Delay vs. No. of Switches

#### 7.2.10. Packet Loss

#### 7.2.11. Bandwidth Usage

#### 7.2.12. Intrusion Detection Rate

## 8. Conclusions and Future Work

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Santos, A.F.C.; Teles, Í.P.; Siqueira, O.M.P.; de Oliveira, A.A. Big data: A systematic review. Adv. Intell. Syst. Comput.
**2018**, 558, 501–506. [Google Scholar] - Ocean, B. Optical Studies on Sol-Gel Derived Lead Chloride Crystals. J. Appl. Eng. Comput. Sci.
**2013**, 2, 5. [Google Scholar] - Mehmood, Y.; Haider, N.; Imran, M.; Timm-Giel, A.; Guizani, M. M2M Communications in 5G: State-of-the-Art Architecture, Recent Advances, and Research Challenges. IEEE Commun. Mag.
**2017**, 55, 194–201. [Google Scholar] [CrossRef] - Mattisson, S. Overview of 5G requirements and future wireless networks. In Proceedings of the ESSCIRC 2017-43rd IEEE European Solid State Circuits Conference, Leuven, Belgium, 11–14 September 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Xiao, L.; Wan, X.; Lu, X.; Zhang, Y.; Wu, D. IoT Security Techniques Based on Machine Learning: How Do IoT Devices Use AI to Enhance Security? IEEE Signal Process. Mag.
**2018**, 35, 41–49. [Google Scholar] [CrossRef] - Ni, J.; Zhang, K.; Vasilakos, A.V. Security and Privacy for Mobile Edge Caching: Challenges and Solutions. IEEE Wirel. Commun.
**2020**, 1–7. [Google Scholar] [CrossRef] - Vishwakarma, R.; Jain, A.K. A survey of DDoS attacking techniques and defence mechanisms in the IoT network. Telecommun. Syst.
**2020**, 73, 3–25. [Google Scholar] [CrossRef] - Xia, W.; Wen, Y.; Foh, C.H.; Niyato, D.; Xie, H. A Survey on Software-Defined Networking. IEEE Commun. Surv. Tutor.
**2015**, 17, 27–51. [Google Scholar] [CrossRef] - Wickboldt, J.; De Jesus, W.; Isolani, P.; Both, C.; Rochol, J.; Granville, L. Software-defined networking: Management requirements and challenges. IEEE Commun. Mag.
**2015**, 53, 278–285. [Google Scholar] [CrossRef] - Hamdan, M.; Hassan, E.; Abdelaziz, A.; Elhigazi, A.; Mohammed, B.; Khan, S.; Vasilakos, A.V.; Marsono, M.N. A comprehensive survey of load balancing techniques in software-defined network. J. Netw. Comput. Appl.
**2021**, 174. [Google Scholar] [CrossRef] - Ray, S. A Quick Review of Machine Learning Algorithms. In Proceedings of the 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; pp. 35–39. [Google Scholar] [CrossRef]
- Almseidin, M.; Alzubi, M.; Kovacs, S.; Alkasassbeh, M. Evaluation of machine learning algorithms for intrusion detection system. In Proceedings of the 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY), Subotica, Serbia, 14–16 September 2017; pp. 277–282. [Google Scholar] [CrossRef] [Green Version]
- Kuzhippallil, M.A.; Joseph, C.; Kannan, A. Comparative Analysis of Machine Learning Techniques for Indian Liver Disease Patients. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Subotica, Serbia, 14–16 September 2017; pp. 778–782. [Google Scholar] [CrossRef]
- Merkert, J.; Mueller, M.; Hubl, M. A survey of the application of machine learning in decision support systems. In Proceedings of the European Conference on Information Systems 2015, Münster, Germany, 26–29 May 2015; pp. 1–15. [Google Scholar]
- Mishra, N.K.; Celebi, M.E. An Overview of Melanoma Detection in Dermoscopy Images Using Image Processing and Machine Learning. arXiv
**2016**, arXiv:1601.07843. [Google Scholar] - Amruthnath, N.; Gupta, T. A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance. In Proceedings of the 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), Singapore, 26–28 April 2018; pp. 355–361. [Google Scholar] [CrossRef]
- Recht, B. A Tour of Reinforcement Learning: The View from Continuous Control. Annu. Rev. Control Robot. Auton. Syst.
**2019**, 2, 253–279. [Google Scholar] [CrossRef] [Green Version] - Asiain, E.; Clempner, J.B.; Poznyak, A.S. Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies. Soft Comput.
**2019**, 23, 3591–3604. [Google Scholar] [CrossRef] - Bhunia, S.S.; Gurusamy, M. Dynamic attack detection and mitigation in IoT using SDN. In Proceedings of the 2017 27th International Telecommunication Networks and Applications Conference (ITNAC), Melbourne, VIC, Australia, 22–24 November 2017; pp. 1–6. [Google Scholar]
- Zhang, J.; Ye, M.; Guo, Z.; Yen, C.Y.; Chao, H.J. CFR-RL: Traffic Engineering with Reinforcement Learning in SDN. IEEE J. Sel. Areas Commun.
**2020**, 38, 2249–2259. [Google Scholar] [CrossRef] - Rischke, J.; Sossalla, P.; Salah, H.; Fitzek, F.H.P.; Reisslein, M. QR-SDN: Towards reinforcement learning states, actions, and rewards for direct flow routing in software-defined networks. IEEE Access
**2020**, 8, 174773–174791. [Google Scholar] [CrossRef] - Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern.
**2020**, 50, 3826–3839. [Google Scholar] [CrossRef] [Green Version] - Liu, X.; Xie, L.; Wang, Y.; Zou, J.; Xiong, J.; Ying, Z.; Vasilakos, A.V. Privacy and Security Issues in Deep Learning: A Survey. IEEE Access
**2020**, 9. [Google Scholar] [CrossRef] - Xu, G.; Li, H.; Ren, H.; Yang, K.; Deng, R.H. Data Security Issues in Deep Learning: Attacks, Countermeasures, and Opportunities. IEEE Commun. Mag.
**2019**, 57, 116–122. [Google Scholar] [CrossRef] - Fan, J.; Wang, Z.; Xie, Y.; Yang, Z. A Theoretical Analysis of Deep Q-Learning. arXiv
**2019**, arXiv:1901.00137. [Google Scholar] - Tafazzol, S.; Fathi, E.; Rezaei, M.; Asali, E. Curious Exploration and Return-based Memory Restoration for Deep Reinforcement Learning. arXiv
**2021**, arXiv:2105.00499. [Google Scholar] - Hou, Y.; Liu, L.; Wei, Q.; Xu, X.; Chen, C. A novel DDPG method with prioritized experience replay. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 316–321. [Google Scholar] [CrossRef]
- Isyaku, B.; Mohd Zahid, M.S.; Bte Kamat, M.; Abu Bakar, K.; Ghaleb, F.A. Software Defined Networking Flow Table Management of OpenFlow Switches Performance and Security Challenges: A Survey. Future Internet
**2020**, 12, 147. [Google Scholar] [CrossRef] - Naderializadeh, N.; Sydir, J.; Simsek, M.; Nikopour, H. Resource Management in Wireless Networks via Multi-Agent Deep Reinforcement Learning. IEEE Trans. Wirel. Commun.
**2021**, 20, 3507–3523. [Google Scholar] [CrossRef] - Bedawy, A.; Yorino, N.; Mahmoud, K.; Zoka, Y.; Sasaki, Y. Optimal Voltage Control Strategy for Voltage Regulators in Active Unbalanced Distribution Systems Using Multi-Agents. IEEE Trans. Power Syst.
**2020**, 35, 1023–1035. [Google Scholar] [CrossRef] - Dharmadhikari, C.; Kulkarni, S.; Temkar, S.; Bendale, S.; Student, B.E. A Study of DDoS Attacks in Software Defined Networks. Int. Res. J. Eng. Technol.
**2019**, 448–453. Available online: www.irjet.net (accessed on 20 May 2020). - Akbari, I.; Tahoun, E.; Salahuddin, M.A.; Limam, N.; Boutaba, R. ATMoS: Autonomous Threat Mitigation in SDN using Reinforcement Learning. In Proceedings of the NOMS 2020–2020 IEEE/IFIP Network Operations and Management Symposium, Budapest, Hungary, 20–24 April 2020. [Google Scholar] [CrossRef]
- Phan, T.V.; Gias, T.M.R.; Islam, S.T.; Huong, T.T.; Thanh, N.H.; Bauschert, T. Q-MIND: Defeating stealthy dos attacks in SDN with a machine-learning based defense framework. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019. [Google Scholar] [CrossRef] [Green Version]
- Liu, Y.; Dong, M.; Ota, K.; Li, J.; Wu, J. Deep Reinforcement Learning based Smart Mitigation of DDoS Flooding in Software-Defined Networks. In Proceedings of the 2018 IEEE 23rd International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), Barcelona, Spain, 17–19 September 2018. [Google Scholar] [CrossRef]
- Guo, X.; Lin, H.; Li, Z.; Peng, M. Deep-Reinforcement-Learning-Based QoS-Aware Secure Routing for SDN-IoT. IEEE Internet Things J.
**2020**, 7, 6242–6251. [Google Scholar] [CrossRef] - Phan, T.V.; Islam, S.T.; Nguyen, T.G.; Bauschert, T. Q-DATA: Enhanced Traffic Flow Monitoring in Software-Defined Networks applying Q-learning. In Proceedings of the 2019 15th International Conference on Network and Service Management (CNSM), Halifax, NS, Canada, 21–25 October 2019. [Google Scholar] [CrossRef] [Green Version]
- Yao, Z.; Wang, Y.; Qiu, X. DQN-based energy-efficient routing algorithm in software-defined data centers. Int. J. Distrib. Sens. Netw.
**2020**, 16. [Google Scholar] [CrossRef] - Stampa, G.; Arias, M.; Sanchez-Charles, D.; Muntes-Mulero, V.; Cabellos, A. A Deep-Reinforcement Learning Approach for Software-Defined Networking Routing Optimization. arXiv
**2017**, arXiv:1709.07080. [Google Scholar] - Yuan, T.; da Rocha Neto, W.; Rothenberg, C.E.; Obraczka, K.; Barakat, C.; Turletti, T. Dynamic Controller Assignment in Software Defined Internet of Vehicles through Multi-Agent Deep Reinforcement Learning. IEEE Trans. Netw. Serv. Manag.
**2021**, 18, 585–596. [Google Scholar] [CrossRef] - Wu, T.; Zhou, P.; Wang, B.; Li, A.; Tang, X.; Xu, Z.; Chen, K.; Ding, X. Joint Traffic Control and Multi-Channel Reassignment for Core Backbone Network in SDN-IoT: A Multi-Agent Deep Reinforcement Learning Approach. IEEE Trans. Netw. Sci. Eng.
**2021**, 8, 231–245. [Google Scholar] [CrossRef] - Yu, C.; Lan, J.; Guo, Z.; Hu, Y. DROM: Optimizing the Routing in Software-Defined Networks with Deep Reinforcement Learning. IEEE Access
**2018**, 6, 64533–64539. [Google Scholar] [CrossRef] - Gordon, H.; Batula, C.; Tushir, B.; Dezfouli, B.; Liu, Y. Securing Smart Homes via Software-Defined Networking and Low-Cost Traffic Classification. arXiv
**2021**, arXiv:2104.00296. [Google Scholar] - Van Otterlo, M.; Wiering, M. Reinforcement learning and markov decision processes. Adapt. Learn. Optim.
**2012**, 12, 3–42. [Google Scholar] [CrossRef] - Imani, M.; Ghoreishi, S.F.; Braga-Neto, U.M. Bayesian control of large MDPs with unknown dynamics in data-poor environments. Adv. Neural Inf. Process. Syst.
**2018**, 2018-Decem, 8146–8156. [Google Scholar] - Asadollahi, S.; Sameer, M. Ryu Controller’s Scalability Experiment on Software Defined Networks. In Proceedings of the 2018 IEEE International Conference on Current Trends in Advanced Computing (ICCTAC), Bangalore, India, 1–2 February 2018; pp. 3–7. [Google Scholar]
- Li, S. Multi-Agent Deep Deterministic Policy Gradient for Traffic Signal Control on Urban Road Network. In Proceedings of the 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications( AEECA), Dalian, China, 25–27 August 2020; pp. 896–900. [Google Scholar] [CrossRef]

Scenario 1 | No. of devices | 20 |

No. of controller | 1 | |

No. of switches | 5 | |

Scenario 2 | No. of devices | 60 |

No. of controller | 1 | |

No. of switches | 8 | |

Bandwidth | 50–100 Mbps | |

Traffic Type | TCP | |

Queue Type | Tail Drop | |

Transmission Rate | 2 Mbps | |

SDN Controller | Ryu |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Dake, D.K.; Gadze, J.D.; Klogo, G.S.; Nunoo-Mensah, H.
Multi-Agent Reinforcement Learning Framework in SDN-IoT for Transient Load Detection and Prevention. *Technologies* **2021**, *9*, 44.
https://doi.org/10.3390/technologies9030044

**AMA Style**

Dake DK, Gadze JD, Klogo GS, Nunoo-Mensah H.
Multi-Agent Reinforcement Learning Framework in SDN-IoT for Transient Load Detection and Prevention. *Technologies*. 2021; 9(3):44.
https://doi.org/10.3390/technologies9030044

**Chicago/Turabian Style**

Dake, Delali Kwasi, James Dzisi Gadze, Griffith Selorm Klogo, and Henry Nunoo-Mensah.
2021. "Multi-Agent Reinforcement Learning Framework in SDN-IoT for Transient Load Detection and Prevention" *Technologies* 9, no. 3: 44.
https://doi.org/10.3390/technologies9030044