An EnergyEfficient Routing Protocol with Reinforcement Learning in SoftwareDefined Wireless Sensor Networks
Abstract
:1. Introduction
 Identify the limitations of current RLbased routing schemes.
 Propose an intelligent multiobjective routing protocol named DOSRL for SDWSNIoT.
 Present distinctive features of the proposed routing scheme.
 Implement and evaluate the performance of the proposed solution in comparison to the OSPF and SDNQ routing schemes in terms of energy efficiency and other related parameters.
2. Background and Related Work
2.1. SDWSNIoT
 Adaptive Optimization: Machine learning algorithms can analyze realtime network conditions and historical data to make intelligent decisions for optimizing energy consumption in IoT applications. By dynamically adapting to changing network conditions, the algorithms can optimize the operation of sensor nodes, reducing unnecessary energy consumption [21].
 Intelligent Routing: Machine learning algorithms can learn from past routing patterns and make predictions about future network conditions. This enables them to intelligently route data, considering factors such as energy constraints, network congestion, and node availability. By choosing efficient routing paths, energy consumption can be minimized while maintaining effective data transmission [22,23,24].
 Proactive Resource Allocation: Machine learning algorithms can analyze data patterns and predict resource demands in advance. This allows for proactive resource allocation, ensuring that resources, such as energy, computational power, and memory, are efficiently allocated to meet the demands of IoT applications. By optimizing resource allocation, energy efficiency can be significantly improved [25].
 Anomaly Detection: Machine learning algorithms can detect anomalies and abnormal behavior in the network. By identifying unusual patterns or events that may lead to energy wastage or inefficiency, proactive measures can be taken to mitigate such issues and optimize energy efficiency in IoT applications [26].
 Predictive Maintenance: Machine learning algorithms can analyze sensor data and predict potential failures or maintenance requirements. By proactively addressing maintenance needs, energy can be saved by preventing unexpected downtime and optimizing the overall performance of IoT devices [27].
2.2. Reinforcement LearningBased SDWSNIoT
3. Methodology
3.1. Preliminary
3.2. Correlated MultiObjective EnergyEfficient Routing Scheme for SDWSNIoT
3.2.1. Dynamic Objective Selection RL with Shaped Rewards
Algorithm 1 Dynamic Objective Selection (DOS) 

3.2.2. Components and Architecture of the Proposed DOSRL for the SDWSNIoT Scheme
3.2.3. The DOSRL Routing Scheme
 The Reward Function (R): In reinforcement learning, the agent evaluates the effectiveness of its actions and improves its policy by relying on rewards collected from the environment. The rewards obtained are typically dependent on the actions taken, with varying actions resulting in differing rewards. To implement the proposed DOSRL scheme, we define three reward functions, each corresponding to one of the correlated objectives: the selection of routing paths with sufficient energy for data forwarding (${o}_{1}$), load balancing (${o}_{2}$), and the selection of paths with a good link quality for a reliable data transfer (${o}_{3}$). Route selection based on any of these objectives is expected to improve the energy efficiency of the IoT network system. Details on each objective of the DOSRL scheme and its related reward functions are provided below:
 (a)
 Energyconsumption: To achieve objective ${o}_{1}$, we consider the energyconsumption parameter, which is a critical factor in determining the overall energy efficiency of an SDWSN: for instance, in a scenario where node i forwards a packet to node j. The reward received by node i for selecting node j as its relay node in terms of energy consumption is estimated by the reward function $R\_{E}_{i,j}$ for the stateaction pair, $\left({s}_{i},{s}_{j}\right)$ as:$$R\_{E}_{i,j}=Remaining\_Energy({s}_{j})Remaining\_Energy({s}_{i})$$
 (b)
 Load balance: To optimize energy efficiency and network performance in the SDWSN, the careful selection of relay nodes and balanced workload distribution are crucial. Otherwise, some nodes along overused paths may become overloaded, leading to bottlenecks and delays, which can result in a degraded network performance. For objective ${o}_{2}$, we utilize the parameteravailable buffer length to estimate the degree of queue congestion in relay nodes. The reward $R\_{Q}_{i,j}$ for load balancing as observed by node i when selecting node j is computed as follows:$$R\_{Q}_{i,j}=\frac{Available\_Buffer\_Length\left({s}_{j}\right)}{Max\_Buffer\_Length}\phantom{\rule{1.em}{0ex}}\frac{Load\left({s}_{i}\right)}{Max\_Load}\phantom{\rule{1.em}{0ex}}$$
 (c)
 Link quality: A wireless link can be measured by retrieving useful information from either the sender or receiver side. To achieve ${o}_{3}$, we use simple measurements to estimate the link quality based on the parameter packet reception ratio (PRR), measured as the ratio of the total number of packets successfully received to the total number of packets transmitted through a specific wireless link between two nodes. Unlike other sophisticated techniques, this approach involves a low computation and communication overhead. Instead of using an instant value of the PRR, we calculate an averageovertime using an ExponentiallyWeighted Moving Average (EWMA) filter. Suppose node i is forwarding data packets to node j, the reward $R\_L{Q}_{i,j}$ received by node j based on PRR using EWMA is estimated as follows:$$R\_L{Q}_{i,j}=PRR\_es{t}_{i,j}(k1)\xb7{\alpha}_{EWMA}+PRR\_sampl{e}_{i,j}\left(k\right)\xb7(1{\alpha}_{EWMA}),$$
 The State Space (S): We define the state space as a graph corresponding to the global topology created by the RNs on the data plane, as seen by the intelligent RL controller. Each state in the state space corresponds to an RN, and a state transition refers to a link connecting two RNs. The intelligent RL controller uses the partial maps created by the topology discovery module to create a global topology. Therefore, the cardinality of the set of states depends on the number of nodes that can actively participate in routing.
 The Action Space (A): The action space, denoted as A, includes all possible actions that an agent can undertake from a given state of the RL environment. It defines the choices available to the agent at each time step, presenting the range of options to the agent. In our specific problem, the discrete action space comprises a finite number of actions that the RL agent can select when in a particular state ${s}_{i}\in S$. The cardinality of A at state i is determined by the number of nodes eligible to participate in the routing process from that specific state.
 The Optimal Policy (P): The policy determines how the learning agent should behave when it is at a given state with the purpose of maximizing the reward value in the learning process. Our proposed scheme estimates the Qfunction of every objective o simultaneously and decides, before every action selection decision, which objective estimate ${o}_{best}$ an agent will consider in its decisionmaking process. We use the concept of confidence on computed Qvalues by representing each action as a distribution and using a normal distribution of Qvalues and the mean values to keep track of the variance as shown in Equation (2). The agent approximates the optimal Qfunction by visiting all pairs of actionstates and stores the updated Qvalues in the Qtable. In our proposed scheme, the approximated Qvalue $Q({s}_{t},{a}_{t})$ represents the expected cumulative reward when the RL agent is in the state ${s}_{t}$ and takes action ${a}_{t}$, transitioning to a new state ${s}_{t+1}$ while maximizing the cumulative rewards for an objective ${o}_{n}$, where $n\in N$ and N denote the set of all objectives. The Qlearning equation to update Qvalues is designed as shown below in Equation (7):$${Q}^{n}({s}_{t},{a}_{t})\leftarrow (1\alpha )\xb7{Q}^{n}({s}_{t},{a}_{t})+\alpha \xb7[{R}_{t+1}^{n}+\gamma \xb7\underset{\forall a\in A}{max}\left\{{Q}^{n}({s}_{t+1},a)\right\}].$$To steer the exploration behavior of the learning agent by incorporating some heuristic knowledge on the problem domain, we introduce an extra reward ${F}_{t+1}$ onto the reward received from the environment ${R}_{t+1}$. The newly added shaping reward function F is included when updating the Qlearning rule as follows:$${Q}^{n}({s}_{t},{a}_{t})\leftarrow (1\alpha )\xb7{Q}^{n}({s}_{t},{a}_{t})+\alpha \xb7[{R}_{t+1}^{n}+{F}_{t+1}+\gamma \xb7\underset{\forall a\in A}{max}{Q}^{n}({s}_{t+1},a)].$$To avoid changes on the optimal policy, F is implemented as the difference of some potential value, ${F}_{t+1}$ = $\gamma \varphi (t+1)\varphi \left(t\right)$ over the state space, where $\varphi $ is a potential function that provides some hints on states. In our study, we define $\varphi $ as the Normalized Euclidean Distance between the current state(s) and the goal state(G) expressed as:$$\varphi =1\frac{\mathrm{distance}({S}_{t},G)}{{max}_{\forall x\in S}\left(\mathrm{distance}(x,G)\right)},$$
 In DOS Qrouting: To find the best actionvalue of the Qfunction, the learning agents use an action selection mechanism to tradeoff between the exploitation and exploration of available action space. To achieve this, our proposed scheme uses the $\u03f5$greedy exploration and exploitation method, whereby $\u03f5\in [0,1]$, allowing the agent to exploit with probability pr = $\u03f5$ and explore with probability pr = 1 −$\u03f5$. The agent action selection is determined by a randomly generated number $x\in [0,1]$, of which if $x<\u03f5$, the agent exploits it by taking an action that returns the most expected optimal value; otherwise, it explores it by selecting a random action based on the most confident objective ${o}_{best}$ as observed from the current state by the learning agent as shown below:$$\mathrm{actionSelection}=\left(\right)open="\{"\; close>\begin{array}{cc}\underset{a\in A}{max}\left\{Q(s,a,{o}_{\mathrm{best}})\right\},\hfill & \mathrm{if}\phantom{\rule{4.pt}{0ex}}x\u03f5\hfill \\ \mathrm{randomAction}(Q(s,a,{o}_{\mathrm{best}}),\cdots ,Q(s,{a}_{n},{o}_{\mathrm{best}})),\hfill & \mathrm{otherwise}\hfill \end{array}$$
3.2.4. DOSRL Routing Algorithm
Algorithm 2 The DOSRL with Shaped Rewards Routing Scheme 

 Initialization and Setup: The intelligent controller is initialized and assigned an IP address. Partial topologies received from the controllers are used to create a global topology containing the eligible RNs. Set the learning rate $\alpha $, discount factor $\gamma $, and exploration rate $\epsilon $, and define the potential function $\varphi $ to provide hints on states for shaping rewards (Lines 2 to 7).
 LoopFree Route Calculation: Loopfree routes are computed using the Spanning Tree Protocol (STP) algorithm for all given sourcesink pairs. This step ensures that the Qlearning algorithm focuses only on exploring states that lead to the end goal. Here, a source node is defined as any RN that receives data from the SN(s) at the sensing layer (Line 8).
 QRouting Exploration: The set of computed loopfree routes serves as input for the SDN intelligent controller, which uses the Qrouting algorithm to find the best path for routing. The Qlearning process runs for a specified number of episodes or until the final state is reached. For each state ${S}_{t}$, compute Qvalues for every action ${a}_{t}$ for each objective o simultaneously, based on the current Qtable values (Lines 9 to 13).
 QLearning Process: During the learning process, the RL agent dynamically chooses the learning objective from the current state, takes the next action, and moves to the next state. Regarding the randomly generated number x ranging between 0 and 1, according to the exploration and exploitation $\epsilon $greedy method, if x < $\epsilon $, then exploit with the probability $pr=\epsilon $ by selecting the action with the highest Qvalue for the most confident objective ${o}_{best}$. Otherwise, explore with the probability $pr=1\epsilon $ by randomly selecting an action for the most confident objective ${o}_{best}$ (Lines 14 to 19).
 Optimal Route Determination: The RL agent uses the generated Qtable to determine the most rewarding route between the source and sink nodes. This is based on the stateaction combinations that received the highest Qvalues after completing a transition. Finally, all found optimum routes are stored and sent to respective controllers. These routes are installed or updated in the routing table of the RNs for immediate use in the network.
3.2.5. Algorithm Time Complexity Analysis
4. Performance Evaluation
4.1. Simulation Environment
4.2. Simulation Settings
4.3. Simulation Parameters Setup
4.4. Performance Evaluation
 PDR: This refers to the ratio of successfully delivered packets to the total number of packets sent within the network. It is a measure of the effectiveness of the routing protocols and communication infrastructure in ensuring that packets reach their intended destinations without loss or errors. A higher PDR indicates a more reliable and efficient network, whereas a lower PDR suggests potential issues such as packet loss, congestion, or faulty routing. Monitoring and optimizing the PDR is crucial in evaluating and improving the overall performance and reliability of the WSNs.
 E2E: This is the time it takes for a packet to travel from the source node to the destination node. It includes processing, queuing, transmission, and propagation delays. A lower delay is better for faster and more efficient communication. Minimizing the endtoend delay is crucial in the WSNs to ensure timely and reliable data delivery.
 EE: This refers to the ability of the network to perform its intended tasks and communication while utilizing minimal energy resources. We defined this metric as the ratio of the PDR to the average energy consumed by the RNs. It is a critical consideration in the WSNs due to the limited and often nonrechargeable energy sources available to the sensor nodes. By enhancing energy efficiency, the WSNs can achieve a longer network lifetime, extended monitoring capabilities, and reduced maintenance requirements.
4.4.1. Packet Delivery Ratio of Routing Protocols
4.4.2. EndToEnd Delay of Routing Protocols
4.4.3. Energy Efficiency of Routing Protocols
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
 Silva, B.N.; Khan, M.; Han, K. Towards sustainable smart cities: A review of trends, architectures, components, and open challenges in smart cities. Sustain. Cities Soc. 2018, 38, 697–713. [Google Scholar] [CrossRef]
 Jin, J.; Gubbi, J.; Marusic, S.; Palaniswami, M. An information framework for creating a smart city through internet of things. IEEE Internet Things J. 2014, 1, 112–121. [Google Scholar] [CrossRef]
 Syed, A.S.; SierraSosa, D.; Kumar, A.; Elmaghraby, A. IoT in smart cities: A survey of technologies, practices and challenges. Smart Cities 2021, 4, 429–475. [Google Scholar] [CrossRef]
 Mainetti, L.; Patrono, L.; Vilei, A. Evolution of wireless sensor networks towards the internet of things: A survey. In Proceedings of the IEEE SoftCOM 2011, 19th International Conference on Software, Telecommunications and Computer Networks, Split, Croatia, 15–17 September 2011; pp. 1–6. [Google Scholar]
 Ngu, A.H.; Gutierrez, M.; Metsis, V.; Nepal, S.; Sheng, Q.Z. IoT middleware: A survey on issues and enabling technologies. IEEE Internet Things J. 2016, 4, 1–20. [Google Scholar] [CrossRef]
 Ijemaru, G.K.; Ang, K.L.M.; Seng, J.K. Wireless power transfer and energy harvesting in distributed sensor networks: Survey, opportunities, and challenges. Int. J. Distrib. Sens. Netw. 2022, 18, 15501477211067740. [Google Scholar] [CrossRef]
 Amini, N.; Vahdatpour, A.; Xu, W.; Gerla, M.; Sarrafzadeh, M. Cluster size optimization in sensor networks with decentralized clusterbased protocols. Comput. Commun. 2012, 35, 207–220. [Google Scholar] [CrossRef] [PubMed]
 Kobo, H.I.; AbuMahfouz, A.M.; Hancke, G.P. A survey on softwaredefined wireless sensor networks: Challenges and design requirements. IEEE Access 2017, 5, 1872–1899. [Google Scholar] [CrossRef]
 Zhao, Y.; Li, Y.; Zhang, X.; Geng, G.; Zhang, W.; Sun, Y. A survey of networking applications applying the software defined networking concept based on machine learning. IEEE Access 2019, 7, 95397–95417. [Google Scholar] [CrossRef]
 Da Xu, L.; He, W.; Li, S. Internet of things in industries: A survey. IEEE Trans. Ind. Inform. 2014, 10, 2233–2243. [Google Scholar]
 Sharma, N.; Shamkuwar, M.; Singh, I. The history, present, and future with IoT. In Internet of Things and Big Data Analytics for Smart Generation; Springer: Cham, Switzerland, 2019; pp. 27–51. [Google Scholar]
 Al Nuaimi, E.; Al Neyadi, H.; Mohamed, N.; AlJaroodi, J. Applications of big data to smart cities. J. Internet Serv. Appl. 2015, 6, 1–15. [Google Scholar] [CrossRef]
 Nishi, H.; Nakamura, Y. IoTbased monitoring for smart community. Urban Syst. Des. 2020, 335–344. [Google Scholar]
 AlTurjman, F.; Zahmatkesh, H.; Shahroze, R. An overview of security and privacy in smart cities’ IoT communications. Trans. Emerg. Telecommun. Technol. 2022, 33, e3677. [Google Scholar] [CrossRef]
 Bukar, U.A.; Othman, M. Architectural design, improvement, and challenges of distributed softwaredefined wireless sensor networks. Wirel. Pers. Commun. 2022, 122, 2395–2439. [Google Scholar] [CrossRef]
 Faheem, M.; Butt, R.A.; Raza, B.; Ashraf, M.W.; Ngadi, M.A.; Gungor, V.C. Energy efficient and reliable data gathering using internet of softwaredefined mobile sinks for WSNsbased smart grid applications. Comput. Stand. Interfaces 2019, 66, 103341. [Google Scholar] [CrossRef]
 Luo, T.; Tan, H.P.; Quek, T.Q. Sensor OpenFlow: Enabling softwaredefined wireless sensor networks. IEEE Commun. Lett. 2012, 16, 1896–1899. [Google Scholar] [CrossRef]
 Mathebula, I.; Isong, B.; Gasela, N.; AbuMahfouz, A.M. Analysis of SDNbased security challenges and solution approaches for SDWSN usage. In Proceedings of the 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), Vancouver, BC, Canada, 12–14 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1288–1293. [Google Scholar]
 JuradoLasso, F.F.; Marchegiani, L.; Jurado, J.F.; AbuMahfouz, A.M.; Fafoutis, X. A survey on machine learning softwaredefined wireless sensor networks (mlSDWSNS): Current status and major challenges. IEEE Access 2022, 10, 23560–23592. [Google Scholar] [CrossRef]
 Shah, S.F.A.; Iqbal, M.; Aziz, Z.; Rana, T.A.; Khalid, A.; Cheah, Y.N.; Arif, M. The role of machine learning and the internet of things in smart buildings for energy efficiency. Appl. Sci. 2022, 12, 7882. [Google Scholar] [CrossRef]
 Suryadevara, N.K. Energy and latency reductions at the fog gateway using a machine learning classifier. Sustain. Comput. Inform. Syst. 2021, 31, 100582. [Google Scholar] [CrossRef]
 Musaddiq, A.; Nain, Z.; Ahmad Qadri, Y.; Ali, R.; Kim, S.W. Reinforcement learningenabled crosslayer optimization for lowpower and lossy networks under heterogeneous traffic patterns. Sensors 2020, 20, 4158. [Google Scholar] [CrossRef]
 Mao, Q.; Hu, F.; Hao, Q. Deep learning for intelligent wireless networks: A comprehensive survey. IEEE Commun. Surv. Tutor. 2018, 20, 2595–2621. [Google Scholar] [CrossRef]
 Alsheikh, M.A.; Lin, S.; Niyato, D.; Tan, H.P. Machine learning in wireless sensor networks: Algorithms, strategies, and applications. IEEE Commun. Surv. Tutor. 2014, 16, 1996–2018. [Google Scholar] [CrossRef]
 Wang, M.; Cui, Y.; Wang, X.; Xiao, S.; Jiang, J. Machine learning for networking: Workflow, advances and opportunities. IEEE Netw. 2017, 32, 92–99. [Google Scholar] [CrossRef]
 DeMedeiros, K.; Hendawi, A.; Alvarez, M. A survey of AIbased anomaly detection in IoT and sensor networks. Sensors 2023, 23, 1352. [Google Scholar] [CrossRef] [PubMed]
 Kanawaday, A.; Sane, A. Machine learning for predictive maintenance of industrial machines using IoT sensor data. In Proceedings of the 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 24–26 November 2017; pp. 87–90. [Google Scholar]
 Sellami, B.; Hakiri, A.; Yahia, S.B.; Berthou, P. Energyaware task scheduling and offloading using deep reinforcement learning in SDNenabled IoT network. Comput. Netw. 2022, 210, 108957. [Google Scholar] [CrossRef]
 Ouhab, A.; Abreu, T.; Slimani, H.; Mellouk, A. Energyefficient clustering and routing algorithm for largescale SDNbased IoT monitoring. In Proceedings of the ICC 2020—2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar]
 Xu, S.; Liu, Q.; Gong, B.; Qi, F.; Guo, S.; Qiu, X.; Yang, C. RJCC: Reinforcementlearningbased joint communicationalandcomputational resource allocation mechanism for smart city IoT. IEEE Internet Things J. 2020, 7, 8059–8076. [Google Scholar] [CrossRef]
 Yao, H.; Mai, T.; Xu, X.; Zhang, P.; Li, M.; Liu, Y. NetworkAI: An intelligent network architecture for selflearning control strategies in software defined networks. IEEE Internet Things J. 2018, 5, 4319–4327. [Google Scholar] [CrossRef]
 Younus, M.U.; Khan, M.K.; Bhatti, A.R. Improving the softwaredefined wireless sensor networks routing performance using reinforcement learning. IEEE Internet Things J. 2021, 9, 3495–3508. [Google Scholar] [CrossRef]
 Yu, C.; Lan, J.; Guo, Z.; Hu, Y. DROM: Optimizing the routing in softwaredefined networks with deep reinforcement learning. IEEE Access 2018, 6, 64533–64539. [Google Scholar] [CrossRef]
 Andres, A.; VillarRodriguez, E.; Martinez, A.D.; Del Ser, J. Collaborative exploration and reinforcement learning between heterogeneously skilled agents in environments with sparse rewards. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Virtual, 18–22 July 2021; pp. 1–10. [Google Scholar]
 Mammeri, Z. Reinforcement learning based routing in networks: Review and classification of approaches. IEEE Access 2019, 7, 55916–55950. [Google Scholar] [CrossRef]
 Marler, R.T.; Arora, J.S. The weighted sum method for multiobjective optimization: New insights. Struct. Multidiscip. Optim. 2010, 41, 853–862. [Google Scholar] [CrossRef]
 Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
 Brys, T.; Van Moffaert, K.; Nowé, A.; Taylor, M.E. Adaptive objective selection for correlated objectives in multiobjective reinforcement learning. In Proceedings of the 2014 International Conference on Autonomous Agents and MultiAgent Systems, Paris, France, 5–9 May 2014; pp. 1349–1350. [Google Scholar]
 Alkama, L.; BoualloucheMedjkoune, L. IEEE 802.15. 4 historical revolution versions: A survey. Computing 2021, 103, 99–131. [Google Scholar] [CrossRef]
 Maleki, M.; Hakami, V.; Dehghan, M. A modelbased reinforcement learning algorithm for routing in energy harvesting mobile adhoc networks. Wirel. Pers. Commun. 2017, 95, 3119–3139. [Google Scholar] [CrossRef]
 Henderson, T.R.; Lacage, M.; Riley, G.F.; Dowell, C.; Kopena, J. Network simulations with the ns3 simulator. SIGCOMM Demonstr. 2008, 14, 527. [Google Scholar]
 Yin, H.; Liu, P.; Liu, K.; Cao, L.; Zhang, L.; Gao, Y.; Hei, X. ns3ai: Fostering artificial intelligence algorithms for networking research. In Proceedings of the 2020 Workshop on ns3, Gaithersburg, MD, USA, 17–18 June 2020; pp. 57–64. [Google Scholar]
 Gawłowicz, P.; Zubow, A. Ns3 meets openai gym: The playground for machine learning in networking research. In Proceedings of the 22nd International ACM Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, Miami Beach, FL, USA, 25–29 November 2019; pp. 113–120. [Google Scholar]
Parameters  Value  Description 

$\alpha $  0.1  Learning rate 
$\delta $  0.99  The discount factor 
$\u03f5$  0.3  Exploration rate 
$\phi $  0.01  Minimum exploration rate 
$\beta $  0.995  Exploration decay rate 
Parameters  Value/Description 

Traffic type  UDP 
Learning rate $\alpha $  0.3 
Number of RNs  30, 45, and 60 
Discount factor $\delta $  0.85 
Packet size (Bits)  512 
Simulation Time (s)  200 
Number of source nodes  2, 4, 6, 8, and 10 
Deployment of sensor nodes  Random 
Packet generation rate (pkts/sec)  10 
The initial energy of sensor nodes  5 Joules 
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Godfrey, D.; Suh, B.; Lim, B.H.; Lee, K.C.; Kim, K.I. An EnergyEfficient Routing Protocol with Reinforcement Learning in SoftwareDefined Wireless Sensor Networks. Sensors 2023, 23, 8435. https://doi.org/10.3390/s23208435
Godfrey D, Suh B, Lim BH, Lee KC, Kim KI. An EnergyEfficient Routing Protocol with Reinforcement Learning in SoftwareDefined Wireless Sensor Networks. Sensors. 2023; 23(20):8435. https://doi.org/10.3390/s23208435
Chicago/Turabian StyleGodfrey, Daniel, BeomKyu Suh, Byung Hyun Lim, KyuChul Lee, and KiIl Kim. 2023. "An EnergyEfficient Routing Protocol with Reinforcement Learning in SoftwareDefined Wireless Sensor Networks" Sensors 23, no. 20: 8435. https://doi.org/10.3390/s23208435