Toward an Adaptive Threshold on Cooperative Bandwidth Management Based on Hierarchical Reinforcement Learning
Abstract
:1. Introduction
2. Materials and Methods
- Each fragment has only a single fog node, and all IoT devices in a fragment are connected to the fog node of that fragment;
- In each time step, all fog nodes send their help lists, including their situation indicators (zm) and the lowest priority devices with their current bandwidth to all neighboring fog nodes;
- Fog node cooperation is performed through a dedicated wireless interface; therefore, fog node communication performs at disparate frequencies from those assigned for fog node communication with connected IoT devices;
3. Modeling and Results
3.1. First Learning Hierarchy Level: Learning the Best Threshold Value
3.1.1. State
3.1.2. Action
3.1.3. Policy
3.1.4. Reward Function
Algorithm 1: Learning the best threshold level in the first learning hierarchy level. |
1. Input: Initialize time step (n), α, γ,, Threshold, and matrices. Make as the accurate threshold matrix based on the transition states. |
2. While not converged, do |
1. Generate two binary random values as |
2. Determine the accurate threshold, Th, based on |
3. With probability ɛ, randomly choose a value for the threshold from levels. |
4. Otherwise, |
5. If ) |
6. else |
7. End |
8. Update the matrix |
9. |
10. Increase the time step n by 1 |
3. End while |
3.2. Second Learning Hierarchy Level: Learning the Best Helper
3.2.1. State
3.2.2. Action
3.2.3. Policy
3.2.4. Reward Function
Algorithm 2: Second-level algorithm of the learning hierarchy. |
1. Initialization |
2. While not converged, do |
1. For each element i of the learner’s flag, do |
1. If flag(i) is converted from 1 to 0, do Find device u that helped device i among all devices, including neighbors’ helpers, and return the borrowed bandwidth from di to du |
2. Else if flag(i) is converted from 0 to 1, do |
a. Make a feasible device set, including the nodes’ own feasible devices and neighbors’ feasible help lists. |
b. With probability ɛ, randomly choose device k among the feasible device set, otherwise: , where j is the index of the selected. |
c. Increase the bandwidth of di and decrease the bandwidth of dj, as much as di needs. |
d. Calculate penalty or punish(i) or blame(i) based on the learner’s action and then calculate RnQ: |
e. Update Q matrix: |
3. End if |
2. End for |
3. Make the next flag based on the learner’s device priorities. |
4. Repeat lines 3 to 26 for neighboring fog nodes |
5. Calculate SBMn using countpun considering all fragments |
6. Calculate the average SBM |
7. Increase the time step n by 1 |
3. End while |
3.3. Results
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
Parameters | Meaning |
n | Entire system time steps |
nd | Number of connected Internet of Things devices to the fog node in each fragment |
nb | Number of neighboring fragments |
nQ | Learning time steps |
ɛ | Probability of random selection |
α | Learning coefficient |
γ | Constant discount factor |
p | Predefined device priorities |
zm | Determines the situation of the fragment |
zn | Average of zm of all neighbors |
Thl | Threshold length |
Thold | Last threshold level |
Thnew | New selected threshold level |
Threshold | Contains Thnew based on zn, zm, and Thold and is continuously updated while learning the best threshold level |
NTh | Number of Threshold elements |
Thc | Correct threshold levels, based on the internal and external situations, as the target threshold matrix |
RTh | Received reward of learning the best threshold level through reinforced learning |
QTh | Represents the Q-value of the selected threshold level |
punish | Number of feasible helpers with lower priority than the emergency device priority |
countpun | Total feasible helpers with lower priority than the selected helpers for all needy devices in time step n |
penalty | Number of all devices with lower priority than the current needy device priority |
blame | Punishment when choosing a neighboring device when feasible devices have lower priority than the threshold |
flag | flag(i) = 0 indicates that device i is in a normal situation, and flag(i) = 1 indicates an emergency |
track | track(i) = 1 indicates that device i has received extra bandwidth |
fnort | fnort(i) = 0 indicates that needy device i has not received extra bandwidth |
a | Selected action (helper) |
RnQ | Received reward for the fog node |
Qnew | Represents the Q-value of the selected device k for helping device i |
SBM | Successful bandwidth management |
References
- Khan, M.A.; Karim, M.; Kim, Y. A two-stage big data analytics framework with real world applications using spark machine learning and long short-term memory network. Symmetry 2018, 10, 485. [Google Scholar] [CrossRef] [Green Version]
- Barnaghi, P.; Bermudez-Edo, M.; Tönjes, R. Challenges for quality of data in smart cities. J. Data Inf. Qual. 2015, 6, 1–4. [Google Scholar] [CrossRef]
- Alam, M.R.; Reaz, M.B.I.; Ali, M.A.M. A review of smart homes—Past, present, and future. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2012, 42, 1190–1203. [Google Scholar] [CrossRef]
- Chan, M.; Campo, E.; Estève, D.; Fourniols, J.Y. Smart homes—Current features and future perspectives. Maturitas 2009, 64, 90–97. [Google Scholar] [CrossRef]
- De Silva, L.C.; Morikawa, C.; Petra, I.M. State of the art of smart homes. Eng. App. Artif. Intell. 2012, 25, 1313–1321. [Google Scholar] [CrossRef]
- Kim, H.; Choi, H.; Kang, H.; An, J.; Yeom, S.; Hong, T. A systematic review of the smart energy conservation system: From smart homes to sustainable smart cities. Renew. Sustain. Energy Rev. 2021, 140, 110755. [Google Scholar] [CrossRef]
- Su, K.; Li, J.; Fu, H. Smart city and the applications. In Proceedings of the 2011 International Conference on Electronics, Communications and Control (ICECC), Ningbo, China, 9–11 September 2011; pp. 1028–1031. [Google Scholar]
- Chen, B.; Wan, J.; Shu, L.; Li, P.; Mukherjee, M.; Yin, B. Smart factory of industry 4.0: Key technologies, application case, and challenges. IEEE Access 2017, 6, 6505–6519. [Google Scholar] [CrossRef]
- Wang, S.; Wan, J.; Li, D.; Zhang, C. Implementing smart factory of industry 4.0: An Outlook. Int. J. Distrib. Sens. Netw. 2016, 12, 3159805. [Google Scholar] [CrossRef] [Green Version]
- Lucke, D.; Constantinescu, C.; Westkämper, E. Smart factory-A step towards the next generation of manufacturing. In Manufacturing Systems and Technologies for the New Frontier; Mitsuishi, M., Ueda, K., Kimura, F., Eds.; Springer: London, UK, 2008; pp. 115–118. [Google Scholar]
- Ansari, S.; Aslam, T.; Poncela, J.; Otero, P.; Ansari, A. Internet of things-based healthcare applications. In IoT Architectures, Models, and Platforms for Smart City Applications; Chowdhry, B.S., Shaikh, F.K., Mahoto, N.A., Eds.; IGI Global: Hershey, PA, USA, 2020; pp. 1–28. [Google Scholar]
- Catarinucci, L.; De Donno, D.; Mainetti, L.; Palano, L.; Patrono, L.; Stefanizzi, M.L.; Tarricone, L. An IoT-aware architecture for smart healthcare systems. IEEE Internet Things J. 2015, 2, 515–526. [Google Scholar] [CrossRef]
- Leonardi, L.; Lo Bello, L.; Battaglia, F.; Patti, G. Comparative assessment of the LoRaWAN medium access control protocols for IoT: Does listen before talk perform better than ALOHA? Electronics 2020, 9, 553. [Google Scholar] [CrossRef] [Green Version]
- Kabalci, Y.; Kabalci, E.; Padmanaban, S.; Holm-Nielsen, J.B.; Blaabjerg, F. Internet of things applications as energy internet in smart grids and smart environments. Electronics 2019, 8, 972. [Google Scholar] [CrossRef] [Green Version]
- Simoens, P.; Dragone, M.; Saffiotti, A. The internet of robotic things: A review of concept, added value and applications. Int. J. Adv. Robot. Syst. 2018, 15, 1729881418759424. [Google Scholar] [CrossRef]
- Patti, G.; Leonardi, L.; Lo Bello, L. A novel MAC protocol for low datarate cooperative mobile robot teams. Electronics 2020, 9, 235. [Google Scholar] [CrossRef] [Green Version]
- Pasetti, M.; Ferrari, P.; Silva, D.R.C.; Silva, I.; Sisinni, E. On the use of LoRaWAN for the monitoring and control of distributed energy resources in a smart campus. Appl. Sci. 2020, 10, 320. [Google Scholar] [CrossRef] [Green Version]
- Wan, J.; Tang, S.; Shu, Z.; Li, D.; Wang, S.; Imran, M.; Vasilakos, A.V. Software-defined industrial internet of things in the context of industry 4.0. IEEE Sens. J. 2016, 16, 7373–7380. [Google Scholar] [CrossRef]
- Sisinni, E.; Ferrari, P.; Carvalho, D.F.; Rinaldi, S.; Marco, P.; Flammini, A.; Depari, A. LoRaWAN range extender for industrial IoT. IEEE Trans. Ind. Inform. 2019, 16, 5607–5616. [Google Scholar] [CrossRef]
- Luvisotto, M.; Tramarin, F.; Vangelista, L.; Vitturi, S. On the use of LoRaWAN for indoor industrial IoT applications. Wirel. Commun. Mob. Comput. 2018, 2018, 3982646. [Google Scholar] [CrossRef] [Green Version]
- Leonardi, L.; Ashjaei, M.; Fotouhi, H.; Bello, L.L. A proposal towards software-defined management of heterogeneous virtualized industrial networks. In Proceedings of the 2019 IEEE 17th International Conference on Industrial Informatics, Helsinki, Finland, 22–25 July 2019; Volume 1, pp. 1741–1746. [Google Scholar]
- Lucas-Estañ, M.C.; Raptis, T.P.; Sepulcre, M.; Passarella, A.; Regueiro, C.; Lazaro, O. A software defined hierarchical communication and data management architecture for industry 4.0. In Proceedings of the 2018 14th Annual Conference on Wireless On-demand Network Systems and Services 2018, Isola, France, 6–8 February 2018; pp. 37–44. [Google Scholar]
- Leonardi, L.; Lo Bello, L.; Aglianò, S. Priority-based bandwidth management in virtualized software-defined networks. Electronics 2020, 9, 1009. [Google Scholar] [CrossRef]
- Velasco, M.; Fuertes, J.M.; Lin, C.; Marti, P.; Brandt, S. A control approach to bandwidth management in networked control systems. In Proceedings of the 30th Annual Conference IEEE Industrial Electronics Society, Busan, Korea, 2–6 November 2004; Volume 3, pp. 2343–2348. [Google Scholar]
- Fazio, P.; Tropea, M.; Veltri, F.; Marano, S. A novel rate adaptation scheme for dynamic bandwidth management in wireless networks. In Proceedings of the 2012 IEEE 75th Vehicular Technology Conference (VTC Spring), Yokohama, Japan, 6–9 May 2012; pp. 1–5. [Google Scholar]
- Chang, Y.C.; Chen, Y.C.; Chen, T.H.; Chen, J.L.; Chiu, S.P.; Chang, W.H. Software-defined dynamic bandwidth management. In Proceedings of the 2019 21st International Conference on Advanced Communication Technology, PyeongChang, Korea, 17–20 February 2019; pp. 201–205. [Google Scholar]
- Paredes, R.K.; Hernandez, A.A. Designing an adaptive bandwidth management for higher education institutions. arXiv 2020, arXiv:2012.12362. [Google Scholar] [CrossRef]
- Jhaveri, R.; Sagar, R.; Srivastava, G.; Gadekallu, T.R.; Aggarwal, V. Fault-resilience for bandwidth management in industrial software-defined networks. IEEE Trans. Netw. Sci. Eng. 2021, 1. [Google Scholar] [CrossRef]
- Mobasheri, M.; Kim, Y.; Kim, W. Fog fragment cooperation on bandwidth management based on reinforcement learning. Sensors 2020, 20, 6942. [Google Scholar] [CrossRef] [PubMed]
- Thrun, S.; Littman, M.L. Reinforcement learning: An introduction. AI Mag. 2000, 21, 103. [Google Scholar]
- Steccanella, L.; Totaro, S.; Allonsius, D.; Jonsson, A. Hierarchical reinforcement learning for efficient exploration and transfer. arXiv 2020, arXiv:2011.06335. [Google Scholar]
- Nachum, O.; Gu, S.; Lee, H.; Levine, S. Data-efficient hierarchical reinforcement learning. arXiv 2018, arXiv:1805.08296. [Google Scholar]
- Vezhnevets, A.S.; Osindero, S.; Schaul, T.; Heess, N.; Jaderberg, M.; Silver, D.; Kavukcuoglu, K. Feudal networks for hierarchical reinforcement learning. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 3540–3549. [Google Scholar]
- Levy, A.; Platt, R.; Saenko, K. Hierarchical actor-critic. arXiv 2017, arXiv:1712.00948. [Google Scholar]
- Kreidieh, A.R.; Berseth, G.; Trabucco, B.; Parajuli, S.; Levine, S.; Bayen, A.M. Inter-level cooperation in hierarchical reinforcement learning. arXiv 2019, arXiv:1912.02368. [Google Scholar]
- Flet-Berliac, Y. The promise of hierarchical reinforcement learning. The Gradient, 09 March 2019. [Google Scholar]
- Barto, A.G.; Mahadevan, S. Recent advances in hierarchical reinforcement learning. Discret. Event Dyn. Syst. 2003, 13, 41–77. [Google Scholar] [CrossRef]
- Mobasheri, M.; Kim, Y.; Kim, W. Toward developing fog decision making on the transmission rate of various IoT devices based on reinforcement learning. IEEE Internet Things Mag. 2020, 3, 38–42. [Google Scholar] [CrossRef]
z | Policy | |||
---|---|---|---|---|
zn | zm | Decreasing Thold | Helping Neighbors | |
1 | 0 | 0 | 0 | 0 |
2 | 0 | 1 | 0 | 1 |
3 | 1 | 0 | 1 | 0 |
4 | 1 | 1 | 1 | 1 |
State | Action | |||
---|---|---|---|---|
zn | zm | Thold | Thnew | |
1 | 0 | 0 | X | 2 |
2 | 0 | 1 | 1,2 | 1 |
3 | 2 | |||
3 | 1 | 0 | 1 | 2 |
2,3 | 3 | |||
4 | 1 | 1 | 1 | 1 |
2,3 | 3 |
Fixed Threshold Learning with Neighbor Cooperation | Adaptive Threshold Learning with Neighbor Cooperation | |
---|---|---|
Final SBM convergence: | 275.89 | 275.95 |
Total time steps: | 864 s | 358 s |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mobasheri, M.; Kim, Y.; Kim, W. Toward an Adaptive Threshold on Cooperative Bandwidth Management Based on Hierarchical Reinforcement Learning. Sensors 2021, 21, 7053. https://doi.org/10.3390/s21217053
Mobasheri M, Kim Y, Kim W. Toward an Adaptive Threshold on Cooperative Bandwidth Management Based on Hierarchical Reinforcement Learning. Sensors. 2021; 21(21):7053. https://doi.org/10.3390/s21217053
Chicago/Turabian StyleMobasheri, Motahareh, Yangwoo Kim, and Woongsup Kim. 2021. "Toward an Adaptive Threshold on Cooperative Bandwidth Management Based on Hierarchical Reinforcement Learning" Sensors 21, no. 21: 7053. https://doi.org/10.3390/s21217053
APA StyleMobasheri, M., Kim, Y., & Kim, W. (2021). Toward an Adaptive Threshold on Cooperative Bandwidth Management Based on Hierarchical Reinforcement Learning. Sensors, 21(21), 7053. https://doi.org/10.3390/s21217053