Skip Content
You are currently on the new version of our website. Access the old version .
SensorsSensors
  • Article
  • Open Access

31 July 2020

An Edge Based Multi-Agent Auto Communication Method for Traffic Light Control

,
,
,
and
1
School of Information & Engineering, Lanzhou University, Lanzhou 730000, China
2
School of Computing and Information Technology, University of Wollongong, Wollongong 2522, Australia
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Internet of Things, Big Data and Smart Systems

Abstract

With smart city infrastructures growing, the Internet of Things (IoT) has been widely used in the intelligent transportation systems (ITS). The traditional adaptive traffic signal control method based on reinforcement learning (RL) has expanded from one intersection to multiple intersections. In this paper, we propose a multi-agent auto communication (MAAC) algorithm, which is an innovative adaptive global traffic light control method based on multi-agent reinforcement learning (MARL) and an auto communication protocol in edge computing architecture. The MAAC algorithm combines multi-agent auto communication protocol with MARL, allowing an agent to communicate the learned strategies with others for achieving global optimization in traffic signal control. In addition, we present a practicable edge computing architecture for industrial deployment on IoT, considering the limitations of the capabilities of network transmission bandwidth. We demonstrate that our algorithm outperforms other methods over 17% in experiments in a real traffic simulation environment.

1. Introduction

Traffic congestion has caused a series of severe negative impacts like longer waiting time, more gas cost, and severe air pollution. According to a report in 2014 [1], the loss caused by traffic jams is up to $124 billion US dollars a year in the US. The shortage of traffic infrastructures, the growing number of vehicles, and the inefficient traffic signal control are key underlying reasons for traffic congestion. Among these, the traffic light control problem seems to be the most easily solved. However, the internal operation of the real urban transportation environment cannot be accurately calculated and analyzed mathematically due to its complexity and uncertainty. Reinforcement learning (RL), which is characterized by being data-driven, mode-less, and self-learning, is well suited for conducting research on adaptive traffic light control algorithms [2,3,4].
The rapid development of artificial intelligence technology and deep learning (DL) has played a vital role in many fields. In recent years, DL has gained great success in image classification [5,6,7,8], machine translation [9,10,11,12], healthcare [13], smart city [14], time-series forecast [15], Game of Go [16] etc. The intelligent transportation systems (ITS) also have benefited from the latest AI achievement.
Traditional adaptive traffic light control method [2,3] could achieve local optimization by adapting to single intersection based on RL. Furthermore, global optimization is needed to achieve dynamic multi-intersection control in large smart city infrastructure. Multi-agent reinforcement learning (MARL) is increasingly being used to study more complex traffic light control issues [17,18,19].
Although the existing methods have effectively improved the control efficiency of traffic signal control, they still have the following problems: (1) shortage of communication between a traffic light and other traffic lights; (2) shortage of consideration of the limitations of the capabilities of network transmission bandwidth. The contributions of this paper are summarized as the following:
  • We present an auto communication protocol (ACP) between agents in MARL based on attention mechanism;
  • We propose a multi-agent auto communication (MAAC) algorithm based on MARL and ACP in traffic light control;
  • We build a practicable edge computing architecture for industrial deployment on Internet of Things (IoT), considering the limitations of the capabilities of network transmission bandwidth;
  • The experiments show the MAAC framework outperformed 17 %over baseline models.
The remainder of this paper is organized as follows: Section 2 introduces related works including multi-agent system, RL, IoT, edge computing, and the basic concept of communication theory. Section 3 formulates the definition of the traffic light control problem. Section 4 details the MAAC model and our edge computing architecture for IoT. Section 5 conducts the experiments in a traffic simulation environment and demonstrates the results of the experiments with a comparison between our methods and others. Section 6 concludes the paper and discusses future work.

3. Preliminary

3.1. Multi-Agent Communication Model

The multi-agent communication model (as shown in Figure 5) is in accordance with Shannon communication model [41], the applied perception and behavior of agents can be modeled by information reception and transmission. The agent acts as a communication transceiver, and the internal structure information of the agent is encoded and decoded. The environment is the communication channel between the agents. In actual modeling, a continuous matrix is generally used for multi-agent communication [18,42].
Figure 5. The structure of multi-agent communication model based on Shannon communication model.

3.1.1. Shannon Communication Model

The basic problem of communication is to reproduce a message sent from one point to another point. In 1948, Shannon proposed the Shannon communication model [41], which represented the beginning of modern communication theory. Shannon communication model is a linear communication model, consisting of six parts: sender, encoder, channel, noise, decoder, and receiver, as shown in Figure 6:
Figure 6. An illustration of Shannon communication model.

3.1.2. Communications Protocol

The communication protocol [43] is also called the transmission protocol. Both parties involved in the communication carried out end-to-end information transmission according to the agreed rules, and both parties can understand the received information. The communication protocol is mainly composed of grammar, semantics, and timing. The syntax includes the data format, encoding, and signal level; the semantics represent the data content that contains control information, and the timing represents clear rate matching and sequencing of communications.

3.2. Problem Definition

In the problem of multi-agent traffic signal control, we consider it as a Markov Decision Process (MDP): < x , π , R , γ > , where x the state of all intersections; π is the policy to create actions; R is reward from all crossroads; γ is the discount factor. Furthermore, we define: each agent that controls the change (duration) of traffic lights is A g e n t i ( i N ) ; π i is A g e n t i for all acceptable traffic light duration control strategies, rewarding the R i environment for the level of traffic congestion at the intersection of A g e n t i and other Agents ( A g e n t i and its policy π i ) (It can be calculated according to the specific indicators of vehicle queue length, the lower the congestion level, the greater the reward), ( c 1 , , c N ) is the communication matrix C between agents, so in our multi-agent traffic signal problem, the objective functions controlled are:
R i x ; π i , π i , C = E t = 0 γ i t r i x t , π i , t , π i , t , C t
In the Eqution (4), π i is the policy of A g e n t i ; π i is the policy of A g e n t i ; t is the timestep. The problem is to find a better strategy to maximize the value of the above formula.

4. Methodology

In this section, we will detail the multi-agent auto communication (MAAC) model and an edge computing architecture for IoT.

4.1. MAAC Model

In the MAAC model (as shown in Figure 7):
Figure 7. The structure of multi-agent auto communication (MAAC) model.
Each agent can be modeled by distributed, partially observable Markov decision (Dec-POMDP). The strategy of each agent ( π θ t ) is generated by a neural network. In each time step, the agent will observe the local environment x t and the communication information sent by other agents ( c 1 , , c i 1 , c i + 1 , , c N ) . Through the combination of the above time series information, the Agent generates the next action ( a t + 1 and the next communication message c t + 1 sent out by the internal processing mechanism (parameter is θ i ).
The joint actions of all Agents ( a 1 , , a N ) interact with the environment, which is to obtain the maximum value of the centralized value function ( θ = E [ R ] ) . The MAAC algorithm is designed to improve the neural network parameter set θ i of each agent through the process of optimizing the central value function. The overall architecture of the MAAC model can be regarded as a distributed MARL model with automatic communication capabilities.

4.1.1. Internal Communication Module in Agent

The internal communication module (ICM) in an agent is an important part in MAAC model (as shown in Figure 8).
Figure 8. The internal communication module in an agent.
Each Agent, which is divided into two sub-modules, with the receiving end and the sending end. The receiving end receives the information of other agents and uses the attention mechanism for information processing, and then sends the processed information to the sending end; the sending end observes the external environment and uses the information processed by the receiving attention mechanism to generate information using a neural network.
  • Receiving End
    A g e n t i will use the attention mechanism to filter information received from other Agents ( A g e n t i ). Firstly it generates its own message c t from a combined message C = c 1 , , c N after receiving the information of A g e n t i . Then, it picks important messages and ignores unimportant ones. Herein, we introduce the parameter set W q , W k , W v , which are calculated separately (could be calculated in parallel):
    q i = W q · C k i = W k · C v i = W v · C
    Then, we calculate the information weight α i ^ = s o f t m a x ( q i k ˙ i ) . Finally we get the weighted information after the information selection: C ^ = i = 1 N α i c i .
  • Sending End
    The sending end of the Agent receives the information of other Agents processed by the Attention mechanism of the receiving end C ^ , and through the observed local environment x t , generates the next execution action through the neural network a t + 1 and communication information c t + 1 .

4.1.2. MAAC Algorithm

At the time of t in the MAAC model, the environment input is X t = ( x t 1 , , x t N ) and corresponding communication information input is C t = ( c t 1 , , c t N ) . Multi-agents ( A g e n t 1 , , A g e n t N ) are going to interact with each other. Each Agent receives information with receivers and transmitters internally. The receiver receives its own environmental information x t and communication information c t , and generates action and external interaction information group ( a t + 1 , c t + 1 ) at t + 1 . The MAAC model collects all agent actions to form a joint action ( a 1 , , a N ), interacting with the environment and optimizing objective strategy for each agent.
θ i V θ i = E θ i log π θ i a i t | c i t Q ^ t , a t 1 , , a t N
The calculation steps of MAAC at time t are shown in the Figure 9:
Figure 9. The process of MAAC algorithm compute.
In the MAAC algorithm (as shown in Algorithm 1), the parameter set of A g e n t i for each agent is θ i . Furthermore, θ i is divided into the sender θ S e n d e r i and receiver θ R e c e i v e r i . The parameters of the sending end and the receiving end, which are optimized by the overall multi-agent objective function, iteratively updating the parameter set of the receiver and the sender in the communication module of each agent.
Algorithm 1 MAAC learning algorithm process
1: Initialize the communication matrix of all agents C 0
2: Initialize the parameters of the agent θ S e n d e r i and θ R e c e i v e r i
3: repeat
4:  Receiver of A g e n t i : uses attention mechanism to generate communication matrix C t ^
5:  Sender of A g e n t i : chooses an action a t + 1 i from policy selection network, or randomly chooses
  action a (e.g., ϵ -greedy exploration)
6:  Sender of A g e n t i : generates its own information through the receiver’s communication matrix
   C t ^ c t + 1 i
7:  Collect all the joint actions of Agent and execute the actions a t + 1 1 , , a t + 1 N , get the reward from
  the environment R t + 1 and next state X t + 1
8:  Update the strategic value function of each Agent:
   θ i V θ i = E θ i log π θ i a i t | c i t Q ^ t , a t 1 , , a t N
9: until End of Round Episode
10: returns θ S e n d e r i and θ R e c e i v e r i for each Agent

4.2. Edge Computing Structure

In order to deploy MAAC algorithms in an industrial scale environment, we must take the network delay into consideration. We propose an edge computing architecture near every traffic light. An edge computing device needs to have the following functions: (1) it could detect vehicles’ information (location, direction, velocity) from the surveillance video of its intersection in real-time and record the vehicle information; (2) it could run the traffic signal control algorithm to control the traffic light nearby (see Figure 10).
Figure 10. The edge devices are deployed near the traffic lights.

5. Experiments

In this section, we first built the urban traffic simulator based on our edge computing architecture. Then, we have applied the MAAC algorithm and other baseline algorithms to the simulation environment for comparing the performance of all models.

5.1. Simulation Environment and Settings

We apply an open source simulator for traffic environment: CityFlow [44] as our experiment environment. We assumed that there are six traffic lights (intersection nodes or edge computing nodes) in one section of a city (as shown in Figure 11).
Figure 11. The experiment environment for multi-intersection traffic signal control.
Our dynamic control of the traffic lights was using the CityFlow [44] Python interface at runtime.
Here are the settings of our experiments (as shown in Table 2).
Table 2. The detailed simulation parameters.
  • The directions
    One traffic light at n o d e 0 has four neighbor nodes ( n o d e 1 , n o d e 2 , n o d e 3 , n o d e 4 ) , four entries (in), and four exits (out). The road length is set to 350 m and vehicle speed limit is set to be 30 (km/h).
  • Traffic light agent
    We apply traffic signal control algorithm into a docker container [45].
  • Communication delay setting
    The communication delay from center to a traffic light is set as 1 s (sleep 1 s in the code).
  • Traffic control timing cycle
    We initially set a traffic light time cycle as 45 s, and green light interval g t = 20 s, red light interval r t = 20 s, and yellow light interval y t = 5 s.
  • Episode
    One episode time is set as 15 min (900 s), including 20 traffic light time cycles.
  • Vehicle simulation setting
    We assume vehicles arrive at road entrances according to the Bernoulli process with the random probability P i n = 1 15 at one intersection. Every vehicle has a random destination node except for the entry node (we set r a n d o m ( s e e d ) = 7 ). In one episode, there are approximately 400 vehicles.
  • Hyper-parameter setting
    The learning rate is set to 0.001; γ is set to 0.992; the reward is the average waiting time at intersection.

5.2. Baseline Methods

  • Fix-time
    In this method we set all the traffic light timing as fixed traffic control timing cycle as we have mentioned in the experiments.
  • Q-learning (Center)
    Q-learning algorithm [46] is deployed on center (docker) to generate traffic light control action. The delay from the traffic light agent to an intersection is set at 1.0 s.
  • Q-learning (Edge)
    Q-learning algorithm [46] herein is deployed on edge device (docker) to generate traffic light control action. The delay from the traffic light agent to an intersection is set 0.1 s.
  • Nash Q-learning
    Nash Q-learning [47] extends Q-learning to a non-cooperative MARL. An agent maintains Q-functions over joint actions, and performs updates based on assuming Nash equilibrium behavior over the current Q-values.

5.3. Evaluation

The time of a vehicle enters an entry of the intersection until it passes through, is defined as t M , where M is the number of vehicles. In simulations, we record the time for all vehicles at one intersection in every episode. At last, we accumulate all the times record over all the intersections, T e = i = 1 I m = 1 M t m . To evaluate the traffic network, where E is the number of the episode, M is the number of vehicles, and I is the number of intersections that every vehicle will pass through.

5.4. Results

We have applied five methods, including Fixed-time method [20], Q-learning (Center) [46], Q-learning (Edge) [46], Nash Q-learning [47], and our MAAC method. They were all trained in 1000 episodes in CityFLow [44] based on edge computing architecture as we designed. As shown in Figure 12, we can see that the the algorithms converged at around 600 episode point. The MAAC method performed the fastest convergence in the training process, comparing with other models.
Figure 12. The training process of five methods.
After trainning, we tested the algorithms in 500 episodes after the training process. As shown in Table 3, the MAAC performed the best among the traffic signal control algorithms.
Table 3. The resluts of five methods.
As shown in Table 4, our method did not sacrifice the waiting time of some intersections to ensure overall performance. Furthermore, the performance of every intersection was optimized at different levels. From the method Q-learning (center) and Q-learning (edge), we can see that the edge computing structure has reduced the network delay for the deployment environment.
Table 4. The experiment results of average waiting time at each intersection.
As shown in Table 5, the delay time and delay rate of Q-learning (Center) are the highest, which proves the edge computing structure we proposed is useful for reducing the network delay. The MAAC still outperforms others when computing the delay time of the network.
Table 5. The delay time of five methods.

6. Conclusions

In this work, we proposed a multi-agent auto communication (MAAC) algorithm based on the multi-agent reinforcement learning (MARL) and an auto communication protocol (ACP) between agents with the attention mechanism. We built a practicable edge computing structure for industrial deployment on IoT, considering the limitations of the capabilities of network transmission bandwidth.
In the simulation environment, the experiments have shown the MAAC framework outperformed 17% over baseline models. Moreover, the edge computing structure is useful for reducing the network delay when deploying the algorithm on an industrial scale.
In future research, we will build a simulation environment much closer to the real world and take the communication from vehicle to traffic light into consideration to improve the MAAC method.

Author Contributions

Conceptualization, Q.W. and J.W.; methodology, Q.W. and J.S.; software, B.Y.; validation, J.S. and Q.Z.; formal analysis, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

This work was supported by Ministry of Education—China Mobile Research Foundation under Grant No. MCM20170206, The Fundamental Research Funds for the Central Universities under Grant No. lzujbky-2019-kb51 and lzujbky-2018-k12, National Natural Science Foundation of China under Grant No. 61402210, Major National Project of High Resolution Earth Observation System under Grant No. 30-Y20A34-9010-15/17, State Grid Corporation of China Science and Technology Project under Grant No. SGGSKY00WYJS2000062, Program for New Century Excellent Talents in University under Grant No. NCET-12-0250, Strategic Priority Research Program of the Chinese Academy of Sciences with Grant No. XDA03030100, Google Research Awards and Google Faculty Award. We also gratefully acknowledge the support of NVIDIA Corporation with the donation of the Jetson TX1 used for this research. Jianqing Wu also would like to gratefully acknowledge financial support from the China Scholarship Council (201608320168). Jun Shen’s collaboration was supported by University of Wollongong’s University Internationalization Committee Linkage grant and Chinese Ministry of Education’s International Expert Fund “Chunhui Project” awarded to Lanzhou University.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Abdulhai, B.; Pringle, R.; Karakoulas, G.J. Reinforcement learning for true adaptive traffic signal control. J. Transp. Eng. 2003, 129, 278–285. [Google Scholar] [CrossRef]
  2. Richard, S.S.; Andrew, G.B. Reinforcement Learning: An Introduction. MIT Press 2005, 16, 285–286. [Google Scholar]
  3. Ghazal, B.; ElKhatib, K.; Chahine, K.; Kherfan, M. Smart traffic light control system. In Proceedings of the 2016 3rd International Conference on Electrical, Electronics, Computer Engineering and their Applications (EECEA), Beirut, Lebanon, 21–23 April 2016; pp. 140–145. [Google Scholar]
  4. Wei, H.; Zheng, G.; Yao, H.; Li, Z. IntelliLight: A Reinforcement Learning Approach for Intelligent Traffic Light Control; Association for Computing Machinery: New York, NY, USA, 2018; pp. 2496–2505. [Google Scholar]
  5. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2012; pp. 1097–1105. [Google Scholar]
  6. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2818–2826. [Google Scholar]
  7. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  8. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
  9. Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.R.; Jaitly, N.; Senior, N.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 2012, 29, 82–97. [Google Scholar] [CrossRef]
  10. Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
  11. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  12. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N. Attention is All you Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  13. Zhou, R.; Li, X.; Yong, B.; Shen, Z.; Wang, C.; Zhou, Q.; Li, K.C. Arrhythmia recognition and classification through deep learning-based approach. Int. J. Comput. Sci. Eng. 2019, 19, 506–517. [Google Scholar] [CrossRef]
  14. Wu, Q.; Shen, J.; Yong, B.; Wu, J.; Li, F.; Wang, J.; Zhou, Q. Smart fog based workflow for traffic control networks. Future Gener. Comput. Syst. 2019, 97, 825–835. [Google Scholar] [CrossRef]
  15. Yong, B.; Xu, Z.; Shen, J.; Chen, H.; Wu, J.; Li, F.; Zhou, Q. A novel Monte Carlo-based neural network model for electricity load forecasting. Int. J. Embed. Syst. 2020, 12, 522–533. [Google Scholar] [CrossRef]
  16. Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354. [Google Scholar] [CrossRef]
  17. Bazzan, A.L. Opportunities for multi-agent systems and multi-agent reinforcement learning in traffic control. Auton. Agents Multi-Agent Syst. 2009, 18, 342. [Google Scholar] [CrossRef]
  18. Sukhbaatar, S.; Fergus, R. Learning multiagent communication with backpropagation. NIPS 2016, 2244–2252. [Google Scholar]
  19. Hoshen, Y. Attentional multi-agent predictive modeling. Neural Inf. Process. Syst. (NIPS) 2017, 2701–2711. [Google Scholar]
  20. Webster, F. Traffic signal settings. H.M. Station. Off. 1958, 39, 45. [Google Scholar]
  21. Thorpe, T.L. Vehicle Traffic Light Control Using SARSA; Colorado State University: Fort Collins, CO, USA, 1997. [Google Scholar]
  22. Chiou, S.W. An efficient algorithm for computing traffic equilibria using TRANSYT model. Appl. Math. Model. 2010, 34, 3390–3399. [Google Scholar] [CrossRef]
  23. Chaib-Draa, B.; Moulin, B.; Millot, P. Trends in distributed artificial intelligence. Artif. Intell. Rev. 1992, 6, 35–66. [Google Scholar] [CrossRef]
  24. Durfee, E.H.; Lesser, V.R. Negotiating task decomposition and allocation using partial global planning. Distrib. Artif. Intell. 1989, 2, 229–243. [Google Scholar]
  25. Minsky, M. The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind; Simon & Schuster: New York, NY, USA, 2007. [Google Scholar]
  26. Peng, P.; Yuan, Q.; Wen, Y.; Yang, Y.; Tang, Z.; Long, H.; Wang, J. Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games. arXiv 2017, arXiv:1703.10069. [Google Scholar]
  27. Chen, C.; Wei, H.; Xu, N.; Zheng, G.; Yang, M.; Xiong, Y.; Xu, K.; Li, Z. Toward a Thousand Lights: Decentralized Deep Reinforcement Learning for Large-Scale Traffic Signal Control. AAAI 2020, 34, 3414–3421. [Google Scholar] [CrossRef]
  28. Prashanth, L.A.; Bhatnagar, S. Reinforcement learning with function approximation for traffic signal control. IEEE Trans. Intell. Transp. Syst. 2010, 12, 412–421. [Google Scholar]
  29. Mousavi, S.S.; Schukat, M.; Howley, E. Traffic light control using deep policy-gradient and value-function-based reinforcement learning. IET Intell. Transp. Syst. 2017, 11, 417–423. [Google Scholar] [CrossRef]
  30. El, -T.S.; Abdulhai, B. Towards multi-agent reinforcement learning for integrated network of optimal traffic controllers (MARLIN-OTC). Transp. Lett. 2010, 2, 89–110. [Google Scholar] [CrossRef]
  31. Weinberg, M.; Rosenschein, J.S. Best-response multiagent learning in non-stationary environments. In Proceedings of the 3rd International Joint Conference on Autonomous Agents and Multiagent Systems, New York, NY, USA, 19–23 August 2004. [Google Scholar]
  32. Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.S.; Bengio, Y. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Comput. Sci. 2015, 37, 2048–2057. [Google Scholar]
  33. Luong, M.T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
  34. Pottie, G.J. Wireless sensor networks. In Proceedings of the 1998 Information Theory Workshop (Cat. No.98EX131), Killarney, Ireland, 22–26 June 1998; pp. 139–140. [Google Scholar] [CrossRef]
  35. Jell, A.; Vogel, T.; Ostler, D.; Marahrens, N.; Wilhelm, D.; Samm, N.; Eichinger, J.; Weigel, W.; Feussner, H.; Friess, H.; et al. 5th-Generation Mobile Communication: Data Highway for Surgery 4.0. Surg. Technol. Int. 2019, 35, 36–42. [Google Scholar]
  36. Atzori, L.; Iera, A.; Morabito, G. The internet of things: A survey. Inf. Syst. Front. 2015, 17, 243–259. [Google Scholar]
  37. de C. Neto, J.M.; Neto, S.F.G.; M. de Santana, P.; de Sousa, V.A., Jr. Multi-Cell LTE-U/Wi-Fi Coexistence Evaluation Using a Reinforcement Learning Framework. Sensors 2020, 20, 1855. [Google Scholar]
  38. Faheem, M.; Butt, R.A.; Raza, B.; Alquhayz, H.; Ashraf, M.W.; Shah, S.B.; Ngadi, M.A.; Gungor, V.C. A Cross-Layer QoS Channel-Aware Routing Protocol for the Internet of Underwater Acoustic Sensor Networks. Sensors 2019, 19, 4762. [Google Scholar] [CrossRef]
  39. Zikria, Y.B.; Afzal, M.K.; Kim, S.W. Internet of Multimedia Things (IoMT): Opportunities, Challenges and Solution. Sensors 2020, 20, 2334. [Google Scholar] [CrossRef]
  40. Yong, B.; Xu, Z.; Wang, X.; Cheng, L.; Li, X.; Wu, X.; Zhou, Q. IoT-based intelligent fitness system. J. Parallel Distrib. Comput. 2017, 8, 279–292. [Google Scholar]
  41. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1963, 27, 379–423. [Google Scholar] [CrossRef]
  42. Foerster, A. Learning to communicate with deep multi-agent reinforcement learning. NIPS 2016, 2145–2153. [Google Scholar]
  43. West, C.H. General Technique for Communications Protocol Validation. IBM J. Res. Dev. 1978, 22, 393–404. [Google Scholar] [CrossRef]
  44. Wei, H.; Chen, C.; Zheng, G.; Wu, K.; Xu, K.; Gayah, V.; Li, Z. Presslight: Learning max pressure control for signalized intersections in arterial network. Int. Conf. Knowl. Discov. Data Min. (KDD) 2019, 1290–1298. [Google Scholar] [CrossRef]
  45. Boettiger, C. An introduction to Docker for reproducible research. ACM SIGOPS Oper. Syst. Rev. 2015, 49, 71–79. [Google Scholar] [CrossRef]
  46. Watkins, C.J.C.H.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
  47. Hu, J.W. Nash q-learning for general-sum stochastic games. J. Mach. Learn. Res. 2004, 4, 1039–1069. [Google Scholar]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.