Data-Driven Multi-Agent Vehicle Routing in a Congested City
Abstract
:1. Introduction
- (1)
- Finding the best route while accounting for congestion requires the driver to search through several possible alternatives.
- (2)
- The specific amount of congestion that will be encountered on any given road segment is unknown to the driver.
- (3)
- Avoiding the tragedy of the commons to approach a system optimum state.
- First, we show that the fastest route in a congested road network can be determined with less exploration than might otherwise be necessary. This is achieved by combining the data the driver acquires through experience driving a route with the data that are collected by all road users.
- Second, we show that this method is adaptable to changes in the congestion problem by constructing a new reinforcement-learning-based approach to teaching each driver which travel data best matches the current congestion.
- Finally, we demonstrate, by a multi-agent simulation, that the drivers can reach an equilibrium point that approaches the system optimum while being directed through a control mechanism.
2. Literature Review
3. The Theoretical Framework
3.1. Building the Potential Routes List (Step 1)
3.2. Requesting Travel Times (Step 2)
3.3. Retrieving Travel Times (Steps 3, 4)
3.4. Building the Route Estimate (Step 5)
3.5. Apply Direct Experience (Step 6)
3.6. Driving the Route and Re-Routing (Step 7)
3.7. Applying the Learning Algorithm (Step 8)
- (1)
- After the agent’s vehicle completes a route, the agent will request the actual travel times for its alternate routes from the database. The times used will be the most recent road segment completion time averages, providing the agent with the travel time they would likely have achieved if they had selected a given alternate route.
- (2)
- The agent selects the route with the fastest actual travel time—this list includes the route they just completed—and uses the SAW formula to find the new weight, :Rewritten, we use the formula as:
- (3)
- The newly calculated weight represents the weighting value that would have allowed the agent to select the fastest route, given the previously available travel time data. However, the agent may choose to adjust it, depending on its learning strategy.
4. Experimental Design
- (1)
- Will such a multi-agent system achieve user equilibrium with fewer routing episodes than either a centralized real-time traffic information system or direct agent experience?
- (2)
- Will re-routing while on route result in lower total route times than when no re-routing is used?
- (3)
- Will the weighting factor reach an equilibrium point at which the agent will no longer adjust between routing episodes?
4.1. Simulation Hardware and Software
4.2. Simulation Configurations
- (1)
- Each agent simulated individuallyEach agent is provided with the five fastest routes from the modified A* algorithm and allowed to run through each as the sole agent in the simulation. The fastest of the five is then selected as the fastest possible route time for the agent to travel from its origin to its destination without delays due to traffic congestion.
- (2)
- Simulation with direct experience but no re-routingThe simulation is run using 100 agents that are limited to using only the travel time data they can collect directly. The agents start by exploring the five fastest potential routes from the modified A* algorithm to determine the fastest one and then select a route as they gain further experience. The SAW formula is used to estimate the fastest route, but each agent uses an epsilon-greedy algorithm [8] to make their selection, with the epsilon value varied as one of the simulation parameters. The simulations are run 60 times with the SAW weight fixed such that the same weight value is used for all simulations for a given set of parameters.
- (3)
- Simulation with a TIS but no re-routingThe simulation is run using 100 agents that can learn a new SAW weight at varying rates using all travel time data available from the TIS. The SAW formula is used to estimate the fastest routes from a list of potential routes and the agents select the fastest estimate. After each simulation, the agent reviews the actual travel time for each potential route and determines what the SAW weight would need to be for the agent to have selected the fastest route. The simulations are run 60 times for each set of parameters.
- (4)
- Simulation with direct experience and re-routingThe simulation is run using 100 agents that are limited to using only the travel time data they can collect directly. This set of simulations is identical to the simulations in method (2), with the exception that a route performance factor of 1.5 is set for each set of simulations. The performance factor is a setting that allows the agent to calculate a new route from the next intersection they will occupy, to their destination. In the case of these simulations, the agent will only attempt to re-route if the total time they have experienced on a route is greater than 1.5 times the expected route time to that point. While an agent can consider re-routing at each intersection, each agent is only allowed to select a re-route once in each simulation. The simulations are run 60 times for each set of parameters.
- (5)
- Simulation with a TIS and re-routingThe simulation is run using 100 agents that can learn a new SAW weight at varying rates using all travel time data available from the TIS. This set of simulations is identical to simulations in method (3), with the exception that a route performance factor of 1.5 is set for each set of simulations. The simulations are run 60 times for each set of parameters.
- (6)
- Simulation with a combination of a TIS and direct experienceThe simulation is run using 100 agents that are allowed to learn a new SAW weight at varying rates using all travel time data available from the TIS. Re-routing is not allowed for the agents.This set of simulations differs from method (3) in that agents are also able to learn from direct experience. After an agent is provided a list of potential routes with estimated route times from applying the SAW formula, it reviews its previous route experience. If the route with the fastest estimated route time has not been used before, the agent will always select it. If the best estimated route is the same as the route used in the previous simulation, the agent selects the same route again. If the best estimated route is different from the route used in the previous simulation, the agent compares the estimated route time to the actual route time from the previous simulation. The previous route’s travel time is modified by an exploration factor of 0.5. If the estimated route is faster than the adjusted previous route time, the agent selects the new route. The simulations are run 60 times for each set of parameters.
4.3. Experimental Analysis
5. Simulation Results
- The total route time for all agents on the 60th simulation. As this is the final simulation run for a given set of parameters, it represents the point at which the agents will no longer be able to modify their routes.
- The price of anarchy at the 60th simulation.
- The minimum price of anarchy across all simulations.
- The mean price of anarchy. This value is presented to show the difference between the final simulation results and the average for the method.
- The median price of anarchy. This value is presented to show the overall effectiveness of the method across all simulations.
- The number of times user equilibrium was achieved. Equilibrium may last for a single pair of simulations or may be repeated across multiple simulations.
- Where re-routing is used, the minimum number of re-routes across all simulations.
5.1. Each Agent Simulated Individually
5.2. Simulation with Direct Experience but No Re-Routing
5.3. Simulation with a TIS but No Re-Routing
5.4. Simulation with Direct Experience and Re-Routing
5.5. Simulation with a TIS and Re-Routing
5.6. Simulation with a Combination of a TIS and Direct Experience
6. Discussion
6.1. Revisiting the Research Objectives
6.2. User Equilibrium
6.3. Re-Routing Effectiveness
6.4. Weighting Factor
6.4.1. Localization of the SAW Weight
6.4.2. Changes in Weighting Factor at User Equilibrium
7. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Mandayam, C.V.; Prabhakar, B. Traffic congestion: Models, costs and optimal transport. ACM Sigmetrics Perform. Eval. Rev. 2014, 42, 553–554. [Google Scholar] [CrossRef]
- Bazzan, A.L.; Chira, C. A Hybrid Evolutionary and Multiagent Reinforcement Learning Approach to Accelerate the Computation of Traffic Assignment. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, Singapore, 4–8 May 2015; pp. 1723–1724. [Google Scholar]
- Hardin, G. The tragedy of the commons. Science 1968, 162, 1243–1248. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dijkstra, E.W. A note on two problems in connexion with graphs. Numer. Math. 1959, 1, 269–271. [Google Scholar] [CrossRef] [Green Version]
- Hart, P.E.; Nilsson, N.J.; Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
- Thathachar, M.; Sastry, P.S. A new approach to the design of reinforcement schemes for learning automata. IEEE Trans. Syst. Man, Cybern. 1985, 1, 168–175. [Google Scholar] [CrossRef]
- Auer, P.; Cesa-Bianchi, N.; Fischer, P. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 2002, 47, 235–256. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Multi-armed Bandits. In Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018; Chapter 2; pp. 25–46. [Google Scholar]
- Google. Google Traffic. Available online: https://www.google.ca/maps/ (accessed on 10 August 2021).
- Waze Mobile. Waze. Available online: https://www.waze.com/ (accessed on 10 August 2021).
- Roughgarden, T. Stackelberg scheduling strategies. SIAM J. Comput. 2004, 33, 332–350. [Google Scholar] [CrossRef]
- Desjardins, C.; Laumônier, J.; Chaib-draa, B. Learning agents for collaborative driving. In Multi-Agent Systems for Traffic and Transportation Engineering; IGI Global: Hershey, PA, USA, 2009; pp. 240–260. [Google Scholar]
- Bazzan, A.L. Opportunities for multiagent systems and multiagent reinforcement learning in traffic control. Auton. Agents Multi-Agent Syst. 2009, 18, 342. [Google Scholar] [CrossRef]
- Wang, S.; Djahel, S.; Zhang, Z.; McManis, J. Next road rerouting: A multiagent system for mitigating unexpected urban traffic congestion. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2888–2899. [Google Scholar] [CrossRef] [Green Version]
- Horng, G.J.; Cheng, S.T. Using intelligent vehicle infrastructure integration for reducing congestion in smart city. Wirel. Pers. Commun. 2016, 91, 861–883. [Google Scholar] [CrossRef]
- de Oliveira, D.; Bazzan, A.L. Multiagent learning on traffic lights control: Effects of using shared information. In Multi-Agent Systems for Traffic and Transportation Engineering; IGI Global: Hershey, PA, USA, 2009; pp. 307–321. [Google Scholar]
- Wardrop, J. Some theoretical aspects of road traffic research. Proc. Inst. Civ. Eng. 1982, 1, 325–378. [Google Scholar] [CrossRef]
- Levy, N.; Klein, I.; Ben-Elia, E. Emergence of cooperation and a fair system optimum in road networks: A game-theoretic and agent-based modelling approach. Res. Transp. Econ. 2018, 68, 46–55. [Google Scholar] [CrossRef]
- Tumer, K.; Proper, S. Coordinating actions in congestion games: Impact of top–down and bottom–up utilities. Auton. Agents Multi-Agent Syst. 2013, 27, 419–443. [Google Scholar] [CrossRef]
- Shoham, Y.; Leyton-Brown, K. Congestion Games. In Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations; Cambridge University Press: Cambridge, MA, USA, 2009; Section 6.4; pp. 201–213. [Google Scholar]
- Shoham, Y.; Leyton-Brown, K. Teams of Selfish Agents: An Introduction to Coalitional Game Theory. In Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations; Cambridge University Press: Cambridge, MA, USA, 2009; Chapter 12; pp. 383–408. [Google Scholar]
- Yamashita, T.; Izumi, K.; Kurumatani, K.; Nakashima, H. Smooth traffic flow with a cooperative car navigation system. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems, Utrecht, The Netherlands, 25–29 July 2005; pp. 478–485. [Google Scholar]
- Levy, N.; Ben-Elia, E. Emergence of system optimum: A fair and altruistic agent-based route-choice model. Procedia Comput. Sci. 2016, 83, 928–933. [Google Scholar] [CrossRef] [Green Version]
- Erev, I.; Ert, E.; Roth, A.E. A choice prediction competition for market entry games: An introduction. Games 2010, 1, 117–136. [Google Scholar] [CrossRef] [Green Version]
- Lattimore, T.; Szepesvári, C. Bandit Algorithms; Cambridge University Press: Cambridge, MA, USA, 2020. [Google Scholar] [CrossRef]
- Zhou, J.; Lai, X.; Chow, J.Y.J. Multi-armed bandit on-time arrival algorithms for sequential reliable route selection under uncertainty. Transp. Res. Rec. 2019, 2673, 673–682. [Google Scholar] [CrossRef]
- Yoon, G.; Chow, J.Y.J. Contextual bandit-based sequential transit route design under demand uncertainty. Transp. Res. Rec. 2020, 2674, 613–625. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Upper-Confidence-Bound Action Selection. In Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018; Section 2.7; pp. 35–36. [Google Scholar]
- DLR – Institute of Transportation Systems. SUMO. Available online: http://www.dlr.de/ts/en/desktopdefault.aspx/tabid-9883/16931read-41000/ (accessed on 10 August 2021).
- Shoham, Y.; Leyton-Brown, K. Selfish routing and the price of anarchy. In Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations; Cambridge University Press: Cambridge, MA, USA, 2009; Section 6.4.5; pp. 180–181. [Google Scholar]
Simulation Parameters | Total Time on 60th Sim | POA 60th Sim | Mean POA | Median POA |
---|---|---|---|---|
Direct exp., no re-routing, SAW weight = 0.25, = 0 4 | 014.7 | 1.17 | 1.182 | 1.179 |
TIS, no re-routing, SAW weight = 0.5, Weight step = 0.1 | 3824.9 | 1.115 | 1.118 | 1.114 |
Direct exp., with re-routing, SAW weight = 0.5, = 0 | 3934.4 | 1.147 | 1.162 | 1.149 |
TIS with re-routing, SAW weight = 0.5, Weight step = 0.8 | 3819.9 | 1.113 | 1.121 | 1.118 |
TIS with Direct exp., SAW weight = 0.5, Weight step = 0.9 | 3722.4 | 1.085 | 1.092 | 1.085 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Solter, A.; Lin, F.; Wen, D.; Zhou, X. Data-Driven Multi-Agent Vehicle Routing in a Congested City. Information 2021, 12, 447. https://doi.org/10.3390/info12110447
Solter A, Lin F, Wen D, Zhou X. Data-Driven Multi-Agent Vehicle Routing in a Congested City. Information. 2021; 12(11):447. https://doi.org/10.3390/info12110447
Chicago/Turabian StyleSolter, Alex, Fuhua Lin, Dunwei Wen, and Xiaokang Zhou. 2021. "Data-Driven Multi-Agent Vehicle Routing in a Congested City" Information 12, no. 11: 447. https://doi.org/10.3390/info12110447
APA StyleSolter, A., Lin, F., Wen, D., & Zhou, X. (2021). Data-Driven Multi-Agent Vehicle Routing in a Congested City. Information, 12(11), 447. https://doi.org/10.3390/info12110447