# Zone-Agnostic Greedy Taxi Dispatch Algorithm Based on Contextual Matching Matrix for Efficient Maximization of Revenue and Profit

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Works

#### 2.1. Taxi Dispatching Systems

#### 2.2. Reinforcement Learning

## 3. Modeling the Taxi Dispatch Problem

## 4. Methodology

#### 4.1. CMM (Contextual Matching Matrix)

Algorithm 1: Making match with Contextual Matching Matrix (CMM) |

#### 4.2. M-Greedy

Algorithm 2: Simulation in M-Greedy |

Algorithm 3: Dispatch process in M-Greedy |

#### 4.3. IQL (Independent Q-Learning)

- State S: S is composed of $<O,\Psi ,\overline{O{D}_{i}},Ou{t}_{i}>$. The state is further divided into a global state and a partially observable state. A global state is composed of <O, $\Psi $>, where O is the proportion of fleets to be assigned to every destination zone and $\Psi $ is the distribution of supplies of all zones. The partially observable state is composed of $<\overline{O{D}_{i}},{O}_{i}>$, where $\overline{O{D}_{i}}$ is the average OD distance from ${Z}_{i}$ to every zone and $Ou{t}_{i}$ is the order proportion of ${Z}_{i}$.
- Action A: A set of possible fleet distributions chosen by an agent.
- Reward R: R is based on the estimated profit after every decision time step.
- Discount factor $\gamma $: The reward of the predicted future is discounted with a factor set to 0.99.
- State transition probability function T: A taxi matched with a passenger becomes idle after the estimated time to travel to the destination zone. Assuming that a taxi drives 400 m at each time step (as defined in Equation (3), we uniformly randomly choose a future time step (from 0 to 5) when the taxi is added back to the supply pool after the idle period. Passenger demands are generated randomly for the succeeding states.

Algorithm 4: Dispatch in IQL |

#### 4.4. DDR (Distribution Difference Reward)

#### 4.5. Z-CMM (Zone-Agnostic CMM)

## 5. Evaluation

## 6. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Lin, K.; Zhao, R.; Xu, Z.; Zhou, J. Efficient large-scale fleet management via multi-agent deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1774–1783. [Google Scholar]
- Zhou, M.; Jin, J.; Zhang, W.; Qin, Z.; Jiao, Y.; Wang, C.; Wu, G.; Yu, Y.; Ye, J. Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2645–2653. [Google Scholar]
- Liu, Z.; Li, J.; Wu, K. Context-Aware Taxi Dispatching at City-Scale Using Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst.
**2020**. [Google Scholar] [CrossRef] - Park, J.H.; Lee, C.J.; Yoon, Y. Dynamic Pricing Based on Reinforcement Learning Reflecting the Relationship between Driver and Passenger Using Matching Matrix. J. Korea Inst. Intell. Transp. Syst.
**2020**, 19, 118–133. [Google Scholar] [CrossRef] - Zhang, R.; Pavone, M. Control of robotic mobility-on-demand systems: A queueing-theoretical perspective. Int. J. Robot. Res.
**2016**, 35, 186–203. [Google Scholar] [CrossRef] - Seow, K.T.; Dang, N.H.; Lee, D.H. A collaborative multiagent taxi-dispatch system. IEEE Trans. Autom. Sci. Eng.
**2009**, 7, 607–616. [Google Scholar] [CrossRef] - Liao, Z. Real-time taxi dispatching using global positioning systems. Commun. ACM
**2003**, 46, 81–83. [Google Scholar] [CrossRef] - Lee, D.H.; Wang, H.; Cheu, R.L.; Teo, S.H. Taxi dispatch system based on current demands and real-time traffic conditions. Transp. Res. Rec.
**2004**, 1882, 193–200. [Google Scholar] [CrossRef] [Green Version] - Chung, L.C. GPS Taxi Dispatch System Based on A* Shortest Path Algorithm; Malausia University of Science and Technology: Kuala Lumpur, Malaysia, 2005. [Google Scholar]
- Liu, Z.; Li, Z.; Wu, K.; Li, M. Urban traffic prediction from mobility data using deep learning. IEEE Netw.
**2018**, 32, 40–46. [Google Scholar] [CrossRef] - Qu, M.; Zhu, H.; Liu, J.; Liu, G.; Xiong, H. A cost-effective recommender system for taxi drivers. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 45–54. [Google Scholar]
- Yuan, N.J.; Zheng, Y.; Zhang, L.; Xie, X. T-finder: A recommender system for finding passengers and vacant taxis. IEEE Trans. Knowl. Data Eng.
**2012**, 25, 2390–2403. [Google Scholar] [CrossRef] - Xu, J.; Rahmatizadeh, R.; Bölöni, L.; Turgut, D. Real-time prediction of taxi demand using recurrent neural networks. IEEE Trans. Intell. Transp. Syst.
**2017**, 19, 2572–2581. [Google Scholar] [CrossRef] - Yang, H.; Wong, S.C.; Wong, K.I. Demand–supply equilibrium of taxi services in a network under competition and regulation. Transp. Res. Part B Methodol.
**2002**, 36, 799–819. [Google Scholar] [CrossRef] - Chadwick, S.C.; Baron, C. Context-Aware Distributive Taxi Cab Dispatching. U.S. Patent App. 14/125,549, 19 March 2015. [Google Scholar]
- Myr, D. Automatic Optimal Taxicab Mobile Location Based Dispatching System. U.S. Patent 8,442,848, 14 May 2013. [Google Scholar]
- Papadimitriou, C.H.; Steiglitz, K. Combinatorial Optimization: Algorithms and Complexity; Courier Corporation: New York, NY, USA, 1998. [Google Scholar]
- Zhang, L.; Hu, T.; Min, Y.; Wu, G.; Zhang, J.; Feng, P.; Gong, P.; Ye, J. A taxi order dispatch model based on combinatorial optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, USA, 13–17 August 2017; pp. 2151–2159. [Google Scholar]
- Zheng, L.; Chen, L.; Ye, J. Order dispatch in price-aware ridesharing. Proc. VLDB Endow.
**2018**, 11, 853–865. [Google Scholar] [CrossRef] [Green Version] - Zhao, Y.; Xia, J.; Liu, G.; Su, H.; Lian, D.; Shang, S.; Zheng, K. Preference-aware task assignment in spatial crowdsourcing. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 2629–2636. [Google Scholar]
- Oda, T.; Joe-Wong, C. MOVI: A model-free approach to dynamic fleet management. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, HI, USA, 15–19 April 2018; pp. 2708–2716. [Google Scholar]
- He, S.; Shin, K.G. Spatio-temporal capsule-based reinforcement learning for mobility-on-demand network coordination. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 2806–2813. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv
**2013**, arXiv:1312.5602. [Google Scholar] - Li, M.; Qin, Z.; Jiao, Y.; Yang, Y.; Wang, J.; Wang, C.; Wu, G.; Ye, J. Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 983–994. [Google Scholar]
- De Lima, O.; Shah, H.; Chu, T.S.; Fogelson, B. Efficient Ridesharing Dispatch Using Multi-Agent Reinforcement Learning. arXiv
**2020**, arXiv:2006.10897. [Google Scholar] - Xu, Z.; Li, Z.; Guan, Q.; Zhang, D.; Li, Q.; Nan, J.; Liu, C.; Bian, W.; Ye, J. Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 905–913. [Google Scholar]
- Tan, M. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA, 27–29 July 1993; pp. 330–337. [Google Scholar]
- Tampuu, A.; Matiisen, T.; Kodelja, D.; Kuzovkin, I.; Korjus, K.; Aru, J.; Aru, J.; Vicente, R. Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE
**2017**, 12, e0172395. [Google Scholar] [CrossRef] [PubMed] - Kim, B.; Kim, J.; Huh, S.; You, S.; Yang, I. Multi-objective predictive taxi dispatch via network flow optimization. IEEE Access
**2020**, 8, 21437–21452. [Google Scholar] [CrossRef] - Hartigan, J.A.; Wong, M.A. Algorithm AS 136: A k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.)
**1979**, 28, 100–108. [Google Scholar] [CrossRef] - Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; Volume 96, pp. 226–231. [Google Scholar]
- Campello, R.J.; Moulavi, D.; Sander, J. Density-based clustering based on hierarchical density estimates. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Gold Coast, Australia, 14–17 April 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 160–172. [Google Scholar]
- Kirk, D.E. Optimal Control Theory: An Introduction; Courier Corporation: New York, NY, USA, 2004. [Google Scholar]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn.
**1992**, 8, 279–292. [Google Scholar] [CrossRef] - Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010. [Google Scholar]
- Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Ft. Lauderdale, FL, USA, 11–13 April 2011; pp. 315–323. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar] - Ruder, S. An overview of gradient descent optimization algorithms. arXiv
**2016**, arXiv:1609.04747. [Google Scholar]

**Figure 2.**Consider an example where 100 passenger orders exceed the supply of 60 taxis in zone B. We can assign more taxis to the zone with the highest expected revenue and profit-generation opportunities (for example, zone C). Such an approach would generate a better result than the naīve approach of dividing the fleets in proportion to the current demand for different zones without predicting the situation at the drop-off zones.

**Figure 3.**Illustration of demand during (

**a**) 30 s (12:00:00 to 12:00:30 a.m.), (

**b**) 10 min (12:00 to 12:10 a.m.), and (

**c**) during 24 h (12:00 a.m. to 12:00 a.m. the next day).

**Figure 4.**Seoul is divided into (

**a**) 2, (

**b**) 4, (

**c**) 8, and (

**d**) 25 zones. The matching process occurs independently within each zone.

**Figure 6.**In every look-ahead state, M-Greedy chooses the top-k distributions that yield the highest revenue and profit. In this figure, each block represents a top-k case after simulating all cases in the preceding state. Then, the platform looks ahead D time steps in the future. As a result, M-Greedy chooses the best case among ${k}^{D}$ cases once the computation is finished on the last look-ahead state. In this paper, we set k to 3 and D to 3.

**Figure 10.**Comparison of the cumulative profit by each methodology during 24 h. The Z-CMM methodology recorded the highest profit.

**Figure 11.**Comparison of the cumulative revenue by each methodology during 24 h. The Z-CMM methodology recorded the highest revenue among the methodologies.

**Figure 12.**Zone-sensitive approaches did not allow the taxis to pick up a passenger in another service zone even though they are nearer than any other passengers within its zone. This is because the zone-sensitive approaches must determine the fleet proportions to different destination zones prior to matching taxis and passengers individually.

Notation | Definition |
---|---|

e | Episode time |

${E}_{max}$ | Maximum episode time |

t | Simulation time step |

${T}_{max}$ | Maximum simulation time steps |

${N}_{z}$ | The number of zones in Seoul |

${N}_{taxi}^{idle}$ | The number of idle taxis |

d | The number of time steps simulated |

D | The number of time steps to simulate |

k | Top-k cases to be chosen in a single simulation time |

${S}_{t}$ | Fleet state at time step t |

${S}_{t,n}^{{}^{\prime}}$ | Fleet state at t in nth best simulation case |

${Z}_{i}$ | Zone identified with i |

$OD$ | Vector of the orders’ distance from origin zone to destination zone |

$O{D}_{matched}$ | The sum of OD distance of matched orders |

$PD$ | Matrix of the distance from orders to idle taxis |

$P{D}_{Normalized}$ | Normalized PD |

$P{D}_{matched}$ | Sum of PD distance of matched orders |

$Ou{t}_{i}$ | Out degree of the ${Z}_{i}$ |

${O}_{i}$ | Ratio of orders in ${Z}_{i}$ to the total orders |

${\Psi}_{i}$ | Ratio of supplies in ${Z}_{i}$ to the total supplies |

${p}_{i}^{j}$ | A distribution with jth highest profit chosen by agent of ${Z}_{i}$ |

$\Delta $ | Sum of differences ratio of orders and ratio of supplies in each zone |

$Pro$ | Calculated profit in a time step |

$Rev$ | Calculated revenue in a time step |

$Re$ | Calculated reward in a time step |

Methods | Zone-Based Matching | Ratio of Taxis between Destination Zones | Myopic | Far Sighted | Taxi–Passenger Spatial Distribution |
---|---|---|---|---|---|

CMM | O | X | X | X | X |

M-Greedy | O | O | O | X | X |

IQL | O | O | X | O | X |

DDR | O | O | X | O | O |

Z-CMM | X | X | X | X | X |

Pick Up Timestep | Pick Up Zone Number | Destination Zone Number | Pick Up Latitude | Pick Up Longitude | Drop Off Latitude | Drop Off Longitude |
---|---|---|---|---|---|---|

0 | Zone 1 | Zone 1 | 37.616380 | 127.12491 | 37.4931945 | 127.029665 |

0 | Zone 1 | Zone 3 | 37.612991 | 126.92186 | 37.463410 | 126.527850 |

… | … | … | … | … | … | … |

2879 | Zone 25 | Zone 25 | 37.493194 | 126.90832 | 37.6083009 | 127.034780 |

Zone Number | Latitude | Longitude |
---|---|---|

Zone 1 | 37.40802 | 127.1442006 |

Zone 1 | 37.795449 | 126.19642 |

… | … | … |

Zone 25 | 37.119776 | 126.048993 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kim, Y.; Yoon, Y.
Zone-Agnostic Greedy Taxi Dispatch Algorithm Based on Contextual Matching Matrix for Efficient Maximization of Revenue and Profit. *Electronics* **2021**, *10*, 2653.
https://doi.org/10.3390/electronics10212653

**AMA Style**

Kim Y, Yoon Y.
Zone-Agnostic Greedy Taxi Dispatch Algorithm Based on Contextual Matching Matrix for Efficient Maximization of Revenue and Profit. *Electronics*. 2021; 10(21):2653.
https://doi.org/10.3390/electronics10212653

**Chicago/Turabian Style**

Kim, Youngrae, and Young Yoon.
2021. "Zone-Agnostic Greedy Taxi Dispatch Algorithm Based on Contextual Matching Matrix for Efficient Maximization of Revenue and Profit" *Electronics* 10, no. 21: 2653.
https://doi.org/10.3390/electronics10212653