Taxi behavior is characterized by the dynamic discrete time dependent events involving customer pick-up, customer drop-off, cruising, and parking within the spatial and temporal domain. Simulation models which are a simplification of a real-world system, can help understand effects of change of such dynamic behavior. In this regard, agent-based simulation and modeling, in which each taxi behaves as an agent, can capture such dynamic behavior through reconstructing complex patterns by decomposing complex systems down to the level of single agents that are administrated by sets of behavior rules [1
]. The advantage of agent-based modeling is that, rather than modeling the entire system with a single equation, the entire system is modeled with the collection of autonomous taxi agent with rules governing them, which makes complex individual agent behave more naturally [2
]. In this way, agent-based simulation and modeling can highlight the effect of a change in taxi services and its impact to driver’s income profitability through optimizing parameters (number of trips, passenger waiting time) derived from simulation. As an example, what will be the impact on taxi behavior service when the number of agents i.e., taxi is increased to the region of low taxi demand or decreased to the region of high taxi demand. Understanding such causality could help better management of taxi fleets with regards to the operational cost as well as improve taxi driver’s income. Moreover, recently, many big cities, such as London and New York, have plans to adopt electric taxis [3
], and understanding discrete taxi behavior through agent-based modeling could help optimize locations for charging stations, which are crucial for such electric vehicles.
As taxis services are operational throughout the city, spatial and temporal information from these vehicles can be an asset for governing different aspects of urban management. Information, as such, could contribute mainly to helping make better decision-making processes at both government and local level. Having said this, the constant advancement of collecting moving trajectory data in space and time has opened up the possibility of a wide range of study in the field of spatial information science [4
]. One of the primary technologies for retrieval of information of various traffic data is from stationary equipment, like loop detectors, for automatic vehicle identification. However, they are limited to specific sections of road. On the contrary, a probe car, also known as a probe vehicle or floating car, utilizes the running vehicles to gather various traffic information, and has been an emerging ITS technology for modeling vehicle behavior [5
]. Big cities, like New York and Beijing, have taxis already equipped with GPS sensors that collects spatial and temporal data to a data center to be processed to extract traffic information [8
]. The taxi driver mobility intelligence is an essential factor to maximize both profit and reliability within every possible scenario, and the knowledge about the service can be an advantage for the driver [9
]. However, to understand such stochastic dynamics of taxi behavior, micro-level simulation models are required, which can be further analyzed for optimization of taxi services by adjusting parameters like demand, supply, or altering dispatching algorithm [10
In this paper, agent-based modeling and simulation (ABMS) was implemented, for which in recent years, has been seen in many areas of application, such as flow evacuation, traffic, and customer flow management [2
]. Agent-based modeling and simulation describes the dynamic action of an entity i.e., taxi agent governed by behavior rule and properties, similar to the work presented in [6
], to emulate the taxi behavior in Bangkok, Thailand.
The contributions of this paper are summarized as follows:
Proposed a taxi agent regarding spatial and temporal domain based on a stay point cluster of probe GPS data and a kernel density of its timestamp.
Formulated a concept of free taxi movement based on the movement direction of the taxi, which was introduced for searching passengers.
Developed an agent-based simulation model which is based on multiple parameters (taxi stay point cluster; trip information (origin and destination); taxi demand information; free taxi movement and network travel time) that were derived from probe GPS taxi data. As such, agent’s parameters were mapped into a grid network and the road network, for which the grid network was used as a base for query/search/retrieval of taxi agent’s parameters, while the actual movement of taxi agents was on the road network, with routing and interpolation.
The motivation of taxi behavior simulation modeling is to optimize taxi service operation, which would be the subject of future study, through an increased number of passenger trips, making drivers wait a less amount of time to get their next passenger, and making more extended passenger trips, as well as determine optimum working time based on the spatial and temporal domain. However, to identifying and evaluating such optimizing parameters, knowing real taxi behavior is a must. The proposed agent-based simulation and modeling recreates the real taxi behavior from which optimizing parameters could be derived, that would improve the taxi service for both driver, regarding monetary profit, as well as for passenger, regarding service level of the taxi.
Also, spatial data, as such, probe GPS taxi data, with its ubiquitous properties, are enormous, and in most cases, deemed confidential. In such cases, obtaining raw spatial data is somewhat complicated. However, simulation and modeling techniques proposed in the study could essentially recreate such spatial data, with secondary data derived, and with properties as similar with the real data. In this regard, such simulation and modeling techniques are not only limited to vehicle behavior, but also could be implemented in simulating human mobility behavior from GPS or call detail record data.
2. Literature Review
In computer modeling, the term “model” describes the abstract or simplified representation of a real world that is already present or planned for the future. The simulation model is typically defined as a mathematical process or an algorithm that depends on various input parameters, which, when processed with mathematical expressions, will result in one or more than one output, encapsulating the behavior and performance of a system in real-world scenarios [11
Taxi service simulation is a dynamic process involving changing demand and supply, as well as urban traffic environment, which suggests stochastic behavior of taxi services that govern the movement, as well as the distribution of taxis [10
]. Taxi customer bilateral searching and meeting behavior in a network was proposed in [15
], which considered stochastic micro-searching behavior of both taxis and customers when they are searching for each other based on customer origin-destination (OD). The model featured location variation in the level of taxi services and stochastic microscopic searching behavior, such that the taxi searched for passenger locally in the network that incorporated Markov chain approach as a route for which transition probability or the link choice probability was specified by the customer pick-up rate within the network. An hourly zone-based origin-destination matrix with the occupied vehicle was developed for evaluating the taxi service behavior, which was then implemented for evaluating time-based taxi demand and supply concerning a given location [7
A probabilistic based model for time-dependent taxi behavior on a road segment, as well as parking space, was devised for taxi passenger recommendation, in which the probability of picking up a passenger was estimated when the taxi went for a specific parking space [16
]. The model was primarily a recommendation system used for suggesting the taxi driver with a location, towards which they would pick up a passenger. Moreover, [17
] proposed passenger-finding strategies based on large real-world taxi data which utilized two passenger-finding strategies which were looking or waiting for a passenger that was analyzed using average pick-up number over the given period and location. The model focus was also predicting potential passenger for the event before pick-up and after drop-off only. A time-dependent taxi behaviors model was proposed which incorporated a taxi picking up, dropping off, cruising, and parking system for both taxi drivers and passengers. The model was also primarily a recommendation system that was developed considering the queue length at parking places, along with day type and weather condition. The model provided a number of top parking places along with routes to them, given the current location and time of the taxi driver or a passenger [8
Time-dependent logit based search models were proposed using global positioning data from an urban taxi, in which profit per unit time was used as the factor characterizing taxi drivers’ search behavior [18
]. A cell-based local customer search behavior was implemented for understanding vacant taxi behavior using a cell/grid-based approach which showed customer search decisions were significantly affected by the probability of successfully picking up a customer along the search route [19
]. The model was further improved by introducing discrete choice behavior representing taxi search behavior of taxi customers for hailing vacant taxis on the street, proposed by [20
], which adopted a multinomial logit approach to model the preference of taxi customers of hailing vacant taxis on streets. Furthermore, the study has been made for a prediction model, which employed learning algorithms to the GPS data. Real-time streaming data was implemented for predicting taxi passenger demand at a given taxi stand [9
], in which the model predicted the passenger demand over the taxi stand for a given period in future.
The existing taxi behavior model primarily focuses on finding passenger strategies through demand prediction with a recommendation system, while few studies are present that would provide insight into the effect of an oversupply of taxis in the given area or vice versa [21
], the reason for which real taxi behavior modeling is important, as that would replicate the real-world system. The proposed agent-based simulation model in this research primarily focuses on understanding the real taxi behavior by utilizing GPS data from the probe taxi, from which further investigation could be made for an efficient passenger-finding system, managing taxi fleet operations, i.e., optimizing the taxi service operation, as well as understanding the impact of oversupply or undersupply of taxis with respect to existing demand. Agent-based simulation in the field of computational science has proved to become a powerful tool for analyzing complex problems, where random or stochastic behavior, as similar to the taxi behavior, can be presented together with behavioral rules. In this regard, [12
] proposed a discrete event simulation model for modeling the behavior of agents operating in a city road network of which agents make their own decisions for making trips. Similarly, [6
] further provided a multi-agent-based simulation, which modeled taxi driver’s strategies as a decentralized discrete event, focusing on modeling only the taxi driver’s behavior, which was designed to make an aggregated pattern of taxi movement as similar to the real world.
3. System Overview and Preprocessing
The overall system overview is shown in Figure 1
, which has preprocessing, data preparation, and taxi behavior modeling as its stages, and indexing and processing platform as its tool to handle the big data. The preprocessing stage consists of preparing for the grid network and road network, with conducting cleaning and map matching of the raw probe GPS data.
In the data preparation stage, multiple secondary datasets from cleaned probe data were extracted, including stay point, passenger trip data, origin-destination probability data, demand probability data, direction probability data, and grid network road network travel time data. In taxi behavior modeling stage, agent-based modeling was implemented, that simulated the taxi behavior of the urban city.
Managing a large volume of data requires an efficient indexing technique that would handle index, search, and retrieval jobs [22
]. In this research, both spatial and non-spatial indexing technique was implemented for the simulation purpose. STR tree, which is sort–tile–recursive R tree from Java Topological Suite (JTS), was implemented to index and search spatial data. As for non-spatial data, an index and search engine named Lucene, that works on vector space model algorithms, was implemented for all query, search, and retrieval tasks during the simulating operation [23
In addition to the large indexing volume of data, the preprocessing of all the data to be utilized for simulation, including cleaning, retrieving trip information, origin-destination, stay point extraction, direction movement extraction, was conducted in Apache Hadoop/Hive large-scale distributed computing system [24
]. The total GPS probe data preprocessed from 1 June 2015 to 31 July 2015 was about 2.2 billion data rows which were stored in Hadoop Distributed File System (HDFS). Each data row consisted of a GPS data points with specification as described in Table 1
. For spatial data processing, Apache Hive based query HiveQL (Hive Query Language) was developed including Hive UDF (User Defined Function) and Hive UDAF (User Defined Aggregated Function).
3.1. Vehicle Probe Data
In this research, the vehicle is the taxi that is running in and around Bangkok, Thailand, of which data is provided by Toyota Tsusho Nexty Electronics (Thailand) Co., Ltd., Bangkok, Thailand. The probe GPS data was collected from approximately 10,000 taxis with a sampling time of 3 s or 5 s, which was collected from 1 June 2015 to 31 July 2015.
Each of the probe data collected belongs to the spatial trajectory generated by moving taxi in geographical space such that trajectory Ti
, …, pj
}, where pj
), such that xj
= longitude, yj
= latitude, and tj
= timestamp. Table 1
shows the data specification and sample data of collected probe data.
3.2. Prepare Road Network and Grid Network
Open street map data of Thailand was utilized for the road network for which topological error was cleaned [25
]. Here, road network was represented by R
such that R
, …, rn
}, where r1
, …, rn
is each road segment. The total of 228,416 OSM road network features was extracted for Bangkok and the surrounding provinces. Following the preparation of OSM road network data, taxi probe GPS data were preprocessed to remove erroneous datasets. The cleaned GPS data were then map-matched with probabilistic map-matching process, with open street map road network R [25
], that mapped GPS data on the road segment, which was the subject of the previous research work.
The small grid size of 500 × 500 meters was chosen as grid network, in order to preserve spatial patterns and characteristics in the grid [26
], however, the optimum grid size selection is still subjective, as larger grid size could be suitable for suburban or rural areas, but not suitable for dense urban area [27
]. A grid network of 500 × 500 meters was constructed, covering all of Bangkok region, as well as surrounding provinces. Here, grid network was represented by G
such that G
, …, gm
}, where g1
, …, gm
is each grid or cell. The total of 64,620 grid network features was prepared for Bangkok and the surrounding provinces. In addition to the road network map matching, the cleaned GPS data were also mapped to the grid network G. Figure 2
shows the OSM road network and grid network in Bangkok and surrounding provinces.
Grid network was used as it simplified the computation while maintaining both spatial and temporal relevance of the aggregated dataset. Also, use of grid network splits the given spatial region into disjoint areas, which makes it easy to inspect for further qualitative analysis [27
]. The road network was used during the preprocessing step of cleaning and map-matching probe taxi data, and then later used routing and interpolation of the simulated taxi agent trajectory, as described in Section 5
Modeling of taxi service is an important aspect of understanding the behavior of taxi service level in the city. This paper proposes a data driven agent-based simulation model to study simulated taxi behaviors in a large-scale urban area with the taxi probe vehicle data. Analysis of the taxi agent simulation showed a significant similarity with the real taxi data, indicating that the simulated result could keep the real nature of taxi service behavior. The previous study on agent-based modeling for taxi behavior analyses have compared the measured and the modeled travel distance, and travel time with the cost, to validate the model. However, for travel time, results were scattered between the measured and the model data [12
], for which two issues regarding road network density and routing were mentioned. In the agent-based modeling presented here, taxi service modeling was categorized based on weekday and weekend. Nevertheless, with the increasing utilization of GPS probe data, modeling of service can be made by adding other entities, such as daily variation, monthly variation, etc. More importantly, such simulation can help understand and predict the effect of having a large number of taxis in the spatial and temporal domain with low demand, and vice versa. Understanding such taxi behavior in the city can significantly help managing and dispatching the fleet of the taxi that can make monetary profit for the drivers.
The limitation of the current agent-based model is that the current agent-based model system utilizes an offline learning method, which possesses constraints in terms of time and resources when required to learn from a high-speed streaming dataset. The offline learning method, despite having many use cases, possess a limitation in regards how it can handle new datasets. In such cases, the model needs to be improved that would help accommodate learning from high-speed steaming data, as proposed in [36
], where the OD matrix constantly evolves as time progresses, with the addition of new datasets and removal of the outdated datasets. The model can further be improved in terms of free movement of vacant taxis, with regard to replacing movement directed by direction angle to searching the next road network node at each time interval. Furthermore, current agent-based modeling could describe the taxi behavior with a trip time of 2 h and trip distance of 100 km. Though such trips accounted for about 98% of the total trip, the model could be further improved to encapsulate both short and long trips, regarding both time and distance.