1. Introduction
Unlicensed taxis refers to vehicles engaged in transportation operations without legally obtaining the necessary operating rights, posing significant safety risks to the transportation sector [
1]. Currently, many drivers in China are involved in unlicensed taxi operations. Due to the lack of necessary assessments and qualification checks, the quality of drivers is inconsistent. Some even have histories of criminal records, posing significant threats to passenger safety. Additionally, most unlicensed taxis are low-end or poorly maintained vehicles, making it difficult to ensure vehicle quality, which poses potential risks to passenger safety. Traditional methods for identifying unlicensed taxis primarily rely on manual roadside inspections and passenger reports, which not only consume substantial human and material resources but also easily disrupt normal traffic order, leading to social conflicts. Therefore, enhancing traditional methods by leveraging traffic big data analysis to identify unlicensed taxi operations is crucial for optimizing road transportation management.
Traffic surveillance bayonet data contain a vast amount of vehicle travel information, from which vehicle travel characteristics can be effectively extracted, enabling accurate identification of unlicensed vehicles and providing a new approach to managing these vehicles. Some scholars have utilized big data mining techniques to extract vehicle travel information from traffic surveillance bayonet data. Liu et al. [
2] analyzed vehicle travel characteristics based on frequent sequence-pattern mining algorithms using bayonet vehicle trajectory sequences. Wang et al. [
3] collected hourly scale data on the structure of vehicle types, road types, and time-varying characteristics of travel and technical parameters for non-local vehicles in the road network inside and outside the First Ring of Foshan City using big data mining techniques. Chen et al. [
4] proposed a method for completing vehicle travel OD based on license plate recognition data, supplementing and restoring OD at measurement points by distributing link flows to obtain complete vehicle travel paths. Wei et al. [
5] established a vehicle travel-demand evaluation index system based on traffic surveillance bayonet data and analyzed vehicle travel characteristics from multiple dimensions. Long et al. [
6] extracted vehicle travel-path information from traffic surveillance bayonet data and analyzed vehicle travel patterns at both individual and aggregate levels. Ruan et al. [
7] combined the K-shortest path algorithm with the grey relational algorithm to complete decisions and reconstruct travel paths generated from license-plate recognition data. Yao et al. [
8] used bayonet license-plate recognition data and employed factor analysis to integrate vehicle travel stability factors, efficiently identifying commuter vehicles. Wu et al. [
9] introduced a heuristic optimal scheduling approach to identify abnormal traffic events, and the results demonstrated that this method outperforms existing algorithms. The above studies are based on traffic surveillance bayonet data, and set feature indicators according to vehicle travel patterns to extract commuter vehicle travel characteristics, providing a methodological reference for extracting and identifying the features of suspected unlicensed taxis.
Currently, scholars both domestically and internationally have conducted extensive research on identifying unlicensed taxis. Lin et al. [
10] proposed a reporting system for unlicensed vehicles based on Near Field Communication technology and Android, encouraging passengers to report suspected unlicensed taxis and creating an effective environment for their governance. Wang et al. [
11] used Convolutional Neural Networks (CNNs) to obtain trajectory data of unlicensed and normal vehicles through simulation experiments, performing feature learning and recognition to improve the identification rate of suspected unlicensed taxis. Li et al. [
12] used vehicle mileage as an identification indicator to establish a model for identifying unlicensed taxis and identifying suspected unlicensed taxis through case analysis. Shuai et al. [
13] extracted vehicle Radio Frequency Identification data and proposed a k-medoids-based algorithm for identifying unlicensed taxis, which was validated through experiments. Ma et al. [
14] collected vehicle operation data using RFID technology and applied the SOM neural network clustering algorithm to establish a mathematical model for identifying unlicensed taxis, achieving favorable results. Tian et al. [
15] developed a method for identifying unlicensed taxis based on ETC toll data, employed an improved K-means++ algorithm, and empirically identified unlicensed taxis in Guiyang. The focus of the above studies is on algorithm models, but they primarily rely on data simulation for theoretical validation, and their applicability requires further testing.
With the development of big-data analysis techniques, some scholars have conducted research on identifying unlicensed taxis by analyzing various types of traffic data and extracting different feature indicators. Zhao et al. [
16] analyzed vehicle refueling data, extracting temporal and spatial refueling characteristics, and defined vehicles with abnormal refueling patterns as suspected unlicensed taxis. Chen et al. [
17] used Electronic Registration Identification data to build a detection model for unlicensed taxis using ensemble learning methods, identifying suspected unlicensed taxis. Yuan et al. [
18] extracted vehicle travel data from traffic surveillance bayonets, developed a detection model to identify coarse-grained unlicensed taxis and further applied a feature-trained Support Vector Machine classification model to identify fine-grained unlicensed taxis. Wang et al. [
19] extracted two types of behavioral features: daily behaviors and sustainable behaviors, and used three machine learning methods to compare the accuracy and quantity of identified unlicensed taxis by each method. Tian et al. [
20] proposed two indicators: “path irregularity” and “time irregularity”, and developed a model to distinguish between commercial and non-commercial vehicles. Huang et al. [
21] employed a random forest algorithm to develop a classifier for identifying unlicensed taxis, comparing its performance with that of other models, to validate the accuracy of the proposed method. Juan et al. [
22] utilized mobile GPS data to develop a decision-tree machine-learning classification algorithm, which can be applied to urban traffic monitoring. The above studies used machine learning algorithms to identify suspected unlicensed taxis in the study areas, but they did not further analyze the spatiotemporal trajectory characteristics of these vehicles, limiting their ability to provide strong support for precise management by authorities.
Therefore, this study extended previous research by using traffic surveillance bayonet data to gather vehicle passage information. The spatiotemporal characteristic indicators of private cars, confirmed unlicensed taxis and compliant taxis were selected and analyzed using variance analysis to explore their differences, thereby determining the identification indicators for unlicensed taxis. Based on this, a Binary Logistic Regression analysis was conducted to develop a model for identifying unlicensed taxis using significant identification indicators. The high-precision spatiotemporal characteristics of suspected unlicensed taxis were then extracted, and governance measures for these vehicles were proposed. The research results contribute to enriching the methods for identifying and regulating unlicensed taxis, providing strategic guidance for traffic management authorities in formulating policies to combat illegal operations, thereby enhancing law enforcement efficiency and passage safety. The primary contributions of this study are summarized as follows:
- (1)
This study addresses the limitations of previous research, which was constrained by a narrow range of data types and small sample sizes. Unlike prior studies that mined data from the vehicle’s perspective, this study adopts the perspective of traffic managers, utilizing traffic surveillance bayonets distributed across the road network to collect vehicle passage data. This method facilitates the convenient and accurate gathering of multidimensional traffic data from vehicles on the road, providing a robust data foundation for identifying unlicensed taxis and ensuring the successful implementation of practical applications.
- (2)
This study avoided relying on a single vehicle characteristic by incorporating multiple features, such as daily average mileage, daily operating time, and the ratio of operating days. Through multidimensional cross-analysis, this approach improved identification accuracy and bolstered the model’s robustness, mitigating biases associated with relying on a single indicator.
- (3)
To develop effective solutions for the precise regulation of unlicensed taxis, this study analyzed the spatiotemporal distribution of suspected unlicensed taxis, including their operating start- and end-points and times, to identify distribution patterns. This thorough analysis provides a strong basis for traffic management authorities to implement targeted regulations, thereby improving management efficiency.
6. Conclusions
This study employs traffic surveillance bayonets to collect vehicle passage information and analyzes the spatiotemporal characteristics of private cars, confirmed unlicensed taxis, and compliant passenger vehicles. Based on this analysis, identification indicators are established, and vehicle-operation-characteristic metrics are calculated using the aforementioned method. An unlicensed-taxi-identification model is then applied to obtain data on suspected unlicensed taxis, enabling traffic management authorities to develop targeted regulatory strategies for these vehicles. The main conclusions of the study are as follows:
- (1)
Based on traffic-surveillance-bayonet data, a distance matrix was constructed and a driving interval threshold was set. The mileage and operating time data were calculated, and variance analysis was conducted to compare the differences between private cars, unlicensed taxis, and compliant taxis, leading to the determination of unlicensed-taxi-identification indicators.
- (2)
Based on the identified unlicensed-taxi indicators, a binary Logistic regression model was established. The model parameters were estimated using the maximum likelihood method, and the model’s goodness-of-fit and predictive power were evaluated through Hosmer–Lemeshow tests and ROC curve analysis. The results show that the model can effectively predict the likelihood of a vehicle engaging in unlicensed-taxi activities (R = 89.26%, P = 94.74%, ACC = 99.10%, F1 = 91.91%, AUC = 0.994).
- (3)
Using the information from the identified suspected unlicensed taxis, an analysis of their daily start and end times and location distribution was conducted to provide a basis for precise management by traffic authorities. The results show that the operational characteristics of suspected unlicensed taxis differ from those of private cars, with regular patterns in their daily start and end times and location distributions.
The study provides precise data support for traffic management authorities, enhancing law enforcement efficiency and contributing to improved passage safety. Furthermore, the study offers a foundation for formulating more rational traffic management policies and promoting the standardization and sustainable development of the transport industry. However, there is still room for further development. Future research will build upon existing studies and the results from traffic management authorities to explore the characteristics of groups involved in illegal operations. This will inform the development of a preventive regulatory identification method for unlicensed taxis, facilitating proactive supervision during processes such as new vehicle purchases and second-hand vehicle transactions.