P-Ride: A Shareability Prediction Based Framework in Ridesharing

: Ridesharing services aim to reduce travel costs for users and optimize revenue for drivers and platforms by sharing available seats. Existing works can be roughly classiﬁed into two types, i.e., online-based and batch-based methods. The former mainly focuses on responding quickly to the requests, and the latter focuses on meticulously enumerating request combinations to improve service quality. However, online-based methods perform poorly in service quality due to the neglect of the sharing relationship between requests, while batch-based methods fail in terms of efﬁciency. To obtain better service quality more efﬁciently, we propose a shareability prediction-based framework P-Ride. Speciﬁcally, we ﬁrst introduce the k -clique listing strategy in graph theory based on the shareability graph to reduce the infeasible request combinations. Moreover, we extend the shareability graph to the hypergraph structure to represent the higher-order shareable relationships among requests. Furthermore, we devise a shareability prediction model that supports the prediction of sharable relationships for request combinations of an arbitrary size, which helps further ﬁltering of candidate request combinations with GPU devices acceleration. The extensive experimental results demonstrate the efﬁciency and effectiveness of our proposed P-Ride framework.


Introduction
With the rapid development of the mobile internet and the sharing economy, ridesharing has become an important transportation mode for traveling. In ridesharing services, passengers share the available seats in vehicles in exchange for discounts on fees, while drivers and platforms realize higher revenues by improving the utilization of vehicles. Therefore, existing ridesharing service providers (e.g., Didi [1] and Uber [2]) are constantly striving for improvements in service quality, such as higher platform service rates [3,4], higher revenue [5,6] and reduced driving costs [5][6][7][8].
The ridesharing problem mainly focuses on the following two issues: request-vehicle matching and route planning. The existing works in solving the ridesharing problem can be roughly classified into two categories: online-based [7][8][9][10] and batch-based [11][12][13][14] methods. For the request-vehicle matching problem, the online-based methods select the vehicle with the lowest service cost for each request, according to a first-come, first-served strategy, and assign the request to the current best vehicle immediately, while the batchbased approaches meticulously group the requests in various combinations based on the strategy of grouping before the assignment and then select the appropriate vehicle for each group and assign the entire group of requests at once. For the problem of path planning, the insertion [9,10,13,15,16] method has been widely adopted in online-based methods. The insertion method updates the route by inserting the source and destination of the request into the proper location of the vehicle's current traveling route. Due to the insertion of the source and destination of the insertion method without reordering the original route waypoints of the vehicle, the route obtained by the insertion method only provides a locally optimal solution. In contrast, the batch-based method enumerates all feasible travel routes Example 1. In the ridesharing example shown in Figure 1a, there are four requests r 1 . . . r 4 with a road network consisting of seven nodes a . . . g. The value close to each edge indicates the distance between the connected two nodes. The details of the requests are shown in Table 1, where the release time represents the time of the request submitted to the platform, and the deadline indicates the expected arrival time to the destination. Suppose that the platform only allows one vehicle to service up to two requests in a trip, and it takes one unit of time to move one unit of distance in the road network. From the information given above, we have that r 1 can share with r 2 and r 3 , and r 2 can share with r 1 and r 3 . However, r 4 can only share with r 2 . We can construct a shareability graph of these four requests as shown in Figure 1b, where each edge indicates that the connected two requests are shareable.
Once the platform makes an allocation based on the online framework [7,8,16] (i.e., each request will be served or discarded as soon as it arrives), it will choose requests r 1 and r 2 to share a vehicle, which leads to the fact that requests r 3 and r 4 cannot share with other requests anymore when they arrive.
In the platform that adopts a batch-based framework [11][12][13][14], all possible request groups will be checked and the optimal one will be taken into service. Through observing the shareability graph, we found that the shareable requests are connected to each other. For example, request r 1 cannot be shared with r 4 . Therefore, we can efficiently prune all request groups that contain r 1 and r 4 simultaneously during the request group enumeration. Nevertheless, we still need to check the shareability of request groups {r 1 , r 2 }, {r 1 , r 3 }, {r 2 , r 3 }, {r 2 , r 4 } and {r 1 , r 2 , r 3 }. With the hypergraph-based shareability prediction model proposed in this paper, we can predict the shareability of these request groups simultaneously in a fixed time, which significantly improves the computational cost of the batch-based methods.   We summarize the main contributions of this paper as follows.
• We study the dynamic ridesharing problem and optimize the efficiency of batchbased methods. • We propose a request group enumeration strategy based on k-clique listing on the shareability graph to optimize request group enumeration for batch-based methods. • We devise the P-Ride ridesharing framework with a shareability prediction model that supports the batch prediction of shareable relationships among a arbitrary number of requests in a fixed time.

•
Through extensive experiments, we demonstrate that the proposed method in this paper can significantly reduce the computational cost of batch-based methods. The P-Ride framework proposed in this paper can significantly improve efficiency with little impact on service quality.

Literature Review
The ridesharing problem can be reduced to a variant of the Dial-a-Ride (DARP) problem [20,21], aiming to plan the vehicle routes and trip schedules for n requests who specific source and destination with practical constraints. The existing works on ridesharing services are categorized as static and dynamic, depending on whether all requests are known in advance. Most of the existing works [22,23] on DARP are in a static environment. For the dynamic ridesharing problem, the existing solutions are mainly in online mode [7,8,15,16] or batch mode [11][12][13]24].
In online mode, insertion [25] is the state-of-the-art operation of the existing works [26,27] in route planning, which inserts the pickup and drop-off locations of a new request into the vehicle's schedule without reordering. Tong et al. [16] proposed an insertion method based on dynamic programming, which checks the constraints in constant time and dispatches requests in linear time. Huang et al. proposed the structure of a kinetic tree in [8] to trace all feasible routes for each vehicle to reduce the total drive distance. The kinetic tree always provides the optimal vehicle schedule whenever the schedule changes (i.e., a new request arrives).
Batch-based algorithms partition the requests into groups and assign groups to their appropriate vehicles. Alonso-Mora et al. [11] proposed an RTV-Graph to model the relationship and constraints among requests, trips, and vehicles, where trips are the groups composed of shareable requests. The RTV-Graph minimizes the utility function by linear programming to allocate between vehicles and trips. The time cost for enumerating trips in building RTV-Graph grows exponentially. Zeng et al. [12] proposed an index called additive tree for pruning the infeasible groups during the group enumeration and greedily chose the most profitable request group for each vehicle. Although the batchbased methods achieve better service quality by meticulously enumerating request groups compared to the online-based methods, the violent enumeration of request groups by batch methods requires significant computational cost. Therefore, the ridesharing problem critically requires an efficient way to analyze request shareable relationships and identify the shareable request groups.
The structure of the shareability graph adopted in this paper is intuitive, and thus some similar structures have been used in some existing works, which are designed in different ways for request-vehicle matching. Wang et al. [28] formulate a tree cover problem to serve urban demands with as few vehicles as possible. Alonso-Mora et al. [11] optimally assign vehicles to shareable groups of customers through linear programming. Zhang et al. [29] formulate the passenger matching as a monopartite matching problem and solve it by the Irving-Tan algorithm. However, the existing works based on sharability networks mainly present sharable relationships through traditional graph structures, and traditional graphs can only represent binary shareability relations among requests. However, in the ridesharing scenario, the shareable relationship between requests often contains three or even more requests, which cannot be properly represented in the existing shareability graphs. Therefore, in this paper, we propose the concept of the shareability hypergraph to represent the high-order sharable relationship among requests, and we devise the shareability prediction model to identify shareable request groups of arbitrary size based on the shareability hypergraph for fast screening of the shareable request groups.

Preliminary
In this section, we introduce and analyze the dynamic ridesharing problem studied in this paper. We used a directed weighted graph to represent the road network, where each node in the graph represents an intersection and each edge indicates the road between intersections. Besides, each edge in the graph is associated with a weight cost(u, v), which indicates the cost to travel from u to v. In this paper, cost(u, v) shows the average travel time.

Definitions
Definition 1 (Request). Let r i = s i , e i , n i , t i , d i denote an online request r i released at time t i , which contains n i passengers departing from s i and requires one to arrive at e i before the deadline d i .
For each vehicle v j , it may be assigned with several numbers of mutually available requests R j simultaneously. Therefore, we also need to plan a route S j for each vehicle v j , which consists of a sequence of pickup and drop-off locations for the requests r ∈ R j . We define the route for each vehicle as follows.
Definition 2 (Route). Given a set of m requests R, let S = o 1 , ..., o 2m denote the route where o x is the source location s i or destination e i of request r i ∈ R.
We mark a route as feasible if and only if it satisfies the following three constraints: • Sequential constraint. The pickup location s i of request r i ∈ R i should be located before the drop-off location e i in the feasible route. • Capacity constraint. At any location o x ∈ S, the total number of requests on the vehicle should not exceed the capacity of the vehicle.
• Deadline constraint. For any location o x ∈ S, ∑ x k=1 cost(o k−1 , o k ) ≤ ddl(o k ), where ddl(o k ) satisfied following Equation (1) for different location type (source or destination).
Definition 3 (Shareable). Given a pair of requests r a and r b , we call it shareable if and only if there exists a feasible route S for serving r a and r b simultaneously.
We can extend the concept of sharable to multiple requests, i.e., we call the requests in a set R sharable if there exists a feasible route that can serve all requests r ∈ R at the same time. With the definitions above, we define the Dynamic Ridesharing Problem as follows.
Definition 4 (Dynamic Ridesharing Problem). Given a set R of n online requests and a vehicle set W with maximum capacity constraint c, the Dynamic Ridesharing Problem requires planning a feasible route for each vehicle w ∈ W to serve r ∈ R, which minimizes a specific utility function.
In this paper, we refer to the following unified cost UC defined in [16] as the optimization utility function. Specifically, the unified cost adopts the evaluation of total revenue in [16], and the varying penalty coefficient β is equivalent to the balance between income per unit time and fare per unit distance. Table 2 summarizes the commonly used symbols.

Symbol Description
R a set of m time-constrained request requests r i request request r i of request i S j the planned route for vehicle v j Q a candidate request group with size |Q| ≤ c

Hardness of Dynamic Ridesharing Problem
Following Theorem 1, we have that the dynamic shared travel problem is NP-Hard and therefore intractable. Moreover, Tong et al. proved that there is no polynomial-time algorithm with a constant competitive ratio for dynamic ridesharing problem [16].

Theorem 1 (Hardness of the Dynamic Ridesharing Problem). The Dynamic Ridesharing Problem defined in Definition 4 is NP-hard.
Proof. We prove the theorem by a reduction from the URR problem defined in [9], which has been proved to be an NP-hard problem. The URR problem can be briefly described as follows: given a set R of m requests and a set V of n vehicles, each request is associated with a source location s i , a destination e i , a pickup deadline rt − i and a drop off deadline rt + i . The URR problem arranges requests to vehicles to maximize the utility function u with capacity and time constraints.
For a given URR problem, we can transform it into an instance of the BDRP problem: we partition the requests r i ∈ R into a single element set G i ; the route of G i is a simple shortest path from s i to e i . In addition, we set the utility value for each vehicle and group pairs as Then, for this BDRP instance, we would like to arrange a request group for the given vehicle with a route such that the summation utility value ∑ v j ∈V −µ (v j , G v j ) is minimized. This shows that the URR problem can be solved in polynomial time if and only if the transformed BDRP can be solved.
In this way, we can reduce the URR problem to the BDRP. Since the URR problem has been proved to be NP-hard, BDRP is also NP-hard. This completes the proof of the theorem.

Brute-Force Solution
The existing batch-based methods [11][12][13][14] for the Dynamic Ridesharing Problem are based on a two-phase framework: (1) the enumeration of shareable request groups among the request in each batch; (2) the matching between request groups and vehicles to minimize the utility function. We summarize the batch-based methods as shown in Algorithm 1.

Algorithm 1 Brute-Force Solution
Require: A set R of n requests, a set W of m vehicles and a batch period τ Ensure: The planned routes set S for vehicle w ∈ W 1: t ← current timestamp; 2: for every time period τ do 3: for w j ∈ W do 5: G ← initialize a empty set for candidate shareable groups; 6: end for 10: g * ← min g∈G UC(g, w);

11:
S ← enumerating routes for serving r ∈ g * ; 12: S j ← arg min S∈S µ(S) 13: end for 14: t ← t + τ 15: end for 16: Firstly, we retrieve all the requests R − that are in the current batch window (line 3). Then we tried to select request groups for each vehicle w ∈ W (line 4-13). Specifically, we first enumerate request groups of size up to the vehicle capacity constraint c (line [5][6][7][8][9]. After that, we select the group g * ∈ G with the minimum unified cost (line 10). In the route planning phase, we enumerate the routes S that can serve all requests r in the request group g * simultaneously (line 11) and select the optimal route S j to assign to the vehicle w j (line 12). Finally, we update the timestamp t = t + τ and wait for the next trigger (line 14).
Complexity Analysis. For each vehicle w, we need to enumerate up to ∑ c i C i n request groups. Since the capacity constraint c n in practice, ∑ c i C i n can also be noted as O(n c ). Then, to identify whether the requests in each group are shareable, we need to examine up to A 2c 2c candidate routes o 1 , . . . , o 2c , and check whether each route satisfies the deadline constraint in linear time. Therefore, the time complexity of the Brute-Force algorithm is O(m × n c × (2c)! × 2c).

Shareability Graph
Shareable request group enumeration is a fundamental operator in batch-based methods (e.g., line 7 in Algorithm 1). Therefore, to optimize the efficiency of the shareable request group enumeration, we first define the following shareability graph for visualizing the shareable relationships between requests intuitively. Definition 5 (Shareability Graph). Given a set of requests R, SG = R, E denotes the shareability graph of R, where e = (r a , r b ) ∈ E reflects that request r a and r b are shareable.
Here, clique [17][18][19] is an extensively studied subgraph structure, and k-clique is a subset of k nodes in the graph that satisfies any two distinct nodes in the k-clique are adjacent in graph theory. With the shareability graph, we have the following Theorem 2 for enumerating those request groups that form a k-clique in the sharability graph rather than an arbitrary enumeration, which helps to reduce the search space by pruning infeasible groups.
Theorem 2. Given a feasible route S for k requests, the corresponding nodes of these k requests form a k-clique in the shareability graph.
Proof. We will prove it by a contradiction. Suppose a feasible route S of k requests whose corresponding nodes did not form a k-clique in the shareability graph. Thus, there are at least two nodes r a and r b that are not connected. We derive the subroute S from S by removing location o x except the source and destination of r a and r b . Since removing existing waypoints reduces the detour, the subroute S is also a feasible route. According to the definition of the shareability graph, there must exist an edge between r a and r b , which contradicts our assumption. In summary, these k requests form a k-clique in the shareability graph.
With the Theorem 2, a shareable request group of size k in the shareability graph must constitute a k-clique. Therefore, we can achieve efficient enumeration of shareable request groups by the state-of-the-art algorithm of k-clique listing [18] in graph theory.

Shareability Prediction with Hyper Graph
The dynamic shareability graph proposed in Section 4.1 can provide an intuitive representation of the shareable relationship between pairs of requests. In reality, for the ridesharing problem, the sharing relationship between requests may often contain three, four, or even more requests. Therefore, the higher-order shareable relationships cannot be expressed by such a traditional graph. However, the higher-order sharable relationships widely exist for most batch-based algorithms. So, we propose the structure of the shareability hypergraph to represent the higher-order shareability relations as follows.
Definition 6 (Shareability Hyper Graph). Let HG = R, E denote the shareability hypergraph for a given set of requests R, where E ⊆ P(R) (where P(R) is the power set of R). For each, e ∈ E represents that the requests included in e are shareable.
The shareability hypergraph HG can represent the different sizes of shareable request groups intuitively, but enumerating all hyper-edges of HG can be extremely expensive. We need to perform ∑ c m=2 C m n times shareable judgments to determine whether there are corresponding hyper-edges between nodes. Thus, it is impractical to construct the shareability hypergraph by enumerating all hyper-edges brutally in an online scenario, but a city's historical shareable request groups are instructive in the guidance for the existence of shareable hyper-edges. Therefore, we propose the following shareability prediction model based on the hyper-edge prediction model Hyper-SAGNN [30] as shown in Figure 2, which trains according to the historical shareable request groups in a city and predicts the shareability among the given requests by batch in a fixed time. Because of the constraints of requests defined in Section 3.1, the shareable requests often satisfy the following two conditions: (1) Temporal Locality-the requests are released in a similar time;

Biased Random Walk
(2) Spatial Locality-the requests share similar sources and destinations. For online ridesharing platforms, the shareability prediction model primarily serves to quickly predict the shareability between requests released in close time on the platform. Therefore, the request groups to be predicted already possess the temporal locality characteristics. To satisfy the spatial locality requirement, we intuitively divide the city into a certain number of grids. Specifically, we first divided the road network into row × col grids according to a fixed grid size δ as a parameter, and the sources and destinations of each request r i corresponded to g s i and g e i among the grids, respectively. Then, we uniformly encoded the requests whose sources and destinations fall in the same grid as the node n r i = g s i × N + g e i in the shareability hypergraph HG, where N = col × row. Thus, each node of the shareability hypergraph represents a class of requests that satisfy both temporal locality and spatial locality.
With the nodes on the hypergraph generated from the above steps, we enumerate the hyper-edges present in the historical request data of a specific city, i.e., the shareable request groups (as shown on the left of Figure 2, each color edge represents a single hyper-edge). Then, we generate a walking path for each node based on the constructed hypergraph by a biased random walk method and extract the features of the nodes ( − → x 1 , . . . , − → x k ) by a skip-gram model, which enables nodes with similar contexts to have similar embeddings. We feed the above node features to Hyper-SAGNN [30], a self-attentive-based graph neural network for hypergraphs, which can support arbitrary-sized link prediction tasks. Specifically, Hyper-SAGNN feeds the features of the nodes into both the static embedding network and the multi-headed attention layer to generate the corresponding static embedding and dynamic embedding of the nodes in the hypergraph. Then, the probability scores (p 1 , p 2 , . . . , p k ) are generated by a layer of position-wise feed-forward network with a sigmoid activation function. Finally, the average value of these probability scores is regarded as the probability of the existence of hyper-edge among requests (x 1 , x 2 . . . , x k ).
Based on such a prediction model, we encode the online requests to the nodes on the corresponding hypergraph in constant time to determine the shareability of the request group, so that we can quickly build a shareable network in batch mode. Meanwhile, the determination of the shareability between requests is a fundamental operation for different upper-level request dispatching algorithms, and the complexity of these algorithms can be greatly reduced by such a prediction model. For example, in the batch-base method shown in Algorithm 1, we have to enumerate up to ∑ c i C i n request groups line 7. However, based on the sharability prediction model, we can efficiently predict whether the request groups are shareable or not in batch with GPU devices, which greatly reduces the search space of request groups and improves the efficiency of the algorithm.

P-Ride: Shareability Prediction Based Ridesharing Framework
Based on Theorem 2 presented in Section 4.1 and the shareability prediction model proposed in Section 4.2, we devise the online ridehsharing framework P-Ride as shown in Figure 3. The P-Ride framework adopts the batch-based processing mode that optimizes the request group enumeration and route planning of existing batch-based methods.  In the request group enumeration, the P-Ride framework first enumerates the request groups by k-clique listing in the shareability graph. Then, it further checks the filtered candidate request groups in a batch manner by the shareability prediction model. In the request-vehicle matching, the P-Ride framework selects the optimal group of requests among all feasible request groups for each vehicle. Since in [31], Ma et al. revealed that reordering waypoints almost has no change in effectiveness but needs more time and space. Therefore, we generate service routes for the request groups based on the insertion method instead of enumerating all feasible routes.

Request-Vehicle Matching
The detailed steps of the P-Ride framework are shown in Algorithm 2. Firstly, we extract all the requests R − within the batch window and construct the corresponding shareability graph SG for the request set R − (line 3-4). Then, we try to select the most appropriate request group for each vehicle and plan a service route for it (line 5-15). In particular, we first search for all feasible candidate request groups (line 6-10). Based on Theorem 2, we enumerate the candidate request groups G in the shareability graph SG by the k-clique listing algorithm (line 8). Then, we predict the set of candidate request groups G in bulk by the pre-trained sharability prediction model M based on the historical sharable request groups. We only retain the request groups that are reported as shareable by the shareability prediction model M (line 9). After that, we pick the request group g * with the optimal unified cost UC(g, w) and plan a service route S j for it (line [11][12][13][14]. More specifically, we insert the source and destination of each request in the optimal request group g * into the appropriate position of the vehicle's route S j in turn (line [12][13][14].

Algorithm 2 P-Ride
Require: A set R of n requests, a set W of m vehicles, a batch period τ and a shareability prediction model M Ensure: The planned routes set S for vehicle w ∈ W 1: t ← current timestamp; 2: for every time period τ do 3: SG ← building the sharability graph for R − ; 5: for w j ∈ W do 6: G ← initialize a empty set for candidate shareable groups; 7: for k ∈ [1..c] do 8: G ← listing k-cliques g in SG; 9: G ← G ∪ M.eval(G ); 10: end for 11: g * ← min g∈G UC(g, w); 12: for r i ∈ g * do 13: S j ← insert s i and e i into S j by insertion; 14: end for 15: end for 16: t ← t + τ; 17: end for 18: return S = {S j |w j ∈ W};

Data Set
In the experiments, we use two real-life request datasets [16,32] from Chengdu (noted as CHD) and Xi'an (noted as XIA), China to demonstrate the effectiveness and efficiency of our proposed methods. Both datasets are available from the Didi GAIA [33] platform. The request datasets contain the latitude and longitude of the pickup and drop-off locations and the release time, but not the number of passengers for each request. Therefore, we generate the number of passengers fields for the CHD and XIA datasets based on the distribution in the NYC cities as [16]. Additionally, we set the deadline of each request r i as d i = t i + γ · cost(r i ), which is a commonly used configuration in many existing works [8,9,34]. We extracted requests data from 1 to 29 November 2016 for CHD and 1 to 30 October 2016 for XIA to train the shareability prediction model proposed in Section 4.2.
In the experiments for analyzing the effects of different parameters, we used data from CHD on 31 November 2016, and XIA on 31 October 2016, for testing. The distribution of the sources and destinations of the testing requests are shown in Figure 4.  The road networks of both cities are downloaded from Geofabrik [35] and segmented by Osmconverter [36] with city boundaries on OpenStreetMap [37] for CHD [38] and XIA [39], respectively (as shown in Figure 5). In addition, we also carefully trimmed the road networks according to the distribution boundaries of requests' sources and destinations so that there are fewer irrelevant regions as possible. The weight associated with each edge on the road network is the average travel time of the road segment. The details of the road network are shown in Table 3, where # indicates the number of corresponding fields. With the distribution of testing requests shown in Figure 4, we can visualize that the two datasets adopted in the experimental study have different distribution characteristics due to the different road network structures; the requests in the CHD are distributed in a star shape, while the requests in the XIA are distributed in a grid shape. Since the road network in Chengdu is more diversified, there are more candidate request groups available in the CHD, which results in generally higher service rates in the CHD than in the XIA with the same parameter settings (as shown in Figures 6-9c,d). The detailed experiment-related parameters are shown in Table 4 (default parameters are in bold).

Environment Settings
Implementation. We simulate the ridesharing and the driver's moving based on the released time of the requests. The request datasets are in the format of a sequence of GPS track points of the vehicle in serving each request. Therefore, we pre-map the pickup and drop-off locations of the requests to the nearest nodes on the road network through the VP-Tree [40]. Specifically, we map the requests' sources and destinations in the GPS track points to the nearest nodes on the road network within 1km through the VP-Tree, and we discard the requests with noisy pickup or drop-off locations where no nodes on the road network exist within 1km around GPS track points. The initial location of the vehicle is set to the earliest occurrence of GPS track points in the dataset. Additionally, we update the location of the vehicles according to the assigned travel routes every second. In the request-vehicle matching phase, we prune the requests that are too far away from the vehicle with the grid index for each tested algorithm. Note that we approximate the travel cost by dividing the Euclidean distance by the maximum speed in the pruning (e.g., cost(s i , e i ) ≈ euclidean(o s i , o e i )/v max ).
For the training data of the shareability prediction model, we set the parameter δ of the cell size of the road network division to 1.5 km, and we mapped the sources s i and destinations e i of historical requests r i into the corresponding cells g s i , g e i . We uniformly encoded the requests whose sources and destinations fall in the same grid as the node n r i = g s i × N + g e i in the shareability hypergraph HG = R, E , where N = col × row. For each subset Q of the powerset P (R) in the hypergraph, we first try to construct feasible routes through the insertion method offline, and we add a hyperedge to the hypergraph HG when there exists such a route that can serve all requests r ∈ Q simultaneously. With the above steps, we obtain the hypergraph HG with hyperedges for training the shareability prediction model. The other training parameters of the shareability prediction model are kept consistent with those of Hyper-SAGNN [30].
Running environments. All algorithms are implemented with C++ and compiled with -O3 optimization. The algorithms run on a single server equipped with Intel(R) Xeon(R) Gold 6258R CPU @ 2.70 GHz, NVIDIA Tesla A100 graphics card (contains 80 GB of video memory), and 1TB of RAM. Moreover, we implemented all algorithms in a single thread.

Approaches and Measurements
We compare the following four algorithms in our experimental study.
• pruneGDP [16]. It inserts the request into the vehicle's current schedule sequentially and selects the vehicle with the least increased distance for service. • BF. The Brute-Force method shown in Algorithm 1. It is in batch mode and enumerates all request groups among each vehicle's candidate requests. • P-Ride. The proposed prediction-based ridesharing framework in this paper. It achieves the prediction of shareability of request groups in a batch mode based on historical shareable requests by the shareability prediction model proposed in Section 4.1, which significantly reduces the unnecessary request group enumeration.
We report all algorithms' unified cost, service rate, and overall running time. Specifically, the unified cost adopts the evaluation of total revenue in [16], and the varying penalty coefficient p r is equivalent to the balance between income per unit time and fare per unit distance. The service rate evaluates the number of requests the platform accepts with a limited number of vehicles. The overall running time demonstrates the efficiency of the algorithms for processing the same number of requests. We early terminated those not completed experiments within 12 h.

Experimental Results
Effect of the number of vehicles. Figure 6 shows the results of varying the number of vehicles from 0.5 K to 2.5 K. As the number of vehicles increases, so does the service quality of the evaluated methods. The BF algorithm leads other methods for the uniform cost, which mainly benefit from its brute force enumeration strategy. The P-Ride performs very similarly to the BF algorithm. However, in terms of the overall running time, because of the high time complexity of the brute force computation in the BF algorithm, it takes nearly up to 40 min and 2.65 h to run on the two test datasets, respectively. In contrast, the performance of the P-Ride method proposed in this paper is 10.35 times and 4.39 times faster compared with the BF algorithm on the CHD and XIA datasets (as shown in Figure 6e,f), which mainly results from the fact that the clique enumeration strategy proposed in Section 4.1 avoids unnecessary enumeration of request groups. In addition, we further filter the candidate request groups using the shareability prediction model proposed in Section 4.2. Benefiting from the linear time complexity of the online algorithm pruneGDP, it leads in terms of overall running time. However, it performs poorly in terms of service quality (service rate and unified cost) because it lacks the analysis of the shareable relationships among requests. It should be noted that on the CHD dataset, the results of the BF algorithm at |W| = 0.5 K are not presented because there are too few vehicles and most requests cannot be served, resulting in a backlog in the platform, and the BF algorithm repeatedly processes these unexpired requests in each round of calculation. Moreover, it is also the main reason for the significant increase in the running time of P-Ride in Figure 6f. Effect of the number of requests. Figure 7 presents the results of varying the number of requests from 10 K to 90 K. Because the number of accepted and rejected requests increased significantly, the unified costs of all experiment algorithms grew. For the service rate shown in Figure 7c,d, the BF and pruneGDP gradually appear to be inadequate as the number of requests continues to increase. P-Ride performs the best, achieving a service rate improvement ranging from 2.91∼35.85% and 6.93∼38.99% over other methods at |R| = 90 K on the two datasets CHD and XIA, respectively. For the running time, the insertion-based method pruneGDP is still the fastest. In Figure 7e, P-Ride is up to 19.36× and 15.37× faster than BF on two datasets, respectively. When the number of requests |R| = 10 K, there are enough vehicles in the platform to serve all the requests, so the requests can be allocated quickly. Therefore, the running time gap between BF and P-Ride is greatly reduced in Figure 7e,f. Effect of the deadline. Figure 8 presents the results of the varying deadline of requests by changing the deadline parameter γ from 1.2 to 2.0. With the gradual relaxation of deadlines, the quality of service achieved by all testing methods has increased. The performances of P-Ride and BF are similar when we strictly set the deadline of requests, i.e., γ = 1.2 or γ = 1.3. The reason for this is that the number of candidate request groups for each request greatly reduced with a minor deadline, making it challenging to achieve noticeable performance improvements by applying request group enumeration strategies. We note that when the request deadline parameter γ = 1.5, the BF causes a significant increase in runtime due to a sharp increase in the request groups. In this case, P-Ride achieves a similar service rate and unified cost with only about 0.6% of the running time used by BF. However, when the parameter γ ≥ 1.8, both BF and P-Ride are incapable of processing all requests within the specified time limit on two datasets due to the dramatic increase in the number of candidate request groups. That is primarily because the number of feasible request groups cannot be reduced no matter how much of the pruning strategy is performed during the request group enumeration. Additionally, Figure 9e,f presents similar results for a similar reason. Effect of the vehicle's capacity constraint. Figure 9 illustrates the results of varying the vehicle's capacity from 2 to 6. In terms of unified cost, BF and P-Ride have similar performance in terms of service quality. However, since the number of request groups increases significantly with vehicle capacity for the BF method (e.g., when c = 6, the BF algorithm needs to enumerate C 6 n different request groups), the BF algorithm cannot finish within the given time limit when c ≥ 5 and c ≥ 6 on two datasets, respectively. When the capacity constraint of the vehicle c = 2, we observe that the BF algorithm can run in a shorter time than P-Ride. That is because the capacity constraint c = 2 means that the maximum number of request groups is 2, and the cost of constructing the shareability graph is already higher than the direct enumeration of BF at this time. However, the superiority of P-Ride gradually realizes with the increase of vehicle capacity constraint. We notice that when the vehicle capacity constraint c ≥ 4, the running time of P-Ride is up to 19.36× faster than that of BF on the CHD dataset. Additionally, on the XIA dataset, the P-Ride performs 712.88× faster than the BF algorithm as shown in Figure 9f. Therefore, P-Ride works better in request groups with diverse sizes.

Summary of the experimental study:
• The group-based methods (i.e., BF, P-Ride) have superior performance in terms of service quality (i.e., higher service rates and lower unified costs) compared to the online-based methods (i.e., pruneGDP). For example, the P-Ride achieves a service rate improvement of up to 38.99% compared to the other tested algorithm (servicing approximately 35,091 more requests for the platform). • The P-Ride shows excellent performance in most cases. For example, P-Ride runs up to 712.88 times faster than BF in Figure 9f. In other words, P-Ride can process the requests of XIA in 1.8 min, but BF takes up to 20.2 h.

Conclusions
In this paper, we study the dynamic ridesharing problem and optimize the request group enumeration with the shareability graph. Concretely, we first propose an efficient request group enumeration strategy based on the k-clique in the shareability graph, which helps one to achieve efficient enumeration of shareable request groups by the state-ofthe-art algorithm of k-clique listing in graph theory. Then, to represent the higher-order shareability relations, we extend the structure of the shareability graph [11,28,29] to the hypergraph. Furthermore, we devise a shareability prediction model to further filter the infeasible request groups by the historical shareable relationships, which significantly reduces the computational cost of existing batch-based methods [11][12][13][14] in enumerating request groups. In the experimental study, the extensive experimental results demonstrate that our method P-Ride achieves a better service rate and less unified cost than online-based methods and achieves a shorter running time than batch-based methods.
Author Contributions: Conceptualization, Y.C.; methodology, Y.C.; writing-original draft preparation, Y.C.; writing-review and editing, Y.C. and L.W.; supervision, L.W. All authors have read and agreed to the published version of the manuscript.