Customer incentive rebalancing plan in free-float bike-sharing system with limited information

: Free-ﬂoat bike-sharing (FFBS) systems have increased in popularity as a sustainable travel mode in recent years, especially in the urban areas of China. Despite the convenience such systems offer to customers, it is not easy to maintain an effective balance in the distribution of bikes. This study considers the dynamic rebalancing problem for FFBS systems, whereby user-based tactics are employed by incentivizing users to perform repositioning activities. Motivated by the fact that the problem is frequently faced by FFBS system operators entering a new market with limited information on travel demand, we adopt the ranking and selection approach to select the optimal incentive plan. We describe the system dynamics in detail, and formulate a proﬁt maximization problem with a constraint on customer service level. Through numerical studies, we ﬁrst establish that our procedure can select the optimal incentive plan in a wide range of scenarios. Second, under our incentive plan, the proﬁt and service level can be improved signiﬁcantly compared with the scenario without incentive provision. Third, in most cases, our procedure can achieve the optimal solution with a reasonable sample size.


Introduction
Bike-sharing systems (BSSs) have been widely adopted around the world, serving as "last-mile" connections to public transit networks for urban residents. As a typical environmentally sustainable travel mode, bike-sharing can largely contribute to the mitigation of traffic congestion, improvement of air quality, and reduction of carbon emissions in urban areas. Zhang and Mi [1] report that bike-sharing in Shanghai decreased carbon emission by 25,240 tons in 2016. Mobike, the largest Chinese BSS operator, claimed in 2017 that the total carbon emission reduction enabled by all of its users had reached 540,000 tons, which was measured by the total distance traveled by the "Mobikers" [2].
In recent years, a relatively new model of BSS, known as the free-float bike-sharing (FFBS) system has increasingly gained its popularity, especially in many Chinese cities. As opposed to the traditional station-based bike-sharing (SBBS) systems, where bikes are stored at specific kiosks (see Figure 1a for a representative system, Citi Bike in New York City), FFBS systems are dockless, and allow users to pick up and return bikes anywhere within their service region. In particular, owing to advanced information technology, the Chinese market has witnessed a surge in launches of privately operated FFBS systems since 2015. According to Chinese Ministry of Transport, by 2017 the number of dockless bikes and their users have reached 16 million and 130 million, respectively [3]. Mobike https://www.mobike.com ( Figure 1b) is a leading FFBS system operator, and has expanded its presence in numerous domestic and overseas cities. Despite the convenience and flexibility provided to users and its contribution to the sustainability of urban transportation, FFBS systems also face numerous challenges. Owing to the fluctuating and asymmetric demand for rides throughout the day, the spatial distribution of bikes is highly imbalanced [4]. It is commonly observed that parking areas and even sidewalks near bus stations are packed with bikes ( Figure 2 shows a bus stop blocked by an excessive number of bikes parked there), while the availability of bikes is rather low in some other locations. It is necessary for the operator to relocate bikes across areas at appropriate times to rebalance the system. A widely adopted rebalancing tactic is the operator-based approach, characterized by a fleet of trucks and staff dedicated to manually transferring bikes across different regions. However, this approach is often implemented at certain times during the day, so that bike numbers cannot be adjusted in real time. It is associated with considerable costs and an environmentally unfriendly operating mode since truck routing is extensively used in rebalancing. In this study, we consider a different strategy for rebalancing BSSs, namely a user-based approach where the system operator can provide users with monetary incentives to induce repositioning activities that are beneficial for the system balance, such as changing the destinations of certain rides. Such an approach has in fact already been practiced in BSSs that are in operation, such as Vélib in Paris [5] and Citi Bike [6]. Mobike also offering cash rewards to encourage users to move bikes from low-traffic areas to high-traffic spots like a subway stop or business district [7]. Given the increasing practical adoption of user-based rebalancing and the existing gap in research on this for BSS, we set out to address research questions regarding when system operators should provide customer incentives, and how large these incentives should be.
One problem faced by system operators, particularly those newly starting their businesses in certain cities, is the lack of information on demand and customer behaviors [8], e.g., their arrival rates, usage patterns, and willingness to accept the incentives. This problem occurs fairly commonly, given the recent rapid expansion of BSSs in China. Inferring this information from historical operational data in other cities, however, may not be effective. This is because that customers can be heterogeneous, and different incentive plans result in varied estimations, which further affect the estimation quality and consequent decision-making.
From a theoretical perspective, methodologies such as robust optimization (RO) can be deployed to resolve problems of limited demand information. The key idea of RO is to identify an uncertainty set of the unknown model parameters, and then optimize against worst-case realizations within this set [9][10][11]. However, for our problem settings where there exist various stochastic components regarding demand patterns, the structure of uncertainty set is very complicated and prevents the derivation of concise analytical results by the RO approach. In contrast, we adopt the ranking and selection (R&S) method, which is a data-driven approach in the simulation optimization (SO) field, with the advantage of determining the optimal solution with a certain probability by means of limited experiments.
For example, when a BSS operator enters a new market, limited information exists regarding customer demand and behavior, and it is difficult to construct an RO model. However, with the R&S approach in our study, no assumptions are made regarding the factors such as customer types, arrival rates, and ride times. We assume that a firm can only observe real-time data of these quantities. In certain cases, these variables can be estimated using historical data, which is somewhat useful for theoretical analysis, and offers certain benefits. However, the main drawback of this method is that the quality of estimation may significantly affect the results. Furthermore, when a firm enters a new market, there is little possibility of obtaining similar historical data, while the R&S approach fully utilizing the realized system outcomes data can be more suitable in such a setting.
Focusing on a typical FFBS system in the urban area, we describe its operational processes in detail, including the system's network structure, system users' demand characteristics, how the incentive provision interacts with ride patterns, and the associated costs incurred. The operator's profit maximization problem is formulated with a service-level constraint. That is, the system operator will guarantee a desirable service availability to customers. Owing to its complexity, it is challenging to solve this optimization directly, and we use discrete event simulations to compute the profit and deploy R&S, which is an extensively adopted SO approach, to derive an optimal incentive plan. This paper makes the following contributions to the literature on BSS. First, we develop a network model to describe the operational processes of a BSS, and describe the manner in which an incentive program can influence customer travel behavior, and thus the number of bikes across different areas of the system. A formal profit optimization model with a constraint on the customer service level is constructed to reflect the operator objective. Second, considering the limited information problem in decision making, we present an R&S procedure as the main methodology to determine the optimal incentive plan, which is feasible even when the model's random variables can follow any type of distribution. R&S takes full advantage of the observed real-time demand and significantly shortens the time required to derive the optimal incentive plan with a certain probability. This is very helpful for system operators with little relevant historical data, which is common in practice. To the best of our knowledge, we are among the first to tackle the BSS rebalancing problem with SO approach such as R&S. Third, through extensive numerical experiments, the validity and effectiveness of our procedure are tested against different model variants, parameter settings, and customer behavior scenarios. We find that the system operator can identify the optimal incentive plans with reasonable sample sizes.
The remainder of this paper is organized as follows. We review related literature on the BSS rebalancing problem, simulation optimization and R&S approach in Section 2. In Section 3, we construct a network model for the FFBS system, and describe its operational processes in detail, particularly the manner in which the incentive program facilitates the system balance through its interactions between the operator and customers. The operator profit optimization problem is also formally presented. The discrete event simulation process and the R&S procedures are demonstrated in Section 4, followed by comprehensive numerical studies for testing the validity and effectiveness of our procedures, as well as several further extensions, in Section 5. Finally, we conclude our paper in Section 6.

Literature Review
Our paper is closely related to two research streams: studies on rebalancing operations in bike-sharing systems, and the R&S method in SO literature.
The study of BSS has been an emerging research area in recent years. Demaio [12] and Shaheen et al. [13] discuss the history and future directions of BSS in detail. Readers are referred to [14][15][16] for more recent reviews. Gavalas et al. [17] examine existing literature concerning the design and management of vehicle-sharing systems, and classify them into strategic, tactical, and operational levels, based on the planning horizon. Under a similar framework, Laporte et al. [18] provide an extensive literature review of shared mobility systems. Various topics are discussed regarding the economic, societal and environmental issues of BSS, such as its externalities on the environment and public health [19], user behavior characteristics [20,21], host city resident acceptability patterns [22,23], strategies to improve its performance, profitability, marketing appeal [24][25][26] and its role in normalising cycling [27]. In the remainder of this section, we mainly review the literature on rebalancing operations given is closer relevance to our work.
BSS rebalancing approaches can be classified into operator-based and user-based strategies, where the former corresponds to the optimal dispatching of trucks to reposition bikes across different areas in a system, and the latter emphasizes directing customers to select more appropriate origins/destinations to realize a more balanced system (see [28] for a specific subdivision of rebalancing strategies). Certain studies consider static rebalancing, which is generally carried out at night when few customers are using the system. Chemla et al. [29] treat this as a pickup and delivery problem and present efficient algorithms. See [30][31][32][33][34], among others, for further research in this stream. In dynamic rebalancing, which is often performed during the daytime, operator transfers and user trips occur simultaneously, so more real-time information must be taken into account. Representative literature in this stream includes [5,[35][36][37][38].
Although various user-based rebalancing schemes exist, such as system regulation [39] and parking reservation policies [40,41], offering users monetary incentives is the most commonly studied scheme in the literature, and we do not distinguish between incentive-based and user-based approaches in this paper. In this line of research, Febbraro et al. [42] first propose the use of a fare discount to motivate users to change their destinations. In addition, Pfrommer et al. [43] examine the combination of dynamic truck routing and an incentive scheme design in bike redistribution, using model-based predictive control principles. Singla et al. [4] extend their work by incorporating a budget constraint on incentive provision and a learning procedure regarding a user utility function in optimal pricing into the model. [44] propose a deep reinforcement learning algorithm to solve a incentive-based rebalancing problem, which consider both spatial and temporal features in the system. Other related research includes [45][46][47][48][49][50]. Almost all of these studies conduct analyses on SBBS systems. Reiss and Bogenberger [51] study a user-based approach to a FFBS system, where the authors discuss the advantages and applicable scenarios for both operator-based and user-based relocation strategies for the FFBS system in Munich, and validat their results through an empirical analysis using GPS-booking data. However, in contrast to their analysis based on certain given relocation policies, our study constructs a more detailed quantitative model, and derives the optimal incentive scheme.
Given the complexity of our model due to the various stochastic components therein, and the assumption that limited information is available regarding customer demand, it is challenging to obtain analytical results of the optimal incentive plan. Consequently, we employ R&S from the SO field to address the problem. A SO problem is a nonlinear optimization problem in which the objective function is defined implicitly through a simulation and can only be observed with error [52].
The problem is frequently seen in various real-world applications such as manufacturing line design, inventory control, healthcare system operations [53]. Readers are referred to [54,55] for more detailed introduction to SO. R&S is a widely used approach in solving SO problems with finite solutions. It aims to select the superior option from a set of systems, which has the largest mean value under the sequential analysis framework. As opposed to the fixed sample size analysis, where a certain amount of data is required prior to carrying out the analysis, R&S uses the information from every new sample, and can save numerous samples with a similar probability of correct selection (PCS). Essentially, there are two approaches in R&S: the frequentist and Bayesian approaches. The frequentist approach can select the optimal system satisfying a required PCS (see [56] for a basic introduction). The Bayesian approach focuses on maximizing the posterior PCS given a fixed total sample size. Comprehensive literature reviews on R&S are provided in [57,58].
Using the frequentist approach, Bechhofer and Sobel [59] first propose a procedure based on the indifference-zone parameter δ, where δ is the smallest difference of concern. Paulson [60] improves these results by introducing the sequential analysis method. To improve the performance, Fabian [61] develops a tighter bound of selection. Based on these advancements, Kim [62] presents an effective procedure called the Kim-Nelson (KN) method. In addition to focusing on improving the bound, Hong [63] employs a variance-dependence sampling method to reduce the required sample size.
In this study, the best incentive plan must be selected under a service-level constraint. Most R&S procedures adopt a non-constrained optimization framework, such as the KN procedure [62]. Andradottir and Kim [64] and Hong et al. [65] present two types of procedures, taking the constraint into consideration. In this paper, we employ the AK+ procedure described in [64] to solve our incentive plan selection problem. We provide a brief description of the AK+ procedure in Section 4.

Network Model: Operational Process and Incentive Plan
To describe the FFBS system, we construct a directed network model which consists of several subnetworks representing different service regions in the system. Multiple nodes exist close to one another in each subnetwork, indicating the locations where bikes are mostly parked. Bike rides between different nodes are modeled as arcs connecting them. Such a model corresponds to the commonly observed operational phenomenon in which most bikes are accumulated at bus/subway stations, shopping malls, and other points of interest (POIs) [66] in urban areas. Furthermore, we assume that the geographical distances between nodes within a subnetwork are sufficiently close (e.g., less than 500 m), so that under an incentive-based rebalancing tactic users are only willing to change their destinations to other nodes within the same subnetwork. That is, we mainly focus on intra-subnetwork destination changes, i.e., the change of destination happens from one node in a certain subnetwork to another node that is also within the same subnetwork. On the other hand, the inter-subnetwork changes, which refer to the case that the customer's original destination and the destination he/she changes to are two nodes in two different subnetworks, are be accepted by customers. Similar region division techniques have also been adopted in the existing studies on free-float vehicle sharing systems [8,67,68]. Figure 3 illustrates a typical FFBS network, where the blue circles represent subnetworks and the black dots represent nodes therein.
To provide a model description that is concise and easy to understand, in the following we focus on a single-region model including a subnetwork with two nodes, denoted as nodes 1 and 2. All other nodes in the network are summarized as an "external node" denoted as node 0, as illustrated in Figure 4. Generalizations to arbitrary numbers of nodes and subnetworks can easily be derived (see Figure 5 for an example with three regions and two nodes in each region). In fact, regardless of the complexity of the entire network model, it can always be decomposed into several single-region models, because we only allow intra-subnetwork destination changes. So despite the simplicity of our focal model, it is quite representative and serves as a baseline for the analysis of the general settings.   We consider the model operating on a day-to-day basis in a continuous time setting. The time index is t ∈ [0, T], where T denotes the end of the operational process in a day. The customer demand for rides is modeled as a Poisson process N(t) with rate λ, as commonly assumed in the literature [36,69]. Each generated demand corresponds to a trip from node i to node j, with i, j ∈ {0, 1, 2}. The probability of such a trip is P ij , which satisfies ∑ i,j P ij = 1. It should be noted that P 12 = P 21 = 0, as we assume that nodes 1 and 2 are sufficiently close such that customers prefer to walk between them instead of riding a bike. The demand flows among different nodes and the corresponding probabilities are illustrated in Figure 4.
We now describe the operational processes during a single day in detail. Let B i (t) be the number of bikes at node i(i = 1, 2) at time t. We assume that both nodes have the same number of bikes at the beginning of the day for ease of analysis, so we let B 1 (0) = B 2 (0) = n. Furthermore, we assume that B 0 (t) is a sufficiently large number with little fluctuation to denote the number of bikes in the remainder of the system. The system evolves in the following manner. At time t, when a customer arrives in the system and generates the demand for a ride from i to j, he observes B i (t), and will leave the system if B i (t) = 0, i.e., if no bike is available at node i (note that this unmet demand phenomenon only occurs when i = 1, 2). An associated penalty cost c u is incurred by the operator, owing to the lost customer demand and loss of company goodwill. Otherwise, if B i (t) > 0 the customer picks up a bike and pays the price p for this trip, with a stochastic traveling time denoted by S. Note that in the practice of FBSSs such as Mobike, fixed price is charged per ride, regardless of the distance of the ride. So our assumption that p is node-independent is reasonable.
To maintain a high customer service level, the operator offers an incentive to customers to induce activities that are beneficial to the system balance when the bike distribution is highly uneven. In particular, when B 3−i (t) is considerably less than B i (t), customers traveling from node 0 to i(i = 1, 2) are motivated to change destinations from i to 3 − i. The incentive plan is characterized by two thresholds, θ i,1 and θ i,2 (which are called the lower and upper thresholds, respectively) and the incentive value I i . For analytical simplicity, we assume that θ i,1 = θ 1 , θ i,2 = θ 2 , (i = 1, 2) in the remainder of this paper. When any of the following three scenarios occur, I i is offered, and the charge for the trip from 0 to 3 − i will be p − I i : . Specifically, condition (1) means the number of bikes at node 3 − i is less than the lower control threshold θ 1 while this is not the case for node i; Condition (2) implies that the ride destination (node i) has more bikes than the upper threshold θ 2 while B 3−i (t) is less than this threshold; Condition (3) means the number of bikes at both nodes are less than the lower threshold, but bike insufficiency is more serious at node 3 − i. Note that in FFBS systems such as those operated by Mobike, each bike is equipped with a GPS device, so the operator can acquire real-time information on the bike distribution at each node, which facilitates the implementation of this incentive program. For a given I i , the customer will change their destination with probability of acceptance P a (I i ), which increases in I i and satisfies P a (0) = 0, P a (Ī i ) = 1, whereĪ i is the incentive under which the customer will definitely accept the offer. Customers accepting the incentive start their rides from node 0 to 3 − i and pay p − I i for the ride. We illustrate the above process with a flow diagram in Figure 6.
This interaction between the system operator and a customer is replicated in each transaction as time passes in a single day, with revenues from rides and penalty costs owing to unsatisfied demand accumulated. At t = T, we assume that the city administration department will check the number of bikes parked at each node. Because excessive bikes cause congestion and inconvenience, associated penalty costs are incurred when B i (T) > K i , where K i is the "capacity limit" of node i, such as the maximum parking number. Note that even for FFBS systems, bikes at a specific node can only be parked at limited spaces, such as sidewalks near the POI. Thus the existence of K i is reasonable. With a unit overage cost c o , the total penalty cost is proportional to the number of bikes exceeding the limit, i.e., c o (B i (T) − K i ) + , where x + = max{x, 0}. Finally, the operator will implement the truck-based rebalancing approach if the numbers of bikes at each node are extremely large or small. In our model, we assume that when B i (T) > K i or B i (T) = 0 for any i, a fleet of trucks and associated staff will be sent to reposition the bikes, and B i (T) is set to its initial condition n with cost c T . We illustrate the daily operational process (including riding transactions, checks on bike overage, and truck-based relocations) of the FFBS system using a flow diagram in Figure 7. The major decision variables and model parameters are summarized in Table 1.   Monetary incentive offered to customers whose destination is node i θ 1 Lower control threshold in incentive provision for each node θ 2 Upper control threshold in incentive provision for each node P a (I) Customer probability of changing destinations when offered incentive I c u Penalty cost per unmet customer riding demand c o Overage cost per bike when B i (T) > K i c T Truck-based rebalancing cost at time T γ Customer service level ω Minimum required service level π Total daily profit

The Stochastic Optimization Model
The operator aims at deploying the incentive plan optimally, i.e., determining I i (i = 1, 2) and θ 1 , θ 2 to maximize the total expected daily profit, while maintaining the expected satisfactory service level for customers. Similar models have been considered in [4,33,43]. We first present the mathematical expression of the total expected profit as follows: Here 1(x) denotes the indicator function of an event x, i.e., 1(x) = 1 if x is true, and 1(x) = 0 otherwise. 1 i (t) is the indicator function of the three conditions that triggers the provision of incentive plan at time t. Thus when one of the 3 inequalities introduced in Section 3.1 is satisfied, is the indicator function of a customer accepting incentive I i . The expectation in (1) is taken with regard to the arrival process N(t), traveling time S, and 1 a (I i ), note that 1 i (t) is determined by B i (t), which is further determined by the random variables above. The first term in the square bracket is the total profit for a certain sample path of random variables during the whole time horizon. The integrand represents the monetary change due to a specific customer arrival: The first term is the revenue collected when demand into the subnetwork arises (so incentive can be provided and accepted); The second term corresponds to the revenue earned when demand to nodes outside the subnetwork arises and bikes are available at the nodes; The third term is the penalty cost for lost demand. The second and third terms in the square bracket describe the overage and rebalancing costs at time T, note that 1 − ∏ 2 i=1 1{0 < B i (T) ≤ K i } is the indicator of the event that rebalancing is required for at least one node.
We define customer service level as the ratio of satisfied demand over the total potential demand, with its mathematical formulation as follows: γ denotes the service level. The numerator is the expected number of customers that arrives and get served (note that the term P i0 1(B i (t) = 0) corresponds to the scenario when demand cannot be satisfied due to lack of bikes at node i), and the denominator is the total demand. Note that this performance measure of service is also used in [33,49]. We exert a constraint γ ≥ ω to the profit optimization problem above, such that the operator should at least achieve the minimum service level ω ∈ (0, 1).
The generalization of this model to a multi-region model is not difficult to construct, and we omit it here for brevity of presentation. Notice that this optimization problem is highly complicated, owing to all the above stochastic components and various customer demand types with different O/Ds. The description of the system dynamics, as well as the corresponding model formulation and solutions, will be more intricate for the multi-region extension. Furthermore, in our focal scenario where historical demand information is limited, estimation on demand information is very difficult and can be inaccurate. With all of the above-mentioned complexities, it is very challenging to derive the optimal solutions analytically for this problem. Consequently, we adopt an alternative approach in simulation optimization to solve the problem, namely the R&S approach, which will be specified in detail in the following section.

Procedures for Optimization Based on R&S
In this section, we reformulate the model in Section 3 under the SO framework, and present the related R&S procedures. We can rewrite the optimization model as It should be noted that these sets contain a finite number of elements, corresponding to discretized feasible solutions. This is quite reasonable, because in practice incentives can only be provided with certain values (for example, if the charge per ride is $1, the incentive can be $0.1, $0.2, etc.), while the number of bikes is naturally an integer. We add a constraint γ j ≥ ω for each feasible solution j, which is to ensure the optimal solution derived can provide the desirable service level. Now, we can deploy the R&S procedure to determine the optimal solution numerically. In a more general manner and using the terminologies in the R&S field, we let k be the number of elements in J and i be an arbitrary element in J . Then, the feasible set can be presented abstractly as {1, 2, . . . , k}, and we want to determine The R&S approach aims to select the optimal solution from a set of solutions, and it belongs to the field of sequential analysis theoretically. As opposed to the fixed sample size analysis, R&S compares contending solutions following every observation. In this manner, it can take advantage of each observation. If the statistical variable reaches a given threshold, we can declare the results with a required probability for correct selection. Sequential analysis usually can curtail the comparison process and save numerous samples. We use the AK+ procedure presented by [64] as the main R&S algorithm.
AK+ is a fully sequential ranking and selection procedure which can solve constrained SO problem. Before the invention of AK+ procedure, most ranking and selection procures were focusing on unconstrained SO problems. AK+ procedure can choose the system with best or worst performance under certain constraint among a set of systems. In our model setting, it means that the procedure can choose the best incentive plan within those plan candidates that can achieve desired service level. Besides, as proved in Theorem 4 in [64], AK+ procedure can guarantee that the probability of correct selection is larger or equal to 1 − α, which is given artificially. In our problem, it means that we can choose the best incentive plan with a desired probability of correct selection. This is the reason why the chosen plan is optimal.
We provide two procedures for conducting SO by means of R&S. Procedure 1 provides an illustration of the daily operational process of the FFBS system, with which one sample consisting of a profit and service level is generated. Procedure 2 is our main procedure, describing the selection of the optimal solution through sequential sampling and comparison. Because the system operator has no information regarding the statistical distribution of the stochastic components in the model, it is necessary to obtain n 0 samples to estimate related information. This procedure will determine the feasibility of solutions and compare the performances of different solutions simultaneously. In the feasibility check, solutions that cannot satisfy the service level constraint will be eliminated. In the comparison step, inferior solutions with a low average daily profit will be deleted. Using these two steps, we can select the optimal system with the highest average daily profit and required service level. Note that for Procedure 1, since it focuses on a single ride with pre-determined origin and destination, we omit the corresponding subscripts somewhere for brevity. For Procedure 2, given that it is mainly base on AK+ procedure [64], we omit some unnecessary details in its specifications.

Procedure 1 (Daily Operations Procedure)
Setup. Set the incentive I, the initial amount of bikes B i (0) for each node, price per ride p, unit penalty cost c u , and the probability of incentive acceptance P a (I).
where d denotes the other nodes in the same region as d, which can be an alternative destination. If one of the three conditions holds, generate P ∼ U[0, 1]; otherwise, set r m = p. If P is generated and P ≥ P a (I), then set r m = p − I. If P < P a (I), set r m = p. The system dynamics of bike numbers follow the description in Section 3.1. Let m = m + 1. Profit calculation. If m ≤ M, return to Single ride profit calculation. Otherwise, calculate the total cost C, including the overage and truck-based relocation costs. The total profit for a single day

Procedure 2 (Customer Incentive Plan Procedure (CIPP))
Setup. Select the initial sample size n 0 ≥ 2, overall confidence level 0 < 1 − α < 1, service level ω, and indifference zone parameters δ and . Solving β and η from the following equations: The two equations are necessary for the proof of the probability of correct selection in our R&S procedures, see [64] for more details. Define h 2 = 2η(n 0 − 1) and function R(a, b, c, d) = max 0, cd 2b − ab 2 , which will be used in the following procedures.
Initialization. Let I = {1, 2, . . . , k} be the initial contending plan set. Let F = ∅ and SS i = ∅, where F and SS i denote the set of feasible solutions and the set that records the solutions inferior to solution i, respectively. Run Procedure 2 n 0 times and obtain observations π ij and γ ij , j = 1, 2, ..., n 0 from each system (feasible solution) i. Define a counter r = n 0 , which is the variable recording the sample size needed for the procedure. For system i, calculate its service level variance S 2 i for i = 1, 2, . . . , k with whereγ = 1 n 0 ∑ n 0 j=1 γ ij is the initial sample mean of service level. For all system pairs i, l ∈ I, calculate the profit variance S 2 il between systems by means of (10) For any pair of i and l ∈ I ∪ F and i = l, i / ∈ SS l , l / ∈ SS i , and If i ∈ F, then eliminate l from I or F. If i / ∈ F, add index l to SS i . Stopping rule. If |I| = 0 and |F| = 1, then stop and select the system with an index in F as superior. If |I| = |F| = 0, then declare that no feasible system exists. Otherwise, take one more observation of all systems in I or F by Procedure 2, set r = r + 1, and return to Feasibility check.

Numerical Investigations
In this section, we conduct comprehensive numerical experiments to test the validity and effectiveness of our procedures. We first present two representative cases. Then we provide certain extensions to explore the case with unequal incentive plans, and examine the influence of the threshold for incentives. It should be mentioned that for any given parameters, the optimal solutions derived from our procedure may not be unique. If the difference in profits for any two optimal solutions is smaller than δ, the indifference zone parameter, then we view these two solutions as equivalent. A more precise indifference zone parameter increases the total sample size required by the procedure. Certain specifications for the parameter settings are provided in the following subsection.

Parameter Settings
1. R&S parameters In this paper, we test the case with δ = 10. This means that for any two incentive plans, if the difference in daily profit is smaller than 10, we regard the two plans as equivalent.
The minimum required service level is ω = 0.8, and the indifference zone parameter for the service level is set to 0.01. The initial sample size n 0 is set to 30, and the overall confidence level is 1 − α = 0.95. We test 1000 macro-replications in all experiments. 2. Parameters for FBSS operations For the parameter settings for Procedure 2, we set the time unit per day in minutes, so T = 24 × 60 = 1440. Customer arrivals follow a Poisson process with a rate of λ = 1, and the ride time of a trip S is a uniform random variable following a distribution S ∼ U[0, 30]. Let the price of a single trip p = 1. The values for the unit penalty cost for unsatisfied customer demand are c u ∈ {0.5, 1, 1.5}, corresponding to different significance levels of customer satisfaction, low, middle, and high, respectively. The capacity limit K for each node is 10, and the lower and upper thresholds for providing incentives are set as θ 1 = 1 and θ 2 = 9, respectively. The initial number of bikes for each node is five. The acceptance probability when customers are offered an incentive is assumed to be proportional to the incentive value, P a (I) = I/Ī, whereĪ = p + c u . So customers exhibit a uniformly distributed willingness to accept an offer. The unit overage cost at the end of decision horizon is assumed to be c o = 1, and the truck-based rebalancing cost c T = 50.

Example 1 (Single-Region Model)
First, we consider the single-region model specified in Section 3.1, which focuses on the operation of a certain region, with all other regions regarded as an "external node". The cases that a region includes two and five nodes are both studied. Let P ij denote the probability of a trip from node i to j. In the two-node test, let P 01 = 0.4, P 02 = 0.1, P 10 = 0.1, and P 20 = 0.4. This setting implies that the inflow will be greater than the outflow at node 1, while the opposite is true at node 2. In this manner, an imbalance arises in the net inflows between the two nodes. For the five-node case, the probabilities are P 01 = 0.16, P 02 = 0.14, P 03 = 0.12, P 04 = 0.06, P 05 = 0.02, P 10 = 0.02, P 20 = 0.06, P 30 = 0.12, P 40 = 0.14, and P 50 = 0.16, which also reflects an imbalance in the net inflows among nodes (note that P ij = 0, i, j ∈ {1, 2, 3, 4, 5}).
We let the feasible set of incentives be {0, 0.5, 1, 1.5}, which is comparable with the price per ride p = 1, and corresponds to operational practice where a price discount is often selected from limited candidates, to simplify the decision. We present the numerical results of the optimal incentive plans (including the counts of all acceptable solutions), daily profits, and service levels (both the mean values and confidence intervals (CIs)) under different values of c u for the two-node and five-node cases, respectively, in Tables 2 and 3. The results demonstrate the statistical validity of our procedure. We also establish that the optimal incentive increases and the daily profit decreases as c u increases, for both tests. Therefore, when an unsatisfied customer induces a higher cost, the operator will accordingly set higher incentive, but the earned profit will decrease. In contrast, the discrepancies among service levels with different values of c u are not highly significant, partly owing to our minimum service level constraint. To illustrate the benefit provided by the incentive program, we compare the two performance measures (profit and service level) with those under the case where the operator does not provide any incentive to customers, which are shown in Tables 4 and 5. In general, we find that using operator-based rebalancing only indicates a considerable profit loss and lower service levels. Taking the model with two nodes when c u = 1.5 as an example. The average daily profit is 309.67 when the operator provides an incentive I = 1.5. However, the average daily profit without the customer incentive-based rebalancing plan is only −77.53 from Table 4. The service level is also improved from 70.48% to 91.98% by incentive provision. We notice that model results under other parameter settings are similar. The effectiveness of our procedure is verified by the results displayed in Tables 6 and 7, which show the statistics of the total sample size (total number of samples drawn in a single macro-replication) and iteration time (value of r in CIPP) for each c u in both tests. It is clear that the procedure can select the optimal solution within a relatively short time, particularly when the penalty cost is small, which demonstrates the merit of the sequential analysis paradigm of R&S. From a practical perspective, this is very important for system operators, who wish to determine the optimal incentive plan quickly. For example, suppose that they can implement a similar procedure using a daily incentive plan test corresponding to one replication in our experiment. Usually, the optimal plan can be acquired within a couple of months by utilizing the daily operational data.

Example 2 (Multi-Region Model)
Next, we present numerical experiments for the multi-region model, which differs from the single-region model in that the dynamics of bike numbers across the system become more complex, and the incentive plan provided for a single node will also influence other nodes across different regions. However, we can still deploy our procedure to carry out simulations and numerical optimization, as the underlying rationale remains the same. In this example, we consider a three-region model, in which each region has two nodes, and an external node is again incorporated to represent other regions. However, the notation becomes slightly more complicated. Let P o,d denote the probability of a trip from the origin o to destination d, where both o and d can be elements in {0, ij}, i = 1, 2, 3; j = 1, 2, where i is the region index and j is the corresponding node index. Now, let P be a 7 × 7 probability matrix storing all the values of P o,d , with its columns and rows arranged in the order {11, 12, 21, 22, 31, 32, 0}. Then, for our experiment P is set to be Note that we have P 11,12 = P 12,11 = P 21,22 = P 22,21 = P 31,32 = P 32,31 = 0, since the mutual distances between nodes in the same subnetwork are very close, as we mentioned in Section 3.
As with the previous example, we present the operational performance results (Table 8), the effectiveness tests for the R&S procedure (Table 9), and the counterparts when no incentives are provided (Table 10) under different c u values. We find that many of the discoveries from the single-region model still apply here, such as the influence of the penalty cost on the optimal incentive and daily profit, and the substantial benefit of introducing an incentive program (although the differences among performance measures are smaller). The consistency between the different models supports our argument that to a certain extent the single-region model provides an effective representation of the complete network model. We also find that comparatively more samples may be required on average to achieve the optimal plan when c u is larger.

Procedure Validity
Our procedure selects more than one solution in certain cases. For example, when there is a single region with five nodes and c u = 1.5, the procedure selects I * = 1 as the optimal solution 658 times and I * = 1.5 becomes optimal 342 times. It should be mentioned that the difference in the average daily profit for these two solutions is rather small, at 0.62. Therefore, these two solutions are both optimal, because we set the indifference zone parameter δ to 10. If the firm wants to select a unique solution, then it can choose a significantly smaller δ. In summary, our procedure can select the optimal solution with the required probability of correct selection under a suitable indifference zone assumption.

Sample Size and Iteration Time
The sample size represents the total number observations collected by the procedure, which is the sum of the observation numbers for all solutions. The iteration time can be regarded as the number of days of observation. Because the R&S method follows the sequential analysis paradigm, certain solutions can be eliminated during the procedure. The total sample size is smaller than the product of the iteration time and number of solutions. In our opinion, the iteration time is the most important performance measure, as it represents the number of observation days required for decision making. For example, in Table 6, the average iteration time for the case in which the penalty cost c u = 1.5 is 54.66, which means that an optimal decision can be reached after 55 days, given that the firm has no information regarding the market.
The iteration time can be influenced by several factors, such as the indifference zone parameter δ, and the mean and variance of the daily profit. A larger δ results in a larger tolerance level for an acceptable selection. In this paper, we set δ = 10, so that if the difference between the average daily profits for two solutions is smaller than 10, we can accept either as the optimal solution. A firm can set this parameter according to the actual scenario and its operation style. However, it should be kept in mind that a significant decrease in δ will lead to an increase in the total sample size and number of iterations. Moreover, the mean and variance of the average daily profit play a vital role. If the means of the average daily profits of two solutions are quite close, additional samples will be required to determine which solution is superior. The variance measures the fluctuation in daily profit. For example, in Table 6 if the penalty cost c u = 1, the iteration time will be significantly larger than c u = 0.5 or 1.5. The reason for this is that in the cast that c u = 1, the difference between the two solutions I = 1 and I = 1.5 is smaller than δ, which increases the difficulty in selecting the optimal solution. A higher variance will increase the total sample size and number of iterations.

Application to More Complex Situations
We first study a single-region model with two nodes, and then consider one with five nodes and the extension with three regions with two nodes. In this manner, we can demonstrate that our procedure can be applied to different cases and even more complex situations. In business practice, there may be many regions and nodes in a BSS, but the analytical procedure remains almost the same. Our procedure can also be applied for cases with various demand rates and ride time distribution patterns, as described below. The procedure can select the optimal incentive plan with the desired PCS, regardless of the true distribution.

Extension 1 (Thresholds for Incentives)
It is worth studying the conditions under which the incentive plan will be initiated. In our model formulation in Section 3, the control thresholds were introduced to trigger the implementation of incentive provision, and in the numerical studies above we simply set pre-determined values for θ i (i = 1, 2). It is interesting to determine how varying the thresholds can influence our results. To reduce the computation time, we consider a "symmetric" threshold pattern, where θ 1 and θ 2 satisfy We conduct comprehensive numerical experiments to investigate equal and unequal threshold scenarios, and our results are recorded in Tables 11 and 12. We find that for each c u , the optimal incentive is unique and stable, while there may be several optimal thresholds, and their distributions are not highly skewed (for example, when c u = 1 the number of solutions (1, 1) and (1,2) are comparable). This finding shows that compared with the provided incentive value, the impact of adjusting when the incentive should be initiated is not highly significant. Thus, when computational resources are limited, focusing on incentive value optimization can offer a rather satisfactory solution, saving on the corresponding sample sizes required to determine the optimal plan. All of the numerical studies conducted above assume that arrival and ride times follow Poisson and uniform distributions, respectively. Here, we relax this assumption in order to test the universality of our procedure under other distributions. We consider two specific settings: the first assumes a renewal arrival process with a binomially distributed inter-arrival time τ ∼ B(10, 0.1) and uniformly distributed ride time S ∼ U[0, 30] (denoted by BAUR), and the second assumes a Poisson arrival time with rate λ = 1 and exponentially distributed ride time with mean 15 (denoted by PAER). Numerical experiments are carried out for the single-region and multi-region models. The settings for other model parameters are the same as those above.

Single-Region Model
In this section, we examine the cases of a single-region model with two nodes for BAUR and PAER. Tables 13-15 present the results for BAUR. From the numerical studies, we conclude that the procedure can operate effectively in the BAUR scenario, demonstrated by both the validity and improvements in daily profit and service level. A similar conclusion can be reached for PAER from the results in Tables 16-18.

Multi-Region Model
Furthermore, we test the procedure under BAUR and PAER for the multi-region model, where we set the number of regions as three with two nodes in each region. All other parameters are the same as those set before, except for the arrival rate and ride time distributions. From the numerical results presents in Tables 19-24, we determine that the procedure can select the optimal incentive plan with a probability of 1 under both the BAUR and PAER scenarios. Both the daily profit and service can be significantly improved using the incentive plan selected by the CIPP.

Conclusions
In this study, we present a rebalancing scheme based on customer incentives for free-float bike-sharing systems, an environmentally sustainable travel mode that gains its increasing popularity. We describe the dynamic operational processes of the system under a continuous-time setting, with the network structure and demand patterns specified in detail. A stochastic optimization model incorporating a service level constraint is constructed, which describes how monetary incentives change customers' riding behavior and influence the revenue accumulation/cost incurrence of the system operator. Given that the problem is often faced by firms with limited demand information so a clear understanding of the stochastic components in the model is unavailable, we adopt the R&S approach as the main methodology, which is widely used in solving simulation optimization problems. Using our proposed procedure, the system operator can select the optimal incentive plan with a reasonable sample size. By means of comprehensive numerical studies, we determine that the optimal customer incentive-based rebalancing plan can not only significantly improve the average daily profit, but can also achieve a higher service level. The overall performance of the procedure, including its validity and effectiveness, is tested for various parametric settings and model variants, demonstrating the wide applicability of our approach across different scenarios.
It should be pointed out that in order to provide a concise description of the complicated system and to facilitate the tractability of the problem, several somewhat strong assumptions are adopted in this paper. For example, we assume the cost/demand parameters and the incentive plans provided are node-independent and stationary in time. These settings may to some extent limit the practical applicability of our model. Notwithstanding, our study is among the first attempts to employ simulation optimization methodologies such as R&S, in solving rebalancing decisions in FFBS systems. Our results imply that well system performance can be attained even if little information on demand is available, which can also provide insights for incentive-based rebalancing in the real world. Given the increasing attention paid to FFBS systems by both researchers and practitioners, we believe there are abundant research opportunities in the future, and we list several below.
First, one direction for future research may focus on more general extensions of our model, which can incorporate non-stationary demand process, node and region-specific incentive plans, etc. Second, most of the parameter settings in our numerical investigations can reflect the operational practice, but are still mostly artificial. Given that empirical analysis based on data from real-world BSSs are critical and has also been used in existing studies, another possible extension can be case studies with computational procedures adjusted to a large-scale model, which will increase the applicability of our methodology. Finally, apart from including company profit and customer service level into our decision model, given that the critical role played by FFBS systems in reducing carbon emission and air pollution, we may consider some environmental performance measures in future research.