A Dynamic Strategy for Home Pick-Up Service with Uncertain Customer Requests and Its Implementation

: In this paper, a home service problem is studied, where a capacitated vehicle collects customers’ parcels in one pick-up tour. We consider a situation where customers, who have scheduled their services in advance, may call to cancel their appointments, and customers, who do not have appointments, also need to be visited if they request for services as long as the capacity is allowed. To handle those changes that occurred over the tour, a dynamic strategy will be needed to guide the vehicle to visit customers in an efﬁcient way. Aimed at minimizing the vehicle’s total expected travel distance, we model this problem as a multi-dimensional Markov Decision Process (MDP) with ﬁnite exponential scale state space. We exactly solve this MDP via dynamic programming, where the computing complexity is exponential. In order to avoid complexity continually increasing, we aim to develop a fast looking-up method for one already-examined state’s record. Although generally this will result in a huge waste of memory, by exploiting critical structural properties of the state space, we obtain an O ( 1 ) looking-up method without any waste of memory. Computational experiments demonstrate the effectiveness of our model and the developed solution method. For larger instances, two well-performed heuristics are proposed.


Introduction
Nowadays, home pick-up service is very common, which involves passenger pick-up service, such as company's staff commute, as well as item pick-up service, such as household trash and express parcels. Among these home services, home pick-up express parcel service is becoming more and more popular, especially in China. In 2019, there are already almost 1200 express firms to provide home pick-up parcel services in China [1] and the number is continually increasing. Hence, to handle the constantly increasing demands and also the more competitive pressure, the service providers must optimize their activities to decrease their costs, improve service qualities and enhance productivities. In addition, as denoted in [2], this business is actually in a low margin market and is very challenging. Hence, these providers have to serve customers faster and more flexibly. Therefore, they need to deal with dynamic changes and uncertainty.
In this paper, we will focus on dynamic changes and uncertainty in customer requests occurring in home parcel pick-up services. One common uncertainty is the arrival of new customer requests during the pick-up service tour. Another more common but often ignored uncertainty in home pick-up service is the cancellation of a pre-scheduled service. For example, the United Parcel Service (UPS), a well-known express company, clearly says on its webpage [3] that "you can cancel or modify a UPS On-Call Pickup any time before the driver arrives". Thus, in order to make our study more close to the practical situation, we take those two types of uncertainty on customer requests into account, i.e., cancellations of pre-scheduled services and new requests for unscheduled services. Aimed at minimizing the vehicle's total travel distance within one pick-up tour, we will try to find a dynamic strategy to guide the vehicle to collect parcels in response to various realization of customers' requests.
Clearly, the presented problem is a variant of the well-known Vehicle Routing Problem (VRP). More specifically, as denoted in [4], the problem belongs to the class of Dynamic Vehicle Routing Problem with Stochastic Customers (DVRPSC). As one of the hardest problems in operations research, VRP has received considerable attention. For the latest research on VRP, please see the literature [5][6][7][8][9][10][11][12][13][14][15][16][17][18]. For the latest development review, please see the literature [19][20][21]. Since the literature related with the VRP is very vast, it is difficult to review all the developments of VRP. Hence, we only confine our attention to the following review on DVRPSC. DVRPSC problems mainly arise in an environment where some customer requests are known in advance but others are revealed during the operational process. Due to its practical relevance, DVRPSC has attracted a lot of attention. For thorough reviews, we refer to Larsen's PhD dissertation [22] and the literature review [23] by Pillac et al. Generally, such problems are solved by computing sequences of deterministic problems (routing among customer requests that are already known). As noted in [24], although such sequential deterministic approaches allow for the direct use of existing methodologies developed for static optimization problems, the research community has realized that it is necessary to look for algorithms that recognize customer requests that might become known, leading to behaviors where vehicles are kept close to customer groups who are likely to place orders. Potvin et al. [25] and Ichoua et al. [26] investigate some classes of policy function approximations, which introduce rules for reserving vehicles for orders that have not yet become known. In the studies [27,28], similar algorithmic strategies are referred to as online algorithms, which often are implemented as simple rules that compute in real time to react to new information as new customer request arrives. Another strategy to deal with this uncertainty is to generate scenarios of potential customer requests in the future. This strategy has been used in [29] by Bent and Van Hentenryck and in [30] by Hvattum et al., where multiple scenarios of future customer requests are used to improve decisions for now. Another strategy is based on sampling, such as Monte Carlo sampling on future customer requests. This idea is investigated in [29] by Bent and Van Hentenryck and [31] by Mercier and Van Hentenryck in [32]. We also mention that Powell et al. provide a detailed tutorial on approximate dynamic programming for DVRPSC in [24].
Obviously, the existing studies on DVRPSC either produce a preprocessed decision through a static approach (e.g., [2,33]), or generate an online decision by an approximated dynamic approach (e.g., [29,34]). To the best of our knowledge, there are no studies that focus on an exact dynamic approach to simultaneously deal with two types of stochastic customers: customers whose requests might be cancelled, and customers who will make new requests in the operational stage.
Thus, in this paper, in response to the aforementioned two types of uncertainty on customer service requests, subject to vehicle's limited capacity, an exact dynamic strategy on how to collect customers' parcels within one pick-up tour and its implementation will be studied. Since Markov Decision Process (MDP) is an ideal dynamic optimization tool and has been well studied both in theory and applications (e.g., [35][36][37][38]), we formulate this problem as a multi-dimensional MDP with a finite exponential-scale state space, where the defined state consists of four components in terms of two sets and two numbers. In order to facilitate looking-up and storage operations in implementation, we have successfully designed a key only consisting of three numbers for each state. Then, by using Dynamic Programming (DP), an exact one-tour dynamic strategy with the minimum expected travel distance is obtained, in which the computing complexity is exponential. Since during the solution of MDP by DP, the already-examined states' records need to be iteratively looked up, in order to avoid complexity continually increasing, we aim to develop a fast looking-up method to obtain an already-examined state's record. Although generally this will result in a huge waste of memory space, in this paper, based on our designed state's key, by exploiting some critical structural properties of the state space, we obtain an O(1) looking-up method without any waste of memory space. Finally, for larger instances which are not applicable to be solved by DP, we propose two well-performed heuristic methods. These heuristic methods are evaluated by comparing their computational results with the results obtained through using strategy from DP on some computable instances.
The main contribution of this paper is the development of an MDP model that simultaneously considers two types of uncertainty on customer service requests occurred in vehicle routing problems with pick-up services. To the best of our knowledge, this is the first time to tackle uncertainties due to cancellations and new requests by an exact dynamic approach. Second, for the complexly and diversely constituted state in which numbers and sets coexist, we develop a mapping technology such that a key consisting of only three numbers is obtained for each state. This largely facilitates the operations during the implementation. Furthermore, in solving via dynamic programming, to avoid the increasing computing complexity, we exploit critical structural properties of the state space and obtain an O(1) looking-up method without any waste of memory space. Finally, two well-performed heuristics are proposed for larger instances.
The remainder of this paper is organized as follows. Section 2 presents the basic setting, the development of the mathematical model, and the solution method. In Section 2, mapping technology is described, structural properties of the state space are exploited, O(1) looking-up method is described, and the experimental results are presented. In Section 4, two heuristic methods are proposed and evaluated. Section 5 concludes this paper with a discussion on future research directions.

The Basic Setting
The background setting for this study is stated as the following: a capacitated vehicle is used to collect customers' parcels within one pick-up tour. Specifically, the vehicle starts from a depot, then visits customers one by one, and finally returns to the depot. There is a set of customers whose pick-up services have been scheduled before the vehicle starts. Hence, unless they call for cancelling their appointments, those customers must be visited. We define them as "D-customer"s. For customers who do not have pre-scheduled appointments, they may call for a pick-up service during the courier's shift. However, they will be visited only in the situation where there is some remaining capacity in the vehicle. We define such customer as "P-customer". We define k as the vehicle's capacity for P-customers' request. Note that k can be zero and it is not necessary to keep k being same across tours. Actually, k depends on the vehicle's capacity limit and the total capacity of D-customers' parcels in the current tour. Certainly, if many unscheduled services are requested from P-customers, they will be dealt with in another tour. We say that each D-customer is always active until his service is finished or his service is cancelled, while we say that a P-customer will become active when his call for service is accepted and will become inactive after his service is finished.
Given that calls for cancellations from D-customers and for unscheduled services from P-customers occurring during the vehicle's shift, which are actually stochastic over time, we aim to find an effective strategy to support the vehicle on how to visit customers in response to those calls so that the total expected travel distance is minimized. Before presenting our MDP model, we give the following mild assumptions that support our model development. It is noted that the first three generally match the reality. Assumptions 4-5 are very common among most MDP models with stochastic arrivals. The last two assumptions are introduced to simplify the complexity of our MDP for a better tractability. Nevertheless, in most instances, the assumption 6 also matches the reality. For example, in China, most express firms have basic requirements on customers' parcel capacity [39].
1. Pick-up service is limited to a certain geographical district such that the location of customers can be known in advance. Moreover, the travel distance between any two customers or between one customer and the depot is certain in advance, which is proportional to the travel time by rate 1. Service time of each customer is also certain in advance.
2. When the vehicle finishes serving one customer, it will be instructed by the dispatcher as to which active customer it needs to visit next. At the same time, the next customer, either an active D-customer or an active P-customer, will be informed as well.
3. If there is no active customer after the vehicle finishes serving one customer, it will return to the depot. 4. Calls for pick-up service from one inactive P-customer follow a Poisson Process with rate λ p . 5. During the vehicle's shift, every D-customer can call to cancel his appointment independently before it is informed as the upcoming customer. Once the service is cancelled, this D-customer will become inactive and will not call for service again. We assume that every D-customer's service cancellation calls are generated by a Poisson process with the same rate λ d .
6. Every active P-customer's parcel has the same unit capacity. Thus, "k capacity for P-customers" means the vehicle can pick up at most k P-customers' parcels. 7. A cancelled appointment from a D-customer does not lead to a new pick-up service opening for any P-customers, i.e., k is fixed regardless of cancellations.

Modelling via Markov Decision Process
Taking reference from literature [40], where "A decision epoch begins when the vehicle arrives at a location and observes new customer requests", we indirectly adopt this idea and regulate that a decision is made when the vehicle arrives at one customer and finishes his pick-up service. Based on this basic thought, we divide the whole service tour into multiple stages and define one stage as the travel from one customer to his subsequent customer, including service for the latter. We restrict that the first stage starts at the depot and the last ends up at the depot. As similar as [40], we define that decision epoch begins when the vehicle finishes serving one customer. Clearly, under this definition, combined with the assumption 3, the P-customers' requests arriving in the last stage will only be serviced in another tour.

Definition of State, Action and Cost
• State: Still taking reference from literature [40], where one state contains information about the vehicle's current location, and each customer's current status, in our model, we also define that the state contains information about the vehicle's current location and each customer's status. In our problem, due to there being exactly two statuses for each customer (active and inactive), we only describe the customers who are in the "active" status. Since in this study two different types of customers are considered, we respectively express the active customers for each type. In addition, in literature [40], the current status of its main constraint "time" is also contained as a component in a state. In our problem, since the vehicle's capacity acts as the main constraint, so, combined with assumption 6, the state should also contain the current number of already accepted P-customers' service requests. To sum up, we define a state at the start of each stage as a quadruple, and it contains (a) the number of already accepted P-customers' service requests; (b) the set of currently active D-customers; (c) the customer whose service has been just finished, namely the position where the vehicle is; and (d) the set of currently active P-customers. Clearly, such defined state satisfies the Markov property. The state space consists of all possible states within all stages. • Action: In response to each state, we define its action as the selection of an active customer that will be directly visited by the vehicle next. Obviously, from the current state, the subsequent state will be resulted from this action and some new arrival calls from both active D-customers (for cancellations) and inactive P-customers (requesting for unscheduled services) during the time of the nurse's travelling and service. • Cost: We define the cost of a state as the expected value of the future contribution to the tour length, assuming that all subsequent actions are optimal.

Notations
At the beginning of the whole service tour: D the set of customers whose pick-up services have been pre-scheduled in advance, |D| = d; P the set of all potential pick-up requesting customers, |P| = p; d ij travel distance (time) between customer i and j; t i service time needed by customer i ∈ D ∪ P; λ d Poisson rate of each active D-customer generating calls for service cancellations; λ p Poisson rate of each P-customer generating calls for unscheduled services; k the vehicle's capacity for P-customers' pick-up service.
At the beginning of one stage: location the customer whose service has just been finished. For the first stage, we define its location to be o, which denotes the depot; Dset the set of all active D-customers at the present moment, Dset ∈ D; Pset the set of all active P-customers at the current moment, Pset ∈ P; c the number of all pick-up requests already accepted from P-customers; s = (c, Dset, location, Pset) one state to describe the historic aggregated situation; ∆P = P\Pset the set of current inactive P-customers; ∆D = Dset the set of current active D-customers; ∆D j = Dset\ {j} , j ∈ Dset the subset of Dset just without element j; h the number of active D-customers who call to cancel their scheduled services in the current stage; ; l the number of inactive P-customers whose calls for pick-up services are accepted in the current stage; ∆P l (|∆P l | = l, l ≤ |∆P|) one subset of ∆P in the size of l; G P l the collection of all ∆P l s; G D h the collection of all ∆D h s; the collection of all ∆D j h s; C s the cost of state s; s a the optimal action to state s, which is an active customer from Pset ∪ Dset to be visited next.

State Transfer Equation and Probability
A state can be expressed as (c, Dset, location, Pset). Given state s = (c, Dset, i, PSet) and an action j, j ∈ Pset ∪ Dset, the subsequent state will depends on both h, the number of active D-customers cancelling their scheduled services, and l, the number of requests accepted for unscheduled services from inactive P-customers, during the current stage. Thus, the State Transfer Equation can be obtained as Formulas (1)-(3). Based on those equations, the action s a is just the customer j (j ∈ Dset ∪ Pset) which makes C s achieved. The initial state is (0, D, o, ∅) and its cost is the objective value: In the formulas above, , which gives the probabilities that given state s and action j, h specific active D-customers cancel their scheduled services during the time T ij , T ij = d ij + t j . P m = P r (l|s, j), which gives the probability that given state s and action j, l specific inactive P-customers' calls for unscheduled services are accepted during the time T ij . Next, we will discuss how to compute P 1 d , P 2 d , and P p , respectively, in two cases of l < k − c and l = k − c. (i) In the case of l < k − c Clearly, for a given s, |∆P| is certain. Since the capacity has not been reached (c + l < k), there is no P-customer's call being rejected. Thus, those l particular P-customers generate calls that are all accepted. Other |∆P| − l P-customers have not generated calls. Thus, we can get the probability that during T ij , l P-customers are accepted is in which the term 1 − e −λ p T ij is the probability of one P-customer generating at least one call during T ij . Sine no limit on how many active D-customers can cancel their services, there are exactly h such customers generating calls. Thus, we can get P 1 d and P 2 d as We use B to denote the event of at least l inactive P-customers generating calls during T ij . Its complementary event B can be easily obtained as the sum of l different sub-event B q s (0 ≤ q ≤ l − 1). B q represents during T ij there are exactly q inactive P-customers generating calls. Thus, the probability of B q can be obtained as Then, we can get the probability of event B as Thus, in the case of l = k − c, we can get P m as Clearly, P 1 d and P 2 d are the same as the case of l < k − c.
Time complexity on probability computing Since in the implementation, the bit width of each variable is always subject to computer's basic restriction, we assume that time complexity on one multiplication operation and one division operation is respectively O(1). In addition, we assume the time complexity on computing e x is O(1). Thus, given state s and its alternative decision j, j ∈ Dset ∪ Pset, on the assumption that all values of ( |∆P| s ), 0 ≤ s ≤ k, are given, time consumed on computing all related P 1 d s and P 2 d s is O(d) and time consumed on computing all related P p s is O(k).

Solving via Dynamic Programming
In this paper, the cost of a state has been defined as the expected remaining length of the service tour given that all subsequent actions are optimal. The goal is to find the costs and optimal actions at all states, which will define the optimal dynamic strategy.

Algorithm
We divide all states into five categories and then solve the problem using dynamic programming in six steps. The whole frame is stated in Algorithm 1 (A1). The detailed operations for the last four steps are described in Algorithm 2 (A2).
Step 3: For s with c = k, |Dset| = 0, compute C s and s a .
Step 4: For s with c < k, location = o and |Dset ∪ Pset| = 0, compute C s and s a .
Step 5: For the initial state (0, D, o, ∅), compute its cost and action. The cost is the optimal objective value.
From A1, we can see that, in Step 1, states' costs and actions can be obtained directly, which will act as the initial values for the dynamic programming process. Probability of service being cancelled needs to be taken into account from Step 3. Probability of new service request needs to be taken into account from Step 4. In Step 5, there is only one initial state (0, D, o, ∅) that needs to be computed. From A2, we can see that a state's cost cannot be directly obtained but depends on other already obtained states' costs which are stored in the memory space.

Algorithm 2
Operations for a given state in Steps 2-5.
Look up the cost of state End for 9. Obtain End for 11. Obtain End for 13.
Look up the cost of state s 2 = (c + l, Dset\∆D h , j, Pset ∪ ∆P l \ {j}), denoted as End for 27. Obtain End for 29. Obtain and s a as the j which makes C s achieved.

Complexity Analysis
can be computed in O(pk), so the time complexity in this step is O(d + pk). In Step 1, since there are totally at most (k Currently, we suppose time consumed for looking up one already obtained state's cost in memory space is O(1), which will be proved in Section 3. In Step 2, given a state, its time consumed is O(|Pset|). Due to |Pset| can vary from 1 to k and the number of states with |Pset| ) and the number of states with |Pset| = k is d( p k ), the total time consumed in Step 2 is: In Step 3, given a state with (c, Dset, i, Pset, ), c = k and |Dset| = 0, under a specific action j ∈ Dset ∪ Pset, the time consumed on computing probability is O |Dset| , and there are totally O(2 |Dset| ) times looking-up-memory operations. Thus, clearly the time consumed on computing probability is dominated and hence, given one state, time consumed for its cost and action is: Since, in this step, there are totally no more than (d + p)(p + 1) k 2 d states, so the time complexity in Step 3 is: In Step 4, given a state (c, Dset, i, Pset), |Dset| + |Pset| = 0, c < k, similar as analysis on Step 3, we can obtain that time consumed on computing its cost and action is: Since, in Step 4, there are totally no more than k(d + p)(p + 1) k 2 d states, time complexity in Step 5, there is only the initial state needed to be computed and the process is similar as for one given state in Step 4, time consumed is completely dominated by the time complexity in Step 4.
To sum up, including the time consumed for computing the probability of states' transfer and time consumed for looking up the already computed states' cost (O(1)), we can get the total time complexity by using dynamic programming to solve this problem is O k(d + k)(d + p)(p + 1) 2k 4 d .

State's Key and Its Computation
Clearly, to look up one already obtained state's cost in memory space includes two steps; one is to design its key and the other is to look for the record in the memory by using the key. Next, we first define state's key and describe how to obtain the key. Second, based on the key, we give a specific storage method for all states' costs and actions, and then describe how to look for a specific state's cost and action in memory. Finally, time complexity for all these operations will be analysed.
We index D-customers from 1 to d and P-customers from d + 1 to d + p. In order to make the description more clear, in this section, we give a specific example and always exemplify our presentation through this example: For one given state (c, Dset, location, Pset), we give a triple i 1 |i 2 |i 3 , in which i 1 = c, i 2 = ∑ j∈Dset 2 j−1 (noting that ∑ ∅ = 0) and i 3 is related with state's components of location and Pset. Clearly, for any two states whose Dsets are complementary, the sum of their i 2 s is 2 d − 1. In order to save computing time, in our implementation, we compute an array of numbers consisting of 2 j , j = 0, 1, . . . , d − 1 at a time in advance with time consumed O(d). Then, for any given state, its i 2 can be obtained by no more than d times additional operations.
It is easy to see that, for any two different states, if they have the same Dset and c, the unions of location and Pset must be different, but if they have different Dset or c, the unions of location and Pset are probably the same. Thus, first we gather all possible different unions of location and Pset probably appearing in all states, and arrange them by a certain rule, so that a one-to-one correspondence between these unions and natural numbers can be formed. Then, we take these natural numbers as states' i 3 values. Note that a possible union is the union in which the number of P-customers is no more than the capacity k.
All possible unions of location and Pset are arranged according to the rules: (1) unions with D-customers as their locations are indexed earlier than unions with P-customers as their locations; (2) unions with a smaller size of Pset are indexed earlier; (3) unions with smaller index of location are indexed earlier; and (4) unions with the same location and the same size of Pset are indexed by applying alphabet order on their Psets. Under this rule, we can see that there are totally ∑ k j=0 d( p j ) + ∑ k j=1 j( p j ) different indices for i 3 , varying from 0 to ∑ k j=0 d( p j ) + ∑ k j=1 j( p j ) − 1.
Example : Indices of i 2 and i 3 are respectively shown in Tables 1 and 2. According to the arranging rules, the number of unions with D-customer locations and specific size l (l ≤ k) of Pset is d( p l ). The number of states with P-customers locations and specific size l (l < k) of Pset is (l + 1)( p l+1 ). Therefore, given a union of location and Pset as (i, {j 1 , j 2 · · · , j l }), i 3 's lower bound denoted by i 3,L can be quickly obtained by Formula (14). Obviously, if l = 0, the lower bound is just the value of i 3 : Then, based on the 4th arranging rule, we get the exact i 3 value through an iterative method, which is based on the assumption that all customers in Pset are arranged increasingly by their indices. As a preliminary, first, we give an algorithm to compute the order for a specific combination given that all ( p l ) combinations are lexicographically ordered. This algorithm is named as kSubsetLexRank Algorithm 3 (A3) proposed by Kreher and Stinson [41].
For g from j i−1 + 1 to j i − 1: End for 8. End if 9. End for 10. Output r Then, by using kSubsetLexRank Algorithm, the method to find the exact i 3 is given as Algorithm 4 (A4). In order to save computing time, similar to computation on i 2 , during the implementation, we also compute values of ( p−g j ), j = 0, 1, · · · , min {p − g, k} , 0 ≤ g < p, at a time in advance with time consumed O(pk). Next, based on A2, we will show that time consumed on computing states' key values is dominated by the complexity O (d + k)(p + 1) k 2 d obtained in Formula (13). In A2, taking one state s as input, computation on state's key values includes two aspects. The first is the key of this given state such that, by using the key, the given state's cost and action obtained in Step 33 can be stored in the memory space. The second are the keys of all possible states which could be resulted from the given state, such that, by using these keys, these states' already known costs can be looked up from the memory space in Step 7 and Step 23. Since the first computation can be done with O(d + p) after the given state's cost and action has been obtained, obviously the time consumed is dominated by O (d + k)(p + 1) k 2 d . For the second computation, since the low bound of one state's i 3 is only related with the location element and the size of Pset, in A2, given a state (c, Dset, i, Pset), for each alternative decision j, j ∈ Dset (or j ∈ Pset), we can compute all possible subsequent states' lower bound i 3,L s in advance in Step 2 (or Step 18). Then, since in our implementation ∆D j h s in Step 5 (or ∆D h s in Step 21) and ∆P l s in Step 6 (or Step 22) are lexicographically generated, in Step 7 (or Step 23), one state's i 2 and i 3 can directly obtained from its close precedent state with constant operations.
Thus, clearly, the complexity O (d + k)(p + 1) k 2 d also dominates the time consumed for the second computation.
Until now, for the defined state with four components in terms of two numbers and two sets, we have successfully designed a key only consisted of three numbers. Moreover, the computation on the key does not make any improvement to the complexity. Thus, we can say that the key can be obtained with O(1) time consumed. If we can directly use the key to look up (store) a state's record (cost and action) in the memory space without any additional computation, clearly an O(1) looking-up method for one state's record has been obtained. Obviously, we can achieve it by using a three-dimensional matrix to store states' records, where the key i 1 |i 2 |i 3 can be taken respectively as indices in three dimensions. However, such a storage method will result in a huge waste of the memory space, as exemplified by the following example. Hence, in order to obtain an O(1) looking-up(storage) method without any waste of memory space, we need to propose a specific storage method.
Example: We use a three-dimensional matrix to store states' records and obtain the storage result as shown in Tables A1 and A2 in Appendix A, where one state's record is represented by the state itself in the form of (c, Dset, location, Pset), gray cells are idle memory and cells between two red lines are used for states with P-customer locations. From Tables A1 and A2, we can see that this direct storage method will result in 56.88% of applied memory being wasted.

Structural Properties and Storage Method
In this section, we first exploit the structural properties of the state space. Then, based on the properties, we propose a special storage method such that an O(1) time consumed looking-up method still holds and moreover no memory is wasted at all. The example in the last section is still used to exemplify our presentation.

Structural Properties
Observation 1. For each state, there is only one triple i 1 |i 2 |i 3 , which can be obtained from this state.
It is easy to see Observation 1 by the definition of triple i 1 |i 2 |i 3 . Observation 1 is also the reason why we take i 1 |i 2 |i 3 as the state's key.

Observation 2.
Not every triple i 1 |i 2 |i 3 is the key of one state. For some triples, there are no states corresponding to them.
It is also easy to see Observation 2. In our example, we can see that although i 1 can vary from 0 to 2, i 2 can vary from 0 to 3, and i 3 can vary from 0 to 22, there are no states corresponding to triples in the form of i 1 = 0 |i 2 |i 3 ≥ 2.
For one triple i 1 |i 2 |i 3 , we say that it is valid if there is one state corresponding to it; otherwise, we say that it is invalid. In addition, we define a triple as D-triple if its i 3 is obtained from a union where the location is a D-customer, while we define a triple as P-triple if its i 3 is obtained from a union where the location is a P-customer .  Observations 3-5 can be easily seen from the definition of i 1 |i 2 |i 3 . We can also further verify them through the example as shown in Tables A1 and A2 (Appendix A).

Lemma 1. The total number of valid D-triples with i 1 = i and i
is no more than the number of valid D-triples with i 1 = k.

Proof of Lemma 1. The number of all D-triples (valid and invalid) with
. In addition, the total number of all D-triples with i 1 = i and . Combined with Observation 3, this Lemma holds. Lemma 2. The total number of valid P-triples with i 1 = i and i 1 = k − i, 1 ≤ i ≤ k − 1, is no more than the number of valid P-triples with i 1 = k.
Proof of Lemma 2. The number of all P-triples (valid and invalid) with i 1 = k is ∑ k j=1 j( p j ). The total number of all P-triples with i 1 = i and . Combined with Observation 4, this lemma holds.

Storage Method
We store valid D-triples and valid P-triples, respectively. Each are stored within a three-dimensional matrix. First, we describe how to store D-triples and then describe how to store P-triples. Note that "store valid triple" always means "store the record (cost and action) of the state corresponding to this triple". Thus, in the later part of this paper, we do not distinguish them.
To store valid D-triples We give three numbers D 1,d , D 2,d and D 3,d , which are obtained by the following formula. Then, we show all valid D-triples can be stored in a three-dimensional matrix in the size of D 1,d × D 2,d × D 3,d : We use (I In detail, given one valid D-triple i 1 |i 2 |i 3 , it will be stored according to the following rules: In the case of k being even For cells in the indices of (I In detail, given one valid D-triple i 1 |i 2 |i 3 , it will be stored according to the following rules: It is easy to see that Lemma 1 and Observation 1 guarantee that there is no overlapped cell in this matrix. Thus, it is clear that all valid D-triples can stored in a three-dimensional matrix of To store valid P-triples We give three numbers D 1,p , D 2,p and D 3,p , which are obtained by the following formula. Then, we show all valid P-triples can be stored in a three-dimensional matrix in the size of D 1,p × D 2,p × D 3,p : We use (I and i 1 = k − I p 1 are stored. In detail, given one valid P-triple i 1 |i 2 |i 3 , it will be stored according to the following rules: (Note: In the case of k being even For cells in the indices of (I In detail, given one valid P-triple i 1 |i 2 |i 3 , it will be stored according to the following rules: It is easy to see that Observation 4, Observation 5 and Lemma 2 guarantee there is no overlapped cell in this matrix. Thus, it is clear that all valid P-triples can be stored in a three-dimensional matrix of Example: D 1,p = 2, D 2,p = 4 and D 3,p = 9. k is even. The storage result of all valid P-triples in the matrix of 2 by 4 by 9 is shown in Table A4 (Appendix A), where cells stored by operation o have white background, and cells stored by operation p have green background. The gray cells are applied but wasted memory space.
Until now, from this storage method, we can easily get an O(1) looking-up method which can largely save the memory space. However, there is still some space being wasted (as the gray cells in Tables A3 and A4 in Appendix A within our example). Thus, in order to achieve the situation of no memory space being wasted, we will do the following operations.
Operations: We set the two matrices D 1,d × D 2,d × D 3,d and D 1,p × D 2,p × D 3,p to have a dynamic size D 3,d and D 3,p in their third dimensions. D 3,d (D 3,p ) is set to be the function of the index I d 1 (I p 1 ) in the first dimension as follows: From Lemmas 1 and 2, we can see that, in the updated matrices, there are no overlapped cells, and, furthermore, all idle cells existing in the original matrices have been deleted.
By now, since D 1,d , D 1,p , D 2,d , D 2,p , D 3,d and D 3,p can be computed at a time in advance, based on the rules from a) to p), for one valid triple i 1 |i 2 |i 3 , we can store or look it up within memory space with O(1) time consumed. Moreover, there is no memory space being wasted. In addition, we have already shown that, for one state, we can get its key i 1 |i 2 |i 3 with O(1) time consumed. Therefore, we conclude that an O(1) looking-up method for one already obtained state's record without any memory space wasted has been obtained.

Experimentation
These computational experiments are carried out on a computer in the setting of Intel(R) Core(TM) i7-7700 CPU, 3.60 GHz and 16.0 GB RAM in the computing laboratory of Hillman Library at University of Pittsburgh (Pittsburgh, PA, USA). For each instance, the position points of the depot and all customers are randomly generated in a unit square, so that the distance between any two points is not more than 2. The service time for customers are randomly generated though a uniform distribution on [0,1].
As shown in Table 3, values of d + k are set to be no more than 10, values of λ d and λ p are set to be both respectively 0.02 and 0.01, the value of k is set from 1 to 4, the value p are set to be, respectively, 30 and 40 and the total time consumed is no more than 10 min. In the column of "Memory (MB)", the memory consumed for storing costs and actions of states in DP implementation is shown.

Heuristics
Since solving the formulated MDP problem for an exact dynamic strategy requires exponential time complexity, for larger instances, it is clear that to find the strategy through Dynamic Programming is not applicable. Thus, in this section, we will propose two heuristics to find the approximated dynamic strategy for larger instances. We will evaluate these heuristics by comparing their computational results with the results obtained through using strategy from Dynamic Programming on some computable instances.
Heuristic 1: Before the vehicle's starting off, find the optimal TSP sequence of all D-customers, a sequence which will result in the minimum total travel distance if no cancellation or no request for unscheduled service occurs. After visiting a customer, if there are some scheduled services being cancelled or new accepted P-customers' calls, re-compute the optimal TSP sequence of all current active customers.
For each combination of d, p, k, λ p and λ d , we randomly generate 10 instances. For each instance, we randomly simulate 2000 realizations of all customers' call-occurring situations. For each realization, we both apply the heuristic method and the obtained exact strategy and obtain two total travel distances respectively from Heuristic 1, denoted by Dis Heu1 , and from the MDP strategy, denoted by Dis MDP . Then, for each instance, we respectively compute its two average total travel distances over 2000 realizations, denoted by Dis Heu1 and Dis MDP . We take (Dis Heu1 −Dis MDP ) Dis MDP as the performance of Heuristic 1 on this instance. Finally, for each combination of d, p, k, λ m and λ p , we respectively compute the average value and the worst (maximal) value of (Dis Heu1 −Dis MDP ) Dis MDP over its 10 instances to evaluate the heuristic method's performance on this combination. The concrete results on some various combinations are shown in Table 4.
From Table 4, we can see that, over all these combinations, the worst average performance of Heuristic 1 is 0.0783. Moreover, there is not an apparent tendency that the average performance will deteriorate as the problem's complexity increases. However, for the worst performance , there is almost a tendency that the output will deteriorate as the problem's complexity increases. Heuristic 2: Before the vehicle's starting off, compute the optimal TSP sequence of all D-customers and keep this sequence unchanged all the time. After visiting a customer, if some active D-customers' calls for service cancellations are received or some P-customers' calls for services are accepted, first directly delete these D-customers from the remaining sequence and then optimally insert the new accepted P-customers into the currently remaining TSP sequence by their appearing order .
The same as Heuristic 1, we can get its performance on some various combinations as shown in Table 5. From Table 5, we can see that the worst average performance of Heuristic 2 is 0.0845. The same as Heuristic 1, there is not an apparent tendency that the average performance deteriorates with the problem's complexity but the worst performance does.
By making a comparison on these two heuristics, we can conclude that Heuristic 1 performs a little bit of better than Heuristic 2 both on average and on worst performance. However, since in Heuristic 1 an optimal TSP sequence probably needs to be computed many times, this will result in much more time consumed than Heuristic 2, especially when the number of customers is large. Thus, for instances with a relatively moderate number of customers, Heuristic 1 is recommended, while, for instances with a very large number of customers, Heuristic 2 is recommended.

Conclusions and Further Discussion
This is the first time to exactly focus on an exact dynamic strategy for home pick-up services under consideration of capacitated vehicle, stochastic cancellations of pre-scheduled services, and stochastic requests for unscheduled services within one tour. Aimed at minimizing the vehicle's total expected travel distance, the problem is formulated as a multi-dimensional MDP, where the defined state consists of four components in terms of two numbers and two sets. In order to facilitate operations on states, we have successfully designed a key only consisting of three numbers for each state. When solving via Dynamic Programming, in order to avoid complexity from continually increasing, based on our designed key, we propose an O(1) time consumed looking-up method for one historic state's record. Although generally this will result in a huge waste of memory, by exploiting the structural properties of the state space, we obtain an O(1) looking-up method without any waste of memory. Finally, for larger instances which are challenging for DP, two well-performed heuristic methods are proposed.
To the best of our knowledge, this is the first time to study a vehicle routing problem exactly with the aforementioned two types of uncertainty on customer requests. As we are not aware of any similar study in the existing literature, we do not provide performance comparisons of our proposed model and the exact DP solution method with others. Nevertheless, numerical studies on our exact DP and two heuristic algorithms show that DP is able to derive exact solutions with a clearly better performance, and heuristics are more scalable to large-scale instances.
In addition, although a dynamic strategy within one tour is studied, the approach by which the strategy is obtained is not specific-tour dependent and hence can be sustainably used tour by tour. Thus, in this sense, the strategy can also be seen as a sustainable strategy. In addition, since the exploited structural properties result in an O(1) looking-up method without any waste of memory, this strategy on moderate instances can be efficiently computed by more general computational devices such as personal mobile phone or laptop, which makes this strategy more convenient when used in real life.

Future Research
Although in this paper the objective of minimizing vehicle's total travel distance is considered, in fact, home pick-up service providers are confronted with multiple, often conflicting, objectives. Thus, future research can consider customers' preferences or a weighted objective containing both provider's interest and customers' preferences. Furthermore, in the future, more practical considerations can be given to vehicles and customers. For example, a customer may request that his service should be completed in a time window. Some customers may have parcels of which the capacity is uncertain until the vehicle arrives at the customer's location. In addition, the travel of the vehicle is interrupted by some unforeseen situations so that the travel time is uncertain. Clearly, those situations are more involved, which demand more advanced modeling tools and computational strategies to provide effective decision support. Hence, some new structural properties and a new storage method need to be exploited, which is also a future research direction.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: