Online Optimization of Pickup and Delivery Problem Considering Feasibility

: A pickup and delivery problem by multiple agents has many applications, such as food delivery service and disaster rescue. In this problem, there are cases where fuels must be considered (e.g., the case of using drones as agents). In addition, there are cases where demand forecasting should be considered (e.g., the case where a large number of orders are carried by a small number of agents). In this paper, we consider an online pickup and delivery problem considering fuel and demand forecasting. First, the pickup and delivery problem with fuel constraints is formulated. The information on demand forecasting is included in the cost function. Based on the orders, the agents’ paths (e.g., the paths from stores to customers) are calculated. We suppose that the target area is given by an undirected graph. Using a given graph, several constraints such as the moves and fuels of the agents are introduced. This problem is reduced to a mixed integer linear programming (MILP) problem. Next, in online optimization, the MILP problem is solved depending on the acceptance of orders. Owing to new orders, the calculated future paths may be changed. Finally, by using a numerical example, we present the effectiveness of the proposed method.


Introduction
There has been much attention paid to control technologies for realizing a smart city [1].In a smart city, many services using control technologies should be developed, such as transportation, energy distribution, healthcare, environmental monitoring, business, commerce, emergency response, and social activities.Moreover, in control technologies for a smart city, a cyber-physical system (CPS) plays an important role.A CPS is a system where physical and information components are deeply connected through a communication network.In a smart city, multiple physical components such as mobile robots are controlled by information systems, such as cloud servers.Hence, a plant in a smart city is modeled as a CPS.Thus, in a smart city and a CPS, it is important to develop a control method for multiple agents.In this paper, we focus on a control method for multiple agents.
Control of multiple agents has several applications and has been widely studied.In the control of multiple agents, there are several problem settings, such as the achievement of cooperative tasks and surveillance (patrol).A control specification in cooperative tasks may be given by linear temporal logic formulas [2][3][4][5][6][7][8][9][10][11][12][13][14][15][16].Temporal logic is a logical system of rules and symbolism for representing, and reasoning about, propositions qualified in terms of time (for example, "I will eventually be hungry").The surveillance problem is to find agents' paths patrolling a given area as evenly as possible [8,[17][18][19][20][21][22][23][24][25][26][27][28].In problem settings on the control of multiple agents, model predictive control (MPC) is frequently used (see, e.g., [9,22,27]).MPC is a control method in which the control input is generated by solving at each time the finite-time optimal control problem (see, e.g., [29,30]).In other words, the control input is calculated based on the prediction using a mathematical model.MPC was frequently used in the control of chemical plants.In recent years, with the development of computers, MPC has been used in the control of several systems, such as automobiles.In the case where a target area is unknown and is changed in time, a modelfree approach such as machine learning is effective (see, e.g., [31,32]).In this paper, we suppose that a target area is fixed and use a model-based approach.
In this paper, we focus on the pickup and delivery problem as one of the control problems of multiple agents.In recent years, the food delivery market has been expanding through the coronavirus epidemic.Efficient delivery from the restaurant to the customer is becoming more and more important.Such a problem of efficiently delivering goods received at a certain location to the desired location is called a pickup and delivery problem (see, e.g., [33][34][35][36][37][38][39][40][41]).Especially, the problem of updating routes by solving optimization problems at regular intervals is called an online pickup and delivery problem, which has been studied extensively in recent years (see, e.g., [42][43][44]).
In pickup and delivery problems, it is important to use electric vehicles and drones as agents.Electric vehicles and drones have the disadvantage of severe battery constraints.Hence, fuel constraints must be considered in pickup and delivery problems.In [45,46], a modeling method to handle the fueling of vehicles has been proposed.A fuel constraint in the proposed method is basically the same as in [45,46].Refueling is different.In this paper, the fuel of a vehicle immediately becomes full by changing the battery.Furthermore, when electric vehicles and drones are used, demand forecasting is important (see, e.g., [47,48]).This is because more efficient moves are needed due to battery constraints.For example, many agents can be placed in advance at points where demand is concentrated.To the best of our knowledge, few results have been obtained.
In this paper, we propose an online pickup and delivery problem in which demand forecasting and fuel constraints are considered.First, we formulate the pickup and delivery problem.In this paper, the target area is given by an undirected graph.Constraints including fuel constraints are explained.Next, after the demand forecast is explained, the cost function is introduced.In a certain term in the cost function, we use the result of demand forecasting.Then, agents' paths can be calculated according to the result of demand forecasting.That is, based on the policy of MPC, the real order and demand forecast are used.Even if the result of demand forecasting is different from real orders, the problem can be solved.This is because the result of demand forecasting is used in the cost function, not a constraint.Hence, we can guarantee the feasibility.Third, we explain the procedure of online optimization.Finally, we present a numerical example.Through a numerical example, we demonstrate that the feasibility is guaranteed by the proposed method.
The conference paper [49] is a preliminary version of this paper.In [49], the authors have considered both demand forecasting and fuel constraints.However, depending on the result of the demand forecasting, the optimization problem becomes infeasible.In this paper, this technical issue is overcome.
This paper is organized as follows.In Section 2, we formulate the pickup and delivery problem studied in this paper.After a mathematical model of the target area and the definition of the orders are explained, the constraints considered in this paper are explained.Demand forecasting in this paper is also explained.Finally, the cost function is explained.In Section 3, we explain the procedure of online optimization of the pickup and delivery problem.In Section 4, we present a numerical example to show the effectiveness of the proposed method.In Section 5, we conclude this paper.
The main contributions of this paper are highlighted as follows: (i) For pickup and delivery problems, fuel constraints are introduced to utilize electric vehicles and drones as agents.(ii) A pickup and delivery problem considering demand forecasting is newly formulated.(iii) The effectiveness of the proposed method is clarified through a numerical example.

Pickup and Delivery Problem
In this section, we formulate the pickup and delivery problem studied in this paper.First, we describe the preliminaries of the problem, which are the target area and the order notation.Next, we describe the constraints in the pickup and delivery problem.Finally, we describe the cost function in the pickup and delivery problem.

Preliminaries
We introduce some notations.First, the target area in the pickup and delivery problem is given by the following undirected graph: where there are the following: Without the loss of generality, the set V can be represented by Let A = [a ij ] ∈ {0, 1} (N+M+J)×(N+M+J) denote the adjacency matrix of G, where if there is the edge between v i and v j , then a ij = 1; otherwise, a ij = 0.An example of an undirected graph G = (V, E ) is shown in Figure 1.We suppose that according to a given graph, an agent moves from some vertex to other one in unit time.Next, notations about orders are introduced.We assume that new orders are received sequentially.Let τ i , i = 0, 1, 2, . . .denote the time that the delivery route may be changed.The initial update time τ 0 is set as τ 0 = 0. We assume that at the update time τ i , there is at least one new order.Let O i denote the set of orders that are newly added at time t = τ i .The element o k of O i is given by the following form: where there are the following: • s k ∈ S: the store that received the order; • c k ∈ C: the customer who placed the order; • T s o k : the time that the goods are available for pickup; • T c o k : the time limit to complete the delivery.

Constraints
We formulate the following constraints in the pickup and delivery problem at the update time τ i .

•
Constraints on Agent Location: an agent can be located at only one vertex.

•
Constraints on Agent Movement: an agent can move according to a given graph.

•
Constraints on Orders: an order is assigned to an agent.

•
Constraints on the Set of Orders: an agent must receive the goods before delivery and must deliver those until a certain time.

•
Constraints on Agent Fuel: the fuel remaining for an agent must be equal to or greater than a certain value.

•
Constraints on Luggage: the number of goods that an agent can handle once is constrained.
We explain the details of these constraints.

Constraints on Agent Location
First, let N denote the finite set of agents.We introduce a binary variable vector p l (t) ∈ {0, 1} |V | for the agent l ∈ N , where t ∈ {0, 1, 2, . . ., } is discrete time.The j-th element p l,j (t) of p l (t) is 1 if the agent l is located at the vertex v j ; otherwise, it is 0. The constraint on the location of the agent l is given by which implies that at time t, the agent l is located at only one vertex.

Constraints on Agent Movement
The movement of the agent l is constrained by the following inequality: where the inequality sign (≤) holds element-wise.Equation (4) represents that the agent l moves only between vertices for which edges exist (see, e.g., [50][51][52]).We remark that (4) becomes the constraint on the agent movement when (3) is imposed.For example, consider the undirected graph shown in Figure 1.Suppose that at time t, the agent l is located at the vertex s 1 .Then, the location candidates at time t + 1 are the vertices s 1 , r 1 , r 2 , and r 5 .Based on the notation of (1), the vertices s 1 , r 1 , r 2 , and r 5 correspond to 1, 5, 6, and 9, respectively.Because the first row of the adjacency matrix A is given by 1 0 0 0 1 1 0 0 1 0 , the above situation can be represented by the following inequality: When at time t the agent l is located at the vertex s 1 , p l,1 (t) = 1 holds.We remark that from (3), p l,j (t) = 0, j ̸ = 1 holds.Hence, under (3), either p l,1 (t + 1), p l,5 (t + 1), p l,6 (t + 1), or p l,9 (t + 1) must be 1.This implies that the location candidates at time t + 1 are the vertices s 1 , r 1 , r 2 , and r 5 .

Constraints on Orders
For the agent l and the order o k , we introduce a binary variable y l,k as follows: Similarly, if c k corresponds to the i-th element p l,i (t) of p l (t), then delivering the goods of order o k at time t can be represented by the following expression: The relationship between y l,k , z pick l,k (t), and z deli l,k (t) is represented by the following expressions: We define two sets of orders P and D to represent the status of each order.The set P is the set of orders where the agent has not received the goods at the store.The set D is the set of orders where the agent has received the goods at the store and is in the process of delivery.Here, because there is only one agent responsible for each order, the following constraints must be imposed:

Constraints on the Set of Orders
The constraint on P is that the agent receives the goods before delivery and that the delivery time of the goods is equal to or earlier than the time limit τ i + T f .This constraint is represented by the following expressions: On the other hand, the constraint on D is that the time to deliver the goods must be equal to or earlier than the time limit τ i + T f .Hence, the constraint on D is expressed by the following expression: 2.2.5.Constraints on Agent Fuel When agents are implemented by using electric vehicles and drones, we need to consider fuel constraints.
We introduce the continuous variable q l (t) ∈ R as a variable representing the fuel of the agent l at time t.Let q 0 denote the agent's maximum fuel.Suppose that the initial fuel is given by q l (0) = q 0 .The fuel q l (t) at time t is defined by q l (t) = q 0 if the agent l is located at the vertex in S, q l (t − 1) − 1 otherwise, (13) which implies that the fuel is fully charged at the store vertex and decreases by one at the vertex, except for the store vertex.
The constraint on the fuel of the agent l is given by where c, c f ≥ 0 are given constants.According to the policy of model predictive control, the terminal constraint may be different with other constraints.This difference can be realized via a method for choosing c and c f .

Constraints on Luggage
We introduce a variable X l (t) that represents the number of pieces of luggage carried by agent l at time t.Here, we assume that the weights of the packages are the same.Using z pick l,k (t) and z deli l,k (t), the variable X l (τ i ) can then be defined as where (X l (τ i )) i−1 is the value obtained via optimization at t = τ i−1 .In addition, X l (t) can be expressed as when t = τ i + 1, . . ., τ i + T f .Assume that the agent l can carry up to c w pieces of luggage, where c w is a given constant.The constraint on the number of pieces of luggage that the agent l carries at time t can be expressed as The variable X l (t) implies the number of pieces of luggage.In the case where the weights of the packages are different, the variable X l (t) is regarded as a total weight for the agent l via a simple modification.

Demand Forecast
In the conventional pickup and delivery problem, the paths of the agents are calculated at τ i , τ i+1 , . . ., max k T c o k (max k T c o k represents the slowest time limit in the set of orders).In this paper, we consider using the demand forecast.It is expected that better paths are obtained by considering not only the time interval where there are orders but also the future where there are no orders.
In this paper, we assume that the number of agents (denoted by d j (t)) required in the store s j at time t is given as the result of the demand forecast.In the next subsection, we explain how to use d j (t).

Problem Formulation
Under the above preparations, we formulate the pickup and delivery problem at the update time τ i .The cost function to be minimized is defined as follows: We describe c i,1 , c i,2 , and c i,3 in detail.At first, c i,1 is defined as follows: p l (t) ∈ {0, 1} |V | is a binary variable vector for the agent l ∈ N , where t ∈ {0, 1, 2, . . ., } is discrete time.The j-th element p l,j (t) of p l (t) is 1 if the agent l is located at the vertex v j ; otherwise, it is 0. The scalar w 1 is a constant representing the weights and is determined through trial and error.In other words, Equation ( 20) implies the sum of the time spent at the vertices, except for the store for all agents.By reducing the value of this term, the running cost of the agents can be reduced.Next, c i,2 is defined as follows: where T k is a variable that represents the time when the agent completes the delivery of order o k to the customer.Time T k can be expressed as where z deli l,k (t) is a binary variable that is 1 if agent l delivers order o k at time t and 0 if it does not.In (22), the column vectors are all zero, except for one element.Therefore, from this expression, we can find T k when the value of z deli l,k (t) is equal to 1.The scalar α k is a variable that determines whether the delivery time of the order o k to be delivered meets the delivery deadline.The α k is defined as follows: where α k = 1 implies that the time T k when the order is delivered to the customer is later than the time limit T c o k to complete the delivery.The scalar w 2 is a constant representing the weights and is determined through trial and error.
In other words, Equation ( 21) is a summation of the delays in the delivery time.By reducing the value of this equation, we can obtain agent trajectories that satisfy the delivery time constraints of the order.
Finally, c i,3 is defined as follows: where σ j (t) is a decision variable that represents the number of agents in store s j at time t.The constant d j (t) is given in advance (see Section 2.3).We assume that the demand forecast is a prediction of when, where, and how many orders will be placed.The α j (t) is a binary variable that determines whether the number of agents in store s j at time t satisfies the required number of agents in store s j obtained from the demand forecast.The α j (t) is defined as follows: where α j (t) = 1 implies that the number of agents in store s j is less than the number of agents required for store s j obtained from the demand forecast.The scalar w 3 is a constant representing the weights and is determined through trial and error.In other words, Equation ( 24) is the sum of the number of agents that are less than the number of agents required for each store.By reducing the value of this equation, the agent trajectory can be obtained with more consideration given to demand forecasting.The pickup and delivery problem at the update time τ i is given as follows. Problem subject to (3)-( 18), ( 22), ( 23), (25).
Via a simple calculation (see, e.g., [52]), the constraint conditions (3)-( 18), ( 22), (23), and (25) can be transformed into linear constraint conditions.In addition, the cost function is also transformed into a linear cost function.Hence, this problem is reduced to a mixed integer linear programming (MILP) problem.

Online Optimization
For O i , P, and D, the online optimization of the pickup and delivery problem is performed as follows.

Procedure for online optimization of the pickup and delivery problem:
Step 1: If i = 0, set P, D = ∅, and t = 0.
Step 2: Step 3: For o k ∈ P ∪ D, perform the following procedure.
Step 5: For each order, if o k ∈ D, then update y l,k , l ∈ N to (y l,k ) i−1 obtained via optimization at t = τ i−1 , that is, Step 6: For o k ∈ P, impose Equations ( 10) and ( 11) as a constraint.For o k ∈ D, impose (12) as a constraint.
Step 7: Solve the pickup and delivery problem at time t = τ i , adding the constraints in Equation (26).
Step 8: Based on the calculation results, move the agent.Update t := t + 1.
Step 9: If there are no orders at time t, return to Step 8.If there is an order, return to Step 2. The outline of this procedure is shown in Figure 2.
Step 5: Hold an agent's path that has already moved.
Step 8: Move agents based on the computa�on result.Update Step 9: Are there new orders?In this procedure, the agents' predicted paths may be changed by solving the optimization problem that is generated by receiving a new order.Because the agents' predicted paths are changed online (real time), we call this procedure an online optimization.

Numerical Example
In this section, we present a numerical example where the optimization problem is solvable even when the demand forecast deviates significantly from the actual order by considering the delivery constraints in the cost function.Here, we compare the proposed method with the existing method [49].

Preliminaries
Suppose that the target area is given by Figure 3.The number of vertices is set to |S| = 10, |C| = 10, and |R| = 16, and the graph is randomly arranged.The number of agents is set to 10, and the initial position is assumed to be one agent in each store.The maximum fuel of an agent is set to q 0 = 20.All w 1 , w 2 , and w 3 in ( 20), (21), and (24) are set to 1.In addition, we set T f = 20.We used a computer with an AMD Ryzen 5 5600X 6-Core Processor 3.70 GHz CPU and 32 GB of memory.The optimization problem at each update time was written using PuLP, which is a modeler written in Python.The solver to compute the optimization problem was IBM ILOG CPLEX 20  The order contents are given in Table 1.In the case of this order, the optimization problem is solved at times 0, 5, and 8.However, in the existing method, τ i + T f in the Equations ( 11) and ( 12) representing the delivery time constraints is T c o k .In other words, the existing method constrains the order delivery to be completed by the order delivery date, not within the time interval.In [49], the cost function J i is defined as follows, and the delivery time constraint is not considered in the cost function.
The existing methods for demand forecasting consider the following inequalities.
∑ l p l,j (t) ≥ d j (t), (j = 1, 2, . . ., N) The d j (t) used in Equations ( 24) and ( 28) are set to d 5 (8) = 4 and 0 in all other cases.This means that at time 8, store s 5 needs four agents, and there are no constraints for the other times or stores.However, the actual orders are not for store s 5 but for store s 4 , which means that the predictions of the stores that will receive the orders in the demand forecast are all wrong.

Calculation Results
Table 2 shows the results obtained for the case solved by the existing method.Table 3 shows the cost function values at each time for the existing method.Table 4 shows the results obtained for the case solved by the proposed method.Table 5 shows the value of the cost function at each time for the proposed method.Figures 4-8 shows the obtained trajectories of the agents.The trajectories of the agents obtained at times 0 and 5 for the existing methods are shown in Figures 4 and 6.The trajectories of the agents obtained at times 0, 5, and 7 for the proposed method are shown in Figures 5, 7, and 8.The solid line in Figures 4-8 shows that the agent has luggage, while the dotted line shows that the agent does not have luggage.The thickness of the arrows in the figure indicates the number of pieces of luggage the agent is carrying.From Tables 2 and 4, only the proposed method can solve the problem at time 8 for the orders given in Figure 1.This is because in the case of the existing method, Equations ( 11) and ( 12) are not satisfied for orders o 2 and o 7 .On the other hand, in the case of the proposed method, we add to the cost function the amount of time past the delivery time constraint T c o k for the orders o 2 and o 7 from Table 4 and Table 5, respectively.Thus, the result at time 8 is solvable.Therefore, even when the demand forecast is significantly off and the delivery time constraints for all the orders are not met, an optimal solution that satisfies the conditions as much as possible can be obtained.

Conclusions
In this paper, we proposed an online pickup and delivery problem in which demand forecasting and fuel constraints were considered.First, we formulated the pickup and delivery problem with fuel constraints.Next, we introduced the cost function considering demand forecasting.After that, we explained the procedure of online optimization.Finally, through a numerical example, we presented the effectiveness of the proposed method.
In future work, we will consider practical applications.The problem formulation in this paper is relatively simpler than real food delivery services.It is important to extend the proposed method to the practical setting.We will consider many real scenarios and validate the proposed method.For large-scale scenarios, the computation time to solve an MILP problem becomes large (in this paper, as the first step, an MILP problem is solved by a conventional solver).It is also important to develop a distributed solution method, such that the problem is decomposed into small subproblems, and a parallel algorithm based on swarm optimization.

Figure 2 .
Figure 2. Outline of the procedure of online optimization.

Figure 3 .
Figure 3. Target area, where an agent can move to an up/down/left/right vertex.

Figure 4 .
Figure 4. Results at time 0 with existing method.

Figure 5 .
Figure 5. Results at time 0 with proposed method.

Figure 6 .
Figure 6.Results at 5 with existing method.

Figure 7 .
Figure 7. Results at time 5 with proposed method.

Figure
Figure Results at time 8 with proposed method.
the agent l is responsible for the order o k , 0 otherwise.As variables related to the agent l's order receipt and delivery at time t, we also introduce binary variables z Assume that the vertex s k in the order o k corresponds to the vertex v i (i.e., the i-th element p l,i (t) of p l (t)).Then, receiving the goods of the order o k at time t can be represented by the following expression: .1.0,which can be called from Python.The simulation result is outputted as the computation result of the Python program.

Table 2 .
Results with the existing method.

Table 3 .
Values of the cost function at each time in the existing method.

Table 4 .
Results with the proposed method.

Table 5 .
Values of the cost function at each time in the proposed method.