A Multi-Depot Vehicle Routing Problem with Stochastic Road Capacity and Reduced Two-Stage Stochastic Integer Linear Programming Models for Rollout Algorithm

: A matheuristic approach based on a reduced two-stage Stochastic Integer Linear Programming (SILP) model is presented. The proposed approach is suitable for obtaining a policy constructed dynamically on the go during the rollout algorithm. The rollout algorithm is part of the Approximate Dynamic Programming (ADP) lookahead solution approach for a Markov Decision Processes (MDP) framed Multi-Depot Dynamic Vehicle Routing Problem with Stochastic Road Capacity (MDDVRPSRC). First, a Deterministic Multi-Depot VRP with Road Capacity (D-MDVRPRC) is presented. Then an extension, MDVRPSRC-2S, is presented as an ofﬂine two-stage SILP model of the MDDVRPSRC. These models are validated using small simulated instances with CPLEX. Next, two reduced versions of the MDVRPSRC-2S model (MDVRPSRC-2S1 and MDVRPSRC-2S2) are derived. They have a speciﬁc task in routing: replenishment and delivering supplies. These reduced models are to be utilised interchangeably depending on the capacity of the vehicle, and repeatedly during the execution of rollout in reinforcement learning. As a result, it is shown that a base policy consisting of an exact optimal decision at each decision epoch can be obtained constructively through these reduced two-stage stochastic integer linear programming models. The results obtained from the resulting rollout policy with CPLEX execution during rollout are also presented to validate the reduced model and the matheuristic algorithm. This approach is proposed as a simple implementation when performing rollout for the lookahead approach in ADP.


Introduction
Realistic modelling goes a long way in supporting effective relief operation in disasters. Such models allow uncertainties and dynamic updates with richer characteristics to be incorporated along with the model objectives. Most importantly, such models can be applied to various humanitarian operations during a disaster, such as the 2015 Nepal earthquake.
In this event, a 7.8 magnitude earthquake was registered, followed by several aftershocks resulting in the loss of roughly nine thousand lives. The remainder of the survivors were in dire need of relief aid [1] and were depending on the efficient delivery of relief supplies for their survival. Among the critical relief aid is medical supplies [2,3]. Adding to the challenges is the fact that Nepal is a landlocked country without any sea access, as the country is bordered by both China and India. As such, the only access to receive international aid is through the Tribhuvan Airport, which was flooded with relief aid. This caused a severe bottleneck, hampering efficient humanitarian operations [4]. On top of that, efficient relief operations were also hindered by landslides and tremors dismantling the transportation network, especially the access to remote places where medical supplies are scarce [5,6]. These challenges can be summarised as a bottleneck problem due to a single collection point and uncertainties in the road network, further constrained by limited transport vehicles. Multi-depots, instead of one single distribution point, could help alleviate the bottleneck problem. Limited vehicles could be made to perform split delivery and multi-trip operations as a means of compensating for the small number of vehicles available. After all, according to [7], 50% cost savings could be achieved through split delivery operations. In addition to the cost incurred by the operation, uncertainties in the road network had to be addressed and travel time was a concern. Given all these observations, the Multi-Depot Vehicle Routing Problem with Stochastic Road Capacity is proposed.
For this case, such a model is considered offline as defined by [8]. From the results presented in this paper, the offline computation gives a very optimistic solution given no dynamic elements are accounted for. For the online computation, changes towards parameters could be observed and re-computation can also be performed such that more accurate actions or decisions can be made efficiently and optimally, or at least close to optimality. An exact solution approach, such as the two-stage Stochastic Integer Linear Programming (SILP) model, is considered an offline computation approach. In this approach, an optimal route is computed prior to executing it in real life or real time. Alternatively, this exact approach could be used multiple times consecutively such that any new changes of parameter in real life could be adapted and re-computation could be conducted. Hence, the routing decision could be made in segments. For example, one may compute offline the whole route from the start of the mission to the end. However, one can also perform such computations at every node by making small adjustments to the previous parameters and constraints. Although this method is costly, as a complete route needs to be computed iteratively exactly, one can reduce the following model such that the computation would not be so punitive. A better reduction can lead to the separation of the goal of the reduced model after the initial computation is performed on the original main model. Furthermore, one may also incorporate the whole strategy within a larger solution framework such as Approximate Dynamic Programming (ADP) to allow for a more systematic update of the parameter at each decision point of time. This method is adopted in this work to incorporate dynamic elements of the problem, such as a deteriorating road capacity.
The curse of dimensionality [9] has been the moving factor in the development of ADP when solving the Bellman equation [10]. The key has always been about computing the values of the state s k in the vast explosion of state space S or the value of the action a k ∈ A(s k ) given the state an agent is in. In ADP, these values are approximated to counter the curse of dimensionality. One of the non-parametric approaches of ADP includes the lookahead approach, which simulates the uncertain parameters through Monte Carlo simulation. The lookahead approach can be conducted through the rollout algorithm, where possible future events unfold from the point of decision onwards. The idea of a one step lookahead can be made analogous to a chess player that simulates a series of future events before making the best move. A two step lookahead would then mean that a chess player is thinking for two consecutive moves instead from the current state of the game. Any uncertain parameters are sampled instead via Monte Carlo simulation, typically based on a known stochastic distribution.
Besides uncertain parameter sampling, a future lookahead can only occur via a transitioning of states from the point of making decision s k onwards. This series of transitioning states is enabled by taking a set of actions according to a certain base policy throughout the lookahead. In a Vehicle Routing Problem (VRP), the base policy is the complete or partial route computed to allow the vehicle to perform humanitarian operations. This policy can be computed by using a construction heuristic, other specialised heuristics and even metaheuristics. Moreover, for cases of deterministic lookahead, an exact complete route could be computed through an Integer Linear Programming (ILP) approach.
In this study, the means to enable the computation of a base policy via an ILP approach for a Multi-Depot Dynamic Vehicle Routing Problem with Stochastic Road Capacity (MD-DVRPSRC) dynamically is proposed by performing ILP iteratively at every decision point. Without such an approach, computing a complete route from the current state would lead to a non viable route as the road capacity might change when the next state is observed. This will prevent the vehicle from passing through certain roads on the pre-computed route. Here, the Deterministic Multi-Depot VRP with Road Capacity (D-MDVRPRC) and the two-stage Stochastic Integer Liner Programing (SILP) model, (MDVRPSRC-2S) for offline computation are presented and validated. The MDVRPSRC-2S is then reduced into two models that function separately based on the status of the operation for each vehicle: delivery (MDVRPSRC-2S1) or replenishment (MDVRPSRC-2S2). It is shown that these models could be used for MDDVRPSRC in the lookahead approach to construct the base policy on the go, while performing rollout. This is done by iteratively solving the reduced problem models.
The contribution of this paper is threefold. First, both deterministic and two-stage SILP models are presented along with two reduced models of the latter. Additionally, it is also shown that these models can be applied in the rollout setting. Next, it is also shown that the tremor progression of a disaster's epicenter can be simulated, influencing the damage to the road network and thus the capacity of the road. Finally, the base policy for the rollout can be constructed dynamically on the go by solving the reduced models.
This paper is organised as follows: Section 2 describes the literature review focusing on the proposed models and rollout method in VRP. Section 3 describes the D-MDVRPRC and MDVRPSRC-2S problems, where the considerations for the time delay, road damage and random road capacity are elaborated. The two reduced models (MDVRPSRC-2S1 and MDVRPSRC-2S2) derived from MDVRPSRC-2S are presented in Section 4 with explanations of how they can be utilised in the rollout. Section 5 shows the numerical validation of D-MDVRPRC and MDVRPSRC-2S computed offline. MDVRPSRC-2S1 and MDVRPSRC-2S2 are validated through the rollout application and computational results are discussed. The research concludes in Section 6.

Literature Review
The MDVRPSRC describes the operation of split delivery throughout multi-trip operations from multi-depots while having to consider the stochastic road capacity within the network.
Different combinations and variations of these problem characteristics are especially addressed by the humanitarian operations applications. The review paper [11] had shown that those addressing split delivery VRP would also typically incorporate a multi-trip operation as a means to address the vehicle limitation problem during such a chaotic time. This could also be seen in [12][13][14][15][16]. Furthermore, a multi-depot problem is also adopted by several works to address a more realistic problem. Work that addresses both multi-depot and split delivery in the humanitarian operations setting can be found in [16][17][18]. As for works that addressed both multi-depot and multi-trip operations, this is seen in [19]. For a more in-depth, rich VRP that incorporates multi-depot, split delivery, multi-trip and road capacity for the application of humanitarian operations, refer to the work in [11].
Meanwhile, outside the field of humanitarian operations, the same trend could be observed where problem characteristics such as split delivery are usually addressed together with multi-trip operations [20]. In their work [20], the authors expand the Mixed Integer Linear Programming (MILP) model of VRP with multiple trips and split the delivery with time windows as well as heterogeneity of the vehicle fleet for optimised waste collection operations. A simple rule is applied to the heterogeneous vehicle fleet, where the largest vehicle should attend the largest collection point of waste.
In [21], similar considerations of split delivery, multiple trips and heterogeneous vehicles are addressed for the Fuel Replenishment Problem (FRP) with different types of fuel. The problem is formulated as an MILP model, where the solutions are compared with exact computations through a lower bound computed by column generation. Another FRP is addressed by [22], which considers split delivery, multiple trips, multi products and time windows as well as a heterogeneous fleet. The loads of vehicle and travel times are balanced and the model is solved with the sequential insertion heuristic methods. Ref. [23] resembles our proposed model the most, with the exclusion of road capacity consideration. Their work consists of solving problems involving split delivery and multi-trip as well as multi-depot operations for rice distribution among the poor using a homogeneous fleet of vehicles. Finally, [24] proposed an integer multi commodity flow model based on time network space planning for both the vehicle and product planning. Through this model, a split delivery, multi-trip and soft time windows problem is considered to replenish inventory in the supply chain. Other combinations of different problem characteristics could be observed from [25], which modeled a distribution problem considering multi-trip time windows as well as multi-products. Meanwhile, [26] addressed the usage of multidepot to ease the travel times, multi-trip as well as time windows with a release date in the last mile of the distribution operation. In their work, [27] addressed the multi-trip with time windows for Pickup and Delivery VRP (PDVRP). Here, the roll containers are delivered while the emptied roll containers are collected according to the availability of the nurses in multiple hospitals as well as the work time of the drivers. This problem seeks to minimise the cost of travel as well as the usage of vehicles. This is solved using a Genetic Algorithm (GA) and the route-first cluster-second approach.
A standalone problem characteristic model can be seen in [28], which deals with delivery and packaging optimisation amidst the Corona Virus Disease (2019) (COVID-19). The proposed model addressed the difficulties of supply and demand while managing social isolation and movement control. Although the sole focus from the perspective of VRP is split delivery, an interesting consideration is given to reducing the number of split deliveries as to reduce the risk of infection by multiple contacts, in the event that a demand point hosted an infected person. Ref. [29] instead focused on the optimisation of production as well as delivery by incorporating the job scheduling problem and VRP into the proposed model. The VRP portion of the problem addresses the uncertain travel time where historical data might be lacking as well as multiple trips with time window operations. On the other hand, [30] proposed a split delivery VRP that aims to minimise the total distance travelled. For this problem, customers are separated into two groups depending on which is more suitable for a split delivery, which leads to an optimised route. In the humanitarian settings, more standalone problem characteristics of VRP in terms of split delivery, multi-trip and multi-depot can be observed in [11].
While still rarely addressed within the field of study, road capacity is highlighted in several works with deterministic and binary road capacity. Integer considerations, as well as deterministic to dynamic or stochastic road capacity, are also seen. In humanitarian operation settings, the characteristics of the road are a crucial factor albeit rarely addressed. As mentioned in [11] the flow model could be the best approach when incorporating road conditions such as road flow, road capacity and road impedance in a VRP. Such work can be seen in [31][32][33][34][35]. Moreover [36] associated road conditions by levels through circle-based areas of disaster from the point of disaster. Disaster affected roads are implied in [37] by restricting the number of vehicles on a specific arc through a constraint based on the travel time along the arc that is affected by disaster. On the other hand, [38,39] assumed a binary road capacity in their problem model. Uncertain road capacity is observed in [40]. Ref. [41] on the other hand implicitly incorporated road capacity by efficiently providing a rescue path during disaster through the three objectives that are optimised. Factors, such as the travel time and the reliability of the road, as well as the safety of the road, are optimised considering the road congestion and road capacity. Meanwhile, a crude binary form of road capacity is considered in [42]. In this particular problem, road capacity is rather exclusive to the type of vehicle used, namely vans and motorcycles for delivering goods. Narrow alleys are exclusive to motorcycles and some roads may prohibit the motorcycles but could be used by the vans. Through the model, the carbon emissions and the cost of replanning are reduced despite the addition of travel distance. All flow models utilised to describe a VRP often address road capacity either explicitly or implicitly. Ref. [43], for example, computes the optimal path for rescue emergency vehicles, taking into consideration the traffic congestion and the road capacity. Here, the road capacity is implicitly taken into consideration when the travel time and reliability of a path is computed and optimised. A dynamic road capacity is considered in [44] under the dynamic constraint for the flow of a vehicle and capacity of the road. This work addressed the flow of emergency materials problem where the decision of the dynamic origin of vehicles and the vehicle path are computed. In [45], the deterministic road capacity is incorporated within the lower level model to compute the time travel of a link. The optimised path consists of chosen links acting as the input to the upper model.
In terms of reflecting the damage on the road network due to the disaster, only the work of [36] can be considered. Here, the region of the network is superimposed on a circular region with the disaster point as the center. The outer ring network has less damage and thus less delay while the inner ring, which is closer to the disaster, suffered from more damage which would cause more delays. Our work differs from [36] as we denote the damage on each individual road based on the number of intersections of the circular radius dispersion from the epicenter of the earthquake. This circular radius dispersion is meant to represent the earthquake tremor radius through its wave-like motion and eruption. While a circular-based region might seem more natural to reflect the damage on the road network, ours has the ability to discern individual road damage (at the expense of computation time) should there be more than one epicenter, as was the case with the Nepal 2015 Earthquake.
The Reinforcement Learning (RL) adoption for stochastic VRP is seen in the work of [46], which provided the fundamentals or a framework for modelling a single stochastic VRP via Markov Decision Processes (MDP). This work is extended and modified by the same author in [47]. The former and latter are without computational results as the complex Operations Research problem then tends to prohibit an exact solution through dynamic programming. This necessitates the ADP approach for solving it. The need to approximate stems from the curse of dimensionality that plagues the framework of RL or Neuro Dynamic Programming (NDP) [48,49]. Among the ADP approaches is the lookahead approach for approximating the value of states rather than approximating the function for the value function itself. The rollout algorithm is one of the lookahead approaches proposed by [48], applied as part of the policy iteration scheme. In RL application for Operations Research, such a method was used from as early as 1998, where [50] developed a rollout algorithm specialised for sequencing problems and also for single VRP [51]. Furthermore, [49] compared the policy obtained from two policy iteration approaches, namely the optimistic approximate policy iteration and rollout policy where the latter outperformed the former for the single VRP. In [52], the rollout with a cyclic heuristic is applied for the application of the Travelling Salesman Problem with stochastic travel time and single VRP with Stochastic Demand.
Ref. [53] presented both the two-stage stochastic programming approach and the RL approach to solve for the single VRP with Stochastic Demand, the same as in [51]. However, [54] applied the Monte Carlo simulation instead of going through all the outcome spaces when performing the single stage and two-stage rollout, respectively. Such methods reduce computation by up to 65% as compared to [51]. Furthermore, [54] investigated a different approach with different base policies and different methods for estimating the value of state.
Ref. [55] eventually emerged with a multi-vehicle VRP with Stochastic Demand by employing the strategy in clustering the customers first for each respective vehicle. A deterministic a priori solution is computed for each vehicle for their respective customers. Should a failure occur, re-optimising through rollout is achieved due to the smaller parti-tioned customer nodes. Ref. [56] extends the idea by introducing a dynamic decomposition mechanism that adapts the clusters at each decision point in the Dynamic and Stochastic VRP for the problem of VRP with Stochastic Demand and Duration Limit. Furthermore, [56] proposed families of rollout algorithms including post decision rollout and pre decision rollout as well as hybrid rollout based on the extension of the MDP framework. The rollouts stem from the pre and post decision states as advocated by [57] to tackle the curse of dimensionality for larger and practical instances. Ref. [58] then went on to extend the work by addressing the problem of pre-emptive replenishment where the fixed route heuristic is adapted into a restocking fixed route heuristic. This was assisted by dynamic programming in evaluating the iterated policies within the local search process. Using this computed policy, the rollout is performed producing restocking-based rollout policies for the problem of VRP with Stochastic Demand and Duration Limit with pre-emptive replenishment. The same idea of applying dynamic programming to evaluating policies or routes computed by heuristics is highlighted by [59] for a single VRP with Stochastic Demand with restocking, this time with the hybrid of backward and forward recursion dynamic programming.
The first to deviate from the stochastic demand problem is [60], which addressed the VRP with stochastic customer requests, albeit only for a single vehicle. The post decision rollout algorithm is applied via the cheapest insertion heuristic and the result is compared using a value function approximation method as well as a greedy heuristic. This was later combined with the offline ADP through Value Function Approximation (VFA) and the online lookahead with Post Decision Rollout (PDS-RA) in [61]. Here, the VFA lookup table, which incorporates the temporal aspect of a customer request as state variables, forms a base policy to execute rollout that looks ahead through the transition of a detailed state that includes the spatial aspect of a customer request. Thus, both the temporal and spatial aspect of a stochastic customer request is addressed.
Finally, we direct the readers to [62][63][64], who made use of mathematical programming to obtain a base policy for the rollout of the problem involving a scheduling or inventory routing problem. For the scheduling problem, [64] applied a deterministic quadratic programming solution approach to obtain a feasible base heuristic for the rollout in solving the constrained MDP model. Meanwhile, [62] made use of an MILP rather than any heuristic to obtain a base policy for the rollout trajectories in order to estimate the value of the possible next state such that the rollout policy could be obtained for the single vehicle inventory routing problem. The same approach is applied in [63,65].
Although we too include the mathematical programming approach within the rollout structure, we differ from the previous mentioned works by constructing the base policy iteratively and dynamically using mathematical programming for every transition within the horizon of rollout. This resulted in a complete base policy for every rollout episode. To the best of our knowledge, such a method has not been published within the scholarly field.

Problem Description
In the event of an earthquake, such as those in 2015 in Nepal, emergency shelters are set up at strategic locations to accommodate as many victims who are seeking medical help as possible.
To assist in the humanitarian logistic operations of medical supply delivery, homogeneous capacitated transport vehicles must travel from their respective depots to these emergency shelters to deliver medical supplies. Each of them are assigned to deliver medical supplies to these shelters and travels through connecting nodes and return back to any depot (not necessarily the depot from which it departed from).
During this operation, a vehicle is allowed to visit both the node and shelter as many times as possible if it is more efficient to do so. Emergency shelters that have been satisfied may serve as connecting nodes for vehicles to pass, en-route, to perform its task. Split delivery and multiple trips are allowed in order to satisfy all demands given the limited numbers of vehicles.
The decision maker must also take into account the road capacity along the routes that lead to the emergency shelters. These road capacities depend on whether the respective road is a highway, city road or a normal road. Thus the number of vehicles passing a road must be limited to that road capacity. Each road capacity within the network is stochastic, following the Poisson distribution representing the uncertain condition of the road when the disaster strikes. Moreover, each road's capacity varies over the distribution whose mean value reflects the damage sustained by the road respectively. These damages are reflected by the intersection of an earthquake radial tremor circle and the road.
Each distance covered along the roads is proportional to the cost of the operation which is to be minimised so that the saved cost could be allocated to other critical humanitarian aid supplies. Additionally, the time travel along the road has an additional delay time depending on the damage sustained by the road.
The objective is to minimise the total travel time as well as the distance which represents the cost. Here, it is assumed that the highway road has the highest capacity, followed by the normal road and finally the city road in general. However, this is assuming no damage has been sustained by these roads. Otherwise, the capacity would reflect the damage sustained accordingly. Additionally, all vehicles are assumed to be travelling at a constant speed of 90 km per hour or per 60 min. From this assumption, the time travel or the delayed time travelled T i,j is computed. Finally, it is also assumed that the sampled road capacities and delayed time travel remained at the same value (non dynamic) despite multiple trips for offline computation. The problem is modelled as an undirected, incomplete graph GG = (N, E). For both D-MDVRPRC and MDVRPSRC-2S, the parameters and variables are presented in Table 1.

Parameters
maximum capacity of vehicles after replenishment at depot penalty incurred when staying at current node (i) due to random event z 0 P z Probability of edge (i, j) with random event z

Variables
x i,j,u,m,g 1, if vehicle m travel the edge (i, j) ∈ E in the order of u in a route of trip g ∈ G 0, otherwise q i,m,g quantity served where q i,m,g ∈ Z at shelter i by vehicle m on the trip g y i,j,u,m,g,z 1, if m stays at i on trip g due to z 0 0, otherwise

Deterministic Integer Linear Programming Model (D-MDVRPRC)
In the deterministic model, the road capacity is deterministic. However, the road capacity is subjected to damages sustained by the road. Furthermore, the travel time along the damaged road is extended, representing difficulties when travelling along the damaged road.
The index u is introduced to represent the consecutive order of edges travelled, forming a route in trip g such that any nodes in the network could be visited many times while ignoring the subtour restriction.
The objective is to minimise the total travel cost and total travel time: The constraints are formulated as follows: with Constraint (2) ensuring one vehicle m departs only from one depot to other destinations for every one trip, if more than one trip is involved.
Constraint (3) ensures vehicle m returns to only (any) one depot for every one trip, if more than one trip is involved.
Constraint (4) ensures the vehicle that reaches a node, besides a depot, also comes out from that node (flow conservation).
Constraint (5) is the road capacity constraint, which should be valid for each one of the trips g, not the collective of all trips.
Additionally, constraints (6)-(8) are adopted from [66] to expand the delivery among shelters in |G| trips. With this formulation, the split delivery is also addressed: Constraint (6) ensures that the demand at each of the shelters is satisfied through the course of all trips.
Constraint (7) states that each vehicle can only deliver its maximum capacity Q during one trip.
Constraint (8) links the routing decision variable to the quantity served decision variable with the possibility of multiple visits to one shelter in one trip, if it is optimal to do so.
Meanwhile, Constraint (9) prevents a premature return to the depot in case the shelter is not one step away from the depot.
Furthermore, Constraint (10) ensures only one variable x i,j,u,m,g would carry the value of one at a specific order u. Lastly, Constraints (10) and (11) ensure that a vehicle can serve as many shelters provided that it does not violate its maximum capacity when departing from a depot at each trip. This also leads to the following trips (after the first trip) utilizing just enough vehicles to serve the remaining demands.

Time Delay and Damage Determination
The delayed travel time for travelling along a damaged road or edge is determined based on the damage sustained by the road (i, j). The damage sustained by an edge or road on the other hand is determined by the interception of the radial circle of an earthquake tremor and the edge. For example, an edge (i, j), having a damage of p i,j = 4, would have four interceptions from the dispersion radial circle of different radius. Figure 1 illustrates the earthquake tremor radial dispersion originating from an epicenter affecting the road network based on the interceptions of the radial circle lines and the edges. Edge (9, 2), for example, sustained damage of p 9,2 = 4. Algorithm 1 shows how the radial dispersion could be simulated using Fibonacci's numbers.

Algorithm 1 Pseudocode for determining damages on all edges in road network instance.
Require: Instance files Ensure: edges : damages ← sum of intensity or interception 1: create graph network GG = (N, E) using Networkx library (Python): 2: add edges (i, j) ∈ E to the network 3: extracting list undirected edges (i, j) ∈ E from the GG 4: Create even outward radial radius apart by 5 unit up to outermost node in GG 5: extracting Fibonacci sequence up to max radius of tremor 6: propagation tremor's radius = radius ∈ Fibonacci sequence ⊂ the radius list 7: initialise intensity list and damages dictionary 8: for line i, j in undirected edges list E do 9: reset intensity list 10: for epicenter in list of selected epicenter in Online Python GUI do 11: for radius ∈ radius propagation of earthquake do 12: create 2D line between 2 nodes i and j of undirected line (i, j) using sympify library 13: create 2D circle line of radius using sympify library from epicenter 14: Obtain list of intersection coordinates using sympify library Python 15: intensity list ← length of list of intersection coordinates 16: end for 17: end for 18: p i,j = sum(intensity list) 19: damages : line i, j ← p i,j 20: end for 21: return damages As mentioned earlier, a constant speed 90 km/hour is assumed for all vehicles. Based on the distance, which gives the cost C i,j , the time T i,j without considering damage, p i,j is computed in minutes as: For each edge (i, j) ∈ E sustaining damages of p i,j , the time is updated, with delay, as below:

Two-Stage Stochastic Integer Linear Programming Model (MDVRPSRC-2S)
The deterministic problem model can be extended into a two-stage SILP model. The solution computed based on the two-stage SILP model would result in routes for all vehicles for all trips, along with a recourse decision should random event z ∈ Z occur due to random road capacityr i,j , ∀(i, j) ∈ E. Here, the variable y i,j,u,m,g,z is introduced as the recourse rule should a failure of a computed route occur due to road capacityr i,j = 0 for edge (i, j) in the computed route.
Two random events are considered where Here, the recourse rule is such that: where the variable addresses vehicle m ∈ M journeying a computed route if it should stop at current nodes i in the event of z on the trip g ∈ G. For example, the route: (0-7-4-7-0) in Figure 1 for one trip is computed for vehicle m that is currently at a depot (node 0). This vehicle is assigned to serve node 4, which is the shelter. If random event z 0 occurred where r 0,7 = 0, a recourse rule would prevent m from moving. Instead, m remains at the depot (node 0). Otherwise, if z 1 is observed where r 0,7 > 0, m should proceed according to the pre computed route to node 7. Therefore, part of the solution computed by the two-stage programming would be as such: (0[0,7]-7[7,4]-4 [4,7]-7[7,0]-0). Although such a rule is rather obvious, the aim is to compute the best route for all trips. This is done while taking into consideration the random events z (as is done in objective (15)) that might happen once the route is computed as well as incorporating the defined recourse rule y i,j,u,m,g,z . The route computed by this approach is thus different than the route computed deterministically where logical rules or decisions could be applied later in real time practice when a route failure occurs.
Here, the objective is to minimise the total travel cost and total travel time, taking into consideration of the probability of event z and the recourse variable y: The Constraints (1)-(11) hold true for the two-stage SILP model except for Constraint (5). Furthermore, Constraint (16) links the recourse rule to the routing decision variable x i,j,u,m,g and the stochastic decision variable y i,j,u,m,g,z . Here, if random event z 0 occurs, a recourse decision rule would dictate that vehicle m remains at i despite binary x i,j,u,m,g computed as "True": Meanwhile, Constraint (17) ensures that vehicle m proceeds with path (i, j) for random event z 1 : and finally constraint (5) is adapted for random road capacity:

Random Road Capacity
Among the many random distributions chosen, Poisson distribution suits the MDVRPSRC-2S well as it is applied for a discrete random distribution for online computation. The Poisson distribution denotes the probability distribution of a random value at a given interval that is given by Equation (19) as: The general formulation of Poisson distribution is adapted for the case of random road capacityr i,j in Equation (20): The updated random road capacityr i,j for all edges (i, j) ∈ E is observed during the triggered event (for online computation, this is when any vehicle arrives at their destination). This updated random road capacity consists of all road capacities for each of the edges that are observed by the agent in the online simulation. These random road capacities are assumed to remain unchanged throughout the duration onwards until the next triggered event. For the offline computation, a suitable interval is chosen by considering when a delivery mission is normally executed in the aftermath of the earthquake disaster.
In the online computation, the applied duration in updating the random road capacities is the duration between two consecutive decision points (T k − T k−1 ). At both decision points, one or multiple vehicles first arrive at the destination that was previously assigned in decision point (k − 1).
Based on Equation (20), it is clear that the task is to find a suitable mean value µ i,j,k in order to obtain the probabilities of the stochastic road capacity for all edges at time k. In this distribution model, a suitable mean value µ i,j,k is the road capacity over an interval of time. µ i,j,k is computed by considering the dynamic updated maximum road capacity of the said edge (i, j), r i,j,max,T k minus the deteriorating capacity of the road, (δ i,j ).
Given the severity of a disaster such as an earthquake, the road capacity tends to degrade over time for the online and dynamic cases. Taking that into consideration, the random Poisson distribution at time T k is said to have a mean road capacity µ i,j,k of edge (i, j) over the interval of T k − T k−1 . In the offline computation case, such as the two-stage SILP approach, a suitable interval value is chosen and T k is considered as the time at which the road capacity is at its full capacity r i,j,max . Mean road capacity µ i,j,k is then computed as: where the value of δ i,j,N is a normalised value of δ i,j in the range [0, r i,j,max,T k ] ∈ R. Furthermore, the mean road capacity of an edge (i, j), µ i,j,k computed in Equation (21) is also the maximum capacity of the edge when sampling is made at time T k . This means an assumption is made that the edge has an average of the maximum road or edge capacity during the interval of T k − T k−1 . Therefore, in the dynamic and online operation, the maximum road capacity in the next observation k + 1 is updated in Equation (22). This is also the reason why the sampled random road capacity is bounded by the value of the average road capacity.
The random Poisson distribution mean µ i,j,k is thus translated as the mean road capacity observed over the interval of T k − T k−1 at the time T k , following the two assumptions of Poisson distribution: • The mean does not change over observed duration T k − T k−1 . • The events remain independent of each other.
The value of δ i,j is determined by the maximum road capacity of the edge at time T k , a constant damage parameter from the interception of the edge and the radial line of tremor dispersion of the earthquake, p i,j . This also includes the interval between the current decision point and previous decision point T k − T k−1 . Here, δ i,j is assumed to be proportional to the damage rate p i,j ∈ R: It makes sense to relate that the more damage an edge sustained, the more likely that the road capacity of the edges deteriorates from its maximum capacity.
At the same time, δ i,j is proportional to the time interval between the current and previous decision points given as: Equation (24) is proposed based on the assumption that should the interval between two decision points be fairly small, such as five minutes, it would not make sense that the road capacity would change so much from its previous road capacity. However, if the time interval is large, such as 100 min, it stands to reason that the road capacity might have reduced over time. As such, the relationship between δ i,j and the damages of the respective edges p i,j , as well as the interval, can be formulated as found in Equation (25): or where P is a constant. The value of µ i,j,k in Equation (20) should remain a positive value within the capacity of the edges. Thus, the value of δ i,j is normalised in range (0, r i,j,max,T k ]. Furthermore, the value obtained in Equation (26) should be normalised, given by the general normalisation equation: where both the minimum range and minimum value of δ i,j are 0, and the maximum value of δ i,j is given as: where an extreme interval time assumes that the decision is next triggered after a vehicle arrives at the farthest destination. This is denoted by the maximum value in the time matrix. To account for any delay associated by the arrival time, the maximum value of the time matrix is multiplied by two. Once the value of µ i,j,k is obtained from Equation (21), the random stochastic road capacityr i,j can be computed using the numpy library in Python: Finally, the random road capacity is bounded by µ i,j,k as the upper limit:

Online MDP MDDVRPSRC Lookahead Approach
In the online, stochastic and dynamic versions of the problem, the MDDVRPSRC is modelled in MDP with a PDS structure [57]. The ADP solution approach is adopted to compute for a near optimal policy. In this paper, the solution approach is briefly explained instead of the MDP model formulation as the MDP model is not the focus of this paper.
The stochastic and dynamic problems (formulated in MDP) seek an optimal policy π that maximises the total rewards R(s k , A π (s k )). The total rewards are obtained upon making a decision, starting from the initial state s 0 to the next state s 1 until the end state s K . Based on the principle of optimality [10], a policy is optimal if the decision or action taken at every decision point is optimal (a ) such that π : s k → a k , where a k = A π (s k ). The Bellman equation [10] can be rewritten such that a is computed in Equation (29) as: where the conventional approach is to estimate the next state value in order to compute for the optimal decision or action as in Equation (30): However, such an approach leads to the computation of expectation, where the outcome space needs to be included. By introducing the PDS structure [57], as in Equation (31): the exhaustive computation can be reduced as in Equation (32): The task is then to first approximate the value of post decision state instead and then compute the optimal action as in Equation (33): The approximated value of the post decision state seen in Equation (33) could be obtained through the averaged rollout return after the N number of the Monte Carlo simulation for each PDS following the basic pseudo Algorithm 2 (see [8]). Here, the expected rollout return B(π B(s a k k ) , k + 1, K) [67] is obtained per rollout episode until the end horizon K following a policy π B(s a k k ) , which is obtained by B given the current state s k : In this approach, B is typically a heuristic. However, recent advancements of the rollout show that B could also be by any method to obtain policy π B(s k+1 ) , including mathematical programming. R(s k , a k ) = R(s k , a k ) + λ k R(s k , a k ) 8: s k = S M,W (s a , W k+1 (ω(k + 1))) 9: 15: In the online MDDVRPSRC, the decision a k is a vector in a decision space A(s k ) at state s k . The decision is to assign the next destination for all vehicles at every decision point k, where A(s k ) is defined as: where M is a set of arrived vehicles and q m is the dynamic capacity of vehicle m. Furthermore, O m is defined as the set of immediate locations a vehicle can go to next based on the road network. From this set, an optimal a m is selected via computation as seen in Equation (34).
To solve Equation (33) however, the decision space A(s k ) could be too large for obtaining good solution within reasonable computation efforts. We thus propose that for every decision point k: Although a m is computed for each vehicle by performing the rollout, it is observed for a short horizon and moderate or small Monte Carlo simulations; this proposed approach yields good results for the MDDVRPSRC. This is seen in the online computation results presented in Section 5.
Furthermore, matheuristic is applied where the base policy π β(s a k ) obtained is constructed on the go, based on the iterative computation of the reduced two-stage SILP problems (MDVRPSRC-2S1 and MDVRPSRC-2S1 in Section 4.2). As such, the proposed algorithm deviates from the PDS-RA algorithm in Algorithm 2 as illustrated in Algorithm 3. Algorithm 3 is then extended to Algorithm 4 to compute for a m .
Here, the base policy π β(s a k ) is constructed on the go based on every first decision of the mathematical programming solution policy at the lookahead rollout decision time k, which is iterated over K decision points in one Monte Carlo episode. Ideally, the rollout would be performed to lookahead until the end decision point, which is the end horizon of the MDP problem. However, K in the rollout algorithm could mean any determined horizon as far as how one wished to lookahead in future subject to computation constraint. At every rollout decision point k, CPLEX is called in the Python program to compute π CP k (s k ) given the rollout current state s k (Figure 2). The policy π CP k (s k ) is simply a route with the goal to either replenish at a depot (MDVRPSRC-2S2) or to serve a shelter (MDVRPSRC-2S1) depending on the capacity of the vehicle in the rollout lookahead state s k . Depending on the capacity of the vehicle, one of the two reduced two-stage SILP model is solved (Figure 2). The first decision within the computed policy is the action or decision for the lookahead rollout state at which CPLEX is executed for, π CP k (s k ).
Randomly changing road capacity at every decision point may disrupt a route computed offline. Hence the policy computed by CPLEX could not be applied directly throughout the one episode rollout (π β(s a k ) = π CP k (s k ) ). Instead, the first decision of the policy π CP k (s k ) is taken as part of the base policy for the lookahead state k. The process continues until horizon K where the base policy π β(s a k ) would then be fully constructed for one Monte Carlo simulation episode. This lookahead simulated episode is constructed based on the policy computed on the go to transition to the next lookahead rollout state s k+1 as well as by the sampled random information W k+1 of episode n observed during the rollout decision point k which in this work is the road capacities:r i,j , ∀(i, j) ∈ E ← W k+1 (n(k + 1)). In Algorithms 2-4, the post decision state during lookahead episode s a is differentiated from s a k to avoid confusion. s k = S M,W (s a , W k+1 (n(k + 1))) 8: R(s k , a m ) = R(s k , a m ) + λ k R(s k , a m ) 8: s k = S M,W (s a , W k+1 (n(k + 1))) 9: π CP k (s k ) ← CPLEX k (s k ) 10:

Reduced Two-Stage Stochastic Integer Linear Programming Model
In order to execute CPLEX in the rollout algorithm iteratively with multiple Monte Carlo episodes and a suitable horizon of lookahead, the ILP model of the MDVRPSRC-2S needed to be substantially reduced. Since it is a lookahead approach, the setup of the problem should retain the same characteristics, such as the uncertain road capacity. However, the problem task is separated into two and reduced substantially due to the online solution that is required instead of the offline solution.
Since the rollout is executed only for a single vehicle, the problem in Section 3.4 is reduced to a single vehicle version of the problem. In this reduced version, the vehicle has only a single task, which is to either reach one emergency shelter (MDVRPSRC-2S1) and deliver the medical supply or to replenish its capacity if the total demand is still not fully satisfied (MDVRPSRC-2S2). These two tasks eliminate the requirement for both multi-trip and split delivery. Therefore, both the decision variable q i,m,g and parameter or index g ∈ G are no longer required. Due to flexibility needed in reusing edges within a completed route, index u is still needed to arrange the edges forming the route and for allowing some subtour.
The same approach is taken when considering the random road capacities as well as their probability values from the random distribution as in Section 3.5. The online approach is taken where the interval is taken as the dynamic T k − T k−1 . Therefore, the deterioration factor changes dynamically as in Equation (26). Furthermore, the mean µ i,j,k of the distribution of all edges changes accordingly based on the dynamic maximum road capacity r i,j,max,T k due to Equation (22). The road capacity observed at the point of decision in the lookahead episode n is:r i,j ∀(i, j) ∈ E ← W k+1 (n(k + 1)), where W k+1 is obtained from the random distribution. Finally, the time delay for a damaged road is considered as elaborated in Section 3.3.
For both MDVRPSRC-2S1 and MDVRPSRC-2S2, the parameters and variables in Table 2 are applied.
penalty incurred when staying at current node (i) due tor i,j = 0 P z Probability of edge (i, j) with random event z The model where the vehicle is en-route for delivery (if the vehicle capacity is not empty) will have the objective of minimising the total travel cost and total travel time, taking into consideration the probability of event z and the recourse variable y: Constraint (36) ensures only one vehicle m departs from its current location at decision point k. Vehicle m is not allowed to travel to depot since it still has capacity and there is still an unserved shelter.
Constraint (37) ensures one vehicle m arrives at an unserved shelter.
Constraint (38) ensures the vehicle that reaches a node (besides any unserved shelters and depot) also comes out from that node (flow conservation).
Meanwhile Constraint (39) links the recourse variable to the decision variable x.
and finally constraint (5) is adapted to form the random road capacity constraint (41):

Model for Replenishment (MDVRPSRC-2S2)
This model has a vehicle en-route for replenishment (if the vehicle capacity empty) with the objective to minimise the total travel cost and total travel time, taking into consideration of the probability of event z and the recourse variable y: Constraint (45) ensures the vehicle that reaches a node (besides depot) also comes out from that node (flow conservation). Furthermore, Constraints (39)- (41) are also applied for this model.

Computational Results and Discussions
The computational results obtained here are geared towards: The offline deterministic and stochastic models (D-MDVRPRC and MDVRPSRC-2S) are developed in Python 2.7 where CPLEX is executed through the DOCPLEX API for Python. Furthermore, the laptop computer used is running on Intel R Core TM i7-7500U CPU at 2.70-2.90 GHz with 20 GB RAM. In Table 3, the relevant parameters configured to run both the offline and online models are shown. To illustrate the scenario of a disaster, the disaster in Nepal 2015 is referred to, and the simulated test instances are developed. The road network forms the undirected, incomplete graph. Since the purpose of the test is to validate a problem with a road capacity issue, typical benchmarks, such as Perl, Gaskell, and Christofides, are not considered suitable. Therefore, new simulated test instances are created to highlight some issues with the road capacities and the delivery process. Among the instances created, 18 were applied and the characteristics are shown in Table 4. More difficult instances are available for future work, which can be applied for online computation. Computation within reasonable time for both deterministic and stochastic model proved to be prohibitive for instance D6N16S6 (with 6 depots, 16 connecting nodes and 6 shelters) onwards. As such, only the results with listed instances are presented in this paper.
Three road networks that were tested can be seen in Figures A1-A3 with an epicenter of an earthquake roughly at the lower center part of all the road networks in the test instances (specific coordinates: (460, 180)). The blue, brown and yellow nodes are the depots, connecting nodes and emergency shelters respectively. The problem is modelled as the undirected, incomplete graph, where no node is completely connected to all other nodes and the distance is computed based on the Euclidean distance formulation. The number at the center of the edge represents the capacity of the road. The tuple (6,7,8) represents the maximum road capacity of a highway city road, a normal road, and a highway respectively without taking the damage factor into consideration. A tighter maximum road capacity tuple (2,3,4) is also introduced in some of the instances tested to study the difference in results as compared to the original maximum road capacity setting (6,7,8). In the road network, the inner part edges are city roads while the outer part of the edges is the highway. In between the outer and inner regions are the normal roads. In the D-MDVRPRC model, the road capacity does not change after computing the maximum road capacity with respect to the damages at the initial stage. Meanwhile, in the MDVRPSRC-2S model, the capacity of the road is sampled randomly as described in Section 3.5 for offline computation. In the online simulation (rollout using MDVRPSRC-2S21 and MDVRPSRC-2S2), the road capacity changes, as can be seen in Figure A4 as compared to Figure A1. In Figure A4, it could be seen that shelters are all served and the vehicles are en-route back to the depot. At the simulated time (1152 min), which translated into an almost full day (19 hours and 12 min), the edges near the epicenter show a decrease in road capacity as compared to other edges although the setting for road capacity applied is (6,7,8). In particular, edge (9, 2) or (2,9) has zero capacity since the remaining capacity is used by the vehicles en-route to depot 2. Furthermore, the vehicle travelling along the damaged road experienced longer travel time than usual as described in Section 3.3. What can be also observed is that the road capacity of the edges, which experienced no damage (no interception on the edge), is still stochastic with the upper bound of the maximum computed road capacity. Finally, in all figures, the vehicle is denoted in either blue and green colors, differentiating the en-route vehicle and the arrived vehicle respectively with a capacity status shown within the square marker.
Based on these simulated test instances, the validation results for both the deterministic (D) and stochastic (S) MDVRPSRC-2S models are presented in Tables 5-13. Furthermore, the application of the matheuristic rollout (using the MDVRPSRC-2S1 and MDVRPSRC-2S2 models) is also included in the tables addressing both the stochastic and dynamic (SDY) MDP version of the problem. Trend of the results could be observed in Figures A5-A22. The abbreviation used in the tables is listed in Table 14.   Both the D-MDVRPRC and MDVRPSRC-2S models are solved offline, computing the route for each instance tested. Meanwhile, the matheuristic rollout (using MDVRPSRC-2S1 and MDVRPSRC-2S2) is adopted to compute the route online by simulation, as seen by the agent or decision support system through RL. Each respective computed route per instance is assessed by the total distance covered and total time travel by all vehicles utilised as well as the computation time needed to compute the routes. In Tables 5-12, an average of three independent runs is listed for the validation of the D-MDVRPRC and MDVRPSRC-2S models. Meanwhile, due to the high variability of the online stochastic and dynamic problems in MDDVRPSRC (including random vehicle at depots), an average of ten readings is recorded to demonstrate the applicability of matheuristic rollout (using MDVRPSRC-2S1 and MDVRPSRC-2S2) in the MDDVRPSRC.
Furthermore, for stochastic models (MDVRPSRC-2S2 and MDDVRPSRC), the measurements of mean, variance, standard deviation, coefficient of variation and the best results are recorded to investigate the distribution of the data obtained. With the exception of computation time, these measurements understandably do not apply for the deterministic model (D-MDVRPRC) where multiple runs would always yield the same result for total distance and total travel time covered. It should be noted once again that the purpose of this section is to mainly validate the models and the matheuristic proposed. The online computational results are expected to be different as compared to offline computational results due to the reasons discussed below in this section. However, it is interesting to note and observe how they differ when compared side by side.
The results presented in the tables are translated into figures in Appendix A where observable trends could be seen. From the figures, it is obvious that addressing the online problem would result in covering much more total distance and thus would incur larger costs as compared to the results obtained through the offline computation of both the D-MDVRPRC and MDVRPSRC-2S as seen in Figures A5 and A6 and especially Figures A14 and A15.
The offline computation approach, however, is optimistic and unlikely to be realistic as both the D-MDVRPRC and MDVRPSRC-2S assume non dynamic changes with regards to road capacity. Therefore, the results obtained never consider some of the roads becoming increasingly less available over time for the vehicles. This is especially true for the roads in close proximity to the epicenter of the disaster. Addressing the online problem (using MDVRPSRC-2S1 and MDVRPSRC-2S2), however, took into account the dynamic reduction of road capacity over time due to damaged roads. Therefore, vehicle routes are developed constructively depending on the status of the road capacity at the time of decision point, T k . Furthermore, the D-MDVRPRC and MDVRPSRC-2S models dispatch vehicles according to the total demands since they assume all the vehicles are able to travel safely back and forth from the depot. However, MDDVRPSRC takes into account the worst case scenario where the vehicle might be stranded at their last location due to no available road capacity to the neighboring location. Since the road capacity decreases over time, the vehicle might get stranded at any time throughout the mission. As a result, they could not make it back to the depot or resume delivery to shelters which are not fully served while en-route. This was observed a few times when running the smallest instance for the online computation where the roads to emergency shelters were limited and prone to damage. Thus, all vehicles are dispatched as long as total demands are not fully served. In a normal case scenario, the demand might be fully served when other vehicles with fresh capacity are dispatched and en-route, resulting in unnecessary trips. At this point, vehicles with non-delivered supplies have to return back to the depot from their assigned locations. This translates to additional distance or cost, total travel time as well as computation time.
Moreover, unlike D-MDVRPRC and MDVRPSRC-2S models, a realistic look at the problem assumes that, at the beginning, the vehicles are randomly distributed among the existing depots thus reflecting a more realistic situation during the disaster event. So, some vehicles might depart from depots that are situated at a location that is less ideal for an optimal delivery. The offline computation problem model has the luxury of choosing the departure depots for a respective vehicle in an optimal manner. All the aforementioned difference between offline and online models factored in towards a larger variance observed in the online model.
In Figures A5-A10, the total distance, total travel time and computation time are charted for every instance with respective maximum road capacity settings classified by their problem model: Deterministic (D), Stochastic programming (S) and the online MDDVRPSRC, which is both stochastic and dynamic (SDY). In Figure A5, especially, both offline computation models show close to no difference in terms of total distance travelled by the vehicle, even with the increase of vehicles from four, eight and 15 for all three instances. The SDY model, however, shows a clear increase of total distance travelled with the increased number of vehicles. Such observations could be explained by the difference in the mode of operations and assumptions taken between the offline and online modelling approach as explained earlier. The same reasoning would still apply for the slight increase of total distance for both S and D models seen in Figure A6, which addressed the instance with a tighter maximum road capacity setting. In both Figures A7 and A8, however, all three problem models solved for the routes show a decrease in time travel with the increase of vehicles. This result is to be expected as more vehicles may contribute to serving the total demands faster, proportional to the number of vehicles available. Meanwhile, Figures A9 and A10, as expected, show an overall increase of computation time for the D, S and SDY models in this order. An exception, however, is observed for a tighter maximum road capacity (2,3,4). Here, the two-stage SILP model shows a huge jump in computation time for the smallest instance with 8 vehicles. This could be explained by a stochastic value of the road capacities based on the maximum capacity setting which might be sampled with small capacities where routing computation could be challenging.
In Figures A11-A13, the assessment measurement values are compared side by side per instance, between the maximum road capacity settings (6,7,8) and (2,3,4). In general, the trend formed by the measured values could be said to be roughly similar where occasionally the setting (2, 3, 4) would slightly increase the value of measurements. This is due to the consideration of the stochastic road capacity, which holds more influence for a tighter maximum road capacity setting. Roads with a maximum road capacity of two, for example, are more likely to be randomly sampled as zero during computation in comparison with roads with a maximum road capacity of six. Having one road perceived as unavailable for the offline computation would drastically deviate from the ideal optimal road configuration. This results in a slight difference in the value measured when compared to the results of the deterministic model. Here, it is also observed that an increase of vehicles for an instance with a tighter road capacity setting would also increase the total distance, especially for smaller instances. This is due to limited roads within the network (for smaller instances) that have a tighter capacity limit but need to accommodate a larger number of vehicles in order to reach the emergency shelters. For some vehicles, a direct or obvious route is not possible due to the capacity thus leading to a longer route detour resulting in an increase in total distance covered. Larger road networks, on the other hand, provided wider options for alternative routes even if a direct route is not possible.
Meanwhile, the counterparts of Figures A5-A10 (Figures A14-A19) are shown to observe the trends for all three instances, with increasing the amount of vehicles grouped by the model characteristics (D, S and SDY) for each maximum road capacity settings. Here, the general similarity of D and S model characteristics in terms of the results obtained for total distance, time travel as well as computation is even more apparent. Meanwhile, Figures A20-A22 are the counterparts of Figures A11-A13 where the trends obtained from Figures A14-A19 are put side by side based on their respective maximum road capacity settings. As discussed for the earlier figures, the MDVRPSRC-2S model tends to produce optimal assessment values similar to those of deterministic values when the stochastic sampling of random road capacity permits it. Otherwise, slight differences are observed, suggesting the limitation of ideal routes due to a tighter road capacity and thus road availability. It is also seen that the increase in the number of vehicles has close to no influence on decreasing the total distance travelled for the case of the two offline models. This is due to the flexibility of dispatching vehicles according to the total demand as well as the optimistic assumption that the vehicle dispatched will make it back to the depot.

Conclusions
This paper addressed the problem of Multi-Depot VRP with Stochastic Road Capacity during a post-disaster event for medical supplies' delivery. Apart from multi-depot, split delivery is allowed to account for limited homogeneous vehicles when disaster strikes. The D-MDVRPRC model represents the deterministic version of the problem. The MDVRPSRC-2S model is based on the two-stage SILP in dealing with stochastic road capacity. Both of these models are validated by offline computations using CPLEX. Moreover, in this work, the reduced models from MDVRPSRC-2S: MDVRPSRC-2S1 and MDVRPSRC-2S2 are presented for iterative application during the rollout approach in ADP to solve MDDVRPSRC online. Through solving MDVRPSRC-2S1 or MDVRPSRC-2S2 for each vehicle at each decision point in the rollout, the base policy for rollout is thus constructed on the go.
All proposed models and the novel construction of the base policy in the rollout are validated through the results presented. From the results, the differences in the idealistic assumptions of the offline model and the more realistic online model are shown based on the solutions computed. Additionally, the dynamic consideration of deteriorating stochastic road capacity, which leads to difficulties in road navigation, is shown in the online computation of the problem. Based on the presented work, a few future research directions are noted. MDVRPSRC-2S addressed the problem by incorporating a recourse rule, which is to either to stay at the current position or to proceed with the computed route if the road capacity is available for the vehicle. This problem model could be extended with the option of navigating to other destinations instead of staying at the current position when the optimal path is not available. However, the model would be progressively complex as the extended version may need to be addressed by multi-stage stochastic programming instead of adapting to the multiple possible changes of routes at each route failure. Furthermore, the results obtained from the proposed rollout are based on a single vehicle rollout. More research can be carried out to study the practicality of such an approach (constructive matheuristic rollout) for the case of multiple-vehicle rollout in terms of reduced model complexity as well as additional computation time. The proposed solution approach, on the other hand, is shown to produce good results for a short rollout horizon and a minimal Monte Carlo simulation for this specific problem.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: