Development of a Shipment Policy for Collection Centers

Natural disasters represent a latent threat for every country in the world. Due to climate change and other factors, statistics show that they continue to be on the rise. This situation presents a challenge for the communities and the humanitarian organizations to be better prepared and react faster to natural disasters. In some countries, in-kind donations represent a high percentage of the supply for the operations, which presents additional challenges. This research proposes a Markov Decision Process (MDP) model to resemble operations in collection centers, where in-kind donations are received, sorted, packed, and sent to the affected areas. The decision addressed is when to send a shipment considering the uncertainty of the donations’ supply and the demand, as well as the logistics costs and the penalty of unsatisfied demand. As a result of the MDP a Monotone Optimal Non-Decreasing Policy (MONDP) is proposed, which provides valuable insights for decision-makers within this field. Moreover, the necessary conditions to prove the existence of such MONDP are presented.


Introduction
In a world where natural and man-made disasters are on the rise in both number and intensity, the efficiency of humanitarian operations gains importance in order to be prepared and respond efficiently to the communities' needs. An estimated 80% of disaster relief operations involves making aid, food, and other resources available to the affected people in a timely and adequate manner [1], the very definition of logistics in the context of disaster relief supply chain. An essential part of success in any supply chain is the policies and adequate management of the inventories at every echelon. In the case of disaster relief supply chains, proper inventory control ranges from the collecting centers to on-site distribution centers and the last mile distribution points and it can make a significant difference in attending to the affected communities in a timely and adequate fashion.
Classic inventory strategies and decision-making policies hardly adjust to the diverse conditions faced in a crisis such as properly responding to a natural disaster. This in turn challenges the researchers in the field of humanitarian logistics to develop customized policies, strategies, and models to efficiently manage the inventories of supplies in the specific case of emergency and disaster response [2,3].
In many countries, during the immediate aftermath of a disaster, some key inventory items are positioned at the origin of the relief supply chain through donations to the collection centers. For example, in Mexico, the percentage of in-kind donations received in the aftermath of a disaster can go up to 80% [4]. These donations range from human resources, housing, and medical supplies to food, clothing, and personal hygiene items, etc. They can also come through donations made by governments, the private sector or citizens. Therefore, in-kind donations, can be the backbone of relief operations since they constitute a large percentage of the available resources to work with. However, they also create a high variable starting point for the disaster relief supply chain due to their arbitrary nature. Most of the research done in disaster relief operations assumes that the resources to be sent to the affected areas are in the quantity and nature required. Additionally, despite the importance of in-kind donations in some countries, little research has been conducted to address their efficient management and distribution. Donations that are addressed in the body of research on disaster relief operations are mostly financial.
Some researchers consider in-kind donations in disaster relief operations with an approach of material convergence problem [5,6], others research on the psychological drivers towards donating in cash or in-kind [7,8]. Only few authors such as Cook and Lodree [9] address this problem with the uncertainty of in-kind donations and the uncertainty in the demand. However, the research proposed by these authors, focuses solely on the minimization of the unsatisfied demand. For this research, we seek to integrate the goal of the minimization of the unsatisfied demand with the logistics costs that it entails.
Collection centers are usually installed by public organizations and private sectors to receive donations, and they can also function as a place of preparation, packaging, and transit of shipments. They may be located outside or within the affected region [10].
In this work, the operation of such collection centers was developed with the aim of developing efficient policies for their decision-making process. Depending on the strength and impact of the disaster, organizations may evaluate the suitability of opening and running collection centers for a certain period of time. In the case of this research, we attempt to address those events with a large enough impact to call for the necessity of nationwide aid collection. Hence, we assume that relief operations are being held at a national level and collection centers are positioned throughout the country.
Collection centers are confronted by the uncertainty in the quantity and nature of the in-kind donations that will be received daily and similarly, the quantity and nature of the aid that will be required in the affected areas. These two variables directly impact the operations conducted in the collection centers, and the decisions made regarding the size and frequency of the shipments, i.e., the dispatching policy of the donations to the affected areas. The research presented in this document addresses the creation of efficient decision-making policies to this end.
Making such a decision can become especially complicated when the costs of shipping and distribution are also added to the decision process. This decision is currently addressed by the stakeholders based mostly on their experience. They may send a shipment once the vehicle is full, which could take from one to several days to get to its destination, therefore, creating a lag in satisfying the demand. In other cases, the decision is made considering the demand alone, which can lead to making smaller shipments and ignoring the long-term cost implications.
The goal of this research was to develop a Markov Decision Process Model (MDP) to model the daily operations in collection centers. In addition, to make the decision of sending a shipment to the affected areas based on the available inventory and current accumulated demand, while considering the tradeoff between the implications of unsatisfied demand and the cost of frequent shipments. Such a model provides valuable insights to facilitate the analysis and, most importantly, decision-making among the key stakeholders involved in these operations.
The output of the MDP developed through this research is a policy or a sequence of decision rules where the decision, at any period of time, may depend on the available information on the system at the time [11]. This policy is derived for the developed MDP as a Monotone Optimal Non-Decreasing Policy (MONDP) after validating the necessary conditions for its existence. These conditions provide useful and valuable insights for decision makers in this field.
The remainder of this paper is organized as follows. Section 2 describes the details of the addressed problem. Section 3 presents the elements of the proposed Markov Decision Process Model. Section 4 states the conditions for the existence of a MONDP. In Section 5, these conditions are mathematically verified. Finally, Section 6 presents the conclusions as well as some future research.

Problem Description
This section presents the setting of the mathematical model including the problem statement, the notation used in the MDP model, and the sequence of events occurring in each period of time.

Problem Statement
When a natural disaster strikes, and national assistance is required, different disaster relief protocols are put into place to respond to the emergency. One of them is to enable collection centers to receive in-kind donations. Figure 1 presents an example of a national disaster relief supply chain, based on our field observations working with the Mexican Red Cross operations. The dotted lines going from right lo left represent the information flow along the supply chain from the affected areas to the donors. Information regarding the required good is initially assessed and constantly updated by the onsite team and travels up to potential donors. Donors (presented on the left of Figure 1) receive the requirements and seek to provide the different products to cover the assessed needs. The products are received through the collection centers, consolidated, sorted, and packed to be shipped to the onsite distribution center to eventually be distributed to smaller warehouses or directly to the points of distribution.
The Mexican Red Cross requires specific items to be packaged into predesigned aid kits such as food, hygiene, house cleaning or baby care kits. These aid kits have been predefined to work for one standard family and last for a specific period of time.
Therefore, in our model, we assume that in each decision epoch a random number of donations is received in the form of aid kits, increasing the available inventory, while the accumulated demand is updated when a request is received for a certain number of aid kits from the affected areas. The key decision is whether a shipment should be sent in the current period or not, while the size of the shipment is naturally defined based on the current available inventory and the accumulated demand.

Notation
This section presents the notation to be used in the formulation and validation of the MDP. Details of this notation are presented in Table 1. Optimal level of demand to send a shipment at the beginning of period t Variable Probability of arriving to state I t+1 given that there is currently (I t , D t ). and action a t was chosen Transition Probability Function Probability of arriving to state D t+1 given that there is currently (I t , D t ) and action a t was chosen Transition Probability Function The joint probability of arriving to state (I t+1 , D t+1 ) given the current state (I t , D t ) and action a t was chosen Joint transition probability Total cost incurred in period t when inventory and demand are (I t , D t ) respectively and action a t chosen Cost Function  Figure 2 presents the graphical description in each discrete period of time, with a number of the sequence of events occurring throughout the decision process, and the description of each event follows.

1.
The t th period starts with the information of the inventory on hand and the accumulated demand (I t , D t ).

2.
The size of the possible shipment is calculated by the available inventory and the accumulated demand up to period t. The shipment size is computed as follows: 3.
Once the size of the shipment is determined, the decision maker can choose an action (a t ), which consists of whether to send a shipment of size Q t or not to send a shipment at that period.

4.
After making the decision, up to the end of the current period new donations (i t ) are received and the demand of that period (d t ) also arrives.

5.
By the end of the day, the collection center closes and opens the next day, starting a new period t+1.

6.
Starting a new period t+1 the available inventory (I t+1 ) and accumulated demand (D t+1 ) are updated with the information of the previous period, adding the new donations and demand and subtracting the sent donations. In the case that the action taken was not to send, Q t will be zero. This update is computed as follows: 7.
With this updated information, as in period t, the shipment decision is made for any t.

Markov Decision Process Model
This section presents the Markov Decision Process Model proposed for the collection center operations. It starts with the assumptions under which the model was developed and continues with the five elements of the MDP model, as well as the value function to be minimized.

1.
In the addressed problem, we are looking at a finite horizon, discrete time setting with length T. The value of T is determined by the decision-maker based on several factors such as the type of disaster, the scope of the impact on the affected population, etc.

2.
Time is discrete and each of the time epochs or stages (t) can represent one working day or any period that fits the scenario.

3.
It is assumed that the random variable of the available inventory (I t ) and accumulated demand (D t ) are Markovian and their values only depend on the previous inventory, previous demand, and the action (a t ) taken in the previous period. This is a reasonable assumption, given the nature and dynamics of inventory management and shipments made under these kinds of events. New inventory levels will depend on random donations and shipments that took place in the previous period.

4.
The total number of families in the affected area (N) is known at the beginning of the emergency, although not all of them are necessarily in need of rescue or response.
This can be known by the population census owned by the government and is a free access for organizations. 5.
The affected population that generates the demand of the collection center can be modelled as a binomial random variable (q t ), since each family demands a kit, independently of the situation faced by the other affected families [12,13]. This implies having a certain probability of families out of N that will demand a kit in period t (s t ).
The average number of donors (µ) and the average amount of donations per donor (λ) are defined based on the type of disaster that took place and the specific characteristics of the affected population. 7.
The approximate proportion of the families that will demand a kit in period t (s t ) and the behavior of this proportion throughout the time horizon can be estimated by the organization. This value depends mostly on two factors: The socioeconomic status of the affected region (i.e., if the socioeconomic level is low the demand will be higher) and the second factor is the impact of the disaster, according to the Florida Post-Disaster Redevelopment Planning Guidebook (2010). 8.
The number of kits received during a period t (i t ) is modeled as a Compound Poisson Process, since the number of donors arriving in each period is random, as well as the amount of donations they will provide.
The initial inventory is zero since all the prepositioned products are assumed to be sent in the initial response phase. 10. The shipments are made for national disasters, therefore, the shipping time is considered to be one day, and it is assumed that the shipment arrives complete. 11. The transition probability function for demand q t (d t |(I t , D t ), a t ) is the increasing failure rate (IFR). In the context of our problem, this means that the higher the demand, the more probable that the demand will keep increasing. 12. For this model, it is considered that the shipment cannot be much larger than the current demand. This is due to the fact that having a surplus of supplies can compromise their integrity since they are exposed to theft and damage.
Limitations in transport capacity are not considered in this MDP. This is due to the fact that during the interviews carried out with the leader of the Mexican Red Cross, it has not been a core boundary since they not only have their own truck fleet, but they have an agreement with the government and some multinationals such as Walmart to assist with their infrastructure, if necessary.
It is also worth mentioning that, considering the operations of the Mexican Red Cross, standard kits are designed according to the population's basic needs, considering the average family size and eating habits of the country. The design and use of kits happens also at an international level with the International Federation of the Red Cross. There may be other items demanded beyond these kits, in which supply, including donations, and demand may be handled by other organizations supporting the affected population.

Decision Epochs
Our decision epochs are assumed to be discrete stages where the decision is made at the beginning of each epoch t and T − 1 is the last epoch when a decision is made. Therefore, the last shipment would be sent in T − 1 if the decision for that last epoch dictates so.

State Variables
The state of the system in each period t is defined by the tuple (I t , D t ). I t describes the inventory on hand at stage t which is suitable to be sent to the affected area.
This variable carries the accumulated donations from period 1 minus the quantity of kits sent in the previous periods and is computed as presented in Equation (4). This equation is equivalent to the Equation (2) previously introduced.
The second variable D t , describes the cumulative unsatisfied demand (in number of kits) in the affected area at stage t. The ordered elements of D t are represented as: This variable carries the cumulative demand from the beginning minus the number of kits that have already been shipped in the previous periods and is computed as follows: The states of the system are grouped by the level of inventory and are partially ordered according to the accumulated demand.

Actions
The goal of the MDP is to decide whether to send a shipment in period t. This decision is made at the beginning of the period and has the following two possible alternatives.
The action a t = {0} represents the decision of not sending a shipment in the current period. This implies that the available inventory will continue to be stocked in the collection center for one more period and the demand will be accumulated for one more period, as well.
The action a t = {1} represents sending a shipment of size Q t in the current period. The size of the shipment depends on the current available inventory and the current demand. If the available inventory is less than the accumulated demand, the shipment will be the size of the inventory trying to satisfy as much demand with what is available. On the other hand, if the inventory is greater than the demand, the shipment will be the size of the accumulated demand plus the expected demand of the next period. The size of the shipment is previously presented in Equation (1).

Transition Probabilities
The transition probabilities for this model are composed of two independent probabilities since the model includes two state variables: The available inventory (I t ) and the accumulated demand (D t ). These two variables have different and independent behaviors, therefore are modelled separately and then a joint probability function is presented.

Donations Probability Function
p t (I t+1 |(I t , D t ), a t ) is the probability of reaching to the state of I t+1 donations, given that the current state is (I t , D t ) and action a t is taken.
As presented in assumption 8, the value of the received donations in period t (i t ) is modeled as a Compound Poisson Process. The value of i t directly impacts the state variable of the available inventory for the next period (I t+1 ). Therefore, the state space for the next period depends on the current state, the action taken, and the donations received.
In the case of choosing action 0, not sending a shipment in that period, the possible future states depend exclusively on the number of donations received (i t ). Hence, the future states will have the current amount plus the received donations (I t + i t ). The possible states can go from having no donations whatsoever and the inventory level stays in (I t ) to receiving the necessary amount of donations to filling the collection's center capacity (W).
For the case of choosing action 1, sending a shipment of size Q t , the possible future states will depend on the size of the shipment made (Q t ) and the quantity of donations received (i t ). The future states could range from the remaining inventory after the shipment (I t − Q t ), assuming there are no additional donations, to the full capacity of the collection center, considering that they received W − (I t − Q t ).
Since in both cases there is a limited number of feasible states in the state space, the traditional Compound Poisson Distribution is adjusted to its conditional probability by dividing it by the probabilities of the feasible states.
The probability function is computed as follows: For a t = {0}: For a t = {1}:

Demand Probability Function
is the probability of arriving at state D t+1 kits demanded, given that the current state is (I t , D t ) and action a t was chosen.
As presented in assumption 6, the value of the demanded kits in period t (d t ) is modeled as a binomial random variable, where s t represents the probability of a family requesting a kit at time t. The value of d t directly impacts the state variable of the available inventory to be sent for the next period (I t+1 ). Therefore, the state space for the next period depends on the current state, the action taken, and the donations received.
Similar to the case of available inventory, the feasible future states are determined by the current state, the action taken, and the demand in period t. For the case of choosing action 0, the demand will continue increasing and the possible states will range from the current demand (considering that no more kits were demanded during that period) to the total number of families in the affected area (N), assuming each would need a kit.
In the case of action 1 (i.e., sending a shipment of size Q t ), the possible future states are divided in two categories that depend directly on the size of the shipment: (a) If the shipment sent is greater than the accumulated demand in period t or (b) if the shipment is less than or equal to the accumulated demand in period t.
In case (a), all the demand will be satisfied completely and there will be extra inventory after satisfying the demand. Therefore, the possible future state can go from having zero accumulated demand, considering that the demand in the next period is less than or equal to the remaining shipment, to having the maximum demand possible. The maximum demand possible is the total number of families N minus the number of families that have been served with the shipment.
In case (b), the demand was not completely satisfied. Therefore, the future states will be staying in the same level of demand, i.e., d t is equal to zero, to having all the families that were not served with Q t demanding kits.
The probability distribution function is computed as: For a t = {0}: For a t = {1}: For case a: For case b:

Joint Transition Probability Function
represents the joint probability of having a certain amount of demand and donations given the current state and the action taken. This is the product of the independent probabilities p t (i t |(I t , D t ), a t ) and q t (d t |(I t , D t ), a t ), as follows: 3.6. Cost Function r t ((I t , D t ), a t ) is the total immediate cost incurred in period t when action a t is taken. The cost function for action 1 is formed by: The cost of the unsatisfied demand, in case there is any, after the shipment is delivered in the first term, the fixed and variable costs of sending a shipment in the second and third terms, finally the cost of holding the rest of the inventory, if any, in the last term.
On the other hand, the cost function for action 0 is formed by the cost of unsatisfied accumulated demand in the first term and the holding cost of the available inventory at the collection center.
r T (I T , D T ) is the immediate cost incurred in the final period, when no decision is made.

Value Function
This problem can be formulated as the following optimality equations:

Monotone Optimal Non-Decreasing Policy
To establish the set of conditions to assure that a Monotone Optimal Nondecreasing Policy (MONDP) exists, a physical interpretation and a natural ordering of the states is necessary. In this case, the states are ordered by the accumulated unsatisfied demand (D t ), i.e., a larger size of unsatisfied demand represents a higher level in the states of the system. Figure 3 visually represents the behavior of the partially ordered groups. The groups are formed according to the current level of inventory, since the availability of inventory determines the maximum shipment size to cover the demand. Therefore, it depends on the level of the demand if the shipment is sent in that period.
Each group of states (I t , D t ) is partially ordered according to the following criteria: • At each period of time t, a group of states with different values of I t is generated.
They are defined as I Each group of states generated at time t, has a logical order according to the levels of the accumulated non-satisfied demand D t .
In the MONDP, the threshold where the action of "not sending" changes to "send" is represented by a control limit with the following structure, where D * t represents the demand level from where it is optimal to start sending: Theorem 4.7.4 of Puterman [14] provides conditions under which there exist monotone nondecreasing optimal policies in all the states for t = 1, . . . ,T−1. These conditions presented in the context of our problem are:

5.
u T (I T , D T ) is non decreasing in (I T , D T ).

Mathematical Verification for the MOND Policy
For a setting to present this monotone behavior, it is necessary but not sufficient to meet the conditions mentioned in the previous section. The setting of the problem must meet the model parameters that fit the policy, as well.
The mathematical proof of the five conditions and the lemmas that sustain them are presented in this section.

Condition 1
Equation (8) a is non-decreasing in (D t ) for each level of inventory I t and action a = 0. This implies that the cost incurred in the collection center will increase with the increase of the accumulated unsatisfied demand for any fixed level of inventory.
For a = 0 and for all I t This condition implies that the larger the accumulated demand D t (in number of kits), under the decision of not sending, the higher the costs will be.
This reduces to: This inequality stands since r t (I t , D t ) < r t (I t , D t + 1).

Condition 2
Condition 2 states that q t [(D t+1 = D l )|[(I t , D t ), a t ] is a non-decreasing function in (D t ) ∀ D l and a t = {0, 1}. In the context of the problem, this implies that the probability of having a higher accumulated demand in the next period is more when the current accumulated demand is high.
This condition is stated as follows: In this instance, there are three possible cases: Case 1: D l < D t for a t = 0. Case 2: D t ≤ D l ≤ D t + i for a t = 0. Case 3: D t + i < D l for a t = 0.

Case 1
In the first case, the accumulated unsatisfied demand in t−1 is greater than D l for both sides of the inequality. The probability of the accumulated unsatisfied demand being higher than or equal to D t is 1. Therefore, the inequality is satisfied as an equality.
Case 2 In the second case, the accumulated unsatisfied demand is higher than D l in the right-hand side of the inequality so, the probability of having at least D l in the next period is 1. Therefore, the inequality corresponding to case 2 is satisfied independently of the value of the probability term on the left-hand side since the maximum value is 1.
Case 3 The condition can be stated as: It can be rewritten as: Equation (11) represents the comparison between two random variables X and Y as defined in Lemma 1 as: This inequality is proved through Lemma 1. With this, Condition 2 is verified as one of the conditions of the existence of a Monotone Optimal Nondecreasing Policy.

Proof of Lemma 1
Recall that X n is a binomial random variable with parameters n and p, where n = 1,2, . . . and 0 < p < 1. Its expected value is µ n = E[X n ] = np. Therefore, for any n, m = 1,2, . . . , n−1 and for any integer there is: Proof: Consider P(X n > l) − P(X n−1 > l) where l is an integer. Therefore: This implies that the probability to experience more than l successes is greater when an additional experiment is added to the sequence of Bernoulli trials.
P(X n > l) − P(X n−1 > l) > 0 This relation can be scaled with a random value of n as:

Condition 3
Condition 3 requires Equation 8.a to be a superadditive function. Puterman states that this happens when we have partially ordered sets X and Y and g(x, y) is a real valued function on X × Y. A superadditive function then holds the following inequality: To demonstrate that Equation (8) a is superadditive it is be shown that both Equation (7) a and ∑ N j=0 P t+1 I t+1 , D t+1 j ) (I t , D t , a t × u * t+1 (I t+1 , D t+1 ) are superadditive. Proposition 1 demonstrates that it is superadditive.

Proposition 1
The function r t ((I t , D t ), a t ) is superadditive. This is represented as: r t ((I t , D t ), 1) + r t ((I t , D t + 1), 0) ≥ r t ((I t , D t ), 0) + r t ((I t , D t + 1), 1) Proof: This instance presents two possible cases: This statement is true, therefore, the condition is met. Case 2: For A and B. Proof by Lemma 2 j which is superadditive by the previously stated definition and the assumed grouping and ordering of states and actions, thus completing the proof. Proof of Lemma 2 (Adopted from [15]): Let H be an IFR transition probability matrix and V(h) be a nondecreasing function. Then, the following holds: Since the value of the demand does not depend on the on-hand inventory and it does not change for each inventory group, this condition is satisfied as an equality.

Condition 5
Equation (7) c is non-decreasing in (I T , D T ). This implies that the final cost is greater when the accumulated demand in T is greater.

Conclusions
Natural disasters present a latent threat for every country in the world, challenging the communities and the humanitarian organizations to be better prepared and react faster to the situation. Along with the government and the humanitarian organizations, citizens also either volunteer or are asked to help with their donations.
In many countries, including emerging economies such as Mexico, around 80% are inkind. Therefore, the efficient management of in-kind donations has become a key element of disaster relief operations in such countries. Other countries, especially in developing economies, present a similar donations behavior and, in consequence, suitability for implementation of the insights and recommendations presented through this research.
With the importance of in-kind donations, comes the importance of collection centers, where these donations are received, sorted, packed, and shipped to the affected areas. An important decision made is when to make the shipment, considering the tradeoff between resources limitations and the urgency of aids. This tradeoff is addressed by developing a decision process model.
Within their operations, collection centers face a high level of demand uncertainty, as well as the supply side since it mainly depends on the donations of companies or the community. This complex problem was modeled with a Markov Decision Process to address the uncertainty and complexity of the decision-making process. A Monotone Optimal Non-Decreasing Policy is developed for the use of decision makers.
The existence of such a policy is mathematically proved by five conditions and the verification of such conditions is presented throughout this paper. The proof of these conditions and the MONDP represents a valuable insight for decision makers in humanitarian operations since it helps in making better decisions in times of crisis.
Among the limitations of this work, which present useful avenues for future research, are the parameters and assumptions considered to estimate the number of donors and the average amount of donations per donor. Moreover, the proportion of families that may demand a kit at a particular point in time can be further improved by considering other factors beyond the socioeconomic status of the impacted region. In a similar manner, the number of kits received during a particular period can be modeled beyond a Compound Poisson Process to explore the system dynamics under such conditions. This research can motivate further research in inventory management for disaster relief situations, especially considering the particularities and considerations that need to be taken for in-kind donations, such as variability and variety. Moreover, the link with humanitarian organizations and using real-time data, presents an important line of research for future applications of this model.