Integrated Inventory Transshipment and Missing-Data Treatment Using Improved Imputation-Level Adjustment for Efﬁcient Cross-Filling

: This research investigates an integrated problem of transshipment for cross-ﬁlling and imputation for missing demand data. Transshipment for cross-ﬁlling has proved effective in miti-gating shortages with relatively low inventory, thus reducing resource consumption in inventory management. Although accurate demand data are critical for cross-ﬁlling decision making, some demand data are inevitably incomplete. These missing data should be treated for effective transshipment operations. Despite the importance, these missing data issues have not been adequately studied for transshipment problems. This paper addresses how transshipment can be conducted under missing demand conditions. A novel integrated problem is established to combine demand-data imputation processes and transshipment decisions. Imputation strategies and new algorithms suitable for transshipment are developed to handle missing demand data. Diverse demand and transshipment cases are analyzed for cost-effectiveness. The analysis uncovers that conventional straightforward imputation methods result in inferior transshipment decisions. The study also reveals that imputed values should be adjusted to appropriate levels for transshipment to be effective. The strong interplay between imputation processes and shortage prevention is also discovered for transshipment with missing demand. This study demonstrates how inventory transshipment can be carried out successfully with appropriate treatment of missing demand data in practice.


Introduction
This paper investigates a novel problem of integrating inventory transshipment for conducting cross-filling among distribution centers and data imputation for handling incomplete demand data in transshipment. Although accurate demand data are critical in cross-filling decision making, missing values in demand data are inevitable. These incomplete values should be handled properly to generate transshipment decisions, because transshipment decisions are strongly based on the demand data. However, conventional imputation methods often lead to incorrect transshipment decisions. In addition, the impact of inventory and imputation parameters on transshipment has not been fully investigated. It is also challenging to streamline statistical imputation processes and the transshipment optimization process involving vehicle routing and inventory management. This study presents an integrated approach to solve these problems.
Cross-filling among distribution centers (DCs) has demonstrated its effectiveness in preventing shortages with comparatively low inventory levels at DCs [1][2][3]. The shortage at DCs often leads to a shortage at retail stores, resulting in sales loss and customer dissatisfaction. A high inventory level often incurs additional storage and handling activities as well as high holding costs and a substantial risk of inventory obsoleteness. These often result in great financial loss, energy consumption, greenhouse gas emissions, and waste disposal. This is especially true for cold-chains or perishable items such as medicine or fresh food. A high inventory level is also not desirable for short-product-lifecycle items such as high-technology products. To avoid shortage without high inventory levels, cross-filling has often been used as one of the effective strategies. Cross-filling is to transship product items of a DC to other DCs experiencing a shortage of the same product items. Figure 1 illustrates the transshipments among DCs and their routes. In Figure 1, DCs 1 to 4 cross-fill inventory each other by transshipping. DC 5 does not participate in this specific transshipment, because shortage mitigation by transporting goods to or from DC 5 is probably not cost-effective. Transshipment (cross-filling) functions as a pooling strategy at DC levels to reduce the overall inventory levels for both financial and environmenta sustainability. Accurate demand data are critical in cross-filling decision making and operations. All strategic and operation plans are based on demand data. In particular, for cross-filling, the quantity and route of transshipment is determined by the amount of demand to fill and possible inventory and shortage. Thus, correct and timely demand data are indispensable for optimal transshipment operations.
Inevitable missing values in demand data should be handled properly for effective transshipment operations. Each DC receives orders from its retailers typically on a regular basis, e.g., daily. These demand data for a DC may contain missing values due to many reasons, such as monitoring failures or data-logging errors. A late rush order may arrive after an ordering time window has closed. These missing demand values should be treated properly for effective transshipment planning; simply ignoring them will lead to possible shortages and flawed transshipment routes. Thus, these missing demand data should be replaced with appropriate values to be included in optimal transshipment planning.
Although missing demand data strongly affect the quality of transshipment decisions, the missing data issues in the transshipment problem have not been thoroughly investigated. The previous studies conducted extensive research on transshipment policies or optimal transshipment routes. However, few of these studies have fully addressed how to handle the missing demand data cases for transshipment. Preliminary studies revealed that conventional simple imputation resulted in high shortage penalties. In addition, this problem is not a simple shortage versus transshipment cost problem. Missing demand values and replaced ones affect transshipment routes and amount. Conversely, vehicle routing routes and inventory parameters affect how much the imputed values influence transshipment. Therefore, either of the imputation or transshipment problems should not be solved separately. However, little research investigated this interwoven problem of missing demand treatment, inventory transshipment, vehicle routing, and shortage prevention.
This study investigates how inventory transshipment can be carried out effectively under missing demand data conditions. In other words, the primary goal of this study is to answer the question of what the most effective inventory and imputation strategies are to improve transshipment decisions on the condition that some of the demand data are missing. First, this paper characterizes the problem of inventory transshipment with missing demand data. This research also compares different approaches to handling missing demand data. Several imputation methods are proposed to handle the missing data, considering different inventory strategies. This paper also demonstrates how the imputation methods can be incorporated into transshipment decisions and operations for cross-filling. In addition, this paper analyzes the qualitative and quantitative impact of the data missingness and imputation methods on the cost saving by transshipment.
One of the primary contributions of this study is to show how imputation approaches can be integrated into transshipment decisions and operations for cross-filling. The newly developed integrated approach combines a statistical imputation process and adjustment algorithm with transshipment decision procedures, thus overcoming the limitations of the current separated methods. This study also uncovers the characteristics of different demand imputation and adjustment methods for the application to inventory transshipment. The proper imputation levels and adjustment are also obtained to achieve cost-effective transshipment. In addition, this study provides managerial insights into the effect of missing data and imputation methods on the inventory transshipment.
The remainder of this paper is organized as follows: Section 2 reviews the related literature. Section 3 describes the transshipment model and treatment methods with missing demand data. Section 4 reports case studies and analyzes the results. The conclusions and future research directions are presented in Section 5.

Review of the State-of-the-Art Research
Transshipment or cross-filling has been considered important inventory pooling strategies [1,3,4] for supply chains. Diverse transshipment problems have been studied, depending on inventory conditions and mathematical models [2]. Rim and Vu developed a genetic algorithm to solve transshipment vehicle routing with simultaneous pickup and delivery (TVRSPD) problems [1]. A mixed-integer program was suggested for a generalized transshipment problem [5]. These studies seek optimal transshipment routes and amounts in response to shortage caused by high demands.
The transshipment problems have been studied for a variety of aspects and properties. The impact of lateral transshipment on the bullwhip effect was investigated in a two-tiered supply chain system [6]. Heuristic policies were suggested for a multilocation system with combined transshipment and production control [7]. A problem of group configuration was investigated for within-groups transshipment [8]. DCs' behaviors were examined for their willingness to share all inventory in a complete network [9] or complete pooling condition [10]. A combined control policy for production and transshipment was proposed for two location firms [11]. A transshipment policy with two thresholds for a logistics service system was suggested [12]. Transshipment between two newsvendor retailers was investigated for the impact on the optimal inventory [13]. Heuristic algorithms were also proposed for a transshipment inventory-routing problem [14]. Other diverse aspects of supply chains have been described, including sustainability and new technologies in supply chain network design [15]. Other topics include closed-loop supply chains [16] and healthcare logistics and service networks [17].
Extensive research has been conducted to address demand uncertainty. Demand is not always completely observable, or inventory records may contain errors [18]. Firms cannot accurately forecast demand or inventory in a volatile market. The newsvendor problem with incomplete demand information has been considered for a long time [19]. A base-stock list-price policy was studied using K-approximate convexity with incomplete demand information [20]. Extensive studies related to demand uncertainty exist in the literature, e.g., a method based on order statistics [21], stocking policy based on bootstrapping [22], and demand rate uncertainty model using relative entropy [23]. Other examples include a transshipment policy described by demand uncertainty and transfer lead time fuzziness [24].
A missing data problem is one of the common real-world data uncertainty issues. For example, missing data appear in survey research due to inappropriate survey frames, as well as null or irrelevant responses [25]. Incomplete data also appear in sensor networks owing to power outages, integrity attacks, or bit errors in transmission [26,27]. Financial data also contain missing or partial data due to market closures for holidays, failure to collect the data within the prescribed time frame, or recording errors [28]. Missing data issues also occur in many other fields, such as industrial database processes [29], telecommunications and computer network management [30], and transportation management systems [31].
Missing data can be handled using various approaches. The arrays containing missing values may be deleted, or missing values may be imputed [32]. Numerous approaches to impute missing data exist, such as mean [33], regression [34], and multiple imputations [35]. Different machine learning approaches have been employed for forecasting and backcasting missing demand data due to improper database management systems [36]. The application of clustering and missing data imputation were examined for a large electricitydemand dataset [37]. The mechanisms of missing data are often categorized into the following three types and considered accordingly for imputation: missing completely at random (MCAR), missing at random (MAR), and not missing at random (NMAR) [38].
Although transshipment problems, data uncertainty, and missing-data treatment have been investigated extensively, none of the above studies have properly addressed the problem of transshipment under missing demand data. This study is performed to address this gap.

Transshipment Model and Missing-Data Treatment
Section 3.1 defines the problem of this paper: an integrated missing-data treatment and transshipment decision problem. Section 3.2 describes the steps of data imputation and transshipment. Section 3.3 presents the two stages of the missing data treatment approaches. Section 3.4 describes the combined mathematical representation of the problem.

Problem Definition and Assumptions
This section describes the transshipment processes and the treatment of missing demand data for the transshipment decisions. We assume that a central decision maker (headquarters, HQ) collects all information and makes a cross-filling decision. The HQ acquires the demand information for DC i from the retailers located in the region served by DC i. A missing demand case occurs when a retailer does not provide demand data within the ordering time window. If a case involving missing demand data occurs for item k at DC i, we set the indicator variable M ik = 1. We assume that n = ∑ ∑ M ik demand values are missing. For planning transshipment, the missing demand data cases may be ignored, or the missing values are to be estimated (imputed).
This study assumes an inventory policy by which all DCs replenish daily up to a certain level (U), because it is known that such policies result in optimal long-term performance [39]. The order-up-to level for item k at DC i is set as U ik = µ ik + z α σ ik , where z α is the z-score in the standard normal distribution corresponding to in-stock probability α. In this paper, the in-stock probability is referred to as the service level, which is often defined as the probability of not being stock out during a time period [40]. We assume that the upstream suppliers (manufacturers) provide sufficient product items to cover the order-up-to level. Using the target in-stock probability, order-up-to level, and ignored or estimated demand, the expected inventory and shortage levels before transshipment (I − ik and S − ik , respectively) are generated.
Using the inventory and shortage information as well as the demand data, the HQ plans the optimal transshipment among DCs. Then, the DCs carry out the transshipment planned by the HQ. The HQ also places a replenishment order to the upstream suppliers for the DCs. After the transshipment is completed, each DC delivers the product items to its retailers. Figure 2 illustrates the relationship in transshipment operations and information exchange. The main decisions for the transshipment include the imputation of missing demand values and planning of the optimal transshipment amounts and routes. Thus, this problem can be considered an integrated problem of statistical treatment of missing values and transshipment vehicle routing with simultaneous pickup and delivery (TVRSPD) [1,5].

Notations
The symbols used in this paper are described in Table 1.
Demand data including both non-missing values (d c ik ) and imputed ones Matrix of demand data including both non-missing values (d c ik ) and imputed ones (d m ik ) Decision variables

X ijk
Quantity of product item k to be transshipped from DC i to j; non-negative integer N ij Number of trucks used for transshipment between DCs i and j, 1 ≤ i = j ≤ Y; non-negative integer Expected inventory and shortage, respectively, of product item k at DC i if transshipment is carried out; non-negative integer

Timeline of the Operations and Information
This section describes the timeline of the operations, information stream, and decisions for the inventory management with transshipment under the condition of missing demand data. The timeline clarifies the relationship among operations [1,5] and decisions as well as calculations of relevant values.
The timeline is organized in nine steps, as follows, and is illustrated in Figure 3. The time on the right-hand side of Figure 3 is only for illustrating relative times. The nine steps of the timeline are defined as follows.
Step 1: The previous day's replenishment order arrives at each DC from the upstream suppliers.
Step 2: Demand data for the next day are given for DC i from the retailers located in its region by late afternoon (e.g., 4pm).
Step 3: Missing data cases are handled.
(a) Identify the missingness (M ik ) of daily demand data (d ik ). (b) Impute the incomplete demand data (d m ik ) to obtain complete daily demand D.
Step 4: Expected inventory (I − ) and shortage (S − ) are evaluated for each product item at DC i by assigning its on-hand inventory to the daily demand: for d ik > U ik , Step 5: Determine the optimal transshipment plan (X ijk and N ij ) and compute the expected I + ik and S + ik for the case in which transshipment is carried out.
Step 6: DC i places a replenishment order to the upstream supplier so that the inventory level after replenishment on the next day becomes U ik .
Step 7: Transshipments are carried out among DCs.
Step 8: The true value of the missing demand may become available.
Step 9: Delivery from DCs to the retailers are carried out.
Go to Step 1 for the next day's operation.
Without loss of generality, this study assumes that the nine steps repeat daily. This time interval can be set to any length if necessary.

Treatment of Missing Demand Data
This section describes the imputation strategies and algorithm used for missing demand data. If demand data are missing for item k at DC i, then indicator variable M ik is set to 1. We assume that n (= ∑ ∑ M ik ) demand values are missing or incomplete.
In our transshipment problem, missing data occur when retailers do not provide demand data within the ordering time window. Missing data are the result of external factors (lateness of demand information), indicating that the missing mechanism in this case is MAR.
The missing demand data cases can be handled in a variety of ways. They may be simply disregarded during transshipment planning. The missing values can be estimated and replaced (imputed) for a more effective transshipment planning. Without loss of generality, this study investigates four representative imputation methods corresponding to common protection levels against shortage.
The imputation process consists of two stages. The first stage is to set a temporary imputation value, depending on the shortage mitigation strategies. The second stage is to adjust the imputed value to a valid one.

First Stage of the Imputation Process: Value Selection and Calculation
As the first stage, temporary imputed value d m * ik is set using one of the following methods: These imputation treatments are characterized by shortage protection levels. Facing missing demand data, a decision maker can select from diverse options. The decision maker may decide to disregard the missing values as if the demand does not exist (imputation with "0"). The decision maker may replace them with the average of the known data. The decision maker may impute them with higher values for better protection (reduced shortage).
Four main strategies and corresponding imputation methods are explained below: (1) No protection imputation (imputation with "0"): In this method, we replace the missing values at M ik = 1 with zeros, such that d m * ik = 0. In other words, we assume that no demand exists if these demand values are not available from the downstream retailers. This imputation has a consequence: it makes the calculated value of the expected shortage become zero. This is caused by the current positive inventory level at the time of imputation calculation. This imputation method is simple and easy to use. However, it has a risk that an actual shortage case may be designated as a no-shortage case.
(2) Low protection (imputation with average): In this method, the missing data are imputed with the mean of the known demand data. For example, for item k, the values from all the other DCs having received demand data can be used to calculate Although positive demand values are used to impute the missing values, the resulting imputed values are usually lower than U ik . Thus, similar to the zero-imputation method above, this imputation method also often leads to evaluating no or little expected shortage for the missing data points. Using this imputation method, some actual shortages may not be prepared adequately. Here, we assume that a missing data case has an underlying highly fluctuating demand, and shortage is highly expected with the current on-hand inventory. Thus, the imputed values are set higher than those in the previous methods. For example, with the target in-stock probability α and fluctuation level β, the missing value can be imputed as d m * ik = U ik (1 + β)/α. Parameter β can be determined using a certain ratio to the demand variability or in-stock probability. This imputation method incurs extra transshipment costs but allows more shortage cases to be taken care of.
These four imputation methods are representative cases of common inventory strategies in cross-filling and are chosen to reveal the efficacy of the imputation methods. Although numerous imputation methods exist, it is not possible that this single study covers all such cases. The details of these methods are illustrated with numerical examples in Section 4.

Second Stage of the Imputation Process: Adjustment and Validation
As the second stage, the temporary imputed value is adjusted to a valid one, if necessary. In particular, the imputed value needs to be limited below an upper bound so that the total expected inventory (I − ) is sufficient to cover the total shortage (S − ) of item k at a DC.
The algorithm presented below finds such an upper bound and adjusts the imputed value not to exceed the bound. The flowchart shown in Figure 4

Calculation of Transshipment Quantity
Given the imputed demand data and the expected shortage among DCs, the transshipment plan among DCs is determined. Transshipment planning mainly involves the calculation of transshipping amounts and routes. This calculation itself is not the main concern of this study. Thus, various methods, such as those presented in [1,5], can be used, provided that the transshipping amounts and routes are correctly calculated. The following mathematical model is a combination of a mixed-integer program (MIP) by Rim and Jiang [5], selected for convenience, and the new constraints for processing the missing demand data (Equations (5) to (8)), as suggested in Section 3.3.
The integrated optimal transshipment and missing-data treatment is expressed as follows: The objective function in Equation (2) aims to minimize the total costs, including trucking (the first term on the right-hand side), handling (second term), resulting shortage (third term), and inventory holding (fourth term) costs. Please note that the objective function can be extended to include various measures.
Equations (3) and (4) represent that a DC can simultaneously ship-out and ship-in each product item through a series of DCs at most as much as the order-up-to level U ik . Equation (3) expresses that the total number of items transshipped from location i cannot exceed the maximum inventory amount restricted by the order-up-to level.
Equations (9) and (10) restrict the transshipment volumes not to exceed the trucking volume capacity in either direction between a pair of DCs. In Equation (9), v k represents the volume of product item k. The left-hand side expresses the total volume of all items. The right-hand side expresses the total transportation capacity between locations i and j.
Equation (11) prohibits the transshipment between DCs of too long distances.

Numerical Analysis and Discussion
This section analyzes the imputation scenarios suggested in Section 3 using numerical simulations and discusses the impact of the imputation methods and managerial implications.

Parameters and Demand Data
To illustrate the characteristics of the problem and suggested imputation approaches, this study investigates sets of the cases with seven DCs and five inventory items. Although there can be many DCs and a DC may manage numerous inventory items, these case sets can represent the transshipment of critical items and key DCs for a supply chain. The size of these cases would also fit illustrating the effectiveness of different imputation approaches.
Realistic cost parameters for TVRSPD [1] were adopted to evaluate the transshipment performance under practical conditions. We use representative cost factors listed in Table 2 to simulate the transshipment cases and the suggested data imputation approaches. The terms and symbols are defined in Section 3.1. Some parameters are to be changed for sensitivity analysis in later sections. Many cases with different parameter values are to be generated for statistical analysis as shown in later sections.  The demand of item k at DC i is assumed to follow a normal distribution N(µ ik , σ 2 ik ), where µ ik is generated from a uniform distribution with (50, 60). The normality assumption is reasonable enough for many practical cases. In order to analyze more generalized cases not restricted by a specific standard deviation value, demand variability is represented by the ratio to the mean, the coefficient of variation (CV). This paper considers the demand variability that is not so high, meaning a CV not exceeding 0.5. Thus, the representative standard deviation σ ik is set as c · µ ik = 0.3µ ik , where c is the CV. Using these parameters, the demand values were simulated. The target in-stock probability is set as α = 0.85. This specific probability was chosen based on the transshipment problem characteristics as well as our experience and preliminary studies. If the probability is one of the common values in industry such as 95% or 99% in service levels, then transshipment is probably not necessary. This is because high safety stock levels virtually prevent shortages, and thus a very slim chance of cross-filling exists. Recall that the primary purpose of transshipment is to reduce inventory levels. Thus, a slightly lower value among the common probabilities was chosen. In fact, we conducted preliminary studies with different probabilities, and the results verified the appropriateness of such number choices. This paper presents only such meaningful cases, and a broader analysis would be beyond the scope of this single paper. The corresponding order-up-to level is calculated U ik = µ ik + σ ik z α . Subsequently, using U ik and the demand value, the expected inventory and shortage before transshipment (I − and S − ) are computed. In this numerical example, parameter β for the high-protection imputation method is chosen to be equal to the CV of the demand (β = c = 0.3). This parameter choice enables us to examine the effect of the inventory increase comparable to the demand variability.

Simulation with Imputation Scenarios
Although simulations were conducted for a large number of cases in this study, the analysis in this section presents a few important cases representing the main characteristics of the problem.  (3) DCs not on the transshipment route. In this section, we focus on the cases of the missing data at the shortage DCs, since shortage prevention is the hallmark in cross-filling. The effect of missing data is minimal or irrelevant in the other two DC types. Table 3 summarizes the data used for the simulations and analyses. In Table 3, the row represents DCs, and column expresses inventory items. The shaded matrix elements indicate that these are on the transshipment routes. In the upper left corner of Table 3, the order-up-to level is presented. Below the order-up-to levels, the original demand data are shown. Below these values, two matrices provide the values of estimated inventory (I − ik ) and shortage (S − ik ) before transshipment. In the matrix of the estimated shortage before transshipment (S − ik ), elements S − 51 , S − 12 , S − 34 , and S − 75 present shortage values, as shown at the bottom left of Table 3. Namely, in the original data without missing demand values, the shortage before transshipment is expected at four locations: S − 51 , S − 12 , S − 34 , and S − 75 . The remaining part of Table 3 indicates the location of the missing demand, and presents the imputed values by the four methods and the corresponding expected inventory and shortage before transshipment. The simulation assumes that some of the demand data are missing at the shortage DCs. For K = 5 and Y = 7, three (n = 3) random missing values exist among those data points. In Table 3, the imputed data are indicated by the bold boxes. In the simulation, M 51 , M 34 , and M 75 are randomly chosen as missing points, as shown in Table 3.
For the original non-missing demand data case and four imputed demand data cases, the optimal transshipment routes and amounts were determined. For these optimization problems, exact solutions were found using commercial MIP optimization software CPLEX®. Because the problem sizes are not huge, the solution times were short in usual workstations. If the problem size is large, other algorithms [1] could be used to find near optimal solutions. The optimal costs were also computed. The comparison of these results is presented in the subsequent sections.

Result of Each Imputation Scenario
In this section, for each of the four imputation methods, the simulated results in Section 4.2 are analyzed in terms of shortage protection and cost. The differences are also summarized. Analyses of more diverse cases are shown in Section 4.4.

Zero Imputation
Consider the case of imputation with "0" in Table 3  The imputation with "0" at the shortage DCs makes the true shortages hidden. This causes the true shortage cases not to be considered in transshipment planning. This incurs the shortage penalty when the true demand is realized. Although this imputation method is a fast and easy way, applying this will induce a negative effect on the fulfillment and customer satisfaction. Thus, disregarding the missing demand data may not be a favorable approach if the shortage cost rate is high.

Low-Protection Imputation
The case of low-protection imputation is shown in Table 3 (c). The missing data points are imputed by the mean of the known demand value of item k among the other DCs (e.g., d m . After imputing the values, only one shortage S − 12 = 14 is expected before transshipment. This imputation suggests no expected shortage at the missing data cases, because the imputed demand values are lower than the order-up-to level U ik . The true shortages at the missing data points remain unprotected after the transshipment. Thus, in the specific simulations presented in this section, the transshipment and shortage costs are not different from those by the method of imputation with zero (Section 4.3.1). The cases with different costs are presented in a later section. This average imputation approach may not be suitable for handling the missing data at the shortage DCs if the shortage cost rate is not low.

Medium-Protection Imputation
The case of the medium-protection imputation is shown in Table 3 (d). The missing demand data are imputed by values that are higher than U ik based on the target in-stock probability α, i.e., d m ik = U ik /0.85, where 0.85 is the target in-stock probability. The following two possible situations may occur in this imputation.
Situation 1: The estimated shortage before transshipment is higher than the actual amount. This triggers transshipping an extra unnecessary amount of inventory. Thus, an additional transshipment cost is incurred. In Table 3, the expected value of S − 75 is 89 but the actual one is 79. A total of 10 extra units of shortages are to be covered, compared with the true value.
Situation 2: The estimated shortage before transshipment is lower than the actual amount. A shortage penalty occurs when the true demand is realized. In Table 3, even though shortages are expected at missing points S -51 and S − 34 , they are still 21 and 6 units fewer compared with the true values, respectively. These remaining shortages incur a shortage penalty of approximately $2800.
In general, this imputation method may cover significant shortages in the missing demand cases, unless the demand fluctuates significantly. However, this imputation cannot guarantee the prevention of all shortage cases.

High-Protection Imputation
The high-protection imputation case is shown in Table 3 (e). The missing points will have the highest imputed values among the four imputation methods. The imputed values are set to d m ik = U ik (1 + 0.3)/0.85. This means that the calculated value for the shortage before transshipment is mostly higher than the true one. An extra transshipment cost can be incurred if the expected shortage amount is higher than the true value. Even in this case, the cost is not high compared with the shortage penalty of the true shortage amount. This will not incur large additional inventory holding cost, because the inventory level remains U until the time delivery to retailers commences. Table 4 shows the transshipment cost, actual shortage penalty cost when the true demand appears, and computed total cost. No cost differences occur between the first two methods, because all the missing data cases are considered to have no shortages, as explained in Section 4.3.2. By the high-protection imputation, the transshipment cost is higher than those of the other methods due to the extra shortage amount attempted to cover. However, this case results in the lowest total cost. Please note that the decision in this study is not simply comparing the shortage cost and transportation cost. Because vehicle routing is involved and imputation introduces uncertainty, cost alone cannot determine which imputation method is better. Further details are explained in Sections 4.4 and 4.5.

Trend and Statistical Analysis
Sections 4.2 and 4.3 described the cases in which transshipment has the strong effect on cost saving. This section describes the trend and statistical analysis of the cost reduction by the four imputation methods with a variety of random demand values. The cost savings are compared with those of the cases of no transshipment. For each imputation level, 20 samples were simulated to consider the diverse demand conditions. Figure 5 shows the cost savings using the transshipment and missing data treatment approaches. As shown in Figure 5, as the imputation level increases, so does the average cost saving. The cost saving is generally the largest with the high-value imputation method. While the low imputation value cases suffer significant shortage costs, the high imputation cases have lower shortage costs but additional trucking and handling costs. Because the shortage cost rate is higher than the other rates, the cost saving trend in Figure 5 is consistent with the expected trend.
In addition to the overall trend, subtle differences exist between the neighboring imputation methods. There are trivial differences in the average cost savings between the imputation with "0" and low-protection imputation methods. Slight differences are also observed between the medium and high imputation ones. The cost savings of the first two are from the transshipment itself, not from the additional shortage coverage based on the imputation values. This suggests that an inventory level equivalent to the average demand may not effectively hedge shortage possibility. This is usually the reason why safety stock is necessary. A detailed comparison of these two cases is presented in Section 4.5.
The results also indicate the effectiveness of the medium protection compared with the other methods. In many cases, the cost saving exceeds 70% in methods (3) and (4). However, between the medium-and high-protection approaches, the differences are not significant. The cost saving by method (4) is only 3% higher than that by method (3).
Highly imputed values lead to predicting larger shortages beyond the nominal variation range, leading to possibly superfluous transshipment volume. This indicates a possibly negative effect due to extra transshipment or at least the possibility of diminishing returns on inventory investment. The cost saving does not always increase as imputation level increases, even with the high shortage cost. A detailed comparison of these two cases is also provided in Section 4.5. Figure 5 also shows the distributed values in the cost savings from the 20 samples. The dispersed values reveal the variation from each imputation method. In fact, wide ranges are observed in the cost saving values. The variation is caused by randomness and a variety of reasons: random demand values, resulting various shortage quantities, product item prices, missing-data locations, etc. among the 20 samples. Owing to the wide statistical variability, there are even a few cases in which the highest cost-saving value from the first two methods is higher than the lowest value of the medium-and high-protection methods. This means that in some low-imputation method cases, the demands of missing data cases were high by chance, and they were handled by transshipment. On the contrary, in some high-imputation method cases, the demands of missing data cases were low by chance, and they did not have to be handled by transshipment. Thus, this high imputed value case leads to limited cost saving.
The variations within each imputation method, however, decrease as the imputation level increases. In the low-protection imputation methods, there are high chances that at least some of the actual shortages are not covered by transshipment. Thus, more variability exists in the sense that high shortage costs may or may not occur. The standard deviation is 16% among the 20 samples. In contrast, in the high-protection case, the variations in cost savings are relatively low; the standard deviation is 9.7%, which is lower than the abovementioned value. This variation reduction occurs because of the increased effectiveness of the medium and high imputation methods. As the imputation level increases, the cost savings approach the possible limit. Namely, the shortages have more chances to be considered in transshipment planning and covered by cross-filling. Thus, it is less likely that a large shortage cost occurs. This result implies that the high imputation method can facilitate more stable operations throughout DC networks.
To examine the robustness of the trend, a sensitivity analysis was conducted. The results are summarized in Figure 6. Figure 6 shows the sensitivity analysis of cost savings by the transshipment with different shortage cost rates and missing data treatment approaches. The shortage cost rate (γ) is defined as the ratio of the shortage cost to the item price. The trend appears to be consistent. Regardless of the shortage cost rate, the cost saving average exceeds 60% in methods (3) and (4). The cost savings in methods (3) and (4) are higher than those in methods (1) and (2). One special case is with γ = 0.2: the cost saving by method (4) is lower than that by method (3). This probably include cases in which highly imputed values lead to predicting too much shortage and consequent extra transshipment volume, and thus net negative cost saving. However, no significant differences exist between the medium and high-protection methods. The sensitivity analysis result also implies that the medium-protection method can be effective.

Detailed Comparative Analysis
This section provides a more detailed analysis of the cost values by the different imputation methods. Whereas Section 4.4 has provided the analysis of the overall trends, this section presents more detailed analyses. Section 4.5.1 compares the zero and average imputation methods. Section 4.5.2 analyzes the high imputation method and imputation adjustment algorithm.

Comparison of the Zero-and Low-Protection Imputation Methods
In the numerical simulation results of Section 4.3, little difference was observed between the imputation methods with zero and the average. This occurred because of several reasons. The individual demands are usually lower than the order-up-to level U. For the imputation with average, these demand values are averaged to impute the missing data of item k. Thus, the imputed value (the average) is also most likely to be lower than U. Thus, no shortage is suggested for the most of the missing demand cases. Hence, usually the mean imputation method ends up indicating the same no-shortage cases as the zeroimputation method. In other words, in many cases, the average-imputation method will generate a similar result to the zero-imputation method.
However, although the amount is not large, the two methods usually generate different cost savings. Table 5 illustrates the cases of cost saving improvement by the averageimputation, compared with zero-imputation. In Table 5, the missing demand are at locations M 12 , M 24 , and M 74 . With the zero-imputation method, no shortage is suggested at the three missing points that actually have shortages. Thus, no transshipment operations are performed for these locations. By contrast, with the low-protection method (imputation by average), the missing values are replaced with the average value. Thus, in matrix D, d m 74 becomes 72 and this is greater than U 74 = 68 by 4. This leads to the calculated shortage of S − 74 = 4. Thus, the transshipment is carried out to prevent this expected shortage. Although the difference is not large, the two methods lead to different transshipment operations and cost values. Table 6 summarizes the improved cost saving of the average imputation method compared with the zero-imputation method.   Table 6. Cost saving for the example in Table 5. In this section, we describe (1) a more detailed analysis of the high-protection cases and (2) how to adjust the imputation values using the algorithm depicted on the flowchart in Figure 4.
As can be seen in Figures 5 and 6, the medium-and high-protection approaches do not have significant differences. In some cases, the higher imputed values even lead to worse cost saving. This trend reflects the fact that the problem of this study is not simply a function of transportation and shortage costs. Because the shortage cost rate is higher than the approximate transportation cost, it is tempting to say that high shortage prevention is preferred so that high imputed values are preferred. The problem condition is not so simple. High imputed values lead to estimating large shortages and transshipping large volumes. However, the large shortage may not be realized. In other words, high imputed values may result in unnecessary cost increase due to extra transshipment but no effective cost reduction by shortage mitigation.
In addition, this analysis concerns the cases in which, for some imputed demand data, the total inventory I − of an item k from all DCs may not be enough to cover the estimated shortage. In a naïve transshipment planning, most stocks are transshipped to the shortage locations just by the least-cost plan. Such transshipment may incur an additional shortage penalty in the sending DCs. If some shortage values by imputation are arbitrarily high, the situation becomes worse. Thus, an upper bound of the imputed values must be specified so that the total shortage to cover is at most equal to the available inventory.
Even though this case can occur in any of the four imputation methods, it has much more chances to exist in the high-protection method due to the high imputed values. The example in this section demonstrates these cases. As shown in Table 7 This means that there are 21 shortages that even the stock of all DCs cannot cover. In Table 7, with the temporary imputed values, S + 24 = 16 and S + 44 = 5 (known shortage). Thus, we deduct the following values from the temporary imputed values so that the total deducted value is equal to 21. We compute the number of missing points for inventory item 4: v = ∑ i M i4 = 2. We compute the division 21/2 as quotient 10 with remainder 1 (q =10 and l = 1 after 21 is divided by v = 2). For the first l = 1 index, where the missing data are located, g = 2 (DC 2), and because q > 0,

Insight for Practice
This study provides insights on how to make transshipment more effective by proper treatment of missing data in practice. First, the results imply that demand-data imputation in transshipment decision-making is not simply to recover the missing values. Sophisticated data imputation methods exist in the literature. The majority of them focus on recovering the missing values as close to the original ones as possible, especially around mean values.
The numerical studies in this paper suggest that such an approach may fail for specific applications such as transshipment for cross-filling. Transshipment decisions based on average values were inferior to those based on the higher values. This means that a common straightforward imputation did not improve decision making; we should have a better and customized approach to handle missing values. Therefore, this study indicates that managers should not focus on simply how to recover the demand values themselves, but how to set appropriate shortage values for effective transshipment.
The results also suggest that a simple trade-off between inventory and transshipment costs does not always work for cross-filling. Although a higher cost in shortage than transportation may prefer a higher inventory, this trend may stop at a certain point. In cross-filling under a missing-demand condition, a high inventory may lead to unnecessary transshipment. This incurs extra transportation cost but seldom contributes to shortage prevention. This means that managers should find appropriate points from which diminishing returns on inventory investment become apparent. These points should be estimated by a combination of optimization and statistical analysis because of the complex interrelation of the parameters.
This study also implies that handling missing data is not just a one-step imputation for practitioners. The imputed values should be adjusted before being used. The algorithm in Figure 4 was devised for such a case. Without such algorithms, the optimal solution may generate infeasible solutions in practice. This indicates that practitioners should be advised on the practical validity of the statistically processed numbers.

Conclusions
This study investigated an integrated problem of demand data imputation and transshipment operation decisions, and the result analysis verified that the integrated approach enhances transshipment performance successfully. First, this paper established a combined problem of inventory transshipment for cross-filling among distribution centers and data imputation for treating missing demand data. This research formulated imputation methods corresponding to principal inventory management strategies. This research also developed algorithms keeping imputed values within practical inventory bounds. In addition, by combining a mathematical program and the developed algorithm, this research provided a mathematical model that allows simultaneous transshipment decisions and demand imputation. Therefore, this study contributes to establishing a theoretical foundation for advanced modeling of the transshipment problem.
Second, this paper uncovered key theoretical characteristics of the integrated imputationtransshipment problem and uncovered the effect of imputation parameters on transshipment performance. Numerical simulations were conducted for diverse demand and imputation cases with practical parameters. A comprehensive analysis of the results reveals that imputed values above a particular level are critical for effective shortage mitigation and cost reduction by transshipment. Moreover, the statistical analysis shows that the consistency of transshipment performance improves as the imputation level increases.
Third, this paper confirms that in practice the transshipment effectiveness can be improved by proper data treatment as suggested in this study. This study shows that demand data imputation can be incorporated into a holistic transshipment process in industry. The method in this paper is straightforward to implement for practical application. This study also demonstrates that transshipment can be performed efficiently even under missing demand data conditions.
The results imply that the novel method proposed in this study performs better for transshipment decision-making than the conventional imputation methods recovering simply the missing values. For example, whereas imputation with the demand average did not result in significant cost savings, imputed and adjusted values sufficiently higher than the average help reduce the shortage cost by 80% on average. This means that specialized guidelines and algorithms are necessary for a specific transshipment problem. This paper also provides managerial insights into how cost parameters, inventory levels, and impu-tation levels jointly influence transshipment operations and cost reduction in inventory management. A simple trade-off does not exist between inventory and transshipment costs in cross-filling. This means that an appropriate balance should be struck for a specific transshipment condition by the combination of optimization and statistical analysis.
Hence, this study lays the groundwork for sophisticated transshipment operations with data treatment, and the integrated approach can be extended to a variety of directions. This study focused on the key characteristics of the integrated transshipment and imputation problem, but this single paper does not address all possible extended topics. For example, this paper did not consider diverse probability distributions of the demand. A variety of vehicle routing conditions were not considered. These limitations can be addressed in the future work that investigates diverse problem conditions. A wide range of demand properties can be considered with different missing data conditions. Diverse imputation methods can be employed for large demand-data sets to take advantage of advanced statistical learning methods. Diverse cross-filling situations can also be reflected in transshipment modeling. The examples include vehicle availability and distance restriction. The future research may also consider a variety of performance measures such as financial risks or various sustainability indexes.