Next Article in Journal
Electromechanical Behavior of Afyonkarahisar Clay Under Varying Stress and Moisture Conditions
Previous Article in Journal
Cellular Entry, Cytotoxicity, and Antifungal Activity of Newly Synthesized Dendrimers
Previous Article in Special Issue
Deep Reinforcement Learning for Selection of Dispatch Rules for Scheduling of Production Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Integrated Framework for Internal Replenishment Processes of Warehouses Using Approximate Dynamic Programming

by
İrem Kalafat
1,2,*,
Mustafa Hekimoğlu
1,
Ahmet Deniz Yücekaya
1,
Gökhan Kirkil
1,
Volkan Ş. Ediger
1 and
Şenda Yıldırım
1,2
1
Department of Industrial Engineering, Faculty of Engineering and Natural Sciences, Kadir Has University, 34083 Istanbul, Türkiye
2
Advanced Analytics Department, Dogus Technology, 34398 Istanbul, Türkiye
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(14), 7767; https://doi.org/10.3390/app15147767
Submission received: 4 June 2025 / Revised: 21 June 2025 / Accepted: 26 June 2025 / Published: 10 July 2025
(This article belongs to the Special Issue Advances in AI and Optimization for Scheduling Problems in Industry)

Abstract

Warehouses are vital in linking production to consumption, often using a forward–reserve layout to balance picking efficiency and bulk storage. However, replenishing the forward area from reserve storage is prone to delays and congestion, especially during high-demand periods. This study investigates the strategic use of buffer areas—intermediate zones between forward and reserve locations—to enhance flexibility and reduce bottlenecks. Although buffer zones are common in practice, they often lack a structured decision-making framework. We address this gap by developing an optimization model that integrates demand forecasts to guide daily replenishment decisions. To handle the computational complexity arising from large state and action spaces, we implement an approximate dynamic programming (ADP) approach using certainty-equivalent control within a rolling-horizon framework. A real-world case study from an automotive spare parts warehouse demonstrates the model’s effectiveness. Results show that strategically integrating buffer zones with an ADP model significantly improves replenishment timing, reduces direct picking by up to 90%, minimizes congestion, and enhances overall flow of intra-warehouse inventory management.

1. Introduction

Efficient warehouse management directly influences critical performance metrics such as operating costs, throughput, and customer satisfaction. Among all warehouse activities, order picking is one of the most resource-intensive and cost-sensitive processes, particularly in spare parts supply chains where timely and accurate order fulfillment is paramount [1]. In order to address the challenges, many warehouses employ a forward–reserve area configuration. The forward area serves to stock a limited quantity of spare parts for rapid order picking, while the reserve area holds bulk inventory to replenish the forward area as needed. This internal replenishment practice aims to balance accessibility and storage capacity, but it also introduces operational complexities.
The forward–reserve replenishment process often faces significant challenges, such as delays, congestion, and capacity constraints, particularly in high-demand environments. To alleviate these issues, some warehouses utilize buffer areas as interim holding spaces. The usage of a buffer area is a well-established practice in warehouse management, valued for enhancing overall warehouse flexibility. Despite the benefits of reduced handling times and improved responsiveness, buffer areas can contribute to congestion if not properly managed. This study introduces a replenishment policy that strategically incorporates buffer areas into forward reserve operations. The policy prioritizes order picking from the buffer area. Additionally, it restricts returning stock to the reserve area to avoid operational disruptions. By focusing on reducing excess buffer inventory, the approach ensures a more efficient flow of goods, maximizing the benefits of buffer areas. Thus, buffer areas not only relieve pressure on reserve areas but also enable faster replenishment of forward areas, enhancing throughput and operational fluidity.
The significance of this study lies in its ability to address a critical gap in warehouse management practices: the absence of structured strategies for buffer area utilization. Although buffer areas are commonly used across industries, they are often implemented reactively as short-term solutions rather than as complementary components of a well-designed replenishment strategy. This study’s novelty revolves around the development of a tractable, integrated, and adaptive framework for this complex, real-world problem. The methodological novelty lies in the synthesis of three components: (1) the explicit mathematical modeling of unstructured buffer areas within the replenishment problem, (2) the use of a rolling-horizon mixed-integer linear programming (MILP) model as a high-quality policy evaluation engine, and (3) its integration with a dual-loop, adaptive forecasting module. This combination provides a practical approximate dynamic programming (ADP) framework for optimizing a high-dimensional, stochastic system where traditional dynamic programming is computationally infeasible.

Motivational Case Study

The warehouse system of interest includes a reserve area, forward area, and buffer area. The internal replenishment process in this system is carried out using cages with constant batch sizes. On the other hand, the reserve area consists of narrow corridors that accommodate the bulk storage of all parts. This area is equipped with specialized machines capable of retrieving large cages that carry spare parts, which are not suitable for order picking. Therefore, it is only preferred that parts from this area be picked up if required. In certain instances of replenishment from reserve to forward, the cage is added to the existing inventory on the shelf, potentially exceeding its capacity. In such cases, the warehouse personnel use buffer areas instead of returning the excess inventory to the reserve area. Warehouse managers determine these buffer areas arbitrarily, and they are located next to the shelves without any physical designation as special zones. Their primary purpose is to prevent flow obstruction in the reserve area by avoiding the transportation of partially filled cages. However, if these areas are not effectively managed, they can accumulate a significant amount of stock, leading to congestion in the warehouse. To address this issue, the prioritization of this area is emphasized during the order picking process. Consequently, when inventory is present in both the buffer area and the shelves of the forward area, pickers are instructed to take the stock in the buffer area first. Figure 1 shows the overview of the system.
The findings reveal that incorporating a strategic buffer area policy within the forward–reserve replenishment framework significantly enhances warehouse operations. Key results include a dramatic reduction in direct picking activities up to 90% in some scenarios alongside optimized buffer area utilization and decreased handling costs. The use of demand forecasts, combined with the integration of ADP, enables the system to dynamically adapt to fluctuating demand, enhancing replenishment efficiency even under labor constraints. By showcasing these benefits, this study presents a scalable and adaptable model for optimizing warehouse operations in high-demand, multi-product environments.
This paper makes an essential contribution to the forward reserve area problem and reveals that internal replenishment processes can be carried out more smoothly with a well-managed buffer area.
The specific objectives of the work in this paper can be further described as follows:
  • To overcome the computational challenges of in-warehouse replenishment processes and progressively improving policies by adopting an ADP approach with certainty equivalent.
  • To evaluate the impact of the ADP method in managing intuitively placed buffer points for internal replenishment and develop strategies to alleviate warehouse congestion.
  • To demonstrate that internal replenishment processes can be carried out more smoothly in a well-managed buffer area.
This paper is structured as follows: Section 2 reviews the relevant literature on the forward–reserve problem and identifies existing gaps. Section 3 details the methodology, including the ADP framework and the MILP model. Section 4 presents the experimental results from different scenarios and analyzes the model’s performance. Section 5 discusses managerial insights, limitations, and future research directions. Finally, Section 6 provides concluding remarks.

2. Literature Review

The forward–reserve area problem has been studied repeatedly in the literature. Ref. [2] solved the problem of selecting items to store in automated storage and retrieval systems and allocating space for inventory keeping. They developed a heuristic solution and applied it to a naval supply center’s small-sized items where it is assumed that they are continuously divisible. This paper is the first to solve the internal replenishment problem: restocking a primary area from a secondary location in the facility using a mathematical model. The model focuses on maximizing profit by allocating sufficient items in the volume base, considering the replenishment cost. Ref. [3] considered busy and idle periods in a distribution center. The busy period means the order picking operations will occur, and the idle period means replenishment operations of the forward area from the reserve area may happen. They also assumed that one trip from the reserve to the forward area is sufficient, replenishment is performed with one unit load, and order picking from the reserve area is possible at a higher cost. They formulated the model as a binary programming problem that minimizes the total labor time and aims to prevent accidents in busy periods. They solved it with greedy knapsack and dynamic programming. Also, a constraint for limiting replenishment capacity is added, and the computational outcomes are compared.
Ref. [4] presented a mathematical model and a heuristic for assigning items and deciding the size of a warehouse’s five functional areas: receiving, shipping, forward, reserve, and cross-docking. They mention that determining the sizes of the areas and assigning products to those areas affect warehouse management on strategic, tactical, and operational levels. The model is formulated as a mixed-integer linear programming model (MILP) and solved with the proposed heuristic and branch and bound algorithms. The computational results show that the heuristic algorithms take much less time. Ref. [5] applied three storage strategies to the forward area, analyzed the results, and determined which was better. The first one is the equal space strategy, which allocates equivalent space for each item; the other ones are the equal time strategies, which allocate space for each item that is sufficient for the same amount of time, and the optimal strategy. They applied strategies to small parts of the data by extending the model in [2] with new inventory level constraints based on the strategies. Ref. [6] considered a forward reserve area problem in a picker-to-belt system where order pickers collect orders from the forward area and put them on the conveyor. It is assumed that if the pickers face a stockout, the belt stops, and they must perform emergency replenishment. They have compared and evaluated the computational results of replenishment methods, and six heuristic policies have been proposed. Ref. [7] presented the results of early heuristic methodology studies and an alternative branch and bound algorithm that can solve the problem optimally. They applied the algorithms to two different data sets that belong to different warehouses, and the computational performance of the algorithms was presented. It is shown that when the number of stock-keeping units (SKUs) gets smaller, the optimality difference is higher for the aforementioned greedy heuristic in Hackman’s study. It is also observed that even though the objective value is equal, the model’s outcomes differ between the optimal and the heuristic solutions. Ref. [8] used simulation and what-if analysis to assign items to forward areas and allocate space to them. In order to achieve effective solutions in an acceptable amount of time, several storage policies have been implemented.
In earlier studies, the forward pick area was assumed to be continuously divisible; therefore, the models were fluid. Ref. [9] demonstrated a discrete space allocation for forward reserve problems and analyzed the difference between the discrete and the fluid models. They developed three models which serve different purposes. The first model solves the problem of dividing the forward area for a given set of items, which is defined in the paper as a discrete forward–reserve allocation problem (DFRAP). The second model unifies the allocation and assignment problem, and the model’s outcome shows which items must be held in the forward area and the size that belongs to the particular items. The final model has an approach to the problem that considers the forward area size as a variable and how the products must be allocated. The results showed that the gap between the discrete and the fluid models is small and can be ignored.
The case study presented by [10] has been applied to a cosmetic firm’s warehouse where they use a wave picking strategy. The objective is to avoid stockouts in an environment where picking and replenishment operations are performed simultaneously. They proposed three prioritizing policies for replenishments and simulated to compare the computational results and stockouts. Ref. [11] proposed an alternative storage allocation policy to reduce the picking time for the picking area of a warehouse. The proposed policy assumes several empty locations in the picking area that can be used for items whose requested quantities are higher than those in their stock. The study compared the proposed policy with the conventional warehouse layout by simulation and analyzed its performance regarding operator travel and order fulfillment time. The study also considered the impact of congestion on the proposed policy. The paper provides insights to improve the performance of the warehouse order-picking system.
Ref. [12] identified the best storage strategies for an automated storage and retrieval system. The research used analytical and numerical methods to compare the performance of different storage strategies. The paper also presents the factors affecting the performance of storage strategies and the experimental results used to determine the best strategies.
Ref. [13] aimed to design efficient work schedules for human pickers in mobile rack warehouses using human–robot co-coordinated order picking systems. The article discusses the challenges of designing good work schedules for human pickers in such systems. It proposes a model that allows mobile racks with different workloads to be assigned to pickers and schedules the racks assigned to each picker to minimize the expected total picking time. The paper uses a stochastic dynamic programming model and an approximate dynamic programming-based branch-and-price solution approach to solve the problem. The results show that the proposed approach can solve a moderate-sized problem of 50 racks in less than 2 min and produces high-quality solutions with picking times that are 10% shorter than those that do not account for schedule-induced fluctuations in the pickers’ work states. The article concludes that the proposed approach can be applied to other warehouses to detect fluctuations in picker work states, incorporate these fluctuations into robot schedules, and increase the productivity of a picking system.
Ref. [14] considered the storage replenishment and routing problem (SRRP) in a warehouse environment where items are stored in separate forward and reserve storage areas and replenished using a common reserve storage area. The authors propose an MIP model and heuristic approaches to solve the problem. Based on the inventory routing problem (IRP) literature, the authors propose an MIP model to solve SRRP. They also propose a priori routing heuristics to solve the problem. These heuristics are based on the shortest path algorithm and graph theory-based heuristics. To further improve the replenishment routes, they propose an a posteriori routing step. Ref. [15] examined the ADP method for solving multidimensional bag problems. ADP provides fast and effective solutions to complex and large-scale optimization problems using approximate calculations of the value function. The results show that this approach provides high-quality and practical solutions, especially under multidimensional constraints. Ref. [16] developed an optimization tool intending to replace Zara’s manual and experience-based pricing process with a more systematic approach based on demand forecasting and price optimization models. In the study, the certainty equivalent approach was used by substituting expected values for uncertain future sales, and an MIP model was formulated with this approach. The developed model was tested in controlled field experiments in Zara’s Irish and Belgian stores, and this new process increased revenue from discount sales by 6%. As a result, this method began to be used in Zara’s discount decisions worldwide.
A broader view of buffer management reveals its strategic importance across industries. In manufacturing, buffers act as decoupling points in Just-in-Time (JIT) systems, absorbing variability to maintain a smooth production flow [17]. In supply chain management, inventory buffers are strategically placed to hedge against demand and lead-time uncertainty, as analyzed in the seminal work by Simchi-Levi et al. [18]. While the function of these formal, planned buffers is well-understood, the strategic management of informal, dynamic buffer zones within a warehouse—which arise from operational constraints, as addressed in our study—remain underexplored in the literature.
Recent developments in warehouse automation, such as the deployment of Automated Storage and Retrieval Systems (AS/RS) and Autonomous Mobile Robots (AMRs), are changing the landscape of internal logistics. A comprehensive review in [19] shows that these technologies excel at executing physical movements efficiently. However, they require a high-level strategy to provide the decision logic for what to move, when to move it, and where to place it. The hierarchical nature of such systems, where strategic planning directs tactical execution, is a key theme in the automation literature. Our ADP framework is designed to serve as this strategic layer, providing optimized decisions that can then be passed to an automated system’s execution layer, thereby addressing a critical need in the evolution towards smart warehouses.
In the context of evaluating stochastic optimization models like ours, it is a standard practice to benchmark performance against a theoretical optimum. This is often achieved by solving the problem with perfect future information, a method known in the literature as a “dynamic oracle” or perfect information benchmark [20]. This oracle provides a theoretical upper bound on performance, and the gap between the stochastic solution and the oracle’s performance quantifies the “cost of uncertainty” or the value of perfect information. This benchmark is crucial for contextualizing the performance of any forecast-driven model.
Despite this extensive body of work, a critical gap remains in the literature regarding the strategic and optimized use of buffer areas within the forward–reserve replenishment context. While buffers are acknowledged as operational tools, they are rarely integrated as a formal component of a dynamic, forecast-driven optimization model. Most studies focus on the direct link between the forward and reserve areas, overlooking the potential of a managed intermediate zone to absorb variability and reduce congestion.
This study fills this gap by explicitly modeling the buffer area and using an ADP framework to make intelligent, proactive decisions about its use, thereby justifying its contribution to the field. Alternative optimization methods, such as purely deterministic models, often struggle with the uncertainty of real-world demand, while simple heuristics may fail to find high-quality solutions. Our ADP approach provides a balance of computational tractability and responsiveness to dynamic conditions.

3. Methodology

3.1. Application of Approximate Dynamic Programming (ADP)

This study utilizes ADP to manage and optimize the decision-making process under uncertainty to address the complexities of internal replenishment in a spare parts warehouse. ADP is particularly suitable for environments where the state and decision spaces are too large for traditional dynamic programming (DP) techniques due to the curse of dimensionality. It simplifies the problem by approximating the value function, which assesses future costs associated with different states and decisions. While a formal mathematical proof of convergence for this type of applied framework is beyond the scope of this paper, we demonstrate its empirical stability in Section 4.3, where results are shown to be consistent across varying planning horizons, suggesting the policy does not diverge. The DP formulation for this problem is presented in Appendix A.
  • Demand Forecasting
The efficacy of the ADP framework is fundamentally dependent on the quality of its demand forecasts, which serve as the primary input for decision-making. To ensure predictive reliability, we conducted a comparative analysis of state-of-the-art machine learning models, including Extreme Gradient Boosting (XGBoost) and LightGBM, using time-series cross-validation. Evaluation metrics included Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). LightGBM was ultimately selected due to its superior performance across a holdout test set. To maintain robustness in a dynamic warehouse environment, the model operates within a dual-loop learning framework:
  • Daily Operational Loop: At the end of each day, actual demand is captured and used to update the inventory state for the next day’s MILP optimization run.
  • Periodic Retraining Loop: The same demand data is added to the historical training set, and the LightGBM model is retrained periodically to reflect evolving demand patterns.
This adaptive retraining strategy ensures that the forecasts remain aligned with changes in seasonality, customer behavior, or product life cycles. Inaccurate forecasts can lead to over-replenishment or stockouts, especially in a cost-sensitive MILP environment; hence, this mechanism is vital for sustaining high-quality replenishment decisions over time.
2.
ADP Model with Certainty Equivalent Approach
The ADP model incorporates state variables for inventory levels, decision variables for replenishment and picking, and an objective to minimize total expected costs. Value function approximation is achieved by implementing a certainty-equivalent control policy, a practical approach for complex problems chosen here for its interpretability, data efficiency, and stability. In this approach, the stochastic future demand is replaced by deterministic point forecasts from the machine learning model. This simplification allows the problem to be resolved at each step by the finite-horizon MILP. The solution to this MILP provides the optimal actions, and its objective value serves as the practical approximation of the true value function for the current state. This method of using a deterministic optimization model within a larger stochastic framework is a practical approach for complex problems, and a similar strategy was successfully used by Caro et al. [16].
While this certainty-equivalent approach simplifies the problem at each step, uncertainty is managed over the long term through a rolling-horizon implementation. This mechanism is clarified by the daily operational cycle:
  • Forecast and Optimize: At the start of the day, we generate a forecast for the planning horizon. The MILP is solved using the current inventory state, and this forecast produces an optimal plan.
  • Implement: Only the decisions for the immediate first day are implemented.
  • Observe and Update: At the end of the day, we observe the actual demand and update the inventory levels to reflect the true state of the system.
  • Repeat: On the next day, the entire process repeats from the new, true state. The old plan is discarded, and a new optimization is performed.
This constant cycle of re-planning based on real-world feedback allows the system to continuously correct its course, effectively managing uncertainty without needing to model the full probability distribution of demand.
3.
Rolling-Horizon Implementation
The implementation of ADP follows a rolling-horizon approach, which is particularly effective for managing operations in dynamic environments like warehouses. The optimization model is performed over discrete time periods such as daily. The decisions are made at the beginning of each period based on the system’s current state using demand predictions as inputs. At the end of each period, the state information is updated based on the actual demand fulfilled and replenishment actions taken. This updated state information is then used to make decisions in the next period. A feedback loop is incorporated whereby the outcomes of previous decisions inform updates of the value function approximations, refining the decision-making process over time.
To effectively apply ADP, certain assumptions are made about the operational context, including but not limited to the predictability of demand variations, the reliability of supply chain logistics, and the stability of replenishment lead times. These assumptions help simplify the model while ensuring it remains robust enough to handle real-world complexities. This detailed application of the ADP model with certainty equivalence seeks to optimize internal replenishment processes by effectively handling uncertainties in demand and supply, thus enhancing operational efficiency and reducing costs in the spare parts warehouse setting.

3.2. ADP Pseudo Code

The core of our framework is the iterative solving of the MILP model within the ADP loop. The MILP serves as the policy function in our ADP framework. It is solved iteratively, once per day, in our rolling-horizon simulation. At the beginning of each day t , the current state (inventory levels) provides the initial boundary for the MILP. The MILP is then solved to find an optimal set of actions over the planning horizon.
In this approach, the MILP’s finite-horizon objective function acts as the practical approximation for the theoretical ADP value function, V ( S ) . Since computing V ( S ) for all possible states is interactable, we instead approximate the total cost-to-go by solving the detailed MILP over a finite lookahead period. This is a common and effective technique in ADP, allowing us to leverage the power of exact optimization for tactical planning within a larger, adaptive framework.
Step 0: Initialization
0.1 Load historical demand data from the database;
0.2 Initialize state variables: forward inventory, buffer inventory and reserve inventory;
0.3 Define the planning horizon T (e.g., the next seven days);
0.4 Set MILP parameters.
Step 1: Main Loop
For each spare part (i = 1, 2, 3, ..., N):
   For each planning day (t = 1, 2, ..., T):
   Step 1.1: Generate forecasts (certainty equivalent);
   Step 1.2: Solve MILP model with recursive state transitions (1)–(17).
Step 2: Update Initial States (Inventory levels)
2.1 Record the actual demand at the end of the day;
2.2 Update the initial states and historical data with the recorded actual demand (iteratively refining value approximation).
The ADP process begins with Step 0: Initialization, where historical demand data is loaded, state variables are set, and MILP parameters are configured. This establishes the foundation for the iterative optimization. In Step 1: Main Loop, demand forecasts are treated as fixed values and serve as inputs to the MILP model, which determines optimized replenishment and buffer allocation decisions over the finite planning horizon. By solving the MILP iteratively each day, the approach captures the dynamic nature of inventory management while maintaining computational efficiency.
In Step 2: Update Initial States, when the actual demand at the end of the day is observed, we update the initial states of the model for the next day’s run. These iterative updates continually improve the accuracy of the model’s future decisions. It is important to note that optimal decisions are made not with a complex, intractable DP model but with the simple MILP model that we run continuously. This demonstrates that ADP offers a much more practical solution than traditional DP. While this methodology does not solve the DP problem exactly, it uses ADP principles to approximate the value function and optimize decisions iteratively. The rolling-horizon framework recalibrates the optimization at each step, balancing immediate and future costs dynamically. Although this method does not provide a truly optimal solution, it provides a much more feasible and practical solution, avoiding the difficulties of DP. The DP formulation of this problem and the DP pseudo code are explained in detail in Appendix A.

3.3. MILP Model for Forward Area Replenishment Problem with Buffer Area

A MILP model has been developed to address the specific challenges presented in this paper. This model is designed to optimize decision-making by capturing the complex dynamics and constraints of the problem. The model is structured around a key policy that prioritizes order picking from the buffer area whenever the required spare part is available there. By giving precedence to buffer stock, we aim to decrease the inventory in the buffer area, reducing congestion and improving operational flow. If the buffer stock is insufficient, the remaining quantity is picked from the forward area, ensuring that the forward area is replenished only when necessary. This policy-driven approach has demonstrated clear benefits in improving warehouse efficiency by reducing unnecessary movements in warehouse and streamlining the replenishment process.
In developing the mathematical model, certain assumptions related to warehouse operations were made to reflect realistic business practices. These assumptions include the operational characteristics and warehouse policies that are critical to this study, as outlined below.
  • Each spare part has a specific batch size for replenishment.
  • Due to the involvement of a limited number of staff members in the internal replenishment process, the number of replenishments that can be accomplished daily is limited.
  • The stock needed to cover the demand is available in the warehouse in the forward, reserve, or buffer areas.
  • A dedicated rack with a capacity of a specific batch size is allocated in the forward area for each part.
  • A complete cage is transferred from the reserve area to the forward area for replenishment purposes, i.e., all transferred cages are full.
  • If there is any surplus beyond the forward area’s capacity, the remaining stock must be located in the buffer area, which is adjacent to the shelf.
  • If a cage is transferred from the reserve area while some parts exist in the associated shelf in the forward area, this partially filled cage in the forward area is transferred to the buffer area, and the new full cage is located in the forward area.
  • In cases where the order picker exclusively visits the forward area and encounters insufficient stock, they proceed to the reserve area to fulfill the order and perform direct picking.
The mathematical model explained in this paper will undergo comprehensive examination alongside the notations employed for delineating the internal replenishment problem. The methodologies for optimization utilized to address the issue will be explained. The notations that will be used in the mathematical model are given in Abbreviations.
The framework aims to efficiently ascertain the inventory levels of shelves and enhance product positioning. For this objective, the mathematical expression of the framework and the resolution methods are elaborated thoroughly. The MILP model details are given below.
Model:
m i n i = 1 , t = 1 N , T Q i t c r + C i t D c d + ( W A R i t D W i t + W B R i t ) c w + X i t W c w x
Subject to:
X A R i t = X B R i t + Q N i t C i X i t W i N ,   1 t T
X B R i t + 1 = X A R i t D i t S i N ,   1 t T
W A R i t = X i t W + W B R i t i N ,   1 t T
W B R i t + 1 = W A R i t D i t W + C i t D C i D i t P i N ,   1 t T
X i t W Q i t C i i N ,   1 t T
Q N i t Q i t M i i N ,   1 t T
Q N i t C i + X B R i t X i t W F i i N ,   1 t T
D i t W + D i t S + D i t P = D i t i N ,   1 t T
X i t W X B R i t i N ,   1 t T
D i t P C i t D C i i N ,   1 t T
D i t W W A R i t i N ,   1 t T
Q N i t C i F i i N ,   1 t T
X A R i t F i i N ,   1 t T
i = 1 N Q i t τ i N ,   1 t T
X B R i t ,   X A R i t ,   W B R i t ,   W A R i t ,   Q N i t ,   X i t W ,   D i t S ,   D i t W ,   D i t P ,   C i t D 0 i N ,   1 t T
Q i t   { 0,1 }
The objective function (1) minimizes overall costs, which include the costs associated with internal replenishment ( c r ), direct picking ( c d ), inventory holding in the buffer area ( c w ), and transferring inventory from the forward area to the buffer area ( c w x ). Based on Assumption 7, the cost of transferring inventory from the forward area to the buffer area is incorporated to prevent unnecessary handling activities.
In this model, the cost structure is defined with the assumption that direct picking incurs the highest cost, followed by forward-to-buffer area transfer, then buffer area usage, with replenishment having the lowest cost. This hierarchy reflects the labor intensity and resource requirements of each operation.
Constraints (2) and (3) determine the stock levels in the forward area based on the inventory from the preceding time period and the replenishment quantity within the current timeframe. Constraints (4) and (5) model the inventory movement between the buffer and forward areas. Constraints (6) and (7) are the either-or constraints that help the variables X i t W , Q N i t   and the binary variable Q i t to be assigned correctly. In Equation (18), the replenishment decision is also shown. This binary decision is critical as it triggers the sequence of subsequent inventory adjustments and movements.
Q i t = 1 ,                                                                   i f   f o r w a r d   a r e a   i s   r e p l e n i s h e d   0 ,                                                         i f   f o r w a r d   a r e a   i s   n o t   r e p l e n i s h e d
Equation (19) shows the process of stock relocation from the forward area to the buffer area within a warehouse. If the transfer meets certain capacity conditions, the binary variable is set to 1, indicating replenishment. If no stock is moved, the variable is set to 0.
Q i t = 1 ,                                                             i f   C i X i t W > 0   0 ,                                                                                 i f   X i t W = 0  
Function (20) shows the logic of the inclusion of Q N i t   is particularly important as it integrates the replenishment decision with the physical quantity of inventory moved, ensuring that our model reflects the practical constraints of the warehouse’s operational capacity. Together, these equations fine-tune our inventory management and enhance the overall efficiency of warehouse operations, leading to a reasonable balance between availability and the cost implications of inventory holding and movement.
Q i t = 1 ,                                                 i f   M i Q N i t > 0   0 ,                                                                       i f   Q N i t = 0  
Constraint (9) ensures that the total demand is met through a combination of stocks from the forward area, the buffer area, and direct picking. Constraint (10) ensures that the inventory transferred from the forward area to the buffer area does not exceed the existing stock in the forward area. Constraint (11) ensures that the cages required for direct picking are adequately supplied. Constraint (12) limits the amount of demand covered from the buffer area to the available stock within that area.
Constraints (8), (13), and (14) ensure that the quantity of parts brought from the reserve area for replenishment is limited to the capacity of the forward area. Finally, constraint (15) ensures compliance with the daily replenishment capacity, preventing overstocking and optimizing resource allocation. The objective function and the constraint set in (1)–(20) constitute our mathematical model for effectively managing warehouse inventory.

4. Results and Analysis

The results presented in this section are based on a one-month simulation conducted for spare parts groups of 50, 200, and 300 items. To provide robust validation for our findings, we performed 30 independent simulation runs for each scenario. In each run, the actual demand was generated stochastically to reflect real-world uncertainty, while the model only had access to its deterministic forecast. Different random seeds were used to ensure demand variability across the runs. All reported metrics for these scenarios show the mean and the 95% confidence interval. The proposed optimization model’s performance is evaluated against three distinct benchmarks:
  • (s,S) Policy: A standard inventory control policy from the academic literature, serving as a rigorous algorithmic benchmark.
  • Business as Usual: The current, manual practice in the case study warehouse, serving as a practical baseline.
  • Simulation with Known Demand (Simulation KD): A theoretical, perfect-information case, corresponding to the “dynamic oracle” benchmark discussed in the literature [20], used to establish an upper bound on performance.
Our proposed model, Simulation with Predicted Demand (Simulation PD), is driven by the ADP framework with machine learning forecasts. The findings illustrate the benefits of integrating predictive demand models and buffer area strategies into warehouse operations.

4.1. Metrics and Scenarios

The performance of the optimization model was evaluated using four key metrics:
  • Direct Picking: This represents the percentage of items picked directly from the reserve area, bypassing forward and buffer areas. A lower percentage indicates a more streamlined replenishment process.
  • Replenishment: This tracks the frequency of stock restocking activities, ensuring forward areas remain adequately stocked to meet demand.
  • Buffer Area Usage: This measures the percentage of inventory stored temporarily in buffer areas. Lower usage reflects improved stock flow without reliance on intermediary storage.
  • Forward-to-Buffer Transfers: This reflects the movement of inventory from forward areas to buffer areas due to capacity constraints or imbalances.

4.2. Comparative Analysis of Handling Activities

The proposed solution framework was applied to a case with all part groups, and the results are presented in Table 1. The most striking achievement is the dramatic and statistically significant reduction in direct picking activities. For the 50-part group, our proposed model (Simulation PD) averages 13.8% direct picking, which is a substantial improvement over both the 19.5% from the standard (s,S) policy and the 28.1% from Business as Usual. This signifies a more streamlined picking process that translates into faster fulfillment and reduced labor costs—key factors for warehouse efficiency. Another notable outcome is the optimized use of buffer space, a critical resource in managing inventory flow. While buffer usage under Business as Usual is excessively high (56.8%), our model maintains a healthier balance (18.9%) compared to the more reactive (s,S) policy (25.4%), ensuring that space is used only when truly necessary to avoid unnecessary congestion and enhancing throughput. These performance trends hold and are often amplified as the part group size increases. For instance, in the 300-part group, our model reduces direct picking to just 5.5%, compared to 8.9% for the (s,S) policy and 19.2% for Business as Usual, demonstrating the scalability of the model’s benefits. In essence, these results highlight the potential for real-world implementation, promising significant gains in operational performance and efficiency over both current practice and standard academic benchmarks.

4.3. Summary of Handling Activities: Total and Average Metrics

The findings presented in Table 2 offer valuable insights into the impact of optimization on warehouse handling activities. The key takeaway is the significant reduction in direct picking in our proposed Simulation PD compared to both the (s,S) policy and Business as Usual. This illustrates the efficiency gains achieved through predictive and demand-driven approaches, particularly Simulation KD, which leverages known demand to minimize unnecessary direct picking, one of the most labor-intensive and costly activities in warehouse operations. The results also highlight the role of replenishment in improving operational efficiency. In Simulation KD, higher replenishment rates indicate that the system is replenishing stock at more optimal times and quantities, which in turn reduces the reliance on direct picking. The minimal increase in replenishment activities demonstrates that this process is not overly burdensome and instead adds to the overall efficiency by ensuring the forward area is adequately stocked.
The buffer area plays a crucial role in operational efficiency, especially in smaller part groups. In the 50-part scenario, Simulation KD shows a higher reliance on buffer usage, as this model optimizes buffer space to balance demand fluctuations. As the part group size increases, buffer usage in Simulation KD decreases, suggesting that for larger part groups, the optimization model relies more on strategic replenishment and less on buffer storage. This reflects the flexibility of the optimization model, adjusting to the scale of the operation and using buffers only when necessary. If the buffer area were not used at all, we would expect a significant increase in direct picking activities and overall handling costs. Without the buffer, the system would rely more on direct fulfillment from the forward or reserve areas, leading to inefficiencies and increased labor costs. The results make it clear that buffer areas, when used strategically, help smooth out inventory management, reduce unnecessary work, and improve overall efficiency. In summary, the findings show that the use of optimization, particularly Simulation KD, enhances warehouse operations by reducing direct picking and strategically utilizing the buffer area.
The buffer’s role diminishes in larger part groups due to more effective replenishment planning, but its presence still adds value to smaller-scale operations. This demonstrates that optimized replenishment and buffer management can significantly improve warehouse efficiency, minimizing costs and unnecessary handling activities across varying scales of operation.

4.4. Time Horizon Impact on Operations

Table 3 illustrates how handling activities vary across different planning horizons (1 week, 2 weeks, and 3 weeks) for the 50-part group. The results show that the optimization model maintains consistent performance regardless of the length of the planning period. For direct picking, the percentage remains stable across all planning horizons, with only a slight decrease from 13.83% in the 1-week and 2-week horizons to 13.73% in the 3-week horizon. This stability suggests that the optimization model is effective in consistently managing direct picking activities, regardless of the planning period. Replenishment activities also show minimal variation, staying nearly identical across the different horizons, with a minor increase to 33.97% in the 3-week horizon. This indicates that the model can reliably manage replenishment needs over extended periods without significant changes in the handling activity percentage. Buffer area usage and forward-to-buffer area transfers exhibit a similarly stable pattern, with only slight variations as the planning horizon increases. For instance, buffer area usage increases marginally from 18.91% in the 1-week and 2-week horizons to 19.067% in the 3-week horizon. This consistency across metrics underscores the robustness of the optimization model in handling inventory activities effectively, even as the planning horizon extends.
Overall, Table 3 demonstrates that the optimization model’s performance remains steady over different planning periods, ensuring reliable and efficient handling activities regardless of the planning horizon.
Table 4 presents the sum and mean values of handling activities across different planning horizons (1 week, 2 weeks, and 3 weeks) for the 50-part group. The key takeaway is that both the total handling activities and average efficiency remain consistent, regardless of the planning period. Direct picking activities exhibit minimal variation, with a slight decrease in the 3-week horizon, indicating that the model effectively manages operations over longer periods without significant changes in the workload. Similarly, replenishment activities maintain stable sum and mean values across all planning horizons, suggesting that the optimization model ensures consistent efficiency. While buffer usage shows a minor increase over the 3-week horizon, its impact on overall performance is minimal. In summary, the results in Table 4 highlight the model’s effectiveness in maintaining consistent handling activities and operational efficiency across varying planning horizons.

5. Discussion

5.1. Cost and Efficiency Trends

Total cost trends were observed over time in four scenarios—Simulation PD, (s,S) Policy, Simulation KD, and Business as Usual—for three different parts groups (50, 200, and 300 parts). This analysis provides valuable information about the performance of each scenario and the effectiveness of the optimization models. It highlights how each approach impacts costs, particularly in the context of an automotive parts warehouse.
Figure 2 presents the total daily costs for the scenarios of the 50-part group in a line graph. The Simulation KD scenario, which has the perfect knowledge of demand, consistently outperforms all other approaches, establishing a theoretical lower bound on costs. This outcome underscores the significant advantage of accurate demand forecasting in inventory management. The Simulation KD scenario’s lower total costs suggest that when demand can be known with certainty, inventory systems can be optimized to a level of efficiency that minimizes the total cost, likely through more effective stock positioning and reduced necessity for last-minute adjustments that typically incur higher costs. In contrast, the Simulation PD scenario, which relies on predicted demand, yields a total cost profile that exceeds that of the Business as Usual and (s,S) Policy scenarios at various points. This observation suggests that while predictive models aim to improve standard practices, the inherent uncertainty and potential inaccuracies in demand forecasting may lead to suboptimal decision-making that can inadvertently increase costs. It is important to note, however, that the higher costs associated with the Simulation PD model do not necessarily denote a flawed model; instead, they may reflect the complex trade-offs between different operational objectives, such as service level targets, risk of stockouts, and the cost of holding inventory. Moreover, the peaks in the cost graph for Simulation PD could indicate moments where the predictive model may have over- or under-estimated demand, leading to increased costs due to excess stock or emergency restocking. These variations in the Simulation PD scenario emphasize the critical role of forecasting accuracy and the potential cost implications of deviations from actual demand.
Notably, the (s,S) Policy scenario falls consistently between Simulation PD and Business as Usual, validating its role as a reasonable academic benchmark but highlighting its limitations in adapting to dynamic demand. Meanwhile, the Business as Usual approach remains the most costly throughout, with significant variability and frequent cost spikes, illustrating the inefficiencies of manual decision-making and lack of systematic forecasting.
This graph does not dismiss predictive models’ value but highlights the complexity and challenges associated with demand forecasting. It also illustrates the substantial benefits realized when demand is known precisely, presenting a best-case scenario for optimization. Thus, these findings serve as a catalyst for investing in the accuracy and robustness of demand forecasting methods within predictive models to bridge the gap between the idealized outcomes of known demand and the fluctuating results of predicted demand scenarios.
For the 200-part group, it is shown in Figure 3 that the Simulation PD scenario fluctuates significantly and, like in the 50-part set, sometimes exceeds the Business as Usual scenario in terms of total cost. This could indicate that the predictive model may struggle with scale, where the complexity of managing a more extensive inventory set might amplify the cost of inaccuracies in demand forecasting. The Simulation KD scenario maintains a significant advantage over other approaches, reinforcing that accurate demand knowledge allows for superior inventory optimization. The costs are consistently lower across the observed time frame, suggesting scalable efficiency gains when demand is precisely known. Business as Usual exhibits the highest costs among the three scenarios, which may signal inefficiencies that are magnified as the inventory set increases. This scenario appears to lack the dynamic adjustments possible in the simulated scenarios, leading to less optimized cost outcomes. The (s,S) Policy provides a consistently moderate performance, superior to Business as Usual but generally inferior to Simulation PD and far from the efficiency of Simulation KD. Its performance reflects a rule-based approach that, while systematic, lacks responsiveness to real-time demand dynamics.
As is shown in Figure 4, in the 300-part group, the Simulation PD scenario again shows variability in total costs, with peaks potentially reflecting moments of over- or under-estimation of demand by the predictive model. This variability seems less pronounced compared to the 200-part inventory, suggesting some degree of scalability in the model’s predictive accuracy or a possible indication of the model’s adaptive mechanisms that become more effective with larger data sets. Simulation KD continues to outperform the other scenarios, suggesting that the benefits of known demand in cost optimization are evident and consistent across different inventory sizes. The model’s cost efficiency is apparent, with the lowest costs throughout, indicating that the ability to forecast demand directly correlates with cost savings precisely. Business as Usual remains relatively stable but at a higher cost level compared to Simulation KD, yet it occasionally dips below the costs observed in Simulation PD. This might reflect the inherent inefficiencies in a system that does not utilize optimization models to respond to changing demand patterns.
Forward reserve area replenishment optimization analysis highlights consistent trends and critical insights across all part groups, namely 50, 200, and 300. Simulation PD consistently demonstrates lower direct picking frequencies than Simulation KD and Business as Usual scenarios. This pattern suggests the efficacy of predictive demand optimization in reducing direct picking occurrences, thereby potentially enhancing operational efficiency and cost-effectiveness.
Furthermore, Simulation KD consistently exhibits lower overall costs allocated to direct picking and higher costs allocated to pulling inventory from forward to buffer areas compared to Simulation PD and Business as Usual. This indicates a strategic allocation of resources towards optimizing inventory movement scenarios with known demand, potentially resulting in more streamlined operations and reduced overhead costs.
In contrast, the Business as Usual, consistently relies on traditional inventory management practices, with higher direct picking frequencies and costs allocated to buffer area usage. This suggests a potential opportunity for optimization through the adoption of predictive analytics and more efficient inventory management strategies.
The cost profile for the Simulation PD scenario highlights the practical limits of forecast-driven optimization. The observed cost spikes, particularly when compared to the stable, low costs of the Simulation KD scenario, can be attributed to the model’s reaction to forecast inaccuracies. When the predicted demand deviates significantly from actual demand, the certainty-equivalent model makes decisions that are suboptimal in hindsight. For instance, over-forecasting can lead to excessive replenishment and high inventory holding costs, while under-forecasting can trigger costly, last-minute direct picking to avoid stockouts. This analysis underscores that while the ADP framework is robust, its performance is intrinsically linked to the quality of its forecast inputs.

5.2. Managerial Insights

The results from this study provide actionable insights for warehouse managers looking to optimize operations in dynamic environments. Key takeaways include the following:
  • Buffer areas should be viewed as a flexible resource for managing demand fluctuations, but they should be used thoughtfully. Overusing them can cause inefficiencies, while underusing them can lead to congestion in forward areas. Managers should focus on clearing out excess inventory in buffer areas first, as shown in the model, to keep operations smooth and efficient.
  • The variability in Simulation PD outcomes highlights the critical need for robust and accurate demand forecasting systems. Investments in machine learning or advanced analytics can close the gap between predictive and perfect knowledge scenarios.
  • The significant cost savings and efficiency gains observed for larger part groups indicate that the proposed model is particularly beneficial for high-volume warehouses.
  • Managers overseeing complex operations should prioritize strategic replenishment planning, which reduces direct picking and minimizes labor costs.
  • While Simulation KD demonstrates the lowest costs, achieving perfect demand knowledge is often unrealistic. Managers should aim to combine predictive tools with adaptive strategies to mitigate forecasting errors and maintain flexibility.
These insights serve as a roadmap for implementing advanced replenishment strategies that align with real-world constraints and opportunities.
Furthermore, managers should be aware of the model’s sensitivity to cost parameters. The optimal policies for replenishment and buffer use will shift based on the relative costs of different operations. For example, if the cost of direct picking were to decrease significantly, the model would naturally favor more direct fulfillment from the reserve area and rely less on maintaining high stock levels in the forward and buffer areas. Conversely, an increase in inventory holding costs would incentivize the model to maintain leaner inventories and use the buffer more sparingly. A sensitivity analysis of these cost parameters could provide valuable insights for tailoring the strategy to a specific operational environment.

5.3. Limitations and Future Research

Our study, while providing a robust framework, has certain limitations that open avenues for future research. First, the model’s performance is highly dependent on the accuracy of demand forecasts. The certainty-equivalent approach does not explicitly model the stochastic nature of demand, which can lead to suboptimal decisions as discussed. Future work could explore more advanced stochastic optimization techniques or two-stage programming to better account for uncertainty.
Second, practical implementation requires a sophisticated IT infrastructure. This includes systems for real-time inventory tracking, automated data pipelines for the demand forecasting model, and sufficient computational power to solve the MILP on a daily basis.
For future research, several promising directions can extend the contributions of this study. First, to provide a more comprehensive benchmark, our ADP model could be evaluated against established warehouse heuristics and purely deterministic models to better quantify the value added by the adaptive framework. Furthermore, the model could be enhanced by explicitly incorporating sustainability objectives. Inspired by the work of Stanković et al. [21] on optimizing dock door allocation to save energy, our objective function could be augmented to include energy consumption costs associated with internal vehicle movements, creating a trade-off between traditional operational costs and environmental impact. Building on this, the advanced stochastic optimization methods discussed by Zhong et al. [22] could be explored to manage the dual uncertainties of demand and energy pricing, moving beyond our certainty-equivalent approach to a more distributionally robust framework. Finally, the integration of our framework with warehouse automation presents a significant opportunity. The current ADP model could be adapted to generate optimal task sequences for an Automated Storage and Retrieval System (AS/RS), where the model’s decisions would directly command robotic agents. This extension would involve reformulating cost functions to reflect robotic operational parameters and could leverage specialized optimization techniques, such as the cross-entropy method proposed by Foumani et al. [23], to solve the resulting complex scheduling problems.

6. Conclusions

This study comprehensively examined the internal replenishment process in an automotive spare parts warehouse, a critical component of the supply chain that directly impacts operational efficiency, costs, and customer satisfaction. One of the most significant challenges faced with this problem is that the system expands into an infinite state space over time due to its nonstationary distribution and multi-product structure. This creates computational difficulties, especially the curse of dimensionality, making it nearly impossible to solve with traditional methods like DP. The DP approach for this problem is highly complex and computationally costly, as it must consider each situation and decision.
By combining advanced methodologies such as ADP with machine learning techniques for demand forecasting, the research has developed a robust model that addresses the complexities of inventory management while improving decision-making under uncertainty. The findings indicate that maintaining optimal stock levels in the forward area significantly reduces the inefficiencies of direct picking from the reserve area, leading to a more effective replenishment process. The ADP application, designed to operate as a daily system, prioritizes parts that require more urgent replenishment and adapts to limited staff capacity, ensuring timely and efficient replenishment according to fluctuating demand patterns. Being a system that can be operated daily allows the warehouse to continuously adjust replenishment and buffer area management according to real-time demands. In this way, demand fluctuations are responded to immediately, both stock levels are kept balanced, and labor and costs are reduced. Daily operations enable warehouse processes to be more flexible and dynamic, thus creating a structure that is more resistant to sudden changes and increasing customer satisfaction.
The study also emphasizes the crucial role of buffer areas in managing excess stock and preventing congestion. In an automotive warehouse environment, where parts vary significantly in size, demand frequency, and urgency, buffers are particularly vital for managing demand changes, facilitating efficient picking, and reducing congestion in smaller part groups. As part groups increase, the optimization model shifts focus to precise replenishment timing, reducing buffer dependence while keeping the forward area adequately stocked. Without buffer area utilization, handling costs would rise dramatically, especially in the dynamic environment of an automotive spare parts warehouse, where rapid access to parts is critical. The findings clearly demonstrate that strategic replenishment and buffer integration are essential for reducing labor costs, optimizing space usage, and ensuring smooth operations in a demand-sensitive warehouse.
The research suggests strategies that enable more efficient use of space and resources by prioritizing critical areas during order picking, thus supporting smoother operations even with limited staff.
In conclusion, this study’s successful application of integrated ADP and machine learning models highlights the potential to improve warehouse operations and provides a basis for further innovation in inventory management applications. As businesses continue to meet the challenges of the evolving supply chain environment, the methodologies developed here offer strategic advantages, increased operational efficiency, and greater resilience to market fluctuations. This research contributes to a deeper understanding of internal replenishment processes, offering a compelling example of how advanced analytical tools can effectively manage complex logistics operations.

Author Contributions

İ.K. (Corresponding author): conceptualization, formal analysis, investigation, methodology, validation, visualization, writing—original draft, writing—review & editing. M.H.: supervision, methodology, writing—review & editing. A.D.Y.: supervision, methodology, writing—review & editing. G.K.: supervision, methodology, writing—review & editing. V.Ş.E.: supervision, methodology, writing—review & editing. Ş.Y.: conceptualization, formal analysis, investigation. All authors have read and agreed to the published version of the manuscript.

Funding

This study is funded by The Scientific Technological Research Council of Turkey (TUBITAK) under the 2244 Industrial PhD Fellowship Program, grant number 119C085.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are proprietary and were provided under a research agreement. Due to confidentiality agreements and competitive considerations, the raw data cannot be made openly available. However, simulated datasets and the core model code can be made available to academic researchers upon reasonable request.

Conflicts of Interest

Author İrem Kalafat and Şenda Yıldırım are employed by Doğuş Technology as salaried staff members. However, this research was conducted independently and was not funded by the company. The authors declare no other conflicts of interest. Authors Mustafa Hekimoğlu, Ahmet Deniz Yücekaya, Gökhan Kirkil and Volkan Ş. Ediger declare no conflicts of interest.

Abbreviations

  • This are the abbreviations that are used in this manuscript.
Parameters:
N Set of spare parts in the warehouse;
T The planning horizon length in days;
D i t Demand for product i at time t ;
C i Average batch size for the replenishment of product i ;
F i Maximum inventory capacity of the forward area for product i ;
τ The number for daily replenishment capacity;
M i The maximum amount of batch that can be needed for replenishment of product i .
Costs:
c d Cost of direct picking from the reserve area;
c w Cost of keeping inventory in the buffer area;
c w x Cost of pulling inventory from the forward area to the buffer area;
c r Cost of internal replenishment.
Decision Variables:
X B R i t Inventory level of product i   in the forward area at time t   before replenishment;
X A R i t Inventory level of product i   in the forward area at time t   after replenishment;
X i t W Amount of product i   that is pulled into buffer area for replenishment at time t ;
Q i t Replenishment decision for product i   at time t ;
Q N i t Number of cages needed for replenishing product i   at time t from the reserve area;
W B R i t Inventory level of product i   in the buffer area at time t before replenishment;
W A R i t Inventory level of product i   in the buffer at time t after replenishment;
D i t W Amount of demand covered from the buffer area for product i   at time t ;
D i t S Amount of demand covered from the forward area for product i   at time t ;
D i t P Amount of demand covered from the reserve area (direct picking) for product i   at time t ;
C i t D Number of batches needed for direct picking for product i   at time t .

Appendix A

A. 
Dynamic Programming Formulation
1. 
State Variables:
S t = ( X B R i t ,   X A R i t ,   W B R i t ,   W A R i t ) : State at time t representing the inventory levels before and after replenishment in both the forward and buffer areas for each product i .
2. 
Decision Variables:
a t = ( Q i t ,   Q N i t ,   X i t W ,   D i t S ,   D i t W ,   D i t P ,   C i t D ) : Action at time t representing the decisions for each product i .
3. 
Value Function:
V t ( S t ) : The minimum cost-to-go from the state ( S t )   at time t to the end of the planning horizon.
4. 
Bellman Equation:
c o s t t = m i n a t i = 1 N ( Q i t c r + C i t D c d + ( W A R i t D W i t + W B R i t ) c w + X i t W c w x )
V t ( S t ) = m i n a t c o s t t + Ε [ V t + 1 ( S t + 1 ) ]
V t ( S t ) = m i n a t { i = 1 N ( Q i t c r + C i t D c d + ( W A R i t D W i t + W B R i t ) c w + X i t W c w x ) + Ε [ V t + 1 ( X B R i t + 1 , X A R i t + 1 , W B R i t + 1 , W A R i t + 1 ) ] }
5. 
Constraints
State Transition Equations:
X A R i t = X B R i t + Q N i t C i X i t W i N ,   1 t T
X B R i t + 1 = X A R i t D i t S i N ,   1 t T
W A R i t = X i t W + W B R i t i N ,   1 t T
W B R i t + 1 = W A R i t D i t W + C i t D C i D i t P i N ,   1 t T
Capacity Constraints:
X i t W Q i t C i i N ,   1 t T
Q N i t Q i t M i i N ,   1 t T
Q N i t C i + X B R i t X i t W F i i N ,   1 t T
Q N i t C i F i i N ,   1 t T
X A R i t F i i N ,   1 t T
Demand Fulfillment:
D i t W + D i t S + D i t P = D i t i N ,   1 t T
X i t W X B R i t i N ,   1 t T
D i t P C i t D C i i N ,   1 t T
D i t W W A R i t i N ,   1 t T
Total Replenishment Capacity:
i = 1 N Q i t τ i N ,   1 t T
Non-negativity and binary:
X B R i t ,   X A R i t ,   W B R i t ,   W A R i t ,   Q N i t ,   X i t W ,   D i t S ,   D i t W ,   D i t P ,   C i t D 0 i N ,   1 t T
Q i t { 0,1 }
B. 
Dynamic Programming Pseudo Code
This algorithm outlines the standard, computationally intractable backward induction method for solving the full DP formulation of the problem, based on foundational methods described in the literature [24].
Step 0: Initialization
0.1 Load historical demand data from the database
0.2 Define state variables, decision variables, and planning horizon T
0.3 Initialize value function   V t ( S t ) = 0 for final stage T
Step 1: Backward Induction
For t = T − 1, T − 2, ..., 1:
For each state S t :
V t ( S t ) = ∞
For each action a t = ( Q i t , Q N i t , X i t W , D i t S , D i t W , D i t P , C i t D ) :
Compute immediate cost:
c o s t t = m i n a t i = 1 N ( Q i t c r + C i t D c d + ( W A R i t D W i t + W B R i t ) c w + X i t W c w x )
Compute the next state S t + 1 according to the given constraints:
X A R i t = X B R i t + Q N i t C i X i t W
X B R i t + 1 = X A R i t D i t S
W A R i t = X i t W + W B R i t
W B R i t + 1 = W A R i t D i t W + C i t D C i D i t P
Evaluate the next value:
     Next Value = V t + 1 ( S t + 1 )
Update value function:
             V t S t = min ( V t ( S t ) ,   c o s t t + Next Value)
Step 2: Forward Simulation
Initialize starting state S 0
For t = 1, 2,..., T:
Determine optimal action a t *   based   on   V t ( S t )
Apply optimal action a t * :
Execute optimal replenishment decisions   Q i t * , Q N i t * , X i t W * , D i t S * , D i t W * , D i t P * , C i t D *
Update state S t + 1
   Record actual demand and update historical data
   Update initial states for the next iteration
End

References

  1. De Koster, M.B.M. Warehouse Assessment in a Single Tour. In Warehousing in the Global Supply Chain; Springer: London, UK, 2012; pp. 457–473. [Google Scholar]
  2. Hackman, S.T.; Rosenblatt, M.J.; Olin, J.M. Allocating Items to an Automated Storage and Retrieval System. IIE Trans. 1990, 22, 7–14. [Google Scholar] [CrossRef]
  3. van den Berg, J.P.; Sharp, G.P.; Gademann, A.J.R.M.; Pochet, Y. Forward-Reserve Allocation in a Warehouse with Unit-Load Replenishments. Eur. J. Oper. Res. 1998, 111, 98–113. [Google Scholar] [CrossRef]
  4. Heragu, S.S.; Du, L.; Mantel, R.J.; Schuur, P.C. Mathematical Model for Warehouse Design and Product Allocation. Int. J. Prod. Res. 2005, 43, 327–338. [Google Scholar] [CrossRef]
  5. Bartholdi, J.J.; Hackman, S.T. Allocating Space in a Forward Pick Area of a Distribution Center for Small Parts. IIE Trans. 2008, 40, 1046–1053. [Google Scholar] [CrossRef]
  6. Gagliardi, J.-P.; Ruiz, A.; Renaud, J. Space Allocation and Stock Replenishment Synchronization in a Distribution Center. Int. J. Prod. Econ. 2008, 115, 19–27. [Google Scholar] [CrossRef]
  7. Gu, J.; Goetschalckx, M.; McGinnis, L.F. Solving the Forward-Reserve Allocation Problem in Warehouse Order Picking Systems. J. Oper. Res. Soc. 2010, 61, 1013–1021. [Google Scholar] [CrossRef]
  8. Accorsi, R.; Manzini, R.; Bortolini, M. A Hierarchical Procedure for Storage Allocation and Assignment within an Order-Picking System. A Case Study. Int. J. Logist. Res. Appl. 2012, 15, 351–364. [Google Scholar] [CrossRef]
  9. Walter, R.; Boysen, N.; Scholl, A. The Discrete Forward–Reserve Problem—Allocating Space, Selecting Products, and Area Sizing in Forward Order Picking. Eur. J. Oper. Res. 2013, 229, 585–594. [Google Scholar] [CrossRef]
  10. de Vries, H.; Carrasco-Gallego, R.; Farenhorst-Yuan, T.; Dekker, R. Prioritizing Replenishments of the Piece Picking Area. Eur. J. Oper. Res. 2014, 236, 126–134. [Google Scholar] [CrossRef]
  11. Bahrami, B.; Aghezzaf, E.-H.; Limère, V. Enhancing the Order Picking Process through a New Storage Assignment Strategy in Forward-Reserve Area. Int. J. Prod. Res. 2019, 57, 6593–6614. [Google Scholar] [CrossRef]
  12. Wu, W.; de Koster, R.B.M.; Yu, Y. Forward-Reserve Storage Strategies with Order Picking: When Do They Pay Off? IISE Trans. 2020, 52, 961–976. [Google Scholar] [CrossRef]
  13. Jiang, M.; Huang, G.Q. Intralogistics Synchronization in Robotic Forward-Reserve Warehouses for e-Commerce Last-Mile Delivery. Transp. Res. Part E Logist. Transp. Rev. 2022, 158, 102619. [Google Scholar] [CrossRef]
  14. Çelik, M.; Archetti, C.; Süral, H. Inventory Routing in a Warehouse: The Storage Replenishment Routing Problem. Eur. J. Oper. Res. 2022, 301, 1117–1132. [Google Scholar] [CrossRef]
  15. Bertsimas, D.; Demir, R. An Approximate Dynamic Programming Approach to Multidimensional Knapsack Problems. Manag. Sci. 2002, 48, 550–565. [Google Scholar] [CrossRef]
  16. Caro, F.; Gallien, J. Clearance Pricing Optimization for a Fast-Fashion Retailer. Oper. Res. 2012, 60, 1404–1422. [Google Scholar] [CrossRef]
  17. Hopp, W.J.; Spearman, M.L. Factory Physics; Waveland Press: Long Grove, IL, USA, 2011. [Google Scholar]
  18. Simchi-Levi, D.; Kaminsky, P.; Simchi-Levi, E. Designing and Managing the Supply Chain: Concepts, Strategies, and Case Studies; McGraw-Hill: New York, NY, USA, 2008. [Google Scholar]
  19. Azadeh, K.; de Koster, R.; Roy, D. Robotized and Automated Warehouse Systems: A Review and Recent Developments. Transp. Sci. 2019, 53, 917–945. [Google Scholar] [CrossRef]
  20. Besbes, O.; Gur, Y.; Zeevi, A. Non-Stationary Stochastic Optimization. Oper. Res. 2015, 63, 1227–1244. [Google Scholar] [CrossRef]
  21. Stanković, R.; Rogić, K.; Šafran, M. Saving Energy by Optimizing Warehouse Dock Door Allocation. Energies 2022, 15, 5862. [Google Scholar] [CrossRef]
  22. Zhong, J.; Xie, S.; Li, Y.; Liu, J.; Zhou, B. Synergistic Operation Framework for the Energy Hub Merging Stochastic Distributionally Robust Chance-Constrained Optimization and Stackelberg Game. IEEE Trans. Smart Grid 2025, 16, 1037–1050. [Google Scholar] [CrossRef]
  23. Foumani, M.; Moeini, A.; Haythorpe, M.; Smith-Miles, K. A cross-entropy method for optimising robotic automated storage and retrieval systems. Int. J. Prod. Res. 2018, 56, 6450–6472. [Google Scholar] [CrossRef]
  24. Powell, W.B. Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
Figure 1. Internal replenishment process.
Figure 1. Internal replenishment process.
Applsci 15 07767 g001
Figure 2. Total cost comparison over time for 50 parts.
Figure 2. Total cost comparison over time for 50 parts.
Applsci 15 07767 g002
Figure 3. Total cost comparison over time for 200 parts.
Figure 3. Total cost comparison over time for 200 parts.
Applsci 15 07767 g003
Figure 4. Total cost comparison over time for 300 parts.
Figure 4. Total cost comparison over time for 300 parts.
Applsci 15 07767 g004
Table 1. Handling activities for different part groups (50, 200, 300 parts).
Table 1. Handling activities for different part groups (50, 200, 300 parts).
Part GroupScenarioDirect Picking (%)Replenishment (%)Buffer Area Usage (%)Forward-to-Buffer Area Transfer (%)
50 PartsSimulation PD13.83 ± 0.733.75 ± 1.118.91 ± 1.515.75 ± 1.2
(s,S) Policy19.5 ± 1.129.1 ± 1.825.4 ± 2.118.3 ± 1.9
Business as Usual28.06 ± 1.332.47 ± 1.456.78 ± 2.523.98 ± 2.0
Simulation KD2.6636.1615.4122.5
200 PartsSimulation PD6.83 ± 0.417.85 ± 0.97.708 ± 0.86.5 ± 0.6
(s,S) Policy10.2 ± 0.815.5 ± 1.212.1 ± 1.49.8 ± 1.0
Business as Usual22.78 ± 1.217.41 ± 1.046.05 ± 2.812.01 ± 1.1
Simulation KD218.660.5211.08
300 PartsSimulation PD5.541 ± 0.314.55 ± 0.85.70 ± 0.74.75 ± 0.5
(s,S) Policy8.9 ± 0.613.1 ± 1.09.5 ± 1.17.2 ± 0.8
Business as Usual19.15 ± 1.114.24 ± 0.943.88 ± 2.910.30 ± 1.0
Simulation KD3.1114.331.027.83
Table 2. Sum and mean of handling activities for different part groups (50, 200, 300 parts).
Table 2. Sum and mean of handling activities for different part groups (50, 200, 300 parts).
Part GroupScenarioDirect Picking (Sum/Mean)Replenishment (Sum/Mean)Buffer Usage (Sum/Mean)Forward-to-Buffer Transfer (Sum/Mean)
50 PartsSimulation PD18,822/15.69 ± 0.9405/0.34 ± 0.033067/2.55 ± 0.213231/2.69 ± 0.24
(s,S) Policy26,520/22.1 ± 1.2361/0.3 ± 0.0457,720/48.1 ± 4.54944/4.12 ± 0.38
Business as Usual42,633/35.49 ± 1.5390/0.32 ± 0.03131,074/109.13 ± 5.123,479/19.54 ± 1.7
Simulation KD9711/8.09434/0.3615,658/13.043750/3.13
200 PartsSimulation PD32,381/6.74 ± 0.5857/0.17 ± 0.024623/0.96 ± 0.15142/1.07 ± 0.11
(s,S) Policy47,040/9.8 ± 0.7768/0.16 ± 0.027200/1.5 ± 0.186969/1.45 ± 0.15
Business as Usual66,492/13.84 ± 0.9836/0.17 ± 0.02478,690/99.70 ± 8.255,486/11.55 ± 1.0
Simulation KD26,842/5.59896/0.19234/0.059296/1.94
300 PartsSimulation PD33,124/4.60 ± 0.41044/0.15 ± 0.025592/0.78 ± 0.095879/0.81 ± 0.08
(s,S) Policy48,600/8.1 ± 0.6936/0.13 ± 0.028280/1.15 ± 0.147560/1.05 ± 0.12
Business as Usual66,611/9.25 ± 0.71026/0.14 ± 0.02302,231/41.97 ± 4.546,077/6.39 ± 0.6
Simulation KD27,302/3.791032/0.14237/0.0310,641/1.47
Table 3. Handling activities for different planning horizons with predicted demands (50 parts).
Table 3. Handling activities for different planning horizons with predicted demands (50 parts).
Planning HorizonDirect Picking (%)Replenishment (%)Buffer Area Usage (%)Forward-to-Buffer Area Transfer (%)
1 Week13.83 ± 0.733.75 ± 1.118.91 ± 1.518.91 ± 1.2
2 Weeks13.83 ± 0.733.75 ± 1.118.91 ± 1.518.91 ± 1.2
3 Weeks13.73 ± 0.733.97 ± 1.119.067 ± 1.519.067 ± 1.2
Table 4. Sum and mean of handling activities for different planning horizons (50 parts).
Table 4. Sum and mean of handling activities for different planning horizons (50 parts).
Planning HorizonDirect Picking (Sum/Mean)Replenishment (Sum/Mean)Buffer Usage (Sum/Mean)Forward-to-Buffer Transfer (Sum/Mean)
1 Week18822/15.685 ± 0.9405/0.3375 ± 0.033067/2.55 ± 0.213231/2.692 ± 0.24
2 Weeks18822/15.685 ± 0.9405/0.3375 ± 0.033067/2.55 ± 0.213231/2.692 ± 0.24
3 Weeks18710/15.592 ± 0.9409/0.341 ± 0.033095/2.575 ± 0.213241/2.701 ± 0.24
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kalafat, İ.; Hekimoğlu, M.; Yücekaya, A.D.; Kirkil, G.; Ediger, V.Ş.; Yıldırım, Ş. An Integrated Framework for Internal Replenishment Processes of Warehouses Using Approximate Dynamic Programming. Appl. Sci. 2025, 15, 7767. https://doi.org/10.3390/app15147767

AMA Style

Kalafat İ, Hekimoğlu M, Yücekaya AD, Kirkil G, Ediger VŞ, Yıldırım Ş. An Integrated Framework for Internal Replenishment Processes of Warehouses Using Approximate Dynamic Programming. Applied Sciences. 2025; 15(14):7767. https://doi.org/10.3390/app15147767

Chicago/Turabian Style

Kalafat, İrem, Mustafa Hekimoğlu, Ahmet Deniz Yücekaya, Gökhan Kirkil, Volkan Ş. Ediger, and Şenda Yıldırım. 2025. "An Integrated Framework for Internal Replenishment Processes of Warehouses Using Approximate Dynamic Programming" Applied Sciences 15, no. 14: 7767. https://doi.org/10.3390/app15147767

APA Style

Kalafat, İ., Hekimoğlu, M., Yücekaya, A. D., Kirkil, G., Ediger, V. Ş., & Yıldırım, Ş. (2025). An Integrated Framework for Internal Replenishment Processes of Warehouses Using Approximate Dynamic Programming. Applied Sciences, 15(14), 7767. https://doi.org/10.3390/app15147767

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop