1. Introduction
In the era of Industry 4.0, Logistics 4.0 has emerged as a paradigm that integrates core technologies with logistics sector needs [
1]. The rapid development of smart cities and the surge in e-commerce order volumes have intensified the demand for automated packing strategies that can adapt to goods diversity while ensuring stability and optimal space utilization [
2]. Currently, logistics companies predominantly rely on manual operations that frequently fail to achieve symmetric load distribution, resulting in transportation instability, increased costs, and safety hazards [
3]. This operational reality necessitates intelligent methods capable of maintaining geometric and temporal symmetry in real-time decision-making scenarios [
4].
The Three-Dimensional Container Loading Problem (TDCLP) represents a fundamental challenge in logistics optimization, where symmetry plays a crucial role in ensuring balanced spatial distribution and center-of-gravity stability. Junqueira et al. [
5] identified two key modeling paradigms where symmetry constraints are essential. However, modern logistics operations present an even more complex challenge characterized by “dual unknown sequences,” where both containers and items arrive dynamically and stochastically. This temporal uncertainty, combined with the requirement for continuous symmetry maintenance across geometric, temporal, and computational dimensions, creates a problem space that existing methods have not adequately addressed [
6].
To systematically understand the evolution and limitations of current solutions, we organize existing approaches into four main categories: traditional optimization algorithms, metaheuristic methods, deep learning approaches, and industrial applications. Classical optimization methods have laid the foundation for container loading problems through various algorithmic frameworks. Heuristic algorithms based on layer generation [
7] implicitly consider horizontal symmetry, while multilayer tree search algorithms [
8] maintain vertical balance through ternary and quaternary structures. Branch-and-price algorithms [
9] attempt to balance asymmetric product characteristics with symmetric packing goals, and multi-objective algorithms for large-scale cargo with stacking constraints [
10] explicitly optimize symmetry metrics alongside efficiency. Constructive heuristics for irregular shaped items [
11] further extend these capabilities. Despite these contributions, traditional approaches suffer from combinatorial explosion when incorporating symmetry requirements and cannot handle dynamic arrivals in real-time scenarios.
The emergence of metaheuristic methods has brought adaptive capabilities to packing problems. Genetic algorithms with prioritized strategies [
12] maximize space utilization through symmetric placement patterns, while Particle Swarm Optimization [
13] demonstrates stability by maintaining load symmetry with adaptive weights and parallel computing capabilities. The Aquila Optimizer [
14] achieves convergence through symmetric perturbation strategies and dynamic fusion policy optimization [
15] enhances adaptability. Particle Swarm Optimization [
16] demonstrates stability by maintaining load symmetry with adaptive weights and parallel computing capabilities using differential mutation. Nevertheless, these metaheuristic approaches remain limited to scenarios with known containers and are generally unsuitable for real-time applications due to slow convergence rates, particularly in large-scale problems [
17].
Recent advances in deep learning have introduced new paradigms for solving complex packing optimization problems. Multimodal deep reinforcement learning [
18] and attention-based Attend2Pack methods [
19] enhance spatial understanding, while deep policy dynamic programming [
20] originally developed for vehicle routing provides transferable insights. Tsang et al. [
21] modeled 3D bin packing with weight distribution symmetry as explicit constraints, while online 3D bin packing with buffer [
22] improves real-time performance. Reinforcement learning frameworks [
23] enhance performance through symmetric action spaces. Actor-Critic frameworks [
24] reduce complexity using balanced stacking trees, and methods integrating heuristics with deep learning [
25] improve online packing by exploiting spatial symmetries. Adjustable robust reinforcement learning [
26] addresses uncertainty in online scenarios. Among the most promising architectures, Transformer-based methods [
27] excel in offline scenarios through self-attention mechanisms and the GOPT method [
28] for multi-environment generalization. However, their computational complexity severely hinders real-time applications, and they struggle with temporal causality in online settings. Graph Neural Networks [
29] effectively capture geometric constraints through TAP-Net architecture by modeling spatial relationships, yet the repeated graph reconstruction required in dynamic scenarios incurs prohibitive computational costs. Critically, all existing deep learning methods assume containers are known a priori, failing to address the dual unknown sequences characteristic of real logistics operations [
30].
Industrial deployments reveal persistent challenges in maintaining symmetry during dynamic operations. Mainstream warehouse management systems, including SAP EWM and Manhattan systems [
31], struggle with asymmetric item flows and irregular shapes. Amazon’s robotic centers employ learning-based suggestions but require human intervention for symmetry verification, while JD Logistics achieves automation but faces efficiency bottlenecks when handling asymmetric fragile goods [
32]. Recent learning-augmented algorithms [
33] incorporate prediction mechanisms for temporal symmetry through consistency-robustness tradeoffs, and smart logistics concepts [
34] enhance collaboration through symmetric information flow between physical and digital systems. However, these industrial solutions primarily rely on traditional heuristics and simple switching strategies, lacking the sophistication to handle complex temporal dependencies in time-series logistics [
35].
Table 1 compares the differences in key information between different algorithms.
Synthesizing existing research reveals several critical gaps that directly motivate our proposed Second-Order Dual Pointer Adversarial Network (So-DPAN). First, the fundamental assumption of known containers in all existing online methods creates a significant disconnect with real logistics operations where both containers and items arrive stochastically—our work is the first to formalize and solve this “dual unknown sequences” problem. Second, the computational complexity of state-of-the-art deep learning methods, with O(n2) for Transformers and substantial graph reconstruction overhead for GNNs, prohibits real-time deployment, while our O(n) dual pointer mechanism directly addresses this limitation. Third, existing algorithms treat symmetry as a byproduct rather than a design objective, leading to suboptimal stability and load distribution, whereas our framework explicitly optimizes for multi-dimensional symmetry. Fourth, current approaches lack mechanisms to capture complex temporal dependencies in time-series logistics, a capability provided by our second-order structure. Finally, the gap between academic research and industrial deployment remains substantial, and our architecture is specifically designed for compatibility with existing WMS infrastructure while maintaining theoretical rigor.
The core innovation of this study lies in systematically introducing symmetry-aware time-series characteristics into 3D packing problems. Unlike traditional static formulations, our approach features “dual unknown sequences” with both containers and items arriving dynamically, requiring continuous symmetry maintenance throughout the packing process. The So-DPAN architecture decomposes this complex problem into symmetric sub-problems of sequence matching and spatial optimization, ensuring geometric symmetry in placement, temporal symmetry in processing, and computational symmetry in algorithm design.
The main contributions are as follows: (1) introducing time-series concepts with explicit symmetry constraints into 3D packing, establishing mathematical models that balance efficiency with stability through symmetric optimization; (2) proposing So-DPAN architecture that handles dynamic matching through symmetric dual pointer mechanisms, achieving linear time complexity compared to quadratic complexity in Transformer-based methods, while avoiding the graph reconstruction overhead of GNN approaches, and enhancing robustness via adversarial training; (3) designing phased reinforcement learning that maintains symmetry across decomposed sub-problems, improving convergence through balanced Actor-Critic frameworks; (4) developing deployable algorithms that preserve symmetry metrics while integrating with existing systems, meeting real-time requirements.
2. Problem Description and Modeling
This paper proposes an executable model for automated packing scenarios. By converting placement parameters into continuous variables and combining integer programming with evolutionary algorithms, the model achieves continuous optimization in discrete scenarios. Each container is considered as a fragmented space, while the goods are treated as discrete units occupying these spaces, aiming to improve space utilization and loading efficiency. The fitness function considers objectives such as avoiding overlap, maintaining boundaries, minimizing space wastage, and maximizing benefits, providing a novel solution for the three-dimensional packing problem.
2.1. Problem Description
This study introduces the concept of time series into the traditional packing problem, constructing a dynamic decision-making environment. In this environment, goods and containers continuously emerge as discrete events along the time series, and the system needs to match and optimize their loading in real-time.
Unlike the traditional online packing problem where “containers are known and goods are unknown,” the time-series packing problem investigated in this study features both containers and goods as dynamically generated, unknown sequences. This “dual unknown sequence” matching decision mode not only aligns more closely with real-world logistics scenarios but also significantly increases the complexity of the problem. In this framework, the system must comprehensively consider the multidimensional attributes of goods and containers (such as size, shape, weight, arrival time, etc.) to achieve a globally optimal matching strategy.
The problem is modeled as a time-series-based three-dimensional online packing problem, breaking the limitations of traditional methods that focus solely on single scenarios or batch processing. By establishing a dynamic matching mechanism in the continuous time domain, this method more accurately describes the ongoing operational needs in real logistics environments, providing a new approach to real-time decision-making in dynamic logistics networks.
Definition 1.
The attributes of item i are represented by a 10-dimensional vector , where
: Identifier for item i, generally sequenced by time, with potential repetition for items arriving simultaneously.
,: Length, width, and height of item i.
: Weight of item i.
: Arrival time of item i.
: Shape characteristics of item i.
: Priority level for handling item i.
: Ability to be stacked, indicating if other items can be placed on top of item i.
: Constraints on orientation, such as “must be upright” or “must not be inverted.”
Definition 2.
The attributes of container C are represented by a 6-dimensional vector , where
: The attributes of container C, generally sequenced by time.
,: Length, width, and height of the container.
: Maximum load capacity of the container.
: Current loading status of the container.
As shown in
Figure 1, the target container ID corresponds to a specific container within the sequence. The arrival time
indicates the point in time when the item enters the process. The loading status
records information on currently loaded items, providing data to support subsequent packing optimization. Each container can load multiple items, and placement decisions depend on factors such as size, weight, and arrival time of each item.
2.2. Objectives and Constraints
2.2.1. Basic Symbol Definition
: Container, with defined length, width, and height.
: Number of items.
: The i-th item, with defined length, width, height, and weight.
,
,
: Position coordinates of the item in the container.
: Orientation of the item, usually represented as a triplet indicating orientation in three dimensions.
2.2.2. Formal Constraints
- (1)
Container Space Constraints
Each item must be fully within the container:
- (2)
Container Shape Constraints
Ensure that the total volume of all items is less than or equal to the volume of the container, and in each dimension (length, width, height), the item size does not exceed the corresponding container dimension:
- (a)
Total volume constraint
- (b)
Dimension constraints
- (3)
Center of Gravity Constraints
The overall center of gravity of the loaded items should be within a certain range, represented by
, where
Ideally, the center of gravity should be at the geometric center of the container when fully loaded, and it should satisfy the constraint of being within a spherical region centered on
, and thus be constrained as follows:
- (4)
Item Orientation Constraints
Ensure that items are placed in an appropriate orientation such as “must be upright” or “must not be inverted.”
- (5)
Item Stackability Constraints
To ensure stability and safety during stacking, a stacking factor is defined for each item, indicating if other items can be stacked on it. If = 1, stacking is allowed. If = 0, it is not allowed.
If goods are stacked on top of goods , the following conditions must be met: and .
- (6)
Complete Support Constraint
Each non-bottom item must be fully or partially supported by at least one item below it to ensure stability (as shown in
Figure 2). Specifically, for each upper item, there must exist a lower item such that a portion of its base overlaps directly with the top of the lower item.
Define the support function
as follows:
For each non-bottom cargo there must be at least one to let = 1.
- (7)
No Overlap Constraint
No two items should overlap in space:
For all items i, j, either , or , or , or .
2.2.3. Formal Objectives
This study constructs an optimization model for the time-series-based three-dimensional online packing problem, focusing on designing a multi-dimensional, comprehensive objective function. This function integrates core performance metrics such as space utilization, item waiting time, and processing cost, ensuring a balanced optimization strategy through reasonable weight distribution.
Considering the dynamic nature of the online packing problem, the objective function introduces a time window constraint to handle the arrival and departure time of items. It also incorporates dynamic readjustment costs to balance the economic impact of strategy adjustments in real-time environments. Additionally, by comprehensively considering the quality and volume characteristics of the items, the objective function further optimizes the stability and safety of the packing process.
where
represents the weight coefficient of different optimization objectives, ensuring that the objective function comprehensively reflects multiple performance indicators. The specific calculation methods for each indicator will be determined based on practical application scenarios and available data.
3. Algorithm Designs
The entire algorithm consists of two components: a dual pointer network-based algorithm for matching goods and containers, which is trained using an Actor–Critic approach, and a three-dimensional online packing algorithm for determining physical placement within containers. Together, these form an integrated solution for the sequence of “goods arrival–container selection–placement”. The matching mechanism relies on the dual pointer network introduced in prior work, which addressed static matching problems. In contrast, the current So-DPAN architecture extends this foundation with three major innovations: (1) second-order optimization for capturing temporal dependencies through historical matching patterns; (2) an adversarial training framework that improves robustness against distribution shifts in item and container sequences; and (3) hierarchical integration with spatial packing decisions, where matching outcomes directly guide placement optimization. These enhancements enable So-DPAN to effectively handle dynamic environments with dual unknown sequences, a scenario not supported by the original model.
This section is divided into two parts: the time-series sequence matching algorithm and the three-dimensional space loading positioning algorithm. The time-series sequence matching part will introduce the core of the dual pointer network algorithm and the training details using the Actor–Critic network, while the three-dimensional space loading positioning part will focus on the combination of deep reinforcement learning and search algorithms to solve the three-dimensional packing problem. An overview of the entire algorithm framework is shown in
Figure 3:
As illustrated in
Figure 3, the So-DPAN architecture introduces several key innovations: a dual-stream processing structure that enables parallel handling of sequence matching and spatial placement, facilitating joint optimization through interactive learning; second-order temporal modeling that captures changing trends in item arrival rates for better anticipation of logistics dynamics; an adversarial training mechanism that enhances robustness against real-world distribution shifts; and a hierarchical decision integration that ensures spatial placement respects matching constraints. The model employs a dual-pointer mechanism with bidirectional attention—tracking both upcoming items and available container capacity—which reduces computational complexity to linear scale relative to sequence length.
3.1. Time-Series Sequence Matching Algorithm
Standard data normalization and scaling cannot fully capture the relationships between various attributes in the time-series online packing problem, nor the interactions between goods and containers. Therefore, specialized preprocessing methods are required. This subsection will introduce preprocessing methods suitable for matching goods and containers in the time-series online packing problem.
3.1.1. Feature Encoding
Preprocessing transforms item attribute information into unit-free data that can be processed by neural networks. Additional attributes are added to item attributes, while container attributes need certain properties removed. The goods and container descriptions used for training neural networks are referred to as “goods features” and “container features,” represented by vectors
and
, respectively, defined as follows:
3.1.2. Feature Embedding
In the step after feature embedding preprocessing, a fully connected layer is applied for container feature embedding, while for goods feature embedding, a method incorporating Self Attention and Multi-Head Attention mechanisms is employed, followed by a fully connected layer, as shown in
Figure 4:
Considering the spatial correlation among the three-dimensional coordinates of goods, this study uses Self Attention and Multi-Head Attention mechanisms to process the coordinates. Specifically, the three-dimensional coordinates of the goods are converted into QKV triplets, and coordinate embeddings are generated through the attention mechanism, which are then input into a fully connected layer along with container features to generate goods and container feature representations. The Self Attention computation process is shown in
Figure 5:
where
are matrices obtained through linear transformations of item coordinates,
is the output dimension, and C is the matrix formed by the item coordinates, representing the model parameters.
Each head is a stack of multiple Self Attentions modules with residual connectivity in between. Let be the computation result of the ith Attention header, in which the computation of the splicing and linear transformation part is formulated as follows:
3.1.3. Sequence Mapping and Matching Results Mapping
This study focuses on the time-series three-dimensional online packing problem, which involves efficiently assigning goods arriving in a time series to containers also appearing over time. The main method for sequence mapping and matching is based on a greedy algorithm that converts two time-ordered sequences into matching results, as Algorithm 1:
Algorithm 1 Time Sequence Mapping Algorithm |
1: | Input: arriving over time, sequence of appearing over time. |
2: | Output: Matching Matrix D |
3: | Initialize is the number of containers. |
4: | in the cargo sequence, following its time of arrival do |
5: | for each container in the container sequence, following its time of appearance do |
6: | Extract the 3D dimensions of : (length, width, height) |
7: | Extract the 3D dimensions of : [ (length, width, height) and current load state |
8: | if < and < and < then |
9: | Set = 1 in matrix |
10: | Break and move to the next cargo for matching |
11: | end if |
12: | end for |
13: | end for |
The Algorithm 1 iterates through the item sequence, attempting to find a suitable container for each item by sequentially searching through the container sequence. The process stops when a suitable container is found, and the matching result is recorded.
3.2. Three-Dimensional Packing Positioning Algorithm
In the deep So-DPAN framework, this section aims to design the action space for the three-dimensional packing problem using a human-like strategy. The human-like strategy simulates the intuition, experience, and decision-making process of human experts in solving three-dimensional packing problems. This approach is reflected in the action space design, which seeks to understand and imitate human decision-making patterns.
3.2.1. Human-like Placement Strategy Action Space
In the three-dimensional packing problem, simulating human experts’ spatial planning capabilities is crucial for action space design. This process involves understanding and imitating human intuition and experience when deciding the placement location and orientation of items. To achieve this goal, a comprehensive placement strategy algorithm was developed to simulate human decision-making during spatial planning and item placement (
Figure 6 for details).
Firstly, the algorithm needs to consider the internal spatial layout of the container, which involves modeling the container’s interior in three dimensions and keeping track of the real-time spatial position of already placed items. When placing items, human experts determine the optimal placement based on the container’s spatial layout and current fill status. To simulate this, three-dimensional spatial analysis techniques are employed to model the container’s interior precisely and update space changes after item placement in real time. The algorithm determines placement based on the characteristics of the items, such as size, shape, weight, and specific handling requirements (e.g., fragility or orientation). Imitating human intuition is particularly important here. For instance, heavier items are typically placed at the bottom to increase stability, while regular-shaped items are prioritized to improve space utilization. Additionally, the algorithm considers the interaction between items. In practical packing scenarios, human experts assess the fit between items to avoid unusable spaces between irregular-shaped items. Therefore, an optimization module is included in the algorithm to evaluate the impact of different placement plans on overall space utilization.
Combining the outputs from each of the above modules, we have designed the following strategy for placing actions:
In order to formally describe each step of the above advanced placement action algorithm, we can introduce a series of mathematical formulas and functions to precisely define the logic of the algorithm. The following is a formal description of each step:
Let be the 3D spatial state of the container and be the remaining spatial volume of the container.
Let
be a function to calculate the space utilization around the goods of
.
- (2)
Supporting considerations
Let be the weight of the cargo and be the weight distribution of the container.
Let
be a function to evaluate the weight distribution of a shipment after placement.
- (3)
Shape adaptation analysis
Let be the geometric shape of the goods and be the shape fitness score.
Let
be a function that evaluates the match between the shape of the goods and the remaining space.
- (4)
Mechanisms for the protection of fragile items
Let be an indicator of cargo fragility and be a risk score for fragile cargo placement.
Let
be a function to evaluate the risk of placing fragile goods.
- (5)
Optimization solver
Let
be a function to find the optimal placement scheme under the given constraints.
- (6)
Multi-program evaluation
Let
be a function to evaluate multiple placement actions.
Combining the above steps, the mathematical expression for the final placement decision is
3.2.2. Human-like Micro-Adjustment Mechanism
For formalizing each step of the space adjustment algorithm, a series of functions and mathematical expressions are introduced to precisely define the logic. Here are the formal descriptions of each step:
- (a)
Local Space Assessment
Let
be a function to evaluate the utilization of space around the goods of
.
- (b)
Dynamic Weight Distribution Analysis
Let
be an algorithmic function to analyze the weight distribution under the current cargo layout.
This algorithm aims to optimize the center of gravity adjustment in the three-dimensional online packing process based on the time series to enhance loading safety and stability. Initially, it evaluates whether the center of gravity in the packing configuration meets safety standards. If not, the algorithm calculates the necessary center of gravity shift and identifies adjustable items. By simulating different item moving plans, the algorithm finds layouts that effectively shift the center of gravity to a safe area, iterating as needed to ensure the final configuration both optimizes space utilization and maintains stability, The pseudo-code is Algorithm 2:
Algorithm 2 Time Sequence Center of Gravity Update Algorithm |
1: | Input: |
2: | Output: if the center of gravity is outside the safety range |
3: | : is the position vector of item i. |
4: | if is within the safety range then |
5: | |
6: | else |
7: | to bring the center of gravity within the safety range: is the desired center of gravity. |
8: | . |
9: |
do |
10: | if item i is movable then |
11: | : is a scaling factor that adjusts item movement. |
12: | based on the new layout: |
13: | is within the safety range then |
14: | |
15: | end if |
16: | end if |
17: | end for |
18: | is not empty then |
19: | Execute all adjustments in |
20: | |
21: | else |
22: | return “Adjustment not possible” |
23: | end if |
24: | end if |
- (c)
Geometry Matching and Adjustment
Let
be a function to evaluate the efficiency of the geometric layout of the current cargo.
- (d)
Safety Considerations
Let
be a function that evaluates the impact of an adjusted cargo layout on the security of fragile or high-value cargo.
- (e)
Safety Considerations
Let
be a function that is used to find the optimal cargo adjustment solution, subject to satisfying specific constraints.
Combining the above steps, the final decision to adjust the action can be formally represented as follows:
where each function represents a key step in the algorithm and combines to form a comprehensive, multi-dimensional tuning strategy. This formal description allows the tuning space algorithm to mimic the fine-tuning strategies of human experts in all aspects, providing a more refined and efficient solution to the 3D crating problem.
3.2.3. Placement Decision Algorithm Design
The model adopts an attention-based neural network with residual connections as the encoder to extract item features. The actor network, built upon a pointer mechanism [
14], serves as the decoder to select leaf nodes and output actions, while the critic network evaluates these actions and predicts future packing states and rewards. The actor incorporates a multi-head attention layer (8 heads, 64 dimensions each) and uses a combined feature vector of size 256 as input, processed through fully connected layers. The critic shares part of the encoder and uses a value head to estimate state values, trained with TD (λ) (λ = 0.95) to reduce variance.
The entire network is trained using a composite loss function that incorporates both actor and critic objectives, as illustrated in
Figure 7. The ACKTR algorithm [
23] is employed for natural gradient updates, utilizing a Kronecker-factored approximation of the Fisher Information Matrix to update parameters efficiently. This approach replaces traditional gradient descent and enhances the stability and convergence of the agent during training. The definitions for loss functions are
The discount factor is set to 1 due to the limited data involved in the packing process. The reward function is
, as our primary optimization objective is to maximize the space utilization of the container. Additional attribute constraints can be added to the reward function if required for specific item properties, such as species or density. The packing process terminates when an item cannot be placed. Specifically, when symmetry and stability constraints are critical (e.g., in fragile goods or heavy machinery transportation), the reward function can be extended as follows:
where the symmetry reward component is
measuring the deviation of center of gravity from the container’s geometric center, with
controlling sensitivity.
5. Conclusions
This paper has addressed a critical gap in logistics optimization by formulating and solving the time-series-based three-dimensional online packing problem with dual unknown sequences. Our primary contribution, the Second-Order Dual Pointer Adversarial Network (So-DPAN), represents a fundamental advancement in handling dynamic logistics scenarios where both containers and items arrive stochastically over time. The proposed architecture successfully decomposes the complex spatiotemporal optimization problem into manageable sequence matching and spatial arrangement components, achieving superior performance compared to existing approaches. In particular, the 93.2% space utilization achieved by So-DPAN substantially surpasses the 85–87% range reported by recent Transformer-based approaches [
27,
28], while our dual pointer mechanism overcomes the sequence matching challenges identified by Tsang et al. [
21]. The key innovations manifest in three dimensions: first, the problem formulation captures real-world logistics dynamics through explicit modeling of temporal dependencies and dual uncertainty, moving beyond traditional assumptions of known container sets; second, the So-DPAN architecture leverages adversarial training and second-order optimization to achieve robust performance across diverse operational conditions; third, the phased reinforcement learning strategy with ACKTR optimization demonstrates significant improvements in convergence speed and solution quality, making real-time deployment feasible.
Our comprehensive experimental evaluation reveals that So-DPAN consistently outperforms baseline algorithms including DQN, DDPG, and traditional heuristics across multiple metrics and problem scales. The computational efficiency gains demonstrated by So-DPAN—maintaining 0.48 s solving times for 5000-item instances—directly address the scalability concerns raised by Montes-Franco et al. [
17] regarding hybrid algorithms with exponential complexity, and overcome the computational bottlenecks identified by Bonet Filella et al. [
32] as barriers to real-time optimization. Compared to the learning-augmented algorithms proposed by Grigorescu et al. [
33] and Angelopoulos et al. [
35], which achieve consistency-robustness tradeoffs through prediction mechanisms, our adversarial training component enhances robustness without requiring explicit prediction models. Furthermore, our symmetry-aware reward design advances beyond the implicit stability considerations in previous work [
5,
10] by providing explicit mathematical formulations for balance optimization, addressing the gap identified by Junqueira et al. [
5] between theoretical models and practical stability requirements. The algorithm’s ability to generalize across different container configurations and item distributions addresses crucial requirements for practical deployment, while its compatibility with existing warehouse management systems facilitates immediate industrial adoption.
The generalization capabilities demonstrated across diverse problem scales extend the findings of Pan et al. [
26] on adjustable robust reinforcement learning, showing that architectural innovations can achieve robustness without sacrificing average-case performance. While their AR2L framework balances scenarios through adjustable parameters, our dual pointer mechanism inherently maintains consistent performance across varying complexity levels without parameter tuning, addressing the generalization challenges highlighted by Hadjidj and Oulamara [
30]. The successful integration of spatial and temporal constraints also validates theoretical predictions from smart logistics literature [
34] regarding holistic optimization approaches. Despite these contributions, several limitations warrant acknowledgment: the current implementation assumes deterministic item properties, whereas real operations involve measurement uncertainties, and our approach does not explicitly model seasonal patterns or long-term trends. Future research should extend So-DPAN to handle stochastic properties, incorporate predictive demand forecasting, integrate with robotic manipulation systems, and include sustainability metrics. This research establishes a foundation for addressing complex logistics challenges in Industry 4.0, providing both theoretical insights and practical tools for next-generation smart logistics systems.