MixedPalletBoxes Dataset: A Synthetic Benchmark Dataset for Warehouse Applications

Adamos Daios; Ioannis Kostavelis

doi:10.3390/asi9010014

and

Department of Supply Chain Management, International Hellenic University, Kanellopoulou 2, 601 32 Katerini, Greece

^*

Author to whom correspondence should be addressed.

Appl. Syst. Innov.2026, 9(1), 14;https://doi.org/10.3390/asi9010014

Version Notes

Order Reprints

Abstract

Mixed palletizing remains a core challenge in distribution centers and modern warehouse operations, particularly within robotic handling and automation systems. Progress in this domain has been hindered by the lack of realistic, freely available datasets for rigorous algorithmic benchmarking. This work addresses this gap by introducing MixedPalletBoxes, a family of seven synthetic datasets designed to evaluate algorithm scalability, adaptability and performance variability across a broad spectrum of workload sizes (500–100,000 records) generated via an open source Python script. These datasets enable the assessment of algorithmic behavior under varying operational complexities and scales. Each box instance is richly annotated with geometric dimensions, material properties, load capacities, environmental tolerances and handling flags. To support dynamic experimentation, the dataset is accompanied by a FastAPI-based tool that enables the on-demand creation of randomized daily picking lists simulating realistic inbound orders. Performance is analyzed through metrics such as pallet count, volume utilization, item distribution per pallet and runtime. Across all dataset sizes, the distributions of the physical attributes remain consistent, confirming stable generation behavior. The proposed framework combines standardization, feature richness and scalability, offering a transparent and extensible platform for benchmarking and advancing robotic mixed palletizing solutions. All datasets, generation code and evaluation scripts are publicly released to foster open collaboration and accelerate innovation in data-driven warehouse automation research.

Keywords:

mixed palletizing; benchmark dataset; warehouse logistics; logistics data analytics; robotic manipulation; automation systems

1. Introduction

The role of artificial intelligence in supply chain management is steadily increasing, driving improvements in efficiency, resilience and decision making through data-driven insights and automation [1]. Among the core daily operations of modern distribution centers, mixed palletizing is understood as the accurate and efficient arrangement of packages onto pallets for dispatch and is increasingly supported by Industry 4.0 technologies to improve speed, flexibility and cost effectiveness [2]. The variety of packages is extremely wide and the presence of constraints makes this problem computationally demanding and operationally complex [3]. In previous research [4], solutions that tackle this challenge were reviewed and classified, such as exact and heuristic algorithms [5,6], big data analytics, reinforcement learning [7] and simulation. In addition, mixed palletizing tasks are predominantly executed by robotic systems, which leverage AI-driven perception, motion planning and control to efficiently handle diverse types of packages. To address the identified lack of suitable benchmarks, the present work introduces the MixedPalletBoxes dataset, a synthetic benchmark designed to support the development, training and evaluation of robotic mixed palletizing solutions. Researchers have pointed out a widespread lack of realistic datasets to validate mixed palletizing solutions [8,9], and only a limited number of approaches have been proposed in the literature [10,11].

The paper at hand aims to tackle this gap by introducing an open group of datasets that are publicly accessible on Github. This bundle contains various record datasets, spanning 500 to 100,000, that contain geometric dimensions, material properties, load capacities, environmental tolerances and handling flags (fragility, waterproofing, fire retardancy). These datasets are fixed and reproducible, and researchers are welcome to download and apply them in their evaluations. Additionally, the Python script box_generator used to generate these datasets is also made available for any changes and tweaks necessary to the generator code. Examples of alterations are different ranges in dimensions, adding/removing constraints and setting a different number of generated records. Also available is a software tool box_filter, which is an Application Programming Interface (API) that can be used offline to construct random picking lists from any dataset. This simulates the process of inbound picking orders that distribution centers fulfill constantly in their daily operations. The box_filter tool can be adjusted to create random picking lists with dimension criteria and the presence or not of specific constraints.

This collection of datasets addresses core challenges in the research community for mixed palletizing, namely the following:

Standardization: By providing a unified data source with controlled variability, algorithms can be compared more reliably without differences in input distributions affecting the results.
Rich Feature Space: The dataset extends beyond geometric description by incorporating material properties, load capacity, fragility and environmental tolerances, allowing algorithmic evaluation to account for practical handling constraints, such as limiting heavy loads on fragile items and respecting temperature-sensitive stacking conditions.
Scalability: Researchers can easily adjust the dataset size or customize parameter distributions, such as increasing the proportion of metal boxes or narrowing the range of temperature tolerances, to simulate various industrial scenarios.
Benchmarking and Extension: The dataset establishes a foundation for community-wide benchmarks, with potential future enhancements like irregular box shapes, dynamic loading sequences, or integration with 3D spatial simulations.

1.1. Research Objectives and Contributions

A coherent research gap arises from the absence of publicly accessible benchmark datasets that combine geometric descriptions with operationally meaningful attributes within a scalable and reproducible structure. Widely used resources such as PackLib2 [12] and the BED-BPP dataset [9] have provided important foundations for the evaluation of three-dimensional packing and palletizing algorithms. However, these datasets primarily emphasize geometric feasibility and fixed instance collections, which limits their suitability for analyses that require repeated experimentation under variable warehouse conditions. Related instance generators and software toolboxes [10,11] introduce parameterized generation mechanisms, yet they typically focus on geometry-driven variability and offer limited support for scenario-oriented evaluation.

In this context, the objective of the present study is the introduction of a synthetic benchmark framework designed to support systematic and controlled experimentation under warehouse-relevant assumptions. The proposed dataset integrates geometric, material, environmental and handling attributes within a unified generation process, while maintaining stable parameter distributions across multiple catalog sizes. This design enables comparative algorithmic evaluation without scale-induced distortion of physical or operational characteristics. Furthermore, the inclusion of an API for dynamic picking list generation supports scenario-driven experimentation, reflecting the structured variability observed in daily warehouse operations. These contributions establish a reproducible and extensible benchmark framework that complements existing datasets and supports more comprehensive evaluation of mixed palletizing algorithms.

1.2. Limitations of Existing Benchmark Datasets

Despite their widespread adoption, existing benchmark datasets for three-dimensional packing and palletizing exhibit structural limitations that restrict their suitability for evaluating algorithms under realistic warehouse dynamics. PackLib2 [12], while providing an extensive collection of benchmark instances, relies on static problem definitions and fixed instance sets. This design does not support dynamic order composition or scenario variation, which are intrinsic characteristics of warehouse operations where daily picking lists continuously change in size and composition.

A related limitation concerns the evaluation of algorithm generalization. The BED-BPP dataset [9] offers carefully curated benchmarks for robotic bin packing, with a strong emphasis on perception and geometric feasibility. However, its reliance on predefined scenes constrains the assessment of algorithm robustness under unseen combinations of item attributes, order configurations or operational constraints. As a result, algorithm performance is validated primarily within fixed environments, providing limited insight into adaptability and scalability across variable warehouse scenarios. Similar constraints are observed in existing instance generators and software toolboxes, where randomization is introduced without strong coupling between attributes or repeated scenario control [10,11].

These limitations indicate a gap between existing benchmarks and the requirements of warehouse-oriented algorithm evaluation. Addressing this gap requires benchmark frameworks that support controlled variability, repeated scenario generation and the inclusion of operational constraints beyond geometry alone. Such characteristics enable more comprehensive validation of algorithm behavior, particularly with respect to generalization, robustness and performance stability under realistic and dynamically evolving warehouse conditions.

After establishing the scope and positioning of the dataset, the following sections present its construction, evaluation and implications. Section 2 describes the construction of the MixedPalletBoxes datasets, outlining the problem-driven synthetic generation framework and the role of the box_filter API in producing scenario-based picking lists. Section 3 presents the experimental evaluation, detailing the benchmarking setup across multiple mixed palletizing algorithms, the resulting performance metrics and an interpretation of algorithm behavior in relation to dataset characteristics. Section 4 discusses the findings, focusing on algorithm generalization, controlled variability and the implications of synthetic datasets for warehouse-oriented evaluation. Finally, Section 5 summarizes the main contributions, reflects on limitations and outlines directions for future research.

2. Dataset Creation and Feature Specification

The dataset introduced in this work is constructed as a problem-driven synthetic dataset, intended for hypothesis-oriented evaluation rather than general-purpose benchmarking. Its design is guided by modeling assumptions that reflect the structured characteristics of warehouse operations, including spatial layout, item attributes and task composition. By coupling controlled random parameter distributions with scenario-driven instance generation, the framework supports systematic analysis of algorithm behavior under well-defined conditions, with particular relevance to the validation of hypotheses related to warehouse robotic processes such as robot path planning.

The construction of the MixedPalletBoxes datasets follows a structured generation pipeline instead of a simple random sampling procedure. The process begins with the definition of a parameter space that combines dimensional ranges, material-specific rules, environmental limits and handling constraints. Within this space, controlled randomization is applied to create item instances that remain physically coherent and operationally plausible while preserving diversity across catalog sizes. Derived attributes are computed from the sampled dimensions and material properties, and consistency checks are applied to exclude infeasible combinations. The resulting records are exported in tabular form and can be accessed through an API that generates dynamic picking lists under user-defined filters. In this way, the generator constitutes the core methodological artifact and the static datasets represent reproducible snapshots of its parameterized behavior. To enable thorough benchmarking and comparative analysis of mixed palletizing algorithms, a synthetic dataset of rectangular boxes was created. Generated using a parameterized Python 3.11.13 script, the dataset encompasses a comprehensive range of geometric, material, environmental and handling characteristics for each box. The source code is freely accessible on GitHub (see Section Data Availability).

2.1. Synthetic Box Generation Procedure

The array of datasets was constructed by sampling between 500 and 100,000 individual box instances, resulting in seven distinct, unchanging datasets. The generation process was guided by design principles that balance variability with operational plausibility, ensuring that the resulting instances remain physically feasible and representative of warehouse packaging conditions. Parameter ranges were selected to reflect commonly encountered industrial constraints, while attribute dependencies were preserved to maintain coherence between geometric, material and handling characteristics. Rather than introducing unconstrained randomization, the framework emphasizes structured variability, enabling controlled exploration of algorithm behavior under realistic conditions and avoiding artificial combinations that would reduce interpretability. Details of parameter sampling procedures and attribute assignment are provided in Appendix A.

The final dataset comprises 15 columns (Table 1) and is exported as an Excel file (boxes_db_****.xlsx, where * refers to the number of the records). Table 2 depicts a sample dataset preview.

Table 1. Attributes in the synthetic dataset.

Table 2. Sample dataset preview, an extracted instance.

Anomaly prevention is embedded in the design of the generator. All item attributes are sampled within material-dependent ranges that incorporate feasible dimensional, thermal and load capacity limits. These ranges ensure that the resulting combinations of dimensions, thickness, temperature tolerance and maximum load capacity remain physically coherent and operationally plausible. Derived properties are computed directly from the sampled values, which further maintains consistency between geometric and mechanical characteristics. Through this range-based construction, the generator naturally avoids infeasible or contradictory items and produces catalogs that remain internally consistent without requiring post-processing exclusion steps.

The summary statistics across the seven MixedPalletBoxes datasets (generated by a Python script available on GitHub) demonstrate a high degree of internal consistency and realistic scaling behavior (Table 3). The average box length remains stable across all dataset sizes (≈58.5–60.6 cm), and both width and height show minimal variation, suggesting that the synthetic generator preserves distributional characteristics as volume increases. The average maximum load capacity varies modestly, from 77.2 kg to 88.2 kg, which reflects natural fluctuations due to material and dimensional sampling. Fragility rates range from 20.3% to 24.8%, and stackability remains consistently above 70%, aligning with general expectations for mixed warehousing and distribution environments. These results confirm that dataset scaling does not introduce bias or skew in critical physical and handling attributes, reinforcing their suitability for benchmarking both lightweight and large-scale mixed palletizing algorithms.

Table 3. Summary statistics across all seven MixedPalletBoxes datasets.

2.2. Box Filter API

A FastAPI service is also available on GitHub, enabling users to easily download and utilize it for generating random picking lists based on a specific dataset. There is a comprehensive guide that walks through the process of transforming an Excel-based box dataset into an SQLite database and deploying a FastAPI server. The server allows for randomized box selection via an API. The steps include downloading the required files, setting up a Python virtual environment, installing necessary dependencies and running the API server locally. Once operational, devices like Arduino can interact with the server to fetch picking list data using HTTP endpoints. To encourage transparency in research and reproducibility, the complete Python script (box_generator.py), the associated datasets (boxes_database) and the picking list generator (box_filter_api) are available in the Github repository. Users can clone the repository to customize dataset generation parameters or explore the logic behind the dataset creation process.

From a methodological perspective, these design choices collectively define the contribution of the proposed dataset framework. The structured definition of the parameter space used for synthetic instance generation constitutes a central methodological element. Rather than introducing unconstrained randomness, parameter ranges are deliberately bounded according to physical feasibility, material properties and operational relevance. This approach ensures that variability reflects meaningful warehouse conditions while excluding parameter combinations that would be unrealistic or operationally irrelevant. By constraining randomness within explicitly defined domains, the framework supports controlled experimentation without sacrificing diversity and prevents the emergence of meaningless randomness that would otherwise obscure the relationship between instance characteristics and algorithmic performance.

3. Experimental Evaluation

3.1. Multi-Algorithm Evaluation on Synthetic Picking Lists

In many small- to medium-sized warehouses, the daily handling volume of distinct packages rarely exceeds a few hundred, making a catalog of 500 synthetic box models quite representative of real-world operations [13]. This research, based on a survey of 215 warehouses in the Netherlands and Flanders, highlights that task complexity and market dynamics are primary drivers of warehouse management decisions. It suggests that warehouses with fewer stock keeping units (SKUs) often exhibit less complex planning and control systems, aligning with our approach of using a 500-item catalog to mirror the SKU diversity typical in small- to medium-sized warehouses. Larger distributors may catalog tens of thousands of SKUs, but most palletized shipments tend to concentrate around a limited set of recurring case dimensions, which the proposed dataset is designed to reflect. Importantly, while our demonstration uses 500 items, the same generator script can be used to produce larger or smaller catalogs, enabling for different approaches for testing and validating algorithms.

Custom Python implementations of established mixed palletizing algorithms were applied to the 500-record dataset. The first fit decreasing (FFD) heuristic, a classic bin packing strategy that places each item into the first bin where it fits, is commonly used in similar contexts [14,15]. The extreme point (EP) method identifies and utilizes available extreme points in the packing space to position items efficiently [16,17,18]. The guillotine cut approach recursively partitions the packing area using straight line cuts, producing rectangular subregions [19]. The layered (shelf) strategy organizes items in horizontal layers or shelves to simplify placement and stacking [20,21,22]. The Genetic Algorithm (GA) applies evolutionary principles such as selection, crossover and mutation to explore the solution space [23,24,25]. Finally, the tabu search (TS) metaheuristic iteratively improves solutions while avoiding cycles through a tabu list of forbidden moves [26,27,28].

The algorithms utilized geometric dimensions (length, width, height) and box volume for packing decisions, with only the TS algorithm additionally considering special attributes like fragility and stackability as constraints. To reflect real-world variability in workload, our box_filter API generated four randomized picking lists of varying sizes for testing:

Light orders (20 items): Small, low-density shipments that reflect quick fulfillment tasks or partial replenishment needs, often associated with flexible, last-minute logistics.
Typical orders (35 items): Standard warehouse orders representing the bulk of daily operations. These are balanced in volume and complexity, often resembling single shipment batches.
Heavy orders (50 items): Larger and more demanding requests that reflect substantial restocking events or outbound shipments to retail hubs with diverse inventory needs.
Overflow orders (100 items): High-volume, complex orders simulating peak demand scenarios or bulk consolidation loads, requiring efficient coordination and robust algorithmic strategies.

For each combination of picking list and algorithm, detailed reports were produced summarizing the number of pallets used, average pallet utilization, average box count per pallet and algorithm runtime (Table 4). In all scenarios, the pallet size is 1.2 m × 1.00 m × 0.85 m and pallet volume was limited to a 1.02 m³ capacity—a simple yet practical constraint that serves as an effective baseline for initial dataset validation and benchmarking of palletizing algorithms. This comprehensive testing framework validates both the realism of our synthetic datasets and the performance of the palletizing methods under diverse operational conditions. All experiments were conducted in a Google Colab environment running Python 3.11.13 (GCC 11.4.0) on a Linux 6.1.123+ (x86_64) platform with glibc 2.35. All Python scripts used for the experiments are also accessible on Github at release v1.0.

Table 4. Performance metrics across algorithms and picking list sizes.

3.2. Results

Our randomized picking lists (ranging from 20 to 100 items) from the 500-record synthetic dataset serve as a rigorous foundation for validating and benchmarking mixed palletizing algorithms. By applying six distinct algorithms, including FFD, EP, guillotine cut, layered shelf, GA and TS, the analysis examined the capability of the dataset to emulate realistic and diverse warehouse packing challenges. Across all scenarios and algorithmic approaches, the datasets consistently enabled the following:

Diverse packing outcomes: The number of pallets used, pallet utilization rates and box distributions varied meaningfully, illustrating the dataset’s capacity to represent a broad spectrum of workload complexities, from under loaded pallets to near-full and multi-pallet orders.
Robust performance metrics: Metrics such as average pallet utilization (ranging from 32% to over 90%), box counts per pallet and packing runtimes showcased the datasets’ ability to support comprehensive performance analysis across heuristic and metaheuristic methods.
Scalability validation: Increasing order sizes led to expected shifts in packing complexity [29] and resource usage, confirming that the datasets scale naturally and realistically with workload size, which is crucial for testing algorithm scalability and robustness.
Algorithm agnostic testing: The datasets function effectively as a universal testbed, supporting a variety of packing strategies with differing spatial, volume and computational characteristics, thus providing an objective ground for evaluating new and existing palletizing solutions.

Overall, this multi-algorithm evaluation confirms that our datasets are both versatile and realistic, capable of supporting the development, validation and benchmarking of mixed palletizing algorithms across diverse operational scenarios.

3.3. Algorithm Behavior Explanation

The performance patterns reported in Table 4 can be directly interpreted in relation to the structured properties of the generated dataset and the characteristics of the evaluated algorithms. For small picking lists, algorithms based on greedy or deterministic placement strategies, such as FFD, achieve high-volume utilization due to limited geometric heterogeneity and reduced combinatorial complexity. As order size increases, performance degradation becomes evident, reflecting the sensitivity of such heuristics to increased dimensional diversity and reduced placement flexibility.

Algorithms relying on geometric partitioning principles, including EP and layered strategies, exhibit consistently lower utilization across all order sizes. This behavior can be attributed to fragmentation effects and restrictive placement assumptions, which become more pronounced under heterogeneous box dimensions, which is a defining feature of the synthetic dataset. While these approaches maintain stable runtimes, their limited adaptability to diverse geometric configurations constrains packing efficiency.

Metaheuristic methods demonstrate a contrasting behavior. The GA achieves high utilization, particularly for larger orders, indicating effective exploration of the structured parameter space introduced by the dataset. However, this improvement is accompanied by a substantial increase in computational cost, highlighting the trade-off between solution quality and scalability. TS exhibits balanced performance, benefiting from its ability to incorporate additional constraints such as fragility and stackability, which are explicitly encoded in the dataset. This explains its strong performance in medium to large picking lists, where constraint interactions play a more prominent role.

These observations confirm that the dataset parameters actively shape algorithmic behavior, rather than serving as passive descriptors. Differences in performance across algorithms can be consistently explained by their interaction with geometric variability, order size and constraint structure, demonstrating the dataset’s suitability for controlled and interpretable evaluation of warehouse-oriented packing algorithms. This form of parameter-conditioned interpretation aligns with explainability practices commonly adopted in post hoc analysis of algorithmic performance.

4. Discussion

The proposed dataset is intended to support systematic evaluation of mixed palletizing algorithms under conditions that resemble real warehouse operations, while maintaining full control over problem structure. By combining geometric variation with material properties, handling constraints and environmental limits, the dataset enables analysis of algorithm behavior beyond purely spatial feasibility. From this perspective, the experimental results demonstrate that the dataset can expose meaningful differences in algorithm performance that arise from interaction between instance parameters and algorithm design.

Rather than focusing solely on performance rankings, the results highlight how different algorithmic strategies respond to changes in order size, geometric heterogeneity and constraint composition. Greedy- and geometry-based heuristics benefit from regularity and limited variability, whereas metaheuristic approaches exhibit greater robustness under heterogeneous conditions at the cost of increased computational effort. These observations support the theoretical expectation that algorithm performance in palletizing tasks is strongly shaped by constraint structure and problem scale, reinforcing the relevance of controlled synthetic benchmarks for systematic analysis.

The dataset also plays an important role in evaluating algorithm generalization. Because parameter ranges and distributions are explicitly defined, performance differences can be attributed to algorithmic behavior rather than unintended dataset bias. This is particularly relevant for adaptive- or learning-based methods, where uncontrolled correlations in real-world data may lead to overfitting. The ability to generate multiple datasets with consistent statistical properties allows for comparative evaluation across scenarios, supporting more reliable validation of algorithm robustness.

Several limitations of the current framework should be acknowledged. The dataset models static packing scenarios and does not explicitly represent dynamic effects such as moving obstacles, time-varying velocities or pallet stability under motion. In addition, some real-world factors, including irregular package shapes and correlated item properties, are simplified or omitted. These limitations define clear directions for future work.

Future extensions may focus on multimodal data generation, incorporating temporal, kinematic or sensory attributes alongside existing geometric and material features. Such extensions would enable evaluation of perception-aware and motion-constrained palletizing strategies, particularly in robotic settings. Further work may also introduce dynamic elements, such as obstacle motion or sequence-dependent constraints, to better reflect operational warehouse environments.

5. Conclusions

This work introduces the MixedPalletBoxes dataset, a scalable synthetic benchmark comprising 500 to 100,000 box instances annotated with geometric, material, environmental and handling attributes. The dataset is designed to support reproducible and controlled evaluation of mixed palletizing algorithms, establishing a common reference framework for warehouse optimization research.

A central contribution of this work lies in demonstrating how synthetic datasets can be used to validate algorithm generalization under controlled conditions. By explicitly defining parameter ranges and distributions, the dataset reduces hidden correlations and mitigates unintended bias that often arises in real-world data collections. This enables performance differences to be attributed to algorithmic design choices rather than artifacts of dataset composition, supporting more reliable assessment of robustness across varying workload sizes and instance configurations.

An important implication of this framework is that effective palletizing solutions are inherently context-dependent. Each warehouse operates within its own distinct SKU universe, shaped by product mix, packaging standards and operational constraints. Algorithms trained or tuned on facility-specific datasets implicitly learn compatibility patterns and packing regularities unique to that environment. Consequently, universal algorithms evaluated on generic datasets may fail to generalize effectively in practice, whereas problem-driven synthetic datasets provide a principled means of tailoring and validating solutions for specific operational contexts.

While the dataset captures a wide range of realistic static attributes, it remains synthetic and does not model all aspects of real-world operations, such as deformable items, correlated feature distributions or dynamic effects during execution. These limitations reflect deliberate design choices that prioritize interpretability and controlled evaluation.

Future work may extend this framework by integrating real industrial data to refine parameter distributions and by incorporating additional modalities, such as temporal, kinematic or sensory information. Such extensions would further enhance the applicability of the dataset to robotic palletizing systems operating in dynamic environments.

By openly releasing the dataset, generation tools and evaluation pipeline, this work provides a flexible foundation for bias-aware benchmarking and facility-specific validation of palletizing algorithms in automated warehouse systems.

Author Contributions

Conceptualization, A.D.; methodology, A.D.; validation, I.K.; formal analysis, A.D.; investigation, A.D.; resources, A.D.; writing—original draft preparation, A.D.; writing—review and editing, I.K.; visualization, A.D.; supervision, I.K.; project administration, I.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data can be accessed on 25 December 2025 at https://github.com/Robotics-Logistics/MixedPalletBoxes/releases/tag/v1.0.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

API	Application Programming Interface
SKU	Stock Keeping Unit
FFD	First Fit Decreasing
EP	Extreme Point
GA	Genetic Algorithm
TS	Tabu Search

Appendix A

For each box, the script proceeds through the following steps:

Dimension Sampling: Lengths are sampled uniformly between 20 cm and 100 cm. While PackLib² [12] includes benchmarks with a wide range of box sizes, it does not specify a unified dimension range across all instances. The selected ranges correspond to commonly observed industrial carton dimensions and are intended to maintain practical feasibility. The width is determined to be between 5% of the chosen length (with a minimum of 5 cm) and the length minus 1 cm, ensuring realistic proportions. Similarly, the height is chosen to fall between 5% of the smaller dimension (either length or width) and that smaller dimension, maintaining feasible internal clearance after accounting for wall thickness.
Material and Wall Thickness: Five discrete materials—cardboard, plastic, wood, metal and composite—are selected according to predefined probabilities (e.g., 55% cardboard, 5% metal), reflecting common packaging distributions. For each material, wall thickness is drawn from a material-specific set of options (e.g., cardboard: 0.1–0.5 cm in 0.05 cm increments; metal: 0.1–0.3 cm).
Derived Metrics: External volume (L) and internal volume (L) are computed from the sampled dimensions. The maximum load capacity [30] (kg) is calculated using the formula: Loadmax = min(External Volume × Thickness × M, 500), where M represents a material specific multiplier (e.g., 2.5 for cardboard, 12 for metal). The 500 kg limit ensures that the values remain within realistic limits.
Special Attributes: Fragility [31], stackability [32], waterproofing and fire retardancy are assigned via Bernoulli trials, with material-dependent probabilities (e.g., plastic is 95% waterproof; composite is 65% stackable). Temperature Tolerance: Minimum and maximum operating temperatures are determined from material-specific ranges (e.g., metal: –45 °C to –20 °C for the minimum and 80 °C to 120 °C for the maximum), ensuring that the minimum value is always less than the maximum.
Aesthetic and Origin Attributes: Color is chosen from a palette of eight hues (e.g., white, deep brown, light yellow). Country of origin is sampled uniformly from a nine-country list (e.g., Greece, China, USA). Each box is uniquely identified by a six-digit identifier (e.g., BX000123).

References

Daios, A.; Kladovasilakis, N.; Kelemis, A.; Kostavelis, I. AI applications in supply chain management: A survey. Appl. Sci. 2025, 15, 2775. [Google Scholar] [CrossRef]
Daios, A.; Kostavelis, I. Industry 4.0 Technologies in Distribution Centers: A Survey. In Proceedings of the Olympus International Conference on Supply Chains, Katerini, Greece, 24–26 May 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 3–11. [Google Scholar] [CrossRef]
de Carvalho, P.R.; Elhedhli, S. A data-driven approach for mixed-case palletization with support. Optim. Eng. 2022, 23, 1587–1610. [Google Scholar] [CrossRef]
Daios, A.; Kladovasilakis, N.; Kostavelis, I. Mixed Palletizing for Smart Warehouse Environments: Sustainability Review of Existing Methods. Sustainability 2024, 16, 1278. [Google Scholar] [CrossRef]
Ananno, A.A.; Ribeiro, L. A multi-heuristic algorithm for multi-container 3-d bin packing problem optimization using real world constraints. IEEE Access 2024, 12, 42105–42130. [Google Scholar] [CrossRef]
Zhao, H.; Xu, J.; Yu, K.; Hu, R.; Zhu, C.; Du, B.; Xu, K. Deliberate planning of 3d bin packing on packing configuration trees. Int. J. Robot. Res. 2025, 02783649251380619. [Google Scholar] [CrossRef]
Tsang, Y.; Mo, D.; Chung, K.; Lee, C. A deep reinforcement learning approach for online and concurrent 3D bin packing optimisation with bin replacement strategies. Comput. Ind. 2025, 164, 104202. [Google Scholar] [CrossRef]
Elhedhli, S.; Gzara, F.; Yildiz, B. Three-dimensional bin packing and mixed-case palletization. Informs J. Optim. 2019, 1, 323–352. [Google Scholar] [CrossRef]
Kagerer, F.; Beinhofer, M.; Stricker, S.; Nüchter, A. BED-BPP: Benchmarking dataset for robotic bin packing problems. Int. J. Robot. Res. 2023, 42, 1007–1014. [Google Scholar] [CrossRef]
Ribeiro, L.; Ananno, A.A. A software toolbox for realistic dataset generation for testing online and offline 3D bin packing algorithms. Processes 2023, 11, 1909. [Google Scholar] [CrossRef]
Osaba, E.; Villar-Rodriguez, E.; Romero, S.V. Benchmark dataset and instance generator for real-world three-dimensional bin packing problems. Data Brief 2023, 49, 109309. [Google Scholar] [CrossRef]
Fekete, S.P.; Van der Veen, J.C. PackLib2: An integrated library of multi-dimensional packing problems. Eur. J. Oper. Res. 2007, 183, 1131–1135. [Google Scholar] [CrossRef]
Faber, N.; De Koster, M.; Smidts, A. Organizing warehouse management. Int. J. Oper. Prod. Manag. 2013, 33, 1230–1256. [Google Scholar] [CrossRef]
Johnson, D.S. Near-Optimal bin Packing Algorithms. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1973. [Google Scholar]
Tadic, L.; Afric, P.; Sikic, L.; Kurdija, A.S.; Klemo, V.; Delac, G.; Silic, M. Analysis and Comparison of Exact and Approximate Bin Packing Algorithms. In Proceedings of the 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 20–24 May 2019; pp. 919–924. [Google Scholar] [CrossRef]
Crainic, T.G.; Perboli, G.; Tadei, R. Extreme point-based heuristics for three-dimensional bin packing. Informs J. Comput. 2008, 20, 368–384. [Google Scholar] [CrossRef]
Mahvash, B.; Awasthi, A.; Chauhan, S. A column generation-based heuristic for the three-dimensional bin packing problem with rotation. J. Oper. Res. Soc. 2018, 69, 78–90. [Google Scholar] [CrossRef]
Iori, M.; Locatelli, M.; Moreira, M.C.O.; Silveira, T. Solution of a Practical Pallet Building Problem with Visibility and Contiguity Constraints. In Proceedings of the ICEIS (1), Virtual, 5–7 May 2020; pp. 327–338. [Google Scholar] [CrossRef]
Lodi, A.; Martello, S.; Vigo, D. Recent advances on two-dimensional bin packing problems. Discret. Appl. Math. 2002, 123, 379–396. [Google Scholar] [CrossRef]
Bischoff, E.E.; Ratcliff, M. Issues in the development of approaches to container loading. Omega 1995, 23, 377–390. [Google Scholar] [CrossRef]
Calzavara, G.; Iori, M.; Locatelli, M.; Moreira, M.C.; Silveira, T. Mathematical models and heuristic algorithms for pallet building problems with practical constraints. Ann. Oper. Res. 2021, 350, 5–36. [Google Scholar] [CrossRef]
Tresca, G.; Cavone, G.; Carli, R.; Cerviotti, A.; Dotoli, M. Automating bin packing: A layer building matheuristics for cost effective logistics. IEEE Trans. Autom. Sci. Eng. 2022, 19, 1599–1613. [Google Scholar] [CrossRef]
Bortfeldt, A.; Gehring, H. A hybrid genetic algorithm for the container loading problem. Eur. J. Oper. Res. 2001, 131, 143–161. [Google Scholar] [CrossRef]
Lau, H.C.; Chan, T.; Tsui, W.; Ho, G.T.; Choy, K.L. An AI approach for optimizing multi-pallet loading operations. Expert Syst. Appl. 2009, 36, 4296–4312. [Google Scholar] [CrossRef]
Ancora, G.; Palli, G.; Melchiorri, C. A hybrid genetic algorithm for pallet loading in real-world applications. IFAC-PapersOnLine 2020, 53, 10006–10010. [Google Scholar] [CrossRef]
Lodi, A.; Martello, S.; Vigo, D. Heuristic algorithms for the three-dimensional bin packing problem. Eur. J. Oper. Res. 2002, 141, 410–420. [Google Scholar] [CrossRef]
Álvarez-Valdés, R.; Parreño, F.; Tamarit, J.M. A tabu search algorithm for the pallet loading problem. Spectrum 2005, 27, 43–61. [Google Scholar] [CrossRef]
Leon, P.; Cueva, R.; Tupia, M.; Paiva Dias, G. A taboo-search algorithm for 3d-binpacking problem in containers. In Proceedings of the New Knowledge in Information Systems and Technologies: Volume 1, Galicia, Spain, 16–19 April 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 229–240. [Google Scholar] [CrossRef]
Barros, H.; Pereira, T.; Ramos, A.G.; Ferreira, F.A. Complexity constraint in the distributor’s pallet loading problem. Mathematics 2021, 9, 1742. [Google Scholar] [CrossRef]
Gzara, F.; Elhedhli, S.; Yildiz, B.C. The pallet loading problem: Three-dimensional bin packing with practical constraints. Eur. J. Oper. Res. 2020, 287, 1062–1074. [Google Scholar] [CrossRef]
Ancora, G.; Palli, G.; Melchiorri, C. Combining Hybrid Genetic Algorithms and Feedforward Neural Networks for Pallet Loading in Real-World Applications. In Proceedings of the Human-Friendly Robotics 2021: HFR: 14th International Workshop on Human-Friendly Robotics, Bologna, Italy, 28–29 October 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–14. [Google Scholar] [CrossRef]
Iori, M.; Locatelli, M.; Moreira, M.C.; Silveira, T. A mixed approach for pallet building problem with practical constraints. In Proceedings of the International Conference on Enterprise Information Systems, Prague, Czech Republic, 5–7 May 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 122–139. [Google Scholar] [CrossRef]

Table 1. Attributes in the synthetic dataset.

Attribute	Description	Generation Method/Values
Box ID	Unique identifier (BX######) ¹	Incremented number
Length	External dimension (cm)	Uniform integer between 20 and 100
Width	External dimension (cm)	Random integer between 5% of length and length − 1
Height	External dimension (cm)	Random integer between 5% of min(length, width) and that value
Thickness	Wall thickness (cm)	Discrete choices per material type
External/Internal Volume	Volume in liters	Calculated
Max Load Capacity	Maximum vertical load (kg)	Computed: volume × thickness × material multiplier
Material	Packaging material	Weighted choice: cardboard (55%), plastic (25%), etc.
Fragile	“Yes”/“No” label	Bernoulli distribution by material (e.g., 30% for cardboard)
Stackable	“Yes”/“No” label	Same as above (probabilistic by material)
Waterproof	“Yes”/“No” label	Same as above (probabilistic by material)
Fire Retardant	“Yes”/“No” label	Same as above (probabilistic by material)
Min/Max Temperature	Operating temperature bounds (°C)	Random int in material dependent range (min < max)
Color	Box color	Random choice from predefined list
Country of Origin	Manufacturing country	Random choice from country list

¹ BX######: unique Box ID, where ###### is a six-digit number.

Table 2. Sample dataset preview, an extracted instance.

Box ID	BX000001	BX000002
Length (cm)	75	90
Width (cm)	60	450
Height (cm)	40	50
Thickness (cm)	0.4	0.2
External Volume (L)	180.00	202.50
Internal Volume (L)	167.78	197.28
Max Load (kg)	350.00	450.00
Material	Cardboard	Plastic
Fragile	Yes	No
Stackable	Yes	Yes
Waterproof	No	Yes
Fire Retardant	No	Yes
Min Temp (°C)	−30	−20
Max Temp (°C)	60	55
Color	Brown	Gray
Country	Germany	Japan

Table 3. Summary statistics across all seven MixedPalletBoxes datasets.

Dataset	Items	Avg Length (cm)	Avg Width (cm)	Avg Height (cm)	Avg Max Load (kg)	Fragile %	Stack-Able %
boxes_db_500	500	58.54	33.46	19.52	88.18	24.80	73.60
boxes_db_1.000	1000	58.82	31.82	18.40	77.19	20.30	70.40
boxes_db_3.000	3000	60.26	31.52	18.11	77.68	22.47	71.90
boxes_db_5.000	5000	60.21	32.20	18.59	81.55	21.64	72.34
boxes_db_10.000	10,000	60.58	32.14	18.70	82.71	22.18	72.88
boxes_db_50.000	50,000	59.94	32.00	18.53	81.56	22.15	72.56
boxes_db_100.000	100,000	60.00	31.99	18.51	81.29	21.97	72.59

Table 4. Performance metrics across algorithms and picking list sizes.

Algorithm	Order Size	Pallets Used	Volume Utilization	Boxes per Pallet	Runtime (s)
FFD (1D)	20	2	84.53%	10.00	0.023
	35	2	91.18%	17.50	0.031
	50	4	80.42%	12.50	0.058
	100	10	77.03%	10.00	0.143
EP	20	4	41.44%	5.00	0.032
	35	3	59.59%	11.67	0.056
	50	5	63.08%	10.00	0.079
	100	13	58.09%	7.69	0.164
Guillotine Cut (py3dbp)	20	3	55.25%	6.67	0.003
	35	3	59.59%	11.67	0.011
	50	4	75.14%	12.25	0.018
	100	11	68.65%	9.09	0.098
Layered (Shelf)	20	3	45.82%	5.67	0.024
	35	4	32.42%	7.25	0.031
	50	6	35.57%	5.83	0.053
	100	13	35.91%	5.69	0.055
GA with 100 generations	20	2	82.74%	10.00	7.90
	35	2	89.34%	17.50	12.77
	50	4	78.85%	12.50	17.70
	100	8	94.40%	12.50	37.05
TS	20	3	55.25%	6.67	0.03
	35	2	89.39%	17.50	0.12
	50	4	78.85%	12.50	1.50
	100	9	83.91%	11.11	6.67

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.