MixedPalletBoxes Dataset: A Synthetic Benchmark Dataset for Warehouse Applications
Abstract
1. Introduction
- Standardization: By providing a unified data source with controlled variability, algorithms can be compared more reliably without differences in input distributions affecting the results.
- Rich Feature Space: The dataset extends beyond geometric description by incorporating material properties, load capacity, fragility and environmental tolerances, allowing algorithmic evaluation to account for practical handling constraints, such as limiting heavy loads on fragile items and respecting temperature-sensitive stacking conditions.
- Scalability: Researchers can easily adjust the dataset size or customize parameter distributions, such as increasing the proportion of metal boxes or narrowing the range of temperature tolerances, to simulate various industrial scenarios.
- Benchmarking and Extension: The dataset establishes a foundation for community-wide benchmarks, with potential future enhancements like irregular box shapes, dynamic loading sequences, or integration with 3D spatial simulations.
1.1. Research Objectives and Contributions
1.2. Limitations of Existing Benchmark Datasets
2. Dataset Creation and Feature Specification
2.1. Synthetic Box Generation Procedure
2.2. Box Filter API
3. Experimental Evaluation
3.1. Multi-Algorithm Evaluation on Synthetic Picking Lists
- Light orders (20 items): Small, low-density shipments that reflect quick fulfillment tasks or partial replenishment needs, often associated with flexible, last-minute logistics.
- Typical orders (35 items): Standard warehouse orders representing the bulk of daily operations. These are balanced in volume and complexity, often resembling single shipment batches.
- Heavy orders (50 items): Larger and more demanding requests that reflect substantial restocking events or outbound shipments to retail hubs with diverse inventory needs.
- Overflow orders (100 items): High-volume, complex orders simulating peak demand scenarios or bulk consolidation loads, requiring efficient coordination and robust algorithmic strategies.
3.2. Results
- Diverse packing outcomes: The number of pallets used, pallet utilization rates and box distributions varied meaningfully, illustrating the dataset’s capacity to represent a broad spectrum of workload complexities, from under loaded pallets to near-full and multi-pallet orders.
- Robust performance metrics: Metrics such as average pallet utilization (ranging from 32% to over 90%), box counts per pallet and packing runtimes showcased the datasets’ ability to support comprehensive performance analysis across heuristic and metaheuristic methods.
- Scalability validation: Increasing order sizes led to expected shifts in packing complexity [29] and resource usage, confirming that the datasets scale naturally and realistically with workload size, which is crucial for testing algorithm scalability and robustness.
- Algorithm agnostic testing: The datasets function effectively as a universal testbed, supporting a variety of packing strategies with differing spatial, volume and computational characteristics, thus providing an objective ground for evaluating new and existing palletizing solutions.
3.3. Algorithm Behavior Explanation
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| API | Application Programming Interface |
| SKU | Stock Keeping Unit |
| FFD | First Fit Decreasing |
| EP | Extreme Point |
| GA | Genetic Algorithm |
| TS | Tabu Search |
Appendix A
- Dimension Sampling: Lengths are sampled uniformly between 20 cm and 100 cm. While PackLib2 [12] includes benchmarks with a wide range of box sizes, it does not specify a unified dimension range across all instances. The selected ranges correspond to commonly observed industrial carton dimensions and are intended to maintain practical feasibility. The width is determined to be between 5% of the chosen length (with a minimum of 5 cm) and the length minus 1 cm, ensuring realistic proportions. Similarly, the height is chosen to fall between 5% of the smaller dimension (either length or width) and that smaller dimension, maintaining feasible internal clearance after accounting for wall thickness.
- Material and Wall Thickness: Five discrete materials—cardboard, plastic, wood, metal and composite—are selected according to predefined probabilities (e.g., 55% cardboard, 5% metal), reflecting common packaging distributions. For each material, wall thickness is drawn from a material-specific set of options (e.g., cardboard: 0.1–0.5 cm in 0.05 cm increments; metal: 0.1–0.3 cm).
- Derived Metrics: External volume (L) and internal volume (L) are computed from the sampled dimensions. The maximum load capacity [30] (kg) is calculated using the formula: Loadmax = min(External Volume × Thickness × M, 500), where M represents a material specific multiplier (e.g., 2.5 for cardboard, 12 for metal). The 500 kg limit ensures that the values remain within realistic limits.
- Special Attributes: Fragility [31], stackability [32], waterproofing and fire retardancy are assigned via Bernoulli trials, with material-dependent probabilities (e.g., plastic is 95% waterproof; composite is 65% stackable). Temperature Tolerance: Minimum and maximum operating temperatures are determined from material-specific ranges (e.g., metal: –45 °C to –20 °C for the minimum and 80 °C to 120 °C for the maximum), ensuring that the minimum value is always less than the maximum.
- Aesthetic and Origin Attributes: Color is chosen from a palette of eight hues (e.g., white, deep brown, light yellow). Country of origin is sampled uniformly from a nine-country list (e.g., Greece, China, USA). Each box is uniquely identified by a six-digit identifier (e.g., BX000123).
References
- Daios, A.; Kladovasilakis, N.; Kelemis, A.; Kostavelis, I. AI applications in supply chain management: A survey. Appl. Sci. 2025, 15, 2775. [Google Scholar] [CrossRef]
- Daios, A.; Kostavelis, I. Industry 4.0 Technologies in Distribution Centers: A Survey. In Proceedings of the Olympus International Conference on Supply Chains, Katerini, Greece, 24–26 May 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 3–11. [Google Scholar] [CrossRef]
- de Carvalho, P.R.; Elhedhli, S. A data-driven approach for mixed-case palletization with support. Optim. Eng. 2022, 23, 1587–1610. [Google Scholar] [CrossRef]
- Daios, A.; Kladovasilakis, N.; Kostavelis, I. Mixed Palletizing for Smart Warehouse Environments: Sustainability Review of Existing Methods. Sustainability 2024, 16, 1278. [Google Scholar] [CrossRef]
- Ananno, A.A.; Ribeiro, L. A multi-heuristic algorithm for multi-container 3-d bin packing problem optimization using real world constraints. IEEE Access 2024, 12, 42105–42130. [Google Scholar] [CrossRef]
- Zhao, H.; Xu, J.; Yu, K.; Hu, R.; Zhu, C.; Du, B.; Xu, K. Deliberate planning of 3d bin packing on packing configuration trees. Int. J. Robot. Res. 2025, 02783649251380619. [Google Scholar] [CrossRef]
- Tsang, Y.; Mo, D.; Chung, K.; Lee, C. A deep reinforcement learning approach for online and concurrent 3D bin packing optimisation with bin replacement strategies. Comput. Ind. 2025, 164, 104202. [Google Scholar] [CrossRef]
- Elhedhli, S.; Gzara, F.; Yildiz, B. Three-dimensional bin packing and mixed-case palletization. Informs J. Optim. 2019, 1, 323–352. [Google Scholar] [CrossRef]
- Kagerer, F.; Beinhofer, M.; Stricker, S.; Nüchter, A. BED-BPP: Benchmarking dataset for robotic bin packing problems. Int. J. Robot. Res. 2023, 42, 1007–1014. [Google Scholar] [CrossRef]
- Ribeiro, L.; Ananno, A.A. A software toolbox for realistic dataset generation for testing online and offline 3D bin packing algorithms. Processes 2023, 11, 1909. [Google Scholar] [CrossRef]
- Osaba, E.; Villar-Rodriguez, E.; Romero, S.V. Benchmark dataset and instance generator for real-world three-dimensional bin packing problems. Data Brief 2023, 49, 109309. [Google Scholar] [CrossRef]
- Fekete, S.P.; Van der Veen, J.C. PackLib2: An integrated library of multi-dimensional packing problems. Eur. J. Oper. Res. 2007, 183, 1131–1135. [Google Scholar] [CrossRef][Green Version]
- Faber, N.; De Koster, M.; Smidts, A. Organizing warehouse management. Int. J. Oper. Prod. Manag. 2013, 33, 1230–1256. [Google Scholar] [CrossRef]
- Johnson, D.S. Near-Optimal bin Packing Algorithms. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1973. [Google Scholar]
- Tadic, L.; Afric, P.; Sikic, L.; Kurdija, A.S.; Klemo, V.; Delac, G.; Silic, M. Analysis and Comparison of Exact and Approximate Bin Packing Algorithms. In Proceedings of the 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 20–24 May 2019; pp. 919–924. [Google Scholar] [CrossRef]
- Crainic, T.G.; Perboli, G.; Tadei, R. Extreme point-based heuristics for three-dimensional bin packing. Informs J. Comput. 2008, 20, 368–384. [Google Scholar] [CrossRef]
- Mahvash, B.; Awasthi, A.; Chauhan, S. A column generation-based heuristic for the three-dimensional bin packing problem with rotation. J. Oper. Res. Soc. 2018, 69, 78–90. [Google Scholar] [CrossRef]
- Iori, M.; Locatelli, M.; Moreira, M.C.O.; Silveira, T. Solution of a Practical Pallet Building Problem with Visibility and Contiguity Constraints. In Proceedings of the ICEIS (1), Virtual, 5–7 May 2020; pp. 327–338. [Google Scholar] [CrossRef]
- Lodi, A.; Martello, S.; Vigo, D. Recent advances on two-dimensional bin packing problems. Discret. Appl. Math. 2002, 123, 379–396. [Google Scholar] [CrossRef]
- Bischoff, E.E.; Ratcliff, M. Issues in the development of approaches to container loading. Omega 1995, 23, 377–390. [Google Scholar] [CrossRef]
- Calzavara, G.; Iori, M.; Locatelli, M.; Moreira, M.C.; Silveira, T. Mathematical models and heuristic algorithms for pallet building problems with practical constraints. Ann. Oper. Res. 2021, 350, 5–36. [Google Scholar] [CrossRef]
- Tresca, G.; Cavone, G.; Carli, R.; Cerviotti, A.; Dotoli, M. Automating bin packing: A layer building matheuristics for cost effective logistics. IEEE Trans. Autom. Sci. Eng. 2022, 19, 1599–1613. [Google Scholar] [CrossRef]
- Bortfeldt, A.; Gehring, H. A hybrid genetic algorithm for the container loading problem. Eur. J. Oper. Res. 2001, 131, 143–161. [Google Scholar] [CrossRef]
- Lau, H.C.; Chan, T.; Tsui, W.; Ho, G.T.; Choy, K.L. An AI approach for optimizing multi-pallet loading operations. Expert Syst. Appl. 2009, 36, 4296–4312. [Google Scholar] [CrossRef]
- Ancora, G.; Palli, G.; Melchiorri, C. A hybrid genetic algorithm for pallet loading in real-world applications. IFAC-PapersOnLine 2020, 53, 10006–10010. [Google Scholar] [CrossRef]
- Lodi, A.; Martello, S.; Vigo, D. Heuristic algorithms for the three-dimensional bin packing problem. Eur. J. Oper. Res. 2002, 141, 410–420. [Google Scholar] [CrossRef]
- Álvarez-Valdés, R.; Parreño, F.; Tamarit, J.M. A tabu search algorithm for the pallet loading problem. Spectrum 2005, 27, 43–61. [Google Scholar] [CrossRef]
- Leon, P.; Cueva, R.; Tupia, M.; Paiva Dias, G. A taboo-search algorithm for 3d-binpacking problem in containers. In Proceedings of the New Knowledge in Information Systems and Technologies: Volume 1, Galicia, Spain, 16–19 April 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 229–240. [Google Scholar] [CrossRef]
- Barros, H.; Pereira, T.; Ramos, A.G.; Ferreira, F.A. Complexity constraint in the distributor’s pallet loading problem. Mathematics 2021, 9, 1742. [Google Scholar] [CrossRef]
- Gzara, F.; Elhedhli, S.; Yildiz, B.C. The pallet loading problem: Three-dimensional bin packing with practical constraints. Eur. J. Oper. Res. 2020, 287, 1062–1074. [Google Scholar] [CrossRef]
- Ancora, G.; Palli, G.; Melchiorri, C. Combining Hybrid Genetic Algorithms and Feedforward Neural Networks for Pallet Loading in Real-World Applications. In Proceedings of the Human-Friendly Robotics 2021: HFR: 14th International Workshop on Human-Friendly Robotics, Bologna, Italy, 28–29 October 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–14. [Google Scholar] [CrossRef]
- Iori, M.; Locatelli, M.; Moreira, M.C.; Silveira, T. A mixed approach for pallet building problem with practical constraints. In Proceedings of the International Conference on Enterprise Information Systems, Prague, Czech Republic, 5–7 May 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 122–139. [Google Scholar] [CrossRef]
| Attribute | Description | Generation Method/Values |
|---|---|---|
| Box ID | Unique identifier (BX######) 1 | Incremented number |
| Length | External dimension (cm) | Uniform integer between 20 and 100 |
| Width | External dimension (cm) | Random integer between 5% of length and length − 1 |
| Height | External dimension (cm) | Random integer between 5% of min(length, width) and that value |
| Thickness | Wall thickness (cm) | Discrete choices per material type |
| External/Internal Volume | Volume in liters | Calculated |
| Max Load Capacity | Maximum vertical load (kg) | Computed: volume × thickness × material multiplier |
| Material | Packaging material | Weighted choice: cardboard (55%), plastic (25%), etc. |
| Fragile | “Yes”/“No” label | Bernoulli distribution by material (e.g., 30% for cardboard) |
| Stackable | “Yes”/“No” label | Same as above (probabilistic by material) |
| Waterproof | “Yes”/“No” label | Same as above (probabilistic by material) |
| Fire Retardant | “Yes”/“No” label | Same as above (probabilistic by material) |
| Min/Max Temperature | Operating temperature bounds (°C) | Random int in material dependent range (min < max) |
| Color | Box color | Random choice from predefined list |
| Country of Origin | Manufacturing country | Random choice from country list |
| Box ID | BX000001 | BX000002 |
|---|---|---|
| Length (cm) | 75 | 90 |
| Width (cm) | 60 | 450 |
| Height (cm) | 40 | 50 |
| Thickness (cm) | 0.4 | 0.2 |
| External Volume (L) | 180.00 | 202.50 |
| Internal Volume (L) | 167.78 | 197.28 |
| Max Load (kg) | 350.00 | 450.00 |
| Material | Cardboard | Plastic |
| Fragile | Yes | No |
| Stackable | Yes | Yes |
| Waterproof | No | Yes |
| Fire Retardant | No | Yes |
| Min Temp (°C) | −30 | −20 |
| Max Temp (°C) | 60 | 55 |
| Color | Brown | Gray |
| Country | Germany | Japan |
| Dataset | Items | Avg Length (cm) | Avg Width (cm) | Avg Height (cm) | Avg Max Load (kg) | Fragile % | Stack-Able % |
|---|---|---|---|---|---|---|---|
| boxes_db_500 | 500 | 58.54 | 33.46 | 19.52 | 88.18 | 24.80 | 73.60 |
| boxes_db_1.000 | 1000 | 58.82 | 31.82 | 18.40 | 77.19 | 20.30 | 70.40 |
| boxes_db_3.000 | 3000 | 60.26 | 31.52 | 18.11 | 77.68 | 22.47 | 71.90 |
| boxes_db_5.000 | 5000 | 60.21 | 32.20 | 18.59 | 81.55 | 21.64 | 72.34 |
| boxes_db_10.000 | 10,000 | 60.58 | 32.14 | 18.70 | 82.71 | 22.18 | 72.88 |
| boxes_db_50.000 | 50,000 | 59.94 | 32.00 | 18.53 | 81.56 | 22.15 | 72.56 |
| boxes_db_100.000 | 100,000 | 60.00 | 31.99 | 18.51 | 81.29 | 21.97 | 72.59 |
| Algorithm | Order Size | Pallets Used | Volume Utilization | Boxes per Pallet | Runtime (s) |
|---|---|---|---|---|---|
| FFD (1D) | 20 | 2 | 84.53% | 10.00 | 0.023 |
| 35 | 2 | 91.18% | 17.50 | 0.031 | |
| 50 | 4 | 80.42% | 12.50 | 0.058 | |
| 100 | 10 | 77.03% | 10.00 | 0.143 | |
| EP | 20 | 4 | 41.44% | 5.00 | 0.032 |
| 35 | 3 | 59.59% | 11.67 | 0.056 | |
| 50 | 5 | 63.08% | 10.00 | 0.079 | |
| 100 | 13 | 58.09% | 7.69 | 0.164 | |
| Guillotine Cut (py3dbp) | 20 | 3 | 55.25% | 6.67 | 0.003 |
| 35 | 3 | 59.59% | 11.67 | 0.011 | |
| 50 | 4 | 75.14% | 12.25 | 0.018 | |
| 100 | 11 | 68.65% | 9.09 | 0.098 | |
| Layered (Shelf) | 20 | 3 | 45.82% | 5.67 | 0.024 |
| 35 | 4 | 32.42% | 7.25 | 0.031 | |
| 50 | 6 | 35.57% | 5.83 | 0.053 | |
| 100 | 13 | 35.91% | 5.69 | 0.055 | |
| GA with 100 generations | 20 | 2 | 82.74% | 10.00 | 7.90 |
| 35 | 2 | 89.34% | 17.50 | 12.77 | |
| 50 | 4 | 78.85% | 12.50 | 17.70 | |
| 100 | 8 | 94.40% | 12.50 | 37.05 | |
| TS | 20 | 3 | 55.25% | 6.67 | 0.03 |
| 35 | 2 | 89.39% | 17.50 | 0.12 | |
| 50 | 4 | 78.85% | 12.50 | 1.50 | |
| 100 | 9 | 83.91% | 11.11 | 6.67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Daios, A.; Kostavelis, I. MixedPalletBoxes Dataset: A Synthetic Benchmark Dataset for Warehouse Applications. Appl. Syst. Innov. 2026, 9, 14. https://doi.org/10.3390/asi9010014
Daios A, Kostavelis I. MixedPalletBoxes Dataset: A Synthetic Benchmark Dataset for Warehouse Applications. Applied System Innovation. 2026; 9(1):14. https://doi.org/10.3390/asi9010014
Chicago/Turabian StyleDaios, Adamos, and Ioannis Kostavelis. 2026. "MixedPalletBoxes Dataset: A Synthetic Benchmark Dataset for Warehouse Applications" Applied System Innovation 9, no. 1: 14. https://doi.org/10.3390/asi9010014
APA StyleDaios, A., & Kostavelis, I. (2026). MixedPalletBoxes Dataset: A Synthetic Benchmark Dataset for Warehouse Applications. Applied System Innovation, 9(1), 14. https://doi.org/10.3390/asi9010014
