Blockchain-Based Zero-Trust Supply Chain Security Integrated with Deep Reinforcement Learning for Inventory Optimization
Abstract
:1. Introduction
- We formulate the supply chain inventory optimization problem as a Markov Decision Process and apply the SAC algorithm with prioritized experience replay to learn adaptive policies under demand uncertainty.
- The blockchain architecture with smart contracts is designed to enable secure, transparent, and traceable record-keeping and automated execution of supply chain transactions.
- We integrate the SAC-based inventory optimization model with the blockchain-based zero-trust mechanism, creating a unified framework for secure and efficient supply chain management.
- We conduct experiments using real-world supply chain data to evaluate the performance of the proposed framework in terms of reward maximization, inventory stability, and security metrics.
2. Related Work
3. Methodology
3.1. Problem Formulation
3.2. Soft Actor–Critic with Prioritized Experience Replay
- Compute the TD error for each transition when it is added to the replay buffer.
- Assign a priority value to each transition based on its TD error, using a priority function (e.g., proportional or rank-based).
- When sampling a batch of transitions from the replay buffer, use the priority values to determine the sampling probabilities.
- Update the priorities of the sampled transitions based on their updated TD errors after the learning step.
- Initialize the actor network with parameters θ, the critic networks with parameters ϕ1 and ϕ2, and the replay buffer D.
- Set the target network parameters equal to the main network parameters: ϕtar,1 ← ϕ1, ϕtar,2 ← ϕ2.
- For each episode:
- a.
- For each time step:
- i.
- Observe the current state s and select an action a~πθ(·|s) using the actor network.
- ii.
- Execute the action a in the environment and observe the next state’s one-step reward r, and store the transition (s, a, r, s’) in the replay buffer D.
- iii.
- Sample a batch B of transitions from the replay buffer D based on their priority scores.
- iv.
- Compute the target values y(r, s’) using the target critic networks and the entropy-regularized policy.
- v.
- Update the critic networks by minimizing the mean-squared Bellman error using the sampled transitions.
- vi.
- Update the actor network by maximizing the expected Q-value minus the entropy term.
- vii.
- Update the target networks using a soft update rule: ϕtar,i ← ρϕtar,i + (1 − ρ)ϕi, for i = 1,2.
3.3. Blockchain-Based Framework for Factory Supply Chain Management
4. Experiment
4.1. Simulation Environment and Parameters
4.2. Results and Analysis
5. Conclusions and Future Directions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, H.; Chen, Z.; Lin, F.; Zhuang, P. Effective management for blockchain-based agri-food supply chains using deep reinforcement learning. IEEE Access 2021, 9, 36008–36018. [Google Scholar] [CrossRef]
- Ableeva, A.M.; Salimova, G.A.; Rafikova, N.T.; Fazrahmanov, I.I.; Zalilova, Z.A.; Lubova, T.N.; Nigmatullina, G.R.; Girfanova, I.N.; Farrakhova, F.F.; Hazieva, A.M. Economic evaluation of the efficiency of supply chain management in agricultural production based on multidimensional research methods. Int. J. Supply Chain Manag. 2019, 8, 328. [Google Scholar]
- Castro, J.A.O.; Jaimes, W.A. Dynamic impact of the structure of the supply chain of perishable foods on logistics performance and food security. J. Ind. Eng. Manag. 2017, 10, 687–710. [Google Scholar]
- Samadi, A.; Mehranfar, N.; Fathollahi Fard, A.M.; Hajiaghaei-Keshteli, M. Heuristic-based metaheuristics to address a sustainable supply chain network design problem. J. Ind. Prod. Eng. 2018, 35, 102–117. [Google Scholar] [CrossRef]
- Agi, M.A.N.; Faramarzi-Oghani, S.; Hazır, Ö. Game theory-based models in green supply chain management: A review of the literature. Int. J. Prod. Res. 2021, 59, 4736–4755. [Google Scholar] [CrossRef]
- Alves, J.C.; Mateus, G.R. Deep reinforcement learning and optimization approach for multi-echelon supply chain with uncertain demands. In Proceedings of the International Conference on Computational Logistics, Enschede, The Netherlands, 28–30 September 2020; Springer International Publishing: Cham, Germany, 2020; pp. 584–599. [Google Scholar]
- Ismail, S.; Reza, H. Security Challenges of Blockchain-Based Supply Chain Systems. In Proceedings of the 2022 IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 26–29 October 2022; pp. 1–6. [Google Scholar]
- Feng, H.; Wang, X.; Duan, Y.; Zhang, J.; Zhang, X. Applying blockchain technology to improve agri-food traceability: A review of development methods, benefits and challenges. J. Clean. Prod. 2020, 260, 121031. [Google Scholar] [CrossRef]
- Lin, Q.; Wang, H.; Pei, X.; Wang, J. Food safety traceability system based on blockchain and EPCIS. IEEE Access 2019, 7, 20698–20707. [Google Scholar] [CrossRef]
- Peng, Z.; Zhang, Y.; Feng, Y.; Zhang, T.; Wu, Z.; Su, H. Deep reinforcement learning approach for capacitated supply chain optimization under demand uncertainty. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; pp. 3512–3517. [Google Scholar]
- Powell, W.; Cao, S.; Foth, M.; He, S.; Turner-Morris, C.; Li, M. Revisiting trust in supply chains: How does blockchain redefine trust? In Blockchain Driven Supply Chains and Enterprise Information Systems; Springer International Publishing: Cham, Germany, 2022; pp. 21–42. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
- Gonczol, P.; Katsikouli, P.; Herskind, L.; Dragoni, N. Blockchain implementations and use cases for supply chains-a survey. IEEE Access 2020, 8, 11856–11871. [Google Scholar] [CrossRef]
- Malik, S.; Dedeoglu, V.; Kanhere, S.S.; Jurdak, R. Trustchain: Trust management in blockchain and iot supported supply chains. In Proceedings of the 2019 IEEE International Conference on Blockchain (Blockchain), Atlanta, GA, USA, 14–17 July 2019; pp. 184–193. [Google Scholar]
- Cheng, X.; Chen, F.; Xie, D.; Sun, H.; Huang, C. Design of a secure medical data sharing scheme based on blockchain. J. Med. Syst. 2020, 44, 52. [Google Scholar] [CrossRef] [PubMed]
- Jevtic, M.; Khan, S.; Gomes, J.; Svetinovic, D. Blockchain-Based Countermeasures for Luxury Goods Counterfeiting: A Focused Survey. In Proceedings of the 2023 Fifth International Conference on Blockchain Computing and Applications (BCCA), Kuwait, Kuwait, 24–26 October 2023; pp. 530–537. [Google Scholar]
- Fernando, E. Success factor of implementation blockchain technology in pharmaceutical industry: A literature review. In Proceedings of the 2019 6th International Conference on Information Technology, Computer and Electrical Engineering (ICITACEE), Semarang, Indonesia, 26–27 September 2019; pp. 1–5. [Google Scholar]
- Reyna, A.; Martín, C.; Chen, J.; Soler, E.; Díaz, M. On blockchain and its integration with IoT. Challenges and opportunities. Future Gener. Comput. Syst. 2018, 88, 173–190. [Google Scholar] [CrossRef]
- Nguyen, D.C.; Ding, M.; Pham, Q.V.; Pathirana, P.N.; Le, L.B.; Seneviratne, A.; Li, J.; Niyato, D.; Poor, H.V. Federated learning meets blockchain in edge computing: Opportunities and challenges. IEEE Internet Things J. 2021, 8, 12806–12825. [Google Scholar] [CrossRef]
- Mlika, Z.; Cherkaoui, S. Network slicing with MEC and deep reinforcement learning for the Internet of Vehicles. IEEE Netw. 2021, 35, 132–138. [Google Scholar] [CrossRef]
- Ohm, M.; Kempf, L.; Boes, F.; Meier, M. Supporting the detection of software supply chain attacks through unsupervised signature generation. arXiv 2020, arXiv:2011.02235. [Google Scholar]
- Ismail, S.; Moudoud, H.; Dawoud, D.; Reza, H. Blockchain-Based Zero Trust Supply Chain Security Integrated with Deep Reinforcement Learning. Preprints 2024, 2024030714. Available online: https://www.preprints.org/manuscript/202403.0714/v1 (accessed on 1 March 2024). [CrossRef]
- Melnyk, S.A.; Bititci, U.; Platts, K.; Tobias, J.; Andersen, B. Is performance measurement and management fit for the future? Manag. Account. Res. 2014, 25, 173–186. [Google Scholar] [CrossRef]
- Moudoud, H.; Cherkaoui, S. Empowering Security and Trust in 5G and Beyond: A Deep Reinforcement Learning Approach. IEEE Open J. Commun. Soc. 2023, 4, 2410–2420. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.M.O.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D.P. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
Model | Average Reward | Training Time (mins) |
---|---|---|
PPO | 0.76 | 13.6 |
DDPG | 0.71 | 12.9 |
SAC | 0.87 | 13.2 |
TD3 | 0.74 | 12.4 |
SAC-rainbow (our proposed algorithm) | 0.92 | 10.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, Z.; Chen, X.; Sun, T.; Wang, X.; Wu, Y.C.; Zhou, M. Blockchain-Based Zero-Trust Supply Chain Security Integrated with Deep Reinforcement Learning for Inventory Optimization. Future Internet 2024, 16, 163. https://doi.org/10.3390/fi16050163
Ma Z, Chen X, Sun T, Wang X, Wu YC, Zhou M. Blockchain-Based Zero-Trust Supply Chain Security Integrated with Deep Reinforcement Learning for Inventory Optimization. Future Internet. 2024; 16(5):163. https://doi.org/10.3390/fi16050163
Chicago/Turabian StyleMa, Zhe, Xuhesheng Chen, Tiejiang Sun, Xukang Wang, Ying Cheng Wu, and Mengjie Zhou. 2024. "Blockchain-Based Zero-Trust Supply Chain Security Integrated with Deep Reinforcement Learning for Inventory Optimization" Future Internet 16, no. 5: 163. https://doi.org/10.3390/fi16050163
APA StyleMa, Z., Chen, X., Sun, T., Wang, X., Wu, Y. C., & Zhou, M. (2024). Blockchain-Based Zero-Trust Supply Chain Security Integrated with Deep Reinforcement Learning for Inventory Optimization. Future Internet, 16(5), 163. https://doi.org/10.3390/fi16050163