LLM-Assisted Reinforcement Learning for U-Shaped and Circular Hybrid Disassembly Line Balancing in IoT-Enabled Smart Manufacturing
Abstract
:1. Introduction
- We propose a novel hybrid disassembly system combining U-shaped and circular disassembly line layouts, considering worker posture constraints, and formulate a profit-maximization mathematical model.
- We design a QLoRA-based instruction-fine-tuning approach for LLMs and redesign the Duel-DQN algorithm to efficiently solve the complex UC-HDLBP.
- We demonstrate that the framework can be extended to address cybersecurity, privacy, and trust challenges in IoT-enabled disassembly environments, making it suitable for intelligent, secure manufacturing applications.
2. Problem Description
2.1. U-Shaped and Circular Disassembly Line Balance Problem
2.2. Mathematical Model
- The relevant parameters of ELOPs are constant and known.
- ELOPs may undergo complete or partial disassembly.
- The disassembly line and workstation parameters are known and constant.
- The time required to perform each disassembly task is fixed.
- Task assignments must satisfy both precedence and conflict constraints.
2.2.1. Symbol Definition
Set of all ELOP to be disassembled, = {1, 2, …, P}. | |
Set of all components/parts in product p, . | |
Set of all tasks in product p, . | |
Set of all disassembly lines, . | |
U-shaped disassembly line edge selection set, . | |
Set of all U-shaped workstations, . | |
Set of all circular workstations, . | |
Set of the relationship between components i and task j in product p. | |
Set of tasks that conflict with task j in product p. | |
Set of immediate tasks for task j in product p. | |
P | Number of products. |
Number of components/parts in production p. | |
Number of tasks in production p. | |
Number of U-shaped workstations. | |
Number of circular workstations. | |
The value of component i in product p. | |
Time to execute the j-th task of the p-th product. | |
Unit time cost of executing the j-th task of the p-th product. | |
Unit time cost of opening of the l-th disassembly line. | |
Fixed cost of opening of the w-th workstation in l-th. | |
Posture to execute the j-th task of the p-th product. | |
Configuration of opening of the w-th workstation in the l-th disassembly line. |
2.2.2. Relation Matrix
2.2.3. Decision Variables
2.2.4. Maximizing Profit Model
2.2.5. Constraints
- (1)
- U-shaped Disassembly Line Constraint:
- (2)
- Circular Disassembly Line Constraint:
- (3)
- Hybrid Disassembly Line Constraint:
3. Large Language Model Assisted Duel-DQN
3.1. Instruction-Fine-Tuning
3.1.1. Instruction Set Construction
- Instruction: The high-level prompt describing the disassembly task.
- Input: Specific product-related information provided to the LLM.
- Output: The expected disassembly sequence and line assignment.
An Example of Instruction:instcution: The “personal computer” contains the following disassembly sequences: [1, 3, 7, 11], [1, 2, 5, 10], and [1, 3, 6, 9]. It also includes two disassembly lines: 0 and 1. Only output the selected disassembly lines and disassembly sequences.input: Select a disassembly sequence for the personal computer and allocate the disassembly line.output: 1,[1, 3, 7, 11]
Algorithm 1 Instruction set Generation Algorithm |
Require: Name (the name of product p) and (set of conflict matrices of products) Ensure: Instruction set
|
3.1.2. Instruction Set Generation
3.1.3. Expected Fine-Tuning Result
Q: Select a disassembly sequence for the personal computer and allocate the disassembly line.
A: 0, [1, 2, 5, 10]
3.2. Design of the Duel-DQN Algorithm
3.2.1. State Space
3.2.2. Action Space
3.2.3. Reward Design
3.2.4. Train Algorithm
Algorithm 2 Task Allocation Phase Algorithm |
Require: LLM, IS, set of disassembly sequences , action A, selected disassembly line l, set of workstations , state , Ensure: ,
|
4. Experiment and Results
4.1. Experimental Cases
4.2. Comparison of Fine-Tuning Effects of Trained Products
- Load the fine-tuned model and the original model, respectively.
- Conduct 15,000 questions, with the scope limited to the disassembly sequences of the four products in Table 2 Use the AND/OR diagrams of the corresponding products as the result verification, and record the number of legal disassembly sequences recommended by the model.
- Before the experiment, enable the model to understand the products in the form of describing the AND/OR diagrams. When starting the test, ask questions to make the model provide the disassembly sequences.
4.3. Comparison of the Fine-Tuning Effects for Untrained Products
4.4. Comparison of Iteration Efficiency
4.5. Comparison and Analysis of Results
5. Cybersecurity and Privacy in IoT-Enabled Disassembly Systems
- Privacy-Aware Task Sequencing: The fine-tuned LLM can be enhanced to prioritize the early disassembly of components containing confidential or security-sensitive data (e.g., memory chips in smart devices). This enables safe disposal or isolation of sensitive components early in the workflow.
- Federated Learning for Worker Privacy: Our reinforcement learning agent can be integrated with federated learning frameworks [19] to enable decentralized training across IoT-enabled workstations, where posture, fatigue, or biometric data are processed locally. This prevents raw data transmission and improves compliance with privacy regulations.
- Secure Communication for Resource-Constrained Devices: Lightweight cryptographic protocols [20] and attribute-based encryption [21] can be implemented within the control layer to secure the transmission of disassembly instructions between edge devices and central coordinators. Our framework supports these protocols due to its modular decision-making architecture.
- Trust-Aware Reward Design: Privacy or safety constraints can be embedded into the reinforcement learning reward function—for example, penalizing task allocations that require a worker to reveal identifiable biometric data or exceed ergonomic risk thresholds. The LLM’s instruction-tuning mechanism enables these soft constraints to be flexibly encoded.
- Practical Integration: These features have not yet been implemented in the current experiment but are technically feasible given the modular structure of our architecture. For example, by combining federated optimization with edge inference on IoT devices, future extensions can support privacy-by-design learning loops. Our goal in this section is to articulate how LLM-based decision engines can act as gateways to integrate optimization, learning, and secure communication for intelligent disassembly environments.
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ELOPs | End-of-Life Products |
LLM | Large Language Model |
DQN | Deep Q-Network |
Duel-DQN | Dueling Deep Q-Network |
DSP | Disassembly Sequence Problem |
DLBP | Disassembly Line Balancing Problem |
HDLBP | Hybrid Disassembly Line Balancing Problem |
UC-HDLBP | U-shaped and Circular Hybrid Disassembly Line Balancing Problem |
IoT | Internet of Things |
References
- Gungor, A.; Gupta, S.M.; Pochampally, K.; Kamarthi, S.V. Complications in disassembly line balancing. In Proceedings of the Environmentally Conscious Manufacturing, Tokyo, Japan, 11–15 December 2001; pp. 289–298. [Google Scholar]
- Altekin, F.T.; Kandiller, L.; Ozdemirel, N.E. Profit oriented disassembly-line balancing. Int. J. Prod. Res. 2008, 46, 2675–2693. [Google Scholar]
- Agrawal, S.; Tiwari, M.K. A collaborative ant colony algorithm to stochastic mixed-model U-shaped disassembly line balancing and sequencing problem. Int. J. Prod. Res. 2008, 46, 1405–1429. [Google Scholar] [CrossRef]
- Altekin, F.T. A comparison of piecewise linear programming formulations for stochastic disassembly line balancing. Int. J. Prod. Res. 2017, 55, 7412–7434. [Google Scholar] [CrossRef]
- Wang, K.; Li, X.; Gao, L.; Garg, A. Partial disassembly line balancing for energy consumption and profit under uncertainty. Robot. Comput.-Integr. Manuf. 2019, 59, 235–251. [Google Scholar] [CrossRef]
- Li, Z.; Janardhanan, M.N. Modelling and solving profit-oriented U-shaped partial disassembly line balancing problem. Expert Syst. Appl. 2021, 183, 115431. [Google Scholar] [CrossRef]
- Tang, Y.; Zhou, M.C.; Gao, M. Fuzzy-Petri-Net Based Disassembly Planning Considering Human Factors. IEEE Trans. Syst. Man Cybern.-Part A 2006, 36, 718–726. [Google Scholar] [CrossRef]
- Kara, Y.; Atasagun, Y.; Gökçen, H.; Hezer, S.; Demirel, N. An integrated model to incorporate ergonomics and resource restrictions into assembly line balancing. Int. J. Comput. Integr. Manuf. 2014, 27, 997–1007. [Google Scholar] [CrossRef]
- Guo, X.; Wei, T.; Wang, J.; Liu, S.; Qin, S.; Qi, L. Multi-objective U-shaped Disassembly Line Balancing Problem Considering Human Fatigue Index and An Efficient Solution. IEEE Trans. Comput. Soc. Syst. 2022, 10, 2061–2073. [Google Scholar] [CrossRef]
- Chau, M.Q.; Nguyen, X.P.; Huynh, T.T.; Chu, V.D.; Le, T.H.; Nguyen, T.P.; Nguyen, D.T. Prospects of application of IoT-based advanced technologies in remanufacturing process towards sustainable development and energy-efficient use. Energy Sources Part A Recover. Util. Environ. Eff. 2021, 1–25. [Google Scholar] [CrossRef]
- Wong, K.S.; Kim, M.H. Privacy protection for data-driven smart manufacturing systems. Int. J. Web Serv. Res. (IJWSR) 2017, 14, 17–32. [Google Scholar] [CrossRef]
- Avikal, S.; Jain, R.; Mishra, P.K. A Kano model, AHP and M-TOPSIS method-based technique for disas sembly line balancing under fuzzy environment. Appl. Soft Comput. 2014, 25, 519–529. [Google Scholar] [CrossRef]
- Zhu, L.X.; Zhang, Z.Q.; Guan, C. Multi-objective partial parallel disassembly line balancing problem using hybrid group neighbourhood search algorithm. J. Manuf. Syst. 2020, 56, 252–269. [Google Scholar] [CrossRef]
- Altekin, F.T.; Bayındır, Z.P.; Gümüşkaya, V. Remedial actions for disassembly lines with stochastic task times. Comput. Ind. Eng. 2016, 99, 78–96. [Google Scholar] [CrossRef]
- Gu, J.; Wang, J.; Guo, X.; Liu, G.; Qin, S.; Bi, Z. A Metaverse Based Teaching Building Evacuation Training System with Deep Reinforcement Learning. IEEE Trans. Syst. Man Cybern. Syst. 2022, 53, 2209–2219. [Google Scholar] [CrossRef]
- Guo, X.; Jiao, C.; Ji, P.; Wang, J.; Qin, S.; Hu, B.; Qi, L.; Lang, X. Large Language Model-Assisted Reinforcement Learning for Hybrid Disassembly Line Problem. Mathematics 2024, 12, 4000. [Google Scholar] [CrossRef]
- Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. Qlora: Efficient finetuning of quantized llms. Adv. Neural Inf. Process. Syst. 2024, 36, 10088–10115. [Google Scholar]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.V.; Lanctot, M.; Freitas, N.D. Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on International Conference on Machine Learning—Volume 48 (ICML’16), New York, NY, USA, 20–22 June 2016; pp. 1995–2003. [Google Scholar]
- Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–19. [Google Scholar] [CrossRef]
- Ali, A.; Zhang, L.; Arshad, J.; Mahmood, Y. A lightweight encryption scheme for industrial internet of things using physically unclonable functions. IEEE Trans. Ind. Inform. 2021, 17, 3980–3989. [Google Scholar]
- Zhang, R.; Liu, L.; Chen, Y.; Nepal, S. Attribute-based encryption for cloud computing access control: A survey. IEEE Access 2018, 6, 49428–49449. [Google Scholar] [CrossRef]
Disassembly Line Type | ||
---|---|---|
Action Value | U-Shape | Circular |
Select entrance side | Select workstation. | |
Select exit side | Select workstation. | |
do not perform the disassembly task |
Case ID | Product Quantity | Number of Disassemble Line Workstations | ||||
---|---|---|---|---|---|---|
PC | Washing Machine | Radio | Mechanical Module | U-Shaped | Circular | |
1 | 1 | 1 | 1 | 1 | 5 | 5 |
2 | 2 | 2 | 2 | 3 | 6 | 6 |
3 | 3 | 5 | 5 | 6 | 8 | 8 |
4 | 10 | 10 | 10 | 10 | 10 | 10 |
Case ID | Profit | |||
---|---|---|---|---|
CPLEX | LLM-Assisted Duel-DQN | DQN | Duel-DQN | |
1 | 1344 | 1299 | 1036 | 1296 |
2 | 2369 | 2266 | 1447 | 1452 |
3 | 5562 | 5458 | 3518 | 1685 |
4 | 12,624 | 12,246 | 141 | 139 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, X.; Jiao, C.; Wang, J.; Qin, S.; Hu, B.; Qi, L.; Lang, X.; Zhang, Z. LLM-Assisted Reinforcement Learning for U-Shaped and Circular Hybrid Disassembly Line Balancing in IoT-Enabled Smart Manufacturing. Electronics 2025, 14, 2290. https://doi.org/10.3390/electronics14112290
Guo X, Jiao C, Wang J, Qin S, Hu B, Qi L, Lang X, Zhang Z. LLM-Assisted Reinforcement Learning for U-Shaped and Circular Hybrid Disassembly Line Balancing in IoT-Enabled Smart Manufacturing. Electronics. 2025; 14(11):2290. https://doi.org/10.3390/electronics14112290
Chicago/Turabian StyleGuo, Xiwang, Chi Jiao, Jiacun Wang, Shujin Qin, Bin Hu, Liang Qi, Xianming Lang, and Zhiwei Zhang. 2025. "LLM-Assisted Reinforcement Learning for U-Shaped and Circular Hybrid Disassembly Line Balancing in IoT-Enabled Smart Manufacturing" Electronics 14, no. 11: 2290. https://doi.org/10.3390/electronics14112290
APA StyleGuo, X., Jiao, C., Wang, J., Qin, S., Hu, B., Qi, L., Lang, X., & Zhang, Z. (2025). LLM-Assisted Reinforcement Learning for U-Shaped and Circular Hybrid Disassembly Line Balancing in IoT-Enabled Smart Manufacturing. Electronics, 14(11), 2290. https://doi.org/10.3390/electronics14112290