Abstract
Modern manufacturing robots must dynamically balance multiple conflicting objectives amid rapidly evolving production demands. Traditional control approaches lack the adaptability required for real-time decision-making in Industry 4.0 environments. This study presents an adaptive multi-objective reinforcement learning (MORL) framework integrating dynamic preference weighting with Pareto-optimal policy discovery for real-time adaptation without manual reconfiguration. Experimental validation employed a UR5 manipulator with RG2 gripper performing quality-aware object sorting in CoppeliaSim with realistic physics (friction μ = 0.4, Bullet engine), manipulating 12 objects across four geometric types on a dynamic conveyor. Thirty independent runs per algorithm (seven baselines, 30,000+ manipulation cycles) demonstrated +24.59% to +34.75% improvements (p < 0.001, d = 0.89–1.52), achieving hypervolume 0.076 ± 0.015 (19.7% coefficient of variation—lowest among all methods) and 95% optimal performance within 180 episodes—five times faster than evolutionary baselines. Four independent verification methods (WFG, PyMOO, Monte Carlo, HSO) confirmed measurement reliability (<0.26% variance). The framework maintains edge computing compatibility (<2 GB RAM, <50 ms latency) and seamless integration with Manufacturing Execution Systems and digital twins. This research establishes new benchmarks for adaptive robotic control in sustainable Industry 4.0/5.0 manufacturing.