Abstract
To address the issues of parameter setting, reliance on human experience, and the limitations of traditional model-driven control methods in handling complex nonlinear dynamics in the tin smelting industrial process, this paper proposes a data-driven control approach based on improved deep reinforcement learning (RL). Aiming to reduce the tin entrainment rate in smelting slag and CO emissions in exhaust gas, we construct a data-driven environment model with an 8-dimensional state space (including furnace temperature, pressure, gas composition, etc.) and an 8-dimensional action space (including lance parameters such as material flow, oxygen content, backpressure, etc.). We innovatively design a Dual-Action Discriminative Deep Deterministic Policy Gradient (DADDPG) algorithm. This method employs an online Actor network to simultaneously generate deterministic and exploratory random actions, with the Critic network selecting high-value actions for execution, consistently enhancing policy exploration efficiency. Combined with a composite reward function (integrating real-time Sn/CO content, their variations, and continuous penalty mechanisms for safety constraints), the approach achieves multi-objective dynamic optimization. Experiments based on real tin smelting production line data validate the environment model, with results demonstrating that the tin content in slag is reduced to between 3.5% and 4%, and CO content in exhaust gas is decreased to between 2000 and 2700 ppm.