Suppressing High-Frequency Action Noise in DRL-Based Process Control: A Dual Strategy for Thermal Regeneration Column

Si, Shuaoyun; Pan, Jincheng; Wan, Hui; Guan, Guofeng

doi:10.3390/pr14101598

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Suppressing High-Frequency Action Noise in DRL-Based Process Control: A Dual Strategy for Thermal Regeneration Column

by

Shuaoyun Si

,

Jincheng Pan

,

Hui Wan

^* and

Guofeng Guan

State Key Laboratory of Materials-Oriented Chemical Engineering, College of Chemical Engineering, Jiangsu National Synergetic Innovation Center for Advanced Materials, Jiangsu Collaborative Innovation Center for Advanced Inorganic Function Composites, Nanjing Tech University, Nanjing 211816, China

^*

Author to whom correspondence should be addressed.

Processes 2026, 14(10), 1598; https://doi.org/10.3390/pr14101598

Submission received: 17 April 2026 / Revised: 8 May 2026 / Accepted: 12 May 2026 / Published: 14 May 2026

(This article belongs to the Topic Artificial Intelligence and Automation in Chemical Engineering)

Download Versions Notes

Abstract

Stochastic policy reinforcement learning (RL) algorithms are widely used in industrial control due to their strong exploration ability and high sample efficiency. However, these algorithms often produce large action fluctuations and noise, making them unsuitable for steady-state chemical processes. To solve this problem, this study uses a thermal regeneration column (TRC) as the research object and selects the Soft Actor-Critic (SAC) algorithm as the baseline. Three strategies are introduced to improve the SAC algorithm: an action-amplitude-constrained reward function, a low-pass filter, and a Kalman filter. Experimental results show that the combination of the action-amplitude-constrained reward function and the Kalman filter achieves the best performance. Compared with the traditional SAC algorithm, the fluctuation amplitudes of steam consumption, cooling water consumption, sulfur concentration and methanol makeup rate are reduced by 85.50%, 82.81%, 90.84% and 85.49%, respectively. In addition, the fluctuation amplitude of the reward function decreases by 90.68%. This method not only optimizes operating costs but also ensures the stable operation of the TRC.

Keywords: deep reinforcement learning; Soft Actor-Critic; TRC; Kalman filter; action constraint; stable control

Share and Cite

MDPI and ACS Style

Si, S.; Pan, J.; Wan, H.; Guan, G. Suppressing High-Frequency Action Noise in DRL-Based Process Control: A Dual Strategy for Thermal Regeneration Column. Processes 2026, 14, 1598. https://doi.org/10.3390/pr14101598

AMA Style

Si S, Pan J, Wan H, Guan G. Suppressing High-Frequency Action Noise in DRL-Based Process Control: A Dual Strategy for Thermal Regeneration Column. Processes. 2026; 14(10):1598. https://doi.org/10.3390/pr14101598

Chicago/Turabian Style

Si, Shuaoyun, Jincheng Pan, Hui Wan, and Guofeng Guan. 2026. "Suppressing High-Frequency Action Noise in DRL-Based Process Control: A Dual Strategy for Thermal Regeneration Column" Processes 14, no. 10: 1598. https://doi.org/10.3390/pr14101598

APA Style

Si, S., Pan, J., Wan, H., & Guan, G. (2026). Suppressing High-Frequency Action Noise in DRL-Based Process Control: A Dual Strategy for Thermal Regeneration Column. Processes, 14(10), 1598. https://doi.org/10.3390/pr14101598

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Suppressing High-Frequency Action Noise in DRL-Based Process Control: A Dual Strategy for Thermal Regeneration Column

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI