Intelligent Control of the Main Steam Flow Rate for the Municipal Solid Waste Incineration Process

Pian, Jinxiang; Liu, Jianyong; Tang, Jian; Hou, Jing

doi:10.3390/su17136036

Open AccessArticle

Intelligent Control of the Main Steam Flow Rate for the Municipal Solid Waste Incineration Process

¹

School of Electrical and Control Engineering, Shenyang Jianzhu University, Shenyang 110168, China

²

School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(13), 6036; https://doi.org/10.3390/su17136036

Submission received: 30 May 2025 / Revised: 25 June 2025 / Accepted: 30 June 2025 / Published: 1 July 2025

(This article belongs to the Section Waste and Recycling)

Download

Browse Figures

Versions Notes

Abstract

The stable control of the main steam flow rate (MSFR) can effectively improve the waste combustion efficiency and energy utilization, reduce environmental pollution, and is crucial for promoting the sustainable development of municipal solid waste incineration (MSWI). Developed countries benefit from stable municipal solid waste (MSW) composition, enabling advanced automated combustion control. However, in developing countries, fluctuating waste composition and calorific value cause frequent disturbances, limiting the use of foreign control methods. Therefore, MSFR control technologies suited to developing countries are crucial. This study proposes a two-layer intelligent control method, consisting of an optimization setting layer and a loop control layer. The optimization layer uses a steam flow prediction model (OPTICS and RBF) and an improved antlion optimizer (IALO) for manipulated variable setpoints. The control layer applies reinforcement learning (actor–critic) to fine-tune PI controller parameters. Experimental results show that the proposed method adaptively adjusts manipulated variables, ensuring MSFR control within the target range and maintaining efficient, stable MSWI operation.

Keywords:

municipal solid waste incineration (MSWI); two-layer intelligent control method; main steam flow rate (MSFR); intelligent control; reinforcement learning

1. Introduction

Pollution control remains a critical issue for global sustainable development [1]. With the acceleration of urbanization, the annual generation of municipal solid waste (MSW) worldwide continues to grow steadily at a rate of 8–10% [2]. In developing countries where waste classification systems are not yet well-established, the problem of “waste surrounding cities” has become increasingly severe [3]. Therefore, effective measures must be taken to address the growing volume of waste in order to prevent further environmental deterioration. Compared to traditional waste management methods, municipal solid waste incineration (MSWI) offers significant advantages and can effectively reduce pollution. Traditional methods such as landfilling and open burning often lead to land and air pollution, whereas MSWI converts waste into energy through advanced combustion technologies. This not only alleviates environmental pressure but also recovers thermal energy, promoting sustainable urban development. However, during the incineration process, the stable control of the main steam flow rate (MSFR) is one of the key factors for the stable operation of the MSWI system. Stable control of MSFR can improve energy efficiency, reduce emissions, and promote resource recycling [4,5]. Therefore, stable control of the MSFR is crucial for both the MSWI operation and the sustainable development of cities.

In developed countries, waste separation systems are well implemented. This results in relatively uniform MSW composition and stable calorific values. Such conditions provide a solid foundation for applying automated combustion control strategies [6]. However, in developing countries, where waste classification is still under development, MSW composition and calorific value fluctuate widely. These fluctuations cause frequent disturbances in incineration, making it difficult to directly apply control strategies developed abroad [7]. Therefore, the effective MSFR control technologies tailored to developing countries’ complex conditions are essential for improving incineration stability and pollution control efficiency.

Thanks to rigorous classification and standardized pretreatment, developed countries have created stable incineration environments. Highly automated control systems further ensure these environments are well controlled. This provides ideal conditions for advanced mechanistic model-based control strategies [8,9]. For example, a pioneering two-stage closed-loop identification method was proposed [10], which introduces an input sensitivity function to reduce the impact of disturbances on modeling accuracy. This approach produced a multiple-input multiple-output (MIMO) model that maintained MSFR prediction errors below 4% at the HVC incineration plant in the Netherlands. Building on this, a nonlinear model predictive control (MPC) system based on a three-zone combustion model was developed [11]. By optimizing grate speed and secondary air distribution, the system decoupled MSFR and carbon monoxide emissions and reduced steam flow response delay to 80 s, greatly improving dynamic performance. Another study applied MPC to MSFR control by constructing a precise predictive model [12] that accounted for variations in waste composition and equipment status. This method significantly enhanced MSFR control stability and accuracy.

The above mechanistic modeling and advanced control methods, grounded in stable operation conditions, provide valuable theoretical and practical foundations for global MSWI control technologies. However, in developing countries, kitchen waste accounts for more than 50% of MSW, leading to high moisture content and low calorific value. These factors severely reduce combustion stability and cause MSFR to be highly sensitive to fuel disturbances [13]. There is a strong coupling among furnace temperature, flue gas oxygen concentration, and MSFR during incineration [14]. Empirical data show that every 5% increase in moisture content expands the MSFR fluctuation range by ±1.2% [15]. This makes it difficult to directly transfer control strategies from developed countries to practical applications in developing countries.

To address these challenges of varying operating conditions in developing countries, researchers have carried out systematic technological innovations based on data-driven approaches and intelligent algorithms. Traditional PI parameter tuning methods are unsuitable due to changes in process characteristics [16,17], an event-triggered RBF–PID controller is developed in [18]. It can adaptively adjust PID parameters based on dynamic furnace temperature changes. It reduces MSFR recovery time under load disturbances to 120 s while keeping fluctuations within ±2%. To better serve control objectives, a fuzzy-width forest regression soft sensing method is proposed in [19], maintaining MSFR prediction errors within ±3.5% even under calorific value fluctuations of ±15%. A Takagi–Sugeno (TS) fuzzy neural network is employed to achieve nonlinear decoupling among multiple variables in [20], keeping MSFR prediction errors below 5%, further improving control accuracy. Additionally, advances in swarm intelligence have opened new perspectives for MSFR control [21,22]. For instance, the beetle antennae search–support vector machine (BAS–SVM) algorithm, inspired by swarm behavior, achieves global optimization of control variables. It balances multiple objectives, including steam flow stability, energy efficiency, and pollutant emissions [23]. This method enhances system adaptability, robustness, and optimization, greatly improving steam flow control effectiveness.

Although the previous studies have significantly improved system control performance, several key technical challenges remain. For example, in multivariable decoupling, relative gain array (RGA)-based methods reduce coupling to some extent [24], but MSWI involves complex pyrolysis, gasification, and combustion reactions. The strong nonlinearity and uncertainty often cause model mismatch, limiting these methods’ broad application. More importantly, existing control strategies mainly focus on precise tracking of target variables but lack the ability to dynamically optimize control setpoints [25]. However, in dynamic operating conditions, the predefined setpoints often deviate from their optimal values. In such cases, experienced operators must manually adjust parameters such as airflow based on empirical judgment. Yet, when the composition of waste varies significantly or combustion conditions fluctuate rapidly, expert intervention may be delayed or inconsistent due to differences in individual expertise, experience, and skill levels. This subjectivity can hinder effective steam flow regulation and compromise overall system performance. Reference [26] addresses the issue of varying operating conditions by proposing a two-layer control architecture for operational management and hydroelectricity production maximization in inland waterways. It eliminates the subjective errors introduced by manual settings.

Developing countries face challenges due to immature waste sorting systems, large fluctuations in waste composition and calorific value, and frequent changes in combustion conditions. This makes it difficult to achieve automated combustion control like in developed nations. To address this, we propose a two-layer intelligent control system for the MSFR. The system proposed in the paper accounts for computational and infrastructure limitations while handling changing conditions. Regarding hardware limitations, we use a two-layer structure in the system design. The optimization setting and loop control layers operate on separate devices. This reduces reliance on a single device and ensures better task distribution. The system can adjust control parameters in real-time, based on current conditions. A PI controller tracks the new set points, ensuring stable and efficient operation. On the other hand, the antlion optimization (ALO) and reinforcement learning algorithms are innovative but have existing applications. Local technicians in developing countries can perform routine maintenance after proper training, reducing reliance on external experts and lowering costs. Furthermore, the system uses buffering and data interpolation. This ensures stable operation even when data acquisition devices are incomplete or delayed. Therefore, the main contributions of this study are as follows:

(1): An RBF neural network prediction model was established at the optimization setting level to accurately predict the MSFR. By introducing the OPTICS algorithm, the centers and widths of the RBF hidden layer were objectively determined, and the gradient descent algorithm was used to adaptively adjust the neural network parameters. The model’s accuracy and good convergence were validated through experiments involving five different models (BP neural network, RBF neural network, RBF neural network with maximum matrix element algorithm, and RBF neural network with K-means clustering).
(2): Additionally, optimization setting of the manipulated variables based on IALO is proposed. The manipulated variables’ setting values can be dynamically adjusted in response to changes in operating conditions.
(3): In the control layer, reinforcement learning was introduced to the PI control system, which can quickly track the set values, especially when operating conditions change, with the PI parameters automatically tuned. Compared to the QDRNN-PID control system and the standard PID control system, the proposed intelligent optimization control method for MSFR demonstrated strong robustness and dynamic response capabilities.

The remainder of this article is organized as follows. Section 2 analyzes the MSWI process and control problems. Section 3 introduces the methodology of the MSFR control. Section 4 discusses experimental results. Finally, Section 5 concludes this article.

2. Control Problem Description in the MSWI Process

2.1. Description of MSWI Process Flow

A typical municipal solid waste incineration (MSWI) process flow is shown in Figure 1, which consists of six key systems working in synergy: the solid waste fermentation system, solid waste combustion system, waste heat recovery system, steam generation system, flue gas filtration system, and flue gas emission system. In the solid waste combustion system, pre-treated solid waste, after classification, is conveyed to the incinerator through a hopper. The solid waste undergoes complete combustion within the system, releasing substantial thermal energy, which is transferred as high-temperature flue gas to the waste heat recovery system. In this system, the high-temperature flue gas flows sequentially through the water-cooled wall, economizer, evaporator, and superheater of the waste heat boiler, enabling efficient heat exchange with water or steam. During this process, water is gradually heated into saturated steam. It is then further converted into high-temperature, high-pressure superheated steam. This steam is known as the main steam. This main steam not only heats the primary and secondary air for the combustion system but also serves as the key driving force for the turbine generator in the steam power generation system. Therefore, the stable control of the MSWI MSFR is crucial for the stable operation of both the solid waste incineration and steam generation systems [27,28,29,30].

The MSFR is closely linked to the heat exchange processes in the solid waste combustion and waste heat boiler systems. In the combustion system, high-precision temperature sensors monitor the furnace temperature in real time, pressure sensors detect furnace pressure, and airflow sensors measure the speeds of primary and secondary air. These data serve as critical inputs for precise combustion control. The flow control valves for both air streams are adjusted in real time based on sensor feedback, while the grate speed motor modulates grate speed as needed. Similarly, in the waste heat recovery system, temperature and pressure sensors monitor the steam’s state in real time. Typically, a throttling device measures the MSFR. However, due to fluctuations in solid waste composition, environmental temperature changes, and the dynamic behavior of steam temperature, pressure, and flow rate, accurately measuring the MSFR is often challenging.

2.2. Problem Analysis

To effectively control the MSF, it is first essential to analyze the influencing factors from a mechanistic perspective. This involves considering the interactions among multiple variables and their dynamic variations to identify truly feasible and critical control variables [31]. The MSFR mechanism model established in reference [32] is shown in Equation (1). The MSFR is closely related to factors such as primary and secondary air.

y = \frac{(Q_{i n} - Q_{l o s s}) + \sqrt{{(Q_{i n} - Q_{l o s s})}^{2} - 4 (h_{s h} - h_{w s}) \cdot 5.82 {(S_{r a t e d})}^{0.62}}}{2 (h_{s h} - h_{w s})},

(1)

Q_{i n} = Q_{n c v} + Q_{a i r, I} + Q_{a i r, I I},

(2)

where

Q_{i n}

and

Q_{l o s s}

are the input heat and heat loss, respectively;

h_{s h}

and

h_{w s}

are the enthalpies of superheated steam and feedwater;

S_{r a t e d}

is rated MSFR;

Q_{n c v}

is the basic low-grade heat received;

Q_{a i r, I}

and

Q_{a i r, I I}

are the heat contributions from primary and secondary air; and

Q_{l o s s}

represents the total heat loss caused by physical heat loss from slag, incomplete combustion of solid materials, and flue gases in the incinerator.

Apart from airflow, the MSFR is also influenced by factors such as the boiler drum pressure, secondary air fan outlet pressure, economizer feedwater flow, and grate speed [11,33]. Firstly, the boiler drum pressure directly regulates the steam generation rate by altering the steam-water separation efficiency, steam storage conditions, and the phase change conditions of water, thus affecting the MSFR. Secondly, the secondary air fan outlet pressure determines the motive force for secondary air entering the furnace, which influences the oxygen supply during combustion and the airflow distribution. Thereby affecting the combustion intensity and stability, ultimately impacting the MSFR. The economizer feedwater flow, as the material basis for steam generation, not only determines the amount of water involved in steam production but also influences the system’s thermal balance, playing a crucial role. Additionally, the grate speed determines the residence time of the solid waste within the furnace, influencing the degree of combustion and the heat release process. Thus, the MSFR is primarily affected by the primary and secondary airflow rates, secondary air fan outlet pressure, boiler drum pressure, economizer feedwater flow, and grate speed.

Since abrupt changes in grate speed may cause shifts in the combustion zone and its distribution, affecting the stability of the combustion process, and given its slow response and complex impacts, it is typically not used as a direct control variable. However, changes in grate speed can comprehensively reflect information on the characteristics of solid waste, combustion conditions, and the system’s thermal balance, making it an important parameter for characterizing the operating conditions of municipal solid waste incineration (MSWI) [34]. Therefore, in this study, primary airflow rate, secondary airflow rate, secondary air fan outlet pressure, boiler drum pressure, and economizer feedwater flow are chosen as control variables.

The complex and variable composition of solid waste, coupled with drift in operating conditions, results in a highly coupled nonlinear relationship between the control variables and the MSFR, making it difficult to describe the relationship through mathematical models precisely. Hence, the relationship between the MSFR and its influencing factors is shown in Equation (3) and Figure 2, where

f (\cdot)

represents the complex nonlinear relationship describing the connection between the MSFR and six influencing factors. S₁, S₂, S₃, S₄, and S₅ are the control variables, and

Ω

represents the operating conditions as characterized by the grate speed.

y = f (S_{1}, S_{2}, S_{3}, S_{4}, S_{5}, Ω) .

(3)

3. Methodology

The objective of the MSFR control is shown in Equations (4) and (5). It is to maintain the steam flow within the target range required by the process and to bring it as close as possible to the target MSFR value.

\min \{|y - y^{*}|\},

(4)

- τ \leq y - y^{*} \leq τ, τ > 0,

(5)

where

y

represents the measured MSFR, and

y^{*}

represents the target MSFR. The absolute deviation between the target and the measured MSFR must satisfy the condition of not exceeding epsilon

τ

.

The achievement of this objective relies on the coordinated regulation of several key parameters, including primary airflow

S_{1}

, secondary airflow

S_{2}

, secondary air fan outlet pressure

S_{3}

, boiler drum pressure

S_{4}

, and economizer feedwater flow

S_{5}

. The grate speed serves as an indicator of changes in operating conditions. Since fluctuations in solid waste composition and changes in environmental temperature can lead to dynamic changes in operating conditions, the control system must not only focus on the loop’s tracking performance of setpoints but also investigate how to optimize the setpoints of key manipulated variables in real-time according to the changing operating conditions. To tackle this, this article proposes a two-layer intelligent control method that includes an optimization setting layer and a loop control layer, aiming to maintain the MSFR within the target range while ensuring the combustion process operates in an optimized state. Essentially, this problem is a multivariable nonlinear optimization issue in a dynamic environment [35].

Inspired by the application of the antlion optimizer to solve the nonlinear optimization problem of multivariable manipulated variables in voltage source converters [36]. The antlion optimizer is applied in the setpoint optimization layer [37]. On the other hand, the RBF technique is used to solve the nonlinear modeling problem of the alkali borosilicate glass transition temperature [38]. Given the nonlinear fitting capability of RBF neural networks to asymptotically approximate real values, which has been widely applied in modeling complex nonlinear industrial processes [39], the RBF technique is used to model the nonlinear relationship between the MSFR and its influencing factors. In the loop control system, the incorporation of reinforcement learning techniques is considered. Reinforcement learning, as a function approximation-based approach, has achieved significant success in solving complex tasks with high-dimensional state spaces and nonlinear control problems, such as robot control [40]. We explored integrating reinforcement learning into the design of the loop control system. A variable PI controller is developed to enhance the system’s adaptability, aiming at the complex operating conditions [41,42].

3.1. Intelligent Control Strategy

In consideration of the multivariable, nonlinear, and dynamic characteristics of the MSFR control problem within the MSWI system, this article presents a two-layer intelligent control method, as depicted in Figure 3. The method comprises a manipulated variables optimization setpoint layer and a loop control layer. There are three key modules: the MSFR prediction module based on the improved RBF, the manipulated variables optimization setpoints module based on the improved antlion algorithm, and the PI loop control system module based on reinforcement learning. Within the MSFR prediction module relying on the improved RBF, the inputs incorporate key manipulated variables (

S_{1} - S_{5}

) and the grate speed, which characterizes the operating conditions. This module utilizes RBF techniques for modeling and employs gradient descent algorithms to conduct self-learning of the model’s key parameters, ultimately yielding the predicted MSFR for the MSWI. The manipulated variables optimization setpoint module, relying on the improved antlion algorithm, uses the target MSFR as an input and optimizes the key manipulated variables (

S_{1} - S_{5}

) via the improved antlion algorithm. The PI MSFR control system module, which relies on reinforcement learning, constitutes a PI control system. It utilizes the grate speed, which represents operating conditions, and the optimized key manipulated variables (

S_{1} - S_{5}

) as inputs. This module applies reinforcement learning techniques. The module is designed to enable the adaptive adjustment of PI parameters by changing operating conditions, thus achieving more precise and efficient MSFR control.

3.2. Intelligent Control Algorithm

3.2.1. MSFR Prediction Module Based on the Improved RBF

(1): Model Structure

The MSFR prediction module utilizes a three-layer RBF neural network composed of an input layer, a hidden layer, and an output layer. The input layer comprises six nodes, which correspond to the key manipulated variables (

S_{1} - S_{5}

) and the grate speed (

Ω

), the latter of which characterizes the operating conditions. The output layer is composed of a single neuron that represents the predicted value of the MSFR (

\hat{y}

). The quantity of nodes in the hidden layer is determined by means of the OPTICS clustering algorithm. The nodes within the hidden layer employ a Gaussian radial basis function,

φ_{g} (x) = \exp (- \frac{{‖x - c_{g}‖}^{2}}{2 σ_{g}^{2}}),

(6)

\hat{y} = \sum_{g = 1}^{m} φ_{g} (x) ω_{g},

(7)

where

x = (u, Ω)

,

x \in R^{6}

, represents the inputs; m is the number of hidden layer nodes; and

c_{g}

is the center of the g-th radial basis function. The values of m and

c_{g}

are determined using the OPTICS clustering algorithm;

σ_{g}

is the width parameter of the neuron; and

ω_{g}

is the connection weight between the g-th hidden layer node and the output layer node, which are determined using the gradient descent algorithm. The network output is the predicted MSFR value

\hat{y}

.

(2): Determination of RBF hidden layer node number m and parameter $c_{g}$ based on the OPTICS clustering algorithm

The OPTICS clustering algorithm was chosen over DBSCAN and K-means++ because it can better handle data with varying densities, noise, and uncertain cluster numbers, and it does not require the number of clusters to be set in advance, offering greater flexibility and accuracy.

The Euclidean distance formula is used to calculate the distance metric between sample data points.

D_{i j} = d i s t (x_{i}, x_{j}) = \sqrt{\sum_{s = 1}^{6} {(x_{i s} - x_{j s})}^{2}},

(8)

where

x_{i}, x_{j} \in R^{6}

represent the inputs vectors of two sample data in a six-dimensional space; and

x_{i s}, x_{j s}

are the values of the s-th dimension.

The core distance is defined as follows:

d i s t_{c o r e} (x_{i}) = \min \{(ε | | ψ (x_{i}, ε) | \geq w)\},

(9)

ψ (x_{j}, ε) = {x_{j} | d i s t (x_{i}, x_{j}) \leq ε},

(10)

where

ε

represents the neighborhood range radius of the MSFR; and

w

is the number of MSFR samples which are within the range

ε

. The reachable distance is defined as follows:

d i s t_{r e a c h} (x_{o}, x_{j}) = \max (d i s t_{c o r e} (x_{j}), d i s t (x_{o}, x_{j})) .

(11)

The specific clustering process is shown in Figure 4 and is described as follows:

Step 1: Input the sample data of manipulated variables and operating conditions, initialize the reachable distances, and create an ordered queue O and a result queue R.
Step 2: Use Equation (8) to calculate the dataset D, and compute all the core distances using Equation (9).
Step 3: Select an object from D as the core object P, place it into R, and remove it from D. Use Equation (11) to calculate the reachable distance of other data points to the core object P, sort them by reachable distance, and add them to O.
Step 4: Select the data point with the smallest reachable distance from O as the new core object P, place it into R, and remove it from both O and D. Calculate the reachable distance of the remaining data points to the new core object, and update O.
Step 5: Check if O and D are empty. If not, repeat Step 4.
Step 6: Take $ε$ and, based on R, determine the number of clusters m.

After determining the number of clusters m, and the center points of the RBF, hidden layer neurons are determined using Equation (12).

c_{g} = \frac{1}{N} \sum_{j = 1}^{N} x_{j},

(12)

where N represents the total number of data points in the g-th cluster of MSFR, and

x_{j}

denotes the sample belonging to this cluster.

(3): Correction of RBF neural network parameters $σ_{g}$ and $ω_{g}$ based on the gradient descent algorithm.

The gradient descent algorithm is used to update the width parameter

σ_{g}

of the neurons and the output layer weights

ω_{g}

. The loss function is defined as follows:

E_{l o s s} = \frac{1}{2} \sum_{i = 1}^{N} {‖y_{i} - {\hat{y}}_{i}‖}^{2},

(13)

where

y_{i}

and

{\hat{y}}_{i}

represent the measured and predicted values of the MSFR for the i-th sample, respectively. The updated formulas for the neuron width parameter

σ_{g}

and the output layer weights

ω_{g}

are as follows:

σ_{g + 1} = σ_{g} - η \frac{\partial E_{l o s s}}{\partial σ_{g}},

(14)

ω_{g + 1} = ω_{g} - η \frac{\partial E_{l o s s}}{\partial ω_{g}},

(15)

- η \frac{\partial E_{l o s s}}{\partial ω_{g}} = η \sum_{i = 1}^{n} ({\hat{y}}_{i} - y_{i}) φ_{g} (‖x_{i} - c_{g}‖),

(16)

- η \frac{\partial E_{l o s s}}{\partial σ_{g}} = η \frac{ω_{g}}{σ_{g}^{3}} \sum_{i = 1}^{n} ({\hat{y}}_{i} - y_{i}) φ_{g} (‖x_{i} - c_{g}‖) {‖x_{i} - c_{g}‖}^{2},

(17)

where

η

denotes the learning rate. The update formulas are derived through the calculation of the partial derivatives of the loss function with respect to the neuron width parameter

σ_{g}

and the output layer weights

ω_{g}

. This enables the correction of the error function along the direction of the negative gradient, facilitating the iterative optimization of the parameters.

3.2.2. Optimization of Manipulated Variables Setpoint Based on the IALO

The optimization of the manipulated variables setpoint based on the IALO for the MSFR is designed to optimize the manipulated variables based on the input target value of the MSFR. The optimization is performed using the IALO algorithm, with the strategy diagram shown in Figure 5. The module randomly generates candidate solutions for the manipulated variables and their optimized counterparts. The MSFR prediction model is then used to predict the MSFR for each solution, from which the fitness is calculated. The antlion algorithm attracts ants toward the optimal manipulated variables solutions by constructing traps. The ants adjust their positions through random walks, while the elite antlions guide the search process. The solutions for both ants and antlions are merged and sorted, and continuously updated until the termination conditions are met. The optimized manipulated variables are ultimately output to ensure the MSFR reaches the target value. In the event of a change in the operating conditions, such as furnace grate speed, the manipulated variables will be re-optimized.

The objective function

f_{o b j}

and fitness function

f_{f i t}

for the optimization of the MSFR are presented by the equations as follows:

f_{f i t} = \frac{1}{1 + f_{o b j}},

(18)

f_{o b j} = \frac{\sum_{i = 1}^{100} \sqrt{{(y_{i} - {\hat{y}}_{i})}^{2}}}{100},

(19)

where

y_{i}

and

{\hat{y}}_{i}

represent, respectively, the measured and predicted values of the MSFR for the i-th sample. The value of

{\hat{y}}_{i}

is determined via the MSFR prediction model.

Table 1 shows the relationship between the improved antlion optimizer and the MSFR parameter optimization problem. The ants represent candidate solutions for the manipulated variables, the antlion represents the optimized solution for the manipulated variables, and the elite antlion represents the global optimized solution for the manipulated variables.

The antlion optimization algorithm draws inspiration from the predatory behavior of antlions in the natural realm and their interactions with ants. The algorithm involves a series of steps to find the optimal solution, including the random walk of ants, the construction of traps by the antlions, ants being trapped, the antlions capturing their prey, and the reconstruction of traps. The process is described as follows:

(1): The random walk of ants and the normalization process.

$H (l) = [0, c u m s u m (2 r (l_{1} - 1)), \dots, c u m s u m (2 r (l_{l} - 1)] .$

(20)

$r (l) = \{\begin{array}{l} 1 & i f & r a n d > 0.5 \\ 0 & i f & r a n d \leq 0.5 \end{array} .$

(21)

In the given equation, the cumulative $c u m s u m$ is calculated, and rand represents a random number uniformly distributed within the interval [0, 1].

$H_{s}^{l} = E [\frac{(H_{s}^{l} - a_{s}) \times (d_{s}^{l} - c_{s}^{l})}{(b_{s} - a_{s})} + c_{s}^{l}],$

(22)

where $a_{s}$ and $b_{s}$ represent, respectively, the minimum and maximum values of the random walk for the s-th dimension of the manipulated variables; $c_{s}^{l}$ and $d_{s}^{l}$ denote the minimum and maximum values of the manipulated variables in the l-th iteration for the s-th dimension; and E is the error between the predicted and target values of the MSFR. The five-dimensional manipulated variables ( $S_{1} - S_{5}$ ) are primary airflow, secondary airflow, secondary fan outlet air pressure, boiler steam drum pressure, and economizer feedwater flow.
Two improvements are proposed based on the antlion optimization algorithm to tackle the problems of premature convergence and getting trapped in local optima during the optimization of the MSFR manipulated variables. Firstly, an adaptive step-size factor is introduced during the random movement of the ants, which dynamically adjusts the search step size to enhance search efficiency and avoid premature convergence. Secondly, during the update of the ants’ positions, crossover and mutation operations from genetic algorithms are integrated to increase the diversity of the population. It can prevent the algorithm from converging to local optima and enhance its global search capability.
Improvement 1: The position update strategy is given by the following equation:

$H_{s}^{l} = E [\frac{(H_{s}^{l} - a_{s}) \times (d_{s}^{l} - c_{s}^{l})}{(b_{s} - a_{s})} + c_{s}^{l}] \times ξ,$

(23)

$ξ = \frac{L - l}{L},$

(24)

where L denotes the maximum number of iterations, and l represents the current iteration number.

$H^{l ’} = \{\begin{array}{l} H_{1 : s}^{l} \cup H_{s + 1 : 5}^{l - 1} & i f r a n d < 0.8 \\ X^{l} i f & r a n d \geq 0.8 \end{array} .$

(25)

$H_{s}^{l ″} = l b_{s} + (u b_{s} - l b_{s}) \cdot r a n d () .$

(26)

In the given equation, $u b_{s}$ and $l b_{s}$ represent the lower and upper bounds of the search space for the s-th dimension of the MSFR.
(2): The antlion constructs traps that affect the random movement of ants within the search domain. This effect can be represented as follows:

$\begin{array}{l} c_{s}^{l} = A n t l i o n_{s}^{l} + c^{l} \\ d_{s}^{l} = A n t l i o n_{s}^{l} + d^{l} \end{array},$

(27)

where $c_{s}^{l}$ and $d_{s}^{l}$ represent, respectively, the minimum and maximum values of the manipulated variables in the s-th dimension during the l-th iteration; and $A n t l i o n_{s}^{l}$ represents the position of the antlion in the s-th dimension during the l-th iteration.
(3): To simulate the process of ants approaching the antlions, the boundaries of the random movement should be gradually narrowed.

$\begin{array}{l} c^{l} = \frac{c^{l}}{I} \\ d^{l} = \frac{d^{l}}{I} \end{array} .$

(28)

In the given equation, $I = 1 + 10^{v} \frac{l}{L}$ , l represents the current iteration index, and L represents the maximum number of iterations, while V must adhere to the following condition:

$\{\begin{array}{l} v = 2 & l > 0.1 L \\ v = 3 & l > 0.5 L \\ v = 4 & l > 0.75 L \\ v = 5 & l > 0.9 L \\ v = 6 & l > 0.95 L \end{array} .$

(29)
(4): During the process of the antlion capturing prey, if the value of the objective function of an ant is superior to that of the selected antlion, the position of the antlion will be replaced by the most recent position of the captured ant, thereby enhancing the probability of capturing ants. The equation is presented as follows:

$A n t l i o n_{j}^{l} = A n t_{j}^{l} \begin{matrix} i f & f (A n t_{j}^{l}) \end{matrix} < f (A n t l i o n_{j}^{l}) .$

(30)

During the iteration process, the antlion with the highest fitness value in each generation is selected as the elite antlion. Additionally, another antlion is selected via the roulette wheel selection method. The random movements of all other ants are attracted to these two antlions. The equation is presented as follows:

${Ant}_{i}^{l} = \frac{H_{A}^{l} + H_{E}^{l}}{2},$

(31)

where $H_{A}^{l}$ and $H_{E}^{l}$ denote the positions of the ant after conducting a random movement, when the ant has to choose between the normal and elite antlion via the roulette wheel selection method during the l-th iteration.

3.2.3. PI Control System for MSFR Based on Reinforcement Learning

Traditional PI control systems perform suboptimally when confronted with strong nonlinearities and time-varying systems. In such cases, operators need sufficient experience to adjust the parameters manually. However, manual adjustments are time-consuming and often fail to achieve the desired results, especially in complex environments. To address this issue, this study adopts a reinforcement learning framework based on the actor–critic architecture and proposes a PI tuning method, as illustrated in Figure 6. The control system consists of three components: the actor (policy neural network), the critic (value neural network), and the Stochastic Action Modifier. Firstly, the actor (policy neural network) generates reference PI manipulated variables based on the setpoints of the manipulated variables and the grate speed provided by the optimization layer. Next, the critic (value neural network) computes the value function estimate based on the setpoints of the manipulated variables and the grate speed provided by the optimization layer. Finally, the Stochastic Action Modifier combines the reference PI values generated by the actor with the value function estimate produced by the critic to obtain the actual PI parameters for optimized system control. When the grate speed operating condition changes, the optimization layer adjusts the corresponding setpoints of the manipulated variables, and the reinforcement learning system updates the PI parameters according to variations in the grate speed and manipulated variable setpoints.

An incremental PI controller is designed as defined in Equation (32),

\begin{array}{l} u (k) & = u (k - 1) + Δ u (k) \\ = u (k - 1) + K_{I} (k) e (k) + K_{p} (k) Δ e (k) \end{array},

(32)

where

Δ e (k) = e (k) - e (k - 1)

and

e (k) = y_{s e t} (k) - y_{r} (k)

, where

y_{s e t} (k)

is the setpoint of the manipulated variables, and is the actual value of the manipulated variables.

The RBF network is used to approximate both the policy function of the Actor and the value function of the Critic [43]. The policy and value neural networks share the input and hidden layers of the RBF network, which not only reduces memory requirements but also enhances learning efficiency. The AC learning structure based on RBF is illustrated in Figure 7.

(1): Determination of PI values based on the RBF neural network

The three inputs of the RBF are

e (k)

,

Δ e (k)

and

Ω

, as shown in Equation (33).

X (k) = [e (k), Δ e (k), Ω]^{T} .

(33)

The radial basis function of the hidden layer adopts the Gaussian kernel function as follows:

ϕ_{h} (k) = \exp (- \frac{{‖X (k) - c_{h} (k)‖}^{2}}{2 σ_{h}^{2} (k)})

(34)

The output layer has 3 nodes, and the outputs are

K_{P}^{'} (k)

,

K_{I}^{'} (k)

, and the value function of the value neural network. The mathematical expression is as follows:

K_{P, I}^{'} (k) = \sum_{h = 1}^{m} ω_{h}^{P, I} (k) ϕ_{h} (k),

(35)

V (k) = \sum_{h = 1}^{m} v_{h} (k) ϕ_{h} (k),

(36)

where

v_{h} (t)

represents the weight between the h-th hidden unit and the critic output layer, and

ω_{h}^{P, I}

represents the weight between the h-th hidden unit and the output layer of the actor.

To resolve the exploration–exploitation dilemma, the outputs from the Actor are not directly passed to the PI controller. Instead, a Gaussian noise term

n_{k} (0, ς_{V} (k))

is added to the recommended PI parameters

K_{P, I}^{'} (k)

coming from the Actor. Thus, the actual PI parameters are modified as shown in Equation (37). The magnitude of the Gaussian noise depends on

V (k)

. When

V (k)

is large, the noise term is small, and vice versa.

K_{P, I} (k) = K_{P, I}^{'} (k) + n_{k} (0, ς_{V} (k)),

(37)

where

ς_{V} (k) = \frac{1}{1 + \exp (2 V (k))}

.

(2): Parameter update method for $ω_{h}^{P, I}$ , $v_{h}$ , $c_{h}$ , and $σ_{h}$ based on the gradient descent algorithm

The calculation of the error is as follows, using the temporal difference algorithm:

δ_{T D} (k) = R (k) + γ V (k) - V (k - 1),

(38)

where

V (k)

represents the value function, and

R (k)

is the reward at k-th time;

γ

denotes the discount factor. The reward R(k) is defined as follows:

R_{s_{i}} (k) = 0.5 {[y^{*} (k) - y (k)]}^{2} - 0.1 (|Δ K_{P}^{s_{i}}| + |Δ K_{I}^{s_{i}}|) - 0.6 \cdot |e_{s_{i}}|,

(39)

where

Δ K_{P}^{s_{i}}

and

Δ K_{I}^{s_{i}}

are the change proportional and integral gain of each manipulated variable at the current moment k;

e_{s_{i}}

is the error value of each manipulated variable; and

y^{*} (k)

and

y (k)

are, respectively, the target value and measured value of the MSFR. The weights are updated using Equations (40) and (41),

ω_{h}^{P, I} (k + 1) = ω_{h}^{P, I} (k) + η^{'} δ_{TD} (k) \frac{K_{P, I} (k) - K_{P, I}^{'} (k)}{σ_{V} (k)} ϕ_{h} (k),

(40)

v_{h} (k + 1) = v_{h} (k) - η^{″} δ_{T D} (k) ϕ_{h} (k),

(41)

where

η^{'}

and

η^{″}

are the learning rates of the policy neural network and the value neural network, respectively. The center

c_{h}

and center width

σ_{h}

of the neurons are updated using Equations (42) and (43),

c_{h} (k + 1) = c_{h} (k) + η_{c} δ_{T D} v_{h} (k) ϕ_{h} (k) \frac{X (k) - c_{h} (k)}{σ_{h}^{2} (k)},

(42)

σ_{h} (k + 1) = σ_{h} (k) + η_{σ} δ_{T D} v_{h} (k) ϕ_{h} (k) \frac{{‖X (k) - c_{h} (k)‖}^{2}}{σ_{h}^{3} (k)} .

(43)

3.3. Realization Steps of the Intelligent Control Method for MSFR

As shown in Figure 8, the steps for implementing the two-layer intelligent control method for the MSFR are as follows:

Step 1: Parameter Initialization: The ant population, antlion population, maximum number of iterations, and search range are initialized, and an initial solution is randomly generated within the search range.

Step 2: Collect the measured data, including the grate speed, the actual control values, parameters, and the measured MSFR.

Step 3: Identify whether the working operating conditions have changed. If the grate speed changes, re-execute the optimization setting layer at once. Otherwise, execute the optimization setting layer periodically.

Step 4: Output the optimized setting values of the manipulated variables and send them to the loop control layer to execute.

Step 5: Compute the error between the target and the actual manipulated variables.

Step 6: The loop control layer tunes the PI values of the loop control system according to Equations (35)–(37) and implements the optimized setting values of the manipulated variables based on Equation (32).

Step 7: Drive the actuator to implement the control quantities.

Step 8: Store the current data into the historical database, and update the learn parameters of the MSFR prediction model adaptively.

Step 9: Update the parameters of the intelligent control method.

Step 10: Is the historical data updated periodically? If yes, update the structure of the steam flow prediction model. If no, back to step 2.

Figure 8. The steps for implementing the two-layer intelligent control method.

4. Experiments and Analysis

In this section, we designed three experiments to validate the efficacy of the intelligent control approach for the MSFR presented in this article. All the experiments were conducted using on-site data collected from the DCS system of a forward-moving grate furnace in a waste incineration plant.

4.1. Verification of the MSFR Prediction Model Based on RBF

In this study, a total of 5000 sets of experimental data were used. The range of sample variations in the manipulated variables (primary and secondary airflow, secondary fan outlet air pressure, boiler drum pressure, and economizer feedwater flow) is as follows:

S_{1} \in [55, 75]

,

S_{2} \in [0, 11.4]

,

S_{3} \in [0, 0, 5]

,

S_{4} \in [4.4, 4.6]

and

S_{5} \in [30.5, 35.3]

, and grate speed variations range is

Ω \in [25, 60]

. We took 3500 sets (70%) of the data as the training set for the MSFR prediction model, and 750 sets (15%) were used as the validation set. Another 750 sets (15%) were taken as the test set for the MSFR prediction model.

The OPTICS clustering algorithm uses six input variables of the MSWI system (primary airflow, secondary airflow, economizer feedwater flow, boiler drum pressure, secondary fan outlet air pressure, and grate speed under operating conditions) for clustering. The optimal combination of minimum sample size and neighborhood radius is determined through orthogonal experiments, with each parameter set at 5 levels. The minimum sample size is set at five levels: 4, 8, 12, 16, and 20. The initial neighborhood radius is set to infinity, and based on the reachable distance results obtained from the experiments, the neighborhood radius is divided into five levels: 0.11, 0.19, 0.27, 0.35, and 0.43. Table 2 shows the number of clusters for each parameter combination.

The neuron centers for each group of experiments are calculated using Equation (12). The model is trained using the training set data, with RMSE and MAE as performance evaluation metrics to determine the optimal parameter combination. Figure 9 shows the RMSE and MAE for each experimental combination. As can be seen from the figure, the model performs best when

M i n P t s = 4, ε = 0.27

. Under this condition, Figure 10 shows the reachable distance sorting chart, while the corresponding neuron centers are listed in Table 3.

To verify the effectiveness of the forecasting model, it is compared with the widely used BP neural network, RBF neural network, maximum matrix element (MME) RBF neural network, and K-means clustering-based RBF neural network models. The root mean square error is selected as the evaluation metric for the models. The training RMSE of each model is shown in Figure 11 and Table 4. As can be seen, the proposed OPTICS-RBF neural network outperforms the other four neural networks in terms of faster convergence and higher accuracy.

Experiments were conducted on the test set. Figure 12 and Table 5 present the errors between the predicted and actual values of the MSFR for five forecasting models on the test set. Figure 13 further visualizes the error distribution using a box plot. The results show that the OPTICS-RBF neural network has the smallest error range, which falls between [−0.420, 0.429].

To further validate the effectiveness of the proposed MSFR forecasting model, Figure 14 shows the autocorrelation function of the normalized deviation between the predicted MSFR and the actual MSFR. From the figure, it can be observed that the autocorrelation coefficients of the deviation sequence mostly fall within the 95% confidence interval, indicating that the residual sequence can be considered as white noise.

4.2. Validation of the Intelligent Control Method

To verify the effectiveness of the IALO, it is compared with the widely used Particle Swarm Optimization (PSO) algorithm and Artificial Bee Colony (ABC) optimization algorithm. First, comparative tests are performed on the standard test functions shown in Table 6, where F5 is a unimodal benchmark function and F12 is a multimodal benchmark function. A total of 50 candidate solutions are set, and the number of iterations is 1000. For the unimodal benchmark function F5, the fitness change curve in Figure 15 shows that the ALO achieves a fitness value of 5.2827 at 957 iterations, PSO reaches a fitness value of 90.5459 at 988 iterations, ABC achieves a fitness value of 10.4729 at 922 iterations, and IAO achieves a fitness value of 1.9508 at 410 iterations. For the multimodal benchmark function F12, the fitness change curve in Figure 16 shows that ALO achieves a fitness value of 2.19 × 10⁻⁷ at 912 iterations, PSO reaches a fitness value of 0.0024 at 449 iterations, ABC achieves a fitness value of 2.12 × 10⁻⁵ at 872 iterations, and IALO achieves a fitness value of 4.69 × 10⁻¹¹ at 974 iterations. From the test results in Table 7, it can be concluded that the IALO proposed in this article exhibits better convergence and higher accuracy compared to ALO, PSO, and ABC.

When validating the optimization setting of the manipulated variables based on IALO, the maximum number of iterations is set to 300, and the ant population is set to 40. Also, the antlion population is set to 40. The target MSFR value is 79. The search ranges of the primary airflow, secondary airflow, secondary fan outlet air pressure, boiler drum pressure, and economizer feedwater flow are

[55, 75]

,

[0, 11.4]

,

[0, 0, 5]

,

[4.4, 4.6]

and

[30.5, 35.3]

. The optimization process is shown in Figure 17. After 243 iterations, the target values are found, where the corresponding parameters are primary airflow of 59.8, secondary airflow of 0.622, secondary fan outlet air pressure of 0.158, boiler steam drum pressure of 4.564, and economizer feedwater flow of 32.558.

4.3. Validation of the Effectiveness of the Intelligent Control Method Under Varying Operating Conditions

The purpose of this experiment is to verify whether the intelligent control method can effectively control the MSFR when operating conditions change.

First, the optimal hyperparameter combination for each loop of the reinforcement learning control system is determined through orthogonal experiments. Based on experience, the learning rate is set at three levels: 0.001, 0.01, and 0.02; the discount factor is set at three levels: 0.90, 0.95, and 0.99. The number of hidden layer center nodes and their positions in the RBF network are determined using the OPTICS algorithm. This algorithm performs clustering analysis on the input data, effectively identifying the appropriate center points and the number of nodes. Based on the optimization results from the OPTICS algorithm, the network’s topology is determined to be 3-8-4. The experimental design is shown in Table 8, using the standard L₍₂₇₎(3¹³) orthogonal table for combination design. A total of 100 groups are randomly selected from the dataset for the experiment, and the ISE and IAE are used as evaluation metrics. Figure 18 shows the experimental results of various hyperparameter combinations for the primary airflow PI control system. It can be observed that Experiment Group 11 performs the best. Table 9 presents the optimal hyperparameter combinations for all sub-loop PI control systems obtained through the orthogonal experiment.

Figure 19 shows that when the grate speed changes, the optimization setting module of manipulated variables based on IALO can recognize the change in operating conditions and automatically updates the setting values of the manipulated variables (

S_{1} - S_{5}

) to better fit the changed conditions. The updated setting values are then sent to the reinforcement learning-based PI MSFR control system to execute the new setting values.

The loop control system automatically tunes the PI parameters according to the changed operating conditions, enabling rapid tracking of the updated manipulated variables settings. Figure 20 shows the process of adaptive adjustment of PI parameters in the AC-PI control system, and Table 10 lists the PI setpoints of the PI control system. The actual manipulated variables of the control system are then sent to the object model of the MSFR in the MSWI process [44], and the controlled MSFR results are shown in Figure 21. It can be seen that, compared to traditional PI methods, the intelligent control method proposed in this article can control the MSFR within the target range.

Table 11 shows the compared results with traditional PID methods, such as QDRNN-PID from the literature [44]. The comparison is evaluated based on integral square error (ISE), absolute error integral (AIE), maximum deviation, average quantization overshoot, average rise time, and average response time. It can be seen that the proposed method demonstrates good transient response, stability, and anti-interference ability, showing superior performance.

4.4. Comprehensive Analysis

The experimental results show that when the grate speed changes, indicating a shift in the operating conditions of the solid waste incineration process, the proposed method effectively detects these changes and re-optimizes the key control variables (such as primary airflow, secondary airflow, secondary fan outlet air pressure, boiler drum pressure, and economizer feedwater flow). By employing the IALO method, the system can rapidly optimize the setpoints of the control variables based on the target MSFR, ensuring alignment with the current operating conditions. The loop control layer then implements the new optimized setpoints. In the loop control layer, the PI controller parameters can be dynamically adjusted in response to changing operating conditions. Experimental results demonstrate that when the working operating conditions change, the optimized setpoints are recalibrated accordingly, and the control loop can quickly track the new setpoints, thereby maintaining the MSFR within the target range.

5. Conclusions

The MSFR in the MSWI process is one of the key factors for the stable operation of the incineration system. Due to its strong coupling, nonlinearity, and multivariable nature, effectively controlling the MSFR presents significant challenges. To address the control needs of the MSWI process’s MSFR, a two-layer intelligent control method is proposed, consisting of an optimization setting layer and a loop control layer. This method can adaptively adjust the setpoints of key manipulated variables under varying operating conditions. The loop control system then implements these setpoints, ultimately ensuring that the MSFR is controlled within the target range. Simulations based on industrial sample data collected from the field were conducted to verify the effectiveness of the proposed method.

Although this study has made progress in the control strategy for MSFR, several issues require further attention. For example, the two-layer structure control method used in this paper helps alleviate the computational burden by distributing algorithms across different devices. However, it inevitably increases computational costs. The system is also highly sensitive to sensor noise, which can interfere with data collection and affect control decisions. This issue is particularly significant in developing countries, where computational resources are limited and may become more pronounced.

Additionally, there is a lack of systematic research on the trade-offs between energy recovery and toxic emissions (e.g., dioxins). This gap arises mainly from the asymmetry of monitoring data—high-frequency energy recovery data and low-frequency dioxin sampling data are difficult to correlate effectively.

Future research will involve empirical experiments in collaboration with industry partners. The goal is to develop a multi-objective optimization model that provides innovative solutions for achieving energy-environmental synergy. Moreover, another important aspect is balancing MSFR stability with emission control. Future studies could focus on developing a multi-objective function that integrates both MSFR stability and emission control. This approach would optimize both the control accuracy of the MSFR and emission control, offering a more comprehensive and effective solution.

Author Contributions

Conceptualization, J.L.; Methodology, J.P.; Validation, J.L.; Formal analysis, J.H.; Investigation, J.L.; Resources, J.H.; Data curation, J.L.; Writing—original draft, J.P. and J.L.; Writing—review & editing, J.P. and J.T.; Supervision, J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

Variable	Variable Name
$S_{1}$	optimal setpoint of the primary airflow
$S_{2}$	optimal setpoint of the secondary airflow
$S_{3}$	optimal setpoint of the secondary fan outlet air pressure
$S_{4}$	optimal setpoint of the boiler drum pressure
$S_{5}$	optimal setpoint of the economizer feedwater flow
$Ω$	grate speed
$y$	measured actual value of main steam flow rate
$\hat{y}$	prediction value of main steam flow rate
$y^{*}$	target value of main steam flow
$U_{s p}$	setpoints of the manipulated variables
$u$	actual value of the manipulated variables
$Q_{in}$	input heat
$Q_{loss}$	heat loss
$h_{sh}$	enthalpies of superheated steam
$h_{ws}$	enthalpies of superheated feedwater
$S_{r a t e d}$	rated main steam flow
$Q_{air, I}$	heat contributions from primary air
$Q_{air, II}$	heat contributions from secondary air
$Q_{ncv}$	basic low-grade heat received
$x$	RBF neural network and OPTICS input
$m$	number of hidden layer nodes
$η$	learning rate of the RBF neural network in the forecasting model
$c_{g}$	the center of the g-th radial basis function
$ω_{g}$	connection weights between the g-th hidden layer node and the output layer node
$σ_{g}$	width parameter of the g-th neuron
$ε$	neighborhood range radius of the MSFR
$w$	number of MSFR samples that are within the range $ε$
$d i s t_{core}$	core distance of the two data points
$d i s t_{reach}$	the reachable distance between the two data points
$O$	ordered queue
$R$	result queue
$P$	core object
$D$	Euclidean distance set of pairwise data points
$E_{loss}$	loss function
$f_{obj}$	objective function value
$f_{fit}$	fitness function value
$a_{s}$	minimum values of the random walk for the s-th dimension of the manipulated variables
$b_{s}$	maximum values of the random walk for the s-th dimension of the manipulated variables
$c_{s}^{l}$	minimum values of the manipulated variables in the l-th iteration for the s-th dimension
$d_{s}^{l}$	maximum values of the manipulated variables in the l-th iteration for the s-th dimension
$L$	maximum number of iterations
$l$	current iteration number
$ξ$	adaptive step size factor
$H_{s}^{l}$	position of the ant in the l-th iteration
$H_{s}^{l^{'}}$	position of the ant after crossing in the l-th iteration
$H_{s}^{l^{″}}$	position of the ant after mutation in the l-th iteration
$H_{1 : s}^{l}$	position of the ant from 1 to S dimensions in the l-th iteration
$H_{s : 5}^{l - 1}$	position of the ant from s to 5 dimensions in the (l-1)-th iteration
$u b_{s}$	lower bounds of the search space for the s-th dimension of the manipulated variables
$l b_{s}$	upper bounds of the search space for the s-th dimension of the manipulated variables
$H_{A}^{l}$	the ant moves around the position of the antlion in the l-th iteration
$H_{E}^{l}$	the ant moves around the position of the elite antlion in the l-th iteration
$A n t_{i}^{l}$	the position of the i-th ant in the l-th iteration
$e (k)$	manipulated variable error
$Δ e (k)$	first-order difference in the manipulated variable error
$c_{h}$	the center of the h-th radial basis function
$v_{h}$	weight between the h-th hidden unit and the critic output layer
$ω_{h}^{P, I}$	weight between the h-th hidden unit and the output layer of the actor
$σ_{h}$	width parameter of the h-th neuron
$K_{P, I}^{'} (k)$	reference PI values
$Δ K_{P}^{s_{i}} (k)$	the change proportional gain of k at time
$Δ K_{I}^{s_{i}} (k)$	the change integral gain of k at time
$K_{P, I} (k)$	PI parameters
$V (k)$	value function
$n (k)$	Gaussian noise term
$η^{'}$	learning rate of the action neural network
$η^{″}$	learning rate of the action neural network
$η_{c}$	learning rate for updating the neuron center in the gradient descent algorithm
$η_{σ}$	learning rate for updating the neuron width in the gradient descent algorithm
$γ$	denotes the discount factor
$δ_{TD}$	TD error
$R_{s_{i}} (k)$	manipulated variable reward at k-th time

References

Ali, K.; Kausar, N.; Amir, M. Impact of pollution prevention strategies on environment sustainability: Role of environmental management accounting and environmental proactivity. Environ. Sci. Pollut. Res. 2023, 30, 88891–88904. [Google Scholar] [CrossRef] [PubMed]
Kumar, A.; Singh, E.; Mishra, R.; Lo, S.L.; Kumar, S. Global trends in municipal solid waste treatment technologies through the lens of sustainable energy development opportunity. Energy 2023, 275, 127471. [Google Scholar] [CrossRef]
Tang, L.; Guo, J.; Wan, R.; Jia, M.; Qu, J.; Li, L.; Bo, X. Air pollutant emissions and reduction potentials from municipal solid waste incineration in China. Environ. Pollut. 2023, 319, 121021. [Google Scholar] [CrossRef] [PubMed]
Adibimanesh, B.; Polesek-Karczewska, S.; Bagherzadeh, F.; Szczuko, P.; Shafighfard, T. Energy consumption optimization in wastewater treatment plants: Machine learning for monitoring incineration of sewage sludge. Sustain. Energy Technol. Assess. 2023, 56, 103040. [Google Scholar] [CrossRef]
Zhao, Z.; Liu, S.; Wang, J.; Wei, Q.; Xu, H. Dynamic Modeling and Multi-objective Optimization of Waste Incinerator Combustion Process Based on ISSA-ELM-NARMAX. J. Chin. Soc. Power Eng. 2024, 44, 1264–1271. [Google Scholar]
Yamada, T.; Asari, M.; Miura, T.; Niijima, T.; Yano, J.; Sakai, S.I. Municipal solid waste composition and food loss reduction in Kyoto City. J. Mater. Cycles Waste Manag. 2017, 19, 1351–1360. [Google Scholar] [CrossRef]
Lee, D.S.; Lee, S.T.; Chen, Y.T.; Su, P.Y. Artificial intelligence technique development for energy-efficient waste-to-energy: A case study of an incineration plant. Case Stud. Therm. Eng. 2024, 61, 105071. [Google Scholar] [CrossRef]
Ozgen, S.; Wu, A.; Ruiz, F. Modeling approaches for data-driven model predictive control of acid gases in waste-to-energy plants. Waste Manag. 2025, 204, 114902. [Google Scholar] [CrossRef]
Vilardi, G.; Verdone, N. Exergy analysis of municipal solid waste incineration processes: The use of O2-enriched air and the oxy-combustion process. Energy 2022, 239, 122147. [Google Scholar] [CrossRef]
Rospawan, A.; Tsai, C.C.; Hung, C.C. Two-layer Intelligent Learning Control Using Output Recurrent Fuzzy Neural LSTM-BLS with RMSprop. IEEE Access 2025, 13, 34334–34349. [Google Scholar] [CrossRef]
Bardi, S.; Astolfi, A. Modeling and control of a waste-to-energy plant. IEEE Control Syst. Mag. 2010, 30, 27–37. [Google Scholar] [CrossRef]
Giantomassi, A.; Ippoliti, G.; Longhi, S.; Bertini, I.; Pizzuti, S. On-line steam production prediction for a municipal solid waste incinerator by fully tuned minimal RBF neural networks. J. Process Control 2011, 21, 164–172. [Google Scholar] [CrossRef]
Wang, Q.; Gong, D.; Huang, Z.; Peng, C.; Chen, J.; Luo, J.; Zhu, J.; Luo, Q. Simulation Study on the Stabilization Control Performance of Low-Calorific-Value Waste Combustion in Waste Incinerators. ACS Omega 2024, 9, 48586–48597. [Google Scholar] [CrossRef] [PubMed]
Dong, J.; Liu, W.; Zhong, B.; Yu, Z.; Zhang, X.; Zou, Q.; Ma, X. Initial steam flow prediction and control optimization for waste incineration plants based on machine learning. Energy Convers. Manag. 2023, 281, 116358. [Google Scholar] [CrossRef]
Sun, J.; Meng, X.; Qiao, J.F. Soft Sensor Method of Main Steam Flow Based on MIV-RBF Neural Network. Control Eng. 2022, 29, 1829–1836. [Google Scholar] [CrossRef]
Chai, T.Y.; Zhou, Z.; Zheng, R.; Liu, N.; Jia, Y. PID tuning intelligent system based on end-edge-cloud collaboration. Acta Autom. Sin. 2023, 49, 514–527. [Google Scholar] [CrossRef]
Chai, T.Y.; Zhou, Z.; Cheng, S.; Jia, Y.; Song, Y. Industrial metaverse-based intelligent PID optimal tuning system for complex industrial processes. IEEE Trans. Cybern. 2024, 49, 514–527. [Google Scholar] [CrossRef]
He, H.; Meng, X.; Tang, J.; Cui, C.L.; Qiao, J.F. ET–RBF–PID-based control method for furnace temperature of municipal waste incineration process. Control Theory Appl. 2022, 39, 2262–2273. [Google Scholar]
Xia, H.; Tang, J.; Qiao, J.F. Soft sensing method of dioxin emission in municipal solid waste incineration process based on broad hybrid forest regression. Acta Autom. Sin. 2023, 49, 343–365. [Google Scholar] [CrossRef]
Ding, H.X.; Qiao, J.F.; Huang, W.M.; Yu, T. Event-triggered fuzzy neural multivariable control for a municipal solid waste incineration process. Sci. China Technol. Sci. 2023, 66, 3115–3128. [Google Scholar] [CrossRef]
Cui, Y.; Meng, X.; Qiao, J.F. Multi-condition operational optimization with adaptive knowledge transfer for municipal solid waste incineration process. Expert Syst. Appl. 2024, 238, 121783. [Google Scholar] [CrossRef]
Lin, X.; Wang, R.; Wen, C.; Chen, J.; Huang, Q.; Li, X.; Yan, J. Collaborative prediction and intelligent control of multiple pollutants emission from a large-scale waste incinerator. J. Environ. Manag. 2025, 379, 124874. [Google Scholar] [CrossRef] [PubMed]
Chen, L.; Wang, C.; Zhong, R.; Li, Z.; Zhao, Z.; Zhou, Z. Prediction of Main Parameters of Steam in Waste Incinerators Based on BAS-SVM. Sustainability 2023, 15, 1132. [Google Scholar] [CrossRef]
Hultgren, M.; Ikonen, E.; Kovács, J. Once-through circulating fluidized bed boiler control design with the dynamic relative gain array and partial relative gain. Ind. Eng. Chem. Res. 2017, 56, 14290–14303. [Google Scholar] [CrossRef]
Huang, W.; Meng, X.; Qiao, J.F. Cooperative Dynamic Multiobjective Optimization With Multi-Time-Scale for MSWI Process. IEEE Trans. Ind. Electron. 2025, 1–13. [Google Scholar] [CrossRef]
Pour, F.K.; Segovia, P.; Duviella, E.; Puig, V. A two-layer control architecture for operational management and hydroelectricity production maximization in inland waterways using model predictive control. Control Eng. Pract. 2022, 124, 105172. [Google Scholar] [CrossRef]
Yao, Z.; Romero, C.; Baltrusaitis, J. Combustion optimization of a coal-fired power plant boiler using artificial intelligence neural networks. Fuel 2023, 344, 128145. [Google Scholar] [CrossRef]
Trindade, A.B.; Palacio, J.C.E.; González, A.M.; Orozco, D.J.R.; Lora, E.E.S.; Renó, M.L.G. Advanced exergy analysis and environmental assessment of the steam cycle of an incineration system of municipal solid waste with energy recovery. Energy Convers. Manag. 2018, 157, 195–214. [Google Scholar] [CrossRef]
Hou, G.; Gong, L.; Wang, M.; Yu, X.; Yang, Z.; Mou, X. A novel linear active disturbance rejection controller for main steam temperature control based on the simultaneous heat transfer search. ISA Trans. 2022, 122, 357–370. [Google Scholar] [CrossRef]
Tugov, A.N. Municipal solid wastes-to-energy conversion: Global and domestic experience. Therm. Eng. 2022, 69, 909–924. [Google Scholar] [CrossRef]
Ding, X.H.; Luo, B.; Zhou, H.T.; Chen, Y.H. Generalized solutions for advection–dispersion transport equations subject to time-and space-dependent internal and boundary sources. Comput. Geotech. 2025, 178, 106944. [Google Scholar] [CrossRef]
Ding, H.; Tang, J.; Qiao, J.F. Dynamic modeling of multi-input and multi-output controlled object for municipal solid waste incineration process. Appl. Energy 2023, 339, 120982. [Google Scholar] [CrossRef]
Magnanelli, E.; Tranas, O.L.; Carlsson, P.; Mosby, J.; Becidan, M. Dynamic modeling of municipal solid waste incineration. Energy 2020, 209, 118426. [Google Scholar] [CrossRef]
Liu, J.; Xie, Z.; Guo, B.; Xu, Y.; Wang, Q.; Guo, X.; Bai, L.; Long, J. The effect of air distribution on the characteristics of waste combustion and NO generation in a grate incinerator. J. Energy Inst. 2024, 117, 101827. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, K.; Hao, Y.; Yao, Y. Short-term wind power prediction using a novel model based on butterfly optimization algorithm-variational mode decomposition-long short-term memory. Appl. Energy 2024, 366, 123313. [Google Scholar] [CrossRef]
Khan, M.A.; Yousaf, M.Z.; Khalid, S.; Fashihi, D.; Bokhari, S.A.H.; Insafmal, B.K.; Abbas, G. Applying Ant Lion Optimization Technique to Enhance Power Converters Performance via Effective Controller Tuning. In Proceedings of the 2023 2nd International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE), Lahore, Pakistan, 27–29 November 2023; pp. 1–5. [Google Scholar]
Mirjalili, S. The ant lion optimize. Adv. Eng. Softw. 2015, 83, 80–98. [Google Scholar] [CrossRef]
Dos Santos Vitoria, L.; Cassar, D.R.; de Souza Lalic, S.; Nascimento, M.L.F. Using a simple radial basis function neural network to predict the glass transition temperature of alkali borate glasses. J. Non-Cryst. Solids 2024, 629, 122870. [Google Scholar] [CrossRef]
Gao, Y.; Zhang, X.; Gao, Y.; Shang, T.; Yang, Q. Control of Underwater Manipulator Based on Neural Network and Fuzzy Compensation. Comput. Eng. Appl. 2022, 58, 317–323. [Google Scholar]
Knights, V.A.; Petrovska, O.; Kljusurić, J.G. Nonlinear Dynamics and Machine Learning for Robotic Control Systems in IoT Applications. Future Internet 2024, 16, 435. [Google Scholar] [CrossRef]
Singh, B.; Kumar, R.; Singh, V.P. Reinforcement learning in robotic applications: A comprehensive survey. Artif. Intell. Rev. 2022, 55, 945–990. [Google Scholar] [CrossRef]
Guan, Z.; Yamamoto, T. Design of a reinforcement learning PID controller. IEEJ Trans. Electr. Electron. Eng. 2021, 16, 1354–1360. [Google Scholar] [CrossRef]
Zamfirache, I.A.; Precup, R.E.; Roman, R.C.; Petriu, E.M. Neural n work-based control using actor-critic reinforcement learning and grey wolf optimizer with experimental servo system validation. Expert Syst. Appl. 2023, 225, 120112. [Google Scholar] [CrossRef]
Ding, H.; Tang, J.; Qiao, J. MIMO modeling and multi-loop control based on neural network for municipal solid waste incineration. Control Eng. Pract. 2022, 127, 105280. [Google Scholar] [CrossRef]

Figure 1. MSWI process flow diagram.

Figure 2. Schematic diagram of influencing factors of the MSFR.

Figure 3. MSFR intelligent control strategy diagram.

Figure 4. OPTICS algorithm clustering process diagram.

Figure 5. Optimization of the manipulated variables setpoints module of the MSFR.

Figure 6. PI control system of the MSFR based on reinforcement learning.

Figure 7. AC learning structure for RBF.

Figure 9. The RMSE of each experimental combination.

Figure 10. The reachable distance of the manipulated variable samples.

Figure 11. Comparison chart of five network prediction models.

Figure 12. Comparison when model testing.

Figure 13. Comparison when model testing.

Figure 14. Autocorrelation distribution of prediction deviation of the MSFR.

Figure 15. Comparison of fitness values performed on unimodal reference function F5.

Figure 16. Comparison of fitness values performed on the unimodal reference function F12.

Figure 17. Optimization setting process of the manipulated variables.

Figure 18. Experimental results of the hyperparameter combinations for the primary airflow PI control system.

Figure 19. Manipulated variables tracking results.

Figure 20. The self-tuning process of PI parameters in the AC-PI control system.

Figure 21. MSFR tracking results plot and error plot.

Table 1. The relationship between the improved antlion algorithm and the optimization of MSFR.

Parameters in the Ant-Lion Algorithm	Manipulated Variables Optimization Problem
Ant	A candidate solution for manipulated variables
Ant population	The set of candidate solutions for manipulated variables
Antlion	One round of iterative manipulated variables optimization solution
Antlion population	Manipulated variables optimization solution set
The movement of ants The hunting behavior of antlions Elite antlion	Search for the optimization solution process of manipulated variables Update the optimization solution of the manipulated variables The optimized solution of manipulated variables

Table 2. The number of clusters under each parameter combination.

Experiment Number	$M i n P t s$	$ε$	Number of Clusters
1	4	0.11	13
2	4	0.19	22
3	4	0.27	17
⋮	⋮	⋮	⋮
24	20	0.35	32
25	20	0.43	27

Table 3. The MSFR RBF neural network

c_{g}

.

Table 3. The MSFR RBF neural network

c_{g}

.

	Primary Airflow $S_{1}$	Secondary Airflow $S_{2}$	Secondary Air Outlet Air Pressure $S_{3}$	Boiler Drum Pressure $S_{4}$	Economizer Feedwater Flow $S_{5}$	Grate Speed $Ω$
1	55.528	1.234	0.135	4.548	33.108	40
2	58.334	1.317	0.132	4.554	33.488	40
3	63.737	0.750	0.061	4.561	33.667	40
⋮	⋮	⋮	⋮	⋮	⋮	⋮
16	61.723	9.957	0.404	4.593	33.197	60
17	70.873	10.220	0.401	4.574	33.636	60

Table 4. Comparison of five network forecasting models.

Model	RMSE (t/h)	MAE (t/h)	Model Structure
BP Neural Network	0.7903	0.6319	6-17-1
RBF Neural Network	0.5208	0.4015	6-17-1
MME-RBF Neural Network	0.4143	0.3287	6-34-1
K-means-RBF Neural Network	0.2656	0.2231	6-21-1
OPTICS-RBF Neural Network	0.1380	0.1009	6-17-1

Table 5. Comparison of the prediction error for the MSFR.

	Error Range (t/h)	RMSE (t/h)	MAE (t/h)	$R^{2}$
BP Neural network	[−0.611, 1.930]	0.8916	0.6828	0.5112
RBF Neural network	[−1.350, 1.580]	0.5718	0.4379	0.6937
MME- RBF Neural Network	[−1.227, 1.267]	0.4767	0.3574	0.7663
K-means-RBF Neural network	[−0.823, 0.660]	0.2812	0.2363	0.8271
OPTICS-RBF Neural network	[−0.420, 0.429]	0.1692	0.1125	0.9218

Table 6. Unimodal reference function F5 and multi-modal reference function F12.

Function	Dimension	Range
$F_{5} (x) = \sum_{i = 1}^{n = 1} [100 {(x_{i + 1} - x_{i}^{2})}^{2} + {(x_{i} + 1)}^{2}]$	50	[−30, 30]
$\begin{matrix} F_{12} (x) = \frac{π}{n} \{10 \sin (π y_{1}) + \sum_{i = 1}^{n - 1} {(y_{i} - 1)}^{2} [1 + \sin^{2} {(π y_{i + 1})}^{2}] + {(y_{n} - 1)}^{2}\} \\ + \sum_{i = 1}^{n} u (x_{i,} 10, 100, 4) \end{matrix}$	50	[−50, 50]

Table 7. Comparison of optimization results on the standard test functions.

	ALO	PSO	ABC	IALO
F5	5.2827	90.5459	10.4729	1.9508
F12	2.19 × 10⁻⁷	0.0024	2.12 × 10⁻⁵	4.69 × 10⁻¹¹

Table 8. Orthogonal table of hyperparameters for the primary airflow PI control system.

Experiment Number	$η^{'}$	$η^{″}$	$η_{c}$	$η_{σ}$	$γ$
1	0.001	0.001	0.001	0.001	0.9
2	0.001	0.001	0.01	0.01	0.95
3	0.001	0.001	0.02	0.02	0.99
⋮	⋮	⋮	⋮	⋮	⋮
26	0.02	0.02	0.01	0.001	0.99
27	0.02	0.02	0.02	0.01	0.9

Table 9. Hyperparameter settings for reinforcement learning in each control loop.

	$η^{'}$	$η^{″}$	$η_{c}$	$η_{σ}$	$γ$
Primary airflow PI control loop	0.001	0.01	0.01	0.001	0.99
Secondary airflow PI control loop	0.001	0.01	0.001	0.001	0.95
Economizer feedwater flow PI control loop	0.001	0.02	0.01	0.001	0.95
Boiler drum pressure PI control loop	0.001	0.01	0.001	0.02	0.99
Secondary fan outlet air pressure PI control loop	0.001	0.001	0.02	0.001	0.9

Table 10. The setpoint of the PI parameters in the PI control system.

	K_P	K_I
Primary airflow PI control loop	0.07472	0.05774
Secondary airflow PI control loop	0.08019	0.04623
Economizer feedwater flow PI control loop	0.07488	0.04777
Boiler drum pressure PI control loop	0.06457	0.05184
Secondary fan outlet air pressure PI control loop	0.06975	0.04693

Table 11. Evaluation of the effectiveness of control of MSFR.

	ISE	IAE	DEV^max	$\bar{σ} %$	${\bar{t}}_{r}$ (s)	${\bar{t}}_{s}$ (s)
Optimize the setting of PID	0.0671	0.0882	3.2901	1.43%	48.21	421.69
PID	0.0823	0.0988	3.4632	1.69%	46.23	445.41
QDRNN-PID	0.0716	0.0931	3.3772	1.51%	50.50	444.67
PID	0.1074	0.1130	3.5592	1.84%	46.83	475.83

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pian, J.; Liu, J.; Tang, J.; Hou, J. Intelligent Control of the Main Steam Flow Rate for the Municipal Solid Waste Incineration Process. Sustainability 2025, 17, 6036. https://doi.org/10.3390/su17136036

AMA Style

Pian J, Liu J, Tang J, Hou J. Intelligent Control of the Main Steam Flow Rate for the Municipal Solid Waste Incineration Process. Sustainability. 2025; 17(13):6036. https://doi.org/10.3390/su17136036

Chicago/Turabian Style

Pian, Jinxiang, Jianyong Liu, Jian Tang, and Jing Hou. 2025. "Intelligent Control of the Main Steam Flow Rate for the Municipal Solid Waste Incineration Process" Sustainability 17, no. 13: 6036. https://doi.org/10.3390/su17136036

APA Style

Pian, J., Liu, J., Tang, J., & Hou, J. (2025). Intelligent Control of the Main Steam Flow Rate for the Municipal Solid Waste Incineration Process. Sustainability, 17(13), 6036. https://doi.org/10.3390/su17136036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Control of the Main Steam Flow Rate for the Municipal Solid Waste Incineration Process

Abstract

1. Introduction

2. Control Problem Description in the MSWI Process

2.1. Description of MSWI Process Flow

2.2. Problem Analysis

3. Methodology

3.1. Intelligent Control Strategy

3.2. Intelligent Control Algorithm

3.2.1. MSFR Prediction Module Based on the Improved RBF

3.2.2. Optimization of Manipulated Variables Setpoint Based on the IALO

3.2.3. PI Control System for MSFR Based on Reinforcement Learning

3.3. Realization Steps of the Intelligent Control Method for MSFR

4. Experiments and Analysis

4.1. Verification of the MSFR Prediction Model Based on RBF

4.2. Validation of the Intelligent Control Method

4.3. Validation of the Effectiveness of the Intelligent Control Method Under Varying Operating Conditions

4.4. Comprehensive Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI