Next Article in Journal
A Study on the Impact of Local Policy Response on the Technological Innovation of the New Energy Vehicle Industry
Previous Article in Journal
Chain Leader Policy and Corporate Environmental Sustainability: A Multi-Level Analysis of Greenwashing Mitigation Mechanisms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluating Municipal Solid Waste Incineration Through Determining Flame Combustion to Improve Combustion Processes for Environmental Sanitation

1
School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China
2
Beijing Laboratory of Smart Environmental Protection, Beijing 100124, China
3
College of Information Engineering, Dalian Ocean University, Dalian 116023, China
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(19), 8872; https://doi.org/10.3390/su17198872
Submission received: 18 July 2025 / Revised: 30 September 2025 / Accepted: 30 September 2025 / Published: 4 October 2025
(This article belongs to the Section Waste and Recycling)

Abstract

Municipal solid waste (MSW) refers to solid and semi-solid waste generated during human production and daily activities. The process of incinerating such waste, known as municipal solid waste incineration (MSWI), serves as a critical method for reducing waste volume and recovering resources. Automatic online recognition of flame combustion status during MSWI is a key technical approach to ensuring system stability, addressing issues such as high pollution emissions, severe equipment wear, and low operational efficiency. However, when manually selecting optimized features and hyperparameters based on empirical experience, the MSWI flame combustion state recognition model suffers from high time consumption, strong dependency on expertise, and difficulty in adaptively obtaining optimal solutions. To address these challenges, this article proposes a method for constructing a flame combustion state recognition model optimized based on reinforcement learning (RL), long short-term memory (LSTM), and parallel differential evolution (PDE) algorithms, achieving collaborative optimization of deep features and model hyperparameters. First, the feature selection and hyperparameter optimization problem of the ViT-IDFC combustion state recognition model is transformed into an encoding design and optimization problem for the PDE algorithm. Then, the mutation and selection factors of the PDE algorithm are used as modeling inputs for LSTM, which predicts the optimal hyperparameters based on PDE outputs. Next, during the PDE-based optimization of the ViT-IDFC model, a policy gradient reinforcement learning method is applied to determine the parameters of the LSTM model. Finally, the optimized combustion state recognition model is obtained by identifying the feature selection parameters and hyperparameters of the ViT-IDFC model. Test results based on an industrial image dataset demonstrate that the proposed optimization algorithm improves the recognition performance of both left and right grate recognition models, with the left grate achieving a 0.51% increase in recognition accuracy and the right grate a 0.74% increase.

1. Introduction

Pollution control and prevention are crucial for the sustainable development of the global economy, the protection of the environment, and the safeguarding of public health, making them a global priority [1,2]. Municipal solid waste (MSW) is growing annually by 8% to 10% worldwide [3,4]. In developing countries such as China, many cities are at risk of being overwhelmed by MSW, leading to significant environmental challenges [5]. MSW incineration (MSWI) is a complex system that converts waste into energy (WTE) and plays a key role in addressing urban environmental issues [6,7,8] while supporting renewable energy recycling [9]. Although MSWI is a scientifically validated waste treatment method, emissions from these plants are a major source of pollution [10,11], often facing opposition due to the “Not in My Backyard” (NIMBY) effect [12,13]. The combustion stage of the MSWI process is the primary source of environmental indicators (EIs) such as NOx, CO, HCl, and SO2 [14]. Flame combustion state recognition technology enables early detection of abnormal combustion conditions, thereby guiding the control system to promptly adjust operational parameters such as grate speed and facilitating a rapid return to normal combustion. Under abnormal conditions, incomplete fuel combustion leads to increased carbon monoxide emissions, a significantly higher risk of dioxin formation due to suboptimal combustion environments, and reduced power generation efficiency [3]. Furthermore, deteriorating conditions may even cause unplanned shutdowns and equipment wear. Therefore, accurate identification of flame combustion states is crucial for pollution reduction and optimized control.
Due to differences in operating equipment, the uncertainty in the maintenance of MSWI plants, fluctuations in the composition of MSW, and strong interference from the environment within the incinerator, accurately identifying the flame combustion state is challenging [3]. Additionally, the reliance on experts for manual recognition of combustion status lacks stability, making it difficult to ensure consistent operational efficiency and pollution reduction in the MSWI process [15]. The process of expert manual identification is intelligently modeled by constructing a data-driven mapping function from image space to state space, thereby transforming human visual experience based on prior knowledge into a computable and optimizable mathematical model. Leveraging the powerful representational learning capability of deep neural networks, discriminative deep-level information is automatically extracted from raw data, and high-precision modeling is achieved by learning the underlying complex patterns and decision boundaries. By building a high-accuracy artificial intelligence-based combustion state recognition model, limitations of human experts—such as subjectivity, limited experience, and cognitive biases—are overcome in terms of generalization, robustness, and judgment consistency. This provides effective support for the intelligent operation of the MSWI process.
Extensive research has been conducted on the identification of MSWI flame combustion states. Zhou et al. proposed a PCA-K-means clustering approach to distinguish between abnormal and normal flames, using only two states for classification [16]. Huang, on the other hand, extracted key parameters—such as the grayscale mean, flame area ratio, and flame high-temperature rate—from combustion images through image processing technology. These parameters were then input into K-nearest neighbor (KNN) and convolutional neural network (CNN) models for partial burning state recognition [17]. However, this approach relies on only a few physical features. Zhou also extracted 12 physical features from flame images based on expert knowledge and used a backpropagation neural network (BPNN) for combustion state recognition [18]. Additionally, Qiao et al. constructed a combustion state recognition model based on flame image color moment features and applied the least squares support vector machine (LS-SVM) [19]. Guo et al. took a different approach by generating abnormal working condition images using deep convolutional adversarial networks (DCGANs) to construct a CNN for combustion state recognition. This method addresses the issue of limited abnormal combustion images [20]. However, it only classifies combustion states based on the position of the combustion line, which provides limited local information. In contrast, Pan et al. created a flame combustion state recognition database using global image features and employed an improved visual transformer-based deep forest classification (ViT-IDFC) algorithm for state recognition [21]. The key limitation of the research on MSWI flame state recognition models mentioned above is that determining hyperparameters often requires expert experience and repeated experimentation, which is time-consuming, resource-intensive, and computationally expensive. Furthermore, these methods do not guarantee that the model will achieve optimal recognition performance.
To address these issues, this article conducts research on flame combustion state recognition through the collaborative optimization of deep features and model hyperparameters. Evolutionary algorithms (EAs) are commonly used to solve the aforementioned optimization problems [22]. Among all possible combinations, EAs can achieve the search for a globally optimal or near-optimal solution, thereby enabling the model to reach its best performance. The optimization process simulates the mechanisms of “natural selection” and “genetic inheritance” observed in biological systems. Each “population” consists of multiple potential solutions, and the algorithm iteratively performs operations such as selection, crossover, and mutation. Based on the evaluation results of each solution, superior individuals are retained while inferior ones are eliminated, gradually evolving solutions with progressively improved performance. Among these, the Differential Evolution (DE) algorithm is a heuristic optimization technique primarily used to solve continuous problems in real space. Due to its automatic adaptation and ease of implementation, DE has been widely adopted. For example, Dilbag et al. optimized the hyperparameters of a remote sensing image visibility restoration model using dynamic DE [23], Singh et al. adjusted the initial parameters of a CNN using multi-objective DE [24], and Chen et al. introduced Gaussian–Cauchy mutation and parameter adaptation strategies in DE to solve the economic scheduling problem of large-scale cogeneration systems [25]. These studies achieved improved results by optimizing model parameters using the DE algorithm. However, DE faces challenges such as stagnation and premature convergence during the optimization process, which are closely linked to the control parameters and the crossover rate of the algorithm. Clearly, for different optimization problems, the most suitable parameter settings must strike a balance between maintaining population diversity and enhancing convergence speed. Typically, the control parameters of DE are set based on optimization experience obtained through repeated experiments.
Reinforcement learning (RL) achieves task goals through interactive learning between agents and the environment [26]. Unlike supervised and unsupervised learning, RL focuses on decision-making based on environmental feedback. In the RL framework, agents learn to maximize long-term cumulative rewards through interaction with the environment. Its advantages include the ability to make sequential decisions, adapt to environmental changes, balance exploration and exploitation, not requiring large amounts of labeled data, and having high generalizability [27]. Currently, RL is gradually becoming an important tool for solving model hyperparameter optimization. For example, Pan et al. designed a deep neural network, PFSPNet [28], to achieve an end-to-end output model that is not limited by problem size, aimed at solving the schedule ng problem in an assembly line workshop. They applied a deep RL execution evaluation strategy to optimize the network parameters of the model. The results showed that this approach outperformed existing heuristic algorithms. Tian et al. addressed the exploration and exploitation challenges faced by EAs in operator selection during the search process [29]. They treated decision variables as states and candidate operators as actions, using deep neural networks to learn and estimate the Q-value strategy for each action in a given state. Based on reinforcement learning, their proposed method was used for operator selection. Experimental results demonstrated that the method could successfully identify the optimal operator for each parent.
Furthermore, some scholars are focused on exploring how to combine RL with DE in the search process to achieve adaptive selection of optimal parameters. For example, Hu et al. proposed integrating RL with DE in photovoltaic models [30], evaluating fitness function values during iterations to determine action rewards for adjusting parameter values, and then using RL to adjust these parameters to identify the most suitable algorithm settings for the environmental model. Tan et al. introduced a hybrid mutation strategy DE algorithm based on deep Q-network (DQN) [31], called DEDQN, which implements adaptive selection of mutation strategies during the evolutionary process through RL. Sun et al. used Long Short-Term Memory (LSTM) as the parameter controller for DE and employed an RL algorithm to optimize the parameters of the LSTM [32]. These studies leverage the reward/punishment mechanism of RL, enabling agents to continuously learn the optimal parameter combinations through trial and error, thereby improving model performance. However, there is a lack of research on improving the operational efficiency of the DE algorithm and on performing deep feature and model hyperparameter collaborative optimization in the context of flame combustion state recognition.
In the MSWI process, variations in process parameters such as MSW heating value and air supply volume directly influence the flame combustion state, which manifests as differences in flame image data distribution. Building on prior knowledge from field experts, reference [21] proposed a global feature-oriented flame combustion state image dataset and introduced the ViT-IDFC recognition algorithm. However, the multilayer feature selection and hyperparameter determination of this model mainly depend on manual experience. When the distribution of the flame combustion state dataset changes, the effectiveness of manual parameter selection decreases significantly. Thus, there is a need to develop a method capable of adaptively selecting hyperparameters based on changes in dataset distribution. This approach ensures that when process parameter variations such as MSW heating value or air supply volume cause a decline in model recognition accuracy, this performance degradation signal promptly triggers the optimization framework, driving it to automatically adjust feature selection parameters and model hyperparameters to regain adaptation to new operating conditions. To improve the performance of the flame global characterization information combustion state recognition model for MSWI processes, it is crucial to address the following: (1) how to select appropriate feature selection parameters and recognition model hyperparameters for encoding, and (2) how to optimize the control parameters of the parallel DE (PDE) algorithm to accommodate changes in the image dataset distribution. parameters of the parallel differential evolution algorithm to adapt to changes in image dataset distribution.
In response to the above challenges, this article proposes a ViT-IDFC recognition model optimized by RL-LSTM-PDE. First, the hyperparameter selection process of the ViT-IDFC combustion state recognition model is encoded as a PDE optimization problem. Next, control parameters—such as the mutation factor and crossover factor of PDE—are modeled using LSTM, with the LSTM output providing the optimal hyperparameters for the ViT-IDFC combustion state recognition model. Finally, the network parameters of the LSTM model are derived through PDE during the empirical learning process, which optimizes both ViT features and IDFC model hyperparameters. The innovations of this article are as follows: (1) a novel optimization modeling strategy for deep features and model hyperparameters, which formalizes the selection process as an observable Markov Decision Process (MDP); (2) the use of a multi-layer optimization approach, combining PDE for model parameter optimization, LSTM for predicting the evolutionary parameters of PDE, and RL for optimizing PDE parameters based on model generalization performance, to solve the MDP problem; (3) the collaborative optimization of ViT-IDFC model features and hyperparameters in the MSWI process, which improves the model’s generalization performance.

2. Flame State Recognition Description for MSWI Process

The process flow of grate furnace type MSWI is shown in Figure 1.
As shown in Figure 1, the MSWI process is a complex, serial system consisting of six stages: solid waste fermentation, solid waste combustion, waste heat exchange, steam power generation, flue gas treatment, and flue gas emissions. These stages are interrelated and together form the overall framework of the MSWI process. Among them, solid waste combustion is the core of the entire MSWI process, and the assessment of the on-site MSW combustion status is generally based on domain experts’ visual observation of flame images. Correspondingly, the combustion control strategy is adjusted based on the combustion status.
The schematic diagram of the four combustion states of MSW based on global information is shown in Figure 2.
In Figure 2, the blue line represents the burnout line of the flame, the red line represents the edge of the flame, and the red arrow indicates the direction and height of the flame. As shown in Figure 2, the combustion state of the MSWI is primarily classified into four categories: channeling burning, smoldering, partial burning, and normal burning. Significant differences are observed in the distribution of burnout lines, flame edges, direction, and height across the different combustion states, which result from the combined effects of the current combustion control strategies and the components of the MSW.

3. Optimization Modeling Strategy

The optimization modeling strategy proposed in this article is shown in Figure 3.
In Figure 3, { X n } n = 1 N and { y n } n = 1 N represent the flame image dataset and its corresponding combustion state, respectively; [ S P 1 , S P 2 , , S P 12 ] represents the deep feature selection parameters of the ViT-IDFC combustion state recognition model submodule; M and J represent model hyperparameters such as the number of decision trees and the minimum number of partition samples for leaf nodes in IDFC; Θ F t and Θ CR t represent the mutation factor and crossover factor of PDE; and W represents the network parameter of LSTM.
The functions of each module are as follows:
(1)
PDE-based deep feature and hyperparameter optimization module: encodes the feature selection parameters [ S P 1 , S P 2 , , S P 12 ] and recognition model hyperparameters M and J of ViT-IDFC as decision variables for PDE optimization, and the fitness is characterized by the recognition;
(2)
LSTM-based PDE parameter prediction module: an LSTM model is constructed, with the mutation factor F and crossover factor C r of PDE as inputs, and optimized F * and C r * as the outputs;
(3)
RL-based LSTM parameter optimization module: treating LSTM as a Markov decision process and using RL optimization based on policy gradient algorithm to obtain the optimized control parameters of PDE.
Therefore, the optimization modeling process is as follows. Firstly, based on RL agents that can learn from experience, we obtain the network parameters of LSTM. Then, based on LSTM, we determine the F and C r parameters of PDE. Next, based on PDE, we determine the parameters of the ViT-IDFC model, such as [ S P 1 , S P 2 , , S P 12 ] , M and J . Finally, we obtain the optimized recognition model.

4. Algorithm Implementation

4.1. PDE-Based Deep Feature and Hyperparameter Optimization Module

4.1.1. ViT-IDFC Model Construction Submodule

Firstly, a combustion state recognition model based on ViT-IDFC is constructed, and the collected flame images on site are classified to obtain a flame image dataset, which is denoted as ( X n , y n ) n = 1 N (where X n R H × W × C represents the n th image, H , W and C respectively represent its height, width, and number of channels, y n represents the label, and N represents the sample size of the dataset). On this basis, pre-training of ViT is carried out based on the ImageNet dataset, and the ViT model is iteratively optimized until the network performance meets the predetermined requirements. The weights and bias parameters of the model are saved. Based on the above image dataset, transfer feature extraction is performed by inputting the flame image dataset ( X n Flame ) n = 1 N into the ViT model. Further, the feature sequences extracted by each encoding layer based on Transformer is represented using the following equation:
[ P n 1 ,   P n 2 ,   ,   P n 12 ] n = 1 N = f ViT ( X n Flame ) n = 1 N
where f ViT ( · ) represents the ViT deep feature extraction model composed of an embedding layer and an encoding layer.
Next, transfer feature selection is performed based on expert experience, and the selected flame image depth features are denoted as ( P n * - 12 ) n = 1 N . Furthermore, we construct the IDFC module and concatenate ( X n Flame , samll ) n = 1 N obtained by reducing ( P n * - 12 ) n = 1 N and ( X n Flame ) n = 1 N to obtain the following features:
( z n Flame ) n = 1 N = [ P n * 12 , flatten ( X n Flame , small ) ] n = 1 N
where flatten ( · ) represents the image flattening operation.
Finally, the recognition result is obtained by inputting ( z n Flame ) n = 1 N into the IDFC model as follows:
( y ^ n Flame ) n = 1 N = f IDFC ( ( z n Flame ) n = 1 N )
where f IDFC ( · ) represents the constructed IDFC model.

4.1.2. PDE-Based Optimization Sub-Module

DE is widely used to solve global optimal solutions in multidimensional spaces. To improve the problem of long optimization time in DE, the PDE algorithm uses multiple population parallelization and reduces the size of subpopulations to reduce evolution time, while further improving evolution efficiency by constraining the direction of subpopulation evolution in iterations [33]. It can be divided into four parts, i.e., optimization parameter encoding, parameter initialization, population partition, and parallel evolution. Details are shown as follows.
Step (1) Optimization parameter encoding. The deep feature [ P n 1 ,   P n 2 ,   ,   P n 12 ] n = 1 N in the ViT-IDFC recognition model represents the 12 layers of features extracted by the ViT model. Obviously, relying on expert experience to select features has a certain degree of subjective arbitrariness. To further select deep features that can represent global information, [ S P 1 , S P 2 , , S P 12 ] is used as the parameter for selecting deep features. We take the first layer feature as example; the criterion is defined as follows: when S P 1 = 1 is present, it is selected; when S P 1 = 0 occurs, it is deleted. In addition, the number of decision trees J and the minimum number of leaf nodes M of the base learner RF in the IDFC recognition model are encoded as model hyperparameters.
Step (2) Parameter initialization. Four PDE parameters, including population size f size ( N P ) , number of subpopulations A , subpopulation mutation factor F , and subpopulation crossover probability C R , need to be initialized, where f size ( N P ) and A are used to partition subpopulations; F and C R are used for the evolutionary process of each subpopulation. We use prediction accuracy as the fitness function to evaluate the fitness of the population, which is as follows:
Fitness N P = f Accuracy ( N P )
where f Accuracy ( · ) represents the fitness evaluation function.
Step (3) population partition. Randomly dividing the initial population into A subpopulations can be expressed as:
{ s u b _ N P 1 , , s u b _ N P α , , s u b _ N P A } = f random ( N P , A )
The size of the α th subpopulation s u b _ N P α is obtained as follows:
f size ( s u b _ N P α ) = 1 A f size ( N P )
Step (4) Parallel evolution. We execute DE in parallel pools for different subpopulations based on their partitioning results. The following sections describe the process from two aspects such as independent evolution of subpopulations and overall evolution of the population.
a.
Independent evolution of subpopulations
The evolutionary strategy of subpopulations is consistent with the standard DE strategy and can be divided into three parts such as “mutation”, “crossover”, and “selection”. The following section takes the t th generation of the α th subpopulation as an example.
Firstly, we adopt the DE/current to best/1 strategy for mutation operation. Specifically, two different individuals are differentiated and added to a third individual to obtain a mutated individual, as shown below:
V α m ( t + 1 ) = s u b _ N P α m ( t ) + Θ F t ( s u b _ N P α best ( t ) s u b _ N P α m ( t ) ) + Θ F t ( s u b _ N P α r 1 ( t ) s u b _ N P α r 2 ( t ) )
where Θ F t represents the variable factor, Θ F t = { F t , 0 < F t 1 } ; V α m ( t + 1 ) represents the m th mutant individual of the ( t + 1 ) th generation; s u b _ N P α m ( t ) represents the m th individual of the t th generation; s u b _ N P α r 1 ( t ) and s u b _ N P α r 2 ( t ) represent two random individuals of the t th generation; and s u b _ N P α best ( t ) is the optimal individual of the t th generation.
Here, the mutation operation of the PDE algorithm is referred to as a function M ( · ) , and the generation process of the t th generation mutation heterogeneous group V α ( t + 1 ) is as follows:
V α ( t + 1 ) = M ( s u b _ N P α ( t ) ; Θ F t )
Next, we perform crossover operations on each individual and their offspring’s mutated individuals. Here, the m th individual of the ( t + 1 ) th is represented as:
U α m ( t + 1 ) = { S P , m 1 , S P , m 2 , , S P , m 12 , J m , M m }
where U α m ( t + 1 ) is obtained through cross recombination of V α m ( t + 1 ) and s u b _ N P α m ( t ) . Taking allele S P , m 1 as an example, its crossover process is executed as follows:
U α m ( t + 1 ) S P , m 1 = V α m ( t + 1 ) S P , m 1   rand [ 0 , 1 ] < Θ CR t s u b _ N P α m ( t ) S P , m 1   otherwise
where Θ CR t is the cross factor, Θ CR t = { C R t , 0 < C R t 1 } .
When the set crossover probability Θ CR t is less than the generated random floating-point number rand in [ 0 , 1 ] , the crossover reassembly operation is performed.
Here, if we denote the crossover operation as a function C R ( · ) , the process of generating U α ( t + 1 ) can be expressed as follows:
U α ( t + 1 ) = C R ( V α ( t + 1 ) , s u b _ N P α ( t ) ; Θ CR t )
Finally, the fitness function evaluates and selects s u b _ N P α m ( t ) and U α m ( t + 1 ) , with the criterion of selecting the individual with a high score as the m th individual of the ( t + 1 ) th generation, as follows:
s u b _ N P α m ( t + 1 ) = s u b _ N P α m ( t )   Fitness α ( t ) Fitness α , U ( t + 1 ) U α m ( t + 1 )   otherwise
Here, the selection operation is denoted as function S ( · ) . Correspondingly, the fitness value Fitness α ( t + 1 ) corresponding to the ( t + 1 ) th generation population can be expressed as:
Fitness α ( t + 1 ) = S ( Fitness α ( t ) , Fitness α , U ( t + 1 ) , s u b _ N P α ( t ) , U α ( t + 1 ) )
Finally, the local optimal solution s u b _ N P α best ( t + 1 ) obtained after the evolution of each subpopulation can be expressed as:
s u b _ N P α best ( t + 1 ) = [ S P , best 1 , S P , best 2 , , S P , best 12 , J best , M best ]
b.
Overall evolution of the population
First, after completing a parallel evolution, the local optimal solutions of each subpopulation are summarized to obtain a set of local optimal solutions N P best ( t + 1 ) , as follows:
N P best ( t + 1 ) = { s u b _ N P 1 best ( t + 1 ) , , s u b _ N P α best ( t + 1 ) , , s u b _ N P A best ( t + 1 ) }
Then, based on the local optimal solution set N P best ( t + 1 ) , each subpopulation is updated by adding the optimal individual of each subpopulation to obtain a new population N P ( t + 1 ) .
Finally, based on whether the number of previous iterations has reached the pre-set threshold Θ PDE , perform the corresponding operations:
If not achieved, re-divide the population and perform the evolutionary process, i.e., return to step (3);
If reached, the model stops iterating. Based on the local optimal solution set N P best ( t + 1 ) , the optimal solution with the highest fitness value is selected as the optimal solution s u b _ N P best ( t + 1 ) of the model, as shown below:
s u b _ N P best ( t + 1 ) = arg max ( f Accuracy ( s N P ( t + 1 ) ) )
where s N P ( t + 1 ) N P best ( t + 1 ) .
The algorithm flowchart is shown in Figure 4.

4.2. LSTM-Based PDE Parameter Prediction Module

In the evolution process of PDE, the information of its t th generation population is determined from the 1st generation to the ( t 1 ) th generation. Due to the randomness of Θ F t and Θ CR t , the evolutionary process of PDE can also be viewed as a random time series. Therefore, to adapt to its temporal dependence, an LSTM network is used here to obtain the Θ F t and Θ CR t through online prediction.
Specifically, the fitness value Fitness ( t ) (denoted as F t ) of PDE and its statistical data U t are combined to form A t = [ F t , U t ] as the input for LSTM. Among them, U t is composed of the normalized histogram f ˙ t and the moving average f ¯ t of the histogram vectors from the past g generation, denoted as U t = [ f ˙ t , f ¯ t ] . The calculation process is shown below in detail.
Firstly, we normalize F t to obtain f ¯ as follows:
f ¯ = f min { F t } max { F t } min { F t }
Then, we calculate the histogram f ˙ t of f ¯ as follows:
f ˙ t = h i s t o g r a m ( { f ¯ } , b )
where b is the number of bars in the histogram.
Furthermore, the moving average value f ¯ t of the histogram vector of the previous generation g is calculated as follows:
f ¯ t = 1 g i = t g t 1 f ˙ i
Finally, the input A t = [ F t , U t ] of the LSTM network is obtained to predict Θ t = [ Θ F t , Θ CR t ] .
The LSTM network introduces gate structures including forget gates, input gates, and output gates, as well as a memory unit to selectively receive information input into the neural network. The gate structure is responsible for connecting other parts of the neural network to the memory unit to complete weight setting.
Specifically, the forget gate receives information from the previous moment to the current moment, and calculates the degree of information retention f t based on the output h t 1 from the previous moment and the input A t from the current moment as follows:
f t = σ ( W f · [ h t 1 , A t ] + b f )
where σ is the sigmoid activation function.
The input gate first calculates its activation value i t based on h t 1 and A t , as follows:
i t = σ ( W i · [ h t 1 , A t ] + b i )
Next, we calculate the candidate values C ˜ t that may be added to the cell state as follows,
C ˜ t = tanh ( W C · [ h t 1 , A t ] + b C )
Then, based on the results of the first two steps, we calculate the updated unit status C t as follows,
C t = f t C t 1 + i t C ˜ t
where represents Hadamard product.
The output gate calculates the degree of information output o t based on h t 1 and A t as follows:
o t = σ ( W o · [ h t 1 , A t ] + b o )
The short-term memory value h t at the current time is obtained as follows:
h t = o t tanh ( C t )
Finally, the current prediction result output by the network is as follows:
Θ ^ t = σ ( W h t )
where W is the output hidden layer weight.
Therefore, LSTM can be formalized as follows:
[ Θ ^ t , C t , h t ] = LSTM ( A t , h t 1 , C t 1 ; W )
where W = [ W f , W i , W c , W o , W , b f , b i , b c , b o ] , C t represents long-term memory value, and h t represents short-term memory value.
The algorithm flowchart is shown in Figure 5.

4.3. RL-Based LSTM Parameter Optimization Module

To optimize the evolutionary parameters of PDE, the LSTM used for PDE parameter prediction is first modeled as a Markov decision process (MDP). The definitions of environment, state, action, strategy, reward, and transfer are as follows.
(1)
Environment: Optimize the ViT-IDFC combustion state recognition model;
(2)
Status: Use fitness values F t , statistics U t , and short-term memory values h t as the status s t ;
(3)
Action: Define PDE parameters as actions, i.e., a t = Θ ^ t = { Θ F t , Θ CR t } ;
(4)
Strategy: Determined by LSTM parameters, as follows:
π ( a t | s t ) = 1 ( 2 π σ 2 ) exp { 1 2 σ 2 ( a t L S T M ( s t ; W ) ) 2 }
(5)
Reward: Refers to the relative improvement of the best fitness value, as follows:
r t + 1 = max { F t + 1 } - max { F t } max { F t }
(6)
Transition: Represented as a probability distribution p ( s t + 1 | a t , s t ) . Here, the RL based variant algorithm Policy Gradient (PG) is used to handle the situation where the transition probability is unknown. The principle of this algorithm is to update policy parameters through random gradient ascent to maximize the expected value of rewards. In this way, algorithms can optimize the behavior of agents to adapt to unknown transition probability environments, as follows:
r t + 1 = r t + α · r U ( r t )
where α represents the learning rate, and r U ( r ) represents the cumulative reward gradient.
For the proposed MDP, the joint probability of trajectory τ = { s 0 , a 0 , r 1 , , a T 1 , a T 1 , r T } is as follows,
q ( τ ; r ) = p ( s 0 ) t = 0 T 1 π ( a t | s t ; r ) p ( s t + 1 | a t , s t )
Furthermore, the cumulative reward gradient r U ( r ) can be obtained as follows:
r U ( r ) = τ r ( τ ) r q ( τ ; r ) = τ r ( τ ) r q ( τ ; r ) q ( τ ; r ) q ( τ ; r ) = τ r ( τ ) q ( τ ; r ) θ [ log q ( τ ; r ) ] = τ r ( τ ) q ( τ ; r ) [ t = 0 T 1 r log π ( a t | s t ; r ) ]
where r ( τ ) is the cumulative reward of τ .
The expected value of the above equation can be obtained through L trajectory samples, as follows:
U r ( r ) 1 L i = 1 L r ( τ i ) t = 0 T 1 r ln π ( a t ( i ) | s t ( i ) ; r )
where a t ( i ) represents the action value of the agent at time t of the i th trajectory, and s t ( i ) represents the state of the agent at time t of the i th trajectory.
Therefore, according to Equation (29), it can be calculated to evaluate the actions a t taken by the agent in the current state s t .
The algorithm flowchart is shown in Figure 6.

4.4. Pseudocode

Table 1 shows the pseudocode of the proposed algorithm.
The proposed algorithm flowchart is shown in Figure 7.

5. Experimental Study

5.1. Hardware Configuration and Evaluation Indicators

The recognition model described in this article is built on Python 3.9 and Pytorch 1.12.0 GPU framework. The training and testing hardware environments are shown in Table 2.
To evaluate the performance of the model, we used evaluation metrics such as confusion matrix, accuracy, precision, recall, F1-score and ROC-AUC. Table 3 shows the confusion matrix of the classification results, providing specific data for performance evaluation. Among them, TP represents the number of samples correctly predicted as positive by the model; FP denotes the number of samples incorrectly predicted as positive; TN indicates the number of samples correctly predicted as negative; and FN refers to the number of samples incorrectly predicted as negative.
Accuracy measures the proportion of correct predictions among all predictions made by the model, reflecting its overall performance; Precision measures the proportion of truly positive samples among those predicted as positive, indicating the accuracy of positive predictions; Recall measures the proportion of successfully predicted positive samples among all actual positive samples, reflecting the model’s ability to identify all relevant instances; F1-score reflects the model’s comprehensive performance in identifying positive-class samples, balancing both the accuracy of positive predictions and the coverage capability for positive instances; ROC-AUC indicates the model’s ability to distinguish between positive and negative samples across different classification thresholds. The value ranges of the above five metrics are all [0, 1], with higher values indicating better model performance. The definitions of Accuracy, Precision, Recall, F1-score and ROC-AUC are as follows.
Accuracy = T P + T N T P + T N + F P + F N
Precision = T P T P + F P
Recall = T P T P + F N
F 1 - score = 2 × T P 2 T P + F P + F N
FPR = F P F P + T N
ROC - AUC = 0 1   Recall ( x )   d ( FPR ( x ) )

5.2. Data Description

The flame image data used in this study was sourced from a MSWI power plant in Beijing. Due to the limited field of view of industrial cameras, one camera was installed at each end of the grate to capture flame videos.
We collected left and right single-channel flame videos from an MSWI power plant at a frame rate of 25 frames per second. Guided by expert experience, representative combustion state segments were then selected from the collected videos. The selected segment duration was 54 h and 49 min for the left grate and 44 h and 45 min for the right grate. These segments were sampled at a rate of one frame per minute to generate flame images. The specific distribution of the various typical combustion states is shown in Table 4, yielding 3289 images for the left grate and 2685 images for the right grate. These images cover flame states under multiple working conditions, including different waste input volumes, different grate speeds, and different air velocities, to ensure sample diversity and dataset representativeness. After randomly shuffling the dataset, it was split into training and testing sets in a 7:3 ratio.
Remark 1. 
The composition of waste shows significant variability due to residents’ living habits and seasonal consumption patterns. During combustion, water evaporation results in relatively low flame temperatures, which typically appear as low-brightness features in images. In contrast, dry waste produces higher image brightness and more dynamic flame contours. This variability in the dataset directly impacts the feature extraction logic of the model, affecting recognition stability across different scenarios. To mitigate the impact of such variability on the model’s generalization capability, future research will expand the dataset coverage to ensure a comprehensive sample distribution that accounts for fluctuations in seasonal waste composition and environmental conditions.

5.3. Experimental Result

5.3.1. Deep Features and Hyperparameter Selection Results

Table 5 shows the deep features and hyperparameter optimization results of ViT-IDFC based on PDE.
According to Table 5, after PDE optimization, both the feature selection parameters and model hyperparameters of the left and right grate recognition models have undergone changes. Among them, the feature selection parameters for both grates have selected 7 layers of deep features as input to the recognition model. However, the specific deep feature layers selected differ between the left and right grates, which is related to the distribution differences in the flame images from the two grates.

5.3.2. PDE Parameter Optimization Results

The optimization results of PDE parameters based on LSTM network are shown in Figure 8.
As shown in Figure 8, during the optimization process of PDE mutation factor and crossover factor based on LSTM network, F and CR gradually approach 1 in the optimization process of the left grate recognition model, while F and CR gradually approach 0.5 in the optimization process of the right grate recognition model. In the process of learning LSTM parameters based on RL, the reward value of RL is consistent with the accuracy value of the model and shows an increasing trend. This indicates that the agent gradually masters the experience of making the recognition model achieve the best performance during the learning process, and ultimately achieves the best recognition performance through parameter optimization.

5.3.3. Optimize Model Recognition Results

Table 6 shows the recognition results of the optimized model before and after deep feature and hyperparameter optimization.
As shown in Table 6, the RL-LSTM-PDE optimization algorithm has improved the recognition performance for both the left and right grate models through multi-level optimization based on the integration of RL, LSTM, and PDE. Specifically, the recognition accuracy increased by 0.51% for the left grate and 0.74% for the right grate.
The trend of RL reward value and recognition model accuracy value changing with epoch is shown in.
As shown in Figure 9, during the process of learning LSTM parameters based on RL, the reward value of RL is consistent with the accuracy value of the model and shows an increasing trend. This indicates that the agent gradually obtains the experience of achieving the optimal recognition effect of the model during the learning process, and ultimately achieves the best recognition effect through parameter optimization.
Figure 10 shows the confusion matrix of the optimized left/right grate recognition results.
In the matrix shown in Figure 10, C, M, P, and Z represent the four flame combustion states: Channeling burning, Smoldering, Partial burning, and Normal burning, respectively. The rows of the matrix correspond to the predicted combustion states, while the columns represent the true combustion states by the model.
The numbers along the diagonal indicate the count of correctly predicted samples. For example, the value in the first row and first column represents the number of samples correctly predicted as Channeling burning. The off-diagonal elements represent misclassified samples. For instance, the value in the first row and second column indicates the number of samples where Channeling burning was incorrectly predicted as Smoldering. The misclassification count for each combustion state is calculated by summing the off-diagonal elements in each row of the matrix. The optimal combustion state is determined as the category with the highest diagonal value and the lowest misclassification count.
From Figure 10, the combustion categories of both the left and right grate are prone to misidentification. This is because the combustion state often occurs in the transition stage of other combustion states, and therefore the flame image features are coupled with other categories. The Partial burning state is the least prone to misidentification, as its flame image features are relatively simple and differ significantly from other states.

5.3.4. Computational Complexity Analysis

The computational complexity of a single forward pass of the ViT-IDFC model is O ( m o d e l ViT - IDFC ) ; the computational complexity of the PDE algorithm is O ( Θ PDE × N P × m o d e l ViT - IDFC ) ; the computational complexity of the LSTM model is O ( Θ PDE × H 2 ) , where H is the size of the hidden layer; and the computational complexity of RL is O ( L × Θ PDE × N P × m o d e l ViT - IDFC ) .
During the training phase, the computational complexity of the LSTM model is significantly lower than that of RL and can be neglected. Therefore, the overall computational complexity of the proposed RL-LSTM-PDE method is denoted O ( L × Θ PDE × N P × m o d e l ViT - IDFC ) . During the inference phase, the time cost is completely decoupled from the optimization framework and is equivalent to the computational cost of a single ViT-IDFC model, denoted as O ( m o d e l ViT - IDFC ) .
Under the standard hardware configuration specified in Table 2, tests conducted on the left grate model demonstrated that completing the ViT-IDFC parameter optimization required approximately 4.2 h when the training epoch was set to 5. During this process, hardware resources were dedicated to core computational tasks including feature extraction, LSTM parameter inference, and PDE population optimization, with no redundant computations. This resource allocation aligns with the conventional load capacity of such hardware systems.

5.4. Method Comparison

To verify the superiority of the proposed method, it was compared with some other methods. The parameter settings are shown in Table 7, and the recognition results of the left/right grate flame images are shown in Table 8 and Table 9.
According to Table 8 and Table 9, compared to other methods, the method proposed in this article has the best optimization effect on the recognition model. The multi-level collaborative optimization architecture adopted in the proposed method achieves a three-layer interaction among RL, LSTM, and PDE. RL globally guides the optimization direction through a reward mechanism; LSTM dynamically interprets the optimization status and finely adjusts the hyperparameters of PDE; and PDE collaboratively explores the solution space with multiple subpopulations. This deeply coupled mechanism significantly enhances global optimization capability and model expressiveness, leading to comprehensive performance that surpasses compared models. Furthermore, the proposed method employs a parallel differential evolution framework, which significantly enhances operational efficiency compared to the RL-LSTM-DE approach. However, due to the incorporation of RL-LSTM networks for parameter prediction and optimization, the computational complexity of the proposed method is higher than that of conventional DE and PDE algorithms. Despite this increased complexity, the method achieves a substantial improvement in recognition accuracy. These results demonstrate that the proposed approach strikes an effective balance between computational efficiency and model performance.

5.5. Ablation Experiment

Table 10 and Table 11 present the ablation experimental results of the RL-LSTM-PDE method proposed in this study, based on the left and right grate flame images, respectively.
According to Table 10 and Table 11, when optimizing the recognition model parameters based on PDE and LSTM-PDE algorithms, the performance improvement is very limited, and even the model performance decreases, indicating that its global search ability is still very limited. When optimizing the recognition model based on the proposed RL-LSTM-PDE algorithm, the model performance can reach its optimal level.

6. Conclusions

To address the challenges of time consumption, high dependency, and difficulty in adaptively achieving optimal results when manually selecting optimization features and hyperparameters based on experience for the MSWI flame combustion state recognition model, an optimization method for collaboratively optimizing deep features and model hyperparameters is proposed. The optimization framework of the proposed method is essentially a general-purpose adaptive model optimizer, whose design is independent of the inherent characteristics of any specific species or dataset. By replacing the target dataset and correspondingly adjusting the preprocessing parameters of the model’s input layer, the framework can be quickly adapted to a variety of application scenarios, such as plant species recognition and combustion state classification in coal-fired power plants, demonstrating its capability for cross-scene migration and adaptation.
The main contributions of this study are as follows: (1) proposing a deep feature and hyperparameter optimization strategy for the recognition model based on RL-LSTM-PDE, which reduces the time-consuming process and inconsistencies caused by reliance on manual experience for feature and hyperparameter selection; (2) transforming the deep feature selection and hyperparameter optimization problem of the combustion state recognition model into a coding design and optimization problem for the PDE algorithm, reinterpreting the determination of mutation and selection factors for the PDE algorithm as an LSTM model prediction task, and framing the optimization of LSTM parameters as an RL problem within the context of optimizing the ViT-IDFC combustion state recognition model; (3) implementing the optimization of deep features and model hyperparameters for the ViT-IDFC combustion state recognition model based on the proposed strategy, and validating its effectiveness using real process data.
In the field of industrial AI applications, the integration of interpretable AI methods has become a crucial component for building trustworthy systems. The ViT-IDFC model used in this article not only visually identifies the key regions of flame images with varying depth features but also demonstrates intrinsic recognition causality through a decision tree-based recognition algorithm that offers traceability and interpretability. These features make the inference process of complex models transparent and traceable, helping operators understand the decision logic behind AI. This transparency is critical in enhancing the reliability of human–machine collaboration, particularly in industrial environments.
Future research should focus on reducing the time cost associated with the operation of this strategy. Additionally, we will collect flame image data under various operating conditions from multiple MSWI facilities and validate the methodology through extendable testing.

Author Contributions

Conceptualization, J.T.; methodology, J.T.; software, X.Y. and J.R.; validation, J.R.; formal analysis, J.T.; investigation, X.Y.; resources, X.Y. and J.R.; data curation, X.Y.; writing—original draft preparation, X.Y.; writing—review and editing, J.T. and W.W.; visualization, X.Y.; supervision, J.T. and W.W.; project administration, J.T.; funding acquisition, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the General Scientific Research Projects of Liaoning Province Science and Technology Joint Plan Project under Grant 2024JH2/102600083, and in part by Liaoning Provincial Department of Education under Grant JYTMS20230489.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Gómez-Sanabria, A.; Kiesewetter, G.; Klimont, Z.; Schoepp, W.; Haberl, H. Potential for future reductions of global GHG and air pollutants from circular waste management systems. Nat. Commun. 2022, 13, 106. [Google Scholar] [CrossRef]
  2. Ali, K.; Kausar, N.; Amir, M. Impact of pollution prevention strategies on environment sustainability: Role of environmental management accounting and environmental proactivity. Environ. Sci. Pollut. Res. 2023, 30, 88891–88904. [Google Scholar] [CrossRef]
  3. Tang, J.; Xia, H.; Yu, W.; Qiao, J.F. Research status and prospects of intelligent optimization control for municipal solid waste incineration process. Acta Autom. Sin. 2023, 49, 2019–2059. [Google Scholar]
  4. Song, Y.; Xian, X.; Zhang, C.; Zhu, F.; Yu, B.; Liu, J. Residual municipal solid waste to energy under carbon neutrality: Challenges and perspectives for China. Resour. Conserv. Recycl. 2023, 198, 107177. [Google Scholar] [CrossRef]
  5. Tang, L.; Guo, J.; Wan, R.; Jia, M.; Qu, J.; Li, L.; Bo, X. Air pollutant emissions and reduction potentials from municipal solid waste incineration in China. Environ. Pollut. 2023, 319, 121021. [Google Scholar] [CrossRef] [PubMed]
  6. Walser, T.; Limbach, L.K.; Brogioli, R.; Erismann, E.; Flamigni, L.; Hattendorf, B.; Juchli, M.; Krumeich, F.; Ludwig, C.; Prikopsky, K.; et al. Persistence of engineered nanoparticles in a municipal solid-waste incineration plant. Nat. Nanotechnol. 2012, 7, 520–524. [Google Scholar] [CrossRef] [PubMed]
  7. Istrate, I.R.; Galvez-Martos, J.L.; Vázquez, D.; Guillén-Gosálbez, G.; Dufour, J. Prospective analysis of the optimal capacity, economics and carbon footprint of energy recovery from municipal solid waste incineration. Resour. Conserv. Recycl. 2023, 193, 106943. [Google Scholar] [CrossRef]
  8. Fernando, Y.; Tseng, M.L.; Aziz, N.; Ikhsan, R.B.; Wahyuni-TD, I.S. Waste-to-energy supply chain management on circular economy capability: An empirical study. Sustain. Prod. Consum. 2022, 31, 26–38. [Google Scholar] [CrossRef]
  9. Kalyani, K.A.; Pandey, K.K. Waste to energy status in India: A short review. Renew. Sustain. Energy Rev. 2014, 31, 113–120. [Google Scholar] [CrossRef]
  10. Liu, J.; Hu, L.; Tang, L.; Ren, J. Utilisation of municipal solid waste incinerator (MSWI) fly ash with metakaolin for preparation of alkali-activated cementitious material. J. Hazard. Mater. 2021, 402, 123451. [Google Scholar] [CrossRef]
  11. Fan, C.; Wang, B.; Qi, Y.; Liu, Z. Characteristics and leaching behavior of MSWI fly ash in novel solidification/stabilization binders. Waste Manag. 2021, 131, 277–285. [Google Scholar] [CrossRef]
  12. Song, L.; Sun, Y.; Song, J.; Feng, Z.; Gao, J.; Yao, Q. From “not in my backyard” to “please in my backyard”: Transforming the local responses toward a waste-to-energy incineration project in China. Sustain. Prod. Consum. 2024, 45, 104–114. [Google Scholar] [CrossRef]
  13. Liang, X.; Kurniawan, T.A.; Goh, H.H.; Zhang, D.; Dai, W.; Liu, H.; Goh, K.C.; Othman, M.H.D. Conversion of landfilled waste-to-electricity (WTE) for energy efficiency improvement in Shenzhen (China): A strategy to contribute to resource recovery of unused methane for generating renewable energy on-site. J. Clean. Prod. 2022, 369, 133078. [Google Scholar] [CrossRef]
  14. Liang, D.; Wang, F.; Lv, G. The resource utilization and environmental assessment of MSWI fly ash with solidification and stabilization: A review. Waste Biomass Valorization 2024, 15, 37–56. [Google Scholar] [CrossRef]
  15. Cao, C.; Zhang, Q.; Li, M.H.; Wang, S.Y. Flame combustion state recognition in municipal solid waste incineration processes based on image multi-threshold segmentation and DQN-PL model. Energy 2025, 331, 136967. [Google Scholar] [CrossRef]
  16. Zhou, C.; Cao, Y.; Yang, S. Video Based Combustion State Identification for Municipal Solid Waste Incineration. IFAC-Pap. 2020, 53, 13448–13453. [Google Scholar] [CrossRef]
  17. Huang, S. Research on the Diagnosis Method of Large scale Household Waste Incineration Process Based on Machine Vision. Zhejiang University: Hangzhou, China, 2020. [Google Scholar]
  18. Zhou, Z.C. Research on Combustion State Diagnosis of Garbage Incinerator Based on Image Processing and Artificial Intelligence. Southeast University: Nanjing, China, 2015. [Google Scholar]
  19. Qiao, J.F.; Duan, H.S.; Tang, J.; Meng, X. MSWI combustion condition recognition based on flame image color features. Control Eng. 2022, 29, 1153–1161. [Google Scholar]
  20. Guo, H.T.; Tang, J.; Ding, H.X.; Qiao, J.F. Combustion states recognition method of MSWI process based on mixed data enhancement. Acta Autom. Sin. 2024, 50, 560–575. [Google Scholar]
  21. Pan, X.T.; Tang, J.; Xia, H.; Yu, W.; Qiao, J.F. Combustion state identification of MSWI processes using ViT-IDFC. Eng. Appl. Artif. Intell. 2023, 126, 106893. [Google Scholar] [CrossRef]
  22. Bian, C.; Zhou, Y.W.; Li, M.Q.; Qian, C. Stochastic population update can provably be helpful in multi-objective evolutionary algorithms. Artif. Intell. 2025, 341, 104308. [Google Scholar] [CrossRef]
  23. Singh, D.; Kaur, M.; Jabarulla, M.Y.; Kumar, V.; Lee, H. Evolving fusion-based visibility restoration model for hazy remote sensing images using dynamic differential evolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  24. Singh, D.; Kumar, V.; Vaishali; Kaur, M. Classification of COVID-19 patients from chest CT images using multi-objective differential evolution–based convolutional neural networks. Eur. J. Clin. Microbiol. Infect. Dis. 2020, 39, 1379–1389. [Google Scholar] [CrossRef]
  25. Chen, X.; Shen, A. Self-adaptive differential evolution with Gaussian–Cauchy mutation for large-scale CHP economic dispatch problem. Neural Comput. Appl. 2022, 34, 11769–11787. [Google Scholar] [CrossRef]
  26. Liang, X.X.; Feng, Y.H.; Ma, Y.; Cheng, G.Q.; Huang, J.C.; Wang, Q.; Zhou, Y.Z. Deep multi-agent reinforcement learning: A survey. Acta Autom. Sin. 2020, 46, 2537–2557. [Google Scholar]
  27. Zhou, P.; Wang, X.; Chai, T. Multiobjective operation optimization of wastewater treatment process based on reinforcement self-learning and knowledge guidance. IEEE Trans. Cybern. 2022, 53, 6896–6909. [Google Scholar] [CrossRef]
  28. Pan, Z.X.; Wang, L.; Wang, J.J.; Lu, J.W. Deep reinforcement learning based optimization algorithm for permutation flow-shop scheduling. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 7, 983–994. [Google Scholar] [CrossRef]
  29. Tian, Y.; Li, X.P.; Ma, H.P.; Zhang, X.Y.; Tian, K.C.; Jin, Y.C. Deep reinforcement learning based adaptive operator selection for evolutionary multi-objective optimization. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 7, 1051–1064. [Google Scholar] [CrossRef]
  30. Hu, Z.; Gong, W.; Li, S. Reinforcement learning-based differential evolution for parameters extraction of photovoltaic models. Energy Rep. 2021, 7, 916–928. [Google Scholar] [CrossRef]
  31. Tan, Z.P.; Li, K.S. Differential evolution with mixed mutation strategy based on deep reinforcement learning. Appl. Soft Comput. 2021, 111, 107678. [Google Scholar] [CrossRef]
  32. Sun, J.; Liu, X.; Bäck, T.; Xu, Z. Learning Adaptive Differential Evolution Algorithm From Optimization Experiences by Policy Gradient. IEEE Trans. Evol. Comput. 2021, 25, 666–680. [Google Scholar] [CrossRef]
  33. Li, T.L.; Li, Y.F.; Wang, F.; Gong, C.; Zhang, J.R.; Ma, H. Improved parallel differential evolution algorithm with small population for multi-period optimal dispatch problem of microgrids. Energies 2025, 18, 3852. [Google Scholar] [CrossRef]
Figure 1. MSWI process flow.
Figure 1. MSWI process flow.
Sustainability 17 08872 g001
Figure 2. Typical flame combustion states during MSWI process.
Figure 2. Typical flame combustion states during MSWI process.
Sustainability 17 08872 g002
Figure 3. Optimization modeling strategy.
Figure 3. Optimization modeling strategy.
Sustainability 17 08872 g003
Figure 4. PDE algorithm flowchart.
Figure 4. PDE algorithm flowchart.
Sustainability 17 08872 g004
Figure 5. LSTM algorithm flowchart.
Figure 5. LSTM algorithm flowchart.
Sustainability 17 08872 g005
Figure 6. RL algorithm flowchart.
Figure 6. RL algorithm flowchart.
Sustainability 17 08872 g006
Figure 7. Algorithm flowchart.
Figure 7. Algorithm flowchart.
Sustainability 17 08872 g007
Figure 8. PDE parameter optimization results.
Figure 8. PDE parameter optimization results.
Sustainability 17 08872 g008
Figure 9. Relation between values of RL reward and recognition model accuracy and learning epoch.
Figure 9. Relation between values of RL reward and recognition model accuracy and learning epoch.
Sustainability 17 08872 g009
Figure 10. Confusion matrix of model recognition results.
Figure 10. Confusion matrix of model recognition results.
Sustainability 17 08872 g010
Table 1. Pseudocode for flame combustion state recognition model based on deep feature and hyperparameter collaborative optimization.
Table 1. Pseudocode for flame combustion state recognition model based on deep feature and hyperparameter collaborative optimization.
Input: Original flame image dataset { X n Flame , y n Flame } n = 1 N
Output: Flame image combustion state prediction output ( y ^ n Flame ) n = 1 N
Step1:Randomly initialize the population of PDE P 0 , initialize the long-term memory value C 0 = 0 and h 0 = 0 of LSTM
Step 2:Set the iteration number Epoch = 1
Step 3:Evaluate the fitness value of the population F 0 , calculate fitness value statistic U 0 , obtain input for LSTM A 0 = [ U 0 , F 0 ]
Step 4:Input A 0 , h 0 and C 0 into the LSTM network
Step 5:LSTM obtains the values of PDE parameters F 0 and C R 0 based on the optimization experience of trained agents
Step 6:PDE optimizes the deep features and hyperparameters of the ViT-IDFC recognition model based on F 0 and C R 0
Step 7:When the pre-set maximum iteration number of Epoch is reached, the process is stopped
Step 8:Return the deep feature selection parameters [ S P 1 ,   S P 2 ,   ,   S P 12 ] best and the model hyperparametersReturn the deep feature selection parameters that optimize the performance of the ViT-IDFC model, as well as the M best and J best hyperparameters of DFC
Step 9:Construct the optimal recognition model
Table 2. Hardware configuration.
Table 2. Hardware configuration.
DeviceConfiguration
SystemWindows10
Memory32 GB
GPUInvidia GeForce RTX3060Ti
CPU11th Gen Intel(R) Core(TM) i9-11900K
Table 3. Classification result confusion matrix.
Table 3. Classification result confusion matrix.
PositiveNegative
PositiveTP (true positive)FN (false negative)
NegativeFP (false positive)TN (true negative)
Table 4. Flame image dataset.
Table 4. Flame image dataset.
Data SetTotal NumberNormal BurningPartial BurningChanneling BurningSmolderingSize
Left grate 328965511761044414720 × 576
Right grate26855641002534585720 × 576
Table 5. Hyperparameter optimization results.
Table 5. Hyperparameter optimization results.
GrateBefore and After OptimizationParameter Settings
[ S P 1 ,   S P 2 ,   ,   S P 12 ] J M
LeftBefore optimization[0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]503
After optimization[1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1]1001
RightBefore optimization[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]503
After optimization[0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1]1001
Table 6. Results of hyperparameter optimization.
Table 6. Results of hyperparameter optimization.
GrateBefore and After OptimizationEvaluation Indicators
AccuracyPrecisionRecallF1-ScoreROC-AUC
LeftBefore optimization0.92890.93510.92250.92840.9925
After optimization0.93400.93890.92830.93330.9931
RightBefore optimization0.90300.91200.89610.90270.9870
After optimization0.91040.91690.90320.90920.9879
Table 7. Parameter setting statistics of comparison methods.
Table 7. Parameter setting statistics of comparison methods.
Method(Sub) Population SizeSub Population NumberEpochTrajectory LengthTrajectory Number
DE5-5--
PDE5335--
RL-LSTM-DE53535
This study102523
Table 8. Comparative experimental results of left grate flame image models.
Table 8. Comparative experimental results of left grate flame image models.
MethodParameter SettingsAccuracyPrecisionRecallF1-ScoreROC-AUC
( X ViT k ) k = 1 12 J M
DE[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0]4350.91270.91950.90360.91090.9891
PDE[0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0]5110.92890.93190.92360.92760.9929
RL-LSTM-DE[1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1]9050.91880.92400.91060.91680.9905
This study[1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1]10010.93400.93890.92830.93330.9931
Table 9. Comparative experimental results of right grate flame image models.
Table 9. Comparative experimental results of right grate flame image models.
MethodParameter SettingsAccuracyPrecisionRecallF1-ScoreROC-AUC
( X ViT k ) k = 1 12 J M
DE[1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1]3710.88060.88880.86820.87500.9819
PDE[0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1]3910.90050.90850.89060.89700.9848
RL-LSTM-DE[1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0]2720.88680.89310.87750.88350.9838
This study[0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1]10010.91040.91690.90320.90920.9879
Table 10. Experimental results of left grate ablation.
Table 10. Experimental results of left grate ablation.
MethodParameter SettingsAccuracyPrecisionRecallF1-ScoreROC-AUC
( X ViT k ) k = 1 12 J M
PDE[0, 1, 0, 1, 0, 10, 1, 1, 0, 1, 0]5110.92890.93190.92360.92760.9929
LSTM-PDE[0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1]5510.92990.93990.91830.92770.9915
RL-LSTM-PDE[1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1]10010.93400.93890.92830.93330.9931
Table 11. Results of right grate ablation experiment.
Table 11. Results of right grate ablation experiment.
MethodParameter SettingsAccuracyPrecisionRecallF1-ScoreROC-AUC
( X ViT k ) k = 1 12 J M
PDE[0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1]3910.90050.90850.89060.89700.9848
LSTM-PDE[1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1]2110.90300.90920.89490.90090.9869
RL-LSTM-PDE[0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1]10010.91040.91690.90320.90920.9879
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tang, J.; Yang, X.; Wang, W.; Rong, J. Evaluating Municipal Solid Waste Incineration Through Determining Flame Combustion to Improve Combustion Processes for Environmental Sanitation. Sustainability 2025, 17, 8872. https://doi.org/10.3390/su17198872

AMA Style

Tang J, Yang X, Wang W, Rong J. Evaluating Municipal Solid Waste Incineration Through Determining Flame Combustion to Improve Combustion Processes for Environmental Sanitation. Sustainability. 2025; 17(19):8872. https://doi.org/10.3390/su17198872

Chicago/Turabian Style

Tang, Jian, Xiaoxian Yang, Wei Wang, and Jian Rong. 2025. "Evaluating Municipal Solid Waste Incineration Through Determining Flame Combustion to Improve Combustion Processes for Environmental Sanitation" Sustainability 17, no. 19: 8872. https://doi.org/10.3390/su17198872

APA Style

Tang, J., Yang, X., Wang, W., & Rong, J. (2025). Evaluating Municipal Solid Waste Incineration Through Determining Flame Combustion to Improve Combustion Processes for Environmental Sanitation. Sustainability, 17(19), 8872. https://doi.org/10.3390/su17198872

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop