1. Introduction
Pollution control and prevention are crucial for the sustainable development of the global economy, the protection of the environment, and the safeguarding of public health, making them a global priority [
1,
2]. Municipal solid waste (MSW) is growing annually by 8% to 10% worldwide [
3,
4]. In developing countries such as China, many cities are at risk of being overwhelmed by MSW, leading to significant environmental challenges [
5]. MSW incineration (MSWI) is a complex system that converts waste into energy (WTE) and plays a key role in addressing urban environmental issues [
6,
7,
8] while supporting renewable energy recycling [
9]. Although MSWI is a scientifically validated waste treatment method, emissions from these plants are a major source of pollution [
10,
11], often facing opposition due to the “Not in My Backyard” (NIMBY) effect [
12,
13]. The combustion stage of the MSWI process is the primary source of environmental indicators (EIs) such as NOx, CO, HCl, and SO
2 [
14]. Flame combustion state recognition technology enables early detection of abnormal combustion conditions, thereby guiding the control system to promptly adjust operational parameters such as grate speed and facilitating a rapid return to normal combustion. Under abnormal conditions, incomplete fuel combustion leads to increased carbon monoxide emissions, a significantly higher risk of dioxin formation due to suboptimal combustion environments, and reduced power generation efficiency [
3]. Furthermore, deteriorating conditions may even cause unplanned shutdowns and equipment wear. Therefore, accurate identification of flame combustion states is crucial for pollution reduction and optimized control.
Due to differences in operating equipment, the uncertainty in the maintenance of MSWI plants, fluctuations in the composition of MSW, and strong interference from the environment within the incinerator, accurately identifying the flame combustion state is challenging [
3]. Additionally, the reliance on experts for manual recognition of combustion status lacks stability, making it difficult to ensure consistent operational efficiency and pollution reduction in the MSWI process [
15]. The process of expert manual identification is intelligently modeled by constructing a data-driven mapping function from image space to state space, thereby transforming human visual experience based on prior knowledge into a computable and optimizable mathematical model. Leveraging the powerful representational learning capability of deep neural networks, discriminative deep-level information is automatically extracted from raw data, and high-precision modeling is achieved by learning the underlying complex patterns and decision boundaries. By building a high-accuracy artificial intelligence-based combustion state recognition model, limitations of human experts—such as subjectivity, limited experience, and cognitive biases—are overcome in terms of generalization, robustness, and judgment consistency. This provides effective support for the intelligent operation of the MSWI process.
Extensive research has been conducted on the identification of MSWI flame combustion states. Zhou et al. proposed a PCA-K-means clustering approach to distinguish between abnormal and normal flames, using only two states for classification [
16]. Huang, on the other hand, extracted key parameters—such as the grayscale mean, flame area ratio, and flame high-temperature rate—from combustion images through image processing technology. These parameters were then input into K-nearest neighbor (KNN) and convolutional neural network (CNN) models for partial burning state recognition [
17]. However, this approach relies on only a few physical features. Zhou also extracted 12 physical features from flame images based on expert knowledge and used a backpropagation neural network (BPNN) for combustion state recognition [
18]. Additionally, Qiao et al. constructed a combustion state recognition model based on flame image color moment features and applied the least squares support vector machine (LS-SVM) [
19]. Guo et al. took a different approach by generating abnormal working condition images using deep convolutional adversarial networks (DCGANs) to construct a CNN for combustion state recognition. This method addresses the issue of limited abnormal combustion images [
20]. However, it only classifies combustion states based on the position of the combustion line, which provides limited local information. In contrast, Pan et al. created a flame combustion state recognition database using global image features and employed an improved visual transformer-based deep forest classification (ViT-IDFC) algorithm for state recognition [
21]. The key limitation of the research on MSWI flame state recognition models mentioned above is that determining hyperparameters often requires expert experience and repeated experimentation, which is time-consuming, resource-intensive, and computationally expensive. Furthermore, these methods do not guarantee that the model will achieve optimal recognition performance.
To address these issues, this article conducts research on flame combustion state recognition through the collaborative optimization of deep features and model hyperparameters. Evolutionary algorithms (EAs) are commonly used to solve the aforementioned optimization problems [
22]. Among all possible combinations, EAs can achieve the search for a globally optimal or near-optimal solution, thereby enabling the model to reach its best performance. The optimization process simulates the mechanisms of “natural selection” and “genetic inheritance” observed in biological systems. Each “population” consists of multiple potential solutions, and the algorithm iteratively performs operations such as selection, crossover, and mutation. Based on the evaluation results of each solution, superior individuals are retained while inferior ones are eliminated, gradually evolving solutions with progressively improved performance. Among these, the Differential Evolution (DE) algorithm is a heuristic optimization technique primarily used to solve continuous problems in real space. Due to its automatic adaptation and ease of implementation, DE has been widely adopted. For example, Dilbag et al. optimized the hyperparameters of a remote sensing image visibility restoration model using dynamic DE [
23], Singh et al. adjusted the initial parameters of a CNN using multi-objective DE [
24], and Chen et al. introduced Gaussian–Cauchy mutation and parameter adaptation strategies in DE to solve the economic scheduling problem of large-scale cogeneration systems [
25]. These studies achieved improved results by optimizing model parameters using the DE algorithm. However, DE faces challenges such as stagnation and premature convergence during the optimization process, which are closely linked to the control parameters and the crossover rate of the algorithm. Clearly, for different optimization problems, the most suitable parameter settings must strike a balance between maintaining population diversity and enhancing convergence speed. Typically, the control parameters of DE are set based on optimization experience obtained through repeated experiments.
Reinforcement learning (RL) achieves task goals through interactive learning between agents and the environment [
26]. Unlike supervised and unsupervised learning, RL focuses on decision-making based on environmental feedback. In the RL framework, agents learn to maximize long-term cumulative rewards through interaction with the environment. Its advantages include the ability to make sequential decisions, adapt to environmental changes, balance exploration and exploitation, not requiring large amounts of labeled data, and having high generalizability [
27]. Currently, RL is gradually becoming an important tool for solving model hyperparameter optimization. For example, Pan et al. designed a deep neural network, PFSPNet [
28], to achieve an end-to-end output model that is not limited by problem size, aimed at solving the schedule ng problem in an assembly line workshop. They applied a deep RL execution evaluation strategy to optimize the network parameters of the model. The results showed that this approach outperformed existing heuristic algorithms. Tian et al. addressed the exploration and exploitation challenges faced by EAs in operator selection during the search process [
29]. They treated decision variables as states and candidate operators as actions, using deep neural networks to learn and estimate the Q-value strategy for each action in a given state. Based on reinforcement learning, their proposed method was used for operator selection. Experimental results demonstrated that the method could successfully identify the optimal operator for each parent.
Furthermore, some scholars are focused on exploring how to combine RL with DE in the search process to achieve adaptive selection of optimal parameters. For example, Hu et al. proposed integrating RL with DE in photovoltaic models [
30], evaluating fitness function values during iterations to determine action rewards for adjusting parameter values, and then using RL to adjust these parameters to identify the most suitable algorithm settings for the environmental model. Tan et al. introduced a hybrid mutation strategy DE algorithm based on deep Q-network (DQN) [
31], called DEDQN, which implements adaptive selection of mutation strategies during the evolutionary process through RL. Sun et al. used Long Short-Term Memory (LSTM) as the parameter controller for DE and employed an RL algorithm to optimize the parameters of the LSTM [
32]. These studies leverage the reward/punishment mechanism of RL, enabling agents to continuously learn the optimal parameter combinations through trial and error, thereby improving model performance. However, there is a lack of research on improving the operational efficiency of the DE algorithm and on performing deep feature and model hyperparameter collaborative optimization in the context of flame combustion state recognition.
In the MSWI process, variations in process parameters such as MSW heating value and air supply volume directly influence the flame combustion state, which manifests as differences in flame image data distribution. Building on prior knowledge from field experts, reference [
21] proposed a global feature-oriented flame combustion state image dataset and introduced the ViT-IDFC recognition algorithm. However, the multilayer feature selection and hyperparameter determination of this model mainly depend on manual experience. When the distribution of the flame combustion state dataset changes, the effectiveness of manual parameter selection decreases significantly. Thus, there is a need to develop a method capable of adaptively selecting hyperparameters based on changes in dataset distribution. This approach ensures that when process parameter variations such as MSW heating value or air supply volume cause a decline in model recognition accuracy, this performance degradation signal promptly triggers the optimization framework, driving it to automatically adjust feature selection parameters and model hyperparameters to regain adaptation to new operating conditions. To improve the performance of the flame global characterization information combustion state recognition model for MSWI processes, it is crucial to address the following: (1) how to select appropriate feature selection parameters and recognition model hyperparameters for encoding, and (2) how to optimize the control parameters of the parallel DE (PDE) algorithm to accommodate changes in the image dataset distribution. parameters of the parallel differential evolution algorithm to adapt to changes in image dataset distribution.
In response to the above challenges, this article proposes a ViT-IDFC recognition model optimized by RL-LSTM-PDE. First, the hyperparameter selection process of the ViT-IDFC combustion state recognition model is encoded as a PDE optimization problem. Next, control parameters—such as the mutation factor and crossover factor of PDE—are modeled using LSTM, with the LSTM output providing the optimal hyperparameters for the ViT-IDFC combustion state recognition model. Finally, the network parameters of the LSTM model are derived through PDE during the empirical learning process, which optimizes both ViT features and IDFC model hyperparameters. The innovations of this article are as follows: (1) a novel optimization modeling strategy for deep features and model hyperparameters, which formalizes the selection process as an observable Markov Decision Process (MDP); (2) the use of a multi-layer optimization approach, combining PDE for model parameter optimization, LSTM for predicting the evolutionary parameters of PDE, and RL for optimizing PDE parameters based on model generalization performance, to solve the MDP problem; (3) the collaborative optimization of ViT-IDFC model features and hyperparameters in the MSWI process, which improves the model’s generalization performance.
3. Optimization Modeling Strategy
The optimization modeling strategy proposed in this article is shown in
Figure 3.
In
Figure 3,
and
represent the flame image dataset and its corresponding combustion state, respectively;
represents the deep feature selection parameters of the ViT-IDFC combustion state recognition model submodule;
and
represent model hyperparameters such as the number of decision trees and the minimum number of partition samples for leaf nodes in IDFC;
and
represent the mutation factor and crossover factor of PDE; and
represents the network parameter of LSTM.
The functions of each module are as follows:
- (1)
PDE-based deep feature and hyperparameter optimization module: encodes the feature selection parameters and recognition model hyperparameters and of ViT-IDFC as decision variables for PDE optimization, and the fitness is characterized by the recognition;
- (2)
LSTM-based PDE parameter prediction module: an LSTM model is constructed, with the mutation factor and crossover factor of PDE as inputs, and optimized and as the outputs;
- (3)
RL-based LSTM parameter optimization module: treating LSTM as a Markov decision process and using RL optimization based on policy gradient algorithm to obtain the optimized control parameters of PDE.
Therefore, the optimization modeling process is as follows. Firstly, based on RL agents that can learn from experience, we obtain the network parameters of LSTM. Then, based on LSTM, we determine the and parameters of PDE. Next, based on PDE, we determine the parameters of the ViT-IDFC model, such as , and . Finally, we obtain the optimized recognition model.
6. Conclusions
To address the challenges of time consumption, high dependency, and difficulty in adaptively achieving optimal results when manually selecting optimization features and hyperparameters based on experience for the MSWI flame combustion state recognition model, an optimization method for collaboratively optimizing deep features and model hyperparameters is proposed. The optimization framework of the proposed method is essentially a general-purpose adaptive model optimizer, whose design is independent of the inherent characteristics of any specific species or dataset. By replacing the target dataset and correspondingly adjusting the preprocessing parameters of the model’s input layer, the framework can be quickly adapted to a variety of application scenarios, such as plant species recognition and combustion state classification in coal-fired power plants, demonstrating its capability for cross-scene migration and adaptation.
The main contributions of this study are as follows: (1) proposing a deep feature and hyperparameter optimization strategy for the recognition model based on RL-LSTM-PDE, which reduces the time-consuming process and inconsistencies caused by reliance on manual experience for feature and hyperparameter selection; (2) transforming the deep feature selection and hyperparameter optimization problem of the combustion state recognition model into a coding design and optimization problem for the PDE algorithm, reinterpreting the determination of mutation and selection factors for the PDE algorithm as an LSTM model prediction task, and framing the optimization of LSTM parameters as an RL problem within the context of optimizing the ViT-IDFC combustion state recognition model; (3) implementing the optimization of deep features and model hyperparameters for the ViT-IDFC combustion state recognition model based on the proposed strategy, and validating its effectiveness using real process data.
In the field of industrial AI applications, the integration of interpretable AI methods has become a crucial component for building trustworthy systems. The ViT-IDFC model used in this article not only visually identifies the key regions of flame images with varying depth features but also demonstrates intrinsic recognition causality through a decision tree-based recognition algorithm that offers traceability and interpretability. These features make the inference process of complex models transparent and traceable, helping operators understand the decision logic behind AI. This transparency is critical in enhancing the reliability of human–machine collaboration, particularly in industrial environments.
Future research should focus on reducing the time cost associated with the operation of this strategy. Additionally, we will collect flame image data under various operating conditions from multiple MSWI facilities and validate the methodology through extendable testing.