Benchmark Operational Condition Multimodal Dataset Construction for the Municipal Solid Waste Incineration Process

Hua, Yapeng; Tang, Jian; Tian, Hao

doi:10.3390/su18052282

Open AccessArticle

Benchmark Operational Condition Multimodal Dataset Construction for the Municipal Solid Waste Incineration Process

by

Yapeng Hua

^1,2,

Jian Tang

^1,2,*

and

Hao Tian

^1,2

¹

School of Information Science and Technology, Beijing University of Technology, Beijing 100124, China

²

Beijing Laboratory of Smart Environmental Protection, Beijing 100124, China

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(5), 2282; https://doi.org/10.3390/su18052282

Submission received: 3 August 2025 / Revised: 20 September 2025 / Accepted: 25 September 2025 / Published: 27 February 2026

(This article belongs to the Section Waste and Recycling)

Download

Browse Figures

Versions Notes

Abstract

Municipal solid waste incineration (MSWI) is a typical complex industrial process for achieving sustainable development of the global environment. It implements the “perception-prediction–control” mode based on domain experts by using multimodal information. To harness the complementary value of different modal data, prevent information conflicts or fusion failures caused by misalignment, and ensure the availability of multimodal datasets and the reliability of analytical conclusions, constructing a benchmark operational condition multimodal dataset is essential. The objective of this work was to create a multimodal reference database for the operational status of IMSW processes. Based on the description of the MSWI process and the analysis of the characteristics of the multimodal data, the process data is first preprocessed under different missing scenarios, missing value processing and outlier processing. Then, single-frame images of the flame video are captured on a minute scale, and the missing combustion lines are quantized by using machine vision technology. Finally, the alignment of combustion line quantization (CLQ) values with the minute time scale of process data is achieved through the multimodal time synchronization module. Taking an MSWI power plant in Beijing as the research object, the combustion flame video and process data under the benchmark operating conditions were collected. The hybrid missing value management strategy combining linear interpolation with the LRDT model improved data integrity, and a spatiotemporal aligned multimodal dataset was constructed. The standardized benchmark operating condition multimodal data was obtained to support combustion state analysis during the incineration process, pollutant generation prediction, and process optimization. Therefore, the objectives of ‘reduction, harmlessness, and resource utilization’ of municipal solid waste, addressing land resource shortages, protecting the ecological environment, and promoting the dual carbon goal can be supported. Additionally, data and technical support for environmental and urban sustainable development are provided.

Keywords:

combustion line quantization (CLQ); dual carbon goals; flame video; missing value imputation; multimodal dataset; municipal solid waste incineration (MSWI); sustainable development

1. Introduction

With the advancement of urbanization, the total amount of municipal solid waste (MSW) produced has increased rapidly [1]. This seriously affects the realization of sustainable development of the global environment, which has led to the phenomenon of “garbage encirclement” in many cities in developing countries such as China [2]. To promote the realization of the sustainable development of the urban environment [3], the handling of MSW is extremely urgent. At present, the municipal solid waste incineration (MSWI) technology has been widely applied worldwide due to its advantages, such as harmlessness, reduction, and resource utilization [4]. However, due to the uncertainty and difficulty in real-time detection of the MSW components, differences in geographical regions and seasonal fluctuations, equipment wear and aging, and other factors, the operation of MSWI process in developing countries such as China relies on the “perception-prediction-control” manual mode based on domain experts by using the multi-mode data such as flame video and process data [5]. Obviously, a benchmark operation condition multimodal dataset is the basis for the academic community to effectively study intelligent optimization control algorithms.

Due to factors such as sensor failures and process interferences, approximately 2–3% of the data from an MSWI power plant in Beijing are missing, affecting subsequent analyses [6]. Therefore, it is necessary to fill in the missing or abnormal data of the MSWI process. The processing methods for missing data are mainly divided into the simple deletion method, the weighting method, and the filling method [7,8,9]. When the sample size is large enough and the proportion of missing data is less than 5%, the deletion method can be used [10]. In low-resource deep text matching scenarios, transfer learning utilizes resource-rich source domain data to alleviate the data famine problem existing in the target domain. Li et al. proposed a cross-domain deep text matching method based on the meta-learning instance weight method, which better fits the data distribution in the target domain and thereby improves the model transfer effect [11]. Liang et al. proposed an adaptive K-nearest neighbor missing value filling method for non-uniformly distributed imbalanced datasets, and experiments demonstrated that it has high filling accuracy and classification accuracy [12].

To address the issue of data filling for missing features in the water network database, Kabir et al. used the average value of the data to replace the missing values [13]. However, this approach is only suitable for situations with small data scales and limited missing data. Yan et al. interpolated the missing values of the international soil moisture network time series using cyclone global navigation satellite system data, enhancing integrity by fusing satellite and ground data [14]. However, they relied on the accuracy of interpolated data, and the filling effect on long-term continuous missing data remains to be verified. Tzoumpas et al. proposed filling for missing values based on a convolutional neural network (CNN) and bidirectional long short-term memory (LSTM), which can capture local spatial features and long-term and short-term temporal dependencies and is suitable for data with both local fluctuations and temporal correlations [15]. However, it has high requirements for temporal continuity and feature stability, a complex structure, high training costs, and is prone to overfitting with small samples. Deng et al. proposed an expectation-maximization (EM) filling algorithm based on a random process for the abnormal phenomenon of remote monitoring data of concrete pump trucks and used the random process approximation method to compensate for the filling effect of EM [16]. However, this method ignores the local similarity of the data, and the convergence speed gradually slows down as the proportion of missing data increases. Zhao et al. proposed a missing data filling method based on the improved K-means [17], selecting the data with the smallest distance from the missing data as the nearest neighbor to complete the filling. Its limitation lies in how to define the similarity criteria to adapt to the dataset. For the missing data in the urban sewage treatment process, a data filling method based on the radial basis function (RBF) neural network and an improved support vector machine is used [18,19]. The literature [20] provides a review of the main methods for handling missing time series data from a data-dimensional perspective. Due to the error sources such as time scale error, phase jump, and ranging system error existing in the original laser ranging interferometry (LRI) measurement phase, Yin et al. carried out fine data preprocessing on the LRI measurement phase, effectively eliminating various errors in the original measurement phase [21]. For the MSWI process, different process variables exhibit varying distribution characteristics over time. Typically, domain experts can accurately infer discontinuous missing values based on the physical meaning of process variables and their distribution changes across different periods. Additionally, the MSWI process requires preprocessing to align process data and eliminate influencing factors, such as unit conversion, to ensure consistency and accuracy in the analysis.

The combustion line is a critical controlled process parameter that reflects the combustion stability and operational safety of the MSWI process. It is typically monitored using the “man-watch” mode at most sites in developing countries, such as China [1]. Combustion nine quantization (CLQ) enhances control intelligence and automation through real-time feedback [22]. Moreover, the construction of a multimodal dataset requires matching the process data with flame images at corresponding moments. Data alignment is a core prerequisite to ensuring the effective association and integration of different modal information. Multimodal data (such as process data, images, etc.) often come from heterogeneous acquisition sources and inherently exhibit heterogeneity on the time scale. Without proper alignment, data from different modalities become isolated information units and cannot establish accurate correlations. Only through time synchronization and alignment operations can multimodal data achieve one-to-one correspondence in time, providing a consistent informational foundation for subsequent cross-modal retrieval and fusion modeling.

The United Nations Sustainable Development Goals explicitly call for efficient urban solid waste management alongside ecological protection. As a core method for urban solid waste treatment, MSWI faces challenges such as difficult combustion monitoring, limited pollutant early warning, and constrained resource optimization due to the lack of multidimensional “combustion–parameters–emissions” data support, which restricts sustainable development. This study constructs a multimodal MSWI dataset to provide a data foundation for AI modeling and process optimization, promoting the transition of MSWI toward “proactive sustainable management,” resolving the conflict between “waste encirclement” and ecological protection, and supporting urban sustainable development.

In this article, using a specific MSWI power plant in Beijing as a case study, flame video and process data were collected over five days under benchmark operating conditions to construct a multimodal dataset. The innovations presented in this article are as follows: (1) the first construction of a benchmark operating condition multimodal dataset; (2) predicting and filling in missing process data; and (3) quantifying and recovering missing controlled variables based on flame images.

2. Description of MSWI Process

The process flow of a typical grate furnace MSWI mainly includes six process stages: storage and fermentation, solid waste combustion, waste heat exchange, steam power generation, flue gas treatment, and flue gas emission [23]. This study is based on a typical MSWI power plant with a grate furnace in Beijing. The process flow is shown in Figure 1.

The weighed MSW is unloaded into the solid waste pool by a material transport vehicle. After dehydration, fermentation, and homogenization by the grab bucket stirring, it is sent to the hopper and quantitatively supplied to the drying grate by the feeder. After being dried by primary air, it is incinerated in the combustion grate. On-site experts monitor the combustion status through the furnace flame video, adjusting the grate speed and air volume to maintain stability. Unburned components are completely combusted in the grate. The slag is cooled by the slag remover and then transported to the slag pit. The incineration heat is exchanged through the waste heat boiler to generate steam for power generation. The flue gas passes through a system composed of equipment such as activated carbon tanks, slaked lime tanks, and fly ash tanks for desulfurization, denitrification, dust removal, and dioxin (DXN) removal. The final flue gas, containing particulate matter, CO, CO₂, NO_x, SO₂, HCl, and DXN, is discharged through the chimney by the induced draft fan [24].

In developing countries such as China, in addition to the differences in the composition and calorific value of incinerator fuels caused by factors like MSW classification systems and residents’ living habits, there are also disparities between developing and developed countries in terms of expert skill levels and equipment operation and maintenance technologies. Currently, MSWI power plants in developing countries mainly operate in a manual control mode, with frequent and strong intervention from domain experts (i.e., knowledge workers). Essentially, this is an optimization control mode based on the embodied intelligence of domain experts, following the “perception—prediction—control” process, as illustrated in Figure 2.

Essentially, this is an embodied intelligent mode summarized in combination with the actual situation of many strong interferences existing in the MSWI dynamic system, which can be briefly described as follows. Based on multi-modal information such as on-site flame images, time-series process data, and operation records, domain experts, in accordance with their own “brain models”, use on-site time-series process data and flame videos to monitor in real time easily measurable process parameters such as furnace temperature, oxygen content in flue gas, and emission concentration of conventional flue gas pollutants, as well as predict difficult-to-measure process parameters such as emission concentration of trace pollutants. Based on the process mechanisms mastered through professional education and training and the long-term accumulated experience and knowledge, after reasoning and understanding the current dynamic operating conditions and treatment priorities, the control behavior in the solid waste combustion, waste heat exchange, and flue gas purification stages is empirically regulated.

However, with the continuous development of AI, this article can study embodied intelligence models to replace the domain experts’ process of “perception—prediction—control” in order to achieve intelligent control. For example, by processing flame video images and generating single-frame images of flame combustion lines, which are then quantized to obtain corresponding values, this article can achieve the quantification of the “observable and unobservable” controlled variable/auxiliary variable (CV/AV). Furthermore, by embedding the process mechanisms and the long-term accumulated experience and knowledge of domain experts into the embodied intelligence model, AI-driven collaborative control can replace manual control by domain experts. This enables more efficient combustion of MSW, leading to a reduction in pollutant emissions. Therefore, the construction of a multimodal dataset is crucial, as it serves as the foundation for developing the reproduction model based on the domain expert’s embodied intelligence.

3. Materials and Methods

3.1. Data Collection and Selection Description

The data in this article is sourced from an MSWI power plant in Beijing. The structure of the edge verification platform for realizing the real-time data collection is shown as Figure 3.

To support the research on the intelligent algorithm of the MSWI process, the multi-modal data acquisition system on this platform was deployed at the industrial site, enabling real-time collection and storage of operational process data. From 17 to 18 October 2019, 8 November 2019, 22 November 2019, and 29 November 2019, process data and flame video images were collected for approximately 2 h each day. The determination of this 5-day data collection cycle is based on the following considerations: (1) avoiding abnormal periods caused by equipment maintenance or garbage feeding, as verified by the MSWI power plant’s operation log; (2) selecting dates when the ambient temperature is stable, thus eliminating the influence of extreme weather on combustion; (3) the 2 h daily collection window corresponds to the peak garbage feeding period at the power plant, ensuring that the data reflects typical working conditions; and (4) the incineration plant adopts a manual operation mode managed by field experts, and the 2 h collection window is the optimal duration for maintaining stable MSWI operation and the minimum sample duration required for detecting difficult-to-measure pollutants.

To achieve stable combustion, compliance with emission standards, and improved thermal efficiency in the MSWI process, numerous process variables must be continuously monitored and appropriately adjusted. Based on actual control objectives and regulatory requirements, the process variables considered in this study are categorized into three groups as follows:

(1): Environmental indicators (EIs): These variables reflect the environmental performance of the incineration process and are used as optimization targets.
(2): Controlled variables (CVs): These are key output variables directly regulated through control actions.
(3): Manipulated variable (MVs): These are input variables that can be directly adjusted by the control system.

In the MSWI system, there are strong coupling relationships among process variables. Relying solely on statistical correlations for variable screening often makes it difficult to identify variable combinations that are practically significant for control decisions. To address this, the study introduces a domain expert rule-guided method. This approach focuses on actual industrial control requirements, incorporating on-site operational experience and professional knowledge to identify key variables, thereby providing a solid foundation for subsequent modeling and optimization. The specific details of the selected variables are presented in Table 1.

To construct a multimodal dataset for the MSWI process, this article processes the MSWI process data and flame video. The proposed strategy consists of four modules: process data processing, flame image processing, combustion line quantification, and multimodal data synchronization modules, as shown in Figure 4.

As can be seen from Figure 4, the construction process of the multimodal dataset is as follows: When one process’s data is missing and there are actual accurate data before and after, the weighted average processing of the time before and after is adopted. That is, the average of the accurate data of one minute before and after that moment is taken. If the missing data is located at the beginning or end of the time, or if there are two or more consecutive missing data or obvious abnormal data, the linear regression decision tree (LRDT) model should be used for data filling. The collected flame videos are captured as one flame image at the same time every minute, and the flame images are quantized to obtain the corresponding quantized values. Finally, a multimodal dataset is formed with the process data at the corresponding time.

3.2. Process Data Processing Module

3.2.1. Process Data Preprocessing Sub-Module

To facilitate the study of the MSWI process from an academic perspective and in combination with the experience of domain experts, this article conducts 15 MVs, as detailed in Table 2.

The average grate speed in the constructed dataset is defined as the reference speed for 2 h. By using a grate length of 11 m, the grate speed in the constructed dataset is converted to match the units used in the mechanical dataset. The conversion formula is as follows:

V_{Process}^{Trans} (m / h) = V_{Process}^{Ori} (%) \times \frac{11}{2 \times {\bar{V}}_{Process}^{Ori} (%)}

(1)

where

{\bar{V}}_{Process}^{Ori}

and

V_{Process}^{Ori}

are, respectively, the average grate speed and the actual sample value in the constructed dataset, with the unit of %; and

V_{Process}^{Trans}

is the grate speed converted in units, with the unit being m/h.

In the process of constructing the dataset, the average grid speed in this study is calculated by averaging the values of each column within the same feature over a period of (sample size/60) hours. During data processing, missing values and outliers were identified and addressed accordingly.

3.2.2. Missing Value Handling Sub-Module

The phenomenon of randomly distributed missing data can be classified into single data missing and consecutive multiple data missing. In the MSWI process data, it is empirically known that it exhibits linear characteristics within a certain range. For the missing data with random distribution, linear interpolation is used to fill the gaps. Based on experience, we determine whether the data meets the requirement that the time interval between the current missing value and the known values above and below is equal. If they are equal, the mean of the two known values is used as the filling value; if not, the filling value is determined based on experience. The samples in this study are collected at time intervals such as every minute. Using linear interpolation, the average of the data before and after the missing value is taken as the filling value.

For the absence of two or more consecutive data in the random distribution and feature dimension, feature selection is an important step in data filling preprocessing. The first problem to be solved is based on which known data to model for the prediction of filling values. For the prediction of the manipulated variables, part of it is directly collected from the original process data, while other variables are obtained by summing or averaging certain features in the original process data. Secondary filling is required to conduct mutual information (MI) analysis between them and all the features of the original process data and select several features with higher correlations as the correlation input features.

A missing feature prediction model based on LRDT is constructed using certain correlated features as input. After training the model with the complete input and output features, the data types of the samples with missing output features are verified and transformed to ensure all data are numerical. The statistical parameters (mean and standard deviation) of the training set are used to standardize the new data samples. The trained LRDT model is then used for prediction, and the prediction results are destandardized to obtain the predicted value in the original data scale.

The modeling and implementation process of the LRDT model is as described below.

First, the process dataset is denoted as follows:

D = {\{\begin{array}{l} x_{11}, x_{12}, x_{13}, \dots, x_{1 (M - 1)}, x_{1 M} \\ x_{21}, x_{22}, x_{23}, \dots, x_{2 (M - 1)}, x_{2 M} \\ x_{31}, x_{32}, x_{33}, \dots, x_{3 (M - 1)}, N a N \\ x_{41}, x_{42}, x_{43}, \dots, x_{4 (M - 1)}, N a N \\ ⋮ \\ x_{N 1}, x_{N 2}, x_{N 3}, \dots, x_{N (M - 1)}, x_{N M} \end{array}\}}_{N * M}

(2)

where

N

represents the total number of samples in the dataset and

M

represents the number of features in the dataset. The first (

M - 1

) columns are feature variables with a high correlation to the missing feature, and the

M

column is the missing feature.

The (

M - 1

) correlated input features are represented as follows:

x_{p c c} = {\{\begin{array}{l} x_{11}, x_{12}, x_{13}, \dots, x_{1 (M - 1)} \\ x_{21}, x_{22}, x_{23}, \dots, x_{2 (M - 1)} \\ x_{31}, x_{32}, x_{33}, \dots, x_{3 (M - 1)} \\ x_{41}, x_{42}, x_{43}, \dots, x_{4 (M - 1)} \\ ⋮ \\ x_{N 1}, x_{N 2}, x_{N 3}, \dots, x_{N (M - 1)} \end{array}\}}_{N * (M - 1)}

(3)

The missing feature needs to be filling is denoted as follows:

x_{o r i} = {[x_{1 M}, x_{2 M}, N a N, N a N, \dots, x_{N M}]}^{T}

(4)

Since the missing feature in the output features

x_{o r i}

is random in the actual data filling processing, this article assumes that the output features in the 3rd to (

N

−1) samples are missing. The complete samples of the data for training are extracted and marked as follows:

x^{t r a i n} = \{\begin{array}{l} x_{11}, x_{12}, x_{13}, \dots, x_{1 (M - 1)}, x_{1 M} \\ x_{21}, x_{22}, x_{23}, \dots, x_{2 (M - 1)}, x_{2 M} \\ x_{N 1}, x_{N 2}, x_{N 3}, \dots, x_{N (M - 1)}, x_{N M} \end{array}\}

(5)

The input feature dataset of the remaining missing samples for predicting the missing values is denoted as follows:

x^{p r e - i n} = \{\begin{array}{l} x_{31}, x_{32}, x_{33}, \dots, x_{3 (M - 1)} \\ x_{41}, x_{42}, x_{43}, \dots, x_{4 (M - 1)} \\ ⋮ \\ x_{(N - 1) 1}, x_{(N - 1) 2}, x_{(N - 1) 3}, \dots, x_{(N - 1) (M - 1)} \end{array}\}

(6)

Next, the data are traversed to calculate the mean squared error (MSE) of the target value, as shown below:

\begin{array}{l} L_{k}^{MSE} (n, i) = f_{MSE} (x_{left}^{t r a i n}) + f_{MSE} (x_{right}^{t r a i n}) \\ = \frac{1}{5} \sum_{T a r g = 1}^{5} \{\begin{array}{l} {((x_{T a r g, Left}^{\land} - {\bar{x}}_{T a r g, Left}^{\land}) I (x_{i} \in x_{left}^{t r a i n}))}^{2} \\ + {((x_{T a r g, Right}^{\land} - {\bar{x}}_{T a r g, Right}^{\land}) I (x_{i} \in x_{right}^{t r a i n}))}^{2} \end{array}\} \end{array}

(7)

where

L_{k}^{MSE} (n, i)

represents the loss function value of MSE, and

(n, i)

represents the

k

th feature of the

n

th sample in the

i

th iteration.

Then, based on the above calculation results, the minimum MSE is selected as the first non-leaf node

x_{nonleaf}^{1}

as the segmentation variable:

x_{1, nonleaf}^{MD} = \min {\{L_{k}^{MSE} (n, i)\}}_{k = 1}^{K}

(8)

Then, based on Formulas (7) and (8), the above process is repeated and

(T / 2 - 1)

intermediate nodes

\{x_{nonleaf}^{1}, x_{nonleaf}^{3} \dots, x_{nonleaf}^{T / 2 - 1}\}

are obtained according to the empirically set minimum sample size

θ

.

In a typical regression decision tree, since the predicted values of leaf nodes in the CART only consider the sample mean, the relationship between input features and missing feature values is ignored. Therefore, the LRDT algorithm improves it to calculate the predicted output of classification and regression tree (CART) leaf nodes using the linear regression method, as follows:

{\hat{X}}_{leaf}^{t} = X_{leaf}^{t} W_{leaf}^{t} = X_{leaf}^{t} [w_{leaf}^{1}, w_{leaf}^{2}]

(9)

where

{\hat{X}}_{leaf}^{t}

,

X_{leaf}^{t}

and

W_{leaf}^{t}

are the output, input, and weight matrix of the

t

th leaf node, respectively.

Since the number of samples is greater than the feature dimension, obtaining the weight vector can be classified as solving a class of overdetermined matrix equations. To ensure the convergence of the weight vector, a regularized least squares loss function is adopted here, as follows:

J (W_{leaf}^{t}) = \frac{1}{2} {[\begin{matrix} ({‖X_{leaf}^{t} w_{leaf}^{1} - {\hat{x}}_{leaf}^{1}‖}_{2}^{2} + λ {‖w_{leaf}^{1}‖}_{2}^{2}) \\ ({‖X_{leaf}^{t} w_{leaf}^{2} - {\hat{x}}_{leaf}^{2}‖}_{2}^{2} + λ {‖w_{leaf}^{2}‖}_{2}^{2}) \end{matrix}]}^{T}

(10)

where

{\hat{x}}_{leaf}^{t}

is the true output of the leaf node; and

λ

represented the coefficient of the regularization term with

λ \geq 0

.

Furthermore, the gradient of the above loss function with respect to

w_{leaf}^{t}

can be expressed as follows:

\begin{matrix} \frac{\partial J (W_{leaf}^{t})}{\partial {(w_{leaf}^{t})}^{T}} \\ = \frac{\partial}{\partial {(w_{leaf}^{t})}^{T}} (\begin{array}{l} {(X_{leaf}^{t} w_{leaf}^{t} - {\hat{x}}_{leaf}^{t})}^{T} (X_{leaf}^{t} w_{leaf}^{t} - {\hat{x}}_{leaf}^{t}) \\ + λ {(w_{leaf}^{t})}^{T} (w_{leaf}^{t}) \end{array}) \\ = {(X_{leaf}^{t})}^{T} X_{leaf}^{t} w_{leaf}^{t} - {(X_{leaf}^{t})}^{T} {\hat{x}}_{leaf}^{t} + λ (w_{leaf}^{t}) \end{matrix}

(11)

Then, by solving

\frac{\partial J (W_{leaf}^{t})}{\partial {(w_{leaf}^{t})}^{T}} = 0

, the weighting vector can be obtained as follows:

w_{leaf}^{t} = {({(X_{leaf}^{t})}^{T} X_{leaf}^{t} + λ I)}^{- 1} {(X_{leaf}^{t})}^{T} {\hat{x}}_{leaf}^{t}

(12)

Finally, based on the weight vector obtained, the predicted values of leaf nodes are calculated.

After the above training process is completed, the trained LRDT model is used to predict for

x^{p r e - i n}

, and the prediction results are filled into the corresponding missing feature positions.

3.2.3. Outlier Data Handling Sub-Module

Outliers refer to values in a time series where the difference between a specific observed value and other normal values within the same period is significantly large. This difference exceeds the reasonable fluctuation range expected during normal operation, data collection, or business scenarios, deviating from the overall distribution or time series patterns of the data. In this article, a fixed-size time window is defined, and the mean value of the data within this window is calculated. If the difference between a specific observed value and the window’s mean exceeds a preset threshold, it is identified as an outlier. When obvious abnormal situations occur in the data, these outliers are defaulted to missing data, and then data padding is carried out. The boiler feed water volume and urea data are taken as an example. Firstly, MI analysis was conducted on other features of the original data set, respectively, with the boiler feed water volume and urea. Then, eight features with higher MI were taken as the correlation inputs, and the boiler feed water volume and urea were taken as the output. The LRDT model was used to fill in the missing feature data.

3.3. Flame Image Processing Module

The video of the combustion flame at the MSWI power plant is transmitted to the video acquisition system via a dedicated cable. The analog signal is converted to a digital signal through a high-definition video card and is periodically stored in minute intervals. To ensure the temporal consistency of the image samples, the system automatically captures a single frame of the flame image from the complete combustion flame video at a fixed time each minute (e.g., the 30th second of each minute), forming a sequential image sequence. This processing module integrates functions such as video buffering, frame rate synchronization, and timed screenshot triggering. It effectively avoids image distortion caused by video transmission delays or storage fluctuations. Additionally, by standardizing the capture time, it eliminates the impact of intraday illumination changes on image features, providing a consistent visual data foundation for subsequent flame morphology analysis, temperature field inversion, and combustion state assessment.

3.4. Combustion Line Quantification Module

Online flame combustion line quantification using mechanism-based pseudo-label and generative adversarial network (GAN) [25] is proposed, including complete image library construction, dynamic combustion line quantification, and template library adaptive adjustment sub-modules. The functions of different sub-modules are described as follows.

(1): Complete image library offline construction sub-module. The data-driven CLQ algorithm requires data that aligns with the global distribution. However, in scenarios with limited data, conventional image generation algorithms [26] struggle to supplement the distinctive flame features in scarce images, making it even more challenging to capture missing image characteristics [27,28,29]. Hence, this article employed image processing, deep convolutional generative adversarial network (DCGAN), and cycle-consistent generative adversarial network (CycleGAN) [24] techniques and mechanism knowledge on furnace flame images to construct a complete flame image library encompassing normal, abnormal, and highly abnormal combustion patterns. This library is enriched with grate position features, a substantial quantity of unlabeled flame images, and mechanism-based pseudo-labels.
(2): Dynamic combustion line quantification sub-module. This article aims to enhance quantification accuracy by incorporating a broader range of flame characteristics and providing more flame information for subsequent control purposes. Hence, this article employs the spatial convolutional neural network (SCNN) matching approach for quantification. This strategy not only yields quantified values but also provides corresponding templates. These matched templates can be directly linked to control strategies, enabling an “end-to-end” control approach. This constitutes the core idea. Firstly, SCNN is trained based on the complete flame image library. Then, the combustion line feature is extracted, and the corresponding template sub-library is loaded. Finally, the online CLQ is realized based on SCNN multi-scale feature similarity matching.
(3): Template library adaptive adjustment sub-module. The time cost of directly matching against a complete image library is exceedingly high. Therefore, it is essential to construct a non-redundant flame template library. Additionally, employing a time-reversed retrieval of the template library can significantly enhance retrieval efficiency. Hence, a redundancy discriminant mechanism is adopted to construct and update the typical template library from unadjusted images.

3.4.1. Complete Image Library Construction Sub-Module

This sub-section includes the construction of the normal, abnormal, and extremely abnormal flame image sub-library.

(1): Normal flame image sub-library

Firstly, a combustion line edge feature extraction algorithm is designed, including reverse binarization, median filtering, 40 expansion operations using (1, 5) operators, and edge feature calculation using bottom-up operators [30]. Then, a combustion-line position calibration algorithm is designed to extract the characteristic value of the combustion line. Specifically, this article calculates mean

μ

and variance

σ

of the vertical coordinates of white pixels within the range of pixel point widths [

θ_{area}

,

n_{wide}

] and lengths [0,

n_{high}

]. Next, when

σ

is less than

θ_{σ}

, calibration res is

μ

; otherwise, res is marked as 0, indicating the unconventional combustion line. Finally, the flame image and feature value are expressed as

< X_{real, t}^{NM}, < μ_{real, t}^{NM}, σ_{real, t}^{NM} > >

to construct the normal flame image sub-library.

(2): Abnormal flame image sub-library

The construction of the abnormal flame image sub-library for the real combustion line is the same as that of the normal flame image sub-library. The abnormal flame image sub-library for the generated combustion line is constructed as follows: First, two DCGANs are designed to generate the candidate sets of abnormal flame images for forward and backward combustion. Then, generated samples are selected by Fréchet Inception Distance (FID) to obtain a qualified set of abnormal flame images of the combustion line. The lower the FID score, the stronger the ability of the model to generate diverse and high-quality images. The FID function is shown as follows:

\begin{array}{l} FID & = | | μ_{r} - μ_{g} | |^{2} + \\ tr (C o v_{r} + C o v_{g} - 2 {(C o v_{r} C o v_{g})}^{\frac{1}{2}}) \end{array}

(13)

where

μ_{r}

and

μ_{g}

represent the means of the multivariate normal distribution of the feature matrix of the real and generated image sets, respectively;

C o v_{r}

and

C o v_{g}

the covariance matrix of the feature matrix of the real and generated image sets, respectively; and

tr (\cdot)

represents the trace of the matrix.

Next, the qualified abnormal flame image set is expanded again by non-generative data enhancement to obtain the generated abnormal flame image set of combustion lines. Finally, the generated sub-library of abnormal flame images is constructed similarly to the former algorithm.

In the non-generative data enhancement, the process is as follows. Firstly, this article rotates data 0–5° randomly, and translates the scale 0–0.3 along the horizontal direction. Then, this article fills in the missing pixels by mapping.

In the combustion position calibration, the process is as follows. Firstly, a combustion line edge feature extraction algorithm is used to extract the combustion line feature map. Then, a combustion line position calibration algorithm is used to extract the characteristic value of the combustion line. The flame image and characteristic value are expressed as

< X_{generated, t}^{FW}, < μ_{generated, t}^{FW}, σ_{generated, t}^{FW} > >

and

< X_{generated, t}^{BC}, < μ_{generated, t}^{BC}, σ_{generated, t}^{BC} > >

to construct an abnormal flame image sub-library for the generated combustion line.

(3): Extremely abnormal flame sub-library

First, the flame image is moved up n pixels based on knowledge to obtain the pseudo-labeled flame image of an extremely abnormal combustion line. Then, the pseudo-labeled image is converted into a candidate extremely abnormal flame image of the combustion line based on CycleGAN. Finally, evaluate and select the candidate’s extremely abnormal flame image position and construct the generated extremely abnormal flame image sub-library.

X^{'}

represents the image set with the obvious combustion position of MSW in the submodule of acquisition of a pseudo-labeled flame image of extremely abnormal combustion lines.

X_{False}

is the generated extremely abnormal flame a false image set with mechanism-based pseudo label. The image is moved up by n pixels as follows:

X_{False} = X^{'} [:, n :, :, :] + X^{'} [:, n_{wide} - n :, :, :]

(14)

where n is in range of 30–40 pixels.

Generation networks are updated in the generation submodule of candidate flame images of extremely abnormal combustion lines based on CycleGAN, as follows:

\begin{array}{l} \min_{θ_{G}} L_{G} & = \min_{θ_{G_{False_to_Real}}, θ_{G_{False_to_Real}}} \frac{1}{m} \sum_{i = 1}^{m} (\begin{matrix} {(Y_{Generated_Real} [i] - Y_{1} [i])}^{2} + \\ \begin{array}{l} (2.5 * | X_{Reconstrute_False} [i] - X_{False} [i] | \\ + 10 * | X_{False_id} [i] - X_{False} [i] | \end{array} \end{matrix}) \\ + \frac{1}{k} \sum_{i = 1}^{k} (\begin{matrix} {(Y_{Generated_False} [i] - Y_{1} [i])}^{2} + \\ 2.5 * | X_{Reconstrute_Real} [i] - X_{Real} [i] | + \\ 10 * | X_{Real_id} [i] - X_{Real} [i] | \end{matrix}) \end{array}

(15)

Discrimination networks are updated as follows:

\begin{array}{l} \min_{θ_{D_{Real}}} L_{D_{Real}} & = \min_{θ_{D_{Real}}} (0.5 * \frac{1}{m} \sum_{i = 1}^{m} (Y_{Generated_Real} [i] \\ - Y_{0} [i])^{2} + 0.5 * \frac{1}{k} \sum_{i = 1}^{k} {(Y_{Real} [i] - Y_{1} [i])}^{2}) \end{array}

(16)

\begin{array}{l} \min_{θ_{D_{False}}} L_{D_{False}} & = \min_{θ_{D_{False}}} (0.5 * \frac{1}{k} \sum_{i = 1}^{k} (Y_{Generated_False} [i] \\ - Y_{0} [i])^{2} + 0.5 * \frac{1}{m} \sum_{i = 1}^{m} {(Y_{False} [i] - Y_{1} [i])}^{2} \end{array}

(17)

The network parameters and the generated extremely abnormal flame image of the candidate combustion line are saved after each iteration.

In the submodule of sample selection and combustion position calibration, the process is as follows. First, this article loads the generation models under different batches and generates candidate sample sets of extremely abnormal flame images of combustion lines. Then, this article calculates FID between the generated sample set and the pseudo-labeled sample set and FID between the generated and the real sample set. This article selects the samples generated by the three-generation models with the lowest sum of the two FIDs as the candidate samples. Finally, the characteristic value of the combustion line is extracted, and the flame image is an extremely abnormal flame image of the combustion line when

μ_{generated, t}^{exFW}

is less than 0.47. The flame image and characteristic value are expressed as <

X_{generated, t}^{exFW}

,

< μ_{generated, t}^{exFW}, σ_{generated, t}^{exFW} >

> to construct an extremely abnormal flame image sub-library.

3.4.2. Dynamic Combustion Line Quantification Sub-Module

The aims of training SCNN are to obtain a feature extractor, including data preparation, network structure design, and training process. For the data preparation, the combustion line forward, normal, and backward sets are divided according to drying grate, burning grate, and burnout grate, where MSW is located. On this basis, positive and negative samples are constructed. Two pictures are randomly selected from the same subgroup for positive sample

y_{i}^{postive}

. Two pictures are randomly selected from different subgroups for negative sample

y_{i}^{negative}

. To the structure design, it comprises VGG16 and dense layers. To the training process, the relevant variables

X_{i}, X_{j}, X_{k} \in X = {X_{FW}, X_{NM}, X_{BC}}

and corresponding labels

y_{i}, y_{j}, y_{k}

are obtained.

y_{i}

is taken as follows:

y_{i} = \{\begin{matrix} 0 & X_{i} \in X_{FW} \\ 1 & X_{i} \in X_{NM} \\ 2 & X_{i} \in X_{BC} \end{matrix}

(18)

where

i \neq j

;

y_{i} = y_{j}

;

y_{i} \neq y_{k}

. Then,

X_{i}^{postive}

(

X_{i}

and

X_{j}

) and

X_{i}^{negative}

(

X_{i}

and

X_{k}

) are introduced into SCNN to get predicted results

{\hat{y}}_{i}^{postive}

and

{\hat{y}}_{i}^{negative}

. They are shown as follows:

{\hat{y}}_{i}^{postive} = f_{Siamese} (X_{i}^{postive}) = f_{Dense} (f_{vgg 16} (X_{i}) - f_{vgg 16} (X_{j}))

(19)

{\hat{y}}_{i}^{negative} = f_{Siamese} (X_{i}^{negative}) = f_{Dense} (f_{vgg 16} (X_{i}) - f_{vgg 16} (X_{k}))

(20)

Finally, the “Adam algorithm” is used to minimize the cross-entropy loss function as follows:

\begin{array}{l} L_{Siamese} (Y, f_{Siamese} (X_{postive}), f_{Siamese} (X_{negative})) \\ = - \frac{1}{2 n} \sum_{i = 1}^{n} \begin{array}{l} [y_{i}^{postive} \log ({\hat{y}}_{i}^{postive}) \\ + (1 - y_{i}^{negative}) \log (1 - {\hat{y}}_{i}^{negative})] \end{array} \end{array}

(21)

The combustion line features

μ_{current}

and

σ_{current}

are extracted from the flame image

X_{current}

. At first, the combustion line feature map is obtained as follows:

X_{thresh_dilate_sobely, current} = f_{image_process} (X_{current}, θ_{binarization})

(22)

Then, the combustion line feature is extracted as follows:

(μ_{current}, σ_{current}) = f_{combustion_line_calibration} (X_{thresh_dilate_sobely, current}, θ_{var iance}))

(23)

When

σ_{current}

is greater than

θ_{σ}

, CLQcurrent is 0; When

σ_{current}

is less than

θ_{σ}

, enter the SCNN similarity measurement module.

The SCNN similarity measure includes template library loading, SCNN, and template overloading functions. At first, this article loads the corresponding template library

T_{•}

as follows:

T_{•} = \{\begin{matrix} T_{exFW}, \\ T_{FW}, \\ T_{NM}, \\ T_{BC}, \end{matrix} \begin{matrix} 0 < μ_{current} < = 0.47 \\ 0.47 < μ_{current} < = 0.51 \\ 0.51 < μ_{current} < = 0.736 \\ 0.736 < μ_{current} < = 1 \end{matrix}))

(24)

where the description of the template feature

T_{•, i}

in

T_{•}

is as

< μ_{•, i}, σ_{•, i} >

;

i = 1, 2 \dots I

; and

I

represents the number of templates in the current template sub-library. Then, SCNN is used to measure the similarity between

X_{current}

and

T_{•, i}

, which is shown as follows:

SIM (X_{current}, T_{•, i}) = f_{Siamese} (X_{current}, T_{•, i}) = f_{Dense} (f_{vgg 16} (X_{current}) - f_{vgg 16} (T_{•, i}))

(25)

{SIM}_{\max} = \max_{i \in | 1, 2 \dots I |} (SIM (X_{current}, T_{•, i}))

(26)

where

SIM (X_{current}, T_{•, i})

greater than

θ_{sim}

is regarded as the matching template after calculating the similarity.

When

SIM (X_{current}, T_{•, i})

is less than

θ_{sim}

,

μ_{current}

is regarded as

{CLQ}_{current}

. This process is as follows:

{CLQ}_{current} = \{\begin{matrix} \sum_{i = 1}^{I} w_{i} * μ_{•, i} / \sum_{i = 1}^{I} w_{i} & , \sum_{i = 1}^{I} w_{i} \neq 0 \\ μ_{current} & , \sum_{i = 1}^{I} w_{i} = 0 \end{matrix}

(27)

w_{i} = \{\begin{matrix} 1 & , SIM (X_{current}, T_{•, i}) > θ_{sim} \\ 0 & , SIM (X_{current}, T_{•, i}) \leq θ_{sim} \end{matrix}

(28)

3.4.3. Template Library Adaptive Adjustment Sub-Module

To the construction of the template library, this article loads

X_{j}

in

L

firstly. The definition of

L

is as follows:

L = {L_{real}^{NM}, L_{real}^{FW}, L_{Generated}^{FW}, L_{real}^{BC}, L_{Generated}^{BC}, L_{real}^{exFW}, L_{Generated}^{exFW}}

(29)

Then, the jth image

L_{j}

and its corresponding feature in

L

are denoted as follows:

L_{j} = < X_{j}, < μ_{j}, σ_{j} > >

(30)

where

X_{j}

represents the jth image;

< μ_{j}, σ_{j} >

is the combustion line feature of the jth image, and

j = 1, 2 \dots J

; and J the number of images in

L

. Further, this article loads the corresponding template sub-library image

T_{•, i}

in reverse order according to the correspondence. The template’s feature in

T_{•}

is denoted as

< μ_{•, i}, σ_{•, i} >

, in which

i = I, I - 1, \dots, 1

, and

I

represents the number of templates in the current template sub-library. Then, the similarity between

X_{j}

and

T_{•, i}

are measured by SCNN as follows:

SIM (X_{j}, T_{•, i}) = f_{Siamese} (X_{j}, T_{•, i})

(31)

The next step is to judge whether

SIM (X_{j}, T_{•, i})

is greater than

θ_{sim}

. If

SIM (X_{j}, T_{•, i})

is greater than

θ_{sim}

, judge whether j is equal to J; if it is true, the construction is completed; Otherwise, this article loads the next image in

L

, and skip to step 1). If

SIM (X_{j}, T_{•, i})

is less than

θ_{sim}

, judge whether the reverse traversal is completed; If it is true, this process is finished, and this article saves

< X_{j}, < μ_{j}, σ_{j} > >

to the template library; Otherwise, this article loads the next template in reverse order.

To the update of the template library, this article loads the corresponding template sublibrary image

T_{•, i}

in reverse order, firstly. Then, the similarity between

X_{current}

and

T_{•, i}

are measured by SCNN as follows:

SIM (X_{current}, T_{•, i}) = f_{Siamese} (X_{current}, T_{•, i})

(32)

Further, this article judges whether

SIM (X_{current}, T_{•, i})

is greater than

θ_{sim}

. If it exceeds

θ_{sim}

, the template library is updated, and the traversal is skipped; otherwise, judge whether the reverse order traversal is over. When the traversal is over, this article updates the template library with

< X_{current}, < μ_{current}, σ_{current} >>

; otherwise, this article continues the reverse order traversal.

3.5. Multimodal Data Synchronization Module

This article aligns the quantified values of the flame images with process data such as temperature, air volume, and grate speed during the incineration process at the second timestamp level. Ultimately, a spatio-temporal aligned multimodal dataset, containing visual features and process parameters, is formed. This dataset provides support for the joint analysis of combustion states, parameter optimization, and the prediction of pollutant generation.

4. Results

4.1. Process Data Processing Results

The processing result of missing values in the process data takes the filling of the air volume of the second stage of the left burner 1 as an example. Its inputs are cumulative primary air volume of furnace No. 2, the flue gas temperature on the left side of the primary combustion chamber, the cumulative amount of the lime feeder, the cumulative volume of urea solution in furnace No. 2, the accumulation of feeding volume in the activated carbon storage silo, cumulative flow rate of urea solvent supply, the flue gas temperature at the left inlet of the economizer, and air volume setting at drying grate 2 on the right. The number of training samples is 40, and the number of validation samples is 26. The parameters used to construct the LRDT model are set with a minimum sample size of five and a regularization coefficient of 0.8. The prediction results of the filling model are shown in Figure 5, Figure 6 and Figure 7.

To verify the validity of the LRDT model’s prediction, this article compares it with the K-Nearest Neighbors (KNN) and eXtreme Gradient Boosting (XGBoost) models. The parameters for the comparison experiment are as follows: for KNN, the number of neighbors (K) is set to 3; for XGBoost, the number of iterations is 200, the learning rate is 0.01, and the maximum split number is 3.

The evaluation indicators for the comparative experimental results are shown in Table 3 as follows.

From the three indicators of RMSE, MAE, and R² of the training set and validation set in Table 3, it can be clearly seen that the LRDT algorithm demonstrates significant effectiveness in the modeling of benchmark operating condition data. On the training set, the RMSE and MAE of LRDT are both lower than those of KNN and XGBoost, while R² is higher than both, indicating that its fitting accuracy for data patterns under benchmark conditions is better. On the validation set, LRDT still maintains its advantage. Its RMSE and MAE are not only far lower than those of KNN, which has severely insufficient generalization ability, but also superior to XGBoost. Moreover, its R² is higher than both, indicating that it has stronger generalization stability on unseen benchmark operating condition data. Overall, the LRDT algorithm performs best in the balance between fitting accuracy and generalization ability, fully demonstrating its effectiveness in the modeling of benchmark operating condition data.

4.2. Flame Image Processing Results

By capturing single-frame images of the flame video at fixed moments per minute, the flame images corresponding to the process data at fixed moments per minute can be obtained. Some flame images of the left grate from 19:11:15 to 21:24:15 on 17 October 2019 are shown as Figure 8.

4.3. Combustion Line Quantification Result

First, the original flame images during the incineration process are captured by an industrial camera installed in the observation window of the grate furnace. For complex operating conditions such as high temperature and dust interference, the images are pre-processed (e.g., denoising and contrast enhancement) to improve image quality. Then, using computer vision technology and deep learning methods (such as convolutional neural network feature extraction or edge detection algorithms), the dynamic boundaries of the flame combustion area are segmented from the pre-processed flame image. The contour of the combustion line representing the core combustion area is recognized and extracted, generating a clear combustion line image. Finally, quantitative analysis is conducted on the combustion line images. By calculating key characteristic parameters such as the length, spatial distribution uniformity, position offset, and dynamic fluctuation frequency of the combustion lines, the visual features in the images are transformed into quantifiable numerical indicators. Ultimately, quantitative values of the combustion lines, which accurately reflect the combustion state of solid waste in the grate furnace, are obtained.

The quantitative value of the combustion line serves as a crucial link between “process mechanism analysis” and “actual operation optimization” in the MSWP process, providing an operational foundation for optimizing incinerator operation. For example, based on the quantitative offset of the combustion line position, the grate speed or the proportion of combustion-supporting air distribution can be adjusted to prevent local coking caused by flame offset. Additionally, optimizing the secondary air temperature and air volume through quantitative monitoring of the combustion line temperature distribution can reduce the generation of pollutants such as CO and dioxins. The quantitative value can also be used for the verification and correction of incineration process models. By taking the quantitative value of the combustion line as the key verification index of the model output, the simulation accuracy of the numerical model for the actual combustion process can be improved, providing reliable quantitative references for the improvement of furnace design and the development of new incineration technologies (such as low-nitrogen combustion and efficient heat recovery). Ultimately, this will achieve a coordinated improvement in the harmlessness, reduction, and energy recovery efficiency of the MSWI process. The partial results of the quantification process of the flame combustion line are shown in Table 4.

4.4. Multimodal Data Synchronization Results

The multimodal dataset constructed in this article contains 780 samples. As shown in Table 1, 15 MVs, 5 CVs, and 7 EIs are included. Specifically, the CVs include images of the left and right grate flames, which are used to quantify the position of the combustion line. The structure of the multimodal dataset that has been constructed is shown in Figure 9.

For example, consider the period from 19:12:15 to 21:24:15 on 17 October 2019. The partial process data diagram and flame images for this period are shown in Figure 10, Figure 11 and Figure 12.

In the case of missing features on different dates, the real data from the current day is used for training and prediction, forming a complete multimodal reference working condition dataset. The processed flame images are then added at the corresponding moments to constitute the final multimodal dataset.

5. Discussion

5.1. Limitations and Scalability

There are some issues worth attention in the construction process of this multimodal dataset. Firstly, the issue of missing data is rather prominent. The absence of process data and quantized values may result from sensor failure, poor image quality leading to quantization failure, etc. Moreover, the missing patterns of different modal data may vary, which can affect the accuracy of the LRDT model filling. Especially when the missing ratio is high or the missing mechanism is related to the data distribution, it may introduce bias. Secondly, there may be deficiencies in the spatio-temporal alignment accuracy of multimodal data. The acquisition frequencies or timestamps of flame video frames, combustion line images, and process data may have minor differences, leading to deviations in the correlation of “the same moment”. In addition, the stability of data quality needs to be improved. The original flame images may be affected by dust and light fluctuations, generating noise or blurring. The robustness differences in the combustion line generation algorithms may lead to distortion of some image features, thereby affecting the reliability of quantified values. Moreover, process data may also have problems such as sensor drift.

The core achievement in constructing a benchmark operating condition multi-modal dataset for the MSWI process lies in overcoming the information limitations of single-modal data. This dataset integrates process parameters such as flame images, furnace temperature, and flue gas pollutant concentrations. By correlating flame visual features (such as shape and brightness) with real-time temperature distribution and pollutant emission data, the combustion state is upgraded from “indirect parameter inference” to “multi-dimensional intuitive quantification.” This provides comprehensive training samples for models used in combustion stability assessment and extreme abnormal flame identification. Additionally, through standardized spatiotemporal stamps and annotation guidelines, it offers a unified data benchmark for algorithm comparison, primary and supplementary modeling, and technical verification across different research teams.

The constructed multi-modal dataset still has significant limitations regarding scalability. First, data on extreme operating conditions is scarce. Due to the randomness in the composition of waste and the safety constraints of industrial operations, systematically collecting multimodal samples for extreme scenarios—such as high moisture content/low calorific value waste combustion and local coking in the furnace—is challenging. This results in insufficient coverage of edge operating conditions within the dataset. Second, there is insufficient spatio-temporal synchronization between modalities. The temporal resolution discrepancy between flame images and sensor data, as well as the spatial offset caused by grate movement, can lead to matching deviations between visual features and process parameters, which negatively impacts multimodal fusion. The third issue is the bottleneck of annotation and scalability. Visual annotation of abnormal flames relies heavily on expert judgment, and ensuring consistency in annotation is difficult. Additionally, the compatibility cost of the existing architecture for new modalities—such as infrared thermal imaging and online analysis of flue gas components—is relatively high. At the same time, some enterprises are unable to access the dataset due to the confidentiality requirements of core operational data, further limiting the diversity and scale of the samples. These challenges make it difficult to fully support model training and mechanism research under all operating conditions.

In the future, the research can be carried out in the following aspects. First, optimize the missing value handling strategy. The LRDT model can be improved by combining data missing mechanisms (such as random missing and non-random missing), or multiple filling methods can be integrated (such as introducing generative models to assist in the filling of image-derived quantified values), and the impact of different methods on downstream tasks can be evaluated through cross-validation. Second, enhance the consistency and correlation of multimodal data. Strengthen spatio-temporal alignment through high-precision time synchronization technology, and develop more robust algorithms for generating and quantifying burning lines to reduce error transmission in the image preprocessing stage. Third, expand the coverage and diversity of the dataset, collect multimodal data under different operating conditions (such as different garbage components and load fluctuations), enhance the representativeness of the dataset, and at the same time establish a data quality assessment system to quantitatively label image clarity, process data signal-to-noise ratio, etc. Fourth, deepen the integrated application of multimodal data, explore cross-modal feature associations based on a complete dataset (such as the potential mapping between flame visual features and pollutant emissions), and provide more comprehensive decision support for the optimized control of the incineration process.

Liao et al. utilized the long-term industrial heat source dataset of India to accurately locate the industrial heat source areas, analyze their layout, resource allocation, and factory operation conditions, and provide decision support for the Indian government’s industrialization construction and ecological monitoring, etc. [31]. Wang et al. constructed a dense control valve parts dataset for industrial target detection to meet the requirements of control valve manufacturing enterprises for parts target detection [32]. Therefore, industrial datasets play an indispensable role in industrial development and even national strategic development.

5.2. Application Prospects

Based on this benchmark operational condition multimodal dataset, the research work that can be carried out is shown in Figure 13 below.

Figure 13 includes AI modeling, AI maintenance, AI decision, AI optimization, AI control, and AI recommendation modeling in terms of reproducing the “perception-prediction-control” mode of domain experts in the actual MSWI industry field. The AI modeling algorithm extracts multi-dimensional data, such as operational indicators and auxiliary variables, from the real and virtual dual-controlled objects, and builds multi-purpose models. The AI maintenance algorithm provides multi-level modeling. Through full-process operation status monitoring, operating condition/flame status recognition, and environmental indicator early risk warning, it achieves “early perception” of system abnormalities. The AI decision algorithm outputs adaptive decisions for multiple demands (such as production and environmental indicators) and multiple operating conditions (such as multiple operating conditions composed of differentiated CV and EI values). AI optimization algorithms focus on different objectives, such as environmental, economic, and quality indicators, and resolve multi-scale and multi-object conflicts. The AI control algorithm connects the PLC/DCS loop, links multi-position control, precisely outputs “material, air, water” instructions, and implements “control”. An AI recommendation model built around the MSWI scenario, deeply replicating the “perception-prediction-control” model of industry experts, and constructing a model based on the embodied intelligence of domain experts. These AI algorithms are studied based on the dual dimensions of real and virtual controlled objects. The real side connects on-site operation record data, real process data, and real flame videos to restore the actual incineration conditions. The virtual side relies on a knowledge-data-mechanism hybrid-driven AI model, integrating numerical simulation, mechanism knowledge, historical process data, and historical flame videos to build a digital twin scene. The entire system is linked by algorithms, integrating the entire process from data perception, prediction analysis, and decision output to execution control.

AI serves as a tool for learning multimodal feature patterns from a dataset, quickly absorbing and processing multimodal information through algorithms. The models it constructs can replicate expert “perception-prediction-control” logic and respond rapidly to parameter adjustments, addressing operational challenges under complex conditions. This contributes to the advancement of the MSWI process toward greater intelligence and refinement. However, several challenges remain in AI modeling, maintenance, decision-making, optimization, and control within MSWI processes.

(1): Difficulties in AI modeling

The core challenge lies in the “data-mechanism-generalization” problem: the data suffers from issues such as being “more conventional and less extreme,” “more single-modal and less cross-modal,” and “more unlabeled and less precisely labeled.” The scarcity of edge case samples prevents the model from fully learning the operating condition rules. The process itself is highly coupled and nonlinear, while most AI models are black boxes, making integration into industrial mechanisms difficult and often leading to “excellent fitting but incorrect physical meaning.” The dynamic drift of operating conditions causes the accuracy of laboratory models to drop sharply when transferred to real-world scenarios. Frequent retraining requires a large amount of new data, which contradicts the low maintenance needs of such systems.

(2): Difficulties in AI Maintenance

Focus areas include “dynamic adaptation—human–machine collaboration—system compatibility.” Data drift caused by equipment aging and changes in raw materials can degrade the accuracy of the AI maintenance model, making real-time monitoring challenging. Recalibration requires synchronization with production rhythms. While operation and maintenance personnel understand the equipment, they often lack AI knowledge, and AI engineers are not always familiar with industrial operations, creating barriers to effective human–machine collaboration. Additionally, the interface between AI systems and MSWI industrial systems, such as DCS, is often closed and utilizes diverse protocols, making it difficult to directly integrate AI maintenance model results into automatic operation and maintenance. This often necessitates manual conversion, which can introduce errors.

(3): Difficulties in AI decision-making

The challenges here are “explainability—multi-objective balance—real-time performance.” AI decisions are often black-box solutions, and explaining the underlying logic in MSWI process terms is difficult (e.g., suggesting an adjustment in grate speed without explaining its impact on pollutant emission concentrations). Domain experts may resist adopting these decisions due to a lack of trust. Multiple objectives, such as emission reduction, efficiency improvement, and energy consumption reduction, often conflict, making it challenging for AI to dynamically adjust the weights. This results in outputs that are “theoretically optimal but practically unfeasible.” Complex models demand significant computation time, and simplifying them compromises accuracy, leading to either “fast but inaccurate” or “accurate but slow” outcomes.

(4): Difficulties in AI optimization

The central issue lies in the disconnection between “constraint adaptation—global optima—practical implementation.” Industrial hard constraints in the MSWI process, such as minimum furnace temperature and maximum steam flow rate, limit the optimization space, causing AI to become stuck in local optima. This leads to conflicts between optimization goals and safety. The optimization objectives change dynamically with varying operating conditions (for example, controlling NO_x emissions when the calorific value of waste is high or stabilizing furnace temperature when it is low). As a result, it becomes difficult for the model to promptly adjust the switching weights based on changing operating conditions, leading to mismatched output parameters. The ideal parameters suggested by AI often exceed the equipment’s control accuracy and fail to account for response delays, making them difficult to implement in real MSWI process operations.

(5): Difficulties in AI control

The core challenge is “dynamic response—safety robustness—multi-loop coupling.” The MSWI process is highly nonlinear, features large delays, and exhibits time-varying characteristics. AI control requires precise modeling of dynamic parameters, which change due to equipment aging and MSW variations, increasing the likelihood of control overshoot or oscillations. AI optimization relies on “exploration—trial and error,” but in actual MSWI environments, such trial and error is not feasible. Excessive restriction of exploration makes it difficult to identify the optimal strategy, leading to a compromise of “safety but low precision.” Multiple control loops, such as the furnace temperature, steam flow rate, and combustion line position, are interdependent, and optimizing a single loop can cause other parameters to exceed acceptable limits. Multi-loop collaborative control is complex and challenging to implement within real-time constraints in the MSWI process.

In other industrial fields, Wang et al. developed a multi-source data fusion method for wing aerodynamic loads based on pre-training and fine-tuning, providing support for wing design and performance optimization [33]. Yan et al. constructed a data-driven model based on industrial datasets, integrating real oil storage system operation data with virtual simulation data [34]. Through model training and optimization, they achieved precise control over the oil storage process, providing data support and experimental verification for the safe operation and efficiency improvement of the oil storage system.

Relying on the multimodal dataset, it can support real-time diagnosis of combustion status, pollutant emission prediction, and coordinated optimization of grate speed and air volume, etc. Therefore, the construction of industrial datasets is a necessary foundation for solving the problem of scattered and disordered industrial data and supporting the implementation of intelligent technologies. It is also an important cornerstone for promoting the digital transformation of industry and achieving efficient optimization and innovative development.

6. Conclusions

This article builds a benchmark operational condition multimodal dataset for the MSWI process to support the research of AI algorithms aimed at achieving sustainable development of the urban environment. The innovation points are as follows: (1) In data processing, a hybrid missing value handling strategy is proposed, combining linear interpolation (single missing value) with the LRDT model (continuous missing/abnormal data), which improves data integrity compared to a single method. (2) Obtain the true value of the combustion line position based on the quantification of the flame image. (3) For the first time, an MSWI benchmark operational condition multimodal dataset combining visual information with process parameters was constructed, integrating process data, flame images, and CLQ values, filling the gap of the lack of an open benchmark operational condition multi-mode dataset in existing research. This dataset provides crucial data support for analyzing combustion states, predicting pollutant generation, and optimizing processes in MSWI. It contributes to sustainable development in three key areas: environmentally, it supports early warnings of combustion anomalies to reduce pollutant emissions and minimize ecological risks; in urban contexts, it optimizes waste heat recovery to improve efficiency, reduces landfill space usage, and alleviates land pressure; and industrially, it establishes benchmarks for “operating conditions—emissions—energy efficiency,” driving the shift from broad, inefficient operations to precise, sustainable practices. In the future, expanding the modalities and integrating them with solid waste lifecycle management will further strengthen support for the sustainable and stable development of urban ecosystems.

Author Contributions

Conceptualization, J.T.; Methodology, J.T.; Data curation, Y.H.; Software, Y.H. and H.T.; Investigation, Y.H.; Resources, Y.H. and H.T.; Data curation, Y.H.; Writing—original draft, Y.H.; Validation, H.T.; Writing—review & editing, J.T.; Visualization, Y.H.; Supervision, J.T.; Project administration, J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

Abbreviation	Full Name
MSW	Municipal solid waste
MSWI	MSW incineration
AI	Artificial intelligence
CNN	Convolutional neural network
LSTM	Long short-term memory
EM	Expectation-Maximization
LRI	Laser ranging interferometry
DXN	Dioxin
CV	Controlled variable
EI	Environmental indicators
MV	Manipulated variable
AV	Auxiliary variable
CO	Carbon monoxide
NO_x	Nitrogen oxides
SO₂	Sulfur dioxide
HCl	Hydrogen chloride
PM	Particulate matter
HF	Hydrogen fluoride
MI	Mutual Information
LRDT	Linear regression decision tree
CART	Classification and regression tree
GAN	Generative adversarial network
SCNN	Spatial convolutional neural network
CLQ	Combustion line quantification
DCGAN	Deep convolutional generative adversarial network
CycleGAN	Cycle—consistent generative adversarial network
FID	Fréchet inception distance
KNN	K-nearest neighbors
XGBoost	eXtreme gradient boosting
Symbol	Meaning
${\bar{V}}_{Process}^{Ori}$	The average grate speed in the constructed dataset
$V_{Process}^{Ori}$	The actual sample value in the constructed dataset
$V_{Process}^{Trans}$	The grate speed converted into units
$N$	The total number of samples in the dataset
$M$	The number of features in the dataset
$D$	The process dataset
$x_{p c c}$	Correlated input features
$x_{o r i}$	Output feature
$x^{t r a i n}$	Complete samples of the data for training
$x^{p r e - i n}$	The input feature dataset of the remaining missing samples for predicting the missing values
$L_{k}^{MSE} (n, i)$	The loss function value of MSE
$(n, i)$	The $k$ -th eigenvalue of the $n$ -th sample in the $i$ -th iteration
$x_{nonleaf}^{1}$	The first non-leaf node
$x_{1, nonleaf}^{MD}$	The segmentation variable of the first non-leaf node
$θ$	Minimum sample size
${\hat{X}}_{leaf}^{t}$	The output of the $t$ -th leaf node
$X_{leaf}^{t}$	The input of the $t$ -th leaf node
$W_{leaf}^{t}$	The weight matrix of the $t$ -th leaf node
$J (W_{leaf}^{t})$	A regularized least squares loss function
${\hat{x}}_{leaf}^{t}$	The true output of the leaf node
$λ$	The coefficient of the regularization term
$X_{real}^{NM}$	A collection of real flame images with normal combustion lines
$X_{real}^{FW}$	A set of real burning line abnormal forward shift flame images
$X_{real}^{BC}$	A set of real flame images with abnormal backward shift in the burning line
$X_{real}^{exFW}$	A set of extremely abnormal forward shift flame images of the real burning line
$X^{'}$	Flame images with the combustion line position ranging from 0 to 60%
$X_{current}$	Real-time flame image
$X_{real, t}^{NM}$	The t-th image of a normal and real flame on the combustion line
$< μ_{real, t}^{NM}, σ_{real, t}^{NM} >$	Tuple < mean of $X_{real, t}^{NM}$ , variance of $X_{real, t}^{NM}$ >
$X_{generated, t}^{BC}$	The t-th generated image of the abnormally backward shifting flame of the combustion line
$< μ_{generated, t}^{BC}, σ_{generated, t}^{BC} >$	Tuple < mean of $X_{generated, t}^{BC}$ , variance of $X_{generated, t}^{BC}$ >
$X_{generated, t}^{FW}$	The t-th generated flame image with the combustion line abnormally moving forward
$< μ_{generated, t}^{FW}, σ_{generated, t}^{FW} >$	Tuple < mean of $X_{generated, t}^{FW}$ , variance of $X_{generated, t}^{FW}$ >
$X_{real, t}^{BC}$	The t-th real image of the burning line with an abnormally backward shifting flame
$< μ_{real, t}^{BC}, σ_{real, t}^{BC} >$	Tuple < mean of $X_{real, t}^{BC}$ , variance of $X_{real, t}^{BC}$ >
$X_{real, t}^{FW}$	The t-th real image of a flame with an abnormally forward shift in the burning line
$< μ_{real, t}^{FW}, σ_{real, t}^{FW} >$	Tuple < mean of $X_{real, t}^{FW}$ , variance of $X_{real, t}^{FW}$ >
$X_{generated, t}^{exFW}$	The t-th real image of a flame with an extremely abnormal forward shift in the burning line
$< μ_{generated, t}^{exFW}, σ_{generated, t}^{exFW} >$	Tuple < mean of $X_{generated, t}^{exFW}$ , variance of $X_{generated, t}^{exFW}$ >
$L_{real}^{exFW}$	Construct $< X_{real, t}^{exFW}, < μ_{real, t}^{exFW}, σ_{real, t}^{exFW} > >$ set of real burning line extreme anomaly forward shift images in the form of a pairs
$L_{generated}^{exFW}$	The set of extremely abnormal forward shift images of the burning line generated is constructed in the form of $< X_{generated, t}^{exFW}, < μ_{generated, t}^{exFW}, σ_{generated, t}^{exFW} > >$ tuple
$L_{real}^{FW}$	Construct real set of abnormal forward shift images of combustion lines in the form of tuples $< X_{real, t}^{FW}, < μ_{real, t}^{FW}, σ_{real, t}^{FW} > >$
$L_{generated}^{FW}$	The set of abnormal forward shift images of the combustion line generated is constructed in the form of $< X_{generated, t}^{FW}, < μ_{generated, t}^{FW}, σ_{generated, t}^{FW} > >$ tuple
$L_{real}^{NM}$	Construct set of normal flame images of real combustion lines in the form of tuples $< X_{real, t}^{NM}, < μ_{real, t}^{NM}, σ_{real, t}^{NM} > >$
$L_{real}^{BC}$	Construct real set of abnormal backshift images of burning lines in the form of tuples $< X_{real, t}^{BC}, < μ_{real, t}^{BC}, σ_{real, t}^{BC} > >$
$L_{generated}^{BC}$	The set of abnormal backshift images of the burning line generated is constructed in the form of $< X_{generated, t}^{BC}, < μ_{generated, t}^{BC}, σ_{generated, t}^{BC} > >$ tuple
$L^{Non}$	The burning line does not exist in the image set
$L$	An image library for SCNN training, $L = {L_{real}^{exFW}, L_{generated}^{exFW}, L_{real}^{FW}, L_{generated}^{FW}, L_{real}^{NM}, L_{real}^{BC}, L_{generated}^{BC}}$
$f_{Siamese}$	Siamese network algorithm
$μ_{current}$	The average value of the $X_{current}$ burning line
$σ_{current}$	The variance of the $X_{current}$ burning line
$T_{•}$	The corresponding template sublibrary loaded
CLQ_current	The quantified value of the burning line of $X_{current}$
$T_{•, i}$	The i-th template in the corresponding template sublibrary loaded
$T_{exFW}$	Template library for extreme abnormal flame images with the combustion line moving forward
$T_{FW}$	Template library for abnormal flame images with forward shift in the combustion line
$T_{NM}$	A template library for normal flame images of the combustion line
$T_{BC}$	Template library for abnormal flame images with rearward burning lines
$X_{thresh_dilate_sobely}$	The edge feature image of the combustion line obtained by the combustion line edge feature extraction algorithm
$f_{image_process}$	Morphological processing algorithm
$n_{wide}$	The width of the image resolution captured by the camera is 576
$n_{high}$	The length of the image resolution captured by the camera is 720
$θ_{area}$	Lower limit of pixels in the burning area
$μ$	The mean value of the white pixels in some areas of $X_{thresh_dilate_sobely}$
$σ$	The variance of the white pixels in some areas of $X_{thresh_dilate_sobely}$
$θ_{σ}$	When the threshold of variance is greater than this threshold, it is considered that the burning line does not exist
$f_{combustion_line_calibration}$	Combustion line calibration algorithm
$μ_{r}$	The mean of the multivariate normal distribution of the feature matrix $Z_{r}$
$μ_{g}$	The mean of the multivariate normal distribution of the feature matrix $Ζ_{g}$
$C o v_{r}$	The covariance matrix of $X_{r}$
$C o v_{g}$	The covariance matrix of $X_{g}$
$X_{False}$	The generated set of pseudo-labeled images of extreme abnormal flames
$X_{Real}$	A set of unlabeled real flame images
$X_{False}$	The generated pseudo-labeled image of extreme abnormal flames
$X_{Real}$	Unmarked real flame images
$G_{Fals e_to_Real} (.)$	A generator for converting pseudo-labeled images into real images
$G_{Real_to_False} (.)$	A generator for converting real images into pseudo-labeled images
$D_{Re al} (.)$	A discriminator for determining whether it is a real image
$D_{False} (.)$	A discriminator for determining whether an image is falsely labeled
$X_{Generated_False}$	The pseudo-labeled image generated by $G_{Real_to_False} (X_{Real})$
$X_{Generated_Real}$	The real image generated by $G_{Fals e_to_Real} (X_{False})$
$X_{Reconstrute_Real}$	Reconstructed real images by $G_{Fals e_to_Real} (X_{Generated_False})$
$X_{Reconstrute_False}$	Reconstructed pseudo-labeled images by $G_{Real_to_False} (X_{Generated_Real})$
$X_{Real_id}$	The identity verification image of the real image $G_{Fals e_to_Real} (X_{Real})$
$X_{False_id}$	The authentication image of the falsely labeled image $G_{Real_to_False} (X_{False})$
$y_{Generated_False}$	The predicted value of the discrimination result of the generated pseudo-labeled image by $D_{False} (X_{Generated_False})$ , $D_{False} (.)$ .
$Y_{Generated_False}$	The set of $y_{Generated_False}$
$y_{False}$	The predicted value of $D_{False} (X_{False})$ , $D_{False} (.)$ for the discrimination result of the falsely labeled image.
$Y_{False}$	The set of $y_{False}$
$y_{Generated_Real}$	The predicted value of $D_{Real} (X_{Generated_Real})$ , $D_{Real} (.)$ for the generated real image discrimination result
$Y_{Generated_Real}$	The set of $y_{Generated_Real}$
$y_{Real}$	The predicted value of $D_{Real} (X_{Real})$ , $D_{Real} (.)$ for the discrimination result of the real image
$Y_{Real}$	The set of $y_{Real}$
$Y_{1}$	A vector with all elements being 1 s
$Y_{0}$	A vector with all elements being 0
m	And the number of images in $X_{Real}$
k	And the number of images in $X_{Real}$
$L_{D_{Real}}$	Update the loss function of $D_{Real}$
$L_{D_{False}}$	Update the loss function of $D_{False}$
$X$	Flame image set
$X_{FW}$	It includes a set of real and generated flame images with the burning line moving forward and extreme moving forward, with the burning line position ranging from 0% to 51%
$X_{NM}$	A set of normal flame images containing real combustion lines, with the position of the combustion lines ranging from 51% to 73.6%
$X_{BC}$	A set of flame images with real and generated burning lines shifted backward, with burning line positions ranging from 73.6% to 100%
$X_{i}, X_{j}, X_{k}$	The flame image in $X$ , $X_{i}, X_{j}, X_{k} \in X = {X_{FW}, X_{NM}, X_{BC}}$
$y_{i}, y_{j}, y_{k}$	The label value corresponding to $X_{i}, X_{j}, X_{k}$
$X_{i}^{postive}$	Positive samples constructed for training the Siamese network
$X_{i}^{negative}$	Negative samples constructed for training Siamese networks
$y_{i}^{postive}$	The label value of the positive sample, $y_{i}^{postive} = 1$
$y_{i}^{negative}$	The label value of the negative sample, $y_{i}^{negative} = 0$
${\hat{y}}_{i}^{postive}$	The predicted value of the Siamese network for positive samples
${\hat{y}}_{i}^{negative}$	The predicted value of the Siamese network for negative samples
$f_{Siamese}$	Twin network
$f_{Dense}$	The fully connected layer in a twin network
$f_{vgg 16}$	The VGG16 layer in the twin network
$L_{Siamese}$	The loss function of the twin network
$< μ_{•, i}, σ_{•, i} >$	The characteristic description of $T_{•, i}$
$SIM (X_{current}, T_{•, i})$	Measure the similarity between $X_{current}$ and $T_{•, i}$ based on the Siamese network
${SIM}_{\max}$	The maximum similarity between $T_{•, i}$ and $X_{current}$ in $T_{•}$ is measured by using SCNN
$θ_{sim}$	Measure the similarity threshold
$w_{i}$	Weight parameter
$L_{j}$	The j-th image in $L$ and its corresponding feature
$I$	The number of templates in $T_{•}$
$X_{j}$	The j-th image in $L$
$μ_{j}$	The average burning line of $X_{j}$
$σ_{j}$	The variance of the combustion line of $X_{j}$
J	The number of images in $L$

References

Li, B.; Liu, Y.; Xiong, X.; Zhang, M.; Duan, O. Study on the Behavior of Chlorine in the Melting Process of Municipal Solid Waste Incineration Fly Ash. Environ. Sanit. Eng. 2025, 33, 110–115. [Google Scholar]
Chen, A.; Chen, J.; Cui, J.; Fan, C.; Han, W. Research on the Risk and Countermeasures of “Garbage Encirclement” in 31 Municipalities Directly under the Central Government and Provincial Capitals (Capital Cities) in China: An Empirical Study Based on the DIIS Method. Bull. Chin. Acad. Sci. 2019, 34, 797–806. [Google Scholar]
National Development and Reform Commission. Guiding Opinions on Promoting High-Quality Development of Municipal Solid Waste Incineration Power Generation. 2021. Available online: https://www.ndrc.gov.cn/wsdwhfz/202111/t20211111_1303691.html (accessed on 2 August 2025).
Meng, X.; Hou, Q.; Qiao, J. Intelligent Operation Optimization of Municipal Solid Waste Incineration Process Based on Multi-Objective Particle Swarm Optimization Algorithm. Acta Autom. Sin. 2024, 50, 2462–2473. [Google Scholar]
Tang, J.; Xia, H.; Yu, W.; Qiao, J. Research Status and Prospect of Intelligent Optimization Control in Urban Solid Waste Incineration Process. Acta Autom. Sin. 2023, 49, 2019–2059. [Google Scholar]
Tang, J.; Xu, W.; Xia, H.; Qiao, J. Missing Data Filling and Application for Urban Solid Waste Incineration Process. J. Beijing Univ. Technol. 2023, 49, 435–448. [Google Scholar]
Liu, S. Incomplete big data filling algorithm based on machine learning. Int. Core J. Eng. 2021, 7, 461–466. [Google Scholar]
Wu, S.; Feng, X.; Shan, Z. Missing data imputation approach based on incomplete data clustering. Chin. J. Comput. 2012, 35, 1726–1738. [Google Scholar] [CrossRef]
Emmanuel, T.; Maupong, T.; Mpoeleng, D.; Semong, T.; Mphago, B.; Tabona, O. A survey on missing data in machine learning. J. Big Data 2021, 8, 140. [Google Scholar] [CrossRef]
Deng, J.; Shan, L.; He, D.; Tang, R. Methods for Handling Missing Data and Their Development Trends. Stat. Decis. 2019, 35, 28–34. [Google Scholar]
Li, B.; Huang, Q.; Lu, X.; Zhang, X.; Wang, B. Cross-Domain Deep Text Matching Based on Meta-Learning Instance Weighting Method. Softw. Guide 2025, 1–8. Available online: https://link.cnki.net/urlid/42.1671.TP.20250715.1722.028 (accessed on 2 August 2025).
Liang, L.; Lin, J.; Huo, Y. Adaptive k-Nearest Neighbor Missing Value Imputation Method Based on Probability Density. J. South China Norm. Univ. 2024, 56, 80–90. [Google Scholar]
Kabir, G.; Tesfamariam, S.; Hemsing, J.; Sadiq, R. Handling incomplete and missing data in water network database using imputation methods. Sustain. Resilient Infrastruct. 2020, 5, 365–377. [Google Scholar] [CrossRef]
Yan, Q.; Hu, M.; Jin, S.; Huang, W. Gap Filling for ISMN Time Series Using CYGNSS Data. IEEE Geosci. Remote Sens. Lett. 2025, 22, 2500105. [Google Scholar] [CrossRef]
Tzoumpas, K.; Estrada, A.; Miraglio, P.; Zambelli, P. A Data Filling Methodology for Time Series Based on CNN and (Bi)LSTM Neural Networks. IEEE Access 2024, 12, 31443–31460. [Google Scholar] [CrossRef]
Zhu, H.; Deng, Z.; Tang, Z.; Zhu, H. An improved expectation maximization algorithm for missing data management of concrete pump truck. J. Cent. South Univ. (Sci. Technol.) 2021, 52, 443–450. [Google Scholar]
Zhao, X.; Zhang, Y.; Yin, B.; Liu, H.; Zhang, K. Imputation of incomplete bus arrival time based on the improved k-means algorithm. J. Beijing Univ. Technol. 2018, 44, 135–143. [Google Scholar]
Lu, S.W.; Wu, X.L.; Zheng, J.; He, Z.; Gu, J.; Han, H. Data-cleaning method based on dynamic fusion LOF for municipal wastewater treatment process. Control. Decis. 2022, 37, 1231–1240. [Google Scholar]
Han, H.; Lu, S.; Wu, X.; Qiao, J. Abnormal data cleaning method for municipal wastewater treatment based on improved support vector machine. J. Beijing Univ. Technol. 2021, 47, 1011–1020. [Google Scholar]
Hu, L.; Xu, X.; Xiao, P.; Wang, Y.; Liu, J. A Review of Missing Data Handling Methods for Time Series. Control Theory Appl. 2025, 1–15. Available online: https://link.cnki.net/urlid/44.1240.TP.20250716.1838.114 (accessed on 2 August 2025).
Yin, H.; Zhu, Z.; Yan, Y.; Wang, C.; Zhong, M.; Feng, W.; Zhu, J.; Gu, D. Preprocessing of GRACE-FO Gravity Satellite Laser Interferometry Ranging Raw Data. Chin. J. Geophys. 2025, 68, 1615–1632. [Google Scholar]
Guo, H. Quantitative Study on Flame Combustion Line for Grate Furnace Municipal Solid Waste Incineration Process. Master’s Thesis, Beijing University of Technology, Beijing, China, 2023. [Google Scholar]
Huang, W.; Meng, X.; Qiao, J. Dynamic Collaborative Optimization Method for Municipal Solid Waste Incineration Process. Sci. Sin. 2025, 55, 1200–1220. [Google Scholar] [CrossRef]
Cosimo, M.; Marco, M.; Nicolas, S. The relationship between municipal solid waste and greenhouse gas emissions: Evidence from Switzerland. Waste Manag. 2020, 113, 508–520. [Google Scholar] [CrossRef] [PubMed]
Guo, H.; Tang, J.; Ding, H.; Qiao, J. Combustion State Recognition for MSWI Processes Based on Hybrid Data Augmentation. Acta Autom. Sin. 2024, 50, 560–575. [Google Scholar]
Niu, Z.; Reformat, M.Z.; Tang, W.; Zhao, B. Electrical Equipment Identification Method with Synthetic Data Using Edge-Oriented Generative Adversarial Network. IEEE Access 2020, 8, 136487–136497. [Google Scholar] [CrossRef]
Liu, X.; Ren, Y.; Wang, L. Low-Illumination LiDAR Image Missing Region Inpainting Algorithm Based on U-Net and GAN. Laser J. 2025, 46, 135–141. [Google Scholar]
Lin, C.T.; Huang, S.W.; Wu, Y.Y.; Lai, S.H. GAN-Based Day-to-Night Image Style Transfer for Nighttime Vehicle Detection. IEEE Trans. Intell. Transp. Syst. 2021, 22, 951–963. [Google Scholar] [CrossRef]
Zhang, C.; Tang, Y.; Zhao, C.; Sun, Q.; Ye, Z.; Kurths, J. Multitask GANs for Semantic Segmentation and Depth Completion With Cycle Consistency. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5404–5415. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Han, X.; Zhang, H.; Zhao, L. Edge detection algorithm of image fusion based on improved Sobel operator. In Proceedings of the IEEE 3rd Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 3–5 October 2017; pp. 457–461. [Google Scholar]
Liao, R.; Ma, C.; Xie, Y.; Sui, X.; Yang, J.; Li, T.; Zhang, P. Dataset of Active Industrial Heat Source Areas in India from 2012 to 2021. China Sci. Data 2025, 10, 164–174. [Google Scholar]
Wang, L.; Bai, J.; Li, Y.; Li, W. Dense Control Valve Parts Dataset for Industrial Object Detection. Opt. Precis. Eng. 2024, 32, 1241–1251. [Google Scholar] [CrossRef]
Wang, P.; Zeng, L.; Shao, X.; Li, J. Research on Multi-Source Data Fusion Modeling Method for Wing Aerodynamic Load Based on Pre-Training and Fine-Tuning. Acta Aeronaut. Astronaut. Sin. 2025, 1–15. Available online: https://link.cnki.net/urlid/11.1929.v.20250725.1653.016 (accessed on 2 August 2025).
Yan, Y.; Liu, X.; Li, Z.; Sha, Y.; Zhang, W.; Wang, R. A Virtual-Real Combined Oil Storage Control Experiment Platform Based on Data-Driven Modeling. Exp. Technol. Manag. 2025, 42, 179–187. [Google Scholar]

Figure 1. MSWI process flow of typical grate furnace.

Figure 2. Optimization control mode of “perception—prediction—control” based on the behavior of the domain expert embodied intelligence.

Figure 3. Structure of the edge verification platform of an actual MSWI power plant.

Figure 4. Multimodal dataset construction strategy.

Figure 5. Training set prediction results.

Figure 6. Validation set prediction results.

Figure 7. New data prediction results.

Figure 8. Partial flame images of the grate on the left.

Figure 9. Structure the constructed multimodal dataset.

Figure 10. Partial process data diagram.

Figure 11. Part of the flame images on the left side of the grate.

Figure 12. Part of the flame images on the right side of the grate.

Figure 13. Application prospect based on a multi-model dataset.

Table 1. Variable information for MVs, CVs and EIs.

Type	Process Variable Types
Type	Process Variable Name	Value Range	Unit
MVs	Primary air temperature	[90.20, 182.82]	°C
	Total primary air volume	[50.72, 79.58]	km³N
	Primary air pressure	[1.63, 3.33]	KPa
	Drying zone air volume	[9.60, 17.29]	km³N/h
	Combustion stage 1 air volume	[20.05, 42.09]	km³N/h
	Combustion stage 2 air volume	[12.39, 20.31]	km³N/h
	Burnout zone exhaust air volume	[2.17, 9.70]	km³N/h
	Secondary air temperature	[11.49, 22.99]	°C
	Secondary air volume	[0, 44.80]	km³N/h
	Waste feed rate	[3.39, 7.33]	m/h
	Grate speed in the drying zone	[4.25, 6.90]	m/h
	Grate speed in combustion stage 1	[4.35, 6.93]	m/h
	Grate speed in combustion stage 2	[4.90, 6.66]	m/h
	Boiler feedwater flow	[0.73, 1.73]	t/h
	Urea dosage	[1.03, 3.90]	L/h
	Quicklime dosage	[382, 410]	kg/h
	Activated carbon dosage	[25.20, 25.20]	kg/h
CVs	Furnace temperature	[897.10, 1051.15]	°C
	Flue gas oxygen content	[5.19, 11.76]	%
	Steam flow rate	[56.39, 77.81]	t/h
	Flue gas temperature at the burnout grate outlet	[591.80, 892.19]	°C
	Quantified value of the combustion line	[0, 1]	%
EIs	Carbon monoxide (CO)	[0, 293.68]	mg/m³N
	Nitrogen oxides (NO_x)	[0, 430.69]	mg/m³N
	Sulfur dioxide (SO₂)	[0, 52.38]	mg/m³N
	Hydrogen chloride (HCl)	[0, 6.45]	mg/m³N
	Particulate matter (PM)	[1.49, 7.86]	mg/m³N
	Hydrogen fluoride (HF)	[0, 0.20]	mg/m³N
	Oxygen content in G3 flue gas	[7.69, 25.00]	%

Table 2. Processed 15 MVs and their original ones.

Processed MVs	Original MVs	Calculation Method
Primary air temperature	Primary air temperature	/
Total primary air volume	The air volume of the first drying section on the left, the air volume of the first drying section on the right, the air volume of the second drying section on the left, the air volume of the second drying section on the right, the air volume of the first combustion section on the left, the air volume of the first combustion section on the right, the air volume of the second combustion section on the left, the air volume of the second combustion section on the right, the air volume of the second combustion section on the left, and the air volume of the second combustion section on the right	Summation
Primary air pressure	Primary air pressure	/
Drying zone air volume	The air volume of the first drying section on the left, the air volume of the first drying section on the right, the air volume of the second drying section on the left, and the air volume of the second drying section on the right	Summation
Combustion stage 1 air volume	The air volume of the first stage of combustion on the left, the air volume of the first stage of combustion on the right, the air volume of the second stage of combustion on the left, and the air volume of the second stage of combustion on the right	Summation
Combustion stage 2 air volume	The air volume of the first stage of combustion on the left, the air volume of the first stage of combustion on the right, the air volume of the second stage of combustion on the left, and the air volume of the second stage of combustion on the right	Summation
Burnout zone exhaust air volume	The combustion air volume on the left and the combustion air volume on the right	Summation
Secondary air temperature	Secondary air temperature	/
Secondary air volume	Secondary air volume	/
Waste feed rate	The left inner feeding speed, left outer feeding speed, right inner feeding speed, and right outer feeding speed	Take the mean value
Grate speed in the drying zone	The left inner drying grate velocity, the left outer drying grate velocity, the right inner drying grate velocity, and the right outer drying grate velocity	Take the mean value
Grate speed in combustion stage 1	The combustion speeds of the first section of the left inner grate, the first section of the left outer grate, the first section of the right inner grate, and the first section of the right outer grate	Take the mean value
Grate speed in combustion stage 2	The combustion grate speed of the left inner burner, the combustion grate speed of the left outer burner, the combustion grate speed of the right inner burner, and the combustion grate speed of the right outer burner	Take the mean value
Boiler feedwater flow	The current cumulative amount and the cumulative amount of the previous minute	The current value minus the value of the previous minute
Urea dosage	The current cumulative amount and the cumulative amount of the previous minute	The current value minus the value of the previous minute

Table 3. The evaluation indicators for the comparative experimental results.

Dataset	Method	RMSE	MAE	R²
Training Set	KNN	1.4309	0.9461	0.2727
	XGBoost	0.7334	0.6160	0.7663
	LRDT	0.6829	0.5654	0.7974
Validation set	KNN	1.6145	1.2778	−1.1364
	XGBoost	1.1402	0.9236	0.4635
	LRDT	1.0356	0.7811	0.5575

Table 4. The process of flame image quantization.

Original Flame Image	Image of the Burning Line	Quantified Value of the Combustion Line
		0.7287
		0.7112
		0.6946

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hua, Y.; Tang, J.; Tian, H. Benchmark Operational Condition Multimodal Dataset Construction for the Municipal Solid Waste Incineration Process. Sustainability 2026, 18, 2282. https://doi.org/10.3390/su18052282

AMA Style

Hua Y, Tang J, Tian H. Benchmark Operational Condition Multimodal Dataset Construction for the Municipal Solid Waste Incineration Process. Sustainability. 2026; 18(5):2282. https://doi.org/10.3390/su18052282

Chicago/Turabian Style

Hua, Yapeng, Jian Tang, and Hao Tian. 2026. "Benchmark Operational Condition Multimodal Dataset Construction for the Municipal Solid Waste Incineration Process" Sustainability 18, no. 5: 2282. https://doi.org/10.3390/su18052282

APA Style

Hua, Y., Tang, J., & Tian, H. (2026). Benchmark Operational Condition Multimodal Dataset Construction for the Municipal Solid Waste Incineration Process. Sustainability, 18(5), 2282. https://doi.org/10.3390/su18052282

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Benchmark Operational Condition Multimodal Dataset Construction for the Municipal Solid Waste Incineration Process

Abstract

1. Introduction

2. Description of MSWI Process

3. Materials and Methods

3.1. Data Collection and Selection Description

3.2. Process Data Processing Module

3.2.1. Process Data Preprocessing Sub-Module

3.2.2. Missing Value Handling Sub-Module

3.2.3. Outlier Data Handling Sub-Module

3.3. Flame Image Processing Module

3.4. Combustion Line Quantification Module

3.4.1. Complete Image Library Construction Sub-Module

3.4.2. Dynamic Combustion Line Quantification Sub-Module

3.4.3. Template Library Adaptive Adjustment Sub-Module

3.5. Multimodal Data Synchronization Module

4. Results

4.1. Process Data Processing Results

4.2. Flame Image Processing Results

4.3. Combustion Line Quantification Result

4.4. Multimodal Data Synchronization Results

5. Discussion

5.1. Limitations and Scalability

5.2. Application Prospects

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI