Next Article in Journal
Particulate Matter 2.5 (PM2.5): Persistence and Trends in the Air Quality of Five India Cities
Previous Article in Journal
Discovery of Large Methane Emissions Using a Complementary Method Based on Multispectral and Hyperspectral Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Deep Learning Model for NOx Emissions Prediction of a 660 MW Coal-Fired Boiler Considering Multiscale Dynamic Characteristics

1
School of Power Engineering, Jiangxi Electric Vocational & College, Nanchang 330032, China
2
School of Chinese Language and Literature, Nanjing XiaoZhuang University, Nanjing 211171, China
3
School of Electrical Engineering, Nanjing Vocational University of Industry Technology, Nanjing 210023, China
*
Author to whom correspondence should be addressed.
Atmosphere 2025, 16(5), 533; https://doi.org/10.3390/atmos16050533
Submission received: 25 March 2025 / Revised: 15 April 2025 / Accepted: 28 April 2025 / Published: 30 April 2025
(This article belongs to the Section Air Quality)

Abstract

:
Coal-fired boilers significantly contribute to nitrogen oxides (NOx) emissions, posing critical environmental and health risks. Effective prediction of NOx emissions is essential for optimizing control measures and achieving stringent emission standards. This study applies a Multiscale Graph Convolutional Network (MSGNet) designed to capture multiscale dynamic relationships among operational parameters of a 660 MW coal-fired boiler. MSGNet employs Fast Fourier Transform (FFT) for automatic periodic pattern recognition, adaptive graph convolution for dynamic inter-variable relationships, and a multihead attention mechanism to assess temporal dependencies comprehensively. Compared with the existing state of the art, the proposed structure achieves a good performance of 2.176 mg/m3, 1.652 mg/m3, and 0.988 of RMSE, MAE, and R 2 . Experimental evaluations demonstrate that MSGNet achieves superior predictive performance compared with traditional methods such as LSTM, BiLSTM, and GRU. Results underscore MSGNet’s robust accuracy, stability, and generalization capability, highlighting its potential for advanced emission control and environmental management applications in thermal power generation.

1. Introduction

Coal-fired boilers play a crucial role in China’s electricity generation. However, nitrogen oxide (NOx) emissions from these boilers have attracted increasing attention [1,2,3]. NOx emissions are a major contributor to atmospheric pollutants, responsible for acid rain, photochemical smog, and pose significant risks to human health and ecosystems [4,5,6]. With stricter environmental policies, controlling and optimizing NOx emissions has become a critical challenge in the power industry. Therefore, accurately predicting NOx emissions is essential for optimizing denitrification processes, improving environmental outcomes, ensuring equipment safety, and achieving ultra-low emission goals [7].
However, accurately modeling NOx generation and emissions from coal-fired boilers is challenging due to complex combustion processes, frequent load fluctuations, variability in fuel quality, and changing environmental conditions [8,9]. Traditional analytical models based on combustion mechanisms struggle to represent these complexities accurately [10]. Moreover, conventional continuous emission monitoring systems (CEMS) suffer from inherent limitations such as measurement delays, equipment aging, signal interference, and maintenance difficulties [11]. These limitations hinder real-time monitoring of NOx concentrations, limiting precise ammonia injection control.
To address these challenges, data-driven machine learning methods have recently gained prominence. Approaches such as Support Vector Machines (SVMs) [12], Convolutional Neural Networks (CNNs) [13], Long Short-Term Memory (LSTM) networks [14], and ensemble learning [15] have been widely employed for predicting NOx emissions from coal-fired boilers. Remarkable advancements have also been achieved through the application of intelligent algorithms such as Autoencoders (AE) [16], Artificial Neural Networks (ANNs) [17,18,19], Gaussian Processes (GPs) [20], improved SVM variants [21,22], and Extreme Learning Machines (ELMs) [23,24,25]. These approaches have successfully captured the complex nonlinear relationships between auxiliary variables and target NOx emissions, showing clear advantages in managing dynamic combustion systems. In addition, Pachauri et al. [26] proposed a stacked ensemble model that improved predictive accuracy for CO and NOx emissions. Tiep Nguyen et al. [27] developed a hyperparameter optimization framework that significantly enhanced deep learning models for regression tasks. Similarly, de Lima Nogueira et al. [28] demonstrated that a Random Forest regression model achieved high predictive accuracy for engine emissions, further supporting the potential of machine learning models in emissions prediction. These methods effectively utilize historical operational data, circumventing the uncertainties and complexities associated with traditional analytical models, thus achieving considerable predictive accuracy and efficiency in practical applications.
Nevertheless, existing machine learning models often overlook dynamic relationships between variables across different time scales during NOx generation and emission. Additionally, they fail to capture both short-term and long-term dependencies within variables, resulting in limited predictive performance under complex operational conditions. Particularly under rapid load fluctuations, single-scale or traditional structured models are inadequate in adapting to multiscale dynamic data features, further constraining their practical application.
In response, this paper proposes a novel prediction model based on a Multiscale Graph Convolutional deep learning Network (MSGNet) [29] to accurately predict NOx emissions from a 660 MW coal-fired boiler. Firstly, the Random Forest (RF) [30] algorithm is applied to screen auxiliary variables, effectively reducing input redundancy and mitigating noise interference, thus enhancing the model’s prediction stability. Subsequently, the MSGNet employs Fast Fourier Transform (FFT) [31] to automatically identify dominant periodic characteristics from historical operational data, establishing data sequences at various time scales accordingly. Moreover, an adaptive graph convolutional mechanism is utilized to capture dynamic interdependencies among variables across multiple scales, enabling the model to effectively extract both long-term and short-term temporal dependencies inherent in the data. Additionally, the integration of a multihead attention mechanism further enhances the sensitivity and adaptability of the model to dynamic features, significantly improving predictive performance under complex operating conditions. An adaptive weighted aggregation strategy is also designed, synthesizing predictions across different time scales to further improve accuracy and stability.
Prior studies have demonstrated the superior performance of MSGNet over traditional models in time series predictions across multiple standard datasets. Its capabilities make it particularly suitable for handling nonlinear, multiscale, and dynamic characteristics inherent in coal-fired boiler operation data. Thus, this research aims to apply the MSGNet algorithm for predicting NOx emissions from coal-fired boilers, striving to overcome limitations of traditional methods, enhance model accuracy and generalizability, and provide robust theoretical and technical support for efficient denitrification control strategies, operational optimization, and environmental management.
The remainder of this paper is structured as follows: Section 2 provides an overview of the thermal power plant under investigation. In Section 3, the proposed methodology is detailed, encompassing both data preprocessing procedures and the implementation of the MSGNet algorithm. Section 4 discusses the experimental setup along with the results obtained from historical operational data. Finally, Section 5 concludes this study by summarizing the key findings and implications.

2. Description of the Research Object

2.1. The Studied Boiler

In this research, a 660 MW tangentially coal-fired boiler equipped with a low-NOx concentric firing system (LNCFS) was selected as the study object. Figure 1 shows the overall flow chart of the coal-fired power plant and the cross-section of the boiler furnace. In the coal-fired boiler, the combustion system features burners installed at each corner of the boiler’s cross-section, which inject pulverized coal mixed with hot primary air to establish a swirling flame pattern. This arrangement aims to reduce NOx emissions and enhance combustion efficiency by controlling the combustion temperature and improving air-fuel mixing.
The specific arrangement of the combustion system is designed to optimize the air-fuel ratio, minimize temperature peaks, and promote stable combustion. The boiler employs six primary air nozzles labeled A to F, each paired with corresponding secondary air nozzles that are arranged systematically along the furnace’s vertical direction. The primary air nozzles direct the coal-air mixture into the furnace, while the secondary air nozzles provide additional air to ensure complete combustion. This staged air injection system allows for more efficient use of the fuel, reducing the amount of oxygen in the lower part of the furnace, which in turn lowers the formation of thermal NOx by limiting the high-temperature zones where NOx typically forms.
Additionally, the secondary air delivery system includes concentric firing nozzles (AI/AII-FI/FII) and auxiliary air nozzles (AA-EF), which help regulate the combustion environment and improve the distribution of air within the furnace. By controlling the air supply at different stages of the combustion process, the design reduces the formation of NOx emissions. Two layers of close-coupled overfire air (CCOFA) nozzles and five layers of separated overfire air (SOFA) nozzles are positioned above the main combustion zone to optimize combustion staging and further promote the uniform distribution of temperature. The use of overfire air is particularly effective at lowering NOx emissions because it reduces the peak temperature within the combustion zone, where thermal NOx formation is most prevalent.
During operation, the concentric firing air nozzles maintain a 25° deflection relative to the main injection axis, which helps improve the combustion air distribution and minimizes the formation of local hot spots. Similarly, the SOFA nozzles are angled at −12° to effectively introduce air above the main combustion zone, creating a controlled environment where fuel is burned more completely. Each primary burner nozzle (A–F) can adjust its vertical inclination angle to achieve optimal air-fuel mixing, ensuring that the combustion process is as efficient as possible. The horizontal deflection angle remains fixed to maintain flame stability and reduce pollutant formation, as this arrangement helps ensure that the flame is properly shaped and evenly distributed across the furnace.
These combustion system design features are key strategies that help reduce NOx emissions by controlling the temperature distribution within the furnace and enhancing the efficiency of the combustion process. This design is based on established methods proven to reduce NOx emissions in large-scale coal-fired boilers, and while this study does not propose new combustion system designs, it leverages these well-established techniques to optimize NOx emission prediction.

2.2. Variables Selection

The formation and emission of NOx in coal-fired boilers are primarily influenced by operating parameters and coal properties. However, real-time coal quality data are typically unavailable due to practical limitations in monitoring technology at power plants. Previous research [5,7] has indicated that operational parameters can effectively reflect variations in coal quality, thereby providing sufficient information for accurate NOx emission modeling. For instance, earlier studies have demonstrated successful NOx prediction outcomes by exclusively utilizing operational parameters without explicitly incorporating coal characteristics [32,33]. Thus, in the current research, the NOx emission prediction model predominantly considers boiler operational variables.
In this work, operational variables are selected by analyzing their correlation with historical NOx emission data recorded under various operational scenarios. As presented in Table 1, a comprehensive dataset comprising 32 variables, including boiler load, oxygen concentration, air flow rates, damper positions, and air nozzle configurations, is assembled. Data are collected from historical operation records at 30-s intervals, yielding a total of 14,000 valid samples. As illustrated in Figure 2, these samples are further normalized and systematically partitioned into training, validation, and testing subsets at a ratio of 8:1:1 to facilitate model evaluation. Specifically, considering the dynamic and temporal nature of the NOx prediction task, random partitioning would disrupt the inherent temporal continuity and correlations among operational parameters. Therefore, we employed a sequential data partitioning approach, preserving the chronological order of collected samples to accurately reflect real-world operational conditions. The training dataset contains the majority of samples, including significant load variations, enabling comprehensive learning of dynamic behaviors. Validation and testing datasets sequentially follow the training dataset to objectively assess the model’s predictive performance and generalization capability over successive operational periods, thus maintaining the temporal consistency necessary for dynamic model evaluation.

3. Methodology

3.1. Data Preprocessing

3.1.1. Data Normalization

First of all, the collected historical data samples should be normalized to a uniform range from different magnitudes. Without normalization, features with larger magnitudes could dominate the model’s learning process, leading to biased predictions and slower convergence. To avoid this, we apply the Z-score method to standardize all variables, ensuring a mean of zero and unit variance. This ensures that the model treats each variable equally, improving both convergence and predictive accuracy. The specific calculation process is as follows:
x n o r m = x μ σ
where x n o r m is the normalized data, x denotes the original value, μ and σ are the mean and standard variance of the collected data, respectively.

3.1.2. Auxiliary Variables Determination

As shown in Table 1, 31 auxiliary variables were initially selected for modeling based on the boiler operation mechanism. It is notable that too many variables will increase the complexity of the model and introduce more noise. Hence, it is necessary to fully evaluate the correlation between different variables and target NOx before modeling, and then screen the auxiliary variables to reduce the dimension of input parameters. In this work, the Random Forest (RF) algorithm [30] is employed, and Figure 3 presents the basic structure of RF.
Out-of-bag (OOB) [34] error estimation constitutes a notable attribute of Random Forests (RFs). Specifically, during each decision tree’s construction, a subset of data is randomly withheld from training, forming what is known as the out-of-bag samples. These samples subsequently serve to quantify the predictive accuracy of the corresponding tree. By aggregating the prediction errors computed from OOB samples across all trees, an unbiased measure of the overall RF model performance can be derived. Furthermore, OOB error serves as a crucial indicator for assessing variable importance and guiding model refinement. Evaluating variations in OOB errors between the original RF model and modified versions facilitates an in-depth understanding of the influence exerted by particular features. Consequently, OOB error emerges as an effective methodological tool for feature selection, mathematically represented as follows:
D v = 1 + 1 B b = 1 B R O O B , b R O O B , b , v
where D v represents the final error, B is the number of all trees, R O O B , b denotes the OOB error for the bth tree in the Bth fold, and R O O B , b , v represents the OOB error for a variant of the model.

3.2. Multiscale Graph Convolutional Deep Learning Network

This study employs a multiscale deep learning method termed MSGNet to realize the NOx emissions prediction. MSGNet aims to effectively capture dynamic inter-series correlations at different temporal scales in multivariate time series forecasting. As illustrated in Figure 4, the MSGNet algorithm primarily includes three core modules: scale identification, multiscale adaptive graph convolution, and multihead attention.

3.2.1. Data Embedding

For multivariate temporal-data-based prediction, consider the dimension of the given original data is N. Specifically, define a matrix X t L : t R N × L containing historical samples. Each entry X τ i represents the observed value of the ith variable at time point τ with τ t L , t 1 . L indicates the size of the sliding window, and t denotes the start of the forecasting horizon. For the NOx emissions prediction objective, the next T time steps of future targets Y t : t + T should be accurately estimated.
Before executing the scale identification part, the original data should first be embedded with a sliding window [35]. The embedded data X e R d × L can be calculated via
X e = α C o n v 1 D X ^ t L : t + X p e + m = 1 M X m s e
where X ^ t L : t denotes the normalized values of X t L : t , C o n v 1 D · represents the 1 D convolutional filters, α is a trade-off coefficient, X p e is defined as the positional embedding vector, and X m s e is a global timestamp.

3.2.2. Scale Identification

The Fast Fourier Transform (FFT) [31,36] is a widely utilized computational technique designed to convert time-domain data into frequency-domain representations. By conducting such a transformation, FFT facilitates the identification and extraction of critical frequency-domain features from time series data, including inherent periodicities and fluctuation patterns. Conceptually, FFT decomposes the original time series into fundamental sinusoidal components. Given the embedded data X e , the discrete Fourier transform X e f can be represented as
X e f = i = π = 0 n 1 X e e i 2 π f i / n
where X e f is the the discrete Fourier transform corresponding to frequency f.
Employing this transformation, FFT effectively characterizes the frequency-domain features hidden within the time series, enabling subsequent data analysis and interpretation. Notably, the FFT output is typically presented as a complex-valued array, wherein each frequency component encompasses both amplitude and phase information. To determine the amplitude spectrum A m p ( X e f ) , the absolute value of each complex frequency component is calculated as follows:
A m p ( X e f ) = | X e f |
Given that high-frequency components frequently introduce unwanted noise, focusing on significant periodic signals typically involves identifying dominant frequency components with larger amplitudes. After omitting the direct current component, the k frequencies exhibiting the most pronounced amplitudes are selected as prominent periodicities. This selection criterion is expressed as
f = arg max k A m p ( X e f )
Subsequently, the period P associated with these primary frequencies is derived through the following formula:
P = n f m a i n
where f m a i n denotes the frequency with the dominant amplitude.
Time series data often exhibit clear periodic patterns, and such periodicities can lead to varied correlation structures across different temporal scales. MSGNet employs the FFT to identify these prominent periodic patterns automatically. The key scales are derived as follows:
F = A v g A m p ( X e f ) f 1 , , f k = argmax F , s i = L f i
where f i are the amplitudes that indicate the important periodicities, and s i represents the corresponding time scales.
Based on the calculated time scales s i , s 2 , , s k and the corresponding frequency f 1 , f 2 , , f k , the 1 D data should be reshaped into 3 D tensors for analyzing correlations at different time scales, and the expression can be given as follows:
X i = R e s h a p e f i , s i ( P a d d i n g ( X 1 D ) ) , i = 1 , 2 , , k
where the term P a d d i n g ( · ) is an expansion function along the time dimension, its purpose is to make the R e s h a p e function easy to convert 1 D data to 3 D tensors, and X i R d × s i × f i is the reshaped tensor with respect to time scale s i .

3.2.3. Multiscale Adaptive Graph Convolution

Graph convolution networks (GCNs) are then employed to capture inter-series correlations at each temporal scale. First of all, data at each scale is projected onto graph nodes via linear transformations:
H i = W i X i
where W i is a weight matrix for the ith time scale.
For a given temporal scale s i , it is possible to construct a corresponding graph structure G i based on the provided time series data [37]. Consequently, when considering a dataset characterized by k distinct temporal scales, a set of adjacency matrices denoted as A 1 , A 2 , , A k with each matrix A i R N × N can be derived. And the adjacency matrices A i are obtained to dynamically capture inter-variable relations:
A i = S o f t M a x ( R e L U ( E i 1 ) ( E i 2 ) )
where E i 1 and E i 2 are two trainable matrices, the S o f t M a x ( · ) function is employed to normalize the values of edges between different node points, and R e L U ( · ) is the activation function.
Subsequently, a Mixhop graph convolution operation is employed to capture high-order graph relations inhabited in the temporal data:
H i o u t = σ ( j Γ ( A i ) j H i )
where Γ is a hyperparameter consisting of integer adjacency powers, ‖ represents the relations between different variable columns, and σ ( · ) is the sigmoid function. Further, a multilayer network can be used to remap H i o u t to the original 3 D tensor form as X ^ i .

3.2.4. Multihead Attention Mechanism

After capturing dynamic inter-variables correlations, the multihead attention (MHA) method can be employed to extract intra-series temporal dependencies. MHA is commonly utilized in transformer architectures, which simultaneously captures multiple contextual dependencies within sequences. Given the input query, key, and value matrices Q , K , V , it transforms these inputs into several independent subspaces through learned linear projections. The process is formally expressed as
M H A ( Q , K , V ) = C o n c a t ( h e a d 1 , , h e a d k ) W O
where each individual attention head computes attention separately according to
h e a d i = S o f t M a x ( Q W i Q ) ( K W i K ) d V W i V
where W i Q , W i K , W i V denote projection parameters in the ith head, and W O is the final output transformation matrix, d represents the scaling term.
Then, for the obtained graph relations X ^ i , the corresponding MHA process can be formulated as follows:
X i o u t ^ = M H A ( X ^ i )
Further, representations at different time scales are aggregated according to their FFT-based amplitude weights, emphasizing crucial scales:
α ^ 1 , , α ^ k = S o f t M a x ( F f 1 , , F f k ) X o u t = i = 1 k α ^ i X i o u t
where α ^ i denotes the attention-based weight for each scale.
Finally, linear projections along both temporal and variable dimensions are utilized to predict the target NOx emissions:
Y ^ t : t + T = W s X o u t W t + b i a s
where W s and W s are the trainable matrices, and T denotes the prediction horizon.

3.3. Evaluating Indices

The prediction performance of the training and testing sets is evaluated using five metrics, among them are the coefficient of determination ( R 2 ), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPE). The definitions for these metrics are described below:
R 2 = 1 i = 1 n ( y i y ^ i ) 2 i = 1 n ( y i y ¯ ) 2 , RMSE = 1 n i = 1 n ( y i y ^ i ) 2 MAE = 1 n i = 1 n | y i y ^ i | , MSE = 1 n i = 1 n ( y i y ^ i ) 2 MAPE = 1 n i = 1 n y i y ^ i y i × 100 %
where y i denotes the actual measurements, y i ^ represents the predictions, and y ¯ is the mean of y i .

4. Results and Discussion

4.1. Analysis of Auxiliary Variables Determination

Figure 5 illustrates the variable importance coefficients derived from the RF algorithm, quantifying the contributions of different auxiliary variables to the NOx value. The importance values of these variables are presented in descending order using a bar chart. Among these, oxygen content is identified as the most influential variable, exhibiting a substantially higher importance coefficient compared with other factors. Following oxygen content, Coal Rate A and Secondary Air B demonstrate high relevance, with importance coefficients of 0.193 and 0.174, respectively. Moreover, cumulative importance analysis indicates that the first 18 variables collectively account for 95% of the total importance. The remaining variables possess comparatively lower significance and are consequently eliminated during the modeling process. These findings underscore that a limited subset of key variables is sufficient to effectively characterize the primary influencing factors driving NOx emissions in boiler combustion processes.
From the results obtained through the RF importance evaluation, it is evident that the ranked variables align closely with the combustion mechanisms and theoretical foundations. Oxygen content emerges prominently as the most critical variable, in line with its direct influence on the combustion atmosphere and subsequent NOx formation pathways. Variables related to coal rate and secondary air distribution, such as Coal Rate A and Secondary Air B, also exhibit high importance, reinforcing their recognized roles in shaping combustion intensity, flame stability, and temperature distributions within the furnace. Furthermore, variables such as primary air and secondary dampers, integral to regulating air–coal flow dynamics and mixture uniformity, are also ranked among the significant predictors. This consistency between the RF-derived variable importance and the known physical combustion processes confirms the soundness of the selected variables, thus validating the robustness and interpretability of the modeling approach.
The model was performed on a system with the following configuration: 13th Gen Intel(R) Core(TM) i7-13,700 processor, 32 GB RAM, running Windows 11, with Python version 3.10.9. The results presented in Table 2 clearly illustrate the impact of variable selection on both model performance and computational efficiency. Specifically, the model trained with the reduced set of 18 variables exhibits better performance compared with the model trained with the original set of variables. In terms of training performance, the model using the reduced set of variables achieves an R2 value of 0.995, which is slightly higher than the R2 of 0.987 observed for the model with all variables. Similarly, the MSE and RMSE for the reduced variables model are marginally better than the corresponding values for the full variable model. Additionally, the MAE and MAPE values are lower than those of the full variable model. In terms of testing performance, the model trained with the reduced variables still outperforms the full-variable model. Moreover, the training time required for the model with the reduced set of variables is significantly lower, at 131.6 s, compared with the 225.4 s required for the model using all variables. This reduction in computational cost further demonstrates the efficiency of reducing the number of input features without sacrificing predictive accuracy. Overall, through RF-based variable selection, unnecessary information is filtered out, which helps reduce interference and improves model efficiency while maintaining strong predictive performance.

4.2. Comparison with Other Algorithms

To evaluate the predictive performance of the proposed algorithm, this study conducts a comparative analysis with three alternative models commonly employed in recurrent neural networks for time series forecasting: the Long Short-Term Memory (LSTM) network, the Bidirectional Long Short-Term Memory (BiLSTM) network, and the Gated Recurrent Unit (GRU). LSTM addresses the issue of long-term dependencies by incorporating mechanisms such as forget gates, making it well suited for modeling longer sequences. BiLSTM enhances contextual understanding by integrating both forward and backward temporal information, though its bidirectional structure renders it less efficient for real-time applications compared with unidirectional models. GRU, characterized by a more streamlined architecture, offers faster training and performance compared with LSTM, albeit with a potentially reduced capacity to capture intricate long-term dependencies.
The training and testing comparison results are presented in Figure 6 and Table 3, which clearly demonstrate the superior performance of the MSGNet algorithm in predicting NOx concentrations. In the training phase, MSGNet achieved the highest coefficient of determination, with an R 2 = 0.995 , indicating its strong capability to accurately capture intrinsic data patterns. Besides, the value of RMSE is 2.176, the value of MAE is 1.652, and the MAPE index is 0.526 for MSGNet. Clearly, these indices are significantly lower compared with those of LSTM, BiLSTM, and GRU, highlighting MSGNet’s exceptional predictive accuracy. During the testing phase, MSGNet continued to exhibit outstanding predictive performance with an R 2 of 0.988, reflecting its excellent generalization capability. In comparison, the R 2 values for the other models were notably lower, specifically 0.933 for LSTM, 0.921 for BiLSTM, and 0.929 for GRU. Additionally, MSGNet exhibited significantly lower prediction errors in terms of RMSE (2.602), MAE (2.093), and MAPE (0.635), reinforcing its superior accuracy.
The scatter plots presented in Figure 6 further validate the quantitative analysis. The predicted values using MSGNet closely align with the actual measurements, adhering more closely to the diagonal line. This indicates higher accuracy, stability, and less data fluctuation compared with the other models. Overall, these analyses underscore MSGNet’s superior capability in multiscale dynamic feature extraction and attention mechanism, making it particularly suitable for predicting NOx concentrations in complex operational data of a 660 MW coal-fired boiler.
In contrast to conventional recurrent architectures such as LSTM, BiLSTM, and GRU, the proposed MSGNet introduces three key innovations that contribute to its superior performance. First, the FFT-based scale identification enables automatic decomposition of operational sequences into dominant periodic components, allowing the model to adaptively focus on key temporal cycles associated with NOx emission dynamics. Second, the multiscale adaptive graph convolution constructs inter-variable dependency graphs at each identified scale, capturing nonlinear correlations and evolving relationships between operational parameters more effectively than static input representations. Third, the multihead attention mechanism enhances the model’s capability to capture complex temporal dependencies by assigning adaptive weights to different time steps, thus improving sensitivity to transient events and load fluctuations. These design elements collectively allow MSGNet to extract both global and local dynamic patterns in the data, enabling robust prediction performance even under multistep forecasting scenarios. In practical applications, such architecture facilitates both accurate short-term prediction and stable long-term forecasting, making it particularly suitable for online emission monitoring and proactive control decision support.

4.3. Evaluation of Multistep Prediction Performance

Given the automatic selection of multiple time scales, MSGNet can also have strong adaptability in multistep prediction. In this study, steps 1 to 6 were selected for testing. Figure 7 presents a detailed comparative analysis of multistep prediction results using different algorithms (LSTM, BiLSTM, GRU, and MSGNet) for both training and testing sets. Three key metrics, namely R 2 , MAE, and RMSE, are employed for performance evaluation.
On the training set, MSGNet consistently outperforms other algorithms, maintaining superior R 2 values close to 1 across all six prediction steps. Meanwhile, LSTM, BiLSTM, and GRU algorithms exhibit a notable decline in R 2 , particularly after the third prediction step. Regarding error metrics, MSGNet demonstrates significantly lower MAE and RMSE values, indicating more accurate and stable predictions. In contrast, GRU shows the highest errors, with values steadily increasing across the prediction steps. LSTM and BiLSTM algorithms perform moderately but still fall notably behind MSGNet.
For the testing set, similar performance patterns emerge. MSGNet achieves consistently higher R 2 scores at each prediction horizon, demonstrating enhanced reliability and generalization capability. The error metrics (MAE and RMSE) further confirm MSGNet’s superior performance, showing the lowest and most stable trends. GRU exhibits substantial performance degradation, with errors sharply increasing as prediction steps advance. LSTM and BiLSTM again present intermediate performance, slightly better than GRU but significantly inferior to MSGNet.
In summary, MSGNet effectively addresses the multiscale temporal dynamics inherent in NOx emissions data. Its superior predictive performance, robustness across multiple prediction steps, and stability in longer horizons underline its suitability and advantage for complex multistep prediction tasks in coal-fired boiler emission modeling.

5. Conclusions

This research developed and validated a novel MSGNet model for accurately predicting NOx emissions from a 660MW coal-fired boiler, addressing limitations inherent in existing predictive methodologies. Through sophisticated multiscale feature extraction, adaptive graph convolution, and advanced temporal attention mechanisms, MSGNet effectively captured the complex dynamic interactions of operational parameters influencing NOx formation. The model exhibited exceptional predictive accuracy and robustness, significantly outperforming conventional deep learning algorithms (LSTM, BiLSTM, and GRU). These findings confirm MSGNet’s practical applicability and potential to enhance real-time NOx monitoring. It can also provide a basis for practical applicability in denitrification optimization, particularly when integrated with control systems such as SCR or combustion regulation modules. In future work, we aim to integrate the prediction model into a closed-loop emission control framework to further enhance boiler operation efficiency.

Author Contributions

Conceptualization, J.H. and Y.J.; methodology, J.H. and H.Y.; software, J.H.; validation, J.H.; formal analysis, J.H.; investigation, J.H.; resources, J.H.; data curation, J.H.; writing—original draft preparation, J.H.; writing—review and editing, Y.J.; visualization, H.Y.; supervision, Y.J.; project administration, Y.J.; funding acquisition, Y.J. and H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant 23KJD470005.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data in this study are presented in terms of tables and figures. Further requests will be considered by the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yue, H.; Worrell, E.; Crijns-Graus, W.; Zhang, S. The potential of industrial electricity savings to reduce air pollution from coal-fired power generation in China. J. Clean. Prod. 2021, 301, 126978. [Google Scholar] [CrossRef]
  2. Wu, Z.; Zhang, Y.; Dong, Z. Prediction of NOx emission concentration from coal-fired power plant based on joint knowledge and data driven. Energy 2023, 271, 127044. [Google Scholar] [CrossRef]
  3. Liu, Y.; Zhou, J.; Fan, W. A novel robust dynamic method for NOx emissions prediction in a thermal power plant. Can. J. Chem. Eng. 2023, 101, 2391–2402. [Google Scholar] [CrossRef]
  4. Qiao, J. A novel online modeling for NOx generation prediction in coal-fired boiler. Sci. Total Environ. 2022, 847, 157542. [Google Scholar] [CrossRef]
  5. Zhu, Y.; Yu, C.; Fan, W.; Yu, H.; Jin, W.; Chen, S.; Liu, X. A novel NOx emission prediction model for multimodal operational utility boilers considering local features and prior knowledge. Energy 2023, 280, 128128. [Google Scholar] [CrossRef]
  6. Song, M.; Xue, J.; Gao, S.; Cheng, G.; Chen, J.; Lu, H.; Dong, Z. Prediction of NOx Concentration at SCR Inlet Based on BMIFS-LSTM. Atmosphere 2022, 13, 686. [Google Scholar] [CrossRef]
  7. Chen, S.; Yu, C.; Zhu, Y.; Fan, W.; Yu, H.; Zhang, T. NOx formation model for utility boilers using robust two-step steady-state detection and multimodal residual convolutional auto-encoder. J. Taiwan Inst. Chem. Eng. 2024, 155, 105252. [Google Scholar] [CrossRef]
  8. Yin, G.; Li, Q.; Zhao, Z.; Li, L.; Yao, L.; Weng, W.; Zheng, C.; Lu, J.; Gao, X. Dynamic NOx emission prediction based on composite models adapt to different operating conditions of coal-fired utility boilers. Environ. Sci. Pollut. Res. 2022, 29, 13541–13554. [Google Scholar] [CrossRef]
  9. Tang, Z.; Wang, S.; Li, Y. Dynamic NOX emission concentration prediction based on the combined feature selection algorithm and deep neural network. Energy 2024, 292, 130608. [Google Scholar] [CrossRef]
  10. Wang, Z.; Zhou, Y.; Zhu, Y.; Yu, H.; Fan, W. Forecast of NOx Emissions for a 660 MW Coal-Fired Boiler with Multilayered Gradient Boosting Decision Tree Considering Multiple Operating Modes. ACS Omega 2024, 9, 45884–45897. [Google Scholar] [CrossRef]
  11. Sun, C.; Li, B.; Chen, L.; Gao, Y.; Jin, J.; Gu, X.; Yang, Y.; Lou, Y.; Zhao, Y.; Liao, H. An improved hourly-resolved atmospheric NOx emission inventory of industrial sources based on Continuous Emission Monitoring System data: Case of Jiangsu Province, China. J. Clean. Prod. 2023, 419, 138192. [Google Scholar] [CrossRef]
  12. Fan, W.; Si, F.; Ren, S.; Yu, C.; Cui, Y.; Wang, P. Integration of continuous restricted Boltzmann machine and SVR in NOx emissions prediction of a tangential firing boiler. Chemom. Intell. Lab. Syst. 2019, 195, 103870. [Google Scholar] [CrossRef]
  13. Wang, Z.; Peng, X.; Zhou, H.; Cao, S.; Huang, W.; Yan, W.; Li, K.; Fan, S. A dynamic modeling method using channel-selection convolutional neural network: A case study of NOx emission. Energy 2024, 290, 130270. [Google Scholar] [CrossRef]
  14. Wang, X.; Liu, W.; Wang, Y.; Yang, G. A hybrid NOx emission prediction model based on CEEMDAN and AM-LSTM. Fuel 2022, 310, 122486. [Google Scholar] [CrossRef]
  15. Kotyla, M.; Banasiewicz, A.; Krot, P.; Śliwiński, P.; Zimroz, R. NOx Emission Prediction of Diesel Vehicles in Deep Underground Mines Using Ensemble Methods. Electronics 2024, 13, 1095. [Google Scholar] [CrossRef]
  16. Tang, Z.; Wang, S.; Chai, X.; Cao, S.; Ouyang, T.; Li, Y. Auto-encoder-extreme learning machine model for boiler NOx emission concentration prediction. Energy 2022, 256, 124552. [Google Scholar] [CrossRef]
  17. Tuttle, J.F.; Vesel, R.; Alagarsamy, S.; Blackburn, L.D.; Powell, K. Sustainable NOx emission reduction at a coal-fired power station through the use of online neural network modeling and particle swarm optimization. Control Eng. Pract. 2019, 93, 104167. [Google Scholar] [CrossRef]
  18. Yao, Z.; Romero, C.; Baltrusaitis, J. Combustion optimization of a coal-fired power plant boiler using artificial intelligence neural networks. Fuel 2023, 344, 128145. [Google Scholar] [CrossRef]
  19. Doner, N.; Ciddi, K.; Yalcin, I.B.; Sarivaz, M. Artificial neural network models for heat transfer in the freeboard of a bubbling fluidised bed combustion system. Case Stud. Therm. Eng. 2023, 49, 103145. [Google Scholar] [CrossRef]
  20. Wang, C.; Liu, Y.; Zheng, S.; Jiang, A. Optimizing combustion of coal fired boilers for reducing NOx emission using Gaussian Process. Energy 2018, 153, 149–158. [Google Scholar] [CrossRef]
  21. Duan, H.; Huang, Y.; Mehra, R.K.; Song, P.; Ma, F. Study on influencing factors of prediction accuracy of support vector machine (SVM) model for NOx emission of a hydrogen enriched compressed natural gas engine. Fuel 2018, 234, 954–964. [Google Scholar] [CrossRef]
  22. Tuttle, J.F.; Blackburn, L.D.; Powell, K.M. On-line classification of coal combustion quality using nonlinear SVM for improved neural network NOx emission rate prediction. Comput. Chem. Eng. 2020, 141, 106990. [Google Scholar] [CrossRef]
  23. Yang, T.; Ma, K.; Lv, Y.; Bai, Y. Real-time dynamic prediction model of NOx emission of coal-fired boilers under variable load conditions. Fuel 2020, 274, 117811. [Google Scholar] [CrossRef]
  24. Niu, P.; Ma, Y.; Li, G. Model NOx emission and thermal efficiency of CFBB based on an ameliorated extreme learning machine. Soft Comput. 2018, 22, 4685–4701. [Google Scholar] [CrossRef]
  25. Ouyang, T.; Wang, C.; Yu, Z.; Stach, R.; Mizaikoff, B.; Huang, G.B.; Wang, Q.J. NOx measurements in vehicle exhaust using advanced deep ELM networks. IEEE Trans. Instrum. Meas. 2020, 70, 7000310. [Google Scholar] [CrossRef]
  26. Pachauri, N. An emission predictive system for CO and NOx from gas turbine based on ensemble machine learning approach. Fuel 2024, 366, 131421. [Google Scholar] [CrossRef]
  27. Tiep, N.H.; Jeong, H.Y.; Kim, K.D.; Xuan Mung, N.; Dao, N.N.; Tran, H.N.; Hoang, V.K.; Ngoc Anh, N.; Vu, M.T. A New Hyperparameter Tuning Framework for Regression Tasks in Deep Neural Network: Combined-Sampling Algorithm to Search the Optimized Hyperparameters. Mathematics 2024, 12, 3892. [Google Scholar] [CrossRef]
  28. Cesar de Lima Nogueira, S.; Och, S.H.; Moura, L.M.; Domingues, E.; dos Santos Coelho, L.; Mariani, V.C. Prediction of the NOx and CO2 emissions from an experimental dual fuel engine using optimized random forest combined with feature engineering. Energy 2023, 280, 128066. [Google Scholar] [CrossRef]
  29. Cai, W.; Liang, Y.; Liu, X.; Feng, J.; Wu, Y. Msgnet: Learning multi-scale inter-series correlations for multivariate time series forecasting. Proc. AAAI Conf. Artif. Intell. 2024, 38, 11141–11149. [Google Scholar] [CrossRef]
  30. Kursa, M.B.; Rudnicki, W.R. The all relevant feature selection using random forest. arXiv 2011, arXiv:1106.5112. [Google Scholar]
  31. Liao, Y.; Li, H.; Cao, Y.; Liu, Z.; Wang, W.; Liu, X. Fast Fourier transform with multihead attention for specific emitter identification. IEEE Trans. Instrum. Meas. 2023, 73, 2503812. [Google Scholar] [CrossRef]
  32. Lv, Y.; Yang, T.; Liu, J. An adaptive least squares support vector machine model with a novel update for NOx emission prediction. Chemom. Intell. Lab. Syst. 2015, 145, 103–113. [Google Scholar] [CrossRef]
  33. Tang, Z.; Zhang, Z. The multi-objective optimization of combustion system operations based on deep data-driven models. Energy 2019, 182, 37–47. [Google Scholar] [CrossRef]
  34. Breiman, L. Out-of-Bag Estimation; UC Berkeley Statistics Department: Berkeley, CA, USA, 1996. [Google Scholar]
  35. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
  36. Ye, H.; Chen, J.; Gong, S.; Jiang, F.; Zhang, T.; Chen, J.; Gao, X. ATFNet: Adaptive Time-Frequency Ensembled Network for Long-term Time Series Forecasting. arXiv 2024, arXiv:2404.05192. [Google Scholar]
  37. Zhu, Y.; Xu, W.; Zhang, J.; Liu, Q.; Wu, S.; Wang, L. Deep graph structure learning for robust representations: A survey. arXiv 2021, arXiv:2103.03036. [Google Scholar]
Figure 1. The schematic diagram of the coal-fired boiler.
Figure 1. The schematic diagram of the coal-fired boiler.
Atmosphere 16 00533 g001
Figure 2. Historical trends of coal rate A, secondary air B, unit load, oxygen content, and NOx emissions at SCR inlet.
Figure 2. Historical trends of coal rate A, secondary air B, unit load, oxygen content, and NOx emissions at SCR inlet.
Atmosphere 16 00533 g002
Figure 3. The basic structure of RF.
Figure 3. The basic structure of RF.
Atmosphere 16 00533 g003
Figure 4. The mechanism for the MSGNet algorithm (The red arrows indicate the steps of the algorithm).
Figure 4. The mechanism for the MSGNet algorithm (The red arrows indicate the steps of the algorithm).
Atmosphere 16 00533 g004
Figure 5. The results of auxiliary variables determination.
Figure 5. The results of auxiliary variables determination.
Atmosphere 16 00533 g005
Figure 6. Comparison results of different algorithms (The red dot lines represent ideal lines).
Figure 6. Comparison results of different algorithms (The red dot lines represent ideal lines).
Atmosphere 16 00533 g006
Figure 7. Comparison results of multistep NOx emissions prediction.
Figure 7. Comparison results of multistep NOx emissions prediction.
Atmosphere 16 00533 g007
Table 1. Selected variables for constructing the NOx emissions prediction model.
Table 1. Selected variables for constructing the NOx emissions prediction model.
IndexVariableUnitDescription
x 1 L unit MWUnit output power
x 2 T cr t/hTotal coal rate
x 3 T ar t/hTotal air rate
x 4 O2%Oxygen content
x5–10 A cr F cr t/hCoal rate of each coal mill
x11–16 A pr F pr t/hPrimary air rate of each burner
x17–21 I so V so %SOFA damper opening
x22,23 I cc , I I cc %CCOFA damper opening
x24–29 A I F I %Secondary damper opening
x30,31 A sr , B sr t/hSecondary air rate of each burner
yNOxmg/m3NOx concentration
Table 2. Comparison of model performance and computational efficiency with reduced and full variable sets.
Table 2. Comparison of model performance and computational efficiency with reduced and full variable sets.
Model DescriptionTraining ResultsTesting ResultsTraining Time (s)
R2 MSE RMSE MAE MAPE R2 MSE RMSE MAE MAPE
Using Variables after RF0.9954.7332.1761.6520.5260.9886.7712.6022.0930.635131.6
Using Original Variables0.9874.8732.2091.7010.6030.9727.0483.0212.5320.724225.4
Table 3. Performance matrices of both training and testing results.
Table 3. Performance matrices of both training and testing results.
AlgorithmTraining ResultsTesting Results
R2 MSE RMSE MAE MAPE R2 MSE RMSE MAE MAPE
LSTM0.97721.4124.6273.4621.0990.93339.686.2994.9541.502
BiLSTM0.97820.9974.5823.5821.1370.92146.7976.8415.231.552
GRU0.97523.3074.8283.7211.1780.92941.7516.4625.1691.557
MSGNet0.9954.7332.1761.6520.5260.9886.7712.6022.0930.635
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, J.; Ji, Y.; Yu, H. A Deep Learning Model for NOx Emissions Prediction of a 660 MW Coal-Fired Boiler Considering Multiscale Dynamic Characteristics. Atmosphere 2025, 16, 533. https://doi.org/10.3390/atmos16050533

AMA Style

Huang J, Ji Y, Yu H. A Deep Learning Model for NOx Emissions Prediction of a 660 MW Coal-Fired Boiler Considering Multiscale Dynamic Characteristics. Atmosphere. 2025; 16(5):533. https://doi.org/10.3390/atmos16050533

Chicago/Turabian Style

Huang, Jianrong, Yanlong Ji, and Haiquan Yu. 2025. "A Deep Learning Model for NOx Emissions Prediction of a 660 MW Coal-Fired Boiler Considering Multiscale Dynamic Characteristics" Atmosphere 16, no. 5: 533. https://doi.org/10.3390/atmos16050533

APA Style

Huang, J., Ji, Y., & Yu, H. (2025). A Deep Learning Model for NOx Emissions Prediction of a 660 MW Coal-Fired Boiler Considering Multiscale Dynamic Characteristics. Atmosphere, 16(5), 533. https://doi.org/10.3390/atmos16050533

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop