Improving the Heat Transfer Efficiency of Economizers: A Comprehensive Strategy Based on Machine Learning and Quantile Ideas

Wang, Nan; Shi, Yuanhao; Cui, Fangshu; Wen, Jie; Jia, Jianfang; Wang, Bohui

doi:10.3390/en18164227

Open AccessArticle

Improving the Heat Transfer Efficiency of Economizers: A Comprehensive Strategy Based on Machine Learning and Quantile Ideas

by

Nan Wang

¹

,

Yuanhao Shi

^1,*

,

Fangshu Cui

²,

Jie Wen

¹

,

Jianfang Jia

¹

and

Bohui Wang

^3,*

¹

School of Electrical and Control Engineering, North University of China, Taiyuan 030051, China

²

School of Computer Science and Technology, North University of China, Taiyuan 030051, China

³

School of Cyber Science and Engineering, Xi’an Jiaotong University, Xi’an 710049, China

^*

Authors to whom correspondence should be addressed.

Energies 2025, 18(16), 4227; https://doi.org/10.3390/en18164227

Submission received: 12 July 2025 / Revised: 1 August 2025 / Accepted: 4 August 2025 / Published: 8 August 2025

Download

Browse Figures

Versions Notes

Abstract

Ash deposition on economizer heating surfaces degrades convective heat transfer efficiency and compromises boiler operational stability in coal-fired power plants. Conventional time-scheduled soot blowing strategies partially mitigate this issue but often cause excessive steam/energy consumption, conflicting with enterprise cost-saving and efficiency-enhancement goals. This study introduces an integrated framework combining real-time ash monitoring, dynamic process modeling, and predictive optimization to address these challenges. A modified soot blowing protocol was developed using combustion process parameters to quantify heating surface cleanliness via a cleanliness factor (CF) dataset. A comprehensive model of the attenuation of heat transfer efficiency was constructed by analyzing the full-cycle interaction between ash accumulation, blowing operations, and post-blowing refouling, incorporating steam consumption during blowing phases. An optimized subtraction-based mean value algorithm was applied to minimize the cumulative attenuation of heat transfer efficiency by determining optimal blowing initiation/cessation thresholds. Furthermore, a bidirectional gated recurrent unit network with quantile regression (BiGRU-QR) was implemented for probabilistic blowing time prediction, capturing data distribution characteristics and prediction uncertainties. Validation on a 300 MW supercritical boiler in Guizhou demonstrated a 3.96% energy efficiency improvement, providing a practical solution for sustainable coal-fired power generation operations.

Keywords:

ash fouling; economizer; cleanliness factor; subtraction-average-based optimizer; quantile regression; bi-directional gated recurrent units

1. Introduction

Achieving a green, low-carbon energy system [1,2] while ensuring energy security and sustainable development [3,4] is a global imperative. Within the critical transition of energy structures [5,6], coal-fired power remains indispensable, as evidenced by IEA data: coal dominates the Asia Pacific generation (57%), fossil fuels and nuclear power lead in the Americas (64%), and while natural gas prevails in Europe/Middle East (72%), coal still contributes 21%. In Africa, gas (42%) and coal/nuclear (34%) are primary sources [7]. Given that existing renewable energy infrastructure cannot yet fully supplant thermal power, enhancing the thermal efficiency of coal-fired units by minimizing attenuation of heat transfer efficiency is paramount. A major factor in the degradation of efficiency is the fouling of the ash on the surfaces of the boiler heat exchanger, making regular soot removal operations essential. Consequently, determining the optimal timing (“When?”) and duration (“How long?”) for soot-blowing is a persistent challenge in power plant operation and optimization.

Research into boiler efficiency optimization via soot-blowing includes various approaches. Wen et al. [8] formulated soot-blowing as an equipment health management problem using the Hamilton–Jacobi–Bellman equation and Markov processes. While enabling sensitivity analysis, its computational complexity hinders practical engineering application. Shi et al. [9] employed the dynamic mass and energy balance for online thermal efficiency calculation and soft sensing to optimize soot-blowing frequency and duration without special instruments. However, this method assumes perfect cleaning after each cycle and lacks per-cycle effectiveness evaluation, limiting its robustness. Leveraging big data and deep learning, Pena et al. [10] developed probabilistic soot-blowing impact prediction models using ANNs and ANFIS, validated on a 350 MW plant. Xu et al. [11] combined the principles of heat balance, genetic algorithms (GAs), and back-propagation neural networks (BPNNs) for dynamic monitoring and optimization of lag in a 650 MW boiler, improving the net heat gain. Yet traditional BPNNs struggle to capture dynamic attenuation of heat transfer efficiency trends accurately, affecting prediction fidelity. Kumari et al. [12] used an extended Kalman filter for Cleanliness Factor (CF) estimation and GA-GPR for prediction on a 210 MW plant. However, dataset screening risks overlooking low-frequency variable influences, potentially biasing results.

This study focuses on the economizer of a 300 MW coal-fired boiler in Guizhou Province. In the boiler system, the economizer undertakes key functions such as recovering waste heat from flue gas to improve overall efficiency, protecting the steam drum from thermal shock and optimizing thermal distribution. Its heat exchange process is dominated by the mechanism of forced convection heat transfer generated by flue gas transversely scouring the tube bundle. Since the economizer is located in the low-temperature tail region of the flue gas flow (with an inlet temperature range of approximately 300–400 °C), the radiant capacity of the flue gas is significantly weakened and the contribution of radiant heat transfer is relatively minimal, which can usually be ignored in engineering design and calculation.

Based on this, this study constructs an economizer ash deposition monitoring model with key combustion parameters as the core and establishes the cumulative attenuation of heat transfer efficiency under the Cleanliness Factor (CF) curve of the economizer surface in each operation cycle as the optimization objective function. To optimize the soot-blowing strategy, an improved Subtraction-Average-Based Optimizer (SABO) is used to evaluate each operation cycle, aiming to determine the soot-blowing timing and duration that can minimize attenuation of heat transfer efficiency. Furthermore, the optimized soot-blowing trigger nodes and thresholds are applied to actual operation data to identify the target soot-blowing intervals that need to be predicted. Finally, a quantile regression-based interval prediction algorithm is used to predict the target soot-blowing cycles in the time series.

This method integrates thermodynamic principles and deep learning technologies, committed to building a comprehensive framework covering overall modeling, strategy optimization, and operation prediction, aiming to bridge the gap between mathematical models and artificial intelligence. This strategy enhances the engineering practicability of soot-blowing optimization without excessively increasing computational overhead, with the ultimate goal of improving the heat transfer efficiency of the economizer.

The main contribution of this paper can be summarized in the following three points:

(1): In this paper, a dynamic multi-objective optimization model of attenuation of heat transfer efficiency for the whole process of the economizer is established with the soot-blowing node and soot-blowing duration as the optimization objectives. Compared with other quantile models, this model has an accuracy improvement of up to 97.8% and is more intuitive.
(2): The optimization algorithm is improved according to the characteristics of the cleanliness factor of the economizer in this 300MW subcritical unit so that the improved algorithm has a faster convergence speed and higher convergence accuracy when dealing with specific problems.
(3): The interval prediction method based on quantile regression effectively reflects the overall distribution of the data and characterizes the uncertainty of the predicted point distribution, thereby improving prediction accuracy.

The integrated modeling–optimization–prediction approach provided in this paper is also a better guide for practical engineering. It optimizes and quantifies the soot-blowing nodes and durations for each cycle. It also gives the boiler operator more time to prepare for soot-blowing operations and develop a more reasonable soot-blowing strategy.

2. Problem Description

2.1. Introduction to the Structure of the Boiler and Economizer

This study focuses on a 300 MW coal-fired power station boiler located in Guizhou Province of China. The boiler operates using a tangential combustion mode at four corners, as illustrated in the accompanying figure. The model of the boiler is HG-1210/25.73-HM6, characterized by a steam drum-type configuration. It is a once-through boiler that features primary intermediate reheating and operates under supercritical pressure with variable conditions. The boiler employs an atmospheric expansion start-up system that does not utilize a recirculation pump. It is designed with a single furnace, balanced ventilation, solid slag discharge, an all-steel frame, a fully suspended structure, a type layout, and tight closure. Furthermore, the boiler incorporates a medium-speed mill positive-pressure direct-blowing pulverizing system, with each furnace equipped with five coal pulverizers. Under the BMCR working conditions, four coal pulverizers are operational while one mill remains on standby. The fineness of the pulverized coal is R90 = 18/20% (design/check coal type). The schematic diagram of the boiler heat transfer flow chart is shown in Figure 1.

The type of economizer is a cast-iron economizer, and its internal structure is shown in Figure 2. It is installed in the vertical flue at the end of the boiler and is used to recover the waste heat in the exhaust gas from the equipment; the boiler feed water is heated into the saturated water under the pressure of the steam natural circulation system of the heated surface since it absorbs high-temperature flue gas heat, reduces the temperature of the flue gas, saves energy, and improves the efficiency, so it is called the coal economizer. It has many functions, including

Absorbing the heat of low-temperature flue gas, lowering the exhaust temperature, reducing sensible attenuation of heat transfer efficiency of the flue gas, and saving fuel.
Increasing the temperature of the boiler feed water so that the feed water into the steam drum after the wall temperature difference is reduced, the thermal stress is reduced accordingly to extend the service life of the steam drum, etc.

Convection heat transfer [13] is mainly embodied in the flue gas heating water, so the flue gas temperature is reduced simultaneously to enhance the feedwater temperature.

Figure 1. HG-1210/25.73-HM6 boiler.

Figure 2. Economizer internal structure diagram.

2.2. Grey Pollution Monitoring Model Construction

This study focuses on a W-shaped flame boiler system using Qianxi anthracite (Category WY3). This coal type is characterized by high ash content (actual ash content of coal fed into the furnace > 30%), medium to high sulfur content (

S_{t d}

= 1.06∼4.87%), and low volatile content (

V_{d a f}

< 10%). Its ash fusion properties (slagging index

R_{z}

= 2.18∼2.74) have a significant impact on the deposition behavior of heating surfaces.

Mechanism of ash fouling formation
In the high-temperature zone of the furnace (about 1500 °C), low-melting-point ash melts to form liquid-phase ash slag. Approximately 20% of the molten ash droplets are entrained by flue gas into the convection flue. As the flue gas flows through the economizer finned tube bundles, deposition occurs through the following mechanisms:
- Inertial–gravitational sedimentation: Under the condition of flue gas velocity of 6∼8 m/s (optimized design value), ash particles (with particle sizes of 1∼200 μm) deviate from the flow line and impinge on the tube wall in the eddy current zone on the leeward side of the tube bundle;
- Thermophoretic force driving: When the temperature difference between flue gas and the tube wall exceeds 200 °C, thermophoretic force significantly promotes the migration of submicron particles to the low-temperature wall surface;
- Turbulent diffusion–adhesion: Micron-sized particles diffuse to the boundary layer through turbulence and adhere to the surface of the deposition layer via van der Waals forces.
Evolution characteristics of the deposition layer
Initial deposition takes molten ash droplets as the core (with a viscosity of $10^{2}$ ∼ $10^{3}$ Pas), forming a loose and porous structure, and its spatial distribution shows significant inhomogeneity:
- Leeward side enrichment phenomenon: Due to the lower shear force on the leeward side of the finned tube (about 30% of that on the windward side), the deposition thickness can reach more than twice that on the windward side;
- Regulatory effect of fin spacing: When the fin spacing is ≥35 mm, the “bridging” phenomenon of sediments can be effectively inhibited (Figure 3);
- Dynamic equilibrium mechanism: The growth of the deposition layer is jointly regulated by coal ash properties (ash content > 30%), structural parameters (tube diameter $ϕ$ 50/60 mm, fin height 1∼2.5 m), and flue gas velocity (6∼8 m/s) and finally reaches a state of equilibrium between the detachment and adhesion rates. The equilibrium thickness $δ_{e q}$ satisfies:
  
  $δ_{e q} \propto \frac{ρ_{p} \cdot u^{0.8} \cdot A_{a s h}}{T_{w}^{1.5} \cdot S_{f}}$
  
  (1)
  
  where $ρ_{p}$ is the particle density, u is the flue gas velocity, $A_{a s h}$ is the ash concentration, $T_{w}$ is the wall temperature, and $S_{f}$ is the fin spacing.

In this study, the boiler’s cleanliness factor (

C F

) dataset is classified and screened for soot-blowing node optimization, ash scaling monitoring, scaling prediction, improvement of heat transfer efficiency, and development of a more rational soot-blowing strategy. Generally, due to the complex working conditions of the heat transfer surfaces in a boiler, the ash accumulation state on the heat transfer surfaces cannot be measured directly. The boiler’s ash fouling state, combined with indirect parameters, can reflect the fouling accumulation state of the heat transfer surfaces. In this paper, the

C F

represents the ash fouling state:

C F = \frac{K_{r}}{K_{0}}

(2)

where

K_{r}

and

K_{0}

are the heat transfer coefficients of the heat transfer surface and the theoretical heat transfer coefficient, respectively.

Figure 3. Schematic of the scaling of a single ribbed pipe.

The theoretical coefficient

K_{0}

considers only gas-side thermal resistance based on the following justifications:

Metal wall thermal resistance is negligible due to the high thermal conductivity of carbon steel (40–60 W/(m·K)) versus ash deposits (0.1–1 W/(m·K)).
Water-side resistance is excluded as its heat transfer coefficient (3000–6000 W/(m²·K)) dominates the gas side (30–100 W/(m²·K)) in clean conditions.
This simplification aligns with industrial monitoring standards for economizers in coal-fired plants [14].

The theoretical heat transfer coefficient

K_{0}

is thus defined as

K_{0} = α_{f} + α_{d}

(3)

where

α_{f}

is the radiation heat transfer coefficient calculated by the modified Stefan–Boltzmann law for gas-tube systems [15]

α_{f} = 5.7 \times 10^{- 8} \frac{a_{g b} + 1}{2} a_{h} T^{3} κ

(4)

and

α_{d}

is the convection coefficient from Grimson’s correlation for staggered tube banks [16]:

α_{d} = 0.65 C_{s} C_{z} \frac{λ}{d} {(Re)}^{0.64} P r^{1 / 3}

(5)

The temperature correction factor

κ

follows the Hottel–Whillier formulation [15]:

κ = \{(1 - {(\frac{T_{g b}}{T})}^{4}) / (1 - \frac{T_{g b}}{T})\}

(6)

where

a_{g b}

,

a_{h}

are wall and gas emissivities; T,

T_{g b}

are gas and wall temperatures (°C);

C_{S}

,

C_{Z}

are transverse/longitudinal tube arrangement factors;

λ

is gas thermal conductivity; d is tube diameter (m);

Re = \frac{ω d}{v}

;

ω

is gas velocity (m/s);

υ

is kinematic viscosity; and

P r

is Prandtl number.

The values of

C F

lie in the interval [0, 1], with one corresponding to the clean status of the heat transfer surface. The actual heat transfer coefficient is calculated from the measured data of multi-position sensors on the flue gas side and the working medium side (as shown in Figure 4).

At this point, it is important to highlight the construction process of the soot monitoring model. The theoretical heat transfer coefficient

K_{0}

is the heat transfer efficiency of the heating surface in the original light pipe state without soot deposition. It is usually the sum of the theoretical radiation heat transfer coefficient and the theoretical convective heat transfer coefficient, ignoring the thermal resistance between the working medium and the pipe wall, as well as the internal thermal resistance of the metal.

The following mechanism equation usually obtains the heat transfer coefficient of the heated surface:

α_{f} = 5.7 \times 10^{- 8} \frac{a_{g b} + 1}{2} a_{h} T^{3} κ

(7)

α_{d} = 0.65 C_{s} C_{z} \frac{λ}{d} {(Re)}^{0.64} P r^{1 / 3}

(8)

κ = \{(1 - {(\frac{T_{g b}}{T})}^{4}) / (1 - \frac{T_{g b}}{T})\}

(9)

where

a_{g b}

and

a_{h}

are the blackness of the tube wall and flue gas, respectively; T and

T_{g b}

are the temperatures of the flue gas and the tube wall, respectively, °C;

C_{S}

and

C_{Z}

are the transverse and longitudinal structural parameters of the heated surface of the economizer, respectively;

λ

is the thermal conductivity of the flue gas; d is the diameter of the tube in m;

Re = \frac{ω d}{v}

is the Reynolds number;

ω

is the flow rate of the flue gas in m/s;

υ

is the kinetic viscosity of the flue gas; and

P r

is the Plumtree constant.

The flue gas velocity

ω

is determined from the continuity equation:

ω = \frac{V_{b}}{A_{c}}

(10)

where

A_{c}

represents the cross-sectional area of the gas passage (in m²), which is distinct from the heat transfer surface area. This distinction is critical as the use of incorrect area values would introduce systematic errors in cleanliness factor (

C F

) calculations, even for clean heat transfer surfaces.

V_{b}

denotes the volumetric flue gas flow rate in standard conditions (in m³/s), obtained by correcting the actual measured flow rate

V_{r}

using the ideal gas law:

To account for flow non-uniformity effects, which are known to reduce

C F

values by 5–15% even in clean conditions [14], the following considerations apply:

The heat transfer surface area $A_{h}$ depends on tube geometry and fin characteristics:

$A_{h} = N_{t} π d L η_{f} + N_{f} (2 h_{f} L_{f} + t_{f} L_{f})$

(11)

where $N_{t}$ is the number of tubes, d tube diameter, L tube length, $η_{f}$ fin efficiency, $N_{f}$ fin count, $h_{f}$ fin height, $L_{f}$ fin length, and $t_{f}$ fin thickness.
Flow maldistribution is quantified by the non-uniformity coefficient $ζ$ [17]:

$ζ = \frac{ω_{max} - ω_{min}}{ω_{avg}}$

(12)

with typical economizer values ranging from 0.2 to 0.4.

The actual heat transfer coefficients

K_{r}

incorporate these geometric and flow factors to minimize systematic deviations in

C F

evaluation.

V_{b} = \frac{p_{r} V_{r}}{p_{b}} / (1 + \frac{T_{r}}{273.15})

(13)

where

V_{r}

is the actual measured flue gas flow rate in m³/s;

T_{r}

is the measured temperature of the flue gas in the section, °C;

p_{r}

is the pressure of the flue gas,

P a

; and

p_{b}

is the standard atmospheric pressure,

P a

.

The actual heat transfer coefficients are obtained through a dynamic energy [18] balance.

K_{r} = \frac{q_{y}}{A Δ T_{m}}

(14)

where

q_{y}

is the heat released by the flue gas flowing through the heated surface,

k J / s

, and

Δ T_{m}

is the average temperature difference between the flue gas side and the mass side of the heat transfer surface, °C.

Considering the energy balance between the flue gas side and the work mass side, the heat released by the flue gas on the flue gas side,

q_{y}

, is equal to the heat absorbed by the work mass on the work mass side,

q_{q}

(

k J / s

); that is,

q_{y} = q_{q}

(15)

Heat absorbed on the work side

q_{q} = D (h_{o u t} - h_{i n})

(16)

where D is the work mass flow rate through the heated surface, kg/s;

h_{o u t}

and

h_{i} n

are the enthalpy of the work mass flowing through the outlet and inlet of the heated surface, kJ/s, respectively.

h_{o u t}

and

h_{i} n

are calculated using the

I A P W S - I F 97

[12] formula, where the saturated steam parameters

b h = 0

, temperature T = 350 °C, and pressure P for 2.5 Mpa.

The calculated CF image with the normalized load profile is shown in Figure 5.

Since the start-up phase is in a state of deep peak regulation (with a load below 30%), the abrupt changes in flue gas components caused by auxiliary fuel combustion and the local abnormal ash deposition resulting from flow field reconstruction under low loads are both difficult to effectively monitor or quantify. Therefore, this phase is not suitable as an object for soot-blowing optimization research. Based on this, this study selects the curves of the stable load segment in Figure 5 as the objective function for soot-blowing optimization in Section 3.

3. Full Process Modeling of Economizer Energy Efficiency

3.1. Raw Data Plotting

In the axes shown in Figure 6, the indigo blue curve is the soot accumulation segment 1; the green curve is the soot-blowing segment; the yellow curve is the soot accumulation segment 2; the two purple and red dotted lines indicate the soot-blowing start point and the soot-blowing endpoint, respectively; the vertical coordinate indicates the trend of the change of cleanliness factor; and the horizontal coordinate indicates the time, with a sampling period of 5 s.

3.2. Data Pre-Processing

Since the curve of the original data of the coal economizer cleanliness factor is non-linear and non-stable, the direct calculation of the area by a definite integral will produce a large error, so we use the polynomial fitting method to deal with the original data. The expression for the polynomial fit is

y = a_{0} + a_{1} x + a_{2} x^{2} + a_{3} x^{3} + a_{4} x^{4}

(17)

After obtaining the fitted curve, reduce the calculation error by calculating the boundary and the fitted curve enclosing the area, which is convenient for calculating the area. The polynomial fitting results and expressions for the three processes for the trend change in the cleanliness coefficient of the economizer are shown in Figure 7. The polynomials corresponding to ash accumulation segment 1, soot-blowing segment, and ash accumulation segment 2 are Equation (18), Equation (19) and Equation (20), respectively.

\begin{matrix} y_{1} & = 0.78499 - 1.05678 \times 10^{- 5} x^{1} + 8.63269 \times 10^{- 10} x^{2} - 4.37152 \times 10^{- 14} x^{3} + \\ 6.99596 \times 10^{- 19} x^{4} \end{matrix}

(18)

\begin{matrix} y_{2} & = 1416.15676 - 0.16281 x^{1} + 7.02413 \times 10^{- 6} x^{2} - 1.34737 \times 10^{- 10} x^{3} + \\ 9.69771 \times 10^{- 16} x^{4} \end{matrix}

(19)

\begin{matrix} y_{3} & = 161.41513 - 0.01616 x^{1} + 6.09207 \times 10^{- 7} x^{2} - 1.0198 \times 10^{- 11} x^{3} + \\ 6.39346 \times 10^{- 17} x^{4} \end{matrix}

(20)

3.3. Optimization Problem Description

Figure 8 presents a schematic diagram of the reduction in the heat transfer coefficient from the heated surface of the economizer. The area of the curved trapezium enclosed by the fitted curve

y_{1}

with

x_{0} = 0

,

y = 1

,

x_{1} =

32,725 is

A_{1}

.

The area of the curved trapezium enclosed by the fitted curve

y_{2}

with

x_{1} =

32,725,

x_{2} =

36,660, and

y = 1

is

A_{2}

.

A_{2}

is used as the soot-blowing stage and includes a portion of the steam loss

Q_{s b}

. Since the soot-blowing method is used in the 300 MW unit of the Guizhou power plant, the research object is steam soot-blowing, and steam loss is inevitably caused during soot-blowing operations. In this study, steam loss

Q_{s b}

is used as part of the fitness function of the optimization problem to optimize the entire soot-accumulation–soot-blowing–soot-accumulation cycle globally.

Q_{s b} = m_{s} t_{s} (h_{s i} - h_{s o})

(21)

m_{s}

represents the soot-blowing steam flow rate in kg/s,

t_{s}

represents the soot-blowing time in s, and

h_{s i}

represents the enthalpy of the soot-blowing steam source in kJ/kg, while

h_{s o}

represents the enthalpy of the condenser inlet steam in kJ/kg.

The area of the curved trapezium enclosed by the fitted curve

y_{3}

with

x_{2} =

36,660,

x_{3} =

44,690, and

y = 1

is

A_{3}

. The total area of attenuation of heat transfer efficiency for these three operating processes of the economizer is

A_{g} = A_{1} + A_{2} + A_{3} + Q_{s b}

(22)

A_{g}

is the sum of the attenuation of heat transfer efficiency from the three processes on the heating surface of the heat accumulator and is also the objective function of the optimization problem.

A_{k}

represents the effective heat exchange, and

A_{z}

represents the total heat,

A_{z} = A_{k} + A_{g}

. When

x_{1} =

32,725,

x_{2} =

36,660, the attenuation of heat transfer efficiency area

A_{g}

at the heated surface of the economizer before the optimization is 12,836.7124. At this time, the percentage attenuation of heat transfer efficiency

P_{Q}

from the heated surface of the economizer is

P_{Q} = \frac{A_{g}}{A_{z}} = \frac{12,836.7124}{44,690} \times 100 % = 28.7239 %

(23)

The percentage reduction of heat transfer coefficient of the original economizer dataset is 28.7239%.

4. Improved Subtraction-Average-Based Optimizer and Optimization Results

4.1. Subtraction-Average-Based Optimizer

The idea of SABO’s algorithm [19] is to update the positions of the population members in the search space using the subtracted averages of multiple intelligences, such as the mean, the difference in the positions of the search agents, and the sign of the difference between the two values of the objective function, using the arithmetic average position of all the search agents. SABO is based on the special operation “

- v

” called

v -

subtraction from search agent A to search agent B. Among them, v is a set of vectors with data randomly generated from the set

[1, 2]

.

F (A)

and

F (B)

are the fitness values of individuals A and B, respectively, expressed as

A - v B = s i g n (F (A) - F (B)) (A - \bar{v} * B)

(24)

The displacement of any search agent

X_{i}

in the search space is computed by the arithmetic mean of

v -

, the subtractions of each search agent

X_{j}

;

X_{i}^{n e w}

represents the new updated position of individual i; and

\vec{r_{i}}

is a random vector introduced to enhance the exploration ability.

1 / N

is a normalization coefficient (the reciprocal of the population size), which ensures that the summation result is not affected by the population size N.

X_{i}^{n e w} = X_{i} + {\vec{r}}_{i} * \frac{1}{N} \sum_{j = 1}^{N} (X_{i} - v X_{j}), i = 1, 2, . . ., N

(25)

F_{i}

represents the objective function value (fitness value) at the current position

X_{i}

, and

F_{i}^{n e w}

represents the objective function value (fitness value) at the new position

X_{i}^{n e w}

. The followign equation determines whether to accept a new agent:

X_{i} = \{\begin{matrix} X_{i}^{n e w}, F_{i}^{n e w} < F_{i} \\ X_{i}, e l s e \end{matrix}

(26)

The illustration of the use of “v-subtraction” for exploration and mining is shown in Figure 9:

4.2. Golden Ratio Strategy

Traditional optimization algorithms are inefficient in interval partitioning as they require more function evaluations to narrow the search interval. For some gradient-based algorithms, it is easy to fall into topical optimal solutions, and the golden ratio strategy is a global search method that helps avoid premature convergence to topical minima or maxima.

(1): Initialization intervals: Set an initial search interval [a, b], where a and b are the upper and lower bounds of the solution space.
(2): Determine the subinterval: the interval [a, b] is divided into two subintervals according to the golden ratio. Let c and d be two points in the interval such that

$c = a + (1 - ϕ) * (b - a)$

(27)

$d = a + ϕ * (b - a)$

(28)

$ϕ \approx 0.618$ , which is a mathematically specific ratio.
(3): Evaluation function value: Compute $f (c)$ and $f (d)$ . $f (x)$ is the objective function, and we want to find the minimal or maximal value of $f (x)$ in this interval.
(4): Update interval: Compare the values of $f (c)$ and $f (d)$ ; if $f (c) < f (d)$ , the new search interval becomes [a, d]; on the contrary, if $f (c) > f (d)$ the new search interval becomes [c, b]. After each iteration, the length of the interval is reduced according to the golden ratio.
(5): Repeat steps (2) to (4) until the length of the search interval is less than a predefined threshold or an upper limit on the number of iterations is reached.

4.3. Piecewise Chaos Mapping

In many traditional optimization algorithms, the initialization of the population is not random enough, leading to premature convergence of the algorithm to the topical optimal solution, which fails to achieve the desired effect. To address this problem, piecewise chaos mapping has been added to the algorithm for further improvement. Piecewise chaos mapping has good statistical performance and is a segmented mapping function. The piecewise chaos mapping formula is

x_{i + 1} = \{\begin{matrix} \frac{x_{i}}{P} 0 \leq x_{i} < P \\ \frac{x_{i} - P}{0.5 - P} P \leq x_{i} < 0.5 \\ \frac{1 - P - x_{i}}{0.5 - P} 0.5 \leq x_{i} < 1 - P \\ \frac{1 - x_{i}}{P} 1 - P \leq x_{i} < 1 \end{matrix}

(29)

P takes values in [0, 0.5], a segmented control factor used to divide the four-part function of this segmented function. Generally,

d = 0.3

. The range of chaotic orbital state values is (0, 1). The effect of population initialization with the addition of piecewise chaos mapping is shown in Figure 10.

Among them, the chaos value refers to the sequence values generated during the iteration of the chaotic map, which have deterministic but unpredictable characteristics. They are generated by a simple mathematical Equation (29) yet exhibit complex random behaviors. In optimization algorithms, chaos values play a role in population initialization and dynamically adjusting parameters such as step sizes and weights to avoid premature convergence.

Dimension refers to the number of independent variables in a chaotic sequence, which determines the complexity of the state space of a chaotic system. In optimization algorithms, it typically functions in matching the solution space dimension, enhancing ergodicity, and avoiding correlation.

4.4. Roulette Wheel Selection

Since the objective problem with nonlinear constraints solved in this paper is minimization, the probability of the optimal solution appearing at the boundary is much higher. The insensitivity of traditional optimization algorithms to boundary values makes the optimal solution easy to ignore. Most optimization algorithms have only one process, which makes it easy for non-optimal solutions to be selected.

The roulette strategy is one of the commonly used strategies in genetic algorithms.

(1)

Proportions the probability of an individual being selected with the size of its fitness value (as shown in Equation (30)):

P (x_{i}) = \frac{f (x_{i})}{\sum_{j = 1}^{N} f (x_{j})}

(30)

x_{i}

is a certain individual.

(2)

Cumulative probability represents the probability of everyone using line segments of different lengths, which are combined to form a straight line with a length of 1 (the sum of the probability of everyone), such that the longest line segment of a certain segment in the line represents the higher probability of the individual being selected. Its mechanism is as follows

Arbitrarily select a sequence of permutations of all individuals (this sequence can be arbitrary because it is the length between certain line segments as representing the probability of selection of a particular individual);
The cumulative probability of any individual (as shown in Equation (31)) is the cumulative sum of the previous data corresponding to that individual.

$Q (x_{i}) = \sum_{k = 1}^{i} p (x_{k})$

(31)

(3)

Generate a random number between the intervals [0, 1], and judge which interval the number falls in, and if it falls in a certain interval, that interval is selected. Obviously, for an individual with a larger fitness value, the length of the corresponding line segment will be long. Hence, the probability of a randomly generated number falling in this interval is large, and the probability of that individual being selected is also large. Figure 11 shows a simple example of four independent trials using the roulette wheel selection algorithm.

In this paper, we also extend the single process of the algorithm to multi-process by converting the fitness of the individuals in each process into a probability, eliminating the individual with the smallest probability, and looping the others into the next process. After many experiments, the optimal solution can be locked in about three cycles.

4.5. Solving Targeted Problems with GRSABO

In Section 3.3 of this paper, the objective function

A_{g}

of the GRSABO algorithm is obtained. Now, we set

x_{1}

(ash-blowing starting point) and

x_{2}

(soot-blowing endpoint) as unknown variables and let

x_{1}

and

x_{2}

move freely on the x-

a x i s

subject to the constraints. The main task of the optimization part of this paper is to minimize the value of

A_{g}

by varying the values of

x_{1}

and

x_{2}

, as shown in Figure 12.

According to the operating regulations of thermal power plants, the interval between two soot-blowing operations cannot be less than 8 h (

x_{1} \geq 28, 800

), and the duration of each soot-blowing is in the range of 4500 to 5400 s (

4500 \leq x_{2} - x_{1} \leq 5400

), which are the two non-linear constraints of the optimization problem.

4.6. Optimization Results and Validation

In this paper, six intelligent optimization algorithms are used, such as the whale optimization algorithm (WOA), the gray wolf optimization algorithm (GWO), the subtraction-average-based optimizer (SABO), the particle swarm optimization algorithm (PSO), the seagull optimization algorithm (SOA), and the sparrow optimization algorithm (SSA) in the comparison of the improved subtraction-average-based optimizer. Twenty independent trials of each optimization algorithm on the objective function, the starting point of ashes blowing

x_{1}

, the endpoint of soot blowing

x_{2}

, and the attenuation of heat transfer efficiency area

A_{g}

of the heated surface of the economizer obtained from each experiment were recorded. The results obtained by the seven optimization algorithms are plotted in box plots of

x_{1}

,

x_{2}

and

A_{g}

respectively, and the results obtained are analyzed. Firstly, through all the experiments, we obtain the minimum value of

A_{g}

as 11,057.3562, which corresponds to an

x_{1}

value of 28,800 and an

x_{2}

value of 33,300.

Figure 13a shows the soot-blowing starting points obtained by seven optimization algorithms.

y = 28,800

represents the value of

x_{1}

when

A_{g}

is at its smallest, and by the box plot, the results of the 20 experiments of GRSABO converge more efficiently around 28,800 and have fewer outliers.

Figure 13b shows the experimental results at the endpoint of soot-blowing obtained by seven different optimization algorithms.

y = 33, 300

represents the value of

x_{2}

when

A_{g}

is at its smallest point, and through the box plots, the results of the 20 experiments of GRSABO converge more efficiently and with fewer outliers around 33,300.

Figure 13c shows the experimental results of the attenuation of heat transfer efficiency area

A_{g}

of the heated surface of the economizer obtained by seven different optimization algorithms, and through the box plots, the results of the 20 experiments of GRSABO converge more efficiently around y = 11,057.3562 and with fewer outliers. The optimized thermal efficiency of the heated surface of the economizer

A_{g}^{'}

is shown in Equation (32)

\frac{11,057.3562}{44,690} \times 100 % = 24.7423 %

(32)

In summary, GRSABO shows a clear advantage over other optimization algorithms in its convergence accuracy. Throughout the optimization module, we obtain the ideal soot-blowing starting point

x_{1} = 28,800

, soot-blowing endpoint

x_{2} = 33,300

, and the attenuation of heat transfer efficiency area of the heating surface of the economizer

A_{g} = 11,057.3562

. This paper will use this soot-blowing threshold point in the prediction module to perform interval prediction. Table 1 shows the optimal solutions selected by different optimization algorithms and the frequency of selection of optimal solutions in 20 independent trials. After algorithmic optimization, the thermal efficiency of the economizer is increased by approximately

28.7239 % - 24.7423 % = 3.9816 %

(33)

5. Optimization Results Applied to Interval Prediction

5.1. Integration and Application of Optimization Results

According to Section 4.6, we obtain the optimized soot-blowing start point (Figure 14a) and the corresponding soot-blowing threshold (Figure 14b), bring the soot-blowing threshold and the soot-blowing start point into the original cleanliness factor dataset, and intercept the 1000 sets of data before the soot-blowing threshold. We have the soot accumulation segment for which we will perform interval prediction.

5.2. Wavelet Thresholding Method for Denoising

The wavelet threshold noise removal method [20,21] is a time-frequency localization analysis method. After the wavelet transforms the signal, the signal is decomposed into several sub-bands of time domain components. The noise signals with small wavelet coefficients can be filtered by selecting appropriate threshold values. Too large a threshold value will cause the loss of effective information, while too small a threshold value will cause the residual noise signal, affecting the prediction results’ accuracy.

The steps of wavelet threshold denoising are as follows:

(1)

For the original signal characteristics and application background, select the appropriate wavelet basis, find the number of layers, and use wavelet decomposition to process the original signal containing noise to obtain the wavelet coefficients [22].

(2)

After selecting a suitable threshold, the threshold function processes the layer coefficients [19]. Considering that the hard threshold function will cause the reconstructed signal to oscillate and the soft threshold signal will easily lead to signal distortion when dealing with nonlinear signals, the unbiased risk estimation threshold is selected as the threshold function in this paper.

The unbiased risk estimation threshold function is calculated as follows:

The absolute values of all elements in the original signal $s (t)$ are first extracted, and then the sequence of absolute values is ordered from smallest to largest. The expression is

$y (k) = {(s o r t (|s (i)|))}^{2}$

(34)
Set $λ_{j}$ to be the square root of the jth element of $y_{k}$

$λ_{j} = \sqrt{y (j)}$

(35)
Then, the risk function with this threshold is shown in Equation (36)

$R i s k (j) = \frac{1}{N} [N - 2 j + \sum_{i = 1}^{j} y (i) + (N - j) y (N - j)]$

(36)
The corresponding risk curve can be obtained from the risk function, and then the value of j corresponding to the smallest risk is recorded as $j_{min}$ , and the unbiased risk estimation threshold can be obtained from $j_{min}$ .

$λ = \sqrt{y (j_{min})}$

(37)

(3)

The signal after noise removal is obtained by processing the wavelet coefficients with the unbiased risk estimation threshold.

Here, we evaluate the denoising effect using three metrics: signal-to-noise ratio (SNR), root-mean-square error (RMSE), and running time. The signal-to-noise ratio (SNR) and root-mean-square error (RMSE) are shown in Equations (38) and (39):

S N R (d B) = 10 {log}_{10} (\frac{P_{s i g n a l}}{P_{n o i s e}})

(38)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(39)

where

P_{s i g n a l}

is the signal power,

P_{n o i s e}

is the noise power, n is the number of observations,

y_{i}

is the true value of the ith observation, and

{\hat{y}}_{i}

is the predicted value of the observation.

5.3. Ensemble Empirical Mode Decomposition

The traditional empirical modal decomposition (EMD) [23] has significant advantages when dealing with nonlinear and non-smooth time series; it is based on the distribution of the extreme points of the signal itself, so there is no need to choose a basis function, and in addition, it is data-driven and adaptive. It is not constrained by Heisenberg’s principle of inappropriateness of measurement [24,25]. However, EMD is highly susceptible to modal aliasing. Modal aliasing leads to false time-frequency distributions and renders the IMFs physically meaningless. Ensemble empirical modal decompositionn (EEMD) [26] is an improved version of EMD, which reduces the problem of modal aliasing in EMD by adding Gaussian white noise to the original signal, then performing EMD decomposition of multiple noisy versions of the signal, and finally averaging the results. Here are the basic steps of EEMD:

(1): Add Gaussian white noise: Add a set of randomly generated Gaussian white noise $n (t)$ to the original signal $x (t)$ to create a set of noise-added signals $k_{i} (t)$ , i is the number of times the noise is added [27,28].

$k_{i} (t) = x_{i} (t) + n_{i} (t) i = 1, 2 . . ., N$

(40)
(2): EMD decomposition: An EMD decomposition is performed for each noisy signal $k_{i} (t)$ to obtain a series of Intrinsic Mode Functions (IMFs) [29].
(3): Average treatment: the IMFs of the same sequences obtained from each noisy signal are averaged to obtain the final stable set of IMFs.

I M F_{j} = \frac{1}{N} \sum_{i = 1}^{N} I M F_{j} (x_{i}) j = 1, 2, . . ., M

(41)

M is the number of IMF components.

5.4. t-Test

After using the EEMD decomposition, we need to classify and integrate the obtained IMF components to obtain three key features for interval prediction: high-frequency components, low-frequency components, and trend terms, and the method for integrating these features t-test [30] is described below. The obtained IMF components were preprocessed by noting IMF1 as indicator 1, IMF1 + IMF2 as indicator 2, and so on, with the sum of the first i IMF components adding up to indicator i. A t-test was performed to determine whether this mean differed significantly from 0. The specific steps of the t-Test are as follows:

(1): Set the assumptions: Null hypothesis ( $H_{0}$ ): The mean of the sample is equal to 0. Alternative hypothesis( $H_{a}$ ): The sample’s mean is not equal to 0.
(2): Selecting the significance level: Usually, $α$ = 0.05 is chosen as the significance level. We will reject the null hypothesis if there is a 5% probability that the observed data is inconsistent with the hypothesized overall mean [31].
(3): Calculating t-statistics: Statistics calculation using Equation (42).

$t = \frac{\bar{x} - μ}{s / \sqrt{n}}$

(42)

$\bar{x}$ is the sample mean, $μ$ is the hypothesized overall mean, s is the sample standard deviation, and n is the sample size.
(4): Determining sample degrees of freedom ( $d f$ ): $d f = n - 1$ .
(5): Finding the t critical value: find the t critical value corresponding to the degree of freedom ( $d f$ ) and significance level ( $α$ ) in the t distribution table.
(6): Comparing t-statistics and t-critical values: if the absolute value of the calculated t-statistic is greater than the t critical value, the null hypothesis is rejected, and the sample mean is considered to be significantly different from 0.
(7): Conclusion: If the absolute value of the t-statistic is greater than the t-critical value, then we can conclude that the sample mean is significantly different from 0, based on the direction of the alternative hypothesis.

If the absolute value of the t-statistic is less than or equal to the t-critical value, then you cannot reject the null hypothesis; there is not enough evidence that the sample mean is significantly different from 0.

5.5. Interval Forecasting

5.5.1. Quantile Regression

The main goal of traditional mean regression (OLS) [32] is to estimate the conditional mean of the dependent variable (response variable) with respect to one or more independent variables (explanatory variables). In the simplest case, the OLS regression attempts to find a straight line (or hyperplane in multidimensional space) such that the sum of the squares of the distances (residuals) of all observations from this line is minimized. This means that the OLS regression estimates the model parameters by reducing the residual sum of squares (RSS). OLS regression is usually required to satisfy several assumptions, such as zero mean of the error term, homoskedasticity, no autocorrelation, etc. At the same time, it is susceptible to extreme values or outliers because the squares amplify the larger residuals. Therefore, the OLS regression cannot portray the uncertainty of the predicted points [33].

Quantile regression (QR) aims to estimate the relationship between the dependent and independent variables at different quantile levels, not just the mean. For example, it can estimate the regression relationship for median regression (when the quantile

τ = 0.5

), upper quartile regression (

τ = 0.5

), or other regression relationships at any quantile level. Quantile regression uses different loss functions to minimize a weighted sum of the absolute values of the residuals, with the weights depending on the direction of the residuals and the chosen quantile level. Quantile regression [34] is more robust to outliers because it uses the absolute value of the residuals rather than the square. It also allows for heterogeneity analysis, permitting analyses of how the effect of the dependent variable may vary across quartiles, providing more comprehensive information about the relationship between variables. At the same time, quantile regression does not require that the error term obey a particular distribution or that homoskedasticity be present [35].

Quantile regression is usually divided into five steps as follows:

(1): Data preparation: Determine the response variable Y and explanatory variable X. Split the dataset into a training set and a test set.
(2): Model setting: Set the form of the quantile regression model, which is usually a linear model, $Q_{Y} (τ ∣ X) = X β_{T}$ , where $Q_{Y} (τ ∣ X) = X β_{T}$ is the $τ$ th quantile of Y for a given X and $β_{τ}$ is the corresponding regression coefficient.
(3): Definition of loss function: Quantile regression uses a special loss function that adjusts the weights according to the sign of the residuals. The loss function $L_{τ}$ is defined as

$L_{τ} (u) = \{\begin{matrix} τ u u \geq 0 \\ (τ - 1) u u < 0 \end{matrix}$

(43)

$u = Y - X β$ is the residual, and $τ$ is the quantile (between 0 and 1) one wants to estimate.
(4): Parameter estimation: The parameter $β_{τ}$ is estimated by minimizing the overall loss function.

${\hat{β}}_{τ} = a r g min \sum_{i = 1}^{n} L_{τ} (Y_{i} - X_{i} β)$

(44)

Here, $Y_{i}$ and $X_{i}$ are the response and explanatory variables for the ith observation, respectively.
(5): Model validation: Using test set data to assess a model’s predictive power, some measure of error between predicted and actual values can be calculated, such as the mean absolute error (MAE) or quantile absolute deviation (QAD).
(6): Model interpretation: Analyze the estimates of $β_{τ}$ to understand the effect of the explanatory variables on the response variable at a particular quantile level.

5.5.2. Gated Recurrent Unit

Gated recurrent unit (GRU) [36] is a type of recurrent neural network, an improvement of recurrent neural networks (RNNs) and long short-term memory networks (LSTMs) [37], which can better capture dependencies on sequences with a long time-step distance. The reset gate helps to capture short-term dependencies in sequences, and the update gate helps to capture long-term dependencies in sequences. When the reset gate is open, the gated recurrent unit contains the basic recurrent neural network; when the update gate is open, the gated recurrent neural unit can skip subsequences. Figure 15 shows the internal structure of the GRU.

Firstly, we obtain two gating states from the last transmission state

h_{t - 1}

and the current node’s input

x_{t}

, where

r_{t}

controls reset gating and

z_{t}

controls update gating.

{\hat{β}}_{τ} = a r g min \sum_{i = 1}^{n} L_{τ} (Y_{i} - X_{i} β)

(45)

σ

is the sigmoid function by which the data is normalized to a value between [0,1], which acts as a gating signal.

W_{Z}

,

W_{r}

, W is a machine learning process that replaces the weights with new ones after each iteration.

h_{t - 1}^{'} = h_{t - 1} ⊙ r_{t}

(46)

After obtaining the gating signal, use the reset gating to obtain the data after the “reset”, where ⊙ is the Hadamard product.

Splicing

{\tilde{h}}_{t}

with the input

x_{t}

and then normalizing the data to values between [−1,1] by the

t a n h

activation function yields the candidate value

{\tilde{h}}_{t}

.

{\tilde{h}}_{t}

contains the signal features in the current input

x_{t}

and adds the new features recorded through learning.

{\tilde{h}}_{t} = tan h (W \cdot [r_{t} * h_{t - 1}, x_{t}])

(47)

The most critical step of GRU—"updating memory”—is the step in which both “forgetting” and “remembering” take place.

h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * {\tilde{h}}_{t}

(48)

The closer the gating signal

z_{t}

is to 1, the more data is “remembered,” and the closer it is to 0, the more data is “forgotten”.

5.5.3. Bidirectional Gating Unit

A bidirectional gating unit (BiGRU) is essentially a two-layer GRU network, where features are fed into the network training through forward propagation in the forward GRU layer while mining the forward correlation of the data. In the reverse GRU layer, the input sequences are trained by back-propagation to mine the inverse correlation of the data, and this network architecture allows for bidirectional extraction of the input features to enhance the completeness and global nature of the features. Figure 16 shows the internal structure of the BiGRU.

5.5.4. An Interval Prediction Method Incorporating EEMD-QRBiGRU

Since a single BiGRU model can only learn the deterministic mapping relationship between the input features and the prediction target and cannot reflect the information, such as its uncertainty error distribution in the prediction results, this paper combines it with the theory of quantile regression. It proposes a BiGRU prediction model based on quantile regression, which can realize the prediction of the trend of the coal economizer cleanliness factor under different quantiles and thus achieve the function of interval prediction.

The parameters in the BiGRU model are the weights W and bias vectors b of each neuron. The quantile regression method is introduced into the neural network to establish the BiGRU model based on the quantile regression loss function. The conditional quantile of the output response variable Y at the

τ

quantile is

Y^{(τ)} = Q_{Y} (τ |X) = f (\sum_{j = 1}^{J} w_{j} (τ) h_{j} (τ) + b (τ))

(49)

where J is the number of units of the hidden state, f is the activation function of the output layer,

h_{j} (t)

is the output of the BiGRU hidden state, and

w_{j} (t)

and

b_{j} (t)

are the weight and bias of the output layer, respectively.

Updating the network parameters according to the gradient descent algorithm (Adam) yields l BiGRU models with different weights and biases. After a series of forward propagation and backward learning, the predicted values of the response variables under each quantile at the moment

t + h

can be obtained:

Y_{t + h |t} = \{Y_{t + h |t}^{(τ_{1})}, Y_{t + h |t}^{(τ_{2})}, . . ., Y_{t + h |t}^{(τ_{l})}\}

(50)

This enables the estimation of the probability density distribution of

Y_{t + h |t}

, as well as the calculation of confidence intervals for

Y_{t + h |t}

from discrete conditional quartiles.

T_{t + h}^{(β)} = [Y_{t + h |t}^{(\underset{̲}{τ})}, Y_{t + h |t}^{(\bar{τ})}]

(51)

where

T_{t + h}^{(β)}

is the prediction interval at the

β

significance level;

\bar{τ}

and

\underset{̲}{τ}

are the upper and lower limits of the prediction interval, respectively;

β = 1 - (\bar{τ} - \underset{̲}{τ})

; and the confidence level of the interval is

1 - β

.

In summary, the prediction process of the proposed EEMD-QRLSTM is shown in Figure 17, and its algorithm execution steps are as follows:

Denoise the original data using the wavelet thresholding method.
Decompose the denoised data into EEMD data, and obtain nine sets of IMF components after decomposition.
Classify the IMF components using the t-Test to obtain three features: high-frequency components, low-frequency components, and trend terms.
Determine the structure of the network, the number of nodes, and the number of quantile points l; initialize the network; and construct the training set and test set.
Input the training set into QRBiGRU, and train and update the BiGRU model under each quantile point $τ$ .
Enter the explanatory variable $X_{t}$ from the test set into the trained QRBiGRU to obtain the conditional quantile $Y_{t + h |t}$ of the response variable at time t and output the results.

Figure 17. EEMD–QRBiGRU flowchart.

6. Interval Prediction Result Display

6.1. Wavelet Thresholding Method Denoising Module

In this paper, four different threshold functions are used to process the raw data (as shown in Figure 18), namely Rigrsure: unbiased risk estimation threshold; Heursure: heuristic threshold function; Minimax: substantial and tiny thresholds; and Sqtwolog: fixed thresholds. Evaluation metrics of the results of the four threshold functions are shown in Table 2.

By comparing the evaluation indexes, we find that the unbiased risk estimation threshold has the highest signal-to-noise ratio with the smallest root-mean-square error, and it also has the shortest running time. So in this paper, we use the unbiased risk estimation threshold function to deal with the optimized time series.

6.2. Modal Decomposition of the Cleanliness Factor Time Series Using EEMD and Classification Using the t-Test

The data after wavelet threshold denoising still have nonlinear and non-smooth characteristics. Hence, it needs to be further subjected to ensemble empirical modal decomposition (EEMD) to obtain the topical signals containing different time scales of the original signals: the Intrinsic Mode Function components (IMFs), which are shown in Figure 19, and after decomposition, eight Intrinsic Mode Function components and a trend term are obtained.

After the ensemble empirical modal decomposition, a t-Test (as shown in Table 3) is needed to classify these intrinsic mode function components into three categories: high-frequency components, low-frequency components, and trend terms. Knowing that RES is the trend term, IMF1 is defined as indicator 1, IMF1 + IMF2 is defined as indicator 2, and so on, comparing the t-values of indicator 1 and indicator 2, indicator 2 and indicator 3, respectively. We find that the t-values at indicator 7 and indicator 8 are close to 0, which indicates that IMF1∽IMF7 are high-frequency components, and IMF8 is a low-frequency component.

6.3. BiGRU Time Series Forecasting Based on Quantile Regression

After obtaining the three key features, the high-frequency component, the low-frequency component, and the trend term, interval prediction of the cumulative grey time series is performed using a two-way gated neural network based on quantile regression. The training set data are set: test set data = 8:2, and the experimental results are obtained as shown in Figure 20.

In this paper, we use mean square error (MAPE), mean square error (MSE), interval coverage (PICP), and predicted interval average width (PINAW) as the evaluation metrics to evaluate the prediction effect of the test set and the training set.

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}|

(52)

The value of MAPE ranges from

[0, + \infty)

, with a MAPE of 0% indicating a perfect model and a MAPE greater than 100% indicating a poor quality model.

M S E = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}

(53)

The range of values for MSE is

[0, + \infty)

, which is equal to 0 when the predicted value coincides exactly with the true value, and the model is perfect; the larger the error, the larger the MSE value.

P I C P = \frac{1}{K} \sum_{i = 1}^{K} a_{i} \{\begin{matrix} y_{i} \in [L_{i}, U_{i}], a_{i} = 1 \\ y_{i} ∋ [L_{i}, U_{i}], a_{i} = 0 \end{matrix}

(54)

a_{i}

is a Boolean variable;

L_{i}

and

U_{i}

represent the next and previous terms corresponding to the ith prediction interval, respectively; and

y_{i}

represents the actual value. The closer the PICP is to 1, the higher the probability of falling into the prediction interval, which means the better the prediction is. However, when the prediction interval is wider, the coverage of the prediction interval will also increase, and the information that can be provided will be less, so this paper introduces the average bandwidth of the prediction interval (PINAW).

P I N A W = \frac{1}{K R} \sum_{i = 1}^{K} (U_{i} - L_{i})

(55)

R = y_{max} - y_{min}

, where R represents the extreme deviation from the true value. A smaller PINAW means a better prediction, and there is actually an inverse relationship between PICP and PINAW. The comparison of the evaluation metrics for the test set and the training set is shown in Table 4.

6.4. Comparison of the Results of 4 Prediction Models Based on Quantile Regression

Through the evaluation indexes of the above experiments, it can be seen that the bi-directional gated cyclic unit based on quantile regression is very effective in predicting the intervals of the accumulated grey segments, so it can perform better in comparison with other optimization algorithms, for which this paper carries out the following kinds of comparisons, including the comparison among QRBiGRU, QRBiLSTM, QRLSTM [38], QRGRU, and the original data.

For the key hyperparameters of the neural network (including the number of hidden units, learning rate, dropout rate, and number of quantiles, etc.), this study adopts Bayesian optimization, tree-structured Parzen estimator (TPE), random search, and grid search algorithms to perform the systematic optimization process. Each model architecture undergoes independent hyperparameter optimization to determine its optimal configuration, and fair performance comparisons are conducted based on these optimal configurations. Table 5 presents the hyperparameter optimization results of each neural network. Thirty Bayesian optimization experiments were conducted using the Optuna framework, with the objective function being the quantile loss of the validation set.

From Figure 21 and Table 6, it can be seen that the prediction results of the QRBiGRU model show the best fit with the original data, and all its evaluation indicators are significantly better than those of the comparison models. Taking the MSE as an example, the prediction error of QRBiGRU (1.437 ×

10^{- 7}

) is only 1/45.5 of that of the QRGRU model (6.534 ×

10^{- 6}

), and the model accuracy has improved by more than 97.8%.

7. Conclusions

This study proposes an integrated framework that combines deep learning and quantile regression to optimize the energy efficiency of economizers in coal-fired power plants. Based on key combustion parameters of the boiler, an ash deposition monitoring model is developed to characterize the convective heat transfer efficiency of the economizer by calculating its cleanliness factor. The derived attenuation of heat transfer efficiency function is optimized using the improved GRSABO algorithm to determine the optimal soot-blowing start and end points, thereby minimizing the total attenuation of heat transfer efficiency and steam loss (

Q_{s b}

). The processed ash deposition data segments are further subjected to interval prediction through QRBiGRU, which effectively quantifies the prediction uncertainty.

Validation results on a 300 MW unit in Guizhou Province show that the QRBiGRU model outperforms benchmark models in prediction accuracy, can improve heat transfer efficiency by approximately 4%, and can robustly characterize the uncertainty distribution without incurring significant computational load, providing empirical support for the optimization of power plant operations.

It is worth noting that the current framework is based on the steady-state assumption and has limited adaptability to transient operating conditions such as start-up and deep peak regulation (e.g., load below 30%). It is difficult to capture the abrupt changes in flue gas components caused by auxiliary fuel combustion during the start-up phase and the abnormal local ash deposition induced by flow field reconstruction under low loads. Future research can focus on developing adaptive digital twin models that can adapt to diverse operating conditions and load fluctuations, which will be crucial for addressing the challenge of renewable energy intermittency. This study provides a methodological foundation for such advancements in the field of power plant digitalization.

Author Contributions

Conceptualization, Y.S. and N.W.; methodology, F.C.; software, J.W.; validation, J.J.; writing—original draft preparation, N.W.; validation, N.W.; writing—review and editing, B.W.; funding acquisition, F.C., Y.S. and N.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China grant number 72071183, the Fundamental Research Program of Shanxi Province grant number 202303021222084 and Graduate School-level Science and Technology Project of the North University of China grant number 20242066.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to project data restriction.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CF	Cleanliness Factor
BiGRU	Bi-directional Gated Recurrent Units
QR	Quantile Regression
SABO	Subtraction-Average-Based Optimizer
WOA	Whale Optimization Algorithm
GWO	Grey Wolf Optimization
PSO	Particle Swarm Optimization
SOA	Seagull Optimization Algorithm
SSA	Sparrow Search Algorithm
GA	Golden Sine Algorithm
EEMD	Ensemble Empirical Mode Decomposition
LSTM	Long Short-Term Memory
GRU	Gated Recurrent Units
MAPE	Mean Absolute Percentage Error
MSE	Mean Square Error
PICP	Prediction Interval Coverage Probability
PINAW	PI Normalized Average Width
TPE	Tree-structured Parzen Estimator

References

Liu, P.; Peng, H. What drives the green and low-carbon energy transition in China? An empirical analysis based on a novel framework. Energy 2022, 239, 122450. [Google Scholar] [CrossRef]
Zhao, C. Is low-carbon energy technology a catalyst for driving green total factor productivity development? The case of China. J. Clean. Prod. 2023, 13, 428. [Google Scholar] [CrossRef]
Fang, D.; Shi, S.; Yu, Q. Evaluation of Sustainable Energy Security and an Empirical Analysis of China. Sustainability 2018, 10, 5. [Google Scholar] [CrossRef]
Zakari, A.; Musibau, H.O. Sustainable economic development in OECD countries: Does energy security matter? Sustain. Dev. 2024, 1, 32. [Google Scholar] [CrossRef]
Ning, Z.; Meimei, X.; Xiaoyu, W.; Hongcai, D.; Yunzhou, Z.; Lin, L.; Dong, Z. Comparison and Enlightenment of Energy Transition Between Domestic and International. Electr. Power 2021, 54, 113–119. [Google Scholar]
Zhang, Y.N.; Liu, Y.N.; Ji, C.L. A research on sustainability evaluation and low-carbon economy in China. IOP Conf. Ser. Earth Environ. Sci. 2019, 233, 052008. [Google Scholar] [CrossRef]
Shuai, Y.; Zhao, B.; Jiang, D.F.; He, S.; Lyu, J.; Yue, G. Status and prospect of coal-fired high efficiency and clean power generation technology in China. Therm. Power Gener. 2022, 51, 1. [Google Scholar]
Wen, J.; Shi, Y.; Pang, X.; Jia, J.; Zeng, J. Optimal soot blowing strategies in boiler systems with variable steam flow. In Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 2284–2289. [Google Scholar]
Shi, Y.; Wang, J.; Liu, Z. On-line monitoring of ash fouling and soot-blowing optimization for convective heat exchanger in coal-fired power plant boiler. Appl. Therm. Eng. 2015, 78, 39–50. [Google Scholar] [CrossRef]
Peña, B.; Teruel, E.; Díez, L.I. Soft-computing models for soot-blowing optimization in coal-fired utility boilers. Appl. Soft Comput. 2011, 11, 1657–1668. [Google Scholar] [CrossRef]
Xu, L.; Huang, Y.; Yue, J.; Dong, L.; Liu, L.; Zha, J.; Yu, M.; Chen, B.; Zhu, Z.; Liu, H. Improvement of slagging monitoring and soot-blowing of waterwall in a 650 MWe coal-fired utility boiler. J. Energy Inst. 2021, 96, 106–120. [Google Scholar] [CrossRef]
Bongartz, D.; Najman, J.; Mitsos, A. Deterministic global optimization of steam cycles using the IAPWS-IF97 model. Optim. Eng. 2020, 21, 1095–1131. [Google Scholar] [CrossRef]
Adebayo, T.S. Environmental consequences of fossil fuel in Spain amidst renewable energy consumption: A new insights from the wavelet-based Granger causality approach. Int. J. Sustain. Dev. World Ecol. 2022, 29, 579–592. [Google Scholar] [CrossRef]
American Society of Mechanical Engineers. ASME PTC 4-2013: Fired Steam Generators; American Society of Mechanical Engineers: San Diego, CA, USA, 2013. [Google Scholar]
Hottel, H.C.; Cohen, E.S. Radiant heat exchange in a gas-filled enclosure. Ind. Eng. Chem. 1954, 46, 1738–1747. [Google Scholar]
Grimson, A.W. Correlation and utilization of new data on flow resistance and heat transfer for crossflow of gases over tube banks. Trans. ASME 1937, 59, 583–594. [Google Scholar]
Incropera, F.P.; DeWitt, D.P. Fundamentals of Heat and Mass Transfer; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
Zhu, Q. Developments in circulating fluidised bed combustion. IEA Clean Coal Cent. 2013, 300, 219. [Google Scholar]
Trojovský, P.; Dehghani, M. A New Swarm-Inspired Metaheuristic Algorithm for Solving Optimization Problems. Biomimetics 2023, 8, 149. [Google Scholar] [CrossRef]
Chen, J.; Pan, J.; Li, Z.; Zi, Y.; Chen, X. Generator bearing fault diagnosis for wind turbine via empirical wavelet transform using measured vibration signals. Renew. Energy 2016, 89, 80–92. [Google Scholar] [CrossRef]
Hao, L.; Naiman, D.Q. Quantile Regression; Sage: Thousand Oaks, CA, USA, 2007; Volume 149. [Google Scholar]
Feng, X.; Li, Q.; Zhu, Y.; Hou, J.; Jin, L.; Wang, J. Artificial neural networks forecasting of PM2.5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmos. Environ. 2015, 107, 118–128. [Google Scholar] [CrossRef]
Perez-Ramirez, C.A.; Amezquita-Sanchez, J.P.; Valtierra-Rodriguez, M.; Adeli, H.; Dominguez-Gonzalez, A.; Romero-Troncoso, R.J. Recurrent neural network model with Bayesian training and mutual information for response prediction of large buildings. Eng. Struct. 2019, 178, 603–615. [Google Scholar] [CrossRef]
Sun, W.; Huang, C. A carbon price prediction model based on secondary decomposition algorithm and optimized back propagation neural network. J. Clean. Prod. 2020, 243, 118671. [Google Scholar] [CrossRef]
Wang, Z.L.; Yang, J.P.; Shi, K.; Xu, H.; Qiu, F.Q.; Yang, Y.B. Recent advances in researches on vehicle scanning method for bridges. Int. J. Struct. Stab. Dyn. 2022, 22, 2230005. [Google Scholar] [CrossRef]
Wu, J.; Dong, J.; Wang, Z.; Hu, Y.; Dou, W. A novel hybrid model based on deep learning and error correction for crude oil futures prices forecast. Resour. Policy 2023, 83, 103602. [Google Scholar] [CrossRef]
Xiong, J.; Peng, T.; Tao, Z.; Zhang, C.; Song, S.; Nazir, M.S. A dual-scale deep learning model based on ELM-BiLSTM and improved reptile search algorithm for wind power prediction. Energy 2023, 266, 126419. [Google Scholar] [CrossRef]
Zhang, S.; Luo, J.; Wang, S.; Liu, F. Oil price forecasting: A hybrid GRU neural network based on decomposition–reconstruction methods. Expert Syst. Appl. 2023, 218, 119617. [Google Scholar] [CrossRef]
Zhou, F.; Huang, Z.; Zhang, C. Carbon price forecasting based on CEEMDAN and LSTM. Appl. Energy 2022, 311, 118601. [Google Scholar] [CrossRef]
Yu, Z.; Guindani, M.; Grieco, S.F.; Chen, L.; Holmes, T.C.; Xu, X. Beyond t test and ANOVA: Applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron 2022, 110, 21–35. [Google Scholar] [CrossRef]
Fan, J.; Li, Q.; Wang, Y. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. B 2017, 79, 247–265. [Google Scholar] [CrossRef]
Lin, G.; Lin, A.; Gu, D. Using support vector regression and K-nearest neighbors for short-term traffic flow prediction based on maximal information coefficient. Inf. Sci. 2022, 608, 517–531. [Google Scholar] [CrossRef]
Yang, X.; Yuan, C.; He, S.; Jiang, D.; Cao, B.; Wang, S. Machine learning prediction of specific capacitance in biomass derived carbon materials: Effects of activation and biochar characteristics. Fuel 2023, 331, 125718. [Google Scholar] [CrossRef]
Wei, Y.; Pere, A.; Koenker, R.; He, X. Quantile regression methods for reference growth charts. Stat. Med. 2006, 25, 1369–1382. [Google Scholar] [CrossRef]
Qin, Y.; Chen, D.; Xiang, S.; Zhu, C. Gated dual attention unit neural networks for remaining useful life prediction of rolling bearings. IEEE Trans. Ind. Inform. 2020, 17, 6438–6447. [Google Scholar] [CrossRef]
Yuan, L.; Nie, L.; Hao, Y. Communication spectrum prediction method based on convolutional gated recurrent unit network. Sci. Rep. 2024, 14, 8959. [Google Scholar] [CrossRef]
Ke, J.; Zheng, H.; Yang, H.; Chen, X.M. Short-term forecasting of passenger demand under on-demand ride services: A spatio-temporal deep learning approach. Transp. Res. C Emerg. Technol. 2017, 85, 591–608. [Google Scholar] [CrossRef]
Yi, Y.; Cui, K.; Xu, M.; Yi, L.; Yi, K.; Zhou, X.; Liu, S.; Zhou, G. A long-short dual-mode knowledge distillation framework for empirical asset pricing models in digital financial networks. Connect. Sci. 2024, 36, 2306970. [Google Scholar] [CrossRef]

Figure 4. Measuring points for calculating the actual heat transfer coefficient.

Figure 5. Cleanliness factor and load variation curves.

Figure 6. Economizer cleanliness factor raw data.

Figure 7. Comparison of raw data before and after fitting.

Figure 8. Attenuation of heat transfer efficiency at the heating surface of the economizer.

Figure 9. (a) v-subtraction “extraction phase. (b) v-subtraction” exploratory phase.

Figure 10. (a) Distribution of the population after chaotic mapping. (b) Population chaos values and corresponding frequencies.

Figure 11. Schematic diagram of the four independent test roulettes.

Figure 12. Schematic of the objective function.

Figure 13. (a) Soot-blowing starting points calculated by seven optimization algorithms. (b) Soot blowing end points calculated by seven optimization algorithms. (c) Attenuation of heat transfer efficiency areas calculated by seven optimization algorithms.

Figure 14. (a) Optimization results graphic. (b) Interval forecast target data.

Figure 15. Structure diagram of the GRU model.

Figure 16. BiGRU internal structure diagram.

Figure 18. Comparison of the processing results of four threshold functions.

Figure 19. EEMD decomposition of ash accumulation curve.

Figure 20. (a) Test set prediction results. (b) Error histogram. (c) Training set prediction results. (d) Data fitting curve.

Figure 21. Comparison of four improved models based on quantile regression.

Table 1. Comparison of optimal solutions of seven optimization algorithms and frequency of occurrence of optimal solutions.

Optimization Algorithms	$x_{1}$	$x_{2}$	$A_{g}$
WOA	29,000	33,500	11,324.74
GWO	28,890	33,432	11,187.51
SABO	28,800	33,300	11,057.36
PSO	28,800	33,308	11,058.48
SOA	28,800	33,400	11,071.07
SSA	28,800	33,378	11,068.12
GRSABO	28,800	33,300	11,057.36

Table 2. Comparison of four threshold function evaluation indicators.

Evaluation Indicators	Rigrsure	Minimax	Sqtwolog	Heursure
SNR	60.7562	53.5176	49.7381	51.3049
RMSE	0.000595	0.001369	0.002116	0.001767
Runtime (second)	0.3429	0.3517	0.3712	0.3498

Table 3. t-value from the t-Test between indicators.

Indicator	1,2	2,3	3,4	4,5	5,6	6,7	7,8
t-value	0.359666	0.65325	0.494376	0.796583	0.353716	0.828541	1.76 × $10^{- 9}$

Table 4. Comparison of evaluation indexes related to the test set and training set.

Evaluation Indicators	Training Sets	Test Sets
MAPE	0.00037	0.00046
MSE	9.431 × $10^{- 8}$	1.437 × $10^{- 7}$
PICP	0.96875	0.98000
PINAW	0.00221	0.00264

Table 5. Hyperparameter optimization results.

Model	Hidden Units	Learning Rate	Dropout	Quantile	Batch Size
QRBiGRU	192	0.0012	0.35	7	32
QRBiLSTM	176	0.0008	0.18	7	16
QRGRU	96	0.003	0.42	5	64
QRLSTM	64	0.005	0.25	5	128

Table 6. Comparison of results of different quantile regression methods.

Evaluation Indicators	QRBiGRU	QRBiLSTM	QRGRU	QRLSTM
MAPE	0.00046	0.00080	0.00272	0.00149
MSE	1.437 × $10^{- 7}$	1.923 × $10^{- 6}$	6.534 × $10^{- 6}$	7.279 × $10^{- 7}$
PICP	0.98000	0.92137	0.81546	0.92231
PINAW	0.00263	1.99975	5.21438	3.48572

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, N.; Shi, Y.; Cui, F.; Wen, J.; Jia, J.; Wang, B. Improving the Heat Transfer Efficiency of Economizers: A Comprehensive Strategy Based on Machine Learning and Quantile Ideas. Energies 2025, 18, 4227. https://doi.org/10.3390/en18164227

AMA Style

Wang N, Shi Y, Cui F, Wen J, Jia J, Wang B. Improving the Heat Transfer Efficiency of Economizers: A Comprehensive Strategy Based on Machine Learning and Quantile Ideas. Energies. 2025; 18(16):4227. https://doi.org/10.3390/en18164227

Chicago/Turabian Style

Wang, Nan, Yuanhao Shi, Fangshu Cui, Jie Wen, Jianfang Jia, and Bohui Wang. 2025. "Improving the Heat Transfer Efficiency of Economizers: A Comprehensive Strategy Based on Machine Learning and Quantile Ideas" Energies 18, no. 16: 4227. https://doi.org/10.3390/en18164227

APA Style

Wang, N., Shi, Y., Cui, F., Wen, J., Jia, J., & Wang, B. (2025). Improving the Heat Transfer Efficiency of Economizers: A Comprehensive Strategy Based on Machine Learning and Quantile Ideas. Energies, 18(16), 4227. https://doi.org/10.3390/en18164227

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improving the Heat Transfer Efficiency of Economizers: A Comprehensive Strategy Based on Machine Learning and Quantile Ideas

Abstract

1. Introduction

2. Problem Description

2.1. Introduction to the Structure of the Boiler and Economizer

2.2. Grey Pollution Monitoring Model Construction

3. Full Process Modeling of Economizer Energy Efficiency

3.1. Raw Data Plotting

3.2. Data Pre-Processing

3.3. Optimization Problem Description

4. Improved Subtraction-Average-Based Optimizer and Optimization Results

4.1. Subtraction-Average-Based Optimizer

4.2. Golden Ratio Strategy

4.3. Piecewise Chaos Mapping

4.4. Roulette Wheel Selection

4.5. Solving Targeted Problems with GRSABO

4.6. Optimization Results and Validation

5. Optimization Results Applied to Interval Prediction

5.1. Integration and Application of Optimization Results

5.2. Wavelet Thresholding Method for Denoising

5.3. Ensemble Empirical Mode Decomposition

5.4. t-Test

5.5. Interval Forecasting

5.5.1. Quantile Regression

5.5.2. Gated Recurrent Unit

5.5.3. Bidirectional Gating Unit

5.5.4. An Interval Prediction Method Incorporating EEMD-QRBiGRU

6. Interval Prediction Result Display

6.1. Wavelet Thresholding Method Denoising Module

6.2. Modal Decomposition of the Cleanliness Factor Time Series Using EEMD and Classification Using the t-Test

6.3. BiGRU Time Series Forecasting Based on Quantile Regression

6.4. Comparison of the Results of 4 Prediction Models Based on Quantile Regression

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI