Prediction of Energy Consumption in a Coal-Fired Boiler Based on MIV-ISAO-LSSVM

: Aiming at the problem that the energy consumption of the boiler system varies greatly under the flexible peaking requirements of coal-fired units, an energy consumption prediction model for the boiler system is established based on a Least-Squares Support Vector Machine (LSSVM). First, the Mean Impact Value (MIV) algorithm is used to simplify the input characteristics of the model and determine the key operating parameters that affect energy consumption. Secondly, the Snow Ablation Optimizer (SAO) with tent map, adaptive t -distribution, and the opposites learning mechanism is introduced to determine the parameters in the prediction model. On this basis, based on the operation data of an ultra-supercritical coal-fired unit in Xinjiang, China, the boiler energy consumption dataset under variable load is established based on the theory of fuel specific consumption. The proposed prediction model is used to predict and analyze the boiler energy consumption, and a comparison is made with other common prediction methods. The results show that compared with the LSSVM, BP, and ELM prediction models, the average Relative Root Mean Squared Errors (aRRMSE) of the LSSVM model using ISAO are reduced by 2.13%, 18.12%, and 40.3%, respectively. The prediction model established in this paper has good accuracy. It can predict the energy consumption distribution of the boiler system of the ultra-supercritical coal-fired unit under variable load more accurately.


Introduction
The combination of practical engineering problems with data analysis and mechanism interpretation has become increasingly popular with the rapid development of big data analysis technology and intelligent machine learning algorithms [1,2].In this context, historical operation data of coal-fired units is particularly important for optimizing unit operation and diagnosing unit status.Simple mechanism analysis can be misleading due to experience, hence the need for a more comprehensive approach.Many researchers are now focusing on the intersection of mechanism analysis, data mining, machine learning, and other related disciplines.The operation, control, design, and transformation of the unit are studied using extensive historical data.
Currently, numerous scholars have utilized machine learning and data mining techniques in researching thermal power units.Yang et al. [3] proposed a soft-sensor model for NO X concentration at the SCR outlet based on the gated cyclic unit neural network.Yang et al. [4] conducted a study on the optimization of the collaborative recovery system of flue gas waste heat and water in coal-fired units.Sun et al. [5] established an NO X prediction model for coal-fired units using Bayesian optimization combined with random forest, providing assistance for NO X emission control.Dai et al. [6] proposed a hierarchical clustering retrieval strategy based on the fuzzy C-means clustering algorithm to construct an offline database; they also employed a multi-attribute decision-making method that comprehensively considered multiple objectives to extract optimal decision samples from the database and then guide the load scheduling scheme of power plants.Blackburn et al. [7] conducted dynamic optimization for variable load control of coal-fired units using the particle swarm optimization algorithm and LSTM.Wang et al. [8] used clustering and prediction algorithms to build a multi-objective regression model to predict and control boiler efficiency, improve boiler combustion efficiency, and reduce NOx emissions.
In the study of unit energy consumption, researchers typically employ data analysis and other methods to diagnose and optimize unit operation.Fu et al. [9] utilized the fuzzy rough set attribute reduction method to reduce the main characteristic parameters that affect unit energy consumption.They then established a sensitivity analysis model for unit energy consumption using support vector machines.Cai [10] proposed an intelligent algorithm that combines a generalized neural network and average influence value to classify the factors that affect the energy consumption of units.They established an energy consumption characteristic model for large coal-fired units.Xiao et al. [11] used RapidMiner, a data mining platform, to establish a prediction model for exergic efficiency of the unit, and explored the improvement space of exergic loss based on the random simulation method, so as to obtain the best working conditions of the unit under various loads and corresponding control index parameters during actual operation.Sun.[12] obtained the nonlinear relationship between power supply coal consumption and various operation control parameters by using a random forest algorithm, and combined with a genetic algorithm, proposed the unit operation optimization strategy.Wan et al. [13] developed a neural network-based model to predict steam flow at the outlet of cogeneration boilers.Wang et al. [14] developed a hierarchical energy efficiency index system for unit economic diagnosis.They analyzed the variation characteristics of key energy consumption indicators on the boiler and turbine sides through mechanism research and actual data trends, thus helping to analyze the causes of abnormal operation.
In many previous studies, researchers mainly used data analysis and other methods to carry out unit diagnosis, operation optimization, and other aspects of the work, and the accuracy of the research model will have a great impact on the effectiveness of the work.Currently, the primary prediction models consist of the neural network and support vector machine (SVM) [15].The Least-Squares Support Vector Machine (LSSVM) model possesses the strengths of both, including the strong generalization ability and global optimization of SVM, while avoiding the issues of overfitting in neural networks and time-consuming training in SVM [16].However, setting the hyperparameters in the LSSVM model is often based on experience, which can lead to problems such as low efficiency and a lack of precision.To address this issue, it is necessary to use optimizers to obtain the best hyperparameters.
Since China announced the '30•60' decarbonization goal, traditional coal-fired units have increasingly participated in deep peak regulation to adapt to the intermittency and volatility of new energy power generation.This has led to an increase in load adjustment frequency and range during actual production.Exergy analysis reveals that the energy consumption of the boiler, which is the part of the unit with the greatest energy loss, has also changed significantly.To help staff in comprehending the operational status of a coal-fired unit under flexible peak load balancing and to support the optimization of the boiler system's operation, this study focuses on a 660 MW coal-fired unit as the research subject.A prediction model for the boiler system's energy consumption and distribution is established, and a prediction analysis of the boiler's energy consumption is conducted.The paper presents the main contributions as follows: (1) Construction of the boiler energy consumption dataset using the theory of fuel specific consumption.The mean impact value algorithm is used to extract eigenvalues, resulting in a more comprehensive model input with a stronger correlation to boiler energy consumption compared to other manual input specification methods.(2) The snow ablation algorithm was improved using tent map, adaptive t-distribution, and the opposites learning mechanism to enhance its optimization ability and speed.(3) By adjusting the model's hyperparameters using an optimization Processes 2024, 12, 422 3 of 16 algorithm instead of manually setting the hyperparameters, the prediction model can better align with the operation of a specific unit, resulting in improved performance.

Snow Ablation Optimizer
The Snow Ablation Optimizer (SAO) is a kind of meta-heuristic algorithm.The source of inspiration for it arises from the sublimation and melting behavior of snow, and it shows strong performance in comparison with other optimizers [17].The initial population is generated using the following formula: where: i = 1, 2 . . .N; j = 1, 2, . . .Dim; N is the population number; Dim is the search space dimension.UB j and LB j are the upper and lower limits of the j-dimensional, respectively.r is the random number between 0 and 1.
There are two strategies for the algorithm to iterate: (1) The exploitation phase Instead of expanding with a highly decentralized feature in the solution space, search agents are encouraged to exploit high-quality solutions around the current best solution when the snow converts into liquid water by melting behavior.The algorithm updates the regions where the optimal solution may exist based on the distribution of the current solution.The mathematical expression for this process is as follows: where: t is the number of current iterations; X t+1 is the new individual based on X t ; X b is the best individual in the population obtained in the t-th iteration; BM t indicates a vector including random numbers on the basis of Gaussian distribution denoting the Brownian motion; X is the average of all individuals in the population; X t is the individual in the population obtained in the t-th iteration; m and θ 1 are the algorithm parameters.
(2) The exploration phase In the exploration phase, after snow or liquid water are formed into water vapor by sublimation or evaporation, the water vapor moves in space without rules and explores through irregular movement.Using the randomness and irregularity of Brownian motion in the SAO algorithm makes it easier for individuals in the population to explore valuable and potential areas in the exploration process.The mathematical expression for this is as follows: where: E t refers to the individual randomly selected between the top three individuals with fitness values in the population and X; θ 2 is an algorithm parameter.

Algorithm Improvement Strategy
The SAO algorithm has two defects.The first is that during the algorithm's execution, the population is divided into two groups, each executing one of the two strategies mentioned above.After each iteration, an individual executing the exploitation strategy is randomly selected to switch to the exploration strategy.This means that the exploitation phase ends after N/2 iterations.However, this processing method may not be conducive to a comprehensive solution.Therefore, in this paper, when t ≤ (2/3)T, exploitation and exploration occur in parallel, and the number of individuals is in a 1:2 ratio, and when t ≤ (2/3)T, all individuals execute the exploration strategy.The second issue is that the original algorithm completely discards the old individual after each iteration, yet the new individual may be worse than the old one.Therefore, this paper compares the fitness of the old and new individuals, retains the better one, and improves the improved Snow Ablation Optimizer (ISAO) using the following three methods.
(1) Tent map The generation rule for the initial population has a significant impact on the efficiency of the intelligent optimization algorithm.A uniformly distributed population in the initial stage of the algorithm can expand the search range of the algorithm, leading to improved convergence speed and solving accuracy.To ensure a more comprehensive and balanced initial population, a tent map is used in this paper [18], which is defined as follows: where p n and p (n+1) are tent sequence values from the tent map and they are used to replace r in Formula (1); τ is the threshold with a value of 0.7.The random number r used in Formula ( 1) is replaced by the generated tent sequence values.
(2) Adaptive t-distribution The adaptive t-distribution's probability density function is given by [19]: When n equals 1, p (n) ∼ C(0, 1), it is a Cauchy distribution.As n increases, the t-distribution approaches normality.When n → ∞ , p (n) → N(0, 1) , it is approximate Gaussian distribution.This paper uses the adaptive t-distribution and its mutation operators λ to update the potential regional locations of the optimal solution determined by the exploitation strategy.This improves the algorithm's global search ability.The variable n represents the current iteration number t, and the formula for updating the position is as follows: where: X ′ t is the updated individual; λ = 1 − t/(T − 1).With the increase of the number of iterations, its effect of controlling variation becomes weaker.
(3) Opposites learning mechanism During the algorithm's exploration stage, the opposites learning mechanism [20] was used to prevent local optimization.After each iteration, the optimal and worst solutions were selected, and their corresponding opposite solutions were calculated.The better solution was then retained.The opposite solution X op t was obtained from the following formula: In this paper, θ = 1, and the above formula is standard opposites learning.
(4) Quick search strategy After implementing the aforementioned strategies, the algorithm's optimization ability was enhanced, albeit at a slight cost to its speed.Consequently, this paper employs the following formula to update the exploration stage, with the aim of bringing the entire population closer to the optimal solution and improving the algorithm's efficiency.

ISAO Optimization Ability Test
To test the effectiveness of ISAO, this paper selected several test functions from the CEC2017 standard function test library for optimization ability testing.Table 1 displays the expressions of the test functions and their corresponding parameters.ISAO was compared with commonly used optimizers such as AO [21], GSA [22], GWO [23], WOA [24], AVOA [25], and DBO [26].Each function was tested 30 times, and the results were averaged.Table 2 shows that ISAO outperforms SAO and other algorithms in multiple test functions, achieving a significantly lower optimal value and demonstrating better ability to escape local optima.Figure 1 displays the iterative graph of fitness values for different algorithms.Although the graph shows that ISAO is not the fastest of all optimizers in terms of optimization speed, it requires fewer iterations to achieve convergence compared to SAO, as seen in graphs (a), (c), and (d).In general, the ISAO algorithm has advantages in terms of convergence speed and optimization ability.

Mean Impact Value
The selection of input features has a direct impact on the accuracy and computational efficiency of the model.Currently, the Mean Impact Value (MIV) algorithm is considered one of the best methods for achieving data dimensionality reduction when combined with neural networks [27][28][29].The MIV algorithm measures the importance of independent variables to the dependent variable by comparing the absolute value of the MIV of each feature.The positive and negative values of the variable indicate a positive or negative

Mean Impact Value
The selection of input features has a direct impact on the accuracy and computational efficiency of the model.Currently, the Mean Impact Value (MIV) algorithm is considered one of the best methods for achieving data dimensionality reduction when combined with neural networks [27][28][29].The MIV algorithm measures the importance of independent variables to the dependent variable by comparing the absolute value of the MIV of each feature.The positive and negative values of the variable indicate a positive or negative correlation between the independent and dependent variables, respectively.
Since the MIV algorithm needs to be combined with a neural network to realize feature screening, a back propagation neural network (BP) was selected in this paper.Firstly, a BP model was constructed using the training samples.Secondly, each feature in the training sample was increased or decreased by 10% on the original basis to form two new training samples.Finally, the BP model was used to predict and calculate the new sample data.The MIV value of the feature on the output was obtained by arithmetically averaging the difference between the two results, which reflects the influence of the feature on the output after proportional increase or decrease.Figure 2

Least-Squares Support Vector Machine
Suykens et al. [30,31] developed the Least-Squares Support Vector Machine (LSSVM) by transforming the solution of the original quadratic programming problem into the solution of linear equations.The mathematical theory is presented below.
For nonlinear regression samples {xi, yi} (i = 1, 2, 3, …, n; xi ∈ R n ), a mapping function is introduced to map them to a high-dimensional space for linear regression.In this space, LSSVM is expressed as: where: T ω is the weight vector; b is the offset quantity.
The function and constraints are objective and clearly stated: .
where: i ζ is the relaxation variable; γ is a regularization parameter.
Introducing the Lagrange multiplier α = [α1, …, αn], we obtain: The partial derivative of Formula ( 11) is obtained and solved, and the kernel function is introduced.The expression for the regression function of the LSSVM model can be obtained as follows: ( ) ,

Least-Squares Support Vector Machine
Suykens et al. [30,31] developed the Least-Squares Support Vector Machine (LSSVM) by transforming the solution of the original quadratic programming problem into the solution of linear equations.The mathematical theory is presented below.
For nonlinear regression samples {x i , y i } (i = 1, 2, 3, . .., n; x i ∈ R n ), a mapping function is introduced to map them to a high-dimensional space for linear regression.In this space, LSSVM is expressed as: where: ω T is the weight vector; b is the offset quantity.
The function and constraints are objective and clearly stated: where: ζ i is the relaxation variable; γ is a regularization parameter.
Introducing the Lagrange multiplier α = [α 1 , . .., α n ], we obtain: The partial derivative of Formula ( 11) is obtained and solved, and the kernel function K x i , x j = φ(x i ) T φ x j is introduced.The expression for the regression function of the LSSVM model can be obtained as follows:

MIV-ISAO-LSSVM
Literature [32] investigated the effectiveness of different kernel functions when used in conjunction with the LSSVM model for wind power prediction.The results indicated that the radial basis function (RBF) produced the lowest prediction errors.This paper selected RBF as the kernel function for the prediction model.The mathematical expression is as follows: The prediction model based on the RBF kernel function requires optimization of two hyperparameters: the kernel function parameter σ and the regularization parameter γ.The search area for σ and γ are set as [0.1, 1000] and [0.01, 100], respectively.
To evaluate the model's performance comprehensively, this paper adopted the average Relative Root Mean Squared Error (aRRMSE) [33,34] as the fitness function of the optimizers.aRRMSE measures the mean square error of the model's prediction results on all target outputs relative to the sample mean of the test set.The smaller it is, the better the performance of the model on the test set.The formula for calculating aRRMSE is as follows: where q represents the number of output targets of the model, and N represents the number of samples.The variable y i represents actual value of target j corresponding to test set sample i, while ŷ(j) i represents the predicted value of target j corresponding to sample i of the test set.Finally, y (j) represents the average value of the actual value of target j in all sample points of the test set.
After determining the optimization parameters and fitness function, the ISAO algorithm was used to establish the LSSVM prediction model, as shown in Figure 3.
The model operates as follows: (1) Clean the dataset and select features; (2) Specify the number of algorithm populations, set the maximum number of iterations, and generate initial populations using the tent map; ing to test set sample i, while ( ) ˆj i y represents the predicted value of target j corresponding to sample i of the test set.Finally, ( ) j y represents the average value of the actual value of target j in all sample points of the test set.
After determining the optimization parameters and fitness function, the ISAO algorithm was used to establish the LSSVM prediction model, as shown in Figure 3.The study is based on the operating data of a 660 MW ultra-supercritical coal-fired unit in Xinjiang, China, for the whole month of July 2022.The data collection interval is 5 min and the load range is 25-97% (the highest load is 640.77MW and the lowest load is 164 MW).The dataset contains 166 characteristics and a total of 8929 sample points, some of which are shown in Table 3.It should be noted that the data presented in the table are a simplified version of the actual data collected.For instance, the table only shows one measuring point for the steam temperature before the reheater desuperheater, whereas there are actually six measuring points; the data presented in the table are the arithmetic average of these six points.According to the theory of fuel specific consumption [35,36], assuming that the output power of the unit is P and the amount of fuel consumed is B, exergy is embodied in e p and e f , respectively, and the sum of irreversible loss of chemical energy to electrical energy is ∑ I i , we can obtain: Divide the left and right by P • e f to obtain the unit consumption analysis model of power generation: where b = B/P is the actual power consumption of the unit; e p is the exergy of electrical energy of 1 kW•h/(kW•h); e f is the exergy of standard coal of 7000 Kcal, and b min = e p /e f is the theoretical power generation unit consumption of 123 g/(kW•h).This means that without any energy loss, 123 g of standard coal can produce 1 kW•h of electricity; b i = I i / P•e f is the additional energy consumption of the unit corresponding to the irreversible loss of the i-th link or equipment.
Using the above theory, we calculated the irreversible loss of each piece of boiler equipment, such as the economizer, water wall, low temperature superheater, platen superheater, final superheater, low temperature reheater, final reheater, and air preheater.Then, we calculated the energy consumption of each piece of equipment and summed them up to obtain the integral boiler energy consumption.Some of the calculation results are shown in Table 4. Using the water wall as an example, the calculation results include both the irreversible loss of heat transfer and the irreversible loss of chemical energy from coal converted into heat energy.
Figure 4 presents a schematic diagram of the energy consumption of the boiler system at different loads.The average energy consumption of the unit, based on 872 sample points with over 95% load, was approximately 142.72 g/(kW•h).When the load was below 40%, the energy consumption began to rise significantly.The load dropped below 30% for 44 sample points, and the average energy consumption was about 178.28 g/(kW•h), which was an increase of 29.56 g/(kW•h) compared to 95% load or above, representing an increase of about 20.7%.
Additionally, 141 sample points within the load range of 626-628 were selected, that is, the unit was operating close to 95% load.The boiler system's minimum energy consumption within the range was then statistically obtained to be 140.21g/(kW•h), while the maximum was 149.86 g/(kW•h), resulting in a difference of 9.65 g/(kW•h).These results indicate a significant variation in energy consumption for the target unit's boiler system, highlighting a considerable potential for energy savings.
is, the unit was operating close to 95% load.The boiler system's minimum energy consumption within the range was then statistically obtained to be 140.21g/(kW•h), while the maximum was 149.86 g/(kW•h), resulting in a difference of 9.65 g/(kW•h).These results indicate a significant variation in energy consumption for the target unit's boiler system, highlighting a considerable potential for energy savings.

Removing Outliers
The accuracy of the measuring instrument and means affects the unit's data, leading to potential errors.These errors can increase model inaccuracies and weaken the effectiveness of data mining.Therefore, it is crucial to clean abnormal data before establishing an energy consumption prediction model.For instance, it was discovered that the coal feed measurement point of a coal mill was not 0 when the mill was shut down, but instead of being a small value.To rectify such clearly erroneous data, it was artificially modified to 0. In addition, using the unit operating at 606-607 MW as an example, there are 730 samples in this range.Figure 5 shows the distribution of energy consumption of the boiler system under this condition.It was discovered that the energy consumption of the boiler system differed due to the difference in its operating state and regulation mode, despite having a similar unit operating load.However, it can be assumed that the energy consumption of the boiler system follows a normal distribution under the same working conditions, with particularly high or low energy consumption being rare cases.

Removing Outliers
The accuracy of the measuring instrument and means affects the unit's data, leading to potential errors.These errors can increase model inaccuracies and weaken the effectiveness of data mining.Therefore, it is crucial to clean abnormal data before establishing an energy consumption prediction model.For instance, it was discovered that the coal feed measurement point of a coal mill was not 0 when the mill was shut down, but instead of being a small value.To rectify such clearly erroneous data, it was artificially modified to 0. In addition, using the unit operating at 606-607 MW as an example, there are 730 samples in this range.Figure 5 shows the distribution of energy consumption of the boiler system under this condition.It was discovered that the energy consumption of the boiler system differed due to the difference in its operating state and regulation mode, despite having a similar unit operating load.However, it can be assumed that the energy consumption of the boiler system follows a normal distribution under the same working conditions, with particularly high or low energy consumption being rare cases.Based on the analyses above, the 3-sigma criterion was selected for outlier detection in this paper.First, the dataset was arranged in descending order according to the load order, and the interval was divided into intervals of 50 samples; then, the mean e and standard deviation µ of each interval were calculated, and the samples that were not in the range (e − 3µ, e + 3µ) were recorded as outliers; finally, to ensure continuity and coherence of the dataset, the Lagrange interpolation method was used to calculate a new value based on the two samples preceding and following the outlier.The outlier was then replaced with this new value.Taking the main steam flow rate and main steam pressure as an example, their distribution with unit load before and after data processing is shown in Figure 6.It can be seen that after clearing outliers, the distribution of data was more stable and concentrated, which is more in line with the actual trend.
based on the two samples preceding and following the outlier.The outlier was then replaced with this new value.Taking the main steam flow rate and main steam pressure as an example, their distribution with unit load before and after data processing is shown in Figure 6.It can be seen that after clearing outliers, the distribution of data was more stable and concentrated, which is more in line with the actual trend.

Data Standardization
There are various types of measurement data for thermal power units, such as pressure, temperature, flow rate, and other parameters.To eliminate the influence of different data dimensions and numerical sizes before establishing the prediction model, this paper used extreme value standardization to process the dataset.The formula for this process is given below: where: ij x′ is the standardized data; ij x is the j-th characteristic variable value of the i-th sample data; min ij x is the minimum value of the j-th characteristic variable; max ij x is the maximum value of the j-th characteristic variable.

Feature Selection Result
The MIV algorithm can be used to obtain the MIV value of each input feature for each output.However, since this paper studies a multi-input and multi-output prediction model, the MIV value of each feature for all outputs can be summed to obtain the MIV value of the feature for the entire output.To ensure the objectivity of the research results, 10-fold cross-validation was used for calculation, and the results were averaged.
As shown in Table 5, after taking the absolute value of MIV for each feature, they were arranged in order of size.We were able to obtain the percentage of "sum of current

Feature Selection 4.2.1. Data Standardization
There are various types of measurement data for thermal power units, such as pressure, temperature, flow rate, and other parameters.To eliminate the influence of different data dimensions and numerical sizes before establishing the prediction model, this paper used extreme value standardization to process the dataset.The formula for this process is given below: where: x ′ ij is the standardized data; x ij is the j-th characteristic variable value of the i-th sample data; x min ij is the minimum value of the j-th characteristic variable; x max ij is the maximum value of the j-th characteristic variable.

Feature Selection Result
The MIV algorithm can be used to obtain the MIV value of each input feature for each output.However, since this paper studies a multi-input and multi-output prediction model, the MIV value of each feature for all outputs can be summed to obtain the MIV value of the feature for the entire output.To ensure the objectivity of the research results, 10-fold cross-validation was used for calculation, and the results were averaged.
As shown in Table 5, after taking the absolute value of MIV for each feature, they were arranged in order of size.We were able to obtain the percentage of "sum of current feature's MIV" in "sum of all features' MIV", that is, the correlation degree of all variables from the first feature to the current feature with the model output.In order to ensure the model was more accurate, a threshold of 90% was selected, and the number of features in the filtered dataset was reduced by 50% from 166 to 74.There is a strong correlation between the features of a coal-fired unit.To simplify the dataset, we used Spearman correlation to screen 74 features obtained for the second time.
The threshold was set at 0.95, and redundant features were removed, resulting in a final dataset of 26 features, and the features of the processed dataset are shown in Table 6.

Model Prediction Results
The multi-objective prediction model MIV-ISAO-LSSVM was established by randomly dividing the training and test sets in an 8:2 ratio.In addition to the aRRMSE mentioned earlier, Mean Absolute Error (MAE) was also used to evaluate the model's performance for each output [33], which represents the average absolute error between the predicted and true values of the test set.The formula for calculating MAE is as follows: where N represents the number of samples.The variable y (j) i represents the actual value of target j corresponding to test set sample i, while ŷ(j) i represents the predicted value of target j corresponding to sample i of the test set.
To verify the effect of the LSSVM prediction model optimized by the ISAO algorithm, we established LSSVM, BP, and ELM prediction models based on the same training and test sets.For LSSVM, we set the hyperparameters σ and γ to 30 and 50, respectively, while BP and ELM had a hidden layer containing seven nodes.We also used optimization algorithms such as PSO, SSA, and WOA to optimize the LSSVM model.The three algorithms and the ISAO algorithm were set to T = 50 and N = 20.
Table 7 shows the evaluation results of the prediction models.The aRRMSE of the ISAO-LSSVM model have reduced by 2.13%, 18.12%, and 40.3% compared to the LSSVM, BP, and ELM prediction models, respectively.This suggests that an optimization algorithm can be more effective in determining hyperparameters that align with the actual conditions of the research unit, resulting in better prediction model results than artificially set hyperparameters.Upon comparing the LSSVM prediction model before and after optimization, it was discovered that the optimized model had a higher MAE value in the index of air preheater energy consumption.This indicates a larger deviation in its prediction of air preheater energy consumption.This is due to the optimization algorithm's aim of improving the model's overall performance, rather than focusing on a single output index.The LSSVM prediction model optimized by the four algorithms had the same performance in the evaluation index.The model hyperparameters determined by different optimization algorithms are basically the same; the values of σ and γ determined by ISAO, PSO, SSA, and WOA are 37.7441 and 13.1996, 37.8332 and 13.2014, 37.7441 and 13.1996, and 37.7347 and 13.1998, respectively.Based on the research problem presented in this paper, it has been found that the choice of optimization algorithm has minimal impact on the effectiveness of the prediction model.However, the ISAO algorithm requires the least number of iterations to find the optimal value, which can effectively reduce computing costs.

Conclusions
This paper proposes a boiler system energy consumption prediction model based on the LSSVM algorithm to help analyze the change in energy consumption of ultrasupercritical coal-fired units in the Xinjiang region under flexible peaking demand.The SAO algorithm has been improved to enhance the prediction ability of the original LSSVM algorithm.Additionally, the MIV algorithm has been introduced to optimize the model input and ensure comprehensive input information.This results in accurate prediction of the boiler system's energy consumption.The improved algorithm and model have been verified using historical operation data from the target unit.
(1) Using the single consumption analysis method, we calculated and analyzed the energy consumption distribution of the boiler system in the target unit based on field measurement data under variable load conditions.The results indicate that when the load of the target unit is reduced to less than 30%, the energy consumption of the boiler system increases by approximately 20.7% compared to its consumption under 95% load operation.(2) Compared to other optimization algorithms, the strategy proposed in this study improves the convergence speed of the ISAO algorithm.Although the performances of LSSVM prediction models obtained by different optimization algorithms are similar, the ISAO algorithm can efficiently and accurately obtain the hyperparameters of LSSVM for boiler system energy consumption.(3) A MIV-ISAO-LSSVM model was developed to predict the energy consumption of ultra-supercritical coal-fired boilers under various load conditions.The MIV algorithm reduces the number of dataset features from 166 to 26.This greatly simplifies the model and identifies the main factors that affect the energy consumption of the boiler system.The hyperparameters of the LSSVM model are obtained through the ISAO optimization algorithm.The model demonstrated superior accuracy, reliability, and applicability compared to other models.
The approach to modelling in this paper can be applied to other boilers, but it should be noted that different boilers have different structures or equipment, so the inputs and outputs of the model will be different from those in this paper.The historical data used to train the model for predicting boiler system energy consumption should cover the entire range of working conditions as much as possible.The number of samples should also be maximized to improve the accuracy and applicability of the model across all working conditions.In the future, we aim to integrate the prediction model presented in this paper with boiler control to offer parameter guidance for optimizing boiler operation and achieving low-energy operation without compromising the boiler's normal output.

Figure 1 .
Figure 1.Fitness values of different algorithms.labels (a-f) display the fitness value iteration curves of different optimization algorithms applied to the test functions F3, F8, F13, F18, F23, and F29, respectively.

Figure 1 .
Figure 1.Fitness values of different algorithms.labels (a-f) display the fitness value iteration curves of different optimization algorithms applied to the test functions F3, F8, F13, F18, F23, and F29, respectively.

( 3 )
Compare the fitness values of all individuals in the population, determine the three individuals with the smallest fitness values, and calculate the average of all individuals in the population X; (4) Perform population iteration.The exploitation stage follows Formulas (2) and (6), while the exploration stage follows Formulas (3) and(8).The fitness values of both old and new individuals are compared, and only those with lower fitness values are kept; (5) During the later stage of the algorithm, Formula (7) is used to generate opposing individuals, and their fitness values are also compared.(6) Upon completion of the final iteration, the optimal individual's 'σ' and 'γ' values are extracted and used to train the LSSVM, resulting in an LSSVM prediction model that has been optimized by the ISAO algorithm.

Figure 4 .
Figure 4. Coal consumption change chart of the unit boiler system.

Figure 4 .
Figure 4. Coal consumption change chart of the unit boiler system.

Figure 5 .
Figure 5. Probability distribution of boiler energy consumption.

Figure 6 .
Figure 6.Before and after data processing.

Figure 6 .
Figure 6.Before and after data processing.

Table 2 .
Comparison of different optimizers.
displays the specific flow chart.

Table 3 .
The original dataset.

Table 4 .
Energy consumption of boiler system equipment.

Table 5 .
MIV values for each feature.

Table 6 .
Model input features after filtering.

Table 7 .
Evaluation results of different algorithms.
Author Contributions: Conceptualization, X.M.; software, J.Z.; validation, X.Z.; resources, Z.C.; writing-original draft preparation, J.Z.All authors have read and agreed to the published version of the manuscript.Funding: Major Science and Technology Special Project of Xinjiang Uygur Autonomous Region, grant number 2022A01002-2; Xinjiang Uygur Autonomous Region Tianshan Talent Training Plan, grant number 2022TSYCCX0054; Key Research and Development Task Special Project of Xinjiang Uygur Autonomous Region, grant number 2022B03028-5; Xinjiang Uygur Autonomous Region Tianshan Talent Training Plan, grant number 2022TSYCJC0031; and China College Students' Innovative Entrepreneurial Training Plan Program, grant number 202310755014.