Interpretable Optimized Extreme Gradient Boosting for Prediction of Higher Heating Value from Elemental Composition of Coal Resource to Energy Conversion

García-Nieto, Paulino José; García-Gonzalo, Esperanza; Paredes-Sánchez, José Pablo; Menéndez-García, Luis Alfonso

doi:10.3390/bdcc10040112

Open AccessArticle

Interpretable Optimized Extreme Gradient Boosting for Prediction of Higher Heating Value from Elemental Composition of Coal Resource to Energy Conversion

by

Paulino José García-Nieto

^1,*,

Esperanza García-Gonzalo

¹

,

José Pablo Paredes-Sánchez

²

and

Luis Alfonso Menéndez-García

¹

Department of Mathematics, Faculty of Sciences, University of Oviedo, C/Leopoldo Calvo Sotelo, 18, 33007 Oviedo, Spain

²

Department of Energy, Polytechnic School of Engineering of Gijón, University of Oviedo, C/Luis Ortiz Berrocal, Campus de Gijón, 33203 Gijón, Spain

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2026, 10(4), 112; https://doi.org/10.3390/bdcc10040112

Submission received: 15 February 2026 / Revised: 22 March 2026 / Accepted: 30 March 2026 / Published: 7 April 2026

(This article belongs to the Special Issue Smart Manufacturing in the AI Era)

Download

Browse Figures

Versions Notes

Abstract

The higher heating value (HHV), sometimes referred to as the gross calorific value, is a crucial metric for determining a fuel’s primary energy potential in energy production systems. By combining extreme gradient boosting (XGBoost) with the differential evolution (DE) optimizer, an innovative machine learning-based model was created in this study to forecast the HHV (dependent variable). As input variables, the model included the constituents of the coal’s ultimate analysis: carbon (C), oxygen (O), hydrogen (H), nitrogen (N), and sulfur (S). For comparative purposes, random forest regression (RFR), M5 model tree, multivariate linear regression (MLR), and previously reported empirical correlations were also applied to the experimental dataset. The results showed that the XGBoost strategy produced the most accurate predictions. An initial XGBoost analysis was carried out to identify the relative contribution of the input variables to coal HHV prediction. In particular, for coal HHV estimates reliant on experimental samples, the XGBoost regression produced a correlation coefficient of 0.9858 and a coefficient of determination of 0.9691. The excellent agreement between observed and anticipated values shows that the DE/XGBoost-based approximation performed satisfactorily. Lastly, a synopsis of the investigation’s key conclusions is provided.

Keywords:

higher heating value (HHV); extreme gradient boosting (XGBoost); random forest regression (RFR); M5 model tree; multivariate linear regression (MLR); differential evolution (DE); ultimate analysis

MSC:

62J12; 62G08; 62J02; 68T07

1. Introduction

Coal is a carbonaceous sedimentary rock formed from the gradual transformation of plant matter under geological pressure and temperature conditions over millions of years. During this process, organic residues accumulate in oxygen-deficient environments such as swamps and wetlands, where decomposition is limited. Progressive burial and compaction lead to physical and chemical transformations that increase the carbon concentration and energy density of the material, eventually producing coal as a combustible fossil fuel [1,2].

As coalification progresses, plant material evolves through several stages (see Figure 1), including peat, lignite, bituminous coal, and anthracite, each characterized by increasing carbon content and calorific value. These stages reflect progressive loss of moisture and volatile components while the fixed carbon fraction increases. Consequently, the physicochemical properties of coal, including its heating value and suitability for energy applications, depend strongly on its elemental composition and coalification degree [3,4].

Coal classification and characterization commonly rely on parameters such as moisture content, ash content, volatile matter, and calorific value. In particular, ultimate analysis determines the weight percentages of the main elemental constituents of coal—carbon (C), hydrogen (H), oxygen (O), nitrogen (N), and sulfur (S)—which together describe the primary chemical structure of the fuel and strongly influence its energy performance.

The current global challenges encompass energy, environment, and sustainable development. Extended Sustainable development, environmental preservation, and energy security are current worldwide issues. Long-term reliance on fossil fuels, especially coal, has exacerbated climate change, resource exhaustion, and the greenhouse effect. As a result, investigators from all over the world are actively looking into efficient and environmentally friendly methods of using alternative fuels in the chain of supply for energy [3,4].

Coal plays a major role in the global energy supply. The widespread usage of this solid fossil fuel has been fueled by the growing need for power production and thermal energy uses; by 2030, coal consumption is expected to almost double [5]. Coal remains one of the most abundant and widely used fossil fuels worldwide and can be regarded as a long-term reservoir of stored solar energy. It contains intrinsic moisture and more than 50% organic content, mainly carbon [6].

Specifically, this analytical procedure includes the quantification of carbon by weight percentage, as well as the weight percentages of sulfur, nitrogen, and oxygen. Furthermore, ultimate analysis accounts for trace elements that may be present in coal. Data obtained from the elemental analysis of coal are essential for understanding its combustion behavior, including its heating value and suitability for diverse applications.

Ultimate coal analysis is a valuable method for succinctly characterizing the primary organic elemental composition of coal. In this procedure, the combustion of a representative coal sample is used to determine the weight percentages of hydrogen, sulfur, carbon, and nitrogen. An analyzer system measures the total nitrogen, carbon, and hydrogen from the same sample, while the remaining values are used to calculate the total oxygen content. Figure 2 presents the standardized procedure for conducting ultimate analysis in accordance with established protocols.

In fact, Figure 2 depicts the main stages involved in the ultimate analysis of coal:

Coal sampling: After drying, grinding, and sieving, the coal sample is produced with uniformly small particles.
Laboratory test: To change the components of the coal sample into the appropriate oxides, the coal sample is burned.
Detection: The products of combustion are used to define the composition of the sample for analysis.
Data analysis: The elemental composition and possible applications are predicted using the analysis results.

The energy yield of a solid fuel is determined by its calorific value, often known as its higher heating value (HHV). Consequently, a variety of applications, such as categorization, evaluation of energy potential, assessment of productive use, and precise assessment of commodities markets, depend on the right determination of coal’s HHV [7]. Moreover, insight into the HHV is crucial for the correct plans and procedures of coal-reliant systems [8]. Therefore, it is preferable to create and apply techniques that enable the quick and precise calculation of coal HHV, providing significant money savings over customary laboratory measurements. In keeping with this goal, earlier studies have suggested mathematical correlations, found empirically, for forecasting coal HHV based on the essential components found in the final analysis [9,10,11,12,13,14].

In recent years, machine learning (ML) approaches have increasingly been applied to estimate the higher heating value (HHV) of coal from proximate or ultimate analysis data. Methods such as artificial neural networks, adaptive neuro-fuzzy inference systems, Gaussian process regression, and decision tree-based algorithms have shown promising predictive capability in comparison with traditional empirical correlations in the context of AI and energy-related regression modeling [15,16,17,18]. However, several limitations remain in the existing literature. Many studies focus primarily on predictive accuracy, while paying less attention to model interpretability, robustness to correlated predictors, and systematic hyperparameter optimization. In addition, the performance of some ML approaches may depend strongly on the specific dataset used, which may limit the generalizability of the models when applied to coal samples from different geological basins or coal ranks. Another common limitation is that several models operate as “black-box” predictors, providing limited insight into the relative contribution of elemental variables to the calorific value. Consequently, despite the progress achieved, there remains a need for predictive frameworks that combine high predictive performance with optimization strategies and interpretable analysis of feature importance.

Despite the increasing use of machine learning techniques for estimating the higher heating value (HHV) of coal, several limitations remain in the current literature. Many previous studies have focused on traditional models such as artificial neural networks, decision trees, Gaussian process regression, or hybrid metaheuristic approaches, often emphasizing predictive accuracy without providing a systematic framework for model optimization and interpretability. In particular, the application of extreme gradient boosting (XGBoost) to HHV prediction from ultimate analysis has received very limited attention, and the integration of advanced evolutionary optimization techniques for tuning its hyperparameters has not been sufficiently explored. Moreover, only a few studies have analyzed the relative importance of the elemental composition variables using explainable artificial intelligence tools, which are necessary to better understand the physical relevance of the predictors involved in energy conversion processes.

In this context, ensemble learning methods based on gradient boosting have recently attracted significant attention due to their strong predictive performance and ability to capture nonlinear interactions between variables. Among them, extreme gradient boosting (XGBoost) has proven particularly effective in various regression problems involving complex tabular datasets. Nevertheless, its potential for predicting coal HHV from elemental composition remains insufficiently explored, especially when combined with evolutionary optimization strategies capable of systematically tuning its hyperparameters. Furthermore, integrating explainable artificial intelligence techniques with such models can provide valuable insight into the physical relevance of the elemental predictors involved in fuel energy characterization. The ability of extreme gradient boosting (XGBoost) to forecast the higher heating value (HHV) of coal in various kinds, deposits, and geographical regions has not yet been investigated. The comprehensive characterization of coal, commonly referred to as ultimate analysis, involves the precise determination of its various compositional components.

This work addresses an application that has not been previously explored. Here, XGBoost model [19,20,21,22,23,24,25] is employed tuning its parameters by means of differential evolution (DE) [26,27,28,29,30,31,32,33], which is subsequently used for HHV estimation in coal samples from different deposits and geographical settings.

To assess the coal HHV output variable, the observed dataset was further subjected to random forest regression [34,35,36], M5 model trees [37,38,39], and multivariate linear regression [40,41]. Regression problems are especially well-suited for the XGBoost methodology [19,20,21,22,23,24,25], a method of supervised learning known for its resilience and ability to manage nonlinear connections.

In a number of domains, such as fault location in non-homogeneous multi-terminal direct current (MTDC) systems [42], building energy performance prediction [43], and predictive modeling of blood pressure during hemodialysis [44], extreme gradient boosting (XGBoost) has proven to be effective. Many factors highlight the advantages of the suggested XGBoost method [19,20,21,22,23,24,25]. (1) High predictive performance: XGBoost typically delivers highly accurate regression results by sequentially combining multiple decision trees, thereby correcting errors from previous iterations (boosting); (2) Effective handling of nonlinear relationships: It can capture complex and nonlinear interactions between explanatory variables and the target variable, which many linear models cannot achieve without additional transformations; (3) Integrated regularization to prevent overfitting: XGBoost incorporates L1 and L2 regularization on the trees, controlling model complexity and reducing overfitting, particularly in regressions involving numerous predictors; (4) Robust handling of missing values: The algorithm can automatically determine the optimal direction in a tree for null values without requiring prior imputation, simplifying data preprocessing; (5) High computational efficiency: XGBoost is optimized for speed and memory usage through parallelization, efficient tree pruning, and optimized data structures, making it suitable for large datasets; (6) Feature importance assessment: It provides metrics for feature importance, facilitating model interpretation and identifying which variables most strongly influence the prediction of the continuous variable; and (7) Flexibility in the loss function: It allows the definition of various objective functions for regression (for instance, the Huber loss function, mean squared error, and mean absolute error), enabling adaptation to different problem types and error distributions.

Several machine learning models for estimating coal HHV employing elemental analysis data collected from coal samples with various origins and locations are compared in this investigation. These approaches include the optimized DE/XGBoost-based model, the optimized DE/RFR-based model, the M5 model tree, and multivariate linear regression (MLR). Additionally, the study looks at how five input components—sulfur (S), hydrogen (H), oxygen (O), nitrogen (N), and carbon (C)—affect the accuracy of coal HHV as the objective variable.

By suggesting an efficient and comprehensible machine learning framework for HHV prediction reliant on the ultimate examination of coal samples, the current work attempts to close this research gap. The novelty of this study lies in three main aspects. First, the hyperparameters of the XGBoost regression model are automatically optimized using the differential evolution (DE) algorithm, enabling a systematic search of the parameter space and improving predictive performance. Second, the proposed DE/XGBoost hybrid model is compared with several widely used approaches in HHV prediction, including random forest regression (RFR), M5 model trees, multivariate linear regression (MLR), and classical empirical correlations. Third, the interpretability of the predictive model is enhanced through the use of SHAP (Shapley Additive Explanation) values, which allow the quantification and ranking of the influence of the elemental variables on the predicted HHV. These contributions offer a methodology for calculating the calorific value of coal from its constituent makeup that is more precise, efficient, and comprehensible.

The rest of the paper is structured as follows. First, the instruments and methods required to carry out this inquiry are listed. In the second step, the findings are discussed and presented. After that, the key consequences are explained.

2. Materials and Methods

Nowadays, the two main methods for estimating the heating value (HV) of fuels are calculations based on Dulong’s formulas and actual measurement using a bomb calorimeter [45]. These techniques are reliant on empirical modeling and rely on experimental data from proximate or ultimate studies of solid fuels like coal. To aid in describing the characteristics of fuel, a variety of models and mathematical expressions have been devised [46]. More advanced fuel models can also be employed in this area to further support energy research using statistical machine learning techniques.

2.1. Dataset for Experimentation

The coal dataset used in this study’s primary input variables is the higher heating values (HHVs) that correspond to experimental ultimate analyses. These physicochemical variables form the basis of the dataset employed in the DE/XGBoost, DE-optimized RFR, M5 model tree, and MLR approaches.

Higher heating value (HHV), carbon (C) content, hydrogen (H) content, sulfur (S) content, oxygen (O) content, and nitrogen (N) content were the variables employed in the laboratory examinations of 318 coal samples. They include samples from different coal ranks, mainly sub-bituminous and bituminous coals, representing a wide range of geological formations. The dataset, which includes a wide range of parent coals from different geographical origins obtained from the worldwide scientific literature, is reliant on earlier investigations of coal as a solid fuel [47].

Based on the coal assessments, the main input parameters for this investigation were carefully chosen. One essential characteristic of coal-based energy systems is HHV. It is crucial to use sophisticated models within machine learning frameworks to assess energy performance and investigate feasible fuel substitutes. In the energy sector, these models play a crucial role in helping researchers characterize and manage solid fuels [48,49,50,51].

Before using a solid fuel in thermal applications, especially coal, it is crucial to determine its heating value (HV) [52]. HV measures how much heat can be obtained when a specific mass of fuel burns. However, because water in the mass fuel evaporates during burning, some of this heat is preserved as latent heat. In this application, the HHV represents the maximum energy that can be released during burning, including latent heat (from the condensation of water vapor). An adiabatic bomb calorimeter is commonly employed for laboratory HHV measurement; however, this procedure is not always feasible and can be expensive [53]. Alternatively, the ultimate analysis of a solid fuel can provide valuable information on its chemical and physical properties. The weight percentages of H, O, N, C, and S can be calculated using elemental analysis [54].

HHV is a crucial metric since it offers vital details about a fuel’s energy performance and efficiency. It is extensively used in transportation, industrial processes, heating, and power generation. These metrics are crucial for energy management that adheres to standard operating procedures in order to fully characterize the fuel behavior of coal. As a result, the model’s input variables were chosen as follows (see Table 1).

Ultimate analysis:

Oxygen (O): This is used in the elemental analysis to describe the coal’s oxygen concentration. It serves as a means of expressing the degree of coalification process progression.
Sulfur (S): Elemental analysis is employed to determine the sulfur content in coal. It illustrates how burning coal may contribute to the production of SO_x.
Nitrogen (N): This is the nitrogen component that makes up the atomic structure of coal. It shows that burning coal has the ability to produce NO_x.
Hydrogen (H): The hydrogen content of the coal is ascertained using elemental analysis. It illustrates the role of volatile components in coal.
Carbon (C): It shows the percentage of carbon that makes up the main atomic structure of the sample. It shows how the coalification process changes throughout time.

The five physicochemical predictors that were employed as input variables for the DE/XGBoost model under investigation are shown in Table 1. Furthermore, the aim variable in this research is the HHV of coal, which is derived from test samples that represent different types of coal.

2.2. Mathematical Modeling Methods

2.2.1. Extreme Gradient Boosting (XGBoost) Regression Model

Extreme gradient boosting (XGBoost) is a boosting-based ML approach that has shown strong predictive performance in regression and classification tasks. In this article, we focus on the application of XGBoost to regression, a problem where the goal is to predict a continuous variable. Unlike other traditional ensemble approaches, the XGBoost model efficiently optimizes both predictive accuracy and computation time, thus achieving a balance between precision and speed [19,20,21,22,23,24,25].

The boosting algorithm is an approach in which several weak models (usually decision trees) are combined to form a strong model. In the case of regression, the goal is to minimize the loss function [19,20,21,22,23,24,25]:

L (θ) = \sum_{i = 1}^{N} L (y_{i}, {\hat{y}}_{i})

(1)

where

y_{i}

is the actual value of the response and

{\hat{y}}_{i}

is the prediction of the model. The XGBoost process can be formulated as an iterative optimization algorithm, where in each iteration t, a model

h_{t} (x)

is built that minimizes the residual loss function [19,20,21,22,23,24,25,42,43,44]:

F_{t} (x) = F_{t - 1} (x) + η h_{t} (x)

(2)

where

η

is the learning rate,

F_{t - 1} (x)

is the cumulative prediction at step

t - 1

and

h_{t} (x)

is the model added at step t.

In XGBoost, the base model is a decision tree. Given a set of features x, the decision tree performs partitions on the feature space to predict an output

\hat{y}

. Each tree is represented as a series of binary decisions that result in a prediction of the form [19,20,21,22,23,24,25,42,43,44]:

\hat{y} = \sum_{j = 1}^{J} w_{j} I (x \in ℜ_{j})

(3)

where

ℜ_{j}

is a region of the feature space (a terminal node),

w_{j}

is the prediction value at that node, and

I

is an indicator function that takes the value 1 if

I

belongs to

ℜ_{j}

and 0 otherwise.

The key to XGBoost’s success lies in its ability to optimize both the loss function and the complexity cost of trees, preventing overfitting. Instead of simply minimizing the loss function, XGBoost optimizes a regularized objective function composed of two terms [19,20,21,22,23,24,25,42,43,44]:

O (T) = \sum_{i = 1}^{N} L (y_{i}, {\hat{y}}_{i}) + Ω (f)

(4)

where

L

is the loss function (commonly the mean squared error in the regression), and

Ω (f)

is a regularization term that penalizes the complexity of the model [19,20,21,22,23,24,25]:

Ω (f) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}

(5)

This expression contains the following terms:

T is the number of terminal nodes in the tree;
$γ$ is the regularization parameter that controls the complexity of the tree;
$λ$ is the regularization parameter that penalizes the weights of the terminal nodes.

The XGBoost optimization algorithm is based on a quadratic approximation of the loss function around the current prediction, allowing for efficient updating of the model parameters. At each step t, given a set of trees

\{h_{1}, h_{2}, \dots, h_{t - 1}\}

, the prediction

F_{t} (x)

is updated using a second-order approximation of the loss function [19,20,21,22,23,24,25]:

L_{t} = \sum_{i = 1}^{N} [L (y_{i}, F_{t - 1} (x_{i}) + Δ F) + \frac{1}{2} Δ F^{2} h_{i}^{″}]

(6)

where

h_{i}^{″}

is the second derivative of the loss function with respect to the predictions of the previous models, and

Δ F

is the change in the prediction that minimizes the error.

The optimization of the parameters of each tree

h_{t} (x)

is performed using a feature-space partitioning algorithm. The solution consists of finding the partition

s_{j}

at node j that minimizes the regularized loss function. For a node j, the value

w_{j}

is calculated as follows [19,20,21,22,23,24,25,42,43,44]:

w_{j} = - \frac{\sum_{i \in R_{j}} g_{i}}{\sum_{i \in R_{j}} h_{i} + λ}

(7)

where

R_{j}

is the set of samples that fall on node j,

g_{i}

is the gradient of the loss function and

λ

is the regularization parameter.

In summary, the XGBoost algorithm applied to regression is a highly efficient and flexible method that combines the power of decision trees with advanced optimization techniques to minimize the regularized loss function. XGBoost’s ability to handle large volumes of data and its robustness against overfitting make it a powerful tool in a variety of regression applications. Furthermore, second-order optimization and the regularization of tree complexity are key to its success in real-world scenarios. Figure 3 shows an illustration of an XGBoost regression process.

Additionally, the XGBoost method’s characteristic parameters are succinctly described as follows [19,20,21,22,23,24,25,42,43,44]:

Learning rate $(η)$ : It controls how much each new tree contributes to the final model, scaling the impact of the predictions that tree adds in each iteration of the boosting process. XGBoost builds trees sequentially, where each new tree corrects the errors of the previous ones. The learning rate determines how large that correction step is: a high learning rate makes for large and rapid corrections, while a low learning rate makes smaller and more gradual corrections, that is to say, the step size shrinkage is reduced in each update to prevent overfitting.
Maximum depth (max_depth): It sets the maximum depth that each decision tree built by XGBoost can reach, i.e., the maximum number of levels (or divisions) from the root node to a leaf.
Subsample ratio of the training instances: It indicates the fraction of observations from the training set that is randomly selected to train each decision tree within the boosting process. Setting it to 0.5 means that XGBoost would randomly sample half of the training data prior to growing trees and this will prevent overfitting. Subsampling will occur once in every boosting iteration.
Colsample_bytree: This is the subsample ratio of columns when constructing each tree. Subsampling occurs once for every tree constructed. It specifies the fraction of variables (features) that are randomly selected to build each decision tree within the XGBoost model.
n_estimators: This indicates the total number of decision trees (estimators) that are built and sequentially added to form the final model.

Consequently, applying a mathematical approach that precisely ascertains the previously mentioned hyperparameters makes sense. The differential evolution (DE) optimizer, outlined below, performed effectively in this study.

2.2.2. Random Forest Regression (RFR)

Let

(X, B (X))

be a measurable space of explanatory variables and let

Y \subset ℜ

be the response space. It is assumed that the observed data

D_{n} = {\{X_{i}, Y_{i}\}}_{i = 1}^{n}

constitute an i.i.d. sample from an unknown joint distribution

P_{X Y}

. The goal of regression is to estimate the true regression function [34,35,36]:

m (x) = E [Y |X = x]

(8)

minimizing the quadratic risk:

Q_risk (m) = E [{(Y - m (X))}^{2}]

(9)

A regression tree induces a finite partition of the input space:

X = \cup_{k = 1}^{K} R_{k}, R_{k} \cap R_{k^{'}} = \emptyset if k \neq k^{'}

(10)

where each

R_{k}

is a region (cell) defined by recursive partitioning rules of the axis-aligned type. The estimator associated with the tree T is defined as follows [34,35,36]:

{\hat{m}}_{T} (x) = \sum_{k = 1}^{K} {\hat{c}}_{k} Ι (x \in R_{k})

(11)

where

{\hat{c}}_{k} = \frac{1}{|D_{k}|} \sum_{i : X_{i} \in R_{k}} Y_{i}

(12)

is the empirical mean of the observations that fall within the region

R_{k}

. The tree construction is based on the local minimization of the empirical quadratic error [34,35,36]:

\sum_{k = 1}^{K} \sum_{i : X_{i} \in R_{k}} {(Y_{i} - {\hat{c}}_{k})}^{2}

(13)

From a statistical point of view, an individual regression tree presents the following:

Low bias, due to its high flexibility;
High variance, as a consequence of its strong dependence on sample disturbances.

Formally, for an estimator

{\hat{m}}_{T} (x)

, the variance

Var [{\hat{m}}_{T} (x)]

is high, which motivates the use of aggregation techniques. Indeed, let B be the number of bootstrap replicates. For each

b \in \{1, \dots, B\}

, a bootstrap sample

D_{n}^{(b)} = {\{X_{i}^{(b)}, Y_{i}^{(b)}\}}_{i = 1}^{n}

is generated and obtained by sampling with a replacement from

D_{n}

. Then, a regression tree

T_{b}

is fitted to each sample and the aggregate estimator is defined as follows [34,35,36]:

{\hat{m}}_{bag} (x) = \frac{1}{B} \sum_{b = 1}^{B} {\hat{m}}_{T_{b}} (x)

(14)

Bagging reduces the variance of the estimator, given by the following equation [34,35,36]:

Var ({\hat{m}}_{bag} (x)) = \frac{1}{B^{2}} \sum_{b, b^{'}} Cov ({\hat{m}}_{T_{b}} (x), {\hat{m}}_{T_{b^{'}}} (x))

(15)

Random forest introduces additional randomization in the process of building each tree, namely, a random selection of variables. At each node of the tree, instead of considering the complete set of variables

\{1, \dots, p\}

, a subset is randomly selected:

M \subset \{1, \dots, p\}, |M| = m_{try}

(16)

The best partition is chosen only from these variables. The random forest estimator for regression is defined as follows [34,35,36]:

{\hat{m}}_{RF} (x) = \frac{1}{B} \sum_{b = 1}^{B} {\hat{m}}_{T_{b}^{RF}} (x)

(17)

where each

T_{b}^{RF}

is a tree trained on a bootstrap sample and with a random selection of variables in each node.

From a probabilistic perspective, random forest can be interpreted as a Monte Carlo-type approximation of an expected estimator [34,35,36]:

{\hat{m}}_{RF} (x) \approx E_{Θ} [\hat{m} (x; Θ)]

(18)

where Θ represents the set of random variables that govern the following:

The bootstrap sample;
The selection of variables;
The partitioning process.

Under standard assumptions, the mean squared error decomposition satisfies the following:

E [{({\hat{m}}_{RF} (x) - \hat{m} (x))}^{2}] = Bias ({\hat{m}}_{RF} (x)) + Var ({\hat{m}}_{RF} (x))

(19)

Therefore, random forest maintains a bias comparable to that of a deep tree and significantly reduces variance by decreasing the correlation between trees [34,35,36]:

Var ({\hat{m}}_{RF} (x)) \approx ρ σ^{2} + \frac{1 - ρ}{B} σ^{2}

(20)

where

ρ

is the average correlation between individual trees.

Under appropriate conditions regarding the growth of the number of trees B→∞, the tree depth and the minimum leaf size, the random forest estimator has been shown to be consistent:

{\hat{m}}_{RF} (x) \overset{P}{\to} \hat{m} (x) for almost all x \in X

(21)

Random forest can be viewed as a nonparametric, adaptive, and random-partitioned estimator that combines the following principles:

Adaptive histogram estimation;
Aggregation methods;
Variance reduction through structural decorrelation.

An illustration of this algorithm is shown in Figure 4.

Moreover, the following presents a succinct overview of the key parameters of the random forest regression (RFR) method [34,35,36]:

Node size: It sets the minimum number of observations that a terminal (leaf) node of a decision tree within the RFR must contain, that is to say, the minimum size of terminal nodes. Setting this number larger causes smaller trees to grow (and thus takes less time).
ntree indicates the total number of decision trees that are built to form the random forest, whose predictions are combined (averaged) to obtain the final prediction in regression.
nPerm indicates the number of random permutations (number of times the out-of-bag data are permuted per tree) performed to calculate the importance of variables based on error permutation, and does not directly affect the model training, but rather the estimation of the importance of variables.
mtry indicates the number of variables (predictors) that are randomly selected and evaluated as candidates in each split of a decision tree within the random forest.

2.2.3. M5 Model Tree

Decision trees are nonparametric predictive models that recursively divide the feature space into disjoint regions, with the goal of minimizing an error criterion [37,38,39]. While classification trees assign discrete labels, regression trees estimate continuous values, traditionally using the mean of the observations in each leaf.

However, this approach can be limited when the relationship between the predictor variables and the response variable is approximately linear in local regions of the input space, but not globally.

The M5 algorithm, originally proposed by Quinlan in 1992 [37] and later improved [38,39], introduces the concept of model trees. Unlike classical regression trees, M5 replaces constant values in the leaves with multivariate linear regression models, allowing for a more accurate capture of local linear relationships. The main objective of the M5 model tree is to combine the interpretability of decision trees with the predictive capacity of linear models. An M5 model tree consists of the following [38,39]:

Internal nodes, which represent binary divisions based on an explanatory variable;
Leaves, containing a linear regression model of the following form:

\hat{y} = β_{0} + \sum_{i = 1}^{p} β_{i} x_{i}

(22)

where

$\hat{y}$ is the target variable;
$x_{i}$ are the predictor variables;
$β_{i}$ are the coefficients estimated using least squares.

Each leaf thus defines a region of the input space where an approximately linear relationship is assumed.

The M5 algorithm selects the divisions using the expected reduction in the standard deviation of the target variable, instead of the classical variance [38,39]:

Δ S D = S D (T) - \sum_{k} \frac{|T_{k}|}{|T|} S D (T_{k})

(23)

where

T is the dataset in the node;
$T_{k}$ are the subsets resulting from the partition;
$S D (\cdot)$ is the standard deviation.

This criterion favors divisions that generate more homogeneous subsets in terms of the value to be predicted.

Once the tree is constructed, a linear regression model is fitted to each leaf using the observations that reach that leaf. The following should be considered to avoid overfitting:

Irrelevant variables are eliminated through backward selection;
The model’s complexity is penalized when the number of observations is small.

The M5 employs a pruning strategy based on estimated error, comparing the following:

The error of the entire subtree;
Against the error of a single linear model at the node.

If the linear model has a lower estimated error, the subtree is replaced by that model. This process improves the tree’s generalizability.

To avoid abrupt discontinuities between adjacent leaves, M5 applies a smoothing process that combines the prediction of the leaf model with that of its ancestor nodes [38,39]:

{\hat{y}}_{s m o o t h} = \frac{n \cdot {\hat{y}}_{l e a f} + k \cdot {\hat{y}}_{´ p a r e n t}}{n + k}

(24)

where

n is the number of instances in the leaf;
k is a smoothing parameter.

This mechanism improves the stability of the predictions. Some advantages of the M5 model tree are as follows:

High interpretability;
Ability to model global nonlinear relationships using local linear approximations;
Good performance on complex regression problems.

Some limitations of the M5 model tree are as follows:

Sensitivity to noise in small datasets;
Assumption of local linearity;
Higher computational complexity than simple regression trees.

An illustration of a simple M5 model tree can be seen in Figure 5.

2.2.4. Multivariate Linear Regression (MLR)

Let

{\{x_{i}, y_{i}\}}_{i = 1}^{n}

be a sample of independent observations, where

x_{i} \in ℜ^{p}

is a vector of explanatory variables and

y_{i} \in ℜ

is the scalar response variable. The multivariate linear regression model is defined as follows [40,41]:

y_{i} = β_{0} + x_{i}^{T} β + ε_{i}, i = 1, 2, \dots, n

(25)

where

$β_{0} \in ℜ$ is the independent term or intercept;
$β \in ℜ^{p}$ is the parameter vector;
$ε_{i}$ is the term for a random error.

In matrix notation, the design matrix is designed as follows [40,41]:

X = [\begin{matrix} 1 & x_{11} & \dots & x_{1 p} \\ 1 & x_{21} & \dots & x_{2 p} \\ \begin{array}{l} . \\ . \\ . \end{array} & \begin{array}{l} . \\ . \\ . \end{array} & \begin{array}{l} . \\ . \\ . \end{array} & \begin{array}{l} . \\ . \\ . \end{array} \\ 1 & x_{n 1} & \dots & x_{n p} \end{matrix}] \in ℜ^{n \times (p + 1)}

(26)

The parameter vector

θ = {(β_{0}, β^{T})}^{T}

and the response vector

y = {(y_{1}, y_{2}, \dots, y_{n})}^{T}

; the model is expressed as follows [40,41]:

y = X θ + ε

(27)

The ordinary least squares (OLSs) estimator is defined as the solution to the optimization problem [40,41]:

\hat{θ} = \arg \min_{θ \in ℜ^{p + 1}} {‖y - X θ‖}_{2}^{2}

(28)

The objective function is convex and differentiable. The first-order condition leads to the following normal equations [40,41]:

X^{T} X \hat{θ} = X^{T} y

(29)

Under the full-range assumption, the estimator has a closed form [40,41]:

\hat{θ} = {(X^{T} X)}^{- 1} X^{T} y

(30)

The statistical properties of the OLS estimator are as follows [40,41]:

1.: OLSs is an unbiased estimator, that is, under the classical assumptions $E [\hat{θ} |X] = θ$ ;
2.: Variance and covariance matrix: $Var (\hat{θ} |X) = σ^{2} {(X^{T} X)}^{- 1}$ . An unbiased estimator of $σ^{2}$ is given by ${\hat{σ}}^{2} = \frac{1}{n - (p + 1)} {‖y - X \hat{θ}‖}_{2}^{2}$ ;
3.: Gauss–Markov theorem: The OLSs estimator is the best linear unbiased estimator, in the sense that it minimizes the variance within the class of linear unbiased estimators.

An illustration of the multivariate linear regression model is shown in Figure 6.

2.2.5. Differential Evolution (DE) Optimization Algorithm

Let

f : Ω \subset ℜ^{d} \to ℜ

be a real objective function, defined over a compact domain

Ω = \prod_{j = 1}^{d} [l_{j}, u_{j}]

, where

l_{j} < u_{j}, \forall j = 1, \dots, d

. The global optimization problem is formulated as follows [26,27,28,29,30,31,32,33]:

\min_{x \in Ω} f (x)

(31)

However, the formulation is analogous for maximization problems.

The differential evolution (DE) algorithm belongs to the family of population stochastic methods for continuous and non-convex optimization, characterized by not requiring derivative information or assumptions of convexity or smoothness beyond the evaluability of f [26,27,28,29,30,31,32,33]. In DE, the state of the algorithm at generation

g \in ℕ

is given by a finite population of real vectors

P^{(g)} = \{x_{i}^{(g)} \in Ω |i = 1, \dots, N\}

, where

N \in ℕ

is the fixed population size. Each individual

x_{i}^{(g)} = (x_{i, 1}^{(g)}, \dots, x_{i, d}^{(g)})

represents a candidate solution in the search space. The initial population

P^{(0)}

is typically generated by independent and uniform random sampling in

Ω

.

The distinctive feature of DE is the use of vector differences as an exploration mechanism. For each target vector

x_{i}^{(g)}

, a mutant vector

v_{i}^{(g)}

is constructed according to the general scheme [26,27,28,29,30,31,32,33]:

v_{i}^{(g)} = x_{r_{0}}^{(g)} + F \sum_{k = 1}^{K} (x_{r_{2 k - 1}}^{(g)} - x_{r_{2 k}}^{(g)})

(32)

where

$r_{0}, r_{1}, \dots, r_{2 K} \in \{1, \dots, N\}$ are mutually distinct indices and distinct from i;
F is the differential scaling factor;
K determines the number of vector differences.

The most common scheme, known as DE/rand/1, corresponds to the case

K = 1

[26,27,28,29,30,31,32,33]:

v_{i}^{(g)} = x_{r_{1}}^{(g)} + F (x_{r_{2}}^{(g)} - x_{r_{3}}^{(g)})

(33)

From a geometric point of view, this operator induces an adaptive exploration whose scale depends on the current dispersion of the population.

The mutant operator

v_{i}^{(g)}

combines with the target vector

x_{i}^{(g)}

to produce a test vector

u_{i}^{(g)}

. In the case of binomial recombination, it is defined component by components such as the following [26,27,28,29,30,31,32,33]:

u_{i j}^{(g)} = {\begin{matrix} v_{i j}^{(g)} & if & r_{j} \leq C_{r} or j = j^{*} \\ x_{i j}^{(g)} & otherwise \end{matrix}}

(34)

where

$r_{j} \sim U (0, 1)$ are independent random variables;
$C_{r} \in [0, 1]$ is the probability of recombination (crossover);
$j^{*} \in \{1, \dots, d\}$ is a randomly chosen index to ensure that at least one component comes from the mutant.

This operator introduces a controlled compromise between exploration (mutated components) and exploitation (inherited components).

The selection in the DE optimizer is deterministic and local. The trial vector competes exclusively with its corresponding objective vector [26,27,28,29,30,31,32,33]:

x_{i}^{(g + 1)} = {\begin{matrix} u_{i}^{(g)} & if & f (u_{i}^{(g)}) \leq f (x_{i}^{(g)}) \\ x_{i}^{(g)} & otherwise \end{matrix}}

(35)

This mechanism implements a form of one-to-one elite selection, ensuring that the best fitness of the population does not worsen over generations.

From a mathematical perspective, the DE optimizer can be interpreted as an adaptive search method that uses empirical estimates of decline directions implicit in the population structure, without resorting to explicit gradients. Its algorithmic simplicity contrasts with the richness of its stochastic dynamics, which explains both its practical effectiveness and the sustained theoretical interest in its analysis.

2.3. The Accuracy of This Approximation

The coefficient of determination (R²) is the main goodness-of-fit metric utilized in this study for the regression problem [55,56,57]. Here, we will use the following formulations [55,56,57] to describe the observed and forecasted values, respectively:

$S S_{r e g} = \sum_{i = 1}^{n} {(y_{i} - \bar{t})}^{2}$ denotes the sum of squares explained;
$S S_{t o t} = \sum_{i = 1}^{n} {(t_{i} - \bar{t})}^{2}$ : There is a clear association between this summation and sample variance;
$S S_{e r r} = \sum_{i = 1}^{n} {(t_{i} - y_{i})}^{2}$ is referred to as the residual sum of squares.

where

\bar{t}

represents the experimental data’s mean value, which can be obtained using the following:

\bar{t} = \frac{1}{n} \sum_{i = 1}^{n} t_{i}

(36)

Hence, the coefficient of determination is given as follows [55,56,57]:

R^{2} \equiv 1 - \frac{S S_{e r r}}{S S_{t o t}}

(37)

The R² statistic approaches 1.0 as the discrepancy between the experimental and projected data decreases.

For a comparison of their estimates employing the coefficient of determination R², a number of approximations have been created, including the DE/RFR, M5 model tree, and MLR approaches employed in this study and the XGBoost-based methodology previously reported. In these approaches, the HHV of the coal is the dependent variable to be foretold, based on the independent variables. They comprise the five physicochemical characteristics of the coal samples used in the experiment [47].

Moreover, as stated earlier, the learning rate

η

, maximum depth (max_depth), fraction of rows from the dataset used in each tree (subsample), fraction of randomly selected columns for each tree (colsample_bytree) and number of decision trees (n_estimators) are the XGBoost hyperparameters that have the biggest impact on the XGBoost approximation. Similarly, the node size, total number of decision trees (ntree), number of random permutations carried out to determine variables importance (nPerm) and number of variables (predictors) randomly selected (mtry) are the RFR hyperparameters that most notably affect the RFR approach. The method employed in this study to identify the optimal hyperparameters for both the XGBoost and RFR approaches is the differential evolution (DE) optimizer [26,27,28,29,30,31,32,33]. This entails a thorough search by applying the machine learning statistical technique to a preselected subset of the parameter space.

This effectively divides the dataset into two parts, with eighty percent of the data going to the training set and twenty percent going to testing. The DE/XGBoost and DE/RFR models are then developed using the training set. To gauge the parameters of the DE/XGBoost and DE/RFR approaches, we use the DE algorithm combined with a ten-fold cross-validation approach [55,58,59]. The entire training dataset was employed to construct the model when the optimal parameters were determined. It then goes on to predict the testing set’s components employing this model. We next evaluate the approach’s goodness-of-fit by contrasting these predictions with the observed values. The process diagram for the best DE/XGBoost-based approach employed in this research is displayed in Figure 7.

Furthermore, the true coefficient of determination (R²) in this situation is frequently ascertained using cross-validation [55,58,59]. Specifically, this study used a k-fold cross-validation approximation (in this study, with

k = 10

) to evaluate the prediction performance of the DE/XGBoost-based approximation [58,59]. The regression modeling technique was undertaken utilizing the R software tools described below:

XGBoost approach: xgboost package in R software (version 1.7.11.1) [60,61];
DE optimizer: metaheuristicOpt package in R software (version 2.2.8) [62,63,64];
Additionally, the randomForest package in R project (version 4.7-1.2) was used to implement RFR [60,65], the cubist package for the M5 model tree in R software (version 1.10) [60,66] and MLR models [66].

Table 2 and Table 3 display the variation intervals of the five factors of the DE/XGBoost method and four factors of the DE/RFR method utilized in this investigation, respectively.

The DE optimizer works well for adjusting the hyperparameters for the XGBoost and RFR models. By examining each iteration’s cross-validation error, the optimal values for the five hyperparameters can be determined using the DE. The search space has five dimensions for XGBoost and four for RFR.

3. Results and Discussion

Figure 8 displays the correlation matrix for all variables considered in the ultimate analysis.

A high negative Pearson correlation coefficient of 0.87 was observed between carbon (C) and oxygen (O) content, indicating significant multicollinearity between these variables. This relationship, inherent to the stoichiometric structure of carbonaceous matter, poses challenges for model interpretation and stability, even for robust algorithms like XGBoost. While XGBoost can accommodate correlated predictors, the inclusion of oxygen may distort the assessed relative importance of variables, artificially inflating its perceived contribution due to its inverse relationship with carbon—the primary determinant of the higher heating value (HHV).

From a thermodynamic standpoint, the oxygen in coal does not directly contribute to energy release during combustion. Instead, it is associated with functional groups such as hydroxyl (–OH) or carbonyl (C=O), which reduce the availability of carbon and hydrogen for oxidation but do not generate significant heat. Oxygen thus acts as an inverse indicator of coal rank, reflecting lower energy quality. For instance, empirical formulas like Dulong’s often disregard or correct for oxygen to simplify calculations without compromising accuracy [45]. Thus, the decision to omit oxygen is justified by its redundancy, lack of direct energetic contribution, and adherence to the principle of parsimony, resulting in a more robust and theoretically consistent model. Consequently, oxygen is treated as a redundant variable, whose exclusion optimizes the performance of the DE/XGBoost model in both statistical and physical terms.

Table 4 and Table 5 illustrate the optimal hyperparameters obtained for the optimized XGBoost-based and RFR-based techniques for the coal’s HHV, as produced by the differential evolution (DE) optimizer, respectively.

For comparison purposes, this investigation also employed the M5 model tree and multivariate linear regression (MLR) models.

Figure 9 displays the DE/XGBoost method’s first-order terms. This picture makes it easier to grasp the connections between the many input factors utilized in this method. For example, with the other four input variables maintained constant, the coal’s HHV is plotted on the Y-axis versus the carbon concentration (C) on the X-axis (see the first graph in Figure 9). The second and third graphs in Figure 9 show the coal’s higher heating value on the Y-axis in relation to the hydrogen and nitrogen concentrations on the X-axis, respectively, with all other input variables held constant.

In a similar manner, the second-order terms of the DE/XGBoost technique are shown in Figure 10. Furthermore, when all other factors are held constant, the first graph in Figure 10a shows the coal’s HHV on the Z-axis as a result of the hydrogen concentration on the Y-axis and the carbon composition on the X-axis. Similar patterns can be seen in the other graphs in Figure 10b,c, which plot the coal HHV on the Z-axis against the contents of carbon and nitrogen on the X-axis, and hydrogen and nitrogen on the Y-axis, respectively, while keeping the other variables constant.

Table 6 compiles representative empirical correlations documented in the literature for estimating coal HHV. These expressions are formulated from the elemental composition obtained by ultimate analysis, using the mass fractions of the principal coal constituents as predictor variables. The formulas capture synergistic effects such as the oxidation of sulfur or the energetic contribution of hydrogen, which improves predictive accuracy over simple linear models by reflecting the complexity of the carbonaceous matrix.

The DE-XGBoost, DE-RFR, M5 model tree, and multivariate linear regression models’ coefficients of determination and correlation are presented in Table 7, together with the results for F6 [12], the best-performing empirical correlation, using the test dataset.

The most current statistical estimates indicate that the XGBoost technique is the optimal model for predicting the coal HHV as a dependent factor for different types of coal. For the coal HHV factor, this approach yielded a coefficient of determination of 0.9691 and a correlation value of 0.9858. This choice shows a consistent goodness-of-fit, which suggests that the XGBoost method and the data from the experimentally collected measurements of the samples agree appropriately.

Importance of the Variables

Each feature’s contribution to a machine learning model’s prediction for a particular instance is represented by its SHAP value (Shapley Additive Explanation value) [67]. It is modified for feature importance in predictive models and Shapley values from cooperative game theory serve as its foundation. Any machine learning model, including intricate models like XGBoost, can be used with SHAP values since it is model-agnostic. We can reliably understand the importance of variables across many models thanks to this flexibility. SHAP offers explanations that are both local (individual prediction) and global (overall feature importance). This dual capability aids in our comprehension of how factors affect the model’s predictions at various granularities. This ensures that the feature priority is assigned in a fair and accurate manner, reflecting each feature’s real contribution to the model’s output [67,68]. The degree to which a feature influences a model’s prediction in comparison to its baseline (or expected) output is measured by a SHAP value. Negative SHAP values indicate that a feature decreases the predicted response, whereas positive SHAP values indicate that it increases it. The spread of points along the x-axis reflects how the contribution of that feature varies across individual observations.

A popular way to gauge a variable’s importance is to examine its average absolute SHAP value for every sample. This index measures the average contribution of the variable (in terms of magnitude, irrespective of direction) to the model’s predictions [68]:

Variable Importance = \frac{1}{n} \sum_{i = 1}^{n} |S H A P v a l u e_{i}|

(38)

In this case, there are n samples. A greater impact of the variable on the model’s estimates is shown by greater average absolute SHAP values.

The relevance and impacts of the input variables are displayed in the summary graph of the SHAP technique. Each point is a Shapley value for a particular occurrence and input variable. The x-axis displays the Shapley value, and the y-axis represents the input variable. The input variable’s value is indicated using colors. Along the y-axis, points are varied quickly to improve the display of the Shapley values’ distribution for a specific variable. The order of the variables indicates their relative importance: the higher the value on the y-axis, the more significant the variable. A favorable relationship between the value of the characteristic and its SHAP value is indicated by a gradient trend, such as red dots pointing to the right. Complex or nonlinear relationships are shown if both sides have blue and red marks. For example, the SHAP values for forecasting the coal HHV are shown in Figure 11.

An additional result of these analyses—the hierarchical relevance of the process variables (input factors) in forecasting the coal HHV (output-dependent factor) for this complex investigation—is displayed in Table 8 and Figure 12. According to the XGBoost framework, the process variable carbon amount (C) emerges as the major predictor of the output variable coal HHV. Hydrogen amount (H), nitrogen amount (N), and sulfur amount (S) come next, in decreasing order of significance.

For solid fuels like coal to burn, one must understand their elemental composition, i.e., by using ultimate analysis, because it determines the thermal energy production, HHV, and energy management, i.e., the potential of the pollutant emissions such as NO_x or SO₂. The ultimate analysis shows the S, N, O, H, and C percentages.

In energy research, carbon (C) and hydrogen (H) are the key parameters used to estimate HHV, whereas nitrogen (N) and sulfur (S) gain relevance mainly due to their sustainable energy management by energy conversion. In this context, carbon (C), as the main component of all carbonaceous fuels, has a decisive role in the material’s energy performance [69].

As per the DE/XGBoost-based approximation ranking order, as it directly contributes to the energy released during combustion and is the most significant indication of the coalification grade, the element (C) is the key component of the suggested model. Carbon is one of the main energetic constituents of coal, as its oxidation produces a substantial amount of heat per unit mass. The higher carbon content in coal, for example, when comparing anthracite with lignit, the greater its calorific value, making carbon the dominant variable in the calculation of energy conversion and evaluation of energy management in tehcnologies such as co-combustion of fuels, oxy-fuel combustion, emissions from furnaces, and carbon capture and sequestration [70].

Hydrogen also contributes significantly to the HHV, as its combustion produces water and releases a greater amount of heat per unit mass than carbon, although its content in coal is usually lower. In the definition of HHV, it is assumed that the water formed from hydrogen combustion condenses, thereby recovering the latent heat of vaporization, which substantially increases its energetic contribution in the energy conversion processes [71]. In this sense, the oxygen in coal is bonded partly to hydrogen (for example, in hydroxyl groups) and mainly to carbon (for example, in carbonyl groups) [69].

Nitrogen does not significantly contribute to the fuel energy; instead, part of the available energy is dissipated in the formation of nitrogen oxides, which are pollutant compounds generated through chemical reactions. Sulfur does release heat upon oxidation, but its content in coal is relatively low and its specific energy is much smaller than that of carbon or hydrogen, so its contribution to energy production is limited; however, it is relevant in the control of energy production due to the sulfur–nitrogen interactions with air [72].

For all these reasons, understanding the fuel resource is essential because its elemental composition directly determines the HHV and thus the amount of energy that can be obtained during combustion. Accurate knowledge of the resource allows for better design and optimization of resources for energy conversion, improving energy efficiency and reducing environmental impacts [8,73,74,75].

In summary, this study effectively illustrates how to use the DE/XGBoost-based method to estimate the coal HHV as an output factor in accordance with the actual observed values. The DE/XGBoost model captures nonlinear interactions reflecting key combustion processes. For example, the synergistic effect of carbon (C) and hydrogen (H) on HHV aligns with their combined oxidation efficiency, while sulfur’s (S) nonlinear contribution—positive at low concentrations but inhibitory at high levels—stems from SO₂ formation competing with fuel oxidation. These dependencies, poorly addressed by linear empirical formulas (e.g., Dulong), underscore ML’s ability to integrate complex physicochemical mechanisms.

Figure 13 compares the experimental and anticipated values of the coal HHV using the following models: the most accurate empirical correlation F6 (Figure 13a), the MLR method (Figure 13b), the M5 model tree (Figure 13c), the DE/RFR-based model (Figure 13d) and the DE/XGBoost-based model (Figure 13e). As a result, using an XGBoost technique is crucial to finding the best solution to the regression problem. These findings unequivocally show that the DE/XGBoost-based method satisfies the crucial statistical goodness-of-fit requirement (R²) and offers the best fit.

4. Conclusions

By comparing the experimental and numerical outcomes, the principal discoveries of the research are summarized as follows:

First, the accurate estimation of coal HHV remains a challenging task because conventional approaches may involve complex heat-transfer phenomena associated with radiation, convection, and conduction, or depend on empirical and heuristic formulations that can lead to markedly different results. In this context, the development of advanced machine learning-based methods is essential. Among the approaches evaluated in this study, the DE-optimized XGBoost model turned out to be the best technique for precisely calculating HHV in distinct coal classes from diverse sources and geographical origins.
Second, the results demonstrate that the coal HHV can be accurately predicted for fuel-related applications by using the proposed DE-optimized XGBoost approach.
Third, a coefficient of determination of 0.9691 was achieved when the dependent variable, coal HHV, was predicted employing the DE/XGBoost model and the test subset comprising 20% of the observed data not utilized for training.
Fourth, the XGBoost-based framework could support the development of an inexpensive microcontroller-based gadget capable of providing dependable coal HHV forecasting for fuel automation uses.
Fifth, the relevance of the input factors used to anticipate the coal HHV might be taken into consideration. This is one of the investigation’s principal discoveries. In consequence, after the carbon (C) content, the constituents H, N, and S, in that sequence, could be considered the second most important coal HHV indicators.
Sixth, the ideas presented here may be expanded to include more independent variables in future research for the purpose of developing hybrid models that include ultimate and/or proximal analysis.
Lastly, the findings strongly discourage the widespread use of easily accessible mathematical methods that significantly rely on the expected behavior of the data. The question of coal HHV estimation in various coal kinds pertinent to the fuel sector is therefore effectively answered by a helpful DE/XGBoost-based approximation.

To sum up, our DE/XGBoost method could be successfully applied to other types of coal with comparable or dissimilar origins. Nonetheless, it is always crucial to consider the distinctive features of every deposit and basin. The practical implementation of these mathematical techniques in energy systems could be an option to optimize fuel delivery operations and enhance overall energy efficiency. Directions for future research include integrating additional fuel properties—adapted to the nature of organic fuels—and extrapolating the models into a broader range of industrial solid fuels or wastes.

Author Contributions

Conceptualization: P.J.G.-N., E.G.-G., J.P.P.-S. and L.A.M.-G.; methodology: P.J.G.-N., E.G.-G., J.P.P.-S. and L.A.M.-G.; software: P.J.G.-N., E.G.-G., J.P.P.-S. and L.A.M.-G.; validation: P.J.G.-N., E.G.-G., J.P.P.-S. and L.A.M.-G.; formal analysis: P.J.G.-N., E.G.-G., J.P.P.-S. and L.A.M.-G.; investigation: P.J.G.-N., E.G.-G., J.P.P.-S. and L.A.M.-G.; data curation: P.J.G.-N., E.G.-G., J.P.P.-S. and L.A.M.-G.; writing—original draft preparation: P.J.G.-N., E.G.-G., J.P.P.-S. and L.A.M.-G.; writing—review and editing: P.J.G.-N., E.G.-G., J.P.P.-S. and L.A.M.-G.; visualization: P.J.G.-N., E.G.-G., J.P.P.-S. and L.A.M.-G.; supervision: P.J.G.-N., E.G.-G., J.P.P.-S. and L.A.M.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to thank the University of Oviedo’s Department of Mathematics for its computational assistance. The research is involved in the research line ‘Energy and Materials’ in the Cogersa Chair of Circular Economy, with the aim of reducing the impact or replacing fossil fuels in energy conversion.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
ANFIS	Adaptive Neuro-Fuzzy Inference System
ANN	Artificial Neural Network
C	Carbon (chemical element)
CCS	Carbon Capture and Sequestration
DE	Differential Evolution
DE/XGBoost	Differential Evolution/Extreme Gradient Boosting
Eq	Equation
GA	Genetic Algorithm
H	Hydrogen
HHV	Higher Heating Value
HV	Heating Value
ML	Machine Learning
MLR	Multivariate Linear Regression
MLT	Machine Learning Techniques
MTDC	Multi-terminal Direct Current
N	Nitrogen
NOx	Nitrogen Oxides
O	Oxygen
OLS	Ordinary Least Square
PDE	Partial Differential Equation
RFR	Random Forest Regression
r	Correlation Coefficient
R²	Coefficient of Determination
S	Sulfur
SHAP	Shapley Additive Explanation
SO₂	Sulfur Dioxide
SO_x	Sulfur Oxides
SS_reg	Sum of Squares Explained
SS_tot	Sum of Squares Total
SS_err	Residual Sum of Squares

References

Finkelman, R.B.; Dai, S.; French, D. The Importance of Minerals in Coal as the Hosts of Chemical Elements: A Review. Int. J. Coal Geol. 2019, 212, 103251. [Google Scholar] [CrossRef]
Wang, X.; Tang, Y.; Wang, S.; Schobert, H.H. Clean Coal Geology in China: Research Advance and Its Future. Int. J. Coal Sci. Technol. 2020, 7, 299–310. [Google Scholar] [CrossRef]
Jiang, L.; Xue, D.; Wei, Z.; Chen, Z.; Mirzayev, M.; Chen, Y.; Chen, S. Coal Decarbonization: A State-of-the-Art Review of Enhanced Hydrogen Production in Underground Coal Gasification. Energy Rev. 2022, 1, 100004. [Google Scholar] [CrossRef]
Paredes-Sánchez, J.P.; López-Ochoa, L.M. Bioenergy as an Alternative to Fossil Fuels in Thermal Systems. In Advances in Sustainable Energy; Vasel, A., Ting, D.S.-K., Eds.; Lecture Notes in Energy; Springer International Publishing: Cham, Switzerland, 2019; Volume 70, pp. 149–168. [Google Scholar]
Seervi, K. Prediction of Calorific Value of Indian Coals by Artificial Neural Network. Bachelor’s Thesis, Department of Mining Engineering, National Institute of Technology, Rourkela, India, 2015. [Google Scholar]
Paredes-Sánchez, J.P.; Las-Heras-Casas, J.; Paredes-Sánchez, B.M. Solar Energy, the Future Ahead. In Advances in Sustainable Energy; Vasel, A., Ting, D.S.-K., Eds.; Lecture Notes in Energy; Springer International Publishing: Cham, Switzerland, 2019; Volume 70, pp. 113–132. [Google Scholar]
Akkaya, A.V. Proximate Analysis Based Multiple Regression Models for Higher Heating Value Estimation of Low Rank Coals. Fuel Process. Technol. 2009, 90, 165–170. [Google Scholar] [CrossRef]
Paredes-Sánchez, B.M.; Paredes-Sánchez, J.P.; García-Nieto, P.J. Evaluation of Implementation of Biomass and Solar Resources by Energy Systems in the Coal-Mining Areas of Spain. Energies 2022, 15, 232. [Google Scholar] [CrossRef]
Channiwala, S.A.; Parikh, P.P. A Unified Correlation for Estimating HHV of Solid, Liquid and Gaseous Fuels. Fuel 2002, 81, 1051–1063. [Google Scholar] [CrossRef]
Mason, D.M.; Gandhi, K.N. Formulas for Calculating the Calorific Value of Coal and Coal Chars: Development, Tests, and Uses. Fuel Process. Technol. 1983, 7, 11–22. [Google Scholar] [CrossRef]
Selvig, W.A.; Wilson, I.H. Calorific Value of Coal. In Chemistry of Coal; Lowry, H.H., Ed.; Wiley: New York, NY, USA, 1945; Volume 1, p. 139. [Google Scholar]
Given, P.H.; Weldon, D.; Zoeller, J.H. Calculation of Calorific Values of Coals from Ultimate Analyses: Theoretical Basis and Geochemical Implications. Fuel 1986, 65, 849–854. [Google Scholar] [CrossRef]
Chelgani, S.C. Estimation of Gross Calorific Value Based on Coal Analysis Using an Explainable Artificial Intelligence. Mach. Learn. Appl. 2021, 6, 100116. [Google Scholar] [CrossRef]
Matin, S.S.; Chelgani, S.C. Estimation of Coal Gross Calorific Value Based on Various Analyses by Random Forest Method. Fuel 2016, 177, 274–278. [Google Scholar] [CrossRef]
Pekel, E.; Akkoyunlu, M.C.; Akkoyunlu, M.T.; Pusat, S. Decision Tree Regression Model to Predict Low-Rank Coal Moisture Content during Convective Drying Process. Int. J. Coal Prep. Util. 2020, 40, 505–512. [Google Scholar] [CrossRef]
Entezari, A.; Aslani, A.; Zahedi, R.; Noorollahi, Y. Artificial intelligence and machine learning in energy systems: A bibliographic perspective. Energy Strat. Rev. 2023, 45, 101017. [Google Scholar] [CrossRef]
Safari, A.; Daneshvar, M.; Anvari-Moghaddam, A. Energy Intelligence: A Systematic Review of Artificial Intelligence for Energy Management. Appl. Sci. 2024, 14, 11112. [Google Scholar] [CrossRef]
Akkaya, A.V. Coal Higher Heating Value Prediction Using Constituents of Proximate Analysis: Gaussian Process Regression Model. Int. J. Coal Prep. Util. 2022, 42, 1952–1967. [Google Scholar] [CrossRef]
Álvarez Antón, J.C.; García Nieto, P.J.; García Gonzalo, E.; González Vega, M.; Blanco Viejo, C. Data-driven state-of-charge prediction of a storage cell using ABC/GBRT, ABC/MLP and LASSO machine learning techniques. J. Comput. Appl. Math. 2023, 433, 115305. [Google Scholar] [CrossRef]
Pritam Deka, P.; Weiner, J. XGBoost for Regression Predictive Modeling and Time Series Analysis: Learn How to Build, Evaluate, and Deploy Predictive Models with Expert Guidance; Packt Publishing: Birmingham, UK, 2024. [Google Scholar]
Ryan, M.; Massaron, L. Machine Learning for Tabular Data: XGBoost, Deep Learning, and AI; Manning: Shelter Island, NY, USA, 2025. [Google Scholar]
Han, J.; Pei, J.; Tong, H. Data Mining: Concepts and Techniques; Morgan Kaufmann: Burlington, MA, USA, 2022. [Google Scholar]
Quinto, B. Next-Generation Machine Learning with Spark: Covers XGBoost, LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More; Apress: New York, NY, USA, 2020. [Google Scholar]
Sharma, N. XGBoost: The Extreme Gradient Boosting for Mining Applications; Grin Verlag: Berlin, Germany, 2018. [Google Scholar]
Kubat, M. An Introduction to Machine Learning; Springer: New York, NY, USA, 2021. [Google Scholar]
Storn, R.; Price, K. Differential Evolution—A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. J. Global Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Feoktistov, V. Differential Evolution: In Search of Solutions; Springer Optimization and Its Applications; Springer: New York, NY, USA, 2006; Volume 5. [Google Scholar]
Price, K.V.; Storn, R.M.; Lampinen, J.A. Differential Evolution: A Practical Approach to Global Optimization; Natural Computing Series; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Rocca, P.; Oliveri, G.; Massa, A. Differential Evolution as Applied to Electromagnetics. IEEE Antennas Propag. Mag. 2011, 53, 38–49. [Google Scholar] [CrossRef]
Chong, E.K.P.; Zak, S.H. An Introduction to Optimization, 4th ed.; Wiley: Hoboken, NJ, USA, 2013. [Google Scholar]
Eberhart, R.C.; Shi, Y.; Kennedy, J. Swarm Intelligence, 8th ed.; The Morgan Kaufmann Series in Evolutionary Computation; Morgan Kaufmann: San Francisco, CA, USA, 2009. [Google Scholar]
Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Aggarwal, C.C. Linear Algebra and Optimization for Machine Learning: A Textbook; Springer: Cham, Switzerland, 2020. [Google Scholar]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: New York, NY, USA, 2021. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Genuer, R.; Poggi, J.-M. Random Forests with R; Springer: Cham, Switzerland, 2020. [Google Scholar]
Quinlan, J.R. Learning with Continuous Classes. In Proceedings of Australian Joint Conference on Artificial Intelligence; World Scientific Press: Singapore, 1992; pp. 343–348. [Google Scholar]
Pal, M. M5 model tree for land cover classification. Int. J. Remote Sens. 2006, 27, 825–831. [Google Scholar] [CrossRef]
Rahimikhoob, A.; Asadi, M.; Mashal, M. A comparison between conventional and M5 model tree methods for converting pan evaporation to reference evapotranspiration for semi-arid region. Water Resour. Manag. 2013, 27, 4815–4826. [Google Scholar] [CrossRef]
Weisberg, S. Applied Linear Regression; Wiley: New York, NY, USA, 2013. [Google Scholar]
Fox, J. Applied Regression Analysis and Generalized Linear Models; SAGE Publications: Los Angeles, CA, USA, 2015. [Google Scholar]
Esmaili, R.; Imani, A.; Moravej, Z.; Pazoki, M. A data-driven approach for fault location in non-homogeneous MTDC systems using Mathematical Morphology and XGBoost regression. Electr. Power Syst. Res. 2026, 250, 112185. [Google Scholar] [CrossRef]
Kangalli Uyar, S.G.; Kagan Ozbay, B.; Dal, B. Interpretable building energy performance prediction using XGBoost Quantile Regression. Energy Build. 2025, 340, 115815. [Google Scholar] [CrossRef]
Huang, J.-C.; Tsai, Y.-C.; Wu, P.-Y.; Lien, Y.-H.; Chien, C.-Y.; Kuo, C.-F.; Hung, J.-F.; Chen, S.-C.; Kuo, C.-H. Predictive modeling of blood pressure during hemodialysis: A comparison of linear model, random forest, support vector regression, XGBoost, LASSO regression and ensemble method. Comput. Methods Programs Biomed. 2020, 195, 105536. [Google Scholar] [CrossRef] [PubMed]
Hosokai, S.; Matsuoka, K.; Kuramoto, K.; Suzuki, Y. Modification of Dulong’s Formula to Estimate Heating Value of Gas, Liquid and Solid Fuels. Fuel Process. Technol. 2016, 152, 399–405. [Google Scholar] [CrossRef]
Xing, J.; Luo, K.; Wang, H.; Gao, Z.; Fan, J. A Comprehensive Study on Estimating Higher Heating Value of Biomass from Proximate and Ultimate Analysis with Machine Learning Approaches. Energy 2019, 188, 116077. [Google Scholar] [CrossRef]
Richards, A.P.; Haycock, D.; Frandsen, J.; Fletcher, T.H. A Review of Coal Heating Value Correlations with Application to Coal Char, Tar, and Other Fuels. Fuel 2021, 283, 118942. [Google Scholar] [CrossRef]
Li, Z.; Zhao, Y.; Lu, Z.; Dai, W.; Huang, J.; Cui, S.; Chen, B.; Wu, S.; Dong, L. Machine Learning Prediction of Calorific Value of Coal Based on the Hybrid Analysis. Int. J. Coal Prep. Util. 2023, 43, 577–598. [Google Scholar] [CrossRef]
Speight, J.G. Synthetic Fuels Handbook: Properties, Process, and Performance; McGraw-Hill Education LLC: New York, NY, USA, 2020. [Google Scholar]
Speight, J.G. Coal-Fired Power Generation Handbook; Wiley-Scrivener Publishing: Beverly, MA, USA, 2021. [Google Scholar]
García Nieto, P.J.; García–Gonzalo, E.; Paredes–Sánchez, B.M.; Paredes–Sánchez, J.P. Forecast of the Higher Heating Value Based on Proximate Analysis by Using Support Vector Machines and Multilayer Perceptron in Bioenergy Resources. Fuel 2022, 317, 122824. [Google Scholar] [CrossRef]
Yaka, H.; Insel, M.A.; Yucel, O.; Sadikoglu, H. A Comparison of Machine Learning Algorithms for Estimation of Higher Heating Values of Biomass and Fossil Fuels from Ultimate Analysis. Fuel 2022, 320, 123971. [Google Scholar] [CrossRef]
Mandavgade, N.K.; Jaju, S.B.; Lakhe, R.R. Determination of Uncertainty in Gross Calorific Value of Coal Using Bomb Calorimeter. In Advanced Instrument Engineering: Measurement, Calibration, and Design; Lay-Ekuakille, A., Ed.; IGI Global: Hershey, PA, USA, 2013; pp. 292–299. [Google Scholar]
Boumanchar, I.; Charafeddine, K.; Chhiti, Y.; M’hamdi Alaoui, F.E.; Sahibed-dine, A.; Bentiss, F.; Jama, C.; Bensitel, M. Biomass Higher Heating Value Prediction from Ultimate Analysis Using Multiple Regression and Genetic Programming. Biomass Conv. Bioref. 2019, 9, 499–509. [Google Scholar] [CrossRef]
Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2018. [Google Scholar]
Wasserman, L. All of Statistics: A Concise Course in Statistical Inference; Springer Texts in Statistics; Springer: New York, NY, USA, 2004. [Google Scholar]
Freedman, D.; Marinho, R.; Purves, R. Statistics; W.W. Norton & Company: New York, NY, USA, 2007. [Google Scholar]
Picard, R.R.; Cook, R.D. Cross-Validation of Regression Models. J. Am. Stat. Assoc. 1984, 79, 575–583. [Google Scholar] [CrossRef]
Efron, B.; Tibshirani, R. Improvements on Cross-Validation: The .632+ Bootstrap Method. J. Am. Stat. Assoc. 1997, 92, 548–560. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Mullen, K.; Ardia, D.; Gil, D.; Windover, D.; Cline, J. DEoptim: An R Package for Global Optimization by Differential Evolution. J. Stat. Softw. 2011, 40, 1–26. [Google Scholar] [CrossRef]
Ardia, D.; Boudt, K.; Carl, P.; Mullen, K.M.; Peterson, B.G. Differential Evolution with DEoptim: An Application to Non-Convex Portfolio Optimization. R J. 2011, 3, 27–34. [Google Scholar] [CrossRef]
Onwubolu, G.C.; Babu, B.V. New Optimization Techniques in Engineering; Studies in Fuzziness and Soft Computing; Springer: Berlin/Heidelberg, Germany, 2004; Volume 141. [Google Scholar]
Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques, 3rd ed.; Morgan Kaufmann Series in Data Management Systems; Morgan Kaufmann: Burlington, MA, USA, 2011. [Google Scholar]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17); Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4768–4777. [Google Scholar]
Molnar, C. Interpreting Machine Learning Models with SHAP: A Guide with Python Examples and Theory on Shapley Values; Christoph Molnar: Munich, Germany, 2023. [Google Scholar]
Mondal, C.; Pal, S.K.; Samanta, B.; Dutta, D.; Raj, S. Analysis and significance of prediction models for higher heating value of coal: An updated review. J. Therm. Anal. Calorim. 2023, 148, 7521–7538. [Google Scholar] [CrossRef]
Yadav, S.; Mondal, S.S. A complete review based on various aspects of pulverized coal combustion. Int. J. Energy Res. 2019, 43, 3134–3165. [Google Scholar] [CrossRef]
Tian, J.; Dong, R.; Jia, H.; Peng, Z.; Liu, Z.; Wang, L.; Yi, L.; Xu, J.; Jin, H.; Chen, B.; et al. Interpretable machine learning for predicting and evaluating hydrogen production from supercritical water gasification of coal. Fuel 2026, 404, 136173. [Google Scholar] [CrossRef]
Jiang, Y.; Yang, X.; Ma, H. Modelling the mechanism of sulphur evolution in the coal combustion process: The effect of sulphur–nitrogen interactions and excess air coefficients. Processes 2023, 11, 1518. [Google Scholar] [CrossRef]
Fan, L.; Meng, X.; Zhao, J.; Zhou, Y.; Chu, R.; Yu, S.; Li, W.; Wu, G.; Jiang, X.; Miao, Z. Reaction Site Evolution during Low-Temperature Oxidation of Low-Rank Coal. Fuel 2022, 327, 125195. [Google Scholar] [CrossRef]
Yin, C.-Y. Prediction of Higher Heating Values of Biomass from Proximate and Ultimate Analyses. Fuel 2011, 90, 1128–1132. [Google Scholar] [CrossRef]
Paredes-Sánchez, J.P.; Gutiérrez-Trashorras, A.J.; Xiberta-Bernat, J. Wood Residue to Energy from Forests in the Central Metropolitan Area of Asturias (NW Spain). Urban. For. Urban Green. 2015, 14, 195–199. [Google Scholar] [CrossRef]

Figure 1. Main coal types and the process of their transformation.

Figure 2. Process diagram showing the experimental methodology.

Figure 3. A more detailed scheme of how XGBoost works in a regression scenario.

Figure 4. An illustration of a random forest regression process.

Figure 5. An illustration of an M5 tree model.

Figure 6. An illustration of a linear regression model.

Figure 7. Process diagram for the DE/XGBoost-based technique.

Figure 8. Correlation matrix for all variables studied in this process.

Figure 9. The DE/XGBoost technique for the graphical representation of the coal HHV using first-order terms: (a) coal’s HHV as a function of the carbon content; (b) coal’s HHV as a function of the hydrogen content; (c) coal’s HHV as a function of the nitrogen content.

Figure 10. The DE/XGBoost technique for the coal HHV illustrated visually by the three most important independent variables’ second-order terms: (a) coal’s HHV as a function of the hydrogen and carbon concentrations; (b) coal’s HHV as a function of the nitrogen and carbon concentrations; (c) coal’s HHV as a function of the hydrogen and nitrogen concentrations.

Figure 11. SHAP values for predicting the coal HHV.

Figure 12. Relevance ranking of the process variables used in the best-fit DE/XGBoost-based approach for predicting the coal HHV, taking into account the SHAP criterion.

Figure 13. The coal HHV’s observed and predicted values for the test dataset utilizing the following: (a) Given et al.’s empirical correlation (F6) (

R^{2} = 0.2263

); (b) multivariate linear regression (MLR) model (

R^{2} = 0.8201

); (c) M5 model tree (

R^{2} = 0.9062

); (d) DE/RFR-based model (

R^{2} = 0.9567

); (e) DE/XGBoost-based model (

R^{2} = 0.9691

).

Figure 13. The coal HHV’s observed and predicted values for the test dataset utilizing the following: (a) Given et al.’s empirical correlation (F6) (

R^{2} = 0.2263

); (b) multivariate linear regression (MLR) model (

R^{2} = 0.8201

); (c) M5 model tree (

R^{2} = 0.9062

); (d) DE/RFR-based model (

R^{2} = 0.9567

); (e) DE/XGBoost-based model (

R^{2} = 0.9691

).

Table 1. Physicochemical variables considered in this study.

Input Variables (wt%)	Symbol	Mean	Standard Deviation
Carbon content	C	78.85	8.11
Hydrogen content	H	5.01	0.95
Oxygen content	O	13.13	7.95
Nitrogen content	N	1.30	0.43
Sulfur content	S	1.72	1.88
Output variable
Higher heating value (MJ/kg)	HHV	30.84	4.03

In this Table, wt% refers to weight percentage.

Table 2. Intervals of variation for the five parameters of the DE/XGBoost-based technique that were fitted for this investigation.

XGBoost Hyperparameters	Lower Limit	Upper Limit
$L e a r n i n g r a t e$	$10^{- 2}$	0.3
$M a x_d e p t h$	3	$10^{1}$
$s u b s a m p l e$	0.5	1.0
Colsample_bytree	0.5	1.0
n_estimators	$10^{2}$	$10^{3}$

Table 3. Intervals of variation for the four parameters of the DE/RFR-based technique that were fitted for this investigation.

RFR Hyperparameters	Lower Limit	Upper Limit
$n o d e s i z e$	1	10
$n t r e e$	10	300
nPerm	1	10
mtry	$1$	5

Table 4. The best-fitted DE/XGBoost approach’s optimal hyperparameters for coal higher heating value (HHV) forecasting.

Parameter	Values of Optimal Hyperparameters
Learning rate	0.0702
Max_depth	5
subsample	0.5077
Colsample_bytree	0.9512
n_estimators	864

Table 5. The best-fitted DE/RFR approach’s optimal hyperparameters for coal higher heating value (HHV) forecasting.

Parameter	Values of Optimal Hyperparameters
Nodesize	3
Ntree	45
nPerm	3
Mtry	4

Table 6. Representative empirical correlations from the literature for estimating coal HHV from ultimate analysis data [9,10,11,12,13,14].

ID	HHV Model Equation ¹	Source
F1	$0.3550 \times C + 0.1331 \times H \times (O - S)$ (F1)	Reference [9]
F2	$0.1909 \times S - 0.0984 \times O + 0.3403 \times C + 1.2432 \times H - 0.0628 \times N$ (F2)	Reference [9]
F3	$(0.3333 \times C + H - O - 0.1250 \times S) \times (0.9875 + 0.0152 \times H)$ (F3)	Reference [9]
F4	$0.0931 \times S + 0.3391 \times C - 0.1237 \times O + 1.4357 \times H$ (F4)	Reference [9]
F5	$0.0941 \times S + 0.3360 \times C - 0.1530 \times O + 0.00072 \times O^{2} + 1.4180 \times H$ (F5)	Reference [10] Reference [11]
F6	$0.0941 \times S + 0.3360 \times C - 0.1450 \times O + 1.4180 \times H$ (F6)	Reference [12]
F7	$0.3850 \times C - 0.1100$ (F7)	Reference [13]
F8	$0.2830 \times S + 0.4310 \times C + 0.6450 \times N + 0.367 \times H - 4.5420$ (F8)	Reference [14]

¹ For consistency, all the literature equations are presented using a unified notation. Elemental variables denote mass percentages of the corresponding elements, and HHV is expressed in

{M J k g}^{- 1}

.

Table 7. Correlation coefficients (r), coefficients of determination (R²), root mean square error (RMSE) and mean absolute error (MAE) for the DE/XGBoost, DE/RFR, M5 tree and multivariate linear models and the best empirical formula with the test data.

Model	$R^{2}$	r	RMSE	MAE
XGBoost	0.9691	0.9858	0.2978	0.1897
RFR	0.9567	0.9802	0.3520	0.2019
M5 model tree	0.9062	0.9524	0.5186	0.3623
MLR	0.8201	0.9077	0.7182	0.5353
F6	0.2263	0.6989	1.4894	1.1400

Table 8. Relative importance of the variables for the best-fit DE/XGBoost-based model for the coal HHV forecasting.

Variable	Mean Absolute SHAP Value
Carbon content	$0.6069$
Hydrogen content	$0.5486$
Nitrogen content	$0.5119$
Sulfur content	$0.2663$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

García-Nieto, P.J.; García-Gonzalo, E.; Paredes-Sánchez, J.P.; Menéndez-García, L.A. Interpretable Optimized Extreme Gradient Boosting for Prediction of Higher Heating Value from Elemental Composition of Coal Resource to Energy Conversion. Big Data Cogn. Comput. 2026, 10, 112. https://doi.org/10.3390/bdcc10040112

AMA Style

García-Nieto PJ, García-Gonzalo E, Paredes-Sánchez JP, Menéndez-García LA. Interpretable Optimized Extreme Gradient Boosting for Prediction of Higher Heating Value from Elemental Composition of Coal Resource to Energy Conversion. Big Data and Cognitive Computing. 2026; 10(4):112. https://doi.org/10.3390/bdcc10040112

Chicago/Turabian Style

García-Nieto, Paulino José, Esperanza García-Gonzalo, José Pablo Paredes-Sánchez, and Luis Alfonso Menéndez-García. 2026. "Interpretable Optimized Extreme Gradient Boosting for Prediction of Higher Heating Value from Elemental Composition of Coal Resource to Energy Conversion" Big Data and Cognitive Computing 10, no. 4: 112. https://doi.org/10.3390/bdcc10040112

APA Style

García-Nieto, P. J., García-Gonzalo, E., Paredes-Sánchez, J. P., & Menéndez-García, L. A. (2026). Interpretable Optimized Extreme Gradient Boosting for Prediction of Higher Heating Value from Elemental Composition of Coal Resource to Energy Conversion. Big Data and Cognitive Computing, 10(4), 112. https://doi.org/10.3390/bdcc10040112

Article Menu

Interpretable Optimized Extreme Gradient Boosting for Prediction of Higher Heating Value from Elemental Composition of Coal Resource to Energy Conversion

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset for Experimentation

2.2. Mathematical Modeling Methods

2.2.1. Extreme Gradient Boosting (XGBoost) Regression Model

2.2.2. Random Forest Regression (RFR)

2.2.3. M5 Model Tree

2.2.4. Multivariate Linear Regression (MLR)

2.2.5. Differential Evolution (DE) Optimization Algorithm

2.3. The Accuracy of This Approximation

3. Results and Discussion

Importance of the Variables

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI