A Novel Data-Driven Method to Estimate Methane Adsorption Isotherm on Coals Using the Gradient Boosting Decision Tree: A Case Study in the Qinshui Basin, China

: The accurate determination of methane adsorption isotherms in coals is crucial for both the evaluation of underground coalbed methane (CBM) reserves and design of development strategies for enhancing CBM recovery. However, the experimental measurement of high-pressure methane adsorption isotherms is extremely tedious and time-consuming. This paper proposed the use of an ensemble machine learning (ML) method, namely the gradient boosting decision tree (GBDT), in order to accurately estimate methane adsorption isotherms based on coal properties in the Qinshui basin, China. The GBDT method was trained to correlate the adsorption amount with coal properties (ash, ﬁxed carbon, moisture, vitrinite, and vitrinite reﬂectance) and experimental conditions (pressure, equilibrium moisture, and temperature). The results show that the estimated adsorption amounts agree well with the experimental ones, which prove the accuracy and robustness of the GBDT method. A comparison of the GBDT with two commonly used ML methods, namely the artiﬁcial neural network (ANN) and support vector machine (SVM), conﬁrms the superiority of GBDT in terms of generalization capability and robustness. Furthermore, relative importance scanning and univariate analysis based on the constructed GBDT model were conducted, which showed that the ﬁxed carbon and ash contents are primary factors that signiﬁcantly a ﬀ ect the adsorption isotherms for the coal samples in this study. inherent moisture, and vitrinite contents and vitrinite reﬂectance), equilibrium moisture content and Laboratory tests, including proximate analysis, maceral group identiﬁcation, vitrinite reﬂectance determination, and adsorption isotherm measurements, were conducted on 165 coal samples retrieved from the Qinshui basin in China in order to develop a database for regression. It has been demonstrated that the GBDT is capable of not only reproducing the adsorption isotherms with reasonable accuracies, but also properly recovering the underlying relation between the input and output variables. As a comparison, the BP-ANN is associated with the over-ﬁtting problem, whereas the SVM has di ﬃ culties in accurately estimating the adsorption isotherms in both the training and testing stages. Such observations conﬁrmed the superiority of the GBDT over other ML tools in solving the speciﬁc regression problem in this study. Furthermore, the relative importance scanning and univariate analysis based on the constructed GBDT model showed that the adsorption isotherms are primarily controlled by the ﬁxed carbon and ash contents for the coals that were investigated in this study. Other factors, including vitrinite, inherent and equilibrium moistures, vitrinite reﬂectance, and temperature, exert minor or even negligible e ﬀ ects on the adsorption isotherm.


Introduction
As an unconventional hydrocarbon resource, coalbed methane (CBM) has been unlocked for commercial development in the USA, China, Australia, Canada, and India [1]. The recovery of CBM from coal seams has multiple favorable effects, such as the reduction of greenhouse gas release into the atmosphere, enhancement of underground coal mining safety, and addition to natural gas supply [2,3]. It is commonly believed that the majority of methane exists within coal seams via physical adsorption [4,5]. The accurate characterization of methane adsorption isotherms in coals is crucial

Samples and Experiments
A number of 165 coal samples were acquired while using the downhole coring technique from 72 CBM wellbores in No. 3 and No. 5 coal seams. After being transported to the laboratory in sealed tanks, the coal samples were subjected to proximate analysis, vitrinite reflectance, and adsorption isotherm measurements in order to develop a database that is used for machine learning. Proximate analysis was conducted following the Chinese standard GB/T 212-2008 [26]. The maceral group was identified at 50× magnification under plane polarized reflected light with a fluorescence illuminator, following the Chinese standard GB/T 8899-2013 [27]. Vitrinite reflectance (Romax) was measured according to Chinese standard GB/T 6948-2008 [28] at a magnification of 500 × oil immersions. More details on the analysis procedure are given in [29]. Methane adsorption isotherms were measured on 60-80 mesh moisture-equilibrium coal powders while using the manometric method [7]. For each sample, the experimental temperature was set to be identical with the in situ temperature where the sample was retrieved. Each adsorption isotherm is comprised of eight equilibrium pressures (ranging from ≈0.5 to ≈8.5 MPa) with corresponding adsorption amounts, which results in a total number of 8 × 165 = 1320 data points in the database. Table 1 summarizes experimental data for the samples. amples and Experiments number of 165 coal samples were acquired while using the downhole coring technique fro wellbores in No. 3 and No. 5 coal seams. After being transported to the laboratory in sealed t al samples were subjected to proximate analysis, vitrinite reflectance, and adsorption isot rements in order to develop a database that is used for machine learning. Proximate analysis cted following the Chinese standard GB/T 212-2008 [26]. The maceral group was identifi agnification under plane polarized reflected light with a fluorescence illuminator, followin se standard GB/T 8899-2013 [27]. Vitrinite reflectance (Romax) was measured according to Ch ard GB/T 6948-2008 [28] at a magnification of 500 × oil immersions. More details on the ana dure are given in [29]. Methane adsorption isotherms were measured on 60-80 mesh mois brium coal powders while using the manometric method [7]. For each sample, the experim rature was set to be identical with the in situ temperature where the sample was retrieved.

Basics of GBDT
The basic philosophy behind the GBDT is to use an ensemble of classification and regression trees (CARTs) to fit the training data samples through minimizing a regularized objective function. Each CART is comprised of a number of leaf nodes and each leaf node is associated with a binary decision rule structure and a continuous score. In GBDT, a number of CARTs are developed in a sequential manner in order to form an accurate ensemble model. For the completeness of this paper, the GBDT algorithm is briefly addressed, as follows. Readers are referred to [20,21] for more details on the GBDT algorithm.
For a given data set with d dimensions and n examples D = (x i , y i ) , X i ∈ R d , y i ∈ R, i = 1, 2, . . . , n , the output F is predicted as the sum of K additive functions, which is written as where h represents a tree with a number of L nodes; R lm represents partitioned region that is defined by the terminal node l of the mth tree; β m M 0 are expansion coefficients that are jointly fit with {R lm } L l to the training data set by minimizing a regularized objective function: where ψ is a differentiable loss function, which was assigned as the squared error in this study. The minimization of the loss function is achieved by iteratively adding leaf nodes that result in the steepest decent [21], which is mathematically expressed as: where υ is the shrinkage factor in the range of (0, 1] that controls the learning rate of the training process. Empirically, small values of υ are beneficial in conserving the model and, thus, help in increasing the generalization capability [22].

Input Features
For constructing a reliable regression model, a first key step is to properly identify input features (or independent variables) [30]. A most popular method for identifying the input features is to conduct univariate correlation regression, and features with high degree of correlation (e.g., in terms of high correlation coefficient) with the output are fed into the estimation model [31,32]. The primary drawback of this method is that feature(s) with weak, but certain, underlying correlations with the output may be excluded from the model, which may tend to decrease the modeling accuracy. Beker et al. [33] argued that all features with either explicit (strong) or implicit (weak) correlations with the output variables should be included in a machine learning model in order to attain high modeling accuracy. In this regard, we assigned features as the input of the model that have been demonstrated empirically to exert potential effects on the adsorption amount, and that are less expensive and more rapid to be experimentally measured than the adsorption isotherm. Section 3 will discuss the effect of the inclusion of these "less significant" features on the model accuracy.
In this study, the adsorption isotherm is represented with a series of discrete (adsorption amount versus equilibrium pressure) data points ( Figure 3). Thus, the estimation of adsorption isotherm is, Energies 2020, 13, 5369 6 of 21 in fact, transformed to the estimation of adsorption amounts at given equilibrium pressures. In this way, the equilibrium pressure is an essential input variable for the construction of the estimation model. An alternative option to estimate the adsorption isotherm would be to use an adsorption model (e.g., the Langmuir type model) to represent the isotherm and then correlate the adsorption model parameters (e.g., Langmuir volume and Langmuir pressure) with certain input features. However, our preliminary evaluation of this alternative option turned out to fail in accurately reproducing the adsorption isotherm, which is probably due to the weak correlation of Langmuir pressure with input features, such as coal properties and experimental conditions, as mentioned earlier.
Energies 2020, 13, x FOR PEER REVIEW 6 of 21 (e.g., Langmuir volume and Langmuir pressure) with certain input features. However, our preliminary evaluation of this alternative option turned out to fail in accurately reproducing the adsorption isotherm, which is probably due to the weak correlation of Langmuir pressure with input features, such as coal properties and experimental conditions, as mentioned earlier. For the coal samples that were used in this study, coal properties that exhibit strong control on methane adsorption capacity are ash ( Figure 4a) and fixed carbon ( Figure 4b) (with R 2 ≥ 0.6), which, therefore, are assigned as the input features. The vitrinite reflectance ( Figure 4c) exhibits a generally linear positive effect on the adsorption although the correlation is relatively loose (R 2 = 0.36), which is also included in the input feature bank. Other factors, including inherent and equilibrium moisture, vitrinite content, and experimental temperature, which show weak correlations (with R 2 ≤ 0.1) with the adsorption capacity (Figure 4d through 3g), are also included in the model construction given numerous documentations of their potential effect on adsorption isotherm (e.g., [1,4,[15][16][17][18]). For the coal samples that were used in this study, coal properties that exhibit strong control on methane adsorption capacity are ash ( Figure 4a) and fixed carbon ( Figure 4b) (with R 2 ≥ 0.6), which, therefore, are assigned as the input features. The vitrinite reflectance ( Figure 4c) exhibits a generally linear positive effect on the adsorption although the correlation is relatively loose (R 2 = 0.36), which is also included in the input feature bank. Other factors, including inherent and equilibrium moisture, vitrinite content, and experimental temperature, which show weak correlations (with R 2 ≤ 0.1) with the adsorption capacity ( Figure 4d through 3g), are also included in the model construction given numerous documentations of their potential effect on adsorption isotherm (e.g., [1,4,[15][16][17][18]).
As mentioned earlier, our goal is to develop an estimation model that is based on data that are less expensive and less time-consuming to obtain, so that the adsorption isotherms can be fast estimated with reasonable accuracies. Therefore, other coal properties that may exert potential influence on methane adsorption isotherms, such as micro-pore surface area/volume [34][35][36] and surface functional groups [13,37] of coals, are not considered because such information requires experimental endeavors that inevitably bring in additional expenses. Besides, the experimental determination of the pore characteristics is rather complicated while using techniques, such as gas (N 2 /CO 2 ) adsorption [38], focused ion beam scanning electron microscopy (FIB-SEM, [39]) and small-angle neutron scattering (SANS, [40]), which require special experimental apparatus and they may be even more time-consuming than the measurement of adsorption isotherms.
As a short summary, the input features for constructing the estimation model for the adsorption isotherm are: coal properties (including ash, fixed carbon, inherent moisture, vitrinite, and vitrinite reflectance) and experimental conditions (equilibrium pressure, equilibrium moisture, and temperature).

Determination of Optimal GBDT Hyperparameters
Prior to conducting GBDT regressions, the whole database comprising of 165 samples and adsorption isotherms is randomly divided into three sub-sets, namely the training (99 samples, 60%), validation (33 samples, 20%), and testing (33 samples, 20%) sets ( Figure 5). The training set was used for training the GBDT network, while the validation set was for monitoring the performance and for determining the optimal model parameters (which is to be addressed in the following paragraph). The testing set was assumed to be "unseen" during the model construction process and used for testing the generalization capability of the constructed regression model. It should be noted again that each adsorption isotherm is represented with eight (adsorption amount versus equilibrium pressure) discrete data points, and, thus, the training, validation, and testing sets are, in effect, constituted with a number of 99 × 8 = 792, 33 × 8 = 264, and 33 × 8 = 264 data points, respectively ( Figure 5). An example of the experimentally measured adsorption isotherm represented with discrete equilibrium points. P i and Y i represent the equilibrium pressure and the corresponding adsorption amount for the ith equilibrium points.
For the coal samples that were used in this study, coal properties that exhibit strong control on methane adsorption capacity are ash ( Figure 4a) and fixed carbon ( Figure 4b) (with R 2 ≥ 0.6), which, therefore, are assigned as the input features. The vitrinite reflectance ( Figure 4c) exhibits a generally linear positive effect on the adsorption although the correlation is relatively loose (R 2 = 0.36), which is also included in the input feature bank. Other factors, including inherent and equilibrium moisture, vitrinite content, and experimental temperature, which show weak correlations (with R 2 ≤ 0.1) with the adsorption capacity (Figure 4d through 3g), are also included in the model construction given numerous documentations of their potential effect on adsorption isotherm (e.g., [1,4,[15][16][17][18]  As mentioned earlier, our goal is to develop an estimation model that is based on data that are less expensive and less time-consuming to obtain, so that the adsorption isotherms can be fast estimated with reasonable accuracies. Therefore, other coal properties that may exert potential influence on methane adsorption isotherms, such as micro-pore surface area/volume [34][35][36] and surface functional groups [13,37] of coals, are not considered because such information requires The empirical results from [41,42] demonstrate that the accuracy and generalization capability of the GBDT can be significantly influenced by three parameters, namely the number of estimators, the shrinkage factor, and the maximal tree depth. As such, these parameters should be optimized in order to ensure the accuracy and robustness of the GBDT. In this study, the optimal values for the three parameters were determined through the exhaustive grid search method [43]. That is, all possible combinations of the parameter values were run sequentially, and the optimal parameterization was determined to be the one that results in the lowest root mean squared error (RMSE) for the validation set. Previous studies [41,42] suggested that a satisfactory performance of GBDT can be obtained with relatively small shrinkage factors (<0.1) and low-level tree complexity (with tree depth<6). As such, the shrinkage factor was varied from 0.005 to 0.105 with a step of 0.01, and the maximum tree depth was varied from two to seven with a step of 1 in this study. The optimal number of trees is a problem-dependent hyperparameter, which was set to vary from 500 to 5000 with a step of 500.

Evaluation Matrices
The performance of the GBDT estimation was quantitatively evaluated through four metrics, namely average absolute error (AAE), average relative error (ARE), root mean square error (RMSE), and determination coefficient (R 2 ). The definitions for these metrics are: Figure 5. Illustration of the database structure and division of the database into the training, validation and testing sets. P-equilibrium pressure; A-ash; FC-fixed carbon; V-vitrinite; R omax -vitrinite reflectance; IM-inherent moisture; EM-equilibrium moisture; T-temperature; Y-adsorption amount. Subscript j denotes the jth sample; Superscript I on "P" and "Y" denote the ith equilibrium point on the adsorption isotherm.
The empirical results from [41,42] demonstrate that the accuracy and generalization capability of the GBDT can be significantly influenced by three parameters, namely the number of estimators, the shrinkage factor, and the maximal tree depth. As such, these parameters should be optimized in order to ensure the accuracy and robustness of the GBDT. In this study, the optimal values for the three parameters were determined through the exhaustive grid search method [43]. That is, all possible combinations of the parameter values were run sequentially, and the optimal parameterization was determined to be the one that results in the lowest root mean squared error (RMSE) for the validation set. Previous studies [41,42] suggested that a satisfactory performance of GBDT can be obtained with relatively small shrinkage factors (<0.1) and low-level tree complexity (with tree depth<6). As such, the shrinkage factor was varied from 0.005 to 0.105 with a step of 0.01, and the maximum tree depth was varied from two to seven with a step of 1 in this study. The optimal number of trees is a problem-dependent hyperparameter, which was set to vary from 500 to 5000 with a step of 500.

Evaluation Matrices
The performance of the GBDT estimation was quantitatively evaluated through four metrics, namely average absolute error (AAE), average relative error (ARE), root mean square error (RMSE), and determination coefficient (R 2 ). The definitions for these metrics are: Energies 2020, 13, 5369 9 of 21 where y and f are the measured and estimated adsorption amounts, respectively; y is the mean value of the measured adsorption amount; and, N is the number of data points.

Comparison with BP-ANN and SVM
The BP-ANN and SVM are powerful supervised machine learning algorithms that have been successfully applied in solving nonlinear regression problems in a variety of fields [32,[44][45][46]. A most popular version of the BP-ANN is the multilayer perception network (MLPN), which is comprised of one input layer, one or more hidden layer, and one output layer. The training of a MLPN is, in essence, an iterative process of updating the weights and biases of the nodes by using the back propagation algorithm in order to minimize an error function. The basic philosophy behind the SVM is to convert the nonlinear regression problem in the true space into linear approximations in a higher dimensional feature space by minimizing a regularized loss function. Mathematical details on the BP-ANN and SVM have been extensively addressed previously (see e.g., [24,39]), which, therefore, are not repeated in this paper.
The LIBSVM pakage [47] and the neural network module that were implemented in the Matlab (V2019) were used for conducting SVM and BP-ANN regressions, respectively. The data points and input variables are identical with that in the GBDT regression. A BP-ANN with three layers (one input, one hidden, and on output layer) has proven to be capable of approximating any continuous function with any accuracy [32], which, therefore, was adopted in this study. It should be noted that (i) the performance of a BP-ANN can be significantly influenced by the number of neurons in the hidden layer [44] and (ii) for an SVM with a kernel of radial basis function (RBF, which is most frequently used for regression), the regression accuracy is associated with regulation and error goal parameters [48,49]. In order to attain a fair comparison, parameters that may affect the BP-ANN and SVM performance were tuned and optimized while using the exhaustive grid search, in a similar manner with the GBDT. Table 2 shows the optimal key model parameters for the BP-ANN and SVM.

Performance of the GBDT Estimation Model
The optimal hyperparameter values for the GBDT, as determined with the exhaustive grid search method, were 0.01 for the shrinkage factor, 3 for the tree depth, and 1500 for the number of trees, respectively. Figure 6 depicts the GBDT estimation results for the training, validation, and testing sets. It is shown in Figure 6a that all of the training data points are grouped closely around the 45-degree line. Table 3 demonstrates extremely low error matrices (an AAE of 0.33 m 3 /t, an ARE of 2.31%, and a RMSE of 0.42 m 3 /t) and a remarkably high R 2 of 0.993 for the training set. These evaluation matrices prove that the GBDT is capable of accurately reproducing the adsorption amount that is based on the input variables. For the validation and testing set, although the cross plots of the measured versus estimated values show a more scattered pattern than the training set, the majority of the data points are distributed around the 45-degree line and the deviations are within small ranges (Figure 6b,c). The AAE, ARE, RMSE, and determination coefficient (R 2 ) are calculated to be 0.83 m 3 /t, 5.97%, 1.00 m 3 /t, and 0.950 for the validation, and 0.85 m 3 /t, 6.35%, 1.06 m 3 /t, and 0.946 for the testing sets, respectively. The error matrices for the validation and testing are quite comparable, suggesting strong robustness of the constructed model (Table 3). In this regard, the GBDT model can be considered to have a strong generalization capability, as indicated by the relatively low error matrices and high R 2 .
Energies 2020, 13, x FOR PEER REVIEW 10 of 21 AAE, ARE, RMSE, and determination coefficient (R 2 ) are calculated to be 0.83 m 3 /t, 5.97%, 1.00 m 3 /t, and 0.950 for the validation, and 0.85 m 3 /t, 6.35%, 1.06 m 3 /t, and 0.946 for the testing sets, respectively. The error matrices for the validation and testing are quite comparable, suggesting strong robustness of the constructed model (Table 3). In this regard, the GBDT model can be considered to have a strong generalization capability, as indicated by the relatively low error matrices and high R 2 .  The comparison between the estimated and measured adsorption isotherms for typical samples in the testing set was conducted in order to further demonstrate the accuracy of the GBDT model in reproducing the adsorption isotherm for an individual coal sample. The methane adsorption capacity on the coal samples is predominantly controlled by the ash and fixed carbon contents, as mentioned in Section 2.4.1. Therefore, four typical samples-one with the highest ash content, one with the lowest ash content, one with the highest fixed carbon content and one with the lowest fixed carbon content-among all samples in the testing set were selected for illustrating the model accuracy.
For the two samples with respective ash contents of 9.6% and 39.96% and one sample with low fixed carbon content (83.88%), the estimated adsorption isotherms are in excellent agreement with the measured ones, as can be seen from Figure 7. For the sample with high fixed carbon content (91.54%), the adsorption equilibrium points at lower pressures (≤≈4.0 MPa) agrees well with the measured ones, whereas certain deviations exist for the equilibrium points at higher pressures (>≈4.0MPa). The maximum error occurs at an equilibrium pressure of ≈8.0 MPa, with the estimated and measured  The comparison between the estimated and measured adsorption isotherms for typical samples in the testing set was conducted in order to further demonstrate the accuracy of the GBDT model in reproducing the adsorption isotherm for an individual coal sample. The methane adsorption capacity on the coal samples is predominantly controlled by the ash and fixed carbon contents, as mentioned in Section 2.4.1. Therefore, four typical samples-one with the highest ash content, one with the lowest ash content, one with the highest fixed carbon content and one with the lowest fixed carbon content-among all samples in the testing set were selected for illustrating the model accuracy.
For the two samples with respective ash contents of 9.6% and 39.96% and one sample with low fixed carbon content (83.88%), the estimated adsorption isotherms are in excellent agreement with the measured ones, as can be seen from Figure 7. For the sample with high fixed carbon content (91.54%), the adsorption equilibrium points at lower pressures (≤≈4.0 MPa) agrees well with the measured ones, whereas certain deviations exist for the equilibrium points at higher pressures (>≈4.0MPa). The maximum error occurs at an equilibrium pressure of ≈8.0 MPa, with the estimated and measured adsorption amounts being 23.71 and 25.23 m 3 /t, respectively. Such discrepancy, as we note, can be considered to be acceptable given the uncertainties that are associated with sample preparation, data acquisition, and measurement operations [12]. Previous reproducibility tests [50,51] showed that discrepancies in the adsorption isotherm measurement may reach high, up to 10-15% on a same coal sample, which are even higher than the GBDT estimation results. It should also be pointed out that the estimated adsorption amount follows a monotonically increasing trend with increasing pressure (which is basic characteristics for methane adsorption isotherms on coals), although no specific constraint was applied in the training process to compel such monotonicity. These results confirm the reliability of the constructed GBDT model in estimating the methane adsorption isotherms on coals with reasonable accuracies.
Energies 2020, 13, x FOR PEER REVIEW 11 of 21 adsorption amounts being 23.71 and 25.23 m 3 /t, respectively. Such discrepancy, as we note, can be considered to be acceptable given the uncertainties that are associated with sample preparation, data acquisition, and measurement operations [12]. Previous reproducibility tests [50,51] showed that discrepancies in the adsorption isotherm measurement may reach high, up to 10%-15% on a same coal sample, which are even higher than the GBDT estimation results. It should also be pointed out that the estimated adsorption amount follows a monotonically increasing trend with increasing pressure (which is basic characteristics for methane adsorption isotherms on coals), although no specific constraint was applied in the training process to compel such monotonicity. These results confirm the reliability of the constructed GBDT model in estimating the methane adsorption isotherms on coals with reasonable accuracies.  Figure 8 shows the cross plots of BP-ANN estimated with measured adsorption amounts for the training, validation, and testing sets. All of the data points are generally located on the 45-degree line, which suggests that BP-ANN has an extraordinary capability to accurately correlate the output with input variables for the training set, as can be seen from Figure 8a. Table 3 demonstrates that the BP-ANN outperforms the GBDT in terms of error matrices for the training set. However, Figure 8b,c demonstrate that a noticeable number of data points deviate severely from the 45-degree line for both the validation and testing sets, resulting in higher errors (AAE, ARE, and RMSE) and lower R 2 than the GBDT (Table 3). These observations suggest that the generalization capability of BP-ANN is highly questionable and severe over-fitting issue occurs. As such, the BP-ANN should not be considered to be suitable for accurately estimating the adsorption isotherms.   Figure 8 shows the cross plots of BP-ANN estimated with measured adsorption amounts for the training, validation, and testing sets. All of the data points are generally located on the 45-degree line, which suggests that BP-ANN has an extraordinary capability to accurately correlate the output with input variables for the training set, as can be seen from Figure 8a. Table 3 demonstrates that the BP-ANN outperforms the GBDT in terms of error matrices for the training set. However, Figure 8b,c demonstrate that a noticeable number of data points deviate severely from the 45-degree line for both the validation and testing sets, resulting in higher errors (AAE, ARE, and RMSE) and lower R 2 than the GBDT (Table 3). These observations suggest that the generalization capability of BP-ANN is highly questionable and severe over-fitting issue occurs. As such, the BP-ANN should not be considered to be suitable for accurately estimating the adsorption isotherms.

Comparison with BP-ANN and SVM
Energies 2020, 13, x FOR PEER REVIEW 11 of 21 adsorption amounts being 23.71 and 25.23 m 3 /t, respectively. Such discrepancy, as we note, can be considered to be acceptable given the uncertainties that are associated with sample preparation, data acquisition, and measurement operations [12]. Previous reproducibility tests [50,51] showed that discrepancies in the adsorption isotherm measurement may reach high, up to 10%-15% on a same coal sample, which are even higher than the GBDT estimation results. It should also be pointed out that the estimated adsorption amount follows a monotonically increasing trend with increasing pressure (which is basic characteristics for methane adsorption isotherms on coals), although no specific constraint was applied in the training process to compel such monotonicity. These results confirm the reliability of the constructed GBDT model in estimating the methane adsorption isotherms on coals with reasonable accuracies.  Figure 8 shows the cross plots of BP-ANN estimated with measured adsorption amounts for the training, validation, and testing sets. All of the data points are generally located on the 45-degree line, which suggests that BP-ANN has an extraordinary capability to accurately correlate the output with input variables for the training set, as can be seen from Figure 8a. Table 3 demonstrates that the BP-ANN outperforms the GBDT in terms of error matrices for the training set. However, Figure 8b,c demonstrate that a noticeable number of data points deviate severely from the 45-degree line for both the validation and testing sets, resulting in higher errors (AAE, ARE, and RMSE) and lower R 2 than the GBDT (Table 3). These observations suggest that the generalization capability of BP-ANN is highly questionable and severe over-fitting issue occurs. As such, the BP-ANN should not be considered to be suitable for accurately estimating the adsorption isotherms.   Figure 9 depicts the estimation results of SVM regression. As shown, there is a noticeable number of data points that severely deviate from the 45-degree line for the training, validation, and testing sets. Thus, it is concluded that the SVM is neither capable of accurately learning the underlying correlations between the output and input variables nor capable of giving reasonable predictions. Comparisons of the evaluation matrices for the SVM with those for the GBDT and BP-ANN (Table 3) suggest that the SVM has better generalization capability than the BP-ANN, but performs worse than the GBDT.

Comparison with BP-ANN and SVM
Energies 2020, 13, x FOR PEER REVIEW 12 of 21 Figure 9 depicts the estimation results of SVM regression. As shown, there is a noticeable number of data points that severely deviate from the 45-degree line for the training, validation, and testing sets. Thus, it is concluded that the SVM is neither capable of accurately learning the underlying correlations between the output and input variables nor capable of giving reasonable predictions. Comparisons of the evaluation matrices for the SVM with those for the GBDT and BP-ANN (Table 3) suggest that the SVM has better generalization capability than the BP-ANN, but performs worse than the GBDT.

Relative Importance of Input Features
Once the estimation model has been constructed, it should be of practical meaning to quantify the effect of each input feature on the adsorption isotherm. In this section, the relative importance of each input variable is quantified while using the mean decrease impurity importance (MDI) [52,53]. A most significant advantage of the MDI over conventional Pearson or Spearman coefficients is that the MDI does not require a priory assumption of linear or monotonic dependence of the output on the input features, which, therefore, should be more accurate in quantifying the effects of each input feature [54]. Figure 10 shows that fixed carbon and ash are three key factors that control the adsorption amount. The equilibrium moisture has a relative importance of ≈8.8%, while the remaining factors (temperature, vitrinite, vitrinite reflectance, and inherent moisture) have relative importance of less than 3.0%, which suggests the very minor or even negligible influences of these factors on the adsorption amount. Here, it is noted the effect of vitrinite reflectance is significantly diluted when compared with the correlation analysis in Section 2.3, which is possibly due to the collinearity between the vitrinite reflectance and fixed carbon for the coal samples ( Figure 11). The existence of collinearity may result in the abnormal response of the output to one or several of the collinear inputs [55]. Fixed carbon demonstrates an obviously stronger correlation on the adsorption capacity than vitrinite reflectance does and, thus, the effect of the vitrinite reflectance has a high risk of being overridden by the fixed carbon considering their collinearity, as can be seen from Figure 4b,c.

Univariate Analyses
The univariate analysis was conducted using the constructed GBDT model to further demonstrate how the adsorption isotherms are affected by the input features. The base value was set to be 20%, 1.0%, 88%, 80%, 2.5%, 15%, and 35℃ for ash, inherent moisture, fixed carbon, vitrinite, vitrinite reflectance, equilibrium moisture, and temperature, respectively. These values were set as approximately the averaged ones that are shown in Table 1. Each input variable was tuned at four values (that are within the range of all the coal samples in this study) and the corresponding adsorption isotherms (at pressures of 1 to 8 MPa with a step of 1 MPa) were sequentially calculated,

Univariate Analyses
The univariate analysis was conducted using the constructed GBDT model to further demonstrate how the adsorption isotherms are affected by the input features. The base value was set to be 20%, 1.0%, 88%, 80%, 2.5%, 15%, and 35℃ for ash, inherent moisture, fixed carbon, vitrinite, vitrinite reflectance, equilibrium moisture, and temperature, respectively. These values were set as approximately the averaged ones that are shown in Table 1. Each input variable was tuned at four values (that are within the range of all the coal samples in this study) and the corresponding adsorption isotherms (at pressures of 1 to 8 MPa with a step of 1 MPa) were sequentially calculated, which are shown in Figure 12. vitrinite reflectance, equilibrium moisture, and temperature, respectively. These values were set as approximately the averaged ones that are shown in Table 1. Each input variable was tuned at four values (that are within the range of all the coal samples in this study) and the corresponding adsorption isotherms (at pressures of 1 to 8 MPa with a step of 1 MPa) were sequentially calculated, which are shown in Figure 12. • Fixed carbon Figure 12a depicts the adsorption isotherm with reference to varying fixed carbon. It is well demonstrated that the isotherm tends to move upwards as fixed carbon increases. Previous studies [16,56] observed that the methane adsorption capacity follows a first decreasing and then increasing trend with increasing fixed carbon, with the minimum occurring at ≈60-80%. This parabolic trend may be attributed to the variations in the micro-pore surface areas that are associated with the coalification jump that occurs approximately in the range of 75-85% fixed carbon [17]. More recently, Chattaraj et al. [1] showed that, for Indian coals with fixed carbon content of >75%, methane adsorption capacity is in positive linear correlation with fixed carbon. It is interesting to note that the coal samples used in this study have a generally high fix carbon contents of >77%, which suggests that our findings are in line

Relative Importance of Input Features
Once the estimation model has been constructed, it should be of practical meaning to quantify the effect of each input feature on the adsorption isotherm. In this section, the relative importance of each input variable is quantified while using the mean decrease impurity importance (MDI) [52,53]. A most significant advantage of the MDI over conventional Pearson or Spearman coefficients is that the MDI does not require a priory assumption of linear or monotonic dependence of the output on the input features, which, therefore, should be more accurate in quantifying the effects of each input feature [54]. Figure 10 shows that fixed carbon and ash are three key factors that control the adsorption amount. The equilibrium moisture has a relative importance of ≈8.8%, while the remaining factors (temperature, vitrinite, vitrinite reflectance, and inherent moisture) have relative importance of less than 3.0%, which suggests the very minor or even negligible influences of these factors on the adsorption amount. Here, it is noted the effect of vitrinite reflectance is significantly diluted when compared with the correlation analysis in Section 2.3, which is possibly due to the collinearity between the vitrinite reflectance and fixed carbon for the coal samples ( Figure 11). The existence of collinearity may result in the abnormal response of the output to one or several of the collinear inputs [55]. Fixed carbon demonstrates an obviously stronger correlation on the adsorption capacity than vitrinite reflectance does and, thus, the effect of the vitrinite reflectance has a high risk of being overridden by the fixed carbon considering their collinearity, as can be seen from Figure 4b,c.

Univariate Analyses
The univariate analysis was conducted using the constructed GBDT model to further demonstrate how the adsorption isotherms are affected by the input features. The base value was set to be 20%, 1.0%, 88%, 80%, 2.5%, 15%, and 35°C for ash, inherent moisture, fixed carbon, vitrinite, vitrinite reflectance, equilibrium moisture, and temperature, respectively. These values were set as approximately the averaged ones that are shown in Table 1. Each input variable was tuned at four values (that are within the range of all the coal samples in this study) and the corresponding adsorption isotherms (at pressures of 1 to 8 MPa with a step of 1 Mpa) were sequentially calculated, which are shown in Figure 12.
• Fixed carbon Figure 12a depicts the adsorption isotherm with reference to varying fixed carbon. It is well demonstrated that the isotherm tends to move upwards as fixed carbon increases. Previous studies [16,56] observed that the methane adsorption capacity follows a first decreasing and then increasing trend with increasing fixed carbon, with the minimum occurring at ≈60-80%. This parabolic trend may be attributed to the variations in the micro-pore surface areas that are associated with the coalification jump that occurs approximately in the range of 75-85% fixed carbon [17]. More recently, Chattaraj et al. [1] showed that, for Indian coals with fixed carbon content of >75%, methane adsorption capacity is in positive linear correlation with fixed carbon. It is interesting to note that the coal samples used in this study have a generally high fix carbon contents of >77%, which suggests that our findings are in line with these previous studies.
• Ash Figure 12b illustrates that the adsorption isotherm exhibits an obvious negative correlation with ash at all pressures. It is well understood that an increase in ash content tends to decrease the adsorption isotherm on coals, because (i) ash has no affinity to methane adsorption [7,18] and (ii) ash-rich samples are generally associated with lower micro porosities [57] and, therefore, provide less adsorption space to accommodate gas molecules.
• Moisture Variations in adsorption isotherms caused by inherent and equilibrium moistures are obviously less significant than that by fixed carbon or ash, which is consistent with the ranking of relative importance, as shown in Figure 12c,d. Additionally, it is noticeable that the adsorption capacity does not follow a monotonous decreasing trend with the increase in either inherent or equilibrium moistures. It has been extensively addressed in previous studies [14,58,59] that a coal sample in the moisture-equilibrium state has a significantly lower adsorption capability than in the dry state. This is due to the occupation of some adsorption sites on the coal surface by water molecules because coals have a preferential affinity to water over methane [7]. However, for a coal sample that is already in a moisture-equilibrated state, a further increment in moisture content does not affect the adsorption capacity to gas [14,60]. Besides, as stated in [7,13], the moisture content may be predominated by the coal rank. Thus, the effect of moisture content on the adsorption isotherm may possibly be overridden by the coal rank indicators such as fixed carbon for the coal samples in this study.
• Temperature Figure 12e shows that there is no significant change in the adsorption isotherm with elevating temperature. Most previous studies [61,62] conclude that the elevation in temperature may result in a noticeable reduction in methane adsorption capacity, because the sorptive surface coverage at a specific gas pressure decreases with increasing temperature, as derived from thermodynamics [7]. However, Crosdale et al.'s [60] experiments on moist coals showed no significant dependence of adsorption capacity on temperatures. More recently, Guan et al. [63] showed that the adsorption capacities for both methane and CO 2 remained constant as the temperatures were elevated from 323 to 343 K. Our observations are consistent with [60], which may be attributed to the compensation by water molecule release for the reduction in the sorptive surface coverage caused by temperature elevation [6].
• Vitrinite To date, there are still controversies regarding the effect of vitrinite content on the methane adsorption capacity. Some studies [1,4,17] showed that vitrinite-rich (bright) coals have a higher methane adsorption capacity than the inertinite-rich (dull) ones with equivalent ranks, which may be attributed to the existence of more micro-pores in vitrinite that is favorable to accommodation of gas molecules [64]. Dutta et al. [18] and Feng et al. [16] stated that methane adsorption capacity follows a "U-shaped" trend with vitrinite content. Other authors [13,65,66] found no obvious correlation between the adsorption capacity and vitrinite content, which holds valid for the coal samples in this study (Figure 12f).

Vitrinite reflectance
Vitrinite reflectance is a commonly used indicator of the coal rank (maturity), which numerous previous studies [15,16,18] have demonstrated to be closely correlated with the methane adsorption capacities in coals. For the coal samples that were investigated in this study, the vitrinite reflectance exerts a negligible effect on the adsorption isotherm (Figure 12g). This is in line with [67], who argued that the vitrinite reflectance alone cannot control the maximum sorption capacities and simple lithotype analysis is insufficient for evaluating the effects of coal type. One explanation for this observation is that the influence of vitrinite reflectance on methane adsorption capacity is caused by the variations in macromolecular [68] and pore [56] structures during the coalification process as coal maturity increases. Besides, it is again noted that there exists a dependence of vitirnite reflectance on the fixed carbon for the coals in this study ( Figure 11). Thus, the effect of vitrinite reflectance may be overridden by that of the fixed carbon from the standpoint of statistical regressions.
The univariate analyses based on the GBDT model are in well accordance with numerous previous studies, which further confirms the validity of the constructed model, as can be seen from the above discussion. It can be also concluded that the GBDT has a remarkably capability of "automatically" identifying the true important features and properly finding the underlying correlations between the output and each input feature, even though both of the features with collinearity and features exerting minor/negligible effects on the output were included in the model.

Influence of Input Features on the Model Accuracy
The constructed model includes not only features with convincing control on the adsorption capacity (equilibrium pressure, ash content, fixed carbon content, and vitrinite reflectance), but also features showing minor or negligible relevance with the output (vitrinite content, inherent moisture, equilibrium moisture, and temperature), as mentioned earlier in Section 2.4.1. To demonstrate the influence of input feature selection on the model accuracy, several estimation models with different scenarios of input features (Table 4) were separately constructed, following the same procedure described in Section 2.4.2. Table 4. Input feature scenarios for analyzing the estimation accuracy.

Scenario No. Input Features *
We began the analysis by including only equilibrium pressure and three coal property parameters (fixed carbon, ash and vitrinite reflectance) that show relatively strong correlations with adsorption capacity (Figure 4) in order to estimate the adsorption isotherm (Scenario#1 in Table 4). It can be seen from Figure 13 that this scenario produces an estimation result with the lowest accuracy in terms of all the evaluation matrices, suggesting that using only these four key features are not sufficient for accurate estimation of the isotherm. With these four parameters held in the model, we then added one of the remaining less significant features (inherent moisture, equilibrium moisture, vitrinite, temperature) at a time into the model. It is shown (Figure 13) that the inclusion of equilibrium moisture in the model (Scenarios#2) results in a most noticeable reduction in the estimation error than that of any of the other features (Scenarios#3, #4, and #5). In order to honor the contribution of equilibrium moisture to estimation accuracy improvement, we fixed equilibrium moisture together with the aforementioned four key parameters in the input feature bank; the feature bank was then expanded by adding one (Scenarios#6, #7, and #8) or two (Scenarios#9, #10, and #11) out of the remaining features sequentially in order to further examine the effect of input feature scenarios on the estimation results. It is depicted in Figure 13 that the estimation accuracy exhibits a general decreasing trend with more input features being included in the model. The model that incorporates all available input features (the one addressed in Section 3.1, which is assigned as Scenario#12 in this section) demonstrates the highest estimation accuracy among all of the scenarios investigated. All available features that may potentially affect the isotherm should be incorporated in the construction of the estimation model for the adsorption isotherm, as indicated from the above results. The exclusion of insignificant features identified with correlation coefficient is highly questionable and tends to decrease the estimation accuracy. This finding is well supported by Beker et al. [33]. It is reiterated that the GBDT is highly robust to interferences from insignificant features and it has a strong capability to properly find the underlying correlations between the input features and the adsorption amount.
It should be noted that feeding more input features into the estimation model requires more efforts to obtain the associated feature information. Generally, the proximate analysis parameters (ash, fixed carbon, and inherent moisture contents) are less expensive and easier to be experimentally measured than the maceral analysis parameters (e.g., vitrinite content). Therefore, it should be of practical significance to use as less maceral features as possible while ensuring relatively high modeling accuracies. Scenarios#7, #8, # 9, and #11 result in high modeling accuracies when compared with Scenario #12, as can be seen from Figure 13. Among these four scenarios, only Scenario#8 does not include the vitrinite, which is a required input feature for all of the remaining scenarios ( Table 4). As such, Scenario#8 may be the most "cost-effective" ones when considering the less input features and reasonably high modeling accuracy.

Conclusions
This paper proposed the use of a machine learning algorithm, namely GBDT, in order to estimate methane adsorption isotherm on coals that are based on coal properties (ash, fixed carbon, inherent moisture, and vitrinite contents and vitrinite reflectance), equilibrium moisture content and temperature. Laboratory tests, including proximate analysis, maceral group identification, vitrinite reflectance determination, and adsorption isotherm measurements, were conducted on 165 coal samples retrieved from the Qinshui basin in China in order to develop a database for regression. It has been demonstrated that the GBDT is capable of not only reproducing the adsorption isotherms with All available features that may potentially affect the isotherm should be incorporated in the construction of the estimation model for the adsorption isotherm, as indicated from the above results. The exclusion of insignificant features identified with correlation coefficient is highly questionable and tends to decrease the estimation accuracy. This finding is well supported by Beker et al. [33]. It is reiterated that the GBDT is highly robust to interferences from insignificant features and it has a strong capability to properly find the underlying correlations between the input features and the adsorption amount.
It should be noted that feeding more input features into the estimation model requires more efforts to obtain the associated feature information. Generally, the proximate analysis parameters (ash, fixed carbon, and inherent moisture contents) are less expensive and easier to be experimentally measured than the maceral analysis parameters (e.g., vitrinite content). Therefore, it should be of practical significance to use as less maceral features as possible while ensuring relatively high modeling accuracies. Scenarios#7, #8, # 9, and #11 result in high modeling accuracies when compared with Scenario #12, as can be seen from Figure 13. Among these four scenarios, only Scenario#8 does not include the vitrinite, which is a required input feature for all of the remaining scenarios ( Table 4). As such, Scenario#8 may be the most "cost-effective" ones when considering the less input features and reasonably high modeling accuracy.

Conclusions
This paper proposed the use of a machine learning algorithm, namely GBDT, in order to estimate methane adsorption isotherm on coals that are based on coal properties (ash, fixed carbon, inherent moisture, and vitrinite contents and vitrinite reflectance), equilibrium moisture content and temperature. Laboratory tests, including proximate analysis, maceral group identification, vitrinite reflectance determination, and adsorption isotherm measurements, were conducted on 165 coal samples retrieved from the Qinshui basin in China in order to develop a database for regression. It has been demonstrated that the GBDT is capable of not only reproducing the adsorption isotherms with reasonable accuracies, but also properly recovering the underlying relation between the input and output variables. As a comparison, the BP-ANN is associated with the over-fitting problem, whereas the SVM has difficulties in accurately estimating the adsorption isotherms in both the training and testing stages. Such observations confirmed the superiority of the GBDT over other ML tools in solving the specific regression problem in this study. Furthermore, the relative importance scanning and univariate analysis based on the constructed GBDT model showed that the adsorption isotherms are primarily controlled by the fixed carbon and ash contents for the coals that were investigated in this study. Other factors, including vitrinite, inherent and equilibrium moistures, vitrinite reflectance, and temperature, exert minor or even negligible effects on the adsorption isotherm.