A Novel Data-Driven Method to Estimate Methane Adsorption Isotherm on Coals Using the Gradient Boosting Decision Tree: A Case Study in the Qinshui Basin, China

Zhang, Jiyuan; Feng, Qihong; Zhang, Xianmin; Hu, Qiujia; Yang, Jiaosheng; Wang, Ning

doi:10.3390/en13205369

Open AccessArticle

A Novel Data-Driven Method to Estimate Methane Adsorption Isotherm on Coals Using the Gradient Boosting Decision Tree: A Case Study in the Qinshui Basin, China

by

Jiyuan Zhang

^1,2,

Qihong Feng

^1,2,*,

Xianmin Zhang

^1,2

,

Qiujia Hu

³,

Jiaosheng Yang

⁴ and

Ning Wang

³

¹

Key Laboratory of Unconventional Oil & Gas Development, China University of Petroleum (East China), Qingdao 266580, China

²

School of Petroleum Engineering, China University of Petroleum (East China), Qingdao 266580, China

³

PetroChina Huabei Oilfield Co., Renqiu 062552, China

⁴

PetroChina Research Institute of Petroleum Exploration and Development, Langfang Branch, Langfang 065007, China

^*

Author to whom correspondence should be addressed.

Energies 2020, 13(20), 5369; https://doi.org/10.3390/en13205369

Submission received: 28 September 2020 / Revised: 8 October 2020 / Accepted: 12 October 2020 / Published: 15 October 2020

(This article belongs to the Special Issue Development of Unconventional Reservoirs 2020)

Download

Browse Figures

Versions Notes

Abstract

:

The accurate determination of methane adsorption isotherms in coals is crucial for both the evaluation of underground coalbed methane (CBM) reserves and design of development strategies for enhancing CBM recovery. However, the experimental measurement of high-pressure methane adsorption isotherms is extremely tedious and time-consuming. This paper proposed the use of an ensemble machine learning (ML) method, namely the gradient boosting decision tree (GBDT), in order to accurately estimate methane adsorption isotherms based on coal properties in the Qinshui basin, China. The GBDT method was trained to correlate the adsorption amount with coal properties (ash, fixed carbon, moisture, vitrinite, and vitrinite reflectance) and experimental conditions (pressure, equilibrium moisture, and temperature). The results show that the estimated adsorption amounts agree well with the experimental ones, which prove the accuracy and robustness of the GBDT method. A comparison of the GBDT with two commonly used ML methods, namely the artificial neural network (ANN) and support vector machine (SVM), confirms the superiority of GBDT in terms of generalization capability and robustness. Furthermore, relative importance scanning and univariate analysis based on the constructed GBDT model were conducted, which showed that the fixed carbon and ash contents are primary factors that significantly affect the adsorption isotherms for the coal samples in this study.

Keywords:

methane adsorption isotherm; coal properties; machine learning; gradient boosting decision tree; estimation model

1. Introduction

As an unconventional hydrocarbon resource, coalbed methane (CBM) has been unlocked for commercial development in the USA, China, Australia, Canada, and India [1]. The recovery of CBM from coal seams has multiple favorable effects, such as the reduction of greenhouse gas release into the atmosphere, enhancement of underground coal mining safety, and addition to natural gas supply [2,3]. It is commonly believed that the majority of methane exists within coal seams via physical adsorption [4,5]. The accurate characterization of methane adsorption isotherms in coals is crucial for the successful development of CBM resources, because the isotherm determines the in situ level of gas saturation, which significantly affects CBM production rates [6].

To date, experimental methods that were commonly used for measuring high-pressure methane adsorption isotherms have included the manometric, the gravimetric, and the volumetric methods [7]. Although these methods differ in the means by which the adsorption amount is determined, they all require indispensable procedures that typically include sample preparation, adsorption equilibrium, and data deduction. Such tedious experimental procedures are not only time-consuming, but they also may result in varying sources of uncertainties. Previous studies [8,9] showed that adsorption isotherms on a same sample measured in different laboratories may exhibit noticeable discrepancies, which may be attributed to uncertainties that stem from e.g., the determination of reference/pump and void volume [10,11], the choice of equation of state (EoS) [8] and impurities in the measurement gas [9]. As such, it is pointed out by [12] that extremely tedious procedures, including through the calibration of the instrument, careful operations, and check of the repeatability, are needed in order to ensure the accuracy and consistency in adsorption isotherm measurements.

When compared with the measurement of adsorption isotherms, determinations of coal properties (e.g., proximate analysis, maceral group identification, vitrinite reflectance measurement) are much easier and faster. Numerous studies have shown that the methane adsorption capacity on coals is potentially affected by the coal properties (e.g., ash, fixed carbon and inherent moisture contents, maceral group, vitrinite reflectance) and experimental conditions (e.g., sample particle size, equilibrium moisture, and temperature) [13,14,15]. As such, it is reasonable and should be viable to estimate/predict the adsorption isotherm while using mathematical regression techniques that are based on these influencing factors. Feng et al. [16] quantitively correlated the Langmuir volume (V_L) with vitrinite reflectance, proximate parameters, vitrinite content, and temperature while using the alternating conditional expectation (ACE) algorithm. More recently, Chattaraj et al. [1] applied the multiple regression analysis method to develop a predictive model for VL based on proximate and ultimate parameters. It should be noted that only V_L was estimated in Feng et al. and Chattarj et al.; neither study considered the estimation of Langmuir pressure (P_L), which determines the curvature of an adsorption isotherm. In other words, the models that were proposed by [1,16] can only predict the maximum adsorption capacity instead of the adsorption isotherms. The difficulty in the accurate estimation of P_L may be due to the uncertainties in its correlations with coal properties. For example, Laxminarayana and Crosdale [17] and Dutta et al. [18] found that Langmuir pressure decreases with the increase in vitrinite reflectance for Australian and Indian coals. However, Busch et al.’s [13] statistics on ≈1000 coal samples show a very scattered pattern between P_L and vitrinite reflectance. Zhang et al. [19] proposed the use of deep neural network (DNN) in order to predict CO₂ adsorption on porous carbon based on surface area, micropore, and mesopore volumes. However, gas adsorption behavior on coals is more complicated than on porous carbons due to the higher degree of chemical and physical complexity of coals.

Having addressed these issues, it should be of practical significance to accurately estimate the adsorption isotherm from parameters that are easy and fast to determine in order to reduce the time-consuming and expensive work of adsorption isotherm measurement. This paper proposed the use of the gradient boosting decision trees (GBDT) [20,21] in order to accurately estimate adsorption isotherms that are based on coal properties (ash, fixed carbon, moisture, vitrinite, and vitrinite reflectance) and experimental condition (equilibrium moisture and temperature) for coal samples acquired from the Qinshui basin. The GBDT is an ensemble method that combines a number of base estimators (decision trees) with the gradient boosting algorithm in order to improve the robustness over a single estimator. The GBDT has empirically proven to be highly efficient and promising for solving various regression and classification problems in the field of energy and petroleum engineering [22,23]. However, to the best knowledge of the authors, the application of GBDT in estimating the adsorption isotherm has not yet been reported. The superiority of the GBDT in terms of accuracy and robustness was then confirmed by comparison with the back-propagation artificial neural network (BP-ANN) and support vector machine (SVM). Sensitivity analysis was then conducted while using the constructed GBDT model to analyze the effect of each input variable on the adsorption isotherm.

2. Materials and Methods

2.1. Geological Background of the Study Area

The study area is the Anze block in the southern Qinshui basin, North China (Figure 1), where commercial developments of CBM resources have been ongoing since more than two decades ago. The Qinshui Basin is a large compound synclinal basin surrounded by the uplifts of Wutai Mountain, Taihang Mountain, Zhongtiao Mountain, and Huo Mountain [24]. The study area consists of the Pennsylvanian Benxi and Taiyuan formations, Permian Shanxi, Xiashihezi, Shangshihezi and Shiqianfeng formations, and Triassic to Quaternary deposits. The primary CBM-bearing formations are No. 3 coal seam in the Shanxi formation and No. 15 coal seam of the Taiyuan formation (Figure 2). The No. 3 and No. 5 coals are characterized with high metamorphism. The coal ranks are in the range of low volatile bituminous to anthracite with R_o,m high up to 4.5% [25]. Maceral compositions are dominated by vitrinite and subordinate inertinite, while liptinite is microscopically unrecognizable. The Lithotypes are primarily semi-bright and bright coals.

2.2. Samples and Experiments

A number of 165 coal samples were acquired while using the downhole coring technique from 72 CBM wellbores in No. 3 and No. 5 coal seams. After being transported to the laboratory in sealed tanks, the coal samples were subjected to proximate analysis, vitrinite reflectance, and adsorption isotherm measurements in order to develop a database that is used for machine learning. Proximate analysis was conducted following the Chinese standard GB/T 212-2008 [26]. The maceral group was identified at 50× magnification under plane polarized reflected light with a fluorescence illuminator, following the Chinese standard GB/T 8899-2013 [27]. Vitrinite reflectance (Romax) was measured according to Chinese standard GB/T 6948-2008 [28] at a magnification of 500 × oil immersions. More details on the analysis procedure are given in [29]. Methane adsorption isotherms were measured on 60–80 mesh moisture-equilibrium coal powders while using the manometric method [7]. For each sample, the experimental temperature was set to be identical with the in situ temperature where the sample was retrieved. Each adsorption isotherm is comprised of eight equilibrium pressures (ranging from ≈0.5 to ≈8.5 MPa) with corresponding adsorption amounts, which results in a total number of 8 × 165 = 1320 data points in the database. Table 1 summarizes experimental data for the samples.

2.3. Basics of GBDT

The basic philosophy behind the GBDT is to use an ensemble of classification and regression trees (CARTs) to fit the training data samples through minimizing a regularized objective function. Each CART is comprised of a number of leaf nodes and each leaf node is associated with a binary decision rule structure and a continuous score. In GBDT, a number of CARTs are developed in a sequential manner in order to form an accurate ensemble model. For the completeness of this paper, the GBDT algorithm is briefly addressed, as follows. Readers are referred to [20,21] for more details on the GBDT algorithm.

For a given data set with d dimensions and n examples

D = {(x_{i}, y_{i})}, (Χ_{i} \in ℝ^{d}, y_{i} \in ℝ, i = 1, 2, \dots, n)

, the output

F

is predicted as the sum of K additive functions, which is written as

F (x) = \sum_{m = 0}^{M} β_{m} h (x; {R_{l m}}_{l}^{L})

(1)

where

h

represents a tree with a number of L nodes;

R_{l m}

represents partitioned region that is defined by the terminal node l of the mth tree;

{β_{m}}_{0}^{M}

are expansion coefficients that are jointly fit with

{R_{l m}}_{l}^{L}

to the training data set by minimizing a regularized objective function:

L = \sum_{i}^{n} ψ (y_{i}, F (x_{i}))

(2)

where

ψ

is a differentiable loss function, which was assigned as the squared error in this study.

The minimization of the loss function is achieved by iteratively adding leaf nodes that result in the steepest decent [21], which is mathematically expressed as:

γ_{l m} = \underset{γ}{argmin} \sum_{x_{i} \in R_{l m}} ψ (y_{i}, F_{m - 1} (x_{i}) + γ)

(3)

F_{m} (x) = F_{m - 1} (x) + υ γ_{l m} l (x_{i} \in R_{l m})

(4)

where

υ

is the shrinkage factor in the range of (0, 1] that controls the learning rate of the training process. Empirically, small values of

υ

are beneficial in conserving the model and, thus, help in increasing the generalization capability [22].

2.4. Construction of the GBDT Estimation Model

2.4.1. Input Features

For constructing a reliable regression model, a first key step is to properly identify input features (or independent variables) [30]. A most popular method for identifying the input features is to conduct univariate correlation regression, and features with high degree of correlation (e.g., in terms of high correlation coefficient) with the output are fed into the estimation model [31,32]. The primary drawback of this method is that feature(s) with weak, but certain, underlying correlations with the output may be excluded from the model, which may tend to decrease the modeling accuracy. Beker et al. [33] argued that all features with either explicit (strong) or implicit (weak) correlations with the output variables should be included in a machine learning model in order to attain high modeling accuracy. In this regard, we assigned features as the input of the model that have been demonstrated empirically to exert potential effects on the adsorption amount, and that are less expensive and more rapid to be experimentally measured than the adsorption isotherm. Section 3 will discuss the effect of the inclusion of these “less significant” features on the model accuracy.

In this study, the adsorption isotherm is represented with a series of discrete (adsorption amount versus equilibrium pressure) data points (Figure 3). Thus, the estimation of adsorption isotherm is, in fact, transformed to the estimation of adsorption amounts at given equilibrium pressures. In this way, the equilibrium pressure is an essential input variable for the construction of the estimation model. An alternative option to estimate the adsorption isotherm would be to use an adsorption model (e.g., the Langmuir type model) to represent the isotherm and then correlate the adsorption model parameters (e.g., Langmuir volume and Langmuir pressure) with certain input features. However, our preliminary evaluation of this alternative option turned out to fail in accurately reproducing the adsorption isotherm, which is probably due to the weak correlation of Langmuir pressure with input features, such as coal properties and experimental conditions, as mentioned earlier.

For the coal samples that were used in this study, coal properties that exhibit strong control on methane adsorption capacity are ash (Figure 4a) and fixed carbon (Figure 4b) (with R² ≥ 0.6), which, therefore, are assigned as the input features. The vitrinite reflectance (Figure 4c) exhibits a generally linear positive effect on the adsorption although the correlation is relatively loose (R² = 0.36), which is also included in the input feature bank. Other factors, including inherent and equilibrium moisture, vitrinite content, and experimental temperature, which show weak correlations (with R² ≤ 0.1) with the adsorption capacity (Figure 4d through 3g), are also included in the model construction given numerous documentations of their potential effect on adsorption isotherm (e.g., [1,4,15,16,17,18]).

As mentioned earlier, our goal is to develop an estimation model that is based on data that are less expensive and less time-consuming to obtain, so that the adsorption isotherms can be fast estimated with reasonable accuracies. Therefore, other coal properties that may exert potential influence on methane adsorption isotherms, such as micro-pore surface area/volume [34,35,36] and surface functional groups [13,37] of coals, are not considered because such information requires experimental endeavors that inevitably bring in additional expenses. Besides, the experimental determination of the pore characteristics is rather complicated while using techniques, such as gas (N₂/CO₂) adsorption [38], focused ion beam scanning electron microscopy (FIB-SEM, [39]) and small-angle neutron scattering (SANS, [40]), which require special experimental apparatus and they may be even more time-consuming than the measurement of adsorption isotherms.

As a short summary, the input features for constructing the estimation model for the adsorption isotherm are: coal properties (including ash, fixed carbon, inherent moisture, vitrinite, and vitrinite reflectance) and experimental conditions (equilibrium pressure, equilibrium moisture, and temperature).

2.4.2. Determination of Optimal GBDT Hyperparameters

Prior to conducting GBDT regressions, the whole database comprising of 165 samples and adsorption isotherms is randomly divided into three sub-sets, namely the training (99 samples, 60%), validation (33 samples, 20%), and testing (33 samples, 20%) sets (Figure 5). The training set was used for training the GBDT network, while the validation set was for monitoring the performance and for determining the optimal model parameters (which is to be addressed in the following paragraph). The testing set was assumed to be “unseen” during the model construction process and used for testing the generalization capability of the constructed regression model. It should be noted again that each adsorption isotherm is represented with eight (adsorption amount versus equilibrium pressure) discrete data points, and, thus, the training, validation, and testing sets are, in effect, constituted with a number of 99 × 8 = 792, 33 × 8 = 264, and 33 × 8 = 264 data points, respectively (Figure 5).

The empirical results from [41,42] demonstrate that the accuracy and generalization capability of the GBDT can be significantly influenced by three parameters, namely the number of estimators, the shrinkage factor, and the maximal tree depth. As such, these parameters should be optimized in order to ensure the accuracy and robustness of the GBDT. In this study, the optimal values for the three parameters were determined through the exhaustive grid search method [43]. That is, all possible combinations of the parameter values were run sequentially, and the optimal parameterization was determined to be the one that results in the lowest root mean squared error (RMSE) for the validation set. Previous studies [41,42] suggested that a satisfactory performance of GBDT can be obtained with relatively small shrinkage factors (<0.1) and low-level tree complexity (with tree depth<6). As such, the shrinkage factor was varied from 0.005 to 0.105 with a step of 0.01, and the maximum tree depth was varied from two to seven with a step of 1 in this study. The optimal number of trees is a problem-dependent hyperparameter, which was set to vary from 500 to 5000 with a step of 500.

2.4.3. Evaluation Matrices

The performance of the GBDT estimation was quantitatively evaluated through four metrics, namely average absolute error (AAE), average relative error (ARE), root mean square error (RMSE), and determination coefficient (R²). The definitions for these metrics are:

AAE = \frac{1}{N} \sum_{i = 1}^{N} | y_{i} - f_{i} |

(5)

ARE = \frac{1}{N} \sum_{i = 1}^{N} | \frac{y_{i} - f_{i}}{y_{i}} |

(6)

RMSE = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - f_{i})}^{2}

(7)

R^{2} = \frac{\sum_{i = 1}^{N} {(y_{i} - f_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}

(8)

where y and f are the measured and estimated adsorption amounts, respectively;

\bar{y}

is the mean value of the measured adsorption amount; and, N is the number of data points.

2.5. Comparison with BP-ANN and SVM

The BP-ANN and SVM are powerful supervised machine learning algorithms that have been successfully applied in solving nonlinear regression problems in a variety of fields [32,44,45,46]. A most popular version of the BP-ANN is the multilayer perception network (MLPN), which is comprised of one input layer, one or more hidden layer, and one output layer. The training of a MLPN is, in essence, an iterative process of updating the weights and biases of the nodes by using the back propagation algorithm in order to minimize an error function. The basic philosophy behind the SVM is to convert the nonlinear regression problem in the true space into linear approximations in a higher dimensional feature space by minimizing a regularized loss function. Mathematical details on the BP-ANN and SVM have been extensively addressed previously (see e.g., [24,39]), which, therefore, are not repeated in this paper.

The LIBSVM pakage [47] and the neural network module that were implemented in the Matlab (V2019) were used for conducting SVM and BP-ANN regressions, respectively. The data points and input variables are identical with that in the GBDT regression. A BP-ANN with three layers (one input, one hidden, and on output layer) has proven to be capable of approximating any continuous function with any accuracy [32], which, therefore, was adopted in this study. It should be noted that (i) the performance of a BP-ANN can be significantly influenced by the number of neurons in the hidden layer [44] and (ii) for an SVM with a kernel of radial basis function (RBF, which is most frequently used for regression), the regression accuracy is associated with regulation and error goal parameters [48,49]. In order to attain a fair comparison, parameters that may affect the BP-ANN and SVM performance were tuned and optimized while using the exhaustive grid search, in a similar manner with the GBDT. Table 2 shows the optimal key model parameters for the BP-ANN and SVM.

3. Results

3.1. Performance of the GBDT Estimation Model

The optimal hyperparameter values for the GBDT, as determined with the exhaustive grid search method, were 0.01 for the shrinkage factor, 3 for the tree depth, and 1500 for the number of trees, respectively. Figure 6 depicts the GBDT estimation results for the training, validation, and testing sets. It is shown in Figure 6a that all of the training data points are grouped closely around the 45-degree line. Table 3 demonstrates extremely low error matrices (an AAE of 0.33 m³/t, an ARE of 2.31%, and a RMSE of 0.42 m³/t) and a remarkably high R² of 0.993 for the training set. These evaluation matrices prove that the GBDT is capable of accurately reproducing the adsorption amount that is based on the input variables. For the validation and testing set, although the cross plots of the measured versus estimated values show a more scattered pattern than the training set, the majority of the data points are distributed around the 45-degree line and the deviations are within small ranges (Figure 6b,c). The AAE, ARE, RMSE, and determination coefficient (R²) are calculated to be 0.83 m³/t, 5.97%, 1.00 m³/t, and 0.950 for the validation, and 0.85 m³/t, 6.35%, 1.06 m³/t, and 0.946 for the testing sets, respectively. The error matrices for the validation and testing are quite comparable, suggesting strong robustness of the constructed model (Table 3). In this regard, the GBDT model can be considered to have a strong generalization capability, as indicated by the relatively low error matrices and high R².

The comparison between the estimated and measured adsorption isotherms for typical samples in the testing set was conducted in order to further demonstrate the accuracy of the GBDT model in reproducing the adsorption isotherm for an individual coal sample. The methane adsorption capacity on the coal samples is predominantly controlled by the ash and fixed carbon contents, as mentioned in Section 2.4.1. Therefore, four typical samples—one with the highest ash content, one with the lowest ash content, one with the highest fixed carbon content and one with the lowest fixed carbon content—among all samples in the testing set were selected for illustrating the model accuracy.

For the two samples with respective ash contents of 9.6% and 39.96% and one sample with low fixed carbon content (83.88%), the estimated adsorption isotherms are in excellent agreement with the measured ones, as can be seen from Figure 7. For the sample with high fixed carbon content (91.54%), the adsorption equilibrium points at lower pressures (≤≈4.0 MPa) agrees well with the measured ones, whereas certain deviations exist for the equilibrium points at higher pressures (>≈4.0MPa). The maximum error occurs at an equilibrium pressure of ≈8.0 MPa, with the estimated and measured adsorption amounts being 23.71 and 25.23 m³/t, respectively. Such discrepancy, as we note, can be considered to be acceptable given the uncertainties that are associated with sample preparation, data acquisition, and measurement operations [12]. Previous reproducibility tests [50,51] showed that discrepancies in the adsorption isotherm measurement may reach high, up to 10–15% on a same coal sample, which are even higher than the GBDT estimation results. It should also be pointed out that the estimated adsorption amount follows a monotonically increasing trend with increasing pressure (which is basic characteristics for methane adsorption isotherms on coals), although no specific constraint was applied in the training process to compel such monotonicity. These results confirm the reliability of the constructed GBDT model in estimating the methane adsorption isotherms on coals with reasonable accuracies.

3.2. Comparison with BP-ANN and SVM

Figure 8 shows the cross plots of BP-ANN estimated with measured adsorption amounts for the training, validation, and testing sets. All of the data points are generally located on the 45-degree line, which suggests that BP-ANN has an extraordinary capability to accurately correlate the output with input variables for the training set, as can be seen from Figure 8a. Table 3 demonstrates that the BP-ANN outperforms the GBDT in terms of error matrices for the training set. However, Figure 8b,c demonstrate that a noticeable number of data points deviate severely from the 45-degree line for both the validation and testing sets, resulting in higher errors (AAE, ARE, and RMSE) and lower R² than the GBDT (Table 3). These observations suggest that the generalization capability of BP-ANN is highly questionable and severe over-fitting issue occurs. As such, the BP-ANN should not be considered to be suitable for accurately estimating the adsorption isotherms.

Figure 9 depicts the estimation results of SVM regression. As shown, there is a noticeable number of data points that severely deviate from the 45-degree line for the training, validation, and testing sets. Thus, it is concluded that the SVM is neither capable of accurately learning the underlying correlations between the output and input variables nor capable of giving reasonable predictions. Comparisons of the evaluation matrices for the SVM with those for the GBDT and BP-ANN (Table 3) suggest that the SVM has better generalization capability than the BP-ANN, but performs worse than the GBDT.

4. Discussion

4.1. Analyses of Effects of Input Features on Adsorption Isotherms

4.1.1. Relative Importance of Input Features

Once the estimation model has been constructed, it should be of practical meaning to quantify the effect of each input feature on the adsorption isotherm. In this section, the relative importance of each input variable is quantified while using the mean decrease impurity importance (MDI) [52,53]. A most significant advantage of the MDI over conventional Pearson or Spearman coefficients is that the MDI does not require a priory assumption of linear or monotonic dependence of the output on the input features, which, therefore, should be more accurate in quantifying the effects of each input feature [54]. Figure 10 shows that fixed carbon and ash are three key factors that control the adsorption amount. The equilibrium moisture has a relative importance of ≈8.8%, while the remaining factors (temperature, vitrinite, vitrinite reflectance, and inherent moisture) have relative importance of less than 3.0%, which suggests the very minor or even negligible influences of these factors on the adsorption amount. Here, it is noted the effect of vitrinite reflectance is significantly diluted when compared with the correlation analysis in Section 2.3, which is possibly due to the collinearity between the vitrinite reflectance and fixed carbon for the coal samples (Figure 11). The existence of collinearity may result in the abnormal response of the output to one or several of the collinear inputs [55]. Fixed carbon demonstrates an obviously stronger correlation on the adsorption capacity than vitrinite reflectance does and, thus, the effect of the vitrinite reflectance has a high risk of being overridden by the fixed carbon considering their collinearity, as can be seen from Figure 4b,c.

4.1.2. Univariate Analyses

The univariate analysis was conducted using the constructed GBDT model to further demonstrate how the adsorption isotherms are affected by the input features. The base value was set to be 20%, 1.0%, 88%, 80%, 2.5%, 15%, and 35℃ for ash, inherent moisture, fixed carbon, vitrinite, vitrinite reflectance, equilibrium moisture, and temperature, respectively. These values were set as approximately the averaged ones that are shown in Table 1. Each input variable was tuned at four values (that are within the range of all the coal samples in this study) and the corresponding adsorption isotherms (at pressures of 1 to 8 MPa with a step of 1 Mpa) were sequentially calculated, which are shown in Figure 12.

Fixed carbon

Figure 12a depicts the adsorption isotherm with reference to varying fixed carbon. It is well demonstrated that the isotherm tends to move upwards as fixed carbon increases. Previous studies [16,56] observed that the methane adsorption capacity follows a first decreasing and then increasing trend with increasing fixed carbon, with the minimum occurring at ≈60–80%. This parabolic trend may be attributed to the variations in the micro-pore surface areas that are associated with the coalification jump that occurs approximately in the range of 75–85% fixed carbon [17]. More recently, Chattaraj et al. [1] showed that, for Indian coals with fixed carbon content of >75%, methane adsorption capacity is in positive linear correlation with fixed carbon. It is interesting to note that the coal samples used in this study have a generally high fix carbon contents of >77%, which suggests that our findings are in line with these previous studies.

Ash

Figure 12b illustrates that the adsorption isotherm exhibits an obvious negative correlation with ash at all pressures. It is well understood that an increase in ash content tends to decrease the adsorption isotherm on coals, because (i) ash has no affinity to methane adsorption [7,18] and (ii) ash-rich samples are generally associated with lower micro porosities [57] and, therefore, provide less adsorption space to accommodate gas molecules.

Moisture

Variations in adsorption isotherms caused by inherent and equilibrium moistures are obviously less significant than that by fixed carbon or ash, which is consistent with the ranking of relative importance, as shown in Figure 12c,d. Additionally, it is noticeable that the adsorption capacity does not follow a monotonous decreasing trend with the increase in either inherent or equilibrium moistures. It has been extensively addressed in previous studies [14,58,59] that a coal sample in the moisture-equilibrium state has a significantly lower adsorption capability than in the dry state. This is due to the occupation of some adsorption sites on the coal surface by water molecules because coals have a preferential affinity to water over methane [7]. However, for a coal sample that is already in a moisture-equilibrated state, a further increment in moisture content does not affect the adsorption capacity to gas [14,60]. Besides, as stated in [7,13], the moisture content may be predominated by the coal rank. Thus, the effect of moisture content on the adsorption isotherm may possibly be overridden by the coal rank indicators such as fixed carbon for the coal samples in this study.

Temperature

Figure 12e shows that there is no significant change in the adsorption isotherm with elevating temperature. Most previous studies [61,62] conclude that the elevation in temperature may result in a noticeable reduction in methane adsorption capacity, because the sorptive surface coverage at a specific gas pressure decreases with increasing temperature, as derived from thermodynamics [7]. However, Crosdale et al.’s [60] experiments on moist coals showed no significant dependence of adsorption capacity on temperatures. More recently, Guan et al. [63] showed that the adsorption capacities for both methane and CO₂ remained constant as the temperatures were elevated from 323 to 343 K. Our observations are consistent with [60], which may be attributed to the compensation by water molecule release for the reduction in the sorptive surface coverage caused by temperature elevation [6].

Vitrinite

To date, there are still controversies regarding the effect of vitrinite content on the methane adsorption capacity. Some studies [1,4,17] showed that vitrinite-rich (bright) coals have a higher methane adsorption capacity than the inertinite-rich (dull) ones with equivalent ranks, which may be attributed to the existence of more micro-pores in vitrinite that is favorable to accommodation of gas molecules [64]. Dutta et al. [18] and Feng et al. [16] stated that methane adsorption capacity follows a “U-shaped” trend with vitrinite content. Other authors [13,65,66] found no obvious correlation between the adsorption capacity and vitrinite content, which holds valid for the coal samples in this study (Figure 12f).

Vitrinite reflectance

Vitrinite reflectance is a commonly used indicator of the coal rank (maturity), which numerous previous studies [15,16,18] have demonstrated to be closely correlated with the methane adsorption capacities in coals. For the coal samples that were investigated in this study, the vitrinite reflectance exerts a negligible effect on the adsorption isotherm (Figure 12g). This is in line with [67], who argued that the vitrinite reflectance alone cannot control the maximum sorption capacities and simple lithotype analysis is insufficient for evaluating the effects of coal type. One explanation for this observation is that the influence of vitrinite reflectance on methane adsorption capacity is caused by the variations in macromolecular [68] and pore [56] structures during the coalification process as coal maturity increases. Besides, it is again noted that there exists a dependence of vitirnite reflectance on the fixed carbon for the coals in this study (Figure 11). Thus, the effect of vitrinite reflectance may be overridden by that of the fixed carbon from the standpoint of statistical regressions.

The univariate analyses based on the GBDT model are in well accordance with numerous previous studies, which further confirms the validity of the constructed model, as can be seen from the above discussion. It can be also concluded that the GBDT has a remarkably capability of “automatically” identifying the true important features and properly finding the underlying correlations between the output and each input feature, even though both of the features with collinearity and features exerting minor/negligible effects on the output were included in the model.

4.2. Influence of Input Features on the Model Accuracy

The constructed model includes not only features with convincing control on the adsorption capacity (equilibrium pressure, ash content, fixed carbon content, and vitrinite reflectance), but also features showing minor or negligible relevance with the output (vitrinite content, inherent moisture, equilibrium moisture, and temperature), as mentioned earlier in Section 2.4.1. To demonstrate the influence of input feature selection on the model accuracy, several estimation models with different scenarios of input features (Table 4) were separately constructed, following the same procedure described in Section 2.4.2.

We began the analysis by including only equilibrium pressure and three coal property parameters (fixed carbon, ash and vitrinite reflectance) that show relatively strong correlations with adsorption capacity (Figure 4) in order to estimate the adsorption isotherm (Scenario#1 in Table 4). It can be seen from Figure 13 that this scenario produces an estimation result with the lowest accuracy in terms of all the evaluation matrices, suggesting that using only these four key features are not sufficient for accurate estimation of the isotherm. With these four parameters held in the model, we then added one of the remaining less significant features (inherent moisture, equilibrium moisture, vitrinite, temperature) at a time into the model. It is shown (Figure 13) that the inclusion of equilibrium moisture in the model (Scenarios#2) results in a most noticeable reduction in the estimation error than that of any of the other features (Scenarios#3, #4, and #5). In order to honor the contribution of equilibrium moisture to estimation accuracy improvement, we fixed equilibrium moisture together with the aforementioned four key parameters in the input feature bank; the feature bank was then expanded by adding one (Scenarios#6, #7, and #8) or two (Scenarios#9, #10, and #11) out of the remaining features sequentially in order to further examine the effect of input feature scenarios on the estimation results. It is depicted in Figure 13 that the estimation accuracy exhibits a general decreasing trend with more input features being included in the model. The model that incorporates all available input features (the one addressed in Section 3.1, which is assigned as Scenario#12 in this section) demonstrates the highest estimation accuracy among all of the scenarios investigated.

All available features that may potentially affect the isotherm should be incorporated in the construction of the estimation model for the adsorption isotherm, as indicated from the above results. The exclusion of insignificant features identified with correlation coefficient is highly questionable and tends to decrease the estimation accuracy. This finding is well supported by Beker et al. [33]. It is reiterated that the GBDT is highly robust to interferences from insignificant features and it has a strong capability to properly find the underlying correlations between the input features and the adsorption amount.

It should be noted that feeding more input features into the estimation model requires more efforts to obtain the associated feature information. Generally, the proximate analysis parameters (ash, fixed carbon, and inherent moisture contents) are less expensive and easier to be experimentally measured than the maceral analysis parameters (e.g., vitrinite content). Therefore, it should be of practical significance to use as less maceral features as possible while ensuring relatively high modeling accuracies. Scenarios#7, #8, # 9, and #11 result in high modeling accuracies when compared with Scenario #12, as can be seen from Figure 13. Among these four scenarios, only Scenario#8 does not include the vitrinite, which is a required input feature for all of the remaining scenarios (Table 4). As such, Scenario#8 may be the most “cost-effective” ones when considering the less input features and reasonably high modeling accuracy.

5. Conclusions

This paper proposed the use of a machine learning algorithm, namely GBDT, in order to estimate methane adsorption isotherm on coals that are based on coal properties (ash, fixed carbon, inherent moisture, and vitrinite contents and vitrinite reflectance), equilibrium moisture content and temperature. Laboratory tests, including proximate analysis, maceral group identification, vitrinite reflectance determination, and adsorption isotherm measurements, were conducted on 165 coal samples retrieved from the Qinshui basin in China in order to develop a database for regression. It has been demonstrated that the GBDT is capable of not only reproducing the adsorption isotherms with reasonable accuracies, but also properly recovering the underlying relation between the input and output variables. As a comparison, the BP-ANN is associated with the over-fitting problem, whereas the SVM has difficulties in accurately estimating the adsorption isotherms in both the training and testing stages. Such observations confirmed the superiority of the GBDT over other ML tools in solving the specific regression problem in this study. Furthermore, the relative importance scanning and univariate analysis based on the constructed GBDT model showed that the adsorption isotherms are primarily controlled by the fixed carbon and ash contents for the coals that were investigated in this study. Other factors, including vitrinite, inherent and equilibrium moistures, vitrinite reflectance, and temperature, exert minor or even negligible effects on the adsorption isotherm.

Author Contributions

Conceptualization, Q.F.; methodology, J.Z.; software, X.Z.; validation, N.W.; formal analysis, X.Z.; investigation, J.Z.; data curation, J.Y.; writing—original draft preparation, J.Z.; writing—review and editing, X.Z.; visualization, J.Y.; supervision, Q.F.; project administration, Q.H.; funding acquisition, J.Z. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by China National Natural Science Foundation, grant numbers 51904319 and U1810105, China Postdoctoral Science Foundation, grant number 2018M642727.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chattaraj, S.; Mohanty, D.; Kumar, T.; Halder, G.; Mishra, K. Comparative study on sorption characteristics of coal seams from Barakar and Raniganj formations of Damodar Valley Basin, India. Int. J. Coal Geol. 2019, 212, 103202. [Google Scholar] [CrossRef]
Zhang, J.; Feng, Q.; Zhang, X.; Hu, Q.; Wen, S.; Chen, D.; Zhai, Y.; Yan, X. Multi-fractured horizontal well for improved coalbed methane production in eastern Ordos basin, China: Field observations and numerical simulations. J. Pet. Sci. Eng. 2020, 194, 107488. [Google Scholar] [CrossRef]
Zhang, J.; Feng, Q.; Zhang, X.; Bai, J.; Karacan, C.Ö.; Wang, Y.; Elsworth, D. A two-stage step-wise framework for fast optimization of well placement in coalbed methane reservoirs. Int. J. Coal Geol. 2020, 225. [Google Scholar] [CrossRef]
Crosdale, P.J.; Beamish, B.B.; Valix, M. Coalbed methane sorption related to coal composition. Int. J. Coal Geol. 1998, 35, 147–158. [Google Scholar] [CrossRef]
Kim, D.; Seo, Y.; Kim, J.; Han, J.; Lee, Y. Experimental and simulation studies on adsorption and diffusion characteristics of coalbed methane. Energies 2019, 12, 3445. [Google Scholar] [CrossRef] [Green Version]
Peng, Z.; Liu, S.; Li, Y.-J.; Deng, Z.; Feng, H. Pore-scale lattice Boltzmann simulation of gas diffusion–adsorption kinetics considering adsorption-induced diffusivity change. Energies 2020, 13, 4927. [Google Scholar] [CrossRef]
Busch, A.; Gensterblum, Y. CBM and CO₂-ECBM related sorption processes in coal: A review. Int. J. Coal Geol. 2011, 87, 49–71. [Google Scholar] [CrossRef]
Gasparik, M.; Rexer, T.F.; Aplin, A.C.; Billemont, P.; De Weireld, G.; Gensterblum, Y.; Henry, M.; Krooss, B.M.; Liu, S.; Ma, X.; et al. First international inter-laboratory comparison of high-pressure CH₄, CO₂ and C₂H₆ sorption isotherms on carbonaceous shales. Int. J. Coal Geol. 2014, 132, 131–146. [Google Scholar] [CrossRef] [Green Version]
Gensterblum, Y.; van Hemert, P.; Billemont, P.; Battistutta, E.; Busch, A.; Krooss, B.M.; De Weireld, G.; Wolf, K.-H. European inter-laboratory comparison of high pressure CO₂ sorption isotherms ii: Natural coals. Int. J. Coal Geol. 2010, 84, 115–124. [Google Scholar] [CrossRef]
Mavor, M.J.; Hartman, C.; Pratt, T.J. Uncertainty in sorption isotherm measurements. In Proceedings of the International Coalbed Methane Symposium, Tuscaloosa, AL, USA, 12–14 May 2004. [Google Scholar]
van Hemert, P.; Rudolph-Floter, S.; Wolf, K.-H.A.; Bruining, J. Estimate of equation of state uncertainty for manometric sorption experiments: Case study with Helium and carbon dioxide. SPE J. 2010, 15, 146–151. [Google Scholar] [CrossRef]
Zlotea, C.; Moretto, P.; Steriotis, T. A round robin characterisation of the hydrogen sorption properties of a carbon based material. Int. J. Hydrogen Energy 2009, 34, 3044–3057. [Google Scholar] [CrossRef]
Busch, A.; Han, F.; Magill, C.R. Paleofloral dependence of coal methane sorption capacity. Int. J. Coal Geol. 2019, 211. [Google Scholar] [CrossRef]
Day, S.; Sakurovs, R.; Weir, S. Supercritical gas sorption on moist coals. Int. J. Coal Geol. 2008, 74, 203–214. [Google Scholar] [CrossRef]
Weniger, P.; Kalkreuth, W.; Busch, A.; Krooss, B.M. High-pressure methane and carbon dioxide sorption on coal and shale samples from the Paraná Basin, Brazil. Int. J. Coal Geol. 2010, 84, 190–205. [Google Scholar] [CrossRef]
Feng, Q.; Zhang, J.; Zhang, X.; Shu, C.; Wen, S.; Wang, S.; Li, J. The use of alternating conditional expectation to predict methane sorption capacity on coal. Int. J. Coal Geol. 2014, 121, 137–147. [Google Scholar] [CrossRef]
Laxminarayana, C.; Crosdale, P.J. Role of coal type and rank on methane sorption characteristics of Bowen Basin, Australia coals. Int. J. Coal Geol. 1999, 40, 309–325. [Google Scholar] [CrossRef]
Dutta, P.; Bhowmik, S.; Das, S. Methane and carbon dioxide sorption on a set of coals from India. Int. J. Coal Geol. 2011, 85, 289–299. [Google Scholar] [CrossRef]
Zhang, Z.; Schott, J.A.; Liu, M.; Chen, H.; Lu, X.; Sumpter, B.G.; Fu, J.; Dai, S. Prediction of carbon dioxide adsorption via deep learning. Angew. Chem. Int. Ed. 2019, 131, 265–269. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
Luo, Z.; Sun, Z.; Ma, F.; Qin, Y.; Ma, S. Power Optimization for Wind Turbines Based on Stacking Model and Pitch Angle Adjustment. Energies 2020, 13, 4158. [Google Scholar] [CrossRef]
Amar, M.; Shateri, M.; Hemmati-Sarapardeh, A.; Alamatsaz, A. Modeling oil-brine interfacial tension at high pressure and high salinity conditions. J. Petrol. Sci. Eng. 2019, 183, 106413. [Google Scholar] [CrossRef]
Cai, Y.; Liu, D.; Yao, Y.; Li, J.; Qiu, Y. Geological controls on prediction of coalbed methane of No. 3 coal seam in Southern Qinshui Basin, North China. Int. J. Coal Geol. 2011, 88, 101–112. [Google Scholar] [CrossRef]
Song, Y.; Ma, X.; Liu, S.; Jiang, L.; Hong, F.; Qin, Y. Accumulation conditions and key technologies for exploration and development of Qinshui coalbed methane field. Pet. Res. 2018, 3, 320–335. [Google Scholar] [CrossRef]
Chinese National Standard GB/T 212-2008. Proximate Analysis of Coal; Standardization Administration of China: Beijing, China, 2008. [Google Scholar]
Chinese National Standard GB/T 8899-2013. Determination of Maceral Composition and Minerals in Coal; Standardization Administration of China: Beijing, China, 2013. [Google Scholar]
Chinese National Standard GB/T 6948-2008. Method of Determining Microscopically the Reflectance of Vitrinite in Coal; Standardization Administration of China: Beijing, China, 2008. [Google Scholar]
Sanders, M.; Rimmer, S. Revisiting the thermally metamorphosed coals of the Transantarctic Mountains, Antarctica. Int. J. Coal Geol. 2020, 228, 103550. [Google Scholar] [CrossRef]
Zhang, J.; Feng, Q.; Zhang, X.; Shu, C.; Wang, S.; Wu, K. A supervised learning approach for accurate modeling of CO₂-Brine interfacial tension with application in identifying the optimum sequestration depth in saline aquifers. Energy Fuel. 2020, 34, 7353–7362. [Google Scholar] [CrossRef]
Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
Feng, Q.; Zhang, J.; Zhang, X.; Wen, S. Proximate analysis based prediction of gross calorific value of coals: A comparison of support vector machine, alternating conditional expectation and artificial neural network. Fuel Process. Technol. 2015, 129, 120–129. [Google Scholar] [CrossRef]
Beker, W.; Gajewska, E.P.; Badowski, T.; Grzybowski, B.A. Prediction of major regio-, site-, and diastereoisomers in diels-alder reactions by using machine-learning: The importance of physically meaningful descriptors. Angew. Chem. Int. Ed. 2019, 58. [Google Scholar] [CrossRef]
An, F.H.; Cheng, Y.P.; Wu, D.M.; Wang, L. The effect of small micropores on methane adsorption of coals from Northern China. Adsorption 2013, 19, 83–90. [Google Scholar] [CrossRef]
Clarkson, C.R.; Bustin, R.M. The effect of pore structure and gas pressure upon the transport properties of coal: A laboratory and modeling study. 1. Isotherms and pore volume distributions. Fuel 1999, 78, 1333–1344. [Google Scholar] [CrossRef]
Liu, X.; He, X. Effect of pore characteristics on coalbed methane adsorption in middle-high rank coals. Adsorption 2017, 23, 3–12. [Google Scholar] [CrossRef]
Hao, S.; Wen, J.; Yu, X.; Wei, C. Effect of the surface oxygen groups on methane adsorption on coals. Appl. Surf. Sci. 2013, 264, 433–442. [Google Scholar] [CrossRef]
Jiang, J.; Yang, W.; Cheng, Y.; Zhao, K.; Zheng, S. Pore structure characterization of coal particles via MIP, N₂ and CO₂ adsorption: Effect of coalification on nanopores evolution. Powder Technol. 2019, 354, 136–148. [Google Scholar] [CrossRef]
Li, Z.; Liu, D.; Cai, Y.; Ranjith, P.; Yao, Y. Multi-scale quantitative characterization of 3-D pore-fracture networks in bituminous and anthracite coals using FIB-SEM tomography and X-ray μ-CT. Fuel 2017, 209, 43–53. [Google Scholar] [CrossRef]
Zhang, R.; Liu, S.; Bahadur, J.; Elsworth, D.; Melnichenko, Y.; He, L.; Wang, Y. Estimation and modeling of coal pore accessibility using small angle neutron scattering. Fuel 2015, 161, 323–332. [Google Scholar] [CrossRef] [Green Version]
Elith, J.; Leathwick, J.R.; Hastie, T. A working guide to boosted regression trees. J. Anim. Ecol. 2008, 77, 802–813. [Google Scholar] [CrossRef]
Ridgeway, G. Generalized Boosted Models: A Guide to the GBM Package. Available online: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.4024 (accessed on 27 September 2020).
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 2012, 12, 2825–2830. [Google Scholar]
Zhang, J.; Feng, Q.; Wang, S.; Zhang, X.; Wang, S. Estimation of CO₂–brine interfacial tension using an artificial neural network. J. Supercrit. Fluid 2016, 107, 31–37. [Google Scholar] [CrossRef]
Dixit, N.; McColgan, P.; Kusler, K. Machine learning-based probabilistic lithofacies prediction from conventional well Logs: A case from the Umiat Oil Field of Alaska. Energies 2020, 13, 4862. [Google Scholar] [CrossRef]
Jadidi, M.; Kostic, S.; Zimmer, L.; Dworkin, S.B. An artificial neural network for the low-cost prediction of soot emissions. Energies 2020, 13, 4787. [Google Scholar] [CrossRef]
Chang, C.; Lin, C. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2019, 2, 1–27. [Google Scholar] [CrossRef]
Khan, P.W.; Byun, Y.-C.; Lee, S.-J.; Kang, D.-H.; Kang, J.-Y.; Park, H.-S. Machine learning-based approach to predict energy consumption of renewable and nonrenewable power sources. Energies 2020, 13, 4870. [Google Scholar] [CrossRef]
Memon, Z.A.; Trinchero, R.; Manfredi, P.; Canavero, F.G.; Stievano, I.S. Compressed machine learning models for the uncertainty quantification of power distribution Networks. Energies 2020, 13, 4881. [Google Scholar] [CrossRef]
Busch, A.; Gensterblum, Y.; Krooss, B.M. Methane and CO₂ sorption and desorption measurements on dry Argonne premium coals: Pure components and mixtures. Int. J. Coal Geol. 2003, 55, 205–224. [Google Scholar] [CrossRef]
Li, D.; Liu, Q.; Weniger, P.; Gensterblum, Y.; Busch, A.; Krooss, B.M. High-pressure sorption isotherms and sorption kinetics of CH₄ and CO₂ on coals. Fuel 2010, 89, 569–580. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.; Stone, C.J.; Olshen, R.A. Classification and Regression Trees; Chapman & Hall (Wadsworth, Inc.): New York, NY, USA, 1984. [Google Scholar]
Louppe, G. Understanding Random Forests: From Theory to Practice. Ph.D. Thesis, University of Liège, Liège, Belgium, 2014. [Google Scholar]
Zhang, J.; Sun, Y.; Shang, L.; Feng, Q.; Gong, L.; Wu, K. A unified intelligent model for estimating (gas + n-alkane) interfacial tension. Fuel 2020, 282, 118783. [Google Scholar] [CrossRef]
Tomaschek, F.; Hendrix, P.; Baayen, R.H. Strategies for addressing collinearity in multivariate linguistic data. J. Phon. 2018, 71, 249–267. [Google Scholar] [CrossRef]
Levy, J.H.; Day, S.J.; Killingley, J.S. Methane capacities of Bowen Basin coals related to coal properties. Fuel 1997, 76, 813–819. [Google Scholar] [CrossRef]
Chalmers, G.R.; Bustin, R.M. On the effects of petrographic composition on coalbed methane sorption. Int. J. Coal Geol. 2007, 69, 288–304. [Google Scholar] [CrossRef]
Clarkson, C.R.; Bustin, R.M. Binary gas adsorption/desorption isotherms: Effect of moisture and coal composition upon carbon dioxide selectivity over methane. Int. J. Coal Geol. 2000, 42, 241–271. [Google Scholar] [CrossRef]
Weniger, P.; Franců, J.; Hemza, P.; Krooss, B.M. Investigations on the methane and carbon dioxide sorption capacity of coals from the SW Upper Silesian Coal Basin, Czech Republic. Int. J. Coal Geol. 2012, 93, 23–39. [Google Scholar] [CrossRef]
Crosdale, P.J.; Moore, T.A.; Mares, T.E. Influence of moisture content and temperature on methane adsorption isotherm analysis for coals from a low-rank, biogenically-sourced gas reservoir. Int. J. Coal Geol. 2008, 76, 166–174. [Google Scholar] [CrossRef]
Krooss, B.V.; Van Bergen, F.; Gensterblum, Y.; Siemons, N.; Pagnier, H.J.M.; David, P. High-pressure methane and carbon dioxide adsorption on dry and moisture-equilibrated Pennsylvanian coals. Int. J. Coal Geol. 2002, 51, 69–92. [Google Scholar] [CrossRef]
Pan, J.; Hou, Q.; Ju, Y.; Bai, H.; Zhao, Y. Coalbed methane sorption related to coal deformation structures at different temperatures and pressures. Fuel 2012, 102, 760–765. [Google Scholar] [CrossRef]
Guan, C.; Liu, S.; Li, C.; Wang, Y.; Zhao, Y. The temperature effect on the methane and CO₂ adsorption capacities of Illinois coal. Fuel 2018, 211, 241–250. [Google Scholar] [CrossRef]
Clarkson, C.R.; Bustin, R.M. Variation in micropore capacity and size distribution with composition in bituminous coal of the Western Canadian Sedimentary Basin: Implications for coalbed methane potential. Fuel 1996, 75, 1483–1498. [Google Scholar] [CrossRef]
Carroll, R.E.; Pashin, J.C. Relationship of sorption capacity to coal quality: CO₂ sequestration potential of coalbed methane reservoirs in the Black Warrior basin. In Proceedings of the International Coalbed Methane Symposium, University of Alabama, Tuscaloosa, AL, USA, 5–7 May 2003. [Google Scholar]
Faiz, M.; Saghafi, A.; Sherwood, N.; Wang, I. The influence of petrological properties and burial history on coal seam methane reservoir characterisation, Sydney Basin, Australia. Int. J. Coal Geol. 2007, 70, 193–208. [Google Scholar] [CrossRef]
Laxminarayana, C.; Crosdale, P.J. Controls on methane sorption capacity of Indian coals. AAPG Bull. 2002, 86, 201–212. [Google Scholar] [CrossRef]
Guo, X.; Huan, X.; Huan, H. Structural characteristics of deformed coals with different deformation degrees and their effects on gas adsorption. Energy Fuel. 2017, 31, 13374–13381. [Google Scholar] [CrossRef]

Figure 1. Illustration of the study area where coal samples were retrieved. Reprint with permission [24]; 2011, Elsevier Ltd.

Figure 2. Stratigraphy of the coal-bearing strata in the study area. Reprint with permission [24]; 2011, Elsevier Ltd.

Figure 3. An example of the experimentally measured adsorption isotherm represented with discrete equilibrium points. Pⁱ and Yⁱ represent the equilibrium pressure and the corresponding adsorption amount for the ith equilibrium points.

Figure 4. Correlation analysis of input features with the adsorption capacity represented with Langmuir volume. (a) ash (a.r.); (b) fixed carbon (d.a.f.); (c) vitrinite reflectance; (d) inherent moisture (a.r.); (e) equilibrium moisture; (f) vitrinite (m.m.f) and (g) temperature.

Figure 5. Illustration of the database structure and division of the database into the training, validation and testing sets. P—equilibrium pressure; A—ash; FC—fixed carbon; V—vitrinite; R_omax—vitrinite reflectance; IM—inherent moisture; EM—equilibrium moisture; T—temperature; Y—adsorption amount. Subscript j denotes the jth sample; Superscript I on “P” and “Y” denote the ith equilibrium point on the adsorption isotherm.

Figure 6. Cross plot of the gradient boosting decision trees (GBDT) estimated versus measured adsorption amount for the (a) training (b) validation; and, (c) testing sets. Open circles are data points; red lines are 45-degree lines.

Figure 7. Comparison of the estimated with measured adsorption isotherms for samples with (a) ash contents of 9.6% and 39.96%, respectively, and (b) fixed carbon contents of 83.88% and 91.54%, respectively.

Figure 8. Cross plot of the BP-ANN estimated versus measured adsorption amounts for the (a) training, (b) validation, and (c) testing sets. Open circles are data points; red lines are 45-degree lines.

Figure 9. Cross plot of the SVM estimated versus measured adsorption amounts for the (a) training, (b) validation, and (c) testing sets. Open circles are data points; red lines are 45-degree lines.

Figure 10. Relative importance of the input variables to the adsorption isotherm.

Figure 11. Dependence of vitrinite reflectance on fixed carbon of the coal samples.

Figure 12. Calculated adsorption isotherms using the constructed GBDT model with reference to varying (a) fixed carbon (d.a.f), (b) ash (a.d.), (c) inherent moisture (a.r.), (d) equilibrium moisture, (e) temperature, (f) vitrinite (m.m.f), and (g) vitrinite reflectance.

Figure 13. Error matrices of (a) average absolute error (AAE), (b) average relative error (ARE), (c) root mean squared error (RMSE) and (d) R² for different input feature scenarios.

Table 1. Summary of the experimental data.

Property	Maximum	Minimum	Average
Ash (a.d.), %	49.59	4.85	18.70
Moisture (a.r.), %	2.20	0.34	1.10
Fixed carbon (d.a.f.), %	93.08	78.15	87.74
Vitrinite (m.m.f), %	97.80	47.50	80.77
Vitrinite reflectance, %	3.18	1.67	2.39
Equilibrium moisture, %	33.90	6.00	14.22
Temperature, °C	45.0	24.0	33.71
Langmuir volume_, m³/t	37.26	12.53	24.25
Langmuir pressure, MPa	2.90	1.52	2.03

Note: a.r.—as received; a.d.—air dry; d.a.f—dry ash free; m.m.f—mineral matter free.

Table 2. Modeling parameters for the back-propagation artificial neural network (BP-ANN) and support vector machine (SVM).

Method	Property	Value
BP-ANN	No. of hidden layers	1
	No. of nodes in each hidden layer	20
	Activation function for hidden layer(s)	Tangent
	Activation function for output layer	Linear
SVM	Activation function	RBF
	Regulation parameter	86
	Error goal parameter	0.005

Table 3. Error statistics of the GBDT, BP-ANN, and SVM models.

Data Set	Error Matrices	GBDT	ANN	SVM
Training set	AAE, m³/t	0.33	0.21	0.71
	ARE, %	2.31	1.62	5.58
	RMSE, m³/t	0.42	0.28	1.01
	R², fraction	0.993	0.997	0.959
Validation set	AAE, m³/t	0.83	1.14	1.11
	ARE, %	5.97	8.10	9.12
	RMSE, m³/t	1.00	1.45	1.57
	R², fraction	0.950	0.895	0.877
Testing set	AAE, m³/t	0.85	1.26	0.96
	ARE, %	6.35	9.25	7.81
	RMSE, m³/t	1.06	1.81	1.23
	R², fraction	0.946	0.842	0.927
Whole set	AAE, m³/t	0.53	0.61	0.84
	ARE, %	3.85	4.44	6.74
	RMSE, m³/t	0.73	1.06	1.19
	R², fraction	0.977	0.952	0.940

Table 4. Input feature scenarios for analyzing the estimation accuracy.

Scenario No.	Input Features *
1	P, A, R_o, FC
2	P, A, R_o, FC, EM
3	P, A, R_o, FC, IM
4	P, A, R_o, FC, V
5	P, A, R_o, FC, T
6	P, A, R_o, FC, EM, IM
7	P, A, R_o, FC, EM, V
8	P, A, R_o, FC, EM, T
9	P, A, R_o, FC, EM, IM, V
10	P, A, R_o, FC, EM, IM, T
11	P, A, R_o, FC, EM, V, T
12	P, A, R_o, FC, EM, IM, V, T

* Abbreviations: P—equilibrium pressure; A—ash; FC—fixed carbon; IM—inherent moisture; R_o—vitrinite reflectance; EM—equilibrium moisture; V—vitrinite; T—temperature.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, J.; Feng, Q.; Zhang, X.; Hu, Q.; Yang, J.; Wang, N. A Novel Data-Driven Method to Estimate Methane Adsorption Isotherm on Coals Using the Gradient Boosting Decision Tree: A Case Study in the Qinshui Basin, China. Energies 2020, 13, 5369. https://doi.org/10.3390/en13205369

AMA Style

Zhang J, Feng Q, Zhang X, Hu Q, Yang J, Wang N. A Novel Data-Driven Method to Estimate Methane Adsorption Isotherm on Coals Using the Gradient Boosting Decision Tree: A Case Study in the Qinshui Basin, China. Energies. 2020; 13(20):5369. https://doi.org/10.3390/en13205369

Chicago/Turabian Style

Zhang, Jiyuan, Qihong Feng, Xianmin Zhang, Qiujia Hu, Jiaosheng Yang, and Ning Wang. 2020. "A Novel Data-Driven Method to Estimate Methane Adsorption Isotherm on Coals Using the Gradient Boosting Decision Tree: A Case Study in the Qinshui Basin, China" Energies 13, no. 20: 5369. https://doi.org/10.3390/en13205369

APA Style

Zhang, J., Feng, Q., Zhang, X., Hu, Q., Yang, J., & Wang, N. (2020). A Novel Data-Driven Method to Estimate Methane Adsorption Isotherm on Coals Using the Gradient Boosting Decision Tree: A Case Study in the Qinshui Basin, China. Energies, 13(20), 5369. https://doi.org/10.3390/en13205369

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Data-Driven Method to Estimate Methane Adsorption Isotherm on Coals Using the Gradient Boosting Decision Tree: A Case Study in the Qinshui Basin, China

Abstract

1. Introduction

2. Materials and Methods

2.1. Geological Background of the Study Area

2.2. Samples and Experiments

2.3. Basics of GBDT

2.4. Construction of the GBDT Estimation Model

2.4.1. Input Features

2.4.2. Determination of Optimal GBDT Hyperparameters

2.4.3. Evaluation Matrices

2.5. Comparison with BP-ANN and SVM

3. Results

3.1. Performance of the GBDT Estimation Model

3.2. Comparison with BP-ANN and SVM

4. Discussion

4.1. Analyses of Effects of Input Features on Adsorption Isotherms

4.1.1. Relative Importance of Input Features

4.1.2. Univariate Analyses

4.2. Influence of Input Features on the Model Accuracy

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI