Forecasting of Coalbed Methane Daily Production Based on T-LSTM Neural Networks

: Accurately forecasting the daily production of coalbed methane (CBM) is important forformulating associated drainage parameters and evaluating the economic beneﬁt of CBM mining. Daily production of CBM depends on many factors, making it di ﬃ cult to predict using conventional mathematical models. Because traditional methods do not reﬂect the long-term time series characteristics of CBM production, this study ﬁrst used a long short-term memory neural network (LSTM) and transfer learning (TL) method for time series forecasting of CBM daily production. Based on the LSTM model, we introduced the idea of transfer learning and proposed a Transfer-LSTM (T-LSTM) CBM production forecasting model. This approach ﬁrst uses a large amount of data similar to the target to pretrain the weights of the LSTM network, then uses transfer learning to ﬁne-tune LSTM network parameters a second time, so as to obtain the ﬁnal T-LSTM model. Experiments were carried out using daily CBM production data for the Panhe Demonstration Zone at southern Qinshui basin in China. Based on the results, the idea of transfer learning can solve the problem of insu ﬃ cient samples during LSTM training. Prediction results for wells that entered the stable period earlier were more accurate, whereas results for types with unstable production in the early stage require further exploration. Because CBM wells daily production data have symmetrical similarities, which can provide a reference for the prediction of other wells, so our proposed T-LSTM network can achieve good results for the production forecast and can provide guidance for forecasting production of CBM wells.


Introduction
As a high-quality energy source that can replace natural gas, coalbed methane (CBM) is an important energy reserve in China [1]. The rational use and drainage of CBM is significant for improving energy structure, protecting the environment, and promoting economic development [2]. Forecasting CBM daily production can not only help predict the economic benefits of CBM development [3], but also provide a basis for development of reasonable parameters for CBM drainage, which plays an important role in the orderly mining of CBM [4].
At present, methods for predicting the daily production of CBM mainly include type curves, decline curves, numerical simulations, material balance [5], and machine learning (e.g., neural networks, support vector machines (SVMs)). Li et al. [6] used the Weibull curve to make a segmented prediction of CBM production, obtaining a better fit in linear regression. Xu et al. [7] introduced the relationship between type curve-dimensionless production and time, proposing the type curve method of CBM production by analyzing the influence of permeability and Langmuir pressure on the type curve. Jang et al. [8] predicted production performance of CBM by combining falling curve analysis with material balance and flow analysis to establish a comprehensive production data model of CBM. In addition, several scholars have studied CBM production using multivariate regression methods. For example, Xu et al. [4] established a prediction model for outflows of CBM by combining multivariate regression with contour analysis of the coalbed floor. Chen et al. [9] used the main factors affecting the gas content of coal seams extracted through correlation analysis into the multivariate stepwise regression to produce a predicted value consistent with the measured value. Li et al. [10] combined stepwise regression with a variety of factors to make quantitative predictions about CBM resources.
Numerical simulations of unconventional gas reservoirs are summarized in the works of Cipolla et al. [11], which often employ relatively complex mathematical models. With developments in relevant technologies, geological parameters and human influence factors now require increasing consideration to improve model accuracy. For example, Zhao et al. [12] used a gray lattice Boltzmann method to perform numerical simulations, so as to address the problem of interlayer interference caused by changes in permeability and differences in pressure between coal seams during the mining process. Zhou [13] also used numerical simulations to predict the production of a horizontal CBM well in Australia. The works of Cipolla et al. [11] also reflect this trend, often employing complex mathematical models to numerically simulate the output of unconventional gas reservoirs. Yun et al. [1] used C++-based Korean software CBMRS 1.0 to develop their own software dedicated to providing numerical CBM reservoir simulation. Accurate geological parameters and sufficient production data are essential to the use of numerical simulations for production forecasting [14]; without these, the use of numerical simulations would be inappropriate.
The material balance equation is also an important tool for estimating the reserves and performance of both conventional and unconventional gas reservoirs. King [15] introduced two material balance methods for unconventional gas reservoirs that considered the effects of adsorbed gas; these were used to estimate natural gas reserves and reservoir predictions. By contrast, the material balance equations proposed by Shi et al. [16] considered the effects of the difference between initial reservoir pressure and critical desorption pressure, pore and water compressibility, and dissolved and free gas factors. Sun et al. [17] proposed an improved flow material balance (FMB) equation method by taking into account the pressure-saturation relationship, which provides a reliable tool for extracting low-permeability CBM reservoir information. As evidenced by these advancements, material balance equation methods for CBM production forecasting can take into account numerous factors; however, CBM production is a complex and dynamic process that is not limited to the aforementioned factors, and the difficulty of obtaining these hinders the performance of these methods.
Machine learning methods avoid discussion of complex geological conditions as well as human factors, allowing for more convenient applications in production forecasting. For example, many scholars [18][19][20][21] have used BP neural networks to predict CBM production and have found that neural networks are more accurate than the traditional method. Xia et al. [22] proposed a mixed method based on a rough set (RS) and least squares support vector machine (LS-SVM) to predict CBM productivity. Existing methods for predicting CBM production have been used widely, and a majority have established a complex mathematical model for production forecasting. However, CBM production is a complex dynamic process, influenced by several factors. This, coupled with the unavailability of several factors, makes the process difficult to describe using a mathematical model [2]. Although machine learning methods such as BP neural networks and SVMs have a wide range of applications in production forecasting, predicting CBM production is a typical time series problem based on historical production data of gas wells. BP neural networks and other methods do not fully consider the time Symmetry 2020, 12, 861 3 of 15 dependence of time series data, analyze only a single sample's data [23], and are very susceptible to errors.
Although the process of CBM mining is extremely complicated, the daily production of CBM itself is the result of a combination of various factors, which to some extent reflects the internal mechanism of the system [24]. Accordingly, it is possible to abandon complicated research and find out the inherent change rule based on historical data by studying daily CBM production data, then predict future output. Aiming at the time series variation rule of CBM production data, this study proposes a prediction method of CBM production based on long short-term memory (LSTM) models. The main contribution of this paper is to propose a Transfer-LSTM (T-LSTM) model for predicting CBM production. It innovatively applies transfer learning (TL) to the LSTM network pretraining process and has achieved good results.

Related Work
LSTM is suitable for analyzing time series, being a deep learning recurrent neural network structure that can learn long-term sequence data [25]. With the development of deep learning, the LSTM model has achieved satisfactory results in many aspects. In time series prediction, for example, many scholars [25][26][27][28] have applied the LSTM model to predict the fine particulate matter (PM 2.5 ) concentration of air pollution and obtained more accurate predictions than by existing methods. Chen [29], Tian [30], and Li et al. [31] studied traffic flow using an LSTM model. Ma et al. [32] used LSTM and Beijing microwave traffic detection data to predict traffic speed and found that a LSTM network predicted accuracy and stability better than traditional neural network and other parametric or nonparametric algorithms did. Fischer [33] and Kim et al. [34] used the LSTM model to make predictions about financial markets and found that the results were better than the results of random forest, deep neural network (DNN), and other methods. Peng [35] and Fang et al. [36] used the LSTM model to predict electricity prices and electricity sales. Sagheer et al. [37] proposed a deep long short-term memory (DLSTM) oil production prediction method based on genetic algorithm optimization and compared it with statistical and software calculation methods, finding that the proposed DLSTM model was more accurate under different comparison standards. Wang [3] analyzed the time series of mine gas leakage based on the deep learning method but did not predict CBM production. Although LSTM models have been widely used in time series analysis, they are still relatively new in CBM production forecasting. In this study, the LSTM model was applied to predict CBM production, leading to discussion of the feasibility of using deep learning to predict CBM production.

Data Description
The experimental datasets used in this study were the CBM production data of the Panhe Demonstration Zone of Qinnan CBM in Jincheng, Shanxi, China. The first phase of the demonstration area was put into operation in 2005. Through years of drainage, the CBM production of these CBM wells has entered a stable stage. The dataset consists of two parts: the first includes multiple indicators such as CBM production, bottom hole temperature, casing pressure, water production, coal seam structure, coal seam thickness, and the like for 149 gas wells from 2005 to 2010. The second part contains the CBM production of another 7 gas wells from 2011 to 2014. Because these data are the earliest successful commercialized CBM drainage data in China, analysis of the production data of the early wells can guide the exploitation of new wells. In this study, we mainly tested the CBM production of the seven wells in the second part. Because CBM production data for seven wells are insufficient for deep learning network training, we used the first part of the data to pretrain the network and used transfer learning (see Section 3.2 of this paper) to predict the second part of the CBM production. The time series of CBM production in the seven wells is shown in Figure 1, where X indicates the number of days of CBM mining (1, 2, 3 . . . , n) and y represents daily CBM production (m 3 ). It can be Symmetry 2020, 12, 861 4 of 15 seen from the curves that the CBM output types of the seven wells are different and vary greatly with different phases.
The data were divided into training sets and test sets as required. To ensure availability of sufficient data to train the network, the dataset was divided into training sets and test sets in the ratio of 80% and 20%, respectively. To avoid the impact of excessive numerical range on the accuracy of the network, CBM datasets were first standardized to values ranging between 0 and 1. Normalization of the data is conducive to initialization and adjustment of learning rate and can greatly improve neural networks' speed in finding an optimal solution [38]. The min-max normalization method was adopted in this paper.
Symmetry 2020, 12, x FOR PEER REVIEW 4 of 17 of 80% and 20%, respectively. To avoid the impact of excessive numerical range on the accuracy of the network, CBM datasets were first standardized to values ranging between 0 and 1. Normalization of the data is conducive to initialization and adjustment of learning rate and can greatly improve neural networks' speed in finding an optimal solution [38]. The min-max normalization method was adopted in this paper.

LSTM Neural Network
LSTM is a special kind of recurrent neural network (RNN) that can effectively solve the gradient disappearance or explosion problem of RNN through long-term dependence of time series analysis [39][40][41][42][43][44][45]. Unlike traditional neural networks, RNN has hidden nodes that are connected, with the current output of a time series related to the output before it, so that the time sequence characteristics

LSTM Neural Network
LSTM is a special kind of recurrent neural network (RNN) that can effectively solve the gradient disappearance or explosion problem of RNN through long-term dependence of time series analysis [39][40][41][42][43][44][45]. Unlike traditional neural networks, RNN has hidden nodes that are connected, with the current output of a time series related to the output before it, so that the time sequence characteristics of data can be taken into account. The structure of a typical RNN network is shown in Figure 2. The left side represents a network element, and the right side is the unfolded form of multiple network elements, where t represents a time series, X input values, and S t memory at time t. W, U, and V represent the weights of the input, the moment, and the output, respectively, and O represents the output values. RNN can capture timing features but can solve only short-sequence problems; it does not work for long-sequence data such as CBM daily production.
Symmetry 2020, 12, x FOR PEER REVIEW 5 of 17 left side represents a network element, and the right side is the unfolded form of multiple network elements, where t represents a time series, X input values, and St memory at time t. W, U, and V represent the weights of the input, the moment, and the output, respectively, and O represents the output values. RNN can capture timing features but can solve only short-sequence problems; it does not work for long-sequence data such as CBM daily production. LSTM can overcome the shortcomings of RNN and features long-term memory. A typical LSTM network cell structure is shown in Figure 3, where Cell represents a memory cell; Xt and ht represent the input and output at time t, respectively; and ht-1 represents the output at the previous time. The LSTM network structure solves the RNN gradient problem through three "gates": a "forget gate", an "input gate", and an "output gate". The "forget gate" decides which information to discard and which to retain, the "input gate" controls the information input to the cell, and the "output gate" controls the output information. The hidden layer structure of the LSTM network can be calculated by the following equations: where t is a time series; i, f, c, and o are input gate, forget gate, output gate, and memory cell, respectively; W and b are corresponding weights and offsets, respectively; and ó represents a sigmoid function. LSTM can overcome the shortcomings of RNN and features long-term memory. A typical LSTM network cell structure is shown in Figure 3, where Cell represents a memory cell; X t and h t represent the input and output at time t, respectively; and h t−1 represents the output at the previous time. The LSTM network structure solves the RNN gradient problem through three "gates": a "forget gate", an "input gate", and an "output gate". The "forget gate" decides which information to discard and which to retain, the "input gate" controls the information input to the cell, and the "output gate" controls the output information. The hidden layer structure of the LSTM network can be calculated by the following equations: where t is a time series; i, f, c, and o are input gate, forget gate, output gate, and memory cell, respectively; W and b are corresponding weights and offsets, respectively; and ó represents a sigmoid function.
According to the characteristics of CBM time series data, the overall framework of predicting CBM daily production using the LSTM model is shown in Figure 4. The framework of CBM production time series prediction was divided into three parts: input layer, hidden layer, and output layer. The input layer was responsible for data standardization and dividing the datasets. The data output from the input layer was used as an input to the hidden layer. The hidden layer was used to build the LSTM network structure and predict the time series data. The hidden layer was antistandardized by the output layer after completing the prediction, and the prediction results were used as the output. According to the characteristics of CBM time series data, the overall framework of predicting CBM daily production using the LSTM model is shown in Figure 4. The framework of CBM production time series prediction was divided into three parts: input layer, hidden layer, and output layer. The input layer was responsible for data standardization and dividing the datasets. The data output from the input layer was used as an input to the hidden layer. The hidden layer was used to build the LSTM network structure and predict the time series data. The hidden layer was antistandardized by the output layer after completing the prediction, and the prediction results were used as the output.   According to the characteristics of CBM time series data, the overall framework of predicting CBM daily production using the LSTM model is shown in Figure 4. The framework of CBM production time series prediction was divided into three parts: input layer, hidden layer, and output layer. The input layer was responsible for data standardization and dividing the datasets. The data output from the input layer was used as an input to the hidden layer. The hidden layer was used to build the LSTM network structure and predict the time series data. The hidden layer was antistandardized by the output layer after completing the prediction, and the prediction results were used as the output.  We used mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE) as the three principal indicators to evaluate the accuracy of our experiments, where P i is predicted CBM production, A i is actual CBM production, and n is number of test samples.

T-LSTM Model
Transfer learning applies knowledge learned in a similar domain to a target domain to make up for an insufficient number of samples in the actual training process. The similar domain refers to the existing knowledge, whereas the target domain is the new knowledge to be learned. In our study, the similar domain is the data for the 149 wells in the first part, and the target domain is the data for the 7 wells in the second part. Obviously, a dataset of 7 gas wells is relatively sparse for deep learning training. Compared with our target data, a dataset of 149 gas wells is relatively large, so we use the transfer learning method to apply the knowledge learned on 149 gas wells to the target domain. Although the data in the similar domain are not from the same period as our target 7 gas wells, they all have similar periodic characteristics and are very similar to our task domain [6]. Accordingly, we can first use gas production data for 149 wells to pretrain our network, fix the learned weights, then transfer our findings to the target domain, using gas production data in the target domain to fine-tune the network parameters. In this way, we have formed a new T-LSTM model. Through transfer learning, we can largely solve the problem of insufficient samples for deep learning. The specific process is shown in Figure 5: where Pi is predicted CBM production, Ai is actual CBM production, and n is number of test samples.

T-LSTM Model
Transfer learning applies knowledge learned in a similar domain to a target domain to make up for an insufficient number of samples in the actual training process. The similar domain refers to the existing knowledge, whereas the target domain is the new knowledge to be learned. In our study, the similar domain is the data for the 149 wells in the first part, and the target domain is the data for the 7 wells in the second part. Obviously, a dataset of 7 gas wells is relatively sparse for deep learning training. Compared with our target data, a dataset of 149 gas wells is relatively large, so we use the transfer learning method to apply the knowledge learned on 149 gas wells to the target domain. Although the data in the similar domain are not from the same period as our target 7 gas wells, they all have similar periodic characteristics and are very similar to our task domain [6]. Accordingly, we can first use gas production data for 149 wells to pretrain our network, fix the learned weights, then transfer our findings to the target domain, using gas production data in the target domain to finetune the network parameters. In this way, we have formed a new T-LSTM model. Through transfer learning, we can largely solve the problem of insufficient samples for deep learning. The specific process is shown in Figure 5:

T-LSTM Network Training and Parameter Optimization
The training set was used as an input into the T-LSTM network model for training, and the model parameters were optimized through experiments. Here, well 4 is taken as an example to explain the setting of the parameters. First, the influence of the number of LSTM hidden layers on network training and prediction results was explored. The input layer and output layer of the

T-LSTM Network Training and Parameter Optimization
The training set was used as an input into the T-LSTM network model for training, and the model parameters were optimized through experiments. Here, well 4 is taken as an example to explain the setting of the parameters. First, the influence of the number of LSTM hidden layers on network training and prediction results was explored. The input layer and output layer of the network were set to 1. Based on past experience, the initial learning rate was set to 0.001, the number of nodes in each layer was 128, and the training iteration times epoch was set to 300.
Cross-validation is needed for optimal parameter selection. For machine learning and deep learning, there are many traditional cross-validation methods, such as the leave-one-out cross-validation method and the k-fold cross-validation method [46,47]. However, these methods cannot be used in time series prediction, because time series data are time-dependent. To solve this problem, rolling-origin recalibration evaluation was adopted in this paper [48]. The training set was divided into two parts, with the first 50% the training subset and the second 50% the validation set. In the forward rolling validation, the data from the validation set were moved to the training subset in chronological order. Our cross-validation process is shown in Figure 6, with blue circles representing training data, red circles validation data, and hollow circles unused data. We performed a split of the validation set five times, each time using 10% of the data for verification and then moving it to the training subset in chronological order for the next training process. into two parts, with the first 50% the training subset and the second 50% the validation set. In the forward rolling validation, the data from the validation set were moved to the training subset in chronological order. Our cross-validation process is shown in Figure 6, with blue circles representing training data, red circles validation data, and hollow circles unused data. We performed a split of the validation set five times, each time using 10% of the data for verification and then moving it to the training subset in chronological order for the next training process. The optimal number of hidden layers was selected by experiments from 1 to 5. To improve the generalization ability of the model and prevent overfitting, the dropout function was included in the LSTM model and 20% of the nodes were discarded. The loss function used was MSE. Corresponding to the different numbers of hidden layers (1,2,3,4,5), the LSTM network was trained, and the number of hidden layers was selected by comparing the loss variation of the model, as shown in Figure 7. As can be seen from Figure 7, with two hidden layers, the loss obtained was at its minimum (less than 0.001). Accordingly, two hidden layers were selected to train the LSTM model. The more hidden layers used, the longer the network training time required. Accordingly, using more hidden layers in a network does not necessarily produce better network training. As can be seen from the results, use of too many layers not only reduces training accuracy but also greatly increases training time. The optimal number of hidden layers was selected by experiments from 1 to 5. To improve the generalization ability of the model and prevent overfitting, the dropout function was included in the LSTM model and 20% of the nodes were discarded. The loss function used was MSE. Corresponding to the different numbers of hidden layers (1,2,3,4,5), the LSTM network was trained, and the number of hidden layers was selected by comparing the loss variation of the model, as shown in Figure 7. As can be seen from Figure 7, with two hidden layers, the loss obtained was at its minimum (less than 0.001). Accordingly, two hidden layers were selected to train the LSTM model. The more hidden layers used, the longer the network training time required. Accordingly, using more hidden layers in a network does not necessarily produce better network training. As can be seen from the results, use of too many layers not only reduces training accuracy but also greatly increases training time. The learning rate was found to directly affect the training result of the model. If the learning rate was too high, the model was not trained accurately. Conversely, too small a learning rate increased network convergence time. Through experimentation, the hidden layer's number of LSTM networks was fixed at 2, and different values of the learning rate were taken (LR = 0.00001, 0.00005, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, and 0.1) to explore the influence of learning rate value on the model. The variations in model loss, corresponding to different learning rates, are shown in Figure 8. As can be seen, when the learning rate was large (LR = 0.1, 0.05), the loss of the model did not gradually decrease with the epoch but rather increased to a certain value and stopped changing as the epoch increased. The model fit was poor. As the learning rate gradually decreased (LR = 0.01, 0.005), the loss error gradually decreased as the epoch increased. However, as the epoch continued to increase, loss error increased instead of decreasing and was mostly in a state of shock. When the learning rate decreased to 0.001, loss value decreased with increases in epoch and gradually stabilized. When the learning rate decreased to 0.0001, loss value reached its minimum value (less than 0.001) and model accuracy its highest value (with RMSE, MAE, and MAPE at their minimum values). As the learning The learning rate was found to directly affect the training result of the model. If the learning rate was too high, the model was not trained accurately. Conversely, too small a learning rate increased network convergence time. Through experimentation, the hidden layer's number of LSTM networks was fixed at 2, and different values of the learning rate were taken (LR = 0.00001, 0.00005, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, and 0.1) to explore the influence of learning rate value on the model. The variations in model loss, corresponding to different learning rates, are shown in Figure 8. As can be seen, when the learning rate was large (LR = 0.1, 0.05), the loss of the model did not gradually decrease with the epoch but rather increased to a certain value and stopped changing as the epoch increased.
The model fit was poor. As the learning rate gradually decreased (LR = 0.01, 0.005), the loss error gradually decreased as the epoch increased. However, as the epoch continued to increase, loss error increased instead of decreasing and was mostly in a state of shock. When the learning rate decreased to 0.001, loss value decreased with increases in epoch and gradually stabilized. When the learning rate decreased to 0.0001, loss value reached its minimum value (less than 0.001) and model accuracy its highest value (with RMSE, MAE, and MAPE at their minimum values). As the learning rate continued to decrease, the loss value no longer decreased significantly but the precision of the model started to decrease (RMSE, MAE, and MAPE became larger). These findings show that when LR = 0.0001, the model can achieve a better fit and achieve the highest accuracy. Accordingly, a learning rate of 0.0001 was adopted to train the LSTM network. Based on the foregoing analysis, we used a network with two hidden layers and set the learning rate to 0.0001.

Results and Discussion
We trained the T-LSTM model through cross-validation and finally determined the optimal structure and parameters of the network. Then, the LSTM model was used to predict the CBM production of seven wells. The final prediction of CBM production is shown in Figure 9 (No. 1-No. 7), with blue representing measured data, green the LSTM network output result of the training set, and red the prediction of the T-LSTM network for the test set. It can be seen that time series analysis of daily CBM production by T-LSTM can achieve accurate results. The predicted CBM production by the T-LSTM network was close to the actual production. The predicted production of wells 2 and 4 was the most consistent with actual production. From the results, the predicted values are evenly distributed on both sides of the true values, indicating that the predicted results are unbiased. Although the results for wells 1, 6 and 7 were significantly lower than those of other sample areas, the predicted values deviated farther from the true values, the overall estimated values were higher than the true values, and the predicted results were biased. Based on the foregoing analysis, we used a network with two hidden layers and set the learning rate to 0.0001.

Results and Discussion
We trained the T-LSTM model through cross-validation and finally determined the optimal structure and parameters of the network. Then, the LSTM model was used to predict the CBM production of seven wells. The final prediction of CBM production is shown in Figure 9 (No. 1-No. 7), with blue representing measured data, green the LSTM network output result of the training set, and red the prediction of the T-LSTM network for the test set. It can be seen that time series analysis of daily CBM production by T-LSTM can achieve accurate results. The predicted CBM production by the T-LSTM network was close to the actual production. The predicted production of wells 2 and 4 was the most consistent with actual production. From the results, the predicted values are evenly distributed on both sides of the true values, indicating that the predicted results are unbiased. Although the results for wells 1, 6 and 7 were significantly lower than those of other sample areas, the predicted values deviated farther from the true values, the overall estimated values were higher than the true values, and the predicted results were biased.
The RMSE, MAE, and MAPE values of the seven CBM wells were statistically analyzed, as shown in Table 1     For the LSTM network, different results were obtained by each network training. To avoid the contingency of network training and develop a robust LSTM network, 30 independent repeat experiments were conducted on the CBM production data for each gas well. After setting the network parameters, training for each well was repeated 30 times, with RMSE, MAE, and MAPE values obtained for each repeat. Then boxplots were used to statistically analyze the RMSE of the 30 experiments, with RMSE, MAE, and MAPE distributions used to verify the stability of the T-LSTM network. Thirty independent repetitions sufficed to obtain a good distribution of these three indicators. The boxplots for the seven wells are shown in Figure 10. It can be seen that the RMSE, MAE, and MAPE results obtained from the 30 network trainings were concentrated, with a small interval between the upper and lower quartiles indicating a small distribution range and good stability of the LSTM network. From the boxplots, we can clearly see that the prediction result for well 2 was the most accurate, featuring the smallest RMSE, MAE, and MAPE values. The prediction results for wells 1, 6, and 7 were relatively poor, featuring large values of RMSE, MAE, and MAPE. Based on analysis of the change curves of production with time, the gas production of well 2 entered the stage of stable and continuous CBM production earlier, so the prediction result was relatively accurate. However, for types such as well 6, having low output or unstable output at the early stage, neural networks have not yet learned the characteristics of the time series of the curves well, making it difficult to grasp the time node of the output change and thus to forecast results for these types with a high degree of accuracy. In addition, we compared the T-LSTM model with other CBM production prediction cases in the literature, as shown in Table 2. As can be seen, the average relative error of the T-LSTM model we proposed is 2.20%. The prediction error of the T-LSTM model was smaller than most cases in the references, indicating that the T-LSTM model can predict CBM production better than many traditional methods. Moreover, the T-LSTM model is more convenient to operate than traditional methods. It does not require consideration of many complex geological factors and complex In addition, we compared the T-LSTM model with other CBM production prediction cases in the literature, as shown in Table 2. As can be seen, the average relative error of the T-LSTM model we proposed is 2.20%. The prediction error of the T-LSTM model was smaller than most cases in the references, indicating that the T-LSTM model can predict CBM production better than many traditional methods. Moreover, the T-LSTM model is more convenient to operate than traditional methods. It does not require consideration of many complex geological factors and complex mathematical models, only input of historical production data, and thus can be more efficient than many traditional methods. Table 2. Comparison of the average relative error between the Transfer-LSTM (T-LSTM) model and some traditional cases in the literature.

Conclusions
In this study, a time series forecasting method of CBM daily production based on a T-LSTM network has been proposed, and parameter selection and model training, with CBM production forecasting of the T-LSTM model introduced. Through experimentation, the following findings were made: (1) The use of the T-LSTM model for time series forecasting of CBM production can provide accurate results. Compared with traditional methods, the LSTM model does not need to consider the complex mining process of CBM but instead directly looks for the rule from the time series data to predict future output. Combining the idea of transfer learning with that of LSTM can solve the problem of insufficient training samples for deep learning. It can be seen from the experiment that the curve of CBM production predicted by the T-LSTM model was very close to the actual production curve and that error was small, suggesting a significant role for this model in practical applications.
(2) When training the LSTM model, the number of hidden layers and the setting of learning rate are very important. With too few hidden layers, the model could not be fully trained. Too many layers increased the network training time and reduced efficiency. Too large or too small learning rates will affect network convergence speed and can even lead to overfitting and underfitting. Accordingly, multiple experiments are needed to find the most suitable value.
(3) When predicting CBM production in seven gas wells, it can be seen that the prediction accuracy of gas wells with different periodic trends varies greatly. For gas wells that entered the stable production period earlier, the T-LSTM model's predicted results were more accurate. However, for gas well types that showed unstable production for a long time in the early stage, or that had extremely low production and suddenly increased at a certain stage, the prediction results were relatively poor. CBM production of these types of wells has not yet produced a regular rule, and as a result, the neural network has not been fully trained, showing that LSTM might not be suitable for predicting production of all types of CBM wells-a finding that requires further exploration.