4.1. Comparison of the Hyperparameter Tuning of Learning Rate, Batch Size, and Neuron
(1) Setting the hyperparameters
According to Algorithm 2. this experiment first fixed the values of non-key parameters—random seed, seed = 2, and 100 training steps—and then set the value ranges of three parameters—batch size {2, 4,…48}, number of neurons {12, 16,…104}, and the learning rate {0.001, 0.003, 0.005, 0.01}—and finally set the objective function to the minimum MSE.
Algorithm 2. the training and prediction of a gas concentration prediction model for a working face based on the RNN. |
Input: |
1. get , from by m |
2. = min-max() |
3. get , from by L |
4. create by |
5. connect by and L |
6. connect by seed |
7. for each step in 1: steps |
8. = |
9.
|
10. update by Adam with loss and |
11. get |
12. for each j in 0: (n-m-1) |
13.
|
14. append with [−1] |
15.
|
16. error measure , |
The batch size represents the maximum number of samples for each model’s training. As the batch size increases, the convergence speeds up; however, it is easy to miss the optimal solution; the number of neurons represents the learning ability of the model, and the expression ability of the model increases as the number of neurons increases. However, the operating cost will also see a corresponding increase while increasing the probability of overfitting.
The RNN model training process mainly involves the input layer, hidden layer, output layer, and network training part. The prediction process mainly involves the output layer. means the hidden layer network of the RNN model, means the neural unit in the hidden layer of the RNN model, and means the error measurement function.
(2) Analysis of experimental results
Figure 6 shows the training results of using the grid search method to optimize RNN hyperparameters. The independent variable in each sub-graph is batch size, and the dependent variable is the number of neurons. Different sub-graphs correspond to different values. Mse is the minimum MSE in the sub-graph. The lighter the color of a square in the grid, the smaller the MSE.
Figure 6 shows that with an increase in batch size, the MSE became larger. When = 0.001, the MSE could converge to the lowest range; at the same time, the batch size value range was [
10,
20], and the neuron value was 68. It was easy to obtain a higher accuracy, and the minimum MSE was 0.0191.
(3) Experimental results
The top five optimal parameter combinations of the RNN model and the corresponding model accuracy are shown in
Table 1.
It can be seen that it was easier to obtain higher accuracy when was 0.001. When the number of batches was 10 and the number of neurons was 68, the model’s prediction accuracy was the best. The experimental results show that, for , the smaller the value, the higher the prediction accuracy. At the same time, selecting reasonable values for the numbers of batches and neurons can effectively avoid over-fitting problems, increase the learning ability of the model, and improve the prediction accuracy of the model.
4.2. Hyperparameter Tuning Comparison of Network Depth and Discard Ratio
(1) Hyperparameter setting
According to the training results, this experiment first fixed the parameter values: random seed, seed = 2, 100 training steps, 10 batches, 68 hidden layer neurons, learning rate = 0.001. Then, it set two parameters: the value ranges of the dropout ratio, Dropout {0.1, 0.2, 0.3, 0.4, 0.5}, and network depth {1, 2, 3, 4, 5}; finally, it set the objective function to be the minimum MSE.
Dropout represents the proportion of randomly inactivated neurons in the prevention of overfitting. A decrease in the proportion will increase the capacity of the model. Fewer dropped parameters means that the adaptability between model parameters and the capacity between models will increase, but this will not necessarily increase the model’s accuracy. When the discarding ratio is too large, the model parameters will be reduced, thereby reducing the ability of the model to learn and express; an increase in the network depth of the model means an increase in the model capacity; under the same conditions, an increase in the network depth means more parameters and a stronger fitting ability; as the network depth increases, the model becomes increasingly complex, and it is increasingly likely to cause over-fitting.
(2) Analysis of experimental results
Figure 7 shows the training results of using the grid search method to optimize the RNN hyperparameters. As the discarding ratio increased, the model’s ability to prevent overfitting became stronger; however, the problem of underfitting became increasingly serious. As shown in
Figure 7, as the discarding ratio increased from 0.1 to 0.5, the model’s error became larger (see the darker colors in the figure); in addition, as the network depth increased, the parameters increased increasingly in number, and the model learned and showed capability when the number of parameters was increased to a certain extent. Although the model’s learning and predictive abilities were very strong, the complexity of the model increased gradually, which caused a higher probability of overfitting, leading to higher errors and model depth. When it had 1 to 3 layers, the model’s error gradually decreased (see the lighter colors in the figure), and when the model’s depth increased to five layers, the predictions started to improve (see the darker colors in the figure). In addition, the experiment found that the smaller the model’s discarding ratio, the smaller the training error; as the network depth increased, the model’s training error showed a parabolic trend, and the error first increased and then decreased.
(3) Experimental results
The detailed performance in terms of the dropout ratio and network depth for the RNN gas concentration prediction method is shown in
Table 2.
From the table, the model’s parameter discarding ratio was 0.1, and when the network depth was three layers, the minimum model error was 0.0191. In addition, when the discarding ratio was 0.1, the probability of obtaining a lower error was higher.
Based on the optimization results of the previous two sections, this experiment first fixed the values of the key parameters: random seed, seed = 2, 100 training steps, 10 batches, 68 hidden layer neurons, learning rate 0.001, and dropout ratio 0.1. The network depth was three layers, and finally the objective function was set to the highest prediction accuracy (minimum MSE) at the test point, and the early stopping method was used to find the best training times. The training effect of the early stopping method is shown in
Figure 8.
This experiment used the early stopping method for training, while monitoring the error of the validation set. When the error in the validation set increased continuously for five training times, the training was stopped, and the weight parameter in the previous iteration result was used to the model the final parameters. As shown in
Figure 8, when the number of training epochs was 80, the validation set’s error was 0.0195, and the error continued to increase during five consecutive trainings; the validation set error was 0.0201. The experiment found that after 85 training epochs, the RNN prediction model appeared to be over-fitting, and the error of the training set continued to decrease, but the corresponding validation set error gradually increased, and the validation set error oscillated. According to the principle of the early stopping method, this experiment will train the sample 80 times.
4.3. Comparison of Prediction Performances of Different Models
To ensure the prediction performance of the prediction model, the average absolute percentage error (MAPE) evaluation index was used to evaluate the prediction performance of the three optimized prediction algorithms (SVR, BP, and RNN). This operation is shown in Equation (39):
where
fi is the predicted value of the gas concentration;
yi is the true value of the gas concentration.
The fitting diagram of the multi-parameter fusion prediction effect of the working face gas concentration using the SVR method is shown in
Figure 9. Based on the figure, the SVR method could learn the changing trend of the gas concentration time series, and could roughly fit the changing trend of the gas concentration. However, as the prediction length increased, the prediction error of the SVR prediction method gradually increased. In particular, at the inflection points of some gas concentration changes, the prediction error was too large.
Figure 10 shows the fitting diagram of the multi-parameter fusion prediction of the coal face gas concentration using PSO-Adam-BP. The parameter set in the PSO-Adam model needs to analyze the actual optimization problem. This paper used a three-layer neural network structure. The number of hidden layer neurons was 48, the batch size was 10, the activation functions of the hidden layer and the output layer used the “tansig” and “relu” functions, a dropout layer was added to prevent overfitting, the discarding ratio was 0.1, and the number of training samples optimized by the early stopping method was 20. From the figure, the PSO-Adam-BP method could learn the changing trend of the gas concentration time series, and could essentially fit the changing trend of the gas concentration. Compared with the SVR method, the prediction performance was better, but with the increases in the length of the prediction, the prediction error of the method based on PSO-Adam-BP was gradually increased. In particular, at the time inflection points of some gas concentration transformations, the prediction error was too large.
To study the universality and popularization of the RNN method, the coal mine production monitoring data were used as the research object to compare and analyze the prediction performance of each method in other mine applications. The comparison of the prediction results of different methods is shown in
Figure 11.
The SVR, BP neural network, and RNN can roughly fit the gas concentration change trend in the prediction process; however, the prediction of the RNN performed best at the time inflection points of certain concentration changes. As shown in the
Figure 11, at the A~D stages, the three prediction methods could fit the changing trend of gas concentration, but at the time inflection point of certain gas concentration changes—at points A, B, C, and D in the figure—the error of the RNN was obviously smaller than the prediction errors of the SVR and BP neural network models; in addition, when the prediction length was 40–50 units, the RNN prediction method could fit the changing trend of gas concentration, but the SVR and BP neural network methods could no longer achieve adequate prediction, especially at point E in the figure; there, the prediction errors of the SVR and BP neural network methods were relatively large.
Generally speaking, the RNN had better predictions than the SVR and BP.