An Automatic Classiﬁcation Method of Well Testing Plot Based on Convolutional Neural Network (CNN)

: The precondition of well testing interpretation is to determine the appropriate well testing model. In numerous attempts in the past, automatic classiﬁcation and identiﬁcation of well testing plots have been limited to fully connected neural networks (FCNN). Compared with FCNN, the convolutional neural network (CNN) has a better performance in the domain of image recognition. Utilizing the newly proposed CNN, we develop a new automatic identiﬁcation approach to evaluate the type of well testing curves. The ﬁeld data in tight reservoirs such as the Ordos Basin exhibit various well test models. With those models, the corresponding well test curves are chosen as training samples. One-hot encoding, Xavier normal initialization, regularization technique, and Adam algorithm are combined to optimize the established model. The evaluation results show that the CNN has a better result when the ReLU function is used. For the learning rate and dropout rate, the optimized values respectively are 0.005 and 0.4. Meanwhile, when the number of training samples was greater than 2000, the performance of the established CNN tended to be stable. Compared with the FCNN of similar structure, the CNN is more suitable for classiﬁcation of well testing plots. What is more, the practical application shows that the CNN can successfully classify 21 of the 25 cases.


Introduction
Well testing generally has two major categories: Transient rate analysis and transient pressure analysis.For the transient pressure analysis, its main purpose is to identify the type of target reservoir and further quantitatively determine the reservoir properties.Muskat [1] first proposed a method of estimating the initial reservoir pressure and parameters using a buildup test plot.Due to the fact that compressibility of the formation fluid is difficult to study, this method only can qualitatively analyze the results.Van Everdingen and Hurst [2] used the Laplace integral method to obtain the analytical solution of the transient diffusion equation, which gives the mathematical theoretical basis of well testing.Based on this truth, Horner et al. [3] developed a classic "semi-log" analysis method, which can determine the permeability, skin factor, productivity index, and other parameters.These methods make full use of the mid and late period data in well testing, but a common disadvantage is that the early data of the well testing is ignored.
In order to make reasonable use of the early data in well testing, Ramey et al. [4] first proposed a "plate analysis method" of the log-log type plot.Further, Gringarten et al. [5] extended this method to various well test models such as the dual-porosity model and fractured well model, and a combination of different parameters were used to greatly reduce the difficulty in curve classification

Background
The Ordos Basin is the second largest sedimentary basin in China and it contains abundant oil and gas reserves.In terms of geology, the Ordos Basin is a large-scale multicycle craton basin with simple tectonics, which is made up of the Yimeng uplift, Weibei uplift, the Western margin thrust belt, the Tianhuan depression, and the Jinxi flexture belt [23,24].This Basin is a typical low-permeability formation with an average permeability of less than 1 mD.Except for the Chang 6 reservoir with developed horizontal bedding [25,26], the horizontal stress in most areas of the basin is greater than the vertical stress, which means that the fractures generated by hydraulic fracturing are mainly vertical fractures [27][28][29][30].

Concept of CNN
Traditional neural networks (like the FCNNs) use the matrix multiplication to describe the connection between input nodes and output nodes.Wherein, each individual weight of the weights matrix describes the interaction between an input unit and an output unit.For traditional neural networks, when the number of input nodes is quite large, the number of weights will also become very huge, and the training efficiency will drop drastically.To address this issue, the convolution method of reducing the number of weights is needed to reduce training costs.The two main advantages of the convolution method are weight sharing and sparse connections, which effectively improve the situation.The calculation process of convolution method is shown in Figure 1.The filter contains the weights to be optimized, and the forward propagation process of the filter is a process of calculating the output data by using the inner product matrix multiplication between the weights in the filter and the input data.In a CNN, the filter weights used by each convolutional layer (CONV layer) are the same.The sharing of filter weights can make image content unaffected by local feature and reduce the number of weights.Then, the data are convolved and output through the activation function.
Energies 2019, 12, x FOR PEER REVIEW 3 of 27 better prediction result than FCNN.Finally, 25 buildup test curves from Ordos Basin were used to verify the generalization ability of the CNN noted above.

Background
The Ordos Basin is the second largest sedimentary basin in China and it contains abundant oil and gas reserves.In terms of geology, the Ordos Basin is a large-scale multicycle craton basin with simple tectonics, which is made up of the Yimeng uplift, Weibei uplift, the Western margin thrust belt, the Tianhuan depression, and the Jinxi flexture belt [23,24].This Basin is a typical low-permeability formation with an average permeability of less than 1 mD.Except for the Chang 6 reservoir with developed horizontal bedding [25,26], the horizontal stress in most areas of the basin is greater than the vertical stress, which means that the fractures generated by hydraulic fracturing are mainly vertical fractures [27][28][29][30].

Concept of CNN
Traditional neural networks (like the FCNNs) use the matrix multiplication to describe the connection between input nodes and output nodes.Wherein, each individual weight of the weights matrix describes the interaction between an input unit and an output unit.For traditional neural networks, when the number of input nodes is quite large, the number of weights will also become very huge, and the training efficiency will drop drastically.To address this issue, the convolution method of reducing the number of weights is needed to reduce training costs.The two main advantages of the convolution method are weight sharing and sparse connections, which effectively improve the situation.The calculation process of convolution method is shown in Figure 1.The filter contains the weights to be optimized, and the forward propagation process of the filter is a process of calculating the output data by using the inner product matrix multiplication between the weights in the filter and the input data.In a CNN, the filter weights used by each convolutional layer (CONV layer) are the same.The sharing of filter weights can make image content unaffected by local feature and reduce the number of weights.Then, the data are convolved and output through the activation function.In addition to the CONV layer, the network often uses a pooling layer, which can adjust the output structure of the layer and reduce the size of the model.With the pooling layer, the calculation speed and the robustness of the model are improved.The pooling layer usually includes a max-pooling layer and an average-pooling layer, as shown in Figure 2, which is used to output the maximum value and the average value in the filter area.So, no weights exist in the filter of pooling layer.In addition to the CONV layer, the network often uses a pooling layer, which can adjust the output structure of the layer and reduce the size of the model.With the pooling layer, the calculation speed and the robustness of the model are improved.The pooling layer usually includes a max-pooling layer and an average-pooling layer, as shown in Figure 2, which is used to output the maximum value and the average value in the filter area.So, no weights exist in the filter of pooling layer.To achieve different test tasks, different layers need to be connected to form a CNN.The AlexNet is a typical CNN proposed by Krizhevsky et al. [31], which has a simple model structure but accurate image recognition rate.The AlexNet fully demonstrates the superior performance of CNN in dealing with complex data.As shown in Figure 3, the structure of AlexNet is 8 layers with weights, including 5 layers of CONV layers and 3 layers of fully connected layers (FC layers).Three max-pooling layers are utilized to adjust the output shape.Additionally, to reduce the dimension of curve probabilistic prediction data, a flatten method is used before the FC layer.Finally, the FC layers are used to achieve the data dimensional reduction and output the final results.In the calculation process of FC layers, the softmax function is usually chosen to calculate the probability of the data after dimension reduction.The picture with the highest probability is the final result of CNN.Equation (1) gives the mathematical expression of the softmax function.To achieve different test tasks, different layers need to be connected to form a CNN.The AlexNet is a typical CNN proposed by Krizhevsky et al. [31], which has a simple model structure but accurate image recognition rate.The AlexNet fully demonstrates the superior performance of CNN in dealing with complex data.As shown in Figure 3, the structure of AlexNet is 8 layers with weights, including 5 layers of CONV layers and 3 layers of fully connected layers (FC layers).Three max-pooling layers are utilized to adjust the output shape.Additionally, to reduce the dimension of curve probabilistic prediction data, a flatten method is used before the FC layer.Finally, the FC layers are used to achieve the data dimensional reduction and output the final results.In the calculation process of FC layers, the softmax function is usually chosen to calculate the probability of the data after dimension reduction.The picture with the highest probability is the final result of CNN.Equation (1) gives the mathematical expression of the softmax function.
where a l is the output value of the lth node of the output layer, c is the total number of sample classes.

Sample Obtaining
The type curve of well testing is the log-log plot, which is based on the analysis of the time, pressure, and its derivative in the log-log coordinates.The reservoir types are determined by different shapes of the curve, and they are very critical to well testing interpretation results.Due to the non-uniqueness in interpretation results, it is difficult to quickly and accurately determine the reservoir type corresponding to a large amount of interpretation data.Automatic identification of well test curve types based on CNN can significantly reduce the workload of identification, and it provides a reliable basis for accurate parameter inversion.
Production wells in unconventional reservoirs represented by the Ordos Basin are generally hydraulically fractured, so the vertically fractured model is one of the commonly used well test interpretation models in unconventional reservoirs.At the same time, due to hydraulic fracturing, the natural fractures in the formation are activated, and the resulting considerable amounts of buildup test data of Ordos basins are characterized by a dual-porosity model.On the other hand, large-scale hydraulic fracturing significantly improves the permeability of the near-well region, which means that the radial composite model is also used as the reservoir model for well test interpretation in unconventional reservoirs.The mathematic expressions of the above well testing models are given in the Appendixes B-D.With these mathematic expressions, Figure 4 shows that the typical well test curves for the above models can be roughly divided into the following categories.In the same reservoir conditions, there is no doubt that the radial composite model with mobility ratio >1 and dispersion ratio >1 in the five well test models has the greatest productivity.The reason is that this model assumes that the area around the production well has been adequately stimulated by the hydraulic fracturing, so an inner zone of high permeability is formed around the production well, which contributes to the largest productivity.

Sample Obtaining
The type curve of well testing is the log-log plot, which is based on the analysis of the time, pressure, and its derivative in the log-log coordinates.The reservoir types are determined by different shapes of the curve, and they are very critical to well testing interpretation results.Due to the non-uniqueness in interpretation results, it is difficult to quickly and accurately determine the reservoir type corresponding to a large amount of interpretation data.Automatic identification of well test curve types based on CNN can significantly reduce the workload of identification, and it provides a reliable basis for accurate parameter inversion.
Production wells in unconventional reservoirs represented by the Ordos Basin are generally hydraulically fractured, so the vertically fractured model is one of the commonly used well test interpretation models in unconventional reservoirs.At the same time, due to hydraulic fracturing, the natural fractures in the formation are activated, and the resulting considerable amounts of buildup test data of Ordos basins are characterized by a dual-porosity model.On the other hand, large-scale hydraulic fracturing significantly improves the permeability of the near-well region, which means that the radial composite model is also used as the reservoir model for well test interpretation in unconventional reservoirs.The mathematic expressions of the above well testing models are given in the Appendices B-D.With these mathematic expressions, Figure 4 shows that the typical well test curves for the above models can be roughly divided into the following categories.In the same reservoir conditions, there is no doubt that the radial composite model with mobility ratio >1 and dispersion ratio >1 in the five well test models has the greatest productivity.The reason is that this model assumes that the area around the production well has been adequately stimulated by the hydraulic fracturing, so an inner zone of high permeability is formed around the production well, which contributes to the largest productivity.
In this paper, the training set included 2725 well test curves for five well test models, and 25 field buildup test cases were used to evaluate the generalization ability of CNN.The pressure derivative-time curve data for each training sample was used for classification.There were 545 curves of each well test model type and Table 1 shows the range of corresponding parameters for five well test models.In this paper, the training set included 2725 well test curves for five well test models, and 25 field buildup test cases were used to evaluate the generalization ability of CNN.The pressure derivative-time curve data for each training sample was used for classification.There were 545 curves of each well test model type and Table 1 shows the range of corresponding parameters for five well test models.Before training, improving, and evaluating the CNN model for well test plots, it was necessary to divide the training data sets into training set, validation set, and test set.Their quantities respectively accounted for 90.909%, 8.182%, and 0.909% of the total number of samples.The primary role of the validation set was to compare the performance of different neural network models.The test set was used to verify the generalization ability of the model based on the mine data.The validation set and test set were not involved in the training process of the network.The first time they were entered into the network was in the process of network verification.In total, 2500 of the theoretical curves were determined as training sets, and the remaining 225 curves were chosen as the validation set.Additionally, 25 field buildup test cases from the Ordos Basin were used as a test set.Figure 5

Structure of Neural Network Model
The neural network model has strong ability of nonlinear representation, as its basic unit is a neuron.Through the design of different numbers of neurons and different layers, various mapping relations can be characterized.

Model Building of CNN
The CNN constructed in this paper was a five-layer deep network with weights, in which three layers were CONV layers and two layers were FC layers, as shown in the Figure 6.There was also the max-pooling layer and average-pooling layer between various CONV layers, which was used to compress the input data and reduce overfitting problem.Table 2 shows the number of network weights in the different layers and the total number of weights was 76,583.In order to minimize the weights number of CNN, the input layer of the CNN was the data point of pressure derivative time plot, rather than the curve picture.Since the input data point of pressure derivative time curve was one-dimensional data with respect to time, we used the layers containing one-dimensional (1D) filter, including CONV1D, max-pooling1D.Layers containing two-dimensional (2D) filters (such as CONV2D, max-pooling2D, and average-pooling2D) were used to transform 1D data into 2D data needed for convolutional calculations.In the final layer, flatten method and softmax activation function were used to output the result of CNN.

Structure of Neural Network Model
The neural network model has strong ability of nonlinear representation, as its basic unit is a neuron.Through the design of different numbers of neurons and different layers, various mapping relations can be characterized.

Model Building of CNN
The CNN constructed in this paper was a five-layer deep network with weights, in which three layers were CONV layers and two layers were FC layers, as shown in the Figure 6.There was also the max-pooling layer and average-pooling layer between various CONV layers, which was used to compress the input data and reduce overfitting problem.Table 2 shows the number of network weights in the different layers and the total number of weights was 76,583.In order to minimize the weights number of CNN, the input layer of the CNN was the data point of pressure derivative time plot, rather than the curve picture.Since the input data point of pressure derivative time curve was one-dimensional data with respect to time, we used the layers containing one-dimensional (1D) filter, including CONV1D, max-pooling1D.Layers containing two-dimensional (2D) filters (such as CONV2D, max-pooling2D, and average-pooling2D) were used to transform 1D data into 2D data needed for convolutional calculations.In the final layer, flatten method and softmax activation function were used to output the result of CNN.Model Building of FCNN In contrast, a FCNN was established, which had a similar number of weights as CNN.The input layer, hidden layer, and output layer for FCNN had 488, 106, and 5 neurons respectively.Figure 7 shows that the input layer consisted of 488 nodes that accepted the 244 data points (t, dP).Table 3 demonstrates that the FCNN had a total of 76,575 weights.

Evaluation Results for the CNN and FCNN
During the training process of the FCNN and CNN, the maximum value of the output corresponded to the type of curves being predicted, which was recorded as ŷ .In order to optimize the weights in models, the cross-entropy function values of the predicted and the theoretical curve type were calculated.As shown in Equation ( 2

Model Building of FCNN
In contrast, a FCNN was established, which had a similar number of weights as CNN.The input layer, hidden layer, and output layer for FCNN had 488, 106, and 5 neurons respectively.Figure 7 shows that the input layer consisted of 488 nodes that accepted the 244 data points (t, dP).Table 3 demonstrates that the FCNN had a total of 76,575 weights.Evaluation Results for the CNN and FCNN During the training process of the FCNN and CNN, the maximum value of the output corresponded to the type of curves being predicted, which was recorded as ŷ.In order to optimize the weights in models, the cross-entropy function values of the predicted and the theoretical curve type were calculated.As shown in Equation ( 2), the cross-entropy function is usually recorded as the loss function L. When the loss function value is the smallest, the ratio of the number of accurately predicted training samples to the total number of samples (called accuracy) is the largest, which means that the network model has the highest performance.
where y i the type of the ith training samples, ŷi is the probability of ith training samples, and m is the number of training samples.In order to obtain a robust and fast CNN, the newly proposed one-hot encoding, Xavier normal initialized model, ReLU activation function, L2 regularization method, Adam optimization algorithm, and mini batch technique were combined to further construct the CNN.

One-Hot Encoding
In the training tasks of machine learning, the variety of sources of training data led to more complex data types.The training data can be roughly divided into type data and numerical data.
The training process of the neural network model was performed on the basis of numerical data.Therefore, in the classification task, the type data needed to be converted into numerical data and were further used to train the neural network model.The one-hot encoding method is a commonly used method of encoding type data into numerical data, which encodes the type data into a binary vector with at most one valid value.As shown in Table 4, each column represents each category in the training sample data and the unit containing "1" represents the category to which the sample data belongs.

Determination of Model Initialization
During the training process of the neural network model, proper initialization of the weights was essential to establish a robust neural network model.The proper initialization of the weights will cause the weights to be distributed over a reasonable range.If the initial weight value is small, the effective information in the backpropagation process will be ignored and the training process of neural network model maybe invalidated.When the initial weight values are large, the weight fluctuations in the backpropagation process will increase, which may lead to the instability or even

Selection of Activation Functions
The activation function of the neural network has a significant impact on the prediction effect of the model.When no activation function is used (i.e., f(x) = x), the input of each node is a linear function of the output result of the node in upper layer.In this case, regardless of the number of layers in the neural network, the output result only is a linear combination of input results, and the hidden layer does not work.Only when the neural network model uses a nonlinear activation function are the output results no longer a linear combination of input results and it can approximate an arbitrary function.Table 5 shows the five commonly used activation functions (i.e., linear, tanh, sigmoid, ELU, and ReLU).As shown in Figure 9, the comparative results showed that the neural network model had a better effect when the ReLU function was used in the middle layer.

Type
Equation linear x x e e f x e e

Selection of Activation Functions
The activation function of the neural network has a significant impact on the prediction effect of the model.When no activation function is used (i.e., f (x) = x), the input of each node is a linear function of the output result of the node in upper layer.In this case, regardless of the number of layers in the neural network, the output result only is a linear combination of input results, and the hidden layer does not work.Only when the neural network model uses a nonlinear activation function are the output results no longer a linear combination of input results and it can approximate an arbitrary function.Table 5 shows the five commonly used activation functions (i.e., linear, tanh, sigmoid, ELU, and ReLU).As shown in Figure 9, the comparative results showed that the neural network model had a better effect when the ReLU function was used in the middle layer.

Table 5. The mathematical expression of five commonly activation functions.
Type Equation

Regularization Technique
The overfitting is a common problem in the training process of neural network models and it greatly reduces the generalization ability of neural network models [33].The main reasons for the overfitting problem are insufficient training samples and a complex structure of networks.To overcome this problem, the dropout method and L2 regularization method are used to dynamically adjust the network structure, which can effectively avoid the overfitting problem.(1) In the process of forward propagation, the dropout method can make various nodes stop working in a certain probability p (called dropout rate) and the relative importance of each node is balanced.After the introduction of the dropout method, each node of the neural network model contributes more equally to the output results and avoids the situation where a few high-weight nodes fully control the output results.(2) For the L2 regularization method, the sum of the squared value for weight squares is added to the loss function, which can constrain the size of the weight values to reduce complexity of the model.Therefore, Equation ( 2) is rewritten as Equation (3).

 
where λ is the super-parameter, which is used to control the level of weight decay.n is the amount of weights.In contrast, the results of the classification accuracy of the model with dropout method, L2 regularization method, and without regularization method are compared.It can be seen from Figure 10 that the model had the highest accuracy in the validation set when using the dropout method and the accuracy in validation set was close to the that for the training set.

Regularization Technique
The overfitting is a common problem in the training process of neural network models and it greatly reduces the generalization ability of neural network models [33].The main reasons for the overfitting problem are insufficient training samples and a complex structure of networks.To overcome this problem, the dropout method and L2 regularization method are used to dynamically adjust the network structure, which can effectively avoid the overfitting problem.(1) In the process of forward propagation, the dropout method can make various nodes stop working in a certain probability p (called dropout rate) and the relative importance of each node is balanced.After the introduction of the dropout method, each node of the neural network model contributes more equally to the output results and avoids the situation where a few high-weight nodes fully control the output results.(2) For the L2 regularization method, the sum of the squared value for weight squares is added to the loss function, which can constrain the size of the weight values to reduce complexity of the model.Therefore, Equation ( 2) is rewritten as Equation (3).
where λ is the super-parameter, which is used to control the level of weight decay.n is the amount of weights.In contrast, the results of the classification accuracy of the model with dropout method, L2 regularization method, and without regularization method are compared.It can be seen from Figure 10 that the model had the highest accuracy in the validation set when using the dropout method and the accuracy in validation set was close to the that for the training set.

Adam Optimization Algorithm
To obtain the minimum value of loss function of the model, the weights in the network model need to be updated at each iteration step.Among various optimization algorithms for weight updating, the Adam optimization algorithm proposed by Kingma and Ba, [34] has the highest performance [35,36].Compared to the classical gradient descent method, this method can avoid the oscillation of the loss function and the model with Adam optimization algorithm has a higher convergence speed.The Adam optimization algorithm updates the network model weights in the form of Equation ( 4).In Equation ( 5), ( ) ( ) where β 1 and 2 β are the exponential decay rates for the moment estimates, t j g is the gradient in the jth parameter under the tth time step.In this work, we used the parameters recommended by Kingma and Ba, [34]: β 1 = 0.9, 2 β = 0.999, 0 0 j ω = , 0 0 j υ = and ε = 10 −8 .

Mini Batch Technique
The premise of machine learning is that it requires a huge sample size.During each iteration of the model, the optimization algorithm needs to fit the model to all training samples at once, so the requirement for CPU will be enormous.In order to reduce the requirements for CPU and improve

Adam Optimization Algorithm
To obtain the minimum value of loss function of the model, the weights in the network model need to be updated at each iteration step.Among various optimization algorithms for weight updating, the Adam optimization algorithm proposed by Kingma and Ba, [34] has the highest performance [35,36].Compared to the classical gradient descent method, this method can avoid the oscillation of the loss function and the model with Adam optimization algorithm has a higher convergence speed.The Adam optimization algorithm updates the network model weights in the form of Equation ( 4).
where η t is the learning rate at the tth time step, w t j is the network weight of the jth feature of the training sample data under the tth time step, ε is a small constant to avoid a zero denominator.

Mini Batch Technique
The premise of machine learning is that it requires a huge sample size.During each iteration of the model, the optimization algorithm needs to fit the model to all training samples at once, so the requirement for CPU will be enormous.In order to reduce the requirements for CPU and improve computational efficiency, the mini batch technique was utilized as it can randomly select a small portion of the training samples in the training set for each iteration process of the model.Meanwhile, the random selection of training samples also made the mini batch technology effectively avoid the neural network model falling into local minimum problems during the training process.For the mini batch technique, the gradient g t in Equations ( 5) and ( 6) is as follows: where b represents the number of iterative step of the mini batch method from t − 1th time step to tth time step.g t k is the gradient of the kth iterative step from t-1th time step to tth time step.w t−1 is the weights at the t-1th time step.s is the number of training samples in one mini batch.i 1 , . . ., i s are a random number between 1 and m. x i r is the pressure derivative-time curve data of the i r training sample.y i r the type of the i r training sample.m is the total number of training samples.

Comparison of Classification Performance for FCNN and CNN
We used the same techniques (including regularization technique, Adam optimization algorithm, activation functions, and initialization methods) to optimize the FCNN model.Table 6 and Figure 11 compare the errors of different models.The error of the test set verified the performance of the well test plot classification in the field buildup test cases and demonstrated the generalization ability of the CNN.For the FCNN, after 100 iterations, the loss function value was 0.44, and the classification accuracy was 91.2%.In the validation set, its accuracy was 89.8%.For the CNN, the loss function and accuracy for training set and validation set respectively were 0.19, 96.6%, and 95.6%.The comparing results of FCNN and CNN showed that the CNN had a higher accuracy when the number of weights of two models were close (76,583 weights and 76,575 weights).
As shown in Figures 12-14, the confusion matrix analysis is a method for judging the classification performance of neural network model, which shows the accuracy result of classification.Mukhanov et al. [37] used the confusion matrix to evaluate the classification result of the waterlogging curve by the support vector machine technology.The confusion matrix separately calculates the number of misclassification classes and the number of correct classification classes in the model.It can be seen from Figures 12 and 13, and Tables 7 and 8 that the FCNN had different classification capabilities for various types of well test curves in the training set and validation set.For the CNN, its classification results of the 5 well test models were 0.98, 0.94, 0.97, 0.95, 0.98 for training set and 0.97, 0.93, 0.96, 0.93, 0.98 for validation set, which were generally better than the FCNN results.Figure 13 and Table 8 also showed that the FCNN forecasting results of class1, class3, and class5 in the validation set were basically correct, but there was a large error for the curves of class2 and class4.For CNN, the stability of forecasting result was high, and the prediction errors of various types of curves were almost the same, indicating the reliability of CNN.Through the confusion matrix, the recall rate (Equation ( 11)) and precision rate (Equation ( 10)) of the model could be calculated.The precision rate represents the number of correctly predicted samples in a class (TP) to all actually retrieved items (the sum of TP and FP).The recall rate refers to the TP as a percentage of all items (the sum of TP and FN) that should be retrieved.F 1 value is the harmonic mean of the precision rate and recall rate.The average of the F 1 values for all training samples is determined as Score (Equation ( 13)).Table 8 summarizes the performance of different network models in validation sets.It can be seen that the Score of the FCNN model was 0.81 and the value of CNN was 0.91, indicating that the overall performance of CNN was better than FCNN.
Precision = TP TP + FP (10) where TP is the number of correctly predicted samples in a class.For a certain type of training sample, FN is the difference between the number of successfully predicted training samples and the total number of training samples, FP is the difference between the number of successfully predicted training samples and the predicted number of samples for a certain type, c is the total number of sample classes.Finally, the classification ability of CNN was verified on 25 field buildup test data, among which 21 samples are successfully classified.Table 9 and Figure 14 show the confusion matrix of the model in the test set, and its Score was 0.69.Appendix A shows the data of the 25 field buildup test data.confusion matrix, the recall rate (Equation ( 11)) and precision rate (Equation ( 10)) of the model could be calculated.The precision rate represents the number of correctly predicted samples in a class (TP) to all actually retrieved items (the sum of TP and FP).The recall rate refers to the TP as a percentage of all items (the sum of TP and FN) that should be retrieved.F1 value is the harmonic mean of the precision rate and recall rate.The average of the F1 values for all training samples is determined as Score (Equation ( 13)).Table 8 summarizes the performance of different network models in validation sets.It can be seen that the Score of the FCNN model was 0.81 and the value of CNN was 0.91, indicating that the overall performance of CNN was better than FCNN.

TP Precision TP FP
= + where TP is the number of correctly predicted samples in a class.For a certain type of training sample, FN is the difference between the number of successfully predicted training samples and the total number of training samples, FP is the difference between the number of successfully predicted training samples and the predicted number of samples for a certain type, c is the total number of sample classes.14 show the confusion matrix of the   14 show the confusion matrix of the

Effects of Parameters on Classification Results
Sensitivity analysis is a key step in testing CNN performance and determining the impact of input parameters on the predictive results [38][39][40][41][42][43].In order to study the influence of a series of CNN parameters on the prediction results and further optimize the established CNN, a sensitivity analysis was conducted.

Effects of Parameters on Classification Results
Sensitivity analysis is a key step in testing CNN performance and determining the impact of input parameters on the predictive results [38][39][40][41][42][43].In order to study the influence of a series of CNN parameters on the prediction results and further optimize the established CNN, a sensitivity analysis was conducted.

Effect of the Learning Rate
The loss function is a function of weights, and the learning rate determines the update speed of the weights in the CNN and determines the value of the loss function.If the learning rate is too large, it will cause the loss function to oscillate and the CNN is hard to converge.When the learning rate is a small value, the updated value of the weights also is small and the model converges slowly.Therefore, there is an optimized learning rate for each CNN.In order to determine the optimal learning rate, we changed the value of the learning rate from 0.0001 to 0.03 and remaining values were constant.Figure 15 shows that the CNN had the highest accuracy on both the validation set and the training set when the learning rate was 0.005.The key point of the dropout method to prevent the overfitting problem is to make some nodes of the CNN to stop working in a probability of dropout rate.Therefore, the value of the dropout rate has a significant impact on the training effect of the CNN.As shown in Figure 16, as the

Effect of the Dropout Rate
The point of the dropout method to prevent the overfitting problem is to make some nodes of the CNN to stop working in a probability of dropout rate.Therefore, the value of the dropout rate has a significant impact on the training effect of the CNN.As shown in Figure 16, as the dropout rate increased, the accuracy of CNN on the training set continued to decrease.For the value for the validation set, it increased firstly and then decreased when the dropout rate increased.Meanwhile, the accuracy difference of the training set and the validation set was large in the case of small dropout value, indicating that the overfitting phenomenon had occurred.As the dropout rate became larger, the accuracy difference was very small, meaning that the CNN had not been well fitted to the training data.For the CNN in this paper, the optimal value of the dropout rate was 0.4.

Effect of the Dropout Rate
The key point of the dropout method to prevent the overfitting problem is to make some nodes of the CNN to stop working in a probability of dropout rate.Therefore, the value of the dropout rate has a significant impact on the training effect of the CNN.As shown in Figure 16, as the dropout rate increased, the accuracy of CNN on the training set continued to decrease.For the value for the validation set, it increased firstly and then decreased when the dropout rate increased.Meanwhile, the accuracy difference of the training set and the validation set was large in the case of small dropout value, indicating that the overfitting phenomenon had occurred.As the dropout rate became larger, the accuracy difference was very small, meaning that the CNN had not been well fitted to the training data.For the CNN in this paper, the optimal value of the dropout rate was 0.4.

Effect of the Number of Training Samples
The performance of CNN is strongly controlled by the number of training samples.A small sample size can make the training process of the CNN difficult to converge.In general, CNN require a pretty large training sample size and the negative effect is that this large sample size usually increases the requirement for CPU.To select as few samples as possible while ensuring the best performance of CNN, the impact of sample size on CNN learning curves was investigated.Figure 17 shows that as the number of training samples increased, the accuracy of CNN model on the validation set also increased.When the number of training samples was greater than 2000, the increase of the accuracy on the validation set tended to be flat.Meanwhile, the CNN had a similar accuracy on both the training set and the validation set.Therefore, the number of training samples was finally determined to be 2500.performance of CNN, the impact of sample size on CNN learning curves was investigated.Figure 17 shows that as the number of training samples increased, the accuracy of CNN model on the validation set also increased.When the number of training samples was greater than 2000, the increase of the accuracy on the validation set tended to be flat.Meanwhile, the CNN had a similar accuracy on both the training set and the validation set.Therefore, the number of training samples was finally determined to be 2500.

Conclusions
In this paper, a CNN model was developed to classify the well testing curves.In order to obtain the best test curve classification effect, before the training, we optimized the CNN model from several aspects such as regularization technology, activation function, and optimization algorithm.The results show that the Xavier normal initialization worked best in the four optimization methods.Among the five activation functions, the CNN model had the best performance when the activation function of the convolution layer and the output layer was chosen as the ReLU function.Compared to the L2 regularization method, the dropout method had a better performance in avoiding overfitting problem.In addition, the utilization of mini batch technique and Adam optimization algorithm made the model not fall into local minimum and fast convergence.Further, the impacts of key parameters in the CNN model on work performance were studied.It was found that when the learning rate was 0.005, the CNN had the highest precision in the validation set and the training set.For the dropout rate, CNN could better fit the training data without over-fitting phenomenon in the case of 0.4.The analysis of training sample numbers showed that the accuracy difference between the training set and the validation set could be ignored when the number of training samples was 2500.Finally, the classification results of CNN and FCNN with similar structures on well testing curves were compared.For the validation set, the Score of FCNN and CNN were 0.81 and 0.91, respectively, indicating that the CNN had a more robust performance in the classification results of the well test curve.The 25 field cases from the Ordos Basin showed that the trained CNN could successfully classify 21 cases and the robustness of the model was further proved.

Conclusions
In this paper, a CNN model was developed to classify the well testing curves.In order to obtain the best test curve classification effect, before the training, we optimized the CNN model from several aspects such as regularization technology, activation function, and optimization algorithm.The results show that the Xavier normal initialization worked best in the four optimization methods.Among the five activation functions, the CNN model had the best performance when the activation function of the convolution layer and the output layer was chosen as the ReLU function.Compared to the L2 regularization method, the dropout method had a better performance in avoiding overfitting problem.In addition, the utilization of mini batch technique and Adam optimization algorithm made the model not fall into local minimum and fast convergence.Further, the impacts of key parameters in the CNN model on work performance were studied.It was found that when the learning rate was 0.005, the CNN had the highest precision in the validation set and the training set.For the dropout rate, CNN could better fit the training data without over-fitting phenomenon in the case of 0.4.The analysis of training sample numbers showed that the accuracy difference between the training set and the validation set could be ignored when the number of training samples was 2500.Finally, the classification results of CNN and FCNN with similar structures on well testing curves were compared.For the validation set, the Score of FCNN and CNN were 0.81 and 0.91, respectively, indicating that the CNN had a more robust performance in the classification results of the well test curve.The 25 field cases from the Ordos Basin showed that the trained CNN could successfully classify 21 cases and the robustness of the model was further proved.

Figure 1 .
Figure 1.Schematic diagram of convolution in convolutional neural network (CNN) (the elements in the matrix represent the pixel values of the input data and weights).

Figure 1 .
Figure 1.Schematic diagram of convolution in convolutional neural network (CNN) (the elements in the matrix represent the pixel values of the input data and weights).

Figure 2 .
Figure 2. Schematic diagram of pooling layer calculation process (a) max-pooling layer (b) average-pooling layer (the elements in the matrix are various pixels).

a
is the output value of the lth node of the output layer, c is the total number of sample classes.

Figure 2 .
Figure 2. Schematic diagram of pooling layer calculation process (a) max-pooling layer (b) average-pooling layer (the elements in the matrix are various pixels).
is a schematic diagram of training data partition.Energies 2019, 12, x FOR PEER REVIEW 8 training, improving, and evaluating the CNN model for well test plots, it was necessary to divide the training data sets into training set, validation set, and test set.Their quantities respectively accounted for 90.909%, 8.182%, and 0.909% of the total number of samples.The primary role of the validation set was to compare the performance of different neural network models.The test set was used to verify the generalization ability of the model based on the mine data.The validation set and test set were not involved in the training process of the network.The first time they were entered into the network was in the process of network verification.In total, 2500 of the theoretical curves were determined as training sets, and the remaining 225 curves were chosen as the validation set.Additionally, 25 field buildup test cases from the Ordos Basin were used as a test set.Figure 5 is a schematic diagram of training data partition.

Figure 5 .
Figure 5. Data partition including training set, validation set, and test set.

Figure 5 .
Figure 5. Data partition including training set, validation set, and test set.
), the cross-entropy function is usually recorded as the loss function L. When the loss function value is the smallest, the ratio of the number of accurately predicted training samples to the total number of samples (called accuracy) is the largest, which means that the network model has the highest performance.
model.The commonly used initialization methods of neural network model include four categories and they are as follows: (1) Randomnormal method; (2) Randomuniform; (3) Xavier normal; (4) Xavier uniform[32].In this work, we compared the effects of four initialization methods on the training results.After 100 iterations of the model, the Xavier normal initialized model had the highest accuracy in the training set and the validation set (Figure8).

Energies 2019 ,
12, x FOR PEER REVIEW 11 of 27normal; (4) Xavier uniform[32].In this work, we compared the effects of four initialization methods on the training results.After 100 iterations of the model, the Xavier normal initialized model had the highest accuracy in the training set and the validation set (Figure8).

Figure 8 .
Figure 8.Comparison results of CNN on training set and validation set under different initialization methods.

Figure 8 .
Figure 8.Comparison results of CNN on training set and validation set under different initialization methods.

Figure 9 .
Figure 9.Comparison results of CNN on training set and validation set under different activation functions.

Figure 9 .
Figure 9.Comparison results of CNN on training set and validation set under different activation functions.

Figure 10 .
Figure 10.Comparison results of CNN on training set and validation set under different regularization techniques.

Figure 10 .
Figure 10.Comparison results of CNN on training set and validation set under different regularization techniques.

Figure 11 .
Figure 11.The changes in the accuracy and loss function curve for FCNN and CNN on training set as the number of iterations increases.

Figure 11 .Figure 12 .Figure 13 .
Figure 11.The changes in the accuracy and loss function curve for FCNN and CNN on training set as the number of iterations increases.

Figure 14 .
Figure 14.The confusion matrix of CNN on test set.

Energies 2019 , 27 Figure 15 .
Figure 15.Comparison of the training results of CNN on training set and validation set under different learning rates.

Figure 15 .
Figure 15.Comparison of the training results of CNN on training set and validation set under different learning rates.

Figure 15 .
Figure 15.Comparison of the training results of CNN on training set and validation set under different learning rates.

Figure 16 .
Figure 16.Comparison of the training results of CNN on training set and validation set under different dropout rates.

Figure 16 .
Figure 16.Comparison of the training results of CNN on training set and validation set under different dropout rates.

Figure 17 .
Figure 17.Comparison of the training results of CNN on training set and validation set under different numbers of training samples.

Figure 17 .
Figure 17.Comparison of the training results of CNN on training set and validation set under different numbers of training samples.

Table 1 .
The range of model parameters of various well test models in this paper.

Table 1 .
The range of model parameters of various well test models in this paper.

Table 2 .
The layer shape and weights number of CNN.

Table 2 .
The layer shape and weights number of CNN.

Table 3 .
The layer shape and weights number of FCNN.

Table 3 .
The layer shape and weights number of FCNN.

Table 4 .
The schematic diagram of one-hot encoding.

Table 5 .
The mathematical expression of five commonly activation functions.

Table 7 .
The evaluation result of FCNN and CNN on training set.

Table 8 .
The evaluation result of FCNN and CNN on validation set.Finally, the classification ability of CNN was verified on 25 field buildup test data, among which 21 samples are successfully classified.Table9 and Figure

Table 7 .
The evaluation result of FCNN and CNN on training set.

Table 8 .
The evaluation result of FCNN and CNN on validation set.Finally, the classification ability of CNN was verified on 25 field buildup test data, among which 21 samples are successfully classified.Table9 and Figure

Table 9 .
The evaluation result of test set for CNN.

Table 7 .
The evaluation result of FCNN and CNN on training set.

Table 8 .
The evaluation result of FCNN and CNN on validation set.

Table 9 .
The evaluation result of test set for CNN.