A Deep-Learning-Based Oil-Well-Testing Stage Interpretation Model Integrating Multi-Feature Extraction Methods

: The interpretation of well-testing data is a key means of decision-making support for oil and gas ﬁeld development. However, conventional processing methods have many problems, such as the stochastic nature of the data, feature redundancies, the randomness of the initial weights or thresholds, and ﬂuctuations in the generalization ability with slight changes in the network parameters. These result in a poor ability to characterize data features and a low generalization ability of the interpretation models. We propose a new integrated well-testing interpretation model based on a multi-feature extraction method and deep mutual information classiﬁers (MFE-DMIC). This model can avoid the low model classiﬁcation accuracy caused by the simple training structures, lacking of redundancy elimination, and the non-optimal classiﬁer conﬁguration parameters. First, we obtained the initial features according to four classical feature extraction methods. Then, we eliminated feature redundancies using a deep belief network and united the maximum information coe ﬃ cient method to achieve feature puriﬁcation. Finally, we calculated the interpretation results using a hybrid particle swarm optimization–support vector machine classiﬁcation system. We used 572 well-testing ﬁeld samples, including ﬁve working stages, for model training and testing. The results show that the MFE-DMIC model had the highest total stage classiﬁcation accuracy of 98.18% as well as the least of features (nine) compared with the classical feature extraction and classiﬁcation methods and their combinations. The proposed model can reduce the e ﬀ orts of oil analysts and allow accurate labeling of samples to be predicted. feature extraction method and deep mutual information classifiers (MFE DMIC) algorithm framework. MIC: maximum information coefficient.


Introduction
During oil and gas exploitation, the use of an efficient stage interpretation scheme for well-testing data not only guides staff toward revising the production flow but can also provide an important means to manage reservoirs scientifically [1]. Due to temporary operation flow adjustments and the potential for uncertain events to occur at strata and borehole locations [2], well-testing data have the characteristics of being both stochastic and non-scheduled. Even when the data are from the same operation stage but different testing wells, a large difference exists in the length and curve shapes, which poses significant challenges when interpreting the data [3]. These differences directly contribute to an inaccurate matching relationship between the testing data being played back and the operation stage. This mismatch is primarily attributed to the poor information-characterizing ability of the features extracted from the data and to the low generalized interpretation ability of the models [1].

Multi-Feature Exteraction
A complete well-testing process has five stages: lowering the oil string, the waiting stage, well opening, well closing, and pulling up the oil string. Although the procedure of well testing is scheduled and planned, affected by the on-site working conditions, the running time of each stage is uncertain, and its configuration cannot be predicted, designed, or added to the program before lowering the electronic pressure gauge to the oil well. Some operating platforms need to adjust the frequency and time of well opening and closing according to the surface pressure test results. In addition, the underground energy and pressure recovery capacity of different test wells are different, which will lead to different waiting times for each testing stage. As shown in Figure 2a,b, even when the data are from the well opening stage but two different testing wells, large differences in the position, length, and occurrence of the peak and inflection point on the timeline exist. Therefore, it is impossible to design an accurate operation schedule in advance to provide an operation guide and executive standard for each test stage. Well-testing Sample1 The stepped pressure rise The step pressure drop The stepped pressure rise and continuous peak due to mechanical vibration and multiple pressure-relieving operations.

Singular signal with continuous peak
Well opening stage

Multi-Feature Exteraction
A complete well-testing process has five stages: lowering the oil string, the waiting stage, well opening, well closing, and pulling up the oil string. Although the procedure of well testing is scheduled and planned, affected by the on-site working conditions, the running time of each stage is uncertain, and its configuration cannot be predicted, designed, or added to the program before lowering the electronic pressure gauge to the oil well. Some operating platforms need to adjust the frequency and time of well opening and closing according to the surface pressure test results. In addition, the underground energy and pressure recovery capacity of different test wells are different, which will lead to different waiting times for each testing stage. As shown in Figure 2a,b, even when the data are from the well opening stage but two different testing wells, large differences in the position, length, and occurrence of the peak and inflection point on the timeline exist. Therefore, it is impossible to design an accurate operation schedule in advance to provide an operation guide and executive standard for each test stage.

Multi-Feature Exteraction
A complete well-testing process has five stages: lowering the oil string, the waiting stage, well opening, well closing, and pulling up the oil string. Although the procedure of well testing is scheduled and planned, affected by the on-site working conditions, the running time of each stage is uncertain, and its configuration cannot be predicted, designed, or added to the program before lowering the electronic pressure gauge to the oil well. Some operating platforms need to adjust the frequency and time of well opening and closing according to the surface pressure test results. In addition, the underground energy and pressure recovery capacity of different test wells are different, which will lead to different waiting times for each testing stage. As shown in Figure 2a,b, even when the data are from the well opening stage but two different testing wells, large differences in the position, length, and occurrence of the peak and inflection point on the timeline exist. Therefore, it is impossible to design an accurate operation schedule in advance to provide an operation guide and executive standard for each test stage.  Well-testing Sample1 The stepped pressure rise The step pressure drop The stepped pressure rise and continuous peak due to mechanical vibration and multiple pressure-relieving operations.  Figure 2 also shows the similarity of the distinguishable features between different stages. It shows that a stepped pressure rise will be caused by lowering the oil string, pressure gauge problems, and a low sampling frequency. The step pressure drop will occur when the oil string is pulled up and in variable-rate drawdown tests. Continuous peaks will appear due to screen plugging, multiple pressure-relieving operations during well closing, and mechanical vibrations. Furthermore, regarding the failure of well opening and closing, the packer loss of seal during the closing operation and the geological hazards will result in burrs similar to the curve peaks. The above analysis shows that it is not enough to analyze the data from the perspective of the time domain. The time-frequency domain analysis method has the ability to carry out multi-resolution data feature extraction, which can characterize the global and local features better using the above similarities and uncertainties. So, we propose the use of the multi-feature extraction (MFE) method to detect and distinguish changes in the production operation, noise interference, and formation blockage. We obtained five types of features, including the wavelet packet decomposition-approximate entropy feature (WPD-AE), the empirical mode decomposition-approximate entropy feature (EMD-AE), the fast Fourier transform (FFT) coefficient feature, the gradient feature, and the gradient extreme value feature.
The n 0 th group data are defined as s(n 0 , N, x, y) with data length N, where y is the amplitude of the data, the index number is x ∈ [0, N − 1], and SD denotes the standard deviation of s(n 0 , N, x, y). The dimension and tolerance threshold in EMD are donated as m and r, respectively.
WPD and EMD are two of the best methods for the nonlinear analysis of data, as they are conductive to the analysis of different types of stepped rises/falls in the pressure from the perspectives of the step width, curve rise/fall slop, and the time length of each stage. By using WPD, we extracted the WPD coefficients d N il (n 0 ) , where il ∈ [1, 2 C L ], and C L is the number of decomposed layers. By using EMD, we obtained multiple intrinsic mode function (IMF) components c N nl (n 0 ) and a residual r(n 0 ), where nl is the number of IMF components. Furthermore, we separately obtained the WPD-AE feature α(n 0 ) = ApEn N m,r (n 0 , d il ) and the EMD-AE feature β(n 0 ) = ApEn N m.r (n 0 , c nl ) by calculating the approximate entropy (AE) of d N il (n 0 ) and c N nl (n 0 ) . FFT can detect the density of the peak group and the sampling frequency. We obtained the frequency distribution of data in the frequency domain by calculating the top k 0 FFT coefficients Production experience [24] shows that gradient features and peak values can not only accurately predict the motion trend of the well-testing data but they can also judge the differences between peaks and burrs. Considering that global analysis is more advantageous for extracting well-testing stage features than transient analysis, we used linear regression (LR) to extract the features defined as follows, including three-interval regression parameters reg 3 (n 0 ) = [ra n 0 3,1 , ra n 0 3,2 , ra n 0 3,3 ] and multi-interval extreme parameters Reg nd (n 0 ) = [min(ra n 0 nd,jd ), max(ra n 0 nd,jd )], where ra n 0 nd,jd is the gradient value of the jdth interval data after dividing the n 0 th group data into nd parts; it can be expressed as where nd ∈ Z, nd > 3, jd ∈ [1, nd], x = (N/nd)−1 kd=0 x jd kd · (nd/N), and y = (N/nd)−1 kd=0 y jd kd · (nd/N) represent the mean of the index number and the magnitude of the jd interval data, respectively; and kd ∈ [0, N/nd − 1].
Thus, as shown in Table 1, the MFE method can obtain a vector with 38 multiple features for the n 0 th group data.

DBN Feature Learning
The DBN used in this paper was composed of a multilayer restricted Boltzmann machine (RBM) and a back propagation (BP) neural network. RBM has the advantages of possessing a simple structure, convenient network combination, and flexible setting of the neuron number in each layer. As shown in Figure 3, by training the RBM networks layer by layer and stacking the trained RBM networks into deep learning networks, the local optimal initial parameter values can be obtained [25].   Thus, as shown in Table 2, the MFE method can obtain a vector with 38 multiple features for the th 0 n group data.

DBN Feature Learning
The DBN used in this paper was composed of a multilayer restricted Boltzmann machine (RBM) and a back propagation (BP) neural network. RBM has the advantages of possessing a simple structure, convenient network combination, and flexible setting of the neuron number in each layer. As shown in Figure 3, by training the RBM networks layer by layer and stacking the trained RBM networks into deep learning networks, the local optimal initial parameter values can be obtained [  The introduction of the back propagation (BP) network to fine-tune the DBN network parameters can prevent the poor generalization ability caused by the randomness of the initial weights or thresholds and non-global optimization of parameters. The feature's learning steps were as follows: Step 1: Input the MFE features into the DBN network and train each layer of the RBM network for 100 iterations in an unsupervised manner.
Step 2: Add the fine-tuned BP network after DBN and optimize the DBN network weights 100 times to decrease the difference between the highest layer output of the network and the tagged data. The introduction of the back propagation (BP) network to fine-tune the DBN network parameters can prevent the poor generalization ability caused by the randomness of the initial weights or thresholds and non-global optimization of parameters. The feature's learning steps were as follows:

RBM Training
Step 1: Input the MFE features into the DBN network and train each layer of the RBM network for 100 iterations in an unsupervised manner.
Step 2: Add the fine-tuned BP network after DBN and optimize the DBN network weights 100 times to decrease the difference between the highest layer output of the network and the tagged data.  [26] to obtain an optimal value of θ. The RBM training process and its parameter configuration are shown in Algorithm 2. Obtain ∆w, ∆a, and ∆b using the CD method. 4: Update θ. 5: End Foreach 6: End Foreach 7: Get the correction of θ.

DBN Optimization
The corrected θ of each layer in the RBM network can guarantee the best mapping of the feature vectors of only that layer. Therefore, we used the gradient descent method-based BP network to obtain the global parameters θ DBN of the DBN. We used the Polack-Ribiere conjugate gradients method [27] to compute the search directions. The optimal function value was controlled within a reasonable range based on the Wolfe-Powell stopping criteria [28] using polynomial approximations.

MIC Purification
We constructed a new network structure with the same weight θ DBN as the training DBN model. Then, we completed the input features fv mapping from the first layer to the higher layers again. Therefore, the highest layer output features y h−level (n 0 ) from the n 0 th group data can be expressed as Considering that the problem of feature redundancy still exists in y h−level (n 0 ), we introduced the MIC to realize the feature purification. The main steps are as follows: Step 1: Calculate the MIC value M(y h−level , n 0 , I h−level ) using the Minepy package, where I h−level denotes the index number of each feature in y h−level (n 0 ). Step 2: Update the index sorting based on the MIC value from large to small and obtain the rank, which is expressed as where I → h−level denotes the updated feature index number, and M → (y h−level , n 0 , I → h−level ) denotes the corresponding MIC value.
Step 3: Calculate the SVM classification accuracy of the above rank, select the top n features to obtain the highest accuracy, and define these feature as y n MIC (n 0 ).

PSO
The LIBSVM software package developed by Lin Chih-Jen [29] offers several advantages, including convenient modification, and easy transplantation. Its SVM function only requires two parameters to be set: the penalty parameter c, and kernel function parameter g.
We introduced PSO to avoid the occurrence of overlearning and underfitting states and K-fold cross validation to obtain the best parameters c and g, K = 10 [30]. The proposed PSO-SVM training model is shown in Figure 4. It is evident that the training set is combined with the radial basis kernel function to optimize c and g. We used the testing set to validate the high classification accuracy of the model.

PSO
The LIBSVM software package developed by Lin Chih-Jen [29] offers several advantages, including convenient modification, and easy transplantation. Its SVM function only requires two parameters to be set: the penalty parameter c, and kernel function parameter g.
We introduced PSO to avoid the occurrence of overlearning and underfitting states and K-fold cross validation to obtain the best parameters c and g, K = 10 [30]. The proposed PSO-SVM training model is shown in Figure 4. It is evident that the training set is combined with the radial basis kernel function to optimize c and g. We used the testing set to validate the high classification accuracy of the model.

SVM Classification
After PSO, we used the LIBSVM with optimal c and g for the final feature classification. The predicted testing tag Tag is expressed as Then, the category with the largest c j num is the category to which the sample belongs.

Data Source
We collected the dataset used in this paper from the well-testing platform in Huabei Oilfield, China. The reservoirs have the feature of special lithology, structural fractures, and strong edge water. The sample containing the data of a complete well-testing operation was used as the data sample, and a sample containing the data of one working stage was used as the stage sample. From 2009 to 2018, based on the field data collected from 572 wells, we obtained a total of 4004 stage samples of oil testing data and the corresponding operation stages. We acquired all of the data using a downhole pressure storage gauge. As for the proportions of the training set, verification set, and testing set, we adopted the 6:2:2 mode [31]. Considering that this paper does not involve the selection of hidden layers of DBN, the combination of the verification set and testing set will not affect the training effect; the final ratio of the training set and verification set/testing set was 6:4. We randomly classified 2402 stage samples as the training set and 1602 stage samples as the verification set/testing set. Since each data sample contained two well opening and closing operations, the ratio of the sample number in each stage was 1:1:2:2:1. Figure 5 shows the relationship of different types of samples.

SVM Classification
After PSO, we used the LIBSVM with optimal c and g for the final feature classification. The predicted testing tag Tag is expressed as where f (·, i c ) (i c = 1, 2, . . . Dec) denotes the decision function of the i c th classifier, Dec is the combination of j c and 2, and j c is the number of categories. For a given test sample, we define the number of classifiers that classify y n MIC (n 0 ) as a category j c as num j c , Then, the category with the largest num j c is the category to which the sample belongs.

Data Source
We collected the dataset used in this paper from the well-testing platform in Huabei Oilfield, China. The reservoirs have the feature of special lithology, structural fractures, and strong edge water. The sample containing the data of a complete well-testing operation was used as the data sample, and a sample containing the data of one working stage was used as the stage sample. From 2009 to 2018, based on the field data collected from 572 wells, we obtained a total of 4004 stage samples of oil testing data and the corresponding operation stages. We acquired all of the data using a downhole pressure storage gauge. As for the proportions of the training set, verification set, and testing set, we adopted the 6:2:2 mode [31]. Considering that this paper does not involve the selection of hidden layers of DBN, the combination of the verification set and testing set will not affect the training effect; the final ratio of the training set and verification set/testing set was 6:4. We randomly classified 2402 stage samples as the training set and 1602 stage samples as the verification set/testing set. Since each data sample contained two well opening and closing operations, the ratio of the sample number in each stage was 1:1:2:2:1. Figure 5 shows the relationship of different types of samples.  Figure 6 shows the process of data acquisition to data playback. First, we lowered the electronic pressure gauge to the depth of the oil well where the measurement was required. Then, surface staff conducted the operations following the well-testing procedure. After the measurement was completed, the staff took the gauge out of the well, downloaded the memory data from the gauge, and interpreted the well-testing data using the method proposed in this paper.
We set the tag value for each stage as shown in Table 4. Surface staff identified the work stage based on the returned tag value and completed the interpretation work.  Pull-up the oil string 5

Proposed Well-Testing Stag Classification Scheme
The purpose of this paper is to put forward an idea of using multi-methods cooperation to realize high-precision classification in the well-testing stage. It mainly includes the following four aspects: 1. Propose the MFE multi feature extraction method on the basis of FFT, WPD, EMD, and gradient, and determine which method can complete the initial extraction of well-testing data features to the maximum extent.  Figure 6 shows the process of data acquisition to data playback. First, we lowered the electronic pressure gauge to the depth of the oil well where the measurement was required. Then, surface staff conducted the operations following the well-testing procedure. After the measurement was completed, the staff took the gauge out of the well, downloaded the memory data from the gauge, and interpreted the well-testing data using the method proposed in this paper.  Figure 5. Relationship of different types of samples. Figure 6 shows the process of data acquisition to data playback. First, we lowered the electronic pressure gauge to the depth of the oil well where the measurement was required. Then, surface staff conducted the operations following the well-testing procedure. After the measurement was completed, the staff took the gauge out of the well, downloaded the memory data from the gauge, and interpreted the well-testing data using the method proposed in this paper.
We set the tag value for each stage as shown in Table 4. Surface staff identified the work stage based on the returned tag value and completed the interpretation work.  Pull-up the oil string 5

Proposed Well-Testing Stag Classification Scheme
The purpose of this paper is to put forward an idea of using multi-methods cooperation to realize high-precision classification in the well-testing stage. It mainly includes the following four aspects: 1. Propose the MFE multi feature extraction method on the basis of FFT, WPD, EMD, and gradient, and determine which method can complete the initial extraction of well-testing data features to the maximum extent. We set the tag value for each stage as shown in Table 2. Surface staff identified the work stage based on the returned tag value and completed the interpretation work. Table 2. Working stages and the corresponding stage tags.

Stage Order
Working Stage Stage Tag

S1
Lower the oil string 1 S2 Waiting stage 2 S3 Well opening 3 S4 Well closing 4 S5 Pull-up the oil string 5

Proposed Well-Testing Stag Classification Scheme
The purpose of this paper is to put forward an idea of using multi-methods cooperation to realize high-precision classification in the well-testing stage. It mainly includes the following four aspects: 1. Propose the MFE multi feature extraction method on the basis of FFT, WPD, EMD, and gradient, and determine which method can complete the initial extraction of well-testing data features to the maximum extent.
Energies 2020, 13, 2042 9 of 18 2. Set the layer number of the deep learning network, determine the optimal number of neuron nodes, and introduce the BP feedback network to optimize the parameters of the DBN network, so as to ensure the efficient configuration of the deep learning network.
3. Use MIC to analyze the priority of feature elements in the feature vector and eliminate redundant features.
4. Optimize SVM classifier parameters. We set up an integrated well-testing interpretation model named MFE-DMIC. The model workflow is shown in Figure 7. The standard well testing operating procedure was used as a template, and each data sample was divided into five stages and stage labels were matched, forming a training set. The multi-dimensional feature attr extracted by the MFE method was learned by the parameter-optimized (including network weight ω and number of nodes R) DBN network. The output reconstructed feature was y h−level . The MIC method output the feature y MIC after removing redundancies and reducing the feature dimension. Finally, the SVM optimized by the PSO could be used successfully for classification.   Table A1). The simulation algorithm of the MFE-DMIC model is presented in Table 5.  In Figure 7, we set up three hidden layers, acting on feature learning, feature reconstruction, and feature dimension reduction. The input layer was composed of approximately 10 times as many neurons as the number of feature elements, and thus, the DBN had a strong learning ability and information processing capacity. The number of neurons in each of the three hidden layers R hidd declined by an equal ratio from input to output: 400, 200, and 100. The number of neurons in the first visible layer R f irst was set to 38. Here, Attr train (2402 × 38) is defined as the feature training set, and Attr test (1602 × 38) as the testing set features. The number of neurons in the highest layer R high was adjusted and set according to the simulation results of the training set. The main definitions are given in the Appendix A at the end of the paper (see Table A1). The simulation algorithm of the MFE-DMIC model is presented in Algorithm 3.

Experimental Simulations and Results
We conducted all the training, testing, and validation experiments using MATLAB 2017a.

Experimental Simulations and Results
We conducted all the training, testing, and validation experiments using MATLAB 2017a. Figure 8 Figure 8 shows that when high R was 30 and 50, and the number of weight optimization iterations was 30, and the classification accuracy stabilized quickly. When high R was 30, the DBN still achieved the highest system classification ability with fewer optimization iterations. Table 6 shows the identification efficiency before and after the MIC processing with different numbers of highest layer neurons. To obtain self-validation classification accuracy, we derived the training set and the verification set from train (2402 38) Attr  . We calculated the network output for the validation dataset using the sigmoid function. We obtained the self-verification classification accuracy by calculating the ratio of correct predictions to the total number of stage samples in verification set. Energies 2020, 13, 2042 Figure 8 shows that when R high was 30 and 50, and the number of weight optimization iterations was 30, and the classification accuracy stabilized quickly. When R high was 30, the DBN still achieved the highest system classification ability with fewer optimization iterations. Table 3 shows the identification efficiency before and after the MIC processing with different numbers of highest layer neurons. To obtain self-validation classification accuracy, we derived the training set and the verification set from Attr train (2402 × 38). We calculated the network output for the validation dataset using the sigmoid function. We obtained the self-verification classification accuracy by calculating the ratio of correct predictions to the total number of stage samples in verification set. In this paper, the tools-minepy toolbox was used to calculate the MIC values of each feature element in the DBN output feature vector under the condition of optimal meshing. According to the order of MIC values, we evaluated the SVM classification accuracy of the feature element subset of stage samples in the testing set.

compares the interpretation efficiency of the DBN with a different number of highestlayer neurons. The BP network in the DBN optimization experiment had five layers in all. The number of fine-tuning BP optimization iterations for the weight w of the DBN was in the range 1-100.
We obtained the MIC classification accuracy using y n MIC . The input training features were Attr train (2402 × 38), and the testing features were Attr test (1602 × 38). Because the y n MIC with the maximum self-verification classification rate was selected as the purified features, the SVM classification achieved the highest theoretical classification efficiency. Table 3 shows that the DBN with 30 highest-level neurons not only obtained the highest self-verification classification rate, but also showed good classification characteristics for its corresponding feature y 9 MIC . Figure 9 shows the optimal feature ranking when R high was 30 or 50. The horizontal axis represents the feature priority ranking obtained by the MIC method. The vertical axis represents the classification accuracy obtained by the SVM classifier after the current feature was combined with all the features that were prioritized.  with the maximum self-verification classification rate was selected as the purified features, the SVM classification achieved the highest theoretical classification efficiency. Table 6 shows that the DBN with 30 highest-level neurons not only obtained the highest self-verification classification rate, but also showed good classification characteristics for its corresponding feature 9 MIC y . Figure 9 shows the optimal feature ranking when high R was 30 or 50. The horizontal axis represents the feature priority ranking obtained by the MIC method. The vertical axis represents the classification accuracy obtained by the SVM classifier after the current feature was combined with all the features that were prioritized.  Figure 9 shows that as the number of features increased, the classification accuracy of the system gradually increased and tended to stabilize. In addition, the features appearing after the classification rate stabilized had little influence on the classification and interpretation of the system. These features were classified as redundant information that needed to be eliminated by the MIC operation. According to the results in Table 6, when the number of DBN highest-layer neurons was 30, the number of new features was the smallest-equal to 9-and had the best redundancy elimination effect.  Figure 9 shows that as the number of features increased, the classification accuracy of the system gradually increased and tended to stabilize. In addition, the features appearing after the classification Energies 2020, 13, 2042 12 of 18 rate stabilized had little influence on the classification and interpretation of the system. These features were classified as redundant information that needed to be eliminated by the MIC operation. According to the results in Table 3, when the number of DBN highest-layer neurons was 30, the number of new features was the smallest-equal to 9-and had the best redundancy elimination effect. Figure 10 shows the loss-value changing trend with the perplexity at different R high values. The horizontal axis represents the perplexity of the visualized t-distributed stochastic neighbor embedding (t-SNE) algorithm, and the vertical axis represents the loss value; it is evident that the loss value decreased as the perplexity increased.  In the PSO-based [11] parameter optimization experiment, the population number was 20, and the termination algebra was 200. Figure 11 shows the fitness curve of the parameters. In Figure 9, both the optimal fitness value and the average fitness value are kept above 93%, which illustrates that the particles in the optimization algorithm always maintained the best optimization ability. Finally, the optimum values of the SVM parameters were 12.126 c  and 137.18 g  . Figure 11. Fitness (accuracy) curve of the PSO based parameter optimization.
As shown in Figure 12, we employed the grid search method to provide statistical information regarding the classification accuracy near the optimal parameters. For c and g, the range was [ -10 2 , 10 2 ], and the steps size was 0.1. When c and g were optimal, the SVM achieved the maximum K-CV classification accuracy of 99.71%. In the PSO-based [11] parameter optimization experiment, the population number was 20, and the termination algebra was 200. Figure 11 shows the fitness curve of the parameters. In Figure 9, both the optimal fitness value and the average fitness value are kept above 93%, which illustrates that the particles in the optimization algorithm always maintained the best optimization ability. Finally, the optimum values of the SVM parameters were c = 12.126 and g = 137.18.  In the PSO-based [11] parameter optimization experiment, the population number was 20, and the termination algebra was 200. Figure 11 shows the fitness curve of the parameters. In Figure 9, both the optimal fitness value and the average fitness value are kept above 93%, which illustrates that the particles in the optimization algorithm always maintained the best optimization ability. Finally, the optimum values of the SVM parameters were 12.126 c  and 137.18 g  . Figure 11. Fitness (accuracy) curve of the PSO based parameter optimization.
As shown in Figure 12, we employed the grid search method to provide statistical information regarding the classification accuracy near the optimal parameters. For c and g, the range was [ -10 2 , 10 2 ], and the steps size was 0.1. When c and g were optimal, the SVM achieved the maximum K-CV classification accuracy of 99.71%. As shown in Figure 12, we employed the grid search method to provide statistical information regarding the classification accuracy near the optimal parameters. For c and g, the range was [2 −10 , 2 10 ],  Figure 13 shows the stage tag corresponding to the predicted output and the actual standard tag. All training samples used in these methods were randomly selected. The rows correspond to the predicted class (Output Class) and the columns correspond to the true class (Target Class). The diagonal cells correspond to observations that were correctly classified. This paper introduced the t-SNE visualization method to achieve the visualization of all data samples by minimizing the standardized Euclidean distance between the original data and the reconstructed data. Figure 14 shows the t-SNE visualization classification effect of different dimensional features at 100 perplexity  . In Figure 14, the serial numbers 1-5 correspond to the five well-testing stages. We obtained the feature   Figure 13 shows the stage tag corresponding to the predicted output and the actual standard tag. All training samples used in these methods were randomly selected. The rows correspond to the predicted class (Output Class) and the columns correspond to the true class (Target Class). The diagonal cells correspond to observations that were correctly classified.  This paper introduced the t-SNE visualization method to achieve the visualization of all data samples by minimizing the standardized Euclidean distance between the original data and the reconstructed data. Figure 14 shows the t-SNE visualization classification effect of different dimensional features at 100 perplexity  . In Figure 14, the serial numbers 1-5 correspond to the five well-testing stages. We obtained the feature  This paper introduced the t-SNE visualization method to achieve the visualization of all data samples by minimizing the standardized Euclidean distance between the original data and the reconstructed data. Figure 14 shows the t-SNE visualization classification effect of different dimensional features at perplexity = 100. In Figure 14, the serial numbers 1-5 correspond to the five well-testing stages.
We obtained the feature Attr (4004×38) presented in Figure 12a using the MFE method. Because of the lack of deep feature extraction, many mutual inclusions and overlaps existed. The features corresponding to Figure 14b,c took the output features y h−level of the DBN as input. When R high was 50, some points belonging to Stage 3 were separated by Stage 2 and Stage 4. When R high was 30, there was no interruption and isolation, and the points belonging to the same stage were relatively aggregated. Figure 14d presents the features y 9 MIC extracted by MIC. Compared with Figure 14c, their aggregation degree within the class was similar, but y 9 MIC had a larger between-class distance, which could be more conducive to the identification of the well-testing working stage.

Figure 14. The t-SNE visualization classification diagram of different dimensional features.
In order to compare the classification performance of this method with the classical feature extraction methods, we analyzed the classical feature extraction methods. Here, the classical methods generally refer to the methods mentioned above and their combination methods. Table 7 shows the SVM classification accuracy of the classical feature extraction methods used individually. Table 8 summarizes the SVM classification accuracy of different feature extraction and optimization algorithms for the five stages to be identified during well testing. The total classification accuracy was equal to the ratio of the number of successful classification samples to the total number of samples, and the average classification accuracy was equal to the average classification accuracy in each stage. In Table 8, A-B represents the integration algorithm integrating A and B methods.   In order to compare the classification performance of this method with the classical feature extraction methods, we analyzed the classical feature extraction methods. Here, the classical methods generally refer to the methods mentioned above and their combination methods. Table 4 shows the SVM classification accuracy of the classical feature extraction methods used individually. Table 5 summarizes the SVM classification accuracy of different feature extraction and optimization algorithms for the five stages to be identified during well testing. The total classification accuracy was equal to the ratio of the number of successful classification samples to the total number of samples, and the average classification accuracy was equal to the average classification accuracy in each stage. In Table 5, A-B represents the integration algorithm integrating A and B methods.

Discussion
Based on the accumulation of 30 million pressure sampling points from 572 field well test samples, numerous simulation results are presented to reflect the feature representation and generalization We proposed the MFE method based on characteristics of well-testing data by integrating four widely accepted feature methods: FFT, WPD, EMD, and gradient. As shown in Table 4, the information-characterizing ability varied among the methods. The FFT, EMD-AE, and Reg10 features better characterized Stage 1 and Stage 5. Reg3 and Reg100 had performed well in the recognition of Stage 2 and Stage 4. WPD-AE and Reg10 obtained relatively high levels of accuracy for Stage 3. Even when we used the SVM classifier with the optimal parameters, the maximum classification efficiency of a single feature was only 58.2397%, which was too low to satisfy the requirement of practical production. Therefore, the classical methods had significant limitations in well-testing data interpretation and the ability to identify the working stage. The MFE, with added computing units, was able to extract more types of features compared with a single-feature extraction method, so it could obtain a better classification effect; the total identification accuracy was 91.57%. Although the error at Stage 1 was too large to meet the production requirements, it is still a feasible method for obtaining the variation and characteristics of the data in the time-frequency domain.
As for obtaining the deep expression of multidimensional features, we first used a DBN supervisory network to study the time-frequency characteristics obtained by the MFE method. To the best of our knowledge, it was the first time that feature deep learning with a DBN was performed. We tried to determine the DBN structure by seeking the configuration parameters corresponding to the highest classification accuracy. It was a self-comparison process. Finally, R high was determined to be 30 by comparing a set of results on the classification accuracy, loss-value, and s-SNE distribution, which were obtained by setting different R high values in the simulation, with theoretical and practical tags. This process is a new method for the deep learning and reconstruction of oil test data characteristics. Being an important processing unit of the MFE-DMIC, DBN retains the time-frequency characteristics without losing high-level feature expressions with a high distinguishing ability. Table 5 shows that the DBN network with the highest layer neuron number of 30 combined with any method can achieve perfect classification results in the S1, S4, and S5.
Studies on the feature selection of well-testing data are limited. Researchers usually enter the extracted features directly into the classification system and seldom purify them from an MIC [32] perspective. Thus, based on the priority ordering of the feature vectors under the SVM classification, we demonstrated that sorting and screening with the MIC is an efficient method for identifying the correlations between features, eliminating redundant features, and reducing the number of feature dimensions (reducing the number of dimensions of the MFE from 38 to 9). Compared with DBN30, the DBN30-MIC achieved maximum classification accuracy more quickly ( Figure 9) and had the best redundancy elimination effect (Table 3). Additionally, the comparison between DBN30 and DBN30-PSO in Table 5 shows that it is feasible to use the PSO optimized parameters c and g for SVM classification. Although the comparison results before and after the optimization of SVM parameters only showed a 1% improvement, a high classification accuracy of 98.19% was obtained step-by-step by making full use of the advantages of each algorithm. From another point of view, we tried to introduce DBN into the analysis of well test data, and the results were satisfactory. This is because of the second learning of MFE features by the DBN network, whereby all integration methods using the DBN can extract features with a high representation ability.
We also find that the classical methods have different classification effects when dealing with data in different stages. In Figure 13, the PSO-optimized decision-making system significantly improved the efficiency of stage classification, except for S1 (the stage of lowering the oil string). Similarly, in Table 5, PSO has the worst ability to classify stage 1. The possible reason might be that the string-lowering stage was affected by external factors such as mechanical friction and noise interference, causing the data with similar characteristics to other stages to appear at this stage. These similar features need to be considered from the point of view of feature extraction and purification, which cannot be effectively distinguished only by optimizing the system. As for MIC, PSO is better than MIC in improving the classification accuracy of stage 3 (see Table 5). This means that after the DBN learning, the features of stage 3 have been fully excavated, and the main factor affecting the successful classification of stage 3 data is the work efficiency of the classifier. Therefore, it can be concluded that the MIC is helpful for S1 stage classification and the PSO is helpful for S3 classification.
With the use of the above-mentioned classical methods, the accuracy of the model classification gradually improved. Thus, the proposed MFE-DMIC method could achieve a performance superior to other classical methods and combination methods. The data processing route from multidimensional feature depth extraction to model optimization was reasonable, reliable, and suitable for autonomous feature learning and accurate labeling of the samples to be predicted.
In future work, three aspects need to be studied in-depth. First, although the MFE method can effectively extract the features of well-testing data, it is not the only method suitable for that purpose. We should try to use more feasible methods to extract different kinds of features to characterize the data information to the greatest extent possible. Second, this study only optimized the number of neurons at the highest DBN layer and thus lacked global optimization. At present, although there is no scientific and universal method to determine the number of neurons in each hidden layer of the DBN network, inappropriate setting will also affect the efficiency of deep learning [33]. Hence, the influence of a DBN structure on the classification accuracy should be analyzed. The number of hidden layers and the number of neurons in each layer should be adjusted and set according to the simulation results of the training samples. Third, the results shown in Figure 13 indicate that the classification results of the string lowering stage should be improved. In future research, we will improve the identification of similarity data and the fault tolerance of the decision models to reduce the misinterpretation of well-testing stages.

Conclusions
This paper extracted the deep features from well-testing data to obtain high classification accuracy. We proposed a new MFE-DMIC model to solve the two problems of current interpretation methods, that is, the poor information-characterizing ability of features and the low generalized interpretation ability of the model. First, we extracted multiple features using the MFE. Then, we conducted feature learning, reconstitution, and MIC purification using the DBN-MIC method. Lastly, we conducted classification using the PSO-optimized multi-SVM classification method. The proposed model achieved an average single-stage classification of 98.08% and a total accuracy of 98.19%, which indicated excellent classification performance in the well-testing process with five working stages. This model can be used to reduce the effort of oil analysts and to lower the error rate of data interpretation, thus providing a reliable scientific method for both actual production and academic research on data that is stochastic and non-scheduled. Table A1. Main definitions of the paper.