A New Model for Predicting Rate of Penetration Using an Artificial Neural Network.

The drilling rate of penetration (ROP) is defined as the speed of drilling through rock under the bit. ROP is affected by different interconnected factors, which makes it very difficult to infer the mutual effect of each individual parameter. A robust ROP is required to understand the complexity of the drilling process. Therefore, an artificial neural network (ANN) is used to predict ROP and capture the effect of the changes in the drilling parameters. Field data (4525 points) from three vertical onshore wells drilled in the same formation using the same conventional bottom hole assembly were used to train, test, and validate the ANN model. Data from Well A (1528 points) were utilized to train and test the model with a 70/30 data ratio. Data from Well B and Well C were used to test the model. An empirical equation was derived based on the weights and biases of the optimized ANN model and compared with four ROP models using the data set of Well C. The developed ANN model accurately predicted the ROP with a correlation coefficient (R) of 0.94 and an average absolute percentage error (AAPE) of 8.6%. The developed ANN model outperformed four existing models with the lowest AAPE and highest R value.


Introduction
The drilling rate of penetration (ROP) is a measure of the speed or the progress of the drill bit when it drills subsurface formation. ROP is usually reported in ft/h (field units) or m/h (SI units). The major portion of the well capital investment is consumed by drilling operations; thus, optimizing the ROP is a key aspect to reduce total well cost [1][2][3]. ROP modeling challenges arise from the fact that ROP is affected by many interconnected factors, which makes it very difficult to infer the mutual effect of each individual parameter. As a result, many oil and gas companies maintain data for the offset wells in the same field and set certain key performance indicators to assess the ROP for any newly drilled well [4].
In order to drill a well, three factors have to be established together. First of all, a certain load has to be applied on the bit, and this is known as the weight on bit (WOB). WOB can be achieved by the rig hoisting system by slacking some weight of the drill string against the hole being drilled. The drillstring rotation speed by the rig rotary equipment, such as the top drive or the kelly in older rigs, is measured in revolutions per minute (RPM). The action of WOB and drillstring rotation generates a torque (T) as a result of the interaction between the drilling bit and the drilled formation in addition to friction with the wellbore wall [5]. Finally, a drilling fluid has to be circulated within the wellbore and using the rig circulating system to clean the wellbore and cool the drill bit at circulation rate (Q). Surface pressure is generated at the stand pipe against any pressure losses during the mud circulation (friction, hydrostatic, drill bit nozzles ∆P, etc.). This pressure is known as stand pipe pressure (SPP) [6]. The Khoukhi and Alarfaj [43] compared extreme learning machines (ELM) and radial basis function networks (RBF) on Bourgoyne and Young's model and showed that RBF gave the best results in terms of accuracy and processing time. Amer et al. [44] used backpropagation feed-forward ANN to predict ROP with an R of 0.88 using 24 inputs, 50 neurons, and 1 layer. With more than 12,000 data points coming from six different wells, they obtained an R value of 0.88. Elkatatny [45] used an ANN feed-forward network to predict ROP on three offshore wells. The model was trained on 3333 data points from two wells with an R of 0.99 and average absolute percentage error (AAPE) of 5%. Then, 2700 data points were used to validate the model and resulted in the prediction of ROP with an R value of 0.99 and an AAPE of 4%. The rock unconfined compressive strength was constant in each section, which limited the model application to cases where there is no variation in the drilled rock strength. Kamel et al. [46] developed a technique for an automation rotary steerable system.
The objective of this study is to use ANN to develop a new real-time ROP model using field data including drilling parameters and changing formation strength from three onshore wells. After data filtering and screening, one well was used to train and test the ANN model, while the other two wells were used to validate the developed model. A new empirical ROP equation was also derived based on the optimized ANN model and compared with four common ROP models.The developed equation was simplified to present the simplest form that can be used in the rig site without the need for the ANN code or the MATLAB lisence.
In this manuscript, Section 2 represents the data description and Section 3 represents the development of the ANN model, including training, testing, validation, and the comparison with the published ROP correlations, in addition to converting the black box of the ANN to a white box by the development of a new ROP empirical correlation based on the optimized ANN model. Finaly, Section 4 represents the conclusion.

Data Description
The field data were collected from 16-inch intermediate holes in three different vertical wells drilled onshore in the same carbonate formation using the same conventional bottom hole assembly (BHA). Only the parameters that are available in real-time will allow the real time determination of ROP. The data were collected from a real-time sensor based on footage. It is worth mentioning that formation of unconfined compressive strength (UCS) is included in the work as a continuous log rather than the average value for each formation type. Offset well logs were used after performing depth calibration to generate the UCS curve. Additionally, having mud logging in this section helped to reflect actual formation tops if it was found different. Based on that, the drilling parameters collected are ROP in ft/h, Q in gpm, drillstring rotation speed in rpm, SPP in psi, T in Klbf-ft, WOB in Klbf, and UCS in psi.
Usually, field drilling data include all sorts of operations done on the 16-inch hole section such as drilling, tripping, and running the casing string. The first step was to capture only the drilling data where new footage was made and clean it from any other operations. This step requires a human interface using data filtering and elimination. For example, if the footage was 1000 ft, then suddenly the depth log shows 930 ft, then it is a trip out of the hole. Finally, only drilling data were considered in the data set used to develop the ROP model. The second step is to filter out any outliers by normalizing the data and checking the data cluster visually. Figure 1 shows one example of data clustering where the x-axis indicates the normalized ROP values and the y-axis indicates a normalized SPP. The values in blue color indicate very high SPP values at very low ROP, which may represent reaming operations or cement drilling. Based on that, the blue data cluster which represents a low ROP at a very high SPP was eliminated, as this is not normal in practical operation. Out of the 3311 data points in Well A, only 1528 data points (red data points in Figure 1) representing subsurface rocks penetration during drilling were used to train and test the ANN model. The red data points represent the common behavior of the ROP and SPP in normal drilling operation and indicate the progress of the drilling process.  For a better understanding of the cleaned data correlation, the correlation coefficients (R) of ROP with all other parameters from Well A are calculated and presented in Figure 2. The torque showed the maximum R value with ROP, 0.88, while the UCS has the lowest R value, −0.09. In addition, the ROP is a strong function of WOB where the R was 0.78, and it is a moderate function of Q, drillstring rotation, SPP where the R values were 0.53, 0.58, and 0.55, respectively. Although the R value of the UCS is very low, it is very important to include the UCS as an input parameter in building the ANN model, as it has a noticable effect on model accuracy [12]. The statistical parameters of the field data collected from the three wells are shown in Tables 1-3. For all parameters, the available data cover a wide range, which will help toward the development For a better understanding of the cleaned data correlation, the correlation coefficients (R) of ROP with all other parameters from Well A are calculated and presented in Figure 2. The torque showed the maximum R value with ROP, 0.88, while the UCS has the lowest R value, −0.09. In addition, the ROP is a strong function of WOB where the R was 0.78, and it is a moderate function of Q, drillstring rotation, SPP where the R values were 0.53, 0.58, and 0.55, respectively. Although the R value of the UCS is very low, it is very important to include the UCS as an input parameter in building the ANN model, as it has a noticable effect on model accuracy [12]. For a better understanding of the cleaned data correlation, the correlation coefficients (R) of ROP with all other parameters from Well A are calculated and presented in Figure 2. The torque showed the maximum R value with ROP, 0.88, while the UCS has the lowest R value, −0.09. In addition, the ROP is a strong function of WOB where the R was 0.78, and it is a moderate function of Q, drillstring rotation, SPP where the R values were 0.53, 0.58, and 0.55, respectively. Although the R value of the UCS is very low, it is very important to include the UCS as an input parameter in building the ANN model, as it has a noticable effect on model accuracy [12]. The statistical parameters of the field data collected from the three wells are shown in Tables 1-3. For all parameters, the available data cover a wide range, which will help toward the development The statistical parameters of the field data collected from the three wells are shown in Tables 1-3. For all parameters, the available data cover a wide range, which will help toward the development of an accurate prediction of ROP using ANN. Table 1 lists the data used for training the ANN model. The ROP ranges from 5 to 96.7 ft/h, which represents field practical values achieved usually in vertical sections. The flow rate ranges from 859.3 to 1110.6 gal/min, the drillstring rotation speed ranges from 76.9 to 157.9 RPM, the SPP ranges from 1396.3 to 2853.1 psi, the torgue ramges from 7.1 to 22 klbf-ft, the WOB ramges from 31.10 to 61.30 klbf, and the UCS ranges from 1821.1 to 42,819 psi, representing a different lithogy with different UCS values.    Table 2 lists the statistical parameters for Well B which are used for validating the developed ANN model. ROP ranges from 5.4 to 153.7 ft/h, the flow rate ranges from 807 to 1124 gal/min, the drillstring rotation speed ranges from 51 to 119 RPM, the SPP ranges fom 1357 to 3827 psi, the torque ranges from 3.8 to 22.4 klbf-ft, the WOB ranges from 6.5 to 58 klbf, and the UCS ranges from 1996.2 to 28,190 psi. It is clear that some parameters have a wider data range than the one used for training, and this is a challenge to the developed model to assess its ability to predict the ROP with a data range out of the training data range. Table 3 lists the statistical parameters of Well C, which are used for validation of the developed model and also for model comparison with the published ROP models. ROP ranges from 16.79 to 126.66 ft/h, the flow rate ranges from 697.64 to 1171.40 gal/min, the drillstring rotation speed ranges from 64.52 to 168.86 RPM, the SPP ranges fom 676.17 to 2628.60 psi, the torque ranges from 12.9 to 23.29 klbf-ft, the WOB ranges from 26.17 to 58.33 klbf, and the UCS ranges from 5087.20 to 28,190 psi. The same behavior was noted as Well B, as the data range of some parameters is outside the data range of the training parameters of Well A.
Surprisingly for all wells, however, the UCS profile in the field data represents different rock strength within the same formation, and it has the lowest R value with ROP. For example, in Well A, the UCS ranges from 1821.1 to 42,819 psi, while its correlation coefficient with ROP is −0.09. The same behavior was noticed for Wells B and C.

Model Development and Results
The ROP model was built using an ANN feed-forward network with the six input parameters discussed in the previous section. The ANN model consists of 12 neurons and one hidden layer, as shown in Figure 3. The optimum number of neurons and layers was selected based on achieving the minimum AAPE and maximum correlation coefficient as the two governing factors through several trials. During the trials, the minimum number of neurons was 6, and the maximum was 20. Only one hidden layer was used, as having two or more layers did not improve the results even with a different number of neurons. In addition, using one layer reduces the size of the ANN model correlation matrix of weights and biases. The Levenberg-Marquardt training function (trainlm) was used as a training function, while pure-linear was used as a transfer function.

Model Development and Results
The ROP model was built using an ANN feed-forward network with the six input parameters discussed in the previous section. The ANN model consists of 12 neurons and one hidden layer, as shown in Figure 3. The optimum number of neurons and layers was selected based on achieving the minimum AAPE and maximum correlation coefficient as the two governing factors through several trials. During the trials, the minimum number of neurons was 6, and the maximum was 20. Only one hidden layer was used, as having two or more layers did not improve the results even with a different number of neurons. In addition, using one layer reduces the size of the ANN model correlation matrix of weights and biases. The Levenberg-Marquardt training function (trainlm) was used as a training function, while pure-linear was used as a transfer function. The data collected from Well A (1528 data points) were used to train the ANN model using one hidden layer, 12 neurons, the trainlm training function, and pure-linear transfer function. Data from Well A were selected for training because of the wider range of parameters available in the data set compared to Wells B and C which allow the ANN model to learn better about ROP trends in this field. A total of 70% of the data was used for training and 30% for testing the developed model. For The data collected from Well A (1528 data points) were used to train the ANN model using one hidden layer, 12 neurons, the trainlm training function, and pure-linear transfer function. Data from Well A were selected for training because of the wider range of parameters available in the data set compared to Wells B and C which allow the ANN model to learn better about ROP trends in this field. A total of 70% of the data was used for training and 30% for testing the developed model. For the training dataset, an R value of 0.92 and an AAPE of 9.85% were achieved (Figure 4a), while for testing, an R value of 0.92 and an AAPE of 9.02% were achieved (Figure 4b). A complete ROP profile is shown in Figure 5 for both ANN model training and testing. It is clear from the visual check that the developed ANN model for ROP prediction has a high accuracy where the predicted and actual ROP lines are overlapped, and also the predicted line follows the trend of the actual ROP line when there is an increase or decrease. One of the main reasons for the change in ROP values is the change in UCS values, and the developed model was able to capture that affect as shown in Figure 5 for both training and testing.
Sensors 2020, 20, x FOR PEER REVIEW 7 of 19 Sensors 2020, 20, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensors the training dataset, an R value of 0.92 and an AAPE of 9.85% were achieved (Figure 4a), while for testing, an R value of 0.92 and an AAPE of 9.02% were achieved (Figure 4b). A complete ROP profile is shown in Figure 5 for both ANN model training and testing. It is clear from the visual check that the developed ANN model for ROP prediction has a high accuracy where the predicted and actual ROP lines are overlapped, and also the predicted line follows the trend of the actual ROP line when there is an increase or decrease. One of the main reasons for the change in ROP values is the change in UCS values, and the developed model was able to capture that affect as shown in Figure 5 for both training and testing. The datasets collected from Wells B and C, 2157 and 849 data points, respectively, were used to validate the developed ANN model for ROP prediction. The ROP model yielded an R of 0.95 and 0.94 between the actual and predicted ROP and AAPEs of 7.4% and 8.7% for Wells B and C, respectively. The calculated ROP was compared with the actual ROP for both Wells B and C, as shown in Figure 6. It is clear in Figure 6, which represents Well C data, that there is a big change in  The datasets collected from Wells B and C, 2157 and 849 data points, respectively, were used to validate the developed ANN model for ROP prediction. The ROP model yielded an R of 0.95 and 0.94 between the actual and predicted ROP and AAPEs of 7.4% and 8.7% for Wells B and C, respectively. The calculated ROP was compared with the actual ROP for both Wells B and C, as shown in Figure 6. It is clear in Figure 6, which represents Well C data, that there is a big change in the ROP values, which resulted from the change of the UCS, as this section consists of different carbonate formations where each formation has its own UCS value. This is the main reason the UCS was included as an input, and it was proven that it has a big effect. The developed ANN-ROP model was able to capture the big variation in the UCS and yielded a strong ROP model that can capture the increase and the decrease of Sensors 2020, 20, 2058 8 of 18 the actual ROP value over a wide depth range. Even though Well C has a greater data range than that of the training data, the model was able to predict the ROP with a high accuracy. Because of the big variation in the ROP of Well C, this well was used for comparing the developed ANN model with the previous ROP correlations.   Sensors 2020, 20, x; doi: FOR PEER REVIEW www.mdpi.com/journal/sensors Figure 6. Complete profile of actual ROP vs. predicted ROP from the developed ANN model using data from Wells B and C. Well-B data was used for evaluation the developed ROP model, while Well-C data was used for validating and comparison with other published ROP models.

ANN Model Empirical Correlation
To permit real-time prediction, the current black box model has to be converted to a white box model to enable ROP prediction outside the AI modeling software. An empirical correlation can be derived from the ANN model to represent the ANN structure based on the optimized ANN model. These types of correlations are based on sets of weights and biases that can be written as a matrix. The weights and biases can be transferred to an empirical equation through a mathematical operation

ANN Model Empirical Correlation
To permit real-time prediction, the current black box model has to be converted to a white box model to enable ROP prediction outside the AI modeling software. An empirical correlation can be derived from the ANN model to represent the ANN structure based on the optimized ANN model. These types of correlations are based on sets of weights and biases that can be written as a matrix. The weights and biases can be transferred to an empirical equation through a mathematical operation to enable easier utilization of the developed model by engineers. In the ANN model, each neuron handles the following functions [47]: i-Multiplication of the input parameters, x 1 , x 2 , x 3 , ... x n by the associated input weights; ii-Summation of the weight and input product to the bias value associated with the neuron; iii-The passage of the summation result, u, through an activation function (linear or nonlinear transformation), Φ. The transfer function can be logistic sigmoid (logsig), hyperbolic tangent sigmoidm (tansig), or linear (purelin), as described in Table 4. Table 4. Different transfer function definition.

Transfer Functions Definition
The neuron's output (y) is the result of the action of the activation function, as follows: For a one-layer ANN structure (plus one hidden), the equation can be written as: where m is the number of neurons and n is the number of inputs. If the linear transfer function was used on both the first layer and hidden layer, the above equation can be written as: In MATLAB language, Equation (3) can be written as: There will be only one coefficient (a) for each input (x). Thus, the coefficient matrix of the ANN model can be written as: a 1 a 2 a 3 . . . a n = (LW × IW) while a constant (c) can be represented by: Then, the final empirical correlation model will be: y = a 1 x 1 + a 2 x 2 + a 3 x 3 + . . . + a n x n + c Sensors 2020, 20, 2058 13 of 18 Substituting the x values with corresponding ROP model normalized input parameters results in the following ROP empirical correlation: ROP n = a 1 Q n + a 2 RPM n + a 3 SPP n + a 4 T n + a 5 WOB n + a 6 UCS n + c (8) where the values of a 1 -a 6 are listed in Table 5, and the normalized parameters can be defined from the following equations: Q n = 0.008Q − 7.8388 (10) UCS n = 0.0001UCS − 1.1516 (15)  Combining Equations (8)- (15) with constants in Table 5 gives the final ROP equation (Equation (16)), which can be used directly to calculate ROP from input parameters.

Model Comparison
To compare the obtained results from the ANN model with the previous ROP models, four ROP models were selected, and a comparison was performed between all of them using Well C data (849 data points). For the Maurer model [9] using Well C data, the formation drillability constant (K) was found using regression to be 6.7275 × 10 −7 , while for the Bingham [10] model, three constants K, a WOB , and b WOB were calculated using regression to be 0.2, 1.589, and 0.89, respectively. The third model is Bourgoyne and Young [11], for which the seven exponents, a 1 -a 7 , were found to be 2.3583, 2 × 10 −4 , −1.0 × 10 −4 , 1.06 × 10 −7 , 0.636, 0.6259, and −0.1109, respectively. The final ROP model is the Al-AbdulJabbar et al. [12] model, where only two constants, a (WOB) of 0.854 and b (UCS) , ranged between 1.189 and 1.275 depending on formation type. Figure 7 shows that the developed correlation outperformed all other models with the highest R value (0.94) as compared with the Maurer model [9] which yielded an R value of 0.72, the Bingham [10] model wich yielded an R value of 0.87, Bourgoyne and Young [11] which yielded an R value of 0.89, and Al-AbdulJabbar et al. [12] which yielded a high R value of 91 as it considered the effect of clustering based on the UCS value.
In terms of the average absolute percentage error, Figure 8 shows that the developed empirical correlation of ROP yielded the lowest AAPE, which was 8.72%, while the Maurer model [9] yielded an AAPE of 22.41%, the Bingham [10] model yielded an error of 21.27%, Bourgoyne and Young [11] yielded an AAPE of 17.07%, and Al-AbdulJabbar et al. [12] yielded a high AAPE of 10.83%. Based on these results, the developed ROP equation based on the optimized ANN model can be used as a robust and precise equation to predict ROP in future wells in the same field while the developed equation can be tuned for wells in other fields.    Al-AbdulJabbar et al. [12] AAPE (%) Figure 7. Cross-plot of actual vs. calculated ROP using different ROP correlations using the data set from Well C.     Al-AbdulJabbar et al. [12] AAPE (%) Figure 8. AAPE for different ROP correlations applied to Well C data for comparison with the developed correlation.

Conclusions
Field data collected during drilling operation of three vertical onshore wells drilled in the same carbonate formation using the same conventional bottom hole assembly were used to train, test, and validate an ANN model to predict ROP. ROP was successfully predicted as a function of T, Q, drillstring rotation, WOB, SPP, and UCS with an AAPE of 9.85% and 9.02% for training and testing, respectively, and the correlation coefficient was above 0.91 for training and testing. The model was able to calculate ROP for the selected sections in Wells B and C drilled in the same formation with an R value of 0.94 for the two wells and an AAPE of 7.4% and 8.7%, respectively. A simple and highly accurate ROP empirical equation was developed based on the weights and biases of the optimized ANN model so that field engineers can apply it in daily calculations very easily. In addition, the developed ROP equation was compared with four existing ROP correlations using data from Well C, which was not used during the model development. The new correlation outperformed all other correlations with the lowest AAPE and highest R value, which can allow field engineers to use it to predict accurate ROP in future wells or define a certain parameter to achieve a desired ROP.
The main limitation of the developed model is the data range of the used training data where the model can be uded to predict the ROP with a high accuracy for any new sets of data that have the same data range as that of Well A. Moreover, it is recommended to apply this model for the vertical section only, where carbonate formation exists. For other well profiles, there is a need to develop another ROP.

Conflicts of Interest:
The author declares no conflict of interest.