Intelligent Drilling of Oil and Gas Wells Using Response Surface Methodology and Artiﬁcial Bee Colony

: The oil and gas industry plays a vital role in meeting the ever-growing energy demand of the human race needed for its sustainable existence. Newer unconventional wells are drilled for the extraction of hydrocarbons that requires advanced innovations to encounter the challenges associated with the drilling operations. The type of drill bits utilized in any drilling operation has an economical inﬂuence on the overall drilling operation. The selection of suitable drill bits is a challenging task for driller while planning for new wells. Usually, when it comes to deciding the drill bit type, generally, the data of previously drilled wells present in similar geological formation are analyzed manually, making it subjective, erroneous, and time consuming. Therefore, the main objective of this study was to propose an automatic data-driven bit type selection method for drilling the target formation based on the Optimum Penetration Rate (ROP). Response Surface Methodology (RSM) and Artiﬁcial Bee Colony (ABC) have been utilized to develop a new data-driven modeling approach for the selection of optimum bit type. Data from three nearby Norwegian wells have been utilized for the testing of the proposed approach. RSM has been implemented to generate the objective function for ROP due to its strong data-ﬁtting characteristic, while ABC has been utilized to locate the global optimal value of ROP. The proposed model has been generated with a 95% conﬁdence level and compared with the existing model of Artiﬁcial Neural Network and Genetic Algorithm. The proposed approach can also be applied over any other geological ﬁeld to automate the drill bit selection, which can minimize human error and drilling cost. The United Nations Development Programme also promotes innovations that are economical for industrial sectors and human sustainability.


Introduction
Drilling operations are performed to extract natural oil and gas from underlying reservoirs. Wellbores are drilled from top to the target depths of the geological formations containing sweet spots of hydrocarbons. With ever-increasing energy demand, the oil and gas industry requires newer innovative technologies that are more economic and efficient for the sustainable development of human civilization. Innovations in the oil and gas industry will help in the proper extraction and utilization of natural resources with a longer sustainable period of hydrocarbon production. This will also encourage a global economy that influences human sustainability, as clearly mentioned in the sustainable development goals of the United Nation Development Programme (UNDP). However, the drilling of oil and gas wells is an expansive task that involves huge financial investments and electrical powers consumptions at the drill site. Drill sites may also have a hazardous impact on human health and the environment when accidents such as blowouts, hydrocarbon spillage, etc., occur. Drilling hydrocarbon wells is considered one of the most dangerous tasks in the oil and gas industry. Newer unconventional wells are drilled for the extraction of hydrocarbons that requires advanced innovations to encounter the challenges associated with the drilling operations. To achieve highly efficient drilling operations, operational parameters need to be optimized for operational cost minimization.
Drill bits and Optimum Penetration Rate (ROP) are the important drilling parameters that need to be optimized for the success of drilling operations due to their large impact on operational efficacy and cost. While optimizing the cost of drilling operations, selection of the most suitable drill bit types is one of the main concerns of drillers, as all other drilling parameters directly or indirectly rely on the drill bit, although the cost of bits is only 5% of the total operational cost but can influence 10-40% of the overall drilling costs [1]. It may seem easy, but selecting the right bit types for drilling operations is still one of the most challenging tasks due to its dependency on various factors. The performance of drill bits depends on various aspects such as bit design parameters, formation properties, and other operational field parameters [2,3]. The concept of drilling optimization is built on the usage of earlier drilled well data for optimizing operational variables for drilling the next well with minimum cost and time [1]. The drilling variables are gradually adjusted to achieve their best possible effective optimum values to decrease operational cost and time.
Drill bits are mostly selected based on the knowledge of bit data of previously drilled wells and from the types of bits available to driller from manufactures. A driller selects the drill bits for a new well depending upon his experience drilling earlier wells [1]. Drilling operations are also affected by various controllable and uncountable factors, which involves a high risk of human error that may increase the cost of overall drilling operations [1]. Thus, various empirical and data-driven models have been developed based on the known relationships between drilling variables to select the most suitable bit types. Recently, data-driven intelligent models have been utilized to find suitable types of drill bits [4][5][6][7][8][9]. These models are reported to be more accurate as they learn from previous well data, defying traditional methods for selecting the appropriate drill bit [9]. However, researchers reported several issues related to the models, which are critically reviewed in the 'Literature review' section of the paper [4][5][6][7][8][9].
In this work, a novel application of intelligent paradigms has been proposed for drill bit selection utilizing Response Surface Methodology (RSM) and Artificial Bee Colony (ABC) combination to provide a more reliable and efficient technique for automatic drill bit selection. The main contributions of this study are given below: • A comprehensive review of drill bit selection methodology for drilling oil and gas wells; • A novel application of RSM and ABC combination has been tested for the selection of suitable drill bit types based on optimum ROP values; • Comparison of proposed approach with the existing ANN, ANN, and GA-based drill bit selection models to test their efficacies and find a more reliable approach for bit selection.
The rest of the paper is organized as follows: Section 2 gives the literature review and establishes the main objectives of the research work. Section 3 briefly explains data-driven models applied in this study. Section 4 contains results from modeling, whereas Section 5 presents the results and discussions. Finally, Section 6 concludes the major findings of the research work along with the future scope.

Literature Review
Several methods have been applied for finding the optimum bit type mainly based on measured data taken from offset well logs with limited applicability. Out of these methods, the most popular method in use is cost per foot (C PF ) estimation for drilled intervals. The method is popular, as it is based on the operating cost of the drilling operation. C PF can be measured using a formula.
where Bit C is the bit cost in dollars, Rig C is the cost of rig per hour, Bit RT is the bit running time in hours, CN T is connection time in hours, Trip T is the trip time in hours, and DF is the sectional length of wellbore drilled in feet. C PF is used in combination with other methods as it does not depend on the operational parameters, but the drilling economy is highly affected by them. C PF also has one more disadvantage, as it can not be used in the case of directional and horizontal wells. C PF has been proven efficient in the analysis of historic drilling data obtained from the offset wells and current supervision of bit run [3]. Back in the 1960s, Teale formulated a notion of Specific Energy (SE) by establishing a relationship between bit energy requirement and its performance [2,3]. In rock drilling, SE is the energy spent by the machine to eradicate unit rock volume. The SE formula is given below.
where RPM is the rounds per minute in rpm, WOB is the weight on the bit in pounds, ROP is the drilling rate of penetration in feet per hour, and BD is the bit diameter in inches. As it is clear from the formula, SE is governed by only three parameters mainly, ROP, WOB (weight on bit), and RPM, which proves to be less advantageous. It also can not distinguish between various formations based on mechanical properties and vibrations effects on the dulling of a bit [3]. The International Association of Drilling Contractors (IADC) has adopted a new dull grading system for selecting bits on their degree of dullness for roller cone as well as fixed cutter bits. In this, the IADC used to report various parameters of drill bit such as teeth wear, bearing conditions, etc., on a scale from 1 to 8, where 1 is excellent and 8 is the poor condition. The dullness of the drill bit is a crucial factor as if the bit wears fast, it adds to drilling cost and greater time consumption, which indirectly affects the economics of drilling operation. So, one needs to carefully select the bit by evaluating data, as it will lead to extra cost to projects [9].
In 1964, Hightower selected a drill bit from geological information and logs from offset wells [10]. Sonic logs were used to find the right bit for drilling operations by estimating the formation strength to define the drillability of the formation. The rock strength utilized in this method was not measured directly from sonic logs but estimated indirectly from the theory of elasticity. Bourgoyne and Young [11] utilized a multiple regression approach for ROP modeling and considered drill bit types as an important factor influencing the drilling operation. Rabia et al. [12] proposed the selection of a bit based on mechanical specific energy. Fear et al. [13] selected drill bits based on the geology and rock properties of the formation. Perrin et al. [3] proposed a novel drilling index for the evaluation and selection of drill bit types for drilling operations. Uboldi et al. [14] utilized rock strength measurements as criteria for the choice of drill bits. Bahari and Seyed [1] applied mathematical correlations as objective functions for the optimization of various drilling variables and operational costs.
Recently, data-driven intelligent models have been utilized to find suitable types of drill bits. These models are reported to be more accurate as they learn from previous well data, defying traditional methods for selecting the appropriate drill bit [15]. Bilgesu et al. [16] used Artificial Neural Networks (ANN) for the prediction of drill bit types for drilling target geological formations. Yilmaz et al. [4] trained the ANN model using previously drilled wells offset data and predicted the drill bits types for the development wells required to be drilled internal and external of the same field. They also tested the trained ANN model for the prediction of drill bit types for the development wells that needed to be drilled in an adjoining field. Bahari et al. [5] utilized a Genetic Algorithm (GA) for the accurate computation of constants for the Bourgoyne-Young ROP model. Edalatkhah et al. [6] also selected the suitable drill bit types using ANN and GA for South Pars Field wells. Momeni et al. [7] applied ANN for the estimation of drilling ROP and bit types [14]. Momeni et al. [8] combined ANN and Genetic Algorithm (GA) for drill bit selection based on optimal ROP. They selected the drill bit types based on the optimum values of ROP and drilling variables. Abbas et al. [9] also supported the notion of drill bit selection depending upon optimum values of ROP using ANN and GA. Here, ANN was primarily utilized for the development of the objective function and GA was utilized for optimization of ROP objective function for the drill bit selection.
Various researchers have suggested that the selection of drill bits should be performed based on the optimum values of ROP. This condition results in the development of an unconstrained bounded optimization problem where a function of ROP is required to be defined using drilling variables. However, the exact relationship between ROP and drilling variables is unknown and undefined, which makes optimization of ROP a difficult task. According to Kolmogorov's theorem, multilayer feedforward perceptron (MLP) ANN architecture can be utilized to define any continuous function in its approximation form [17]. The approximation function (objective function) requires an activation function and input variables that are predefined during the training of the MLP neural network. Three-layered MLP architecture can be expanded in a mathematical form with connection weights and bias of neurons that will act as coefficients of approximation function. This technique helps solve real-field complex optimization problems, especially where the association between input and target variables is unknown such as bit selection based on optimum ROP values. In the case of complex approximation function, paradigms such as Ant Colony, Swarm Optimization, GA, etc. can be implemented to retort the optimization problem as stated in the literature [18]. However, researchers reported several issues with ANN such as overfitting, underfitting, stuck up in local minima/maxima, lack of proper guidelines for the selected network architecture, etc. [19]. This also opens the opportunity to investigate other techniques that can generate an approximation function to optimize ROP values for drill bit selection. More research work is required to automate the drill bit selection procedure for better reliability and efficacy.
This study aims to investigate RSM and ABC combination for automatic drill bit selection as well as overcome the issues related to the existing ANN-based bit selection model. RSM has been implemented to generate the objective function for ROP due to its strong data-fitting characteristic. Furthermore, the generated ROP function is optimized through ABC to acquire the optimal drilling variables along with drill bit types for target geological formations. ABC is strategically designed to locate the global optimum value of any given objective function more efficiently in the high-dimensional data space. ANN has a tendency to get stuck in local minima, so GA sometimes fails to converge when data become too much complex [19][20][21][22]. Thus, the reliability of the ANN and GA combination is a major concern for drill bit selection. However, RSM has been reported to be a reliable and popular technique for solving optimization problems in various engineering domains [23,24]. Researchers have also reported that ABC is a superior evolutionary optimization paradigm that has outperformed GA in certain applications with faster convergence and lesser iterations [20][21][22].

Materials and Methods
In this study, RSM and ABC have been utilized to develop a novel intelligent datadriven approach for the selection of suitable bit type. The performance of the existing ANN-based drill bit selection model has also been compared with the proposed approach to understand its pros and cons. This study involves the development of two separate objective functions for the same ROP and drilling variables using ANN and RSM techniques. The intelligent paradigms applied in this study are briefly explained below.

Artificial Neural Networks
ANNs are a nature-inspired intelligent paradigm that is designed based on human brain cells. It is constituted of multiple information processing units known as nodes. The interconnected nodes combine to form a layer, and layers combine to form neural networks. Neural nodes are also recognized as neuron units. Each neuron connection has been associated with weights that are attuned during the training to generate approximation function to minimize error for classification or estimation tasks. Generally, ANN comprises multilayers structure viz., input, hidden, and output layers [17]. However, the hidden layers may vary in number depending on the complexity of training data. Initially, the field data are provided to the input layer, which further transmits the raw data to hidden layers for their processing. The results acquired after processing in hidden layers are directed to the output layer, where predicted results are compared with actual target values. The deviation of prediction values from actual targets is provided as feedback to the model for updating associated weights and biases. The number of neurons and hidden layers may vary according to the complexity of problems and data types. ANN is primarily developed for handling classification and regression tasks; however, they are also applied for solving optimization problems. Figure 1 depicts the architecture of the ANN utilized in this study. Kolmogorov's theorem stated that multilayer feedforward perceptron (MLP) neural architecture can be utilized to define any continuous function in the form of approximation function [17,18,25]. There exist two stages in the MLP network, namely the learning stage and the prediction stage. Several neural network parameters are predetermined to define neural networks such as the number of neurons, number of layers, propagation rules, connections between neurons, activation function, learning rate, etc. The propagation rule in MLP is the weighted sum of inputs, which is given below.
where w mn is the connection weights associated with neuron m in the input layer and neuron n in the hidden layer, x m is the outcome from neuron m in the input layer and M represents the input layer neurons, t represents the associated patterns, and β n represents neuron bias. The activation function will be multiplied with Equation (3) to decide whether neurons should be activated or not. There are several popular activation functions used in neural networks such as sigmoid function, hyperbolic tangent function, Softmax function, Softsign, Rectified linear unit, Exponential linear units, etc. The outcome from the K th neuron with activation function existing in the input layer is given below [18].
Due to the three-layer architecture of MLP, the propagation rule is applied twice to transmit the values from the input to the output layer. Considering N to be the number of neurons in the output layer, the result of the K th output neuron can be described by the following equation [18].
Substituting Equation (4) in Equation (5), the outcome of K th neuron can be rewritten as given below.
Equation (6) of MLP is utilized to approximate the objective function in optimization as given in Equation (7) [18]. Constraints Equation (7) acts as the objective function and constraints for optimization problems where the relationship between input and response variable is undefined. In the case of complex approximation function, algorithms such as Ant Colony, Particle Swarm Optimization, ABC, GA, etc., can be applied to retort the optimization problem.

Response Surface Methodology
RSM is a set of statistical techniques that are quite helpful in optimizing, developing, and improving processes and useful in analyzing and modeling numerous problems in engineering [26,27]. RSM is particularly useful in real-world situations where various variables affect the performance, quality, and output of the desired process as in the case of drilling operations [27]. The target variable (Y) is termed as the response variable, while input variables are known as independent variables. An appropriate relationship can be identified between input and response variables using RSM. RSM helps to develop the correct approximate mathematical function, which satisfies a suitable relationship between the objective function and test factor group [26]. Interaction between the input variables can also be included in the response surface equation. The generalized relationship can be developed as given below.
where f is the unknown exact response function, which may be complex, and e is the error due to unaccountable factors that influence the response or output but are never included in the Equation (8) such as noise, interaction effects, etc. Let us consider that e is a statistical error that is distributed normally with variance σ 2 and zero mean. Then, the following error equation can be written as given below.
where x 1 , x 2 , . . . x n are known as natural variables in their natural units. Furthermore, these variables are changed into coded variables that are dimensionless and have zero mean and alike standard deviation. The coded form of Equation (9) is written as given below.
Here, function f is known and undefined. Thus, a suitable approximation function is required to be generated for modeling purposes. In RSM, the first or second-order polynomial equation is primarily generated as approximation functions in place of the actual response function. The second-order model is popularly applied for modeling various process operations due to its flexibility, diverse functional form, efficient approximation, and model coefficients that can be easily estimated through the least square estimation technique. A second-order response surface equation, based on Talyor series expansion, is given below: where x i and x j represent the input parameters, b 0 is the constant of the regression equation, Y is the predicted ROP response, β i is the linear coefficients, β ij are interaction coefficients, β ii are the coefficients of the square terms, and ε is the fitting error in the equation.
To generate a response surface, stepwise regression methods are utilized to reduce the computational burden.
Here, central composite design (CCD) has been considered for developing the secondorder approximation function for ROP, as shown in Equation (11). The input variables are converted into coded variables for fitting the data in CCD between two levels [−1, 1]. CCD contains the factorial point, central point, and axial points, which are mostly developed through sequential experimentation, as shown in Figure 2. Ten factors and full factorial design were utilized for the development of the response function. The range of design factors was assigned according to the range of field variables. The interaction, quadratic, and linear coefficients were estimated through the least square regression. Furthermore, the importance of each term was determined through an analysis of variance (ANOVA) test, whereas redundant terms were eliminated. Equation (11) will act as an ROP objective function (approximation function) for the target geological formations containing bit information as an input variable similar to ANN. Furthermore, this equation will be optimized using Artificial Bee Colony to find the optimum value of ROP along with its input variables (control variables) including BT. A comprehensive explanation of RSM is available in the cited literature [26,27].

Artificial Bee Colony
Karaboga (2005) proposed an Artificial Bee Colony (ABC) paradigm based on the natural behavior of bees when they search for nectar containing flowers [28]. There are three variations of honey bees generally present in natural hives, namely onlookers, employed bees, and scouts. Every bee has an assigned duty that it is required to perform. The scout bees perform a random hunt for the flowers that have nectar in their nearby environment and remember the location of the flower inside their internal memory [28]. This means scouts examine local search feature space for optimal solutions and remember it. After returning to their hive, scouts exchange information about flower location with other bees using the waggle dance technique [28]. After the waggle dance, employed bees begin their exploration for the nectar-having flowers depending upon the information achieved from scout bees. The employed bees extract the nectar from the target flowers, which are known as food sources [28]. Only one employed bee will be assigned to a single flower to exploit their nectar. Thus, each available food source contains an assigned employed bee that creates an initial solution [28]. The value of each solution is calculated to understand its significance. A new response is generated for each problem solution using the relationship as given below.
where S i,j is parameter j obtained from response i, B i,j is the parameter j in the new response, whereas i represents the number of solutions, δ i,j is an arbitrary number existing between [−1, 1], k is a random number from a single answer to the problem, IA signifies initial solutions to the given problem, and O is the total number of parameters required during optimization. After calculating a new answer for each solution, they are compared with the previous answer. If the difference is found to be higher between the current and earlier answer, only then it will be accepted; otherwise, it will be rejected [28]. The step length is adaptively decreased according to the difference between current and previous answers as the search reaches closer to the optimal solution. The waiting onlooker bees in the hive choose the best source depending upon the dance of employed bees. This helps in identifying the global solution existing in the search space along with local solutions [28]. Furthermore, employed bees of an abandoned food site start to behave like scout bees and hunt for newer sources of food. The probability of selection of source through an onlooker bee is given below.
where f itness i represents the fitness of the solution i examined via employed bees depending upon nectar quantity at location i, and FS is the number of the food source. After predefined iterations, there is no improvement in answer value using Equation (12); then, the employed bees convert into scouts and randomly begin the search for a newer source of food. The ABC algorithm has been utilized for solving various engineering problems such as ROP optimization [29], oil and gas well placement optimization [30], over break prediction in the tunnel [31], etc. A detailed description of ABC optimization can be found in other references [31][32][33][34]. Figure 4 depicts the flowchart of the Artificial Bee Colony paradigm that was originally proposed by Karaboga [28]. Figure 5 shows the generalized schema of the proposed approach of drill bit selection.

Data Description
The Volve oil and gas field is situated in the central North Sea near the Norwegian Continental Shelf. It was discovered in 1993 and its production shut down in September 2016 by its investors' companies. The ocean depth near the Volve field is in a range of 85 to 95 meters. This field contains Jurassic sandstone related to the Hugin formation reservoir. The depositional environment of this reservoir is analyzed as tidal to the shallow estuary. The average properties expected from this Hugin reservoir are as follows: porosity (0.2), permeability (910), water saturation (0.23), and shale volume (0.17). Geosteering was particularly utilized to increase the extent of the reservoir linking to various fault blocks. The peak production rate in the Volve field was recorded to be 56,000 barrels per day and produced a cumulative amount of 63 million barrels of oil with a recovery rate of 54% of the total reservoir estimated over 8 years [35]. Input data used in this research work were obtained from the Equinor company website openly available for research purposes [35]. Table 1 contains the geological prognosis of the Volve field. Table 2 contains a description of the drilling variables used in the study. Table 3 has been provided to inform about different drill bit types (IADC code) utilized for drilling the three wells. Wells A (F-4) and B (F-15) were utilized for the training of models and well C (F-12) was used for testing the developed models. The IADC code of drill bits cannot be utilized directly for the training of the ANN model. Thus, they are numbered in the bit-type column of Table 3. The location of the Norwegian Volve field situated in the North Sea is depicted in Figure 6.

Development of ROP Objective Function Using ANN
In this study, three-layered MLP architecture was utilized to generate an approximation function for the ROP in terms of operational drilling variables. ANN having a three-layered multilayer perceptron architecture can be utilized to produce the polynomial equation for solving optimization problems [18]. However, parameters of ANN are required to be determined beforehand during the training stage. Three popular training functions were applied to train the ANN, namely (a) Levenberg-Marquardt (LM), (b) Scaled Conjugate Gradient (SCG), and (c) Bayesian Regularization (BR). Hornik et al. [36] suggested that a singular hidden layer neural network model can be utilized for approximating any nonlinear function. Still, the optimum neurons in the hidden layer are required to be estimated before training the ANN model. Table 4 contains reported correlations to decide the number of neurons inside the hidden layer. N i and N o represent the number of predictor and target variables in Table 4. Several ANN models were generated with the neurons ranging from 2 to 20 which, were calculated using the correlations available in Table 4. The optimum values of the model parameters are given in Table 5. Table 4. The reported relationships to calculate the neurons inside the hidden layer of ANNs. The performance of developed ANN models are compared based on the coefficient of correlation (R 2 ) and root mean square error (RMSE) as mentioned below.

Relationships References
A. Coefficient of correlation (R 2 ): B. Root mean square error (RMSE): Zorlu et al. [43] suggested a ranking method to compare the performance of several neural networks together. Here, an integer (rank) was allocated to every network based upon the goodness of R 2 and RMSE values. Then, ranks allocated to every R 2 and RMSE were added to acquire the total rank for every network configuration separately. The network having the highest total rank was considered as the best model for this research work [44]. The results of three-layered neural architectures with three different combinations of training functions are recorded in Tables 6-8. The training algorithm LM has acquired the highest rank of 36 through its performance with (10-18-1) configuration. Therefore, the best performing ANN (10-18-1) alignment was considered for the development of the approximation function to obtain optimum values ROP and Bit Type (BT) along with other drilling variables. All the ANN variants are developed using MATLAB 2019 software. * Table 6 shows the ranking method for selecting the optimum number of neurons for the hidden layer. Here, 18 neurons have achieved the highest total rank of 36 by summing all the training and testing ratings together for the LM training function. The results are also compared with the total rank achieved in Tables 7 and 8 to decide the optimum neuron configuration in the hidden layer.   Figure 7 contains the Regression plot, Mean square error (MSE) plot, and Error plot of optimal ANN architecture [10-18-1]. Table 9 contains weights and bias associated with the selected neural network configuration (10-18-1). The developed ROP approximation function using Equation (7) is given below.
where the tangent sigmoid function is utilized in the hidden layer, and purline is the activation function for the output layer. w i,1 and w 2,i are the weights of input and hidden layers. The weights and bias (obtained during the training of 10-18-1 ANN configuration) associated with the Equation (16) have been provided in Table 9. Table 10 contains constraint bounds required for the optimization of the ROP Equation (16) and control variables. Here, measured depth (DT), Inclination (IN), and Azimuth (AZ) will remain constant because of their predefined nature while optimizing for a particular depth.     Drill bit selection based on optimum ROP values is a bound constrained maximization problem. The developed ROP objective function (Equation (16)) requires an optimization algorithm for determining optimum values of ROP and other operational variables along with BT. GA is an evolutionary paradigm that has been utilized for the optimization of Equation (16) [44]. During the optimization process, Equation (16) was maximized using GA with upper and lower bounds as shown in Table 10 according to the following steps: (a) Adjust the model parameters of GA (maximum no of iterations = 3000, crossover probability = 0.5, population size = 100, parents portion = 0.3, crossover-type = uniform, elite ratio = 0.01, variable type = real); (b) Set the upper and lower bounds for input variables existing in objective Equation (16) using Table 10; (c) Randomly generate the initial population for GA; (d) Several combinations of ROP and input variables will be generated during optimization. In the end, GA converges on the best combination of input variables having maximum ROP value; (e) Record the value of ROP and BT in the final solution produced by GA. GA will provide optimum ROP values along with suitable BT and other control variables.
The optimization of the ROP objective function (Equation (16)) has been carried out using Python package Geneticalgorithm 1.0.1 freely available online for research purposes.

Development of the ROP Function Using RSM
The CCD design was utilized for the generation of fitting Equation (17) with a facecentered configuration where alpha was one. The ten input variables were considered as factors that were coded in two levels (1, −1). There were 128 cube points, 10 center points, and 20 axial points present in the developed CCD design with a total of 158 base runs. The quadratic equation of RSM contains controllable input factors and uncontrollable factors. These uncontrollable or predefined factors such as Measured Depth (DT), AZ, etc., were held as constant, as they cannot be altered. The objective function of ROP developed using RSM is given below.
391 × x 5 x 9 + 9.156 × x 5 x 8 + 0.0723 × x 5 x 10 − 59.62 (17) where f (x) is the approximate objective function of ROP along with other drilling variables as shown in Table 2. This ROP Equation (17) shows predictor variables in the quadratic fitting equation. The R 2 , adjusted R 2 , and predicted R 2 for the above-mentioned objective function are 84.41%, 82.68%, and 81.23% respectively, as given in Table 6. Adjusted R 2 shows the goodness of fit for the developed regression model. Less difference be-tween R 2 and adjusted R 2 shows that important predictors were selected for fitting a quadratic polynomial of RSM. Contours and 3D surface plots were also generated to have a better visualization of the effects of various predictor variables with ROP, as shown in Figures 8 and 9. These plots were helpful for the manual search of optimal points in case of a lower number of input variables. Surface plots and contour plots are utilized to understand and determine the desirable values of the target variable by varying the input variables. Contour plots are two-dimensional representations where all the points having the same response (or target) values are connected. The surface plot provides three-dimensional visualization of the interaction between response (target variable) and input variables. These plots are helpful in the manual estimation of optimum values for the model parameters when the number of input variables is low. However, it is very difficult to find the global maximum point manually in our case, as the number of input variables is 10 (Equation (17)). Therefore, ABC and GA are used [26,27] to optimize Equations (16) and (17).
The significance level of 5% was used while developing an objective function for ROP (Equation (17)). It has been reported the 5% significance level balances Type 1 and Type 2 error during hypothesis testing of any regression coefficients [45,46]. The correctness of the developed ROP objective function was validated by analysis of variance (ANOVA) test, as shown in Table 11. This test demonstrates that coefficients satisfy 5% significance level criteria. All the terms having P values higher than the significance level were eligible for the null hypothesis, which resulted in zero value of coefficient terms and was eliminated from the ROP regression equation. Table 11 shows all the coefficients having lower P values.
A T-test was also performed to validate the significance of the regression coefficient of the ROP function. There are several missing interaction terms in the ROP objective function such as BT*BT, BT*BS, etc. due to their higher P values. The developed Equation (17) has been optimized using the ABC algorithm available in the python beecolpy 2.1 packages based on Karaboga and Basturk [22]. Figure 10 shows residual plots of errors resulted from the developed ROP objective function using RSM. Figure 10 is composed of four residual plots for the developed ROP objective function using RSM, which are explained below.
(a) A normal probability plot is used to verify the normal distribution of residual data.
Ideally, the fitted data should follow a straight line. However, it was found to be not true for the studied drilling datasets, as the trend did not follow a straight line. This verifies the complexity of the real field drilling datasets. (b) Histogram of residuals plot provides details about data skewness or outliers. The skewness is confirmed by a long tail in one direction; however, if a bar is distant away from the other bars, it indicates noise or outlier in the residual data. In the studied drilling datasets, no such problem with data distribution has been found. (c) The residual vs. fits plot checks the constant variance of the residual. It plots the fitted values on the x-axis and residual on the y-axis and verifies the model assumption of the random distribution of residuals as well as constant variance. It is expected that the data points should lie randomly on both sides of the 0 line with no pattern [26,27], which is true in our case. (d) The residual vs. order plot checks whether residuals are uncorrelated or not. These graphs are generated to inspect the goodness of fit of fitting Equation (17) and ANOVA test. Normally, a model residual must lies randomly on both sides of the centerline with no patterns, which is true for our drilling data as well. The generated ROP Equation (17) has satisfied all the required standard conditions to be a good fitting equation (approximate function/objective function).    The developed Equation (17) has been considered as an ROP objective function. During the optimization process, Equation (17) was maximized using ABC with upper and lower bounds, as shown in Table 10 according to the following steps: (a) Initialize the search boundaries using the range of parameters provided in Table 10 and code Equation (17)

Discussion
The selection of suitable drill bits is essential for a successful drilling operation to minimize the overall wellbore cost and increase the efficiency of the drilling operations. In this study, an alternative approach has been investigated for drill bit selection using RSM and ABC combination. RSM has been utilized to develop an objective function for ROP and to determine optimum values of drilling control variables using ABC. Ten drilling variables were considered as input variables for the development of the ROP objective function, namely; DT, BT, BS, WOB, RPM, TQ, MW, IN, AZ, and SPP. Three nearby Norwegian wells' data have been considered for testing the proposed approach of drill bit selection. The five geological zones of well C were utilized for the testing of data-driven drill bit selection techniques, as shown in Tables 12 and 13. Figure 5 shows the generalized schema of the proposed approach for drill bit selection. The developed objective Equations (16) and (17) for ROP were optimized with the upper and lower bounds provided in Table 10 to obtain the optimum value of BT and ROP. Table 12 contains the bit types selected on the optimum ROP values using different data-driven approaches. ANN has wrongly predicted the drill bit types for zones 1, 3, and 4; however, when combined with GA, its drill bit selection error reduces to zone 1 only.
ANN has been reported to have a tendency to get stuck in local minima, which is why it failed to predict the correct bit type for certain target formations [19,48]. When ANN is combined with GA, the optimization task is handled by GA, which is a strong optimizer and converses to the correct BT except in zone 1, as compared to actual BT. Equations (16) and (17) are multimodal equations developed through high-dimensional data comprised of large local optimal points. Therefore, detecting a globally optimum solution in the search space is a difficult task. In the case of GA, sometimes, premature convergence may happen due to strong selection pressure imposed by the selection operator and crossover operator if the initial population lacks desirable diversity. However, ABC utilizes a stochastic search technique that is good at maintaining diversity and escaping local optimal stagnation. Table 12 shows that the proposed RSM and ABC combination has precisely estimated the bit types for five target geological zones. Here, information about actual drill bits used in the real field for drilling the wells has been taken as a standard reference for comparison in Table 12. Table 14 contains the optimum values of drilling variables acquired through RSM and ABC combination at different target depths. Moreover, the performance of drill bits selection techniques must be compared based on the drilling cost involved in the drilling operations. It might be possible that the GA-suggested BT for zone 1 is more suited for an applied bit. Therefore, the cost per foot analysis should be done for each drill bit to check the economic feasibility of suggested drill bit types.
The oil and gas industry has direct as well as indirect impacts on the global economy, which decides the fate of the sustainable development of human beings. The revenue generated by the oil and gas drilling sector alone was approximately 3.3 trillion U.S. dollars, which was roughly 3.8% of Global Gross Domestic Product (GDP) in 2019 by IBISWorld data [49]. The upstream drilling sector is also known as the exploration and production industry (E&P industry). However, the contribution of industries midstream and downstream of the petroleum sectors was never included in the above-said data; otherwise, the GDP percentage share of the petroleum industry would be much greater. Therefore, any economical and innovative technological development in the E&P industry will have a global impact. This will also encourage a global economy that influences human sustainability, as clearly mentioned in the sustainable development goals of the UNDP.
The type of drill bits utilized in any drilling operation has an economical influence on the overall drilling operation. The drilling cost has a direct relationship with ROP and the running life of the drill bit that is needed to be minimized. Polycrystalline diamond compact (PDC) drill bits were primarily utilized for drilling the Norwegian wells considered in this study. Nearly 10-40% of the dryhole well cost is found dependent on the PDC drill bit [7]. The PDC bit life fluctuates with its design parameters such as cutter distribution, type of gauge protection material, etc., whereas design parameters such as nozzle placement, cutter shape, PDC type, etc. directly vary the bottom hole ROP drastically [7]. Therefore, the selection of a suitable BT is essential for the minimization of associated drilling costs. The selection performance of different data-driven approaches has also been compared based on cost per foot analysis. The prices of drill bits were identified from the manufactures catalogs. The totality of trip time and connection time was expected to be 6 hours per 1000 ft. Equation (1) has been utilized to perform the cost per foot analysis of the predicted drill bit. Table 13 shows the computed cost per foot results for five target zones of well C based on drill bits selected through different data-driven approaches. Figure 11 shows that the selection of the bit for five target drilling zones by ANN and GA combination and proposed approach has given nearly similar types of drill bits and cost per foot except for zone 1. The proposed approach has given a lesser cost per foot value for zone 1 as compared to ANN and GA combination, as shown in Table 13. Table 14 shows the optimum value of input drilling variables for certain depths of target zones using a combination of RSM and ABC. Therefore, the proposed RSM and ABC combination is found to be more reliable than ANN-based drill bit selection models and can also be utilized for the drill bit selection purposes. Moreover, these models are case-specific, as well as data-dependent in nature, and require calibration for other fields. Figure 11. The comparison of cost per foot calculated for five target geological zones. The ANN and Genetic Algorithm (GA) combination has given a similar cost per foot as compared to the proposed RSM and ABC approach, except in zone 1. In zone 1, RSM and ABC have given lower cost per foot results than the ANN and GA combination.
Mostly, PDC drill bits have been utilized along with roller cone bits for the drilling of the Norwegian wells considered in this study. These bits are widely applied in offshore conditions due to various benefits such as to reduce tripping time, to drill in non-hydrating formations, to achieve high RPM and ROP in directional drilling, etc. However, PDC bits are sensitive to fragile and soft formations as in the case of Volve wells considered in this study. These wells comprise softer rock formations that contain fine to medium sandstone, some interbedded claystone, siltstone, and limestone stringers, marl, argillaceous laminations, etc., as shown in Table 1. Soft, fragile, and fractured rock formations affect the stability of PDC bits with a large reduction of the bit life. Recently, hybrid drill bits (e.g., Kymera) have been developed that combined the properties of conventional PDC bit and roller cone bit types [49][50][51]. These hybrid bits seem to be a good solution for drilling problematic wellbore sections while maintaining the stability of drilling operations. It may be possible that the drilling cost per foot has been reduced more if hybrid drill bits are employed for drilling the Norwegian wells considered in this study.

Conclusions
UNDP promotes economic innovations that lead to the growth of any industrial sector. In this work, attempts have been made to develop a more reliable technology for drilling optimization and cost reduction of drilling operations. RSM and ABC combination has been proposed to select drill bit type based on the optimum values of ROP. RSM has been utilized for the development of the ROP objective function, which is further optimized using ABC to find the optimum values of ROP. The optimum values of operational variables are also determined in this research work for drilling the target formations. A comprehensive study has been done to test the efficacy of the proposed method with the existing ANN, ANN, and GA for drill bit selection and drilling optimization. The analysis of the results obtained leads to the following conclusions.

•
This study provides an alternate intelligent approach for bit selection based on optimum values of ROP.

•
The proposed drill bit selection approach is found to be more accurate than the ANN-based prediction of drill bit types.

•
The combination of RSM and ABC provides a more reliable bit selection modeling approach as compared to ANN based on cost per foot comparison.

•
The prediction correlation coefficient of the RSM objective function is found to be 81.23%, while 85.5% has been found for ANN during the estimation of ROP.

•
The ROP objective function developed through RSM is less complex than the ANNbased objective function due to the absence of an exponential function. • ANN requires more computational cost for the development of the ROP function and its optimization. • These models are case-specific data-dependent models and require calibration for other field data.

•
The proposed data-driven approach has presented optimism for the sustainable development of more efficient, robust, reliable, and economical technology that has shown potential for drilling optimization and cost reduction.
This research work indicates the potential of coupled intelligent techniques for the estimation and optimization of drilling variables. Using the proposed model, drilling engineers can effectively reduce the overall time and expenses that a company invests in a field by smartly selecting the optimum parameters of any newly planned oil and well.

Code Source
The results have been generated through Matlab 2019, MiniTab 2016, and python toolbox Geneticalgorithm 1.0.1. and beecolpy 2.1. Data Availability Statement: Data utilized in this research work can be downloaded from the Equinor website [35]. Equinor website database. https://www.equinor.com/en/how-and-why/ digitalisation-in-our-dna/volve-field-data-village-download.html.

Acknowledgments:
The authors are thankful to the reviewers for their valuable comments and suggestions.

Conflicts of Interest:
The authors declare no conflict of interest.