1. Introduction
In the current oil market, with increasing exploration and production, shallow and easily accessible formations that contain oil resources are diminishing. They are gradually being replaced by deeper and more unconventional formations. As drilling depth and difficulty increase, construction becomes more challenging, resulting in slower drilling, longer drilling cycles, and higher costs. Therefore, the global optimization of drilling parameters with the lowest cost and shortest drilling time is becoming increasingly important. Among them, the accurate prediction of ROP can effectively help calculate drilling costs and time, optimize drilling parameters, rationalize personnel arrangements, and provide a working basis for the drilling process. By optimizing drilling parameters and improving drilling speed, operators can reduce drilling time, which directly translates into cost savings. Faster drilling means less time spent on the rig, reduced labor expenses, and lower overall drilling costs. This is of great significance in improving drilling efficiency and reducing costs.
Expert-based models may struggle to adapt to dynamic drilling conditions and changing operational requirements. They often rely on fixed rules or heuristics derived from historical data or expert knowledge. As drilling technologies evolve and new challenges emerge, these models may not effectively capture the optimal drilling parameters for novel situations or unconventional drilling scenarios. Artificial intelligence technology has achieved rapid development in the past 30 years, and it has been widely applied in many fields of study such as materials science, the biomedical industry, and finance [
1,
2,
3], resulting in fruitful outcomes. Gradually, artificial intelligence has become an independent branch that possesses its own theoretical framework and practical system. Since the birth of artificial intelligence, the theory and technology have increasingly matured, and the application field is constantly expanding. Intelligent algorithms are gradually being applied in the fields of oil drilling and development and have achieved good results [
4,
5].
In terms of the ROP prediction model, researchers such as Mingze Xu, Abdulmalek, Omogbolahan [
6,
7,
8,
9], etc., established an ROP prediction model based on KNN, Decision Tree, SVM, and other algorithms by using an integrated algorithm and took the goodness of fit as an evaluation index of ROP prediction. Ensemble machine learning [
10,
11] has been widely used in ROP modeling and has achieved results beyond conventional machine learning. Shengwa Liu and Melvin [
12,
13] used ANN to establish the ROP of directional wells. Under the condition of sufficient data quantity and high data quality, the ROP prediction accuracy can meet users’ needs. Omid [
14] and Abiodun [
15] use a variety of genetic algorithms to optimize the neural network algorithm and compare and analyze the effect of the algorithm from the relative error R square, mean square error, and other aspects. Husam [
16] used a recurrent neural network algorithm based on logging and path data to predict the ROP with an accuracy of up to 85%. Hundman [
17] designed a spacecraft telemetry signal anomaly detection model based on LSTM, which determines anomalies through prediction errors, and proposed a non-parametric dynamic anomaly detection threshold determination method, which can achieve a balance between false alarm rate and missing alarm rate. Malhotra [
18] et al. proposed an auto-encoder model based on LSTM, which aims to reconstruct regular time series, and used reconstruction errors for anomaly detection.
Regarding parameter optimization, in biomedical material science, the ridge regression method is used to optimize the thermal conductivity of MWCNTs-SiO2/Water-EG nanofluid [
19]. The orthogonal distance regression (ODR) algorithm in ANN modeling is used to optimize synthesized hydroxyapatite/ethanolamine for bone tissue [
20]. 
Heidari Varnamkhasti [
21] uses MSE and downhole vibration analysis for drilling optimization. The MSE method is used to identify drilling dysfunction. It also combines additional information, such as downhole drilling performance and borehole quality, to help find the source of abnormal drilling conditions. Mohammad Anemangely [
22] used artificial intelligence to estimate age-mechanical parameters from mechanical-specific energy, and the results showed that intelligent models have higher accuracy and reliability than regression models in estimating geomechanical parameters. The comparison of the results of the MLP-COA and MLP-PSO models shows that COA outperforms the PSO algorithm in terms of model accuracy and reliability. The results of the three models show that the method proposed in this study has great potential in estimating CCS, UCS, and φ parameters. The attention mechanism [
23], as a novel deep learning mechanism, has been widely used in time series prediction in recent years, and it is thus reasonable to apply it to drilling time series prediction.
The paper presents a deep learning algorithm based on Bi-LSTM and the attention mechanism for predicting ROP. It introduces a particle swarm optimization algorithm (PSO) to create the PSO-Bi-LSTM method for optimizing drilling parameters. Using this method, the model optimizes multi-objective parameters to enhance drilling efficiency. By combining Bi-LSTM and attention mechanical mechanisms for intelligent ROP prediction, the PSO is integrated to quickly and accurately optimize high-dimensional spatial drilling parameters. At the same time, the excellent real-time model provides the foundation of a deep learning model for drilling efficiency analysis and optimization, expanding the application scope of deep learning in petroleum engineering.
  2. Materials and Methods
  2.1. Long Short-Term Memory Model
Long Short-Term Memory (LSTM) [
24], a neural network architecture introduced by Hochreiter and Schmidhuber in 1997, has become a prominent solution for retaining and utilizing both short- and long-term information. Over the years, LSTM has undergone significant modifications and refinements by researchers such as Felix Gers, Fred Cummins, and others, resulting in a more comprehensive and robust algorithm. These advancements have propelled LSTM to be widely adopted across diverse domains.
LSTM addresses a fundamental challenge in recurrent neural networks (RNNs) by effectively capturing dependencies between short-term and long-term information. Unlike traditional RNNs that often struggle with the vanishing gradient problem, LSTM offers a solution that enables the network to learn and propagate information over extended sequences. The key components of LSTM are the cell state () and the hidden state (). The cell state evolves relatively slowly throughout the computation process, while the hidden state exhibits dynamic changes. Each node’s output, denoted as , is computed by combining the previous cell state, , with various input signals. In contrast, the hidden state, , encapsulates the network’s output and undergoes substantial transformations.
To facilitate effective information flow and memory management, LSTM employs a specialized structure known as a memory cell. This memory cell incorporates three essential gates: the forgetting gate, the input gate, and the output gate. The forget gate determines which information to discard from the cell state at the current time step. It takes the current input and the previous hidden state as inputs and produces a forget gate activation between 0 and 1. This activation determines what information should be selectively forgotten from the cell state. The input gate controls the addition of new information to the cell state at the current time step. It takes the current input and the previous hidden state as inputs and produces an input gate activation between 0 and 1. This activation determines which new information should be added to the cell state. The output gate regulates how much information should be revealed from the cell state at the current time step. It takes the current input and the previous hidden state as inputs and produces an output gate activation between 0 and 1. This activation determines which information from the cell state should be output to the current time step’s hidden state for further computations or predictions. The forget gate, input gate, and output gate are implemented using neural network layers with learnable weights. The gate activations are obtained through a series of computations and activation functions applied to the inputs and hidden states. By adjusting the activations of these gates, LSTM can selectively retain, forget, and output information, allowing the network to better capture and utilize important contextual information in long sequences. 
Figure 1 depicts the specific structural diagram of LSTM, illustrating the arrangement of gates and connections.
In the field of petroleum drilling parameter optimization, traditional RNN methods fail to capture the long-term mapping relationships among drilling parameters. During backpropagation, the gradients continuously diminish, leading to the loss of drilling information in long sequences and making the model difficult to train. The gated mechanism of LSTM effectively addresses this issue by enabling the network to better utilize gradient information during training. Additionally, due to the progressive nature of drilling information as a temporal and depth-dependent sequence, the performance of traditional RNNs declines as the sequence length increases. In contrast, LSTM, through its controlled gated mechanism, can adapt more effectively to sequences of varying lengths while maintaining satisfactory performance. Therefore, the LSTM approach exhibits unique advantages in optimizing petroleum drilling parameters, as it is capable of handling the long-term relationships among drilling information and addressing the weak generalization ability of models during the drilling process. This is beneficial for improving the subsequent performance of drilling parameter optimization.
This distinctive architecture of LSTM, with its integrated memory cell and gate mechanisms, empowers the network to capture and leverage both short- and long-term dependencies. Consequently, LSTM has demonstrated remarkable efficacy in various applications, ranging from natural language processing and speech recognition to time series analysis and robotics. Its ability to effectively model sequential data and mitigate the vanishing gradient problem has cemented LSTM as a valuable and widely adopted solution in the field of deep learning.
  2.2. Particle Swarm Optimization
Particle swarm optimization (PSO) is an evolutionary computing technique that has gained significant attention in the field of optimization due to its remarkable features. Inspired by bird foraging behavior [
25], PSO stands out for its collaborative nature and the exchange of information among individuals, making it a powerful tool for solving complex optimization problems. One of the notable advantages of PSO is its simplicity, which makes it easily implementable and accessible to researchers and practitioners from diverse backgrounds. Unlike other optimization algorithms that require intricate adjustments, PSO requires minimal parameter tuning, further enhancing its appeal. This simplicity, coupled with its effectiveness, has made PSO a preferred choice for achieving efficient optimization solutions in various domains.
In PSO, the dynamics of bird flocks are simulated, where each particle represents a potential solution and possesses attributes of velocity and position. Through iterative updates, particles navigate the search space based on their individual experiences and the shared information within the swarm. During this optimization process, particles retain their individual best solutions, reflecting the best objective function values encountered thus far. Additionally, they engage in communication with neighboring particles to determine the global best solution, which represents the overall best objective function value found within the swarm. The update of particle velocities and positions is guided by mathematical equations that strike a balance between exploration and exploitation. These equations consider both the particle’s individual experience and the influence of the global best solution, allowing particles to adapt their movements accordingly. This iterative process continues until a termination condition is met, typically defined by a maximum number of iterations or the achievement of a desired solution accuracy.
The effectiveness of PSO is evident through its successful applications in various domains. In function optimization, PSO has been employed to find optimal values for complex mathematical functions, often outperforming traditional optimization techniques. Moreover, PSO has proven to be highly effective in neural network training, where it aids in optimizing the weights and biases of the network to improve its performance. Additionally, PSO has been applied to fuzzy system control, enhancing the ability to tune fuzzy rule bases for better control system performance. The simplicity and efficiency of PSO have made it a popular choice among researchers and practitioners. Its ability to quickly converge to near-optimal solutions with minimal parameter tuning is highly advantageous, especially in scenarios where computational resources or time constraints are significant factors. Furthermore, the collaborative nature of PSO allows for the exploration of diverse solutions and the potential discovery of novel and unconventional solutions that would be challenging to find using traditional optimization methods.
In summary, PSO is an evolutionary computing technique that harnesses collaboration and information exchange to search for optimal solutions. Its simplicity and effectiveness have led to successful applications across diverse domains, including function optimization, neural network training, and fuzzy system control. Ongoing research and exploration of PSO’s capabilities continue to contribute to the field of optimization, opening new avenues for its application and advancement. As researchers delve deeper into the intricacies of PSO and explore its potential in combination with other optimization techniques, the field of optimization stands to benefit from further advancements and improvements in solving complex problems.
  3. Results and Discussion
  3.1. Data Processing
In this experiment, we selected 48,321 pieces of drilling data from eight wells in Xinjiang, China, as the model training data. We needed to process these data before applying them to the training model. Data processing mainly includes data preprocessing, cleaning, standardization, and other processes.
The first step is preprocessing and cleaning the data. Data cleaning refers to using data analysis to change incorrect data into data that meets the requirements. The cleaning process of drilling data mainly includes detecting and deleting erroneous data and removing and completing missing data. We found that some of the data collected from the well had errors or missing data, so they were judged as invalid. After deleting these data, there were 41,242 pieces of data left, which were used to train the model. 
The second step is the standardization of data. The large data dimension gap between the parameters of data will cause hidden trouble for the subsequent machine learning modeling. So, we need to standardize the data to close the dimensional gap. The standardized formula is shown in Equation (1).
        
        where 
 is the minimum value in the dataset, 
 is the maximum value in the dataset, and 
 is the normalization result.
The third and the last step is the segmentation of the data. In this experiment, we need to use the sliding window to train the timing model, so we need to divide the data to the same length. In this experiment, we took every 50 pieces of data as a new array, divided the data, and calibrated the corresponding ROP results.
  3.2. Construction of ROP Prediction Model
The drilling parameter optimization model is built upon the foundation of the rate of penetration (ROP) prediction model, aiming to achieve optimization objectives through the optimization of drilling parameters. As ROP serves as one of the objectives in drilling parameter optimization, it is essential to ensure the accuracy of the optimized ROP to guarantee the credibility of the optimization results. In this experiment, based on the LSTM, we introduced the bi-directional mechanism and bi-directional attention mechanism. We proposed the sliding window dynamic updating mechanism aiming at the specificity of predicting the time series of ROP. Finally, we implemented a prediction model of time series of high-frequency ROP based on Bi-LSTM. The overall structure of the model is shown in the following 
Figure 2.
For this model, we also introduced the sliding window method to optimize parameters and update the model. The sliding window method takes a given sub-experimental window size and slides the window towards the future in time order with a fixed step size. The size of this window is generally much smaller than the total data size, and each window contains its own training data and label values. After training all the sub-experimental data, we predicted and calculated the total evaluation index on the test set.
Based on the above, this experiment uses the sliding window method to divide the data set, using drilling data from wells in Xinjiang, China. 
By moving a fixed size window, sliding window technology can analyze and process input data, enabling various tasks such as feature extraction, object detection, and text segmentation. Although sliding window technology has advantages such as flexibility, multi-scale processing, and real-time performance, sliding windows require repeated calculations of data, which can lead to a decrease in computational efficiency. The length of the window size is very important, Inappropriate window size can lead to low efficiency and information loss. Therefore, this article designed a comparative experiment. From 
Table 1, it can be seen that when the window size is 50, the prediction effect is best, and compared to the fastest model, it only increases the calculation time by 30 s.
So, the paper sets the slide window length to 50 and the step size to 1. That mean, because we have 1242 pieces of test data, and, starting from the first one, every 50 pieces of data is taken as one data window, there are 1242 data windows in total. We input the segmented ROP data into the established Bi-LSTM model.
We input the drilling data of the wells in Xinjiang, China into the Bi-LSTM model. The iteration times set by this model is 100 times, the batch size is 64, and the initial learning rate is 0.001. The neural network optimization algorithm adopts the momentum batch gradient descent algorithm, namely the Adam algorithm.
At the same time, we compared several common machine learning ROP prediction methods. The prediction results in test data are shown in 
Figure 3, and the prediction accuracy is shown in 
Table 2.
It can be seen that the trend of the predicted value of the Bi-LSTM method in this experiment is roughly consistent with the actual value, and the prediction accuracy is the highest, which means that Bi-LSTM can accurately predict the ROP. Therefore, after optimization, the Bi-LSTM method is selected as the ROP prediction model in the process of multi-objective parameter optimization.
  3.3. Construction of the PSO-Bi-LSTM Model
This study is based on the Bi-LSTM ROP prediction algorithm and PSO algorithm, combining the advantages of the high ROP prediction accuracy of Bi-LSTM and accurate and the fast search of high latitude spatial targets of the PSO algorithm. By designing a coupling relationship between the two, a PSO-Bi-LSTM drilling parameter optimization method is established.
The design flow chart of the PSO-Bi-LSTM algorithm is shown in 
Figure 4. First, data selection and normalization are performed, and the data set’s standard is unified. Then, the PSO particle swarm initialization is performed to optimize the drilling parameters. The PSO algorithm is employed to search for drilling parameter combinations, with the objective of maximizing ROP predicted by the LSTM prediction model and minimizing MSE associated with that particular drilling parameter combination. After selecting the kernel function, the kernel parameters are input into the Bi-LSTM ROP prediction model and the MSE calculation model. The PSO drilling parameter optimization iteration is performed with the maximum ROP and the minimum MSE, and the algorithm process is completed by obtaining the optimal parameters at the end of the iteration. The goal of this model is to maximize drilling footage while maintaining the highest ROP, which means that the drilling tool works longer underground while increasing drilling speed. This reduces the non-operating time of the drilling tool while increasing drilling efficiency. The on-site personnel can directly apply the drilling parameters guided by this model to achieve the actual on-site effect of reducing non-operating time and improving drilling speed.
  3.4. The PSO-Bi-LSTM Model Drilling Parameters Optimization
In the experiment, we used a dataset from a well in Xinjiang, China. The input and target parameters selected are shown in 
Table 3. The statistical information of this dataset is shown in 
Table 4. The ROP prediction model inputs weight on bit (WOB), rotary speed, flow rate, uniaxial compressive strength (UCS), and volume content of sandy particles (Vsand) and outputs ROP. The PSO-Bi-LSTM algorithm optimizes the three onsite controllable engineering parameters of weight on bit (WOB), rotation speed, and flow rate. The optimization space ranges of weight on bit (WOB), rotation speed, and flow rate are 25 KN to 230 KN, 11 r/min to 63 r/min, and 25 L/s to 39 L/s, respectively. The goal is to achieve a dual objective optimization effect of maximum ROP and minimum MSE while maximizing drilling speed and drill bit footage.
The Pareto optimal solution set is shown in 
Figure 5. It can be observed that as the ROP increases, the MSE slightly increases, and there is a parameter combination set that has a significant acceleration effect and almost does not affect the drill bit footage.
The parallel representation of this optimal parameter combination set is shown in 
Figure 6. It can be seen that the preferred WOB is concentrated between 92.38 KN and 113 KN; the preferred flow rate is concentrated between 29 L/s and 30.35 L/s; the preferred rotational speed is concentrated between 25 r/min and 34 r/min.
Within the optimal WOB range, WOB is taken to be 98 KN, 103 KN, and 108 KN, respectively. At this time, the distribution of rotational speed and the flow rate has an impact on ROP, as shown in 
Figure 7, and on MSE as shown in 
Figure 8. It can be seen that the highest ROP and lower MSE are both within the optimal range. We determined the optimized range of drilling parameter combinations through 
Figure 6 and drew two-dimensional cloud maps of all parameter combinations of rotational speed and flow rate based on randomly selected drilling pressures within this range, with respect to ROP and MSE. As shown in the two-dimensional cloud map, the optimized drilling parameter combination obtained within the range of drilling parameter combinations has the best performance among all parameter combinations. The validation results are consistent with the PSO experimental results, proving that the algorithm model has a good optimization effect and can achieve the dual objective optimization effect of maximum mechanical drilling speed and minimum MSE.
The effect of the optimized parameter solution in practical application is shown in 
Figure 9 and 
Figure 10. It can be seen that, after the optimized parameter solution is applied, the average ROP is increased from 6.05 m/h to 10.95 m/h and the ROP is increased by 81%. The average MSE is reduced from 464.05 MPa to 333.87 MPa, and the energy loss of the bit is reduced by 28%. The practical application effect is good, which can meet the requirements of practical field application and achieve the dual-objective optimization goal.
  4. Conclusions
In summary, we successfully established an efficient and accurate multi-objective optimization model and applied it to actual drilling operations. We established the PSO-Bi-LSTM algorithm to optimize the controllable parameters in the drilling process. The optimization goal of this algorithm is to maximize ROP and minimize MSE, aiming at minimizing the energy loss of the bit while ensuring the maximum drilling speed.
In the experiment of optimizing the drilling parameters of a well in Xinjiang, the optimized result is that the optimized WOB is between 92.38 KN and 113 KN, the optimized flow rate is between 29 L/s and 30.35 L/s, and the optimized rotation speed is between 25 r/min and 34 r/min. The optimization results show that the drilling speed is increased by 81% and the energy loss of the bit is reduced by 28%. The optimization effect is good, achieving the goal of multi-objective controllable parameter optimization.
This work provides an intelligent model basis for multi-objective optimization of drilling parameters, expands the application of the combination of multi-parameter search algorithms in high dimensional space and neural network models in the field of oil drilling, and provides new ideas and methods for reducing costs and increasing efficiency in drilling operations.