Intelligent Monitoring Model for Lost Circulation Based on Unsupervised Time Series Autoencoder

: Lost circulation, a common risk during the drilling process, significantly impacts drilling safety and efficiency. The presence of data noise and temporal evolution characteristics pose significant challenges to the accurate monitoring of lost circulation. Traditional supervised intelligent monitoring methods rely on large amounts of labeled data, which often do not consider temporal fluctuations in data, leading to insufficient accuracy and transferability. To address these issues, this paper proposes an unsupervised time series autoencoder (BiLSTM-AE) intelligent monitoring model for lost circulation, aiming to overcome the limitations of supervised algorithms. The BiLSTM-AE model employs BiLSTM for both the encoder and decoder, enabling it to comprehensively capture the temporal features and dynamic changes in the data. It learns the patterns of normal data sequences, thereby automatically identifying anomalous risk data points that deviate from the normal patterns during testing. Results show that the proposed model can efficiently identify and monitor lost circulation risks, achieving an accuracy of 92.51%, a missed alarm rate of 6.87%, and a false alarm rate of 7.71% on the test set. Compared to other models, the BiLSTM-AE model has higher accuracy and better timeliness, which is of great significance for improving drilling efficiency and ensuring drilling safety.


Introduction
As the development of conventional oil and gas resources continues to deepen, the oil and gas exploration and development industry is gradually expanding into complex areas, including unconventional, deep, and deep-sea resources.As a crucial step in oil and gas exploration and development, drilling consistently emphasizes safe and efficient drilling, a primary concern for domain experts.Complex oil and gas formations, characterized by high temperatures, high pressures, intricate layers, and instability, lead to high risks and a concealed nature of lost circulation.Identifying lost circulation risks is further complicated by data noise, real-time fluctuations, and nonlinear mapping relationships.If not detected in time, delays in handling can easily result in severe loss and blowout accidents.Therefore, accurate and effective monitoring of lost circulation risks is significant for safe and efficient drilling.
Lost circulation monitoring methods are mainly divided into traditional and intelligent methods.Traditional methods primarily utilize sensor equipment to monitor various parameters during the drilling process, with engineers then determining whether lost circulation has occurred based on real-time changes in these parameters.For example, Shoujun Liu [1] used the mud pit page method and ultrasonic sensors to monitor changes in the mud pit page to determine well leakage; Dong Tan [2] chose the flow difference method and proposed a combination of surface flow monitoring and downhole pressure prediction to monitor and predict drilling overflow in pressure-sensitive formations.Limited by the sensors' precision and engineers' experience, traditional methods have drawbacks such as intense subjectivity, low accuracy, and poor timeliness.
In recent years, with the rapid development of machine learning technology, intelligent monitoring methods have gradually become a research hotspot in drilling risk monitoring.For instance, Liu Biao et al. [3] analyzed various parameters and formation characteristics during the drilling process and used support vector regression to build an intelligent prediction model for lost circulation, achieving early identification of lost circulation risks.Yingzhuo X et al. [4] developed an innovative lost circulation prediction model based on deep learning and conducted a quantitative analysis of the impact of each feature on the model's prediction results.Song Yan [5] proposed an intelligent recognition method for lost circulation risk status based on extreme learning machines, achieving high-accuracy identification of lost circulation risks.Baek S et al. [6]established a lost circulation correlation model by analyzing formation conditions, engineering parameters, and operational dynamic parameters related to lost circulation, using this model to achieve a lost circulation risk response and early warning for data anomalies with different weights.Li Changhua et al. [7] employed convolutional neural networks and multidimensional data fusion algorithms to establish a lost circulation prediction model, enhancing the reliability of lost circulation identification and prediction.Zheng Zhuo et al. [8] used the XGBoost algorithm to construct a real-time lost circulation early warning model tailored to the specific geological conditions of the Bohai Bay Basin, improving the accuracy of formation loss judgments.Sun Weifeng et al. [9] effectively realized accurate monitoring of minor drilling fluid losses by combining dilated causal convolutional networks and long short-term memory networks.Luo Ming et al. [10] proposed a lost circulation prediction method based on deep convolutional feature reconstruction by extracting key characterization parameters of lost circulation and constructing a temporal feature matrix, improving the prediction accuracy of lost circulation risks.For time series learning methods, Li et al. [11] used LSTM and Bi-LSTM for leakage monitoring in 2023, achieving an identification accuracy between 88.93% and 98.38%.
Currently, traditional methods for intelligently monitoring lost circulation mainly focus on supervised learning algorithms, which are extremely dependent on a large amount of well-leakage risk data during the model training process; in addition, they show apparent limitations in parsing well leakage data with complex and nonlinear characteristics.Moreover, accessing a large amount of labeled risk data is often tricky in practical applications.Secondly, since the traditional supervised model relies on training data, if the training data does not adequately cover all possible lost circulation scenarios, the model may exhibit poor generalization ability in practical applications and fail to accurately identify new lost circulation risks.Therefore, an unsupervised learning approach can overcome the limitations of traditional models that rely on a large amount of risk data.
For the non-temporal learning method, it is often difficult to effectively reveal the changing rule of parameters over time, and the temporal continuity and sequence of data are neglected when analyzing the data.Therefore, this study focuses on considering timesequential data in the drilling process to explore the relationship of parameter changes over time.
To address the shortcomings of the supervised learning and non-time series autoencoder model algorithms mentioned above in lost circulation monitoring studies, an unsupervised bi-directional long and short time series autoencoder model based on bi-directional long and short time series memory networks (BiLSTM-AE) is used in this study.The encoder and decoder of the model are BiLSTM, which can capture temporal correlations.When the time series data is input into the model, it encodes, compresses and decodes the data, then calculates the reconstruction error between the output data and the input data.If the reconstruction error is larger than a threshold, it is risky data, and vice versa; it is normal and risk-free data.BiLSTM-AE is more comprehensive in the contextual understanding of sequence data and can extract the characteristics of temporal variations.This method can identify lost circulation risk in real-time in practical applications, make up for the lack of relying on a large amount of lost circulation risk data, mine the changing law of data, improve real-time monitoring, and provide an efficient and accurate method for lost circulation monitoring.

Bidirectional Long Short-Term Memory Network (BiLSTM)
Long Short-Term Memory (LSTM) networks are a particular type of recurrent neural network (RNN) specifically designed to address the long-term dependency issues encountered by traditional RNNs when processing long sequence data.LSTM was proposed by Jürgen Schmidhuber [12] and optimized by Alex Graves et al. in 1999.Through subsequent improvements and optimizations by various researchers, LSTM has become one of the mainstream methods for handling sequential data tasks, such as language modeling and text generation, speech recognition, and machine translation.
Bidirectional Long Short-Term Memory (BiLSTM) networks are an extension of LSTM that better capture the dependencies in sequence data in both forward and backward directions [13].BiLSTM was introduced by Schuster and Paliwal [14] in 1997.In a standard LSTM, the state transmission of each unit is unidirectional from front to back, meaning the output at the next moment can only be predicted based on the temporal information of the previous moment.Therefore, LSTM models can only learn the features of risk characterization parameters from past moments and cannot learn features from future moments.
BiLSTM, on the other hand, has two state transmission paths: one from front to back and the other from back to front.By placing two independent LSTM layers in parallel on the same output, one processing forward sequence information and the other processing backward sequence information, and then merging the information from both directions, the network can simultaneously consider contextual information from both past and future moments.BiLSTM outperforms unidirectional LSTM and other traditional RNN variants in many tasks because it can capture more contextual information.
Similar to h t for the network state of the backward LSTM hidden layer at moment t, the calculation formula is shown as Equation (1).
where h t is the state of the backward LSTM network's hidden layer at moment t; x t is the input at moment t; h (t−1) is the state of the backward LSTM network's hidden layer at moment t − 1.The output of the BiLSTM network is the combination of two parts of hidden layer states h t and h t , thus forming the overall network's hidden state h t .

Autoencoders (AE)
Autoencoders (AE) are an important deep learning architecture in the field of unsupervised learning [15].They can automatically learn effective features from a large amount of unlabeled data.The effective compression and reconstruction of data are mainly achieved through two parts: the encoder and decoder.The main structure is shown in Figure 1.This structure can not only learn low-dimensional feature representations from high-dimensional data to achieve data dimensionality reduction and feature learning but also improve the model's resistance to noise by denoising the self-encoder.This improvement enhances the efficiency of data processing and analysis, providing a highly efficient tool for tasks such as data dimensionality reduction, feature extraction, denoising, and more.The auto-encoder consists of two functions, the encoder and the decoder, denoted as ϕ and ψ , respectively.The goal of the auto-encoder is to approximate the input data by minimizing a constant function as shown in Equation ( 2 In the encoding phase, the autoencoder maps the input vector X to the intermediate vector V through Equation ( 3): W is the weight matrix, b is the bias vector, and σ is the activation function of the coding layer.
In the decoding stage, the autoencoder maps the intermediate vector V to the output vector X by Equation ( 4), here m > n.Then, the reconstruction error between the reconstructed vector X and the original input vector X is calculated using Equation (5).
where Ŵ is the weight matrix of the decoder, b is the bias vector, and σ is the activation function of the decoding layer.The goal of the autoencoder is to minimize the reconstruction error.Error minimization is achieved by adjusting the weight matrices W and Ŵ and the bias vectors b and b, by means of gradient descent and backpropagation algorithms.

Data Cleaning
This study utilizes time series data collected from 13 wells in real drilling sites.The data collection interval is 5 s, comprising over 300,000 normal state data points and more than 5000 lost circulation state data points.Upon reviewing the raw data, it was found that there were anomalies and missing values.Anomalies can distort data trends, and missing values can lead to incorrect data formats, both of which can affect the normal construction and training of intelligent models.An improved sliding window 3σ method was used to filter out anomalies.By setting an appropriate sliding window size, this method avoids misjudging normal data as anomalies.The anomalies in the data were replaced with missing values, and spline interpolation was used to fill both the originally missing values and those generated by anomaly replacement.This method preserves the overall trend of the data and performs well when there are continuous missing data.
To further reduce noise and improve data quality, a smoothing filter was applied to denoise the time series data.Each smoothed point is the average of the current point and the previous 99 data points (a total of 100 data points).The aim is to reduce random fluctuations in the data, facilitating subsequent calculations, analysis, and visualization.Additionally, differences in dimensions between different parameters can affect data analysis and model training effectiveness.Therefore, the min-max normalization method was used to process the data dimensionlessly, which helps shorten the model training time and reduce overfitting [16].The relevant calculation formula is as follows (6): where x is the data sample, x scale is the normalized value of x, x max and x min are the maximum and minimum values of the variable x.

Feature Selection
To reduce irrelevant parameter inputs and computational redundancy, it is necessary to optimize the input features of the model.This study employs the Spearman correlation coefficient method [17] combined with the knowledge of lost circulation identification mechanisms for feature selection.Spearman's correlation coefficient can be an effective measure of non-linear relationships between variables because it is based on rankings rather than specific values and requires less data distribution.Multiple parameter relationships in lost circulation monitoring are usually non-linear, so the Spearman correlation coefficient can more accurately reflect such complex relationships, thus optimizing model input features and improving model accuracy and robustness.First, the Spearman correlation coefficient method is used to calculate the correlation between each parameter and the lost circulation labels, optimizing feature parameters from a data perspective.A Spearman correlation coefficient greater than 0 indicates a positive correlation, while a Spearman correlation coefficient less than 0 indicates a negative correlation.The closer the value is to 1 or −1, the stronger the correlation.A Spearman correlation coefficient equal to 0 indicates no correlation between the two variables.The calculation formula is shown as (7): where r s is the correlation coefficient between the two variables, n is the number of data points in the variables, and d i is the rank difference between the data points in the two variables.The absolute value of the Spearman correlation coefficient is taken; the larger this value, the stronger the correlation.The correlation between the feature parameters and the lost circulation labels is shown in Figure 2. As can be seen from the figure, parameters such as standpipe pressure, total pit volume, outlet flow rate, weight on bit, and torque have a strong correlation with the lost circulation labels.
Considering that the core of accurately identifying lost circulation is monitoring the trends in sequence variation of parameters, this study analyzes the dynamic changes in parameters in multiple typical lost circulation cases [18].Through data visualization, as shown in Figure 3, it is found that in different lost circulation cases, three parameters outlet flow rate, total pit volume, and standpipe pressure exhibit a more noticeable downward trend and form a general pattern of change.In contrast, the variation patterns in parameters such as weight on bit and torque are more influenced by the specific lost circulation case itself.When experts at the drilling site assess lost circulation risks, they often analyze changes in outlet flow rate, total pit volume, and standpipe pressure.When lost circulation occurs, drilling fluid flows from the wellbore into the formation through the lost circulation path, causing a decrease in the return speed of the drilling fluid in the annulus.This decrease reduces annular friction and consequently lowers the standpipe pressure [19].Additionally, if there are no significant changes in the inlet flow rate, the outlet flow rate decreases.Finally, as some drilling fluid flows into the formation, the volume of drilling fluid returning to the surface decreases, resulting in a reduction in the total pit volume [20].
Therefore, considering both the correlation of parameters and the mechanistic knowledge, this study uses total pit volume (TPIT), outlet flow rate (MFO), and standpipe pressure (SPP) as the input parameters for the subsequent intelligent model.

Time Series Sample Construction
In lost circulation monitoring, there is much more normal state data than lost circulation state data.This data imbalance limits the training of supervised learning algorithms.This is because they rely on a large number of well-labeled positive/negative labeled samples.Unsupervised learning algorithms require only normal state data as training data and are able to overcome the data imbalance.
To meet the input requirements of the algorithm model and conform to the identification mechanism of lost circulation risks, the model input must be a continuous data sequence over a period of time rather than independent data samples at single time points.This study uses the sliding window method to construct fixed-length time series samples from the normal state data in the time series dataset.It is constructed as shown in Figure 4.In this paper, the sliding step of the window is set to 1, moving one time step at a time.The time step of the window is set to 30, and the data collection frequency is 5 s.The final time span of each time-series sample is 150 s.For each window, data from 30 consecutive time points starting from the current time step are extracted to form a time-series sample.This setup ensures that each sample contains enough information to capture the trend of the parameters over time.

Evaluation Metrics
An important part of applying machine learning methods to solve engineering problems is evaluating the performance of the algorithm models.This study uses four evaluation metrics to assess the performance of different models in identifying lost circulation risks: accuracy, recall, missing alarm rate, and false alarm rate [21].The calculation formula for each evaluation metric is shown in Table 1 and Formulas ( 8)- (11).TP is the number of samples that predict the true positive class as a positive class; FN is the number of samples that predict the true positive class as a negative class; FP is the number of samples that predict the true negative class as a positive class; TN is the number of samples to predict the true negative class as a negative class.
Accuracy: Equation ( 8) is the ratio of the number of correctly predicted samples inside the classification model to the overall number of predicted samples.
Recall: Equation ( 9) is the ratio of predicted positive samples to actual positive samples.
MissingAlarm: Equation ( 10) is the proportion of actual lost circulation samples that are predicted to be non-lost circulation samples in the classification model, i.e., the proportion of lost circulation samples that are missed.
FalseAlarm: Equation ( 11) is the proportion of samples that are actually non-lost circulation but are predicted as lost circulation by the classification model.In other words, it is the ratio of samples that are predicted to be lost circulation but are actually non-lost circulation within the total samples predicted as lost circulation.

Model Structure
The structure of the proposed BiLSTM-AE model is shown in Figure 5.The model structure consists of two parts: the encoder part and the decoder part.The BiLSTM-AE model uses BiLSTM networks as both the encoder and the decoder.
First, the input layer receives the time series data, which then passes through an encoder that contains a BiLSTM layer.Next, a fully connected layer maps the encoded vectors to a low-dimensional encoding.The encoded data are then expanded through a repeat vector layer to match the input requirements of the decoder.
The decoder consists of two BiLSTM layers, which aim to reconstruct the original input sequence into a high-dimensional space, thus completing the process of data encoding compression and decoding reconstruction.

Parameter Settings
In the proposed BiLSTM-AE model, there are a total of five layers in addition to the input and output layers.The input layer consists of three neurons.The first BiLSTM layer processes each time step with 32 units and returns the full sequence, outputting a 32-dimensional vector for each time step.The second LSTM layer processes the sequence with four units and returns only the output of the last time step.The third Repeat Vector layer repeats the output of the second layer for the entire sequence length, making its dimension match the sequence length.The fourth BiLSTM layer processes the input from the third layer and returns an eight-dimensional vector sequence.The fifth layer is an LSTM layer that processes the output from the fourth layer and returns a 16-dimensional vector sequence.
The output layer is a TimeDistributed Dense Layer that applies a Dense layer at each time step of the fifth layer, outputting vectors that match the number of input features.The loss function used during the training process is the mean absolute error (MAE), which is the average of the absolute differences between the predicted values and the true values.The formula is shown as (12): 1.For each sample i, calculate the absolute error between the true value y i and the predicted value ȳi , that is |y i − ȳi |.
2. Sum the absolute errors for all samples.
3. Divide the total error by the total number of n samples to obtain the mean absolute error.
The Adam optimizer was used with a learning rate of 0.001, 50 epochs, and a batch size of 32.The specific model parameters are shown in Table 2 .

Parameter Analysis
The time step length of the time series samples is crucial for the model's performance.An appropriate time step length can improve model accuracy and reduce the risk of overfitting or underfitting.This study conducted comparative experiments with different time step lengths, and the experimental results are shown in Figure 7.When the time step length is set to 30, the model achieves the highest accuracy and recall on the test set, with values of 92% and 93%, respectively.When using smaller or larger time step lengths, the model's performance decreases to varying degrees.This indicates that the overall model performance is optimal with a time step length of 30, resulting in the fewest missed alarms and false alarms.The selection of the reconstruction error threshold in the unsupervised model determines the final performance, requiring a balance between reducing false alarms and avoiding missed alarms.After repeated experiments and validation, this study selected a reconstruction error threshold of 1.041.The model's performance with different reconstruction error thresholds is shown in Figure 8.The figure shows that as the threshold is adjusted, the model's performance changes significantly.As the threshold increases, the model's accuracy generally increases, false alarm rates decrease, and missed alarm rates increase, indicating that raising the reconstruction error threshold helps reduce false alarms but makes the model prone to missed alarms.At a threshold of 1.041, the model achieves the highest accuracy of 92.51%.Under this threshold, the false alarm rate is 7.71%, and the missed alarm rate is 6.87%, indicating that this threshold balances the ratio of missed alarms and false alarms while ensuring high accuracy, demonstrating good overall performance.

Algorithm Performance Comparison
To further evaluate the performance of the BiLSTM-AE model, this paper constructs five comparison models: Random Forest, Xgboost, LSTM, BiLSTM, and LSTM-AE.These five models are classic intelligent models in the field of lost circulation and have demonstrated strong and good performance in lost circulation monitoring.Random Forest and Xgboost are non-sequential models, while LSTM, BiLSTM, and LSTM-AE are sequential models.The confusion matrices of different algorithm models on the test set are shown in Figure 9.The comparison results of algorithm performance are shown in Figure 10.From the figure, it can be seen that sequential models perform better overall than non-sequential models, with higher overall accuracy.Among the sequential models, the unsupervised models LSTM-AE and BiLSTM-AE perform better than the supervised models, with superior accuracy, recall, missing alarm rate, and false alarm rate.The BiLSTM-AE model performs the best, with an accuracy of 92.51%, a missing alarm rate of 6.87%, a false alarm rate of 7.71%, and a recall rate of 93.13%.The comparison of these models shows that the BiLSTM-AE model demonstrates superior performance in lost circulation risk monitoring, highlighting the rationality and applicability of unsupervised algorithms and the consideration of data temporal fluctuation characteristics in the field of risk monitoring.

Case Analysis
A typical lost circulation case from an oil field was selected for analysis to visually demonstrate the effectiveness of the BiLSTM-AE model in lost circulation monitoring.The lost circulation monitoring results of different models are shown in Figure 11.The drilling site reported the occurrence of lost circulation after the parameters changed drastically.The alarm time of the LSTM model was slightly earlier than the reported time, while the BiLSTM-AE model issued an alarm at the early stage of the slow parameter decline.The alarm time of the BiLSTM-AE model was close to the corrected actual occurrence time of lost circulation and was about 2 min earlier than the reported time.This highlights the rationality and timeliness of the proposed model, which is significant for timely plugging and reducing drilling fluid leakage.

Conclusions
In this paper, a lost circulation monitoring model based on an unsupervised time series autoencoder (BiLSTM-AE) is proposed.This model comprehensively extracts contextual information from time series data.The experimental results show that the model achieves an accuracy of 92.51% on the test set, demonstrating its high accuracy and reliability in practical applications.The relevant patterns and insights obtained during the research can be summarized as follows: 1. Lost circulation exhibits significant temporal evolution characteristics.Compared to non-sequential algorithms, sequential algorithms that consider the temporal variation of sequence data perform better in lost circulation monitoring and diagnosis.
2. Existing data issues severely limit the effectiveness of supervised intelligent algorithms.The proposed unsupervised model effectively overcomes data limitations.One important advantage is that the training set of the model consists of normal drilling time series data.Therefore, the model can detect anomalies by identifying deviations from normal patterns, showing flexibility and adaptability in handling lost circulation risks.
The data used in this study primarily come from normal drilling processes, with relatively fewer well leakage data.Although unsupervised models can partially overcome this issue, the diversity and representativeness of the data remain essential factors affecting the model's generalization ability.In future research, we will further improve and optimize the model to meet the lost circulation monitoring needs under different complex working conditions.

Figure 2 .
Figure 2. Feature Selection Using Spearman Correlation Coefficient Method.

Figure 3 .
Figure 3. Key Parameter Trends After Lost Circulation.

Figure 4 .
Figure 4. Constructing Sample Methods Using a Sliding Window.

Figure 6
Figure 6 shows the change in the loss value of a training model over epochs.During the first 10 epochs, the training loss decreases rapidly, indicating that the model learns a substantial amount of information during this period.As the epochs increase, the training loss converges to 0.0188 and gradually stabilizes, showing no significant signs of overfitting or underfitting.

Figure 7 .
Figure 7. Model Performance and Time Step Length.

Figure 8 .
Figure 8. Model Performance and Threshold Selection.

Figure 11 .
Figure 11.Model Prediction and Lost Circulation Labels.

Table 1 .
True Result vs. Forecast Result.