2.1. Data Preprocessing
The data utilized in this study consists of comprehensive well logging records collected from a representative drilling block, covering 45 candidate monitoring parameters. The dataset comprises records from 10 wells, with a total of 42,814 time-series samples. Among these, 829 samples are labeled as gas kick events, while 41,985 samples correspond to normal drilling conditions, resulting in a highly imbalanced dataset with approximately 1.94% positive samples.
To evaluate model generalization under realistic conditions, the dataset was divided at the well level, with data from 7 wells used for training and the remaining 3 wells reserved for testing. This well-based split strategy prevents data leakage across wells and better reflects real-world deployment scenarios. To address the class imbalance issue during model training, a down-sampling strategy was applied to the training set, where an equal number of normal samples were randomly selected to match the number of gas kick samples, while the test set retained the original data distribution.
To ensure data reliability and modeling robustness, a systematic data preprocessing pipeline was established, including outlier detection, data cleaning and imputation, feature correlation analysis, and gas kick labeling. Specifically, abnormal values were first identified and removed using statistical criteria, followed by feature-level preprocessing to mitigate noise and missing information. Subsequently, correlation-based analysis combined with drilling engineering knowledge was employed to select key monitoring parameters.
Finally, gas kick labels were determined based on multiple sources of operational evidence to ensure reliability. Confirmed gas kick intervals were identified through the integration of drilling operation logs, field engineer reports, and mud logging records that documented abnormal hydrocarbon readings. These operational records were further verified using engineering diagnostic indicators, including unexplained increases in outlet flow rate, pit volume gain, and decreases in standpipe pressure. Only intervals consistently confirmed across these cross-validated sources were labeled as gas kick events. This strategy ensures that the dataset reflects real operational occurrences of gas kicks rather than purely threshold-based automatic labeling. For the purpose of time-series modeling, the identified gas kick intervals were mapped onto the corresponding time stamps of the drilling sensor data, and samples within these intervals were labeled as kick events, while the remaining samples were labeled as normal drilling states.
The overall data preprocessing workflow is illustrated in
Figure 2.
The box plot method, also known as Tukey’s boxplot, is a widely used statistical tool for detecting outliers in univariate data. It visualizes the data distribution through five key summary statistics: the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. Outliers are typically defined as data points that fall outside the range of 1.5 times the interquartile range (IQR) from either the upper or lower quartile. Specifically, any value less than Q1 − 1.5 × IQR or greater than Q3 + 1.5 × IQR is considered an outlier. This method provides an intuitive and efficient approach for identifying extreme deviations in the dataset, thereby ensuring data quality and reliability for subsequent modeling and analysis.
Correlation analysis allows us to determine the degree, direction, and strength of linear relationships between variables. It helps us understand the relationships between variables, predict trends in variable changes, select appropriate variables, assess interactions between variables, and has significant theoretical and practical importance. Feature selection involves reducing the dimensionality of input data by selecting the most useful features from the original data, thereby eliminating redundant and irrelevant features. Dimensionality reduction can lower the computational complexity and storage requirements of algorithms while simultaneously improving algorithm training speed and prediction accuracy.
This article conducted a correlation analysis of the data to identify the relationships between feature variables. Pearson correlation coefficient was employed for this purpose. The coefficient’s values range from −1 to +1, with 0 indicating no correlation between two variables. A positive value indicates a positive correlation, while a negative value indicates a negative correlation. The magnitude of the coefficient represents the strength of the correlation [
13]. The formula for calculating correlation is shown as Equation (1).
To provide a more intuitive understanding of the relationships among variables, a Pearson correlation heatmap of all numerical features is presented in
Figure 3. The heatmap visually illustrates the pairwise correlations, enabling identification of strongly correlated parameter groups.
As shown in
Figure 3, strong correlations can be observed among flow-related parameters (e.g., inlet and outlet flow rates), pressure-related variables (e.g., standpipe pressure and casing pressure), and pit volume measurements. In addition, hydrocarbon-related features (e.g., methane, ethane, and total hydrocarbon content) exhibit high inter-correlations, reflecting their shared physical origin during gas influx events.
In this study, four key parameters were selected for gas kick risk monitoring based on both correlation analysis results and domain knowledge: outlet flow rate, total pool volume, standpipe pressure (SPP), and total hydrocarbon content. These selections were not only guided by correlation analysis but also grounded in a detailed understanding of downhole physical mechanisms and operational experiences from drilling sites. Specifically, standpipe pressure serves as a critical indicator of pressure dynamics within the wellbore. During a gas kick event, the static fluid column is disrupted due to uncontrolled fluid influx, resulting in a decrease in bottom hole pressure (BHP), which is often accompanied by a noticeable decline in SPP. The outlet flow rate is another crucial diagnostic feature, as an unexplained increase—without a corresponding rise in the inlet flow—may signify formation fluid influx, thereby providing an early warning of a potential kick. Similarly, the total pool volume reflects surface fluid storage changes. As the influx progresses, whether from oil, water, or gas, the increase in return volume leads to a measurable rise in surface tank volume, especially during gas expansion. Lastly, the total hydrocarbon content directly captures the composition of returning fluids. During a gas invasion, gas entrained in the returning mud causes a spike in total hydrocarbon readings at the surface, which serves as an immediate and reliable signal of gas influx.
Collectively, these parameters provide complementary perspectives on wellbore stability, enabling timely and accurate identification of gas kick risks. Furthermore, the observed interdependencies among these variables provide additional justification for the use of graph-based models, which are capable of explicitly capturing such relational structures.
2.2. Graph-Based Monitoring Framework
Drilling operations are governed by strongly coupled hydraulic and thermodynamic processes, where multiple surface and downhole parameters interact dynamically. Gas kick events rarely manifest as isolated deviations in a single signal; instead, they emerge through subtle disturbance propagation across multiple correlated parameters. Conventional data-driven models generally treat input variables independently, which limits their capability to capture interaction-driven anomaly evolution. To address this limitation, this study constructs a graph-based monitoring network that explicitly models inter-parameter dependencies and their deviations under gas influx conditions.
Inspired by the idea of learning latent dependency structures among heterogeneous sensors, the proposed framework represents drilling parameters as nodes in a graph and models their relationships as learnable edges. Let the multivariate time-series input be denoted as X ∈ ℝ ^ (T × N), where T is the sliding window length and N is the number of selected parameters. Each parameter corresponds to a graph node whose feature representation is constructed from its temporal sequence within the window. This design enables the network to capture temporal patterns preceding gas kick events while preserving parameter-level structural relationships.
Unlike conventional graph neural networks that rely on predefined adjacency matrices, the interaction structure in this work is learned dynamically through an attention mechanism. This is particularly suitable for drilling environments where parameter dependencies vary with formation characteristics, operational conditions, and drilling stages. The adaptive graph therefore reflects data-driven interaction patterns rather than static assumptions derived solely from prior physical knowledge.
Information propagation across the graph is achieved through attention-based message passing. For each target node, representations from neighboring nodes are aggregated according to learned attention coefficients that quantify interaction importance. Specifically, the representation of node i at layer l + 1 is obtained by applying a nonlinear transformation to the weighted combination of neighboring node representations at layer l, where the weights are normalized attention scores derived from node embeddings. This mechanism allows the network to capture heterogeneous coupling relationships among drilling parameters and dynamically adjust interaction strengths across operating conditions.
To tailor the model for gas kick monitoring, several task-oriented adaptations are introduced. First, temporal sliding-window representations are incorporated at the node level to emphasize gradual deviation evolution rather than instantaneous fluctuations. Second, multiple graph neural layers are stacked to capture higher-order dependency propagation, reflecting cascading disturbance transmission among flow rate, pressure, density, and pit volume signals during gas influx development. Third, learned node embeddings are projected into a compact latent space and processed through fully connected layers to produce gas kick risk predictions. By modeling deviations from learned interaction patterns, the framework effectively distinguishes early-stage gas kick behavior from normal drilling variability.
An additional advantage of the proposed graph attention mechanism lies in its interpretability potential. The learned attention coefficients indicate the relative influence among parameters during prediction, providing insights into dominant interaction pathways associated with abnormal events. This interaction-level interpretability complements feature attribution analysis conducted separately using SHAP, enabling multi-perspective understanding of model decision behavior.
Figure 4 illustrates the overall modeling framework, including parameter-to-node mapping, adaptive graph learning, attention-based information propagation, and prediction generation.
Drilling time-series measurements are encoded as node features through sliding-window processing and mapped into a learnable dependency graph where parameters are treated as interconnected nodes. Attention-driven graph propagation captures heterogeneous interactions and deviation patterns among parameters, enabling robust representation learning under varying operational conditions. The resulting embeddings are processed by fully connected layers to produce risk predictions, while the attention weights offer insights into influential parameter interactions, supporting interpretability analysis.
2.3. Representation Enhancement
2.3.1. Generative Adversarial Network
Generative Adversarial Networks (GANs), first proposed by Goodfellow in 2014, are a class of neural network models based on adversarial learning mechanisms [
14]. They are widely used for data generation tasks, including images, speech, and text. As shown in
Figure 5, a GAN consists of two core components: a generator and a discriminator. The generator G maps random noise into synthetic samples, while the discriminator D evaluates whether the input data are real or generated, and provides feedback to guide the training of G.
Through this adversarial interaction, the generator progressively improves its ability to approximate the underlying data distribution, whereas the discriminator simultaneously enhances its capability to distinguish real samples from generated ones. With iterative training, this competitive process drives the generated data to become increasingly similar to real data, enabling high-quality sample generation [
15].
To address common challenges in GAN training, such as instability, gradient explosion, and convergence issues, a variety of improved variants have been developed, including DCGAN [
16], Wasserstein-GAN [
17], and LSGAN [
18].
In this study, the Conditional Tabular GAN (CTGAN) proposed by Xu et al. [
19] is adopted as the data generation method. CTGAN is specifically developed for tabular data synthesis under conditional constraints. While it retains the generator–discriminator framework of conventional GANs, it incorporates conditional information to guide the data generation process. Owing to its flexibility, controllability, and capability of producing high-quality samples, CTGAN is well suited for handling diverse tabular data generation scenarios.
The generator in CTGAN is implemented as a feedforward neural network composed of multiple fully connected layers, where conditional normalization is applied to incorporate the influence of given conditions into intermediate feature representations. The output layer adopts a sigmoid activation to constrain generated values within a valid range, thereby producing samples consistent with the underlying data distribution. In this framework, the generator learns to map latent noise vectors, together with conditional inputs, into structured tabular samples that satisfy specified constraints.
The discriminator is designed as a binary classification network, also constructed with stacked fully connected layers. LeakyReLU activation functions are employed to improve nonlinearity and training stability. Its primary function is to distinguish real samples from those synthesized by the generator. In addition, residual connections are introduced to facilitate gradient propagation and accelerate convergence.
From an architectural perspective, CTGAN is parameterized by a generator–discriminator pair tailored for tabular data synthesis. The generator leverages residual-style blocks with fully connected transformations and normalization to enhance feature representation capability. In contrast, the discriminator integrates linear layers, LeakyReLU activations, and dropout regularization to mitigate overfitting. Furthermore, a gradient penalty (GP) strategy is incorporated to stabilize training. This is achieved by interpolating between real and generated samples, computing gradients on these interpolations, and enforcing a Lipschitz constraint through a penalty term.
In this study, the conditional variable corresponds to the gas kick label (i.e., normal vs. gas kick), which guides the generation of class-specific samples. The learning rate for both the generator and discriminator is set at 0.00001, with 500 training iterations, which can be adjusted through experimentation.
To avoid potential information leakage, the dataset was first divided into training and testing sets before any data augmentation was performed. The CTGAN model was trained exclusively using the training data, and all synthetic samples were generated solely from the training set distribution. These generated samples were used only to augment the training dataset. The test set remained completely unseen during the augmentation process, ensuring a fair and unbiased evaluation.
Data visualizations for the original and generated data of total pool volume under drilling conditions and the total hydrocarbon parameter when drilling gas kick risks occur are shown in
Figure 6. The generated samples were qualitatively compared with real data distributions, showing consistent trends in key parameters, indicating that the synthetic data reasonably preserves the underlying data characteristics.
Time series data related to risks is often limited by a small dataset size, which can lead to model overfitting or an inability to capture complex data patterns. CTGAN can generate synthetic time series data, increasing the dataset size and aiding the model in better understanding the data’s distribution and patterns. By generating a substantial amount of synthetic data, CTGAN can expand small-sample time series datasets, improving the model’s transfer and generalization abilities while effectively reducing errors on the test dataset.
2.3.2. Shapelet Transformation
Shapelet transformation is a powerful and interpretable feature extraction method specifically designed for time series data. Unlike traditional feature engineering, which often relies on global statistics or domain knowledge, shapelet transformation focuses on identifying discriminative subsequences, known as shapelets, that are most representative of differences between classes or states in the data [
20,
21]. These shapelets capture localized temporal patterns that are critical for classification, anomaly detection, and pattern recognition tasks.
The core idea is to transform raw time series into a feature space where each feature represents the minimum distance between a time series instance and a specific shapelet. This enables traditional machine learning classifiers (e.g., decision trees, SVM, random forests) to operate effectively on time series data.
The transformation process involves the following steps:
Shapelet Discovery: The algorithm initially performs an exhaustive or heuristic scan across all possible subsequences of the time series to identify candidate shapelets. A scoring function—such as information gain, F-statistic, or accuracy improvement—is used to evaluate the discriminative power of each candidate. The top-k shapelets are selected and stored based on their performance.
Shapelet Selection: Since storing all candidate shapelets may introduce redundancy and increase computational burden, a cross-validation strategy is applied to determine the optimal subset of shapelets to be used in the transformed dataset. This ensures both efficiency and generalization of the resulting features.
Feature Transformation: Each time series in the dataset is transformed into a new feature vector, where each dimension corresponds to the shortest distance between the series and one of the selected shapelets. As a result, the original sequential data is projected into a static and fixed-length feature space.
The major advantage of this transformation is that it decouples the process of shapelet learning from model training, offering flexibility to use any off-the-shelf classifier. Moreover, the resulting shapelet-based features are often highly interpretable, as they highlight meaningful time series segments that are strongly associated with specific outcomes or labels.
As illustrated in
Figure 7, two example shapelets extracted from a specific dataset demonstrate how localized time series segments can capture critical discriminative features. These shapelets are not only informative for classification tasks but also help reveal underlying physical or behavioral patterns embedded in the temporal data.
2.4. Interpretability Analysis
Accurate prediction alone is insufficient for intelligent monitoring systems deployed in safety-critical drilling operations. Engineers must understand the reasoning behind model outputs in order to validate alarms against physical mechanisms and build operational trust. Gas kick detection involves complex multi-parameter interactions, and purely black-box predictions hinder diagnosis, model validation, and adoption in field environments. Therefore, interpretability is incorporated as an integral component of the proposed framework.
To achieve transparent decision inspection, a multi-level interpretability strategy is developed that analyzes both individual feature contributions and structural interaction patterns. Feature-level attribution is conducted using SHAP, quantifying how each parameter influences prediction outcomes. Structural-level interpretation is obtained by examining attention weights learned within the graph neural network, revealing interaction pathways that govern anomaly propagation. A unified visualization summarizing both perspectives is presented in
Figure 8, which should be placed immediately after this paragraph.
As illustrated in
Figure 8, the interpretability module links model predictions to both measurable physical indicators and learned parameter dependencies, enabling engineers to evaluate decision consistency from complementary viewpoints.
The left panel presents SHAP-based feature attribution, illustrating the global contribution distribution of drilling parameters influencing gas kick prediction. The right panel visualizes attention-derived interaction dependencies learned by the graph neural model, highlighting dominant parameter coupling pathways during anomaly characterization. Together, these perspectives provide complementary transparency at both feature and structural levels, supporting engineering validation and trustworthy deployment.
2.4.1. Feature Contribution Analysis Using SHAP
SHapley Additive exPlanations (SHAP) are employed to quantify the influence of individual drilling parameters on gas kick prediction. Rooted in cooperative game theory, SHAP evaluates feature contributions by computing their marginal effect on the model output across all possible feature coalitions. This framework provides consistent and theoretically grounded attribution while allowing direct comparison across heterogeneous measurements.
In this study, SHAP values are calculated for the trained monitoring model to perform both global and local interpretability analysis. Global attribution identifies parameters that most strongly affect prediction behavior across the dataset, enabling comparison with engineering knowledge regarding pressure dynamics, flow imbalance, fluid return variation, and hydrocarbon indicators. Local attribution examines individual alarm events, highlighting which features drive model responses in specific operational scenarios.
As shown in the left panel of
Figure 8, the SHAP summary visualization reveals both the magnitude and direction of feature contributions. This representation enables validation of whether model attention aligns with physically meaningful indicators and assists in diagnosing potential bias or over-reliance on spurious correlations. Through feature-level attribution, the proposed framework improves transparency and strengthens confidence in automated monitoring outputs.
2.4.2. Interaction Interpretability via Graph Attention Mechanisms
While feature attribution reveals individual parameter effects, gas kick evolution is inherently governed by interactions among hydraulic, mechanical, and thermodynamic variables. To capture interpretability at the system level, the proposed graph neural model leverages attention mechanisms that assign adaptive importance weights to edges representing parameter dependencies.
During training, attention coefficients regulate message passing among nodes and reflect the relative influence of neighboring parameters on representation updates. Analysis of these learned coefficients enables identification of dominant interaction pathways that contribute to anomaly detection. For example, pressure–flow coupling or fluid–gas response relationships can be revealed through elevated attention intensity.
The right panel of
Figure 8 visualizes the learned attention distribution as an interaction heatmap, illustrating structural dependencies discovered by the model. Unlike post hoc explanations, this interpretability is embedded within the learning process and reflects the internal reasoning of the graph representation itself. Consequently, attention-based inspection provides insights into disturbance propagation patterns and system-level deviation formation that precede observable anomalies.
By integrating SHAP attribution and attention-based structural inspection, the proposed framework establishes multi-granular interpretability across feature and interaction domains. This dual-perspective transparency facilitates engineering validation, supports trustworthy deployment, and enhances the practical applicability of graph-based gas kick monitoring in real-world drilling operations.