Remaining Useful Life Prediction and Operation Optimization of Offshore Electric Submersible Pump Systems Using a Dual-Stage Attention-Based Recurrent Neural Network

Lu, Xin; Han, Guoqing; Liu, Bin; Shangguan, Yangnan; Liang, Xingyuan

doi:10.3390/jmse14010075

Open AccessArticle

Remaining Useful Life Prediction and Operation Optimization of Offshore Electric Submersible Pump Systems Using a Dual-Stage Attention-Based Recurrent Neural Network

by

Xin Lu

^1,*

,

Guoqing Han

¹

,

Bin Liu

²,

Yangnan Shangguan

³ and

Xingyuan Liang

¹

School of Petroleum Engineering, China University of Petroleum-Beijing, Beijing 102249, China

²

Research Institute of Petroleum Production, Petro China Jidong Oilfield Company, Tangshan 063000, China

³

Research Institute of Exploration and Development, Petro China Changqing Oilfield Company, Xi’an 710018, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(1), 75; https://doi.org/10.3390/jmse14010075

Submission received: 2 December 2025 / Revised: 26 December 2025 / Accepted: 27 December 2025 / Published: 30 December 2025

(This article belongs to the Special Issue Advances in Offshore Oil and Gas Exploration and Development)

Download

Browse Figures

Versions Notes

Abstract

Electric Submersible Pumps (ESPs) serve as the primary artificial lift technology in offshore oilfields and play a crucial role in ensuring stable and efficient marine oil and gas production. However, the harsh offshore operating environment—characterized by high temperature, complex multiphase flow, and frequent load fluctuations—makes ESPs highly susceptible to accelerated degradation and unexpected failure. To enhance the operational reliability and efficiency of offshore production systems, this study develops a Remaining Useful Life (RUL) prediction method for offshore ESP systems using a Dual-Stage Attention-Based Recurrent Neural Network (DA-RNN). The model integrates an input-attention mechanism to identify degradation-relevant offshore operating variables and a temporal-attention mechanism to capture long-term deterioration patterns in real marine production data. Using field data from a representative offshore oilfield in the Bohai Sea, the proposed method achieves an average prediction error of less than 28 days, demonstrating strong robustness under complex offshore conditions. Beyond prediction, an RUL-driven operation optimization strategy is formulated to guide controllable parameters—such as pump frequency and nozzle size—toward extending ESP lifespan and improving offshore production stability. The results show that combining predictive maintenance with operational optimization provides a practical and data-driven pathway for improving the safety, efficiency, and sustainability of offshore oil and gas development. This work aligns closely with the goals of marine resource development and offers a valuable engineering perspective for advancing offshore oilfield operations.

Keywords:

electric submersible pump; remaining useful life; DA-RNN; predictive maintenance; operation optimization

1. Introduction

Electric Submersible Pump (ESP) systems are among the most widely used artificial lift technologies in offshore oilfields [1]. By deploying a multistage centrifugal pump and a downhole motor into the well and delivering electrical power from the surface, ESPs provide high production rates and a high degree of automation. Globally, 15–20% of producing wells rely on ESPs [2]; these wells account for nearly 60% [3] of world oil production. However, continuous operation in environments characterized by high temperature, mechanical loading, and complex multiphase flow conditions makes ESPs highly susceptible to degradation and failure. Their operation and maintenance represent approximately 43% of global artificial-lift expenditure [4]; any failure may result in substantial workover costs and prolonged production shutdowns [5,6].

Remaining Useful Life (RUL) prediction for complex industrial equipment has been extensively studied, with approaches largely divided into physics-based models and data-driven models. Physics-based models describe degradation mechanisms through explicit mathematical formulations, such as crack-growth laws [7,8,9], small-crack fatigue theories [10,11], and energy-based fatigue life models [12,13], offering strong interpretability. However, for ESP systems, which feature multistage structures, nonlinear couplings, and diverse operating conditions, the construction of accurate mechanistic models remains extremely challenging. Data-driven methods have gained momentum with the advancement of digital oilfield technologies, enabling degradation behaviors to be learned directly from large-scale monitoring data without the need for detailed prior knowledge [14,15,16,17]. Nonetheless, these approaches still face key limitations, including weak adaptive feature selection, insufficient modeling of long-term temporal dependencies, limited generalization across varying operating conditions, and unstable uncertainty representation.

More critically, most existing studies focus on predicting the remaining life of equipment, while research on how to utilize these predictions to guide operational adjustments, mitigate degradation, and extend system lifespan remains scarce. In practice, field operations require not only an estimate of how long an ESP can continue running, but also actionable insights into how the system should be operated to make it run longer. Current studies in the literature provide limited guidance on how controllable parameters—such as operating frequency or choke settings—can be systematically optimized to slow degradation and enhance equipment longevity. This gap highlights an urgent need for frameworks that integrate lifespan prediction with lifespan-oriented operational decision-making.

The Dual-Stage Attention-based Recurrent Neural Network (DA-RNN) provides a promising pathway to address these challenges. Its input attention mechanism adaptively identifies the most degradation-relevant features, while its temporal attention mechanism highlights critical moments in the evolution of system degradation. This architecture not only improves prediction accuracy for multivariate time-series data but also offers interpretable insights into how different operating variables contribute to degradation progression. Such interpretability forms an essential foundation for translating predictive information into operational strategies, enabling the transition from pure lifespan prediction to actionable lifespan optimization. The DA-RNN provides a promising pathway to address these challenges. The DA-RNN was originally proposed for time series prediction, combining an input attention mechanism that adaptively identifies the most relevant driving series with a temporal attention mechanism that highlights critical hidden states over time [18]. Building on the encoder–decoder paradigm and long short-term memory (LSTM) [19] units for capturing long-range dependencies, attention-based recurrent architectures have demonstrated strong capabilities in modeling complex, multivariate temporal dynamics and extracting interpretable feature importance patterns [20,21]. Such interpretability is particularly valuable for industrial prognostics, as it reveals how different operating variables contribute to degradation progression and provides a principled basis for connecting RUL prediction with maintenance and operation decisions [22].

Building upon these considerations, this study develops an integrated framework for RUL prediction and operation optimization of ESP systems. While existing attention-based RUL prediction models mainly focus on enhancing prediction accuracy, the present framework extends RUL modeling toward operation-relevant analysis by explicitly integrating prediction results with operating-regime adjustment. A dual-stage attention-based recurrent neural network is employed to extract degradation-related features from multi-source time-series data and to generate accurate RUL predictions under complex offshore operating conditions. Based on the predicted RUL trajectories and the analysis of operating-parameter effects, an operation optimization strategy is further formulated to support lifespan-oriented adjustment of ESP operating regimes. Field data validation demonstrates that the proposed framework can effectively capture degradation trends and provide data-driven support for improving ESP reliability, operational safety, and maintenance decision-making.

The main contributions of this study are summarized as follows:

(1): An integrated framework is developed for RUL prediction and operation optimization of ESP systems, enabling degradation assessment and operating-regime adjustment to be addressed in a unified manner;
(2): A dual-stage attention-based recurrent neural network is employed not only to improve RUL prediction accuracy, but also to provide operationally meaningful interpretation of the influence of controllable operating variables on degradation evolution;
(3): An RUL-driven operation optimization strategy is proposed, in which predicted lifespan trajectories are embedded into an operation optimization process to guide lifespan-oriented adjustment of operating parameters, thereby enhancing the practical applicability of data-driven prognostics in offshore ESP management.

2. Theoretical Foundation

2.1. Remaining Useful Life of ESP Systems

The RUL of an Electric Submersible Pump system refers to the amount of time the system can continue to operate normally under its current health condition. The formal definition of RUL is given by [23]:

T_{RUL} (t) = t_{f} - t_{c}, t_{f} \geq t_{c}

(1)

where

T_{RUL} (t)

denotes the remaining lifetime of the ESP system at time t, measured in days,

t_{f}

represents the failure time, and

t_{c}

denotes the current operating time. In this study, an ESP failure is defined as a condition in which the pump system is retrieved from the well due to confirmed functional loss or severe degradation such as motor burnout, electrical insulation failure, mechanical seizure, or other faults identified through field diagnosis and post-retrieval inspection. The failure time

t_{f}

is labeled based on actual field records, including workover reports and maintenance logs, which document the date of pump retrieval and confirmed failure events. Reliable RUL prediction enables proactive scheduling of preventive maintenance, reduces the probability of unexpected failures, and minimizes production downtime, ultimately improving the safety, reliability, and economic efficiency of ESP operations.

However, field operation records indicate that not all ESP retrieval events correspond to actual failures. Early pull-outs performed for preventive inspection, operational adjustment, or surface facility maintenance, as well as non-failure shutdowns caused by production planning or external constraints, are treated as censored cases in this study. Since reliable failure times cannot be defined for these events, such samples are excluded from the supervised RUL model training and evaluation. Only ESP instances with confirmed failure events and clearly documented failure times are retained to ensure the consistency and physical validity of the RUL labels. Accordingly, the RUL ground truth in this study is constructed as the time interval between the current operating time and the confirmed failure time for each retained ESP instance.

2.2. Dual-Stage Attention-Based Recurrent Neural Network (DA-RNN)

The DA-RNN is a neural architecture designed for time series prediction tasks. By integrating attention mechanisms with recurrent neural networks, DA-RNN enhances the ability to identify and extract informative patterns from multivariate input data. The network consists of an encoder and a decoder, each equipped with its own attention module.

The input-attention mechanism introduced in the encoder assigns adaptive importance weights to different driving sequences, while the temporal-attention mechanism in the decoder evaluates the relevance of encoder hidden states over time. Together, these two modules enable simultaneous feature-level and temporal-level selection within a unified encoder–decoder framework, allowing DA-RNN to extract degradation-relevant patterns from multivariate ESP operating data. The overall architecture of DA-RNN is shown in Figure 1.

The encoder is essentially a recurrent neural network (RNN) designed to map an input multivariate time series into a sequence of latent representations. For a time step

t

, the encoder updates its hidden state

h_{t} \in R^{m}

according to:

h_{t} = f_{1} (h_{t - 1}, X_{t})

(2)

where

m

denotes the dimensionality of the hidden state, and

f_{1} (\cdot)

represents a nonlinear state-transition function. In this study, the long short-term memory (LSTM) architecture is adopted due to its capability to mitigate gradient vanishing and capture long-term temporal dependencies inherent in the degradation processes of ESP systems. Given an RUL-related feature sequence

X = (x_{1}, x_{2}, \dots, x_{T})

, where

x_{t} \in R^{n}

and T is the total number of time steps, the update equations of an LSTM cell are expressed as:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(3)

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(4)

{\tilde{s}}_{t} = \tanh (W_{s} \cdot [h_{t - 1}, x_{t}] + b_{s})

(5)

s_{t} = f_{t} ⊙ s_{t - 1} + i_{t} ⊙ {\tilde{s}}_{t}

(6)

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(7)

h_{t} = o_{t} ⊙ \tanh (s_{t})

(8)

where

σ (\cdot)

denotes the sigmoid function;

W_{f}, W_{i}, W_{s}, W_{o}

and

b_{f}, b_{i}, b_{s}, b_{o}

are the trainable weight matrices and bias terms;

f_{t}

,

i_{t}

, and

o_{t}

correspond to the forget, input, and output gates;

{\tilde{s}}_{t}

is the candidate cell state; and

s_{t}

is the internal memory cell that propagates long-term information. To enable the encoder to automatically identify the most degradation-sensitive features, an input attention mechanism is incorporated. For the k-th feature at time step t, an alignment score is computed as:

e_{t}^{k} = V_{e}^{⊤} \tanh (W_{e} [h_{t - 1}, s_{t - 1}] + U_{e} X^{k})

(9)

which is normalized through a softmax function to obtain the attention weight:

α_{t}^{k} = \frac{\exp (e_{t}^{k})}{\sum_{i = 1}^{n} \exp (e_{t}^{i})}

(10)

This input attention mechanism acts as a feature-level gating network that trains jointly with the rest of the DA-RNN model. It adaptively assigns higher weights to features highly correlated with ESP degradation—such as electrical load, operating pressure, or thermal stress—while suppressing noisy or weakly relevant measurements. The refined input vector is given by:

{\tilde{X}}_{t} = {(α_{t}^{1} x_{t}^{1}, α_{t}^{2} x_{t}^{2}, \dots, α_{t}^{n} x_{t}^{n})}^{T}

(11)

which is subsequently used in the LSTM update:

h_{t} = f_{1} (h_{t - 1}, {\tilde{X}}_{t})

(12)

By integrating attention-weighted inputs into the recurrent update, the encoder is able to focus selectively on degradation-relevant temporal patterns while filtering out irrelevant fluctuations. This design substantially enhances the ability of DA-RNN to extract robust and interpretable degradation features from multivariate ESP monitoring data, thereby strengthening the foundation for accurate RUL prediction.

In the decoder, an LSTM unit is employed to process the sequence of hidden states generated by the encoder and produce the prediction

{\hat{y}}_{t}

, representing the RUL of the ESP system at time t. However, as the input sequence length increases, the performance of conventional encoder–decoder architectures tends to degrade because the decoder is unable to fully exploit all historical information. To address this limitation, a temporal attention mechanism is integrated into the decoder, enabling it to adaptively identify the encoder hidden states that contribute most to the current prediction.

To compute the temporal attention, an alignment score is first constructed using the previous decoder hidden state

d_{t - 1}

, the previous decoder cell state

s_{t - 1}

, and the encoder hidden state at time step i, denoted by

h_{i}

. This alignment score is calculated by a feed-forward neural network and reflects the relevance between the encoder state at time i and the current prediction task. The computation is given by:

l_{t}^{i} = V_{d}^{⊤} \tanh (W_{d} [d_{t - 1}, s_{t - 1}] + U_{d} h_{i})

(13)

where

V_{d}

,

W_{d}

, and

U_{d}

are learnable parameters. The use of both

d_{t - 1}

and

s_{t - 1}

is essential: while

d_{t - 1}

captures short-term temporal variations, the cell state

s_{t - 1}

stores long-term degradation trends such as gradual increases in motor temperature or load. By combining these two types of information, the alignment score can more accurately determine the importance of each past time step. Normalization via softmax yields the attention weights:

β_{t}^{i} = \frac{\exp (l_{t}^{i})}{\sum_{j = 1}^{T} \exp (l_{t}^{j})}

(14)

The temporal attention weights are then used to construct a context vector

c_{t}

, which is a weighted sum of all encoder hidden states:

c_{t} = \sum_{i = 1}^{T} β_{t}^{i} h_{i}

(15)

The context vector serves as a condensed representation of the historical information most relevant to the current prediction, enabling the decoder to selectively emphasize key moments in the degradation process rather than relying solely on the final encoder output. After obtaining the context vector, the decoder concatenates it with the previous target value

y_{t - 1}

and applies a linear transformation to obtain the decoder input:

{\tilde{y}}_{t} = F ([y_{t - 1}, c_{t}])

(16)

where

F (\cdot)

is a trainable mapping. The LSTM unit then updates its hidden state based on this input:

d_{t} = f_{2} (d_{t - 1}, \tilde{y_{t}})

(17)

where

f_{2} (\cdot)

follows the same update equations as the LSTM described earlier. Finally, the predicted RUL at time

T

is obtained by concatenating the decoder hidden state

d_{T}

and the context vector

c_{T}

and projecting them through a linear layer:

{\tilde{y}}_{T} = W_{y} [d_{T}, c_{T}] + b_{y}

(18)

By incorporating the temporal attention mechanism, the decoder is able to focus on the most informative moments in the historical sequence at each prediction step. This design complements the feature-level screening performed by the encoder and ensures that the predicted RUL is informed by both degradation-sensitive variables and critical temporal stages. As a result, the DA-RNN architecture provides a compact and interpretable representation for modeling the degradation dynamics of ESP systems.

3. Methodology

3.1. Feature Importance Analysis for RUL Prediction

This study analyzes the key factors influencing the RUL of electric submersible pumps using historical data from 712 ESP wells in the Bohai Oilfield. The dataset includes wellhead and downhole sensor measurements, electrical operating parameters, and fluid production variables. Summary statistics of these variables are presented in Table 1. Prior to correlation analysis, the raw data underwent preprocessing, including the removal of outliers beyond physically reasonable limits, imputation of missing values, and normalization of all variables to eliminate the effects of differing units and scales, thereby ensuring a consistent basis for subsequent analysis.

To quantify the relationship between individual variables and RUL, the Pearson correlation coefficient was employed. The coefficient ranges from −1 to 1, where larger absolute values indicate stronger linear relationships. The Pearson correlation coefficient is computed as:

r = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2} \sqrt{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}}}

(19)

where r denotes the correlation coefficient,

X_{i}

and

Y_{i}

are paired observations of two variables,

\bar{X}

and

\bar{Y}

represent their sample means, and n is the number of observations. Based on this formulation, correlations between RUL and 15 candidate variables were calculated and visualized in a heatmap shown in Figure 2. In the heatmap, the horizontal and vertical axes list the variables; the color intensity of each cell reflects the magnitude and direction of the correlation, with blue indicating strong negative correlation and red indicating strong positive correlation.

As shown in Figure 2, current and frequency exhibit the strongest negative correlations with RUL among all variables, highlighting their significant roles in representing the operating conditions and load levels of ESP systems. Moderate correlations are observed for wellhead pressure, pump intake pressure, pump intake temperature, motor temperature, oil production rate, gas production rate, and liquid production rate. These variables contain meaningful degradation-related information and are therefore retained as input features for DA-RNN model training. In this study, variables with absolute Pearson coefficients below 0.2 were excluded, as their weak associations contribute little to degradation modeling and may introduce noise without improving predictive performance.

It should be noted that Pearson correlation analysis was employed here as a preliminary feature screening step rather than a definitive measure of feature importance. The primary purpose of this step was to remove variables with negligible relevance to RUL and reduce input redundancy before model training. The final contribution of each retained feature to RUL prediction was subsequently learned in a data-driven manner by the dual-stage attention mechanism of the DA-RNN, which is capable of capturing nonlinear relationships and time-dependent effects beyond linear correlation. To avoid information leakage, the Pearson correlation-based feature screening was performed using the training dataset only. The selected feature set was then fixed and consistently applied to the testing dataset without further adjustment.

After identifying the variables that exhibit strong relationships with ESP degradation, the next step was to construct an RUL prediction model capable of capturing the temporal degradation patterns embedded in these multivariate operating signals. This motivated the development of the DA-RNN-based prediction framework described in the subsequent section.

3.2. Construction of the DA-RNN-Based RUL Prediction Model

The DA-RNN-based RUL prediction model is illustrated in Figure 3. The modeling process begins with data preprocessing, including normalization and a sliding time-window operation. Normalization eliminates the effects of differing magnitudes across variables and enables the model to focus on temporal variations rather than absolute values. The sliding window divides the continuous time series into fixed-length segments, enabling the model to capture local dynamic behaviors while forming a stable structure suitable for batch training. After windowing, the data are formatted into a three-dimensional tensor representing the feature dimension, window length, and batch size, which preserves the temporal structure of the data and improves computational efficiency during training.

In designing the model inputs and outputs, multivariate features related to operating conditions, load behavior, and fluid characteristics are selected to describe the degradation patterns of the ESP system. The window length is chosen to balance the need to capture short-term dynamics with the requirement of stable model learning. The output is a one-step RUL estimate that advances as the window slides, thereby mimicking the continuous health evolution of the equipment. Ground-truth RUL labels are computed as the difference between the known failure time and the current timestamp and are aligned with the windowed input sequences to maintain temporal consistency during supervised learning. For multivariate time series from different wells, synchronization and alignment are performed only within each individual well based on its own time stamps, with sliding-window segmentation applied independently for each well and no enforced temporal alignment across wells. To prevent information leakage, the dataset was divided chronologically into 80% for training and 20% for testing, ensuring that the model was always evaluated on data from later operational periods that the network had never seen during training.

The processed data are then fed into the DA-RNN architecture. In the encoder, the input-attention mechanism emphasizes feature dimensions most relevant to degradation evolution, while the LSTM captures nonlinear temporal dependencies in the monitoring signals. Attention weights are learned independently for each well based on its own input time series, allowing the model to adapt to different operating conditions rather than relying on averaged patterns across wells. Compared with conventional LSTM networks, this dual-stage attention structure improves the representation of complex industrial monitoring data. In contrast to Transformer-based architectures that typically require large datasets and higher computational cost, DA-RNN exhibits more stable training behavior with lower parameter complexity, making it suitable for field applications with limited sampling frequency and gradually evolving degradation. In the decoder, the temporal-attention mechanism selects historical time steps most relevant to the current prediction, enabling the model to focus on representative stages of the degradation trajectory and improve prediction accuracy. A single DA-RNN model is trained using data from multiple wells, with model parameters shared across wells, while each well is treated as an independent input sequence during training and inference.

The model is trained using the Adam optimization algorithm, which updates parameters based on the error between predicted and actual RUL values. Early stopping is employed to prevent overfitting, and validation loss is monitored to ensure convergence stability. Gradient clipping is applied when necessary to avoid numerical issues such as exploding gradients. A systematic exploration of key hyperparameters—including window length, learning rate, and hidden-layer size—is conducted to balance convergence speed, generalization capability, and computational cost. The dual-stage attention structure further strengthens the model’s ability to capture salient degradation-related information, resulting in more robust and reliable predictions in field scenarios. Once the model satisfies the predefined accuracy requirements during training, the construction of the RUL prediction model is completed. The final model effectively characterizes the future degradation pattern of ESP systems and provides reliable support for maintenance planning, pump retrieval scheduling, and early-warning strategies in oilfield operations.

With the DA-RNN model established and validated for extracting degradation-sensitive temporal features and generating reliable RUL predictions, it becomes possible to further integrate the predictive model into production decision-making. To achieve this, an optimization framework is developed to determine the ESP operating regime that maximizes equipment lifespan based on the predicted RUL.

3.3. Optimization of ESP Operating Regime Based on RUL Prediction

To further utilize the RUL prediction model for practical production decision-making, an optimization framework is developed to determine the ESP operating regime that maximizes the equipment’s remaining lifespan. Unlike model training, the objective here is to exploit the RUL predictions produced by the DA-RNN network to identify controllable operating parameters—such as pump frequency and surface choke setting—that yield the longest remaining life while meeting production and safety requirements. Production rates and pressure distribution are treated as response variables determined by a physics-based wellbore hydraulic model, while motor current, temperature, and efficiency are obtained from an empirical motor response model calibrated using historical operating data.

For each candidate operating regime V, representing a specific combination of controllable settings, the operating regime is defined as a decision vector:

V = [f, d]

(20)

where f denotes the ESP operating frequency and d represents the surface choke opening. Given a candidate regime V, the DA-RNN model outputs a RUL trajectory over a finite prediction horizon:

L (V) = {L_{k} (V)}_{k = 1}^{K}

(21)

where

L_{k} (V)

denotes the predicted remaining life at time step

t_{k}

and K is the total number of prediction steps within the evaluation horizon. In this study, the prediction horizon

K

was set to 100 days, corresponding to a typical operational planning cycle in the studied offshore field. This horizon also represents a range over which the DA-RNN model maintains strong predictive accuracy, as evidenced by the consistently low MAE observed within this time window, thereby balancing short-term operational responsiveness and robustness against long-term prediction uncertainty. Let

V^{0}

denote the current operating regime and

{L_{k} (V^{0})}

the corresponding baseline lifespan trajectory.

The improvement introduced by a candidate operating regime V is quantified by the cumulative difference between the predicted lifespan trajectories before and after optimization, formulated as:

J (V) = \sum_{k = 1}^{K} [L_{k} (V) - L_{k} (V^{0})] Δ t_{k}

(22)

where

Δ t_{k}

denotes the time interval associated with the k-th prediction step and is consistent with the sampling interval used in the sliding-window construction. A positive and larger value of J(V) indicates a more pronounced extension of the ESP lifespan relative to the baseline operating condition.

The optimization problem for determining the optimal ESP operating regime is thus formulated as:

{\begin{matrix} \max_{V \in Ω} J (V) \\ q_{o} (V) \geq q_{o, \min} (m i n i m u m o i l r a t e) \\ q_{l} (V) \geq q_{l, \min} (m i n i m u m l i q u i d r a t e) \\ p_{wf} (V) \geq p_{wf, \min} (a v o i d l i q u i d l o a d i n g) \\ I (V) \leq I_{\max} (m o t o r c u r r e n t l i m i t) \\ T_{m} (V) \leq {T_{m},}_{\max} (m o t o r t e m p e r a t u r e l i m i t) \\ η (V) \geq η_{\min} (p u m p e f f i c i e n c y c o n s t r a i n t) \\ V_{\min} \leq V \leq V_{\max} (e n g i n e e r i n g b o u n d s) \end{matrix}

(23)

In the above formulation,

q_{o} (V)

and

q_{l} (V)

denote the oil and total liquid production rates, respectively;

p_{wf} (V)

is the flowing bottom-hole pressure;

I (V)

and

T_{m} (V)

represent the motor current and motor temperature; and

η (V)

denotes the pump efficiency under operating regime

V

.

In practice, the optimization is implemented using a discrete search strategy. The ESP operating frequency f and choke size d are discretized within prescribed engineering ranges

[f_{\min}, f_{\max}]

and

[d_{\min}, d_{\max}]

, with increments

Δ f

and

Δ d

, respectively. In practical implementation, the frequency increment

Δ f

is set to 1 Hz, while the choke size increment

Δ d

follows discrete choke specifications expressed in mm/64th that are commonly used in field operations. Under these settings, the feasible search space

Ω_{f e a s}

for a typical well contains approximately 180–250 candidate operating regimes; evaluating the objective function for all candidates requires on the order of 2 min per well on a standard engineering workstation, which is sufficiently efficient for routine batch operational planning. For each candidate pair

(f_{m}, d_{n})

, wellbore hydraulic calculations and motor electrical models are employed to determine the associated pressure distribution, temperature distribution, flow rate, and motor current, thereby forming the complete operating regime

V (f_{m}, d_{n})

and its corresponding response quantities

q_{o} (V)

,

q_{l} (V)

,

p_{wf} (V)

,

I (V)

,

T_{m} (V)

, and

η (V)

. The objective value

J (V (f_{m}, d_{n}))

is evaluated for all feasible combinations. The optimal operating regime is obtained as:

V^{*} = \arg \max_{(f_{m}, d_{n}) \in Ω f_{feas}} J (V (f_{m}, d_{n}))

(24)

where

Ω_{feas}

denotes the set of candidate frequency–choke combinations that satisfy all engineering and operational constraints in Equation (23). These constraints are defined based on equipment design limits, field operational guidelines, and historical operating envelopes derived from long-term offshore production data, ensuring that all candidate regimes remain physically feasible and operationally safe. For each individual well, the historical operating envelope is constructed by removing short-duration abnormal and extreme operating conditions and retaining the stable operating range observed during normal production, where operating frequency and choke size are limited to within ±10% of their long-term average values. This ensures that optimization is constrained within practically verified and stable field operating conditions. While the general form of the constraints is consistent across wells, the specific bounds of

Ω_{feas}

are allowed to vary from well to well to account for differences in reservoir inflow capacity, completion design, and equipment configuration. In addition, all optimized operating regimes are constructed within historical operating envelopes observed in field practice, and the recommended frequency and choke settings have been previously realized during normal operations. This ensures that the proposed optimization results are operationally feasible and do not rely on hypothetical or unrealizable control actions. The corresponding optimal regime

V^{*}

yields the recommended ESP operating parameters, including pump frequency and choke size, along with the associated pressure, temperature, and flow-rate distributions.

This optimization strategy enables the DA-RNN-based RUL prediction model to be directly integrated into ESP operational planning, allowing operating parameters to be adjusted in a condition-aware manner. By guiding operational decisions using predictive degradation trends rather than fixed rules, the proposed framework maximizes equipment lifespan while maintaining safe and efficient production. In this study, the optimization is intended for batch operational planning rather than real-time control and is executed over predefined planning horizons using periodically updated operating data, which is consistent with practical offshore operating workflows. The effectiveness of the overall workflow, including feature selection, RUL prediction, and operating-regime optimization, is evaluated in the following Materials and Methods section using field data.

4. Materials and Methods

4.1. Model Pre-Training and Performance Evaluation

During the training phase, the Adam optimizer was employed to update the parameters of the DA-RNN model. The dataset contains nine selected features, with a batch size of 128 and a window length of 10. Both the encoder and decoder were configured with 64 hidden units, and the learning rate was set to 0.001. The model was trained for 100 epochs, and the mean squared error was used as the loss function. The dataset was divided into training, validation, and testing sets using a chronological split to preserve the temporal structure of the sequences. Model training was performed using only earlier operational data, while validation and testing were conducted on subsequent unseen periods. All preprocessing steps were performed exclusively on the training data and then consistently applied to the validation and testing sets, thereby avoiding data leakage and ensuring realistic prognostic evaluation. To reduce the effect of random initialization, each experiment was repeated five times, and the average performance was reported. Gradient clipping was also applied to avoid numerical instability during backpropagation. All model training was conducted on a workstation equipped with an NVIDIA RTX 50-series GPU (24 GB memory), an Intel Core i9 CPU, and 64 GB RAM. Under this hardware configuration, the total training time for the DA-RNN model was on the order of several hours, depending on the hyperparameter settings.

The training and validation errors of the DA-RNN model are shown in Figure 4, where the horizontal axis represents the number of training epochs and the vertical axis denotes the root-mean-squared error (RMSE). Since the training data were normalized in advance, the RMSE is dimensionless. As illustrated in Figure 4, both training and validation errors converge rapidly to a low level, demonstrating the stable learning and good fitting capability of the DA-RNN model.

The prediction performance of the DA-RNN model on the full dataset is shown in Figure 5, where the horizontal axis represents the true RUL (in days) and the vertical axis denotes the predicted RUL. The samples from both the training and testing sets lie close to the diagonal reference line

y = x

, indicating strong consistency between predicted values and actual observations. This demonstrates that the DA-RNN model achieves high prediction accuracy across a wide range of RUL values.

To further validate the advantages of DA-RNN in time-series prediction tasks, additional RUL prediction models were constructed using RNN, LSTM, GRU, and Transformer architectures for comparison. All baseline models were trained under the same hyperparameter settings, including learning rate, batch size, window length, and number of epochs, to ensure a fair and unbiased comparison. For the Transformer baseline, a time series-oriented configuration was adopted rather than a generic sequence-modeling setup. An encoder-only architecture with positional encoding was used to preserve temporal order, and self-attention was applied on fixed-length sliding windows consistent with the RUL prediction task. Key architectural parameters were adjusted through preliminary testing to ensure stable convergence and reasonable performance under long-horizon degradation conditions. This design ensured that the Transformer baseline was appropriately tuned for time-series prognostics and fairly evaluated in the comparison.

The performance of each model was evaluated using three widely adopted metrics: mean relative error (MRE), mean absolute error (MAE), and root mean squared error (RMSE). The calculation formulas for these metrics are given in Equations (25)–(27).

MRE = \sum_{i = 1}^{n} | \frac{y_{t} - y_{pre}}{y_{t}} | \times \frac{1}{n}

(25)

MAE = \frac{\sum_{i = 1}^{n} | y_{t} - y_{pre} |}{n}

(26)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{t} - y_{pre})}^{2}}

(27)

In these equations,

y_{t}

denotes the true RUL,

y_{t, pre}

denotes the predicted RUL, and n is the sample size. MRE measures the relative deviation between predicted and true values and reflects the model’s ability to remain accurate across different scales of RUL. MAE reflects the average absolute difference between predicted and actual values. RMSE introduces a squared penalty and is therefore more sensitive to large deviations; a lower RMSE indicates that even the worst prediction errors remain well controlled. All metrics are computed on a per-well basis, and the reported values are obtained by averaging the errors across all wells.

The comparative evaluation results are presented in Figure 6. It is evident that the DA-RNN model consistently achieves lower MRE, MAE, and RMSE than the RNN, LSTM, GRU, and Transformer models. This indicates superior predictive performance under both typical and extreme conditions. In relative terms, the proposed DA-RNN achieves a substantial reduction in prediction error. On the test set, the MAE is reduced by approximately 78% compared with the best-performing baseline model (LSTM), while consistently outperforming all other evaluated architectures. Figure 6 also includes two single-attention variants for ablation analysis, namely InputAttOnly and TempAttOnly. The results show that removing either the input attention mechanism or the temporal attention mechanism leads to a clear increase in prediction errors on both training and test sets. Compared with these single-attention variants, the full DA-RNN achieves an additional MAE reduction of approximately 73–75%, demonstrating that the two attention components play complementary roles in enhancing RUL prediction accuracy. The results confirm that the dual-stage attention mechanism of DA-RNN enables it to better extract degradation patterns embedded in multivariate ESP operating data, thus delivering more accurate and reliable RUL predictions. The performance gain can be attributed to the ability of DA-RNN to selectively emphasize degradation-relevant features and to capture key temporal patterns that are critical for accurate RUL estimation. From an operational perspective, prediction errors should be interpreted in relation to the time scale of ESP operation. Since operating adjustments and maintenance planning are typically conducted over horizons of tens to hundreds of days, the MAE reduction achieved by the proposed DA-RNN represents a meaningful improvement in the timing accuracy of operational decisions rather than a purely numerical gain.

To ensure the stability and robustness of the DA-RNN model, a comprehensive hyperparameter optimization procedure was conducted. A k-fold cross-validation strategy was adopted, in which the dataset was divided into k subsets, with k minus one subsets used for training and the remaining subset used for validation. This process was repeated k times so that each subset served once as the validation set. The average validation performance over k iterations was taken as the evaluation metric for each hyperparameter configuration. Considering the relatively large size of the dataset used in this study and the computational cost associated with cross-validation, the value of k was set to 5.

A grid search was then performed to examine the effect of key hyperparameters on the prediction performance of the DA-RNN model. The hyperparameters included the window length, learning rate, and the number of hidden units in the encoder and decoder. Prior to the grid search, the candidate range of window length (5–20) was determined based on the data resolution and practical ESP operating characteristics. The lower bound ensures sufficient recent context to capture short-term fluctuations and early degradation cues, while the upper bound avoids overly long windows that may dilute recent condition changes and increase computational cost. Specifically, the window length was varied from 5 to 20 with a step of 5, the learning rate ranged from 0.001 to 0.1 with a step of 0.001, and the number of hidden units in both the encoder and decoder ranged from 64 to 256 with a step of 64, resulting in a total of 6400 evaluated combinations. The grid search results are summarized in Table 2. As shown in the table, a window length of 20 and a learning rate of 0.001 yielded the lowest prediction errors, while using 64 hidden units in both the encoder and decoder provided the most stable performance. Based on these results, the final model configuration consistently adopts the optimal hyperparameter combination identified by the grid search, including a window length of 20, a learning rate of 0.001, and 64 hidden units in both the encoder and decoder. Grid search was chosen because the hyperparameter space is low-dimensional and well bounded, allowing for exhaustive evaluation without prohibitive computational costs.

These findings suggest that a relatively long input window enables the DA-RNN model to capture richer temporal degradation information from the ESP operating data. A lower learning rate enhances training stability and reduces the risk of undesirable oscillation during optimization. Increasing the number of hidden units slightly improves representation capability, but excessively large hidden layers do not yield additional benefit. Overly complex configurations may even cause overfitting, which degrades prediction accuracy on the testing set. Based on these observations, the final hyperparameter settings of the DA-RNN model were selected as follows: window length of 20, learning rate of 0.001, 64 hidden units in the encoder, and 64 hidden units in the decoder.

In addition to the grid search, several supplementary analyses were conducted to further verify the reliability of the selected hyperparameters. First, each candidate configuration was trained multiple times using different random seeds. This ensured that the superior performance of the chosen configuration was not the result of a single initialization. Second, the learning curves corresponding to each configuration were examined to verify convergence stability. Hyperparameter combinations that exhibited irregular fluctuations in the validation error or failed to converge were excluded. Third, the sensitivity of the model to variations in window length, learning rate, and hidden units was analyzed. The final configuration demonstrated high robustness, as moderate perturbations to these hyperparameters did not lead to significant performance degradation.

These analyses confirm that the selected hyperparameter configuration provides a stable balance between model complexity and predictive accuracy. In addition, the variability across repeated experiments was examined to assess the robustness of the model performance. All reported results in this section were obtained by repeating the training process multiple times with different random initializations. The observed variation in prediction errors across runs remained limited, indicating stable convergence behavior and low sensitivity of the DA-RNN model to random initialization. The chosen settings form a reliable foundation for subsequent RUL prediction and the optimization of ESP operating regimes presented in the following sections.

4.2. Analysis of RUL Prediction Behavior Under Actual ESP Operating Conditions

To assess the practical applicability of the proposed RUL prediction framework, the model was applied to representative ESP operating data. The prediction results are presented in Figure 7, where the horizontal axis denotes the actual operating time of the ESP and the vertical axis represents the RUL in days. The predicted RUL curve closely matches the true degradation trajectory across the entire operating period. This indicates that the DA-RNN model can reliably capture both long-term degradation patterns and short-term dynamic variations present in real ESP operation. For the representative well shown in Figure 7, the MAE between the predicted and actual RUL is 23 days, quantitatively confirming the high prediction accuracy suggested by the visual agreement of the curves.

In addition to the overall agreement between predicted and true RUL values, the behavior of the model under different operating conditions provides further insights. Figure 8 illustrates RUL prediction results for ESPs with relatively short operational lifespans. For ESPs operating for fewer than 1000 days, the RUL decreases rapidly during early production. For these short-lifespan ESP wells, the MAE of the RUL predictions in Figure 8a–d is 57, 54, 68, and 70 days, respectively. Production data show that these wells typically exhibit limited inflow capacity at the beginning of operation. For example, the well corresponding to Figure 8a experienced low liquid production during the first 50 days and operated at an average pump frequency of approximately 55 Hz. In contrast, wells with higher lifespans, such as the example shown in Figure 8b, maintained more stable inflow conditions during early operation. The ESP in Figure 8b operated at an average frequency of 48 Hz in the first 200 days, and its RUL curve exhibited much smaller fluctuations, with a maximum deviation of only 14.6 percent. These observations demonstrate that early-stage inflow stability plays an essential role in shaping the degradation behavior of ESP systems.

Further interpretation of the degradation process is shown in Figure 8c. In this case, the predicted RUL exhibits a marked decline during the mid-life stage. According to field monitoring data, the pump intake temperature exceeded 120 degrees Celsius during this period. The elevated temperature accelerated insulation ageing and increased the risk of gas lock. As inflow gradually weakened, the ESP was operated at high frequency for extended periods in an effort to maintain production. Prolonged high-frequency operation led to increased motor current and additional internal heating, which intensified electrical and mechanical degradation mechanisms. These factors collectively reduced pump efficiency and shortened the RUL of the ESP. The DA-RNN model successfully captured this accelerated degradation stage, as reflected by the steep drop in the predicted RUL curve. This demonstrates that the model responds sensitively to variations in operational parameters such as motor temperature, motor current, and intake pressure. This behavior further illustrates that the dual-stage attention mechanism enables the model to focus on combinations of feature patterns—such as concurrent temperature elevation and current surges—that typically precede accelerated deterioration, thereby generating RUL trajectories that closely reflect underlying physical responses.

A similar degradation pattern is observed in Figure 8d, where the RUL curve exhibits pronounced oscillations throughout the mid-to-late production period. Field data indicate that these fluctuations coincide with repeated cycles of inflow instability, during which the ESP experienced alternating periods of liquid fallback and transient gas interference. These conditions caused the motor current to rise sharply and intermittently, reflecting abrupt changes in hydraulic load. The frequent transitions between liquid-rich and gas-invaded flow not only increased the risk of partial gas lock but also forced the pump to operate away from its best-efficiency region. As a result, both electrical and mechanical stresses accumulated more rapidly, contributing to the accelerated decline in RUL observed in the prediction. The DA-RNN model captured these oscillatory degradation patterns effectively, demonstrating its capability to track fine-scale dynamic disturbances in operating parameters and to translate them into corresponding changes in the RUL trajectory.

Overall, the analysis confirms that the DA-RNN model can accurately reflect diverse degradation behavior under field operating conditions. The model captures both macro trends and micro dynamic responses in the degradation process, enabling early identification of adverse operating states and supporting timely intervention in ESP management. These findings highlight the model’s practical value for deployment in real production environments.

4.3. Analysis of Operating Strategy Optimization for ESP Systems

To further evaluate the practical value of the proposed RUL prediction framework, this section investigates how RUL-informed insights can guide the optimization of ESP operating strategies. Two representative wells, denoted as well-E and well-F, were selected to illustrate the impact of optimization on the predicted RUL trajectory under actual field conditions. Figure 9 and Figure 10 compare the predicted RUL under the historical operating regime with the predicted RUL under an optimized operating regime derived from RUL-based recommendations.

For well-E, the historical operation shows a steady decline in RUL, with occasional sharp drops corresponding to periods of elevated current draw and increased thermal stress. These abrupt declines are typically associated with transient increases in motor loading caused by inflow fluctuations or liquid-column oscillations. When the optimized operating strategy is applied, the predicted RUL curve becomes noticeably smoother toward the end of the operating timeline. Although the overall improvement is modest, with an increase of approximately 1.95% in predicted RUL, the optimized operating regime extends the RUL by approximately 18 days at the end of the 100-day optimization period, effectively suppressing the severe downward excursions observed under historical operation. The optimization primarily aligns pump frequency with real-time inflow capacity, reduces unnecessary high-frequency operation near the pump’s optimal efficiency region, and mitigates abrupt load transitions, thereby alleviating cumulative electrical and thermal stress.

For well-F, the effect of optimization is more pronounced. Under historical operation, the RUL exhibits a rapid decline after approximately 320 days, coinciding with frequent inflow instability and repeated frequency adjustments. These operational patterns drive the pump to traverse its best-efficiency region multiple times, amplifying cyclic loading and accelerating electrical and mechanical degradation. When the RUL-guided optimization strategy is applied, the degradation rate is significantly reduced, resulting in a 9.87% increase in predicted RUL. Over the optimization horizon, this improvement corresponds to an absolute extension of approximately 45 days in RUL, and the optimized RUL trajectory remains substantially flatter, indicating that stabilizing operating conditions and avoiding sustained high-stress states can effectively slow degradation progression in wells subject to volatile operating environments.

To extend the single-well analysis to the platform scale, a comparative evaluation was conducted using wells with similar background conditions, including comparable pump–motor capacity levels, commissioning age, pump-setting depth, well inclination, and operating envelopes. Based on whether the RUL-guided optimization strategy was applied, the selected wells were classified into Group A (RUL-guided operation) and Group B (experience-based operation). A concise platform-level comparison of the two groups is summarized in Table 3.

Table 3 shows that, under comparable geometric and operating conditions, wells operated with RUL-guided optimization exhibit lower annualized shutdown rates and fewer non-planned shutdown events than those operated under conventional strategies. All operating regimes considered are constrained by existing engineering limits and fall within historically observed field ranges, indicating that the comparison reflects feasible operational behavior rather than purely hypothetical control scenarios.

To further interpret these platform-level differences, aggregated operational behaviors of the two groups were examined. Wells operated under experience-based strategies tend to exhibit frequent short-term adjustments in operating frequency and current in response to inflow fluctuations, increasing exposure to transient high-load and high-stress conditions. In contrast, wells operated with RUL-guided optimization generally maintain more stable operating regimes, with fewer abrupt load transitions and reduced residence time in sustained high-stress states. This systematic difference in operational behavior provides a practical explanation for the lower shutdown-related event frequency observed at the platform scale and links the statistical trends in Table 3 to physically interpretable operating patterns.

This stability is also reflected in the response of the optimization results to small perturbations in key input variables. When inflow-related parameters, pressure conditions, and electrical load indicators vary within a narrow range representative of normal measurement uncertainty, the resulting changes in the optimization objective remain below approximately 2 percent and the recommended frequency–choke combinations do not exhibit material variation across the discretized operating space. This observation indicates that the RUL-guided optimization produces robust and stable operating recommendations rather than highly sensitive control decisions.

Beyond robustness, the reliability indicators in Table 3 also imply the tangible operational and economic benefits of the RUL-guided optimization. Compared with experience-based operation, Group A shows a lower non-planned shutdown frequency (0.31 vs. 0.39 events per well year), i.e., 0.08 fewer unplanned events per well year, which corresponds to approximately 1.68 fewer events per year across the 21 wells in Group B. Under the reported operating conditions, the average oil rate is in the order of 164–167 m³/d. Assuming a typical offshore downtime of D = 1–5 days per non-planned shutdown, an oil price of 75 USD/bbl (≈472 USD/m³), and a direct corrective-intervention cost of 30–80 k USD per event, the reduced shutdown frequency implies about 13–67 m³ of protected oil production per well per year (≈6–32 k USD/yr in avoided production loss) and an additional ≈2–6 k USD/yr in reduced intervention costs, yielding combined savings of roughly 9–38 k USD per well per year; at the 21-well scale, this corresponds to ≈0.18–0.79 M USD per year. These estimates, while scenario-dependent, quantitatively link the reliability improvements to reduced downtime risk and a lower corrective-maintenance burden under engineering-feasible operating regimes.

In addition, the interaction between prediction-based optimization and subsequent operation is discussed as follows. Because optimized operating adjustments may influence subsequent system behavior, overly frequent interactions between RUL prediction and operating adjustment could increase operating fluctuations. In this study, the DA-RNN model was trained offline and applied without online updating; operating adjustments were implemented in a controlled manner. This design helps to maintain stable and realistic operating regimes consistent with field practice.

Overall, the comparative analysis demonstrates that the effectiveness of RUL-guided optimization depends on baseline operating stability. Wells with relatively stable inflow conditions tend to show incremental improvements, whereas wells subjected to volatile or rapidly deteriorating conditions can experience substantial extension in predicted RUL. Across both single-well cases and platform-level evaluation, the results consistently indicate that RUL-informed operational adjustment not only delays degradation but also stabilizes system behavior. These findings confirm that the proposed framework can translate degradation predictions into actionable operational guidance, supporting proactive, data-driven optimization of ESP performance. Importantly, the observed reduction in shutdown risk is associated with engineering-constrained and historically consistent operating adjustments, rather than representing a purely model-level counterfactual outcome.

5. Conclusions

This study proposed an RUL prediction framework for ESP systems based on a DA-RNN architecture. By integrating input attention and temporal attention mechanisms, the model effectively identifies key degradation-related variables and captures long-term dependencies in multivariate time series. Validation using data from 712 ESP wells demonstrated that the proposed method provides superior prediction accuracy compared with conventional RNN, LSTM, GRU, and Transformer models.

Through detailed analysis of field operating data, the model was shown to reliably reproduce both long-term degradation trends and short-term dynamic responses. The case studies demonstrated that the RUL framework can reveal degradation precursors associated with high-frequency operation, thermal stress, current fluctuations, and inflow instability. Furthermore, by linking RUL evolution with controllable operating parameters, the framework provides actionable insights for optimizing ESP operation. The optimization results from representative wells showed that RUL-informed adjustments help to stabilize degradation trajectories and extend equipment lifetime, with improvements ranging from incremental gains to substantial lifespan extension, depending on well conditions.

Although the present framework focuses on point-wise RUL estimation and does not explicitly incorporate uncertainty quantification, the proposed optimization strategy relies primarily on the relative variation and temporal trends of predicted RUL rather than absolute point values. In addition, failure definitions and failure-time labeling are based on field records and maintenance logs, which may involve ambiguity due to early pull-outs or non-failure shutdowns. Moreover, the model was developed and validated using data from a specific offshore oilfield and ESP configuration; its generalization to other fields or operating conditions may require further validation. Nevertheless, the consistent RUL response trends under different operating regimes provide practical guidance for lifespan-oriented decision-making, supporting the applicability of the proposed framework for operation optimization in real field environments.

Overall, the results demonstrate that integrating RUL prediction into ESP management workflows delivers meaningful operational benefits. The DA-RNN-based approach enables the early identification of adverse operating states, supports proactive adjustment of production strategies, and provides a reliable data-driven foundation for enhancing equipment reliability. Given its low computational cost and reliance on routinely collected field parameters, the proposed framework can be readily integrated into existing ESP monitoring infrastructures to support real-time decision-making and life-extension management in ESP-driven oilfield production.

Author Contributions

Conceptualization, X.L. (Xin Lu) and G.H.; methodology, X.L. (Xin Lu); software, X.L. (Xin Lu); validation, X.L. (Xin Lu), B.L. and Y.S.; formal analysis, X.L. (Xin Lu); investigation, X.L. (Xin Lu), B.L. and Y.S.; resources, B.L. and Y.S.; data curation, B.L. and Y.S.; writing—original draft preparation, X.L. (Xin Lu); writing—review and editing, X.L. (Xin Lu) and G.H.; visualization, X.L. (Xin Lu); supervision, G.H.; project administration, G.H.; funding acquisition, G.H. and X.L. (Xingyuan Liang). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science Foundation of China University of Petroleum, Beijing, grant number 2462023YJRC019. The APC was funded by the Science Foundation of China University of Petroleum, Beijing.

Data Availability Statement

The data used in this study were obtained from operational records of offshore electric submersible pump (ESP) systems and contain proprietary and confidential information related to field operations. Due to data confidentiality and restrictions imposed by the operating companies, the datasets are not publicly available. Aggregated data and representative examples supporting the findings of this study are included within the article. Further data may be made available from the corresponding author upon reasonable request, subject to approval by the data owners.

Conflicts of Interest

Author Bin Liu was employed by the company Petro China Jidong Oilfield Company; Author Yangnan Shangguan was employed by the company Petro China Changqing Oilfield Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Gupta, S.; Nikolaou, M.; Saputelli, L.; Bravo, C. ESP Health Monitoring KPI: A Real-Time Predictive Analytics Application. Available online: https://onepetro.org/SPEIE/proceedings-abstract/16IE/16IE/SPE-181009-MS/186735 (accessed on 28 November 2025). [CrossRef]
Pham, S.T.; Vo, P.S.; Nguyen, D.N. Effective Electrical Submersible Pump Management Using Machine Learning. Open J. Civ. Eng. 2021, 11, 70–80. [Google Scholar] [CrossRef]
Yang, J.; Wang, S.; Zheng, C.; Feng, G.; Du, G.; Tan, C.; Ma, D. Fault Diagnosis Method and Application of ESP Well Based on SPC Rules and Real-Time Data Fusion. Math. Probl. Eng. 2022, 2022, 8497299. [Google Scholar] [CrossRef]
Hackworth, M.; Williams, S. Real-Time Decision Making for ESP Management and Optimization; OnePetro: Richardson, TX, USA, 2016. [Google Scholar]
Silvia, S.; Gilad, Y.; Wilson, T.A.; Akbari, B.; Furlong, E.R. Case Study: Predicting Electrical Submersible Pump Failures Using Artificial Intelligence and Physics-Based Hybrid Models. Available online: https://onepetro.org/SPEIOGS/proceedings-abstract/22AIS/22AIS/D021S004R003/515674 (accessed on 28 November 2025). [CrossRef]
Abdalla, R.; Samara, H.; Perozo, N.; Carvajal, C.P.; Jaeger, P. Machine Learning Approach for Predictive Maintenance of the Electrical Submersible Pumps (ESPs). ACS Omega 2022, 7, 17641–17651. [Google Scholar] [CrossRef] [PubMed]
Mohanty, J.R.; Verma, B.B.; Ray, P.K. Prediction of fatigue crack growth and residual life using an exponential model: Part I (constant amplitude loading). Int. J. Fatigue 2009, 31, 418–424. [Google Scholar] [CrossRef]
Xie, M.; Wei, Z.; Zhao, J.; Wang, Y.; Liang, X.; Pei, X. Remaining useful life prediction of pipelines considering the crack coupling effect using genetic algorithm-back propagation neural network. Thin-Walled Struct. 2024, 204, 112330. [Google Scholar] [CrossRef]
Tien, S.-C.; Wei, H.; Chen, J.; Liu, Y. Energy-based time derivative damage accumulation model under uniaxial and multiaxial random loadings. Fatigue Fract. Eng. Mater. Struct. 2022, 45, 159–173. [Google Scholar] [CrossRef]
Zhang, X.-C.; Gong, J.-G.; Xuan, F.-Z. A deep learning based life prediction method for components under creep, fatigue and creep-fatigue conditions. Int. J. Fatigue 2021, 148, 106236. [Google Scholar] [CrossRef]
Fatemi, A.; Yang, L. Cumulative fatigue damage and life prediction theories: A survey of the state of the art for homogeneous materials. Int. J. Fatigue 1998, 20, 9–34. [Google Scholar] [CrossRef]
Khalil, Z.; Elghazouli, A.Y.; Martínez-Pañeda, E. A generalised phase field model for fatigue crack growth in elastic–plastic solids with an efficient monolithic solver. Comput. Methods Appl. Mech. Eng. 2022, 388, 114286. [Google Scholar] [CrossRef]
Lucarini, S.; Dunne, F.P.E.; Martínez-Pañeda, E. An FFT-based crystal plasticity phase-field model for micromechanical fatigue cracking based on the stored energy density. Int. J. Fatigue 2023, 172, 107670. [Google Scholar] [CrossRef]
Wen, H.; Zhang, L.; Sinha, J.K. From Envelope Spectra to Bearing Remaining Useful Life: An Intelligent Vibration-Based Prediction Model with Quantified Uncertainty. Sensors 2024, 24, 7257. [Google Scholar] [CrossRef] [PubMed]
Faggioni, N.; Caviglia, A.; Guarnera, N.; Schininà, E.; Sansebastiano, E.; Chiti, R. Enhancing Predictive Maintenance in the Maritime Industry with Unsupervised Learning. In Proceedings of the iSCSS, Liverpool, UK, 5–7 November 2024. [Google Scholar] [CrossRef]
Jiang, M.; Xing, T.; Zio, E.; Zhu, X. A Bayesian Data-Driven Framework for Aleatoric and Epistemic Uncertainty Quantification in Remaining Useful Life Predictions. IEEE Sens. J. 2024, 24, 42255–42267. [Google Scholar] [CrossRef]
Li, T.; Dong, J. Multi-level attention graph feature fusion smooth prognostics approach for aircraft engines remaining useful life prediction. J. Control Decis. 2024, 1–17. [Google Scholar] [CrossRef]
Qin, Y.; Song, D.; Chen, H.; Cheng, W.; Jiang, G.; Cottrell, G.W. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), Melbourne, Australia, 19–25 August 2017; pp. 2627–2633. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Deng, Y.; Guo, C.; Zhang, Z.; Zou, L.; Liu, X.; Lin, S. An Attention-Based Method for Remaining Useful Life Prediction of Rotating Machinery. Appl. Sci. 2023, 13, 2622. [Google Scholar] [CrossRef]
Boujamza, A.; Lissane Elhaq, S. Attention-based LSTM for Remaining Useful Life Estimation of Aircraft Engines. IFAC-PapersOnLine 2022, 55, 450–455. [Google Scholar] [CrossRef]
Jiang, F.; Hou, X.; Xia, M. Spatio-temporal attention-based hidden physics-informed neural network for remaining useful life prediction. Adv. Eng. Inform. 2025, 63, 102958. [Google Scholar] [CrossRef]
Heng, A.; Zhang, S.; Tan, A.C.C.; Mathew, J. Rotating machinery prognostics: State of the art, challenges and opportunities. Mech. Syst. Signal Process. 2009, 23, 724–739. [Google Scholar] [CrossRef]

Figure 1. Architecture of the DA-RNN model for ESP RUL prediction: (a) input attention module, which dynamically assigns weights to multivariate operating variables at each time step, highlighting features that are most relevant to equipment degradation; and (b) temporal attention module, which selectively emphasizes critical historical time steps in the hidden-state sequence, enabling the model to focus on key degradation-related temporal patterns. Together, the dual-stage attention mechanism enhances both interpretability and prediction accuracy.

Figure 2. Thermodynamic diagram of Pearson correlation coefficients between candidate operating variables and RUL. The color scale represents the magnitude and sign of the correlation coefficient calculated over the full dataset, with negative values indicating inverse relationships and positive values indicating direct relationships between individual variables and RUL.

Figure 3. Schematic overview of the DA-RNN-based RUL prediction framework. The figure illustrates the overall network architecture and information flow from windowed multivariate inputs to RUL outputs, including the encoder–decoder structure and the placement of the attention modules.

Figure 4. Training and validation loss curves of the DA-RNN model. The rapid convergence and stable behavior indicate effective learning of degradation patterns without overfitting.

Figure 5. Comparison between predicted and true RUL values across the full dataset. The close alignment with the diagonal demonstrates high prediction accuracy over a wide range of lifespan stages.

Figure 6. Performance comparison of DA-RNN with baseline models. DA-RNN consistently achieves lower prediction errors, highlighting the benefit of dual-stage attention in capturing degradation patterns.

Figure 7. RUL prediction results for a representative ESP well. The predicted trajectory closely follows the actual degradation trend, demonstrating the model’s capability to capture both long-term decline and short-term fluctuations.

Figure 8. RUL prediction results for ESP wells with relatively short lifespans: (a) predicted and observed RUL trajectories for low-lifespan well-A; (b) predicted and observed RUL trajectories for low-lifespan well-B; (c) predicted and observed RUL trajectories for low-lifespan well-C; and (d) predicted and observed RUL trajectories for low-lifespan well-D. The four subfigures present representative cases of ESP wells operating in late-life stages, illustrating how different operating conditions are reflected in the predicted RUL trajectories.

Figure 9. Comparison of predicted RUL under historical and optimized operating regimes for well-E. The optimized strategy stabilizes the degradation trajectory and mitigates abrupt RUL drops.

Figure 10. Comparison of predicted RUL under historical and optimized operating regimes for well-F. The optimized regime substantially slows degradation progression in wells with volatile operating conditions.

Table 1. Basic overview of ESP wells in the Bohai Oilfield.

No.	Variable Symbol	Influencing Factors	Unit	Minimum	Mean	Maximum
1	V1	Oil pressure	MPa	1.06	2	4.59
2	V2	Casing pressure	MPa	0.69	1.71	4.35
3	V3	Pump inlet pressure	MPa	0.71	3.13	13.32
4	V4	Pump outlet pressure	MPa	1.74	10.82	13.79
5	V5	Pump inlet temperature	°C	40.21	58.83	79.91
6	V6	Motor temperature	°C	40.07	89.55	139.29
7	V7	Current	A	15.39	44.52	82.89
8	V8	Voltage	V	609.02	1840.84	2536.1
9	V9	Frequency	Hz	21	49.36	65.67
10	V10	Motor Power	KW	13.87	104.96	250
11	V11	Oil	m³/d	0.59	68.21	236.36
12	V12	Gas	10⁴ m³	0.012	0.4	2.75
13	V13	Water	m³/d	0.17	237.83	1271.49
14	V14	Liquid	m³/d	2.66	306.55	1460.85
15	V15	GOR	m³/m³	1.78	57	331.28

Table 2. Performance comparison of different parameters of the DA-RNN model.

Window Length	Learning Rate	Number of Encoder Hidden Units	Number of Decoder Hidden Units	MRE	MAE	RMSE
20	0.001	64	64	0.0208	0.0043	0.0055
20	0.001	128	128	0.0254	0.0069	0.0091
10	0.001	64	64	0.0287	0.0072	0.0098
10	0.003	128	128	0.0561	0.0100	0.0108
15	0.002	64	64	0.0273	0.0082	0.0109
15	0.001	64	64	0.0369	0.0096	0.0112

Table 3. Platform-level comparison of ESP wells with and without RUL-guided optimization.

Category	Metric	Group A: RUL-Guided Operation	Group B: Experience-Based Operation
Well characteristics	Number of wells	18	21
	ESP configuration	ESP wells with comparable pump–motor capacity levels	ESP wells with comparable pump–motor capacity levels
	Average commissioning age (years)	4.6	4.9
	Average pump-setting depth, PSD (m)	2350	2380
	Average well inclination (°)	21.4	22.7
Operating conditions	Average water cut (%)	61.3	63.1
Operating conditions	Average liquid rate (m³/d)	432	445
Reliability indicators	Annualized shutdown rate (events per well·year)	0.42	0.53
Reliability indicators	Non-planned shutdown events (events per well·year)	0.31	0.39

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lu, X.; Han, G.; Liu, B.; Shangguan, Y.; Liang, X. Remaining Useful Life Prediction and Operation Optimization of Offshore Electric Submersible Pump Systems Using a Dual-Stage Attention-Based Recurrent Neural Network. J. Mar. Sci. Eng. 2026, 14, 75. https://doi.org/10.3390/jmse14010075

AMA Style

Lu X, Han G, Liu B, Shangguan Y, Liang X. Remaining Useful Life Prediction and Operation Optimization of Offshore Electric Submersible Pump Systems Using a Dual-Stage Attention-Based Recurrent Neural Network. Journal of Marine Science and Engineering. 2026; 14(1):75. https://doi.org/10.3390/jmse14010075

Chicago/Turabian Style

Lu, Xin, Guoqing Han, Bin Liu, Yangnan Shangguan, and Xingyuan Liang. 2026. "Remaining Useful Life Prediction and Operation Optimization of Offshore Electric Submersible Pump Systems Using a Dual-Stage Attention-Based Recurrent Neural Network" Journal of Marine Science and Engineering 14, no. 1: 75. https://doi.org/10.3390/jmse14010075

APA Style

Lu, X., Han, G., Liu, B., Shangguan, Y., & Liang, X. (2026). Remaining Useful Life Prediction and Operation Optimization of Offshore Electric Submersible Pump Systems Using a Dual-Stage Attention-Based Recurrent Neural Network. Journal of Marine Science and Engineering, 14(1), 75. https://doi.org/10.3390/jmse14010075

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Remaining Useful Life Prediction and Operation Optimization of Offshore Electric Submersible Pump Systems Using a Dual-Stage Attention-Based Recurrent Neural Network

Abstract

1. Introduction

2. Theoretical Foundation

2.1. Remaining Useful Life of ESP Systems

2.2. Dual-Stage Attention-Based Recurrent Neural Network (DA-RNN)

3. Methodology

3.1. Feature Importance Analysis for RUL Prediction

3.2. Construction of the DA-RNN-Based RUL Prediction Model

3.3. Optimization of ESP Operating Regime Based on RUL Prediction

4. Materials and Methods

4.1. Model Pre-Training and Performance Evaluation

4.2. Analysis of RUL Prediction Behavior Under Actual ESP Operating Conditions

4.3. Analysis of Operating Strategy Optimization for ESP Systems

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI