4.1. Model Pre-Training and Performance Evaluation
During the training phase, the Adam optimizer was employed to update the parameters of the DA-RNN model. The dataset contains nine selected features, with a batch size of 128 and a window length of 10. Both the encoder and decoder were configured with 64 hidden units, and the learning rate was set to 0.001. The model was trained for 100 epochs, and the mean squared error was used as the loss function. The dataset was divided into training, validation, and testing sets using a chronological split to preserve the temporal structure of the sequences. Model training was performed using only earlier operational data, while validation and testing were conducted on subsequent unseen periods. All preprocessing steps were performed exclusively on the training data and then consistently applied to the validation and testing sets, thereby avoiding data leakage and ensuring realistic prognostic evaluation. To reduce the effect of random initialization, each experiment was repeated five times, and the average performance was reported. Gradient clipping was also applied to avoid numerical instability during backpropagation. All model training was conducted on a workstation equipped with an NVIDIA RTX 50-series GPU (24 GB memory), an Intel Core i9 CPU, and 64 GB RAM. Under this hardware configuration, the total training time for the DA-RNN model was on the order of several hours, depending on the hyperparameter settings.
The training and validation errors of the DA-RNN model are shown in
Figure 4, where the horizontal axis represents the number of training epochs and the vertical axis denotes the root-mean-squared error (RMSE). Since the training data were normalized in advance, the RMSE is dimensionless. As illustrated in
Figure 4, both training and validation errors converge rapidly to a low level, demonstrating the stable learning and good fitting capability of the DA-RNN model.
The prediction performance of the DA-RNN model on the full dataset is shown in
Figure 5, where the horizontal axis represents the true RUL (in days) and the vertical axis denotes the predicted RUL. The samples from both the training and testing sets lie close to the diagonal reference line
, indicating strong consistency between predicted values and actual observations. This demonstrates that the DA-RNN model achieves high prediction accuracy across a wide range of RUL values.
To further validate the advantages of DA-RNN in time-series prediction tasks, additional RUL prediction models were constructed using RNN, LSTM, GRU, and Transformer architectures for comparison. All baseline models were trained under the same hyperparameter settings, including learning rate, batch size, window length, and number of epochs, to ensure a fair and unbiased comparison. For the Transformer baseline, a time series-oriented configuration was adopted rather than a generic sequence-modeling setup. An encoder-only architecture with positional encoding was used to preserve temporal order, and self-attention was applied on fixed-length sliding windows consistent with the RUL prediction task. Key architectural parameters were adjusted through preliminary testing to ensure stable convergence and reasonable performance under long-horizon degradation conditions. This design ensured that the Transformer baseline was appropriately tuned for time-series prognostics and fairly evaluated in the comparison.
The performance of each model was evaluated using three widely adopted metrics: mean relative error (MRE), mean absolute error (MAE), and root mean squared error (RMSE). The calculation formulas for these metrics are given in Equations (25)–(27).
In these equations, denotes the true RUL, denotes the predicted RUL, and n is the sample size. MRE measures the relative deviation between predicted and true values and reflects the model’s ability to remain accurate across different scales of RUL. MAE reflects the average absolute difference between predicted and actual values. RMSE introduces a squared penalty and is therefore more sensitive to large deviations; a lower RMSE indicates that even the worst prediction errors remain well controlled. All metrics are computed on a per-well basis, and the reported values are obtained by averaging the errors across all wells.
The comparative evaluation results are presented in
Figure 6. It is evident that the DA-RNN model consistently achieves lower MRE, MAE, and RMSE than the RNN, LSTM, GRU, and Transformer models. This indicates superior predictive performance under both typical and extreme conditions. In relative terms, the proposed DA-RNN achieves a substantial reduction in prediction error. On the test set, the MAE is reduced by approximately 78% compared with the best-performing baseline model (LSTM), while consistently outperforming all other evaluated architectures.
Figure 6 also includes two single-attention variants for ablation analysis, namely InputAttOnly and TempAttOnly. The results show that removing either the input attention mechanism or the temporal attention mechanism leads to a clear increase in prediction errors on both training and test sets. Compared with these single-attention variants, the full DA-RNN achieves an additional MAE reduction of approximately 73–75%, demonstrating that the two attention components play complementary roles in enhancing RUL prediction accuracy. The results confirm that the dual-stage attention mechanism of DA-RNN enables it to better extract degradation patterns embedded in multivariate ESP operating data, thus delivering more accurate and reliable RUL predictions. The performance gain can be attributed to the ability of DA-RNN to selectively emphasize degradation-relevant features and to capture key temporal patterns that are critical for accurate RUL estimation. From an operational perspective, prediction errors should be interpreted in relation to the time scale of ESP operation. Since operating adjustments and maintenance planning are typically conducted over horizons of tens to hundreds of days, the MAE reduction achieved by the proposed DA-RNN represents a meaningful improvement in the timing accuracy of operational decisions rather than a purely numerical gain.
To ensure the stability and robustness of the DA-RNN model, a comprehensive hyperparameter optimization procedure was conducted. A k-fold cross-validation strategy was adopted, in which the dataset was divided into k subsets, with k minus one subsets used for training and the remaining subset used for validation. This process was repeated k times so that each subset served once as the validation set. The average validation performance over k iterations was taken as the evaluation metric for each hyperparameter configuration. Considering the relatively large size of the dataset used in this study and the computational cost associated with cross-validation, the value of k was set to 5.
A grid search was then performed to examine the effect of key hyperparameters on the prediction performance of the DA-RNN model. The hyperparameters included the window length, learning rate, and the number of hidden units in the encoder and decoder. Prior to the grid search, the candidate range of window length (5–20) was determined based on the data resolution and practical ESP operating characteristics. The lower bound ensures sufficient recent context to capture short-term fluctuations and early degradation cues, while the upper bound avoids overly long windows that may dilute recent condition changes and increase computational cost. Specifically, the window length was varied from 5 to 20 with a step of 5, the learning rate ranged from 0.001 to 0.1 with a step of 0.001, and the number of hidden units in both the encoder and decoder ranged from 64 to 256 with a step of 64, resulting in a total of 6400 evaluated combinations. The grid search results are summarized in
Table 2. As shown in the table, a window length of 20 and a learning rate of 0.001 yielded the lowest prediction errors, while using 64 hidden units in both the encoder and decoder provided the most stable performance. Based on these results, the final model configuration consistently adopts the optimal hyperparameter combination identified by the grid search, including a window length of 20, a learning rate of 0.001, and 64 hidden units in both the encoder and decoder. Grid search was chosen because the hyperparameter space is low-dimensional and well bounded, allowing for exhaustive evaluation without prohibitive computational costs.
These findings suggest that a relatively long input window enables the DA-RNN model to capture richer temporal degradation information from the ESP operating data. A lower learning rate enhances training stability and reduces the risk of undesirable oscillation during optimization. Increasing the number of hidden units slightly improves representation capability, but excessively large hidden layers do not yield additional benefit. Overly complex configurations may even cause overfitting, which degrades prediction accuracy on the testing set. Based on these observations, the final hyperparameter settings of the DA-RNN model were selected as follows: window length of 20, learning rate of 0.001, 64 hidden units in the encoder, and 64 hidden units in the decoder.
In addition to the grid search, several supplementary analyses were conducted to further verify the reliability of the selected hyperparameters. First, each candidate configuration was trained multiple times using different random seeds. This ensured that the superior performance of the chosen configuration was not the result of a single initialization. Second, the learning curves corresponding to each configuration were examined to verify convergence stability. Hyperparameter combinations that exhibited irregular fluctuations in the validation error or failed to converge were excluded. Third, the sensitivity of the model to variations in window length, learning rate, and hidden units was analyzed. The final configuration demonstrated high robustness, as moderate perturbations to these hyperparameters did not lead to significant performance degradation.
These analyses confirm that the selected hyperparameter configuration provides a stable balance between model complexity and predictive accuracy. In addition, the variability across repeated experiments was examined to assess the robustness of the model performance. All reported results in this section were obtained by repeating the training process multiple times with different random initializations. The observed variation in prediction errors across runs remained limited, indicating stable convergence behavior and low sensitivity of the DA-RNN model to random initialization. The chosen settings form a reliable foundation for subsequent RUL prediction and the optimization of ESP operating regimes presented in the following sections.
4.2. Analysis of RUL Prediction Behavior Under Actual ESP Operating Conditions
To assess the practical applicability of the proposed RUL prediction framework, the model was applied to representative ESP operating data. The prediction results are presented in
Figure 7, where the horizontal axis denotes the actual operating time of the ESP and the vertical axis represents the RUL in days. The predicted RUL curve closely matches the true degradation trajectory across the entire operating period. This indicates that the DA-RNN model can reliably capture both long-term degradation patterns and short-term dynamic variations present in real ESP operation. For the representative well shown in
Figure 7, the MAE between the predicted and actual RUL is 23 days, quantitatively confirming the high prediction accuracy suggested by the visual agreement of the curves.
In addition to the overall agreement between predicted and true RUL values, the behavior of the model under different operating conditions provides further insights.
Figure 8 illustrates RUL prediction results for ESPs with relatively short operational lifespans. For ESPs operating for fewer than 1000 days, the RUL decreases rapidly during early production. For these short-lifespan ESP wells, the MAE of the RUL predictions in
Figure 8a–d is 57, 54, 68, and 70 days, respectively. Production data show that these wells typically exhibit limited inflow capacity at the beginning of operation. For example, the well corresponding to
Figure 8a experienced low liquid production during the first 50 days and operated at an average pump frequency of approximately 55 Hz. In contrast, wells with higher lifespans, such as the example shown in
Figure 8b, maintained more stable inflow conditions during early operation. The ESP in
Figure 8b operated at an average frequency of 48 Hz in the first 200 days, and its RUL curve exhibited much smaller fluctuations, with a maximum deviation of only 14.6 percent. These observations demonstrate that early-stage inflow stability plays an essential role in shaping the degradation behavior of ESP systems.
Further interpretation of the degradation process is shown in
Figure 8c. In this case, the predicted RUL exhibits a marked decline during the mid-life stage. According to field monitoring data, the pump intake temperature exceeded 120 degrees Celsius during this period. The elevated temperature accelerated insulation ageing and increased the risk of gas lock. As inflow gradually weakened, the ESP was operated at high frequency for extended periods in an effort to maintain production. Prolonged high-frequency operation led to increased motor current and additional internal heating, which intensified electrical and mechanical degradation mechanisms. These factors collectively reduced pump efficiency and shortened the RUL of the ESP. The DA-RNN model successfully captured this accelerated degradation stage, as reflected by the steep drop in the predicted RUL curve. This demonstrates that the model responds sensitively to variations in operational parameters such as motor temperature, motor current, and intake pressure. This behavior further illustrates that the dual-stage attention mechanism enables the model to focus on combinations of feature patterns—such as concurrent temperature elevation and current surges—that typically precede accelerated deterioration, thereby generating RUL trajectories that closely reflect underlying physical responses.
A similar degradation pattern is observed in
Figure 8d, where the RUL curve exhibits pronounced oscillations throughout the mid-to-late production period. Field data indicate that these fluctuations coincide with repeated cycles of inflow instability, during which the ESP experienced alternating periods of liquid fallback and transient gas interference. These conditions caused the motor current to rise sharply and intermittently, reflecting abrupt changes in hydraulic load. The frequent transitions between liquid-rich and gas-invaded flow not only increased the risk of partial gas lock but also forced the pump to operate away from its best-efficiency region. As a result, both electrical and mechanical stresses accumulated more rapidly, contributing to the accelerated decline in RUL observed in the prediction. The DA-RNN model captured these oscillatory degradation patterns effectively, demonstrating its capability to track fine-scale dynamic disturbances in operating parameters and to translate them into corresponding changes in the RUL trajectory.
Overall, the analysis confirms that the DA-RNN model can accurately reflect diverse degradation behavior under field operating conditions. The model captures both macro trends and micro dynamic responses in the degradation process, enabling early identification of adverse operating states and supporting timely intervention in ESP management. These findings highlight the model’s practical value for deployment in real production environments.
4.3. Analysis of Operating Strategy Optimization for ESP Systems
To further evaluate the practical value of the proposed RUL prediction framework, this section investigates how RUL-informed insights can guide the optimization of ESP operating strategies. Two representative wells, denoted as well-E and well-F, were selected to illustrate the impact of optimization on the predicted RUL trajectory under actual field conditions.
Figure 9 and
Figure 10 compare the predicted RUL under the historical operating regime with the predicted RUL under an optimized operating regime derived from RUL-based recommendations.
For well-E, the historical operation shows a steady decline in RUL, with occasional sharp drops corresponding to periods of elevated current draw and increased thermal stress. These abrupt declines are typically associated with transient increases in motor loading caused by inflow fluctuations or liquid-column oscillations. When the optimized operating strategy is applied, the predicted RUL curve becomes noticeably smoother toward the end of the operating timeline. Although the overall improvement is modest, with an increase of approximately 1.95% in predicted RUL, the optimized operating regime extends the RUL by approximately 18 days at the end of the 100-day optimization period, effectively suppressing the severe downward excursions observed under historical operation. The optimization primarily aligns pump frequency with real-time inflow capacity, reduces unnecessary high-frequency operation near the pump’s optimal efficiency region, and mitigates abrupt load transitions, thereby alleviating cumulative electrical and thermal stress.
For well-F, the effect of optimization is more pronounced. Under historical operation, the RUL exhibits a rapid decline after approximately 320 days, coinciding with frequent inflow instability and repeated frequency adjustments. These operational patterns drive the pump to traverse its best-efficiency region multiple times, amplifying cyclic loading and accelerating electrical and mechanical degradation. When the RUL-guided optimization strategy is applied, the degradation rate is significantly reduced, resulting in a 9.87% increase in predicted RUL. Over the optimization horizon, this improvement corresponds to an absolute extension of approximately 45 days in RUL, and the optimized RUL trajectory remains substantially flatter, indicating that stabilizing operating conditions and avoiding sustained high-stress states can effectively slow degradation progression in wells subject to volatile operating environments.
To extend the single-well analysis to the platform scale, a comparative evaluation was conducted using wells with similar background conditions, including comparable pump–motor capacity levels, commissioning age, pump-setting depth, well inclination, and operating envelopes. Based on whether the RUL-guided optimization strategy was applied, the selected wells were classified into Group A (RUL-guided operation) and Group B (experience-based operation). A concise platform-level comparison of the two groups is summarized in
Table 3.
Table 3 shows that, under comparable geometric and operating conditions, wells operated with RUL-guided optimization exhibit lower annualized shutdown rates and fewer non-planned shutdown events than those operated under conventional strategies. All operating regimes considered are constrained by existing engineering limits and fall within historically observed field ranges, indicating that the comparison reflects feasible operational behavior rather than purely hypothetical control scenarios.
To further interpret these platform-level differences, aggregated operational behaviors of the two groups were examined. Wells operated under experience-based strategies tend to exhibit frequent short-term adjustments in operating frequency and current in response to inflow fluctuations, increasing exposure to transient high-load and high-stress conditions. In contrast, wells operated with RUL-guided optimization generally maintain more stable operating regimes, with fewer abrupt load transitions and reduced residence time in sustained high-stress states. This systematic difference in operational behavior provides a practical explanation for the lower shutdown-related event frequency observed at the platform scale and links the statistical trends in
Table 3 to physically interpretable operating patterns.
This stability is also reflected in the response of the optimization results to small perturbations in key input variables. When inflow-related parameters, pressure conditions, and electrical load indicators vary within a narrow range representative of normal measurement uncertainty, the resulting changes in the optimization objective remain below approximately 2 percent and the recommended frequency–choke combinations do not exhibit material variation across the discretized operating space. This observation indicates that the RUL-guided optimization produces robust and stable operating recommendations rather than highly sensitive control decisions.
Beyond robustness, the reliability indicators in
Table 3 also imply the tangible operational and economic benefits of the RUL-guided optimization. Compared with experience-based operation, Group A shows a lower non-planned shutdown frequency (0.31 vs. 0.39 events per well year), i.e., 0.08 fewer unplanned events per well year, which corresponds to approximately 1.68 fewer events per year across the 21 wells in Group B. Under the reported operating conditions, the average oil rate is in the order of 164–167 m
3/d. Assuming a typical offshore downtime of D = 1–5 days per non-planned shutdown, an oil price of 75 USD/bbl (≈472 USD/m
3), and a direct corrective-intervention cost of 30–80 k USD per event, the reduced shutdown frequency implies about 13–67 m
3 of protected oil production per well per year (≈6–32 k USD/yr in avoided production loss) and an additional ≈2–6 k USD/yr in reduced intervention costs, yielding combined savings of roughly 9–38 k USD per well per year; at the 21-well scale, this corresponds to ≈0.18–0.79 M USD per year. These estimates, while scenario-dependent, quantitatively link the reliability improvements to reduced downtime risk and a lower corrective-maintenance burden under engineering-feasible operating regimes.
In addition, the interaction between prediction-based optimization and subsequent operation is discussed as follows. Because optimized operating adjustments may influence subsequent system behavior, overly frequent interactions between RUL prediction and operating adjustment could increase operating fluctuations. In this study, the DA-RNN model was trained offline and applied without online updating; operating adjustments were implemented in a controlled manner. This design helps to maintain stable and realistic operating regimes consistent with field practice.
Overall, the comparative analysis demonstrates that the effectiveness of RUL-guided optimization depends on baseline operating stability. Wells with relatively stable inflow conditions tend to show incremental improvements, whereas wells subjected to volatile or rapidly deteriorating conditions can experience substantial extension in predicted RUL. Across both single-well cases and platform-level evaluation, the results consistently indicate that RUL-informed operational adjustment not only delays degradation but also stabilizes system behavior. These findings confirm that the proposed framework can translate degradation predictions into actionable operational guidance, supporting proactive, data-driven optimization of ESP performance. Importantly, the observed reduction in shutdown risk is associated with engineering-constrained and historically consistent operating adjustments, rather than representing a purely model-level counterfactual outcome.