Real-Time Efficiency Prediction in Nonlinear Fractional-Order Systems via Multimodal Fusion

Ma, Biao; Dong, Shimin

doi:10.3390/fractalfract9080545

Open AccessArticle

Real-Time Efficiency Prediction in Nonlinear Fractional-Order Systems via Multimodal Fusion

by

Biao Ma

^† and

Shimin Dong

^*,†

School of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Fractal Fract. 2025, 9(8), 545; https://doi.org/10.3390/fractalfract9080545

Submission received: 1 July 2025 / Revised: 7 August 2025 / Accepted: 17 August 2025 / Published: 19 August 2025

(This article belongs to the Special Issue Artificial Intelligence and Fractional Modelling for Energy Systems)

Download

Browse Figures

Versions Notes

Abstract

Rod pump systems are complex nonlinear processes, and conventional efficiency prediction methods for such systems typically rely on high-order fractional partial differential equations, which severely constrain real-time inference. Motivated by the increasing availability of measured electrical power data, this paper introduces a series of prediction models for nonlinear fractional-order PDE systems efficiency based on multimodal feature fusion. First, three single-model predictions—Asymptotic Cross-Fusion, Adaptive-Weight Late-Fusion, and Two-Stage Progressive Feature Fusion—are presented; next, two ensemble approaches—one based on a Parallel-Cascaded Ensemble strategy and the other on Data Envelopment Analysis—are developed; finally, by balancing base-learner diversity with predictive accuracy, a multi-strategy ensemble prediction model is devised for online rod pump system efficiency estimation. Comprehensive experiments and ablation studies on data from 3938 oil wells demonstrate that the proposed methods deliver high predictive accuracy while meeting real-time performance requirements.

Keywords:

rod pump system; high-order fractional partial differential; prediction models; feature-fusion method

1. Introduction

With the sustained expansion of global energy demands, petroleum has established itself as a strategically vital energy resource that underpins the international economic framework. The beam pumping unit system [1], a predominant artificial lift technology for crude oil extraction, has been extensively deployed in stripper and marginal well operations due to its well-documented operational advantages. Characterized by mechanical robustness, operational reliability, and cost-effectiveness, this reciprocating pump system has become ubiquitous across major oilfield operations worldwide. However, prolonged oilfield depletion [2] has made the operational efficiency of beam pumping units an increasingly critical factor in production economics and energy optimization. As a result, the real-time efficiency monitoring and performance evaluation of beam pumping systems have emerged as key research priorities for improving oilfield production management and promoting sustainable energy utilization.

Traditional prediction models for estimating the efficiency of beam pumping systems include model-based approaches utilizing mathematical models and data-driven approaches based on historical data. Mathematical methods based on nonlinear fractional-order partial differential equations involve solving intricate nonlinear fractional-order partial differential equations. The mathematical formulation is presented as follows:

ρ A \frac{\partial^{2} u (x, t)}{\partial t^{2}} = \frac{\partial}{\partial x} (E A \frac{\partial u (x, t)}{\partial x} + E A τ^{α} \frac{\partial^{α}}{\partial t^{α}} (\frac{\partial u (x, t)}{\partial x})) + f (x, t)

(1)

However, these traditional models, while offering valuable insights by combining geological and mechanical data, have three main drawbacks. They require large, detailed datasets and complex equations, struggle with the system’s nonlinear and time-varying behavior, and demand heavy computation, making real-time prediction and adaptive control impractical.

The data-driven soft sensing approaches for system efficiency estimation exhibit strong dependence on high-quality datasets while suffering from inherent limitations in physical interpretability, thereby significantly constraining parameter optimization and practical engineering applications. Furthermore, individual models demonstrate notable vulnerability to outliers, and conventional ensemble strategies are prone to either sensitivity issues or overfitting risks, ultimately compromising the real-time performance of efficiency prediction. The utilization of measured electrical power time-series data for system efficiency prediction enhances real-time monitoring capabilities while introducing new technical challenges. As a characteristic temporal data modality, the power sequences combine with existing system parameters to form a heterogeneous multimodal dataset. Consequently, developing effective multimodal feature fusion methodologies emerges as the critical pathway for achieving substantial improvements in prediction accuracy.

To address these challenges, we propose a series of prediction models for evaluating the system efficiency of beam pumping wells. First, three base learners are developed: an Incremental Cross-Fusion Model, an Adaptive-Weight Late Fusion model, and a Two-Stage Progressive Feature Fusion model. Next, two ensemble frameworks are constructed using these base learners: a Parallel-Series Cascade Ensemble and a Data Envelopment Analysis-based ensemble. Finally, we introduce a multi-strategy ensemble prediction model for beam pumping well efficiency that balances base-learner diversity with predictive accuracy. The contributions of this work are as follows:

(1): First, we propose three base fusion-based prediction models for real-time efficiency prediction in nonlinear fractional-order partial differential systems: Asymptotic Cross-Fusion, Adaptive-Weight Late Fusion, and Two-Stage Feature Fusion.
(2): Then, the development of two ensemble strategies integrating the base prediction models for real-time efficiency prediction in nonlinear fractional-order partial differential systems: a Parallel-Series Cascade strategy and a Data Envelopment Analysis strategy.
(3): Finally, we introduce a multi-strategy ensemble prediction model for real-time efficiency prediction in nonlinear fractional-order partial differential equation systems.

2. Related Work

2.1. Mathematical Models

Prediction models for beam pumping system efficiency have predominantly relied on mathematical models grounded in the rod–string’s longitudinal vibration. Gibbs [3] first derived the fundamental rod–string vibration equation. Lekia [4] then introduced a rod–fluid coupled vibration simulation model. Xing [5,6] proposed a strongly nonlinear longitudinal vibration simulation model, while Moreno [7] formulated a nonlinear vibration equation for directional wells. Tarmigh [8] presented a two-phase flow-based vibration equation, and Yin [9] obtained an analytical solution for multi-tapered rod–string vibration. Wang [10] developed a simplified solid–thermal vibration model, and Qin [11] established an equivalent damping coefficient via friction energy-conservation principles. Ma [12] constructed a multiphase-flow simulation prediction model, whereas Langbauer [13] applied finite-element methods to derive the vibration equation. Lukasiewicz [14] addressed deviated wells, and W [15] proposed a gas–liquid separation-based model. Lekia [4] also formulated motion equations for surface equipment coupled with rod–fluid dynamics. Li [11] accounted for viscous and local damping losses, and Wang [16] introduced an axial–transverse coupled vibration simulation for deviated wells. Finally, Dong [17] examined the effects of real-time power-frequency variation and motor–load torque fluctuation on crank motion, rod–string vibration, and system power parameters. Although these traditional models integrate geological and mechanical data to yield valuable theoretical insights, they exhibit three primary limitations. First, they demand extensive, high-fidelity datasets and intricate mathematical formulations. Second, they lack robustness in capturing the intrinsic nonlinear dynamics and time-varying characteristics of the system. Third, their substantial computational overhead renders real-time prediction and adaptive control impractical and cost-prohibitive.

2.2. Prediction Models of System Efficiency Based on Historical Data

With the advancement of machine-learning technologies, an increasing number of studies have tackled the prediction of beam pumping system efficiency using data-driven approaches. Tan [18] employed time-series models, while Ma [19,20] investigated both graph-neural-network architectures and stacking-ensemble frameworks. However, these methods depend on large historical datasets and typically utilize either a single-model strategy or conventional ensemble techniques; consequently, they [21,22,23] often lack real-time responsiveness, exhibit reduced prediction accuracy, and suffer from limited robustness.

2.3. Multimodal Feature Fusion Networks

In recent years, advancements in sensor technologies have enabled the acquisition of multimodal data characterized by heterogeneous modalities. To address multimodal feature-fusion challenges, various methodologies have been proposed and successfully applied across domains. Zhou [24] developed an adversarial-learning-assisted perception importance fusion network. Chen [25] and Wang [26] introduced multimodal fusion techniques for sentiment analysis, while Milon and Zhao [27,28] proposed multimodal human-recognition systems. Moreover, multimodal fusion [29] has been exploited for fault diagnosis, fake information detection [30] using progressive fusion networks, depression detection [31], and human recognition [27]. Despite these advances, existing frameworks are tailored to specific problem domains and do not adequately capture the heterogeneous data types—including string-based, numerical, and sequential modalities—intrinsic to sucker rod pumping system efficiency prediction; consequently, current approaches remain insufficient for this application.

3. Basic Definitions

In this paper, “PCFE” denotes the Progressive Cross-Fusion Efficiency prediction model for rod pumping systems; “AWFE” denotes the Adaptive-Weight Late Fusion Efficiency prediction model; “TSPE” denotes the Two-Stage Progressive Feature-Fusion Efficiency prediction model; “EPCI” denotes the Parallel-Series Cascade Ensemble strategy model; “EDEA” denotes the Data Envelopment Analysis-based Online Efficiency prediction model; and “MEIE” denotes the Multi-strategy Ensemble Integration model for the online efficiency prediction of rod pumping systems. “QR” denotes quantile regression stratum.

4. Methodology

4.1. Prediction Models of Beam Pumping System Efficiency Based on Progressive Cross-Fertilization

The factors influencing beam pumping system efficiency involve three types of data: string data, sequential data, and numerical data. Conventional feature processing methods often extract these features and concatenate them directly as the final feature set. This approach, however, frequently results in high dimensionality, information redundancy, and a failure to capture critical inter-feature relationships [32,33,34]. To overcome these limitations, we propose a prediction model for beam pumping system efficiency based on Progressive Cross-Fusion Efficiency (PCFE). The PCFE model comprises three modules: a feature extraction module, a feature fusion module, and a prediction module. The detailed structure and function of each module are described below, with the overall workflow illustrated in Figure 1 and Table 1.

In the feature extraction phase, we construct a multi-scale feature extraction framework through progressive integration of Residual Networks (ResNet), Transformer architecture, and Cross-Attention mechanisms. This integrated approach enables effective feature extraction for sequential data. For string-type data, feature extraction is performed using one-hot encoding. The mathematical formulation of the feature extraction process is defined as follows:

X = \{x_{1}, x_{2}, x_{3}, \cdot \cdot \cdot\}

(2)

Z = \{z_{1}, z_{2}, z_{3}, \cdot \cdot \cdot\}

(3)

F = R e s N e t (X)

(4)

T = T r a n s f o r m e r E n c o d e r (F)

(5)

X_{1} = {[s o f t \max (\frac{(T W_{h} Q) {(T W_{h} K)}^{T}}{\sqrt{d_{h}}}) (T W_{h} V)]}_{h = 1}^{H} W^{O}

(6)

Z_{1} = O n e h o t (Z)

(7)

where

X

is the raw sequence data;

Z

is the raw string data;

X_{1}

is the extracted features from the sequence data;

Z_{1}

is the extracted features from the string data;

R e s N e t (\cdot)

is a ResNet neural network;

T r a n s f o r m e r E n c o d e r (\cdot)

is the Transformer neural network;

O n e h o t (\cdot)

is the one-hot coding mechanism;

W_{h}

and

W^{O}

are weights; and

Q

,

K

, and

V

are query, key, and value, respectively.

To better capture the structural characteristics of raw data and mitigate cross-modal disparities, this study proposes a progressive cross-feature fusion approach to enhance model robustness. The methodology integrates three components: Bidirectional Long Short-Term Memory (BiLSTM), Bidirectional Gated Recurrent Unit (BiGRU), and Cross-Attention mechanisms. Initially, features extracted from string data

Z_{1}

undergo processing through the BiLSTM network, mathematically formulated as follows:

Z_{2} = [\vec{L S T M} (Z_{1, t}, \vec{h_{t - 1}}); \overset{\leftarrow}{L S T M} (Z_{1, t}, \overset{\leftarrow}{h_{t - 1}})]

(8)

where

L S T M (\cdot)

is a LSTM neural network.

Subsequently, the output

Z_{2}

from the BiLSTM network and the extracted sequence features

X_{1}

are passed to the Cross-Attention module to enable deep interaction between the two. The mathematical formulation of this process is provided in Equation (8).

Z_{3} = X_{1} W_{V} S o f t \max (\frac{Z_{2} W_{Q} {(X_{1} W_{K})}^{Τ}}{\sqrt{d_{h}}})

(9)

where

W_{V}

,

W_{Q}

, and

W_{K}

are weights.

Finally, the output

Z_{3}

of the Cross-Attention module is fed into the BiGRU network for further processing. The output of the BiGRU is then combined with numerical data via another Cross-Attention module to enable deep interaction, ultimately producing the fused features. The mathematical representation of this process is provided in Equations (10) and (11).

Z_{B i G R U} = [\vec{G R U} (Z_{3, t}, \vec{h_{t - 1}}); \overset{\leftarrow}{G R U} (Z_{3, t}, \overset{\leftarrow}{h_{t - 1}})]

(10)

Z_{4} = Z_{B i G R U} W_{V} S o f t \max (\frac{S W_{Q} {(Z_{B i G R U} W_{K})}^{Τ}}{\sqrt{d_{h}}})

(11)

where

Z_{2}

,

Z_{3}

is the intermediate variable for data fusion;

Z_{4}

is the final fused features;

S

is to the numerical data features;

G R U (\cdot)

is a GRU neural network; and

W_{V}

,

W_{Q}

, and

W_{K}

are weights.

During prediction, traditional single models often suffer from inaccurate predictions, vulnerability to disturbances, and a tendency to predict only conditional means—a limitation that may lead to local optima. To overcome this, we integrate CNN, BiGRU, an Attention mechanism, and a Quantile Regression (QR) layer, proposing a novel QRCNN–BiGRU–Attention model. The mathematical formulation of the model is presented below:

Z_{z} = ([\vec{G R U} (C N N (Z_{4}), \vec{h_{t - 1}}); \overset{\leftarrow}{G R U} (C N N (Z_{4}), \overset{\leftarrow}{h_{t - 1}})])

(12)

Z_{Attention} = Z_{z} S o f t \max (\frac{Z_{z} {(Z_{z})}^{Τ}}{\sqrt{d_{k}}})

(13)

y = Q R (Z_{Attention})

(14)

where

y

is the final predicted value;

C N N (\cdot)

is a convolutional neural network; and

Q R (\cdot)

is the quantile regression stratum.

4.2. Adaptive Weight-Based Prediction Models of Efficiency of Late Fusion Pumping Well Systems

Today, widely used multi-modal feature fusion methods primarily include early fusion [32], intermediate fusion [33], and late fusion [34]. Late fusion strategies typically rely on summation, multiplication, and convolution operations. However, these approaches are often sensitive to scale and noise, struggle to handle heterogeneous features, and require additional parameter tuning. To address these limitations, we propose a prediction model method for beam pumping system efficiency based on a late fusion strategy inspired by human evolutionary algorithms. The proposed model consists of three main modules: a feature extraction module, an intermediate prediction module, and an Adaptive Weighting-based Feature Fusion module (AWFE). Each module is described in detail below, with the overall workflow illustrated in Figure 2 and Table 2.

In the feature extraction stage, we progressively integrate ResNet, Transformer, and Cross-Attention to construct a multi- scale feature extraction approach, enabling effective feature extraction for sequential data. For string data, feature extraction is performed using one-hot encoding. The mathematical model for the entire feature extraction process is described as follows:

X = \{x_{1}, x_{2}, x_{3}, \cdot \cdot \cdot\}

(15)

Z = \{z_{1}, z_{2}, z_{3}, \cdot \cdot \cdot\}

(16)

F = R e s N e t (X)

(17)

T = T r a n s f o r m e r E n c o d e r (F)

(18)

X_{1} = {[s o f t \max (\frac{(T W_{h}^{Q}) {(T W_{h}^{K})}^{T}}{\sqrt{d_{h}}}) (T W_{h}^{V})]}_{h = 1}^{H} W^{O}

(19)

Z_{1} = O n e h o t (Z)

(20)

where

X

is the raw sequence data;

Z

is the raw string data;

X_{1}

is the extracted features from the sequence data;

Z_{1}

is the extracted features from the string data;

R e s N e t (\cdot)

is a ResNet neural network;

T r a n s f o r m e r E n c o d e r (\cdot)

is the Transformer neural network;

O n e h o t (\cdot)

is the one-hot coding mechanism;

W_{h}

and

W^{O}

are weight; and

Q

,

K

, and

V

are query, key, and value, respectively.

In the intermediate prediction module, inspired by the Boosting ensemble learning framework, we integrate CNN, BiGRU, Cross-Attention, BiLSTM, and quantile regression models in a cascaded manner. This leads to the proposal of two ensemble prediction models: the QRCNN–BiGRU–Cross-Attention model and the QRCNN-BiLSTM-BiGRU model. Initially, we use the Cross-Attention module to facilitate interaction between the numerical data and the extracted string features, as well as the extracted sequence features. The mathematical formulation of this process is presented below:

M_{1} = [\vec{L S T M} (Z_{t}, \vec{h_{t - 1}}); \overset{\leftarrow}{L S T M} (Z_{t}, \overset{\leftarrow}{h_{t - 1}})]

(21)

M_{2} = F N N (S)

(22)

M_{3} = M_{1} W_{V} S o f t \max (\frac{M_{2} W_{Q} {(M_{1} W_{K})}^{Τ}}{\sqrt{d_{h}}})

(23)

M_{4} = X_{1} W_{V} S o f t \max (\frac{M_{2} W_{Q} {(X_{1} W_{K})}^{Τ}}{\sqrt{d_{h}}})

(24)

where

M_{1}, M_{2}, M_{3}, M_{4}

is the intermediate variable generated after processing the data;

F N N (\cdot)

is a fully connected neural network; and

W_{V}

,

W_{Q}

, and

W_{K}

are weights.

Subsequently, the fused intermediate variable is utilized as the input feature for the QRCNN–BiGRU–Cross-Attention ensemble prediction model to generate predictions. Similarly, the fused intermediate variable is also employed as the input feature for the QRCNN-BiLSTM-BiGRU ensemble prediction model. The mathematical formulations of these processes are presented below:

Z_{z 1} = ([\vec{G R U} (C N N (M_{3}), \vec{h_{t - 1}}); \overset{\leftarrow}{G R U} (C N N (M_{3}), \overset{\leftarrow}{h_{t - 1}})])

(25)

Z_{c r o s s a t t e n t i o n 1} = Z_{z 1} W_{V} S o f t \max (\frac{Z_{z 1} W_{Q} {(Z_{z 1} W_{K})}^{Τ}}{\sqrt{d_{h}}})

(26)

y_{11} = Q R (Z_{c r o s s a t t e n t i o n 1})

(27)

Z_{z 2} = ([\vec{L S T M} (C N N (M_{3}), \vec{h_{t - 1}}); \overset{\leftarrow}{L S T M} (C N N (M_{3}), \overset{\leftarrow}{h_{t - 1}})])

(28)

Z_{z 3} = ([\vec{G R U} (C N N (M_{3}), \vec{h_{t - 1}}); \overset{\leftarrow}{G R U} (C N N (M_{3}), \overset{\leftarrow}{h_{t - 1}})])

(29)

y_{11} = Q R (Z_{c r o s s a t t e n t i o n 1})

(30)

y_{12} = Q R (Z_{z 3})

(31)

where

y_{11}, y_{12}

is the intermediate prediction variable.

y = w_{1} y_{11} + w_{2} y_{12}

(32)

w_{1} = H E O A (M S E)

(33)

w_{2} = H E O A (M S E)

(34)

where

y

is the final predicted value;

w_{1}, w_{2}

is the prediction weights;

H E O A (\cdot)

is to the human evolutionary algorithm; and

W_{V}

,

W_{Q}

, and

W_{K}

are weights.

4.3. Prediction Models of Pumping Well System Efficiency Based on Two-Step Progressive Feature Fusion

Currently, widely used multi-modal feature fusion methods primarily include early fusion [32], intermediate fusion [33], and late fusion [34]. Early feature fusion often suffers from drawbacks such as the curse of dimensionality and noise propagation. Intermediate feature fusion typically faces high implementation complexity. Late fusion, on the other hand, often fails to capture deep-level interactions between modalities. To address these challenges, we propose a prediction model method for beam pumping system efficiency based on a Two-Stage Progressive Feature Fusion approach (TSPE). The model consists of three key phases: a feature extraction phase, a first-stage prediction model, and a second-stage prediction model. A detailed description of the model is provided below, with the workflow illustrated in Figure 3 and Table 3. In the feature extraction stage, we progressively integrate ResNet, Transformer, and Cross-Attention to construct a multi-scale feature extraction approach, enabling effective feature extraction for sequential data. For string data, feature extraction is performed using one-hot encoding. The mathematical model for the entire feature extraction process is described as follows:

X = \{x_{1}, x_{2}, x_{3}, \cdot \cdot \cdot\}

(35)

Z = \{z_{1}, z_{2}, z_{3}, \cdot \cdot \cdot\}

(36)

F = R e s N e t (X)

(37)

T = T r a n s f o r m e r E n c o d e r (F)

(38)

X_{1} = {[s o f t \max (\frac{(T W_{h}^{Q}) {(T W_{h}^{K})}^{T}}{\sqrt{d_{h}}}) (T W_{h}^{V})]}_{h = 1}^{H} W^{O}

(39)

Z_{1} = O n e h o t (Z)

(40)

where

X

is the raw sequence data;

Z

is the raw string data;

X_{1}

is the extracted features from the sequence data;

Z_{1}

is the extracted features from the string data;

R e s N e t (\cdot)

is a ResNet neural network;

T r a n s f o r m e r E n c o d e r (\cdot)

is the Transformer neural network;

O n e h o t (\cdot)

is the one-hot coding mechanism;

W_{h}

and

W^{O}

are weight;

Q

,

K

, and

V

are query, key, and value, respectively.

In the first-stage prediction model phase, to address the limitations of single prediction models in terms of accuracy and robustness to disturbances, we propose an ensemble prediction model based on QRBiLSTM–BiGRU–Attention. Initially, the sequence data features

X_{1}

, string data features

Z_{1}

, and numerical data features

S

are concatenated. Subsequently, predictions are generated using the QRBiLSTM–BiGRU–Attention ensemble prediction model. The mathematical formulation of this process is presented below:

Z_{z 5} = ([\vec{L S T M} ([X_{1}, Z_{1}, S], \vec{h_{t - 1}}); \overset{\leftarrow}{L S T M} ([X_{1}, Z_{1}, S], \overset{\leftarrow}{h_{t - 1}})])

(41)

Z_{z 6} = ([\vec{G R U} (Z_{z 5}, \vec{h_{t - 1}}); \overset{\leftarrow}{G R U} (Z_{z 5}, \overset{\leftarrow}{h_{t - 1}})])

(42)

Z_{Attention - 2} = Z_{z 6} S o f t \max (\frac{Z_{z 6} {(Z_{z 6})}^{Τ}}{\sqrt{d_{k}}})

(43)

y_{13} = Q R (Z_{Attention - 2})

(44)

where

y_{13}

is the intermediate prediction variable.

In the second-stage prediction model phase, to further enhance prediction accuracy and improve robustness to disturbances, we propose an ensemble prediction model based on QRBiRNN-BiGRU-BiLSTM. Initially, the intermediate prediction variables, sequence data features, string data features, and numerical data features are concatenated. Subsequently, predictions are generated using the QRBiRNN-BiGRU-BiLSTM ensemble prediction model. The mathematical formulation of this process is presented below:

Z_{z 7} = ([\vec{R N N} ([X_{1}, Z_{1}, S, y_{13}], \vec{h_{t - 1}}); \overset{\leftarrow}{R N N} ([X_{1}, Z_{1}, S, y_{13}], \overset{\leftarrow}{h_{t - 1}})])

(45)

Z_{z 8} = ([\vec{G R U} (Z_{z 7}, \vec{h_{t - 1}}); \overset{\leftarrow}{G R U} (Z_{z 7}, \overset{\leftarrow}{h_{t - 1}})])

(46)

Z_{Attention - 3} = Z_{z 8} S o f t \max (\frac{Z_{z 8} {(Z_{z 8})}^{Τ}}{\sqrt{d_{k}}})

(47)

y_{13} = Q R (Z_{Attention - 3})

(48)

where

y

is intermediate predictor variables and

R N N (\cdot)

is a RNN.

4.4. Prediction Models of Pumping Well System Efficiency Based on the Parallel-Strand Cascaded Integration Strategy

Ensemble learning is a machine learning approach that combines the predictions of multiple models to enhance overall performance. Common strategies include Bagging and Boosting. The Bagging strategy reduces variance, exhibits strong robustness, and is easily parallelizable, but its ability to improve bias is limited, making it suitable for high-variance models. In contrast, the Boosting strategy reduces bias and demonstrates strong predictive power, but it is slower to train, prone to overfitting, and sensitive to noise. To fully leverage the advantages of both Bagging and Boosting ensemble learning, we propose a prediction model method for beam pumping system efficiency based on a Parallel-Series Cascade Ensemble learning strategy (EPCI). The model primarily consists of two components: an initial Adaptive-Weight Parallel Ensemble prediction model and a subsequent Series Ensemble prediction model. A detailed description of the entire model is provided below, with the workflow illustrated in Figure 4 and Table 4.

During the initial modeling phase, we propose an adaptive parallel ensemble prediction framework. This architecture integrates PCFE and AWFE modules through an adaptively weighted late fusion mechanism. The weight coefficients are optimized via the Genghis Khan Shark Optimization (GKSO) algorithm to enhance prediction accuracy. The mathematical formulation of this integrated process is defined as follows:

y_{6} = w_{1} y_{P C F E} + w_{2} y_{A W F E}

(49)

w_{1} = G K S O (M S E)

(50)

w_{2} = G K S O (M S E)

(51)

where

G K S O (\cdot)

is the Genghis Khan Shark Optimization algorithm and

w_{1}, w_{2}

is the optimization result of the Genghis Khan Shark Optimization algorithm.

In the later Series Ensemble prediction model phase, to fully leverage the advantages of various ensemble strategies, we innovatively integrate the two-stage Progressive Feature Fusion-based online prediction model for pumping well system efficiency with the initial Adaptive Parallel Ensemble prediction model in a serial manner. First, the prediction results from the initial Adaptive Parallel Ensemble prediction model are concatenated with the features extracted during the feature extraction phase.

X_{5} = c o n n e c t i o n (y_{6}, X_{1}, Z_{1}, S)

(52)

where

c o n n e c t i o n (\cdot)

is splicing.

Subsequently, the two-stage progressive feature fusion-based prediction model method for pumping well system efficiency is employed as the base learner for training, yielding more accurate prediction results. The mathematical formulation of this process is presented below:

y = T S P E (X_{5})

(53)

where

T S P E (\cdot)

is the Two-Step Progressive Feature Fusion soft measure.

4.5. Online Prediction Models of Pumping Well System Efficiency Based on the Data Envelope Method Integration Strategy

Today, most methods for combining base learners primarily include voting and averaging techniques. Simple ensemble methods such as voting and averaging offer certain advantages but also exhibit notable limitations, particularly in scenarios with low model diversity, significant performance disparities among base learners, or high levels of noise. These methods cannot often model relationships between base learners and cannot automatically adjust model weights, which restricts their potential for performance improvement. In contrast, the Data Envelopment Analysis (DEA) method can automatically compute data weights tailored to different decision-making units. Therefore, we introduce DEA and propose a prediction model method for beam pumping system efficiency based on a DEA ensemble strategy (EDEA). The prediction model consists of two phases: a base learner prediction phase and a DEA-based ensemble phase. A detailed description of the entire model is provided below, with the workflow illustrated in Figure 5 and Table 5.

First, the decision-making units are defined as: the Progressive Cross-Fusion method, the Adaptive-Weight Late Fusion method, and the Two-stage Progressive Feature Fusion Prediction Model method. The input vectors consist of factors influencing the system efficiency of pumping wells. The output vectors include the coefficient of determination for each base learner.

Next, a linear programming model is established. Here, we have three base learners, each with 28 inputs and 1 output. The primary objective of the DEA model is to maximize the efficiency of each base learner by solving the linear programming problem and determining the optimal weight coefficients. The mathematical formulation of the linear programming model is as follows:

For a decision cell

k

, its efficiency

η_{k}

can be solved by the following linear programming:

M a x i m i z e : η_{k} = \frac{λ_{1} R_{1}^{2} + λ_{2} R_{2}^{2} + λ_{3} R_{3}^{2}}{\sum_{i = 1}^{29} θ_{i} X_{i k}}

(54)

The constraints are as follows:

\begin{array}{l} \sum_{i = 1}^{3} λ_{i} = 1 \\ λ_{i} \geq 0, i = 1, 2, 3 \\ λ_{i} \leq 1, i = 1, 2, 3 \\ λ_{1} R_{1}^{2} + λ_{2} R_{2}^{2} + λ_{3} R_{3}^{2} \leq 1 \\ M a x i m i z e : η_{k} \end{array}

(55)

where

λ_{i}

is the weighting factor, which represents the contribution of the base learner in the integration;

θ_{i}

is the efficiency value of the base learner with the goal of maximizing the efficiency; and

η_{k}

is the efficiency of the first decision unit.

Finally, by solving the aforementioned linear programming problem, the efficiency

θ

of each base learner and the corresponding weight coefficients

λ_{i}

can be obtained. These weight coefficients reflect the relative importance or contribution of each base learner within the ensemble. The final predicted value is derived by performing a weighted integration of the computed weight coefficients.

y = λ_{1} y_{11} + λ_{2} y_{12} + λ_{3} y_{13}

(56)

where

λ_{1}

,

λ_{2}

, and

λ_{3}

are the weights and

y_{11}

,

y_{12}

, and

y_{13}

are the prediction result.

4.6. Online Prediction Models of Pumping Well System Efficiency Based on Integration of Multi-Integrated Strategies Integration

Currently, most studies focus on using single models as learners and employ specific ensemble strategies to integrate these base learners, aiming to explore the strengths and weaknesses of individual models. Alternatively, to improve ensemble accuracy, researchers have further investigated ensemble methods by incorporating accuracy and diversity analysis. However, these studies are typically based on single models, with limited exploration of integrating different ensemble strategies or balancing diversity and prediction accuracy in ensemble design. Therefore, by balancing the diversity and prediction accuracy among ensemble strategies, we propose a novel prediction model method for beam pumping system efficiency based on the integration of multiple ensemble strategies (MEIE). A detailed description of the entire model is provided below, with the workflow illustrated in Figure 6 and Table 6.

Step 1: Initially, the entire dataset was categorized by data type into string data, sequential data, and numerical data, and was then partitioned into training, validation, and test sets.

Step 2: Subsequently, the bootstrap resampling method was applied to the training set to randomly draw samples with replacement, yielding four distinct training subsets. Each subset was then used to evaluate both the EPCI soft sensing method and EDEA soft sensing method for rod pump system efficiency, resulting in eight base ensemble learners, as detailed below:

M = \{\begin{array}{l} E P C I_{1}, E D E A_{1}, E P C I_{2}, E D E A_{2}, \\ E P C I_{3}, E D E A_{3}, E P C I_{4}, E D E A_{4} \end{array}\}

Step 3: Next, to balance the diversity and prediction accuracy among the base ensemble strategy learners, we propose a novel method for selecting base ensemble strategy learners. This method incorporates diversity selection and accuracy selection analysis. In the diversity analysis module, an arbitrary model

\mod {el}_{i}

is selected from

M

to generate a validation set of prediction results

y_{i}

. Another model

\mod {el}_{j}

is then selected to produce a validation set of prediction results

y_{j}

. The Kendall rank correlation coefficient is calculated to determine the correlation coefficient

z_{1}

between

y_{i}

and

y_{j}

. This process is repeated to compute the correlation coefficients

z = \{z_{1}, z_{2}, z_{3}, z_{4}, z_{5}, z_{6}, z_{7}\}

among the predicted values of all models.

Step 4: The base models in a are reordered based on the magnitude of their correlation coefficients, forming

M

, a new set of base learner models, and

M_{2}

, which exhibits a specific order. To ensure maximum diversity, the base ensemble strategy learner with the highest correlation after sorting is removed, resulting in a new set of base learner models,

M_{3}

.

Step 5: The coefficients of determination of the remaining base ensemble strategy learners from Step 4 are arranged in descending order. To ensure the prediction accuracy of the ensemble, the smallest coefficient

R^{2}

is discarded, leaving six base ensemble strategy learners. These learners form a new set of base learner models,

M_{4}

.

Step 6: Finally, leveraging the Blending paradigm, the validation-set predictions of the remaining six base learners were aggregated to form a new training dataset for the meta-learner, which was then trained and subsequently evaluated on the final test set.

5. Experiment

5.1. Data Description

In our study, we randomly selected 3938 actual oil wells from a self-constructed database of a western Chinese oilfield. Due to the presence of unusable and missing data, we employed random sampling from the original dataset. When unusable data were encountered, they were discarded, and the sampling process was repeated until the required 3938 wells were obtained. The data types encompassed three categories: numerical, string, and sequential data. A sample of the dataset is presented in Table 7.

The analysis of Table 7 reveals that the dataset comprises a total of 29 features, including 28 influencing features and 1 predictive feature. The data types are categorized into three groups: “pumping unit model” and “Balancing method” are string types; “well inclination angle”, “dogleg severity”, and “electrical power curve” belong to the sequential data type; while the remaining features are numerical data types.

5.2. Data Pre-Processing and Evaluation Indicators

To comprehensively validate the rationality and predictive accuracy of the proposed model, the dataset was partitioned into training, validation, and test sets in a 0.7:0.15:0.15 ratio. Thereafter, to balance the impact of disparate feature units and scales on prediction performance, min–max normalization was applied to all input features. The formula for maximum–minimum normalization is shown below:

X = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(57)

where

x

is the original data;

x_{\min}

is the minimum value in the dataset; and

x_{\max}

is the maximum value in the dataset.

To evaluate the validity and predictive accuracy of the proposed model, we use the quantile loss function and the coefficient of determination as evaluation metrics. Its calculation formula is given as follows:

L_{τ} (y_{i}, {\hat{y}}_{i}) = \{\begin{cases} τ (y_{a c, j} - y_{p r, j}) & i f y_{a c, j} \geq y_{p r, j} \\ (1 - τ) (y_{p r, j} - y_{a c, j}) & i f y_{a c, j} < y_{p r, j} \end{cases}

(58)

The mathematical model of the coefficient of determination is

R^{2} = 1 - \frac{{\sum_{i = 1}^{M} (y_{a c, j} - y_{p r, j})}^{2}}{{\sum_{i = 1}^{M} (\bar{y_{a c, j}} - y_{a c, j})}^{2}}

(59)

where

M

is the total number of samples;

y_{a c, j}

is the true value; and

y_{p r, j}

is the predicted value.

5.3. Experimental Details

To validate the predictive accuracy of the proposed model, hyperparameters were tuned empirically via trial-and-error methods. The resulting parameter settings are as follows.

(1): PCFE: To validate the accuracy of PCFE, the learning rate was 0.001, the number of training iterations was 500, the batch size was 256, and the optimizer was Adam. In the ResNet backbone, convolutional kernels measured 7 × 7 with a stride of 2 and padding of 3. The Transformer module comprised eight attention heads with an embedding dimension of eight. Both the BiLSTM and BiGRU subnets consisted of two hidden layers, each containing 25 units. In the QRCNN–BiGRU–Attention model, convolutional kernels measured 16 × 16 with a stride of 1 and no padding, the BiGRU component included two hidden layers of 12 units each, and the attention mechanism used four heads with an attention dimension of 24.
(2): AWFE: To validate the accuracy of AWFE, the learning rate was 0.001, the number of training iterations was 100, the batch size was 128, and the optimizer was Adam. In the ResNet backbone, convolutional kernels measured 7 × 7 with a stride of 2 and padding of 3. The Transformer module comprised eight attention heads with an embedding dimension of eight. At the data-alignment layer, the BiLSTM consisted of one hidden layer of five units. In the QRCNN–BiGRU–Cross-Attention model, convolutional kernels measured 16 × 16 with a stride of 2 and padding of 1, the BiGRU featured one hidden layer of 12 units, and the Cross-Attention mechanism employed four attention heads with an embedding dimension of 24. In the QRCNN-BiLSTM-BiGRU variant, convolutional kernels measured 16 × 16 with a stride of 1 and automatic padding; the BiLSTM comprised one hidden layer of 12 units; and the Cross-Attention module again utilized four heads with an embedding dimension of 24. Finally, the human evolutionary algorithm was configured with a population size of 50 and 1000 iterations.
(3): TSPE: To validate TSPE accuracy, the following hyperparameters were adopted, namely a learning rate of 0.001, 80 training iterations, and a batch size of 128, with Adam as the optimizer. The ResNet backbone utilized 7 × 7 convolutional kernels with stride 2 and padding 3. The Transformer module contained eight attention heads with an embedding dimension of 8. In the QRBiLSTM–BiGRU–Attention model, both BiLSTM and BiGRU subnetworks comprised two hidden layers of 20 units each. The QRBiRNN-BiGRU-BiLSTM variant featured single hidden layers of 64 units in each subnetwork (BiRNN, BiLSTM, and BiGRU).
(4): EPCI: To validate EPCI accuracy, the multiple hyperparameters were adopted—the learning rate was 0.001; there were 1000 training iterations for PCFE, 100 for AWFE, and 376 for TSPE; and there was a uniform batch size of 128. All other parameters remained fixed.
(5): EDEA: To validate EDEA accuracy, a learning rate of 0.001 was adopted. Training iterations were set to 500 for PCFE, 100 for AWFE, and 100 for TSPE, with a uniform batch size of 128. In the PCFE method, both BiLSTM and BiGRU modules contained two hidden layers of 32 units each. For the AWFE approach, the data-alignment layer’s BiLSTM module employed a single hidden layer with 10 units. All other hyperparameters remained constant.
(6): MEIE: To validate the accuracy of the proposed multi-strategy ensemble prediction model method for rod pump system efficiency, the final ensemble model (QRBiLSTM–BiGRU–Attention) was configured with a learning rate of 0.001, a batch size of 64, and 500 training iterations. The BiLSTM component contained two hidden layers of 64 units each, while the BiGRU component featured one hidden layer with 64 units. All other hyperparameters remained unchanged.

5.4. Experimental Results and Analysis

To evaluate the accuracy of the proposed method, we conducted 10 experimental trials of six rod pump efficiency prediction models using the hyperparameters detailed in Section 5.3. Table 8 lists the mean and standard deviation of evaluation metrics for each model. The loss curves for training and validation sets, along with test-set prediction results, are shown in Figure 7.

As evidenced by Figure 7a and Table 8, the loss decreases rapidly during the initial 50 iterations. Beyond approximately 250 iterations, both training and validation loss curves asymptotically converge with near-complete overlap. The absence of validation loss rebound or further training loss reduction indicates minimal overfitting and confirms robust generalization capability. Data points align closely with the reference line (blue), while evaluation metrics (0.7961 and 1.8927) demonstrate PCFE’s consistent performance across varying target-value ranges.

From Figure 7b and Table 8, the loss decreases sharply within 10–20 iterations. Thereafter, both the training and validation loss curves level off, with the validation loss closely tracking the training loss and exhibiting no rebound or sustained increase, indicating that under the current hyperparameter configuration neither overfitting nor underfitting occurs. The scatter points align predominantly along the blue reference line, and the evaluation metrics are 0.7627 and 2.0835, demonstrating that the proposed AWFE maintains strong consistency across different target-value ranges.

As shown in Figure 7c and Table 8, the loss decreases sharply from approximately 8 to 3 within 0–20 iterations, followed by a brief plateau around iteration 10, and then continues to decline rapidly to about 2. Throughout the training, the validation loss closely tracks the training loss with minimal divergence, indicating that the proposed prediction model method exhibits neither significant overfitting nor underfitting. The scatter points are predominantly aligned along the blue reference line. Moreover, the evaluation metrics are 0.7693 and 2.0637, demonstrating that the proposed TSPE maintains strong consistency across different target-value ranges.

From Figure 7d and Table 8, it can be observed that within the first 20 iterations, the loss function decreased rapidly from approximately 8 to about 2.5, indicating that the proposed soft-sensing method quickly captured the principal trend during the initial training phase. After 200 iterations, the loss curves stabilized, with the training and validation losses closely overlapping and exhibiting no significant divergence, demonstrating the absence of severe overfitting or underfitting and confirming the model’s strong generalization capability. The scatter points are predominantly aligned along the blue reference line, and the evaluation metrics are 0.8685 and 1.5490, indicating that the proposed EPCI maintains good consistency across different target-value ranges.

From Figure 7e and Table 8, the blue scatter points closely follow the blue dashed reference line, indicating a high degree of agreement between the predicted and actual values. The overall coefficient of determination is 0.8581, and the loss metric is 1.7357. It is evident that prediction accuracy is highest in the medium-efficiency range, with a slight underestimation observed for a few extreme high-efficiency samples. Overall, the proposed EDEA demonstrates robust stability and accuracy across the entire efficiency spectrum.

From Figure 7f and Table 8, the scatter plot of the proposed method on the test set shows that most points are tightly distributed around the blue dashed reference line, indicating strong linear fitting performance across the full efficiency range. The overall coefficient of determination is 0.9335 and the loss metric is 1.2293. Predictions in the low-efficiency region exhibit no significant bias, while those in the medium-to-high-efficiency range show slight underestimation, and errors increase marginally for extreme high-efficiency samples. Overall, the model delivers accurate and stable predictions within the normal operating range.

6. Ablation Study

6.1. Ablation Study of the PCFE Prediction Models

To comprehensively evaluate the contribution of each component within the PCFE pre-prediction model framework to system efficiency prediction performance, ten ablation experiments were conducted for each element, and the mean and standard deviation of the evaluation metrics were computed. In the feature extraction branch ResNet–Transformer–Cross-Attention, we sequentially removed the Cross-Attention and Transformer modules. In the prediction branch QRCNN–BiGRU–Attention, we individually ablated the BiGRU and Attention components. Each ablation variant was then compared against the complete proposed method. Table 9 summarizes the evaluation metrics for all variants, and Table 10 describes the percent change for each ablation experiment.

As shown in Table 9 and Table 10, in the feature extraction branch, the removal of the Cross-Attention module reduces R² from 0.7961 to 0.7746, a reduction of 2.71%, and raises

L_{τ}

from 1.8927 to 1.9464, an increase of 2.83%. Further ablating the Transformer module decreases R² from 0.7746 to 0.7564, a reduction of 2.35%, and increases

L_{τ}

from 1.9464 to 2.0369, an increase of 4.65%. In the prediction branch, excluding the Attention component lowers R² from 0.7961 to 0.7734, a reduction of 2.85%, and raises

L_{τ}

from 1.8927 to 1.9564, an increase of 3.37%. Additionally, removing BiGRU reduces R² from 0.7734 to 0.7552, a reduction of 1.82%, and increases

L_{τ}

from 2.0264 to 2.0388, an increase of 4.21%. These results demonstrate that each key module—Cross-Attention, Transformer, Attention, and BiGRU—provides statistically significant improvements in feature extraction capability and predictive accuracy.

6.2. Ablation Study of the AWFE Prediction Models

To comprehensively assess the contribution of each component in the AWFE prediction framework to system efficiency forecasting, we conducted ten ablation experiments per component and computed the mean and standard deviation of the evaluation metrics. For the feature extraction module ResNet–Transformer–Cross-Attention, we removed the Cross-Attention component. In the prediction module QRCNN–BiGRU–Cross-Attention-1, we ablated the BiGRU, Cross-Attention, and CNN components separately. Similarly, for the prediction module QRCNN-BiLSTM-BiGRU-2, we sequentially removed the BiLSTM and BiGRU components. All ablated versions were systematically compared with the complete framework. The quantitative evaluation metrics of these variants are presented in Table 11. Table 12 describes the percent change for each ablation experiment.

In the feature extraction branch, ablating the Cross-Attention module reduces the R² value from 0.7923 to 0.7756, a reduction of 2.11%, and increases the

L_{τ}

from 1.8645 to 1.9900, an increase of 6.73%. In prediction branch 1, removing the Cross-Attention alone decreases the R² from 0.7923 to 0.7714, a reduction of 2.64%, and increases the

L_{τ}

from 1.8645 to 1.9904, an increase of 6.75%. Further ablating the BiGRU component reduces the R² from 0.7714 to 0.7523, a reduction of 2.47%, and increases the

L_{τ}

from 1.9904 to 2.1569, an increase of 8.37%. When the CNN component is also removed, the R² decreases from 0.7714 to 0.7544, a reduction of 2.20%, and the

L_{τ}

increases from 1.9904 to 2.1369, an increase of 7.36%. In prediction branch 2, ablating BiGRU reduces the R² from 0.7923 to 0.7743, a reduction of 2.27%, and increases the

L_{τ}

from 1.8645 to 1.9901, an increase of 6.74%. Subsequently removing the CNN component further decreases the R² from 0.7923 to 0.7708, a reduction of 2.72%, and increases the

L_{τ}

from 1.8645 to 2.0835, an increase of 11.74%. Finally, ablating the BiLSTM component decreases the R² from 0.7708 to 0.7633, a reduction of 0.97%, and increases the

L_{τ}

from 2.0835 to 2.1046, an increase of 1.01%. Collectively, these quantitative results indicate that each key component—Cross-Attention, BiGRU, CNN, and BiLSTM—provides statistically significant improvements in both feature extraction capability and prediction accuracy within the AWFE soft-sensor framework.

6.3. Ablation Study of the TSPE Prediction Models

To comprehensively assess the impact of each component in the TSPE prediction model on system efficiency forecasting, we conducted ten ablation experiments per component and reported the mean and standard deviation of the evaluation metrics. For the feature extraction branch ResNet–Transformer–Cross-Attention, the Transformer and Cross-Attention modules were removed. In the prediction branch QRBiLSTM–BiGRU–Attention, both the BiGRU layer and Attention mechanism were eliminated. Similarly, for the alternative prediction branch QRBiRNN-BiGRU-BiLSTM, the BiGRU and BiLSTM components were excluded. These ablated configurations were rigorously compared with the complete proposed method. Quantitative evaluation metrics for each ablation variant are summarized in Table 13. Table 14 describes the percent change for each ablation experiment.

In the feature extraction branch, ablating the Cross-Attention module produces an 8.00% relative decrease in R² and a 24.4% relative increase in

L_{τ}

. The subsequent removal of the Transformer module results in a further 2.71% reduction in R² and a 13.01% increase in

L_{τ}

. In prediction branch 1, omitting the Attention component causes a 6.50% drop in R² and a 16.96% rise in

L_{τ}

. The sequential removal of the BiGRU and BiLSTM layers leads to additional relative decreases in R² of 1.45% and 1.42%, while the

L_{τ}

increases by 9.38% and 8.02%, respectively. In prediction branch 2, successive ablations of the BiGRU, BiLSTM, and BiRNN modules result in relative reductions in R² of 6.45%, 7.74%, and 7.82%, respectively, accompanied by increases in

L_{τ}

of 16.11%, 21.91%, and 27.85%. The further removal of each BiGRU, BiRNN, and BiLSTM component contributes additional R² declines of 2.50%, 3.89%, and 2.61%, along with

L_{τ}

increases of 10.57%, 14.70%, and 5.63%, respectively. These quantitative results confirm that each key module—Cross-Attention, Transformer, BiGRU, BiLSTM, and their combinations—contributes significantly to enhanced feature extraction capability and improved prediction accuracy.

6.4. Ablation Study of the EPCI Prediction Models

To assess the contributions of PCFE, AWFE, and TSPE to system efficiency forecasting within the Parallel-Series Cascade Ensemble framework, we conducted ten ablation experiments per variant and calculated the means and standard deviations of the evaluation metrics. Table 15 summarizes the evaluation metrics for all ablation scenarios. Table 16 describes the percent change for each ablation experiment.

In the EPCI framework, removing both the AWFE and TSPE modules produces a 10.17% decrease in R² and a 23.03% increase in

L_{τ}

. Similarly, ablating both the PCFE and TSPE modules yields a 9.63% drop in R² and a 26.59% rise in

L_{τ}

, while eliminating both the PCFE and AWFE modules results in a 15.72% reduction in R² and a 42.63% increase in

L_{τ}

. These findings demonstrate that each key component—AWFE, TSPE, and PCFE—significantly enhances feature extraction and improves prediction accuracy.

6.5. Ablation Study of the EDEA Prediction Models

To comprehensively assess the EDEA framework’s pre-prediction performance, ten ablation studies were conducted on the PCFE, AWFE, and TSPE modules, and the mean and standard deviation of the evaluation metrics were calculated. Table 17 summarizes the evaluation metrics for each ablation variant. Table 18 describes the percent change for each ablation experiment.

In the EDEA framework, ablating the AWFE and TSPE modules results in a relative R² decrease of 10.69% and a corresponding

L_{τ}

increase of 12.85%. Similarly, removing the PCFE and TSPE modules causes an R² drop of 8.52% and an

L_{τ}

rise of 11.70%, while eliminating the PCFE and AWFE modules produces an R² reduction of 11.83% accompanied by a 23.87% increase in

L_{τ}

. These quantitative findings confirm that each key component—AWFE, TSPE, and PCFE—makes a statistically significant contribution to enhancing feature extraction capability and improving prediction accuracy.

6.6. Ablation Study of the MEIE Prediction Models

To comprehensively assess each component’s contribution within the MEIE framework to system efficiency forecasting, ten ablation experiments were performed on the EPCI and EDEA variants, and the mean and standard deviation of the evaluation metrics were calculated. Table 19 summarizes the evaluation metrics for each ablation experiment. Table 20 describes the percent change for each ablation experiment.

From Table 20, In the MEIE framework, removing the EPCI and EDEA modules leads to relative

R^{2}

decreases of 5.71% and 6.01%, respectively, and corresponding

L_{τ}

increases of 21.07% and 33.50%. These quantitative findings indicate that each key component makes a statistically significant contribution to enhancing feature extraction capability and improving prediction accuracy.

7. Conclusions

This study addresses the challenges of nonlinear dynamic characteristics and multi-source data fusion in predicting pumping well system efficiency by proposing an online prediction framework based on multi-strategy integration. By constructing three foundational prediction models—Progressive Cross-Fusion, Adaptive-Weight Late Fusion, and Two-Stage Progressive Feature Fusion—and combining them with Parallel-Series Cascade Ensemble strategies and Data Envelopment Analysis (DEA)-based ensemble strategies, the framework successfully enhances the accuracy and real-time performance of pumping well system efficiency prediction. Furthermore, to balance the diversity and prediction accuracy of base ensemble learners, a multi-ensemble strategy-based prediction model was proposed. Although the proposed models demonstrate strong performance in experiments, there remains room for improvement in handling extreme data and real-time deployment. Future research may focus on optimizing computational efficiency, further improving noise robustness, exploring additional ensemble strategy combinations to enhance adaptability and stability in real-world applications, and developing methods for the secure and reliable front-end deployment of the model in rod pumping systems.

Author Contributions

Both authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 51974276.

Data Availability Statement

The datasets utilized in this study were obtained from a third-party industrial partner under strict confidentiality agreements. Due to commercial sensitivity and contractual restrictions, the raw data cannot be made publicly available. However, aggregated data, processed results, or specific subsets necessary to replicate critical findings may be provided upon reasonable request, subject to approval by the data owner and compliance with confidentiality protocols. Researchers interested in accessing limited data for verification purposes may contact the corresponding author to initiate a formal data-sharing request process.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Luan, G.-H.; He, S.-L.; Yang, Z.; Yang, Z.; Zhao, H.-Y.; Hu, J.-H.; Xie, Q.; Shen, Y.-H. A prediction model for a new deep-rod pumping system. J. Pet. Sci. Eng. 2011, 80, 75–80. [Google Scholar] [CrossRef]
Lv, X.X.; Wang, H.X.; Zhang, X.; Liu, Y.X.; Chen, S.S. An equivalent vibration model for optimization design of carbon/glass hybrid fiber sucker rod pumping system. J. Pet. Sci. Eng. 2021, 207, 109148. [Google Scholar] [CrossRef]
Gibbs, S.G. Predicting the behavior of sucker rod pumping systems. J. Pet. Technol. 1965, 61, 769–778. [Google Scholar] [CrossRef]
Lekia, S.D.L.; Evans, R.D. A coupled rod and fluid Dynamic model for predicting the behavior of sucker-rod pumping system. SPE 1965, 21664, 30–45. [Google Scholar]
Xing, M.; Zhou, L.; Zhang, C.; Xue, K.; Zhang, Z. Simulation Analysis of Nonlinear Friction of Rod String in Sucker Rod Pumping System. J. Comput. Nonlinear Dyn. 2015, 14, 091008. [Google Scholar] [CrossRef]
Xing, M. Response analysis of longitudinal vibration of sucker rod string considering rod buckling. Adv. Eng. Softw. 2019, 99, 49–58. [Google Scholar] [CrossRef]
Moreno, G.A.; Garriz, A.E. Sucker rod string dynamics in deviated wells. J. Pet. Sci. Eng. 2020, 184, 106534. [Google Scholar] [CrossRef]
Tarmigh, M.; Behbahani-Nejad, M.; Hajidavalloo, E. Two-way fluid-structure interaction for longitudinal vibration of a loaded elastic rod within a multiphase fluid flow. J. Braz. Soc. Mech. Sci. Eng. 2023, 45, 572. [Google Scholar] [CrossRef]
Jiaojian, Y.I.; Dong, S.U.; Yousheng, Y.A.G. Predicting multi-tapered sucker-rod pumping systems with the analytical solution. J. Pet. Sci. Eng. 2021, 197, 108115. [Google Scholar]
Wang, X.; Lv, L.; Li, S.; Pu, H.; Liu, Y.; Bian, B.; Li, D. Longitudinal vibration analysis of sucker rod based on a simplified thermo-solid model. J. Comput. Nonlinear Dyn. 2021, 196, 107951. [Google Scholar] [CrossRef]
Li, Q.; Chen, B.; Huang, Z.; Tang, H.; Li, G.; He, L.; Sáez, A. Study on Equivalent Viscous Damping Coefficient of Sucker Rod Based on the Principle of Equal Friction Loss. Math. Probl. Eng. 2019, 2019, 9272751. [Google Scholar] [CrossRef]
Ma, B.; Dong, S. Coupling Simulation of Longitudinal Vibration of Rod String and Multi-Phase Pipe Flow in Wellbore and Research on Downhole Energy Efficiency. Energies 2023, 16, 4988. [Google Scholar] [CrossRef]
Langbauer, C.; Antretter, T. Finite Element Based Optimization and Improvement of the Sucker Rod Pumping System. In Proceedings of the Abu Dhabi International Petroleum Exhibition & Conference, Abu Dhabi, United Arab Emirates, 3–6 November 2017. [Google Scholar]
Lukasiewicz, S.A. Dynamic Behavior of the Sucker Rod String in the Inclined Well. In Proceedings of the SPE Production Operations Symposium, Oklahoma City, OK, USA, 7–9 April 1991. [Google Scholar]
Hongbo, W.; Shimin, D.; Yang, Z.; Shuqiang, W.; Xiurong, S. Coupling simulation of the pressure in pump and the longitudinal vibration of sucker rod string based on gas-liquid separation. Shiyou Xuebao/Acta Pet. Sin. 2023, 44, 394–404. [Google Scholar]
Wang, H.; Dong, S. Research on the Coupled Axial-Transverse Nonlinear Vibration of Sucker Rod String in Deviated Wells. J. Vib. Eng. Technol. 2021, 9, 115–129. [Google Scholar] [CrossRef]
Dong, S.; Li, W.; Houtian, B.; Wang, H.; Chen, J.; Liu, M. Optimizing the running parameters of a variable frequency beam pumping system and simulating its dynamic behaviors. Jixie Gongcheng Xuebao/J. Mech. Eng. 2016, 52, 63–70. [Google Scholar] [CrossRef]
Tan, C.; Deng, H.; Feng, Z.; Li, B.; Peng, Z.; Feng, G. Data-driven system efficiency prediction and production parameter optimization for PW-LHM. J. Pet. Sci. Eng. 2022, 209, 109810. [Google Scholar] [CrossRef]
Ma, B.; Dong, S. A novel hybrid efficiency prediction model for pumping well system based on MDS-SSA-GNN. Energy Sci. Eng. 2024, 12, 3272–3288. [Google Scholar] [CrossRef]
Ma, B.; Dong, S. A Hybrid Prediction Model for Pumping Well System Efficiency Based on Stacking Integration Strategy. Int. J. Energy Res. 2024, 2024, 8868949. [Google Scholar] [CrossRef]
Wang, X.; Kihara, D.; Luo, J.; Qi, G.-J. ENAET: Self-trained ensemble autoencoding transformations for semi-supervised learning. arXiv 2019, arXiv:1911:09265. [Google Scholar]
Ju, C.; Bibaut, A.; van der Laan, M. The relative performance of ensemble methods with deep convolutional neural networks for image classification. J. Appl. Stat. 2018, 45, 2800–2818. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Feng, K.; Wu, J. SVM-based deep stacking networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 1 February 2019; Volume 33, pp. 5273–5280. [Google Scholar]
Zhou, W.; Zhu, Y.; Lei, J.; Wan, J.; Yu, L. APNet: Adversarial Learning Assistance and Perceived Importance Fusion Network for All-Day RGB-T Salient Object Detection. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 957–968. [Google Scholar] [CrossRef]
Chen, C.; Li, Z.; Kou, K.L.; Du, J.; Li, C.; Wang, H. Comprehensive Multisource Learning Network for Cross-Subject Multimodal Emotion Recognition. In Proceedings of the IEEE Transactions on Emerging Topics in Computational Intelligence, Piscataway, NJ, USA, 27 June 2024; Volume 9, pp. 365–380. [Google Scholar]
Wang, L.; Peng, J.; Zheng, C.; Zhao, T.; Zhu, L. A cross modal hierarchical fusion multimodal sentiment analysis method based on multi-task learning. Inf. Process. Manag. 2024, 61, 103675. [Google Scholar] [CrossRef]
Islam, M.; Nooruddin, S.; Karray, F.; Muhammad, G. Multi-level feature fusion for multimodal human activity recognition in Internet of Healthcare Things. Inf. Fusion 2023, 94, 17–31. [Google Scholar] [CrossRef]
Zhao, X.; Tang, C.; Hu, H.; Wang, W.; Qiao, S.; Tong, A. Attention mechanism based multimodal feature fusion network for human action recognition. J. Vis. Commun. Image Represent. 2025, 110, 104459. [Google Scholar] [CrossRef]
Sun, C.; Chen, X. Deep Coupling Autoencoder for Fault Diagnosis with Multimodal Sensory Data. In Proceedings of the IEEE Transactions on Industrial Informatics, Porto, Portugal, 18–20 July 2018; Volume 14, pp. 1137–1145. [Google Scholar]
Jing, J.; Wu, H.; Sun, J.; Fang, X.; Zhang, H. Multimodal fake news detection via progressive fusion networks. Inf. Process. Manag. 2023, 60, 103120. [Google Scholar] [CrossRef]
Niu, M.; Tao, J.; Liu, B.; Huang, J.; Lian, Z. Multimodal Spatiotemporal Representation for Automatic Depression Level Detection. IEEE Trans. Affect. Comput. 2023, 14, 294–307. [Google Scholar] [CrossRef]
Peng, S.; Zhu, J.; Wu, T.; Tang, A.; Kan, J.; Pecht, M. SOH early prediction of lithium-ion batteries based on voltage interval selection and features fusion. Energy 2024, 308, 132993. [Google Scholar] [CrossRef]
Gandhi, A.; Adhvaryu, K.; Poria, S.; Cambria, E.; Hussain, A. Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Inf. Fusion 2023, 91, 424–444. [Google Scholar] [CrossRef]
Huang, J.; Zhang, F.; Safaei, B.; Qin, Z.; Chu, F. The flexible tensor singular value decomposition and its applications in multisensor signal fusion processing. Mech. Syst. Signal Process. 2024, 220, 111662. [Google Scholar] [CrossRef]

$Fractalfract 09 00545 g001$

Figure 1. Online prediction model of pumping well system efficiency based on progressive cross-fertilization.

$Fractalfract 09 00545 g001$

$Fractalfract 09 00545 g002$

Figure 2. Online prediction model for efficiency of late fusion pumping unit well system based on adaptive weights.

$Fractalfract 09 00545 g002$

$Fractalfract 09 00545 g003$

Figure 3. Online prediction model of pumping well system efficiency based on Two-Step Progressive Feature Fusion.

$Fractalfract 09 00545 g003$

$Fractalfract 09 00545 g004$

Figure 4. Online prediction model of pumping well system efficiency based on parallel-string cascade integration.

$Fractalfract 09 00545 g004$

$Fractalfract 09 00545 g005$

Figure 5. Online prediction model of pumping well system efficiency based on data envelope method.

$Fractalfract 09 00545 g005$

$Fractalfract 09 00545 g006$

Figure 6. Online prediction model of pumping well system efficiency based on integration of multiple integration strategies.

$Fractalfract 09 00545 g006$

$Fractalfract 09 00545 g007a$ $Fractalfract 09 00545 g007b$

Figure 7. Loss curves and test-set prediction results for each model.

$Fractalfract 09 00545 g007a$ $Fractalfract 09 00545 g007b$

Table 1. Components and benefits.

Components		Benefits
Feature Extraction	Residual Networks	By employing multi-layer convolutional stacks with shortcut connections, fine-grained local features across multiple receptive fields are efficiently extracted, while mitigating vanishing gradients and ensuring scalable network depth and performance.
	Transformer	This mechanism establishes long-range dependencies, enhancing the model’s perception of global semantic information and compensating for the limited receptive field of pure convolutional networks.
	Cross-Attention	Dynamic weighted fusion across features from different modalities or hierarchical levels enables alignment and complementarity of critical features, thereby enhancing the discriminative power of the feature representations.
Feature Fusion	BiLSTM	By capturing contextual information in both the forward and backward directions of the input sequence, it can more comprehensively model bidirectional dependencies.
	Cross-Attention	By computing attention weights across features from different branches or modalities, it enables the complementary alignment of critical features.
	BiGRU	By supporting bidirectional propagation, it preserves robust temporal modeling capability while enhancing training efficiency and generalization.
Predictive Model	CNN	It can efficiently capture local spatiotemporal dependencies and feature patterns.
	BiGRU	By aggregating information from both the forward and backward directions of the sequence, it fully captures bidirectional dependencies.
	Attnetion	It can establish direct dependencies between all positions in a sequence or feature set, thereby overcoming the limitations of recurrent and convolutional networks in capturing long-range information.

Table 2. Components and benefits.

Components		Benefits
Feature Extraction	Residual Networks	By employing multi-layer convolutional stacks with shortcut connections, fine-grained local features across multiple receptive fields are efficiently extracted, while mitigating vanishing gradients and ensuring scalable network depth and performance.
	Transformer	This mechanism establishes long-range dependencies, enhancing the model’s perception of global semantic information and compensating for the limited receptive field of pure convolutional networks.
	Cross-Attention	Dynamic weighted fusion across features from different modalities or hierarchical levels enables alignment and complementarity of critical features, thereby enhancing the discriminative power of the feature representations.
Predictive Model-1	CNN	It can efficiently capture local spatiotemporal dependencies and feature patterns.
	BiGRU	By aggregating information from both the forward and backward directions of the sequence, it fully captures bidirectional dependencies.
	Cross-Attention	Dynamic weighted fusion across features from different modalities or hierarchical levels enables alignment and complementarity of critical features, thereby enhancing the discriminative power of the feature representations.
Predictive Model-2	CNN	It can efficiently capture local spatiotemporal dependencies and feature patterns.
	BiGRU	By aggregating information from both the forward and backward directions of the sequence, it fully captures bidirectional dependencies.
	BiLSTM	By capturing contextual information in both the forward and backward directions of the input sequence, it can more comprehensively model bidirectional dependencies.

Table 3. Components and benefits.

Components		Benefits
Feature Extraction	Residual Networks	By employing multi-layer convolutional stacks with shortcut connections, fine-grained local features across multiple receptive fields are efficiently extracted, while mitigating vanishing gradients and ensuring scalable network depth and performance.
	Transformer	This mechanism establishes long-range dependencies, enhancing the model’s perception of global semantic information and compensating for the limited receptive field of pure convolutional networks.
	Cross-Attention	Dynamic weighted fusion across features from different modalities or hierarchical levels enables alignment and complementarity of critical features, thereby enhancing the discriminative power of the feature representations.
Predictive Model-1	Attention	It can establish direct dependencies between all positions in a sequence or feature set, thereby overcoming the limitations of recurrent and convolutional networks in capturing long-range information.
	BiGRU	By aggregating information from both the forward and backward directions of the sequence, it fully captures bidirectional dependencies.
	BiLSTM	By capturing contextual information in both the forward and backward directions of the input sequence, it can more comprehensively model bidirectional dependencies.
Predictive Model-2	BiRNN	By employing both forward and backward hidden states, it comprehensively captures contextual dependencies at both the beginning and end of the sequence.
	BiGRU	By aggregating information from both the forward and backward directions of the sequence, it fully captures bidirectional dependencies.
	BiLSTM	By capturing contextual information in both the forward and backward directions of the input sequence, it can more comprehensively model bidirectional dependencies.

Table 4. Components and benefits.

Components		Benefits
Models	PCFE	This model seamlessly integrates heterogeneous sensor data through specialized encoders, dynamically fuses multi-scale features via Cross-Attention, and delivers robust real-time efficiency predictions with uncertainty quantification using a GRU–attention–quantile regression pipeline.
	AWFE	This model combines specialized encoders for sequence, string, and numeric data with cross-modal attention fusion and ensemble GRU/LSTM predictors, topped by a quantile regression output to deliver robust, real-time efficiency predictions with uncertainty quantification.
	TSPE	By employing dedicated encoders for numeric, string, and sequence data with Cross-Attention fusion, and integrating dual-branch GRU/LSTM networks with quantile regression, this model achieves end-to-end multimodal feature fusion, real-time high-precision efficiency prediction, and uncertainty quantification.
Weight Optimization	GKSO	By mimicking sharks’ dynamic foraging strategies, SOA effectively balances exploration and exploitation, reducing the risk of premature convergence to local optima.

Table 5. Components and benefits.

Components		Benefits
Models	PCFE	This model seamlessly integrates heterogeneous sensor data through specialized encoders, dynamically fuses multi-scale features via Cross-Attention, and delivers robust real-time efficiency predictions with uncertainty quantification using a GRU–attention–quantile regression pipeline.
	AWFE	This model combines specialized encoders for sequence, string, and numeric data with cross-modal attention fusion and ensemble GRU/LSTM predictors, topped by a quantile regression output to deliver robust, real-time efficiency predictions with uncertainty quantification.
	TSPE	By employing dedicated encoders for numeric, string, and sequence data with Cross-Attention fusion, and integrating dual-branch GRU/LSTM networks with quantile regression, this model achieves end-to-end multimodal feature fusion, real-time high-precision efficiency prediction, and uncertainty quantification.
Weight Optimization	DEA	DEA derives weights directly from the data without assuming a specific functional form, allowing each decision-making unit to be evaluated against its own “best-practice” frontier.

Table 6. Components and benefits.

Components		Benefits
Models	EPCI	The model achieves high-precision, robust system efficiency prediction by adaptively calibrating fusion weights with the Shark Optimization Algorithm to perform weighted integration of two complementary base learners.
Models	EDEA	By leveraging Data Envelopment Analysis to optimally compute fusion weights for PCFE, AWFE, and TSPE, this model adaptively integrates three complementary predictors to achieve unbiased, high-accuracy system efficiency forecasts.

Table 7. Data characterization.

Characteristics	Example	Characteristics	Example	Characteristics	Example
Rated power of the electric motor	15 KW	Number of centralizers	750	Well inclination angle	0.43, 0.43, 0.58, …
Motor no-load power	0.57 KW	Pump diameter	28 mm	Dogleg severity	0, 0.15, 0.29, …
Motor rated efficiency	88.5%	Stroke frequency	3 (min⁻¹)	Electrical power curve	13, 13.2, 15.6, …
Pump setting depth	2250 m	Number of rod string grades	2	Balancing method	Crank balance
Stroke length	2 m	Equivalent diameter of rod string	17.474 mm	Pumping unit model	CYJY14-4.8-73HB
Balance degree	95%	Tubing specification	62 mm	Relative density of natural gas	0.6
Saturation pressure	5 Mpa	Submergence depth	0.8 m	Tubing pressure	0.8 Mpa
Well fluid density	815 (kg/m³)	Pump clearance grade	1	Gas–oil ratio	25
Well fluid viscosity	5 (mPa s)	Dynamic fluid level	2250 m	Casing pressure	0.8 Mpa
System efficiency	22.34%	Water cut	35%

Table 8. Evaluation indicators.

Methods	R²	$L_{τ}$
PCFE	0.7961 $\pm$ 0.0076	1.8927 $\pm$ 0.0324
AWFE	0.7627 $\pm$ 0.0071	2.0835 $\pm$ 0.0258
TSPE	0.7693 $\pm$ 0.0085	2.0637 $\pm$ 0.0966
EPCI	0.8685 $\pm$ 0.00117	1.5490 $\pm$ 0.0221
EDEA	0.8581 $\pm$ 0.00114	1.7357 $\pm$ 0.0179
MEIE	0.9335 $\pm$ 0.00103	1.2293 $\pm$ 0.0073

Table 9. Evaluation indicators.

Model	$R^{2}$	$L_{τ}$
PCFE	0.7961 $\pm$ 0.0085	1.8927 $\pm$ 0.0329
ResNet	0.7564 $\pm$ 0.0119	2.0369 $\pm$ 0.0882
ResNet–Transformer	0.7746 $\pm$ 0.0101	1.9464 $\pm$ 0.0596
QRCNN	0.7552 $\pm$ 0.0155	2.0388 $\pm$ 0.1294
QRCNN-BiGRU	0.7734 $\pm$ 0.0103	1.9564 $\pm$ 0.0731

Table 10. Percent change for each ablation experiment.

Model	$Percentage Change of R^{2}$ /%	$Percentage Change of L_{τ}$ /%
ResNet	2.35	4.65
ResNet–Transformer	2.71	2.83
QRCNN	1.82	4.21
QRCNN-BiGRU	2.85	3.37

Table 11. Evaluation indicators.

Model	$R^{2}$	$L_{τ}$
AWFE	0.7923 $\pm$ 0.0072	1.8645 $\pm$ 0.0262
ResNet–Transformer	0.7756 $\pm$ 0.0106	1.9900 $\pm$ 0.0285
QRCNN-GRU-1	0.7714 $\pm$ 0.0111	1.9904 $\pm$ 0.0294
QRCNN-1	0.75234 $\pm$ 0.0146	2.1569 $\pm$ 0.03127
QRGRU-1	0.7544 $\pm$ 0.0151	2.1369 $\pm$ 0.03016
QRCNN-BiLSTM-2	0.7743 $\pm$ 0.0109	1.9901 $\pm$ 0.0291
QRBiLSTM-BiGRU-2	0.7708 $\pm$ 0.0095	2.0835 $\pm$ 0.0233
QRGRU-2	0.7633 $\pm$ 0.0146	2.1046 $\pm$ 0.0332

Table 12. Percent change for each ablation experiment.

Model	$Percentage Change of R^{2}$ /%	$Percentage Change of L_{τ}$ /%
ResNet–Transformer	2.11	6.73
QRCNN-GRU-1	2.64	6.75
QRCNN-1	2.47	8.37
QRGRU-1	2.20	7.36
QRCNN-BiLSTM-2	2.27	6.74
QRBiLSTM-BiGRU-2	2.72	11.75
QRGRU-2	0.97	1.01

Table 13. Evaluation indicators.

Model	$R^{2}$	$L_{τ}$
TSPE	0.8362 $\pm$ 0.0096	1.6590 $\pm$ 0.0928
ResNet	0.7485 $\pm$ 0.0226	2.3321 $\pm$ 0.1353
ResNet–Transformer	0.7693 $\pm$ 0.0224	2.0637 $\pm$ 0.1174
QRBiLSTM-BiGRU-1	0.7819 $\pm$ 0.0221	1.9403 $\pm$ 0.1068
QRBiRNN-BiGRU-2	0.7715 $\pm$ 0.0213	2.0224 $\pm$ 0.1047
QRBiRNN-BiLSTM-2	0.7823 $\pm$ 0.0207	1.9263 $\pm$ 0.1141
QRBiGRU-BiLSTM-2	0.7708 $\pm$ 0.0191	2.1210 $\pm$ 0.1219
QRBiLSTM-1	0.7706 $\pm$ 0.0294	2.1222 $\pm$ 0.1359
QRBiGRU-1	0.7708 $\pm$ 0.0327	2.0959 $\pm$ 0.1446
QRBiRNN-2	0.7519 $\pm$ 0.0251	2.2361 $\pm$ 0.1422
QRBiLSTM-2	0.7617 $\pm$ 0.0304	2.2094 $\pm$ 0.1863
QRBiGRU-2	0.7507 $\pm$ 0.0294	2.2476 $\pm$ 0.1272

Table 14. Percent change for each ablation experiment.

Model	$Percentage Change of R^{2}$ /%	$Percentage Change of L_{τ}$ /%
ResNet	2.71	13.01
ResNet–Transformer	8.00	24.40
QRBiLSTM-BiGRU-1	6.50	16.96
QRBiRNN-BiGRU-2	7.74	21.90
QRBiRNN-BiLSTM-2	6.45	16.11
QRBiGRU-BiLSTM-2	7.82	27.85
QRBiLSTM-1	1.45	9.38
QRBiGRU-1	1.42	8.02
QRBiRNN-2	2.50	10.57
QRBiLSTM-2	3.89	14.70
QRBiGRU-2	2.61	5.63

Table 15. Evaluation indicators.

Model	$R^{2}$	$L_{τ}$
EPCI	0.8685 $\pm$ 0.00119	1.5490 $\pm$ 0.0243
PCFE	0.7802 $\pm$ 0.00152	1.9058 $\pm$ 0.1106
AWFE	0.7849 $\pm$ 0.00150	1.9608 $\pm$ 0.1757
TSPE	0.7320 $\pm$ 0.0016	2.2094 $\pm$ 0.1033

Table 16. Percent change for each ablation experiment.

Model	$Percentage Change of R^{2}$ /%	$Percentage Change of L_{τ}$ /%
PCFE	10.17	23.03
AWFE	9.63	26.59
TSPE	15.72	42.63

Table 17. Evaluation indicators.

Model	$R^{2}$	$L_{τ}$
EDEA	0.8581 $\pm$ 0.00106	1.7357 $\pm$ 0.0196
PCFE	0.7664 $\pm$ 0.00164	1.9588 $\pm$ 0.1114
AWFE	0.7850 $\pm$ 0.00153	1.9387 $\pm$ 0.1801
TSPE	0.7566 $\pm$ 0.00158	2.1500 $\pm$ 0.1126

Table 18. Percent change for each ablation experiment.

Model	$Percentage Change of R^{2}$ /%	$Percentage Change of L_{τ}$ /%
PCFE	10.69	12.85
AWFE	8.52	11.70
TSPE	11.83	23.87

Table 19. Valuation metrics.

Model	$R^{2}$	$L_{τ}$
MEIE	0.9130 $\pm$ 0.00103	1.3002 $\pm$ 0.0076
EPCI	0.8609 $\pm$ 0.00111	1.5742 $\pm$ 0.0238
EDEA	0.8581 $\pm$ 0.00109	1.7357 $\pm$ 0.0191

Table 20. Valuation metrics.

Model	$Percentage Change of R^{2}$ /%	$Percentage Change of L_{τ}$ /%
EPCI	5.71	6.01
EDEA	21.07	33.50

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, B.; Dong, S. Real-Time Efficiency Prediction in Nonlinear Fractional-Order Systems via Multimodal Fusion. Fractal Fract. 2025, 9, 545. https://doi.org/10.3390/fractalfract9080545

AMA Style

Ma B, Dong S. Real-Time Efficiency Prediction in Nonlinear Fractional-Order Systems via Multimodal Fusion. Fractal and Fractional. 2025; 9(8):545. https://doi.org/10.3390/fractalfract9080545

Chicago/Turabian Style

Ma, Biao, and Shimin Dong. 2025. "Real-Time Efficiency Prediction in Nonlinear Fractional-Order Systems via Multimodal Fusion" Fractal and Fractional 9, no. 8: 545. https://doi.org/10.3390/fractalfract9080545

APA Style

Ma, B., & Dong, S. (2025). Real-Time Efficiency Prediction in Nonlinear Fractional-Order Systems via Multimodal Fusion. Fractal and Fractional, 9(8), 545. https://doi.org/10.3390/fractalfract9080545

Article Menu

Real-Time Efficiency Prediction in Nonlinear Fractional-Order Systems via Multimodal Fusion

Abstract

1. Introduction

2. Related Work

2.1. Mathematical Models

2.2. Prediction Models of System Efficiency Based on Historical Data

2.3. Multimodal Feature Fusion Networks

3. Basic Definitions

4. Methodology

4.1. Prediction Models of Beam Pumping System Efficiency Based on Progressive Cross-Fertilization

4.2. Adaptive Weight-Based Prediction Models of Efficiency of Late Fusion Pumping Well Systems

4.3. Prediction Models of Pumping Well System Efficiency Based on Two-Step Progressive Feature Fusion

4.4. Prediction Models of Pumping Well System Efficiency Based on the Parallel-Strand Cascaded Integration Strategy

4.5. Online Prediction Models of Pumping Well System Efficiency Based on the Data Envelope Method Integration Strategy

4.6. Online Prediction Models of Pumping Well System Efficiency Based on Integration of Multi-Integrated Strategies Integration

5. Experiment

5.1. Data Description

5.2. Data Pre-Processing and Evaluation Indicators

5.3. Experimental Details

5.4. Experimental Results and Analysis

6. Ablation Study

6.1. Ablation Study of the PCFE Prediction Models

6.2. Ablation Study of the AWFE Prediction Models

6.3. Ablation Study of the TSPE Prediction Models

6.4. Ablation Study of the EPCI Prediction Models

6.5. Ablation Study of the EDEA Prediction Models

6.6. Ablation Study of the MEIE Prediction Models

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI