1. Introduction
As the key power component of rotary machinery equipment, plunger pumps are widely used in various types of industrial equipment. After a long time conducting high-intensity work, the internal parts of the plunger pump will become worn or damaged, resulting in frequent failures during the operation of the equipment. Therefore, condition monitoring and fault diagnosis for the plunger pump can guarantee the normal operation of industrial equipment, reduce the frequency of trouble-free shutdown, and improve the working efficiency and safety of the equipment [
1].
Due to the complex internal structure of the plunger pump, the harsh working environment, and processing errors, the fault modes are varied and difficult to diagnose accurately. In the early days, fault diagnosis of the plunger pump was mainly based on the observations and judgment of experienced technicians on the site and was carried out, for example, by listening to whether the plunger pump had abnormal sound during operation and observing whether there was oil leakage. However, with the rapid progress of sensing technology, detection technology, and information technology, with the help of various sensors installed in the key parts of the plunger pump, a large number of detailed operating parameters and status information can be collected in real time, and then these data can be used for deep fault identification and diagnosis. Intelligent fault diagnosis methods based on deep learning have gradually become the focus of current research [
2]. By constructing multi-level and highly abstract fault diagnosis models and developing targeted learning algorithms with excellent performance, these methods can accurately mine the internal distribution characteristics and evolution rules of fault data from the complex data. Thus, the key characteristic information reflecting the essence of equipment fault can be extracted efficiently. Chao et al. [
3] fused the vibration data of the three channels of the plunger pump, classified them using a convolutional neural network (CNN), and achieved good results. Eraliev, O et al. [
4] proposed a stackable convolutional autoencoder (SCAE) that can extract more hierarchical features from vibration signals. Experimental results verified the effectiveness of SCAE in the fault diagnosis of small-sample plunger pumps. Tang et al. [
5] converted plunger pump vibration signals into images using continuous wavelet transform (CWT) and then classified them by CNN with high classification accuracy. Based on reference [
5], Tang et al. [
6] used the Bayesian optimization (BO) algorithm to carry out the adaptive learning of hyperparameters for a CNN and proposed an improved CNN (CNN-BO) to further improve the classification accuracy. Zhu et al. [
7] established an adaptive convolutional neural network model to classify five typical working conditions of the plunger pump. Zhu et al. [
8] used synchronous compression transform (SWT) to divide plunger pump signals into two dimensions, time and frequency, and achieved intelligent plunger pump fault diagnosis by using the established Visual Geometry Group long short-term memory (VGG-LSTM) model. He et al. [
9] proposed a deep multi-signal fusion adversarial model (MFAN) based on transfer learning to address the problems of diverse working conditions and inconsistent data distribution presented by the plunger pump. A multi-signal fusion module was designed to assign weight to vibration signals and sound signals, and MFAN showed good performance in cross-domain fault diagnosis of the plunger pump. However, traditional deep learning models cannot distinguish which features are more important for the task at hand. Xu et al. [
10] combined the channel attention mechanism squeeze-and-excitation network (SENet) with a one-dimensional convolutional neural network. SENet could adjust the weights of different input channels but did not take into account the dynamic adaptation of thresholds.
The continuous maturation and innovation of deep learning technology greatly promotes the development of technology in the field of fault diagnosis. However, the training of deep neural networks requires a large amount of fault sample data with complete label information. In engineering practice, in order to ensure the safety and continuous operation of equipment, equipment cannot be run in a faulty state for a long time during the entire operation cycle. Most of the data obtained during the operation cycle is unlabeled and in a healthy state, while only part of the data records the characteristic information of the equipment failure. Therefore, it is necessary to study how to maximize the limited fault sample information to improve the accuracy and effectiveness of fault diagnosis models [
11]. The proposal of meta-learning has improved these problems to some extent [
12]. The core idea of meta-learning is to learn the general knowledge of multiple learning tasks, which provides favorable prior knowledge for solving new tasks and ultimately helps to improve the generalization ability of the model. Model agnostic meta-learning (MAML) [
13] and other meta-learning techniques train the model under a variety of different learning tasks so that it can adapt to new learning tasks with only a small number of training samples. This method has achieved good performance in computer vision [
14], speech recognition [
15], and reinforcement learning [
16]. In the past few years, a series of studies have been conducted to explore the application of meta-learning in the field of fault diagnosis. Lin et al. [
17] proposed generalized model-agnostic meta-learning (GMAML) for fault diagnosis driven by heterogeneous signals, which was verified by bearing vibration signals and sound signals. Luo et al. [
18] proposed a meta-learning method for bearing signals under variable speed conditions. Based on the elastic prototype network, the designed enhanced feature encoder and elastic measurer were used to complete cross-domain fault diagnosis under variable speed conditions. Liu et al. [
19] proposed a class-incremental continual learning model for plunger pump faults based on weight space meta-representation (WSMR) continuous learning model for the class incremental fault diagnosis of plunger pumps, aiming at the problem of DL models suffering catastrophic forgetting during continuous learning, and used modified WaveletKernelNet (MWKN) to reduce the forgetting of old knowledge. Li et al. [
20] proposed an attention-based deep meta transfer learning method (ADMTL) to address problems such as poor model generalization ability in low-frequency fine-grained fault diagnosis tasks. ADMTL introduced an attentional mechanism to guide the learning of feature learners. It can be seen from the above studies that domestic and foreign scholars have made progress in the fault diagnosis field by using meta-learning technology, successfully developing a variety of intelligent fault diagnosis technologies driven by small sample data, providing new ideas for meta-learning in small-sample fault diagnosis direction. However, traditional meta-learning strategies do not have a task-specific parameter selection strategy. We consider the degree of difference between different tasks and the unique attributes of common faults of plunger pumps, and further research is conducted under the condition of data limitation to build a meta-learning model for small-sample plunger pump fault diagnosis, solving the problem of data scarcity while improving the model’s rapid adaptability and generalization performance when facing unknown plunger pump fault modes.
In this paper, a fault diagnosis method of plunger pump based on meta-learning and improved multi-channel convolutional neural network (MAML-MCCNN-ISENet) is proposed, which introduces MAML theory into the established MCCNN-ISENet deep learning model. The main contributions are as follows:
- (1)
In this paper, a soft threshold structure is added to the structure of SENet, and ISENet is proposed, which uses the weight distribution characteristics learned by each channel under the attention mechanism to automatically generate the corresponding optimal threshold setting, which enhances the overall recognition accuracy and stability.
- (2)
In this paper, in order to solve the problem of insufficient generalization ability of the model in the small-sample scenario, the MAML model is improved by combining the meta-learning strategy, the learning process of the model initialization parameters is optimized, and the improved MAML is combined with the deep learning model to improve the rapid adaptability and generalization performance of the model in the face of unknown plunger pump failure modes.
- (3)
In this paper, the proposed method is verified on the plunger pump fault dataset and the centrifugal pump vibration dataset of the San Longoval Institute of Engineering and Technology, and the accuracy of the proposed method is higher than that of several small-sample fault diagnosis methods.
The rest of this paper is organized as follows.
Section 2 illustrates the related work.
Section 3 describes the proposed MAML-MCCNN-ISENet in detail.
Section 4 validates the effectiveness of the MAML-MCCNN-ISENet method with two datasets. The conclusions and future research directions are illustrated in
Section 5.
3. The Proposed Method
3.1. Improved SENet Model (ISENet)
The structure of SENet is enhanced in this paper by incorporating a soft threshold mechanism, the soft thresholding operation, originally proposed by Donoho and Johnstone [
26] for wavelet denoising, Zhang et al. [
27] used a combination of soft threshold operations and residual networks to suppress noise interference, we introduced soft thresholds into SENet in order to dynamically adjust the thresholds for individual channels based on their learned weight distribution features under the attention mechanism. This enables the model to emphasize important features while suppressing minor or noisy ones, thereby significantly improving key feature retention and resistance to noise interference during information screening. As a result, overall recognition accuracy and stability are enhanced.
Soft threshold is the core step of many noise reduction methods, which sets the feature value whose absolute value is below a certain threshold
to zero, and shrinks the feature whose absolute value is greater than that threshold to zero. The function of soft threshold is expressed as follows:
where
is the threshold,
x and
y are the input and output features, respectively. As can be seen from the above formula, soft thresholding can actually set the feature value to zero in any interval, which is a more flexible way to eliminate the feature in a specific range.
In previous studies [
26], the threshold value is determined by experience. This paper makes the threshold more targeted by improving the SE module to automatically set the threshold value for each feature channel.
Figure 4 shows the improved SENet with an extra layer of soft-threshold output compared with the previous SENet. The weight
obtained by the attention mechanism and the absolute eigenvalues obtained on the
C channel are averaged to obtain the eigenvalue
. Therefore, the threshold on each channel is set to
, and then the input on each channel is soft-threshold. To obtain output
, in order to make full use of the weight distribution obtained by attention, improve network performance, and enhance the weight related to classification, the output
and weight
are focused on the feature channel.
The enhanced SENet architecture incorporates an innovative feature-weighting mechanism that dynamically adjusts channel-wise thresholds during classification. This adaptive approach provides two key advantages over the conventional structure: (1) sample-specific threshold determination for precise noise suppression, and (2) automated learning of cross-channel dependencies to eliminate feature redundancy. These modifications collectively improve the network’s discriminative feature extraction capability while maintaining computational efficiency.
3.2. Multi-Channel Convolutional Neural Networks Incorporating Attention Mechanisms (MCCNN-ISENet)
The proposed MCCNN-ISENet architecture integrates advanced attention mechanisms to enhance feature learning from multi-channel input signals, as illustrated in
Figure 5 with detailed layer configurations provided in
Table 1. The network employs sequential convolutional blocks for hierarchical feature extraction, where each block contains a 1D convolutional layer followed by batch normalization to stabilize training by reducing internal covariate shift. The ReLU activation function introduces nonlinear transformations while maintaining computational efficiency, and subsequent max-pooling layers progressively reduce spatial dimensions to expand receptive fields and improve feature robustness. A key innovation of this architecture lies in its channel interaction encoder, which combines enhanced SENet attention mechanisms with convolutional operations to model cross-channel dependencies and extract discriminative diagnostic features. This integrated approach dynamically adjusts feature representations based on inter-channel relationships while preserving important spatial patterns through the convolutional pathway. The normalization layers accelerate convergence and improve training stability by standardizing activations throughout the network, while the pooling operations optimize computational efficiency without sacrificing critical feature information. The architecture’s dual-path design effectively balances local feature extraction through convolutional processing with global channel relationship modeling via attention mechanisms, resulting in improved diagnostic performance across varying operating conditions. By jointly optimizing these complementary operations during end-to-end training, the network learns robust representations that capture both detailed fault characteristics and their contextual relationships within the multi-channel signal space.
3.3. Improvement of the MAML Algorithm
The MAML framework aims to enable rapid model adaptation to new tasks by learning transferable initial parameters through multi-task training. By optimizing across diverse but related tasks, MAML obtains parameter initialization with strong cross-task generalization potential. To further improve performance, we integrate weight decay regularization and momentum-based optimization into the MAML update process, enhancing both generalization capacity and training stability. However, a key limitation persists: initialization parameters optimal for certain training tasks may not generalize effectively to novel tasks, potentially hindering rapid adaptation. This fundamental challenge necessitates developing more robust parameter initialization strategies that maintain adaptability across varying task distributions.
3.3.1. Weight Decay (L2 Regularization)
Weight decay serves as an effective regularization method to mitigate model overfitting. This L2 regularization approach improves model generalization by constraining weight magnitudes through a penalty term during optimization. The technique operates by augmenting the standard gradient update with an additional parameter-dependent penalty, effectively balancing model complexity and performance. Mathematically, L2 regularization modifies the loss function through the addition of a squared weight penalty term, expressed as
where
is the original loss function, is the
i-th weight parameter, which is the coefficient corresponding to each feature,
is the regularization intensity (weight attenuation coefficient), which controls the influence of the regularization term, and
m is the total number of model parameters.
3.3.2. Momentum Term
The momentum technique accelerates gradient-based optimization by incorporating historical gradient information to guide parameter updates. This approach combines current gradient computations with exponentially decaying averages of past gradients, effectively reducing oscillations while maintaining consistent descent directions. By introducing a velocity term that accumulates gradient momentum, the method achieves faster convergence through smoother parameter trajectories and improved escape from local optima. The resulting update rule modifies standard gradient descent through momentum-based adjustments.
where
is the momentum variable at time step,
is the momentum factor (momentum term coefficient),
is the learning rate, and
is the gradient of the loss function
with respect to the parameter
at time step
.
3.3.3. MAML Parameter Update and Optimization
The MAML algorithm facilitates few-shot learning on novel tasks, yet its limited training samples often induce overfitting, ultimately degrading meta-testing performance. To address this, we integrate L2 regularization into MAML’s optimization framework, effectively constraining model complexity while improving feature discriminability. Furthermore, the incorporation of momentum term accelerates inner-loop convergence, particularly in optimization landscapes with steep gradients or high curvature, thereby generating more stable and reliable parameter updates.
Here is a combination of MAML’s two-layer update logic, taking into account L2 regularization and the parameter update mode of the momentum term:
In the MAML outer loop update, the gradient of the loss function after adaptation to task is first calculated:
where
is the parameter updated after an inner loop on task
, and
is the L2 regularization coefficient.
When updating the inner loop, the momentum term is introduced, and the momentum vector m is first initialized as zero or a small constant, and then updated as follows:
- (1)
Calculate the loss gradient of the current task (to account for regularization)
- (2)
Update the momentum vector according to Equation (5):
- (3)
Parameters in the MAML algorithm are combined with the momentum term:
Repeat the above internal cycle updating steps until the adaptation is complete and obtain the new parameter
after adaptation. Finally, the adaptive task loss gradient update element parameter
is used in the outer loop:
Here, is the outer cycle learning rate, is the momentum factor, and is the inner cycle learning rate. Through this dual updating mechanism, the MAML model can not only use L2 regularization to prevent overfitting when adapting to new tasks but also use momentum term to accelerate convergence and provide a more stable learning process.
3.3.4. Parameter Initialization Based on Specific Tasks
The standard MAML implementation employs identical initialization parameters across all tasks, failing to account for their individual characteristics. This limitation can be addressed by developing task-specific initial parameters to reduce the adaptation loss for new tasks [
28]. In conventional MAML, the parameter space becomes constrained once the initial weights and network architecture are fixed, as each task’s parameters are strictly derived from previous model iterations. To overcome this constraint, we propose an enhanced parameter selection strategy that incorporates task-relevant initialization parameters, thereby improving MAML’s adaptation effectiveness.
Specifically, after each iteration, the output features obtained by the forward propagation training of the initial parameters and the gradient parameters obtained by the back propagation of each task are saved. In order to learn the features and losses of different tasks as much as possible, the following formula is constructed:
For a model with
n tasks,
where
is the output value obtained for each task using the current parameter, thus obtaining the feature mapping and gradient of each task under the current parameter. To fuse the extracted parameters into the network, build a fully connected structure:
The
F,
L, obtained above are concatenated as the input of the fully connected network, and then nonlinear activation is carried out through ReLU. After a full connection, the mapping
of each task is obtained through Sigmoid nonlinear activation, and the weighting of each task is realized.
Figure 6 shows the update process diagram.
3.4. Fault Diagnosis Process of Plunger Pump Based on MAML-MCCNN-ISENet
To address plunger pump fault diagnosis under small-sample conditions, this paper presents an improved MAML-based diagnostic approach. The method employs an MCCNN-ISENet architecture as the core diagnostic model within the meta-learning framework. Through strategic integration of multi-task learning mechanisms, the optimized network achieves rapid adaptation to new tasks with limited samples while maintaining generalization performance. This design effectively mitigates overfitting risks and significantly improves diagnostic accuracy in data-scarce scenarios.
The MAML-MCCNN-ISENet plunger pump diagnostic framework operates through two sequential phases: meta-training and meta-testing. The meta-training phase employs diverse fault diagnosis tasks to optimize the MCCNN-ISENet model globally, yielding a robust parameter initialization that demonstrates strong cross-task generalization. These pre-trained parameters enable rapid adaptation to novel fault conditions, achieving convergence with minimal task-specific data during the meta-testing phase.
During meta-testing, the base model undergoes fine-tuning using the pre-trained initialization parameters from meta-training, enabling rapid adaptation to new fault diagnosis tasks. With only minimal training samples required for parameter updates, the model efficiently optimizes its settings to improve both convergence speed and diagnostic precision for previously unseen fault patterns. The complete workflow of the MAML-MCCNN-ISENet approach for plunger pump fault diagnosis is depicted in
Figure 7.
The entire fault diagnosis process of a plunger pump, based on MAML-MCCNN-SENet, is divided into the following stages: initial data acquisition, data preprocessing, dataset construction, task configuration, model training and testing. The overall framework is shown in
Figure 8.
Step 1: Construct a fault diagnosis test platform for plunger pumps to acquire vibration signals under various operational conditions and fault modes.
Step 2: The collected raw vibration signals undergo preprocessing via Adaptive Chirp Mode Decomposition (ACMD) to extract multidimensional feature components characterizing the plunger pump’s health conditions.
Step 3: The processed data is partitioned into training and test sets, with random sampling to create support–query pairs. Following meta-learning principles, multiple diagnostic meta-tasks are constructed within the training set. Each meta-task contains a number of support set samples for the initial update of the model parameters, and an independent query set sample to evaluate the model’s generalization performance on new fault types.
Step 4: A MAML-optimized diagnostic model is constructed using the MCCNN-ISENet feature encoder. The model undergoes multi-level optimization on the training set, where the meta-learning algorithm identifies optimal initial parameters enabling rapid adaptation to new fault diagnosis tasks.
Step 5: For new operational conditions or unseen fault types in the test set, the meta-trained parameters serve as initialization values, enabling rapid task adaptation through efficient gradient updates. This approach yields precise fault diagnosis based on the optimized model’s performance.