Few-Shot Learning Approaches for Fault Diagnosis Using Vibration Data: A Comprehensive Review

: Fault detection and diagnosis play a crucial role in ensuring the reliability and safety of modern industrial systems. For safety and cost considerations, critical equipment and systems in industrial operations are typically not allowed to operate in severe fault states. Moreover, obtaining labeled samples for fault diagnosis often requires signiﬁcant human effort. This results in limited labeled data for many application scenarios. Thus, the focus of attention has shifted towards learning from a small amount of data. Few-shot learning has emerged as a solution to this challenge, aiming to develop models that can effectively solve problems with only a few samples. This approach has gained signiﬁcant traction in various ﬁelds, such as computer vision, natural language processing, audio and speech, reinforcement learning, robotics, and data analysis. Surprisingly, despite its wide applicability, there have been limited investigations or reviews on applying few-shot learning to the ﬁeld of mechanical fault diagnosis. In this paper, we provide a comprehensive review of the relevant work on few-shot learning in mechanical fault diagnosis from 2018 to September 2023. By examining the existing research, we aimed to shed light on the potential of few-shot learning in this domain and offer valuable insights for future research directions


Introduction
Rotating machinery, such as pumps, fans, generators, and compressors, is one of the most commonly used types of equipment in industrial production.The faults in these devices can lead to production downtime, losses, and safety accidents.Therefore, the timely and accurate diagnosis of faults in rotating machinery and its critical components is crucial for ensuring production safety and improving efficiency.Modern technology has advanced to the point where faults can be diagnosed using various signals, such as vibration, sound, temperature, and current.These techniques assist engineers in promptly detecting faults and taking measures to prevent production accidents and losses.
In the field of fault diagnosis for rotating machinery, machine learning and deep learning methods have been widely applied.For example, support vector machines (SVM) [1]; decision trees [2]; random forests [3]; and various shallow or deep neural networks, including convolutional neural networks (CNNs) [4], recurrent neural networks (RNNs) [5], long short-term memory (LSTM) networks [6], and autoencoders (AEs) [7], can automatically extract features from the signals of rotating machinery and perform fault classification and prediction.Among these methods, deep learning, with its powerful feature extraction and representation capabilities, is one of the important directions in current fault diagnosis research.Despite the significant successes of deep learning in many domains, it still has limitations.One of them is that deep learning models usually require a large amount of training data to achieve high performance.In the industrial

The Systematic Review Process
This section presents an outline of the literature review procedure concerning the implementation of few-shot learning in mechanical fault diagnosis.The search for relevant literature predominantly occurred in two widely recognized academic databases, namely Web of Science and Scopus.These databases encompass a substantial collection of interdisciplinary research papers subjected to rigorous peer review, rendering them well-suited for unearthing a significant volume of studies on the subject of few-shot learning.
To avoid overlapping papers that were double-counted, we employed a systematic approach during our database search and subsequent paper selection process.Firstly, we conducted a thorough search in both databases to identify relevant papers.In this stage, there existed overlapping papers between these two databases.Upon identifying potentially relevant papers, we used reference management software to carefully compare the papers from both databases.In cases where we encountered the same paper in both Web of Science and Scopus, we ensured that it was counted as a single reference rather than being double-counted.In addition, we checked the abstract, introduction, and conclusion of each paper to make sure that all the papers we counted met our requirements.These processes allowed us to maintain the accuracy of our reference list and eliminate any duplications.
Specifically, during the literature review process, the following keywords were first used for the search: "Few-shot learning & fault diagnosis & vibration" and "Small sample learning & fault diagnosis & vibration", with a time span from 2018 to 30 September 2023.However, we identified more than 200 reference papers from the two databases.To be precise, using keywords "Small sample learning & fault diagnosis & vibration" and a time span from 2018 to 30 September 2023, we identified 102 papers from Scopus and 183 papers from Web of Science; using keywords "Few-shot learning & fault diagnosis & vibration" and the same time span, we identified 24 papers from Scopus and 46 papers from Web of Science.
Considering that few-shot learning refers to a machine learning paradigm in which a model is trained to recognize and generalize from a very limited number of examples or data points, for many papers, the number of examples used was one, five, or under twentyfive [13,14].On the other hand, small sample learning does not include a clear definition regarding the sample size.Therefore, we ultimately applied the keywords "Few-shot learning & fault diagnosis & vibration" with a time span from 2018 to 30 September 2023.
In the next step, inclusion and exclusion criteria were established to systematically narrow down the scope and ensure high-quality evaluation.Work-in-progress papers, preprints, and other non-peer-reviewed publications were excluded.Only peer-reviewed journals and conference proceedings were considered, and works that did not align with the target domain or went beyond the defined scope were filtered based on abstracts and article browsing.The retrieved literature was manually filtered, and relevant target articles were selected.Specifically, the focus was solely on the application of few-shot learning in mechanical fault diagnosis, excluding its application in other fields, such as medicine.Regarding the type of signals considered, only vibration signals or multimodal signals containing vibrations were taken into account, while other signals such as images or speech were disregarded.After the screening process, a total of 41 papers were retained as references.
Conducting the literature search in the manner mentioned above, it was observed that few-shot learning can be applied to various types of mechanical equipment in the field of fault diagnosis.These mainly include bearings, gearboxes, engines, pumps, suspension systems on trains, and pipelines.The distribution of few-shot learning applications in different types of mechanical equipment is shown in Table 1.

Bearings and gears 2
Bearings for electric machines 2 Bearings on machine tools 1

Freight train bearings 1
Magnetic flux leakage in a pipeline and bearings 1 Wind turbines (bearings and gearboxes) 1

Suspension systems on trains 1
Aviation hydraulic pumps 1

Autonomous underwater vehicles 1
Industrial robots 1 In addition to these classifications, a simple statistical analysis over time was performed, the results of which are shown in Figure 1.The statistics of the applied few-shot learning methods are show in Figure 2. From the figure, it can be observed that metalearning [15][16][17] is a prominent area of interest in few-shot learning methods, with over 50% of the articles focused on meta-learning.Within the domain of meta-learning, MAML (model-agnostic meta-learning) and its variations, along with metric-based meta-learning methods, collectively accounted for approximately 90% of the research contributions.
Electromechanical actuators 1 Industrial rotating machinery 1 Suspension systems on trains 1 Aviation hydraulic pumps 1 Autonomous underwater vehicles 1 Industrial robots 1 In addition to these classifications, a simple statistical analysis over time was performed, the results of which are shown in Figure 1.The statistics of the applied few-shot learning methods are show in Figure 2. From the figure, it can be observed that metalearning [15][16][17] is a prominent area of interest in few-shot learning methods, with over 50% of the articles focused on meta-learning.Within the domain of meta-learning, MAML (model-agnostic meta-learning) and its variations, along with metric-based meta-learning methods, collectively accounted for approximately 90% of the research contributions.In addition to these classifications, a simple statistical analysis over time was performed, the results of which are shown in Figure 1.The statistics of the applied few-shot learning methods are show in Figure 2. From the figure, it can be observed that metalearning [15][16][17] is a prominent area of interest in few-shot learning methods, with over 50% of the articles focused on meta-learning.Within the domain of meta-learning, MAML (model-agnostic meta-learning) and its variations, along with metric-based meta-learning methods, collectively accounted for approximately 90% of the research contributions.

Data augmentation methods
Others GAN (d)  The statistics regarding the different journals in which the reviewed papers were published are presented in Table 2.It can be seen from the figure that, from 2019 to 2023, the majority of research papers on few-shot fault diagnosis were published in Measurement Science and Technology, followed by ISA Transactions, Neural Computing and Applications, Measurement, IEEE Transactions on Industrial Informatics, IEEE Transactions on Instrumentation and Measurement, and Sensors.Other papers were also recognized by peers in other high-quality manufacturing journals and conferences.

Methods for Few-Shot Fault Diagnosis in Mechanical Systems
This section provides a detailed overview of the methods used in few-shot learning for machine fault diagnosis, specifically utilizing vibration data.We organized the content of this section according to the results presented in Figure 2 of Section 2. The approaches to few-shot learning are categorized into meta-learning, metric-based learning, data augmentation, and other methods.Since researchers have predominantly employed meta-learning algorithms in their studies, this section further subdivides meta-learning and primarily focuses on the application of metric-based meta-learning and learning initialization methods.

Meta-Learning
The idea of meta-learning was proposed as early as the 1990s [8,18].Serving as a primary approach in few-shot learning, meta-learning, or learning to learn, involves acquiring knowledge from multiple tasks and adapting to new ones.It aims to learn tasklevel knowledge instead of individual sample-level information, promoting task-agnostic learning systems over task-specific models.Meta-learning consists of two stages: metatraining and meta-testing.In the meta-training stage, the meta-learner learns from various tasks to capture underlying patterns or knowledge.In the meta-testing stage, the metalearner adapts its parameters based on new tasks, leveraging the meta-knowledge to make predictions swiftly using available labeled data.In our investigations, meta-learning was classified into metric-based meta-learning, learning initialization methods, and others.
As shown in Figure 3a, machine learning can be summarized in three steps: The first step involves defining a function with unknown parameters θ, where these unknown parameters can represent weights and biases.The second step entails formulating a loss function concerning the unknown parameters θ.The third step involves an optimization process to find a θ that minimizes the loss.Gradient descent is one of the most commonly used optimization methods.The process of meta-learning is shown in Figure 3b.Unlike traditional machine learning, meta-learning involves learning how to learn.Learning itself can be viewed as a function, represented as a learning algorithm denoted by F. This function takes a set of data as the input and produces a trained learning model as the output.For example, in a fault diagnosis task, the output would be a classifier.When test data are fed into the previously obtained classifier, it can classify the faults accordingly.
As depicted in the diagram, the first step in meta-learning is to define the learning algorithm  , where the learnable components are denoted as .While traditional ma- The process of meta-learning is shown in Figure 3b.Unlike traditional machine learning, meta-learning involves learning how to learn.Learning itself can be viewed as a function, represented as a learning algorithm denoted by F. This function takes a set of data as the input and produces a trained learning model as the output.For example, in a fault diagnosis task, the output would be a classifier.When test data are fed into the previously obtained classifier, it can classify the faults accordingly.
As depicted in the diagram, the first step in meta-learning is to define the learning algorithm F ϕ , where the learnable components are denoted as ϕ.While traditional machine learning often focuses on learning weights and biases, in meta-learning, these learnable components can include network architectures, initial parameters, learning rates, and more.Different meta-learning methods aim to learn different learnable components.
The second step involves defining the loss function for the learning algorithm F ϕ .In traditional machine learning, the loss function is derived from the training data, whereas in meta-learning, the loss function is derived from training tasks.This means that the loss function encapsulates the learning algorithm's ability to adapt and generalize across different tasks.
The third step is to use an optimization method to minimize the loss function.The choice of optimization algorithm depends on the nature of the learnable components ϕ.If the gradient ∂L(ϕ)/∂ϕ is computable, gradient descent can be used.However, if the gradient ∂L(ϕ)/∂ϕ is not computable, reinforcement learning algorithms or evolutionary algorithms can be employed to find solutions [19,20].
In summary, meta-learning aims to enhance the efficiency and adaptability of learning algorithms by enabling them to learn from multiple tasks and generalize better to new, unseen tasks.The optimization of the learnable components plays a crucial role in achieving this objective.According to Figure 2, meta-learning was further classified into metric-based meta-learning, learning initialization methods, and others.

Metric-Based Meta-Learning
Metric-based meta-learning is a specific type of meta-learning approach that utilizes metric learning to achieve rapid adaptation to new tasks.In metric-based meta-learning, it is customary to employ a metric learning method, such as a prototypical network, to learn an embedding space in which the similarity between tasks is associated with the distances between samples in the embedding space.The term "metric" in this context refers to the measurement of similarity between tasks, rather than the distance between data samples.
A typical metric-based meta-learning method for fault diagnosis is shown in Figure 4.In the initial stage, supervised learning is conducted on source domain data to train a classification model (upper branch).Subsequently, the feature extractor is kept fixed, and episodic training is performed on the metric embedding module using a set of few-shot tasks randomly sampled from the source domain.Finally, the trained feature extractor and metric embedding are applied to perform testing on target tasks (lower branch).Metricbased meta-learning enables models to adapt swiftly to new fault patterns, crucial for few-shot scenarios.It performs well when the number of labeled examples is limited, making it suitable for fault diagnosis with small datasets.

Prototypical networks
Prototypical networks learn prototypes for each class based on feature representations of limited samples.Classification is performed by assigning a query sample to the class closest to its prototype in the feature space, making it effective for few-shot classification tasks.
Many researchers have applied this method for few-shot tasks.Tang et al. [21] presented an enhanced prototypical network with L2 prototype correction for few-shot crossdomain fault diagnosis.The method utilized L2 correction to refine prototype representations, enabling effective diagnosis across different domains with limited labeled data.Feng et al. [22] employed prototypical networks in the proposed semi-supervised meta-learning networks.The method learned class prototypes by averaging feature representations of limited labeled samples.Squeeze-and-excitation attention was integrated to enhance prototype quality, while the model was trained in a semi-supervised meta-learning framework for improved performance.In [23], a wavelet-prototypical network was employed for few-shot fault diagnosis.It fused time and frequency domain information using wavelet transforms.
classification model (upper branch).Subsequently, the feature extractor is kept fixed, and episodic training is performed on the metric embedding module using a set of few-shot tasks randomly sampled from the source domain.Finally, the trained feature extractor and metric embedding are applied to perform testing on target tasks (lower branch).Metric-based meta-learning enables models to adapt swiftly to new fault patterns, crucial for few-shot scenarios.It performs well when the number of labeled examples is limited, making it suitable for fault diagnosis with small datasets.[14].

Prototypical networks
Prototypical networks learn prototypes for each class based on feature representations of limited samples.Classification is performed by assigning a query sample to the class closest to its prototype in the feature space, making it effective for few-shot classification tasks.
Many researchers have applied this method for few-shot tasks.Tang et al. [21] presented an enhanced prototypical network with L2 prototype correction for few-shot crossdomain fault diagnosis.The method utilized L2 correction to refine prototype representations, enabling effective diagnosis across different domains with limited labeled data.Feng et al. [22] employed prototypical networks in the proposed semi-supervised metalearning networks.The method learned class prototypes by averaging feature representations of limited labeled samples.Squeeze-and-excitation attention was integrated to enhance prototype quality, while the model was trained in a semi-supervised meta-learning framework for improved performance.In [23], a wavelet-prototypical network was employed for few-shot fault diagnosis.It fused time and frequency domain information using wavelet transforms.

Matching networks
Matching networks use attention mechanisms to establish correspondences between query and support samples.Attention weights are learned during training to classify query samples based on the labeled information aggregated from support samples, accommodating variable-length inputs in few-shot learning scenarios.Xu et al. [28] proposed a deep convolutional nearest-neighbor matching network (DC-NNMN) for cross-component fewshot fault diagnosis.The approach utilized matching networks with attention mechanisms to establish correspondences between query and support samples.DC-NNMN efficiently diagnosed faults across different components with limited labeled data, achieving accurate results.Wang et al. [14] proposed a feature space metric-based meta-learning model to distinguish failure attribution accurately under conditions of very limited data.A matching network and prototypical network were applied, respectively, in the proposed model to match the metric features to the support features using a public dataset.A similar approach was applied in [29] employing a matching network to match the metric features to the support features using experimental datasets.

Relation networks
Relation networks recognize relationships between samples using an embedding network to encode input samples into feature representations and a relation network to predict relationships between pairs of feature vectors, capturing complex sample relationships.Kang et al. [26] developed a few-shot rolling-bearing fault classification method using an improved relation network by introducing a residual shrinkage module and a scaled exponential linear unit activation function into the embedding module of the relation network.The approach enhanced the relation network's ability to capture complex relationships between samples in few-shot scenarios.Wang et al. [27] applied a relation network for the few-shot multiscene fault diagnosis of rolling bearings under compound-variable working conditions.Further studies using relation networks can be found in [30,31].
Other researchers have applied metric-based meta-learning methods comprising a subspace network with shared representation learning [32] and a cross-level fusion neural network [33].
Meta-learning, in general, focuses on training models to learn from different tasks or datasets so that they can adapt quickly to new, unseen tasks or datasets.Metric-based meta-learning, in particular, emphasizes the use of a metric or distance function to enable this rapid adaptation.For classification purposes, metric-based meta-learning could be placed in either Section 3.1 (Meta-Learning) or Section 3.2 (Metric Learning).However, our examination of the relevant literature revealed that the papers addressing metric-based meta-learning methods mainly considered task adaptation.Metric-based meta-learning primarily focuses on the ability of a model to adapt quickly to new tasks or datasets.It achieves this by learning a metric or similarity measure during meta-training, which helps in task-specific adaptation.This aligns closely with the core objective of meta-learning, which is to enable models to generalize effectively to new tasks.Additionally, metric-based meta-learning is often applied in few-shot learning scenarios, where the model needs to make predictions with a very limited number of labeled examples in the support set.This is a typical use case within meta-learning, as one of its primary applications is to excel in few-shot or low-data scenarios.Therefore, we included metric-based meta-learning in the broader category of meta-learning.

Learning Initialization Methods
In meta-learning, learning initialization methods refer to a category of algorithms that leverage pretrained models or knowledge obtained from previous tasks to facilitate faster and more effective learning on new, unseen tasks.The core idea behind these methods is to utilize the knowledge acquired from related tasks as a starting point for learning new tasks, thus enabling the model to adapt and generalize quickly with limited data.As for learning initialization meta-learning, the prevailing networks are model-agnostic meta-learning (MAML) [34][35][36][37][38][39] and reptile [40] networks.MAML MAML [41], model-agnostic meta-learning, is a meta-learning algorithm that enables fast adaptation to new tasks with limited data.It optimizes a model's initial parameters such that it can efficiently adapt to a new task using just a few gradient updates.MAML learns a more general initialization that facilitates quick fine-tuning on unseen tasks.By using gradient-based optimization to adapt models across tasks, MAML promotes better generalization and transfer learning, making it a powerful and flexible approach for fewshot learning scenarios.
Several papers have applied this method.Liu et al. [35] used MAML to train a metabaseline model capable of quickly diagnosing faults in wind turbines with limited labeled data.MAML enabled the model to efficiently adapt to new turbine faults with just a few gradient updates, improving few-shot fault diagnosis performance.Yang et al. [36] applied MAML for few-shot fault diagnosis tasks in high-speed train suspension systems.The approach improved fault diagnosis performance by efficiently leveraging knowledge across different suspension system faults.Yu et al. [37] applied MAML to a learnable and interpretable framework based on model-free DA methods, D3AFS, for industrial scenarios with limited data.The good performance of the proposed model was verified using a magnetic flux leakage dataset in a pipeline and bearing datasets.Further papers addressing the MAML concept can be found in [34,38,39].

Reptile
Reptile [42] is a meta-learning algorithm that facilitates quick adaptation to new tasks by fine-tuning a model's parameters.It operates by repeatedly sampling tasks, training the model on each task for a few gradient steps, and then updating the model's parameters based on the changes accumulated across tasks.The algorithm encourages the model to learn a more general initialization that can be easily fine-tuned for different tasks.Unlike traditional meta-learning approaches, reptile does not maintain an explicit representation of meta-parameters, making it more straightforward and efficient.Reptile has shown promising results in few-shot learning scenarios, demonstrating improved generalization across tasks.Pei et al. [40] employed reptile to enhance the few-shot Wasserstein autoencoder (WAE).Reptile was utilized to further enhance the mapping ability of WAE from prior distribution to vibration signals when faced with a small dataset.
The abovementioned methods offer a flexible way to initialize models, enabling them to adapt quickly to new tasks.However, training can be computationally intensive, requiring substantial resources.

Other Methods
Other researchers have applied ensemble techniques in meta-learning.For example, Li et al. [43] proposed a light gradient-boosting-machine-based multiscale weighted ensemble model to perform effective few-shot fault diagnosis without requiring cross-domain data.Chen et al. [44] designed a meta-self-attention multiscale convolution neural network for the actuator fault diagnosis of autonomous underwater vehicles.Che et al. [45] employed ensemble meta-learning, enabling the model to adapt quickly to new fault conditions with few samples.By combining multiple models and leveraging meta-learning techniques, the proposed method achieved enhanced diagnostic performance in variable working conditions.
The future of meta-learning holds the potential for emphasizing fine-grained metalearning and crafting more robust meta-learning models with enhanced transferability and generalization capabilities.In the realm of fine-grained meta-learning, forthcoming research could concentrate on devising techniques that enable adaptability to even more nuanced variations in fault patterns.This entails the development of models capable of swiftly generalizing to hitherto unencountered fault types or levels of severity.In the pursuit of crafting more robust meta-learning models possessing heightened transferability and generalization capabilities, enhancing the transferability of meta-learned models across different machinery types and operating conditions is crucial.Researchers could delve into methodologies aimed at ensuring that insights garnered from one machinery category can be effectively applied to others.

Metric Learning
Metric learning [46] is a machine learning task that focuses on learning a distance or similarity metric between datapoints in a given feature space.A typical example of metric learning can be found in Figure 5.The objective of metric learning is to ensure that the learned metric reflects the actual similarity or dissimilarity between data samples, making it possible to measure how similar or dissimilar they are to each other.Metric learning can be used in traditional machine learning, deep learning, or meta-learning.In the current paper, the application of metric learning in few-shot learning is called metric-based meta-learning (Section 3.1.1).
The process of metric learning typically involves a set of training samples, each with a corresponding label or similarity information.The learning algorithm (i.e., Siamese, prototypical, or relation networks) then uses these labeled data to optimize the metric in a way that brings similar samples closer together and dissimilar samples farther apart in the feature space.
Lu et al. [47] applied multiple-kernel maximum mean discrepancy, which is an improved version of maximum mean discrepancy, for distribution discrepancy measurement and developed a multiview and multilevel network model for fault diagnosis.In another paper by the lead author [48], the fault diagnosis problem was treated as a similarity metric learning problem, with a transfer relation network (TRN) proposed to achieve this objective.The TRN model incorporated a relation learning module that captured knowledge and patterns shared between the source and target domains.Further papers addressing metric learning can be found in [49][50][51][52].
similarity metric between datapoints in a given feature space.A typical example of metric learning can be found in Figure 5.The objective of metric learning is to ensure that the learned metric reflects the actual similarity or dissimilarity between data samples, making it possible to measure how similar or dissimilar they are to each other.Metric learning can be used in traditional machine learning, deep learning, or meta-learning.In the current paper, the application of metric learning in few-shot learning is called metric-based metalearning (Section 3.1.1).The process of metric learning typically involves a set of training samples, each with a corresponding label or similarity information.The learning algorithm (i.e., Siamese, prototypical, or relation networks) then uses these labeled data to optimize the metric in a way that brings similar samples closer together and dissimilar samples farther apart in the feature space.
Lu et al. [47] applied multiple-kernel maximum mean discrepancy, which is an improved version of maximum mean discrepancy, for distribution discrepancy measurement and developed a multiview and multilevel network model for fault diagnosis.In another paper by the lead author [48], the fault diagnosis problem was treated as a similarity metric learning problem, with a transfer relation network (TRN) proposed to achieve this objective.The TRN model incorporated a relation learning module that captured knowledge and patterns shared between the source and target domains.Further papers addressing metric learning can be found in [49][50][51][52].
Metric learning can lead to improved feature representations for fault diagnosis, emphasizing relevant patterns.Metric-based methods heavily depend on the quality and representativeness of the training data.
The future prospects for metric learning encompass two main directions: embedding for interpretability and the development of hybrid approaches.Embedding for interpretability involves the development of metric learning techniques that generate interpretable embeddings.This direction holds promise for gaining insights into the contributions of specific features to fault diagnosis, promoting transparency and actionable insights.The development of hybrid approaches entails the fusion of metric learning with other machine learning methods, such as deep learning and ensemble techniques, with the potential to enhance model performance.Hybrid models can effectively capture both global and local similarities in data, harnessing the strengths of diverse learning paradigms to improve fault diagnosis accuracy and robustness.These future developments in metric The future prospects for metric learning encompass two main directions: embedding for interpretability and the development of hybrid approaches.Embedding for interpretability involves the development of metric learning techniques that generate interpretable embeddings.This direction holds promise for gaining insights into the contributions of specific features to fault diagnosis, promoting transparency and actionable insights.The development of hybrid approaches entails the fusion of metric learning with other machine learning methods, such as deep learning and ensemble techniques, with the potential to enhance model performance.Hybrid models can effectively capture both global and local similarities in data, harnessing the strengths of diverse learning paradigms to improve fault diagnosis accuracy and robustness.These future developments in metric learning offer opportunities to enhance transparency and leverage complementary techniques for more effective fault detection and diagnosis in mechanical systems.

Data Augmentation Method
The core issue in few-shot fault diagnosis lies in the limited number of samples, which hinders the training of a reliable, highly generalizable diagnostic model.In the case of limited data, a generative model can be used to enhance sample diversity.According to the reviewed publications on the application of few-shot learning in the machine fault diagnosis field using vibration data from 2018 to 2023, the data augmentation methods are mainly GAN (generative adversarial network)-based methods [53,54].

GAN-Based Methods
A GAN is a type of unsupervised learning model that consists of two neural networks: a generator network and a discriminator network.The main idea behind GANs is to train the generator network to produce realistic data samples by learning from a training dataset.The generator takes random noise as the input and generates synthetic samples.The discriminator network, on the other hand, acts as a binary classifier that distinguishes between real data samples from the training set and the synthetic samples generated by the generator.During the training process, the generator and discriminator networks play a game against each other.The generator tries to produce realistic samples that can fool the discriminator, while the discriminator aims to correctly classify real and synthetic samples.The two networks are trained simultaneously, and their performance improves iteratively through an adversarial process.
As training progresses, the generator becomes better at generating realistic samples, and the discriminator becomes more skilled at distinguishing between real and synthetic data.Eventually, the generator network learns to generate samples that are difficult for the discriminator to differentiate from real data.It is worth mentioning that GANs face several challenges, including mode collapse (when the generator produces a limited set of similar samples) and instability during training [10].However, they continue to be an active area of research, and many advancements and variations have been developed to address these issues and push the boundaries of generative modeling.
In the field of machine fault diagnosis, researchers usually use one-dimensional (1D) or 2D samples.One-dimensional samples include time-domain data, frequency-domain data, and some extracted data features.Li et al. [53] explored two few-shot learning techniques involving parameter fine-tuning and a conditional Wasserstein generative adversarial network (C-WGAN) for diagnosing faults in freight train rolling bearings.They applied data segmentation and transformed the signals into the frequency domain, resulting in 1D frequency signals.To automatically extract features from the bearing vibration signals and classify fault types, they developed a one-dimensional convolutional neural network (1D-CNN).Gao et al. [55] developed a data augmentation model based on an integrated convolutional transformer GAN to improve diagnostic performance under limited-data conditions by generating high-quality signals.Wan et al. [56] proposed an unsupervised fault diagnosis method based on a quick self-attention convolutional generative adversarial network.Chen et al. [54] utilized a Wasserstein deep convolutional generative adversarial network (WDCGAN) to improve the performance of few-shot fault diagnosis in electrohydrostatic actuators.They achieved this by transforming multidimensional experimental data into 2D grayscale data and extracting local features, effectively emphasizing the time-series characteristics and correlations among different signals.

Other Data Augmentation Models
Xia et al. [57] proposed an augmentation-based discriminative meta-learning method to address the issue of the few-shot cross-machine domain shift problem.During the metatraining process, signal transformation was introduced to enhance meta-task diversity for robust feature learning, and multi-scale learning was integrated for adaptive feature embedding; meanwhile, in the meta-testing phase, sparse labeled fault data boosted model generalization using quasi-meta-training via data augmentation, and a new hyperbolic prototypical loss was designed to ensure more distinct feature representation with a hyperbolic decision boundary for separable category prototypes.
Zhao et al. [58] introduced a data augmentation technique called randomized wavelet expansion to create a collection of synthetic samples that possessed comparable characteristics to the original samples.Subsequently, the synthesized samples were utilized as the training dataset for a deep CNN to achieve the few-shot fault diagnosis of aviation hydraulic pumps.
Wang et al. [59] proposed an extended convolutional adversarial autoencoder (ECAAE) as an end-to-end fault diagnosis approach for electromechanical actuators (EMAs) based merely on vibration signals.The ECAAE combined a set of CNNs for feature extraction with the data generation capabilities of adversarial autoencoders, allowing the model to utilize both unlabeled and unbalanced labeled samples.Through an adversarial training process and hyperparameter-free signal conversion method, the ECAAE achieved robust and precise fault diagnosis for EMAs even with varying working conditions, unbalanced samples, and few-shot situations.
Hu et al. [60] introduced a self-supervised learning framework that combined both inter-instance learning and intra-temporal learning.This innovative approach aimed to improve the model's ability to generalize from limited labeled data by leveraging the inherent structure and temporal information in the data.
In the field of few-shot fault diagnosis, the quantity and quality of data play a crucial role in the performance of models.However, in practical applications, acquiring a substan-tial amount of high-quality annotated data can be prohibitively expensive and challenging.In such scenarios, generative models can serve as an effective solution.By leveraging generative models, it becomes possible to generate a large number of new samples based on the distribution of existing data.Although these samples are synthesized, they retain certain characteristics and the distribution of the original data to an extent.It is important to note that while generative models can enhance the diversity of samples, the generated samples may not fully represent the true distribution of real data.Therefore, caution is required when utilizing generated samples.Additionally, the quality and efficacy of generative models are influenced by the design and training process of the model.Consequently, when employing generative models for data augmentation, the careful adjustment of the model parameters is necessary to ensure that the generated samples are statistically similar to the real data, thereby guaranteeing the effectiveness of data augmentation.Data augmentation enhances the diversity of training samples, potentially reducing overfitting.It helps models generalize better to unseen fault patterns.It is worth noting that the effectiveness of data augmentation is limited by the quality and quantity of original data.
The future outlook for data augmentation methods encompasses two main aspects: cross-modal data augmentation and self-supervised learning.Cross-modal data augmentation involves the exploration of techniques facilitating the integration of data from diverse modalities, such as vibration signals, acoustic signals, and temperature data.This integration holds the potential to yield more comprehensive and informative representations for fault diagnosis, allowing for a richer understanding of system behavior.In the context of self-supervised learning, the amalgamation of self-supervised learning methods with data augmentation stands as a promising approach.This synergy could significantly reduce the dependence on labeled data, enabling models to acquire valuable representations from unlabeled or weakly labeled data, thereby advancing the efficiency and versatility of fault diagnosis processes.These future directions for data augmentation methods offer the prospect of enhancing the depth and breadth of insights available for fault detection and diagnosis in mechanical systems.

Other Few-Shot Learning Methods
Taking into account the impact of noise on vibration signals, Ma et al. [61] introduced a multi-order graph embedding model.They presented an enhanced sine cosine algorithm strategy to optimize the feature extraction capability of a stacked denoising autoencoder.This method was then employed to address the few-shot diagnosis issue.Chen et al. [62] introduce a multi-channel calibrated transformer (MCSwin-T) model with shifted windows to tackle few-shot fault diagnosis in scenarios with sharp speed variation.The model employed multi-channel calibration to align feature representations across speeds, enabling effective fault pattern recognition with limited labeled data.Ma et al. [31] developed a feature extractor named MiniNet that struck an optimal balance between channel count and network depth during fault feature extraction.Leveraging MiniNet, the authors introduced a fault diagnosis model for few-shot samples that effectively enhanced both transferability and discriminability.Wang et al. [63] proposed a method based on two-dimensional (2D) images and cross-domain few-shot learning for bearing fault diagnosis.In the paper, the authors did not mention meta-learning or metric-based methods, although a distance-based classifier was employed to improve the classification capacity of the few samples.Therefore, we included this paper in the category of other few-shot learning methods.

Discussion
The analysis of the literature revealed that meta-learning holds a central position in the realm of few-shot learning approaches, constituting more than 50% of the studied articles.Among the various meta-learning techniques, MAML (model-agnostic meta-learning) and its modifications, alongside metric-based meta-learning methods, collectively contributed to approximately 90% of the research endeavors in this field.In addition, through the review, we found that transformers are increasingly popular for applications in few-shot learning; related applications can be found in [55,62,64].
Looking ahead, few-shot learning holds promising prospects in mechanical fault diagnosis.Robustness and Generalization: Improving the robustness and generalization capabilities of few-shot learning models is an ongoing pursuit.Research will continue to address issues related to noisy data, domain shifts, and class imbalances to make few-shot learning more reliable in practical settings.
Unsupervised Few-shot Learning: Exploring unsupervised few-shot learning, where models can learn from unlabeled data in a few-shot scenario, is an exciting and challenging direction.This area has the potential to further reduce the reliance on labeled data.
Combining Few-shot Learning with Other Techniques: Integrating few-shot learning with other machine learning paradigms, such as transfer learning, semi-supervised learning, and reinforcement learning, could lead to more powerful and versatile models.
Few-shot Learning in Real-world Applications: Few-shot learning is valuable in realworld applications due to its data efficiency, rapid adaptation to new tasks, flexibility across domains, reduced annotation effort, and effective generalization to unseen data.These advantages make it a promising solution for addressing data scarcity and dynamic environments while minimizing manual labeling and achieving better performance.

Conclusions
Continuing the work of previous researchers, this paper focused on a comprehensive analysis of few-shot learning methods applied in the domain of mechanical fault diagnosis using vibration signals.The study particularly concentrated on advancements made within the past five years, spanning from 2018 to September 2023.We found that metalearning is a prominent area of interest in few-shot learning methods, with over 50% of the articles focused on meta-learning.Within the domain of meta-learning, MAML (modelagnostic meta-learning) and its variations, along with metric-based meta-learning methods, collectively accounted for approximately 90% of the research contributions.In summary, the use of few-shot learning in real-world applications is motivated by its ability to handle data scarcity, adapt rapidly to new tasks, demonstrate flexibility across domains, reduce annotation effort, facilitate transfer learning, and effectively generalize to unseen data.These advantages make few-shot learning a promising approach for addressing various challenges in practical applications and pushing the boundaries of machine learning in real-world settings.

Figure 1 .
Figure 1.Number of publications by application of few-shot learning in machine fault diagnosis field using vibration data from 2018 to September 2023.

Figure 1 .
Figure 1.Number of publications by application of few-shot learning in machine fault diagnosis field using vibration data from 2018 to September 2023.

Figure 2 .
Figure 2. Statistics of few-shot learning methods in machine fault diagnosis field using vibration data from 2018 to September 2023: (a) few-shot learning methods; (b) meta-learning; (c) metric learning; (d) data augmentation methods.

Figure 2 .
Figure 2. Statistics of few-shot learning methods in machine fault diagnosis field using vibration data from 2018 to September 2023: (a) few-shot learning methods; (b) meta-learning; (c) metric learning; (d) data augmentation methods.

Figure 3 .
Figure 3.The differences between traditional learning and meta-learning: (a) the procedure of traditional machine learning, (b) the procedure of meta-learning.

Figure 3 .
Figure 3.The differences between traditional learning and meta-learning: (a) the procedure of traditional machine learning, (b) the procedure of meta-learning.

Figure 5 .
Figure 5.Typical example of metric learning.

Figure 5 .
Figure 5.Typical example of metric learning.Metric learning can lead to improved feature representations for fault diagnosis, emphasizing relevant patterns.Metric-based methods heavily depend on the quality and representativeness of the training data.The future prospects for metric learning encompass two main directions: embedding for interpretability and the development of hybrid approaches.Embedding for interpretability involves the development of metric learning techniques that generate interpretable embeddings.This direction holds promise for gaining insights into the contributions of specific features to fault diagnosis, promoting transparency and actionable insights.The development of hybrid approaches entails the fusion of metric learning with other machine learning methods, such as deep learning and ensemble techniques, with the potential to enhance model performance.Hybrid models can effectively capture both global and local similarities in data, harnessing the strengths of diverse learning paradigms to improve fault diagnosis accuracy and robustness.These future developments in metric learning offer opportunities to enhance transparency and leverage complementary techniques for more effective fault detection and diagnosis in mechanical systems.

Table 1 .
The distribution of few-shot learning applications in different types of mechanical equipment.

Table 2 .
Number of publications by journal for few-shot learning in machine fault diagnosis field using vibration data from 2018 to 2023.