A Model-Agnostic Meta-Baseline Method for Few-Shot Fault Diagnosis of Wind Turbines

The technology of fault diagnosis is helpful to improve the reliability of wind turbines, and further reduce the operation and maintenance cost at wind farms. However, in reality, wind turbines are not allowed to operate with faults, so few fault samples could be obtained. With a small amount of training data, traditional fault diagnosis models that need huge samples under a deep learning framework are difficult to maintain with high accuracy and effectiveness. Few-shot learning can effectively solve the problem of overfitting caused by fewer fault samples in model training. In view of model-agnostic meta-learning (MAML), this paper proposes a model for few-shot fault diagnosis of the wind turbines drivetrain, which is named model-agnostic meta-baseline (MAMB). The training data is input to the base classification model for pre-training, then, some data is randomly selected from the training set to form multiple meta-learning tasks that are utilized to train the MAML to finally fine-tune the later layers of the model at a smaller learning rate. The proposed model was analyzed by the small samples of the bearing data from Case Western Reserve University (CWRU) data, the generator bearings, and gearboxes vibration data in wind turbines under randomly changing operating conditions. The results verified that the proposed method was superior in one-shot, five-shot, and ten-shot tasks of wind turbines.


Introduction
As the installed capacity of wind turbines increases rapidly, the technology of condition monitoring and fault diagnosis attracts more attention to guarantee the operational reliability of wind turbines. The huge vibration data collected from wind farms prompts the development of intelligent diagnosis of wind turbines, which is driven by the progress of the technology of artificial intelligence. However, in reality, wind turbines are not allowed to run with faults. When a fault occurs, the wind turbine has to shut down. Therefore, the collected operating data are mostly normal data under healthy status, with very few fault data. Obviously, these kinds of data from wind turbines are insufficient to train an intelligent classification model using traditional deep learning, due to the potential overfitting caused by sample imbalance and type imbalance. Data augmentation and regularization techniques can alleviate the overfitting caused by low data volume [1]. Data augmentation refers to the addition of data by manual rules such as pan, flip, cut, and rotate. Designing these rules relies heavily on domain knowledge and requires expensive labor costs. Regularization can be used to correct the direction of descent. However, neither of the two methods can fundamentally solve the overfitting problem when data are extremely scarce.
Few-shot learning can train deep models with very limited data and solve the problem of overfitting caused by a small number of fault samples. The current few-shot learning makes full use of the advantages of deep neural networks in feature representation and spectrum of a faulty wind turbine motor and compares it with the current spectrum of a healthy motor to achieve fault diagnosis [25].
SCADA-based time series for anomaly detection in wind turbines is of great practical importance [26][27][28][29][30][31][32][33][34]. However, there are currently some limitations in using SCADA for fault diagnosis of wind turbines. SCADA cannot monitor vibrations at multiple measurement points and locations as a wind turbine condition monitoring system (CMS) can. Very precise fault location of wind turbines is not possible with SCADA data, for example, SCADA data cannot determine whether the inner or outer ring of a bearing is faulty or detect compound faults in multiple gears of a gearbox. The use of CMS can solve the limitations of SCADA and can effectively diagnose the specific fault location.
Our group has done a range of work on fault diagnosis for wind turbines using CMS. In the literature [35] the complex wavelet transform was used to extract weak faults in the wind turbine gearbox by analyzing the strips of the multiscale enveloping spectrogram (MuSEnS) on different scales. Conventional demodulation analysis, cyclic coherence function, complex wavelet transform, and spectral kurtosis were used to analyze the vibration signals of a real 2 MW wind turbine generator with a faulty bearing [36]. Empirical wavelet transform was utilized to adaptively find weak fault frequency in the planetary stage as well as evident fault characteristics in other ordinary stages [37]. The normalized multistage enveloping spectrogram was presented to reveal the fault characteristic frequencies of planetary gears and bearings [38]. The literature [39] reviewed almost all the research on the vibration-based diagnosis algorithm for wind turbines in the past decade.
The above research mainly addresses the problem of variable operating conditions, compound faults and weak faults in wind turbines from the perspective of signal processing. Signal processing often requires some prior knowledge and expert experience. In contrast, deep learning does not require too much human intervention and can effectively improve the intelligence of fault diagnosis. CMS is more expensive and requires additional hardware and software costs. Therefore, in reality, CMS does not have access to sufficient sample data as SCADA does. In practical situations, wind turbine fault samples are few, and the specific operating conditions at every moment cannot be accurately obtained. Therefore, wind turbines' few-shot learning requires a more powerful meta-learning model. In combination with convolutional neural network (CNN) pre-training, MAML, and fine-tuning, this paper makes full use of CNN's classification ability, MAML's generalization ability to learn new tasks, and fine-tuning's ability to further optimize parameters, so as to better solve the problem of wind turbine few-shot fault diagnosis under variable working conditions and noise.
In this paper, a novel few-shot fault diagnosis model of wind turbine drivetrain based on model-agnostic meta-learning (MAML) is carried out. Three types of vibration data are analyzed to verify the advantages of the proposed model, including the few-shot case of the bearing data from Case Western Reserve University (CWRU), the few-shot case of wind turbine generator bearing, and the few-shot case of wind turbine gearbox. Each class of data contained data in both x and y directions, all sampled at 1 s. All training data was input into the classifier to train a base model, i.e., the base classifier, then, randomly selected samples from the training datasets were used to build the meta-learning task, and the base classifier was further updated using MAML; further, the optimal classifier was achieved by fine-tuning. The rest of this paper is organized as follows: Section 2 introduces the basic concepts of few-shot learning and MAML. In Section 3, a few-shot fault diagnosis model for wind turbines based on MAML is proposed. In Section 4, the on-site wind turbine datasets are input into the proposed model for training and testing, and the results are analyzed. Section 5 concludes the paper. set and the testing set. The training set comes from the source domain, the testing set shares the same label space and comes from the target domain, and the source domain does not intersect with the target domain. In the training meta-learning process, k data are selected from the training set as the support set S, and q data as the query set Q. If the support set contains c categories with k labeled data in each category, the few-shot problem is called a c-way k-shot. Since the number of labeled samples in the support set is extremely small, meta-learning is performed on the training set to extract transferable knowledge and classify the testing set.

Few-Shot
The support set S and query set Q are extracted from the source domain data, the support set is used as a labeled sample to generate prototype features for the model, and the query set is used as a training sample to update the model. Both the support set and query set form a meta-task, and multiple meta-tasks form a training set. For a c-way k-shot problem, during the training phase, c categories are randomly selected in the training set, and k samples are selected from each category (a total of k × c data) to construct a meta-task as the support set of the model ; then a batch of samples from the remaining data in these c categories are selected as the query set to update the model. During the training process, different meta-tasks are sampled for each training, so overall, the training contains different combinations of categories, and this mechanism enables the model to learn common parts of different meta-tasks, such as how to extract important features and compare sample similarities. The models learned through this learning mechanism will perform better at classifying when facing new unseen meta-tasks.

Model-Agnostic Meta-Learning
Finn et al. proposed model-agnostic meta-learning (MAML) [40], which is compatible with any model trained with gradient descent, by explicitly training the parameters of the model so that a new task requires only a small number of gradient steps and a small amount of training data to produce good generalization performance. The method has achieved good performance in computer vision [41][42][43], speech recognition [44,45], and reinforcement learning [46].
The MAML meta-gradient update involves a gradient through a gradient, i.e., MAML is based on a secondary gradient, which provides many flexibilities for MAML to adapt to different models. The MAML update process is shown in Figure 1. Define the model as f, the parameter of the model as φ, and its initialization parameter as φ 0 . For discrete classification tasks with a cross-entropy loss, the loss is: where x (j) , y (j) are an input/output pair sampled from task T i .

Few-Shot Learning Based on Meta-Learning
Meta-learning was originally driven by the human learning process, where humans can learn to recognize a new object with a few instances. The model contains the training set and the testing set. The training set comes from the source domain, the testing set shares the same label space and comes from the target domain, and the source domain does not intersect with the target domain. In the training meta-learning process, k data are selected from the training set as the support set S, and q data as the query set Q. If the support set contains c categories with k labeled data in each category, the few-shot problem is called a c-way k-shot. Since the number of labeled samples in the support set is extremely small, meta-learning is performed on the training set to extract transferable knowledge and classify the testing set.
The support set S and query set Q are extracted from the source domain data, the support set is used as a labeled sample to generate prototype features for the model, and the query set is used as a training sample to update the model. Both the support set and query set form a meta-task, and multiple meta-tasks form a training set. For a c-way kshot problem, during the training phase, c categories are randomly selected in the training set, and k samples are selected from each category (a total of k×c data) to construct a metatask as the support set of the model (n = q×c) to update the model. During the training process, different meta-tasks are sampled for each training, so overall, the training contains different combinations of categories, and this mechanism enables the model to learn common parts of different meta-tasks, such as how to extract important features and compare sample similarities. The models learned through this learning mechanism will perform better at classifying when facing new unseen meta-tasks.

Model-Agnostic Meta-Learning
Finn et al. proposed model-agnostic meta-learning (MAML) [40], which is compatible with any model trained with gradient descent, by explicitly training the parameters of the model so that a new task requires only a small number of gradient steps and a small amount of training data to produce good generalization performance. The method has achieved good performance in computer vision [41][42][43], speech recognition [44,45], and reinforcement learning [46].
The MAML meta-gradient update involves a gradient through a gradient, i.e., MAML is based on a secondary gradient, which provides many flexibilities for MAML to adapt to different models. The MAML update process is shown in Figure 1. Define the model as f, the parameter of the model as ϕ, and its initialization parameter as ϕ0. For discrete classification tasks with a cross-entropy loss, the loss is: where x (j) , y (j) are an input/output pair sampled from task Ti.    Figure 2 illustrates the process of MAML update step by step, assuming that the learning rate for a single task θ update is γ and the learning rate for model φ update is η, the steps of MAML are as follows: (2) Calculate the sum of the losses of all tasks on the query set: (3) Update the initialization parameters: Figure 2. Update gradient by gradient.
As shown in Figure 3, the original intention of MAML is to find the appropriate parameter ϕ that makes it possible to descend to the global optimum regardless of the loss curve of task1 or task2.

Fine-Tuning the Model
Due to the bias in the distribution of the source and target domains, direct classification of the target domain by the base model trained in the source domain usually does not achieve the desired effect. Fine-tuning the pre-trained model using the support set data in the target domain will be beneficial to further improve the classification accuracy of the test set by fine-tuning the parameters of the fully connected layer or the top few layers of the base model. Howard et al. proposed a general fine-tuning language model by varying the learning rate [47]. Nakamura et al. used an adaptive gradient optimizer for fine-tuning while using a lower learning rate during the few-shot retraining [48]. Gao et al. proposed a few-shot fine-tuning method (LM-BFF) for fine-tuning based on language model cues [49]. Chua et al. provided risk bounds on the best predictor found by fine-tuning via gradient descent [50]. (1) For task θ i , compute the gradient on the support set S and update the parameters: (2) Calculate the sum of the losses of all tasks on the query set: (3) Update the initialization parameters: As shown in Figure 3, the original intention of MAML is to find the appropriate parameter φ that makes it possible to descend to the global optimum regardless of the loss curve of task 1 or task 2 .  Figure 2 illustrates the process of MAML update step by step, assuming that the learning rate for a single task θ update is γ and the learning rate for model ϕ update is η, the steps of MAML are as follows: (1) For task θi, compute the gradient on the support set S and update the parameters: (2) Calculate the sum of the losses of all tasks on the query set: (3) Update the initialization parameters: As shown in Figure 3, the original intention of MAML is to find the appropriate parameter ϕ that makes it possible to descend to the global optimum regardless of the loss curve of task1 or task2.

Fine-Tuning the Model
Due to the bias in the distribution of the source and target domains, direct classification of the target domain by the base model trained in the source domain usually does not achieve the desired effect. Fine-tuning the pre-trained model using the support set data in the target domain will be beneficial to further improve the classification accuracy of the test set by fine-tuning the parameters of the fully connected layer or the top few layers of the base model. Howard et al. proposed a general fine-tuning language model by varying the learning rate [47]. Nakamura et al. used an adaptive gradient optimizer for fine-tuning while using a lower learning rate during the few-shot retraining [48]. Gao et al. proposed a few-shot fine-tuning method (LM-BFF) for fine-tuning based on language model cues [49]. Chua et al. provided risk bounds on the best predictor found by fine-tuning via gradient descent [50].

Fine-Tuning the Model
Due to the bias in the distribution of the source and target domains, direct classification of the target domain by the base model trained in the source domain usually does not achieve the desired effect. Fine-tuning the pre-trained model using the support set data in the target domain will be beneficial to further improve the classification accuracy of the test set by fine-tuning the parameters of the fully connected layer or the top few layers of the base model. Howard et al. proposed a general fine-tuning language model by varying the learning rate [47]. Nakamura et al. used an adaptive gradient optimizer for fine-tuning while using a lower learning rate during the few-shot retraining [48]. Gao et al. proposed a few-shot fine-tuning method (LM-BFF) for fine-tuning based on language model cues [49]. Chua et al. provided risk bounds on the best predictor found by fine-tuning via gradient descent [50].

Proposed MAMB for Few-Shot Fault Diagnosis
In this paper, based on MAML, we proposed a model named model-agnostic metabaseline (MAMB), which performs few-shot fault detection for multiple faults of wind turbine generator bearings and gearboxes, and the model structure is shown in Figure 4. A small number of existing fault samples of the wind turbine were used to build a metalearning model, and the model was updated through meta-tasking, which could effectively detect the faults when the same faults occurred again.
In the third step, the meta-learning model was fine-tuned. We randomly selected data from the training data to fine-tune with a learning rate lr4 of 0.0005. As shown in Figure 4, this paper only fine-tuned the last two blocks (green) and froze the first two blocks (black).
In the last step, the test set data were fed into the fine-tuned model for classification and solved for accuracy. The feature embedding was visualized by t-distributed stochastic neighbor embedding (t-SNE) to test the effectiveness of the proposed model.   The classifier model contained three convolution layers, three BatchNorm1d, three MaxPool1d, and one fully connected layer. The number of neurons in each layer is marked in Figure 4. The activation functions of all layers were rectified linear units (Relu), except for the last layer where the activation function was Softmax. All the data went through the fast Fourier transform, and then it was fed into the model. The fault diagnosis model was divided into the following steps: In the first step, the baseline model was trained. All the training set data were input into the classifier model, set the model as f, and updated the base model parameters with the learning rate lr 1 as 0.01.
In the second step, the meta-learning model was trained. Assuming a c-way k-shot learning task, k pieces of data of each class were randomly selected from the training data as the support set S, another q pieces of data were selected as the query set Q, and the support set and query set formed a meta-learning task. N meta-learning tasks were constructed. The initial parameters of the MAML model were selected from the trained baseline model, and each task was used to update the MAML parameters. The updated learning rate of each task was lr 2 as 0.002, and the updated learning rate of MAML was lr 3 as 0.001.
In the third step, the meta-learning model was fine-tuned. We randomly selected data from the training data to fine-tune with a learning rate lr 4 of 0.0005. As shown in Figure 4, this paper only fine-tuned the last two blocks (green) and froze the first two blocks (black).
In the last step, the test set data were fed into the fine-tuned model for classification and solved for accuracy. The feature embedding was visualized by t-distributed stochastic neighbor embedding (t-SNE) to test the effectiveness of the proposed model.
Backpropagation updates from the first step to the third step are carried out according to Equation (5).
The complete algorithm flow is shown in Algorithm 1.
As the working conditions of wind turbines are randomly changing, the working conditions are not stable and constant for the data of wind turbines over a period of time, and the working conditions of the data are unknown. Therefore, in this paper, we took the first 15 data (the first 15 data span a short period of time and could be considered as a constant condition) as training data and the next 240 pieces of data as testing data. While the testing set had unknown conditions (perhaps the same conditions as the training set, or perhaps not), this paper does not make a specific subdivision of the source and target domains. It only solves the results of a large number of testing data when the training model had only a small amount of data in a single working condition.
The update function used in this model was Adam, with 100 training epochs for the pre-trained base model, 200 training epochs for the meta-learning update, and 100 training epochs for the fine-tune. The batch size was 32. In the MAML training step, the sample size of the query set was 5.

Case Analysis
In this section, three few-shot learning cases are analyzed to verify the advantages of the proposed model, including the few-shot case of the bearing data from Case Western Reserve University (CWRU), the few-shot case of wind turbine generator bearing, and the few-shot case of wind turbine gearbox.
All the three types of data were vibration data. Case 1 was the bearing data from the Case Western Reserve University (CWRU) data, selected from 12DriveEndFault, with operating conditions of 1730, 1750, and 1772 rpm, a sampling frequency of 12 kHz, and a sampling time of 1 s. Case 2 was wind turbine generator drive-end bearing vibration data from field operation, with a sampling frequency of 25,600 Hz and a sampling time of 1 s. Case 3 was wind turbine gearbox vibration data from field operation, with a sampling frequency of 25,600 Hz and a sampling time of 1 s. The input channels provided to the model were the x and y directions of the vibration data.
To further validate the proposed model of MAMB, we compared it with some fewshot or transfer learning algorithms, such as CNN, the Siamese net [7], and the MAML net [40]. To make a fair comparison, we used the same datasets, the same data preprocessing methods (fast Fourier transform), the same classified model, the same epochs, and the same learning rates. Three case studies with one-shot, five-shot, and ten-shot settings were conducted.

Case 1: Fault Diagnosis of CWRU Datasets
In this case, a few-shot fault diagnosis of the CWRU datasets in the drive end was conducted. The available samples are shown in Table 1. The samples contained one category of health data and three kinds of fault data, and each category contained 260 data. In practical working conditions, healthy data is easy to collect, but fault data is difficult. Therefore, in this case, there were 150 data in the health data training set and 15 data in each of the three faults. Data from the 20th to the 260th of each class was used as a testing set to test the model classification accuracy. This example analyzed the results of fourway-one-shot, four-way-five-shot, and four-way-ten-shot, respectively, and compared with CNN, the Siamese net [7], and the MAML net [40]. The final t-SNE is shown in Figures 5-7. The accuracy is displayed at the top of each chart.    The fault classification accuracy of the different algorithms using the CWRU dataset is shown in Table 2. The proposed model MAMB already showed relatively high classification accuracy (91.64%) in the four-way-one-shot while reaching 95.78% and 97.21% in the four-way-five-shot and four-way-ten-shot, respectively. The average accuracy was 14.4% higher than that of CNN, 21% higher than that of Siamese net, and 9% higher than that of MAML.

Case 2: Fault Diagnosis of Generator Bearings for Wind Turbines
In this case, a few-shot fault diagnosis of the generator bearings for wind turbines was conducted. The available samples are shown in Table 3. Outer ring failure 1 15 240 Figure 7. Four-way-ten-shot diagnosis of the CWRU data for different algorithms.
The fault classification accuracy of the different algorithms using the CWRU dataset is shown in Table 2. The proposed model MAMB already showed relatively high classification accuracy (91.64%) in the four-way-one-shot while reaching 95.78% and 97.21% in the fourway-five-shot and four-way-ten-shot, respectively. The average accuracy was 14.4% higher than that of CNN, 21% higher than that of Siamese net, and 9% higher than that of MAML.

Case 2: Fault Diagnosis of Generator Bearings for Wind Turbines
In this case, a few-shot fault diagnosis of the generator bearings for wind turbines was conducted. The available samples are shown in Table 3. The generator bearing data for the wind turbine included health data and three types of faults, and each category contained 260 data. The latter two faults were compound faults. In the actual operating conditions, wind turbines mostly have compound faults, and this paper studies the few-shot problem of compound faults, which has better engineering significance. At the same time, the operating conditions of wind turbines are changing at any time, and the first 15 data were taken for training in this paper. Usually, the latter 240 data are in different operating conditions from the training data. The model could also be further tested for different operating conditions. This case analyzed the results of four-way-one-shot, four-way-five-shot, and fourway-ten-shot, respectively, and compared with CNN, the Siamese net [7], and the MAML net [40]. The final t-SNE is shown in Figures 8-10. The accuracy is displayed at the top of each chart. the latter 240 data are in different operating conditions from the training data. The model could also be further tested for different operating conditions. This case analyzed the results of four-way-one-shot, four-way-five-shot, and fourway-ten-shot, respectively, and compared with CNN, the Siamese net [7], and the MAML net [40]. The final t-SNE is shown in Figures 8-10. The accuracy is displayed at the top of each chart.     The fault classification accuracy of the generator bearings for wind turbines using different algorithms is shown in Table 4. The proposed MAMB model showed relatively high classification accuracy (89.48%) in the four-way-one-shot while reaching 95.73% and 96.4% in the four-way-five-shot and four-way-ten-shot, respectively. The average accuracy was 24% higher than that of CNN, 21% higher than that of Siamese net, and 22% higher than that of MAML.
As the operating conditions of wind turbines change all the time, it can be seen that the classification accuracy of CNN, Siamese net, and MAML was much lower than that of the CWRU data. However, the proposed model incorporated the basic classification advantages of CNN and the learning advantages of MAML, and the test accuracy still reached consistently high values.  Figure 10. Four-way-ten-shot diagnosis of the generator bearing data for the wind turbines using different algorithms.
The fault classification accuracy of the generator bearings for wind turbines using different algorithms is shown in Table 4. The proposed MAMB model showed relatively high classification accuracy (89.48%) in the four-way-one-shot while reaching 95.73% and 96.4% in the four-way-five-shot and four-way-ten-shot, respectively. The average accuracy was 24% higher than that of CNN, 21% higher than that of Siamese net, and 22% higher than that of MAML.
As the operating conditions of wind turbines change all the time, it can be seen that the classification accuracy of CNN, Siamese net, and MAML was much lower than that of the CWRU data. However, the proposed model incorporated the basic classification advantages of CNN and the learning advantages of MAML, and the test accuracy still reached consistently high values.

Case 3: Fault Diagnosis of Wind Turbine Gearbox
This case focused on a few-shot fault diagnosis of the gearbox of wind turbines. The available gearbox samples are shown in Table 5, and the samples contained one category of health data and four kinds of fault data, each category contained 260 data. Fault 2 is a compound fault.
In this example, there were 150 health data in the training set and 15 data for each type of failure. Data from the 20th to the 260th of each class was used as a testing set to test the model classification accuracy. This example analyzed the results of five-way-oneshot, five-way-five-shot, and five-way-ten-shot, respectively, and compared with CNN, the Siamese net [7], and the MAML net [40]. The final t-SNE is shown in Figures 11-13. The accuracy is displayed at the top of each chart.   Figure 11. Five-way-one-shot diagnosis of the wind turbine gearboxes using different algorithms. Figure 11. Five-way-one-shot diagnosis of the wind turbine gearboxes using different algorithms. Figure 11. Five-way-one-shot diagnosis of the wind turbine gearboxes using different algorithms.

Figure 12.
Five-way-five-shot diagnosis of the wind turbine gearboxes using different algorithms. Figure 12. Five-way-five-shot diagnosis of the wind turbine gearboxes using different algorithms.  Figure 13. Five-way-ten-shot diagnosis of the wind turbine gearboxes using different algorithms.
The fault classification accuracy of wind turbine gearboxes using different algorithms is shown in Table 6. The proposed model reached 86.44%, 90.94%, and 91.18% in the fiveway-one-shot, five-way-five-shot, and five-way-ten-shot, respectively. The average accuracy was 14% higher than that of CNN, 21% higher than that of Siamese net, and 10% higher than that of MAML. This section analyses the effect of the number of training samples on the results using MAMB, the accuracy result was shown in Table 7. The sample sizes of each class were 15 or 20. It can be seen that the accuracy of the model improved as the number of training samples increased. The fault classification accuracy of wind turbine gearboxes using different algorithms is shown in Table 6. The proposed model reached 86.44%, 90.94%, and 91.18% in the five-way-one-shot, five-way-five-shot, and five-way-ten-shot, respectively. The average accuracy was 14% higher than that of CNN, 21% higher than that of Siamese net, and 10% higher than that of MAML.

The Impact of the Number of Training Data on the Results
This section analyses the effect of the number of training samples on the results using MAMB, the accuracy result was shown in Table 7. The sample sizes of each class were 15 or 20. It can be seen that the accuracy of the model improved as the number of training samples increased.

Conclusions
Fault diagnosis of wind turbines plays an important role in improving the reliability of wind turbines. However, the operating conditions of wind turbines change randomly, and multiple faults often occur simultaneously. When fault samples are small, ordinary deep learning can fall into overfitting, which in turn leads to low diagnostic accuracy.
Model-agnostic meta-baseline (MAMB)-based few-shot learning was presented in this paper to achieve the few-shot diagnosis of compound faults of the wind turbines drivetrain under variable operating conditions. The model consists of four steps: pre-training the base model, training the MAML, fine-tuning, and testing. This paper analyses the diagnosis of one-shot, five-shot, and ten-shot tasks of single and compound faults in CWRU, wind turbine generator bearings, and wind turbine gearboxes. It was also compared with other algorithms to verify the accuracy and stability of the proposed method. The results are also presented by t-SNE.
The proposed model MAMB combines the advantages of CNN in basic classification and MAML in learning new tasks. The results show that the proposed model MAMB was superior to CNN, Siamese net, and MAML in the classification accuracy of three kinds of data. Especially for wind turbine data, the accuracy of the proposed model MAMB was higher than that of other models. This shows that the proposed model could solve the problems of wind turbine variable operating conditions and composite diagnosis better.
In the future, the recognition of unknown classes of wind turbines should be further considered through transfer learning.
Author Contributions: Methodology, X.L.; writing-original draft preparation, X.L.; writingreview and editing, W.T. and Y.L. All authors have read and agreed to the published version of the manuscript.