Research on an Improved Auxiliary Classiﬁer Wasserstein Generative Adversarial Network with Gradient Penalty Fault Diagnosis Method for Tilting Pad Bearing of Rotating Equipment

: The research on fault diagnosis methods based on generative adversarial networks has achieved fruitful results, but most of the research objects are rolling bearings or gears, and the model test data are almost all derived from laboratory bench test data. In the industrial Internet environment, equipment-fault diagnosis is faced with the characteristics of large amounts of data, unbalanced data samples, and inconsistent data ﬁle lengths. Moreover, there are few research results on the fault diagnosis of rotor systems composed of shafts, impellers or blades, couplings, and tilting pad bearings. There are still shortcomings in the operational risk evaluation of rotor systems. In order to ensure the reliability and safety of rotor systems, an Improved Auxiliary Classiﬁer Wasserstein Generative Adversarial Network with Gradient Penalty (IACWGAN-GP) model is constructed, a fault diagnosis method based on IACWGAN-GP for tilting pad bearings is proposed, and an intelligent fault diagnosis system platform for equipment in an industrial Internet environment is built. The veriﬁcation results of engineering case data show that the fault diagnosis model based on IACWGAN-GP can adapt to any length of sequential data ﬁles, and the automatic identiﬁcation accuracy of early faults in tilting pad bearings reaches 98.7%.


Introduction
Centrifugal compressors, steam turbines, flue gas turbines, expanders and other highspeed rotating machinery are widely used in petrochemical, coal chemical, metallurgical and other industrial fields.In the event of blade fracture, rotor imbalance, rubbing, surges and other faults [1], minor faults may cause equipment failure and production interruption, and serious faults may cause machine damage and fatal accidents, leading to huge economic losses or social impact to enterprises [2,3].Timely and automatic identification of equipment failure types, to take control and to take preventive measures, is of great significance for reducing or avoiding economic losses in enterprises and preventing catastrophic failures of rotating machinery [4].
Oil-whirl faults caused by improper assembly-clearance and contact areas between tilting pad bearing and shaft is one of the most common faults in rotating equipment.Oil whirl failure refers to severe fluctuations or vibrations in the lubricating oil film, which usually occurs when the lubricating oil film cannot be stably maintained on the surface of mechanical parts.This failure may cause serious harm to mechanical equipment and systems.The specific hazards include: (1) Friction and wear increase: oil whirl will lead to the instability of the lubricating oil film, so that the contact area between the mechanical parts increases; friction and wear will increase accordingly.Long-term friction and wear can lead to damage and the shortened life of parts.(2) Energy loss: oil whirl will cause abnormal contact between mechanical parts, which will lead to energy loss, thus affecting the efficiency and performance of the mechanical system.(3) Vibration and noise: Oil whirl can cause the vibration of mechanical parts, and then produce noise.These vibrations and noises not only affect the normal operation of mechanical equipment, but also may affect the surrounding environment and the health of workers.(4) Heat accumulation: oil whirl may lead to local energy concentration, resulting in excessive heat accumulation.This may lead to overheating of lubricating oil and further aggravate the damage to mechanical parts.(5) System failure: If the problem caused by oil whirl is not solved in time, it may lead to the failure of the components of the mechanical system, which in turn affects the normal operation of the entire equipment.This may require expensive maintenance and downtime.In addition, due to the shaft misalignment, rotor imbalance, surges, rubbing and other faults of rotating equipment may also occur at the same time; it is challenging to accurately identify the early faults of tilting pad bearings.
The research on fault diagnosis methods based on artificial intelligence has achieved fruitful results.Zhong et al. [5] proposed a rolling bearing fault diagnosis method based on a convolutional autoencoder and nearest-neighbor algorithm, which was verified experimentally by using the experimental data set published by CWRU under different working conditions.Mohiuddin et al. [6] proposed an improved AlexNet-based intelligent fault diagnosis method for rolling bearings, which was verified experimentally using the data of different working conditions and a different signal-to-noise ratio of the experimental data set published by CWRU.Cui et al. [7] proposed a method for fault diagnosis of rolling bearings under the condition of sample imbalance based on CNN, and used the conventional rolling bearing-fault data set collected in the laboratory for experimental verification.Zhang et al. [8] proposed a CNN-based multi-channel data fusion neural network for rolling bearing fault diagnosis, using bearing data collected by eight vibration sensors on the SB25 aero-engine bearing bench test for experimental verification of the model.Shen et al. [9] proposed an improved Gray Wolf optimizer algorithm based on a support vector machine and swarm intelligence optimization algorithm for rolling bearing fault diagnosis.The proposed algorithm was verified experimentally using the experimental data set published by CWRU and the data obtained from the mechanical transmission bearing life-cycle test platform independently developed by Nanjing Agricultural University.Huang et al. [10] proposed a rolling bearing fault-detection method based on an improved Gray Wolf algorithm to optimize multi-stable stochastic resonance parameters, and conducted experimental verification using the published experimental data sets of CWRU and MFPT.Tian et al. [11] proposed a CNN-LSTM bearing fault diagnosis model based on hybrid particle swarm optimization, and conducted experimental verification using the experimental data set disclosed by CWRU.However, most of the research objects of these research results are rolling bearings, and the model test data are almost all derived from laboratory bench test data.Moreover, there are few research results on the fault diagnosis of rotor systems composed of shafts, impellers or blades, couplings, and tilting pad bearings.There are still shortcomings in the operational risk evaluation of rotor systems.
The traditional fault diagnosis method for rotating machinery relies on the experience and knowledge of external experts, and relies on a spectrum analysis diagram, Bode diagram, Nyquist diagram and other analysis toolboxes in the condition-monitoring and analysis software to carry out a one-by-one manual analysis.This not only has a low efficiency of fault diagnosis and analysis, but also has a great lag, which often leads to untimely early fault-detection.Industrial Internet-enabled equipment management technology has developed rapidly in China.The accumulated equipment state-aware data has laid a foundation for intelligent fault diagnosis based on artificial intelligence and big data analysis.This data-driven, deep learning intelligent fault diagnosis method [12][13][14] makes full use of the advantages of industrial big data and greatly reduces the dependence of the model on external experts.It has gradually become a development trend for equipment fault diagnosis technology in the industrial Internet environment [15,16].
During the service life-cycle of rotating equipment, the time of fault-free operation for equipment is far greater than that of fault operation, which determines that the data samples of normal-state perception for equipment are significantly more abundant than those of fault-state perception.The data samples are of a long-tail distribution type and have the characteristics of low value density [17].For specific rotating equipment, it is impossible to go through all the faults such as rotor imbalance, axis misalignment, rubbing, oil film whirl, surges and so on in the service life-cycle.Some equipment will not even have any kind of fault in the whole life-cycle.The lack of equipment-fault sample data is one of the challenging problems faced by fault diagnosis technology based on artificial intelligence and big data [18].
When model-training samples are insufficient, a generative adversarial network is considered as one of the effective methods for solving the problem of data imbalance [19].Generative Adversarial Networks (GAN) [20] are a deep learning model that is one of the most promising approaches to unsupervised learning over complex distributions in recent years.The model produces a fairly good output through game learning between (at least) two modules in the framework: the generative model and the discriminative model.GAN models generally use deep neural networks as G and D. A good GAN application needs to have a good training method; otherwise the output may not be ideal due to the freedom of the neural network model.To improve a GAN's data generation capabilities and optimize the training process, a Deep Convolution Generative Adversarial Network (DCGAN) based on a deep Convolutional Neural Network (CNN) and generated high-resolution images have been proposed [21].However, as the training time of the model increases, some filters of the model will collapse and oscillate, resulting in mode collapse.
In order to solve the problem of GAN pattern collapse, a Wasserstein generative adversarial network (WGAN) model was constructed to overcome the problem by improving the stability of training [22].Aiming at the problem that WGAN adopts weight-clipping to solve the problem of Lipschitz constraints that can easily cause gradient disappearance or gradient explosion and slow model convergence, an improved Wasserstein GAN training method (WGAN-GP) has been proposed by Gulrajani et al. [23].By using a gradient penalty instead of weight-clipping to solve the problem of Lipschitz constraints, gradient disappearance or gradient explosion during model training can be avoided, and the problem of slow convergence of WGAN also can be solved.GAN, DCGAN, WGAN, and WGAN-GP are all unsupervised learning models that generate samples without category labels and cannot generate multiple types of samples using the same model.
In order to enhance the performance of GAN, an Auxiliary Classifier GAN (ACGAN) supervised learning model, which adds category labels to the generator and discriminator, as well as a classifier to the output part of the discriminator, have been proposed [24].This ACGAN model realizes that the generated samples all have a corresponding category label.The ACGAN model is improved based on DCGAN, so ACGAN still has the problem of model collapse.A Parallel Classification Wasserstein Generative Adversarial Network with Gradient Penalty (PCWGAN-GP) has been proposed by Yu et al. [25].By feeding healthy samples into the PCWGAN-GP model, the model will produce various failure samples of good quality, which can gradually expand the unbalanced data set until equilibrium is reached.
PCWGAN-GP is an unsupervised learning model, which needs to be constructed and trained independently for each fault type to obtain a balanced data set.This undoubtedly increases the workload of model construction and increases the time of model training.An ACWGAN-GP model based on a gradient penalty and auxiliary classifier has been built by Li et al. [26], which can generate good-quality samples from an unbalanced training set, and has used the balanced data set for training Multilayer Perceptron (MLP), CNN, Support Vector Machine (SVM) and other classifiers for fault diagnosis.Cao et al. [27] constructed a fault diagnosis model based on ACWGAN-GP and homogeneous superposition ensemble learning, which significantly improved the classification accuracy and stability of the model.
ACWGAN-GP combines the advantages of WGAN-GP and ACGAN, so that the model has the ability to generate multi-class label samples while overcoming the problems of pattern collapse and gradient disappearance.As a supervised learning model, ACWGAN-GP still needs a complete variety of fault label sample training data sets.Obviously, engineering application scenarios are not always able to meet such needs.Furthermore, the length of a single device state-aware data file is different, and the ACWGAN-GP model can only adapt to a single data file type, which cannot meet the needs of engineering applications.Therefore, the application of the ACWGAN-GP model for equipment fullfault diagnosis needs improvements in the model structure, so that it cannot only meet the data function of generating complete fault samples, but also automatically adapt to different equipment-state perception data.
Rotating equipment is generally composed of shafts, impellers or blades, comb seals, couplings, tilting pad bearings and other components.Among them, oil whirl faults caused by improper assembly-clearance and contact areas between tilting pad bearings and shafts are the most common.It is challenging to accurately identify the early faults of tilting pad bearings, because shaft misalignment, rotor imbalance, surges, rubbing and other faults of rotating equipment may also occur at the same time.Aiming at the engineering status of unbalanced data samples of rotating equipment, this paper studies an improved auxiliary classifier Wasserstein generative adversarial network with a gradient penalty for fault diagnosis of tilting pad bearings.The contributions of this paper are listed as follows: (1) An improved auxiliary classifier Wasserstein generative adversarial network with gradient penalty is developed, in which the input data-length adaptive layer is added before the 2D convolution layer of the discriminator.
(2) A fault diagnosis method based on IACWGAN-GP for tilting pad bearings is proposed, which is able to accurately identify the early faults of tilting pad bearing oil whirl despite the interference of shaft misalignment, rotor imbalance, surges, rubbing and other faults that may occur simultaneously in rotating equipment.
(3) The application of an IACWGAN-GP-based fault diagnosis model in an industrial Internet environment via cloud-integrated prediction and health management systems, which includes a cyber-physical system layer, network layer and application layer, is proposed.The application layer consists of micro-service systems such as early fault warning, health evaluation and fault diagnosis.

Generative Adversarial Network
As shown in Figure 1, a GAN is composed of two neural networks: the generator G and the discriminator D. In order to trick the discriminator D, the generator must learn the data distribution of real samples and create fake samples using random noise.The discriminator's task is to tell the real samples from the fake samples that have been generated.The performance of G and D is continuously improved until Nash equilibrium is reached in these two adversarial trained neural networks.The GAN's loss function can be described as follows: where E S~Pdata and E Z~PZ stand for the expectation of x from the real data distribution P data and z sampled from the random noise prior distribution P Z , respectively.D(x) represents the discriminant result when the input of the discriminator is real data x, G(z) represents the generated data of the generator, and D(G(z)) represents the discriminant result when the input of the discriminator is the generated data G(z).
represents the generated data of the generator, and (()) represents the discriminant result when the input of the discriminator is the generated data ().
The binary minimax problem that describes the optimization procedure for D and G is represented by the following equation:

Auxiliary Classifier Generative Adversarial Network
ACGAN is a modified model of GAN with the structure shown in Figure 2. Unlike GAN, ACGAN can use label information to generate samples of specified types and to identify and classify the input samples.Specifically, the generator G generates new samples G(, ) using random noise  and label , while the discriminator D needs not only to determine the real or fake nature of the input samples, but also to classify the input samples.During the adversarial training, the generating sample capability and recognition capability of ACGAN are continuously optimized.Eventually, the model has a strong ability to generate new samples with corresponding labels.

Category labels
Discriminator D

Random noise z
Generator G

Label y
Real/Fake X real X gen Since ACGAN needs to process the source and class label information of the input samples, the loss function of ACGAN contains two components, defined as follows: where G(, ) represents the generated sample when the generator inputs are random noise  and sample label , ( = |  ) represents the conditional probability distribution of the real sample, and ( = |  ) represents the conditional probability distribution of the generated sample.The binary minimax problem that describes the optimization procedure for D and G is represented by the following equation: (2)

Auxiliary Classifier Generative Adversarial Network
ACGAN is a modified model of GAN with the structure shown in Figure 2. Unlike GAN, ACGAN can use label information to generate samples of specified types and to identify and classify the input samples.Specifically, the generator G generates new samples G(z, y) using random noise z and label y, while the discriminator D needs not only to determine the real or fake nature of the input samples, but also to classify the input samples.During the adversarial training, the generating sample capability and recognition capability of ACGAN are continuously optimized.Eventually, the model has a strong ability to generate new samples with corresponding labels.
Lubricants 2023, 11, x FOR PEER REVIEW 5 of 33 represents the discriminant result when the input of the discriminator is real data , () represents the generated data of the generator, and (()) represents the discriminant result when the input of the discriminator is the generated data ().
The binary minimax problem that describes the optimization procedure for D and G is represented by the following equation:

Auxiliary Classifier Generative Adversarial Network
ACGAN is a modified model of GAN with the structure shown in Figure 2. Unlike GAN, ACGAN can use label information to generate samples of specified types and to identify and classify the input samples.Specifically, the generator G generates new samples G(, ) using random noise  and label , while the discriminator D needs not only to determine the real or fake nature of the input samples, but also to classify the input samples.During the adversarial training, the generating sample capability and recognition capability of ACGAN are continuously optimized.Eventually, the model has a strong ability to generate new samples with corresponding labels.Since ACGAN needs to process the source and class label information of the input samples, the loss function of ACGAN contains two components, defined as follows: where G(, ) represents the generated sample when the generator inputs are random noise  and sample label , ( = |  ) represents the conditional probability distribution of the real sample, and ( = |  ) represents the conditional probability distribution of the generated sample.Since ACGAN needs to process the source and class label information of the input samples, the loss function of ACGAN contains two components, defined as follows: where G(z, y) represents the generated sample when the generator inputs are random noise z and sample label y, P(Y = y|X real ) represents the conditional probability distribution of the real sample, and P Y = y X generated represents the conditional probability distribution of the generated sample.
The objective function of the D is to maximize L source + L class , and the objective function of the G is to maximize L source − L class , shown as follows:

Wasserstein Distance and Gradient Penalty
GANs have attracted the attention of many researchers because of their powerful sample generation capability.But there are gradient-disappearance and model-collapse problems leading to unstable training.To solve these problems, a lot of attempts have been made by many researchers.However, the problem of GAN training instability was not solved until the proposal of WGAN.
Arjovsky et al. [22] proposed WGAN and credited the objective function, which took the form of J-S divergence, as the cause of the unstable training of GAN.They then suggested utilizing the Wasserstein distance (WD) rather than the J-S divergence in the WGAN.The following is an expression of the WGAN's objective function: where Ω denotes the set of 1-Lipschitz functions that take values in the range [−ω, ω].
Although the WGAN training process is faster and more stable than the original GAN, the quality of the generated samples is occasionally unsatisfactory.The issue, according to Gulrajani et al. [23], was caused by the weights in WGAN being restricted in order to enforce the Lipschitz constraint on the discriminator.Therefore, they introduced a gradient penalty to propose WGAN-GP.The following defines the WGAN-GP's loss function and objective function: where x = εx + (1 − ε)G(z) ∼ P x and the random numbers ε ∼ U(0, 1), ϕ represents the penalty factor.Without meticulously adjusting the hyperparameters, WGAN-GP performs better than WGAN and achieves steady training.

Building an Improved Auxiliary Classifier Wasserstein Generative Adversarial Network with Gradient Penalty
In order to overcome the limitation of the input data length on the neural network model, the IACWGAN-GP model has been designed.In order to avoid the problem of model collapse and gradient disappearance during the training process, Wasserstein distance and a gradient penalty are introduced into the loss function of the model.The model introduces category labels in the generator and discriminator, and introduces an auxiliary classifier in the discriminator, so that the model has the ability to generate multi-class label samples and a sample classification ability.The generator uses three 2D deconvolution layer results, and the discriminator uses three 2D convolution layer structures.Before the first convolutional layer of the discriminator, an input data length adaptive layer designed in this paper is added, so that the model can automatically adapt to different device-status sensing data, and improve the applicability and generalization of the model to various types of data.The architecture of the IACWGAN-GP model is shown in Figure 3.
first convolutional layer of the discriminator, an input data length adaptive layer designed in this paper is added, so that the model can automatically adapt to different device-status sensing data, and improve the applicability and generalization of the model to various types of data.The architecture of the IACWGAN-GP model is shown in Figure 3.The 2D CNN has superior performance in feature extraction and classification compared to the 1D CNN [28], so both the generator and discriminator of IACWGAN-GP constructed in this paper use 2D convolutional structures.The vibration signal of rotating machinery is a 1D signal, which cannot be directly convolved in 2D.Therefore, it is necessary to convert 1D data into 2D data, which requires the data length to be a square number.However, the length of engineering case data often does not meet this requirement.For example, the 1D vibration engineering case data with data length of 1024 is halved to 512 after Fourier transform.Since the input data dimension of 2D convolution requires 2D, 1D data needs to be converted into 2D data, and the data length is usually taken as a square number, such as 784(28 × 28), 1024(32 × 32), etc.However, the length of the spectrum data after Fourier transform (512) is not a square number.To overcome the limitation of the neural network model on the input data length, an input adaptive learning framework is designed.Specifically, the Input Adaptive Layer (IAL) is designed before the first convolutional layer of the discriminator, as shown in Figure 3.
A 1D signal of length m is defined as: S a a a = (10) where   indicates the 1D input data, and   indicates the value of node i of the 1D waveform data.The 2D CNN has superior performance in feature extraction and classification compared to the 1D CNN [28], so both the generator and discriminator of IACWGAN-GP constructed in this paper use 2D convolutional structures.The vibration signal of rotating machinery is a 1D signal, which cannot be directly convolved in 2D.Therefore, it is necessary to convert 1D data into 2D data, which requires the data length to be a square number.However, the length of engineering case data often does not meet this requirement.For example, the 1D vibration engineering case data with data length of 1024 is halved to 512 after Fourier transform.Since the input data dimension of 2D convolution requires 2D, 1D data needs to be converted into 2D data, and the data length is usually taken as a square number, such as 784(28 × 28), 1024(32 × 32), etc.However, the length of the spectrum data after Fourier transform (512) is not a square number.To overcome the limitation of the neural network model on the input data length, an input adaptive learning framework is designed.Specifically, the Input Adaptive Layer (IAL) is designed before the first convolutional layer of the discriminator, as shown in Figure 3.
A 1D signal of length m is defined as: where S input indicates the 1D input data, and a i indicates the value of node i of the 1D waveform data.
In order to convert 1D data with length of non-square number m into 2D data with target size, multiply S input with size (1, m) by a weight matrix with size (m, n), the input data can be converted into 1D data with length of square number n, and the result can be corrected with deviation, as shown in Equation (11).
where K is the weight matrix created by the kernel layer, as shown in Equation ( 12); b is the deviation vector, created by the layer, as shown in Equation ( 13); F A (•) is the activation function of the neural network.
By substituting Equations ( 12) and ( 13) into Equation ( 11), the value of the transformed i-th node is as follows: Since n is a square number, it is easy to transform 1D data of length where F R (•) is the Reshape function.Equations ( 10)-( 15) are the derivation process of IAL.IAL can be defined as follows: where IA(•) is the input adaptive function whose input is 1D data of arbitrary length and output is 2D data of desired size.
With the introduction of IAL, 1D data whose data length is not a square number can be easily converted into 2D data required for 2D convolution models, enabling input data-length adaption and improving the applicability and generalization of neural network models to various types of data.
Bringing Equation (8) into Equation ( 5) and Equation ( 6), the objective function of IACWGAN-GP is expressed as follows: where is the random sample got by interpolating between x and x, ϕ is the gradient penalty factor.
Equations ( 17) and ( 18) are the objective functions of IACWGAN-GP.Combining the advantages of WGAN-GP and ACGAN, the model can generate multi-class label samples, and overcome the problems of pattern collapse and gradient disappearance.

Virtual Sample Generation Module Fault Virtual Sample Definition
As the characteristics of frequency domain signals are more prominent than those of time domain signals, many researchers in the field of fault diagnosis in rotating machinery employ frequency domain signals as the input to their models [29,30].The frequency domain signal can be obtained by Fourier transforming the vibration signal in the time domain of the equipment, and studying the spectrum of the equipment fault signal shows that the spectrum contains the fault-characteristic frequencies that coincide with the fault

Virtual Sample Generation Module Fault Virtual Sample Definition
As the characteristics of frequency domain signals are more prominent than those of time domain signals, many researchers in the field of fault diagnosis in rotating machinery employ frequency domain signals as the input to their models [29,30].The frequency domain signal can be obtained by Fourier transforming the vibration signal in the time domain of the equipment, and studying the spectrum of the equipment fault signal shows that the spectrum contains the fault-characteristic frequencies that coincide with the fault mechanism, and these fault-characteristic frequencies belong to the common characteristics of the fault.Besides fault-characteristic frequencies, other frequency information in the spectrum responds to some private characteristics of the equipment, such as working conditions, environmental noise, etc. Studying the spectrum of the normal-state signal of the equipment reveals that the information in the spectrum matches with the private features in the fault signal.In other words, the normal-state signal of the equipment can reflect the private characteristics of the equipment.As shown in Figure 5, the overlapping parts of the fault data belong to the common features, and those other than the overlapping parts belong to the private features.Take surge and oil whirl as an example for illustration, as shown in Figure 6.mechanism, and these fault-characteristic frequencies belong to the common characteristics of the fault.Besides fault-characteristic frequencies, other frequency information in the spectrum responds to some private characteristics of the equipment, such as working conditions, environmental noise, etc. Studying the spectrum of the normal-state signal of the equipment reveals that the information in the spectrum matches with the private features in the fault signal.In other words, the normal-state signal of the equipment can reflect the private characteristics of the equipment.As shown in Figure 5, the overlapping parts of the fault data belong to the common features, and those other than the overlapping parts belong to the private features.Take surge and oil whirl as an example for illustration, as shown in Figure 6.

Common Features Private Features Private Features
Fault data 1 Fault data 2 mechanism, and these fault-characteristic frequencies belong to the common characteristics of the fault.Besides fault-characteristic frequencies, other frequency information in the spectrum responds to some private characteristics of the equipment, such as working conditions, environmental noise, etc. Studying the spectrum of the normal-state signal of the equipment reveals that the information in the spectrum matches with the private features in the fault signal.In other words, the normal-state signal of the equipment can reflect the private characteristics of the equipment.As shown in Figure 5, the overlapping parts of the fault data belong to the common features, and those other than the overlapping parts belong to the private features.Take surge and oil whirl as an example for illustration, as shown in Figure 6.

Common Features Private Features Private Features
Fault data 1 Fault data 2 Figure 6a shows the frequency spectrum of the oil whirl, the speed of the device is 10,600 rpm, in which f x = 66.25 Hz is the fault characteristic frequency of the oil-whirl fault.Figure 6b shows the frequency spectrum of the normal-state data of the equipment.Comparing with Figure 6a, it can be found that except the fault feature frequency, the private features such as the low-frequency component, the 1st-order frequency f 1× , the 2nd-order frequency f 2× , the 3rd-order frequency f 3× , and the 4th-order frequency f 4× are all included in the spectrum.Figure 6c shows the frequency spectrum of the surge fault data, the speed of the device is 8300 rpm, in which f x = 17.29 Hz is the fault characteristic frequency of the surge fault.Figure 6d shows the frequency spectrum of the normal-state data of the equipment.Comparing with Figure 6c, it can be found that excepts the fault feature frequency, the private features such as the low-frequency component, the 1st-order frequency f 1× , the 2nd-order frequency f 2× , the 3rd-order frequency f 3× , and the 4th-order frequency f 4× are all included in the spectrum.
The Pearson Correlation Coefficient (PCC) [31] is extensively used to measure the degree of correlation between two variables, and the value of PCC is between −1 and 1; the closer to ±1 the higher the correlation.The PCC of the fault and normal data, the PCC of the fault data with the fault characteristic frequency removed, and the normal data are calculated as shown in Table 1.The PCC value of the oil whirl and normal data of the device is 0.81824, and the PCC value of the oil whirl with the fault characteristic frequency removed and normal data is 0.90919.The PCC value of the surge and normal data of the device is 0.78202, and the PCC value of the surge with the fault characteristic frequency removed and normal data is 0.98130.Through the comparative study of spectrograms of fault and normal data, as well as the comparative study of a PCC index calculation for fault and normal data, the above results show that fault data can be described as a collection of common features and private features.That means the fault feature frequency characterizing the common features of faults, and the normal data characterizing the private features of the equipment, can be composed of fault data.Based on this result, it provides an idea for constructing a fault virtual sample.Based on the normal data spectrum of the equipment, the fault characteristic frequencies are superimposed to form the fault virtual samples.

Fault Mechanism-Based Virtual Sample Generation
Based on the results of Fault Virtual Sample Definition, when the target device can only provide the normal-state data of the equipment, or when the fault type space is incomplete, a fault virtual sample can be generated based on normal data and fault mechanism features frequency.In this paper, we propose a FMVS generation algorithm with the following pseudocode: As shown in Algorithm 1, α and β represent the upper and lower bounds of the amplitude range of virtual sample fault mechanism features frequency; the amplitude of the fault mechanism features frequency takes a value within [α, β], which can characterize the severity of the fault; ν represents the speed of the device; γ represents the fault characteristic frequency parameters, which are determined based on the fault mechanism; L represents the length of the normal data of the device; n represents the number of data sets of normal data and virtual fault samples; D N represents the normal data of the device.
First, calculate the fundamental frequency f s 1× of the target device, and the amplitude of the fundamental frequency can be obtained by Fourier transform on the normal data of the target device.Then, calculate the fault characteristic frequency f c , and the amplitude of the fault characteristic frequency is determined from the amplitude of the fundamental frequency as well as α and β.Next, a sine function is used to fit the virtual signal containing only the fault characteristic frequencies, and the virtual signal is Fourier transformed to obtain the virtual signal spectrum f y with frequency f c and amplitude within [α, β].Finally, the virtual signal spectrum f y plus the normal signal spectrum f N of the target device can obtain the fault virtual sample Vs i of the target device.Repeating the above process, the virtual sample Data with full fault types can be generated.

IACWGAN-GP Generation Module and Diagnosis Module
The IACWGAN-GP generation module and the diagnosis module adopt the same neural network structure, as shown in Figure 7.The IACWGAN-GP generation module mainly uses the sample generation ability of the generator, and the IACWGAN-GP diagnosis module mainly uses the sample classification ability of the discriminator.It makes full use of the performance of IACWGAN-GP and reduces the difficulty of constructing the model.First, calculate the fundamental frequency  1× of the target device, and the amplitude of the fundamental frequency can be obtained by Fourier transform on the normal data of the target device.Then, calculate the fault characteristic frequency   , and the amplitude of the fault characteristic frequency is determined from the amplitude of the fundamental frequency as well as  and .Next, a sine function is used to fit the virtual signal containing only the fault characteristic frequencies, and the virtual signal is Fourier transformed to obtain the virtual signal spectrum   with frequency   and amplitude within [, ].Finally, the virtual signal spectrum   plus the normal signal spectrum   of the target device can obtain the fault virtual sample   of the target device.Repeating the above process, the virtual sample  with full fault types can be generated.

IACWGAN-GP Generation Module and Diagnosis Module
The IACWGAN-GP generation module and the diagnosis module adopt the same neural network structure, as shown in Figure 7.The IACWGAN-GP generation module mainly uses the sample generation ability of the generator, and the IACWGAN-GP diagnosis module mainly uses the sample classification ability of the discriminator.It makes full use of the performance of IACWGAN-GP and reduces the difficulty of constructing the model.The number of fault virtual samples constructed in Section 3.2.1 is limited, and to avoid the pattern collapse problem caused by training classification networks with limited data sets [32], an IACWGAN-GP generation module is constructed for generating fault mechanistic feature spectrum samples.Using the virtual samples generated in the Section 3.2.1 training generation module, with supervised learning and the powerful sample generation capability of IACWGAN-GP, the prior knowledge in Section 3.2.1 is no longer required, and only the fault sample labels and the number of required samples need to be input for generating the corresponding fault mechanism feature spectrum samples with corresponding fault types and numbers.Compared with the FMVS generation algorithm in Section 3.2.1, the number of inputs to the model is reduced from seven to two.The IACWGAN-GP diagnosis module is trained with a complete fault data set consisting of fault mechanism feature spectrum samples generated using generation module and equipment normal-state samples.The well-trained model can be used for real-time online full-fault-type fault diagnosis and identification.

Fault Severity Evaluation Module
In an actual engineering scenario, the faults of rotating equipment often experience a gradual development process from weak to strong.Therefore, in addition to determining the type of failure of the equipment, it is also necessary to evaluate the severity of the failure of the target equipment.After the spectrum analysis of the vibration signal, the fault-type of the equipment can be known from the fault characteristic frequency, and the severity of the fault can be judged by analyzing the amplitude ratio of the fault-characteristic frequency to the fundamental frequency.Based on this method, this paper has constructed a Fault Severity Evaluation (FSE) module.
The pseudo-code of FSE algorithm is shown in Algorithm 2. D represents the data to be measured; ν represents the speed of the device; γ represents the fault characteristic frequency parameters, which are determined based on the fault mechanism; G represents the result of fault severity evaluation.Firstly, the device sample frequency f s and the fundamental frequency f s 1× are calculated from the speed ν.Secondly, the fault characteristic frequency is calculated by γ and f s 1× , and the Fourier transform is performed on the data to be measured D, and the vibration signal in the time domain is transformed into the frequency domain signal.Then, the amplitude of the fault characteristic frequency amp and the amplitude of the fundamental frequency amp 1× are found in the spectrum signal, and the fault severity of the fault signal to be measured is evaluated based on the amplitude ratio of the fault characteristic frequency to the fundamental frequency.Finally, the output fault grade G is used as the evaluation result of the fault severity.

Fault Diagnosis Method Based on IACWGAN-GP
In this paper, a fault diagnosis method based on IACWGAN-GP for tilting pad bearings is proposed, which is able to accurately identify the early faults of tilting pad bearing oil whirl despite the interference of shaft misalignment, rotor imbalance, surges, rubbing and other faults that may occur simultaneously in rotating equipment.The specific steps are as follows: Step 1: The virtual sample generation module uses the normal data of the equipment and the characteristic frequency of the fault mechanism to generate the virtual sample of the full fault types.
Step 2: Use the generated full fault type virtual samples to train the IACWGAN-GP generation module.
Step 3: The IACWGAN-GP generation module is used to generate the fault mechanismcharacteristic spectrum samples of different fault types, and the fault-type complete data set is formed with the normal-state data of the equipment.
Step 4: Use the complete data set to train the IACWGAN-GP diagnostic module.
Step 5: The IACWGAN-GP diagnosis module is used to realize the intelligent diagnosis and identification of full fault types.
Step 6: The fault-severity evaluation module is used to evaluate the severity of the fault.
Among these, steps one to four are the model training phase, and steps five to six are the online monitoring phase.

Experiments and Analysis of Results
In this paper, model validation and comparison experiments are conducted using the rotating equipment fault case-data of petrochemical enterprises to verify the effectiveness of the rotating machinery fault diagnosis method based on IACWGAN-GP.The proposed IACWGAN-GP-based fault diagnosis model is used to generate fault mechanism feature frequency spectra samples of different fault types and perform fault diagnosis.Subsequently, the validity of the fault virtual sample generation method and the fault diagnosis model is verified by the fault diagnosis accuracy.

Rotating Equipment Condition Monitoring System
The rotating equipment condition-monitoring system is composed of shaft, impeller, bearing seat, tilting pad bearing, thrust pad bearing, shaft vibration, shaft displacement, temperature measurement unit, transmitter, signal processing unit, database server and workstation, as shown in Figure 8.The main research object of this paper is the tilting pad bearing, as shown in Figure 9a.Its characteristic is that the bearing surface is composed of multiple tiles.These tiles can be tilted in the bearing seat, so as to adapt to different working conditions during the rotation process.It has the advantages of strong adaptability, load distribution, vibration suppression, adaptability to non-uniform deformation and long service life.In this study, a non-contact eddy current displacement sensor is used to collect the condition-monitoring data for the rotor bearing system.The installation method for the eddy current sensor is shown in Figure 9b.The x-direction and y-direction eddy current sensors are arranged in a 45 • angle direction, and the angle between the two sensors is 90 • .To avoid energy leakage during data analysis, a synchronous whole-cycle sampling method is used, i.e., the rotor is sampled 32 times for each rotation week, and the number of rotation weeks is 32, so there are 1024 total sampling points in a single sample, and each set of data files (100 × 1024) includes 100 samples of fault data.Transmitter Signal Processing Unit Database Server

Engineering Case Data Validation 4.2.1. Rotor System Condition Monitoring Data Acquisition
Real engineering case data from mixed refrigerant compressor units, axial flow main air units, syngas compressor units and turbogenerator units have been collected.The types of failures include oil whirl, shaft misalignment, rotor imbalance, surges, and rubbing, which are five types of typical rotor system failures, as shown in Table 2.With the oil-whirl fault data from the syngas compressor units, equipment speed is 10,600 rpm, fault code is Class I.The shaft misalignment fault data comes from the axial main air units; the equipment speed is 5900 rpm, the fault code is Class II.The rotor imbalance fault data comes from the syngas compressor units; the equipment speed is 7700 rpm, the fault code is Class III.The fault data of surges comes from the mixed refrigerant compressor units; the equipment speed is 8300 rpm, the fault code is Class IV.The rubbing data comes from the turbogenerator units; equipment speed is 3000 rpm, fault code is Class V.The normal state of the equipment data code is Class VI. Figure 10 shows the time domain waveform and spectrum of the engineering case data.Figure 10a shows the time domain waveform and spectrogram of Class I fault data with fault mechanism characteristic frequency f x = 0.375 f 1× ; Figure 10b shows the time domain waveform and spectrogram of Class II fault data with fault mechanism characteristic frequency f x = 2 f 1× ; Figure 10c shows the time domain waveform and spectrogram of Class III fault data with fault mechanism characteristic frequency f x = f 1× ; Figure 10d shows the time domain waveform and spectrogram of Class IV fault data with fault mechanism characteristic frequency f x = 0.125 f 1× ; Figure 10e shows the time domain waveform and spectrogram of Class V fault data with fault mechanism characteristic frequency f x = 4 f 1× .Class I to Class V data are from different units with different operating conditions, so the private characteristics of the fault samples are not the same, but all of them have obvious characteristic frequencies of fault mechanisms.The typical fault characteristic frequencies of rotor system [33] are shown in Table 3.As shown in Table 4, the data sets A to E contain equipment normal-state data (Class VI) and a class of real fault data, and data set F contains equipment normal-state data and five classes of real fault data across equipment and operating conditions.To confirm the efficiency of the proposed approach, the engineering case data are used for three purposes in this research: (1) to compare with the generated fault mechanism feature spectrum samples to verify the similarity between the virtual samples and the real samples; (2) as the test set for fault diagnosis to calculate the fault diagnosis accuracy of the models; (3) the data set F is used as the test data of cross-device and cross-condition research to verify the robustness and versatility of the proposed method.

Class I Class II Class III Class IV Class V Class VI
Note: √ means the corresponding real fault data are available, × means the corresponding real fault data are not available.
As shown in Figure 11, the spectrum analysis of the oil whirl fault data of different fault severity for the same equipment shows that the equipment has experienced a shift  As shown in Table 4, the data sets A to E contain equipment normal-state data (Class VI) and a class of real fault data, and data set F contains equipment normal-state data and five classes of real fault data across equipment and operating conditions.To confirm the efficiency of the proposed approach, the engineering case data are used for three purposes in this research: (1) to compare with the generated fault mechanism feature spectrum samples to verify the similarity between the virtual samples and the real samples; (2) as the test set for fault diagnosis to calculate the fault diagnosis accuracy of the models; (3) the data set F is used as the test data of cross-device and cross-condition research to verify the robustness and versatility of the proposed method.
√ means the corresponding real fault data are available, × means the corresponding real fault data are not available.
As shown in Figure 11, the spectrum analysis of the oil whirl fault data of different fault severity for the same equipment shows that the equipment has experienced a shift from normal state to fault state, and the fault has gradually developed from weak to strong.Figure 11a is in the normal state, and the fault characteristic frequency f x of oil whirl has not yet appeared.Figure 11b has a fault characteristic frequency f x with a relatively small amplitude.Figure 11b-e, as the fault gradually develops from weak to strong, the amplitude of the fault characteristic frequency f x gradually increases.

Rotor System FMVS Generation
Take data set A as an example; the device has Class I failure.Show the construction process of the proposed fault diagnosis method based on IACWGAN-GP with this device as an example, and verify the effectiveness of the method.In the engineering application scenario where the target device can only provide normal data, the fault virtual samples are generated based on the normal-state data of the device and fault mechanism feature frequencies, generating a total of five fault types' complete virtual samples for the target device from Class I to Class 5.
As shown in Figure 6b, the fundamental frequency of the device is 176.7 Hz with the amplitude of 11.82, which can also be calculated by the speed  1× = /60 = 10600/60 = 176.7 Hz.The FMVS algorithm process is shown in Algorithm 1.In the table,  and  are the upper and lower bounds of the value domain of the fault virtual sample character-

Rotor System FMVS Generation
Take data set A as an example; the device has Class I failure.Show the construction process of the proposed fault diagnosis method based on IACWGAN-GP with this device as an example, and verify the effectiveness of the method.In the engineering application scenario where the target device can only provide normal data, the fault virtual samples are generated based on the normal-state data of the device and fault mechanism feature frequencies, generating a total of five fault types' complete virtual samples for the target device from Class I to Class 5.
As shown in Figure 6b, the fundamental frequency of the device is 176.7 Hz with the amplitude of 11.82, which can also be calculated by the speed f s 1× = ν/60 = 10600/60 = 176.7 Hz.The FMVS algorithm process is shown in Algorithm 1.In the table, α and β are the upper and lower bounds of the value domain of the fault virtual sample characteristic frequency amplitude, α is 0.5 times the fundamental frequency amplitude, β is 1.2 times the fundamental frequency amplitude, and the amplitude of the fault characteristic frequency is taken within [5.91, 14.18], which can represent the different severity of the fault.Speed ν = 10600 rpm.The fault characteristic frequency parameter γ is taken according to the fault mechanism, and the value of γ is shown in Table 5.The characteristic frequency of the fault mechanism of Class I is 0.375 times the fundamental frequency, usually less than 0.5 times the fundamental frequency.Fault mechanism characteristic frequency of Class II is 2×.Fault mechanism characteristic frequency of Class III is 1×.Fault mechanism characteristic frequency of Class IV is the low frequency component of 0.125 times the fundamental frequency, usually in the range of 1 to 30 Hz. Fault mechanism characteristic frequency of Class V is 4×.The data length L of the normal state data of the equipment is 1024 according to the engineering case data.The number of data groups of normal data and fault virtual samples is set to 100 here.The generated fault virtual samples are shown in Figure 12. Figure 12 I indicate the first group of fault virtual samples with the smallest fault characteristic frequency amplitude, and Figure 12 II indicate the 100th group of fault virtual samples with the largest fault characteristic frequency amplitude.As shown in Figure 13, since the device can provide Class I real fault data, the real fault data can be compared with the generated virtual samples of faults.The real and virtual samples are compared by drawing them under the same coordinate system and it is found that the overlap is very high.The PCC of the real and virtual samples is calculated to be 0.98367, showing that the generated fault virtual samples' data distribution closely As shown in Figure 13, since the device can provide Class I real fault data, the real fault data can be compared with the generated virtual samples of faults.The real and virtual samples are compared by drawing them under the same coordinate system and it is found that the overlap is very high.The PCC of the real and virtual samples is calculated to be 0.98367, showing that the generated fault virtual samples' data distribution closely resembles that of real fault samples.

IACWGAN-GP-Based Fault Diagnosis Model Training
The label set  = ( 1 ,  2 ,  3 , ⋯ ,   ) and the random noise vector  = ( 1 ,  2 ,  3 , ⋯ ,   ) are fed to the generator to produce the fake samples  ̂= (, ).Then the generated samples  ̂ is mixed with the real samples  and used as the input to the discriminator for true-false discrimination and classification.The generators and discriminators are trained alternately until Nash equilibrium is reached.
The fault virtual samples generated in Section 4.2.2 are used as the training set, training the IACWGAN-GP generation module.To avoid overfitting problems during network training, a dropout layer is added after each layer of the discriminator network and the ratio is set to 0.5.A ReLU activation function is used for both generator activation functions.The discriminator convolutional layer activation functions all use LeakyReLU, and the last two fully connected layers' activation functions use sigmoid and softmax, respectively.The model uses the Adam optimizer, and the learning rates of the generator and discriminator are set to 10 −4 and 2 × 10 −4 , respectively.The batch size is 32.Since Wasserstein distance and a gradient penalty are introduced in the model, it will not cause the problem of the generator gradient disappearing because the discriminator accuracy is too high, so the generator is optimized once for every five instances of discriminator optimization.
During the training of the model, the values of the loss function are recorded to characterize the performance of the model, as shown in Figure 14.As the number of iterations increases, both the loss of the discriminator and the loss of the generator drop sharply in the beginning stage and stabilize at about 400 iterations, showing that the model is well trained and can be used for samples generation.

IACWGAN-GP-Based Fault Diagnosis Model Training
The label set Y = (y 1 , y 2 , y 3 , • • • , y k ) and the random noise vector Z = (z 1 , z 2 , z 3 , • • • , z m ) are fed to the generator to produce the fake samples X = G(Z, Y).Then the generated samples X is mixed with the real samples X and used as the input to the discriminator for true-false discrimination and classification.The generators and discriminators are trained alternately until Nash equilibrium is reached.
The fault virtual samples generated in Section 4.2.2 are used as the training set, training the IACWGAN-GP generation module.To avoid overfitting problems during network training, a dropout layer is added after each layer of the discriminator network and the ratio is set to 0.5.A ReLU activation function is used for both generator activation functions.The discriminator convolutional layer activation functions all use LeakyReLU, and the last two fully connected layers' activation functions use sigmoid and softmax, respectively.The model uses the Adam optimizer, and the learning rates of the generator and discriminator are set to 10 −4 and 2 × 10 −4 , respectively.The batch size is 32.Since Wasserstein distance and a gradient penalty are introduced in the model, it will not cause the problem of the generator gradient disappearing because the discriminator accuracy is too high, so the generator is optimized once for every five instances of discriminator optimization.
During the training of the model, the values of the loss function are recorded to characterize the performance of the model, as shown in Figure 14.As the number of iterations increases, both the loss of the discriminator and the loss of the generator drop sharply in the beginning stage and stabilize at about 400 iterations, showing that the model is well trained and can be used for samples generation.To compare the similarity of the generated samples and the fault virtual samples, they are drawn under the same coordinate system as shown in Figure 15.Obviously, these samples are very similar.To further evaluate the quality of the generated samples, PCC and Cosine Similarity (CS) are calculated to measure the similarity between the generated and virtual samples [25].The results are shown in Table 6, and PCC and CS of both virtual and generated samples exceed 0.9, further showing that the generated samples are highly similar to the virtual samples.To compare the similarity of the generated samples and the fault virtual samples, they are drawn under the same coordinate system as shown in Figure 15.Obviously, these samples are very similar.To further evaluate the quality of the generated samples, PCC and Cosine Similarity (CS) are calculated to measure the similarity between the generated and virtual samples [25].The results are shown in Table 6, and PCC and CS of both virtual and generated samples exceed 0.9, further showing that the generated samples are highly similar to the virtual samples.To compare the similarity of the generated samples and the fault virtual samples, they are drawn under the same coordinate system as shown in Figure 15.Obviously, these samples are very similar.To further evaluate the quality of the generated samples, PCC and Cosine Similarity (CS) are calculated to measure the similarity between the generated and virtual samples [25].The results are shown in Table 6, and PCC and CS of both virtual and generated samples exceed 0.9, further showing that the generated samples are highly similar to the virtual samples.Using data set A as an example, Section 4.2.2 shows the process of the IACWGAN-GPbased fault diagnosis model in constructing full fault types dataset A * .Data set A contains only real Class I fault data and Class VI normal data.In order to more comprehensively show the correlation between the generated fault mechanism feature spectrum samples of the proposed method and real fault samples, the above process is repeated using data set B to data set E. Based on the normal-state data of the equipment in each data set and the fault mechanism feature frequencies, a total of five fault mechanism feature spectrum samples from Class I to Class V are generated using the IACWGAN-GP generation module.The constructed complete fault data sets are shown in Table 7.
As shown in Figure 16, the virtual samples are highly similar to the real samples.The PCC and CS between the virtual sample and the real sample are calculated, and as shown in Table 8, the PCC and CS values of the virtual sample and the real sample are both higher than 0.9, showing that they are highly positively correlated.The IACWGAN-GP generation module is trained using the above generated virtual samples to obtain the generated fault mechanism feature spectrum samples.As shown in Figure 17, the generated samples and real samples are very comparable to one another.PCC and CS of generated and real samples are calculated, and as shown in Table 9, these values are higher than 0.88, showing a high positive correlation between them.The IACWGAN-GP generation module is trained using the above generated virtual samples to obtain the generated fault mechanism feature spectrum samples.As shown in Figure 17, the generated samples and real samples are very comparable to one another.PCC and CS of generated and real samples are calculated, and as shown in Table 9, these values are higher than 0.88, showing a high positive correlation between them.The IACWGAN-GP generation module is trained using the above generated virtual samples to obtain the generated fault mechanism feature spectrum samples.As shown in Figure 17, the generated samples and real samples are very comparable to one another.PCC and CS of generated and real samples are calculated, and as shown in Table 9, these values are higher than 0.88, showing a high positive correlation between them.The IACWGAN-GP diagnosis module is trained using data sets A * to E * , respectively, and the well-trained fault diagnosis module is tested using real Class I to Class VI data, respectively, and all test experiments are conducted 10 times and averaged; the results as shown in Figure 18.The diagnosis accuracy of shaft misalignment faults is 93%, the diagnosis accuracy of oil-whirl faults is 99%, the diagnosis accuracy of the remaining fault types is 100%, and the overall average fault diagnosis accuracy is 98.7%.

IACWGAN-GP-Based Fault Diagnosis Model Test and Methods Comparison
The IACWGAN-GP diagnosis module is trained using data sets A* to E*, respectively, and the well-trained fault diagnosis module is tested using real Class I to Class VI data, respectively, and all test experiments are conducted 10 times and averaged; the results as shown in Figure 18.The diagnosis accuracy of shaft misalignment faults is 93%, the diagnosis accuracy of oil-whirl faults is 99%, the diagnosis accuracy of the remaining fault types is 100%, and the overall average fault diagnosis accuracy is 98.7%.To demonstrate the efficacy of the fault mechanism feature spectrum samples generated by the proposed method, the Classifiers MLP, CNN, and ACGAN are trained separately using the complete fault data sets in Table 7, and then the real data in the data sets is used as test data to calculate the fault diagnosis accuracy of the classification networks and compare them with the IACWGAN-GP diagnosis module classification methods.Since both CNN and ACGAN have convolution operations, the input adaption layer is introduced before the first convolution layer to lift the limitation on the input data length.The structures and parameters of the three Classifiers MLP, IA-CNN, and IA-ACGAN are shown in Table 10.To demonstrate the efficacy of the fault mechanism feature spectrum samples generated by the proposed method, the Classifiers MLP, CNN, and ACGAN are trained separately using the complete fault data sets in Table 7, and then the real data in the data sets is used as test data to calculate the fault diagnosis accuracy of the classification networks and compare them with the IACWGAN-GP diagnosis module classification methods.Since both CNN and ACGAN have convolution operations, the input adaption layer is introduced before the first convolution layer to lift the limitation on the input data length.The structures and parameters of the three Classifiers MLP, IA-CNN, and IA-ACGAN are shown in Table 10.
The classifiers MLP, IA-CNN, and IA-ACGAN are trained using the complete data sets A * to E * in Table 7, respectively, and then the real data in the data sets are used as test data to calculate the fault diagnosis accuracy of the above classifiers, the results as shown in Figure 19.In the case of providing only the normal data of the equipment and using the method proposed in this paper to generate a complete fault data set to train the classifiers, the fault diagnosis accuracy of MLP, which has a very simple network structure, reaches 0.927, and the highest fault accuracy is the IACWGAN-GP, which reaches 0.987, reflecting the effectiveness of the method proposed in this paper.
A T-SNE is used to downscale and feature-visualize the complete data sets A * ~E* in Table 7, as shown in Figure 20.From the figure, it can be seen that the same classes of fault data are clustered together, while different classes of fault data are far away from each other, so all four classification networks in Table 11 achieve more than 92% fault diagnosis accuracy.The classifiers MLP, IA-CNN, and IA-ACGAN are trained using the complete data sets A * to E * in Table 7, respectively, and then the real data in the data sets are used as test data to calculate the fault diagnosis accuracy of the above classifiers, the results as shown in Figure 19.In the case of providing only the normal data of the equipment and using the method proposed in this paper to generate a complete fault data set to train the classifiers, the fault diagnosis accuracy of MLP, which has a very simple network structure, reaches 0.927, and the highest fault accuracy is the IACWGAN-GP, which reaches 0.987, reflecting the effectiveness of the method proposed in this paper.results, a custom windowing function is used to correct the results.The custom windowing function is defined as follows:

MLB
where G i is the fault severity evaluation result of group i data, G i c is the fault severity evaluation result after correction of group ith data, τ is the window scale factor, mode is the mode function, and the mode of the evaluation result within the range of the window scale factor is taken as the correction value of the fault severity evaluation result.

Fault Severity Evaluation
The oil-whirl data of different fault severity shown in Figure 11 amounts to a total of 24,434 sets of data, and the length of each set of data is 1024.The amplitude ratio of oilwhirl fault-characteristic frequency to fundamental frequency is shown in Figure 21a.The data is evaluated for fault severity, and the results are shown in Figure 21b.Due to the large fluctuation of the amplitude ratio of engineering data, the evaluation results of fault severity also fluctuate.In order to reduce the fluctuation of the fault severity evaluation results, a custom windowing function is used to correct the results.The custom windowing function is defined as follows: ( ) mode : where   is the fault severity evaluation result of group i data,    is the fault severity evaluation result after correction of group ith data,  is the window scale factor, mode is the mode function, and the mode of the evaluation result within the range of the window scale factor is taken as the correction value of the fault severity evaluation result.load of the cloud server and the pressure on network bandwidth occupation caused by data uploading to the cloud, but it also enables fast and efficient potential fault-warning and predictive maintenance decisions.

Application Layer
The cloud design adopts an industrial micro-service architecture, which mainly includes modular components such as fault detection, fault diagnosis, and health evaluation.
Fault detection micro-service is used to detect the occurrence of potential faults in rotating equipment.Fault diagnosis micro-service is used to identify the potential fault type and fault location of rotating equipment.Health assessment micro-service is used to evaluate the severity of rotating equipment failure.
Intelligentization is the essential feature of modular components of industrial microservices.Industrial micro-service modular components such as fault detection, fault diagnosis, and health evaluation provide knowledge for making maintenance decisions and optimizing maintenance tasks.

Conclusions
Aiming at the engineering status of unbalanced data samples for rotating equipment, this paper studies an improved auxiliary classifier Wasserstein generative adversarial network with gradient penalty for fault diagnosis of tilting pad bearings.The work can be summarized as follows: (1) An improved auxiliary classifier Wasserstein generative adversarial network with gradient penalty is developed, in which the input data length adaptive layer is added before the 2D convolution layer of the discriminator.It overcomes the limitation of neural networks on the length of input data and improves the applicability and generalization of neural networks to various types of data.
(2) A fault diagnosis method based on IACWGAN-GP for tilting pad bearings is proposed, which is able to accurately identify the early faults of tilting pad bearing oil whirl despite the interference of shaft misalignment, rotor imbalance, surges, rubbing and other faults that may occur simultaneously in rotating equipment.This method can identify oil whirl faults as they develop from weak to strong, and evaluate the grade of the fault.The engineering case-data verification results show that, with only normal data of the equipment, the model can achieve an accuracy of 98.7% in spotting upcoming faults.Train Multilayer Perceptron, CNN and Auxiliary Classifier GAN fault diagnosis models using full-fault virtual samples, and the accuracy of the models reach 92.7%, 97.7%, and 98.3%, separately.The proposed method and three comparison methods are tested by using cross-device and cross-condition engineering case data sets.The fault diagnosis accuracy of the proposed method and the three comparison methods are 98%, 60.8%, 31.8% and 77.7%, respectively, and the proposed method shows better robustness.
(3) The application of an IACWGAN-GP-based fault diagnosis model in an industrial Internet environment, via a cloud-integrated prediction and health management system, which includes cyber-physical system layer, network layer and application layer, is proposed.The application layer consists of micro-service systems such as early fault warning, health evaluation and fault diagnosis.
In this paper, the typical fault diagnosis of rotor systems is studied, and the proposed fault diagnosis method has a high fault diagnosis accuracy and robustness.However, the engineering case data involved in this paper only contains the data of a single type of fault.When the equipment has multiple faults at the same time, the proposed method can only draw a diagnosis conclusion for one of the faults.In addition, when a fault outside the fault category included in the training data set occurs, the proposed method will draw a similar diagnostic conclusion based on the similarity of fault characteristics between the unknown fault and the known fault, which may lead to incorrect diagnosis results.Future studies will collect more complex fault engineering case data, use the virtual sample generation module in the proposed method to generate complex virtual fault

Figure 4 .
The model is divided into a model training phase and an online monitoring phase.The model training phase includes a virtual sample generation module, an IACWGAN-GP generation module, and an IACWGAN-GP diagnosis module.Using the virtual sample generation module to generate virtual samples of full fault types, the IACWGAN-GP generation module is trained.The well-trained IACWGAN-GP generation module is used to generate the required number and required type of fault samples, which, together with the normal samples of the equipment, form a complete data set to train the IACWGAN-GP diagnosis module.The well-trained diagnosis module can be used for fault diagnosis in the online monitoring phase.The online monitoring phase includes a fault diagnosis module and a fault-severity evaluation module.The fault diagnosis module uses the IACWGAN-GP diagnosis module with good training in the model training phase as the classifier.The real-time original vibration signal is transformed into a frequency domain signal by FFT and then input into the classifier to obtain the real-time fault diagnosis result.After the fault diagnosis results are obtained, the fault-severity evaluation module enables the fault-severity evaluator of the corresponding fault type, and the frequency domain signal is input into the evaluator to obtain the real-time fault-severity evaluation results.

3. 2 .
Establishing Fault Diagnosis Model Based on IACWGAN-GP The architecture of the IACWGAN-GP-based fault diagnosis model is shown in Figure 4.The model is divided into a model training phase and an online monitoring phase.The model training phase includes a virtual sample generation module, an IACWGAN-GP generation module, and an IACWGAN-GP diagnosis module.Using the virtual sample generation module to generate virtual samples of full fault types, the IACWGAN-GP generation module is trained.The well-trained IACWGAN-GP generation module is used to generate the required number and required type of fault samples, which, together with the normal samples of the equipment, form a complete data set to train the IACWGAN-GP diagnosis module.The well-trained diagnosis module can be used for fault diagnosis in the online monitoring phase.The online monitoring phase includes a fault diagnosis module and a fault-severity evaluation module.The fault diagnosis module uses the IACWGAN-GP diagnosis module with good training in the model training phase as the classifier.The real-time original vibration signal is transformed into a frequency domain signal by FFT and then input into the classifier to obtain the real-time fault diagnosis result.After the fault diagnosis results are obtained, the fault-severity evaluation module enables the fault-severity evaluator of the corresponding fault type, and the frequency domain signal is input into the evaluator to obtain the real-time fault-severity evaluation results.

Figure 4 .
Figure 4. Architecture of fault diagnosis model based on IACWGAN-GP.

Figure 4 .
Figure 4. Architecture of fault diagnosis model based on IACWGAN-GP.

Figure 9 .
Figure 9. (a) Tilting pad bearings.(b) Installation method of non-contact eddy current sensor.Figure 9. (a) Tilting pad bearings.(b) Installation method of non-contact eddy current sensor.

Figure 10 .
Figure 10.Time domain waveforms and spectrograms of engineering case data.(a) Class I. (b) Class II.(c) Class III.(d) Class IV.(e) Class V.

Figure 10 .
Figure 10.Time domain waveforms and spectrograms of engineering case data.(a) Class I. (b) Class II.(c) Class III.(d) Class IV.(e) Class V.

Figure 11 .
Figure 11.Oil whirl spectrum with different fault severity.(a) The normal state.(b) Grade I oil whirl.(c) Grade II oil whirl.(d) Grade III oil whirl.(e) Grade IV oil whirl.

Figure 11 .
Figure 11.Oil whirl spectrum with different fault severity.(a) The normal state.(b) Grade I oil whirl.(c) Grade II oil whirl.(d) Grade III oil whirl.(e) Grade IV oil whirl.

Figure 13 .
Figure 13.Comparison of real and virtual sample.

Figure 13 .
Figure 13.Comparison of real and virtual sample.

Figure 14 .
Figure 14.Loss of discriminator and generator.

Figure 15 .
Figure 15.Comparison of virtual and generated sample.(a) Class I. (b) Class II.(c) Class III.(d) Class IV.(e) Class V.

Figure 14 .
Figure 14.Loss of discriminator and generator.

Figure 14 .
Figure 14.Loss of discriminator and generator.

Figure 15 .
Figure 15.Comparison of virtual and generated sample.(a) Class I. (b) Class II.(c) Class III.(d) Class IV.(e) Class V.

Figure 15 .
Figure 15.Comparison of virtual and generated sample.(a) Class I. (b) Class II.(c) Class III.(d) Class IV.(e) Class V.

Figure 16 .
Figure 16.Comparison of virtual and real sample.(a) Class II.(b) Class III.(c) Class IV.(d) Class V.

Figure 17 .
Figure 17.Comparison of generated and real sample.(a) Class I. (b) Class II.(c) Class III.(d) Class IV.(e) Class V.

Figure 19 .
Figure 19.Average diagnosis accuracy of comparison methods on engineering case data.

Figure 19 .
Figure 19.Average diagnosis accuracy of comparison methods on engineering case data.

Figure 21 .
Figure 21.(a) Amplitude ratio of oil-whirl fault data.(b) The results of fault severity evaluation.The fault severity evaluation results corrected by the custom windowing function are shown in Figure22.With the increase of the window scale factor , the fluctuation of the evaluation results decreases, and the development trend of the oil whirl fault severity of the equipment is more obvious.

Figure 21 .Figure 22 .Figure 22 .
Figure 21.(a) Amplitude ratio of oil-whirl fault data.(b) The results of fault severity evaluation.The fault severity evaluation results corrected by the custom windowing function are shown in Figure 22.With the increase of the window scale factor τ, the fluctuation of the evaluation results decreases, and the development trend of the oil whirl fault severity of the equipment is more obvious.Lubricants 2023, 11, x FOR PEER REVIEW 29 of 33

Table 1 .
PCC of fault data and normal data.

Table 2 .
Engineering case data introduction.

Table 4 .
Introduction to the data sets.

Table 3 .
Rotor system typical fault characteristic frequency.

Table 4 .
Introduction to the data sets.
Lubricants 2023, 11, x FOR PEER REVIEW 19 of 33 strong.Figure 11a is in the normal state, and the fault characteristic frequency   of oil whirl has not yet appeared.Figure11bhas a fault characteristic frequency   with a relatively small amplitude.Figure11b-e, as the fault gradually develops from weak to strong, the amplitude of the fault characteristic frequency   gradually increases.

Table 5 .
Fault characteristic frequency parameters.Lubricants 2023, 11, x FOR PEER REVIEW 20 of 33 mechanism characteristic frequency of Class V is 4 ×.The data length  of the normal state data of the equipment is 1024 according to the engineering case data.The number of data groups of normal data and fault virtual samples is set to 100 here.The generated fault virtual samples are shown in Figure 12. Figure 12 I indicate the first group of fault virtual samples with the smallest fault characteristic frequency amplitude, and Figure 12 II indicate the 100th group of fault virtual samples with the largest fault characteristic frequency amplitude.

Table 6 .
Similarity of virtual and generated samples.GP generation module is used to generate 200 samples for each fault type, which are combined with the device's normal-state data to form a full fault types data set to be used as the training set for the IACWGAN-GP diagnosis module.The constructed complete data set is shown in Table 7 A * , and the real data are used as test data to calculate the fault diagnosis accuracy of the fault diagnosis model.The fault diagnosis model is tested using real Class I and Class VI data.The test experiments are performed 10 times and averaged, and the fault diagnosis accuracy is 100% for both fault types.

Table 7 .
Full fault types data sets.

Table 8 .
Similarity of virtual sample and real sample.

Table 9 .
Similarity of generated sample and real sample.

Table 9 .
Similarity of generated sample and real sample.

Table 9 .
Similarity of generated sample and real sample.

Table 10 .
Structure and parameters of the classifiers.

Table 10 .
Structure and parameters of the classifiers.