1. Introduction
Bearings are an essential and key part that is widely used in modern rotating machinery. As a vital component, the occurrence of bearing faults will result in significant breakdown time, increasing maintenance costs, and even jeopardizing casualties. Therefore, it is critical to precisely and quickly diagnose the bearing status [
1,
2,
3]. To explore the bearing status, a variety of signals are collected and used, such as acoustic signals [
4], vibration signals [
5], and current signals [
6]. Among them, the vibration signals contain abundant fault energy information, and the data acquisition of bearing vibration signals does not require complex equipment and professionals.
Therefore, the vibration signal is popularly used to monitor the bearing status. Bearing fault diagnosis techniques via vibration signals can be generally categorized into two classes, including signal-analysis-based and data-driven methods. Regarding the signal-analysis-based method, the raw vibration signals are firstly analyzed using signal processing methods such as time-domain analysis [
7,
8], frequency-domain analysis [
9] and time–frequency-domain analysis [
10]. Afterward, the bearing status is determined by features extracted from different domains using expert knowledge.
Data-driven methods depend only on the vibration signals, as opposed to signal analysis-based methods. In data-driven methods, the bearing fault diagnosis becomes a pattern recognition problem. To deal with the pattern recognition of high-dimensional data in bearing fault diagnosis, dimensionality reduction techniques have been widely employed, such as principal component analysis (PCA) [
11], locality preserving projection (LPP) [
12], and recurrence analysis (RA) [
13,
14]. Moreover, t-distributed stochastic neighbor embedding (t-SNE) is an efficient dimensionality reduction tool. t-SNE identifies close similarities between samples through the relative location of points in the mapped feature space. For t-SNE, the number of features in the reduced space is not restricted by the number of output dimensions [
15]. In [
15], t-SNE is combined with the multiscale distribution entropy method to extract the low-dimensional nonlinear complexity features from the vibration signals of rolling bearing. Similarly, uniform manifold approximation and projection (UMAP) uses graph layout algorithms to arrange data in a low-dimensional space [
16]. In [
16], UMAP is combined with feature selection techniques to improve the performance of latent space visualization for chemical process data. As a popular unsupervised learning method, autoencoder (AE) can learn effective features with unlabeled data by minimizing the error between original input and reconstructed input. Deep autoencoder (DAE) has been widely used to extract hierarchical features for bearing fault diagnosis from vibration signal [
17]. Variational autoencoder (VAE) generates a latent representation of the data through imposing a distribution over the latent variables [
18]. In [
18], a semi-supervised learning scheme using variational autoencoder (VAE)-based deep generative models was proposed to deal with the small labeled data problem in the bearing fault diagnosis.
Compared to unsupervised dimensionality reduction techniques, the supervised method, Fisher discriminant analysis (FDA), aims to find the low-dimensional representation from the high-dimensional data to simultaneously maximize the distance between different classes and minimize the distance within the same class. FDA was also named linear discriminant analysis (LDA). More importantly, FDA can fully utilize the labeled information to directly offer the classification results. Therefore, FDA has gained considerable attention to achieving the task of bearing fault diagnosis in recent years. Due to its simplicity and efficiency, FDA has proved its superiority in fault diagnosis.
Jin et al. [
19] developed trace ratio linear discriminant analysis (TR-LDA) to address the non-Gaussian data by solving the trace ratio problem. Zhou et al. [
20] employed LDA to reduce the dimensionality of 10 statistical features of the raw vibration signals and its transient component using transient-extracting transform (TET) for improving the fault diagnosis performance. To address the nonlinearity issue contained in vibration signals, variants of nonlinear fisher discriminant analysis have been proposed. One of the most frequent extensions is kernel Fisher discriminant analysis (KFDA) [
21]. The idea behind KFDA is to map the original data onto a high-dimensional feature space in which linearly separable feature space is expected for further classification. Van et al. [
22] presented a wavelet kernel-based local Fisher discriminant analysis (WKLFDA) to extract the nonlinear features from original vibration signals for bearing fault diagnosis. Jiang et al. [
23] proposed a semi-supervised kernel marginal Fisher analysis (SSKMFA) method to investigate the inherent manifold structure embedded in data, and simultaneously took the intra-class compactness and the inter-class separability into account. Tao et al. [
24] developed a semi-supervised kernel local Fisher discriminant analysis (SSKLFDA) through introducing the regularization term with pseudo labels by utilizing unlabeled data for the supervised dimensionality reduction.
However, a large number of features extracted from kernel methods will increase the computation burden and may lead to poor fault diagnosis performance, due to all training data being involved in the calculation of the kernel matrix. To lessen the computing complexity of KFDA, Li et al. [
25] presented a feature vector selection (FVS) approach by using a subset of the samples to present all of the data under the geometrical consideration. Liu et al. [
26] combined the kernel feature selection method and KFDA technique to reduce the computation burden and alleviate the impact of irrelevant features in fault diagnosis. Recently, random feature map was widely studied in large-scale kernel machines [
27,
28]. Rahimi and Recht suggested a random Fourier feature mapping approach for approximating non-linear kernels by mapping the input data to randomized low-dimensional feature space [
27]. Fisher discriminant analysis using random Fourier feature mapping was developed to accelerate kernel Fisher discriminant analysis in [
29,
30]. In [
30], a random feature map was introduced to map the input data to a finite dimension to accelerate FDA and kernel FDA. Moreover, a theoretical guarantee was offered to prove that the FDA algorithms using random projection can derive good generalization ability. In [
29], the randomized solution to linear discriminant analysis was developed for processing hyperspectral images to overcome the dimensionality problem. To the best of our knowledge, there is little investigation of Fisher discriminant analysis using random feature map in bearing fault diagnostics.
Motivated by the above discussions, a new bearing fault diagnosis method is proposed by using randomized Fisher discriminant analysis (RFDA). Specifically, 12 time-domain features are first extracted from original vibration signals. Then, an RFDA model is built by using the extracted 12 time-domain features for fault diagnosis. In RFDA, the high-dimensional data are mapped to a low-dimensional features space using random feature mapping, then the projection matrix with Fisher discriminant analysis is calculated. To identify the state of bearings, a Bayesian inference is employed. The main contributions of this paper lie on the following aspects:
RFDA, a nonlinear variant of FDA, is utilized for bearing fault diagnosis. The RFDA-based method can achieve similar performance to the KFDA-based method, while the computational burden is remarkably reduced.
Two widely used bearing datasets are employed to validate the effectiveness of the proposed RFDA-based bearing fault diagnosis method. Results show the superior performance of the proposed method over other related methods.
The remainder of this paper is structured as follows. In
Section 2, a brief review of Fisher discriminant analysis and random Fourier feature map is given. In
Section 3, details of the proposed RFDA based fault diagnosis method are described. In
Section 4, two experimental datasets of bearings are used to assess the performance of the proposed method comparing with other related methods. Conclusions are drawn in the final
Section 5.
2. Related Works
2.1. Fisher Discriminant Analysis
The aim of the FDA is to find a linear transformation matrix to separate the projections in the low-dimensional space as much as possible. For this purpose, FDA measures the compactness of each class with the within-class scatter matrix and the distance between classes with the between-class scatter matrix. Through maximizing the ratio of between-class to within-class scatter matrices, the optimal transformation matrix is calculated.
Denote the training dataset as
D,
where the sample
is the
m-dimensional vector.
represents the category of
.
N samples are contained in the training dataset
D.
Assumed that there are
k types in
D. Define
as the number of samples of type
, and
as the set of all samples which belong to type
. Then, the between-class scatter matrix
is defined as
where
is the mean vector of the type
, and
is the mean vector of all samples.
The within-class scatter matrix
is defined as,
To find the optimal projection matrix
, the optimization problem of FDA is defined as,
where
is the trace operator.
Assumed that
is non-singular, the solution of the optimization problem (3) can be feasibly obtained by using generalized eigenvalue decomposition [
31],
where
is the eigenvector and the corresponding eigenvalue is
. Generally, the eigenvectors which are corresponding to the largest
d eigenvalues are retained for the purpose of dimensionality reduction. Thus, the projection matrix
is constructed as
.
Using the derived projection matrix
, the low-dimensional projection vector
of the original data
is,
2.2. Random Fourier Feature Map
To address the issue of nonlinearity, kernel methods are usually employed to extend the linear dimensionality reduction methods to their nonlinear variants. According to Cover’s theorem, the original input data is mapped to a high-dimensional, even an infinite, reproducing kernel Hilbert space (RKHS) by a given nonlinear mapping function .
Define the mapping function as,
Since the nonlinear features are created from the implicit mapping function, the core of the kernel methods relies on the implicit lifting through the kernel trick. Using the kernel trick, the nonlinear features are generated by calculating the inner product between pairs of input points.
The inner product between lifted data points
and
is defined as,
where
represents the inner product operator.
In the kernel method, the kernel matrix can be constructed for
N training samples
as,
To generate the features for a testing data point, the kernel matrix should be evaluated. As displayed in the kernel matrix , all training data are involved. As a result, the disadvantage of the kernel method is that there are large computational and storage costs while facing large training sets.
For addressing this issue, the shift-invariant kernels
are related to random nonlinear features with the help of Bochner’s theorem [
32]. Defining
, the inner product
can be expressed as,
where
is the inverse Fourier transform of
k.
is sampled from the distribution
.
is the expectation operator.
is the complex conjugate of the inverse Fourier transform of
.
can be considered an unbiased estimate of
.
Furthermore, the kernel
is approximated as below,
where
.
According to Euler’s formula, it obtains
To obtain a real-value random feature for
k, the distribution
and kernel
k should be real. Thus, only the real part of the exponential would remain in Equation (
11). By replacing
with
[
27], then
Based on Equation (
11), the original data are explicitly projected onto a low-dimensional Euclidean inner product space through randomized feature map in [
27], instead of implicitly mapping function
. Following this idea, the inner product
is approximated by defining an explicit random feature map function
, where
where
with
b is drawn from uniform distribution
. It is noted that the parameters
b are used to make the expectation of the inner product of
close to the shift-invariant kernel. For further explanation of the parameters
b, refer to [
27,
33].
Thus, the inner product is expressed as
where
samples of
are randomly chosen. Define
-dimensional vector
Based on the random feature map , a -dimensional feature is obtained from the original data . Then, linear FDA is conducted in the -dimensional feature space. The details of Fisher discriminant analysis with random feature map are elaborated in the next section.
Remark 1. The bound error between the kernel matrix and was discovered in spectral norm as in [28], Obviously, the error becomes small as the dimensionality is selected as a large number. Yet, the computational cost will also increase. To balance the computational burden and the error, the dimensionality of random features can be determined by cross-validation in the offline training phase.