A Semi-Supervised Approach to Bearing Fault Diagnosis under Variable Conditions towards Imbalanced Unlabeled Data

Fault diagnosis of rolling element bearings is an effective technology to ensure the steadiness of rotating machineries. Most of the existing fault diagnosis algorithms are supervised methods and generally require sufficient labeled data for training. However, the acquisition of labeled samples is often laborious and costly in practice, whereas there are abundant unlabeled samples which also imply health information of bearings. Thus, it is worthwhile to develop semi-supervised methods of fault diagnosis to make effective use of the plentiful unlabeled samples. Nevertheless, considering the normal data are much more than the faulty ones, the problem of imbalanced data exists among unlabeled samples for fault diagnosis. Besides, in practice, bearings often work under uncertain and variable operation conditions, which would also have negative influence on fault diagnosis. To solve these issues, a novel hybrid method for bearing fault diagnosis is proposed in this paper: (1) Inspired by visibility graph, a novel fault feature extraction method named visibility graph feature (VGF) is proposed. The obtained features by VGF are natively insensitive to variable conditions, which has been validated by a simulation experiment in this paper; (2) On basis of VGF, to deal with imbalanced unlabeled data, graph-based rebalance semi-supervised learning (GRSSL) for fault diagnosis is proposed. In GRSSL, a graph based on a weighted sparse adjacency matrix is constructed by the k-nearest neighbors and Gaussian Kernel weighting algorithm by means of the samples. Then, a bivariate cost function over classification and normalized label variable is built up to rebalance the importance of labels. Finally, the proposed VGF-GRSSL method was verified by data collected from Case Western Reserve University Bearing Data Center. The experiment results show that the proposed method of bearing fault diagnosis performs effectively to deal with the imbalanced unlabeled data under variable conditions.


Introduction
Rolling element bearings are one of the most frequently used components in machines. Reference [1] shows that a large part of faults in machines (above 40% of the total faults) owes to bearings. Therefore, bearing fault diagnosis has been a research hotspot for decades. Reference [2] reviewed the diagnostic techniques over the past decade and summarized six trends in fault diagnosis for the electrical machines. So far, many studies have proposed plenty of feature extraction

Related Work
Feature extraction is a key step for bearing fault diagnosis. There have been many studies on feature extraction under variable conditions. Borghesani et al. [19] proposed a new procedure for using envelop analysis to remove the effects of variable conditions, but required prior knowledge about defect frequencies of bearings. Feng et al. [20] exploited a concentration of frequency and time method to deal with the variable speed conditions, but required prior information about the undergoing conditions. Tian et al. [21] employed local mean decomposition method and extreme learning machine to detect the bearing fault under variable operation conditions, but required complete condition data. Wang et al. [22] conjugated variation mode decomposition and singular value decomposition to extract features which can fit the variable conditions adaptively. However, it is even impossible for such prior knowledge and complete data for all conditions. Therefore, it is essential to extract fault features which are natively insensitive to variable conditions. Visibility graph(VG), proposed by Lacasa et al. [23] intends to convert time series into complex networks, and the obtained structure parameters not only imply the characteristics of the raw data, but also keep invariant under several transformations of the data. Therefore, inspired by VG, a novel feature extraction method under variable conditions is proposed in this paper.
Fault classification is the following step after feature extraction. There are a large number of references on vibration-based bearing fault classification [7][8][9]. However, most of the existing methods purely depend on labeled data for training. The masses of unlabeled samples acquired in practice are abandoned. To make effective use of those valuable unlabeled samples, the semi-supervised learning (SSL) [24], which is capable of exploiting the unlabeled data combined with a small amount of labeled data to train a well-performed classifier, has been a highlight issue in the field of bearing fault diagnosis.
Concerning this issue, Li [25] proposed a semi-supervised weighted kernel clustering algorithm based on gravitational search for bearing fault diagnosis and processed the unlabeled samples by calculating the weighted kernel distances among them and fault cluster centers. Qin et al. [26] employed a tritraining method for bearing fault diagnosis. Zhao et al. [27] proposed a new graph-based semi-supervised classification for bearing fault diagnosis with sparse coding method. However, these existing semi-supervised methods of bearing fault diagnosis have not considered the imbalance of the unlabeled data. Therefore, it is crucial to develop novel semi-supervised methods for bearing fault diagnosis to deal with the imbalanced unlabeled data.
Among the semi-supervised methods, Graph-based semi-supervised learning (GSSL) [17,18] considers samples as vertices in a graph and shapes the pairwise edges, which are determined by the similarity between the corresponding samples. Then, the small portion of labels propagate to predict labels for unlabeled samples by semi-supervised learning method. Subramanya et al. [28] have proved that GSSL outperforms non-graph-based SSL approaches. Based on the GSSL, Jebara et al. [16] extends a bivariate optimization by which both classification function and a normalized label variable are considered to rebalance the importance of labels. Therefore, the graph-based rebalance semi-supervised learning (GRSSL) is employed for fault classification to deal with the imbalanced unlabeled data.
The intended contributions of this study can be summarized as follows:

1.
A novel fault feature extraction method named visibility graph feature (VGF) is proposed to extract fault characteristics under variable conditions. The vibration signals are mapped into graphs whose structures are natively insensitive to variations of signals corresponding to the speed and load conditions.

2.
A GRSSL algorithm by blending the normalized label variable is developed for bearing fault diagnosis with imbalanced and unlabeled data. To cope with the imbalanced problem, a bivariate optimization function is employed, and the importance of labels is rebalanced by the normalized label variable term.

Methodology
In this section, the proposed fault diagnosis method based on VGF-GRSSL is described in detail, as shown in Figure 1. There are four layers in the method: data segmentation layer, feature extraction layer, fault classification layer, and output layer.

1.
Data segmentation layer: The signals are segmented according to the sample rate and rough shaft speed to ensure each obtained sample covers several circles of signals. In the graph construction, an adjacency matrix is calculated based on a kernel function firstly, and then sparsified and reweighted by means of the k-nearest neighbors (KNN) and Gaussian Kernel weighting algorithm, respectively. Then, a graph with a weighted sparse undirected adjacency matrix is calculated from the samples.
During the label propagation process, a bivariate function over classification function and the unknown labels is built up, and then, a normalized label variable is used to rebalance the importance of labels. Finally, the greedy gradient-based Max-Cut algorithm is adopted to solve the optimization model. 4 Output layer: The unlabeled data is marked with the corresponding label according to the classification with few labeled data. The GRSSL classification based on VGF is expected to extend the bearing fault diagnosis under variable condition with imbalanced unlabeled data.

Feature extraction
Data segmentation   In the graph construction, an adjacency matrix is calculated based on a kernel function firstly, and then sparsified and reweighted by means of the k-nearest neighbors (KNN) and Gaussian Kernel weighting algorithm, respectively. Then, a graph with a weighted sparse undirected adjacency matrix is calculated from the samples. During the label propagation process, a bivariate function over classification function and the unknown labels is built up, and then, a normalized label variable is used to rebalance the importance of labels. Finally, the greedy gradient-based Max-Cut algorithm is adopted to solve the optimization model.

4.
Output layer: The unlabeled data is marked with the corresponding label according to the classification with few labeled data. The GRSSL classification based on VGF is expected to extend the bearing fault diagnosis under variable condition with imbalanced unlabeled data.

Construction of Visibility Graph
The theory of visibility graph has been lucubrated for years and applied widely on many fields, such as engineering and urban planning [29]. Lacasa et al. [23] introduced the theory to the analysis of time series. Here, an undirected complex network is established by individual observations in a time series whose connectivity is defined through the visibility condition in physical space. It has been proved that the obtained graph inherits the intrinsic properties of the raw time series. For instance, the values of impulses of time series are related to the hubs of the graph, and the periods of time series are related to the degree distributions of the graphs. Motivated by this creative idea, this paper proposes a novel feature extraction method based on visibility graph.
An example is given to describe how to convert a time series into a visibility graph, as shown in Figure 2. There are 10 points of a random series. Then, the visibility exists between any two points if there is a straight line that does not intersect any intermediate data height. The complexity of the time series is supposed to be revealed in the structure of the obtained visibility graph. The theory of visibility graph has been lucubrated for years and applied widely on many fields, such as engineering and urban planning [29]. Lacasa et al. [23] introduced the theory to the analysis of time series. Here, an undirected complex network is established by individual observations in a time series whose connectivity is defined through the visibility condition in physical space. It has been proved that the obtained graph inherits the intrinsic properties of the raw time series. For instance, the values of impulses of time series are related to the hubs of the graph, and the periods of time series are related to the degree distributions of the graphs. Motivated by this creative idea, this paper proposes a novel feature extraction method based on visibility graph.
An example is given to describe how to convert a time series into a visibility graph, as shown in Figure 2. There are 10 points of a random series. Then, the visibility exists between any two points if there is a straight line that does not intersect any intermediate data height. The complexity of the time series is supposed to be revealed in the structure of the obtained visibility graph.
The visibility criteria can be formulated as follows: two arbitrary points ( ) a a t x , of the series data will be connected as nodes of the graph, if the points ( ) It is obviously that the associated graph constructed from a time series is always: 1 Connected: each node can see its nearest two neighbors at least. 2 It is an undirected link between every two node. 3 Invariant under affine transformations: the visibility keeps invariant under the resizing of both horizontal and vertical axes (as shown in Figure 3). Therefore, the visibility-graph-based features are natively insensitive to variable conditions. The visibility criteria can be formulated as follows: two arbitrary points (t a , x a ) and (t b , x b ) of the series data will be connected as nodes of the graph, if the points (t c , x c ) between them fufill:

3.
Invariant under affine transformations: the visibility keeps invariant under the resizing of both horizontal and vertical axes (as shown in Figure 3). Therefore, the visibility-graph-based features are natively insensitive to variable conditions.

Features Based on Visibility Graph
After the construction of the graph, an adjacent matrix  Graph Index Complexity (GIC) As a measure of complexity of a graph, GIC [30] is also involved for feature extraction in this study. GIC is defined as follows: λ max represents the largest eigenvalue of the adjacency matrix of a graph with n nodes.

Features Based on Visibility Graph
After the construction of the graph, an adjacent matrix W V ∈ B n×n can be defined to record the links of the graph. It is a symmetric matrix, and the values of elements are either 1 or 0, which indicate that there is a connection or not between the corresponding nodes. For example, if one of the elements w ij = 1, it indicates that the node i and node j are connected with each other. Then, degree distribution of visibility graph (DDVG) and graph index complexity (GIC) are involved for feature extraction under variable conditions.

Degree Distribution of Visibility Graph (DDVG)
Define the degree distribution D V = [d 1 , · · · , d n ] with d i = ∑ n j=1 w ij , which indicate the connection situation of the graph. Then, the mean and standard deviation of the degree distribution (Mean(D v ) and Std(D v )) are calculated as fault features. Figure 4a shows a sample of faulty data with 1024 points. Figure 4b shows the degree distribution of its VG with 1024 nodes. There are several nodes with large values in the degree distribution. Such nodes are hubs of the graph and imply the corresponding points in the time series have large or small values.

Graph Index Complexity (GIC)
As a measure of complexity of a graph, GIC [30] is also involved for feature extraction in this study. GIC is defined as follows: where c = λ max − 2cos(π/(n + 1)) n − 1 − 2cos(π/(n + 1)) λ max represents the largest eigenvalue of the adjacency matrix of a graph with n nodes.  x connect x B x disconnect x

Fault Classification Based on GRSSL
Assume that there are labeled samples {(x 1 , y 1 ), · · · , (x l , y l )} and unlabeled samples {x l+1 , · · · , x l+u } derived from an identical independent distribution. Define X l = {x 1 , · · · , x l } and X u = {x l+1 , · · · , x l+u } as the set of labeled inputs and unlabeled inputs, respectively. l and u represent the number of labeled and unlabeled samples. Y = y ij ∈ B n×c is the label information matrix, where y ij = 1 if the sample x i is related to label j, j ∈ {1, 2, · · · , c}. Each sample can be marked with a label: c ∑ j=1 y ij = 1. The semi-supervised learning aims to identify the missing labels {y l+1 , · · · , y l+u } of unlabeled samples, where typically l << n(l + u = n); there are two parts of the graph-based rebalance semi-supervised learning (GRSSL): graph construction and label propagation.

Graph Construction
In this paper, assume the G = {X, E, W} represents the undirected graph from the data X = X l ∪ X u . There are usually two steps in the estimation of G from X.

1.
The sample similarities are calculated by using a kernel function and utilized to construct a full adjacency matrix K: K ∈ R n×n , K ij = k x i , x j 2.
The matrix K is c and reweighted to obtain the final matrix W.
The sparsification deletes edges of the matrix K by multiplying a binary matrix B ∈ B n×n and a distance matrix H ∈ R n×n . The binary and distance matrix are defined as follows: In this study, the k-nearest neighbor algorithm is employed to estimate the binary matrix B. The optimization procedure can be defined as follows: The final binary matrix is computed by: After the construction of the sparsified graph, the Gaussian Kernel Weighting algorithm is employed to calculate the weight matrix W. The weight between x i and x j is computed as follows: where, d x i , x j represent the Euclidean distance of x i and x j . Finally, a graph G with a weighted sparse adjacency matrix W is generated from the data.

Label Propagation
Given the constructed graph G = {X, E, W}, the purpose of label propagation is to diffuse the known labels to all unlabeled nodes X u in the graph and infer Y u . In this study, this issue is cast as a bivariate optimization over the classification function F and the unknown labels Y: s.t. y ij ∈ {0, 1}, ∑ c j=1 y ij = 1, j = 1, · · · , c. y ij = 1, f or label(x i ) = j, j = 1, · · · , c.
where Y is a binary matrix. The constraint ∑ c j=1 y ij = 1 indicates that it is a single label prediction issue, and (F * , Y * ) represents the optimal solution. The cost function can be specified as: where the first item ranks the function smoothness [31] over graph G, and the second item represents the loss to fit label matrix. The normalized graph Laplacian are as follows: where the vertex degree matrix The cost function can be rewritten as follows: where F i· denote the i'th row vectors of F. The bivariate issue is converted to a univariate issue in regards to the label variable Y. (1) Optimization of F: F is continuous, and the issue is convex while Y is fixed. The minimum is easy to be calculated by setting the partial derivative to zero: where, P = (L/µ + I) −1 is defined as the propagation matrix.
(2) Optimization of Y: The optimal F * is employed in Equation (10) instead of F.
To cope with the imbalanced data, a normalized label variable Y = ΛY is utilized to replace the original Y. The diagonal matrix Λ = diag([λ 1 , · · · , λ n ]) is imported to rebalance the importance of labels based on node degrees. In the formula (15), the degree of the vertex represents its importance in the graph. The majority class is supposed to correspond to a larger ∑ k y kj d k than that of the minority class, so the sample from majority has a small λ i that can reduce its importance in the graph. The sum of λ i of each class is equal to 1. That means that they shared the same importance in the graph. Therefore, the diagonal matrix is expected to eliminate the influence of the imbalanced data.
where d is the degree of the vertex, p is the prior distribution of the class and constrained to c ∑ j=1 p j = 1.
In this paper, p j = 1/c by default. The final optimization issue is rewritten as follows: The optimization is a NP hard problem which requires solving a linearly constrained binary integer programming(BIP) problem [32]. A forthright way to work out the minimization problem is to update the label variable Y with the gradient descent method. Wang and Jebara [16] has proved that the minimization problem equals to a Max K-Cut problem (K = c) over the graph G A = {X, A}.
Therefore, this study utilizes a greedy gradient Max-Cut algorithm to find local optima by selecting unlabeled vertices randomly and placing each of them into the appropriate class subset with minimum connectivity to maximize cross-set edge weights iteratively. The connectivity between unlabeled vertex xi and labeled subset S are defined as follows: S j = x i y ij = 1 , i = 1, 2, · · · , n; j = 1, 2, · · · , c The greedy gradient Max-Cut Algorithm 1. 3 Construct the initial cut S j by the initial labeled vertex X l :

Algorithm 1. Greedy Gradient Max-Cut
Unlabeled vertex set X u = XX l ; 5 repeat 6 for all j = 0 to |X u | do 7 calculate the connectivity: Update the cut S j by placing the vertex xi* to the Sj* subset: Add xi to X l : X l ← X l + x i ; 11 Delete xi from X u : X u ← X u − x i ; 12 Until X u = ∅ 13 Output: the final cut and the corresponding labeled subsets S j , j = 1, 2, . . . , cSj, j = 1, 2, . . . , c

Simulation Experiment for VGF
To demonstrate the performance of VGF, a faulty simulated rolling bearing vibration signal with different resonant frequency under variable speeds is generated and analyzed. The simulated signal can be generated as follows [33]: (19) where, α = 0.03 is the structural damping characteristic, f r = n s /60 (n s means the shaft speed) is the shaft rotation frequency, f m = 2000 Hz is the resonant frequency, A = f r /20 is the amplitude of the impulse. p represents the number of fault impulses in every shaft revolution (here p = 3.58). τ i denotes the randomness of rolling elements slippage, which is subject to a uniformly distribution with a zero mean and a standard deviation of ( 0.01 ∼ 0.02) f r . n(t) is a white Gaussian noise with a signal-to-noise ratio of 0 dB.
To simulate variable conditions, this study assumed that the shaft speed varies from 1500 to 1800 r/min, and the step size defaults to 30 r/min. Therefore, 11 groups of simulated signals were gathered with a sample frequency of 12 kHz, as shown in Figure 5.
To verify the VGF, the signals were segmented into samples with a fixed length (N = 1024). Then, GIC was extracted from the samples. For comparisons, Approximate Entropy [34], Sample Entropy [35], and Fuzzy Entropy [36] are involved in this study. This paper set the embedding dimension m = 2, similarity criterion r = 0.15 of the standard deviation of a signal, length of data N = 1024, and fuzzy power n = 2.
The averages of features are calculated. There are 11 values corresponding to different speeds for each method, as shown in Table 1. It is obvious that GIC has the smallest standard deviation which means it is natively insensitive to variable conditions. [35], and Fuzzy Entropy [36] are involved in this study. This paper set the embedding dimension m = 2, similarity criterion r = 0.15 of the standard deviation of a signal, length of data N = 1024, and fuzzy power n = 2.
The averages of features are calculated. There are 11 values corresponding to different speeds for each method, as shown in Table 1. It is obvious that GIC has the smallest standard deviation which means it is natively insensitive to variable conditions.

Experimental Setup
This study utilized the data collected by the Case Western Reserve University Bearing Data Center to verify the proposed method. The test rig is shown in Figure 6, the shaft is driven by a 2 hp (1470 W) reliance electric motor, a torque transducer, and an encoder mounted on the shaft (Details about the test rig and experimental data can be found in [37].). The tested bearings are 6205-2RS JEM SKF, deep groove ball bearings. The faults were introduced using electro-discharge machining ranging from 0.007 inches (0.178 mm) in diameter to 0.040 inches (0.355 mm) in diameter separately at the inner race, rolling element, and outer race. The location of faults at the fixed outer race was considered with 3 o'clock, 6 o'clock and 12 o'clock. With the motor loads ranging from 0 to 3 horsepower (0 to 2205 W) (motor speeds of 1797 to 1720 r/min), vibration data was recorded at sample frequencies of 12 kHz and 48 kHz. SKF, deep groove ball bearings. The faults were introduced using electro-discharge machining ranging from 0.007 inches (0.178 mm) in diameter to 0.040 inches (0.355 mm) in diameter separately at the inner race, rolling element, and outer race. The location of faults at the fixed outer race was considered with 3 o'clock, 6 o'clock and 12 o'clock. With the motor loads ranging from 0 to 3 horsepower (0 to 2205 W) (motor speeds of 1797 to 1720 r/min), vibration data was recorded at sample frequencies of 12 kHz and 48 kHz. To demonstrate the superiority of the proposed VGF-GRSSL method, this study randomly selected a set of imbalanced data with unlabeled samples under variable conditions for comparison and analysis. The dataset under the defect of 0.007 inches (0.178 mm) and sample frequencies of 12 kHz is showed in the Table 2. The vibration data was divided into four types including normal, inner race fault, outer race fault, and rolling element fault. All of them consist of four operating conditions with the load range from 0 to 3 horsepower (0 to 2205 W) and speed range from 1797 to 1720 r/min, respectively. Each sample contains 1024 points. Therefore, there are 3081 samples (1657 for normal, To demonstrate the superiority of the proposed VGF-GRSSL method, this study randomly selected a set of imbalanced data with unlabeled samples under variable conditions for comparison and analysis. The dataset under the defect of 0.007 inches (0.178 mm) and sample frequencies of 12 kHz is showed in the Table 2. The vibration data was divided into four types including normal, inner race fault, outer race fault, and rolling element fault. All of them consist of four operating conditions with the load range from 0 to 3 horsepower (0 to 2205 W) and speed range from 1797 to 1720 r/min, respectively. Each sample contains 1024 points. Therefore, there are 3081 samples (1657 for normal, 476 for inner fault, 475 for outer fault, and 473 for element fault). Generally speaking, the sampling time for each data sample is extremely short. For example, when the sample frequency is set to 12 kHz and the length of each data sample N = 1024, the sampling time is less than 0.085 s. During such a short time period, the operation condition is considered as constant. Therefore, the four different operation conditions of the Case Western data can be considered under variable conditions.

Fault Diagnosis Based on VGF-GRSSL
The VGFs were extracted, as showed in the Figure 7. It is observed that the locations of features are separated according to their fault modes, remarkably. Therefore, the VGF method not only can extract fault features, but also has strong robustness to variable conditions.

Fault Diagnosis Based on VGF-GRSSL
The VGFs were extracted, as showed in the Figure 7. It is observed that the locations of features are separated according to their fault modes, remarkably. Therefore, the VGF method not only can extract fault features, but also has strong robustness to variable conditions. Based on the obtained features, the GRSSL method is employed for fault classification. For comparisons, GFHF and LGC are involved in this study. This paper set the hyper-parameter µ = 0.01 for LGC and GRSSL. The algorithms GRSSL, GFHF, and LGC shared the same graph construction procedure. The standard k-nearest-neighbors (set k = 6) method was employed for the sparsification and Gaussian kernel weighting for reweighting the edges. In the Gaussian kernel, this study used the Based on the obtained features, the GRSSL method is employed for fault classification. For comparisons, GFHF and LGC are involved in this study. This paper set the hyper-parameter µ = 0.01 for LGC and GRSSL. The algorithms GRSSL, GFHF, and LGC shared the same graph construction procedure. The standard k-nearest-neighbors (set k = 6) method was employed for the sparsification and Gaussian kernel weighting for reweighting the edges. In the Gaussian kernel, this study used the 2 distance, and the kernel bandwidth σ is defined as the average distance between each selected sample and its k'th nearest neighbor.
In the case of the multi-class problem, the imbalanced ratio is defined as the number of majority samples divided by the sum of number of minority samples. It is assumed that the number of each minority class contains the same number of samples.
To demonstrate the performance of the proposed method under different imbalanced ratios, for the labeled data, we fix the minority classes (including inner fault, outer fault, rolling element fault) to have only three label samples and then randomly select m labels from the majority class (normal). For the unlabeled data, we keep it with the same imbalanced ratio as the labeled data. Here, we varied the ratio (r = m/3) from 1 to 20.
The results are shown in Figure 8 under three conditions (i.e., fault diameter 0.007, 0.014, and 0.021 inches (0.178 mm, 0.355 mm, and 0.533 mm)). In Figure 8a, it can be observed: (1) all three algorithms can obtain accuracy rates of nearly 100% when the imbalance ratio is less than 4, and (2) the accuracy of the GRSSL is more stable than the LGC and GFHF when the fault diameter is 0.007 inches (0.178 mm). Figure 8b illustrates results with a fault diameter of 0.014 inches (0.355 mm). The three methods obtained similar accuracy rates with varying imbalance ratios. Meanwhile, it is shown in Figure 8c that all methods achieved excellent accuracy rates except the GFHF with the imbalance ratio lower than 4. In summary, the GRSSL achieves a more stable accuracy rate than the LGC and GFHF under variable conditions with imbalanced unlabeled data.
inches (0.178 mm). Figure 8b illustrates results with a fault diameter of 0.014 inches (0.355 mm). The three methods obtained similar accuracy rates with varying imbalance ratios. Meanwhile, it is shown in Figure 8c that all methods achieved excellent accuracy rates except the GFHF with the imbalance ratio lower than 4. In summary, the GRSSL achieves a more stable accuracy rate than the LGC and GFHF under variable conditions with imbalanced unlabeled data. To illustrate the diagnostic performance of the proposed method exactly, we calculated the accuracy of each class, which is displayed in the Table 3. It is evident that the GRSSL method outperformed other methods under condition 1 and condition 3. However, all methods got a bad performance for the inner race fault and ball fault under the condition 2. But they recognized the normal samples and faulty samples at rate of 100%, which is of significance to industrial field application. Table 3. Results of LGC, GFHF, and GRSSL under different severity of fault with the imbalance ratio r = 16 (IRF-inner race fault, ORF-outer race fault, BF-ball Fault). To illustrate the diagnostic performance of the proposed method exactly, we calculated the accuracy of each class, which is displayed in the Table 3. It is evident that the GRSSL method outperformed other methods under condition 1 and condition 3. However, all methods got a bad performance for the inner race fault and ball fault under the condition 2. But they recognized the normal samples and faulty samples at rate of 100%, which is of significance to industrial field application.

Conclusions
Rolling element bearings are vital to rotating machineries. This paper proposes a novel hybrid method combined with visibility graph feature (VGF) and graph-based rebalance semi-supervised learning (GRSSL) for bearing fault diagnosis under variable conditions with imbalanced unlabeled data. Firstly, the visibility graph algorithm is used to extract features which are natively insensitive to variable conditions. Secondly, the GRSSL is utilized to make effective use of the imbalanced unlabeled data for fault classification. Experiment results have demonstrated the superiority of the proposed VGF and GRSSL: (1) Compared with approximate entropy, sample entropy, and fuzzy entropy, the VGF can effectively extract the fault characteristics under variable conditions; (2) GRSSL has superior performance in bearing fault diagnosis under variable conditions with imbalanced unlabeled data.
However, to some extent, the imbalanced ratio involved in this study only considered normal samples and faulty samples. The absence of faulty samples, such as inner race fault and outer race fault, may create new problems. Therefore, additional experiments under more different ratios should be done to validate and improve the method. Meanwhile, more attention should be paid to the development of semi-supervised learning diagnosis methods.
Author Contributions: X.C. collected and analyzed the data, made charts and diagrams, conceived and performed the experiments and wrote the paper; Z.W. and L.J. conceived the structure, provided guidance and modified the manuscript; Z.Z. analyzed the data and contributed analysis tools. Y.Q. provided guidance.