Fault Diagnosis for Rolling Bearings under Variable Conditions Based on Visual Cognition

Fault diagnosis for rolling bearings has attracted increasing attention in recent years. However, few studies have focused on fault diagnosis for rolling bearings under variable conditions. This paper introduces a fault diagnosis method for rolling bearings under variable conditions based on visual cognition. The proposed method includes the following steps. First, the vibration signal data are transformed into a recurrence plot (RP), which is a two-dimensional image. Then, inspired by the visual invariance characteristic of the human visual system (HVS), we utilize speed up robust feature to extract fault features from the two-dimensional RP and generate a 64-dimensional feature vector, which is invariant to image translation, rotation, scaling variation, etc. Third, based on the manifold perception characteristic of HVS, isometric mapping, a manifold learning method that can reflect the intrinsic manifold embedded in the high-dimensional space, is employed to obtain a low-dimensional feature vector. Finally, a classical classification method, support vector machine, is utilized to realize fault diagnosis. Verification data were collected from Case Western Reserve University Bearing Data Center, and the experimental result indicates that the proposed fault diagnosis method based on visual cognition is highly effective for rolling bearings under variable conditions, thus providing a promising approach from the cognitive computing field.


Introduction
Fault diagnosis is one of the hot topics in many fields, including machinery, power chemical engineering, aviation and space. It helps reduce the loss that component and system malfunction may cause [1]. Failures of rolling bearings, which are the most widely used bearings in industry, may cause unexpected machine breakdowns and thus result in economic loss, which is an issue that has attracted considerable attention of specialists and scholars [2].
Recently, various fault diagnosis have been proposed, including expert systems, spectrum analysis, fuzzy logic, statistical processing and so on [3,4]. Among these different processing techniques, vibration-based measurement is the most widely employed due to its high correlation with structure dynamics [5,6], and the vibration signals contain a wealth of information of the bearings. However, the working environment of rolling bearings is generally tough, complex, and especially variable,

Recurrence Plot
The prior work of introducing visual cognitive theory into the field of rolling bearing fault diagnosis is to achieve the transformation of unidimensional vibration signals into two-dimensional images. Image transformation is an important basis of successfully ensuring the extraction of features using VIC. Therefore, selection of a suitable image transformation method is especially important. On account of the nonlinear and non-stationary characteristics of rolling bearing signals, detection of dynamical changes in complex systems is one of the most difficult problems. Recurrence plot (RP), i.e., developing two-dimensional graphical plots that show the recurrence of states, is an important method for detecting dynamical changes [32]. It can uncover hidden periodicities in a signal in the recurrence domain that are not easily identified, and it is an important method for analyzing the periodic, chaotic and non-stationary properties of time series. This paper translates the vibration signals under variable conditions into RPs and extracts VIC features.

Recurrence Plot Theory
The recurrence plot algorithm utilized the concept of phase-space reconstruction, which can be described as follows: (1) For the time series u k (k = 1, 2, · · · N), whose sample interval is ∆t, we choose the Mutual Information Method and CAO algorithm to calculate the suitable embedding dimensionality m and delay time τ, which could reconstitute the time series. The reconstructed time series is as follows: x i = u i , u i+τ , · · · , u i+(m−1)τ , i = 1, 2, · · · N − (m − 1)τ (1) (2) Calculate the norm (e.g., the Euclidean norm) of ith x i and jth x j in reconstructed phase space [33]: S ij = x i − x j , i = 1, 2, · · · N − (m − 1)τ; j = 1, 2, · · · N − (m − 1)τ (2) (3) Calculate the recurrence value [34]: where ε is a predefined cut-off distance and H{r} is a Heaviside function: (4) Utilize a coordinate graph with an abscissa of i and ordinate of j to draw R(i, j), where i and j are the labels of the time series, and the image is an RP.
The details of the Mutual Information Method and CAO algorithm are as follows.
(1) Mutual Information Method The mutual information method is an effective method to estimate the delay time, which is widely used in the phase space reconstruction.
The Shannon theory shows that we can obtain the information content of a i from event b j : The relationship between a i and b j can be expressed with the comentropy I AB : Applying the theory of mutual information, set A is and set B is Then, Equation (6) translates into Usually at the beginning, I AB (τ) is very large, so we can obtain an infinite amount of information in x(t) = x(t + 0). x(t 0 + iτ x ) and x(t 0 + iτ x + τ) are completely independent for chaotic signals when τ is large; when τ → ∞ , I(τ) → 0 . We generally select the first minimum of mutual information entropy I AB (τ) as the delay time.
(2) CAO algorithm The CAO algorithm has excellent properties in that the algorithm can clearly distinguish real signal and noise and has a high computational efficiency. Firstly, we calculate the distance of the points under the embedded dimensionality: a(i, m) = u i (m + 1) − u n(i,m) (m + 1) u i (m) − u n(i,m) (m) , i = 1, 2, · · · , N − m (10) where · is the ∞ norm of the vector; u i (m + 1) is the ith vector after phase-space rebuilding, and the embedded dimensionality is m + 1; u n(i,m) (m + 1) is the nearest vector from u i (m + 1). Next, calculate the average value of distance change under the same dimensionality: where E(m) is the average value of all a(i, m). Last, according to the discriminant equation, when m > m 0 , E 1 (m) stops changing or changes slowly, and m 0 + 1 is the minimum embedding dimensionality.

Comparison between Recurrence Plot and Bi-spectrum
Nowadays, many methods have been proposed to transform vibration signals to two-dimensional images, including the spectrum analysis method, among which bi-spectrum has been extensively used in fault diagnosis [35]. However, the research of bi-spectrum analysis method mostly focuses on fault diagnosis in single working condition. Fault diagnosis methods for rotating machines working under variable working conditions based on bi-spectrum analysis method have been rarely reported. Moreover, bi-spectrum method is not useful in processing non-stable signals, which limits its application. Considering the advantages of recurrence plot in dealing with non-stable and non-linear signals, the application of recurrence plot in the field of fault diagnosis has attracted more and more attention of the researchers. To investigate the influence of working condition variation to recurrence plot and bi-spectrum, the authors conduct a simulation for comparison.
Given the fact that the frequency of the rolling bearing vibration signal changes when different faults occur, this study employs two sine functions of different frequency, x 1 = sin(2πt/8) and x 2 = sin(2πt/4), to simulate different fault types of rolling bearings. The simulation data are collected with a sampling frequency of 1 Hz and a sampling time of 100 s. Figures 1 and 2 show the simulation results of bi-spectrum and recurrence plot, respectively. Nowadays, many methods have been proposed to transform vibration signals to two-dimensional images, including the spectrum analysis method, among which bi-spectrum has been extensively used in fault diagnosis [35]. However, the research of bi-spectrum analysis method mostly focuses on fault diagnosis in single working condition. Fault diagnosis methods for rotating machines working under variable working conditions based on bi-spectrum analysis method have been rarely reported. Moreover, bi-spectrum method is not useful in processing non-stable signals, which limits its application. Considering the advantages of recurrence plot in dealing with non-stable and non-linear signals, the application of recurrence plot in the field of fault diagnosis has attracted more and more attention of the researchers. To investigate the influence of working condition variation to recurrence plot and bi-spectrum, the authors conduct a simulation for comparison.
Given the fact that the frequency of the rolling bearing vibration signal changes when different faults occur, this study employs two sine functions of different frequency,

Comparison between Recurrence Plot and Bi-spectrum
Nowadays, many methods have been proposed to transform vibration signals to two-dimensional images, including the spectrum analysis method, among which bi-spectrum has been extensively used in fault diagnosis [35]. However, the research of bi-spectrum analysis method mostly focuses on fault diagnosis in single working condition. Fault diagnosis methods for rotating machines working under variable working conditions based on bi-spectrum analysis method have been rarely reported. Moreover, bi-spectrum method is not useful in processing non-stable signals, which limits its application. Considering the advantages of recurrence plot in dealing with non-stable and non-linear signals, the application of recurrence plot in the field of fault diagnosis has attracted more and more attention of the researchers. To investigate the influence of working condition variation to recurrence plot and bi-spectrum, the authors conduct a simulation for comparison.
Given the fact that the frequency of the rolling bearing vibration signal changes when different faults occur, this study employs two sine functions of different frequency,  In Figures 1 and 2, it can be seen that, when the frequency of the sine function varies, i.e., the working condition of rolling bearing varies, the bi-spectrum figures exhibit notable variation in figure structure, whereas the recurrence plots maintain stable structure. The slight changes in two recurrence plots can be considered as scale zooming. The analysis results indicate that recurrence plot is more suitable for processing non-stationary and non-linear vibration signals.

VIC Theory
The HVS, which is the most intuitive tool to perceive the world, has recently gained considerable attention in the field of image processing [36]. Scientists believe that the essence of visual recognition is the perception of the invariant structure characteristics in the observed objects. Cayley et al. first introduced the theory of Algebraic Invariants and introduced it into the field of computing method, which initially formed the theory of visual invariance. At the Iceland conference in 1991, the theory of VIC was formally proposed [37]. The main concept of the theory is that: (1) images are composed of edge and texture details; and (2) the invariant is the essential description of the geometric structure of the object. The invariant is the most important geometric structure in the visual object, as it plays a key role in the recognition of the object.
The main reason why the theory of visual invariance is widely employed is that it is similar to the visual essence of human beings. Human visual perception is based on invariant features, meaning that the perception and recognition of human eyes to external objects cannot change with the rotation, scale variation, translation and brightness changes in the object, as shown in Figures 3 and 4. This is the most significant characteristic of the HVS [38], which indicates that human eyes recognize and understand the object based on the characteristic information of the object itself, and this does not change with rotation or scaling. It is precisely because of human vision capturing the invariant of the same target that people can recognize objects. In Figures 1 and 2, it can be seen that, when the frequency of the sine function varies, i.e., the working condition of rolling bearing varies, the bi-spectrum figures exhibit notable variation in figure structure, whereas the recurrence plots maintain stable structure. The slight changes in two recurrence plots can be considered as scale zooming. The analysis results indicate that recurrence plot is more suitable for processing non-stationary and non-linear vibration signals.

VIC Theory
The HVS, which is the most intuitive tool to perceive the world, has recently gained considerable attention in the field of image processing [36]. Scientists believe that the essence of visual recognition is the perception of the invariant structure characteristics in the observed objects. Cayley et al. first introduced the theory of Algebraic Invariants and introduced it into the field of computing method, which initially formed the theory of visual invariance. At the Iceland conference in 1991, the theory of VIC was formally proposed [37]. The main concept of the theory is that: (1) images are composed of edge and texture details; and (2) the invariant is the essential description of the geometric structure of the object. The invariant is the most important geometric structure in the visual object, as it plays a key role in the recognition of the object.
The main reason why the theory of visual invariance is widely employed is that it is similar to the visual essence of human beings. Human visual perception is based on invariant features, meaning that the perception and recognition of human eyes to external objects cannot change with the rotation, scale variation, translation and brightness changes in the object, as shown in Figures 3  and 4. This is the most significant characteristic of the HVS [38], which indicates that human eyes recognize and understand the object based on the characteristic information of the object itself, and this does not change with rotation or scaling. It is precisely because of human vision capturing the invariant of the same target that people can recognize objects.  Because of the influence of bionics to scientific progress, humans began to study visual cognition several decades ago. Visual object classification has been a long-term interest due to its important role in a variety of applications [39]. Because rolling bearings under variable conditions of the same failure mode may display similar image characteristics, we choose an image translation In Figures 1 and 2, it can be seen that, when the frequency of the sine function varies, i.e., the working condition of rolling bearing varies, the bi-spectrum figures exhibit notable variation in figure structure, whereas the recurrence plots maintain stable structure. The slight changes in two recurrence plots can be considered as scale zooming. The analysis results indicate that recurrence plot is more suitable for processing non-stationary and non-linear vibration signals.

VIC Theory
The HVS, which is the most intuitive tool to perceive the world, has recently gained considerable attention in the field of image processing [36]. Scientists believe that the essence of visual recognition is the perception of the invariant structure characteristics in the observed objects. Cayley et al. first introduced the theory of Algebraic Invariants and introduced it into the field of computing method, which initially formed the theory of visual invariance. At the Iceland conference in 1991, the theory of VIC was formally proposed [37]. The main concept of the theory is that: (1) images are composed of edge and texture details; and (2) the invariant is the essential description of the geometric structure of the object. The invariant is the most important geometric structure in the visual object, as it plays a key role in the recognition of the object.
The main reason why the theory of visual invariance is widely employed is that it is similar to the visual essence of human beings. Human visual perception is based on invariant features, meaning that the perception and recognition of human eyes to external objects cannot change with the rotation, scale variation, translation and brightness changes in the object, as shown in Figures 3  and 4. This is the most significant characteristic of the HVS [38], which indicates that human eyes recognize and understand the object based on the characteristic information of the object itself, and this does not change with rotation or scaling. It is precisely because of human vision capturing the invariant of the same target that people can recognize objects.  Because of the influence of bionics to scientific progress, humans began to study visual cognition several decades ago. Visual object classification has been a long-term interest due to its important role in a variety of applications [39]. Because rolling bearings under variable conditions of the same failure mode may display similar image characteristics, we choose an image translation Because of the influence of bionics to scientific progress, humans began to study visual cognition several decades ago. Visual object classification has been a long-term interest due to its important role in a variety of applications [39]. Because rolling bearings under variable conditions of the same failure

SURF Theory
Recognition of the images that are rotating, scaling and translating refers to finding the stable points of the images. These points, such as corners, blobs, T-junctions and light spots in dark regions, do not disappear with rotating, scaling, translation and brightness changes. Scale-invariant feature transform (SIFT) is the corresponding computing method of VIC that can identify the invariant features and realize image matching. In 2006, Herbert Bay et al. improved SIFT and presented a novel scale and rotation invariant detector and descriptor called the speed up robust feature (SURF). SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster [40].

Theory of Scale Space
Scale space was first proposed by Iijima in 1962, and after the popularization by Witkin and Koenderink, scale space gradually gained attention and became widely used in the field of computer vision. The basic theory of scale space is that a scale parameter is introduced into the pattern information model and the scale space under multi-scale is obtained by continuously changing the scale parameter. The principal contours are extracted as eigenvectors to realize the detection of margins and corners and the extraction of different resolutions.

Integral Image Generation
Integral images, which can be used to fleetly conduct the function of box type convolution filtering is one of the main advantages of SURF. The entry of an integral image I ∑ (x) at a location x = (x, y) T represents the sum of all pixels in the input image I within a rectangular region formed by the origin and x [36].
Once the integral image has been computed, it takes three additions to calculate the sum of the intensities over any upright rectangular area, as shown in Figure 5 [41]. Hence, SURF uses block filtering instead of Gauss filtering, and it can greatly improve the efficiency of the algorithm.

SURF Theory
Recognition of the images that are rotating, scaling and translating refers to finding the stable points of the images. These points, such as corners, blobs, T-junctions and light spots in dark regions, do not disappear with rotating, scaling, translation and brightness changes. Scale-invariant feature transform (SIFT) is the corresponding computing method of VIC that can identify the invariant features and realize image matching. In 2006, Herbert Bay et al. improved SIFT and presented a novel scale and rotation invariant detector and descriptor called the speed up robust feature (SURF). SURF approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster [40].

Theory of Scale Space
Scale space was first proposed by Iijima in 1962, and after the popularization by Witkin and Koenderink, scale space gradually gained attention and became widely used in the field of computer vision. The basic theory of scale space is that a scale parameter is introduced into the pattern information model and the scale space under multi-scale is obtained by continuously changing the scale parameter. The principal contours are extracted as eigenvectors to realize the detection of margins and corners and the extraction of different resolutions.

Integral Image Generation
Integral images, which can be used to fleetly conduct the function of box type convolution filtering is one of the main advantages of SURF. The entry of an integral image represents the sum of all pixels in the input image I within a rectangular region formed by the origin and x [36]. (13) Once the integral image has been computed, it takes three additions to calculate the sum of the intensities over any upright rectangular area, as shown in Figure 5 [41]. Hence, SURF uses block filtering instead of Gauss filtering, and it can greatly improve the efficiency of the algorithm. Figure 5. Functionality of the integral image.

Interest point localization
SURF utilizes the local maximum value of the approximate Hessian matrix of the determinant to locate the interest points. When the Hessian determinant is the local maximum, the detected the interest point. At the point x(x, y), which is in the original image, the Hessian matrix H(x, σ) with a scale of σ at x is defined as follows: where L xx (x, σ) is the convolution of the Gaussian second-order derivative ∂ 2 ∂x 2 g(σ) with the image I at point x, and similarly for L xy (x, σ) and L yy (x, σ).
Simple box filters using the integral image are used to approximate the second-order Gaussian partial derivation and have less computation burden, as shown in Figure 6. Box filters can be quickly calculated by the integral image, and the calculation amount is not related to the template size, which improves the computational efficiency of SURF.
Simple box filters using the integral image are used to approximate the second-order Gaussian partial derivation and have less computation burden, as shown in Figure 6. Box filters can be quickly calculated by the integral image, and the calculation amount is not related to the template size, which improves the computational efficiency of SURF. When we use of the second-order differential Gaussian function to filter and a template size of 9 9  as the smallest scale space to detect the points, the determinant of the Hessian matrix is After simplification, the matrix becomes where 0.9 is used to balance the Hessian determinant.
To realize the scale invariance of the interest points, SURF applies box filters of different scales on the original image to obtain the Hessian matrix response in terms of structures of the scale pyramid, as shown in Figure 7. When we use σ = 1.2 of the second-order differential Gaussian function to filter and a template size of 9 × 9 as the smallest scale space to detect the points, the determinant of the Hessian matrix is Det(H) = L xx L yy − L xy L xy (15) After simplification, the matrix becomes where 0.9 is used to balance the Hessian determinant.
To realize the scale invariance of the interest points, SURF applies box filters of different scales on the original image to obtain the Hessian matrix response in terms of structures of the scale pyramid, as shown in Figure 7. As in the difference of Gaussian picture of SIFT, there are many layers referred to as octaves in the resolution pyramid, and several pictures of different scales remain in an octave. The size of pictures is unaltered, and the pictures in different octaves are obtained by changing the box filter size. In this way, SURF saves time during the down-sampling and improves the operation efficiency.
To determine the interest points, non-maximum suppression in a 3 3 3

 
neighborhood, which is shown in Figure 8, is employed. Each pixel processed by the Hessian matrix is compared with 26 points of its three-dimensional neighborhood to obtain the maximum or minimum as the preliminary feature points. The extreme point of discrete space is not a real extreme point. Figure 9 shows the difference of the extreme point of a two-dimensional function in discrete and continuous space. SURF utilizes the linear interpolation method to obtain accurate interest points. As in the difference of Gaussian picture of SIFT, there are many layers referred to as octaves in the resolution pyramid, and several pictures of different scales remain in an octave. The size of pictures is unaltered, and the pictures in different octaves are obtained by changing the box filter size. In this way, SURF saves time during the down-sampling and improves the operation efficiency.

Scale
To determine the interest points, non-maximum suppression in a 3 × 3 × 3 neighborhood, which is shown in Figure 8, is employed. Each pixel processed by the Hessian matrix is compared with 26 points of its three-dimensional neighborhood to obtain the maximum or minimum as the preliminary feature points. As in the difference of Gaussian picture of SIFT, there are many layers referred to as octaves in the resolution pyramid, and several pictures of different scales remain in an octave. The size of pictures is unaltered, and the pictures in different octaves are obtained by changing the box filter size. In this way, SURF saves time during the down-sampling and improves the operation efficiency.
To determine the interest points, non-maximum suppression in a 3 3 3

 
neighborhood, which is shown in Figure 8, is employed. Each pixel processed by the Hessian matrix is compared with 26 points of its three-dimensional neighborhood to obtain the maximum or minimum as the preliminary feature points. The extreme point of discrete space is not a real extreme point. Figure 9 shows the difference of the extreme point of a two-dimensional function in discrete and continuous space. SURF utilizes the linear interpolation method to obtain accurate interest points. The extreme point of discrete space is not a real extreme point. Figure 9 shows the difference of the extreme point of a two-dimensional function in discrete and continuous space. SURF utilizes the linear interpolation method to obtain accurate interest points.

Interest Point Description
To guarantee the rotation invariance, the main directions of all of the interest points are required. We first calculate the Haar wavelet responses in the x and y directions within a circular neighborhood with a radius of 6σ around the interest point, where σ is the scale at which the interest point was detected [40]. Regarding the interest point as the center, the sum of all points of the Haar wavelet responses within a sliding orientation window of size 60 • are the new vector. The longest such vector over all windows defines the orientation of the interest point, as shown in Figure 10.  The description of the interest point needs to split up the region regularly into smaller 4 × 4 square sub-regions. For each sub-region, we compute the Haar wavelet responses at 5 × 5 regularly spaced sample points. Finally, Haar wavelet processing is performed in each area to calculate the Haar wavelet response in the x and y directions, i.e., d x and d y , as shown in Figure 11. Then, the wavelet responses d x and d y are summed over each sub-region and form the first set of entries in the feature vector, thus the four-dimensional descriptor vector v = ∑ d x ,∑ |d x |, ∑ d y ,∑ d y is obtained. Concatenating this for all 4 × 4 sub-regions results in a descriptor vector with a length of 64.

MPC and Manifold Learning
It is generally known that through the visual system, humans can receive a great wealth of information of the surrounding world with changes in visual angles and objects. In fact, neurophysiologists have found that trigger rates of nerve cell groups can be described by functions consisting of a small number of variables. They consider that the neuronal population activity is controlled by the underlying low dimensional manifold structure [42]. According to the theory of manifold perception, the visual perception process is based on manifold topological continuity. When the scale, position, illumination, and other factors are changed continuously, the image set by the same object will be located on a low-dimensional manifold in the high-dimensional observation space. Furthermore, Ref. [18] noted that the HVS receives images through visual cells, and the information received by each visual cell lies on a high-dimensional manifold space; however, the brain only receives partial information, which lies on the low-dimensional manifold. That is, the HVS has the ability to sense the information hidden in the high-dimensional manifold. Manifold learning can find meaningful low-dimensional structures hidden within high-dimensional observations, and this concept has attracted increasing research. Inspired by MPC, when the features extracted by VIC are processed, the manifold learning method can be employed to discover the essential manifold features.
Manifold learning, the most important part of the nonlinear dimensionality reduction method, has attracted widespread attention. This method can embed high-dimensional samples into low-dimensional feature space by preserving some local or global geometric structures. In recent years, many manifold learning methods have been proposed, including Isometric mapping (Isomap), Locally Linear Embedding (LLE), Laplacian Eigenmaps (LE) and Local tangent space alignment (LTSA) [43]. Isomap is the method utilized in this paper for the remaining globalized features.

Isomap Theory
Isomap, which is based on the classical multidimensional scaling (MDS) method, is a global nonlinear manifold dimensionality reduction method. MDS utilizes Euclidean distance as the measure of data points, however Isomap strives to retain the intrinsic geometric characteristics of the data by acquiring the geodesic distance between all points. This method preserves geodesic proximities using a non-Euclidean metric and, as such, maintains nonlinear features of the original data that are lost in traditional empirical orthogonal function analysis [44]. The geodesic distance employed by Isomap can be described by Swiss Roll, which is shown as Figure 12. In Figure 12a, the blue dotted line represents the Euclidean distance between two sample points, however, it cannot

MPC and Manifold Learning
It is generally known that through the visual system, humans can receive a great wealth of information of the surrounding world with changes in visual angles and objects. In fact, neurophysiologists have found that trigger rates of nerve cell groups can be described by functions consisting of a small number of variables. They consider that the neuronal population activity is controlled by the underlying low dimensional manifold structure [42]. According to the theory of manifold perception, the visual perception process is based on manifold topological continuity. When the scale, position, illumination, and other factors are changed continuously, the image set by the same object will be located on a low-dimensional manifold in the high-dimensional observation space. Furthermore, Ref. [18] noted that the HVS receives images through visual cells, and the information received by each visual cell lies on a high-dimensional manifold space; however, the brain only receives partial information, which lies on the low-dimensional manifold. That is, the HVS has the ability to sense the information hidden in the high-dimensional manifold. Manifold learning can find meaningful low-dimensional structures hidden within high-dimensional observations, and this concept has attracted increasing research. Inspired by MPC, when the features extracted by VIC are processed, the manifold learning method can be employed to discover the essential manifold features.
Manifold learning, the most important part of the nonlinear dimensionality reduction method, has attracted widespread attention. This method can embed high-dimensional samples into low-dimensional feature space by preserving some local or global geometric structures. In recent years, many manifold learning methods have been proposed, including Isometric mapping (Isomap), Locally Linear Embedding (LLE), Laplacian Eigenmaps (LE) and Local tangent space alignment (LTSA) [43]. Isomap is the method utilized in this paper for the remaining globalized features.

Isomap Theory
Isomap, which is based on the classical multidimensional scaling (MDS) method, is a global nonlinear manifold dimensionality reduction method. MDS utilizes Euclidean distance as the measure of data points, however Isomap strives to retain the intrinsic geometric characteristics of the data by acquiring the geodesic distance between all points. This method preserves geodesic proximities using a non-Euclidean metric and, as such, maintains nonlinear features of the original data that are lost in traditional empirical orthogonal function analysis [44]. The geodesic distance employed by Isomap can be described by Swiss Roll, which is shown as Figure 12. In Figure 12a, the blue dotted line represents the Euclidean distance between two sample points, however, it cannot reflect the "real distance" of the two samples. The geodesic distance expressed by the blue line can truly reflect the distance of the two samples which are in the manifold space. The red line in Figure 12b is the approximate geodesic distance of the two samples which is calculated by the shortest path algorithm. We can see in Figure 12c, in two-dimensional spatial distribution, the approximate geodesic distance expressed by red line is really close to the real distance expressed by the blue line.
Materials 2017, 10, 582 12 of 23 reflect the "real distance" of the two samples. The geodesic distance expressed by the blue line can truly reflect the distance of the two samples which are in the manifold space. The red line in Figure  12b is the approximate geodesic distance of the two samples which is calculated by the shortest path algorithm. We can see in Figure 12c, in two-dimensional spatial distribution, the approximate geodesic distance expressed by red line is really close to the real distance expressed by the blue line. In Isomap, the followed method is utilized to approximately calculate the real geodesic distance. For a sample point in the data set, the geodesic distance in its neighborhood is replaced by Euclidean distance, while the geodesic distance out of its neighborhood is replaced by the shortest path on manifold. The shortest path can be calculated by Dijkstra algorithm or Folyd algorithm. The main steps of Isomap are as follows:

Calculate the shortest path
Supposing the shortest path is then the shortest path distance matrix

Calculate embedding dimensionality
The last step applies the classical MDS to the distance matrix G D , constructing the d-dimensional embedding space Y while retaining the expected intrinsic geometric features of the manifold. This is achieved by minimizing the cost function: where matrix transformation operator   In Isomap, the followed method is utilized to approximately calculate the real geodesic distance. For a sample point in the data set, the geodesic distance in its neighborhood is replaced by Euclidean distance, while the geodesic distance out of its neighborhood is replaced by the shortest path on manifold. The shortest path can be calculated by Dijkstra algorithm or Folyd algorithm. The main steps of Isomap are as follows:

Structure neighborhood relation graph G(V, E)
For each sample point x i (i = 1, 2, · · · , N), the Euclidean distance between the sample and the other sample points is calculated. When x j is one of the k-nearest points to x i or the Euclidean distance d x i , x j between x j and x i is less than a fixed threshold ε, x i x j is regarded as the edge of graph G, and the weight of edge x i x j is d x i x j .

Calculate the shortest path
Supposing the shortest path is d G x i , x j = d x i x j when G has an edge x i x j , otherwise then the shortest path distance matrix

Calculate embedding dimensionality
The last step applies the classical MDS to the distance matrix D G , constructing the d-dimensional embedding space Y while retaining the expected intrinsic geometric features of the manifold. This is achieved by minimizing the cost function: where matrix transformation operator τ(D G ) = − HSH 2 , and S is the matrix that contains the square of each element in D. H = I − ee T /n, e = [1, 1, · · · , 1] T . Denote the eigenvalue of τ(D G ) as s 1 , s 2 , · · · (descending order), and v 1 , v 2 , · · · is the corresponding column vector. Thus, the row vector of Y = √ s 1 v 1 , √ s 2 v 2 , · · · , √ s d v d is the most suitable d-dimensional embedded coordinate.

Insprition from Visual Cognition
Based on the above introduction of related theories, we can know that the recurrence plot has good performance in processing non-station and non-linear vibration signals. Therefore, a two-dimensional image can be transformed from vibration signals based on recurrence plot technique, which can reflect the fluctuation of working conditions on the variation of image scale, illumination, translation, and so on.
As a representative computing method based on VIC of the HVS, SURF can successfully find the stable points in images, which have strong robust to image scale, illumination, translation, etc. Inspired by this, stable fault features contained in the transformed two-dimensional images can be extracted by SURF method, which are also robust to variable working conditions.
Considering the SURF fault feature vector have a high dimensionality, which may result in high computation consumption, the manifold learning method, Isomap, can be utilized to excavated low-dimensional manifold structure hidden in the high-dimensional dataset. Finally, the stable fault feature vector is formed.
Based on the extracted low-dimensional stable fault feature vector, a fault classifier can easily realize fault diagnosis for rolling bearing under variable working conditions.

Method for Rolling Bearing Fault Diagnosis under Variable Conditions Based on Visual Cognition
We utilize the following four steps, which are shown in Figure 13, to execute fault diagnosis under variable conditions. First, all of the vibration data under different conditions are transformed into RPs, which facilitates the use of the visual cognitive method to extract features. Then, SURF extracts interest points of all RPs and generates descriptors for fault diagnosis. Nevertheless, because of the fault intrinsic features embedded in the high-dimensional space, an intrinsic manifold is established using Isomap. The features of different fault modes under the same condition are used to train the SVM classifier and classify the fault mode under the other conditions, after which cross validation is indispensable.

Description of the Rolling Bearing Experimental Data
Rolling bearing data collected by the Case Western Reserve University Bearing Data Center are used for testing and verifying the proposed method in the experiment [45]. As shown in Figure 14, the test-rig consists of a 2 hp (1 hp = 735 W) motor (left), a torque transducer/encoder (center), a dynamometer (right), and control electronics (not shown) [46]. The test-rig also includes both the drive end (DE) and fan end (FE) bearings of the 6205-2RS JEM SKF and NTN equivalent bearings, on which the single point faults were introduced using electro-discharge machining with fault diameters of 7 mils, 14 mils, 21 mils, 28 mils, and 40 mils (1 mil = 0.001 inches, 1 inch = 0.0254 m), respectively. To quantify the outer raceway fault stationary effect, experiments were conducted for both the FE and DE bearings with outer raceway faults located at 3 o'clock, 6 o'clock and 12 o'clock.  For data acquisition, vibration data under the normal, inner-race fault, outer-race fault, and rolling element fault conditions were collected by means of accelerometers, which were attached to the housing with magnetic bases and placed at the 12 o'clock position at both the drive end and fan  For data acquisition, vibration data under the normal, inner-race fault, outer-race fault, and rolling element fault conditions were collected by means of accelerometers, which were attached to the housing with magnetic bases and placed at the 12 o'clock position at both the drive end and fan For data acquisition, vibration data under the normal, inner-race fault, outer-race fault, and rolling element fault conditions were collected by means of accelerometers, which were attached to the housing with magnetic bases and placed at the 12 o'clock position at both the drive end and fan end of the motor housing. Digital data, which were post-processed in a Matlab environment, were collected at 12,000 samples and 48,000 samples per second for the DE and FE bearing faults, respectively [46].
In this test-rig, the motor speed was varied between 4 conditions. To verify the efficiency of the proposed approach, the typical data were selected (4.8 kHz sampling rate, 0.021 inch and outer race fault at 6 o'clock) and are described in Table 1. The speed was varied to change the conditions of the bearings so as to realize data collection under variable conditions.

Image Transformation of Vibration Signals for Visual Cognition
The vibration signals of the rolling bearings are one of the easier acquired signals contained in the masses of information of rolling bearings. Choosing an appropriate signal processing method can allow for obtaining the required features and contribute to the fault diagnosis. To explore the features of different failure modes under variable conditions using these vibration signals, the vibration data were transformed into an image for visual cognition. As previously mentioned, RPs can uncover hidden periodicities in a signal in the recurrence domain, which are not easily noticeable, and it is important that the method analyzes the periodic, chaotic and non-stationary elements of the time series; thus, RPs are very suitable for the image transformation of vibration signals without loss of signal information.
Because of the need for calculating the ergodic Euclidean distance of x i and x j in the reconstructed phase space and the restriction of the computer calculation speed, only 1000 points of vibration signals are chosen each time to transform into an RP, and the transformed RP is a black-white image with dimensionalities of N × N (the sizes of different fault modes under different conditions show very slight differences).

Feature Extraction Based on SURF and Isomap
This section describes the feature extraction method of vibration data under variable conditions based on SURF. By utilizing SURF, the transformed images from the vibration signals are extracted interest points, which are described with directions and vectors. The direction is determined by the longest vector, which is the sum of all points of the Haar wavelet responses in the x and y directions within the neighborhood around the interest point within a sliding orientation window of 60 • . The descriptor is described in the 4 × 4 sub-regions with a vector v = ∑ d x , ∑ |d x |, ∑ d y , ∑ d y , where ∑ d x and ∑ d y are the sums of the Haar wavelet response in the x and y directions in each sub-region, and ∑|dx| and ∑ d y are the sums of the absolute values of the Haar wavelet responses in the x and y directions in each sub-region, respectively. Therefore, the descriptor is a 64-dimensional vector.
Due to the visual information appearance on the intrinsic manifold, which is embedded in the high-dimensional space R m constructed by the high-dimensional descriptor, the aforementioned Isomap method is employed to reduce the number of dimensionalities. Given an arbitrary point in R m , the corresponding data point representing the fault mode on the intrinsic manifold can be obtained through mapping g = f −1 . However, the features of the low-dimensional intrinsic manifold are also too large and complex to be taken as feature vectors. To solve this problem and improve the robustness of the feature vectors, singular value decomposition (SVD) was utilized in this paper to compress the scale of the fault feature vectors and obtain more stable feature vectors. Therefore, mapping g = f −1 from the high-dimensional space to the intrinsic manifold could give an SVD scatter diagram that can be shown in three-dimensional space by selecting the first three dimensionalities.

Fault Classification Based on SVM
In this paper, the classic classifier SVM is selected to classify the extracted features from vibration signals, which are processed by SURF. The libsvm toolbox in Matlab, which was developed by Zhiren Lin, was used to find the optimal classification plane H of different fault modes. In terms of multi classification, the libsvm toolbox searches for the hyperplane between two arbitrary types of fault features through the one-versus-one method, and then the classifier is established. Next, the features of four types of fault modes under the same condition are employed as training data to classify the features of the other three conditions, and cross validation is also necessary.

Results and Discussion
To  Table 2. To analyze the influence of conditions on RPs, for each fault mode of four different conditions, a set of test data are randomly selected to generate RPs for comparison and analysis.
The transformed RPs are shown in Figure 15. Different rows represent different conditions; the first column shows the RPs in the normal state, the second column shows the RPs in the inner fault state, the third column shows the RPs in the rolling element fault state, and the fourth column shows the RPs in the outer fault state.   In Figure 14, we can see that the RPs of different fault mode under different conditions have different structural characteristics, while the same fault mode under different conditions have a strong similarity structure. Affected by the change of conditions, RPs under different conditions show the translation variation (shown as the first column of the third row and the fourth row), scale variation (shown as the second column of the first row and the second row) and the combination of these changes.
After the image transformation, SURF is employed to extract features from the transformed images. The extracted interest points under variable working conditions of each failure mode are shown in Figure 16. Each RP generates a 64-dimensional SURF feature vector after the feature extraction. Since the dimensionality of the feature vector is very high, it may cause huge computational consumption and low diagnosis accuracy. According to MPC, the low-dimensional manifold space can effectively highlight the features which are not easy to explore in high dimensional space, therefore Isomap is employed to reduce the feature dimensionality. To explore the relationship between the classification accuracy and the reduced dimensionality, the 64-dimensional feature vector is reduced to two-dimensional SVM is employed as the fault classifier to realize the fault classification of the rolling bearing under different working conditions. Cross validation is utilized to verify the accuracy of the proposed method; that is, the vibration data collected under each condition are orderly selected as training data, and the data collected under the other three conditions are as test data, as shown in Table 3. In each cross validation, the training data and test data are composed as follows: Each RP generates a 64-dimensional SURF feature vector after the feature extraction. Since the dimensionality of the feature vector is very high, it may cause huge computational consumption and low diagnosis accuracy. According to MPC, the low-dimensional manifold space can effectively highlight the features which are not easy to explore in high dimensional space, therefore Isomap is employed to reduce the feature dimensionality. To explore the relationship between the classification accuracy and the reduced dimensionality, the 64-dimensional feature vector is reduced to two-dimensional SVM is employed as the fault classifier to realize the fault classification of the rolling bearing under different working conditions. Cross validation is utilized to verify the accuracy of the proposed method; that is, the vibration data collected under each condition are orderly selected as training data, and the data collected under the other three conditions are as test data, as shown in Table 3. In each cross validation, the training data and test data are composed as follows: Training data: 20 groups of data of each fault mode under one condition, totally 80 groups of data. Test data: 20 groups of data of each fault mode under the other 3 conditions, totally 240 groups of data, where 1-80 groups of data are the first condition, 81-160 groups of data are in the second condition, and 161-240 groups of data are in the third condition. Table 4 shows the classification accuracy of the cross validation in different dimensionality. From Table 4, we can see that the lowest classification accuracy value is 92.083%. Moreover, the classification accuracy increases with the reduced dimensionality increasing. The classification accuracy reaches the highest when the reduced dimensionality reaches 8, in which the classification accuracy values of four groups of cross validation are above 99%, indicating that the correct fault detection probability is very high. To visually display the features of different fault modes under different conditions, the authors draw the feature scatter diagram in the two-dimensional and three-dimensional space, as shown in Figures 17 and 18, respectively. Training data: 20 groups of data of each fault mode under one condition, totally 80 groups of data.
Test data: 20 groups of data of each fault mode under the other 3 conditions, totally 240 groups of data, where 1-80 groups of data are the first condition, 81-160 groups of data are in the second condition, and 161-240 groups of data are in the third condition. Table 4 shows the classification accuracy of the cross validation in different dimensionality. From Table 4, we can see that the lowest classification accuracy value is 92.083%. Moreover, the classification accuracy increases with the reduced dimensionality increasing. The classification accuracy reaches the highest when the reduced dimensionality reaches 8, in which the classification accuracy values of four groups of cross validation are above 99%, indicating that the correct fault detection probability is very high. To visually display the features of different fault modes under different conditions, the authors draw the feature scatter diagram in the two-dimensional and three-dimensional space, as shown in Figures 17 and 18, respectively.

Conclusions
This study proposes a fault diagnosis method for rolling bearings under variable conditions based on visual cognition. First, this method transforms the vibration signals into images, which are expressed in RPs. The core step is to employ the VIC of the HVS to extract the stable features of different RPs transformed from vibration signals under different working conditions. After that, we utilize the Isomap method to establish the intrinsic manifolds embedded in the high-dimensional space, which reveal the clusters of features. Lastly, the classical classification method SVM is employed to execute fault diagnosis.
The proposed fault diagnosis method is a multidisciplinary method which successful introduces visual cognition into the field of fault diagnosis. From the experiment results based on data from the bearing data center of Case Western Reserve University, the following conclusions can be reached: This study provides a promising approach to machine fault diagnosis from the cognitive computing field, especially for those rotating machineries which usually work in a variable working condition. Therefore, it is of great significance to engineering practice.
In the future, more characteristics of the HVS and the possibility of being introduced into the field of fault diagnosis could be investigated. Moreover, the authors plan to apply this method to more types of rotating machineries to further verify its general applicability. Author Contributions: Yujie Cheng and Bo Zhou developed the methodology and performed the experiments. Chen Lu provided professional guidance to complete the research work. Yujie Cheng wrote the paper. Chao Yang contributed as the manuscript advisor.

Conflicts of Interest:
The authors declare no conflict of interest.

Conclusions
This study proposes a fault diagnosis method for rolling bearings under variable conditions based on visual cognition. First, this method transforms the vibration signals into images, which are expressed in RPs. The core step is to employ the VIC of the HVS to extract the stable features of different RPs transformed from vibration signals under different working conditions. After that, we utilize the Isomap method to establish the intrinsic manifolds embedded in the high-dimensional space, which reveal the clusters of features. Lastly, the classical classification method SVM is employed to execute fault diagnosis.
The proposed fault diagnosis method is a multidisciplinary method which successful introduces visual cognition into the field of fault diagnosis. From the experiment results based on data from the bearing data center of Case Western Reserve University, the following conclusions can be reached: This study provides a promising approach to machine fault diagnosis from the cognitive computing field, especially for those rotating machineries which usually work in a variable working condition. Therefore, it is of great significance to engineering practice.
In the future, more characteristics of the HVS and the possibility of being introduced into the field of fault diagnosis could be investigated. Moreover, the authors plan to apply this method to more types of rotating machineries to further verify its general applicability.