Supervised Manifold Learning Based on Multi-Feature Information Discriminative Fusion within an Adaptive Nearest Neighbor Strategy Applied to Rolling Bearing Fault Diagnosis

Rolling bearings are a key component for ensuring the safe and smooth operation of rotating machinery and are very prone to failure. Therefore, intelligent fault diagnosis research on rolling bearings has become a crucial task in the field of mechanical fault diagnosis. This paper proposes research on the fault diagnosis of rolling bearings based on an adaptive nearest neighbor strategy and the discriminative fusion of multi-feature information using supervised manifold learning (AN-MFIDFS-Isomap). Firstly, an adaptive nearest neighbor strategy is proposed using the Euclidean distance and cosine similarity to optimize the selection of neighboring points. Secondly, three feature space transformation and feature information extraction methods are proposed, among which an innovative exponential linear kernel function is introduced to provide new feature information descriptions for the data, enhancing feature sensitivity. Finally, under the adaptive nearest neighbor strategy, a novel AN-MFIDFS-Isomap algorithm is proposed for rolling bearing fault diagnosis by fusing various feature information and classifiers through discriminative fusion with label information. The proposed AN-MFIDFS-Isomap algorithm is validated on the CWRU open dataset and our experimental dataset. The experiments show that the proposed method outperforms other traditional manifold learning methods in terms of data clustering and fault diagnosis.


Introduction
With the rapid development of industrial intelligence, intelligent fault diagnosis technology is playing an increasingly important role in maintaining the health of mechanical equipment and ensuring its safe and stable operation [1].Rolling bearings, as critical components of rotating machinery, often operate in harsh and variable environments, such as at high speed and under heavy loads, making them prone to wear and failure and leading to severe mechanical accidents [2].To avoid significant economic losses and casualties, research on intelligent fault diagnosis methods for rolling bearings is particularly important [3].Fault diagnosis is fundamentally a pattern recognition problem, and analyzing and processing vibration signals during the operation of rolling bearings is an effective approach for diagnosing faults in rotating machinery.One of the most crucial aims is to break the "curse of dimensionality" and extract low-dimensional features with high sensitivity [4][5][6].
To enhance the quality of feature extraction and address the severe "curse of dimensionality" issue at the current stage, manifold learning algorithms have emerged.
Sensors 2023, 23, 9820 2 of 26 Classical manifold learning algorithms mainly include Principal Component Analysis (PCA), Independent Component Analysis (ICA), Linear Discriminant Analysis (LDA), and Multidimensional Scaling (MDS).However, these algorithms are primarily used for linear dimensionality reduction and may not be suitable for high-dimensional nonlinear vibration data from rolling bearings [7,8].
In the year 2000, Joshua B. Tenenbaum and Sam T. Roweis [9,10] proposed two classic nonlinear manifold learning dimensionality reduction algorithms, Isometric Feature Mapping (Isomap) and Locally Linear Embedding (LLE), in science.Since then, manifold learning algorithms have been extensively researched by researchers and have gradually become a research hotspot in the fields of dimensionality reduction and pattern recognition [9][10][11].Based on their different mathematical assumptions, manifold learning algorithms are divided into two major categories: locally preserving embedding methods and globally preserving embedding methods.Laplacian Eigenmaps (LE) [12], LLE [10], Hessian-based Locally Linear Embedding (HLLE) [13], and Local Tangent Space Alignment (LTSA) [14] are considered local preserving embedding methods in manifold learning, while Isomap [8], Diffusion Maps (DM) [2], and t-Stochastic Neighbor Embedding (t-SNE) [15] are regarded as globally preserving embedding methods.However, regardless of the manifold learning algorithm's mathematical assumptions, there is a bottleneck in selecting neighboring points [6,11].
To address the sensitivity of neighbor point selection in manifold learning algorithms, Zhenyue Zhang et al. conducted research from two perspectives: adaptive neighbor selection and the interaction between manifold curvature and sampling density [16].They explored methods for constructing nonlinear low-dimensional manifolds from highdimensional space samples, providing directions for subsequent researchers.Chuang Sun et al. conducted research from the perspective of adaptive neighbors and used the kernel sparse representation method to select sample neighbors and reconstruct the weights of the neighbor graph for the LLE algorithm [17].Yan Zhang et al. integrated nonnegative matrix factorization with sparsity constraints based on the work in reference [17] and applied it to the LLE algorithm to jointly minimize the neighborhood reconstruction error on the weight matrix [18].All of these methods use sparsity constraints to select neighbor points, but they perform relatively averagely when the data contain noise points and outliers.
To address this issue, Yunlong Gao et al. proposed a discriminant analysis based on the reliability of local neighborhoods, enhancing the performance of effective samples in low-dimensional space and filtering the interference of outliers, thereby improving the dimensionality reduction ability [19].Jing An et al. introduced an adaptive neighborhoodpreserving discriminant projection model [20].By updating sparse reconstruction coefficients, the adverse effects of noise and outliers on the dimensionality reduction were mitigated, enhancing sample clustering.Jiaqi Xue et al. proposed a locally linear embedding method by applying an adaptive neighbor strategy, preserving more original information when embedding high-dimensional data manifolds into low-dimensional space and achieving better clustering results [11].It can be observed that various discrimination methods have been widely applied in manifold learning models.Most of these adaptive neighbor strategies and discrimination methods are applied in locally preserving embedding methods of manifold learning algorithms, while their application in globally preserving embedding manifold learning algorithms is limited, especially in unsupervised learning models.
To address the aforementioned issue, incorporating label information into the algorithm's supervised learning mode can further enhance its clustering capability.Ratthachat Chatpatanasiri et al. proposed a general framework for manifold learning semi-supervised dimensionality reduction, providing research directions for subsequent researchers [21].Jing Wang et al. proposed a semi-supervised manifold alignment algorithm that utilizes sample points and their corresponding relationships to construct connections between different manifolds [22].Zohre Karimi et al. introduced a novel hierarchical spatial semi-supervised metric learning approach, integrating local constraints and information-theoretic nonlocal constraints to better represent the smoothness assumption of multiple manifolds using the metric matrix [23].Mingxia Chen et al. proposed a robust semisupervised manifold learning framework applied in locally preserving embedding manifold learning algorithms to eliminate adverse effects caused by noise points [24].Ding Li et al. derived an extension of a semi-supervised manifold regularization algorithm for classification tasks, optimizing the algorithm's performance on multi-class problems using weighted strategies [25].Jun Ma et al. proposed a secure semi-supervised learning framework, using both manifold and discriminant regularization to mitigate the influence of unlabeled points and boundary points in the pattern recognition process [26].However, the impact of unlabeled points and boundary points in the semi-supervised learning mode on the model's clustering and classification capabilities remains unresolved.
Therefore, researchers have applied supervised learning modes to manifold learning models, which, compared to manifold learning models under the semi-supervised learning mode, demonstrate stronger robustness in handling classification problems [27][28][29].However, current research methods are limited to dimensionality reduction and fault diagnosis tasks on a single feature space within the manifold learning model.The feature information in the data is singular and incomprehensive.
In summary, manifold learning methods have been widely applied in the fields of dimensionality reduction and fault diagnosis.However, they still have limitations.The issues of neighbor point selection in manifold global preservation embedding, the influence of data outliers on clustering effectiveness, and the singular and incomplete feature information contained in the data have not been fully addressed.To address these problems and build upon existing research, this paper proposes a supervised manifold learning approach for rolling bearing fault diagnosis based on the discriminative fusion of multiple pieces of feature information using an adaptive nearest neighbor strategy.
The main contributions of this paper are summarized as follows: 1. Propose an adaptive neighbor selection strategy that amalgamates the Euclidean distance and cosine similarity measures.This strategy systematically computes both the distance and angular information among neighboring points, utilizing the metric mean as the discriminant criterion.By configuring the preset neighboring points as the criterion object, it dynamically adjusts the proximity graph to refine the local structure of the manifold.This process is aimed at enhancing the precision of the manifold space depiction and local feature representation and reducing the adverse effects of data outliers on clustering performance.2. Propose three methods for transforming feature spaces and extracting spatial feature information and space information.Notably, this paper proposes a unique form of the kernel function, the exponential linear kernel function, which serves to project data into a novel kernel Hilbert space.Concurrently, this function is employed as the nonlinear discriminant mapping function in the Supervised Version of the Isometric Feature Mapping (S-Isomap) algorithm, thus providing a distinct representation of data in the manifold space.The extracted feature information, originating from diverse kernel Hilbert spaces and manifold spaces, ensures the intricate and sensitive nature of the features.3. Propose a fault diagnosis algorithm model for rolling bearings by employing a supervised learning paradigm.Under the adaptive neighbor selection strategy, features from different spaces are merged which are both sensitive and complex to form a multispace metric matrix.This matrix is designed to encapsulate substantial multi-space feature information, enabling its fusion with machine learning classifier algorithms to facilitate fault diagnosis.
The structure of this paper is organized as follows: Section 2 introduces the foundational manifold learning algorithms Isomap and S-Isomap along with their relevant theories.Section 3 presents the proposed supervised manifold learning method involving an adaptive neighbor strategy, the extraction of multi-space feature information, and the discriminative fusion of multiple pieces of feature information.Section 4 conducts an eval-uation of the model's clustering and classification capabilities, analyzing and comparing the proposed approach in this paper with traditional manifold learning methods from both qualitative and quantitative perspectives.Finally, Section 5 provides a comprehensive summary of the entire paper.

Related Work
The core idea of manifold learning is based on the manifold assumption, which posits that data are distributed on a smooth low-dimensional manifold embedded in a highdimensional space.Traditional manifold learning algorithms such as Isomap, LLE, and LTSA aim to find the embedded low-dimensional manifold within a high-dimensional space [30].This can be described mathematically as finding the mapping process, f , such that f : , where x i denotes the sample in a high-dimensional space, X; y i is the mapping of sample x i in a lowdimensional space, Y; N is the number of data points; D is the number of high-dimensional features; and d is the number of low-dimensional features.

Isometric Feature Mapping (Isomap)
Isomap, as one of the most traditional manifold learning algorithms, operates on the principle of preserving the global geometric properties of the intrinsic low-dimensional manifold to obtain a low-dimensional representation of the data.Isomap modifies the measurement method used in MDS, where the Euclidean distance describes the relationship between two data points, to a method based on the geodesic distance on the manifold [9].The Isomap algorithm is as follows: (1) Calculate the Euclidean distance, d E x i , x j , between any two data points.Then, use k-nearest neighbors (k-NNs) from the number information of samples or ε-nearest neighbors (ε-NNs) from the distance information of samples to construct a simple undirected nearest neighbor graph, G.If x i and x j are neighbors, then connect x i and x j in G and assign a weight, d E x i , x j , to the edge.(2) Based on the edges of the simple undirected nearest neighbor graph, G, use Dijkstra's algorithm or Floyd's algorithm to calculate the geodesic distances, d G x i , x j .(3) To establish the low-dimensional embedded manifold coordinates, Y, based on an objective function, the typical objective function for Isomap can be expressed as follows: where d E y i , y j represents the Euclidean distance of the mapping of the Euclidean distance, d E x i , x j , in a high-dimension space.(4) To compute the low-dimensional embedded manifold coordinates, Y, use MDS.To be specific, let , where H = I − ee T /N is the centered matrix and e is the column vector of all the matrices.Perform an eigenvalue decomposition on τ D 2 G to obtain the low-dimensional embedded manifold coordinates , where λ p and u p denote the p th dominant eigenvalue and its eigenvector, respectively [30].

Supervised Version of Isometric Feature Mapping (S-Isomap)
The S-Isomap algorithm incorporates label information from the data as prior knowledge to guide the dimensionality reduction process.Utilizing data labels as discriminant information, the data are initially divided into a true nearest neighbor set, S + , and a pseudo nearest neighbor set, S − , which guides the discriminant manifold learning process.S + and S − are defined as follows: Sensors 2023, 23, 9820 5 of 26 where L(x i ) denotes the class label of x i and N(x i ) are the neighborhood sets of x i .Next, based on S + and S − , we construct the true nearest neighbor graph, G + , and pseudo nearest neighbor graph, G − .Then, the objective of the S-Isomap algorithm is to preserve the intrinsic geometric structure of the data within the same class and to separate different classes by optimizing the following objective function: To find the optimal solution for the objective function mentioned earlier, the S-Isomap algorithm rescales the metric (Euclidean distance) between two data points as follows: where β is a parameter with relation to the scale of a dataset and α is a parameter which can adjust the intra-class dissimilarity sensitively.The two values α and β are usually determined empirically.Finally, we compute the low-dimensional embedded manifold coordinates, Y, using MDS [30].

Adaptive Nearest Neighbor Strategy
The value of the nearest neighbor count, k, serves as a hyperparameter in manifold learning algorithms, determining the size of local regions within the manifold.A larger k value may smooth or eliminate small-scale structures within the manifold.Conversely, a smaller k value might incorrectly partition a continuous manifold into disjoint submanifolds, thereby affecting the accuracy of the algorithm in approximating the global geometric structure and computing the metric matrix during the process.The appropriate choice of k is crucial to balance between capturing local details and preserving the overall manifold structure accurately.
In this study, we introduce an adaptive nearest neighbor strategy by integrating both distance and angular information among data points.This strategy incorporates the concepts of initial nearest neighbors, pseudo nearest neighbors, and true nearest neighbors.By combining the data distance and angular information, we calculate the edge weights between data points, which are used to construct a simple undirected nearest neighbor graph for true nearest neighbors.The adaptive nearest neighbor strategy can effectively suppress the adverse effect of data outliers on clustering performance.The algorithm's workflow is illustrated in Figure 1, and detailed steps of the algorithm are outlined as follows: concepts of initial nearest neighbors, pseudo nearest neighbors, and true nearest neighbors.By combining the data distance and angular information, we calculate the edge weights between data points, which are used to construct a simple undirected nearest neighbor graph for true nearest neighbors.The adaptive nearest neighbor strategy can effectively suppress the adverse effect of data outliers on clustering performance.The algorithm s workflow is illustrated in Figure 1, and detailed steps of the algorithm are outlined as follows: (1) Define the initial nearest neighbor count, k, and construct a simple undirected nearest neighbor graph, G; (2) Construct the cosine similarity matrix for the data.Compute the cosine similarity between two data points using the following equation to obtain angular information: where ,   denotes the vector s inner product, (3) Compute the normalized cosine similarity matrix by normalizing the angular information using the following equation:  (1) Define the initial nearest neighbor count, k, and construct a simple undirected nearest neighbor graph, G; (2) Construct the cosine similarity matrix for the data.Compute the cosine similarity between two data points using the following equation to obtain angular information: where •, • denotes the vector's inner product, • 2 denotes the L2-norm of the vector, and [•] N×k denotes the shape of the matrix, which is N × k. (3) Compute the normalized cosine similarity matrix by normalizing the angular information using the following equation: where 1 ≤ 1.5 − 0.5 (4) Construct the Euclidean distance matrix for the data.Compute the Euclidean distance between two data points using the following equation to obtain distance information: (5) Compute the normalized Euclidean distance matrix by normalizing the distance information using the following equation: where d E i• represents all elements of row i, d E .( 6) Construct the weight matrix by integrating the angular and distance information.
Reassign the edge weights in the simple undirected nearest neighbor graph, G, using the following equation to fuse the angular and distance information: (7) Compute the weight discrimination criterion.Calculate the average weight for each data point using the following equation to serve as the discrimination criterion: , then x j is considered a pseudo nearest neighbor of x i .This process leads to dynamically obtaining the true nearest neighbor values, k + , a simple undirected nearest neighbor graph, G + for true nearest neighbors, and a new set of rules for calculating the metric matrix.
The adaptive neighbor strategy proposed in this section offers a novel approach for dynamically selecting neighboring points in manifold learning.Essentially, it seamlessly combines k-NNs and ε-NNs by integrating the angular and distance information to derive weights.These weights are compared with the average weight, ultimately resulting in the construction of a simple undirected nearest neighbor graph, G + , for true nearest neighbors.

Kernel Trick
The data matrix in the original space, denoted as X = [x 1 , x 2 , • • • , x N ] T , consists of N observation vectors of the data [31].The kernel trick is a mathematical technique that utilizes a kernel function to map the data from the original space to a higher-dimensional Hilbert space, as follows: where φ(•) denotes the mapping of the kernel function.The data matrix in the higher-dimensional Hilbert space is computed through the inner product of the observation vectors using the kernel function.This can be seen as the metric matrix in the higher-dimensional Hilbert space, as follows:

The First Spatial Transformation and Feature Extraction
In this section, we use K-Isomap as the first spatial transformation method.This method considers the double centering process in Isomap as a kernel trick mapping process [32].Therefore, the objective function is as follows: To obtain the optimal solution for the objective function mentioned above, the problem is transformed into constructing a space metric matrix that better adheres to the manifold assumption.K-Isomap considers the geodesic distance matrix of the original data as the metric matrix during the double centering process.To ensure the semi-positive definiteness of the double-centered matrix, the Mercer kernel matrix method [33] is utilized to construct the first spatial metric matrix.The spectral radius of this matrix is computed to represent the first piece of spatial feature information.
Compared to the classical Isomap algorithm, the K-Isomap algorithm exhibits stronger robustness.The specific process of the algorithm is as follows: (1) Using the traditional Isomap algorithm, construct a simple undirected nearest neighbor graph, G.Then, calculate the geodesic distance, d G x i , x j , between data points x i and x j using the shortest path algorithm and define the matrix where 0 denotes the full-zero matrix and I denotes the unit matrix.Next, calculate its spectral radius, , where λ i are all the eigenvalues of the above block matrix; (3) Construct the Mercer kernel matrix as the first spatial metric matrix using the following equation: where the parameter ρ m 1 ≥ ρ * 1 ; (4) Perform eigenvalue decomposition and select the top λ 1 eigenvalues' corresponding to the eigenvectors Γ 1 as the first piece of spatial feature information for K 1 .Calculate the spectral radius ρ 1 of the first spatial metric matrix, K 1 , to represent the first piece spatial information.

The Second Spatial Transformation and Feature Extraction
In the current manifold learning algorithms, widely used kernel functions include the Gaussian kernel function and the linear kernel function.The Gaussian kernel function is suitable for addressing complex nonlinear problems; however, it is highly affected by hyperparameter noise.On the other hand, the linear kernel function is suitable for addressing issues related to stability and poor generalization capabilities but does not effectively solve complex nonlinear problems [34].In this section, we propose an exponential linear kernel function that combines the advantages of the above two kernel functions.The expression for the exponential linear kernel function is as follows: where the parameter > 0 denotes the linear weight and the parameter b denotes the offset coefficient.A detailed proof of the positive semi-definite property of the constructed kernel matrix for the exponential linear kernel function is provided in Section 6, Appendix A.
In this section, we employ the exponential linear kernel function to construct the second space.We utilize the Mercer kernel matrix method to construct the metric matrix for the second space.The spectral radius of this matrix is then calculated to represent the second piece of spatial feature information.The specific steps of the algorithm are as follows: (1) Apply the traditional Isomap algorithm to construct a simple undirected nearest neighbor graph, G, and calculate the Euclidean distance, d E x i , x j ; (2) Construct the exponential linear kernel matrix as follows, based on Equation (17): Sensors 2023, 23, 9820 Then, perform an eigenvalue decomposition and select the top λ 2 eigenvalues' corresponding to the eigenvectors Γ 2 as the second piece of spatial feature information for K Elinear .Let the spectral radius ρ 2 represent the second piece of spatial information.

The Third Spatial Transformation and Feature Extraction
In this section, we propose a KS-Isomap manifold learning algorithm as the third space transformation method.The KS-Isomap algorithm utilizes data label information to guide the construction of the manifold and employs kernel tricks to map the metric matrix to a more abstract feature space.The KS-Isomap algorithm combines the robustness of the K-Isomap algorithm with the discriminative power of the S-Isomap algorithm.
The KS-Isomap algorithm follows the structure of the S-Isomap algorithm.It utilizes label information to guide the construction of the manifold.Simultaneously, it replaces the original metric matrix with the geodesic distance matrix and maps the geodesic distance matrix to the kernel space.Therefore, the objective function of the algorithm is represented by the following equation: To obtain the optimal solution for the above objective function, the problem is transformed to construct a more sensitive discriminative distance matrix.The specific steps of the KS-Isomap algorithm are as follows: (1) Using the traditional Isomap algorithm, construct a simple undirected nearest neighbor graph, G.Then, calculate the geodesic distance, d G x i , x j , between data points x i and x j using the shortest path algorithm; (2) Construct the discriminative distance matrix based on Equations ( 6) and (18): And define where the parameter ϕ denotes the bias coefficient; (3) Construct the following block matrix: And calculate its spectral radius, ρ * 3 ; (4) Construct the Mercer kernel matrix as the metric matrix for the third space based on the following equation: where the parameter ρ m 3 ≥ ρ * 3 ; (5) Perform an eigenvalue decomposition and select the top λ 3 eigenvalues' corresponding to the eigenvectors Γ 3 as the third piece of spatial feature information for K 3 .Calculate the spectral radius ρ 3 of the first spatial metric matrix, K 3 , to represent the first piece of spatial information.

AN-MFIDFS-Isomap Fault Diagnosis Arithmetric
Currently, research on manifold learning algorithms is primarily confined to a highdimensional Euclidean space or kernel Hilbert space, and the potential of leveraging feature information from multiple spaces corresponding to the same data has been overlooked.Therefore, this study, by utilizing the adaptive neighbor selection strategy proposed in Section 3.1 to obtain the metric matrix and employing the method proposed in Section 3.2 for feature space transformation and feature information extraction, presents a fault diagnosis algorithm called AN-MFIDFS-Isomap.The algorithm's flowchart is illustrated in Figure 2, and the detailed steps of the algorithm are as follows: (1) Divide the data into a training set, X train , and a testing set, X test , preprocess the data, and construct the corresponding label information for each set; (2) For the training set, utilize the adaptive nearest neighbor strategy proposed in Section 3.1.Compute the geodesic distances under the rule of true nearest neighbors for the simple undirected neighbor graph, G + , constructing the geodesic distance matrix, i.e., the metric matrix in the original space; (3) Employ the first spatial transformation method proposed in Section 3.2.2 to obtain the first piece of spatial feature information, Γ 1 , and the first piece of spatial information, ρ 1 ; (4) Let the first piece of spatial information, ρ 1 , be the linear weights, , from Equation (19), incorporating the first piece of spatial information into the second spatial transformation method to obtain the second spatial feature information, Γ 2 , and the second spatial information, ρ 2 ; (5) Let the second piece of spatial information, ρ 2 , be the linear weights, of, the exponential linear kernel function from Equation (24), incorporating the first and second pieces of spatial feature information into the third spatial transformation method to obtain the third piece of spatial feature information, Γ 3 .Linearly combine the feature information from the three spaces to obtain Γ = [Γ 1 , Γ 2 , Γ 3 ] N * (λ 1 +λ 2 +λ 3 ) , (∑ λ i = D, i = 1, 2, 3) and define each element in the matrix Γ 2 as the square of the corresponding element in the matrix Γ. (6) Construct the block matrix as follows: And calculate its spectral radius, ρ * .Then, construct the Mercer kernel matrix according to the following equation, which serves as the metric matrix for the fused space: where the parameter ρ m ≥ ρ * ; (  and construct the corresponding label information for each set; (2) For the training set, utilize the adaptive nearest neighbor strategy proposed in Section 3.1.Compute the geodesic distances under the rule of true nearest neighbors for the simple undirected neighbor graph, G + , constructing the geodesic distance matrix, i.e., the metric matrix in the original space; (3) Employ the first spatial transformation method proposed in Section 3.2.2 to obtain the first piece of spatial feature information, 1 Γ , and the first piece of spatial information, 1 ρ ; (4) Let the first piece of spatial information, 1 ρ , be the linear weights, ϖ , from Equation (19), incorporating the first piece of spatial information into the second spatial transformation method to obtain the second spatial feature information, 2 Γ , and the sec- ond spatial information, 2 ρ ; (5) Let the second piece of spatial information, 2 ρ , be the linear weights, ϖ of, the ex- ponential linear kernel function from Equation (24), incorporating the first and second

Original Space
The First Space The Second Space The Third Space

The First Spatial Information
The Second Spatial Information The three indicators in Fisher's discriminant criterion, the inter-class scatter, S inter , the intra-class scatter, S intra , and Fisher's metric, F, help calculate the distances and separation level between data points in the low-dimensional space, describing the clustering effectiveness of the dimensionality reduction algorithm.The definitions of these three indicators are as follows: where w denotes all the one-column vectors, x i represents the mean vector for the i-th class, x represents the overall mean vector for all the sample points, x j i represents the j-th sample point in the i-th class, and L represents the number of classes.The confusion matrix records the complete results of the fault diagnosis, where the rows represent predicted labels and the columns represent true labels.
For the classification algorithm performance, we record the diagnostic process in detail using a confusion matrix and evaluate it using evaluation indexes such as accuracy, precision, recall, and the F1 score with the following expressions: where TP denotes a true positive and TN denotes a true negative.FP denotes the false positive, and FN denotes a false negative.

Experiment and Analysis
In this section, the effectiveness of the proposed algorithm is evaluated using two cases in the context of rolling bearings, which are essential components of rotating machinery.The diagnostic results are compared with eight other commonly used manifold learning algorithms.The experiments were conducted using Python 3 and PyCharm 2021 software, PyCharm Community Edition 2021.3.2, running on hardware equipped with an 11-th Gen Intel(R) Core(TM) i7-11850H CPU @ 2.50GHz.The vibration data used in this case study are sourced from the Case Western Reserve University (CWRU) Bearing Data Center, as depicted in Figure 3.The experimental setup comprises components such as a motor, accelerometer, torque sensor, bearing, and dynamometer.The data collection involved a deep groove ball bearing SFK 6205, with bearing health states including Normal (NO), Rolling Element Fault (RF), Inner Race Fault (IF), and Outer Race Fault (OF), totaling four distinct modes [35].
dicted labels and the columns represent true labels.
For the classification algorithm performance, we record the diagnostic process in detail using a confusion matrix and evaluate it using evaluation indexes such as accuracy, precision, recall, and the F1 score with the following expressions:

TP Precision TP FP
where TP denotes a true positive and TN denotes a true negative.FP denotes the false positive, and FN denotes a false negative.

Experiment and Analysis
In this section, the effectiveness of the proposed algorithm is evaluated using two cases in the context of rolling bearings, which are essential components of rotating machinery.The diagnostic results are compared with eight other commonly used manifold learning algorithms.The experiments were conducted using Python 3 and PyCharm 2021 software, PyCharm Community Edition 2021.3.2, running on hardware equipped with an 11-th Gen Intel(R) Core(TM) i7-11850H CPU @ 2.50GHz.

Data Description
The vibration data used in this case study are sourced from the Case Western Reserve University (CWRU) Bearing Data Center, as depicted in Figure 3.The experimental setup comprises components such as a motor, accelerometer, torque sensor, bearing, and dynamometer.The data collection involved a deep groove ball bearing SFK 6205, with bearing health states including Normal (NO), Rolling Element Fault (RF), Inner Race Fault (IF), and Outer Race Fault (OF), totaling four distinct modes [35].
The data in this case study were collected using an accelerometer, with the measurement point located at the motor drive end.The sampling frequency was set at 12 kHz.Detailed information regarding the rolling bearings is presented in Table 1.Additionally, the original vibration signals are depicted in Figure 4.The data in this case study were collected using an accelerometer, with the measurement point located at the motor drive end.The sampling frequency was set at 12 kHz.Detailed information regarding the rolling bearings is presented in Table 1.Additionally, the original vibration signals are depicted in Figure 4.The experimental data were resampled to select 160 samples for each operational state of the bearing as the training set.The sampling window had a length of 1024, and the sliding window size was set at 512.For each sample, 17 time-domain features and 12 frequency-domain features were extracted.The same methodology was applied to select 40 samples for each operational state of the bearing as for the testing set.

Model Parameters Setting and Implementation
To comprehensively evaluate the proposed method, it is compared with eight other algorithms: PCA, MDS, Isomap, LE, LLE, HLLE, LTSA, and T-SNE.Among them, PCA and MDS are two commonly used linear manifold learning algorithms that have shown good performance in fault diagnosis applications.Isomap is a classic algorithm for preserving the global embedding in manifold learning, and LLE is a classic algorithm for preserving the local embedding.LE, HLLE, and LTSA have been improved by researchers based on LLE and have shown good performance in dimensionality reduction.
The detail parameter descriptions of all the methods are listed in Table 2.The experimental data were resampled to select 160 samples for each operational state of the bearing as the training set.The sampling window had a length of 1024, and the sliding window size was set at 512.For each sample, 17 time-domain features and 12 frequency-domain features were extracted.The same methodology was applied to select 40 samples for each operational state of the bearing as for the testing set.

Model Parameters Setting and Implementation
To comprehensively evaluate the proposed method, it is compared with eight other algorithms: PCA, MDS, Isomap, LE, LLE, HLLE, LTSA, and T-SNE.Among them, PCA and MDS are two commonly used linear manifold learning algorithms that have shown good performance in fault diagnosis applications.Isomap is a classic algorithm for preserving the global embedding in manifold learning, and LLE is a classic algorithm for preserving the local embedding.LE, HLLE, and LTSA have been improved by researchers based on LLE and have shown good performance in dimensionality reduction.
The detail parameter descriptions of all the methods are listed in Table 2.  To ensure the completeness of the experiments, we used six different classifiers: a logistic regression, a decision tree, a random forest, a plain Bayes, k-nearest neighbors (KNNs), and a support vector machine (SVM).These classifiers were trained using the dimensionality-reduced data obtained from various methods.Subsequently, the trained classifiers were tested using the test set.

Methods
For each method, we applied the dimensionality reduction technique and then fed the reduced data into each of the six classifiers mentioned above.The classifiers were trained using the training data, and their performance was evaluated using the test data.

Diagnosis Results and Discussion
We conducted two comprehensive experiments to compare our proposed method with others.
In the first experiment, we applied all the experimental methods to perform a dimensionality reduction on the data and conducted a quantitative analysis of clustering capability using Fisher's discriminant criterion.In the second experiment, we separately trained different classifiers using the reduced data obtained from each method and conducted a quantitative analysis of the model's diagnostic accuracy.
In the first experiment, to intuitively demonstrate the superiority of our method in terms of its clustering capability, we applied our proposed method and eight other methods to perform a dimensionality reduction on the vibration data from different states of the bearing.In Figure 5, we present 3D scatter plots of the abstract feature space of the bearing data after the dimensionality reduction.
From Figure 5a-h, it can be observed that after the dimensionality reduction using the traditional manifold learning algorithms, there are varying degrees of overlap between samples of different categories in the 3D feature space.Conversely, after the dimensionality reduction, using our proposed method, the data in the 3D feature space exhibit a better separation.This demonstrates the strong clustering capability achieved through the fusion of multi-feature information.
To provide a more precise description of our method's clustering capability, we calculated Fisher's discriminant criterion for the dimensionality-reduced data from all the methods.Table 3 presents detailed indicators of the Fisher criterion for all the methods.From Table 3, it can be observed that our method has the largest between-class scatter and the smallest within-class scatter.Although our method does not have the maximum class separability (Fisher's criterion), it does have the largest Fisher measure (F), indicating that using our method for dimensionality reduction makes it easier to distinguish between data from different categories.This once again demonstrates the strong clustering capability of our method.In the second experiment, to demonstrate the better diagnostic accuracy of our algorithm, we applied our proposed method and eight other methods to diagnose the  In the second experiment, to demonstrate the better diagnostic accuracy of our algorithm, we applied our proposed method and eight other methods to diagnose the vibration data from different states of the bearing.Figure 6 illustrates the diagnostic accuracy of all the methods when using different classifiers.It can be observed that our proposed method exhibits high accuracy across various classifiers, reaching a maximum classification accuracy of 100%.We used a confusion matrix to document in detail the diagnostic process with the highest accuracy for the different fault diagnostic methods.Figure 7 presents the detailed contents of the confusion matrices.Evidently, most misclassified samples have overlapping fault characteristics, thus reducing the distinctiveness between the different categories of samples.Table 4 records the evaluation indexes at the best accuracy of each type of fault diagnosis method.It can be seen that from the data in Case 1 that our proposed method achieves good performance on all evaluation indexes.The bearing vibration dataset used in this case study was obtained from a laboratorybuilt experimental rig.The laboratory-built bearing experimental rig and its theory graph are illustrated in Figures 8 and 9, respectively.The setup comprises components such as a motor, DC driver, healthy bearing, experimental bearing, accelerometer, and loading system.The data collection was performed on a tapered roller bearing, model 33007.The bearing's health conditions encompass five modes, Normal (NO), Rolling Element Fault (RF), Inner Race Fault (IF), Outer Race Fault (OF), and Cage Fault (CF), as depicted in Figure 10.

Proposed
(g) (h) (i) The bearing vibration dataset used in this case study was obtained from a laboratorybuilt experimental rig.The laboratory-built bearing experimental rig and its theory graph are illustrated in Figures 8 and 9, respectively.The setup comprises components such as a motor, DC driver, healthy bearing, experimental bearing, accelerometer, and loading system.The data collection was performed on a tapered roller bearing, model 33007.The bearing's health conditions encompass five modes, Normal (NO), Rolling Element Fault (RF), Inner Race Fault (IF), Outer Race Fault (OF), and Cage Fault (CF), as depicted in Figure 10.The data used in this case were collected with an accelerometer, located at the testing bearing position, with a sampling frequency of 12 kHz.The motor speed was 600 rpm,   The data used in this case were collected with an accelerometer, at the testing bearing position, with a sampling frequency of 12 kHz.The motor speed was 600 rpm, The data used in this case were collected with an accelerometer, located at the testing bearing position, with a sampling frequency of 12 kHz.The motor speed was 600 rpm, and the axial and radial loads were 1 kN.This case focuses on diagnosing the vibration data of rolling bearings under bi-directional loads.Detailed information about the rolling bearing is provided in Table 5, and the original vibration signals are depicted in Figure 11.The experimental data were preprocessed using the same methods as in Case 1.

Model Parameters Setting and Implementation
To comprehensively evaluate the performance of the proposed method on the laboratory-built experimental rig, we continue to utilize a grid search to select the optimal hyperparameters for each algorithm.The detailed parameter descriptions for all the methods are provided in Table 6.This is to substantiate what was mentioned in Section 3.2.3.

Model Parameters Setting and Implementation
To comprehensively evaluate the performance of the proposed method on the laboratorybuilt experimental rig, we continue to utilize a grid search to select the optimal hyperparameters for each algorithm.The detailed parameter descriptions for all the methods are provided in Table 6.This is to substantiate what was mentioned in Section 3.2.3.

Diagnosis Results and Discussion
We conduct two experiments, similar to Case 1, to comprehensively compare the performance of our proposed method and other methods on the laboratory-built experimental rig.
In the first experiment, we apply all experimental methods to perform a dimensionality reduction on the data and quantitatively analyze the clustering ability using Fisher's discriminant criterion.In the second experiment, we train different classifiers separately using the dimensionally reduced data obtained by each method and analyze the model's diagnostic accuracy.Additionally, in the third experiment, we use a Gaussian kernel function, a linear kernel function, and the proposed linear kernel function to construct the second matrix for the reduction.then quantify the model's diagnostic accuracy using the best-performing classifier.
In the first experiment, Figure 12 visually demonstrates the superiority of our method in terms of its clustering ability.As observed in Figure 12a-h, the traditional manifold learning algorithms still exhibit varying degrees of overlap in the case of Case 2. On the other hand, our proposed method continues to perform well, showcasing its robust clustering ability when applied to different datasets.In the second experiment, to validate the superior performance of our algorithm in terms of its diagnostic accuracy, we applied our proposed method as well as eight other methods to diagnose faults in the Case 2 bearing vibration data.Figure 13 illustrates the diagnostic accuracy of all the methods when using different classifiers.It is evident that our method exhibits higher stability in terms of diagnostic accuracy across various classifiers, with the highest accuracy reaching 100%.To provide a more precise description of our method's clustering ability, we calculate Fisher's discriminant criterion for the dimensionally reduced data obtained by all the methods.Table 6 presents the detailed metrics based on the Fisher discriminant criterion for each method.A comparison between Tables 3 and 7 reveals an overall improvement in the F values for all the methods.Our proposed method maintains the highest F value, indicating excellent discriminability between different categories of data Case 2 after using our method for reduction.once strongly validates the robust clustering ability of our proposed method.In the second experiment, to validate the superior performance of our algorithm in terms of its diagnostic accuracy, we applied our proposed method as well as eight other methods to diagnose faults in the Case 2 bearing vibration data.Figure 13 illustrates the diagnostic accuracy of all the methods when using different classifiers.It is evident that our proposed method exhibits higher stability in terms of diagnostic accuracy across various classifiers, with the highest accuracy reaching 100%.For the fault diagnosis, we utilized the classifier that demonstrated the best accuracy for each method and recorded the diagnostic process using confusion matrices.Figure 14 provides detailed insights into the confusion matrices.Table 8 records the evaluation indexes at the best accuracy of each type of fault diagnosis method.It can be seen that for the data in Case 2, our proposed method achieves better performance on all the evaluation indexes than any other method.For the fault diagnosis, we utilized the classifier that demonstrated the best accuracy for each method and recorded the diagnostic process using confusion matrices.Figure 14 provides detailed insights into the confusion matrices.Table 8 records the evaluation indexes at the best accuracy of each type of fault diagnosis method.It can be seen that for the data in Case 2, our proposed method achieves better performance on all the evaluation indexes than any other method.In the third experiment, to showcase the superiority of our proposed kernel function over the traditional ones, we employed a linear kernel, a Gaussian kernel, and our proposed exponential linear kernel to extract features in the second space.Therefore, it is evident that the exponential linear kernel function satisfies the necessary and sufficient conditions for being a kernel function.

2 
denotes the L2-norm of the vector, and [ ] N k ×  denotes the shape of the matrix, which is N k × .

7 )
Apply classical MDS to compute the low-dimensional embedded manifold coordinates Y train for the training set using the metric matrix K F ; (8) Train a classifier using the low-dimensional embedded coordinates, Y, and their corresponding label information.Simultaneously, use a Multi-Layer Perceptron (MLP) to iteratively obtain the mapping matrix from the original training set, X, to the low-dimensional embedded manifold coordinates, Y; (9) Multiply the test set X test by the mapping matrix obtained through the MLP iterations to obtain the low-dimensional embedded manifold coordinates for the test set, denoted as Y test ; (10) Perform fault diagnosis on the low-dimensional embedded manifold coordinates for the test set Y test using the trained classifier.

( 1 )
Divide the data into a training set, train X , and a testing set, test X , preprocess the data,

Figure 3 .
Figure 3.The bearing test rig of CWRU.
dimension d = 4, the number of nearest neighbors k = 10, the offset coefficient b = 0.5, the bias coefficient ϕ = 0.5.The structure of MPL is 29-128-256-128-4 for the training data.The learning rate = 0.001; the iteration number of pre-training is 1000.The optimizer is Adam.PCA Intrinsic dimension d = 4. MDS Intrinsic dimension d = 4. Isomap Intrinsic dimension d = 5; the number of nearest neighbor k = 5.LE Intrinsic dimension d = 4; the number of nearest neighbor k = 5.LLE Intrinsic dimension d = 2; the number of nearest neighbor k = 30.HLLE Intrinsic dimension d = 3; the number of nearest neighbor k = 25.LTSA Intrinsic dimension d = 9; the number of nearest neighbor k = 30.T-SNE Intrinsic dimension d = 2; perplexity p = 30.

Figure 6 .Figure 6 .
Figure 6.Accuracy of all methods with different classifiers for Case 1.

Figure 8 .
Figure 8.The laboratory-built bearing experimental rig.Figure 8.The laboratory-built bearing experimental rig.

Figure 8 .
Figure 8.The laboratory-built bearing experimental rig.Figure 8.The laboratory-built bearing experimental rig.

Figure 9 .
Figure 9.The working schematic of laboratory-built bearing experimental rig.

Figure 10 .
Figure 10.Bearing health status: (a) Rolling Element Fault; (b) Inner Race Fault; (c) Outer Race Fault; (d) Cage Fault; (e) Healthy Bearing.The red box shows the exact shape of the bearing fault in detail.
dimension d = 3, the number of nearest neighbor k = 10, the offset coefficient b = 0.5, the bias coefficient ϕ = 0.5.The structure of MPL is 29-128-256-128-5 for the training data.The learning rate = 0.001; the iteration number of pre-training is 1000.The optimizer is Adam.PCA Intrinsic dimension d = 4. MDS Intrinsic dimension d = 4. Isomap Intrinsic dimension d = 4; the number of nearest neighbor k = 15.LE Intrinsic dimension d = 3; the number of nearest neighbor k = 25.LLE Intrinsic dimension d = 4; the number of nearest neighbor k = 15.HLLE Intrinsic dimension d = 3; the number of nearest neighbor k = 25.LTSA Intrinsic dimension d = 7; the number of nearest neighbor k = 25.T-SNE Intrinsic dimension d = 4; perplexity p = 15.

Figure 13 .
Figure 13.Accuracy of all methods with different classifiers for Case 2.

Figure 13 .
Figure 13.Accuracy of all methods with different classifiers for Case 2.
Figure 15 illustrates confusion matrices detailing fault diagnosis using different kernel functions for feature (A5) By performing a Taylor expansion on the fourth term of Equation (A4), we obtain: = e −b •e −

Table 1 .
Different work conditions of drive end bearings details.

Table 1 .
Different work conditions of drive end bearings details.

Table 2 .
Parameter settings of all methods in Case 1.

Table 3 .
The detailed Fisher statistical quantity of all methods for Case 1.

Table 4 .
The detailed evaluation indexes of all methods with the best accuracy for Case 1.

Table 5 .
Different work conditions of laboratory-built bearing experimental rig details.

Table 6 .
Parameter settings of all methods in Case 2.

Table 6 .
Parameter settings of all methods in Case 2.

Table 7 .
The detail Fisher statistical quantity of all methods for Case 2.

Table 8 .
The detailed evaluation indexes of all methods with the best accuracy for Case 2.

Table 8 .
The detailed evaluation indexes of all methods with the best accuracy for Case 2.