Intelligent Early Fault Diagnosis of Space Flywheel Rotor System

Three frequently encountered problems—a variety of fault types, data with insufficient labels, and missing fault types—are the common challenges in the early fault diagnosis of space flywheel rotor systems. Focusing on the above issues, this paper proposes an intelligent early fault diagnosis method based on the multi-channel convolutional neural network with hierarchical branch and similarity clustering (HB-SC-MCCNN). First, a similarity clustering (SC) method is integrated into the parameter-shared dual MCCNN architecture to set up as the basic structural block. The hierarchical branch model and additional loss are then added to SC-MCCNN to form a hierarchical branch network, which simplifies the problem of fault multi-classification into binary classification with multi-steps. Based on the self-learning characteristics of the proposed model, the unlabeled data and the missing fault types in the training set are re-labeled to realize the re-training of the network. The results of the experiments for comparing the abilities between the proposed method and several advanced deep learning models confirm that on the established early fault dataset of the space flywheel rotor system, the proposed method successfully achieves the hierarchical diagnosis and presents stronger competitiveness in the case of insufficient labeled data and missing fault types at the same time.


Introduction
The space flywheel rotor system, as the rotating support core of the attitude control system, is extremely important in satellite equipment. A catastrophic consequence would occur once the flywheel system malfunctions [1][2][3]. Pair preload space ball bearings [4] are generally used in space flywheel rotor systems. This type of bearing uses a porous oil-impregnated cage [5] to realize the internal micro-circulation lubrication [6], and is unable to be maintained for life.
The operational accuracy of the flywheel could be reduced by various factors, such as the abnormal rubbing of cages [7], wear [8], or the surface damage defects [9] of bearings, which could further lead to the failure of the attitude control system. Given the complex types of early faults and the insufficient labeling data and missing fault types in flywheel rotor systems, the intelligent early fault diagnosis of space flywheel rotor systems becomes a challenging task.
Furthermore, the electromechanical rotary system and the space flywheel rotor system are both rotary mechanisms. However, due to the unique operating conditions and precision requirements of the latter, its fault characteristics are not entirely the same as the former. Replicating many fault diagnosis methods from the former to the latter poses several challenges [10][11][12][13][14].
In 2015, Lecun et al. [15] proposed deep learning (DL) in nature. Due to the features of automatic feature learning, powerful pattern classification, and the reduction in the need for prior knowledge and artificial experience, the DL is more intelligent and adaptive. Both the wide application in many fields and the amazing results achieved [16][17][18] reflect the excellence of DL.
Some researchers have designed many intelligent bearing diagnosis methods by introducing DL [19][20][21][22][23], especially the method of rolling bearing fault diagnosis based on CNN. However, these studies generally assume that there are sufficient tag data samples and no missing fault types in the training dataset [24,25]. In practical applications, it is generally difficult or costly to obtain a sufficient number of fault data samples, while unlabeled and normal data samples are relatively easy to obtain. This results in an imbalance in the number of data samples of various types manifested as missing fault types and insufficient labeled data samples [26][27][28][29]. Therefore, the assumptions of those studies are unreasonable and would have a negative impact on the practical applications.
Transfer learning and unsupervised learning are currently the most commonly used methods to address the lack of labeled data. Liao [30] proposed a cross-domain fault diagnosis method based on the dynamic distribution adaptive transfer network (DDATN) to solve the problem of the impact of differences between marginal distribution and conditional distribution on domain divergence. This method utilizes the instance-weighted dynamic maximum mean difference for dynamic distribution adaptation to adapt the target domain to the source domain. Zhang et al. [31,32] monitored tool wear and bearing fault under different operating conditions, and proposed multi-label transfer reinforcement learning (ML-TRL) by integrating the feature extraction capabilities of deep reinforcement learning (DRL) and the knowledge transfer capabilities of transfer learning. Li [33,34] proposed a fault diagnosis method based on deep learning for rotating machinery with a small amount of supervised (labeled) data and sufficient unsupervised data. Through confrontation training between feature extractors and domain discriminators, the problem of domain generalization in fault diagnosis was solved. The reliability of the proposed method was verified through the CWRU rolling bearing dataset and the high-speed train bogie bearing dataset.
In summary, current research on fault diagnosis using transfer learning and unsupervised learning mainly focuses on the issue of insufficient tag data, but there is very little research on the issue of missing fault types in training sets, and the effect of transfer learning is easily affected by the amount of fully labeled source domain data. The core idea of unsupervised learning is to use incomplete data modeling, utilize the self-learning characteristics of the constructed network model, relabel unlabeled data, and retrain the network model. Given this, this paper draws on the core idea of unsupervised learning and focuses on the accuracy of relabeling in the case of insufficient labeled data and missing fault types. Moreover, we propose an intelligent early fault diagnosis method for space flywheel rotor systems based on HB-SC-MCCNN. The main contributions of this paper are as follows:

1.
Integrating similarity clustering into a dual MCCNN architecture with shared parameters to achieve accurate fault detection in the event of insufficient labeled data (binary classification, i.e., determining whether there is a fault in the rotor system).

2.
Simplifying the multi-classification problem to a multi-step binary classification problem by introducing a hierarchical branch, and accurate bearing fault location (multiclassification) is achieved when labeled data are insufficient.

3.
The SC model enables the convolutional network model to self-learn new fault types and realizes the relabeling of missing fault types in the training dataset.
The rest of this paper is organized as follows: we present the issues studied in this paper in Section 2. Section 3 describes the proposed intelligent early fault diagnosis method for space flywheel rotor systems based on HB-SC-MCCNN in detail. Then, in Section 4, an experimental evaluation of the model is conducted using the established early fault dataset of the satellite flywheel rotor system, and the results are discussed. Section 5 summarizes this paper.

Problem Formulation
represent the monitoring dataset (labeled data with the operation status of flywheel rotor system), where n represents the number of samples in the dataset, represents the normal dataset of the system, X N i is the normal data sample, N i is the corresponding operation status normal label, and n N is the number of normal data samples.
, T Fw , T Fi , T Fo , T Fb represents the early fault datasets for the cage rubbing, bearing wear, inner ring, outer ring, and surface damage of the balls, respectively. Finally, D is randomly divided into the training set, validation set, and test set, namely, D train , D validation , and D test .
This paper aims to construct a multi-classifier f for the operation status of the space wheel rotor system based on D train and D validation , to determine whether there is a fault in the system (binary classification) and to locate the fault (multi-classification) while satisfying the small number of ntrain (ntrain F <ntrain N ) and whether there is a lack of fault types in the training set. Based on the self-learning characteristics of the mode, the data in the validation set are relabeled, and the model is retrained. The diagnostic results of the model are validated by D test .

Overview
To solve the multi-classification problem of the early fault diagnosis of the space wheel rotor system in the case of insufficient labeled data and missing fault types, we propose an intelligent fault diagnosis method based on the HB-SC-MCCNN model. Figure 1 shows the overview of the proposed method. The process of the method is as follows: • Step 1: Multi-source signals acquisition. Multi-source signals include the operation trajectory of the cage, dynamic friction torque, and vibration. By utilizing complementary information from different data sources at the same time, the complete and consistent information description of the rotor system can be effectively improved, making fault multi-classification more accurate and reliable.

•
Step 2: Dataset dividing. The original vibration signals are directly divided into the training set, verification set, and test set without any signal processing or feature extraction. • Step 3: Model training. The HB-SC-MCCNN model proposed in this paper is trained by training samples and provides a relabeling function for the validation set data after preliminary training, and is then retrained. • Step 4: Intelligent early fault diagnosis. The trained model in Step 3 analyzes the data in the test set, determines whether there is a fault in the rotor system, and locates the type of fault.

HB-SC-MCCNN
The model architecture of HB-SC-MCCNN is shown in Figure 2. This model directly takes original multi-source data as input and predicts the early faults through the hierarchical mechanism of multiple branch output layers, which outputs the status and fault location of the space wheel rotor system simultaneously.

Stratification of the Rotor System
According to the actual diagnostic requirements, we divide the faults into three levels: fault detection (a binary classification problem, which determines whether the rotor system has a fault), fault evaluation (a multi-classification problem, which determines the type of fault in the rotor system), and fault location (a multiple classification problem, which determines the specific part where the fault occurs). The hierarchical structure is shown in Table 1 (we use "level" to represent different levels in a hierarchical structure, and "layer" to represent layers in a neural network). By dividing the data categories hierarchically, we can limit the errors that may occur in diagnosis to a subcategory. For example, the model may not be able to distinguish whether the fault occurred in the inner or outer ring, but it can distinguish whether there is a fault in the rotor system. We integrate similarity clustering into a parameter-shared dual MCCNN architecture (as shown in Figure 3 and Table 2) and use it as the basic structural block of HB-SC-MCCNN.  Unlike the Softmax function, which is connected after the full connection layer to the output classification labels in traditional convolutional networks, the SC adopted in this paper is used to cluster the relevant features extracted from MCCNN through similarity measures. SC not only effectively reduces the dependence on the amount of label data during classification but also utilizes the characteristics of SC comparison clustering to enable convolutional neural networks to have the self-learning ability for new fault types.
It is worth noting that to ensure that the same type of data has better relevant characteristics in the target space, we establish a dual MCCNN architecture with parameter sharing. In a single iteration, each channel of the MCCNN input data belongs to the same type.
The process of the method is as follows: Step 1: Relevant characteristics. An MCCNN with input channel number c i and output channel number c o is constructed (c i > c o > 1) [35]. The kernel arrays assigned to each input channel with the shape of k h × k w are concatenated on input channel dimension to obtain the convolutional kernel with the shape of c i × k h × k w . The input multi-channel samples and convolutional kernel are subjected to correlation operations to obtain the two-dimensional cross-correlation features.
It is worth noting that it is not feasible to replace MCCNN with the single-channel CNN [36,37], which is commonly used in current intelligent fault diagnosis methods. If the characteristics of the input data in the target space do not correlate, the similarity clustering effect will be affected, especially in the case of the number of training samples being insufficient.
Step 2: Similarity measure. E(x 1 ,x 2 ) can be considered an "energy" function [38], used to measure the relevant featuresx 1 andx 2 extracted by MCCNN and the similarity between them, specifically defined as G(x) in the above equation is the one-dimensional correlation feature vector output after the input sample enters the convolutional layer and fully connected layer of MCCNN.
It is worth noting that it is not feasible to replace the L 1 norm with a square norm for E(x 1 ,x 2 ). If E(x 1 ,x 2 ) is the square norm of the difference betweenx 1 andx 2 , when E(x 1 ,x 2 ) approaches 0, the gradient of E(x 1 ,x 2 ) relative to the model parameters will disappear [39].
Step 3: Parameter optimization. The loss function of the model is defined as where (Y, x 1 , x 2 ) i is the i-th sample, consisting of input data x 1 , x 2 and the label Y. The Y in Equation (2) is a binary label (0 or 1). When x 1 and x 2 belong to the same type, Y = 1. When x 1 and x 2 belong to different types, Y = 0, and x 2 is represented as x 2 . In supervised and semi-supervised learning, the label of the training data is known, so the optimal value of label Y can be determined.θ is the optimized result, and argmin θ (θ) represents an operational process, which involves selecting appropriate model parameters θ to make (θ) reach the minimum. When uniting Equations (1) and (2), the loss function could be indirectly related to the input data and model parameters through E(x 1 ,x 2 ): where s represents the partial loss function of the same type, while d represents the partial loss function of the different types.
s and d are designed in the principle that the minimization of l will decrease the energy of the same type and increase the energy of the different type. A simple way to achieve that is to make s monotonically increasing and d monotonically decreasing.
To meet the more general conditions and ensure that the same type of data and different types of data always have the boundary E(x 1 ,x 2 ) + m < E x 1 ,x 2 (where the integer m is the boundary), the exact loss function is expressed as where the constant Q is the upper boundary of E(x 1 ,x 2 ), which is determined by the onedimensional correlated feature vectors G(x) output by TCCNN, according to Equation (1) . It is worth noting that there must be a contrastive term in the loss function of the model to ensure that the "energy" from the same type of data is low, while the "energy" from different types of data is high. Once the contrastive term disappears, the energy and the loss can be made zero by simply making G(x 1 ) a constant function [40,41].
It is worth noting that in supervised and semi-supervised learning, the label of training data is known to the user, so the optimal value of label Y can be determined.
Then, we could optimize model parameters θ through the small-batch random gradient descent algorithm [42] as Equation (3): where η > 0 is the learning rate, and B is the batch size.

Branch Structure
The introduction of a branch structure perfectly integrates the hierarchical structure of MCCNN network features [43] with the natural classification level of the fault data of the space wheel rotor system [44] and diagnoses faults at different classification levels.
The hierarchical structure of MCCNN network features refers to the fact that the lower layers of MCCNN typically capture low-level features, while the higher layers can extract high-level features. Therefore, each layer of MCCNN contains a hierarchical structure of network features.
The natural classification hierarchy of the rotor system fault data could be explained by taking the established early fault dataset of the satellite flywheel rotor system as an example. In this benchmark dataset, the normal and fault states of the system are easy to distinguish, but it is difficult to recognize the fault of the inner ring, outer ring, or balls' surface. The reason is that normal and faulty system vibration signals belong to different rough categories, while vibration signals of different fault types belong to the same rough category. For the convenience of expression, only vibration signals are used as an example here. The normal bearing vibration signal belongs to irregular random vibration. However, when a surface damage-like defect occurs in a bearing of the rotor system, if the damage point rolls over the surface of the bearing, sudden shock vibrations will occur, which results in periodic shock pulses in the vibration signal.
Compared to the traditional multi-classification method (as shown in Figure 4), HB-SC-MCCNN simplifies the multi-classification problem into a multi-step binary classification problem by introducing the branch structure, which effectively reduces the dependence on the amount of labeled data when directly performing multi-classification as shown in Figure 5.

Description of the Dataset
The dataset adopted for experiments is obtained from the established visual multiperformance monitoring rig for the space flywheel rotor system, which is mainly composed of four parts-rotating driver, signal collector, A/D converter, and upper computer-as shown in Figure 6. And the signal collector mainly includes a high-speed camera, a dynamic friction torque instrument, and a vibration sensor. Furthermore, the partial parameters of the used sensors are shown in Table 3.   Figure 7 shows the structure of the rotor system as the experimental object. And Table 4 lists the main parameters of the testing bearing modeled B7004C that works with a speed of 6000 r/min and an axial load of 100 N. The marking points distribute concentrically and uniformly on the bearing cage capture and track marks of the running trajectory of the cage used by the high-speed camera systems. The sampling frequency of the experiments is 10 kHz, with 47 ,690 sampling points. It can be seen that the samples in the dataset are all early faults of the rotor system, that is, real faults generated during ground development and testing.  The normal states of the cage and bearing raceway are shown in Figures 8 and 9, with no abnormal contact or rubbing marks on the outer surface of the cage, and uniform contact points in the cage pockets. In addition, the contact area between the inner and outer rings of the bearing and balls during operation is normal, and no defects such as wear and surface damage can be observed there.  Figure 10 shows the early fault of cage rubbing. The abnormal rub between the outer diameter surface of the porous oil-containing cage and the guide surface of the outer ring caused shear deformation and fracture at the edge of the pores, which is manifested as the carbonization and blackening of polyimide materials. And the abnormal friction and relative sliding between pockets and balls also cause excessive blackening and scratches in the contact area. The wear of the bearings can be seen in Figure 11, where the material transfer occurs in the form of particle shedding in the operating contact area of the rings and balls, causing the machining marks on the original machining surface to be worn, and manifested as a "whitening" phenomenon in the contact area. The surface damage of the inner ring, outer ring, and balls is shown in Figure 12, with surface damage such as indentation and scratches occurring in the contact area during rotation. Table 5 lists the data information of the established early fault dataset D 1 of the space flywheel rotor system. Samples in the dataset represent the segmented signals with 1024 consecutive data points randomly selected from the original multi-source signals shown in Figure 13.

Results
The experiment used 28 datasets to verify the effectiveness and adaptability of the proposed HB-SC-MCCNN. D 2 to D 28 were constructed based on D 1 as shown in Table 6.
The number of labeled samples in D 1 to D 4 decreases gradually, while the number in D 5 to D 24 also decreases gradually, coexisting with the missing of 1/5 fault types.At the same time, D 25 to D 27 miss 2/5, 3/5, and 4/5 fault types, and the fault data are completely missing in D 28 .
The reduced data samples from D 2 to D 28 were placed in the validation set, and relabeled during the training process. The relabeled samples were combined with the original labeling samples to retrain the model. Moreover, to evaluate the performance of the model under harsh conditions, the number of samples is kept constant in the test set, the number of normal samples (N) in the test set is 1100, and the number of early fault samples of F c , F w , F i , F o , and F b is 220. For avoiding accidental test results, the 10-fold cross-validation method was adopted, and the average and standard deviations of the multiple verification results of the test were calculated. The results obtained from the test results (as shown in Table 6) are as follows.
The accuracy of the test results in the D 1 to D 4 datasets is above 99.98%, and there is no significant trend of change as the number of labeled samples decreases. The test results show that the HB-SC-MCCNN model proposed in this paper could still have the excellent diagnostic ability with a small number of samples. In addition, the confusion matrix of the model's prediction results on datasets D 1 to D 4 is shown in Figure 14. The accuracy of the test results for the D 5 to D 24 datasets is above 99.87%, and there is no significant trend of change as the number of labeled samples decreases in the absence of 1/5 fault types. The test results show that the proposed model still has excellent diagnostic results under the condition of a small number of samples and 1/5 types of missing faults.
The accuracy of the test results for D 25 is 84.93%, and the accuracy of the first-level binary classification is 100% (this result is not indicated in Table 4). The test results indicate that the proposed model still has good fault location results when 2/5 fault types are missing, and could effectively ensure the accuracy of the fault detection.
The accuracy of the test results for D 26 and D 27 is 73.79% and 63.31%, respectively, and the accuracy of the first-level binary classification is 100% and 99.87%. The test results show that the proposed model loses the accuracy of the fault location when 2/5 or 3/5 fault types are missing but can still effectively ensure the accuracy of fault detection.
The accuracy of the test results for the D 28 dataset is only 25%, and the accuracy of the first-level binary classification is 50%, from which it can be determined that HB-SC-MCCNN has failed in the case of the complete missing of fault samples in the training set.  Table 7 and Figure 15) show that the accuracy of the HB-SC-CNN and HB-SC-MCCNN test results on the datasets decreased from 99.87% to 90.47%, and the standard deviation increased from 0.15 to 9.12, which indicated the increase in the dispersion of the test results. With the decrease in the sample number, the test accuracy of HB-SC-CNN presents a slight downward trend.  We can conclude that the relevant features extracted by MCCNN in the target space effectively improve the clustering effect of SC.

Comparative Experiment for Exploring the Contribution of HB and SC to HB-SC-MCCNN
CNN with softmax is the most traditional and mature deep learning model, which uses CNN to extract the features of the input data, and classifies them through softmax. However, this model lacks the self-learning ability of new fault types. When there are missing fault types in the training set, relabeling and model retraining would not be performed on the samples in the validation set. Comparing the experiment results (as shown in Table 7 and Figure 15), it can be seen that the accuracy of the test results of softmax-CNN and HB-SC-MCCNN on the D 24 dataset decreased from 99.87% to 72.70%, and the standard deviation increased from 0.15 to 18.85. With the decrease in the number of samples in the training dataset, the test accuracy of softmax-CNN is decreased significantly, which is also consistent with the conclusion that the current commonly used model [19,20,40] would have a significant decrease in model performance as the number of samples decreases.

Comparison with Existing Models
In existing research, there are few fault diagnosis methods for rotating machines that simultaneously focus on insufficient labeled samples and missing fault types. The deep representation clustering-based fault diagnosis method with unsupervised data proposed by Li et al. [27] and Zhao et al. [28] achieved excellent diagnostic results. The test results (as shown in Table 7 and Figure 15) indicate that in the case of sufficient samples and 1/5 fault types missing (D 5 to D 9 ), the accuracy of the deep representation clustering is consistent with that of the HB-SC-MCCNN, and is superior to softmax-CNN. Automatic encoders and distance metric learning play important roles in the deep representation clustering, which enables the model to have the self-learning ability for new fault types.
In the case of insufficient training set samples and missing 1/5 fault types (D 20 to D 24 ), compared with HB-SC-MCCNN, the test accuracy of the deep representation clustering decreases slightly, and the standard deviation increases slightly. Test results on D 24 are decreased from 99.87% to 92.58%, and the standard deviation is increased from 0.15 to 6.55. The reason for this phenomenon is that the deep representation clustering still uses the traditional direct multi-classification method, which relies more on the amount of labeled data than the multi-step two-classification method used in this article, and is also consistent with the analysis in Part 3.2.3.
In the absence of 2/5 fault types (D 25 ), the test accuracy of deep representation clustering is decreased by 27.89% compared to HB-SC-MCCNN, and the standard deviation is increased by 9.01. The k-means clustering method used by deep representation clustering has a negative impact on the results, which cannot ensure the accuracy of relabeling in the absence of two or more fault types. However, the similarity clustering method used in this paper is effective for making the proposed model more competitive.

Conclusions
Aiming at the actual needs of fault diagnosis of the space wheel rotor system, this paper divides the fault into three levels: fault detection (a binary classification problem, which determines whether the rotor system has a fault), fault evaluation (a multi-classification problem, which determines the type of fault in the rotor system), and fault location (a multiple classification problem, which determines the specific part where the fault occurs). At the same time, considering the frequent problem of insufficient labeled samples and missing fault types in practical applications, an intelligent fault diagnosis method for the space wheel rotor system based on HB-SC-MCCNN is proposed.
The model uses SC-MCCNN as the basic structural block and simultaneously achieves fault detection and network self-learning capabilities when the number of labeled samples is insufficient. A hierarchical branch structure (HB) is introduced into the model, and the multi-classification problem is simplified to a multi-step binary classification problem, which further reduces the dependence on the number of labeled samples when locating the fault.
Through relabeling unlabeled data and fault missing types, as well as retraining the network model, the proposed method has achieved excellent diagnostic results. Experimental results show that the diagnostic accuracy of the proposed method is above 99.87% when 1/5 fault types are missing, and there is no obvious trend of change as the number of labeled samples decreases, which is more competitive compared with existing models.
However, although the proposed HB-SC-MCCNN model exhibits excellent diagnostic performance and adaptability, it still loses the accuracy of fault location when 2/5 or 3/5 fault types are missing. More efficient basic structural blocks should be further explored to reduce the computing costs. An exploration of more efficient multi-classification simplification methods should also be included in further research directions.  Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data available on request due to restrictions, e.g., privacy or ethical. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the confidentiality policy.

Conflicts of Interest:
The authors declare no conflict of interest.