Data-Driven Fault Diagnosis for Satellite Control Moment Gyro Assembly with Multiple In-Phase Faults

: A satellite can only complete its mission successfully when all its subsystems, including the attitude control subsystem, are in healthy condition and work properly. Control moment gyroscope is a type of actuator used in the attitude control subsystems of satellites. Any fault in the control moment gyroscope can cause the satellite mission failure if it is not detected, isolated, and resolved in time. Fault diagnosis provides an opportunity to detect and isolate the occurring faults and, if accompanied by proactive remedial actions, it can avoid failure and improve the satellite reliability. In this paper, an enhanced data-driven fault diagnosis is introduced for fault isolation of multiple in-phase faults of satellite control moment gyroscopes that has not been addressed in the literature before with high accuracy. The proposed method is based on an optimized support vector machine, and the results yield fault predictions with up to 95.6% accuracy. In addition, a sensitivity analysis with regard to noise, missing values, and missing sensors is done. The results show that the proposed model is robust enough to be used in real applications.


Introduction
Satellites are essential assets for space exploration and data collection. Therefore, their fault-free operation is critical, which relies on the health of their subsystems and components. For example, one of the major systems of any satellite is the attitude control subsystem (ACS) that uses different actuators such as reaction wheels, momentum wheels, and control moment gyros (CMGs), among others. If the ACS fails, the satellite cannot complete its mission. Hence, if a fault occurs in any part of the CMGs, it may fail if unattended. Fault isolation can prevent failure and increase satellite reliability by identifying any occurring fault and conducting remedial actions in time. Different fault isolation approaches include the model-based and data-driven categories [1][2][3][4].
The application of different data-driven methods in fault isolation has become popular in recent years. Specifically, different machine learning methods, such as support vector machines (SVM), neural networks, and gradient boosting machines, along with different deep learning methods, are widely used for this application [2,5]. These methods establish a fault isolation scheme by classifying given data to distinguish between different possible faults.
Several research publications cover the fault diagnosis of satellite ACS using SVM, among other data-driven approaches [6][7][8][9]. The SVM is a supervised learning method with reasonable flexibility and can adapt to any application. As each fault scenario can be considered as a class, the SVM can be used for fault diagnosis. In [6], a multi-classifier model is formed based on the Dempster-Shafer theory and SVM, while nonlinear principal component analysis (NPCA) is adopted to reduce the feature size. In [8], SVM and neural networks are used to build a model for the satellite power supply system's health monitoring. In [7], the combination of random forest (RF), partial least square, SVM, and Naïve Bayes is used to form a framework for detecting and isolating faults. In [9], telemetry data 2 of 19 is used as input to extract the features, and principal component analysis (PCA) is used for feature reduction, which is followed by an optimized SVM model using the particle swarm optimization (PSO) method adopted for FDI.
Neural networks (NN) and deep learning methods are also employed for satellite ACS FDI [10][11][12][13]. In [10], Prony analysis is used for feature extraction, and a feed-forward NN is developed for anomaly classification. In [12], first, a model is established to find the characteristics that express the faults using a deep neural network. Next, the faultto-noise ratio and characteristics differences are amplified using a sliding window. Then, the proposed method is used for fault identification of a satellite ACS. A feed-forward wavelet-based NN is adopted to form an adaptive observer for fault detection. Adopting a feed-forward wavelet-based neural network with a single hidden layer, the proposed method can be applied to nonlinear systems [13]. In [14], Chebyshev Neural Network and genetic algorithm are used to isolate CMG faults using satellite attitude rate data. However, the fault injection in that study does not accommodate in-phase faults, meaning that the faults occur out-of-phase or non-concurrently. That can be a limitation considering that multiple CMGs can become faulty simultaneously. In [15], the authors propose a new third-order nonlinear dynamics for double gimbal control moment gyros (DGCMGs) affected by friction and coupling torques, unmodeled dynamics, or parameter uncertainties that does not directly address the fault isolation and identification challenge and focuses on the dynamics and control of the CMGs under uncertain circumstances.
Various other machine learning approaches such as minimum error minimax probability machine [16], gradient boosting machines (GBM) [17], and kernel principal component analysis [18] are used for fault detection and isolation in aerospace applications.
While several articles exist on ACS fault isolation, most focus on systems that only have one active fault [14]. The proposed models cannot handle cases with multiple inphase (concurrent) faults, while these cases are likely to occur during a real-life satellite operation. When there is more than one fault present simultaneously, the effect of each fault on the overall system and other subsystem units makes the isolation task more challenging. The only work that has evaluated the multiple in-phase faults [19] reported a maximum accuracy of 66.6%, which is not sufficient for real applications. Thus, there is a need for a specific approach to handle this problem while achieving reasonable accuracy of higher than 90%.
In this work, a new data-driven scheme is developed for fault isolation of multiple in-phase faults on a CMG assembly used in ACS to control a three-axis stabilized satellite in orbit. The proposed method can handle multiple in-phase faults in satellite CMGs to address the shortcomings mentioned above in the literature. Specifically, the fault isolation of multiple concurrent faults with high accuracy. The initial results of this work have been published in [20]. However, the accuracy reported in [20] was low for real-life applications. This paper contains the complete work that has solved the accuracy issue and includes additional insights into the different aspects of the problem, including a comprehensive sensitivity analysis. The challenges faced in addressing this problem include (1) the multiple faults that are active simultaneously, (2) the randomness of the faults inception, duration, and severity, and (3) the fact that the satellite is fully controlled so the effect of faults can be compensated for by the controller and leave the fault isolation scheme blind to the underlying conditions. It is also important to note that (4) the proposed data-driven approach is using satellite level measurements (satellite orientation and angular velocities) to isolate a fault at the actuator level.
The remaining of this paper is organized as follows: In Section 2, the problem at hand is defined. In Section 3, the proposed fault isolation scheme is introduced and described. Section 4 is devoted to the algorithm complexity analysis of the methods used in this paper. In Section 5, a case study is presented to assess the proposed method's performance. Results are presented and discussed in Section 6. Finally, Section 7 concludes the paper with final remarks and recommendations for future work.

Problem Definition
In general, any nonlinear dynamical system can be modeled in state-space as: where X k ∈ R n is the state vector, u k ∈ R m is the control input, θ k ∈ R l is the system parameter, y k ∈ R m is the measurement, and ω X k , ω θ k ∈ R n are the additive process noise for states and parameters, respectively. v k ∈ R m is the additive measurement noise, k is the time step of the process, and measurement models are represented by f (·) and g(·), respectively.
Assuming that any change in the physical parameters of the satellite is accompanied by a change in one of the parameters of the system [21], a fault isolation problem can be expressed as: where θ 0 ∈ R L is a vector demonstrating the nominal parameter values, α k ∈ R L is a vector representing the parameter values in the presence of a fault, and L is the number of possible scenarios for faults that can be considered for the satellite. Equation (2) is a demonstration of a multi-parameter model and can be split into L single parameter models as [21]: Equation (3) expresses a classification problem with L classes for which a data-driven approach can be used to find a solution. Then, the data-driven method is set to predict the class of the current system state, considering the potential L cases. This can be done once a fault is detected through each fault model shown in (3), where the ith model captures the ith system parameter θ i 0 and its severity through α i k . The assumptions made in this work are as follows: 1.
The induced faults are in phase. Each data instance has assigned fault inception and duration times, which are the same for all CMG units that are faulty.

2.
The assigned fault severity for each instance is from 0 to 1 to cover all possible fault severities.

3.
All state measurements are available.

4.
There is no source of noise nor missing values in the raw input data.
This work aims to design and develop a data-driven fault isolation scheme that can use the system outputs and predict the presence of any possible fault as well as isolate the fault location under the assumptions mentioned above.

Proposed Fault Isolation Scheme
A data-driven fault isolation method is introduced that comprises an optimized machine learning model for isolating multiple in-phase faults of the satellite CMG. First, the features are calculated using the CMG data, and then, feature reduction is made through the PCA. Next, the chosen features are fed to the machine learning model as inputs for the training and testing steps. For improving the performance, the machine learning model is tuned by finding the optimal values of its parameters. Finally, the optimized machine learning model is used for performance evaluation in a case study for isolating multiple in-phase faults of a CMG assembly on a three-axis stabilized satellite in orbit. Figure 1 shows the flow diagram of the proposed fault isolation scheme. As can be seen in Figure 1, the data can be obtained from either a model of the system or the actual physical system. Once data are collected, residuals are generated as the difference between the healthy and faulty unit measurements. Next, essential features are extracted from the residuals that capture the essence of the data for fault isolation. Once features are extracted, model Once data are collected, residuals are generated as the difference between the healthy and faulty unit measurements. Next, essential features are extracted from the residuals that capture the essence of the data for fault isolation. Once features are extracted, model training and tuning start. Once the model is trained, tuned, and tested, the final model is used to evaluate the performance of the proposed scheme in isolating faults in various scenarios. Further details of the proposed scheme are described in Sections 3.1-3.4.

Data Acquisition
The raw data are acquired from a satellite telemetry system or a satellite mathematical model. In this study, a high-fidelity satellite model with four CMG units is used to generate the required data described in Section 5. The raw data comprise satellite attitude quaternions, angular speeds, and the CMGs gimbal angles. The data are stored in a timeseries format, with each set representing one of the fault scenarios shown in Table 1. There is a total of 16 scenarios. Scenario 0 represents the system without any fault. Scenarios 1 to 15 represent the system with one, two, three, or four faulty units.

Data Acquisition
The raw data are acquired from a satellite telemetry system or a satellite mathematical model. In this study, a high-fidelity satellite model with four CMG units is used to generate the required data described in Section 5. The raw data comprise satellite attitude quaternions, angular speeds, and the CMGs gimbal angles. The data are stored in a timeseries format, with each set representing one of the fault scenarios shown in Table 1. There is a total of 16 scenarios. Scenario 0 represents the system without any fault. Scenarios 1 to 15 represent the system with one, two, three, or four faulty units. The raw data are used to calculate the residuals. Residuals represent the difference between the system outputs in a nominal and faulty condition. The residuals can be calculated using: where y m represents the system measurable states/outputs for faulty model m, m denotes the desired fault scenario, y 0 is the system states/outputs for a healthy model, and k is the measurement time step.

Feature Extraction
The features are extracted from the residual time series. Feature selection/reduction methods are used to reduce the extracted features while looking for the most representative features. There are various methods for feature extraction/reduction/selection that are described in Section 5.6. Then, the chosen feature set is split into training and testing subsets that are fed into the machine learning model.

Machine Learning Model Selection
The machine learning model is developed to be used for the classification of data. There are a variety of methods suitable for machine learning that are described in Section 5.7. Fault scenarios are used as labels, and as each instance of the input feature sets belongs to a specific fault scenario, the developed machine learning model aims to predict the true label for every instance of the input feature set. This is achieved by training the model with the available feature sets with the known label and then testing and tuning the model.

Training, Testing, and Tuning the Model
The training portion of the feature sets is used to train the machine learning model. Then, the model is tested by the test portion of the feature sets, and finally, the optimum values for the model hyperparameters are obtained through an optimization process to avoid over-or under-fitting. Table 2 shows the time complexity for the machine learning models used in this work. In this table, n is the number of training samples, p is the feature numbers, n sv , demonstrates support vector numbers, n trees is the number of trees, d is the maximum depth of trees, n epoch is the number of epoches, and n l i is the number of neurons of layer i.

Method
Training Phase Prediction Phase Complexity analysis of neural networks is not straightforward, while [22,23] provide some insights into this analysis. The SVM algorithms include solving the constrained quadratic equation that is equivalent to the calculation of the inversion of an n size square matrix, which has the complexity of O n 3 . In [24], an extended time complexity analysis is done for different steps of implementing an SVM classifier. The time complexity of training with a gradient boosting machine is O(ndn trees logn) and prediction for a new sample takes O(pn trees ) [25]. Assuming trees are free to grow to maximum height O(logn), training time complexity for random forest is O(npn trees logn), and prediction of a new sample takes O(pn trees ) [26].

Application Case Study: Satellite with Four CMGs
In this section, a satellite with four CMGs is used to evaluate the performance of the proposed fault isolation scheme. Figure 2 shows the CMGs assembly in a pyramid configuration. A high-fidelity satellite mathematical model and simulator [27] are used in this work, as shown in Figure 3. The components of this simulator are described in the following sections.

Application Case Study: Satellite with Four CMGs
In this section, a satellite with four CMGs is used to evaluate the performance of the proposed fault isolation scheme. Figure 2 shows the CMGs assembly in a pyramid configuration. A high-fidelity satellite mathematical model and simulator [27] are used in this work, as shown in Figure 3. The components of this simulator are described in the following sections.

Satellite Dynamics and Kinematics
Dynamics and kinematics for the satellite are used to calculate the required outputs from the input control torque. The dynamics equation of a satellite with momentum wheels onboard can be expressed as [27]: where is the satellite's angular speed relative to the inertial frame demonstrated in the body frame, ∈ ℝ × is the external force, and is the total angular momentum of the satellite.
can be expressed as: where expressed as = − in which ∈ ℝ × is the satellite's inertia moment including the CMGs.
, ]) denotes the inertia moment of the CMGs in the axial direction. The torques provided by the CMGs are transformed into the axes of the satellite body by , the transformation matrix. Substituting (6) into (5), and expressing ℎ for CMG results in: where ℎ is the CMGs moment, and ℎ is its derivative. The kinematic equations of the satellite can be expressed as: Com mand δ  a new sample takes ( ) [26].

Application Case Study: Satellite with Four CMGs
In this section, a satellite with four CMGs is used to evaluate the performance of the proposed fault isolation scheme. Figure 2 shows the CMGs assembly in a pyramid configuration. A high-fidelity satellite mathematical model and simulator [27] are used in this work, as shown in Figure 3. The components of this simulator are described in the following sections.

Satellite Dynamics and Kinematics
Dynamics and kinematics for the satellite are used to calculate the required outputs from the input control torque. The dynamics equation of a satellite with momentum wheels onboard can be expressed as [27]: where is the satellite's angular speed relative to the inertial frame demonstrated in the body frame, ∈ ℝ × is the external force, and is the total angular momentum of the satellite.
can be expressed as: where expressed as = − in which ∈ ℝ × is the satellite's inertia moment including the CMGs.
, ]) denotes the inertia moment of the CMGs in the axial direction. The torques provided by the CMGs are transformed into the axes of the satellite body by , the transformation matrix. Substituting (6) into (5), and expressing ℎ for CMG results in: where ℎ is the CMGs moment, and ℎ is its derivative. The kinematic equations of the satellite can be expressed as:

Satellite Dynamics and Kinematics
Dynamics and kinematics for the satellite are used to calculate the required outputs from the input control torque. The dynamics equation of a satellite with momentum wheels onboard can be expressed as [27]: where ω B BI is the satellite's angular speed relative to the inertial frame demonstrated in the body frame, τ e ∈ R 3×1 is the external force, and H B BI is the total angular momentum of the satellite. H B BI can be expressed as: where J expressed as J = J s − AJ w A T in which J s ∈ R 3×3 is the satellite's inertia moment including the CMGs. J w ∈ R 4×4 = diag([J w1 , J w2 , J w3 , J w4 ]) denotes the inertia moment of the CMGs in the axial direction. The torques provided by the CMGs are transformed into the axes of the satellite body by A, the transformation matrix. Substituting (6) into (5), and expressing h for CMG results in: where h CMG is the CMGs moment, and . h CMG is its derivative. The kinematic equations of the satellite can be expressed as: where q = q v q 4 is the unit quaternion, q 4 ∈ R and q v ∈ R 3×1 = [q 1 , q 2 , q 3 ] T denote the Euler parameters expressing the satellite body frame orientation with regard to the orbital frame where q 4 v q v + q 4 = 1. I ∈ R 3×3 is the unity matrix, and q × v is the skew-symmetric matrix representation of the quaternion vector.

Controller and Steering Logic
The desired attitude of q d ∈ R 4×1 and ω d ∈ R 3×1 are attained by a nonlinear sliding mode controller in a simplified version [27]. The error terms for the quaternion tracking are expressed as: where q T e q e + q 2 4e = 1. The rotating matrix, C e = C(q e , q 4e ) is obtained using: The relative angular speed ω e ∈ R 3×1 is expressed as: Considering the error definitions shown in (9) and (11), the sliding manifold can be obtained from: where λ > 0 expresses the gain for the sliding manifold and sgn(q 4e ) represents the sign function for q 4e . Finally, the control command that is fed to the system can be expressed as: where p 0 is a positive constant. In this work, all the parameters for the controller are set as [27], λ = 1 with regard to the values shown in [27], and p 0 = 0.1 based on the simulation outcomes. As the CMGs have gimballing action, an extra component is needed for the controller that is known as the steering logic. The steering logic is responsible for converting the required torque from the controller to the required gimbal angle rates to generate that torque by the CMGs. In general, the CMG angular momentum is a function of CMG gimbal angles, δ = (δ 1 , . . . , δ n ), and flywheels angular speed, Ω = (Ω 1 , . . . , Ω n ) given by: where n is the number of CMGs. One of the CMG steering logic approaches is to use the differential relationship between gimbal angles and the CMG momentum vector. For such a method, the derivation of h is obtained as: .
where A CMG = A CMG (δ) ∈ R 3×n as the Jacobin matrix is: The gimbal rate can be calculated using Equation (21). At first, h CMG can be calculated based on the CMGs configuration. For the pyramid configuration [27]: where h i is the angular momentum of each CMG expressed in the reference frame of the satellite. δ i represents the gimbal angles, Ω i represents the flywheel angular speed, and h 0i represents the momentum magnitude for the ith CMG. The derivative of the CMG angular momentum versus time can be calculated as: where δ is the gimbal angle vector and: For a given control torque τ c , the torque command of the CMG, .
h, is selected as: .
The gimbal rate command where A + CMG = A T CMG A CMG A T CMG −1 is the pseudoinverse steering logic, and most CMG steering logics determine the gimbal rate commands with variations of it.

Actuators
As the critical components of any satellite's ACS, the actuators provide the torque required for controlling the satellite attitude. In this model, four CMGs are used as actuators. CMG is a reaction wheel capable of changing its angular momentum direction by gimballing the spinning rotor. The CMGs receive the gimbal rate command as input to provide the required control torque for the satellite.

Fault Injection
In order to inject faults into the system, a fault parameter matrix is formed and multiplied by A CMG in Equation (19) to form with where f p i denotes the fault severity in the ith CMG, and its value can range from 0 for a fully failed unit to 1 for a fully functional unit. The fault isolation aims to identify the faulty units using the outputs of the satellite, including its quaternions [q 1 , q 2 , q 3 ], [ω 1 , ω 2 , ω 3 ] and [δ 1 , δ 2 , δ 3 , δ 4 ] or a combination of these outputs, as discussed in Section 6.6.3.

Raw Data
The simulator with its components is described in Sections 5.1-5.3 is used to generate the raw data. The first step is to define the required parameters for the desired fault scenarios in Table 1. Each simulation will require seven inputs. These inputs include the scenario number from Table 1, values for f p i ; i = 1, 2, 3, 4, fault inception, and fault duraton. Next, the simulation factors are fed to the simulator depicted in Figure 3. Once the simulation is complete, the required outputs [q 1 , q 2 , q 3 , q 4 , ω 1 , ω 2 , ω 3 , δ 1 , δ 2 , δ 3 , δ 4 ] h ∈ R 1×11 for nominal values and [q 1 , q 2 , q 3 , q 4 , ω 1 , ω 2 , ω 3 , δ 1 , δ 2 , δ 3 , δ 4 ] f ∈ R 1×11 for faulty system outputs are stored as a time-series. The raw data include 20,000 simulation sets for each fault scenario in Table 1. As there are 16 scenarios, in total, there were 320,000 datasets, each of them stored in a comma-delimited value (CSV) file. As the data related to q 4 are not independent of q 1 , q 2 , q 3 , they were discarded in this work. Each time series has a time length of 200 s. Twenty-two columns and 2000 rows were stored in each CSV file. As the simulation time step is 0.1 s and has 200 s length, 2000 rows were generated. The number of columns is calculated as 11 × 2 = 22 for two sets of 11 parameters. Figure 4 shows a sample of the raw data used in this work. The data shown in Figure 4 are for a simulation with simulation parameters as Fault scenario = 7, f p 1 = 0.71, f p 2 = 1, f p 3 = 1, f p 4 = 0.176, Fault inception = 13.2 s, Fault duration = 38.9 s. The raw data are used to calculate the residuals and for feature extraction and selection/reduction. Electronics 2021, 10, 1537 9 of 20 scenario in Table 1. As there are 16 scenarios, in total, there were 320,000 datasets, each of them stored in a comma-delimited value (CSV) file. As the data related to are not independent of , , , they were discarded in this work. Each time series has a time length of 200 s. Twenty-two columns and 2000 rows were stored in each CSV file. As the simulation time step is 0.1 s and has 200 s length, 2000 rows were generated. The number of columns is calculated as 11 × 2 = 22 for two sets of 11 parameters. Figure 4 shows a sample of the raw data used in this work. The data shown in Figure 4 are for a simulation with simulation parameters as Fault scenario = 7, = 0.71, = 1, = 1, = 0.176, Fault inception = 13.2 s, Fault duration = 38.9 s. The raw data are used to calculate the residuals and for feature extraction and selection/reduction.

Feature Engineering
Residuals are calculated for each instance of the raw data related to each fault scenario using Equation (4). Figure 5 shows a sample of the residual data used in this work and corresponding to Figure 4 simulations. From Figure 5, the distinct difference during the fault period confirms the suitability of using residuals for fault detection and isolation.

Feature Engineering
Residuals are calculated for each instance of the raw data related to each fault scenario using Equation (4). Figure 5 shows a sample of the residual data used in this work and corresponding to Figure 4 simulations. From Figure 5, the distinct difference during the fault period confirms the suitability of using residuals for fault detection and isolation. The residuals are used to extract the features. In an attempt to find a feature set that best represents the desired fault scenarios, different methods are used for feature extraction in this work that include wavelet packet transform (WPT) [28], multi-domain analysis [29], correlation analysis [30], cross-correlation analysis [31], and multi-correlation analysis [32]. The WPT and multi-domain analysis features are used to discover almost any pattern that can be present in time-series data. This includes variations in the shape of the data, amplitude changes over a short period, and changes in data frequency. These two methods are considered as univariate analyses, as they extract the features from each time series individually and do not consider any possible relation between any two sets of the The residuals are used to extract the features. In an attempt to find a feature set that best represents the desired fault scenarios, different methods are used for feature extraction in this work that include wavelet packet transform (WPT) [28], multi-domain analysis [29], correlation analysis [30], cross-correlation analysis [31], and multi-correlation analysis [32]. The WPT and multi-domain analysis features are used to discover almost any pattern that can be present in time-series data. This includes variations in the shape of the data, amplitude changes over a short period, and changes in data frequency. These two methods are considered as univariate analyses, as they extract the features from each time series individually and do not consider any possible relation between any two sets of the time series. Therefore, as there is more than one fault simultaneously active in the CMG units in this study, the residuals from the satellite outputs can have complicated relations with each other. For example, they can be correlated with each other differently for each possible scenario. Thus, there is a need to use multi-variable analysis techniques to handle this issue. Based on this assumption, correlation, cross-correlation, and multi-correlation analysis are also chosen for feature extraction in this study. For the correlation analysis, the Pearson correlation coefficient is calculated between each pair of the residual data using [32]: in which C ij is the covariance of r i and r j , where r denotes the residual calculated using (4). C ii and C jj are the variance of r i and r j , respectively. Cross-correlation analysis and feature extraction is done based on the method used in [31], and the details are not repeated here. Multi-correlation analysis is the same as correlation coefficient calculation, except it is calculated for each set of three residuals and represents the correlation between the three parameters and can be calculated using [32]: where ρ is the correlation coefficient, as shown in (24). Feature reduction/selection aims at finding the most representative features to improve the model performance while reducing its time complexity. Different methods of feature reduction/selection have been used in the literature [2]. In this work, PCA [33], recursive feature elimination [34], and feature importance method [35] are used for this purpose. Finally, the chosen features are used for training and testing the model. Results of the different methods are discussed in Section 6.

Machine Learning Model
Various machine learning approaches have been used for classification purposes in the literature [2]. In this work, SVM [29], neural networks [12], random forest [36], and gradient boosting [37,38] are used for classification. In addition to the models mentioned above, different classification approaches, including multi-label classification, multi-step classification, and ensemble learning, are used in this work to improve the performance of the proposed scheme.
The rationale for using the multi-label approach is that the fault scenarios include cases with more than one active fault, and the faulty units' number can be used as a label instead of the scenario number. For example, for scenario number 11, it is possible to use the array [1,2,3] as the label instead of merely using 11. The multi-label method is implemented using a scikit-learn package called LabelPowerset [35]. This package transforms a multi-label problem into a multi-class problem with one multi-class classifier trained on all unique combinations of labels. The method maps each combination to a unique combination ID number and conducts multi-class classification using the classifier as a multi-class classifier and combination IDs as classes [35].
Multi-step classification is implemented by dividing the problem into first finding the number of active faults and then using different classifiers for cases that belong to a different number of faults. Figure 6 shows the proposed method for multi-step classification. In step 1, the label set is [1,2,3], which is the possible combination of active faults. In step 2, three classifiers are trained. Each classifier only deals with the cases with the same number of active faults. The rationale behind using a multi-step approach is to narrow down the problem and solve it in a hierarchical approach. When using a multi-step approach, the resources are only allocated to finding the number of active faults using the residual discrepancy, and once the number of active faults is identified, in the second step, the resources are used to find which units are faulty. This approach can help use computational resources more efficiently by isolating the portion of the problem being solved (i.e., number of faults vs. source of faults) instead of trying to solve both problems at once.
In the next section, the proposed fault isolation scheme is applied to a case study to evaluate its performance. the resources are used to find which units are faulty. This approach can help use computational resources more efficiently by isolating the portion of the problem being solved (i.e., number of faults vs. source of faults) instead of trying to solve both problems at once.
In the next section, the proposed fault isolation scheme is applied to a case study to evaluate its performance.

Results and Discussion
In this section, the proposed fault isolation scheme is used on the satellite with four CMGs to find the optimum choices for each step of the method and evaluate the performance of the optimized scheme. The proposed scheme was run using a PC comprised of an Intel ® Core™ i7-4790 CPU with a processing power of 3.6 GHz, 8 MB cache, and 8 GB of RAM. The evaluation includes using different feature extraction methods, feature reduction/selection, and machine learning to find the optimum method for each step. It also includes evaluating the performance of the optimized scheme for the test data and performing the sensitivity analysis to ensure that the scheme is suitable for real applications.

Feature Extraction
By discarding , as explained in Section 5.5, 10 outputs of the satellite, namely [ , , , , , , , , , ] are used for feature extraction. Feature extraction is done using different methods to find the most suitable method for this application. Table  3 shows the performance of the proposed scheme when different methods are used for feature extraction. The results show that the correlation analysis features provide the best performance with 85.5% for the proposed method. Therefore, this set of features is chosen to be used in the next step: feature reduction/selection.

Results and Discussion
In this section, the proposed fault isolation scheme is used on the satellite with four CMGs to find the optimum choices for each step of the method and evaluate the performance of the optimized scheme. The proposed scheme was run using a PC comprised of an Intel ® Core™ i7-4790 CPU with a processing power of 3.6 GHz, 8 MB cache, and 8 GB of RAM. The evaluation includes using different feature extraction methods, feature reduction/selection, and machine learning to find the optimum method for each step. It also includes evaluating the performance of the optimized scheme for the test data and performing the sensitivity analysis to ensure that the scheme is suitable for real applications.

Feature Extraction
By discarding q 4 , as explained in Section 5.5, 10 outputs of the satellite, namely [q 1 , q 2 , q 3 , ω 1 , ω 2 , ω 3 , δ 1 , δ 2 , δ 3 , δ 4 ] are used for feature extraction. Feature extraction is done using different methods to find the most suitable method for this application. Table 3 shows the performance of the proposed scheme when different methods are used for feature extraction. The results show that the correlation analysis features provide the best performance with 85.5% for the proposed method. Therefore, this set of features is chosen to be used in the next step: feature reduction/selection.  Table 4 shows the results for applying the proposed scheme with different feature reduction/selection methods. The results depict that the PCA provides the highest score (92.9% for input and 92.2% for output) after feature reduction. Thus, the PCA is selected for feature reduction. The reduction aims to keep the features that represent 99% of all features' variance. Figure 7 shows the features explained variance, and it can be seen that the features reach 99% of the variance with 25 out of 45 components.  Table 4 shows the results for applying the proposed scheme with different feature reduction/selection methods. The results depict that the PCA provides the highest score (92.9% for input and 92.2% for output) after feature reduction. Thus, the PCA is selected for feature reduction. The reduction aims to keep the features that represent 99% of all features' variance. Figure 7 shows the features explained variance, and it can be seen that the features reach 99% of the variance with 25 out of 45 components.   Table 5 shows the results of using different machine learning methods. As the results show, the SVM model has the best performance compared to the neural networks, the gradient boosting machines, and the random forest. Table 6 shows the results of applying different classification approaches. The SVM method is used as the classification algorithm in both the multi-label and multi-step approaches. As Table 6 shows, neither of these two approaches has a better performance than the traditional machine learning methods, as shown in Table 5. Based on the results shown in Tables 5 and 6, the SVM is selected as the most suitable method for the remaining analyses in this paper. The following results in Sections 6.4-6.6 further explore the performance of the proposed method with the SVM as the machine learning method in the overall process.  Table 5 shows the results of using different machine learning methods. As the results show, the SVM model has the best performance compared to the neural networks, the gradient boosting machines, and the random forest. Table 6 shows the results of applying different classification approaches. The SVM method is used as the classification algorithm in both the multi-label and multi-step approaches. As Table 6 shows, neither of these two approaches has a better performance than the traditional machine learning methods, as shown in Table 5. Based on the results shown in Tables 5 and 6, the SVM is selected as the most suitable method for the remaining analyses in this paper. The following results in Sections 6.4-6.6 further explore the performance of the proposed method with the SVM as the machine learning method in the overall process.

Validation/Learning Curves for the SVM Model
The optimum values and choices for the SVM model hyper-parameters are found using the grid search. Table 7 shows the search domain and the optimum value/choice for each hyper-parameter from the grid search. The coefficient C is the penalty factor that is used for the regularization of the model. This parameter makes a balance between the training accuracy and simplicity of the model. A small C makes the decision surface smooth, while a large C aims at classifying all of the training samples accurately [35]. The gamma coefficient applicable for "poly", "rbf", and "sigmoid" kernels, where rbf is an abbreviation for "radial basis function". Gamma takes over the effect that each training example has on the model. By increasing gamma, only the closer samples are being affected [35]. The degree is only applicable for the polynomial kernel, and as "rbf" is selected as the optimum kernel through the grid search for this study, the degree does not apply here.  Figure 8 shows the learning curve for the SVM model. In Figure 8, the model training score and testing score, calculated by the cross-validation method, are shown versus the number of training samples. By increasing the number of samples, the training score decreases while the testing score increases until a specific point, 175,000 samples, and then plateaus. Based on this, having 240,000 total samples is enough when the ratio of the train to test split is chosen as 70% to 30%.

Validation/Learning Curves for the SVM Model
The optimum values and choices for the SVM model hyper-parameters are found using the grid search. Table 7 shows the search domain and the optimum value/choice for each hyper-parameter from the grid search. The coefficient is the penalty factor that is used for the regularization of the model. This parameter makes a balance between the training accuracy and simplicity of the model. A small makes the decision surface smooth, while a large aims at classifying all of the training samples accurately [35]. The gamma coefficient applicable for "poly", "rbf", and "sigmoid" kernels, where rbf is an abbreviation for "radial basis function". Gamma takes over the effect that each training example has on the model. By increasing gamma, only the closer samples are being affected [35]. The degree is only applicable for the polynomial kernel, and as "rbf" is selected as the optimum kernel through the grid search for this study, the degree does not apply here.  Figure 8 shows the learning curve for the SVM model. In Figure 8, the model training score and testing score, calculated by the cross-validation method, are shown versus the number of training samples. By increasing the number of samples, the training score decreases while the testing score increases until a specific point, 175,000 samples, and then plateaus. Based on this, having 240,000 total samples is enough when the ratio of the train to test split is chosen as 70% to 30%.  Table 8 lists the results for the model's five-fold cross-validation. It can be observed from Table 8 that the chosen model achieves a high score in every fold while the standard deviation (SD) remains low. A high mean score with a low standard deviation means that  Table 8 lists the results for the model's five-fold cross-validation. It can be observed from Table 8 that the chosen model achieves a high score in every fold while the standard deviation (SD) remains low. A high mean score with a low standard deviation means that no over-or under-fitting has occurred, and the scores are very close to each other for different folds. Figure 9 shows the SVM model's validation curve for gamma and C. These figures confirm that the selected parameters for the model are at optimum. no over-or under-fitting has occurred, and the scores are very close to each other for different folds. Figure 9 shows the SVM model's validation curve for gamma and C. These figures confirm that the selected parameters for the model are at optimum.

Confusion Matrix for the SVM Model
The confusion matrix is obtained for the test data to evaluate the model performance with more details per scenario and is presented in Table 9. The number of instances tested per class is used to normalize the results; therefore, the table values represent the percentage for the overall data. The values in the diagonal depict the percentage of the instances predicted correctly. The first scenario that is related to all healthy units (scenario 0) has 100% accuracy. The next four scenarios (scenarios 1 to 4) are related to the cases with only one faulty unit and have 99% accuracy. However, as the number of faulty units increases, the model performance degrades for the next scenarios. For the scenarios with only one faulty unit, the accuracy is, on average, 99%.

Confusion Matrix for the SVM Model
The confusion matrix is obtained for the test data to evaluate the model performance with more details per scenario and is presented in Table 9. The number of instances tested per class is used to normalize the results; therefore, the table values represent the percentage for the overall data. The values in the diagonal depict the percentage of the instances predicted correctly. The first scenario that is related to all healthy units (scenario 0) has 100% accuracy. The next four scenarios (scenarios 1 to 4) are related to the cases with only one faulty unit and have 99% accuracy. However, as the number of faulty units increases, the model performance degrades for the next scenarios. For the scenarios with only one faulty unit, the accuracy is, on average, 99%.
In cases with two concurrently faulty units (scenarios 5 to 10), the average accuracy reduces to 97.8%. This pattern continues with an average accuracy of 91.4% for cases with three faulty units (scenarios 11 to 14) and 77% for the case with four faulty units (scenario 15). Therefore, the model performance degrades as more faults are present simultaneously. This behavior can be explained due to the overlap that the cases with more than one active fault have with the other cases. For example, scenario 13 has three active faults in CMG units 1, 3, and 4, as shown in Table 1. This scenario has overlap with scenarios number 7, 10, and 15 in which the faulty units are (1,4), (3,4), and (1, 2, 3, 4), respectively. As Table 9 depicts, for scenario 13, the mentioned scenarios have the highest percentage of incorrectly predicted labels. This rationale can be extended to other similar scenarios.

Sensitivity Analysis of the SVM Model
In this section, a comprehensive sensitivity analysis for the proposed model is presented. The model's sensitivity is evaluated for noise, missing sensors (e.g., due to sensor failure), and missing measurements (e.g., due to sensor fault) to ensure robustness.

Number of Active Faults
There are 16 different scenarios considered in this study, as shown in Table 1. Table 10 shows the results for subsets of all 16 scenarios, including one active fault, two active faults, three active faults, and a combination of these. As the results show, the model has 100% accuracy for the scenarios where there is only one active fault. The accuracy drops gradually as the maximum number of active faults (MNOAF) increases to four. Noise has been added to the raw data with different signal-to-noise ratio (SNR) levels to study the effect of noisy raw data on the model performance. The added noise in this study is Gaussian with a zero mean. Table 11 shows the results for different levels of SNR. The results show that the model performance degrades as the SNR decreases. It should be noted that the model maintains a reasonable score when the SNR is above 50 dB, which is the case in most practical applications. The satellite attitude parameters and the CMGs gimbal angles, which are used as raw data in this work, represent sensor readings from the satellite. In practical applications, there may be circumstances where some of these sensors malfunction or fail. In this section, a study is done on the cases where one or more sensors have failed, and the data are not available from these sensors. Table 12 shows the results for different possible failed sensor combinations. As the results show in Table 12, the model accuracy degrades when one or more sensors fail. However, in cases where six or more out of 10 sensors are properly functioning, the model performance is reasonable for real applications.

Missing Values
It is common for sensory data to contain missing values due to faults in communication channels or sensor components. In this section, an analysis is done to evaluate the model performance for missing data. The original raw data used in this study does not have any missing values. Hence, missing values are added to the original dataset manually at different percentages to conduct this analysis. For this analysis, to reconstruct the missing data, linear interpolation imputation is used to impute the missing values before calculating the residuals and extracting the features. Table 13 shows the results for possible missing measurement percentages (MMP). The model score drops as the percentage of the missing values increases. However, the model score is still reasonable for 10% or less missing values.

Conclusions
A data-driven fault isolation scheme was presented in this work for isolating multiple in-phase CMG faults onboard a three-axis stabilized satellite. To be able to achieve this goal, various methods were considered for each step of data-driven method development, and the performance of each method was compared with the other methods to select the most suitable method for that step. When possible, grid search optimization was conducted to fine-tune the selected methods hyper-parameters to obtain an optimum scheme among the evaluated methods for the problem outlined in this paper. A case study with real-life satellite model parameters was considered to further evaluate the performance of the optimized fault isolation scheme. The case study evaluation included a comprehensive sensitivity analysis to analyze the robustness of the proposed scheme toward various uncertainty sources, including noisy data, missing values, and missing sensors. The results yielded that the proposed scheme can isolate the faulty CMG units for different possible fault scenarios with reasonable accuracy. The sensitivity analysis proved that the proposed scheme is robust enough to be used in real applications. One of the main limitations of this work is in the fault injection approach, where faults are injected as fault parameters that directly capture the effectiveness of the actuator. In reality, the faults occur in the system parameters of the actuators and result in the nonoptimal effectiveness of the actuator unit. In future work, the authors plan to inject faults in the actuator system parameters instead of multiplying the actuator outputs by an effectiveness factor to better represent the real-life mechanism under which faults can emerge in mechanical systems. Furthermore, access to real satellite data can further improve the quality and validity of the proposed method in this work. Finally, the multi-step fault isolation approach will be further investigated to improve the accuracy of the proposed multi-step approach.