Machine Learning Approaches for Data-Driven Self-Diagnosis and Fault Detection in Spacecraft Systems

Crotti, Enrico; Colagrossi, Andrea

doi:10.3390/app15147761

Open AccessArticle

Machine Learning Approaches for Data-Driven Self-Diagnosis and Fault Detection in Spacecraft Systems

by

Enrico Crotti

^* and

Andrea Colagrossi

^*

Department of Aerospace Science and Technology, Politecnico di Milano, Via La Masa 34, 20156 Milan, Italy

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(14), 7761; https://doi.org/10.3390/app15147761

Submission received: 10 June 2025 / Revised: 1 July 2025 / Accepted: 8 July 2025 / Published: 10 July 2025

(This article belongs to the Special Issue Application of Artificial Intelligence in Fault Detection, Diagnosis, and Prediction)

Download

Browse Figures

Versions Notes

Abstract

Ensuring the reliability and robustness of spacecraft systems remains a key challenge, particularly given the limited feasibility of continuous real-time monitoring during on-orbit operations. In the domain of Fault Detection, Isolation, and Recovery (FDIR), no universal strategy has yet emerged. Traditional approaches often rely on precise, model-based methods executed onboard. This study explores data-driven alternatives for self-diagnosis and fault detection using Machine Learning techniques, focusing on spacecraft Guidance, Navigation, and Control (GNC) subsystems. A high-fidelity functional engineering simulator is employed to generate realistic datasets from typical onboard signals, including sensor and actuator outputs. Fault scenarios are defined based on potential failures in these elements, guiding the data-driven feature extraction and labeling process. Supervised learning algorithms, including Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs), are implemented and benchmarked against a simple threshold-based detection method. Comparative analysis across multiple failure conditions highlights the strengths and limitations of the proposed strategies. Results indicate that Machine Learning techniques are best applied not as replacements for classical methods, but as complementary tools that enhance robustness through higher-level self-diagnostic capabilities. This synergy enables more autonomous and reliable fault management in spacecraft systems.

Keywords:

fault detection, isolation, and recovery (FDIR); spacecraft autonomy; self-diagnosis; artificial intelligence; data-driven methods; guidance, navigation, and control

1. Introduction

The management of onboard systems and health status monitoring of a spacecraft is a pivotal but extremely wide aspect of satellite design that is being reshaped as satellites evolve and autonomy increases. Onboard Fault Detection, Isolation, and Recovery (FDIR) provides supervision and control on the satellite behavior under unexpected situations and malfunctions. The main design philosophies for carrying the task of anomaly identification onboard spacecrafts, at the current time, are model-based and data-driven. Model-based techniques exploit an “analytical redundancy” of the onboard subsystems: by running an onboard model in parallel with real-time operations as they are performed, the spacecraft relies on a duplicated virtual version of part of itself. By checking simultaneously the outputs of real elements and their simulated version, residuals are periodically calculated, and it is possible to establish if any discrepancy from the expected scenario has occurred. Analytical redundancy adds a layer of cross-checks without the addition of hardware components to the system: the result of these checks shall be translated into a decision function to eventually proceed with the available recovery actions, if necessary [1]. In the context of CubeSats, Lobo et al. (2019) built an FDIR framework based on analytical redundancy and residual-based logic for fault isolation [2]. For similar live time-varying signal monitoring, in a more recent work Xu et al. (2025) developed a model-based approach which utilizes input-compensated recursive least squares and disturbance observers, for a robust inverter fault diagnosis through parameters estimation [3]. Data-driven algorithms, on the other hand, can work with live acquired data, ranging from housekeeping parameters to raw sensor output, to notify the spacecraft and flight engineers when some anomalous behavior is detected [4]. By exploiting knowledge on process history, it is possible to notice novel or unusual behavior even without an explicit model of the system. Both of these philosophies come with potential weaknesses: one of the drawbacks with model-based techniques is the fact that they require sufficiently accurate dynamic models, which must be designed, optimized, and end up consuming a discrete amount of onboard computational resources. Data-driven approaches attempt to specifically overcome this issue while maintaining a high reliability of detection. However, it is clear that the focus must shift to the quality and relevance of training data, and the selection of statistical knowledge used to monitor the system must be extremely well targeted.

Specifically, in the realm of fault detection, the adoption of data-driven methods is widespread and concerns many different fields: surveys such as Chen et al. (2023) gathered useful and reliable diagnostic methods from recent years, in this specific case with a special focus on HVAC systems [5]. In the current study, attention is drawn on two very common (supervised) Machine Learning approaches for classification, with some meaningful research heritage that hints to potential cutting-edge space applications: Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs). In industrial mechanics, SMVs are already quite popular, with a few examples focusing on fault detection through feature extraction. For instance, Rauber et al. (2014) focused on bearing fault diagnosis [6], Jan et al. (2017) confirmed the very same extracted features pool for sensor fault diagnosis through SVMs [7], while Samanta (2004) employed a further optimization of extracted features through a genetic algorithm [8]. In the space domain, examples are fewer, at least for what concerns fault detection: Gao, Yu et al. (2012) employed principal component analysis and SVMs to classify output data from sensors and actuators, with promising results in all explored cases [9], while Ding et al. (2021) focused on spacecraft leakage detection through SVMs, also applying a feature ranking algorithm [10]. Many of the strategies employ sophisticated reduction methods to diminish the size of the input vector, but a lot of these methods do not involve MIL testing, applying classification offline, or do not match up different SVM architectures for comparison in bulk. Support Vector Machines are relatively easy to intuitively comprehend, and can achieve high efficiencies, thanks to the use of kernels [11]. Support vectors are intended to be used as binary classifiers, but through structured architectures, they have become extremely widespread also for multi-class categorization [12].

Artificial Neural Networks are slightly less obvious to fully understand theoretically, and their hyper-parameters suffer from a lack of physically meaningful interpretation [13]; despite this, their use is following a rapid increase in popularity due to high performance for what concerns pattern recognition and classification, especially when exploiting a feature extraction mechanism to select their inputs, much like SVMs. Sorsa et al. (1991) showed 10-class fault classification on a realistic continuous stirred tank reactor system through ANNs [14], Samanta and Al-Balushi (2001) focused once again on fault diagnostic for rolling element bearings with only five extracted features [15], while Ma et al. (2023) used networks to apply both detection and diagnosis on high-speed train air brake pipes [16]. The variety of layer structures and node types is incredibly wide, and therefore, the topic is still under close study and within a heavy experimenting phase [17]: space applications are many and diversified, and extremely high accuracy in classification problems can be achieved with a multitude of layer setups. Valdes et al. (2009) developed a dynamic Neural Network Fault Detection and Isolation (FDI) framework for pulsed plasma thrusters, with hints to an integrated scheme for combined high and low level FDI [18], O’Meara et al. (2018) used different Neural Network architectures to perform automatic feature extraction, anomaly detection, and telemetry prediction [19], and Li et al. approached voltage anomalies detection through a deep belief network architecture [20]. Once again, many tests are run offline, with no consequential MIL testing: tests in space rarely concern a satellite’s ADCS and only concern a few network architectures, without a critical comparison with other kinds of data-driven methods, such as threshold-based, that could be considered a useful signal monitoring benchmark algorithm. Fault detection and diagnosis through Neural Networks and SVMs is an incredibly wide topic, which is being tackled by dozens of different directions at once, by exploiting the state of the art on both these models. Despite the proven efficacy of Deep Learning solutions for systems’ health monitoring and fault detection, FDIR in space applications still misses a shared standard or universal strategy. In a loosely structured field such as this, especially on the pivotal Attitude Determination and Control Subsystem, Machine Learning is a better choice for what concerns computational resources, fast training, heritage, and most importantly, when the aim is to build a basic but wide comparison of a large range of architectures: for this reason, a rigorous but gradual approach is adopted.

The aim of this research is, starting from a realistic framework in a functional simulator, reproducing an Earth-orbiting satellite’s Attitude Determination and Control System (ADCS), to focus on the analysis of signals onboard the spacecraft and the diagnosis process of faults: the task is to examine performances of a collection of basic detection algorithms employing Machine Learning, in comparison to the architecture of a simple threshold-based data-driven fault detector, to find the most appropriate shape for an enhanced FDIR architecture. Once the problem of a reliable AI-aided detection and diagnosis is tackled, this new paradigm can be extended to other subsystems beyond Guidance, Navigation, and Control (GNC), so that the whole spacecraft, at system level, could eventually benefit from this enhanced robustness.

The remaining chapters of this document are structured as follows: Section 2 presents an overview of the methodology adopted and the main data-driven techniques explored in this study, from a conceptual point of view. Particular emphasis is placed on describing the specific features extracted in Section 3. The experimental setup is showcased in all its aspects in Section 4, alongside details on dataset generation, and the simple algorithm used as a benchmark is described. Finally, AI-based techniques are put to the test in a variety of scenarios in Section 5: through the comparison with other algorithms, their strengths and weak points suggest their most promising uses, which are further evaluated in specific case studies, explored as a very last investigation.

2. Proposed Methodology of This Study

The aim of this section is to explore the methodology adopted and the underlying concepts of data-driven methods employed, to better understand the reasons behind the main design choices. First of all, an overview on the framework, the setting of the testing environment, and the strategy of data processing are given. Afterwards, some insight on data-driven techniques for detection is presented: firstly, the basic one based on thresholds in Section 2.1; then, some theoretical background of the main two Machine Learning methods adopted is provided in Section 2.2 and Section 2.3, highlighting the reason for their use and their limitations. Finally, in Section 2.4, most of the typical faults of a satellite’s ADCS are summarized in a restricted number of anomalous scenarios: these are adopted in a modular way as the baseline for building the large datasets employed in this study, for training and simulation purposes.

As shown in Figure 1, two detection methods with radically different approaches are followed: one is a relatively simple variance-based detection algorithm, relying on thresholds and logical relations, which takes data directly from the simulation and is explored in detail in Section 2.1. The other approach involves the characteristic process pipeline for Machine Learning techniques: data from sensors and actuators is gathered into large datasets containing signal records from hundreds of simulations. Each element of these datasets is then further processed: instead of storing and working with entire signals, only a few time- and frequency-domain features are extracted to significantly reduce the number of inputs for the learning agents to train on. All of the features are explained and motivated in Section 3, while the full list of employed classifiers is detailed in Section 5.

This work approaches the use of Machine Learning in fault detection in a spacecraft’s ADCS context, while trying to extract general observations, so that the methodology can potentially be extended to other pivotal subsystems, and even at a higher system level directly. Innovative learning algorithms are benchmarked against a variance-based method, which is already fully implemented in the functional simulation, and is explored in detail in Section 2.1. Trained learning agents in this study are tasked with many different duties, with the number of output classes heavily dependent on the specific task. Figure 2 shows all the detection methods and the instances in which they are employed: the testing considers first the baseline threshold-based algorithm for a simple detection, and the classifiers for a 4-class classification mimicking a similar behavior. Then, learning agents are tested and specialized on two-class detection, on signals containing one or multiple faults: in addition to that, there is the potential for a real diagnosis, instead of simple detection, which is tested on the same families of classifiers. As final tests, two specific case studies are proposed, where trained classifiers are fully implemented in the functional simulation loop of the satellite, in synergy with the already mentioned variance-based detection algorithm. All of the tests conducted and the results retrieved are reported in Section 5 in detail.

The data to train the learning agents does not come from an actual mission, but is generated in an acceptably realistic way through the MATLAB^® (R2024b) Simulink^® simulator presented in Section 4. In the same environment, the proposed classifiers are built and developed through MATLAB^®’s own Classification Learner App: they are various kinds of Support Vector Machines, with the use of different types of kernels and Artificial Neural Networks, following the classic MLP structure, with varying numbers of nodes and layers. All of them are taken among the most basic versions of the respective classifier family and they do not present intricate architectures: this is conducted in order to grant increased repeatability and easiness to implement in other Machine Learning tools and software. Also, they are all Simulink^®-compatible, so that testing is immediate and implementation is simplified. All of the learning agents take a collection of features of the signal as inputs, and give as output the corresponding signal class. The feature pool is also the same for all of the classifiers; therefore, interchangeability is total and performance can be evaluated objectively. The complete list of classifiers used in this study is visible in Figure 1, and they are once again presented in Section 5.

2.1. Threshold-Based Detection

The baseline functional simulator already involves a simple fault detection algorithm, which is used as benchmark. This “Classic” algorithm is based on custom-made operators placed after each observed element, with the aim of achieving a simple fault detection. These blocks separate any 3-axis time-variant signal into its three components, then calculate the running variance and running mean on each of the signals, with a moving window of tunable length. The evolving variance of the signal, if checked in real time and coupled with the values of the signal itself, can be an already powerful indicator if some unusual behavior is observed [21]. Through a pipeline of logical operators and thresholds, outlined in Figure 3, this basic algorithm can output four different flags:

Generic anomaly: When any of the axes exhibits a particularly high variance in its signal, a threshold is violated and the block emits a flag.
Out-of-range signal: A simple threshold is put on the maximum magnitude all signals can have, mimicking the violation of the maximum range imposed by physical limits on onboard elements.
Stuck signal: When the block senses a variance drop in the signal to the point where it basically reaches zero, this flag is activated.
Data loss: Just like the previous case, a variance drop is recognized. The detection algorithm goes and checks the effective magnitude of the defective signal as well: if it is exactly zero, then the sensor or actuator monitored is not simply stuck, but it most certainly has stopped outputting meaningful data at all.

Clearly, the generic flag “Anomaly” can be specialized into distinguishing the many faults described in Section 2.4: this, however, requires a much finer tuning of the detection block and the use of soft thresholding. A finer 6- or 7-class, data-driven, threshold-based detection is most certainly possible, but the tuning would be so specific and custom-tailored on the simulation parameters that it would lose any practical significance and attempt at generality. Some amount of fine-tuning is made nevertheless, to adapt the detector blocks to each of the monitored elements, and re-parametrized on each attitude mode of the satellite: despite the attempts at building an impartial system, given the fine-tuning performed and the fact that also faults are self injected, the correct detection rate is going to be very high. This is also a chance to point out that the necessity to manually and precisely tune so many parameters is one of the weaknesses of this method.

In real detection algorithms, information on the possible behaviors of onboard signals is much more limited, and it is a good idea to keep an intermediate variance threshold: not too high, otherwise it might miss some faulty behaviors, and not too low, to avoid repetitive triggering on false positives. Finally, a delaying function is implemented in each baseline detector, such that any flag recedes back to nominal status only after 10 consecutive samples of no detected faults. This basic data-driven method is a very first benchmark comparison for performances and robustness on Machine Learning classifiers.

2.2. Support Vector Machines

Support Vector Machines are among the most popular learning algorithms, since in terms of relatively small training data, they are efficient and reliable [12]. The aim of these machines is to build a separator for the data in the form of a hyperplane, with linear equation [13]

w \cdot x + b = 0,

(1)

parametrized by vector

w

and constant b, to binary classify any entry

x

. By mapping both the entry data as well as the parameter vector into a higher dimensional space through a transform

Φ (x)

, the problem can be moved in a more fit environment for optimized classification, with the separation hyperplane function in the new form [13]

f (x) = \sum_{j = 1}^{m} α_{j} Φ (x_{j}) \cdot Φ (x) + b,

(2)

where a crucial component emerges, the kernel function:

K (x_{j}, x) = Φ (x_{j}) \cdot Φ (x) .

(3)

Few different kernels are adopted in this work, and they are shown in Section 4.

As previously stated, SVMs are binary classifiers, which means that they can potentially distinguish data into one of only two classes. Classification on a number of classes N higher than 2 is feasible, but only by building many sub-classifiers, as seen in Figure 4, trained either to distinguish between a class and all others (One-vs-All) or between all possible couplings of classes (One-vs-One). In this research, multi-class classification is approached in a One-vs-One philosophy.

2.3. Neural Networks

Other widespread classifiers are Artificial Neural Networks, learning agents that improve their accuracy by tuning weights on their internal connections, which for their structure resemble the human brain. In this preliminary analysis, we will focus on one of the most simple layouts: The very popular Multilayer Perceptron (MLP) architecture, portrayed in Figure 5.

The first immediate “input” layer is defined in its node number N, by the chosen number of actual inputs of the network, which are the extracted features; the output layer typically contains as many nodes as the total classes P that the network is able to recognize. Assuming a Fully Connected (FC) architecture, the typical output

u_{j}

of one of the M nodes in the second (middle) layer is

u_{j} = φ (\sum_{i = 1}^{N} w_{i, j} x_{i} + b_{j}^{u}),

(4)

which is a function of all the

x_{j}

nodes of the previous input layer, of vectors of weights

w_{i, j}

, and the bias elements

b_{j}^{u}

. The activation function is

φ

: in the examined cases of this paper, a rectifier linear unit activation functions (ReLU), equivalent to a ramp function, will be employed from any node to the next one. Then, a normalized exponential function (also known as softmax function) serves as the last activation function for the output layer of the network, in order to produce a probability distribution of the available classes. Some instances of the testing campaign of Section 5 will employ slightly more complicated architectures, such as Deep Networks with more than one hidden layer, but the current chapter served a basic understanding of the underlying concepts of the most general learning agents.

2.4. Types of Faults on Signals

To understand what kind of inputs the FDIR algorithm has to deal with, it is useful to have an overview of the main kind of trends that could be expected on signals coming from faulty sensors. In practice, there is a limited number of typical behaviors that can summarize most possible faults that a sensor or an actuator could encounter [1]. A collection of common fault errors is reported in Figure 6, with a brief description for each of them.

Bias: The signal has an abrupt increase or decrease from its normal trend.
Drifting signal fault: The output shows an unexpected linear trend, drifting away from the nominal state.
Erratic behavior: Quality of the signal worsens significantly due to an increase in variance or noise.
Spikes: One or more sudden spikes appear in the output of the signal.
Stuck signal: Signal output gets stuck at a constant value.
Data loss: The signal presents gaps in which no data is available and the output is null.

All of the observed elements in this study, namely the three sensors and the actuator output, are assumed to be possibly subject to any of these faults. Systematic fault injection into nominal signals, for simulation purposes, required fine-tuning to adapt to each of the signals’ orders of magnitude and typical behaviors. If present, faults can appear anywhere between the first

5 %

and the last

20 %

of the simulated time, and signals can return to nominality before the end of the simulation, or proceed in their anomalous behavior until the end of the observation. For thorough testing and result collection, the simulation can also apply multiple faults of different types at once on the same signal.

3. Feature Extraction

Datasets are built by running numerous model simulations and therefore collecting nominal and faulty signals, but classification algorithms only receive inputs in the form of a limited number of discrete parameters, which are extracted during the post-processing of data and serve to collect only the most important and characterizing information from any input. This feature extraction phase is the fundamental relationship between the information stored in a raw dataset and the learning abilities of classifiers. The decision on how to choose which parameters to extract is a main topic in the realm of fault diagnosis and signal processing: this research’s contribution is to include parameters coming from different feature extraction philosophies. Table 1 defines the first batch of time-domain statistical features in the pool.

The first four features include basic and standard statistical quantities, sufficient for what is called simple “novelty detection” on a signal: a change in these features is a very first and macroscopic indicator of a new behavior on the signal, not necessarily a fault [22]. These features are mean value (AVG), standard deviation (STD), maximum value (MAX), and minimum value (MIN), where standard deviation

σ

is simply the square root of the variance

σ^{2}

, which was the main tracked feature for threshold-based detection. The other group of time-related statistical features in the table enhance the overall signal characterization: they are root mean square (RMS), square root of the amplitude (SRA), kurtosis value (KV), skewness value (SV), peak-to-peak value (PPV), crest factor (CF), impulse factor (IF), margin factor (MF), shape factor (SF), and kurtosis factor (KF). These features are well established for capturing behaviors and characterizing time-varying signals [6,7], as they are particularly significant: kurtosis, for instance, helps define the amount of extreme values in a distribution, while skewness quantifies the asymmetry of said distribution. Peak-to-peak values, crest factor, impulse factor, and margin factor are signal-based statistical metrics that, especially for relatively regular or periodic signals, are potential markers for outliers. Finally, the shape factor of a signal relates to its shape, while being independent from its magnitude.

Learning entities are provided with some more information concerning the frequency domain, through other typical signal processing features, added to the pool of this work as well [6]. The signal is further analyzed by extracting the frequency center (FC), root mean square frequency (RMSF), and root variance frequency (RVF), all related to the N frequency amplitudes

f_{i}

, as shown in Table 2.

As a last addition to the pool, the feature extraction process calculates the wavelet decomposition of the input 1-D signal using the Daubechies orthogonal wavelet [6]. This final tool, often used when extracting features from a signal, concerns both the time- and the frequency-domain localization of a signal. By choosing a waveform of reference (in this case, Wavelet Daubechies 4 or “db4”), it is possible to use it as a window function on the signal, while applying different scalings on the wavelet, therefore obtaining a multi-resolution analysis. What is actually saved as additional features to the pool are only the energy percentages corresponding to the approximation (low frequency), and the ones of the details (higher frequencies) up to level 8: by adding these to the previous list, the total number of features in the pool amounts to 26.

Before training a classifier, a feature ranking algorithm is used to associate an importance score to all the predictors fed to the classifier. In the presented work, the chosen ranking is operated by a Minimum Redundancy Maximum Relevance (MRMR) algorithm, which tries to decrease redundancy in the feature pool, while maximizing the relevance of the individual entries of the set.

4. Simulation and Dataset Generation

This section provides a detailed explanation of the testing setup, starting from the simulation environment and the characteristics of the satellite taken into consideration. Details are provided also on the description of the spacecraft and its specific subsystems, as well as the tunable parameters of the simulations, to produce a wide and diverse dataset.

4.1. Functional Engineering Simulator

All training datasets of this study are collected by running a high-fidelity functional engineering simulator: it is a validated environment, which replicates the Attitude Determination and Control System (ADCS) of a spacecraft, in MATLAB^® (R2024b) and Simulink^®. The choice for this Model-in-the-Loop (MIL) setup, the modularity, and the tunable parameters foresee a realistic implementation and validation phase. The simulator replicates continuous-time quantities and thus uses ode5 as a solver, with a fixed step size of

0.01

. Furthermore, the amount of parameters randomized at the beginning of each simulation is very large: they concern noise and random errors on sensors and actuators, initial orientation of the satellite, target rotations for slew maneuvers, and physical characteristics as well, and they are explored in Section 4.2. By setting up the algorithm in this way, the process of building large datasets recalls the Monte-Carlo method to remove unwanted biases and to let the final learning agents be trained on widely different scenarios.

The simulator models all the main primary disturbances for a typical Earth-orbiting satellite: the first ones are gravity-gradient torques, applied on the spacecraft under the hypothesis of a constant free fall acceleration. Solar radiation pressure plays a small role in applying different radiation forces on the spacecraft surfaces; the result is once again in a very weak torque, but with cumulative effects over time. The electromagnetic field of the planet, modeled through the IGRF-13 standard, is much more location-specific with respect to the assumed gravity field, and by interacting with a parasitic magnetic dipole onboard, assumed to be constant and reported in Table 3, produces once again environmental torques on the spacecraft. One last source of small perturbations on the orientation of the satellite is air drag, which is retrieved by modeling Earth’s atmosphere density (which is assumed to follow an exponential model, decreasing with altitude). Each run simulates at most a few minutes of satellite operations: in that small time frame, one of the main assumptions made is that perturbations act on the satellite orientation but have no effect on its orbit.

4.2. Spacecraft Description

The spacecraft architecture is chosen to be representative of many instances of real satellites orbiting Earth: this mimics a standard mission in the chosen spacecraft class, where computational capabilities are limited and where the result of this work could be useful and actually implementable. At the start of each run, randomized parameters slightly modify many architecture characteristics, but specific scenarios simulated are realistic enough to confidently generalize all of the results obtained: Table 3 gives more details on these simulation parameters. The case study follows a small satellite at around 700 Km altitude, on an almost circular orbit. The spacecraft has different moments of inertia along its three main axes, since in this case study, the design chosen was not axial-symmetric. Initial angular rates set at the beginning of each run are dependent on which attitude mode is chosen; orbital parameters of the satellite’s trajectory are randomized, while always keeping the orbit almost circular and the altitude within a 200 Km range from the average value. All perturbations modeled by the functional simulator have a small but measurable effect on the spacecraft: principal inertia, magnetic dipole, and other main physical details of the spacecraft are reported in Table 3 for reference.

Going deeper on the description of onboard elements, the spacecraft is equipped with a sensors suite composed of a three-axis gyroscope, a Sun sensor, and a magnetometer. The output of these elements can exhibit the faults described in Section 2: since this is the raw data upon which classifiers need to work, a realistic model should take into consideration also the errors and noise affecting the sensors. Generic sensors, as depicted in Figure 7, are affected by random and systematic errors such as scale factors, small misalignment, cross-axis sensitivity, biases, and white and brown noise. Furthermore, as a result of the resolution of the sensors and their sampling time, quantization and discretization of the signal play a substantial role in defining characteristics of the final output [1]. The gyroscope outputs the three measured angular rates in terms of [deg/s]; the signal output by the Sun sensor is simply a three-dimensional unitary vector indicating the Sun’s direction with respect to a body-frame of the satellite; magnetometers output the three-axis electromagnetic fields components measured in [T].

Raw signals coming from the sensors are processed by a 4th order low-pass Butterworth filter: this is fundamental to generate more manageable output and it is often a default function of real sensors. Data is then passed to the attitude estimation, which retrieves the satellite’s orientation in terms of quaternion and angular rates, through a Multiplicative Extended Kalman Filter (MEKF) [23]. The estimator, which is one of the many versions of an Extended Kalman Filter (EKF), takes the gyroscope’s filtered output as a very first estimate of the angular rates, associated with their own uncertainty, then uses the other two measures available from the sensors to correct the attitude prediction made. The MEKF then outputs estimates of the attitude expressed as the four component quaternions, and the three-axis angular rates.

The spacecraft model can follow one of three possible attitude modes. As a first basic option, actuators can be completely deactivated, leaving the spacecraft in a “No Control” mode, where initial angular rates are random and the only torques acting on the vehicle are perturbations and the ones due to initial angular accelerations. Another customized control action sets the spacecraft in an Earth-pointing mode, with a very slow angular rate around a precise axis, to follow throughout the simulation. Finally, to test an even more generic scenario, the simulation can reproduce a typical slew maneuver to mimic real operations of a satellite, where the attitude algorithm is instructed to start a rotation around one axis and return to stillness in a desired time length: during the creation of a large dataset, the 3D rotation axis is generated randomly at the beginning of every simulation, and the angle is randomized as well, reaching up to 60 deg in 100 s of maneuver. In this final iteration of the model, actuators models are substituted by a simulation block which simply takes the ideal target torque to impose to the satellite and feeds it back into the dynamics and kinematics of the model, with some noise applied. This is carried out in order to avoid unnecessary layers of complexity: by avoiding to refer to particular hardware types, the generality of the observations made is preserved.

4.3. Dataset Generation

Datasets of a few hundred entries are created via the simulation presented in Section 4. Specifically, the whole testing campaign presented in Section 5 makes use of a main archive of 12 large datasets, each one containing 840 (or 672 if reduced) nominal simulations: there are 12 because they concern the four monitored elements (three sensors and one actuator), for each of the three attitude modes (control-free, Earth-pointing, and slew maneuver). Fault injection, since the starting datasets are nominal, is performed offline, on MATLAB, with the type of faults and classes decided a priori. Identical fault injection algorithms are placed in the functional simulator directly after sensors and actuators: these online fully integrated versions are useful for single runs or specially constructed case studies, where diagnosis operations are executed live, during the run of a single simulation, such as testing cases explored in Section 5.4. Any class imbalance issue is avoided by directly generating the required precise number of dataset entries: this means that the actual number of elements in the dataset is also important, since it must be divisible exactly, successive times, by the number of classes and subclasses chosen. The numbers 840 or 672 as total entries (which will appear again in Section 5) are optimally large and are a heritage of preliminary testing experiments, where grouping and sub-grouping into three or seven classes was necessary. Signals can also go through the fault injection process a second time if multiple faults are needed. The algorithm also applies labels to each signal to specify their individual behavior: these labels are essential for Machine Learning classifiers to learn, and therefore must be kept track of, even during shuffling of the signal dataset. Finally, the actual feature extraction process can start: the 26 time and frequency features, selected and explored in Section 3, are calculated from each signal, and a brand new dataset composed only of these discrete inputs is created. Before starting each training, the dataset is split into two:

90 %

of the elements are actually used for training and validation purposes, while the remaining

10 %

are stored and used only at the end of the training.

All accuracies reported throughout the tables in Section 5 refer to the precision reached while evaluating this final 10% testing share of the dataset: validation accuracies are partial results, and therefore, are not reported. The full initial dataset size for each testing case is reported in the description of each training scenario. Training, in all cases, is performed with five-fold cross-validation, which helps not to run into overfitting issues. Cross-validation works by partitioning the shuffled dataset into a training set and a validation set: the training set is directly used to tune the algorithm and supervise the learning, while the validation set evaluates its performance. By repeating this validation process multiple times (five in this case) and by progressively using each slot of the dataset as a validation set, chances of asymmetric sampling are minimized and the net is rigorously trained on all portions of the given dataset in equal shares. Figure 8 presents this whole process in a more graphic way.

5. Results

This section deals with the results obtained throughout the work. Initial tests involve comparing the baseline variance-based detection algorithm with classifiers performing a simple four-flag detection. From there onward, the true potential of Machine Learning algorithms is highlighted, starting with a more flexible binary detection, with tests on multiple faults simultaneously and no additional training of the algorithms. Next, innovative techniques are employed on more refined tasks, such as a clear example of fault diagnosis through classification, which is already way out of the capabilities of simple thresholding. Finally, in Section 5.4 and Section 5.5, examples of uses in synergy of the two architecture models are explored to point out how a combined use is able to outperform the best capabilities of the individual entities.

The testing campaign involves four kinds of Support Vector Machines, respectively, with linear, quadratic, cubic, and Gaussian kernels: the Gaussian SVM requires a kernel scaling factor of 1.4, while in all instances of multi-class classification, as anticipated, the chosen architecture was One-vs-One. For what concerns Artificial Neural Networks, experiments focused on five different kinds of architectures. First, a small network with only one Fully Connected (FC) layer of 10 nodes; then, a slightly bigger version with 25 nodes in the FC layer; one architecture with a 100-element FC layer; finally, a Bilayered and a Trilayered architecture where each middle FC layer is made of 10 nodes. Activation functions for Neural Networks are always ReLU, except for the last layer, which serves as a probability distribution for the output. The input layer is the same for both SVMs and ANNs, and is made up of the 26 features described in Section 3. Output layer size changes based on which classification is tested, and all classifiers are summarized in Table 4.

5.1. Direct Comparison for Simple Detection

A first meaningful test to perform is an exact comparison between classic threshold-based detection and innovative techniques. Since variance-based detection is able to emit three flags plus a simple binary “out-of-range” check, classifiers are trained on a balanced dataset, whose entries exhibit nominal, anomalous, stuck, and loss-of-data behaviors: anomalous behavior can mean any of the faults among bias, erratic, drift, and spikes. Standard threshold-based detection is fed 20 s snippets of signals, as a moving window observing the live evolution of parameters: Machine Learning algorithms are trained on a feature pool extracted by the same dataset, which splices up dataset entries to mimic the same scanning window, as seen in Figure 9.

By looking at the accuracy results obtained in Table 5, some considerations can already be made. First off, the consistency of the column dedicated to classic detection, except for one outlier in the last row, reflects how fine-tuning of both faults and the detection thresholds makes it extremely easy to reach high detection rates. The accuracy percentage is not 100% only because classic methods are particularly in trouble when trying to detect drifting signals: if the slope is not steep, variance-based detection is not triggered unless it rapidly comes back to nominal values, therefore generating a spike in the moving variance (see Figure 6b). However, if the drift does not end before the end of the simulation, and no instrument range is crossed, classic detection might not be triggered at all. In this testing campaign, classic detection is considered successful only if the correct flag is triggered in the first 5 s from the start of a fault: given that faults are programmed to last at least 15% of the time span of the full dataset entry, it is safe to assume that drifts might be almost never detected by variance-based methods. By considering that drifting signals are one of the possible four behaviors labeled under “Anomaly”, and that there are four labels in total, we obtain a probability of a drifting trend happening in any observed signal of

0.25 \times 0.25 = 0.0625

: classic detection misses exactly

6.25 %

to reach complete successful detection of the dataset, which are all instances of drifts. Shifting the focus on classifiers, and by looking at the results mode by mode, some small increase in accuracy can be perceived on those signals whose trend is notoriously smoother (for example, Sun position sensors during a quite steady Earth-pointing mode); apart from that, Gaussian SVMs are reportedly the most unstable and less predictable in terms of behavior.

No particular suggestion on the use of SVMs rather than ANNs can be confidently given: this will come down to ease of implementation, simplicity in design, and computational resources used during their activity, aspects which will be tackled later. The main takeaway from this first batch of tests is that Machine Learning classifiers, when tasked with the same exact duty as the benchmark detection method, reach medium-high accuracy, but are not robust enough to let them deal on their own with this task with no external checks.

5.2. Classifiers for Binary Detection and Multiple Faults

While the baseline threshold-based detection preserves its rigid structure, one of the advantages of learning machines is their flexibility: in the search for an optimal use for classifiers, another strategy is to let them work on a simpler and more general task. For example, the following testing batch in Table 6 is made on binary classifiers, tasked with the simple acknowledgment of a fault in the object they are monitoring.

This time, the large datasets are equally split between a 50% of perfectly nominal entries, and another 50% of signals that show one of the six basic faults (erratic behavior, drift, spikes, bias, stuck signal, or data loss), with an equal chance of any of them happening to avoid class imbalance. Accuracy results are different with respect to Table 5, and in some instances, are slightly increased, but still not close enough to a robust and total detection for any sensor or attitude mode in particular. One possible takeaway from this part of the testing campaign could be to use a classifier trained this way in parallel to a classic variance-based detector: this would add redundancy with “competing” software that work on the same task but with two completely different principles and therefore can be a more robust solution with respect to two identical “watchers” working in parallel.

Within binary classification, other experiments can be made. For example, it is possible to study the behavior of classifiers with multiple faults applied. Up until now, all of the considered signals were either nominal or presented only one fault. For this new analysis, a brand new dataset is prepared in this way: half of the signals are completely nominal, the other half present up to two different faults in the same signal. Time segments are prepared in the usual way, which mimics a limited 20 s long moving window, to extract a snippet of signal from a real simulation. The peculiarity of the following analysis is that it is performed with the same binary classifiers tested in Table 6, which were strictly trained on a dataset containing at most single faults: none of the networks have been re-trained for this second task.

Results for this test are reported in Table 7: apparently, hyper-parameters in the classifiers and the interpretation of input features were balanced enough to let the machines improve their performances in a lot of instances, despite novel behaviors deviating from what was known from training. It is now clear we are drifting away from classic detection with these cases. While variance-based detection would simply turn on and off in an almost deterministic way due to the extremely specific tuning (while still ignoring drifts), here, the elasticity in behavior and adaptability of the method is much more promising for possible next steps for an enhanced and higher-level detection.

5.3. Targeted Fault Diagnosis and Computational Cost

Classifiers, when charged with similar tasks as classic detection algorithms, are promising, but not optimal. One possible step to take at this point is to exploit the actual “classification” function of Machine Learning agents. Let us imagine an FDIR loop where standard detection is implemented: this basic block can work with thresholds to understand if a signal is stuck or missing, or it can benefit from the binary classifiers tested before to spot any instances where other faulty behaviors are present. Once the “Anomaly” flag is triggered, however, no additional information is given on the type of fault. This first layer of detection has correctly identified a possible error, but has no immediate additional information to transmit to the system or to flight engineers. What the algorithm could do, instead, is to try and figure out on its own the exact kind of fault, to demonstrate its autonomy and to choose the appropriate response, based on the outcome. For this, an actual “Diagnostics” agent should be invoked: we can imagine it as another classifier which, given a faulty signal, precisely identifies which of the pre-defined anomalies it presents (bias, drift, spikes, erratic).

This is exactly the kind of scenario imagined for the next training and testing batch of classifiers, reported here below in Table 8. Here, the inputs are all known to be faulty signals, and the classifiers are tasked not with detection anymore, but with an identification of the anomaly, pointing out which of the four classes it belongs to. Results on this approach are much more promising, and showcase a more robust use for Machine Learning classifiers.

All agents show an increased accuracy: presumably, the features extracted from signals are more distinct and separate between one fault and the other, with respect to between a faulty and a nominal signal, like the previous instances. Also, the number and kinds of faults influenced this result: instead of six possible faults sharing the same label, like Table 6 and Table 7, faults are now distinct and only four, with the two “stationary” faults (data-loss and stuck) removed, as they are rather easily detectable by a simple imposed threshold, without the need to train a classifier to recognize them.

Given the high correct classification rate throughout the algorithms, the particular choice for which classifier to choose comes down to method preferences or availability and computational cost: for this last point, some important parameters on any Simulink^® model are obtainable through Profiler Reports. For example, by running one of the simulations for a slew maneuver, just like the ones that generated many of the training databases above, what we obtain are the times recorded and reported in Table 9.

Results are not totally unexpected: the first two rows refer to the four-class detection exactly equivalent to the classic variance-based one, the middle ones are binary fault/non-fault detectors, while the last ones are the more sophisticated diagnostic agents showcased in the example of Table 8. What can be observed is a general increase in processing speed in trained Neural Networks with respect to SVMs, which is good to know, given that for each example, they were built case by case to be completely interchangeable, maintaining the same inputs and outputs. On another note, Gaussian SVMs maintain their unpredictability and low consistency also in terms of computational resources used. Binary detectors take overall less time with respect to four-class classifiers, which is expected due to the lower internal complexity; four-class diagnostic agents are not too far from the first type of simple detectors in terms of computational time. This can be extremely convenient, since they turned out to be the most accurate and they are usually activated only once classic detection has first assessed a fault occurring.

What might be unexpected to notice are the report results in Table 10, concerning the baseline threshold-based detection, which was present and fully active in the same example simulation of the slew maneuver reported above.

This outcome, however, is easily explained: timing is way higher on the classic detectors portrayed in Figure 3; also, their Total Time implicitly contains a lot more signal processing, and thresholding operations with a variance signal, compared instant by instant: any of the above classifier was summoned only about a hundred times in 100 simulated seconds, and its task was simply to gather input features and give back a flag by applying already trained weights.

Training times are a slightly different matter with respect to simulation times of the single classifiers. While the latter concern actual performances and are pivotal for the quickness of the algorithms, training times are a one-time-only cost to pay during the implementation process. They were not reported in each instance, due to their non-recurring nature, but most importantly, for their average order of magnitude, which despite their slightly fluctuating values, is consistent and explained here. To be specific, on the hardware employed for this study, consisting in a 12th generation Intel^® Core^™ i7-1255U, 1700 MHz (10 cores), mounted on a 16 Gb RAM pc, the training times of all algorithms were well below 10 s: SVMs settled at around 2 s each, while Neural Networks required on a rough average 6 s for their training, in all cases. Important outliers are Cubic SVMs and Trilayered networks, which required up to 8 s of training. These results derive once again from choosing very simple architectures for the two classifier families, and confirm once again the quickness of ML approaches when compared to other strategies, such as Deep Learning. It is easy to see how, even with datasets made of hundreds of entries, training times can be considered minimal in an actual SIL or HIL implementation, and even more in real mission planning scenarios, where they are totally negligible. With these considerations, one may have a lesser concern with these aspects, and care much more about real-time performance: all things considered, trained classifiers might be even more attractive as a solution at this point.

5.4. Possible Use in Synergy: Detection and Diagnosis

As previously stated, combining the two detection philosophies explored in this work tends to give the best results overall. The two final rows of Figure 2 foresee these last case studies, which concern classifiers perfectly embedded and implemented into the functional simulator, instead of being trained and tested in a distinct environment. As a first example of synergy use, it is possible to replicate exactly the scenario described in Section 5.3. In this simulation, during a slew maneuver, one of the axes of the magnetometer starts behaving unexpectedly. In particular, some erratic behavior starts at

T = 44

s into the simulation, as shown in Figure 10a.

This is a particularly troublesome behavior affecting sensors, since the MEKF placed directly down in the pipeline of the spacecraft data processing might filter out the excessive noise, rendering the fault possibly unnoticeable. However, in the functional simulation, the sensor in question is being monitored by standard detection algorithms: the three axes of the magnetometers are being processed live, and an “Anomaly” flag is almost immediately turned on (Figure 10b). On a spacecraft provided only with a threshold-based algorithm, the detection function would stop here. However, by applying the strategy described in the previous section, it is possible to start an in-depth diagnosis process, operated by a trained classifier, for a more detailed analysis. Therefore, in this example, the Anomaly flag activates a Large Neural Network for fault diagnosis of Magnetometers (specialized in slew maneuvers), precisely the one trained and tested here before, with details and accuracy reported in Table 8. The network is activated immediately, at

T = 44

s, and its sliding window starts to process the past 20 s of simulation, sliding the window and refreshing its output once every second. After a first round where the algorithm seems to have found evidence of spikes on the signal, the network stabilizes its diagnosis by consistently reporting an erratic behavior from

T = 46

s onward. If this information is communicated to other recovery systems onboard, other decision processes could pick the most appropriate response, basing their choice on more than a simple binary fault alarm. This example is built by directly plugging in the simulation one of the trained networks showcased before, proving how the testing process has followed a highly modular approach.

Additionally, a brief initial instance of mislabeling in Figure 10c is understandable, since the network is trained, as explained in Section 2.4, only on signals containing faults that start between the first

5 %

and the last

20 %

of the analyzed window: an improved version of the same network could be fed more extreme cases during training, so that during its operation, its quickness on recognizing faults can improve.

5.5. Possible Use in Synergy: MEKF Enhanced Detection

As a final study case, let us address the main problem found while evaluating threshold-based detection performances: the impossibility to detect drifting signals. A solution could be to set up a learning agent that collects the outputs of the MEKF in the configuration of Figure 11. The classification network is trained on a dataset of 840 short snippets of MEKF-estimated angular rates, recorded during simulated slew maneuvers, half of which behave in an irregular way due to a drift injected on the actuators. Inputs to the learner are the previously explained 26 features from each one of the axes, so a total of 78 elements. Therefore, once the Neural Network is inserted in the loop, the estimated angular rates incoming from the MEKF need to be split into their three axes components; then, they shall be buffered with the desired length of 20 s at a time, and at that point, features are finally extracted. This is a chance both to show the general setup of a trained learning agent in the loop, as well as to address a more specific and peculiar example: detecting drift faults in actuators, by only looking at MEKF outputs.

Figure 12 explores exactly this case study: the Neural Network detector starts by emitting a null output for the first 20 s, since it does not have enough data to work with. After that, the window starts moving, and real detection effectively starts: it only takes a few seconds, just enough time to incorporate the start of the anomaly in the window, for the network to realize something is wrong and to raise a flag.

It is clear how, despite a lower rate of update of the Neural Network output (the window updates each second, so the resolution is much lower than the sampling time of the simulation), it surpasses the classic threshold-based detector by being extremely quicker and more targeted. As previously said, the training in this case was specifically carried out to identify drifts only, which are the anomalies totally oblivious to classic detection, but it is interesting to notice how the classifier works by monitoring some parameters in a completely different subsystem with respect to the location of the fault. It is easy to imagine other examples of classifiers and classic detection coming together to fill each others’ weaknesses and blind spots, even while staying in the realm of simple detection.

6. Final Remarks

In this research, the challenge of a reliable data-driven fault detection and diagnosis strategy for space systems is explored. The focus has been brought in particular to Support Vector Machines and Neural Networks, trained and validated by large datasets produced in a high-fidelity functional engineering simulator, and compared with a benchmark variance-based method. The kinds of possible faults happening on a space platform can be reduced to a small pool of basic instances, and the feature extraction phase has been showcased in detail as one of the main elements responsible for the quality and accuracy of classifiers in use. After a description of the hypotheses employed and details of the satellite replicated in the simulation, the testing phase was finally set up.

The benchmark classic threshold-based detection algorithm has shown its strength points and weaknesses in terms of the accuracy reached even with a meticulous tuning of parameters. The completely different approach and higher flexibility given by AI-aided classifiers apparently hints to a progressive substitution of classic methods, or at least to a use in parallel of both methods for a fail-safe redundancy. However, performances of classifiers are not robust enough to confidently replace classic methods in their totality: what is recommended instead is a work in synergy, to outperform the capabilities of both the individual architectures. Classic detection is optimal for simple checks, like the ones that signal an out-of-range sensor or an actuator that is stuck on a constant value, but it runs into trouble when tasked with identifying useful information on a signal with a never-before-seen variance trend with respect to nominality. It would be beneficial, at that point, to let a more refined classifier enter the analysis and come in to support these simple monitoring functions, by filling their capability gaps. Classifiers like SVMs and Neural Networks are able to process signals features with an extraordinary sophistication and ANNs, in particular, with an extremely high computational quickness: apart from the training time needed in the project phase, real-time performances are superior in most instances. By coupling the use of variance-based methods adopting simple thresholds, with the accuracy and quickness of trained learning agents for a deeper diagnosis, much more information can be extracted after a fault has happened, and recovery decisions taken autonomously onboard, if shaped on this enriched awareness of all subsystem statuses, will reach autonomy levels never approached until now. Thanks to Machine Learning and artificial intelligence, spacecraft can reach a high level of robust self-diagnostics, not only for GNC and sensor management but by following this novel paradigm for the entire space system, achieving an increased lifetime and an ever-improving self-healing capability.

Author Contributions

Conceptualization, A.C. and E.C.; methodology, A.C. and E.C.; formal analysis, E.C.; data curation, E.C.; writing—original draft preparation, E.C.; writing—review and editing, A.C. and E.C.; visualization, E.C.; supervision, A.C.; project administration, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries are welcomed and can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADCS	Attitude Determination and Control System
AI	Artificial Intelligence
ANN	Artificial Neural Network
CNN	Convolutional Neural Network
FC	Fully Connected
FDIR	Fault Detection, Isolation, and Recovery
GNC	Guidance, Navigation, and Control
MEKF	Multiplicative Extended Kalman Filter
ML	Machine Learning
SVM	Support Vector Machine

References

Pesce, V.; Colagrossi, A.; Silvestrini, S. Modern Spacecraft Guidance, Navigation and Control-Introduction; Elsevier: Amsterdam, The Netherlands, 2023; pp. 3–42. [Google Scholar] [CrossRef]
Lobo, J.S.; Ghiglino, P.; Escobedo, S.L.; Rivo, M.S.; Robotics, K. Design of a model-based failure detection isolation and recovery system for cubesats. In Proceedings of the 8th European Conference for Aeronautics and Aerospace Sciences (EUCASS), Madrid, Spain, 1–4 July 2019. [Google Scholar]
Xu, S.; Zheng, Z.; Wang, L.; Wang, H.; Chai, Y.; Ma, M.; Zheng, W.X. Multiple Open-Switch Fault Diagnosis of Grid-Connected Three-Phase Inverters Under Unknown Parameter Conditions Using ICRLS and Disturbance Sliding Mode Observer. IEEE Trans. Power Electron. 2025, 40, 8631–8647. [Google Scholar] [CrossRef]
Marzat, J.; Piet-Lahanier, H.; Damongeot, F.; Walter, E. Fault diagnosis for nonlinear aircraft based on control-induced redundancy. In Proceedings of the 2010 Conference on Control and Fault-Tolerant Systems (SysTol), Nice, France, 6–8 October 2010; pp. 119–124. [Google Scholar] [CrossRef]
A review of data-driven fault detection and diagnostics for building HVAC systems. Appl. Energy 2023, 339, 121030. [CrossRef]
Rauber, T.; Boldt, F.; Varejao, F. Heterogeneous Feature Models and Feature Selection Applied to Bearing Fault Diagnosis. IEEE Trans. Ind. Electron. 2015, 62, 637–646. [Google Scholar] [CrossRef]
Jan, S.U.; Lee, Y.; Shin, J.; Koo, I. Sensor Fault Classification Based on Support Vector Machine and Statistical Time-Domain Features. IEEE Access 2017, 5, 8682–8690. [Google Scholar] [CrossRef]
Samanta, B. Gear fault detection using artificial neural network & support vector machine with genetic algorithms. Mech. Syst. Signal Process. 2004, 18, 625–644. [Google Scholar] [CrossRef]
Gao, Y.; Yang, T.; Xing, N.; Xu, M. Fault detection and diagnosis for spacecraft using principal component analysis and support vector machines. In Proceedings of the 2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA), Singapore, 18–20 July 2012; pp. 1984–1988. [Google Scholar] [CrossRef]
Ding, H.; Liang, Z.; Qi, L.; Sun, H.; Liu, X. Spacecraft Leakage Detection Using Acoustic Emissions Based on Empirical Mode Decomposition and Support Vector Machine. In Proceedings of the 2021 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Glasgow, UK, 17–20 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
Ma, Y.; Guo, G. Support Vector Machines Applications; Springer International Publishing: Cham, Switzerland, 2014; pp. 1–302. [Google Scholar] [CrossRef]
Aoudi, W.; Barbar, A.M. Support vector machines: A distance-based approach to multi-class classification. In Proceedings of the 2016 IEEE International Multidisciplinary Conference on Engineering Technology (IMCET), Beirut, Lebanon, 2–4 November 2016; pp. 75–80. [Google Scholar] [CrossRef]
Brunton, S.; Kutz, J. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar] [CrossRef]
Sorsa, T.; Koivo, H.; Koivisto, H. Neural networks in process fault diagnosis. Syst. Man Cybern. IEEE Trans. 1991, 21, 815–825. [Google Scholar] [CrossRef]
Samanta, B.; Al-Balushi, K. Artificial Neural Network based fault diagnostics of rolling element bearings using time-domain features. Mech. Syst. Signal Process. 2003, 17, 317–328. [Google Scholar] [CrossRef]
Ma, W.; Wang, J.; Song, X.; Qi, J.; Yu, Y.; Hu, D. Data-Driven Model Space Method for Fault Diagnosis of High-Speed Train Air Brake Pipes. Appl. Sci. 2023, 13, 8335. [Google Scholar] [CrossRef]
Park, P.; Marco, P.D.; Shin, H.; Bang, J. Fault Detection and Diagnosis Using Combined Autoencoder and Long Short-Term Memory Network. Sensors 2019, 19, 4612. [Google Scholar] [CrossRef] [PubMed]
Valdes, A.; Khorasani, K.; Ma, L. Dynamic Neural Network-Based Fault Detection and Isolation for Thrusters in Formation Flying of Satellites. Adv. Neural Netw. 2009, 5553, 780–793. [Google Scholar] [CrossRef]
O’Meara, C.; Schlag, L.; Wickler, M. Applications of Deep Learning Neural Networks to Satellite Telemetry Monitoring. In Proceedings of the 2018 SpaceOps Conferences, Marseille, France, 28 May–1 June 2018. [Google Scholar] [CrossRef]
Li, X.; Zhang, T.; Liu, Y. Detection of Voltage Anomalies in Spacecraft Storage Batteries Based on a Deep Belief Network. Sensors 2019, 19, 4702. [Google Scholar] [CrossRef] [PubMed]
Colagrossi, A.; Lavagna, M. Fault Tolerant Attitude and Orbit Determination System for Small Satellite Platforms. Aerospace 2022, 9, 46. [Google Scholar] [CrossRef]
Martínez-Heras, J.A.; Donati, A. Enhanced telemetry monitoring with novelty detection. AI Mag. 2014, 35, 37–46. [Google Scholar]
Markley, L. Multiplicative vs. Additive Filtering for Spacecraft Attitude Determination; NASA Goddard Space Flight Center: Greenbelt, MD, USA, 2004. [Google Scholar]

Figure 1. Scheme of data-driven detection performed. Outputs of sensors and actuators, the signals studied in this work, follow two different pipelines: in one case, they are directly processed by a threshold-based detection. In parallel to that, signals generated during many simulations are gathered in datasets, then processed to extract and rank by importance only a few meaningful features. These are the actual inputs to train learning agents, which are the other detection elements of this study.

Figure 2. Test instances for data-driven detection in this study. Variance-based detection has a rigid structure and therefore is employed only for its main use on the first row of this graph: it may raise a flag when monitoring an anomalous signals. Machine Learning algorithms, exhibiting more flexibility, are trained on 4-class detection of one or more faults, binary detection, and diagnosis on faulty signals. The last two instances of this graph, surrounded by the dashed line, represent the specific case-study tested within a live simulation. All of these tests are detailed in Section 5.

Figure 3. Classic detection block layout. The input is the low-pass filtered signal coming from a monitored sensor or actuator: it undergoes various threshold comparisons and value checks to understand if some fault is happening (signal exceeded the instrument range, is stuck, or there is partial loss of data). At the same time, the running variance of the signal is calculated axis by axis and is checked not to be too high (in which case, the anomaly flag is raised) and not too low (in that case, the information is transferred to the other part of the process, where other flags may be raised). Right before each output flags, there are custom-made delayers: they make sure that no flag is turned off, unless the alarm has stayed consistently off for at least 10 consecutive simulation samples [21].

Figure 4. Comparison of One-vs-One and One-vs-All architectures for multi-class SVMs.

Figure 5. Example structure of a simple MLP with only one hidden layer.

Figure 6. Basic types of anomalous measurements. The signal taken as example is one-axis gyroscope data during a slew maneuver. In order: bias (a), drifting signal (b), erratic behavior (c), spikes (d), stuck signal (e), data loss (f).

Figure 7. Functional model of a generic sensor.

Figure 8. Dataset processing to obtain testing accuracy.

Figure 9. Differences in the two processing methods. While classic detection in the functional engineering simulator works through a moving window scanning real-time signals to extract the variance, feature extraction needs those same windows to splice the signal and extract meaningful features on short time-periods. For a rigorous training and for immediate modularity in the functional simulator, the sliding window used by all methods is 20 s long.

Figure 10. Plot (a) shows a faulty magnetometer signal. Plot (b) highlights how the classic threshold-based detector quickly raises a flag, indicating an anomaly. In Figure (c), the Large Neural Network responsible for diagnostic (from Table 8) attempts to add details to the analysis by pointing out exactly what kind of fault is happening.

Figure 11. MEKF output classification network layout.

Figure 12. Example of monitored signals from the proposed architecture. Figure (a) shows the actuator output, with an injected drift fault starting at time t = 20 s. Figure (b) shows the angular rates estimated by the MEKF, with a visible repercussion of a fault happening elsewhere; (c) is the “Anomaly” variance-based detector monitoring the actuators, which emits a flag only when the continuous fault injection is stopped; (d) is the Large Neural Network monitoring the MEKF outputs. This is one of the mentioned instances where a fault happening on actuators completely bypasses threshold-based detection but is seen and recognized by classifiers strategically trained and placed on the attitude estimation block outputs.

Table 1. Time-domain statistical features, from a generic vector input signal x of length N.

$AVG = μ_{x} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}$	$STD = σ = {(\frac{1}{N - 1} \sum_{i = 1}^{N} {(x_{i} - μ_{x})}^{2})}^{1 / 2}$
$MAX = max (x)$	$MIN = min (x)$
$RMS = {(\frac{1}{N} \sum_{i = 1}^{N} x_{i}^{2})}^{1 / 2}$	$SRA = {(\frac{1}{N} \sum_{i = 1}^{N} \sqrt{\| x_{i} \|})}^{2}$
$KV = \frac{1}{N} \sum_{i = 1}^{N} {(\frac{x_{i} - μ_{x}}{σ})}^{4}$	$SV = \frac{1}{N} \sum_{i = 1}^{N} {(\frac{x_{i} - μ_{x}}{σ})}^{3}$
$PPV = max (x) - min (x)$	$CF = \frac{max (\| x \|)}{RMS}$
$IF = \frac{max (\| x \|)}{\frac{1}{N} \sum_{i = 1}^{N} \| x_{i} \|}$	$MF = \frac{max (\| x \|)}{SRA}$
$SF = \frac{RMS}{{(\frac{1}{N} \sum_{i = 1}^{N} x_{i}^{2})}^{1 / 2}}$	$KF = \frac{\frac{1}{N} \sum_{i = 1}^{N} {(\frac{x_{i} - μ_{x}}{σ})}^{4}}{{(\frac{1}{N} \sum_{i = 1}^{N} x_{i}^{2})}^{2}}$

Table 2. Frequency-domain statistical features.

$FC = \frac{1}{N} \sum_{i = 1}^{N} f_{i}$	$RMSF = {(\frac{1}{N} \sum_{i = 1}^{N} f_{i}^{2})}^{1 / 2}$
$RVF = {(\frac{1}{N} \sum_{i = 1}^{N} {(f_{i} - FC)}^{2})}^{1 / 2}$

Table 3. Simulation parameters at the start of each run.

Element	Details	Value	Units
Spacecraft’s Inertia	Asymmetric	$I_{x x} = 1.000$
		$I_{y y} = 0.250$	[kg m²]
		$I_{z z} = 0.950$
Parasitic magnetic dipole		[0.1 0.1 0.1]	[A m²]
Initial angular rates	Mode: No Control	$\sim [0.1 0.1 0.1]$
	Mode: Earth pointing	[ $10^{- 6} 10^{- 6} 10^{- 6}$ ]	[rad/s]
	Mode: Slew	[ $10^{- 6} 10^{- 6} 10^{- 6}$ ]
Initial orientation		Uniform random	[−]
Spacecraft altitude		$700 \pm 100$	[km]
Orbital parameters	Semimajor axis	$a = R_{\oplus} + a l t i t u d e$	[km]
	Eccentricity	$e \sim 0.01$	[−]
	A.N. longitude	$Ω = 0 - 359$	[deg]
	Inclination	$i = 0 - 90$	[deg]
	Periapsis argument	$ω = 0 - 359$	[deg]
	Initial true anomaly	$θ_{0} = 0 - 359$	[deg]

Table 4. Classifiers employed during the testing campaign.

Name	Kernel (SVM)	Hidden Layer Nodes (ANN)
Linear SVM	$x_{1}^{T} x_{2}$	-
Quadratic SVM	${(x_{1}^{T} x_{2} + 1)}^{2}$	-
Cubic SVM	${(x_{1}^{T} x_{2} + 1)}^{3}$	-
Gaussian SVM	$exp (- \| \| x_{1} - x_{2} \| \|^{2} / (2 σ^{2}))$	-
Narrow Neural Network	-	[10]
Medium Neural Network	-	[25]
Large Neural Network	-	[100]
Bilayered Neural Network	-	[10/10]
Trilayered Neural Network	-	[10/10/10]

Table 5. Direct comparison: test batch % accuracy ¹ on a 4-class classification. Each sensor in each attitude mode is validated and tested on a balanced dataset of 672 total entries.

Attitude Mode	Sensor	Classic	SVM				ANN
Attitude Mode	Sensor	Classic	Linear	Quadratic	Cubic	Gaussian	Narrow	Medium	Large	Bilayered	Trilayered
Pointing	GYRO	93.75	88.1	91.1	94	80.6	82.1	88.1	92.5	89.6	94
	MAG	93.75	86.6	91	82.1	79.1	88.1	85.1	83.6	80.6	80.6
	SUN	93.75	88.1	91	91	89.6	88.1	92.5	95.5	95.5	94
	ACTS	93.75	83.6	82.1	88.1	82.1	89.6	91	89.6	86.6	86.6
Slew	GYRO	93.75	88.1	88.1	88.1	83.6	86.6	89.6	80.6	91	92.5
	MAG	93.75	86.6	88.1	83.6	77.6	88.1	86.6	86.6	85.1	74.6
	SUN	93.75	89.6	91	88.1	94	94	94	92.5	89.6	88.1
	ACTS	93.75	92.5	91	83.6	86.6	88.1	83.6	83.6	82.1	79.1
No Control	GYRO	93.75	80.6	86.6	73.1	76.1	91	91	85.1	85.1	95.5
	MAG	93.75	83.6	86.6	89.6	83.6	88.1	83.6	82.1	86.6	88.1
	SUN	93.6	92.5	92.5	88.1	92.5	89.6	89.6	97	94	94
	ACTS	-	-	-	-	-	-	-	-	-	-

¹ Results reported in bold are the highest accuracies reached by each classifier in this example.

Table 6. Binary classifiers training: test batch % accuracy ¹ on a 2-class classification. Each sensor in each attitude mode is validated and tested on a balanced dataset of 672 total entries.

Attitude Mode	Sensor	SVM				ANN
Attitude Mode	Sensor	Linear	Quadratic	Cubic	Gaussian	Narrow	Medium	Large	Bilayered	Trilayered
Pointing	GYRO	89.6	92.5	95.5	92.5	97	97	95.5	98.5	94
	MAG	94	91	91	91	95.5	94	95.5	95.5	95.5
	SUN	91	91	71.6	89.9	94	94	91	91	94
	ACTS	89.6	92.5	91	79.1	91	91	92.5	95.5	95.5
Slew	GYRO	89.6	92.5	92.5	82.1	88.1	91	91	91	88.1
	MAG	86.6	97	95.5	83.6	95.5	97	95.5	92.5	94
	SUN	94	100	85.1	92.5	95.5	97	94	97.7	98.5
	ACTS	92.5	94	92.5	83.6	95.5	92.5	92.5	97	92.5
No Control	GYRO	89.6	94	88.1	91	88.1	92.5	91	91	89.6
	MAG	89.6	95.5	91	88.1	95.5	95.5	92.5	95.5	95.5
	SUN	95.5	95.5	61.6	91	98.5	95.5	97	95.5	92.5
	ACTS	-	-	-	-	-	-	-	-	-

¹ Results reported in bold are the highest accuracies reached by each classifier in this example.

Table 7. Binary classifiers testing with multiple faults: test batch % accuracy ¹ on a 2-class classification. Each sensor in each attitude mode is validated and tested on a balanced dataset of 672 total entries.

Attitude Mode	Sensor	SVM				ANN
Attitude Mode	Sensor	Linear	Quadratic	Cubic	Gaussian	Narrow	Medium	Large	Bilayered	Trilayered
Pointing	GYRO	94	98.5	97	97	98.5	100	98.5	98.5	97
	MAG	97	94	82.1	97	98.5	97	97	95.5	97
	SUN	95.5	95.5	65.7	100	94	92.5	95.5	94	92.5
	ACTS	91	95.5	82.1	92.5	98.5	89.6	98.5	91	91
Slew	GYRO	95.5	97	74.6	92.5	97	95.5	98.5	95.5	83.6
	MAG	86.6	97	95.5	83.6	95.5	97	95.5	92.5	94
	SUN	94	91	56.7	89.6	91	95.5	91	92.5	97
	ACTS	92.5	100	59.7	98.5	98.5	92.5	95.5	98.5	94
No Control	GYRO	95.5	94	83.6	98.5	91	98.5	98.5	94	97
	MAG	95.5	95.5	91	95.5	97	97	98.5	98.5	98.5
	SUN	91	89.6	76.1	98.5	98.5	98.5	100	98.5	98.5
	ACTS	-	-	-	-	-	-	-	-	-

¹ Results reported in bold are the highest accuracies reached by each classifier in this example.

Table 8. Autonomous diagnostics agents: test batch % accuracy on a 4-class classification. Each sensor in each attitude mode is validated and tested on a balanced dataset of 480 total entries.

Attitude Mode	Sensor	SVM				ANN
Attitude Mode	Sensor	Linear	Quadratic	Cubic	Gaussian	Narrow	Medium	Large	Bilayered	Trilayered
Pointing	GYRO	100	100	100	100	100	100	100	100	100
	MAG	100	100	100	97.9	97.9	100	97.9	100	95.8
	SUN	100	100	100	100	100	100	100	100	97.9
	ACTS	100	100	100	100	100	100	100	100	97.9
Slew	GYRO	100	100	100	100	100	100	100	100	100
	MAG	100	100	100	95.8	97.9	97.9	100	97.9	97.9
	SUN	100	100	100	100	100	100	100	100	100
	ACTS	100	100	100	100	100	100	100	100	100
No Control	GYRO	100	100	100	100	100	97.9	97.9	100	97.9
	MAG	97.9	97.9	100	95.8	97.9	100	100	97.9	95.8
	SUN	100	100	100	100	100	100	100	100	100
	ACTS	-	-	-	-	-	-	-	-	-

Table 9. Profiler-Times Report (in seconds) on a slew maneuver run with all classifiers active. Total simulation time: 16.455 s.

Type Mode	Time	SVM				ANN
Type Mode	Time	Linear	Quadratic	Cubic	Gaussian	Narrow	Medium	Large	Bilayered	Trilayered
4-class detection	Total Time	0.008	0.017	0.016	0.226	0.005	0.004	0.004	0.004	0.004
4-class detection	Self Time	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001
Binary detection	Total Time	0.002	0.004	0.003	0.080	0.004	0.004	0.004	0.004	0.004
Binary detection	Self Time	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001
4-class diagnosis	Total Time	0.007	0.024	0.016	0.079	0.005	0.004	0.004	0.004	0.004
4-class diagnosis	Self Time	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001

Table 10. Profiler-Times Report (in seconds) on a slew maneuver run for classic detectors. Total simulation time: 16.455 s.

	Gyroscope	Sun Sensor	Magnetometer	Actuators
Total Time	0.428	0.411	0.446	0.468

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Crotti, E.; Colagrossi, A. Machine Learning Approaches for Data-Driven Self-Diagnosis and Fault Detection in Spacecraft Systems. Appl. Sci. 2025, 15, 7761. https://doi.org/10.3390/app15147761

AMA Style

Crotti E, Colagrossi A. Machine Learning Approaches for Data-Driven Self-Diagnosis and Fault Detection in Spacecraft Systems. Applied Sciences. 2025; 15(14):7761. https://doi.org/10.3390/app15147761

Chicago/Turabian Style

Crotti, Enrico, and Andrea Colagrossi. 2025. "Machine Learning Approaches for Data-Driven Self-Diagnosis and Fault Detection in Spacecraft Systems" Applied Sciences 15, no. 14: 7761. https://doi.org/10.3390/app15147761

APA Style

Crotti, E., & Colagrossi, A. (2025). Machine Learning Approaches for Data-Driven Self-Diagnosis and Fault Detection in Spacecraft Systems. Applied Sciences, 15(14), 7761. https://doi.org/10.3390/app15147761

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Approaches for Data-Driven Self-Diagnosis and Fault Detection in Spacecraft Systems

Abstract

1. Introduction

2. Proposed Methodology of This Study

2.1. Threshold-Based Detection

2.2. Support Vector Machines

2.3. Neural Networks

2.4. Types of Faults on Signals

3. Feature Extraction

4. Simulation and Dataset Generation

4.1. Functional Engineering Simulator

4.2. Spacecraft Description

4.3. Dataset Generation

5. Results

5.1. Direct Comparison for Simple Detection

5.2. Classifiers for Binary Detection and Multiple Faults

5.3. Targeted Fault Diagnosis and Computational Cost

5.4. Possible Use in Synergy: Detection and Diagnosis

5.5. Possible Use in Synergy: MEKF Enhanced Detection

6. Final Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI