Deep Generative Models-Based Anomaly Detection for Spacecraft Control Systems.

A spacecraft attitude control system provides mechanical and electrical control to achieve the required functions under various mission scenarios. Although generally designed to be highly reliable, mission failure can occur if anomalies occur and the attitude control system fails to properly orient and stabilize the spacecraft. Because accessing spacecraft to directly repair such problems is usually infeasible, developing a continuous condition monitoring model is necessary to detect anomalies and respond accordingly. In this study, a method for detecting anomalies and characterizing failures for spacecraft attitude control systems is proposed. Herein, features are extracted from multidimensional time-series data of a simulation of the attitude control system. Then, the artificial neural network learning algorithms based on two types of generation models are applied. A Bayesian optimization algorithm with a Gaussian process is used to optimize the hyperparameters for the neural network to improve the performance. The performance is evaluated based on the reconstruction error through the algorithm using the newly generated data not used for learning as input data. Results show that the detection performance depends on the operating characteristics of each submode in the operation scenarios and type of generation model. The diagnostic results are monitored to detect anomalies in operation modes and scenarios.


Introduction
Complex systems can sometimes fail or behave abnormally, which can affect the success or failure of a given mission. Therefore, detection of these related phenomena occurring in the system is important for decision making in order to successfully perform a mission. Several statistical methods have been developed to address various issues including, more recently, machine learning techniques [1]. The two main challenges associated with failure detection are fault diagnosis (FD) and anomaly detection (AD): FD is intended to determine what failure occurred, and AD is used to determine the extent of the deviation from the proper operating range of the system due to the failure [2]. AD does not rely on prior information about the type or form of failure as it only determines if the system is operating outside of its normal range. This approach is gaining attention as a way to evaluate the health of large, complex systems, such as aerospace systems. However, it is difficult to know in advance the mechanism by which faults occur and to obtain data regarding fault conditions because most aerospace systems are designed with high-reliability [3,4]. Further, in complex systems, anomalies occur via complicated pathways. Therefore, a method for processing multidimensional data coming from multiple channels is required.
In this study, a spacecraft system is considered as a representative example of a typical large system in aerospace. A spacecraft system anomaly is defined as any mission-degrading event affecting an data that can be clearly labeled as normal and abnormal. The data set utilizes soft-sensor-based data generated by LUNar Attitude and Orbit Control System (AOCS) SIMulator (LUNASIM) [21], a satellite attitude control simulator developed in-house at the Korea Aerospace Research Institute (KARI) to test the performance of attitude control flight software for the Korea Pathfinder Lunar Orbiter (KPLO) [22]. A series of processes for AD using multidimensional time series spacecraft AOCS data includes the following works: (a) preprocessing of multi-channel time series data (including purification, integration, cleanup and transformation) and data feature extraction using deep neural networks; (b) selecting diagnostic methods based on generative models considering actual data conditions (e.g. number of samples and level of labeling); (c) improving algorithm performance and robustness by applying automatic determination of hyperparameters using Bayesian optimization (BO) with a Gaussian process (GP) and data processing considering the characteristics of physical model and applied deep networks; (d) reviewing the suitability and validity of the system by comparing the predicted results between two selected generation models (VAE [23] and generative adversarial networks AD (GANomaly) [24]); (e) presenting AD monitoring cases in the spacecraft attitude control system based on the diagnosis results.

Problem and Data Description
This section describes the problem to be solved, the simulator used to generate data and characteristics of the data.

Problem Description
A spacecraft attitude control system determines and controls a spacecraft's attitude and momentum using various sensors and actuators within the limits defined by the payload pointing and the stability requirements until the mission is completed. Although the spacecraft is designed to be highly reliable in general, anomalies can lead to deviations from the requirements. Certain anomalies are apparent from the telemetry alarms, such as when a sensor fails; however, other anomalies may manifest as a collection of factors, trends, or nonlinear differences from the normal telemetry that are not apparent. Such a spectrum of anomalies presents special challenges when the spacecraft has difficulty making ground contact because of low communication bit rates, large distances, irregular contact times, and inconveniently located ground stations, such as during a lunar mission. An AD system using machine learning could be an effective tool in this case because it could proactively and semi-autonomously detect possible anomalies. In this study, we develop a model for detecting anomalies in the attitude control system signals for the development of KPLO. Table 1 shows the typical spacecraft attitude control system units and their corresponding output signals that are used by KPLO. KPLO is KARI's first lunar mission; therefore, the operational data is not available. We used simulated attitude control performance data generated by LUNASIM [21]. Certain aspects of the KPLO mission are similar to low-Earth and geosynchronous orbit missions; however, key differences exist that prevent us from blindly applying existing Earth-orbit AD standards. For example, a coordinate system transition occurs when entering the lunar orbit [22]. Moreover, the spacecraft experiences large angular rates and disturbances while preparing for the lunar orbit and during the lunar orbit insertion at equinoxes for reorienting the solar arrays. Large angular rates are also experienced at the apogee Sensors 2020, 20, 1991 4 of 20 and perigee during the lunar transfer orbit. An AD must consider these unique data characteristics to prevent false positives. In the next section, we provide a general description of LUNASIM, its output data, and the various types of simulated faults.

LUNASIM
LUNASIM [21] is being developed in-house at KARI to test the performance of attitude control flight software for the KPLO, a lunar orbiter mission currently in its critical design phase. LUNASIM is composed of attitude control flight software (FSW), equipment, environment, and dynamics models, I/O handlers, and simulation infrastructure. It can run flight software in a software-only model-in-the-loop mode on a PC, or in hardware-in-the-loop mode with flight software running on an actual onboard computer (OBC) interfacing with equipment hardware and model software for environment and dynamics.
KPLO attitude control FSW switches to discrete submodes when operating. Attitude control submodes group together similar attitude control methods and parameters into a set of predictable states or control laws. Each submode behaves differently, and each is assigned different performance envelopes. In particular, some submodes may use thrusters and Sun sensors for attitude control while others do not, thus training must be done on each mode separately. Table 2 lists the submodes.  (Table 3) as a series of pairs of event commands and times that may include submode transitions but also other maneuver commands. For example, the scenarios in Table 3 including LAM submode transitions also perform 180 degree rotation. Test scenarios may be of various durations. All scenarios start at 0, and brackets indicate the moment in time at which the mode transition occurs. For example, the first TP in LAM1 lasts from 199 to 698 s. The first mode transition usually does not start at 0 and this is because some extra (dummy) time is allocated at the beginning of scenarios to allow time for turning on equipment, etc. Scenario results are written as time-series tables in a database. All data are output at 8 Hz, matching the frequency at which flight software runs on the OBC.
Crucially, LUNASIM contains fault injection features that allow the test operator to inject a pre-defined set of faults at certain points during a simulation run. Available faults include: 1) failure of one of four reaction wheel assemblies (RWA), 2) wheel speed anomaly in RWA #1, 3) spike in gyro reference assembly (GRA)-measured spacecraft angular rate, 4) spike in course Sun sensor array (CSSA) #1 and #4. These injectable faults are adapted from a subset of fault management test code that has been developed for other spacecraft and are representative of faults that have a realistic chance of occurring on orbit.
Wheel speed anomalies and GRA and CSSA spikes can be simulated at arbitrary and multiple points during a run. We define special event commands that invoke (and cancel) these faults at the appropriate times, e.g., by setting a variable to a hard-coded 'failure' value. This, in turn, causes the simulator dynamics to react in a nonlinear fashion that is visible in telemetry variables but not Sensors 2020, 20, 1991 5 of 20 necessarily describable in a deterministic way. Wheel failure is a special case in that it is simulate d for the entire length of a run. This is because the operation concepts are: (1) Reboot to thruster-based safe-hold mode on a wheel failure, (2) Via ground command, start using the remaining three wheels only. As we are interested in attitude control performance after exiting safe-hold mode, our wheel-failure scenarios assume that the failure has already occurred and continues indefinitely. LUNASIM also models equipment noise such as gyro rate and angular noise over time, sun sensor current fluctuations, wheel voltage noise, etc. While a full treatment of LUNASIM noise models is beyond the scope of this work, the faults and maneuvers triggered in the test scenarios typically result in instantaneous signals with magnitudes that are many times greater than the mean and are visually distinguishable from noise, and with diminishing but persistent magnitudes afterward.

Data Schema
For every simulation run, LUNASIM writes a total of 31 tables, grouped roughly by functionality. These tables include spacecraft rate and stability, control error, knowledge error, various equipment (e.g., star trackers, thrusters, Sun sensors, solar array drive mechanisms (SADMs), RWAs, and GRAs), Kalman filter, disturbance torques, and current attitude control submode. While each table contains a different number of columns, each table follows the same general layout: a time column as the primary key (simulation time, starting from 0 s), followed by columns that usually map to variables used in simulator models or flight software. Such variables may represent continuous or discrete (e.g., true/false) variables. A sample table schema is shown in Table 4.  Table 2). Attitude control submodes group together similar attitude control methods and parameters into a set Sensors 2020, 20, 1991 6 of 20 of predictable states or control laws. Each submode behaves differently, and each is assigned different performance envelopes, allowing training on each mode separately. In particular, some submodes may use thrusters and Sun sensors for attitude control while others do not. For this reason, the set of injectable failures also differs based on the submode. Table 5 gives a summary of faults injected per scenario.

Preliminaries
In this study, the model learns using only normal data in an unsupervised mode, and then it is applied to abnormal data as an AD model. This method then detects anomalies by measuring the reconstruction error when new data is input. The key is that restoration would not work properly for abnormal data, because the generative model was trained using only normal data. This, in turn, means that the reconstruction error of a given sample is a good estimator for detecting an abnormal state. In this study, we utilize two unsupervised learning algorithms to detect abnormal states solely based on normal data sets: VAE [23] and GANomaly [24].

Variational Autoencoder (VAE)
VAE is an improvement on existing autoencoders (AE) [25], and it differs in that it samples latent code from a probability distribution. An AE always generates the same code for the same input, but VAE is a probabilistic model that tries to reconstruct data from the input data space via maximum likelihood estimation and variational inference. The model comprises two parts, encoder and decoder.
The key is that VAE trained with only normal data cannot reconstruct abnormal data well; this means the reconstruction error of a given sample is a good estimator for detection of an abnormal state. In VAE, the mean and standard deviation of a normal distribution are estimated through the encoder, and the data x are restored by sampling the code z, ( e φ : x → z) . Further, we could generate new samples similar to the original input data (x) by inputting the codes sampled from the distribution p(z) of the latent vectors (z) to the decoder, ( d θ : z → x) . [26]. In VAE, the encoder and decoder model parameters of the conditional distributions are denoted as q φ (z|x) and P θ (x|z), which were assumed to be Gaussian distributions. The cost function for training the model is as follows: where θ and φ denote hidden parameters of neural nets and λ = σ 2 x is a tuning parameter equal to a known variance of data [26]. E q(z|x) x − d θ (z) 2 2 is the reconstruction error and upon minimizing it through backpropagation, the AE is trained. D KL q φ (z|x) p(z) , the Kullback-Leibler divergence [27], is aimed to minimize the reconstruction loss of x during training. An important part of the training is the reparameterization trick, which creates random decoder inputs in the following manner: z = µ z + σ z , where µ z and σ z are outputs of the encoder, and is the sampled form p(z) [26].

GANomaly
GANs are holistic models for reconstructing data from a latent space by optimizing Jensen Shannon Divergence (JSD) of generated and real-world samples [28]. The first GAN-based attempt to detect anomalies was AnoGAN [29]. AnoGAN trains on a GAN in which the generator attempts to deceive the discriminator, while the discriminator attempts to discriminate the authenticity of a given normal datum. This kind of competitive play can sample normal data from latent variables of a probability distribution function. However, AnoGAN algorithm is not automated end-to-end because it needs an additional optimization process. In our work, we adopted a popular GAN-based AD framework: GANomaly. GANomaly borrows an AE-style generator to solve the additional optimization problem, and an encoder-style discriminator for parsing the representation of a given training data set. Overall, the objective function for the generator is as follows: where w adv , w con , and w enc are the weighting parameters that adjust the impact of the individual losses to the overall objective function [24]. Adversarial loss (L adv ) is the difference between the output from discriminator (f(x)) and output from autoencoder (f(x)) to reduce the instability of GAN training [24]. Contextual loss (L con ) is to optimize toward the learning contextual information about the input data (x) by measuring the distance between the input (x) and the generated images (x) [24]. Encoder loss (L enc ) is to minimize the distance between the bottleneck features of the input (z) and the encoded features of the generated image (ẑ) [24].

Bayesian Optimization
BO typically works for continuous functions by assuming the unknown function sampled from a GP and maintaining a posterior distribution for this function as observations are made or as the results of running learning algorithm experiments with different hyperparameters are observed [30]. This method can be used for carefully tuning the learning parameters and model hyperparameters in machine learning algorithms. Snoek et al. [30] presented methods and improved results for performing BO for hyperparameter selection of general machine learning algorithms.
We determine the optimized parameter: where, A is the space of the possible hyperparameters, and f is the objective function. We assumed that f(p) is a black-box function that does not know what the output is when the input is inserted. Based on the observed data , the function f(p) is estimated using the GP prior: The BO principle is based on the following Bayes's theorem: where P( f D i ) is the posterior probability of a model knowing the data; P(D i f ) is the likelihood of the data D i knowing model f ; and P( f ) is the prior probability of f . The function f (p i ) is moved to the next observation point (p n+1 , f (p n+1 )) with the acquisition function. We use the Matérn kernel, which is a stationary kernel and a generalization of the RBF kernel [31]. It has an additional parameter ν that controls the smoothness of the resulting function. It is parameterized by a length-scale parameter l > 0, which can either be a scalar (isotropic variant of the kernel) or a vector with the same number of dimensions as the inputs x (i.e., anisotropic variant of the kernel). The kernel is given by: As ν→∞, the Matérn kernel converges to the RBF kernel. When ν = 1/2, the Matérn kernel becomes identical to the absolute exponential kernel. In particular, ν = 3/2 and ν = 5/2 are popular choices for learning functions that are not infinitely differentiable (as assumed by the RBF kernel) but are at least once (ν = 3/2) or twice differentiable (ν = 5/2).
The acquisition function uses the most commonly used expected improvement (EI) function. Based on the objective function estimated to date, the EI outputs a number indicating the availability of the input value (p) by considering the probability improvement of deriving a function value greater than the maximum function value (f(p + ) = max i f(p i )) of the points (f(p 1 ), f(p 2 ), . . . , f(p n )) investigated so far and the difference between the function value and f(p + ). The expression of EI when using GP is summarized as follows: where ξ is a parameter that controls the relative intensity between the exploration and exploitation; φ is the normal probability density function; and Φ is the cumulative density function.

Method
As most spacecraft operate in a normal state, far fewer samples of abnormal data can be acquired than those of normal data. To resolve this problem, a probability-based unsupervised AD model is trained. After training the generation model using only normal data, abnormal data can be distinguished from normal data by measuring the reconstruction error when new simulated data not used for learning are input. This study aims to validate that the proposed AD method based on the selected generation model is suitable for diagnosing the status of a spacecraft attitude control system. The spacecraft attitude control system is operated in various scenarios according to the range of missions it was intended for, and each scenario included several operation modes. To measure AD performance of the two generation models considered in this study according to the data characteristics acquired for each submode operated in the operating scenarios, a healthy probabilistic model for discrimination is required. For this reason, the simulation data that can be clearly labeled with normal and abnormal status are used. The data sets are generated using LUNar Attitude and Orbit Control System (AOCS) SIMulator (LUNASIM) [21], a spacecraft attitude control simulator developed in-house at the Korea Aerospace Research Institute (KARI) to test the performance of attitude control flight software for the Korea Pathfinder Lunar Orbiter (KPLO) [22]. The training data for the learning model comprises normal data, and the test data comprises normal and anomaly data.
Furthermore, a series of processes for AD using multidimensional time-series spacecraft AOCS data include data preprocessing (including purification, integration, cleanup, and transformation), feature extraction using deep networks, and learning models represented by algorithms considering robustness and performance optimization. Additionally, a selection method based on a generation model considering the actual data status (i.e., the sample count and labeling level) and reviews of conformity and applicability are evaluated by comparing the prediction results between two selected generation models (VAE and GANomaly). Finally, cases of AD monitoring in a spacecraft attitude control system based on the diagnostic results are summarized.

Data Preprocessing
Each data file is a time-series sampled at 8 Hz, with 1063 values (i.e., columns and variables) for each timestamp. The range of each variable is substantially different; therefore, a normalization process is required. Without a normalization process, a learning model can be created that relies only on specific variables over a very large range [32].
We use min-max normalization because it is more stable than standardization such that all normalized data fall between 0 and 1, as shown in the following equation: where x min is the minimum value encountered for a given variable over the entire data set, and x max is the maximum value encountered for a given variable over the entire data set. However, the min-max normalization may be inaccurate in the presence of outliers. The data set used in this paper did not have any outliers in the simulated data; therefore, we chose the min-max normalization method.

Algorithms
In this paper, the data set generated from LUNASIM is labeled with information about normal and abnormal. In most real systems, the normal state data are much larger than the abnormal state. Considering the data status, we select the AD model using only the normal data for training. Then, we predict the state of the newly input data, which include normal and abnormal data acquired by LUNASIM. Considering the large number of datasets in the system, we use the convolutional neural network (CNN)-based approach to maximize the class by extracting the "optimized" features directly from the raw data for the problem.
In this paper, we use the AD methods based on generation models, which include GANomaly presented as the latest AD method based on GAN and AE-based VAE. In the problem of image restoration based on the existing AE and GAN, 2-D convolution and 2-D transposed convolution are mostly used for the encoder and decoder, respectively. CNN is specially modeled and produced for 2D signals; therefore, the appropriate conversion from 1D to 2D is made for application to 1D signal processing, such as time-series data [33].
However, the 2D-CNN technique suffers from hardware performance limitations because of its complexity, large number of computations to be performed, and large number of datasets that are required for learning. To compensate for these shortcomings, an alternative called 1D-CNN has been recently developed to reduce the computational complexity by performing simple array operations instead of matrix computations performed in 2D [34,35]. 2D-CNN is an operation designed to capture the spatial relations between the neighboring pixels in an image, but time sequences of sensor data such as gyroscope or accelerometer data do not have any spatial relation. We assume that the data columns are independent and the location of the feature within the segment (fixed-length) of the overall data is not highly relevant. Therefore, we use VAE and GANomaly generation model based on 1D-CNN, which is a 1D convolution for encoding and 1D transposed convolution for decoding. The filter size is 8, the stride is 4, and the epoch value is 100. Other major hyperparameters (learning rate, lamda, and warm-up period) are automatically determined within the algorithm using BO [30].
The architectures of the applied VAE algorithm are constructed as shown Figure 1a and listed in Tables 6 and 7. In VAE, we calculate an anomaly score from the reconstruction error of a given test dataset. The anomaly score function of the VAE can be derived from Equation (1) as follows: Sensors 2020, 20, x FOR PEER REVIEW 10 of 21 Normal cases should have low reconstruction errors, whereas anomalies should have high reconstruction errors.
The detailed network structure and loss functions for optimization are depicted in Figure 1b, Tables 8 and 9, respectively. During the test stage, the model uses encoder loss ( ) given in Equation (2) for scoring the abnormality of the given dataset. Hence, for a test sample , our anomaly score ( ) is defined with bottleneck features of the input ( ) and the encoded features of the generated output ( ( )) as follows [24]: To evaluate the overall anomaly performance, we computed the anomaly score = ∶ ( ), ∈ for the individual test sample within the test set . Then, we applied feature scaling to obtain the anomaly scores within the probabilistic range of [0, 1].
(a) (b)       Normal cases should have low reconstruction errors, whereas anomalies should have high reconstruction errors.
The detailed network structure and loss functions for optimization are depicted in Figure 1b, Tables 8 and 9, respectively. During the test stage, the model uses encoder loss (L enc ) given in Equation (2) for scoring the abnormality of the given dataset. Hence, for a test samplex, our anomaly score A(x) is defined with bottleneck features of the input G E (x) and the encoded features of the generated output E(G(x)) as follows [24]:  To evaluate the overall anomaly performance, we computed the anomaly score S = {s i : A(x i ),x i ∈D} for the individual test samplex within the test setD. Then, we applied feature scaling to obtain the anomaly scores within the probabilistic range of [0, 1].

Hyperparameter Optimization
Before training a neural network, choosing appropriate hyperparameters is crucial in obtaining a well-generalized neural network for the target task. Traditional ways to select hyperparameters include grid search and random search.
Both methods determine ranges of hyperparameter values and train the neural network by choosing hyperparameters. Although these two methods are sound ways of choosing hyperparameters, the computational complexity increases exponentially with the number of hyperparameters. To alleviate the computational complexity problem, Snoek et al. [30] proposed a BO. This BO utilizes a GP as a prior for a posterior distribution and samples a performance function whose domain is the set of hyperparameters from the posterior distribution. Finally, by choosing the most successful hyperparameters, this method can select hyperparameters without an exponentially expensive search.
In this paper, proper hyperparameters, such as learning rate and lambda for VAE, learning rate for GANomaly, and warm up period are proposed using BO. In VAE, some units of the latent variable are inactive owing to the KL-divergence term in the loss function, as shown in Equation (1), but these units remain inactive during training, preventing them from learning. This implies that the units are pruned away before they learn a useful representation. To solve this problem, Sønderby et al. propose a warm-up technique [36]. This method learns the GP based only on the reconstruction error and then gradually increases the weight of variational regularization term (KL-divergence) in loss function as the learning progresses. In this paper, not only batch normalization but also warm-up technique is applied to improve the robustness and performance of the algorithm. We use the performance indicator area under receiver operating characteristics (AUROC) as an object function to automatically find the best hyperparameters for a neural network. The related equations are described in Section 3.3.
The optimization starts with a sampling of three random hyperparameters and restricts the range of learning rate, lambda (λ) and warm up period to [10 −5 , 10 −3 ], [0.2, 0.8], and [30,500] respectively. Using the optimization method, the optimized parameters according to the two types of generation models and the performance when they are applied can be seen in Table 10.

Learning Models and Operation Modes
We present the AD results from the two deep-learning networks, VAE and GANomaly using data generatd by the satelite attitude control simulator LUNASIM. The reults of performance for two AD models are compared mode by mode via AUROC, as shown in Table 10. From the results, GANomaly (which utilizes GAN and AE simultaneously) gives more accurate results than VAE. This additional accuracy may arise from GANomaly's dual minimization of both reconstruction error and JSD between input and generated samples.
Looking at the results more closely, SP mode show low AUROC values for both models (VAE and GANomaly). The SP mode is aimed at the Sun or Earth orientation in the Earth transition orbit, and includes a rotational movement of the main body for this purpose, thereby causing a large range of variation; this procedure allows a greater attitude error range than other modes. For this reason, it may be more difficult in SP mode to distinguish between normal and abnormal data. VAE is an AE variant and probability generative model that regulates latent variables to follow a multivariate normal distribution. The objective function of VAE consists of two parts: MSE for maximum likelihood estimation and KL divergence for regularization. Although the KL divergence objective allows VAE to sample more informative latent variables, the sampling process for the reparameterization can cause some noise in the reconstructed output. Therefore, when the normal data and the abnormal data are not clearly different, the VAE trained only on normal data will generate abnormal samples instead of normal ones. Conversely, in the case of GANomaly, there is no such sampling process in the middle of the neural network. Additionally, because the GANomaly utilizes AE to reconstruct input data and GAN to match a distribution of generated samples to the target distribution (distribution of normal samples), GANomaly produces less noisy data than VAE. As a result, GANomaly can discriminate abnormal samples using reconstruction error more effectively than VAE. For this reason, SP and TP modes show much better results for GANomaly than for VAE and are not suitable for applying VAE techniques. In other words, normal and abnormal data can be separated easily by VAE in modes like LAM, TSH and WOL due to the simplicity of the training data set, but such separation cannot be achieved by VAE for SP and TP, which may have some noise in the training data set.
In particular, the performance in the SP mode is lower than that in other modes. Since this is not a frequent task such as solar orientation and Earth orientation in SP mode, overfitting may occur because there are relatively many overlapping features owing to the small change in behavior. Thus, a model that fits well with the training data is created, but the performance may be degraded for the test data to be predicted. To compensate for this, learning is performed by resampling 8 Hz data to 4 Hz and 1 Hz. The results are shown in Figure 2, and the smaller the sampling value, the better the performance. The performance for the SP in Table 11 is the result of the resampled data. The raw data output from the sensor is sampled at 8 Hz, but it is confirmed that the effect of overfitting is reduced, and performance is improved as shown in Table 11 by reducing the overlapping features through down-sampling. From the results, the diagnostic results of the SP mode presented in the results (Section 5) are derived by resampling at 1 Hz.
Sensors 2020, 20, x FOR PEER REVIEW 13 of 21 Hz and 1 Hz. The results are shown in Figure 2, and the smaller the sampling value, the better the performance. The performance for the SP in Table 11 is the result of the resampled data. The raw data output from the sensor is sampled at 8 Hz, but it is confirmed that the effect of overfitting is reduced, and performance is improved as shown in Table 11 by reducing the overlapping features through down-sampling. From the results, the diagnostic results of the SP mode presented in the results (chapter 5) are derived by resampling at 1 Hz.

Assistive Tools for AD
AD between normal and abnormal data is easier and clearer to determine using learning techniques than using monitoring of existing model output values. We look at some specific model inference examples (Figure 3a). As a simple example, we introduced a fault in LUNASIM's gyroscope model output from 1000-1001 s (Figure 3b). Artificially increasing the gyroscope model roll rotation

Assistive Tools for AD
AD between normal and abnormal data is easier and clearer to determine using learning techniques than using monitoring of existing model output values. We look at some specific model inference examples (Figure 3a). As a simple example, we introduced a fault in LUNASIM's gyroscope model output from 1000-1001 s (Figure 3b). Artificially increasing the gyroscope model roll rotation rate causes nonlinear changes in the modeled spacecraft control error (Figure 3c) and integrated angular rate ( Figure 3d); both of these may persist for a few hundred seconds after the fault. Control error here refers to the difference between desired attitude and estimated attitude; the instantaneous deviation in the control error appears small because the fault is introduced when the control error is quite large. By applying deep-learning models to this phenomenon, distinctions between normal and abnormal can be detected, as shown in Figure 4. The sudden increase in reconstruction errors in Figure 4 after 1000 s indicates that abnormal data have occurred that are clearly different from the learned model based on normal data. Figure 5 shows that while the anomaly was injected only briefly at 1000 s, differences from normal data. Knowledge error is defined as the difference between the true attitude of the spacecraft vs the attitude estimated using data from gyroscopes and star trackers and fused using an Extended Kalman Filter; an anomalous change in gyroscope rate data, though instantaneous, may result in a persistent bias in the attitude estimate and can explain why the reconstruction error of our models does not immediately return to normal.     Monitoring for the diagnosis of spacecraft systems has been conducted with the critical point (CP) based alarm method. However, because random values are conservatively set using the expert's experience and judgment in setting CP, frequent alarms are caused during development and operation. This makes the operator more resistant to problems, which can cause errors in troubleshooting. The deep neural network-based AD system proposed in this study can be used as a tool to improve the accuracy and reliability of the existing CP-based anomaly detection. In other words, an optimized threshold can be set based on the accumulated data.
We randomly generate some fault signals under CP to verify the algorithm. Table 12 shows the calculation parameters and performance. In the LAM mode, a spike occurs in the Gyro rate as shown in Figure 6. From the results in Figure 7, the AD model can detect abnormalities. Therefore, in addition to the secondary role, the deep neural network-based AD model can also mechanically detect what humans cannot detect.  Monitoring for the diagnosis of spacecraft systems has been conducted with the critical point (CP) based alarm method. However, because random values are conservatively set using the expert's experience and judgment in setting CP, frequent alarms are caused during development and operation. This makes the operator more resistant to problems, which can cause errors in troubleshooting. The deep neural network-based AD system proposed in this study can be used as a tool to improve the accuracy and reliability of the existing CP-based anomaly detection. In other words, an optimized threshold can be set based on the accumulated data.
We randomly generate some fault signals under CP to verify the algorithm. Table 12 shows the calculation parameters and performance. In the LAM mode, a spike occurs in the Gyro rate as shown in Figure 6. From the results in Figure 7, the AD model can detect abnormalities. Therefore, in addition to the secondary role, the deep neural network-based AD model can also mechanically detect what humans cannot detect.

Efficient Monitoring Tool According to Purpose
AD results can be processed and monitored according to purpose. LUNASIM outputs scenariobased data, so monitoring based on operational scenarios is easy in terms of data processing. If an anomaly is detected among the modes constituting the scenario, the results of the modes are checked in detail. We present AD results from two monitoring perspective: the operation scenario and mode.

AD In Operation Scenario
This method diagnoses abnormal modes for each operating scenario. Figure 8 shows the results of AD for MT8 operating scenarios in VAE-and GANomaly-based learning models. The MT8 scenario consists of SP and WOL modes. With reference to AUROCs (Table 10), both techniques in WOL mode show good detection performance, and the results of the GANomaly-based model are reliable in SP mode. In this case, it is possible to identify anomaly types in the mode as a whole in each operating scenario.

Efficient Monitoring Tool According to Purpose
AD results can be processed and monitored according to purpose. LUNASIM outputs scenario-based data, so monitoring based on operational scenarios is easy in terms of data processing. If an anomaly is detected among the modes constituting the scenario, the results of the modes are checked in detail. We present AD results from two monitoring perspective: the operation scenario and mode.

AD in Operation Scenario
This method diagnoses abnormal modes for each operating scenario. Figure 8 shows the results of AD for MT8 operating scenarios in VAE-and GANomaly-based learning models. The MT8 scenario consists of SP and WOL modes. With reference to AUROCs (Table 10), both techniques in WOL mode show good detection performance, and the results of the GANomaly-based model are reliable in SP mode. In this case, it is possible to identify anomaly types in the mode as a whole in each operating scenario.
This method diagnoses abnormal modes for each operating scenario. Figure 8 shows the results of AD for MT8 operating scenarios in VAE-and GANomaly-based learning models. The MT8 scenario consists of SP and WOL modes. With reference to AUROCs (Table 10), both techniques in WOL mode show good detection performance, and the results of the GANomaly-based model are reliable in SP mode. In this case, it is possible to identify anomaly types in the mode as a whole in each operating scenario.

AD in Operation Mode
If an anomaly occurs during spacecraft operation, the anomaly can be detected in each operating mode. Figure 9 shows the failure types in the LAM operating mode. A failure detected with a continuous reconstruction error of 0.996 or higher indicates an RWA failure, and the peak near 1000 and 2000 s is caused by an RWA wheel speed spike. Also, a failure persisting from around 1500 to 2000 s, indicates a gyroscope rate spike. The pattern of each type of failure is distinct, so it is easy to distinguish among failures given knowledge of typical failure types. If an anomaly occurs during spacecraft operation, the anomaly can be detected in each operating mode. Figure 9 shows the failure types in the LAM operating mode. A failure detected with a continuous reconstruction error of 0.996 or higher indicates an RWA failure, and the peak near 1000 and 2000 s is caused by an RWA wheel speed spike. Also, a failure persisting from around 1500 to 2000 s, indicates a gyroscope rate spike. The pattern of each type of failure is distinct, so it is easy to distinguish among failures given knowledge of typical failure types.  Figure 10a shows the observed results in the MT8 and MT9 scenarios in the event of gyroscope failure in TP mode. In both scenarios, failure patterns tended to be similar, followed by failures around 2500 s. Reconstruction errors are consistently high in the MT8 scenario, but not so in MT9. In this regard, the knowledge error values of gyroscopes presented in Figure 10b,c can be analyzed to determine what caused them. In the MT8 scenario, the yaw axis knowledge error was continuously high after abnormal behavior at 2500 s, and in the MT9 scenario, the value increased at the anomalous behavior but converged afterwards. As shown, even if the failure type is the same, there may be a difference in the failure occurrence pattern for each scenario. By monitoring the diagnosis result, it is possible to identify the cause of the failure in the system. If the cause is normal behavior, it is used as normal data to retrain the learning model.  Figure 10a shows the observed results in the MT8 and MT9 scenarios in the event of gyroscope failure in TP mode. In both scenarios, failure patterns tended to be similar, followed by failures around 2500 s. Reconstruction errors are consistently high in the MT8 scenario, but not so in MT9. In this regard, the knowledge error values of gyroscopes presented in Figure 10b,c can be analyzed to determine what caused them. In the MT8 scenario, the yaw axis knowledge error was continuously high after abnormal behavior at 2500 s, and in the MT9 scenario, the value increased at the anomalous behavior but converged afterwards. As shown, even if the failure type is the same, there may be a difference in the failure occurrence pattern for each scenario. By monitoring the diagnosis result, it is possible to identify the cause of the failure in the system. If the cause is normal behavior, it is used as normal data to retrain the learning model. determine what caused them. In the MT8 scenario, the yaw axis knowledge error was continuously high after abnormal behavior at 2500 s, and in the MT9 scenario, the value increased at the anomalous behavior but converged afterwards. As shown, even if the failure type is the same, there may be a difference in the failure occurrence pattern for each scenario. By monitoring the diagnosis result, it is possible to identify the cause of the failure in the system. If the cause is normal behavior, it is used as normal data to retrain the learning model.

Learning Model Update
For a single type of failure, it is possible to detect a scenario in which a failure occurs in a modebased learning model. In other words, abnormal situations in operating scenarios can be identified

Learning Model Update
For a single type of failure, it is possible to detect a scenario in which a failure occurs in a mode-based learning model. In other words, abnormal situations in operating scenarios can be identified in each observed mode. Figure 11a illustrates the pattern of failure in each scenario when RWA and gyro failures are mixed in TP mode.
Sensors 2020, 20, x FOR PEER REVIEW 18 of 21 in each observed mode. Figure 11a illustrates the pattern of failure in each scenario when RWA and gyro failures are mixed in TP mode. The TP1 and TP3 scenarios show similar aspects for the same type of failure, while the MT2 and MT9 scenarios initially tend to be similar, but then present anomalies in the MT2 scenario in the vicinity of 3500 s. After checking the gyroscope rate and wheel speed of the data, as shown in Figure  11b,c, a moderately large but normal maneuver observed at 3500 s appear to be misclassified. Therefore, it is necessary to update the learning model by classifying this maneuver as normal data. The TP1 and TP3 scenarios show similar aspects for the same type of failure, while the MT2 and MT9 scenarios initially tend to be similar, but then present anomalies in the MT2 scenario in the vicinity of 3500 s. After checking the gyroscope rate and wheel speed of the data, as shown in Figure 11b,c, a moderately large but normal maneuver observed at 3500 s appear to be misclassified. Therefore, it is necessary to update the learning model by classifying this maneuver as normal data.

Conclusions
A machine learning-based diagnostic model is developed to perform AD on lunar orbiter AOCS. Based on the data generated by applying various scenarios to perform the exploration mission in LUNASIM, a lunar orbiter AOCS simulator, we apply semi-supervised based AD and create algorithms based on generation models such as VAE and GANomaly. Data characteristics are extracted by deep learning method such as 1D-CNN, and hyperparameters are automatically determined by applying BO with GP to improve learning performance. Anomalies are detected by measuring the generation error for newly input data. The GANomaly model performs better than VAE. Normal and abnormal data may appear similar in modes allowing a relatively large margin of error when operating in an attitude control system. In VAE, if the difference between normal and abnormal data is unclear owing to the noise caused by the reconstructed output in the sampling process for reparameterization, the VAE trained on normal data can produce abnormal samples rather than normal samples. GANomaly, however, uses reconstruction errors more effectively than VAE to distinguish abnormal samples. The reconstruction-based AD models still do not provide state-of-the-art AD performance and in this regard, we expect that a deterministic AD model may improve upon the results of this study in the future. Moreover, the accuracy and reliability of the neural networks' abnormality detection may be improved by adding probability calibration techniques like temperature scaling and outlier exposure. In the present work, we confirm that AD models are selected based not only on the degree and kind of separation between normal and abnormal input data but also on the neural network's processing function, process, and method. The present study also provides examples of case monitoring on a case-by-case basis, based on the spacecraft's mission scenario and operational mode.
FD is necessary to additionally perform identification of the failure type after AD in order to be utilized more effectively in actual operation. However, this requires more data labelled for each failure type, so there is a limit to using simulation results, and actual data accumulated through long-term operation is required. After identifying the failure based on AD, the method of determining the degree of impact on the overall system performance by considering the severity of each failure and using it as a basis for decision-making can be studied as a basis for autonomous operation in the future.