A Satellite Incipient Fault Detection Method Based on Local Optimum Projection Vector and Kullback-Leibler Divergence

: Timely and effective detection of potential incipient faults in satellites plays an important role in improving their availability and extending their service life. In this paper, the problem of detecting incipient faults using projection vector (PV) and Kullback-Leibler (KL) divergence is studied in the context of detecting incipient faults in satellites. Under the assumption that the variables obey a multidimensional Gaussian distribution and using KL divergence to detect incipient faults, this paper models the optimum PV for detecting incipient faults as an optimization problem. It proves that the PVs obtained by principal component analysis (PCA) are not necessarily the optimum PV for detecting incipient faults. It then compares the on-line probability density function (PDF) with the reference PDF for detecting incipient faults on the local optimum PV. A numerical example and a real satellite fault case were used to assess the validity and superiority of the method proposed in this paper over conventional methods. Since the method takes into account the characteristics of the actual incipient faults, it is more adaptable to various possible incipient faults. Fault detection rates of three simulated faults and the real satellite fault are 98%, 84%, 93% and 92%, respectively. fault in spread spectrum transponder


Introduction
Some faults may occur during the operation of satellites because of the uncertainties of the space environment, the limitations on pre-launch testing, and the imperfect design of satellites. It remains a challenge to detect and solve incipient faults and avoid further complications [1,2]. At present, the detection of faults in satellites is done by comparing the telemetry parameters with the preset thresholds [3,4]. However, this method is not suitable for detecting incipient faults because they may not change significantly during early manifestation of the fault. Compared to system signals, the amplitudes of incipient faults are small, typically from 1 to 10% [5,6] and can easily be masked by the normal variations in systems [7].
Model-based fault detection methods have the advantage of being highly intelligent and interpretable. Examples of such systems include the Livingstone2 and the HyDE fault diagnosis models developed by National Aeronautics and Space Administration (NASA) [8] as well as the Testability Engineering and Maintenance System (TEAMS), developed by Qualtech Systems Inc (QSI) [9]. All of these model-based fault detection methods require manual fault modeling by experts in the field. In addition, the accuracy of these models has a determining impact on their final performance at detecting faults. However, due to the high cost yet low yield of components used in satellites, the lifetime tests or the acquisition of actual fault samples required to build accurate models may often be too costly. Moreover, model-based fault detection methods are only capable of detecting a narrow range of faults. This means that there may be faults that occur outside of the range detectable by the models and this may result in undetected incipient faults [10].
In recent years, data-driven fault detection methods have become a popular field of research [11][12][13]. These methods effectively utilize the extensive ground test data and operational data of satellites, with limited involvement of experts and do not require accurate mathematical models. Iverson et al. [14] proposed a fault detection method named the Inductive Monitoring System (IMS) and used it to detect faults in the Columbia Space Shuttle, and the TacSat-3 satellite. Schwabacher et al. [15] compared four datadriven fault detection methods containing IMS and One Class Support Vector Machine (OS-SVM). Pang et al. [16] proposed a fault detection method based on Gaussian Process Regression (GPR) and implemented it to detect faults in the periodic telemetry parameters of satellites [16]. Hundman et al. [17] proposed a spacecraft fault detection method based on Long Short-Term Memory (LSTM) networks and dynamic thresholds. Tariq et al. [18] proposed a fault detection method using a combination of multivariate convolutional LSTM and probabilistic principal component analysis (PCA) and applied it to the Korea Multi-Purpose Satellite 2 (KOMPSAT-2). Among these data-driven fault detection methods, distance-based fault detection methods such as IMS and OS-SVM are suitable for detecting fault data that are significantly different from normal data. However, these methods are less effective when the fault data and normal data mostly overlap. The fault detection methods using GPR or LSTM are suitable for detecting faults in regular (e.g., periodic) and time-continuous systems, but are ineffective when the telemetry parameters include interruptions in time.
Among the data-driven fault detection methods, methods based on PCA rely on the simplicity of the principle and on good real-time performance and have been used extensively in detecting faults in equipment [19,20]. The Hotelling's T 2 and squared prediction error (SPE) statistic are usually used to measure deviations in the principal subspace, and to measure deviations in the residual subspace, respectively. However, the conventional T 2 and SPE statistics are not effective at detecting incipient faults that are easily drowned out by noise [21]. Jinane et al. [21] proposed an incipient fault detection method using Kullback-Leibler (KL) divergence and PCA. On the basis of Jinane et al.'s research, Chen et al. [22] proposed an improved method. The proposed method monitors the derivations in principal and residual subspace. Youssef et al. [23] investigated a method of setting the threshold for fault detection using KL divergence for unknown distribution.
However, existing methods for detecting incipient faults based on PCA and KL divergence depend entirely on the off-line normal data to obtain the principal components. Each principal component is essentially a projection vector (PV). Once the normal data have been determined, the PVs used to detect faults also remain fixed. Unfortunately, the PVs obtained by PCA are not necessarily the optimum PV to detect the incipient fault from the view of fault detection. Fixed PVs may result in lower fault detection rates for some types of incipient faults [24,25]. To address the problem of detecting incipient faults in satellites, this paper proposes a novel incipient fault detection method based on local optimum PV (LOPV) and KL divergence. The main contributions of the present research are summarized in the following three points:

1.
This paper puts forward the argument that the PVs obtained by PCA are not necessarily the optimum PV for using KL divergence to detect an incipient fault.

2.
The problem of finding the optimum PV to detect the incipient fault is modeled as an optimization problem, and the KL divergence is used to detect the incipient fault on the LOPVs.

3.
The application of the incipient fault detection method based on PCA and KL divergence is extended to the satellites. The effectiveness of the proposed method is proven in a real satellite fault.
This paragraph outlines the structure of this paper. Section 2 provides a brief introduction about PCA and KL divergence. In Section 3, the incipient fault detection method based on LOPV and KL divergence is presented in detail. In Section 4, the proposed method is Appl. Sci. 2021, 11, 797 3 of 19 validated and analyzed by using a numerical example and a real satellite fault. Finally, the conclusions are presented in Section 5.

PCA
Let n 0 , m be the numbers of samples and variables of a normal data matrix X ∈ R n 0 ×m , with X = [x 1 , · · · , x k , · · · , x m ]. In general, each column of the original normal data X is standardized to eliminate the impact of different ranges of variables.
E(x k ) and σ(x k ) in Equation (1) are the mean and standard deviation of the kth column in the matrix X, respectively. The covariance matrix Σ ∈ R m×m of the standardized normal data matrix X is then extracted and the singular value decomposition (SVD) is performed.
The matrix V = [v 1 , · · · , v m ] ∈ R m×m in Equation (2) is a set of standard orthogonal bases in R m space. The space P = [v 1 , · · · , v l ] ∈ R m×l consists of the first l column vectors in the matrix V, is the principal component subspace, and the space P = [v l+1 , · · · , v m ] ∈ R m×(m−l) which consists of the last m − l column vectors is the residual subspace [26].
The PCA based fault detection method usually uses the two statistics T 2 and SPE, for fault detection. The calculations of the T 2 statistic and the SPE statistic are shown in Equations (3) and (4), respectively. A fault is considered to have occurred when either the T 2 or the SPE statistic has exceeded its corresponding detection threshold of δ 2 T or δ 2 SPE , respectively [27].

KL Divergence
KL divergence is often used to measure the difference between two probability distributions in the same event space and is widely used for optimization tasks in machine learning [28,29]. The KL divergence between two continuous probability density functions (PDFs) f 1 and f 2 is defined as follows [30]: Since the above definition does not satisfy symmetry and may result in I( f 1 f 2 ) = I( f 2 f 1 ), an improved form of KL divergence referred to as symmetric KL divergence is presented as follows [21]: Let n 1 , m be the numbers of samples and variables of on-line data matrix Y ∈ R n 1 ×m . The main processes of the conventional incipient fault detection methods based on PCA and KL divergence are as follows [21,22]: The standardized normal data X are projected on each PV v j and the reference PDF of the normal data is obtained after projection on v j as f j and the corresponding detection threshold is ε j , v j ∈ V, j ∈ [1, m].
For each PV v j , if the fault detection method could detect an incipient fault with a high detection rate on the v j , then v j is considered sensitive to the incipient fault with a given significance level α. The bigger the detection rate, the greater sensitivity to the incipient fault. It can be seen from formula (7) that the incipient fault can be well detected if there is one PV sensitive to the fault. From the perspective of fault detection, we hope to find an optimum PV that is most sensitive to the incipient fault.

Optimum PV for Incipient Fault Detection
We assume that there is an optimum PV that is more sensitive to the incipient fault than the existing PVs v 1 , v 2 , · · · , v m and the optimum PV is w ∈ R m . To find the optimum PV, this paper assumes the m dimensional feature parameters to be monitored in their normal state, obeying the m dimensional joint Gaussian distribution X ∈ R n 0 ×m ∼ N(µ x , Σ x ). Let the projection of X on w be F ∈ R n 0 . By the nature of the m dimensional joint Gaussian distribution, F obeys the one-dimensional Gaussian distribution F ∼ N µ T x w, w T Σ x w [26].
We assumed that after the incipient fault occurs, the on-line faulty data Y still obey the m dimensional joint Gaussian distribution Y ∈ R n 1 ×m ∼ N µ y , Σ y . However, due to the fault, the mean vector µ y ∈ R m and the covariance matrix Σ y ∈ R m×m of Y deviated and that their deviations were ∆µ and ∆Σ, respectively.
The projection of Y on w is G = Yw. Then, from the nature of the joint Gaussian distribution, G obeys the one-dimensional Gaussian distribution G ∼ N µ T y w, w T Σ y w . Assume that there are two PDFs f and g for two Gaussian distribution signals F ∼ N µ 1 , σ 2 1 and G ∼ N µ 2 , σ 2 2 , respectively. The KL divergence of f and g can be expressed as follows [21]: The projections of the normal data matrix X and the faulty data matrix Y on w obey the one-dimensional Gaussian distribution F∼ N µ T x w, w T Σ x w and G ∼ N µ T y w, w T Σ y w , respectively. Consequently, the mean and variance of F and G can be brought into Equation (11), respectively, obtaining: In Equation (12), the mean deviation vector ∆µ, normal data covariance matrix Σ x , and on-line faulty data sample covariance matrix Σ y can be obtained through the on-line faulty data matrix Y and normal data matrix X. Therefore, when historical normal data X and the on-line faulty data Y to be verified are obtained, only the PV w is unknown in Equation (12). In other words, KLD( f ||g) is a function h(w) on the PV w as shown in Equation (13).
Sliding windows with the length of n 1 are used to divide the normal data matrix X and submatrices of X can be obtained X 1 , · · · , X i , · · · , X r , X i ∈ R n 1 ×m , i ∈ [1, r]. We project all submatrices X 1 , · · · , X i , · · · , X r on w to obtain r PDFs PDF X1 , · · · , PDF Xi , · · · , PDF Xr . We assume that each PDF Xi obeys a one-dimensional Gaussian distribution. The KL divergence between PDF Xi and f can be calculated by Equation (13).
The process used to obtain vector h X (w) is shown in Figure 1.
, , The relative deviation in KL divergence of the on-line faulty data matrix Y and normal data matrix X on the PV w is defined as shown in Equation (15).  The relative deviation in KL divergence of the on-line faulty data matrix Y and normal data matrix X on the PV w is defined as shown in Equation (15). E(h X (w)) and σ(h X (w)) in Equation (15) are the mean and standard deviation of the vector h X (w). E(h X (w)) and σ(h X (w)) reflect the center and range of the interval of variation range of the KL divergence for normal data after projection on w, respectively. Once the on-line faulty data matrix Y and normal data matrix X are given, the KL relative deviation J(w) still remains as the function about w. With the given significance level α, the matrix Y and the matrix X, The larger the value of the J(w), the easier it is for that fault to be detected. On the contrary, if the value of the J(w) is small, it is likely to be drowned out by the noise and this leads to poor performance in detection of the fault. Thus, the problem of finding the optimum PV for fault detection can be modeled as an optimization problem, as shown in Equation (16).
If the PV w is directly optimized in Equation (16) so that h(w) is maximized, and the optimum solution may continue to converge on the PV corresponding to the smallest eigenvalue, thus making the optimization process meaningless. The above mentioned problem arises because the variation ranges of the eigenvalues of the different PVs are different. A small eigenvalue leads to small w T Σ x w and w T Σ y w in Equation (13), which are exactly in the denominator position, and slight noise can cause large fluctuations in the KL divergence on the corresponding PVs of the small eigenvalue. Therefore, it is necessary to standardize h(w), and the optimization goal is to maximize the relative deviation of KL divergence, J(w), rather than the absolute deviation of KL divergence, h(w).
The goal of PCA is to maximize the variance of the normal data X after it is projected on the PV w and can be modeled as an optimization problem, as shown in Equation (17) [26].
We supposed that w p is the optimum solution of the optimization problem shown in formula (16), w q is the optimum solution of the optimization problem shown in formula (17). Since the objective functions of the two optimization problems are different, w T Σ x w = J(w), generally, w p is not equal to w q . However, solving formula (16) solves the optimization problem, and w p is the optimum PV for using KL divergence to detect the fault. Because w p = w q , w q is not the optimum PV for using KL divergence to detect a fault in general. Therefore, the PVs obtained by PCA are not necessarily the optimum PV for using KL divergence to detect the fault.
Equation (16) converts the problem of finding the maximum J(w) into a standard constrained optimization problem by adding a negative sign and constrained conditions. The search for the optimum PV w that maximizes J(w) is on the surface of a hypersphere with a radius of 1. Equation (16) is a typical constrained optimization problem, which can be solved using readily available tools for solving optimization problem, such as the fmincon function built into MATLAB 2019b.
Since the objective function min w − J(w) in Equation (16) is a nonlinear function about w, the solution obtained by the iterative method is not necessarily a global optimum solution, but probably a local optimum solution [31,32]. However, considering that the optimization problem shown in Equation (16) needs to be solved when each set of online data arrives, finding the global optimum solution will undoubtedly increase the computational cost. Considering the timeliness, the iteration can be stopped when the optimization problem in Equation (16) converges to a local optimum solution that is better than the existing fixed PVs.
It is necessary to ensure that the local optimum solution searched for by Equation (16) is better than all the fixed PVs of v 1 , v 2 , · · · , v m obtained by PCA. To achieve this, when the on-line data Y arrive, the on-line data Y are projected on each fixed PV v 1 , v 2 , · · · , v m and the mean value drift vector µ, respectively. The KL divergence relative deviations are calculated according to Equations (13)- (15) in each PV and the vector of mean value drift.
Then the vector v j that makes the largest KL divergence relative deviation is selected as the initial iteration vector w 0 of the optimization problem in Equation (16).

Detecting Incipient Faults Using LOPVs and Dynamic Thresholds
In this section, sliding windows with the length of n 1 are used to extract and monitor the on-line data in real time. The on-line data extracted by the k th sliding window are Y k ∈ R n 1 ×m . This article intends to use the idea of hypothesis testing to detect incipient faults for each Y k , as shown in Figure 2. Let us assume that Y k is faulty, then we test this hypothesis holds. If the hypothesis held after testing, it is considered that a fault has occurred in Y k . Otherwise, Y k is normal. The specific hypothesis testing steps are as follows: 1.
Let Y = Y k , we assume that Y k is faulty. The method is used as described in Section 3.1 to find the local optimum PV w k for fault detection between the on-line data Y k and the historical normal data X.

2.
Let the projection of X and Y k on the local optimum PV w k be Xw k and Y k w k . 3. The The threshold ε k of the local optimum PV w k is set according to the given significance level α.

5.
If KLD(Xw k ||Y k w k ) > ε k , then assuming that Y k is faulty is correct. Otherwise, Y k is normal. Let k = k + 1, the next sliding window Y k+1 will be tested from steps 1 to 5.  For each on-line datum Y k , the method proposed in this paper will find a local optimum PV w k that is most sensitive to the fault. After w k is obtained, we can obtain vector h X (w k ). According to reference [22], Kh X (w k ) obeys the chi-square distribution with one degree of freedom, Kh X (w k ) ∼ χ 2 (1), Where K is a large integer. Given the significance level α, the method of setting the threshold ε k is shown in Equation (19). In Equation (19), γ is a constant. We suggest γ = E(h X (w k )), because there may be E(h X (w k )) = E(h X (w k+1 )), which leads to ε k = ε k+1 . Generally, w k = w k+1 , thus,ε k = ε k+1 . Therefore, dynamic thresholds are used to detect faults.

The Complete Incipient Fault Detection Process
The complete incipient fault detection method based on LOPV and KL divergence consists of two parts: the preprocessing process and the monitoring process: 1.
Z-score normalization is performed on the normal data matrix X to obtain X.

2.
The reference mean vector µ x and the reference covariance matrix Σ x for the normal data matrix X are calculated.

3.
PCA is performed on the normal matrix X and the matrix V of all m PVs is retained. 4.
Data matrices X 1 , · · · , X i , · · · , X r X i ∈ X, i ∈ [1, r] are extracted from the normal matrix X using sliding windows with the length of n 1 . The mean vectorµ Xi and the covariance matrix Σ Xi for each X i are calculated and stored.
The monitoring process: 1.
The on − line data matrix Y k is extracted using a sliding window with the length of n 1 .

2.
Z − score normalization is performed on the on − line data matrix Y k to obtain Y k . 3.
The mean vectorµ Yk and the covariance matrix Σ Yk of the matrix Y k are calculated. 4.
Equation (18) is used to find the initial vector of the iteration of the optimization problem w 0 . 6.
The optimization problem in Equation (16) is solved to obtain the local optimum PV w k of the online data matrix Y k . 7.
Equation (13) and Equation (14) are used to obtain the vector h X (w k ). 8.
The fault detection threshold ε k is determined using Equation (19). 9.
The KL divergence h(w k ) ofX and Y k after being projected on the local optimum PV w k is calculated by using Equation (13). 10. If h(w k ) > ε k was, then assuming that Y k is faulty is correct. Otherwise, Y k is normal. Let k = k + 1, the next sliding window Y k+1 will be tested from steps 1 to 10.
It should be noted that the above incipient fault detection method based on LOPV and KL divergence is suitable for systems with high requirements for incipient fault detection and sufficient computational resources. When the system has insufficient computational resources, the sliding window interval can be increased to reduce the frequency of solving the optimization problem. For example, the optimization problem can be solved every hour or daily. The LOPV obtained by solving the optimization problem can be added to the matrix V which contains m PVs obtained by the PCA method, V = V ∪ LOPV. Then, the new V that contains m + 1 PVs is used to detect incipient faults.
Another method to reduce the computational requirements is to set detection thresholds lower for existing fault detection methods based on PCA and KL divergence, so that the optimization problem is solved when a sliding window of on-line data exceeds the detection threshold of one PV. Then, V = V ∪ LOPV, increase the detection thresholds and use the new V to detect faults. Users can flexibly select the sliding window interval or detection thresholds based on actual monitoring needs and resource constraints.

Numerical Example
The effectiveness of the proposed method is verified through a numerical example which includes three incipient faults. The system is modeled by In Equation (20), [s 1 , s 2 , s 3 , s 4 ] T and [e 1 , e 2 , e 3 , e 4 ] T are Gaussian-distributed source signals and noises. All the source signals obey the standard normal distribution N(0, 1), and f 1 , f 2 and f 3 are three incipient faults, respectively, and [x 1 , x 2 , x 3 , x 4 ] T are the four variables that need be monitored [22]. Three incipient faults were simulated in this numerical experiment. Fault f 1 is the offset fault, and fault f 2 and fault f 3 are the gain faults. The three faults were inserted as follows: The fault detection rate (FDR) and false alarm rate (FAR) are calculated as follows: The other parameters of the numerical simulation experiment were set as follows. The total number of samples were 120,000, of which 60,000 were historical normal samples and 60,000 were online samples for testing. The sliding window length for both the historical normal data and the on-line test data in the experiment was 300. After using the sliding windows, a total of 200 windows of on-line data were obtained, of which the first 100 windows are normal windows, and the last 100 windows are fault windows. The default Signal-to-Noise Ratio (SNR) was 20 dB.
For the purpose of comparison with other PCA-based fault detection methods, the proposed method was compared with the conventional T 2 statistic [33] (PCA + T 2 ), the SPE statistic [33] (PCA + SPE), the method of using KL divergence to monitor the PVs in the principal subspace [21] (PCA + KLD1), and the method of using KL divergence to monitor the PVs both in the principal subspace and the residual subspace [22] (PCA + KLD2). The principal subspace was selected with a cumulative variance contribution of more than 90%. The confidence level for the T 2 statistic and the SPE statistic were both set at 0.99. The significance level for the PCA + KLD1 method and the PCA + KLD2 method were both set at 0.05. The significance level for proposed method was set at 0.01.
The results of the T 2 statistic and the SPE statistic for the incipient fault f 1 are shown in Figure 3. It can be seen from Figure 3 that the T 2 statistic and SPE statistic did not change significantly before and after insertion of the fault f 1 . After f 1 occurred, most of the online fault samples were still within the detection threshold. The T 2 statistic and the SPE statistic were equally ineffective in detecting the incipient faults f 2 and f 3 .
The results of detecting the incipient fault f 1 by using the PCA + KLD1 method and the PCA + KLD2 method are shown in Figure 4. A comparison of Figure 4a-d shows that the second and third PV were sensitive to the incipient fault f 1 . The remaining two PVs were less effective at detecting f 1 . The detection result of the proposed method for the incipient fault f 1 is shown in Figure 5. It can be seen from Figure 5 that the proposed method also has a good detection effect for the fault f 1 .
The results of detecting the incipient fault f 2 by using the PCA + KLD1 method and the PCA + KLD2 method are shown in Figure 6. Comparing Figure 6a-d shows that only the third PV of the four PVs obtained by PCA was sensitive to f 2 . However, there are still about 30% of the fault windows below the fault detection threshold. The result of detecting the fault f 2 by using the proposed method is shown in Figure 7. It can be seen from Figure 7 that only about 12% of the fault windows were below the fault detection threshold, which is better than the results of PVs obtained by PCA.               There are still about 60% of the fault windows below the fault detection threshold. Although we could increase the detection rate by decreasing the detection threshold, because the fault windows and the normal windows are not clearly separated, the false alarm rate will also increase. It can be seen from Figure 9 that the proposed method has a very good detection effect on f 3 , and the fault detection rate is 98% with similar false alarm rate as the PCA + KLD2 method.

R PEER REVIEW 11
x FOR PEER REVIEW f . There are still about 60 windows below the fault detection threshold. Although we could increase rate by decreasing the detection threshold, because the fault windows an windows are not clearly separated, the false alarm rate will also increase. from Figure 9 that the proposed method has a very good detection effect o fault detection rate is 98% with similar false alarm rate as the PCA + KLD2   The three incipient faults were simulated randomly for a total of 100 times and then the average values of the fault detection results were calculated and are summarized in Table 1. In Table 1, the traditional fault detection methods using PCA and 2 T or SPE statistics were very poor at detecting the three incipient faults. The PCA + KLD1, and PCA + KLD2 methods using fixed PVs and KL divergence greatly improved the detection of the three incipient faults compared to the 2 T and SPE statistics. However, under the condition of similar false alarm rate, the average fault detection rate of the proposed method in The three incipient faults were simulated randomly for a total of 100 times and then the average values of the fault detection results were calculated and are summarized in Table 1. In Table 1, the traditional fault detection methods using PCA and T 2 or SPE statistics were very poor at detecting the three incipient faults. The PCA + KLD1, and PCA + KLD2 methods using fixed PVs and KL divergence greatly improved the detection of the three incipient faults compared to the T 2 and SPE statistics. However, under the condition of similar false alarm rate, the average fault detection rate of the proposed method in this paper for the three faults was 60% and 30% higher than the PCA + KLD1, and PCA + KLD2 methods with fixed PVs, respectively. In Table 1, the proposed method and the PCA + KLD2 method had similar effects for the incipient fault f 1 , while the detection rate of the incipient fault f 3 was improved by up to 60%. After analysis, we found that the different improvements in the detection rate of faults was related to the characteristics of the actual occurrence of the incipient faults. When the occurrence of the incipient fault is in the detection sensitive region of a fixed PV obtained by the PCA method, the method of PCA + KLD2 can obtain good results. For example, the incipient fault f 1 is both located in the sensitive region of the second and the third PV. Consequently, the detection rate of the PCA + KLD2 method was more than 98% for the incipient fault f 1 . However, when the incipient fault is located in regions where the existing PVs are insensitive (e.g., the incipient fault f 3 ), the detection rate of the proposed method is improved by 60% compared with the traditional PCA + KLD2 method which uses fixed PVs. Combining the results of detecting the above three incipient faults, with the given significance level α, the PCA + KLD2 method is only sensitive to a part of the fault space. In contrast, the method proposed in this paper covers more of the fault space by finding the optimum PV for detecting incipient faults detection and is more adaptable to various possible incipient faults than the fixed PV fault detection method because it takes into account the characteristics of the actual incipient faults.

The Phenomena and Causes of the Fault
On 3 October 2018, the downlink telemetry fault of a satellite spread spectrum transponder occurred. The ground telemetry, track and command (TT&C) station could not accept the satellite's downlink telemetry signal, and the satellite telemetry remote communication returned to normal after enabling the backup spread spectrum transponder. After the incident, the data transmitted about the spread spectrum transponder telemetry showed that the spread spectrum transponder was in a shutdown state; the 5 V power telemetry jumped to 0 V; the solid-state amplifier current jumped from 0.715 A to 0.13 A; the solid-state amplifier power jumped from 11.19 w to 0.13 w; the temperature data showed the occurrence of the fault in the high-temperature region; the 12 V power telemetry slightly decreased; and other telemetry parameters were in the normal range. The spread spectrum transponder 5 V power telemetry performance before the jump to 0 is illustrated in Figure 10a. A similar exponential increase in the 5 V voltage before the fault and the eventual power failure is illustrated in Figure 10a.

VIEW
15 of temperature fluctuations of the spread spectrum transponder. As shown in Figure 10 periodic fluctuations in the spread spectrum transponder temperature occurred becaus the satellite-Z surface where the stand-alone transmitter is located fluctuates periodical due to the influence of an external heat source. The 5 V voltage telemetry also fluctuate with the temperature before the fault. The 5 V voltage fluctuation to higher values main occurred in the high temperature region, while the 5 V voltage fluctuation to lower value mainly occurred in the low temperature region.

Results and Comparison of Detecting the Fault
We used a total of 5,215,000 samples of real telemetry data obtained from satelli measurements and control systems from 0:0:0 on 1 July 2018 to 8:05:49 on 3 October 201 Each sample included three telemetry parameters: the spread-spectrum transponder tem perature; the 5 V voltage of the spread-spectrum transponder; and the 12 V voltage of th spread-spectrum transponder, as shown in Figure 11. Due to the influence of the variatio of satellite-Z's angle to the sun, the temperature of the spread spectrum transponder drif slowly with time. Consequently, in the experiment the temperature telemetry data we de-drifted. The sampling rate of all data was 1 Hz, and the satellite orbital period wa 46,468 s. However, the data were discontinuous in time, which is due to the influence o the visible arc segment of the satellite and the ground station measurement and contr resources. This resulted in telemetry data not being transmitted downward during pa The post-fault analysis located the fault as an anomaly in the DC-DC module of the spread spectrum transponder power module. This module is responsible for converting the voltage of the 42 V primary power supply to generate 5 V and 12 V power and then supplying power to the spread spectrum receiver and the spread spectrum transmitter. The analysis of the telemetry data of the occurrence of the fault shows that both the 5V and 12 V voltage fluctuations of the spread spectrum transponder are correlated with the temperature fluctuations of the spread spectrum transponder. As shown in Figure 10b, periodic fluctuations in the spread spectrum transponder temperature occurred because the satellite-Z surface where the stand-alone transmitter is located fluctuates periodically due to the influence of an external heat source. The 5 V voltage telemetry also fluctuated with the temperature before the fault. The 5 V voltage fluctuation to higher values mainly occurred in the high temperature region, while the 5 V voltage fluctuation to lower values mainly occurred in the low temperature region.

Results and Comparison of Detecting the Fault
We used a total of 5,215,000 samples of real telemetry data obtained from satellite measurements and control systems from 0:0:0 on 1 July 2018 to 8:05:49 on 3 October 2018. Each sample included three telemetry parameters: the spread-spectrum transponder temperature; the 5 V voltage of the spread-spectrum transponder; and the 12 V voltage of the spread-spectrum transponder, as shown in Figure 11. Due to the influence of the variation of satellite-Z's angle to the sun, the temperature of the spread spectrum transponder drifts slowly with time. Consequently, in the experiment the temperature telemetry data were de-drifted. The sampling rate of all data was 1 Hz, and the satellite orbital period was 46,468 s. However, the data were discontinuous in time, which is due to the influence of the visible arc segment of the satellite and the ground station measurement and control resources. This resulted in telemetry data not being transmitted downward during part of the time.
measurements and control systems from 0:0:0 on 1 July 2018 to 8:05 Each sample included three telemetry parameters: the spread-spectr perature; the 5 V voltage of the spread-spectrum transponder; and t spread-spectrum transponder, as shown in Figure 11. Due to the infl of satellite-Z's angle to the sun, the temperature of the spread spectru slowly with time. Consequently, in the experiment the temperature de-drifted. The sampling rate of all data was 1 Hz, and the satelli 46,468 s. However, the data were discontinuous in time, which is d the visible arc segment of the satellite and the ground station mea resources. This resulted in telemetry data not being transmitted do of the time.  In this article, the satellite orbital period was used as the length of the sliding window. The sliding window interval was 10,000 s, and a total of 539 sliding windows were retained with more than 20,000 samples within each sliding window. The first 380 sliding windows were selected as normal windows, and the last 159 sliding windows were used as on-line test windows. The first 64 windows in the on-line test windows were the normal windows, and the windows after that were the fault windows. The significance level of the PCA + KLD1 method and PCA + KLD2 method participating in the comparison were set to 0.05, 0.025 and 0.01, respectively, and significance level of the proposed method was set to 0.01. Figure 12a-c shows the results of the spread spectrum transponder fault detection on the three PVs obtained from the PCA method. It can be seen from Figure 12 that only the second PV was sensitive to the fault. With a significance level of 0.01, there were still 23% of the fault windows located below the detection threshold. With a significance level of 0.025, 15% of the fault windows were still located below the detection threshold and a normal window was considered to be a fault window.
In comparison, Figure 13 shows the results of the proposed method for the spreadspectrum transponder fault. It is evident that the detection results on the local optimum PVs were better than those on the fixed PVs. Most of the fault windows were located above the detection threshold, with a high fault detection rate of 92.63% and a false alarm rate of 0%. With the significance level of 0.01, the PCA + KLD2 method can also obtain a 92.63% fault detection rate. However, 23.44% of the normal windows were considered to be fault windows. The fault detection results were calculated and are summarized in Table 2. This real case of detecting the fault in a satellite's spread spectrum transponder further verifies the effectiveness and superiority of the method proposed in this paper. set to 0.01. Figure 12a-c shows the results of the spread spectrum transponder fault detec the three PVs obtained from the PCA method. It can be seen from Figure 12 that o second PV was sensitive to the fault. With a significance level of 0.01, there were st of the fault windows located below the detection threshold. With a significance l 0.025, 15% of the fault windows were still located below the detection threshold normal window was considered to be a fault window. In comparison, Figure 13 shows the results of the proposed method for the s spectrum transponder fault. It is evident that the detection results on the local op PVs were better than those on the fixed PVs. Most of the fault windows were l above the detection threshold, with a high fault detection rate of 92.63% and a false rate of 0%. With the significance level of 0.01, the PCA + KLD2 method can also o 92.63% fault detection rate. However, 23.44% of the normal windows were consid be fault windows. The fault detection results were calculated and are summarized in 2. This real case of detecting the fault in a satellite's spread spectrum transponder f verifies the effectiveness and superiority of the method proposed in this paper.    Figure 13. The detection result of the proposed method for the spread-spectrum transponder fault.

Conclusions
The efficient detection and solving of incipient faults will effectively reduce the losses and the hazards they cause. Under the assumption that the variables obey a multidimensional Gaussian distribution, this paper models the optimum PV for detecting incipient faults as an optimization problem. Then, the validity of the proposed method is assessed using a numerical simulation example and an actual satellite fault case. Two different mitigation schemes are proposed to address the computational problem of the proposed method, in which the user can flexibly choose the sliding window interval or the alarm threshold according to the actual monitoring requirements and resource constraints. The method only has been proved in three faults. If the system was non-Gaussian or nonlinear, the detection effect of the proposed method might decrease. The future works related with improvement of the method can come from time efficiency and expansion to nonlinear systems. Although the proposed method is based on detecting incipient faults in satellites, it is not specific to this application of detecting faults in satellites and can be extended to other applications where multivariate statistical analysis can be used to detect incipient faults.  Data Availability Statement: Data available on request due to restrictions eg privacy or ethical. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest:
The authors declare no conflict of interest.