Sparsity-Constrained Vector Autoregressive Moving Average Models for Anomaly Detection of Complex Systems with Multisensory Signals

: Detecting anomalies in large, complex systems is a critical and challenging task, and this is especially true for high-dimensional anomaly detection due to the underlying dependency structures among sensors. To incorporate the interrelationships among various sensors, a novel sparsity-constrained vector autoregressive moving average (scVARMA) model is proposed for anomaly detection in complex systems with multisensory signals. This model aims to leverage the inherent relationships and dynamics among various sensor readings, providing a more comprehensive and accurate analysis suitable for complex systems’ complex behavior. This research uses convex optimization to search for a parameterization that is sparse based on the principal of parsimony. This sparse model will not only contribute to meeting the real-time requirements of online monitoring strategies but also keeps the correlations among different sensory signals. The performance of the proposed scVARMA model is validated using real-world data from complex systems. The results affirm the superiority of the proposed scheme, demonstrating its enhanced performance and potential in practical applications.


Introduction
Since the inception of health monitoring technology for complex systems, such as liquid rocket engines (LREs), major spacefaring nations around the world have placed great emphasis on the design and development of health monitoring systems (HMS) for these engines [1].In fact, health monitoring systems have now become an indispensable key component in both improving existing LREs and developing new ones.This system utilizes sensors to gather system-related data and employs various intelligent algorithms and models to assess the health status of the system [2].The application scenarios of the HMS for LRE can be categorized into online and offline modes [3].The "online" mode is utilized during flight and ground hot-fire tests for real-time monitoring and implementing fault-tolerant control with the objective of ensuring the successful completion of the launch mission [4,5].The "offline" mode is applied post-engine testing for the analysis of components' performance, which involves combining relevant resource information to provide practical maintenance measures, ultimately fulfilling the requirements for reusability [6].Several key technologies in this system include: the selection of sensitive engine parameters and optimization techniques for measurement points [7]; the design and application of real-time fault diagnosis algorithms; and accurate fault localization technology [8].
Over the past thirty years, significant progress has been made in the field of real-time fault diagnosis technology for LREs.Especially since the concept of Industry 4.0 was first introduced in 2011, the automation of the fault detection process has become a significant objective in technological transformation [9].Utilizing big data analytics, advanced statistical algorithms, and data visualization techniques enable the use of sensor data collected from machinery to predict potential faults and implement corresponding maintenance measures.In this field, time series analysis [10][11][12] and anomaly detection [13][14][15][16] have garnered significant attention.Compared to structural models, time series models have more advantages in this aspect as both modeling and forecasting can be readily accomplished.Time series models primarily include the autoregressive model (AR), moving average model (MA), and autoregressive moving average model (ARMA) [17].The ARMA model is a parametric method used to analyze sequentially ordered random vibration data to identify modal parameters.It predicts short-term system behaviors by examining the inherent patterns within time series data.This method is typically employed for engine fault detection during the steady-state phase [18].Deng et al. [19] developed a real-time fault detection method based on the ARMA model.They conducted simulations via a hardware-in-the-loop platform to validate the method's reliability.Similarly, Zhao et al. [20] employed the ARMA model for real-time fault detection in rocket engine diagnosis using a rapid prototyping approach on their hardware-in-the-loop simulation platform.Xue et al. [21] also implemented the ARMA model in a fault detection model for a liquid oxygenmethane rocket engine.Their algorithm successfully identified common LRE faults and fulfilled the real-time demands of engine fault diagnosis.
However, the ARMA model, being univariate in nature, is confined to the analysis of a singular time series.This inherent characteristic precludes its ability to discern and interpret the interrelations among a multitude of critical parameters, such as pressure, temperature, and flow rate.Such a limitation poses a significant constraint on its utility for grasping the intricacies embedded in the overall performance and health state of LREs.In contrast, the vector autoregressive moving average (VARMA), exemplifying a multivariate approach, is adept at concurrently analyzing multiple time series.This capability enables VARMA to elucidate the intricate and dynamic interplay among these parameters, thereby providing a more holistic and nuanced comprehension of the rocket engines' overall health status.Therefore, this manuscript delineates an innovative methodological approach, specifically the deployment of VARMA, within the ambit of LRE surveillance.Despite the established prominence of VARMA in the realm of statistical analysis, its applicability and potential in the field of aerospace engineering, particularly in the monitoring of engines, is still an underresearched area.This study endeavors to fill this research lacuna by methodically exploring the practicability and efficacy of VARMA in the real-time analysis and surveillance of LREs.The ensuing sections of this paper are dedicated to a comprehensive examination of VARMA encompassing its theoretical foundations and an avant-garde approach to application.This culminates in an elaborate elucidation of the transformative impact these models can have on the domain of aerospace engine monitoring.
The main contributions of this paper are as follows: 1.
A novel sparsity-constrained vector autoregressive moving average is proposed to apply multivariate time series analysis methods to the real-time fault detection of complex systems utilizing the interrelationships among data from various sensor points to monitor the overall health status of the systems.

2.
A sparse induction strategy is proposed to enhance the real-time capabilities of multivariate time series analysis methods, which is aimed at reducing the computational load during model parameter identification.

3.
Utilizing actual test data, the proposed scheme is compared with traditional online monitoring schemes that use single time series models, and the results demonstrate the advanced nature of the proposed model.
The remainder of this paper is organized as follows.Section 2 introduces the rationale behind selecting a multivariate time series analysis approach and the specific framework of the scheme.In Section 3, this study describe how to refine the existing approach to better suit the application scenarios of real-time fault diagnosis.Section 4 compares this method with traditional single time series methods.The discussion in Section 5 delves into the broader implications of the findings, highlighting the advancements it offers over conventional approaches.Finally, some important conclusions are drawn in Section 6.

Theoretical Framework
Although the VARMA model is widely used for the predictive analysis of time series data, this study specifically employs it for the real-time prediction of operational engine data.By comparing predicted values with actual observational data, the model aims to promptly identify potential anomalies or faults in the system, thereby supporting fault diagnosis and health monitoring.The complex challenges of real-time fault diagnosis in LREs are particularly evident in the dynamic interactions and complexity of multiple parameters.The application of the VARMA model in this context, due to its ability to analyze multiple time series concurrently, allows us to reveal and understand the complex and dynamic interactions among these parameters, thereby comprehensively grasping the overall health status of the rocket engines.

The Rationale for Choosing VARMA
Figure 1 shows the system diagram of an LRE.During engine start-up, high-pressure ammonia gas squeezes the kerosene in the starter box, and the change in pressure causes the ignition conduit to break, filling the generator and thrust chamber ignition path with igniter.Then, the main valve of the liquid oxygen opens, allowing liquid oxygen to enter the generator.After that, the fuel valve of the generator opens, and the igniter enters the generator, igniting with the mixture of liquid oxygen, producing oxygen-rich gas that drives the main turbine to rotate.This process causes the igniter to flow into the thrust chamber, mixing and igniting with the oxygen-rich gas.Finally, the main fuel valve opens, and the fuel mixes with the oxygen-rich gas for combustion.The start-up process is very brief, and the system quickly reaches a stable operating condition.In the context of fault diagnosis for LREs, this study is interested in the temporal variations of parameters such as temperature, pressure, flow rate, and rotational speed.In the operation of LREs, various components work closely in conjunction and adhere to thermodynamic principles.For instance, variations in the temperature of the combustion chamber can affect the temperature in the exhaust system, subsequently impacting the efficiency of the entire cooling system.Therefore, the performance parameters of the components are interdependent, exhibiting strong correlations or feedback relationships.
LREs generate data from multiple sensors simultaneously.Suppose that a d related time series variable is obtained as, say, y 1t , y 2t , . . ., y dt .Define y t = (y 1t , y 2t , . . ., y dt ) ′ as the time series vector at time point t.
In the process of conducting multi-parameter studies of LREs, a framework is required that not only characterizes the properties of individual sequences but also describes the interrelationships between sequences.The two primary objectives of jointly analyzing and modeling are as follows: 1.
To understand the dynamic relationships of these sequences over time; 2.
To enhance the accuracy of individual sequence forecasts by utilizing additional information provided by related sequences in each prediction.The vector autoregressive (VAR) model has been extensively applied to achieve these objectives, such as in [23,24].Initially proposed by Christopher Sims [25], this model offers a flexible approach, reducing reliance on the numerous inherent assumptions of structural models.Building on the VAR model, the integration of moving averages led to the creation of VARMA, which was an extension of the foundational work on ARMA models [26].Although the VAR is a critically important class of models in multivariate time series analysis, allowing each variable to be a linear function of past values (including its own past values and those of other variables), it requires extensive data to estimate all parameters, especially when involving multiple variables and longer lags.The VARMA, by integrating the AR part and the MA part, can capture the dynamic characteristics of time series more effectively.Additionally, the class of VARMA is closed under marginalization and linear transformations [27].

VARMA and Its Identification Problem
In a VARMA d (p, q) model, a stationary d-dimensional mean-zero vector time series y t is modeled as a function of its own p past values and q lagged error terms, where are MA part matrices, and a t represents a time series of white noise vectors in d dimensions, each with a mean of zero and characterized by a nonsingular d × d contemporaneous covariance matrix, denoted as Σ a .We define the AR and MA operators, respectively, as , where L ℓ represents the lag operator, indicating a lag of ℓ time periods.Then, the model can be written in more compact notation as The matrix polynomials in (2) are assumed to satisfy The first of these conditions ensures that the AR operator is stable and the process is stationary.The second part is the usual invertibility condition for the MA operator, which implies the existence of a pure VAR representation of the process.This denotes that {y t } can be expressed as where The Π matrices can be computed recursively from {Φ ℓ } and {Θ m } [28].The VARMA is distinctly characterized by the operator Π(L) rather than generally by the AR and MA operators Φ(L) and Θ(L).That is, for the same Π(L), p, and q, an equivalent data-generating process can be executed.Consequently, the VARMA representation of {y t } is not unique.
This lack of uniqueness presents challenges in parameter identification.To address this issue, researchers and practitioners have adopted several strategies: [29] discussed in detail the use of identification constraints in multivariate linear time series models, including VARMA models; [30] proposed LASSO regularization methods for parameter estimation and the selection of high-dimensional models; [31] provides an application of Bayesian methods in the estimation of VARMA models; and [32] describes the use of regularization methods in large VAR models, and these techniques can also be extended to VARMA models.

Model Identification
In traditional VARMA, the parameter coefficients are typically dense, reflecting that each time series is related to the past values of other time series, with the majority of elements in the coefficient matrices being non-zero.However, in the context of online monitoring, the model utilized should fulfill the following criteria: 1.
High storage and computational efficiency; 2.
Increased robustness, augmenting resistance to noise and outliers.
All the aforementioned requirements point towards sparsity.When Φ(L) and Θ(L) are sparse matrices, the model's characteristics include: 1.
The ability to conserve significant memory space, reducing computation time by bypassing zero elements; 2.
The retention of only salient features and relationships, which aids in comprehending the operating principles of the model; 3.
The exclusive transmission and processing of non-zero elements, thereby enhancing computational efficiency; 4.
The diminution of noise impact through sparsification.
In order to facilitate coefficient sparse identification in VARMA models, this study initially derives the Yule-Walker equations for the VARMA model.This equation embodies a method for representing stochastic processes, with its core essence residing in establishing a linkage between the parameters and the autocorrelation function.For a pth-order , where γ 0 is the variance and µ is the mean of the time series.The Yule-Walker equations can be derived by considering the autocovariance of both sides of the model at lag k: As mentioned above, the VARMA can be reformulated into the form as presented in (4), and it is noteworthy that the parameter pairs (Φ, Θ) satisfying the same transformation equation are not unique.Define, for a certain Π(L), p, and q, the equivalence class, which can be characterized as: Subsequently, one defines z t = y T t−1 : . . .: y T t−p : a T t−1 : . . .: a T t−q T and β = Φ 1 : . . .: Φ p : Θ 1 : . . .: Θ q T .Consequently, Equation ( 1) can be expressed as: Therefore, Given that a t represents white noise, its expected value is zero, which implies that E y t z T t = β T E z t z T t ; that is to say: Hence, the Yule-Walker-type equation for the VARMA is derived.Clearly, β is the solution to the equation.It is evident that the equation ( 7) is a system of non-chiral linear equations, so the solution takes from β = β * + δ, where β * is a particular solution and δ represents a specific solution to the system of chi-square linear equations and satisfies ∑ z δ = 0. Through this process, the parameter identification problem has been transformed into a problem of solving equations.
In order to identify a solution that meets the criteria from the multitude of available solutions, an additional penalty term has been incorporated into the objective function of the model optimization.This term is designed to facilitate the measurement of parameter sparsity and the identification of uniqueness.In this regard, a combination of the ℓ 1 -norm and Frobenius norm has been selected for addition to the model.The ℓ 1 -norm of a matrix is defined as the sum of the absolute values of its elements.Employing L 1 regularization tends to yield sparse models characterized by the estimation of numerous parameters as zero.However, the ℓ 1 -norm lacks strong convexity.Thus, the research incorporate the Frobenius norm, known for its strong convexity, to ensure the uniqueness of parameter identification.
The model parameters were estimated in two stages, adhering to the methodologies outlined in [33,34].Firstly, we converted the fundamental VARMA equation into an infiniteorder VAR y t = p ∑ τ=1 Π τ y t−τ + a t to approximate the unobservable errors ât .Following the estimation of the unobservable error terms, the model was derived, where u t is the vector error series.We then rewrote the equation in compact matrix notation as Y = ΦZ + ΘX + U.The estimates for the regularized VARMA were acquired as follows: During the parameter estimation process, we first defined three hyperparameters: p, p, and q, where p represents the order of the VAR to which the VARMA equation is converted in the first stage of parameter estimation and p and q are the maximum lag orders for establishing the VARMA, respectively.We then defined the length of the series chosen for modeling as N and took p = 1.5 √ N and p = q = 0.75 √ N. Thus, the scVARMA was obtained.
In the framework of the scVARMA model, the value of each time series at a given time is influenced not only by its own past values and those of other series but also by historical error terms.This structural characteristic allows the model to capture and articulate the complex dynamic interactions and dependencies among the time series.By integrating these relationships, the scVARMA model excels in delineating how each variable within the multivariate dataset influences others over time.The introduction of sparsity through regularization techniques aids in refining the model by emphasizing only the most significant links among the time series variables.These connections are not arbitrary; instead, they are determined based on their ability to explain the variance and co-movements in the dataset effectively.By acknowledging and modeling these intricate dependencies, the scVARMA model offers a profound understanding of the underlying processes governing the system's behavior, thereby facilitating more accurate predictions and strategic insights into the time series data interaction patterns.

Strategies for Online Monitoring
The transition of an LRE system from a state of normal operation to complete failure is typically not instantaneous.There exists an intermediate transitional process as the system progresses from normal functionality to a state of total failure.When parameter indices deviate from normal benchmarks but remain within acceptable limits, the parameters are considered anomalous.A system is deemed to have malfunctioned when at least one characteristic index strays beyond these acceptable boundaries.Failure is defined as the state wherein the system irreversibly loses the capacity to perform its intended function.In Figure 2, the interval T 1 − T 2 represents the system in an anomalous state.As the severity of the malfunction progresses, it transitions into the interval T 2 − T 3 , which is indicative of a state of malfunction.To enable the prediction of potential malfunctions when the LRE exhibits anomalous behavior, the scVARMA is utilized for fault forecasting.The operational conditions of LREs are typically categorized into three stages: the startup phase, the main stage, and the shutdown phase.A majority of studies indicate that during the startup and shutdown phases, the data measured by sensors exhibit highly non-stationary and transient characteristics [35][36][37][38].Conversely, during the main stage, the engine operates stably, and the sensor measurements are either inherently stationary or can be rendered stationary through normalization, differencing, and similar procedures.
When setting threshold values, a larger range decreases the sensitivity of fault detection, leading to an increased incidence of missed warnings.On the other hand, a narrower range may result in a higher frequency of false alarms.Both missed warnings and false alarms are undesirable outcomes in the context of engine fault diagnosis.For each monitored parameter y i , i = 1, . . ., d that is stationary, its mathematical variance and expectation are, respectively, defined as σ 2 = constant ≤ ∞, and µ.According to Chebyshev's inequality, for any bandwidth coefficient n > 0, the following holds: For any given false detection probability α, the upper and lower threshold limits are defined as The normal operating range for the monitored parameter is thus defined as [µ − nσ, µ + nσ].Given that Chebyshev's inequality is applicable to any random variable with finite mathematical expectation and finite variance, the largest normal range is within the interval [µ − nσ, µ + nσ], constituting a finite range.For fault prediction parameters that follow a normal distribution, the probability of their values falling within the interval [µ − nσ, µ + nσ] is 99.74% [39].Hence, it is feasible to set n as 3 or adjust n based on the actual performance of fault prediction.
Once the threshold strategy is determined, the scVARMA is employed to predict parameters, thereby enabling fault prognosis.The comprehensive process is depicted in Figure 3. Data are first collected from sensors in the engine.Upon retrieval, an initial check is performed to determine the operational status of each sensor.If a sensor is found to be faulty, its data are discarded.For sensors operating normally, their data are used for predictive analysis.The scVARMA model predicts future parameter performance based on this data.Simultaneously, thresholds are calculated to determine if the predicted values fall within acceptable ranges.If a predicted value lies outside these thresholds, an alarm is triggered indicating a potential fault.Otherwise, the sensor parameters are considered normal, and monitoring continues.During the real-time input of time series data, the threshold is computed, and the established scVARMA model is utilized to predict future parameter performance.If the predicted values exceed the threshold range, a fault alarm is issued.While establishing the model, typical historical data from engines of the same model are selected, and pre-processing is performed to stabilize the data, thereby fulfilling the application conditions for the scVARMA.For an established scVARMA, the predictive formula is expressed as follows:

Experimental Validation
To evaluate the effectiveness of the scVARMA model in a real-world application, we conducted a two-fold experimental study.The first part of the study focused on online monitoring trials using a proprietary dataset from LRE, while the second part tested the predictive capabilities on both the proprietary dataset and a public dataset from the Case Western Reserve University Bearing Dataset.

Online Monitoring with Proprietary Data
The online monitoring experiment utilized real-time sensor data from the LRE to detect anomalies and predict potential failures.The dataset was derived from the results of a specific test run on an LRE.The sampling frequency of the sensors was uniformly 100 Hz.This research selected eleven parameters for analysis, each accompanied by its respective unit to ensure clarity and precision in data interpretation:

Predictive Performance on Proprietary and Public Data
The second part of the experiment evaluated the predictive performance of the sc-VARMA model.The experimental validation involved a detailed comparison with traditional ARMA and VARMA models across various sensors for predicting different system parameters.Table 1 presents the mean and standard deviation of the residuals from the in-sample data fitting.The scVARMA model exhibits mean residuals close to zero and consistently lower standard deviations across most sensors, indicating better fitting to the in-sample data compared to both the ARMA and VARMA models.To further compare the predictive power, Table 2 presents the mean absolute error (MAE) and root-mean-square error (RMSE) for both models at 3-step and 5-step predictions, highlighting the predictive accuracy across different future intervals.As shown in Table 3, the MAE and RMSE metrics of the scVARMA model have smaller values than the ARMA and VARMA models for most of the sensor data at 3 and 5 steps ahead of prediction.To substantiate the universality of the proposed scVARMA model, we extended our evaluation to include the well-regarded Case Western Reserve University Bearing dataset.This dataset is a benchmark in the field of condition monitoring and predictive maintenance, consisting of vibration signal data that capture bearing performance under various conditions.For this phase of testing, this research specifically focused on the dataset's normal operation conditions to align with the baseline comparisons established in earlier experiments.This analysis applied the ARMA, VARMA, and scVARMA models to predict the future states of the bearings and used the MAE and RMSE as metrics to assess the accuracy of these predictions.The outcomes are shown in Table 3.

Discussion
The results of our experiments on both proprietary and public datasets present a compelling case for the scVARMA model's applicability in predictive maintenance and real-time monitoring.The improved prediction accuracy on both datasets when using the scVARMA model compared to the ARMA and VARMA models underscores the importance of considering cross-channel interactions.
In the experimental results, the scVARMA model shows the smallest increase in error as the prediction step increases from 3 to 5.This suggests that the scVARMA model is not only more accurate but also more stable over longer prediction horizons, likely due to its ability to effectively capture and utilize the interdependencies among multiple time series data.Additionally, by comparing the results of VARMA and scVARMA, although this study performs a sparse operation in identifying the parameters of the VARMA model, this rather leads to an increase in the predictive power.This may be due to the fact that the original VARMA model introduces redundant information when establishing relationships between different sequences.Performing a sparse operation can reduce the introduction of redundant information to some extent, which can improve the performance of the model to some extent.

Conclusions
In current engineering practices, ARMA models are predominantly used for online monitoring due to their simple structures, which meet the requirements for real-time processing.However, these models typically consider each channel independently, neglecting the interdependencies among various channels, which can be critical in complex system monitoring.Acknowledging this limitation, this study turned to the use of VARMA models, which are capable of capturing the dynamic interactions between multiple time series.Nevertheless, the extensive number of parameters in VARMA models poses a significant challenge for real-time forecasting due to their computational burden.
To mitigate this issue and pave the way for future real-time monitoring applications, the research introduced a sparsity-constrained approach to the VARMA structure.By enforcing sparsity, this method aimed to reduce model complexity without compromising the ability to model inter-channel dependencies.This adaptation of VARMA models not only makes real-time computation more feasible but also retains the essential characteristics needed for accurate monitoring and prediction in engineering systems.
The scVARMA model demonstrates promising results, offering a more sophisticated yet computationally efficient alternative to traditional ARMA models.It represents a step forward in the real-time health monitoring of complex systems, providing a foundation upon which future monitoring strategies can be developed to harness the full potential of multivariate time series analysis.

Figure 2 .
Figure 2. Schematic Diagram of System Failure Progression.

Table 1 .
Mean and standard deviation of model in-sample fitting residuals.

Table 3 .
MAE and RMSE comparisons using the CWRU Bearing dataset.