Robust Autoregression with Exogenous Input Model for System Identiﬁcation and Predicting

: Autoregression with exogenous input (ARX) is a widely used model to estimate the dynamic relationships between neurophysiological signals and other physiological parameters. Nevertheless, biological signals, such as electroencephalogram (EEG), arterial blood pressure (ABP), and intracranial pressure (ICP), are inevitably contaminated by unexpected artifacts, which may distort the parameter estimation due to the use of the L2 norm structure. In this paper, we deﬁned the ARX in the L p ( p ≤ 1) norm space with the aim of resisting outlier inﬂuence and designed a feasible iteration procedure to estimate model parameters. A quantitative evaluation with various outlier conditions demonstrated that the proposed method could estimate ARX parameters more robustly than conventional methods. Testing with the resting-state EEG with ocular artifacts demonstrated that the proposed method could predict missing data with less inﬂuence from the artifacts. In addition, the results on ICP and ABP data further veriﬁed its efﬁciency for model ﬁtting and system identiﬁcation. The proposed L p -ARX may help capture system parameters reliably with various input and output signals that are contaminated with artifacts.


Introduction
In the field of linear system identification, the autoregression with exogenous input (ARX) model [1][2][3] is the most widely used linear dynamic model due to being easily implemented by L2 norm-based analytical solutions and capturing the linear relationship in various real-world systems [4]. In recent years, the ARX model has been widely applied in electroencephalogram (EEG)-related system identification [5,6], EEG network analysis [7][8][9], and brain-computer interfaces [10]. For instance, Burke and colleagues used the ARX model to combine both feature filtering and feature extraction for EEG signal processing [11]. Li and colleagues represented the dynamic variation of parameters in time-varying ARX models with a series of multiwavelet bases and applied orthogonal least squares (OLS) regression to improve the robustness of the parameter estimation, which captured the transient information of the inherent dynamics of nonstationary systems [12]. Moreover, Qidwai and colleagues developed an ARX-based dynamic model for EEG signal

SISO ARX
A SISO ARX model is usually represented as where y(n) and u(n) (n = 1, . . . N) are the output and input of the system, respectively, e(n) is a zero-mean white Gaussian noise with variance σ 2 w . a i (i = 1, . . . , n a ) and b j (j = 1, . . . , n b ) are the model parameters for the output and input, respectively, n a and n b are the degree of the ARX model, and n k is the time delay.

MISO ARX
In real applications, there are usually multiple system inputs for the ARX model, and so we can transform (1) into MISO-ARX as, where m is the number of system inputs, u j is the j-th system input and its coefficient vector is b j . The matrix expression for (10) can be formed as, where The delay matrix for multiple system inputs Φ M ∈ R (N−q)×((m+1)×q) can be defined as To this end, the parameter vector WM can also be solved by the LS algorithm with Formula (7).
In this work, we denoted both the L2 norm-based SISO-ARX and MISO-ARX as LS-ARX to differentiate them from our proposed model. Though LS-ARXs have been successfully applied in previous studies, it has been proven that these models are easily affected by outliers because of the L2 norm structure utilized for parameter estimation [18][19][20]31]. To robustly estimate the ARX model, some strategies, such as sparse constraints, are proposed in various least square regression-based versions to alleviate the noise effect. For example, Mattsson and colleagues proposed a sparsity-enhanced identification approach, which exploits the sparse nature of the finite impulse responses (FIRs) of a MIMO system [21]. To further improve the robustness of the recursive LS-based ARX model, Rahim and colleagues proposed leaky least mean squares (LLMS) [33]. Though these methods have been proposed to improve the robustness of ARX parameter estimation, most of them still adopted the L2 norm-based residual for the objective functions, and this will inevitably exaggerate the outlier effect regardless of how the parameters are restricted. Therefore, in this work, we proposed to estimate the ARX model in the Lp (p ≤ 1) norm space to restrain the influence of outliers. The proposed method is termed hereinafter as Lp -ARX.

Lp (p ≤ 1) Norm-Based ARX Model
The ARX model defined in the Lp (p ≤ 1) norm space can be expressed as, where • p denotes the Lp norm of a matrix or a vector. Unless otherwise stated, the p is defined as p ≤ 1 in the following text.
The gradient of (13) is formulated as, where the sign function sgn(i) is defined as, In the field of unconstrained optimization problems, compared with the methods based on the first derivative, the Newton method uses the second derivative, which possesses good convergence and high precision. However, the Newton method has some drawbacks. Second partial derivatives are difficult to compute while computational complexity is high. BFGS is the most popular and effective quasi-Newton method, which has the advantage of Newton's method without the heavy computational burden [34][35][36][37][38][39][40][41]. The basic idea of BFGS is the use of a matrix to approximate the inverse matrix of the Hessian. In this work, with the gradient defined in (14), we adopted the BFGS to find the optimal W* and updated the pseudo-Hessian matrix as, where ∆g k denotes the change of the gradient and ∆W k is the step length of W in the kth iteration. H k is the approximate matrix. With (14) and (16), the parameters in the Lp-ARX can be estimated with the following iterative procedures (Algorithm 1). After the above iterations, the optimal W* in (13) can be solved. The computational complexity of the parameter estimation in each iteration is O(((m + 1) × q)(8((m + 1) × q) + 2(N − q))), where m is the number of system inputs, N is the number of samples, and q is the order of the MVAR model. In the current work, we set the termination error as 0.05, and the p in the Lp-ARX was set to five values, i.e., 1.0, 0.8, 0.6, 0.4, and 0.2.

Algorithm 1 Lp (p ≤ 1) BFGS
Require: Iteration number n, termination error ε ∈ [0, 1], initialize W as a random nonzero 2q-dimensional vector, initial pseudo-Hessian matrix H 0 . For k from 1 to n do Compute the gradient g k by Solve the coupled linear equations H k d k = −g k , and calculate d k Find the optimal learning velocity α k by The EEG data reported in [42] were selected for this simulation study. All experimental and surgical procedures were performed in accordance with protocols approved by the RIKEN ethical committee (No. H22-2-202(4)) and the recommendations of the Weatherall report, "The use of nonhuman primates in research". In the experiment, a blindfolded macaque monkey was seated in a primate chair with its hands tied. Two trials of simultaneous resting-state EEG and electrocorticogram (ECoG) activity were measured, with each trial lasting 5 min. For EEG measurements, a standard 10-20 EEG electrode system was used, where electrode Cz was removed [43]. This EEG recording was sampled at 4906 Hz and downsampled to 1000 Hz to match with the ECoG. Furthermore, the data were bandpass filtered from 0.3 Hz to 40 Hz.
For the SISO-ARX model, we selected 4 s of filtered EEG data at F3 as the system input. The model parameters are defined in [44] as, and where W a is the set of parameters for the system input, and W b is the set of parameters for the system output. With these parameters, we estimated the system outputs (4000 sample points) according to (1). For the purpose of comparison, the system input and output were divided into two equal-length segments. In this simulation, the first segments of both the system input and output were contaminated with outliers. Then, we utilized LS-ARX, Huber-ARX, and Lp-ARX to estimate model parameters and calculated the parameter bias between the predefined parameters and the estimated ones. The second segment of the system output was regarded as the desired output. With the estimated model parameters and the second segment of the system input, we further predicted the model output and calculated the errors between the predicted output and the desired output. The parameter bias and predicting errors were calculated to evaluate the performance of the proposed methods for outlier suppression. In addition, the outliers for the system input were generated from a Gaussian distribution with variance σ 2 = 0.5 and mean µ 1 = 111.0, and the corresponding value was twice the maximum value of the absolute value of the input signal. The outliers for the system output were generated from a Gaussian distribution with variance σ 2 = 0.5 and mean µ 2 = 123.0, whose value was twice the maximum value of the absolute value of the output signal, according to the simulation based on experimental data. For MISO-ARX, we mainly focused on a model with two inputs and a single output (TISO). The 4 s EEG data at F3 and F4 were selected as the system input. The parameters for the model output were defined in [44] as, The parameters for the model input were defined in [44] as, and For the TISO system, we carried out the same experimental processing and performance indexes as the SISO system.

Effect of Outlier Occurrence Rate
In this part, we evaluated the performance of LS-ARX, Huber-ARX, and Lp-ARX with various outlier numbers. The number of outliers added into the first segment of the system input and output increased from 8 to 24 in steps of 4, and the timings of the outliers were randomly determined. The second segment of the system input was used to predict the system output, and the second segment of the output without outlier influence was defined as ground truth to evaluate prediction performance. Figure 1 demonstrates the waveforms of the system output and input. We used the signals on the left side of the green dotted line for parameter estimation and the signal in Figure 1a for fitting performance evaluation of the ARX models. For each outlier condition, we repeated the procedure 200 times to obtain the average errors of the parameter estimation and output prediction. The average results of the model parameter bias and fitting errors for the SISO system are collected in Tables 1 and 2. A paired t-test was used to evaluate the significant differences in the results between LS-ARX and Lp-ARX (p = 1.0, 0.8, 0.6, 0.4, 0.2). Tables 1 and 2 show that the mean values of the model parameter bias and fitting errors for the Lp-ARX method are statistically smaller than those of the LS-ARX and Huber-ARX, which reveals that the Lp-ARX method is more robust to outliers than the other two methods.  0.9940 ± 0.007 * 0.9936 ± 0.008 * 0.9936 ± 0.008 * 0.9929 ± 0.008 * 0.9923 ± 0.009 * "*" indicates the significant difference (p ≤ 0.01) between least square (LS)-autoregression with exogenous input (ARX) and the Lp-ARX model. To demonstrate the fitting results intuitively, we depicted the waveforms of the ground truth and the predicted outputs of the LS-ARX and the five Lp-ARXs in Figure 2. The blue lines depict the desired output, and the red lines demonstrate the predicted results. For the TISO system, the bias of parameter estimation and output prediction were collected in Tables 3 and 4, respectively. Similar to Tables 1 and 2, the bias and errors of Lp-ARX are statistically smaller than those of LS-ARX and Huber-ARX, and the Huber-ARX is unstable.

Application to Actual EEG Recordings
The above simulation study reveals that outliers have obvious influences on ARX parameter estimation. In EEG recordings, ocular artifacts are inevitable, and they will unexpectedly disturb the signal analysis. Based on the similar results for various Lp-ARX models revealed in the simulation study and the relatively high efficiency of L1-ARX implementation, in this section, we concentrated on the different performances between L1-ARX and LS-ARX in the SISO system when they were applied to resting-state EEG datasets that are contaminated with ocular artifacts.
The details of this dataset can be found in [32]. This study was carried out in accordance with the recommendations of the Ethics Committee of the University of Electronic Science and Technology of China. For each subject, we visually selected one 1.5 s EEG segment with obvious ocular artifacts from the whole recording for performance evaluation. For the real data study, we selected the EEG at Fp1 as the system input and the EEG at Fpz as the system output. For both signals, the first 1 s data with obvious ocular artifacts were utilized to estimate the model parameters. The remaining part of the output signal was regarded as the ground truth. We predicted the system output by substituting the remaining system input and the estimated model's parameters into Equation (1). We calculated the predicting errors between the predicted output and the ground truth to evaluate the performance of L1-ARX and LS-ARX on system identification under the influence of ocular artifacts. Two typical waveforms of both the input and output signals are illustrated in Figure 3. Figure 4 depicts the fitting performance for the two methods, where the red curve is the ground truth, the green curve depicts the output predicted by L1-ARX and the blue curve represents the predicted output of LS-ARX. We can see intuitively that the fitting result of L1-ARX is better than that of LS-ARX. Though the results presented here are from a single subject, the results from other subjects are similar to those revealed in Figure 4. In Table 9, we can see that the L1-ARX model is a more robust method for dealing with outliers than the LS-ARX model according to the fitting errors.  The predicted results of the model. The red curve is the desired system output, the green curve is the predicted output by L1-ARX, and the blue curve represents the output predicted by LS-ARX. For the convenience of display, we only draw the prediction results for a certain period of time.

Application to Actual EEG Recordings
Another application for ARX in system identification was to estimate intracranial pressure (ICP) by ABP. However, both ICP and ABP are usually contaminated with artifacts, which may influence the parameter estimation of the ARX model and result in serious fitting errors. In this subsection, we utilized the dataset reported to evaluate the performance of both L1-ARX and LS-ARX. This database was recorded from 15 patients. Each patient had 5-to 25-min-long ABP and ICP recordings. The details of this dataset can be found in [5]. For each subject, we visually selected two 2 s segments from the whole recording of both the ICP and ABP for analysis. In this study, we regarded the ABP as the system input and the ICP as the system output. For both the input and output, the first segment is depicted in Figure 5 for model-parameter estimation, and Figure 6 depicts the second segment that is used for performance evaluation.  With the estimated system parameters and the second segment of the system input, we predicted the system output. The mean value of the ICP, predicting errors and correlation coefficients between the desired and the predicted system output are collected in Tables 10 and 11. "*" indicates a significant difference (p < 0.05 for all variables) between L1-ARX and LS-ARX. To intuitively compare the prediction efficiency of the two models, we show the system output waveforms in Figure 7, where the red curve is the desired system output, the green curve is the predicted output by L1-ARX, and the blue curve represents the predicted output by LS-ARX.  Figure 7. The system output waveforms. The waveform depicted by the red curve is the desired system output, the blue curve depicts the predicted results by LS-ARX, and the green curve depicts the predicted results by L1-ARX. For the convenience of display, we only draw the prediction results for a certain period of time.

Discussion
The existence of outliers is a major challenge in biomedical signal analysis. Our simulation study based on intracranial monkey EEG quantitatively evaluates the robustness of the proposed model to outlier effects. The conducted comparison reveals that both the proposed Lp-ARX (p = 1.0, 0.8, 0.6, 0.4, 0.2) and conventional L2 norm-based ARX are influenced by outliers. Both model parameter biases and waveform fitting errors in Tables 1-4 increase when the number of outliers increases from 8 to 24. Another aspect that needs to be noted is that the parameter bias and fitting errors in Tables 3 and 4 are  higher than those in Tables 1 and 2, which may be caused by the lower causal relationship between the two selected system inputs. Though the parameter bias and fitting errors in Tables 3 and 4 are higher than those in Tables 1 and 2, the results of all Lp-ARXs (p = 1.0, 0.8, 0.6, 0.4, 0.2) are still better than those of LS-ARX. Tables 5-8 further reveal that ARX parameter estimation is largely influenced by the strength of the outliers, i.e., stronger outliers result in more biased ARX parameters. However, for all the simulation results, the Lp-ARX (p = 1.0, 0.8, 0.6, 0.4, 0.2) models consistently show significantly better performance than LS-ARX with p < 0.05 for all conditions. Outliers with large amplitudes will be exaggerated in the objective function of the ARX model because of the square property when they are adopted for parameter estimation. Therefore, the L2 norm-based ARX focuses on fitting the outliers while other detailed information is neglected. When the ARX model is structured in Lp (p ≤ 1) norm space, the objective functions hold the ability to suppress outlier influence, as shown in previous studies [32]. Evidently, the different mathematical characteristics of the Lp norm account for different performances of the Lp-ARX models in the simulation study. In theory, smaller p (p ≤ 1) will facilitate the compress outlier effect, as shown in Tables 4 and 8. However, when p is smaller than one, the gradient of the Lp object function involves the rooting operation, which may dramatically increase the algorithm complexity. Moreover, when p is close to zero, the estimation of the gradient becomes unstable and results in biased model parameters.
In addition, to enrich the content, we also evaluated Huber loss according to the same experimental configuration. The performance of Huber-ARX is better than that of LS-ARX but weaker than that of Lp-ARX. Moreover, the variance of the results is too large, which indicates that the model is unstable.
Similar to the findings in the simulation study, the application to real EEG recordings also consistently indicates better performances of L1-ARX compared with LS-ARXs in dealing with ocular artifacts. The results in Table 9 demonstrate that a smaller fitting error can be obtained by L1-ARX than by LS-ARX, which is similar to what we discussed above. In addition to system identification in EEG, various studies have shown that the change of ICP is tightly associated with ABP waveforms, and the estimation of ICP using ICP-related ABP by a series of system identification strategies offers a promising solution for noninvasive ICP assessment [5]. In this work, we conducted an analysis of performance evaluation for both L1-ARX and LS-ARX on ICP estimation with ABP when the data were contaminated with outliers. As shown in Table 11, the mean ICP error calculated with L1-ARX is significantly smaller than that calculated by LS-ARX. Consistent with the ICP errors, the mean ICP correlation coefficients calculated with L1-ARX are significantly higher than those calculated with LS-ARX. The coefficients modeling the relationship between ICP and ABP play a crucial role in the system output prediction. The better results for both ICP error and the correlation coefficients further indicate the robustness of L1-ARX in the application of system identification when the data are contaminated with outliers.
The conducted experiments of both real EEG data and ICP estimation with ABP illustrate the superiority of L1-ARX relative to LS-ARX in system identification when both the system input and output are contaminated with artifacts. In other words, the performance efficiency of the Lp-ARX (p ≤ 1) method is better than the conventional LS method in data fitting processing, especially when the data are affected by outliers. Moreover, we think the Lp-ARX model may also be a potential method for biomedical signal processing in other applications, such as magnetoencephalography (MEG). We will conduct more investigations to probe the usefulness of our proposed Lp-ARX in our future work.

Conclusions
In this work, we utilize the merits of the Lp (p ≤ 1) norm for the suppression of outliers to build a new ARX model. To verify the robustness of the proposed model, we conducted the simulation experiments and adopted it in the actual datasets. Moreover, to further verify the proposed model, we conducted the compared experiments with the conventional methods. The simulation experimental results demonstrated that the proposed method could estimate ARX parameters more robustly than conventional methods under various outlier conditions, whether for the outlier occurrence rate or the outlier strength. In the actual application, the resting-state EEG with ocular artifacts demonstrated that the proposed method could predict missing data with less influence from the artifacts, and the results on ICP and ABP data further verified its efficiency for model fitting and system identification. The proposed Lp-ARX may hold great potential in the processing of biomedical signals. Furthermore, we adopted the Lp (p ≤ 1) norm in the widely used linear dynamic model, and the non-linear model has not been explored in this work. Considering the underlying non-linear relationship in time series, in future work, we will extend the proposed Lp (p ≤ 1) method to the non-linear ARX models, which is capable of capturing the more complicated couplings among time series.