Proper ARMA Modeling and Forecasting in the Generalized Segre’s Quaternions Domain

: The analysis of time series in 4D commutative hypercomplex algebras is introduced. Firstly, generalized Segre’s quaternion (GSQ) random variables and signals are studied. Then, two concepts of properness are suggested and statistical tests to check if a GSQ random vector is proper or not are proposed. Further, a method to determine in which speciﬁc hypercomplex algebra is most likely to achieve, if possible, the properness properties is given. Next, both the linear estimation and prediction problems are studied in the GSQ domain. Finally, ARMA modeling and forecasting for proper GSQ time series are tackled. Experimental results show the superiority of the proposed approach over its counterpart in the Hamilton quaternion domain.


Introduction
Time series theory is a very well-known statistical methodology for the analysis of stochastic processes that has found widespread applications in signal processing, communications engineering, economics, astronomy, meteorology, seismology, oceanography, among other fields [1]. The most extensively used class of stationary process models are the so-called autoregressive moving average (ARMA) processes. The relevance of ARMA models stems from their power to approximate any stationary process arbitrarily well [2,3]. In addition, ARMA models can be cast in a state-space framework and also have enormous significance in prediction problems, since they facilitate the derivation of several efficient prediction algorithms [3].
The traditional settings for developing time series analysis have been the real and complex numbers, while other number systems have demonstrated a high potential to outperform them in some applications. The most paradigmatic example is the system of Hamilton quaternions (HQs), which has proven to be of great value for the description of 3-dimensional (3D) movements, in fields such as graphic design or automated control [4,5]. Actually, HQs have become the most popular and researched system of hypercomplex numbers [6]. In spite of its relevance, the use of this system in time series analysis is scarce [7].
HQs belong to a more general family of algebras, named hypercomplex systems, which extends the real and complex numbers to higher dimensions. Among them, 4D hypercomplex systems are of special relevance since four coordinates are usually considered for describing the physical world [6]. Further, depending on how the product is defined, they can be classified into two types: noncommutative and commutative systems. HQs are of the first type, i.e., they form a normed division algebra, and this property makes them well suited to deal with signal processing benchmark problems, such as the signal estimation problem [8]. Though commutative hypercomplex numbers, which includes the generalized Segre's quaternions (GSQs), are less well known, their interest is increasing within the signal processing community due to their availability to extend the methodologies developed in the complex and real field to the 4D case [9][10][11][12][13][14][15][16][17][18][19][20][21]. These commutative hypercomplex algebras possess zero divisors, which implies that the Euclidean norm is not multiplicative [13]. This drawback, however, has essentially no impact on signal and image processing applications [10].
Interestingly, a recent line of research has discovered that the HQs algebra is not always the best option for processing 4D hypercomplex signals [22][23][24][25]. Specifically, some investigations have revealed that properness, a structural property of the hypercomplex signals, determines the performance of the used processing [24,25]. In all of these papers, the analyses have been performed by comparing HQs with tessarines, and several scenarios where the latter outperforms the former have been shown. It should be noted that tessarines are a subset of GSQs obtained by defining a particular product. In general, hypercomplex algebras are characterized by its product, which is usually provided by a multiplication table (Appendix A in [6]). From an intuitive point of view, it is to be expected that more general commutative hypercomplex algebras than tessarines can attain better performance results. On that basis, special attention is paid to such systems in this paper.
As indicated above, properness is a crucial property of hypercomplex signals. This property has two important effects: it determines which type of processing should be used, and also it achieves a significant decrease in the computational complexity involved in comparison to the real processing. Properness has been investigated in detail in the HQ domain [26][27][28][29][30], and for tessarines [24,25]. To our knowledge, however, properness has not been studied for GSQs yet.
Inspired by all the above discussed, the main objective of this paper is to present the time series analysis in the GSQs domain while analyzing the properness properties in this setting. Specifically, the concept of properness is introduced for GSQ signals and their properties studied. Remarkably, two tests to experimentally check whether real data are proper in the GSQs field are proposed and an estimate of the parameter characterizing the product in the hypercomplex algebra is provided. Such an estimate gives the best value of the parameter under which the properness properties could be fulfilled by the signal of interest. It should be highlighted that, unlike the existing approaches, a specific hypercomplex algebra is not fixed in advance in the proposed strategy, but a broad family of commutative hypercomplex algebras is considered. So, the suggested estimate allows us to select the particular algebra in which it is most likely to achieve, if possible, the properness properties. A simulation example illustrates how this approach is effective.
The linear estimation problem is also tackled. Although GSQ numbers are not a normed algebra and thus, the classical projection theorem cannot be applied to obtain the optimum linear estimate, we exploit its metric space feature to prove the existence and uniqueness of the projection and an extension of the innovations algorithm for proper GSQ signals is devised. As could be expected, the proposed algorithms attain a notable reduction in the computational burden. Finally, ARMA modeling for proper GSQ time series is introduced. For that, we first investigate an extension of the Yule-Walker equations and then, the prediction problem is addressed. The particular structure of the ARMA models allows us to improve the innovations algorithm giving rise to a more efficient prediction method. The steady-state properties of the suggested predictor are also studied. To conclude, the superiority in terms of performance of our proposal in comparison with its counterpart in the HQ domain is numerically analyzed.
The rest of this paper is organized as follows. Section 2 presents both the GSQ random variables and signals. Then, Section 3 introduces the concepts of properness in the GSQs domain and provides the statistical inference methods for testing the properness as well as estimating the associated parameter that characterizes the product in the hypercomplex algebra. This section ends with a simulation example. The linear estimation problem for GSQ random variables and the prediction problem for GSQ random signals are analyzed in Section 4. Section 5 focuses on the analysis of time series in the GSQs setting. It deals with the problems of modeling and forecasting time series data. A final example is included to show the efficiency of the proposed strategy. To improve the readability of the manuscript, all proofs have been deferred to Appendices B-F.

Notation
We designate matrices with boldface uppercase letters, column vectors with lowercase letters, and scalar quantities with lightface lowercase letters. Moreover, the symbols Z, R, C, and S represent the set of integer, real, complex, and GSQ numbers, respectively. In this regard, A ∈ R p×q (respectively, A ∈ S p×q ) indicates that A is a real (respectively, a GSQ) matrix of dimension p × q. Similarly r ∈ R p (respectively, r ∈ S p ) means that r is a p-dimensional real (respectively, a GSQ) vector. In particular, 0 p×q represents the zero matrix of dimension p × q , and I p is the identity matrix of dimension p. In addition, A ij denotes the ij element of the matrix A and diag(·) stands for a diagonal (or block diagonal) matrix with entries (or block entries) specified in the operator argument. Furthermore, superscript "T" represents the transpose. "⊗"stands for the Kronecker product. E[·] represents the expectation operator. R{·} denotes the real part of a GSQ, and |z| is either the complex modulus if z ∈ C or the absolute value if z ∈ R. Finally, δ nl denotes the Kronecker delta function.
Unless otherwise stated, all random variables are assumed to have a zero-mean throughout this paper.
A list of symbols that are used in the paper is given in Table 1. Symbol Meaning x r (t) Real vector formed with the components of x ∈ S p x(t) Augmented GSQ signal vector formed by x ∈ S p and its conjugations Γ a (t, s) Correlation function of a real random signal vector a(t) ∈ R p Γ ab (t, s) Correlation function of the real random signal vectors a(t) ∈ R p and b(t) ∈ R q Γ ν , for x ∈ S p , y ∈ S q , and ν = 1, 2 , for x ∈ S p and y ∈ S q Γ x Sample autocorrelation matrix of a random vector x C α Set of the random vectors {x 1 , . . . , x m }, with x i ∈ S p , i = 1, . . . , m C α Set of the augmented random vectors {x 1 , . . . ,x m }, withx i ∈ S 4p , i = 1, . . . , m G C α Set of all finite linear combinations of elements of C α GC α Set of all finite linear combinations of elements ofC α y SWL SWL estimator of y respect to C α y S α S α estimator of y respect to C α y QSL QSL estimator of y respect to C α SWL SWL estimation error associated toŷ SWL S α S α estimation error associated toŷ S α QSL QSL estimation error associated toŷ QSL

Generalized Segre's Quaternion Random Variables
This section establishes the basic concepts and properties of interest in the GSQ domain. For that, we first introduce the GSQ random variables and then the GSQ random signals are considered.
A random variable x ∈ S can be defined as x = a + ib + jc + kd where a, b, c, d ∈ R are random variables and the imaginary units (i, j, k) satisfy the following multiplication rules: where α is a nonzero real number. As is well-known, the form in which the product is defined is an essential feature of hypercomplex numbers. The product defined in (1) is commutative, associative, and distributive over addition; however, these algebras have divisors of zero. In contrast to GSQs, the product in the HQs is defined in such a way that the algebra becomes noncommutative and does not have divisors of zero [6]. Further, two special algebras in GSQs are of interest: for α > 0, S is called the hyperbolic quaternions system and for α < 0, S is the elliptic quaternions system [6]. In particular, if α = −1 then, S is named the tessarine numbers.
The real vector of x ∈ S is defined by Moreover, we define the following auxiliary GSQ random variables: The triplet (x i , x j , x k ) are the so-called principal conjugations and obey the following properties.
Consider the vector random variables x ∈ S p and y ∈ S q . We define the following functions: where y = [y 1 , . . . , y q ] T , y j = [y j 1 , . . . , y j q ] T , and y α = [y 1α , . . . , y qα ] T ; however, for reasons that will become clear, they are denoted by Γ 1 3. Now, we extend some of the above concepts to the case of random signals. Consider a random signal vector , and d i (t) real random signals. The real vectors associated to x(t) is denoted by In a similar way to (2), we define the following functions for the random signal vectors x(t) ∈ S p and y(t) ∈ S q : It is easy to check, for α = 0, that We define the augmented vector of x(t) as The following relationship between the augmented vector and the real vector x r (t) given in (3) can be established Moreover, from Property 1, the next result can be stated. Table A1 (see Appendix A) in function of the correlation functions (denoted by Γ) of the components of the real vector (3).

Properness Conditions
This section is devoted to analyze the concepts of properness for GSQ random signals. After introducing the notion of properness, a characterization of this property based on the correlations of the real vectors is presented.

Definition 1.
A random signal vector x(t) ∈ S p is said to be H α 0 -proper if, and only if, there exists a value α 0 > 0 such that the functions Γ 1 xx η (α 0 , t, s), η = i, j, k, vanish ∀t, s ∈ Z. In a similar manner, two random signal vectors x(t) ∈ S p and y(t) ∈ S q are cross H α 0 -proper if, and only if, there exists a value α 0 > 0 such that the functions Γ 1 xy η (α 0 , t, s), η = i, j, k, vanish ∀t, s ∈ Z. Finally, x(t) and y(t) are jointly H α 0 -proper if, and only if, are H α 0 -proper and cross A similar definition about E α 0 -properness can be given by relplacing α 0 > 0 by α 0 < 0, From Table A1, we have the following characterization result. Proposition 1. Let x(t) ∈ S p be a random signal vector. Then 1.
x(t) is H α 0 -proper if, and only if, the correlation of their real vectors verifies x(t) is E α 0 -proper if, and only if, the correlation of their real vectors verifies From an experimental standpoint, it becomes necessary to develop a method for checking whether a random vector x ∈ S p is proper and devise an estimate of the associated α 0 . Specifically, based on the information supplied by a random sample, we aim to test if x is H α 0 -proper (or E α 0 -proper) and we wish to estimate the suitable value of α 0 under which such properness properties could be satisfied by x, if possible. To this end, we consider the following statistical hypotheses test: The following result derives two generalized likelihood ratio tests (GLRT), Wilks' statistics, for the problem (8). These statistics are denoted by φ ν (x 1 , . . . , x n ). The value assigned to the letter "ν" as subscript, superscript or in any other role will indicate when H α 0 -properness (ν = 1) or E α 0 -properness (ν = 2) should be assumed. This convention is applied henceforth. Notice that, these tests generalize the one given in ( [24], Theorem 1) to the GSQs domain. The main difference between them lies in the choice of the appropriate α 0 to achieve properness conditions. Theorem 1. Given n independent and identically distributed random samples x 1 , . . . , x n of a random vector x ∈ S p such that x r follows a Gaussian distribution then, the GLRT statistics for (8) are given by where det(·) α 0 denotes the determinant under the product associated to α 0 , The parameter α 0 can be estimated as with c 1 = −2np ln(2π) + np ln(16|α|). Further, assuming that H 0 is true, the distributions of the statistics φ 1 (x 1 , . . . , x n ) and φ 2 (x 1 , . . . , x n ) tend to a chi-squared distribution with degrees of freedom equal to 6p 2 − 1 and 2p(3p + 1) − 1, respectively, as the sample size tends to infinity.

Example
Consider a random vector x = [x 1 , x 2 ] T ∈ S 2 such that the real vectors of x 1 and x 2 given by (3) verify and The following two cases are studied: 1. Case 1: λ = −0.7 and µ = 0.8 in an hyperbolic quaternions algebra with α 0 = 2. From Proposition 1, we have that the vector x is H 2 -proper under these conditions. Moreover, from Theorem 1, the GLRT statistic φ 1 (x 1 , . . . , x n ) converges to a χ 2 (23).
On the one hand, the behavior of the cumulative distribution functions (CDFs) of φ ν (x 1 , . . . , x n ), ν = 1, 2, is analyzed in both cases in terms of the sample size. For that, and through Montecarlo simulations, 2000 values of the statistics φ ν (x 1 , . . . , x n ), ν = 1, 2, have been generated for n = 50, 100, 300, 500. The corresponding empirical CDFs are displayed in Figure 1. At a glance, this figure shows the rapid convergence of the empirical CDFs to the target chi-squared distributions.
On the other hand, we assess the accuracy of the estimations of α 0 depending on the sample size n. The maximization process in (9) can be achieved using the optimization toolbox in Matlab (The Nelder-Mead search algorithm is used). Tables 2 and 3 depict the 95% confidence intervals for α 0 obtained from 2000 simulated runs. As can be expected, the amplitudes of the confidence intervals become narrower as the sample size grows.

Linear Estimation and Prediction
The objective of this section is to solve the linear estimation problem for proper GSQ random variables. Since it is not possible to define a norm in the set of GSQs then, we cannot apply the projection theorem to obtain the optimum linear estimator. So, we first need to define a suitable metric and thus, we prove the existence and the uniqueness of the projection.
Given the random variables x, y ∈ S, we define the product where Γ 3 xy (α) is the scalar version of Γ 3 xy (α) given in (2). For any random variables x, y, u, v ∈ S and deterministic coefficients λ, β ∈ S, γ, δ ∈ R, it follows that 1.
Thus, we can define a distance as where x 2 α = R{Γ 3 x (α)}. Consider the set of random vectors C α = {x 1 , . . . , x m }, with x i ∈ S p , i = 1, . . . , m, and denote by G C α the linear span of C α , i.e., the set of all finite linear combinations of elements of C α . The issue of existence and uniqueness of the projection of a given random variable y ∈ S onto the set G C α is now briefly investigated. This projection is denoted byŷ α .
In a metric space, neither the existence nor the uniqueness ofŷ α is assured; however, following a reasoning similar to [24], the existence and uniqueness ofŷ α can be proved. Further,ŷ α is the projection of y onto the set G C α if, and only if, y −ŷ α , x α = 0, ∀x ∈ G C α . As a consequence, if {ϑ 1 , . . . , ϑ r } is a basis of G C α then, where the deterministic coefficients h i ∈ S are obtained from the system Now, we are prepared to introduce the notion of linear estimator in the GSQ domain.

Definition 2.
The projection of y onto the set G C α ,ŷ α , is called the S α estimator of y respect to C α and, from now on, it is denoted byŷ S α . If the information set is formed by augmented vectors defined in (5), i.e., if we considerC α = {x 1 , . . . ,x m } withx i ∈ S 4p , i = 1, . . . , m, then, the projection of y onto the set GC α is called SWL estimator of y respect to C α and it is denoted byŷ SWL α .
These concepts are easily extended to the vectorial case. For example, the S α estimator of the random vector y ∈ S q respect to C α isŷ S α = [ŷ S 1α , . . . ,ŷ S qα ], whereŷ S jα is the projection of the component jth of y onto the set G C α . Theorem 2. Consider a random variable y ∈ S and the set of random vectors C α = {x 1 , . . . , x m }, with x i ∈ S p , i = 1, . . . , m. Then, 1.
The SWL estimator of y respect to C α is obtained aŝ where the deterministic vectors h ji (α) ∈ S p are computed through the equation The S α estimator of y respect to C α is calculated aŝ where the deterministic vectors g i (α) ∈ S p are computed through the equation If x 1 , . . . , x m are jointly H α 0 -proper (respectively, E α 0 -proper) and y is cross H α 0 -proper (respectively, E α 0 -proper) with each element of {x 1 , . . . , x m } then,ŷ SWL =ŷ S α 0 .
Remark 2. Theorem 2 shows that whatever hypercomplex algebra you choose provides the same SWL estimator. Likewise, the computational burden of the SWL estimator can be notably reduced whenever hyperbolic or elliptic conditions of properness are fulfilled (compare (13) with (14)). Effectively, while the computational complexity of SWL estimators is of order O(64p 3 m 3 ), this is of order O(p 3 m 3 ) under hyperbolic or elliptic properness. More importantly, this reduction in computational complexity cannot be achieved in the real numbers domain.
Now, we are concerned with the prediction problem for random signals. Consider a random signal vector x(t) ∈ S p . Our aim is to obtain the S α one-stage predictor of x(t + 1) from the information contained in the set C t α = {x(1), . . . , x(t)}, denoted byx S α (t + 1|t). Analogously, the SWL one-stage predictor is denoted byx SWL (t + 1|t).
Note that a similar result to Theorem 2 can be stated for random signals, but we do not include it here for the sake of brevity. It should be remarked, however, that a computational problem remains still unsolved. Although the estimation problem is indeed simplified due to the reduction in dimension implied by the properness conditions, we have to cope with the inverse in (14) as the number of observations t grows. The following result overcomes this problem for proper signals.

Proper ARMA Models
This section constitutes the core of the paper. Starting from the definition of proper ARMA models in the GSQs domain we then solve the one-stage prediction problem.

ARMA Modeling
Next, we adapt the proper ARMA signal concept to the GSQs domain and extend the Yule-Walker equations to this kind of signals. To do so, we first extend the concept of stationarity to the GSQs domain in the proper signal scenario.

Definition 4.
A random signal vector x(t) ∈ S p is said to be a H α 0 -proper ARMA signal of orders p and q (ARMA(p, q)) if it is H α 0 -proper, stationary, and can be modeled by the following system: where G(0) = I p , F(i), G(i) ∈ S p×p are deterministic matrices and v(t) is a H α 0 -proper white noise verifying Γ 1 v (α 0 , t, s) = Ωδ ts (21) with Ω ∈ S p×p a deterministic matrix. The concept of a E α 0 -proper ARMA(p, q) signal is defined analogously by replacing H α 0 -proper by E α 0 -proper, and Γ 1 v by Γ 2 v in (21).
The following result extends the Yule-Walker equations to the case of proper ARMA signals.

Example
By way of illustration, we consider a proper ARMA(1,1) model. With this model, we first carry out a performance comparative analysis between the S α 0 one-stage predictor (25), and its counterpart in the HQ domain,x QSL α 0 (t + 1|t). Our aim is to show the superiority in terms of performance of the proposed approach respect to the usual HQ methodology under different initial conditions. Finally, a steady-state performance analysis is carried out.
Note that the conditions of Proposition 2 and Corollary 2 are fulfilled and hence, x S α 0 (t + 1|t) can be obtained through the recursions (25) and (26) and the steady-state results in (27) hold.
The errors associated tox S α 0 (t + 1|t) andx QSL α 0 (t + 1|t) depend on the value α 0 and thus, they are denoted by SWL α 0 (t + 1|t) and QSL α 0 (t + 1|t), respectively. To assess the performance of both approaches, the following measure is computed: Figure 2 depicts the measure DP α 0 (t) in (29), for t = 1, . . . , 100, in the above two scenarios. As can be seen, the S α 0 one-stage predictor outperforms the HQ one-stage predictor in both scenarios. Furthermore, the measure becomes greater as |α 0 | grows, i.e., as the degree of HQ improperness grows; being DP α 0 (t) slightly greater for the E α 0 -proper case.  Table 4. Further, Figure 3 shows as SWL α 0 (t + 1|t) converges toward their respective steady-state errors when t grows. These plots indicate that the convergence is rapid.

Discussion
Recent investigations have shown that HQs are not always the best choice for processing 4D hypercomplex signals. These results suggest distinct strategies depending on whether properness conditions are complied with. As the main characteristic, they offer a significant computational saving in relation to the real-valued processing. Unlike earlier research, the approach described in this paper has the additional advantage of the flexibility in the choice of the hypercomplex algebra. In our opinion, this research topic opens a broad range of possibilities for studying new applications in signal processing. This paper precisely proposes the time series analysis as a possible field of application of the proper GSQ signal processing. Other potential areas include dynamic linear models, filtering and smoothing, signal detection, etc.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study: in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. where x 2