A Generic Approach to Covariance Function Estimation Using ARMA-Models

: Covariance function modeling is an essential part of stochastic methodology. Many processes in geodetic applications have rather complex, often oscillating covariance functions, where it is difﬁcult to ﬁnd corresponding analytical functions for modeling. This paper aims to give the methodological foundations for an advanced covariance modeling and elaborates a set of generic base functions which can be used for ﬂexible covariance modeling. In particular, we provide a straightforward procedure and guidelines for a generic approach to the ﬁtting of oscillating covariance functions to an empirical sequence of covariances. The underlying methodology is developed based on the well known properties of autoregressive processes in time series. The surprising simplicity of the proposed covariance model is that it corresponds to a ﬁnite sum of covariance functions of second-order Gauss–Markov (SOGM) processes. Furthermore, the great beneﬁt is that the method is automated to a great extent and directly results in the appropriate model. A manual decision for a set of components is not required. Notably, the numerical method can be easily extended to ARMA-processes, which results in the same linear system of equations. Although the underlying mathematical methodology is extensively complex, the results can be obtained from a simple and straightforward numerical method.


Introduction and Motivation
Covariance functions are an important tool to many stochastic methods in various scientific fields. For instance, in the geodetic community, stochastic prediction or filtering is typically based on the collocation or least squares collocation theory. It is closely related to Wiener-Kolmogorov principle [1,2] as well as to the Best Linear Unbiased Predictor (BLUP) and kriging methods (e.g., [3][4][5]). Collocation is a general theory, which even allows for a change of functional within the prediction or filtering step, simply propagating the covariance to the changed functional (e.g., [6] (p. 66); [4] (pp. 171-173); [7]). In this context, signal covariance modeling is the crucial part that directly controls the quality of the stochastic prediction or filtering (e.g., [8] (Ch. 5)).
In common practice, covariance modelers tend to use simple analytical models (e.g., exponential and Gauss-type) that can be easily adjusted to the first, i.e., to the short range, empirically derived covariances. Even then, as these functions are nonlinear, appropriate fitting methods are not straightforward. On the other hand, for more complex covariance models, visual diagnostics such as half-value width, correlation length, or curvature are difficult to obtain or even undefined for certain (e.g., oscillatory) covariance models. In addition, fitting procedures to any of these are sparse and lack an automated procedure (cf. [17,22]).
In geodesy, autoregressive processes are commonly used in the filter approach as a stochastic model within a least squares adjustment. Decorrelation procedures by digital filters derived from the parametrization of stochastic processes are widely used as they are very flexible and efficient for equidistant data (cf. e.g., [36][37][38][39][40][41][42]). Especially for highly correlated data, e.g., observations along a satellite's orbit, advanced stochastic models can be described by stochastic processes. This is shown in the context of gravity field determination from observations of the GOCE mission with the time-wise approach, where flexible stochastic models are iteratively estimated from residuals [43][44][45].
The fact that an AR-process defines an analytical covariance sequence as well (e.g., [23]) is not well established in geostatistics and geodetic covariance modeling. To counteract this, we relate the covariance function associated with this stochastic processes to the frequently used family of covariance functions of second-order Gauss-Markov (SOGM) processes. Expressions for the covariance functions and fitting methods are aligned to mathematical equations of these stochastic process. Especially in collocation theory, a continuous covariance function is necessary to obtain covariances for arbitrary lags and functionals required for the prediction of multivariate data. However, crucially, the covariance function associated with an AR-process is in fact defined as a discrete sequence (e.g., [26]).
Whereas standard procedures which manually assess decaying exponential functions or oscillating behavior by visual inspection can miss relevant components, for instance high-frequent oscillations in the signal, the proposed method is automated and easily expandable to higher order models in order to fit components which are not obvious at first glance. A thorough modeling of a more complex covariance model does not only result in a better fit of the empirical covariances but results in a beneficial knowledge of the signal process itself.
Within this contribution, we propose an alternative method for the modeling of covariance functions based on autoregressive stochastic processes. It is shown that the derived covariance can be evaluated continuously and corresponds to a sum of well known SOGM-models. To derive the proposed modeling procedure and guidelines, the paper is organized as follows. After a general overview of the related work and the required context of stochastic modeling and collocation theory in Section 2, Section 3 summarizes the widely used SOGM-process for covariance function modeling. Basic characteristics are discussed to create the analogy to the stochastic processes, which is the key of the proposed method. Section 4 introduces the stochastic processes with a special focus on the AR-process and presents the important characteristics, especially the underlying covariance sequence in different representations. The discrete sequence of covariances is continuously interpolated by re-interpretation of the covariance sequence as a difference/differential equation, which has a continuous solution. Based on these findings, the proposed representation can be easily extended to more general ARMA-processes as it is discussed in Section 5. Whereas the previous chapters are based on the consistent definition of the covariance sequences of the processes, Section 6 shows how the derived equations and relations can be used to model covariance functions from empirically estimated covariances in a flexible and numerically simple way. In Section 7, the proposed method is applied to a one-dimensional time series which often serves as an example application for covariance modeling. The numerical example highlights the flexibility and the advantage of the generic procedure. This is followed by concluding remarks in Section 8.

Least Squares Collocation
In stochastic theory, a measurement model L is commonly seen as a stochastic process which consists of a deterministic trend Aξ, a random signal S, and a white noise component N (cf. e.g., [3] (Ch. 2); [5] Whereas the signal part S is characterized by a covariance stationary stochastic process, the noise N is usually assumed as a simple white noise process with uncorrelated components. Autocovariance functions γ(|τ|) or discrete sequences {γ |j| } ∆t are models to describe the stochastic behavior of the random variables, i.e., generally γ S (|τ|) and γ N (|τ|), and are required to be positive semi-definite ( [28] (Proposition 1.5.1, p. 26)). In case covariances are given as a discrete sequence {γ |j| } ∆t , they are defined at discrete lags h = j ∆t with sampling interval ∆t and j ∈ N 0 . In general, autocovariance functions or sequences are functions of a non-negative distance, here τ or h for the continuous and discrete case, respectively. Thus, covariance functions are often denoted by the absolute distance |τ| and |h|. Here, we introduce the conditions τ ≥ 0 and h ≥ 0 in order to omit the vertical bars. The term Least Squares Collocation, introduced in geodesy by [3,6,46], represents the separability problem within the remove-restore technique, where a deterministic estimated trend component A x is subtracted from the measurements and the remainder ∆ = − A x is interpreted as a special realization of the stochastic process ∆L. In the trend estimation step the optimal parameter vector x is computed from the measurements as the best linear unbiased estimator (BLUE) for the true trend Aξ. The collocation step follows as the best linear unbiased predictor (BLUP) of the stochastic signal s at arbitrary points or as a filter process at the measured points ( [3] (Ch. 2); [5] (Ch. 3)). The variance/covariance matrices Σ SS reflect the stochastic behavior of the random signal S. The coefficients are derived by the evaluation of the covariance function γ S (τ), where τ represents the lags or distances between the corresponding measurement points. Σ NN denotes the variances/covariances of the random noise which is often modeled as independent and identically distributed random process, γ N (τ) = δ 0,τ σ 2 N with δ i,j being the Kronecker delta. σ 2 N denotes the variance of the noise such that the covariance matrix reads Σ NN = 1σ 2 N . Σ S S is filled row-wise with the covariances of the signal between the prediction points and the measured points.
The true covariance functions γ S (τ) and γ N (τ) are unknown and often have to be estimated, i.e., g S (τ) and g N (τ), directly from the trend reduced measurements ∆ by an estimation procedure using the empirical covariance sequences { g ∆L j } ∆t . The estimated noise variance s 2 N can be derived from the empirical covariances at lag zero Thus, it is allowed to split up g ∆L 0 into the signal variance g S 0 given by the covariance function g S (0) = g S 0 and a white noise component s 2 N , known as the nugget effect (e.g., [9] (p. 59)). In theory, s 2 N can be manually chosen such that the function plausibly decreases from g S 0 towards g S 1 and the higher lags. A more elegant way is to estimate the analytical function g S (τ) from empirical covariances g S j with j > 0 only. Naturally, all estimated functions g S (τ) must result in g ∆L 0 − g S (0) ≥ 0. Covariance function modeling is a task in various fields of application. They are used for example to represent a stochastic model of the observations within parameter estimation in e.g., laserscanning [18], GPS [47,48], or gravity field modeling [49][50][51][52][53][54]. The collocation approach is closely related to Gaussian Process Regression from the machine learning domain [55]. The family of covariance functions presented here can be naturally used as kernel functions in such approaches.
Within these kinds of applications, the covariance functions are typically fitted to empirically derived covariances which follow from post-fit residuals. Within an iterative procedure, the stochastic model can be refined. Furthermore, they are used to characterize the signal characteristics, again e.g., in gravity field estimation or geoid determination in the context of least squares collocation [7] or in atmospheric sciences [11,14].
Reference [8] (Ch. 3) proposed to have a special look to only the three parameters, variance, correlation length, and curvature at the origin to fit a covariance function to the empirical covariance sequence. Reference [17] also suggested taking the sequence of zero crossings into account to find an appropriate set of base functions. In addition, most approaches proceed with the fixing of appropriate base functions for the covariance model as a first step. This step corresponds to the determination of the type or family of the covariance function. The fitting then is restricted to this model and does not generalize well to other and more complex cases. Furthermore, it is common to manually fix certain parameters and optimize only a subset parameters in an adjustment procedure; see e.g., [18].
Once the covariance model is fixed, various optimization procedures exist to derive the model parameters which result in a best fit of the model to a set of empirical covariances. Thus, another aspect of covariance function estimation is the numerical implementation of the estimation procedure. Visual strategies versus least squares versus Maximum Likelihood, point cloud versus representative empirical covariance sequences and non robust versus robust estimators are various implementation possibilities discussed in the literature (see e.g., [11,14,18,47,51,56]). To summarize, the general challenges of covariance function fitting are to find an appropriate set of linear independent base functions, i.e., the type of covariance function, and the nonlinear nature of the set of chosen base functions together with the common problem of outlier detection and finding good initial values for the estimation process.
In particular, geodetic data often exhibit negative correlations or even oscillatory behavior in the covariance functions which leaves a rather limited field of types of covariance functions, e.g., cosine and cardinal sine functions in the one-dimensional or Bessel functions in two-dimensional case (e.g., [27]). One general class of covariance functions with oscillatory behavior is discussed in the next section.

The Covariance Function of the SOGM-Process
A widely used covariance function is based on the second-order Gauss-Markov processes as given in [23] (Equation (5.2.36)) and [25] (Ch. 4.11, p. 185). The process defines a covariance function of the form Its shape is defined by three parameters. ω 0 represents a frequency and ζ is related to the attenuation. The phase η can be restricted to the domain |η| < π/2 for logical reasons.
A reparametrization with c := ζω 0 and a := 1 − ζ 2 ω 0 gives which highlights the shape of a sinusoid. We distinguish between the nominal frequency a and the natural frequency ω 0 = a/ 1 − ζ 2 . The relative weight of the sine with respect to the cosine term amounts to w := tan(η). It is noted here that the SOGM-process is uniquely defined by three parameters. Here, we will use ω 0 , ζ and η as the defining parameters. Of course, the variance σ 2 is a parameter as well. However, as it is just a scale and remains independent of the other three, it is not relevant for the theoretical characteristics of the covariance function.
The described covariance function is referenced by various names, e.g., the second-order shaping filter [57], the general damped oscillation curve [27] (Equation (2.116)), and the underdamped second-order continuous-time bandpass filter [58] (p. 270). In fact, the SOGM represents the most general damped oscillating autocorrelation function built from exponential and trigonometric terms. For example, the function finds application in VLBI analysis [59,60].

Positive Definiteness of the SOGM-Process
At first glance, it is surprising that a damped sine term is allowed in the definition of the covariance function of the SOGM-process (cf. Equation (6)), as the sine is not positive semi-definite. However, it is shown here that it is in fact a valid covariance function, provided that some conditions on the parameters are fulfilled.
The evaluation concerning the positive semi-definiteness of the second-order Gauss-Markov process can be derived by analyzing the process's Fourier transform as given in [25] (Ch. 4.11, p. 185) and the evaluation of it being non-negative (cf. the Bochner theorem, [61]). The topic is discussed and summarized in [60]. With some natural requirements already enforced by 0 ≤ ζ ≤ 1 and ω 0 > 0, the condition for positive semi-definiteness (cf. e.g., [57] (Equation (A2))) is which can be expressed by the auxiliary variable α := arcsin(ζ) as |η| ≤ α.
In terms of the alternative parameters a and c, this condition translates to w ≤ c/a (cf. e.g., [27] (Equation (2.117))). As a result, non-positive definite functions as the sine term are allowed in the covariance function only if the relative contribution compared to the corresponding cosine term is small enough.

Definition of the Process
A more general and more flexible stochastic process is defined by the autoregressive (AR) process. An AR-process is a time series model which relates signal time series values, or more specifically the signal sequence, S i with autoregressive coefficients α k as (e.g., [26] With the transition to α 0 := 1, α 1 := −α 1 , α 2 := −α 2 , etc., the decorrelation relation to white noise E i is given by The characteristic polynomial in the factorized form (cf. [26] (Ch. 3.5.4)) has the roots p k ∈ C, i.e., the poles of the autoregressive process, which can be complex numbers. This defines a unique transition between coefficients and poles. In common practice, the poles of an AR(p)-process only appear as a single real pole or as complex conjugate pairs. Following this, an exemplary process of order p = 4 can be composed by either two complex conjugate pairs, or one complex pair and two real poles, or four individual single real poles. An odd order gives at least on real pole. For general use, AR-processes are required to be stationary. This requires that its poles are inside the unit circle, i.e., |p k | < 1 (cf. [26] (Ch. 3.5.4)). AR-processes define an underlying covariance function as well. We will provide it analytically for the AR(2)-process and will summarize a computation strategy for the higher order processes in the following.

The Covariance Function of the AR(2)-Process
Although AR-processes are defined by discrete covariance sequences, the covariance function can be written as a closed analytic expression, which, evaluated at the discrete lags, gives exactly the discrete covariances. For instance, the AR(2)-process has a covariance function which can be written in analytical form. The variance of AR-processes is mostly parameterized using the variance σ 2 E of the innovation sequence E i . In this paper, however, we use a representation with the autocorrelation function and a variance σ 2 as in [26] (Equation (3.5.36), p. 130) and [62] (Section 3.5, p. 504). In addition, the autoregressive parameters α 1 and α 2 can be converted to the parameters a and c via a = arccos(α 1 /(2 √ −α 2 )) and c = −ln( √ −α 2 ). Hence, the covariance function of the AR(2)-process can be written in a somewhat complicated expression in the variables a and c as γ(τ) = σ 2 (cot(a) tanh(c)) 2 + 1 e −c τ cos(a τ − arctan(cot(a) tanh(c))) (12) using the phase η = arctan(cot(a) tanh(c)) or likewise with the weight of the sine term w = tanh(c) cot(a). Please note that in contrast to the SOGM-process the weight or phase in Equations (12) and (13) cannot be set independently, but depends on a and c. Thus, this model is defined by two parameters only. Therefore, the SOGM-process is the more general model. Caution must be used with respect to the so-called second-order autoregressive covariance model of [11], which is closely related but does not correspond to the standard discrete AR-process.

AR(p)-Process
The covariance function of an AR(p)-process is given as a discrete series of covariances {γ j } ∆t defined at discrete lags h = j ∆t with distance ∆t. The Yule-Walker equations (e.g., [24] (Section 3.2); [30] (Equation (11.8))) directly relate the covariance sequence to the AR(p) coefficients. With (14a) being the 0th equation, the next p Yule-Walker equations (first to pth, i.e., (14b)) are the linear system mostly used for estimation of the autoregressive coefficients. Note that this system qualifies for the use of Levinson-Durbin algorithm because it is Toeplitz structured, cf. [30] (Ch. 11.3) and [63].
The linear system containing only the higher equations (14c) is called the modified Yule-Walker (MYW) equations, cf. [64]. This defines an overdetermined system which can be used for estimating the AR-process parameters where n lags are included.
The recursive relation represents a pth order linear homogeneous difference equation whose general solution is given by the following equation with respect to a discrete and equidistant time lag h = |t−t | = j ∆t cf. [23] (Equation (5.2.44)) and [26] (Equation (3.5.44)). p k are the poles of the process (cf. Equation (11)) and A k some unknown coefficients. It has to be noted here that covariances of AR(p)-processes are generally only defined at discrete lags. However, it can be mathematically shown that the analytic function of Equation (13) exactly corresponds to the covariance function Equation (6) of the SOGM. In other words, the interpolation of the discrete covariances is done using the same sinusoidal functions as in Equation (6) such that the covariance function of the AR(p)-process can equally be written with respect to a continuous time lag This is also a valid solution of Equation (15) in the sense that γ(h) = γ j holds. For one special case of poles, which are negative real poles, the function can be complex valued due to the continuous argument τ. Thus, the real part has to be taken for general use. Now, assuming A k and p k to be known, Equation (17) can be used to interpolate the covariance defined by an AR-process for any lag τ. Consequently, the covariance definition of an AR-process leads to an analytic covariance function which can be used to interpolate or approximate discrete covariances.

AR(2)-Model
We can investigate the computation for the processes of second order in detail. Exponentiating a complex number mathematically corresponds to As complex poles always appear in conjugate pairs, it is plausible that for complex conjugate pairs p k = p * l the coefficients A k and A l are complex and also conjugate to each other A k = A * l . Thus, A k p τ k +A l p τ l becomes A k p τ k +A * k (p * k ) τ and the result will be real. From Equation (13), we can derive that the constants amount to A k,l = σ 2 2 (1 ± i tanh(c) cot(a)) = σ 2 2 (1 ± i w) for the AR(2)-process and from Equation (18) we can see that c = −ln(|p k |) and a = |arg(p k )| such that the covariance function can be written as It is evident now that the AR(2) covariance model can be expressed as an SOGM covariance function. Whilst the SOGM-process has three independent parameters, here, both damping, frequency, and phase of Equation (19) are determined by only two parameters |p k | and |arg(p k )| based on e −c = |p k |, c = −ln(|p k |), a = |arg(p k )| and η = arctan(tanh(ln(|p k |)) cot(|arg(p k )|)). Thus, the SOGM-process is the more general model, whereas the AR(2)-process has a phase η or weight w that is not independent. From Equation (19), phase η can be recovered from the A k by |η k | = |arg(A k )| (21) and the weight by |w| = |Im(A k )/Re(A k )|.

AR(1)-Model
Here, the AR(1)-model appears as a limit case. Exponentiating a positive real pole results in exponentially decaying behavior. Thus, for a single real positive pole, one directly gets the exponential Markov-type AR(1) covariance function, also known in the literature as first-order Gauss-Markov (FOGM), cf. [65] (p. 81). A negative real pole causes discrete covariances of alternating sign. In summary, the AR(1)-process gives the exponentially decaying covariance function for 0 < p k < 1 or the exponentially decaying oscillation with Nyquist frequency for −1 < p k < 0, cf. [23] (p. 163), i.e.,

Summary
From Equation (17), one can set up a linear system here shown exemplarily for an AR(4)-process. The solution of Equation (24) uniquely determines the constants A k applying standard numerical solvers, assuming the poles to be known from the process coefficients, see Equation (11). Since Equation (17) is a finite sum over exponentiated poles, the covariance function of a general AR(p)-process is a sum of, in case of complex conjugate pairs, AR(2)-processes in the shape of Equation (19) or, in case of real poles, damping terms as given in Equations (22) and (23). The great advantage is that the choice of poles is automatically done by the estimation of the autoregressive process by the YW-Equations (14). Here, we also see that the AR(2)-process as well as both cases of the AR(1)-process can be modeled with Equation (17) such that the proposed approach automatically handles both cases.
Furthermore, we see that Equation (17) adds up the covariance functions of the forms of Equation (19), Equation (22), or Equation (23) for each pole or pair of poles. Any recursive filter can be uniquely dissected into a cascade of second-order recursive filters, described as second-order sections (SOS) or biquadratic filter, cf. [58] (Ch. 11). Correspondingly, the poles of amount p can be grouped into complex-conjugate pairs or single real poles. Thus, the higher order model is achieved by concatenation of the single or paired poles into the set of p poles (vector p) and correspondingly by adding up one SOGM covariance function for each section. Nonetheless, this is automatically done by Equation (17).

Covariance Representation of ARMA-Processes
Thus far, we introduced fitting procedures for the estimation of autoregressive coefficients as well as a linear system of equations to simply parameterize the covariance function of AR(p)-processes. In this section, we demonstrate that ARMA-models can be handled with the same linear system and the fitting procedure thus generalizes to ARMA-processes.
For the upcoming part, it is crucial to understand that the exponentiation p τ k of Equation (17) exactly corresponds to the exponentiation defined in the following way: i.e., p τ k = e s k τ , if the transition between the poles p k to s k is done by s k = ln(p k ) = (ln(|p k |) + i arg(p k )) and p k = e s k . To be exact, this denotes the transition of the poles from the z-domain to the Laplace-domain. This parametrization of the autoregressive poles can, for example, be found in [23] (Equation (5.2.46)), [66] (Equation (A.2)), and [26] (Equation (3.7.58)). In these references, the covariance function of the AR-process is given as a continuous function with respect to the poles s k such that the use of Equation (17) as a continuous function is also justified.
In the literature, several parametrizations of the moving average part exist. Here, we analyze the implementation of [33] (Equation (2.15)), where the covariance function of an ARMA-process is given by Inserting p τ k = e s k τ and denoting A k := b(s k ) b(−s k ) a (s k ) a(−s k ) which is independent of τ, we obtain which is suitable for the purpose of understanding the parametrization chosen in this paper. Now, the covariance corresponds to the representation of Equation (17). The equation is constructed by a finite sum of complex exponential functions weighted by a term consisting of some polynomials a( · ), a ( · ) for the AR-part and b( · ) for the MA-part evaluated at the positions of the positive and negative autoregressive roots s k . It is evident in Equation (26) that τ is only linked with the poles. This exponentiation of the poles builds the term which is responsible for the damped oscillating or solely damping behavior. The fraction builds the weighting of these oscillations exactly the same way as the A k in Equation (17). In fact, Equation (17) can undergo a partial fraction decomposition and be represented as in Equation (26). The main conclusion is that ARMA-models can also be realized with Equation (17). The same implication is also gained from the parametrizations by [67] (Equation (3.2)), [31] (Equation (48)), [68] (Equation (9)), and [34] (Equation (4)). It is noted here that the moving average parametrization varies to a great extent in the literature in the sense that very different characteristic equations and zero and pole representations are chosen.
As a result, although the MA-part is extensively more complex than the AR-part and very differently modeled throughout the literature, the MA-parameters solely influence the coefficients A k weighting the exponential terms, which themselves are solely determined by the autoregressive part. This is congruent with the findings of the Equation (19) where frequency and damping of the SOGM-process are encoded into the autoregressive poles p k .

The Numerical Solution for ARMA-Models
Autoregressive processes of order p have the property that p + 1 covariances are uniquely given by the process, i.e., by the coefficients and the variance. All higher model-covariances can be recursively computed from the previous ones, cf. Equation (15). This property generally does not hold for empirical covariances, where each covariance typically is an independent estimate. Now, suppose Equation (24) is solved as an overdetermined system by including higher empirical covariances, i.e., covariances that are not recursively defined. The resulting weights A k will automatically correspond to general ARMA-models because the poles p k are fixed.
Precisely, the contradiction within the overdetermined system will, to some extent, forcedly end up in the weights A k and thus in some, for the moment unknown, MA-coefficients. The model still is an SOGM process because the number of poles is still two and the SOGM covariance function is the most general damped oscillating function. The two AR-poles uniquely define the two SOGM-parameters frequency ω 0 and attenuation ζ. The only free parameter to fit an ARMA-model into the shape of the general damped oscillating function (SOGM-process) is the phase η. Hence, the MA-part of arbitrary order will only result in a single weight or phase as in Equation (19) and the whole covariance function can be represented by an SOGM-process. Consequently, the A k will be different from that of Equation (20), cf. [29] (p. 60), but the phase can still be recovered from Equation (21).
In summary, the general ARMA(2,q)-model (Equation (26)) is also realizable with Equation (17) and thus with the linear system of Equation (24). Here, we repeat the concept of second-order sections. Any ARMA(p,q)-process can be uniquely dissected into ARMA(2,2)-processes. Thus, our parametrization of linear weights to complex exponentials can realize pure AR(2) and general SOGM-processes, which can be denoted as ARMA(2,q)-models. These ARMA(2,q)-processes form single SOGM-processes with corresponding parameters ω 0 , ζ and η. The combination of the ARMA(2,q) to the original ARMA(p,q) process is the operation of addition for the covariance (function), concatenation for the poles, and convolution for the coefficients. Thus, the expansion to higher orders is similar to the pure AR(p) case. The finite sum adds up the covariance function for each second-order section which is an SOGM-process.
The weights would have to undergo a partial-fraction decomposition to give the MA-coefficients. Several references exist for decomposing Equation (26) into the MA-coefficients, known as spectral factorization, e.g., by partial fraction decomposition. In this paper, we stay with the simplicity and elegance of Equation (17).

Estimation and Interpolation of the Covariance Series
Within this section, the theory summarized above is used for covariance modeling, i.e., estimating covariance functions g S (τ), which can be evaluated for any lag τ, from a sequence of given empirical covariances { g ∆L j } ∆t . Here, the choice of estimator for the empirical covariances is not discussed and it is left to the user whether to use the biased or the unbiased estimator of the empirical covariances, cf. [23] (p. 174) and [69] (p. 252).
The first step is the estimation of the process coefficients from the g ∆L j with the process order p defined by the user. Furthermore, different linear systems have been discussed for this step, cf. Equation (14), also depending on the choice of n, which is the index of the highest lag included in Equation (14). These choices already have a significant impact on the goodness of fit of the covariance function to the empirical covariances, as will be discussed later. The resulting AR-coefficients α k can be directly converted to the poles p k using the factorization of the characteristic polynomial (Equation (11)).
For the second step, based on Equation (16), a linear system of m equations with m ≥ p, can be set up, but now for the empirical covariances. Using the first m covariances, but ignoring the lag 0 value contaminated by the nugget effect, this results in a system like Equation (24), but now in the empirical covariances g ∆L For m = p, the system can be uniquely solved, resulting in coefficients A k which model the covariance function as an AR(p)-process. The case m > p results in a fitting problem of the covariance model to the m empirical covariances g j with p unknowns. This overdetermined system can be solved for instance in the least squares sense to derive estimates A k from the g j . As was discussed, these A k signify a process modeling as an ARMA-model. Here, one could use the notation A k in order to indicate adjusted parameters in contrast to the uniquely determined A k for the pure AR-process. For the sake of a unified notation of Equation (17), it is omitted.
Due to the possible nugget effect, it is advised to exclude g ∆L 0 and solve Equation (28); however, it can also be included, cf. Equation (24). Moreover, a possible procedure can be to generate a plausible g S 0 from a manually determined s 2 N by g S 0 = g ∆L 0 − s 2 N . Equally, the MYW-Equations are a possibility to circumvent using g ∆L 0 .

Modeling Guidelines
The idea of solving the system for the weights A k is outlined in [23] (p. 167). In the following, we summarize some guidelines to estimate the covariance function starting at the level of some residual observation data.

Initial steps:
• Determine the empirical autocorrelation function g ∆L 0 to g ∆L n as estimates for the covariances γ ∆L 0 to γ ∆L n . The biased or unbiased estimate can be used. • Optional step: Reduce g ∆L 0 by an arbitrary additive white noise component s 2 N (nugget) such that g S 0 = g ∆L 0 − s 2 N is a plausible y-intercept to g S 1 and the higher lags.
Estimation of the autoregressive process: • Define a target order p and compute the autoregressive coefficients α k by solving the Yule-Walker equations, i.e., Equation (14b), or solving the modified Yule-Walker equations, i.e., Equation (14c), in the least squares sense using g S 1 to g S n .
• Compute the poles of the process, which follow from the coefficients, see Equation (11). Check if the process is stationary, which requires all |p k | < 1. If this is not given, it can be helpful to make the estimation more overdetermined by increasing n. Otherwise, the target order of the estimation needs to be reduced. A third possibility is to choose only selected process roots and continue the next steps with this subset of poles. An analysis of the process properties such as system frequencies a or ω 0 can be useful, for instance in the pole-zero plot.

Estimation of the weights A k :
• Define the number of empirical covariances m to be used for the estimation. Set up the linear system cf. Equation (28) either with or without g S 0 . Solve the system of equations either uniquely using m = p to determine the A k . This results in a pure AR(p)-process.
or as an overdetermined manner in the least squares sense, i.e., up to m > p. This results in an underlying ARMA-process.
• g S (0) is given by g S (0) = ∑ p k=1 A k from which s 2 N can be determined by s 2 N = g ∆L 0 − g S (0). If g S (0) exceeds g ∆L 0 , it is possible to constrain the solution to pass exactly through or below g ∆L 0 . This can be done using a constrained least squares adjustment with the linear condition ∑ p k=1 A k = g ∆L 0 (cf. e.g., [69] (8)) of each second-order section (SOGM component).
In addition, the phases need to be in the range |η| < π/2. If the solution does not fulfill these requirements, process diagnostics are necessary to determine whether the affected component might be ill-shaped. If the component is entirely negative definite, i.e., with negative g S (0), it needs to be eliminated.
Here, it also needs to be examined whether the empirical covariances decrease sufficiently towards the high lags. If not, the stationarity of the residuals can be questioned and an enhanced trend reduction might be necessary.
Using the YW-Equations can be advantageous in order to get a unique (well determined) system to be solved for the α k . By this, one realizes that the analytic covariance function exactly interpolates the first p + 1 covariances, which supports the fact that they are uniquely given by the process, cf. Equation (14b). On the other hand, including higher lags into the process estimation enables long-range dependencies with lower degree models. The same holds for solving the system of Equation (28) for the A k using only m = p or m > p covariances. It is left to the user which procedure gives the best fit to the covariances and strongly depends on the characteristics of the data and the application. In summary, there are several possible choices, cf. Table 1. Finally, the evaluation of the analytic covariance function (Equation (17)) can be done by multiplying the same linear system using arbitrary, e.g., dense, τ: Though including complex p k and A k , the resulting covariance function values are theoretically real. However, due to limited numeric precision, the complex part might only be numerically zero. Thus, it is advised to eliminate the imaginary part in any case.

An Example: Milan Cathedral Deformation Time Series
This example is the well known deformation time series of Milan Cathedral [62]. The time series measurements are levelling heights of a pillar in the period of 1965 to 1977. It is an equidistant time series having 48 values with sampling interval ∆t = 0.25 years. In [62], the time series is deseasonalized and then used for further analysis with autoregressive processes. In this paper, a consistent modeling of the whole signal component without deseasonalization is seen to be advantageous. The time series is detrended using a linear function and the remaining residuals define the stochastic signal, cf. Figure 1.  Based on the detrended time series, the biased estimator is used to determine the empirical covariances in all examples. The order for the AR(p)-process is chosen as p = 4. Four different covariance functions were determined based on the method proposed here. All second-order components of the estimated covariance functions are converted to the SOGM parametrization and individually tested for positive semi-definiteness using Equation (8).
As a kind of reference, a manual approach (cf. Figure 2a) was chosen. A single SOGM is adjusted by manually tuning the parameters to achieve a good fit for all empirical covariances, ignoring g ∆L 0 . This function follows the long-wavelength oscillation contained in the signal.
In a second approach, the process coefficients are estimated using the YW-Equations (14b) with the covariances g ∆L 0 to g ∆L p . Furthermore, the system in Equation (28) is solved uniquely using the lags from g ∆L 1 up to g ∆L m with m = p. In Figure 2b, the covariance function exactly interpolates the first five values. This covariance model with relatively low order already contains a second high-frequent oscillation of a 1-year period, caused by annual temperature variations, which is not obvious at first. However, the function misses the long wavelength oscillation and does not fit well to the higher covariances. Here, the model order can of course be increased in order to interpolate more covariances.
However, it will be demonstrated now that the covariance structure at hand can in fact also be modeled with the low-order AR(4)-model. For the remaining examples, the process coefficients are estimated using the MYW-Equations (14c) with n = 24 covariances. As a first case, the function was estimated with a manually chosen nugget effect, i.e., g S 0 = g ∆L 0 − 0.0019, and by solving Equation (28) with m = p − 1, which results in a pure AR(4)-process. The covariance function represented by Figure 2c approximates the correlations better compared to first two cases. The shape models all oscillations, but does not exactly pass through all values at the lags τ = 1.5 to 4 years. Finally, the system of Equation (28) was solved using the higher lags g S m up to m = 14 in order to enforce an ARMA-model. However, the best fit exceeds g ∆L 0 , such that the best valid fit is achieved with a no nugget effect in the signal covariance model and a covariance function exactly interpolating g ∆L 0 . Thus, we fix g S (0) to g ∆L 0 using a constrained least squares adjustment, which is the result shown here. In comparison, Figure 2d shows the most flexible covariance function. The function passes very close to nearly all covariances up to τ = 6 years and the fit is still good beyond that. The approximated ARMA-process with order p = 4 allows more variation of the covariance function and the function fits better to higher lags.   Tables 2 and 3. All parameters are given with numbers in rational approximation. The positive definiteness is directly visible by the condition |η| ≤ α. The approximated pure AR(4)-process in Table 2 is relatively close to being invalid for the second component but still within the limits. For the ARMA-model, notably, poles, frequency, and attenuation parameters are the same, cf. Table 3. Just the phases η of the two components are different and also inside the bounds of positive definiteness. The resulting covariance function using the ARMA-model (Figure 2d) is given by It needs to be noted here that the connection of unitless poles and parameters with distance τ cannot be done in the way indicated by Sections 2-6. In fact, for the correct covariance function, the argument τ requires a scaling with sampling interval ∆t which was omitted for reasons of brevity and comprehensibility. As a consequence, the scaling 1/∆t to the argument τ is included in Equation (30). We also assess the same factor to the transition from ω 0 to ordinary frequency ν 0 given in Tables 2 and 3. Table 2. Approximation with pure AR(2)-components. The frequency is also given as ordinary frequency ν 0 . The variance of the combined covariance function amounts to σ 2 = 0.02046 mm 2 which leads to σ N = 0.04359 mm. ω 0 and η are given in units of radians.

Roots
Frequency Using the process characteristics in Tables 2 and 3, it is obvious now that the long wavelength oscillation has a period of about 7.6 years. Diagnostics of the estimated process can be done in the same way in order to possibly discard certain components if they are irrelevant. In summary, the proposed approach can realize a much better combined modeling of long and short-wavelength signal components without manual choices of frequencies, amplitudes, and phases. The modified Yule-Walker equations prove valuable for a good fit of the covariance function due to the stabilization by higher lags. ARMA-models provide a further enhanced flexibility of the covariance function.

Summary and Conclusions
In this paper, we presented an estimation procedure for covariance functions based on methodology of stochastic processes and a simple and straightforward numerical method. The approach is based on the analogy of the covariance functions defined by the SOGM-process and autoregressive processes. Thus, we provide the most general damped oscillating autocorrelation function built from exponential and trigonometric terms, which includes several simple analytical covariance models as limit cases.
The covariance models of autoregressive process as well as of ARMA-processes correspond to a linear combination of covariance functions of second-order Gauss-Markov (SOGM) processes. We provide fitting procedures of these covariance functions to empirical covariance estimates based on simple systems of linear equations. Notably, the numerical method easily extends to ARMA-processes with the same linear system of equations. In the future, research will be done towards possibilities of constraining the bounds of stationarity and positive definiteness in the estimation steps.
The great advantage is that the method is automated and gives the complete model instead of needing to manually model each component. Our method is very flexible because the process estimation automatically chooses complex or real poles, depending on whether more oscillating or only decaying covariance components are necessary to model the process. Naturally, our approach restricts to stationary time series. In non-stationary cases, the empirically estimated covariance sequence would not decrease with increasing lag and by this contradict the specifications of covariance functions, e.g., g ∆L 0 being the largest covariance, see Section 2 and [28]. Such an ill-shaped covariance sequence will definitely result in non-stationary autoregressive poles and the method will fail.
The real world example has shown that covariance function estimation can in fact give good fitting results even for complex covariance structures. The guidelines presented here provide multiple possibilities for fitting procedures and process diagnostics. As a result, covariance function estimation is greatly automatized with a generic method and a more consistent approach to a complete signal modeling is provided.