1. Introduction
Singular spectrum analysis (SSA), which is closely related to signal-subspace methods (cf. [
1,
2,
3,
4,
5] and reviews [
6,
7]), has been increasingly used in recent decades for practical tasks, including preprocessing and feature extraction as part of hybrid machine learning methods [
8,
9,
10,
11,
12]. The attractive feature of the SSA method is that it does not require specifying a time series model.
SSA is capable of addressing a wide range of problems in time series analysis, including the application of low-frequency filters for smoothing, the extraction of signals, the estimation of frequencies, the filling of gaps, and the forecasting; all but the former are based on signal subspace estimation (Golyandina, 2020 [
7]). The signal is understood to be a non-random component of the time series, which may include a trend and oscillations. The SSA algorithm consists of embedding the time series into a sequence of vectors of size
L, collecting them into a matrix, decomposing this matrix into elementary matrices, grouping these matrices in a sophisticated way, and then obtaining an interpreted decomposition of the original time series into a sum of interpreted components. In order to act as a low-frequency filter, the number of components is determined based on the frequency characteristics of the components in question. In the case of the majority of other problems, signal estimation is required. Signal estimation is performed by grouping the leading
r components. As a result, it is of great importance to know
r, which is referred to as the signal rank or signal model order. The following section will describe the SSA algorithm for solving a particular signal estimation problem.
Let us briefly describe the SSA algorithm for signal extraction from a time series
of length
N, following [
7]. We assume that
where
is a signal, and
is random noise with zero expectation. The algorithm has two parameters, the window length
L,
, and the number of components
r,
, where
. First, the time series is transformed into a trajectory matrix
of size
of the time series
:
where the embedding operator
denotes the bijection between
and
, and
is the set of Hankel matrices of size
with equal values on the anti-diagonals
.
The SSA estimator of the signal is defined as the composition
where
is the orthogonal projector on the set of Hankel matrices
, and
is the projector on the set
of
matrices of rank at most
r. In both cases, we consider the projector by the Frobenius norm. The projection by
is constructed by averaging the values along the anti-diagonals [Section 6.2] in [
3], and the result of
can be obtained via singular value decomposition as the sum of the leading
r summands (Eckart–Young theorem [
13,
14]).
From the description of the algorithm, it follows that to adequately estimate a signal, its signal trajectory matrix must be of rank r or well approximated by a matrix of rank r. This raises the following notion.
Say that a signal
is a series of rank
r if its
L-trajectory matrix
is rank-deficient and has rank
r for any
. It is known that the definition of the series of rank
r is equivalent to the equality to the rank
r of a trajectory matrix with
if
[Corollary 5.1] in [
15]. If a signal has rank
r, we call it a low-rank series.
For an infinite time series
of rank
r, there exists a governing linear recurrence relation (LRR) of order
r:
,
,
[Chapter XVI, Section 10, Theorem 7] in [
16]. A well-known result that specifies the explicit form of LRR governed series in parametric form is:
, where
are polynomials in
n [Theorem 3.1.1] in (cf. [
17] and [Theorem 5.3] in [
3]).
The rank of a signal is referred to as the model order, since in the complex-valued case, a sum of r complex exponentials has rank r. Consequently, the rank estimation problem is known as model order selection.
If the signal is a series of known rank
r, there are numerous methods for extracting it, including low-rank approximation [
18,
19,
20,
21]. In particular, the paper [
21] proposes an efficient MGN (Modified Gauss-Newton) method for finding the best low-rank approximation using the least-squares method. It is significant to note that in the case of Gaussian white noise, the least-squares approximation coincides with the maximum likelihood estimate. Alternative approaches include the Cadzow method, which consists of alternating projections to
and
at each iteration. This method has been discussed in the literature, including in the following sources: [
22,
23,
24].
In all versions, low-rank approximation methods are iterative, which makes even efficient methods quite time-consuming and not guaranteed to find the global minimum. Furthermore, it is always necessary to know the rank of the signal in advance.
Let us describe the cases when low-rank approximation methods, with the choice of r equal to the signal rank, will prove ineffective. The first situation is when the noise level is too high. In this case, the approximation by a signal-rank series includes a significant portion of the noise in the result. In order to extract the signal more accurately, it is necessary to take r less than the signal rank. The second situation is when the signal is not exactly a low-rank series. This is usually the case for real-world time series. In this case, the low-rank approximation can perform poorly.
The version of SSA for signal extraction is a single iteration of the Cadzow method and has a very efficient implementation [
25]. Given that the method employs a single iteration, it is not constrained to the extraction of low-rank signals. Instead, it can be used to identify trends and periodic components in real-world time series, which can then be subjected to further analysis and forecasting. By employing the intermediate SSA result in the form of the singular value decomposition of the trajectory matrix, one can perform visual identification of the signal-related components of the decomposition. It is evident that this approach is not feasible when dealing with a vast amount of data. Consequently, techniques for automated component identification within SSA have been developed. For example, in [
26], a method for automated trend identification has been proposed, wherein the number of components to be identified must be specified. Consequently, it is also necessary to set
r.
In this study, we propose a novel approach to the problem of estimating the signal rank. Rather than determining the signal rank itself, our objective is to identify the optimal parameter r in the SSA algorithm that minimizes the mean square error (MSE) of the signal reconstruction. In the case of a low-rank signal and low noise, this approach will yield the same result as the conventional method of finding the signal rank. However, in situations where the signal is not exactly of low rank or the noise level is high, the signal rank may not provide the optimal r.
This study will be based on the methods of signal rank detection (model order selection). These methods can be divided into two types: those based on information criteria and those based on the properties of the SSA method (properties of the signal subspace). The information criteria that are currently available were not designed for use in the context of the SSA case; therefore, we propose modifications to them. Given that the original version of the information criteria was developed for the case of Gaussian white noise, we suggest an approach that would allow us to extend them to the case of red noise.
Let us describe the structure of the paper.
Section 2 describes known methods for estimating the model order
r. In
Section 3, we propose an approach that is based on signal estimation by SSA.
Section 3.1 considers the case of white noise, and
Section 3.2 proposes a way to transfer the methods to the case of red noise.
Section 4 includes numerical studies and comparisons on artificial examples.
Section 5 verifies the performance of the methods for real-world time series.
Section 6 presents a summary and discussion; conclusions complete the paper.
3. Modifications of Information Criteria for the Case of Hankel Noise
It is known that the set
of matrices of rank at most
r in the neighborhood of a matrix of rank
r is a smooth manifold of order
(ref. [
21], [Ex.13, p.27 ] in [
35]). This allows one to consider the linearization of the projector
and to approximate the projection on the set
as a linear projection onto a tangent subspace at the desired point.
Our approach is to estimate the variance
used in information criteria (
4) for a given rank
r by using the matrix
rather than by constructing the maximum likelihood estimate of the signal. Since the computation of
is reduced to the summation of the first
r summands of the singular value decomposition of the matrix
, this allows for fast recalculation for different
r and thus provides a fast method for estimating the rank of a signal.
3.1. White Noise
Let be a Gaussian white noise with zero expectation and variance . The noise series can be estimated as . In the proposed approach, the estimation of will be conducted without proceeding from the matrix to a time series.
Let
. Then the estimate of
can be given as follows (let us call this version ‘SVD’):
By employing the singular value decomposition, the same noise variance estimate can be obtained as
, where
are the squares of the singular values of the trajectory matrix
.
It can be seen from (
1) that the operator
repeats each time series element in the trajectory matrix as many times as there are elements on the corresponding anti-diagonal. Consequently, given the Hankel structure of the input matrices, we put forward a more accurate weighted version of the
estimator (which we shall henceforth refer to as ‘TRMAT’):
where
,
, …,
N, are the numbers of elements on the
i-th anti-diagonals of an
matrix. The division by
N is a consequence of the fact that the number of such diagonals is
N. Note, that if the window length
L is a relatively small value compared to
N, there is a negligible difference with the SVD criterion, since the weights
are the same for both methods.
In both cases, the alternative estimate of
k (instead of
) to substitute into the Formula (
4) is
that is, it is the dimension
of the smooth manifold
, reweighted taking into account the “replacement” of the dimension
of the matrix space by the dimension
N of the time series space. Let us explain this normalization. One can look at the contribution of number
k of parameters to the penalty in the AIC and BIC values in the form (
4) by dividing the expression by
N. This shows that the parameter penalty depends on
.
We have numerically checked that the non-normalized number of parameters
that corresponds to the non-Hankel case leads to a severe underestimation of the penalty term
in (
4) when (
8) or (
9) is taken as an estimate of
; therefore, we will not consider this case further.
We will consider (
8) (SVD) and (
9) (TRMAT) with the normalized number of parameters given by (
10) using the information criteria (
4). Recall that the best rank corresponds to the maximum value of a criterion. Preliminary experiments have shown that only the BIC-penalty criteria turned out to be working, and we will consider it further. A graph of
flattens out after growth to the correct rank, so the maximum point is determined unstably and the rank is usually overestimated.
3.2. Red Noise
Let the noise be stationary and Gaussian. The procedure that makes the noise white is called whitening and consists of multiplying the time series by the square root of the inverse of the noise autocovariance matrix. The whitening operation affects both the signal and the noise. Since the matrix-form model is stable concerning a linear transformation (multiplication by a full-rank matrix), we can apply the methods of the previous section to the result of the whitening. To apply the criterion, it is sufficient to know what the variance of the white noise after whitening will be; more precisely, it is sufficient to know how the variance of the noise after whitening is expressed through the variance of the original noise. To conduct this, we need to know the covariance matrix of the noise.
Let us denote the variance estimate using the Formula (
8) or (
9) as
. Then, we need to substitute the variance
of the whitened noise into the Formula (
4). For example, in the case of an AR(1) model with coefficient
,
Recall that red noise is an AR(1) process with a positive coefficient.
To implement this approach, it is sufficient to estimate the coefficient
, which is equal to the correlation coefficient between successive observations. As before,
is the residual matrix defined in (
7). It is not exactly Hankel. Since in the case of wrong rank, its structure is far from Hankel and diagonal averaging should distort it considerably, let us estimate the correlation using the matrix before averaging by shifting the rows of the residual matrix.
Let
Then, as an estimate of
we take
Remark 2. The idea behind TRMAT can also be applied to the evaluation of ϕ by considering the same weights in each sum as in (
9)
. We will apply such an estimate with weights to the algorithm TRMAT. An alternative is to estimate as the correlation between successive observations in the series , but we will not consider it since it did not lead to an improvement in preliminary numerical experiments.
Unfortunately, especially if the signal is not stationary and even more so if it is not a low-rank series, the calculated estimate of based on nonstationary residuals in the case of wrong rank can be accidental and lead to an incorrect maximum of an information criterion. Therefore, we consider the following variant in which the information criterion is assigned when the residual is clearly nonstationary (recall that the best model corresponds to the maximum of criteria):
Check the series
for nonstationarity (this is optional, not necessary), e.g., using the KPSS test [
36]. If the hypothesis is rejected (e.g.,
p-value < 0.05), then
and STOP.
Find the better model for from white and red noise models. If it is detected as white noise, then , and if it is closer to red noise, then estimate parameter in the red-noise model, for example, by the MLE method without requiring the model to be stationary.
If , then the criterion returns ; otherwise, the value is calculated using the formula of the corresponding information criterion.
The option of criteria with the adjustment (
11) according to the estimated
will be referred to by adding _AR.
3.3. Case of Zero Signal
Information criteria allow one to consider the absence of a signal as one of the models. In this case, in the form of the criterion (
4), the mean square of the values of the initial time series serves as the estimate of
.
Accordingly, the signal values are 0. Recall that the ESTER and SAMOS criteria do not consider the case . Formally, criteria ESTER and SAMOS return in the case , so the value will never provide the maximum for these criteria.
3.4. Algorithm
Algorithm 1 describes how to compute BIC versions of the proposed criteria for white and red noise cases.
Algorithm 1 Calculation of TRMAT and SVD |
Input: Time series , window length L, rank r, type of IC (TRMAT or SVD), indicator CHECKSTAT if the stationarity check is needed, significance level for checking stationarity, indicator NOISETYPE. |
Result: Value of IC. |
|
For MDL and white noise, the values of the criterion are calculated by (
6). If the noise is red, then
in (
6) is recalculated with the substitution of
instead of
.
4. Numerical Experiments
Let us numerically compare the considered methods and study their accuracy as a function of the noise level.
4.1. Approach to Comparison
One of the criterion quality characteristics used in practice for model order estimation is the proportion of correct order (rank) estimates or the bias of the average rank estimate. However, overestimation or underestimation of the rank can have different effects on the result, because the decomposition components are arranged by decreasing their contribution, and therefore, overestimation of the rank leads to a smaller increase in the signal estimation error compared to underestimation of the rank. Therefore, we will consider the RMSE of the signal estimation as the main characteristic. Note, also that the trend identification methods [
26,
37] are robust to rank overestimation when the number of identified components is chosen according to the estimated rank.
In the problem statement considered in this paper, the correct order of models is generally not defined. Therefore, we will compare the estimated ranks (model orders) with the optimal rank
r, which gives the minimum error of the signal estimation obtained by (
2). Accordingly, we will compare the RMSE of the signal estimation at the estimated rank with the minimum error at the optimal rank. Since the best approximation depends on the noise level, we will consider the quality of the criteria depending on the noise level.
Thus, in most cases, we will compare the signal estimation error with the average minimum error and the average rank with the average optimal rank. In addition, we will consider the proportion of rank estimate matches with individual optimal ranks that yield the minimum errors for each series separately.
4.2. White Noise
In this section, we consider the case of a noisy signal, where the noise is Gaussian white with zero mean and variance .
4.2.1. Sum of Two Sinusoids
Let us start with a simple example, a signal in the form of the sum of two sinusoids:
The rank of the signal (
13) is 4 [Example 5.2] in [
3]. Since deterministic signals are asymptotically separated from noise [Section 6.1.3] in [
3], the optimal value of
r will be 4. However, as the noise level increases, the second sinusoid will start to mix with the noise and after some period of uncertainty, the optimal rank will become equal to 2, the rank of one sinusoid. It is clear that for any signal, as the noise increases, at some noise level the optimal rank becomes zero, i.e., the best estimate of the signal is the zero series.
We will consider 20 values of from 0.01 to 100 with equal logarithmic steps. Separately, we will focus on four noise levels, (optimal rank 4), (transition period from 4 to 2), (optimal rank 2) and (optimal rank 0).
We begin by examining the proportion of matches between the estimated ranks and the individual optimal ranks (which yield the smallest RMSE of the signal estimates for the given series).
In comparison to the methods SVD, TRMAT, and MDL, the methods ESTER and SAMOS demonstrate the most favorable outcome, with a proportion of matches reaching 0.998 (out of 1000 trials) at a low noise level (
) and window length
. This outcome aligns with the findings presented in the studies [
32,
33]. However, as the noise level increases, the methods lose efficacy, yielding near-zero matches.
Figure 1 illustrates the dependence of standardized criterion values on rank. For comparability, the criteria were standardized (i.e., the mean was subtracted and the values were divided by the standard deviation depicted in the graph) as the criterion scales may be incomparable.
To more effectively illustrate the impact, we have selected the window length as a multiple of both sinusoids’ periods and the standard deviation . It is readily apparent that the ESTER and SAMOS criteria lead to an erroneous determination of the rank in this instance, particularly the ESTER method, with a maximum at the separability point . In contrast, the considered information criteria, such as TRMAT, effectively resolve the rank to be four.
Thereby, in the following sections, we will not consider the ESTER and SAMOS criteria, particularly given that they are not applicable in the absence of a signal. Thus, in what follows, we will examine the criteria SVD, TRMAT, and MDL in greater detail.
As illustrated in
Table 1, for noise levels corresponding to the stable rank detection (i.e., all except the level
), the TRMAT, SVD, and MDL methods consistently yield optimal rank estimates. For each of the aforementioned methods, the proportion of matches with optimal ranks is nearly one. However, at the noise level
, the TRMAT and SVD methods were unsuccessful, with only a small proportion of matches being identified. In comparison, the MDL method demonstrated better performance, with a success rate of 0.48, as opposed to 0.235 and 0.277 for the TRMAT and SVD methods, respectively.
Figure 2 illustrates the distinction in the behavior of the criteria for the two noise levels, namely,
and
. As previously, the criterion values have also been standardized.
Figure 3 depicts the mean estimated ranks and the optimal ranks for the smallest root of average MSEs over 1000 realizations for the TRMAT and MDL methods (SVD has been omitted due to its similar behavior to TRMAT). It can be observed that the optimal rank graph as a function of noise levels contains plateaus with identical rank values and transition periods.
At plateaus, both methods yield accurate estimates of the rank. During transition periods, both methods underestimate the rank, but the MDL method does so to a lesser extent.
The graphs of RMSE versus noise level are presented in
Figure 4. We will now explain the specifics of the graph depicted in this figure. First, as a baseline, we consider the RMSE at the maximum possible rank, that is, when the signal estimate is the entire original series and the RMSE is equal to the root of the mean squared time series values. All errors depicted in the plot are presented on a relative scale, wherein they are divided by the baseline RMSE. Accordingly, the value of 1 corresponds to a signal estimate that is equal to the original series. In a sense, this represents the most unfavorable scenario.
Figure 4 also presents the optimal case, which corresponds to the rank associated with the lowest average error. The lines exhibit minimal divergence at the plateau, indicating stable rank detection, while diverging at the transition periods. It is evident that the error of both criteria exceeds the minimum error at the transitions. However, the error of the MDL criterion is slightly smaller than that of TRMAT, which is consistent with the results presented in
Figure 3.
Let us include MGN in the consideration. When the criterion MGN is applied, we obtain an MGN signal estimate and use this estimate for calculating the MSE of the signal estimate; in particular, for finding the optimal ranks.
Figure 5, which depicts the mean of the rank estimates as a function of noise level, shows that MGN overestimates the rank at plateaus and thus is much closer to the optimal rank at transitions than the other criteria, which estimate the rank accurately at plateaus and underestimate at transitions. It should be noted that the optimal ranks depicted in
Figure 5 may differ from those presented in
Figure 3. This discrepancy arises from the fact that, in the MGN criterion, the estimate of the signal is obtained through the MGN method, rather than the SSA. A comparison of the two figures reveals that the optimal MGN ranks do not exhibit a transition range of noise levels, whereas the optimal SSA ranks display noise levels that correspond to an intermediate rank of 3. Consequently, it is possible to calculate both minimal mean squared errors (MSE) using the MGN estimates and minimal MSEs using the SSA estimates. The former are, naturally, smaller, given that the MGN estimates are obtained by the least-squares method.
As a consequence of rank overestimation, the resulting RMSE for the MGN-estimated rank is observed to be larger at plateaus than that of the minimal MGN error. Therefore, at plateaus, the MGN criterion provides the same level of accuracy (
Figure 6) as TRMAT, with smaller errors occurring at transitions.
Figure 6 also presents a hybrid scenario in which the criterion is TRMAT and the signal estimation is conducted using MGN. This combination results in the smallest errors, as TRMAT determines rank with greater precision and MGN generates a more precise estimate of the signal.
4.2.2. Logarithmic Signal
As an example of a signal that is not low-rank, consider the signal in the form of a logarithmic series:
As the signal is not low-rank, it is not possible to determine a proper rank; however, the selection of an appropriate model order can be discussed.
Figure 7 depicts the mean estimated ranks and the optimal ranks (model orders) for the smallest root of the average mean squared errors (MSEs) over 1000 realizations. As the noise level increases, the optimal rank values decrease from four to zero. The plateaus are relatively short, and there are significantly more transition regions than in the previous example involving a finite-rank signal.
Figure 8 illustrates the dependence of the root mean square error (RMSE) on the noise level.
In this example, we will consider the same noise levels as previously described in
Section 4.2.1. In contrast with the previous example, the noise level
falls on the transition period between ranks 3 and 2,
is almost on the plateau (rank 1),
lies exactly on the plateau (rank 1), and for
, the signal is not detected.
Table 2 correlates with this description of noise levels. In the first row, the accuracy of the criteria is generally less precise; in the second row, it is more precise and in the third and fourth rows, it is highly precise. It can be seen that the TRMAT method produces the best result, providing a good match of ranks on transition periods as well.
In the preceding example, the TRMAT and MDL criteria exhibited comparable accuracy, with MDL demonstrating a slight advantage at transition sections in the noise levels. In the case of the considered signal that is not low-rank, the MDL criterion exhibited instability, while the TRMAT criterion demonstrated a notable advantage. These observations are illustrated in
Table 2 and
Figure 8.
The MGN criterion is now to be incorporated into the comparison (see
Figure 9 and
Figure 10). One can see that the error lines for TRMAT and MGN are intertwined, indicating that there is no clear advantage of one criterion over the other. In this instance, the combination of the TRMAT criterion for rank detection and the MGN signal estimation with the obtained rank does not result in any improvement and thus is not depicted in the graph.
4.3. Red Noise
In this section, we examine a more complicated case of red noise. Since there is no effective implementation of the MGN method for red noise with unknown autoregression parameters, we will not consider it in this section.
In this study, we consider the red noise process in the form , where are independent between themselves and with , ; . Accordingly, when calculating the baseline RMSE, it is necessary to normalize it by dividing by in order to obtain 1 for the maximum relative RMSE.
4.3.1. Sum of Two Sinusoids
In this section, we will examine an example with the signal defined in (
13). We recall that the rank of the signal in this case is 4. As with white noise, an increase in the variance results in the second sinusoid becoming mixed with noise. As the variance continues to increase, after a period of uncertainty, the optimal rank becomes equal to 2 and then to 0.
As in the preceding analysis, 20 values of will be considered, ranging from 0.01 to 100 in equal logarithmic steps.
Figure 11 depicts the mean estimated ranks and the optimal ranks for the smallest root of average MSEs over 1000 realizations. As demonstrated in
Figure 12, the methods exhibit a high degree of similarity in their performance. At plateaus, the criteria accurately determine the rank of the signal, thereby confirming their capacity to estimate the rank correctly. However, at the transition from model order 2 to 0, both methods demonstrate a notable decline in performance, with a pronounced tendency to underestimate the rank.
4.3.2. Logarithmic Signal
As an example of a signal that is not low-rank, consider a series with the signal (
14). In this case, we use the version of the criteria with the stationarity check (Algorithm 1, CHECKSTAT = TRUE).
Figure 13 depicts the mean estimated ranks and the optimal ranks for the smallest root of average MSEs over 1000 realizations. In this case, the superiority of TRMAT is clearly evident at relatively low noise levels, as illustrated in
Figure 14. At high noise levels, however, both methods perform poorly, largely due to a significant underestimation of ranks.
4.4. Zero Signal
In the case of white noise, all three methods (TRMAT, SVD, and MDL) explicitly indicate that the rank is equal to zero. The noise level plays no role in this particular scenario.
In the case of red noise, it is recommended that a criterion without an additional check for stationarity of the residual be used for the stationary signal. In this case, for , the three methods, TRMAT_AR, SVD_AR, and MDL_AR, almost always yield a rank estimate equal to 0.
6. Summary and Discussion
In this paper, we considered a variety of criteria for determining the model order (signal rank) in SSA, including non-conventional cases, namely, signals that are not low-rank or mixed with noise. The criteria considered were ESTER, SAMOS, MGN, MDL, SVD, and TRMAT (the latter two were proposed by us).
The ESTER and SAMOS criteria appeared to be unsuitable for determining model order in the majority of the considered cases. While the information criteria can be employed, none of them have a comprehensive theoretical justification for application to the considered problem statement. It is also important to note that this justification is unlikely to be obtained due to the overly general formulation of the problem, which allows for the possibility of signals that are not low-rank and have a high noise level.
An exception is the MGN criterion, which is theoretically justified in the case of a low-rank unmixed signal in the presence of white noise. This is due to the fact that it numerically searches for the maximum likelihood estimate (MLE) of the signal. The MGN method, which is used in the MGN criterion for signal estimation, is computationally expensive, while it is implemented as computationally fast as possible. The used optimization method is local, as are many others, and can converge to a local extremum, leading to an excessive estimation of the standard error and thus an overestimation of the rank by the MGN criterion. In general, an overestimation may not be a significant issue, as the higher the number of decomposition components, the smaller their contribution. However, the high computational cost represents a significant obstacle to the application of the MGN criterion, even in the case of a low-rank signal.
In the case of SSA, when the singular value decomposition is performed once and the signal estimation is generally not a low-rank series, three variants, MDL, SVD, and TRMAT, were considered in the BIC version, since the penalty from the AIC criterion leads to a significant overestimation of the signal rank. In this case, the MSE signal estimation error (an estimate of ) used in the information criteria is not based on the time series decomposition after diagonal averaging, but on the estimated noise matrix before hankelization. In our proposed version of TRMAT, the Hankel structure of the trajectory matrix is accounted for using weights.
Numerical studies have demonstrated that for relatively simple cases, such as a noisy sum of sinusoids with a low noise level, the methods yield approximately the same results. A slight advantage of the MDL method can be observed in transition regions where the optimal order of the model changes. In the case of a signal that is not low-rank, such as a logarithmic signal, the proposed TRMAT criterion is preferred. In both cases, the MGN criterion is comparable to TRMAT.
Let us turn our attention to the issue of computational cost. In the case of white noise, the SVD and MDL methods have the same cost as SSA itself, which is
. The TRMAT method requires an additional computation that can be implemented at the same asymptotic cost as
. Consequently, TRMAT is slightly more expensive. The MGN method is significantly more expensive, as demonstrated in [
21], with a computational cost of one iteration being
. However, the number of iterations required for convergence can be considerable.
In order to apply the given criteria to the case of red noise, the well-known technique of noise whitening was employed. Due to the linearity of the model, the approach can be reduced to multiplying the estimated variance by , where is the AR(1) coefficient.
In the case of an incorrect model, can be estimated by an irrelevant value; this may lead to an incorrect estimation of the signal rank. The proposed approach is as follows. If the series has a trend, one can first test the residual of the extracted signal of rank r for stationarity and if the hypothesis is rejected, not consider this value of rank as a candidate for the rank estimation. Numerical experiments have shown that the stationary check improved the accuracy in the considered examples. However, in the absence of a trend, such a check leads to a worsening of the rank estimation. For the TRMAT method, the estimate of can be improved using the same weights caused by the Hankel structure of trajectory matrices as it is in the TRMAT method itself.
We do not discuss the costs of the methods for the red noise case because estimating and checking the stationarity of the residuals can be conducted in different ways and it is currently difficult to choose the best one.
A general recommendation on the choice of a criterion that is based on numerical experiments is as follows: If one does not take into account the considerable time consumption associated with MGN, TRMAT can be recommended for both white and red noise cases.
In addition, the paper presented an approach to comparing methods based on the relative MSE as a function of the noise variance. The range of noise levels is divided into plateaus and transitions, providing a more structured framework for interpreting the comparison results. Without this approach, the comparison results were less organized and difficult to interpret.
The application of the methods to real-world data sets demonstrated satisfactory outcomes, with the suggestion that a variance-stabilizing transformation should be performed.