The Decomposition and Forecasting of Mutual Investment Funds Using Singular Spectrum Analysis

Singular spectrum analysis (SSA) is a non-parametric method that breaks down a time series into a set of components that can be interpreted and grouped as trend, periodicity, and noise, emphasizing the separability of the underlying components and separate periodicities that occur at different time scales. The original time series can be recovered by summing all components. However, only the components associated to the signal should be considered for the reconstruction of the noise-free time series and to conduct forecasts. When the time series data has the presence of outliers, SSA and other classic parametric and non-parametric methods might result in misleading conclusions and robust methodologies should be used. In this paper we consider the use of two robust SSA algorithms for model fit and one for model forecasting. The classic SSA model, the robust SSA alternatives, and the autoregressive integrated moving average (ARIMA) model are compared in terms of computational time and accuracy for model fit and model forecast, using a simulation example and time series data from the quotas and returns of six mutual investment funds. When outliers are present in the data, the simulation study shows that the robust SSA algorithms outperform the classical ARIMA and SSA models.


Introduction
Mutual investment funds provide management services to institutional and individual investors, besides great liquidity for financial investments made in them and low transactional costs [1,2]. These funds can be of fixed or variable income and allow to diversify the assets while reducing unsystematic risk. Fixed income mutual investment funds are of low risk, whereas variable-income mutual investment funds vary in terms of risk but also in terms of returns. In this study, we were interested in analyzing the quotas and returns of six of the largest Brazilian based mutual investment funds-three purely based on stocks: (i) Alaska Black, (ii) APEX Long Biased, and (iii) Brasil Capital; and three balanced funds (usually combining a stock component, a bond component, and sometimes a money market component in a single portfolio): (iv) ADAM Strategy, (v) Gavea Macro, and (vi) SPX Nimitz.
A natural framework for analyzing mutual investment funds, due to its underlying structure, is a time series method. L 1 norm [24] instead of the L 2 norm used in the classical algorithm, which they used for model fit.
In this paper we go one step further than [23] and propose a new robust algorithm for SSA that considers the SVD based on the Huber function [25]. Moreover, we propose two robust SSA forecasting algorithms, one based on the the L 1 norm and another based on the Huber function. Comparisons are made between the classical SSA algorithm, the robust SSA algorithm based on the L 1 norm (RLSSA), the robust SSA algorithm based on the Huber function (RHSSA), and the classical autoregressive integrated moving average (ARIMA) model, in terms of computational time and accuracy for model fit and model forecast. These comparisons for decomposing and forecasting time series were done by considering a simulation example and the six mutual investment funds mentioned above.
The rest of this paper is organized as follows. Section 2 provides the materials and methods containing the data description, a brief introduction to the ARIMA and SSA methodologies, and the details of the proposed robust SSA algorithm that uses the SVD based on the Huber function. Section 3 presents the results and discussion, wherein the ARIMA, SSA, and robust SSA algorithms are compared in terms of model fit and model forecast, using the six mutual investment funds and the simulation example. The paper closes in Section 4, wherein some conclusions are drawn.

Data
In this paper we consider a dataset that includes daily observations of six mutual investment funds, three based purely on stocks and three balanced funds:  The datasets were collected from https://infofundos.com.br/carteira.

ARIMA Model
The autoregressive integrated moving average (ARIMA) models are among the most widely used techniques for time series analysis and forecasting. Such a model depends on three parameters: p is the number of lagged observations in the model, i.e., the autoregressive (AR) order; d is the number of times that the original observations are differenced, i.e., the integrated (I) degree; and q is the size of the moving average window, i.e., the order of the moving average (MA) [26]. This parametric model can then be written as ARI MA(p, d, q), with p, d, and q non-negative integers. Given a time series Y N = y 1 , . . . , y N , the ARI MA(p, d, q) model can be written as: where φ 1 , . . . , φ p are the parameters or coefficients of the p autoregressive terms; B is the time lag operator, or backward shift, which is a linear operator denoted by B k such that L k y t = y t−k , t ∈ Z; y t is the observation at the time point t; c = µ(1 − φ 1 − · · · − φ p ); µ is the mean of (1 − B) d y t ; β 1 , . . . , β q are the parameters or coefficients of the q moving average terms; and ε t is an error term, usually white noise with variance σ 2 . Alternatively, the model can be written as: which is the parametization used in the "arima" function of the software R [27].

Singular Spectrum Analysis
Singular spectrum analysis is a non-parametric technique for model fit and model forecasting that decomposes a time series into a number of components that are summed and interpreted as trend, periodicity, and noise. Similarly to many other time series techniques, SSA can be used for solving a wide range of problems, some of the most relevant being its ability to smooth the original time series, and to separate the signal (i.e., trend and oscillatory components with different amplitudes) from the noise components. Therefore, SSA can be used to analyze and reconstruct smoother noise-free time series that can then be used for model forecasting.
SSA is divided into two interconnected stages: decomposition and reconstruction of the time series. These stages are divided into two sets each, forming a total of four steps: embedding, singular value decomposition (SVD), grouping, and diagonal averaging. The complete algorithm for model fit is described in the following sub-section. Further details can be found in, e.g., [5,6,28].

Decomposition
In the first stage, the (univariate) time series is converted into a high-dimensional matrix called a trajectory matrix, which is then decomposed into the sum of rank-one matrices based on the SVD.
(1) Embedding: Consider a non-zero time series Y N = {y 1 , . . . , y n } with size N > 2. Let L(1 < L < N) be an integer value called window length and K an integer such that the trajectory matrix includes all values; i.e., K = N − L + 1. The embedding step is achieved by mapping the original time series into a sequence of K vectors with length L: Then, the trajectory matrix X, that includes the vectors Y i , i = 1, . . . , K, in its columns can be written as: (2) Singular value decomposition: Let S = XX T , U 1 , . . . , U L be the eigenvectors of S, and λ 1 ≥ · · · ≥ λ L , its corresponding eigenvalues. If d is the number of non-null eigenvalues of S, and considering V i = X T U i √ λ i , we can decompose the trajectory matrix X as: The decomposition stage can be accomplished either by the eigendecomposition of X T X or by the SVD of X (X = UDV T , D = diag( √ λ 1 , . . . , √ λ d )). A comparison between both decompositions can be found in [29].

Reconstruction
In the second stage, after a separating signal from noise components, a diagonal averaging procedure is conducted in the matrices associated to the signal resulting into the sum of time series components that can then be interpreted as trend or oscillatory components: (1) Eigentriple grouping: This step consists of identifying the first r eigentriples associated with the signal and discarding the d − r eigentriples associated with the noise. Formally, let I = 1, . . . , r and I c = r + 1, . . . , d. The goal of this step is to choose I such that the trajectory matrix can be written as: where is the noise term. The number of eigentriples to conduct the reconstruction is often decided based on w-correlations. We shall say that two series Y (1) and Y (2) are approximately separable if all correlations between the rows and the columns of the corresponding trajectory matrices obtained from series Y (1) and Y (2) are close to zero. In [5] they considered other characteristics of the quality of separability; namely, the weighted correlation or w-correlation, which is a natural measure of deviation of two series Y (1) T and Y (2) T from w-orthogonality: where Y If the absolute value of the w-correlation is small, the two series are almost w-orthogonal. If the absolute value of the w-correlation is large, the series are far from being w-orthogonal and are, therefore, badly separable. Further explanation and intuition about this measure can be found in [5,28]. Other proposals for this choice were proposed by, e.g., [30,31].
(2) Diagonal averaging: In this step, using anti-diagonal averaging on the matrices included in X I , the noise-free time series is reconstructed. First, the approximate trajectory matrix X I is transformed into a Hankel matrix. Let A s = {(l, k) : l + k = s, 1 ≤ l ≤ L, 1 ≤ k ≤ K} and #(A s ) be the number of elements in A s . The element x ij of the new Hankel matrix X is given by: Next, the Hankel matrix X I is transformed into a new series of dimension N, and the original time series Y N can be approximated by: where j = i − L + 1. The reconstructed noise-fee time series can then be used for out-of-sample forecasting.

Robust SSA
Despite knowing that SSA has shown to be superior to traditional model-based methods in many applications, the singular value decomposition (second step of the SSA algorithm) is highly sensitive to data contamination with outliers. Very few studies were made in order to access effects of outliers in SSA and to generalize this methodology [21,22]. A first attempt to robustify the SSA by considering an SVD based on a robust L 1 norm [24] instead of the L 2 norm used in the classical algorithm, was proposed by [23]. That robust generalization was compared with the classical SSA algorithm for model fit by these authors. In this subsection we review that robust SSA algorithm proposed by [23] and propose a new robust algorithm for SSA that considers the SVD based on the Huber function [25] and also propose an algorithm for robust SSA model forecasting. While the robust algorithms based on the L 1 norm are very popular, they have difficulties in handling heavy tail outliers. The robust algorithms based on the Huber function combine the sum of squares loss and the least absolute deviation loss, that is, a quadratic on small errors, but grows linearly for large errors. As a result, the Huber loss function is not only more robust against outliers but also more adaptive for different types of data [32]. Further details and comparisons between the L 1 and Huber loss functions, among others, can be found in [33]. The R source code is available upon request from the first author of this paper. The robust SSA algorithm proposed by [23] replaces the classical SVD based on the least squares L 2 norm, by the robust SVD algorithm based on the L 1 norm [24]. This robust SVD is performed iteratively, starting with an initial estimate of the first left singular vector U 1 and leading to an outlier-resistant approach that also allows for missing data. The robust SVD based on the L 1 norm is implemented under the function "robustSVD()" from the R package "pcaMethods".

Robust SSA based on the Huber Function
Here we propose a new alternative to robustify the SSA algorithm, where the least squares SVD in the step two is replaced by the robust SVD based on the Huber function [25]. The Huber loss function [34] can be defined as: where δ is a parameter that controls the robustness level, and a smaller value of δ usually leads to more robust estimation. The robust SVD based on the Huber function is a special case of robust regularized SVD and can be obtained with the function "RobRSVD" of the "RobRSVD" R package, in the following way: RobRSVD (data, rough = TRUE, uspar = 0, vspar = 0). In this R implementation, the authors consider δ = 1.345, the value commonly used in robust regression that produces 95% efficiency for normal errors [35]. However, numerical studies suggested that the RobRSVD function is not very sensitive to the choice of δ [25]. More details about this robust SVD can be found in [25].

Robust SSA Forecasting Algorithm
The standard recurrent SSA forecasting algorithm assumes that a given observation can be written as a linear combination of the L − 1 previous observations [5,6,30]. The coefficients of those linear combinations in the classical SSA forecasting algorithm are obtained based on the left singular vectors, U, of the trajectory matrix X. This is valid for SSA because of the orthogonality of the vectors in U and of the full rank decomposition of X, which is not the case for the robust SVD algorithms because of their construction and specific properties. To overcome this limitation for the robust SSA algorithms and to be able to obtain out-of-sample forecasts using a robust SSA algorithm, a three stages approach can be conducted: (i) Use the robust SSA algorithm to obtain a robust approximation for the signal in the trajectory matrix; i.e., conduct the two stages of the robust SSA algorithms, decomposition (using the robust SVD algorithm) and reconstruction, to obtain the noise free (i.e., the signal) trajectory matrix X; (ii) Apply the standard SVD to the matrix X obtained in (i) and obtain U ∇ j , the vector of the first L − 1 components of U j and π j , the last component of the vector U j , j = 1, · · · , r. Then, we can write the coefficient vector a as a = ( a L−1 , · · · , a 1 ) where γ 2 = ∑ r j=1 π 2 j . (iii) The h-steps-ahead out-of-sample recurrent robust SSA forecasts y N+1 , . . . , y N+h , can be obtained as for t = 1, · · · , N ∑ L−1 j=1 a j y t−j , for t = N + 1, · · · , N + h whereỹ 1 , . . . ,ỹ N , are the fitted values for the reconstructed time series, as obtained from the robust SSA algorithm in (i).

Accuracy Measures
There are several methods and measures for assessing model accuracy based on the behavior of model errors. Here, there are two types of errors: • Sample errors, called tuning errors; • Out-of-sample errors, called forecast errors.
Typically, the root mean squared error (RMSE) is used as a criterion for accessing the precision of a model. The RMSE to investigate the quality of the model fit can be written as: where y t are the observed values and y t the fitted values by the considered model/algorithm (i.e., ARIMA, SSA, robust SSA).
To investigate the forecasting accuracy, let us assume that the last g observations are used as a reference (i.e., as test set). Let N 0 = N − h − g. The RMSE to investigate the quality of the forecasting model can be written as: where y t are the last g observed values and y t the respective h-steps-ahead forecast values.

Results and Discussion
In this section, comparisons are made between the classical ARIMA model, the classical SSA algorithm, and the robust SSA algorithms, in terms of computational time and accuracy for model fit and model forecast. These comparisons for decomposing and forecasting time series are done by considering a simulation example and the time series of six mutual investment funds. Table 1 shows the descriptive statistics for the six mutual investment funds, including the minimum, maximum, and mean returns, being clear that Alaska Black is the fund that shows the largest variation and with the highest mean daily return. On the other end there are Gavea Macro and SPX Nimitz, which show the smallest variations among the considered funds, and low mean returns.
In addition to the descriptive measures, Figure 1 shows the behavior of the six investment funds over time. From these plots, it is possible to observe that all funds have an overall growing tendency, with similar patterns for Gavea Macro and SPX Nimitz.

Model Fit
The models/algorithms under comparison for model fit are: (i) ARIMA, (ii) SSA, (iii) robust SSA based on the L 1 norm (RLSSA), and (iv) robust SSA based on the Huber function (RHSSA).
The parameters of the ARIMA model for each of the six mutual investment funds were estimated with the function "auto.arima" from the R package "forecast" [36].
For the SSA and robust SSA algorithms, there are two choices to be made by the researcher: (i) the window length L; and (ii) the number of eigentriples used for reconstruction r. Three values of L were chosen for each time series, as defined in Table 2-L 1 = N/20, L 2 = N/2, and L p -being the L p obtained from the periodogram, based on the largest cycle for each time series [37] (i.e., about one trimester for ADAM Strategy, one semester for Alaska Black, one year for APEX Long Biased, one quadrimeter for Brasil Capital, one quadrimeter for Gavea Macro, and one quadrimester for SPX Nimitz), and N being the time series length. The choice of the number of eigentriples used for reconstruction r, for each of the considered window lengths and each of the time series, was done by taking into consideration the the w-correlations among components [5]. Figure 2 shows the w-correlation matrices for each of the six mutual investment funds, considering an window length L = N/20, and Figure A1 of the appendix shows the w-correlation matrices for each of the six mutual investment funds, considering an window length L = N/2. The w-correlation matrices can be obtained with the function "wcor" of the R package "Rssa" [38] and the number of eigentriples r should be chosen in order to maximize the separability between signal and noise components; i.e., maximize the w-correlation among signal components, maximize the w-correlation among noise components, and minimize the w-correlation between signal and noise components. A summary of the number of eigentriples used for the reconstruction of each time series for each of the window length considered can be seen in Table 2.
Since one of the objectives in SSA is to decompose the original time series into interpretable components such as trend and seasonality, plus the noise component that is then discarded, Figure 3 shows the original time series for the Alaska Black mutual investment fund, its trend component (sum of individual trend components), its seasonal component (sum of individual seasonal components), and its residuals (sum of the remaining components associated to noise), considering an window length L = N/20 = 33 and r = 12 eigentriples for reconstruction. Similar SSA decompositions for ADAM Strategy, APEX Long Biased, Brasil Capital, ADAM Strategy, Gavea Macro, and SPX Nimitz-considering the values of window length L 1 and r 1 eigentriples used for reconstruction, as defined in Table 2-can be found in Figures A2-A6 of the appendix, respectively.   In order to evaluate and compare the ability for model fit using the four models, ARIMA, SSA, robust SSA based on the L 1 norm (RLSSA), and robust SSA based on the Huber function (RHSSA), the root mean square error (RMSE) was calculated for each time series. Table 3 shows the RMSE for model fit by each of the four models applied to each of the six mutual investment funds, considering a window length L = N/2 ( Table 2). Table 4 shows the RMSE for model fit by each of the four models applied to each of the six mutual investment funds, considering a window length L = N/20 (Table 2). Table 5 shows the RMSE for model fit by each of the four models applied to each of the six mutual investment funds, considering a window length obtained based on the largest cycle for each time series (Table 2). From the analyzes of these tables, we can conclude that the ARIMA model shows an overall better performance when the window length in the SSA related algorithms is set to be half of the time series (Table 3). However, when the window length is set to be L 1 = N/20 or L p (i.e., equal to the length of the largest cycle), the classical SSA provides the best results, while the ARIMA model and the robust SSA algorithms alternate for the second best performances. For all choices of window length, the two robust SSA algorithms behaved similarly. Table 3. Root mean square error for each of the six mutual investment funds, considering each of the four models, ARIMA, SSA, robust SSA based on the L 1 norm (RLSSA), and robust SSA based on the Huber function (RHSSA), for the window length L 2 = N/2 and considering r 2 engentriples for reconstruction as defined in Table 2.  Table 4. Root mean square error for each of the six mutual investment funds, considering each of the four models, ARIMA, SSA, robust SSA based on the L 1 norm (RLSSA), and robust SSA based on the Huber function (RHSSA), for the window length L 1 = N/20 and considering r 1 engentriples for reconstruction as defined in Table 2.  Table 5. Root mean square error for each of the six mutual investment funds, considering each of the four models, ARIMA, SSA, robust SSA based on the L 1 norm (RLSSA), and robust SSA based on the Huber function (RHSSA), for the window length L p (i.e., the length of the largest cycle) and considering r p engentriples for reconstruction as defined in Table 2.  Tables 6-8 show the computational times for each combination of model/algorithm and mutual investment fund, as presented in Tables 3-5, respectively. From the analyzes of these tables, we can conclude that the best performance was obtained by the ARIMA and SSA algorithms. The computational time, for the classic and robust SSA algorithms, increases with the increase of the length L. Moreover, for larger trajectory matrices (i.e., considering L = N/2) the robust SSA algorithm based on the Huber function has a lower computational time than the robust SSA algorithm based on the L 1 norm (Table 6). However, when the trajectory matrices are more rectangular (i.e., considering L = N/20, Table 7, or L = L p , Table 8), the robust SSA algorithm based on the L 1 norm has a much lower computational time (comparable to the ARIMA and SSA computational times) than the robust SSA algorithm based on the Huber function). Table 6. Computational time, in minutes, for each of the six mutual investment funds, considering each of the four models, ARIMA, SSA, robust SSA based on the L 1 norm (RLSSA), and robust SSA based on the Huber function (RHSSA), for the window length L 2 = N/2 and considering r 2 engentriples for reconstruction as defined in Table 2.  Table 7. Computational time, in minutes, for each of the six mutual investment funds, considering each of the four models, ARIMA, SSA, robust SSA based on the L 1 norm (RLSSA), and robust SSA based on the Huber function (RHSSA), for the window length L 1 = N/20 and considering r 1 engentriples for reconstruction as defined in Table 2.  Table 8. Computational time, in minutes, for each of the six mutual investment funds, considering each of the four models, ARIMA, SSA, robust SSA based on the L 1 norm (RLSSA), and robust SSA based on the Huber function (RHSSA), for the window length L p (i.e., the length of the longest cycle) and considering r p engentriples for reconstruction as defined in Table 2.  Figure 4 shows the original time series and the model fit by the SSA model with L = N/20 and by the ARIMA model. We can confirm that both fits are almost overlapped and very near to the original time series, which was expected from the small RMSE showed in Table 4.

Model Forecasting
In this section we compare the forecasting abilities of ARIMA, SSA with L = N/2, SSA with L = N/20, SSA with L = L p based on the largest cycle for each time series, and robust SSA based on the L 1 norm with L = N/20 and L p . The decision for not considering the robust SSA algorithm based on the Huber function was because of its similarity in terms of RMSE with the robust SSA based on the L 1 norm (Tables 3-5) and the much higher computational time (Tables 6-8). A similar argument was considered for not presenting the results for the robust SSA algorithm based on the L 1 norm with L = N/2. Table 9 shows the RRMSE for model forecasting for each of the six mutual investment funds, considering each of the four models, ARIMA, SSA with L = N/2, SSA with L = N/20, SSA with L = L p , and robust SSA based on the L 1 norm (RLSSA) with L = N/20 and L p , considering the window length and engentriples used for reconstruction as defined in Table 2. These values were obtained based on the forecasting of the g = 12 observations from each time series, obtained for one, five, and ten steps ahead out-of-sample forecast; i.e., one day ahead, one week ahead, and two weeks ahead.
The overall best performance was obtained with the classic SSA algorithm that considers a lower value for the window length, either L = N/20 or L = L p , followed closely by ARIMA and the robust SSA algorithm based on the L 1 norm. The ARIMA model obtained the best performance in three cases for one-step-ahead forecasting, and the robust SSA algorithm based on the L 1 norm with L = N/20 yielded the best performance in a couple of time series for five-steps-ahead forecasting.
As expected, the RMSE shows an overall increase when increasing the number of steps ahead to be forecast. A possible justification for the similarity between the SSA and robust SSA algorithm can be explained by the possible lack of outliers in the data. Table 10 shows the computational time for model forecasting for each of the six mutual investment funds, considering each of the five models shown in Table 9. As expected, after analyzing the computational times for model fit (Tables 6-8), the best performance in terms of computational time for model forecasting was obtained by the the ARIMA and SSA (with lower values for the window length) models and the worse by the robust SSA algorithm based on the L 1 norm. Table 9. Root mean square error for model forecasting for each of the six mutual investment funds, considering the models ARIMA, SSA with L = N/2, SSA with L = N/20, SSA with L p , robust SSA based on the L 1 norm (RLSSA) with L = N/20, and RSSA with L p , and their respective engentriples, as defined in Table 2.  Table 10. Computational time, in minutes, for the model for each of the six mutual investment funds, considering the models ARIMA, SSA with L = N/2, SSA with L = N/20, SSA with L p , robust SSA based on the L 1 norm (RLSSA) with L = N/20, and RSSA with L p , and their respective engentriples, as defined in Table 2.

Simulation Example
To verify the hypothesis raised in the previous subsection that the similarity between the results from SSA and the robust SSA algorithm can be due to the lack of outliers in the time series, in this subsection we present a simulation example where the methods are compared while analyzing a time series contaminated with outlying observations. The synthetic data were obtained by generating random values from the following function, and then we transformed them into a time series (right-hand plot in Figure 5): f (t) = exp{0.02t + 0.5 sin(2πt/5)} + , t = 1, ..., 100, where is the noise generated from the N(0, 0.1). A total of 100 simulated time series were considered. The data contamination, for illustration purposes, was made by considering additive outliers and magnitude increase outliers in the following way:

•
Additive outliers: 2%, 5%, and 10% of the time points y i are randomly chosen to be replaced by 2 + y i ; i.e., the values of y i are increased by a constant value of 2, resulting in a mild contamination scenario (e.g., (left-hand plot in Figure 5)); • Magnitude increase: 2%, 5%, and 10% of the time points y i are randomly chosen to be replaced by 5 × y i ; i.e., the time point magnitude of y i is increased by a factor of 5, resulting in an a quite extreme contamination scenario (e.g., central plot in Figure 5). Table 11 shows the mean of the root mean square errors for model fit, computed for each of the four models, ARIMA, SSA, robust SSA based on the L 1 norm, and robust SSA based on the Huber function, for the simulated data, based on 100 runs, using L = 24 and r = 5, and considering both contamination scenarios with 2, 5, and 10% outliers. As expected, when there is no data contamination, the classic SSA model is the most appropriated. For the mild contamination scenario with additive outliers, the robust SSA algorithms outperform both ARIMA and SSA models, the better performance being more evident when the percentage of the outliers increases. For the more extreme contamination scenario with multiplicative outliers, a similar patters was obtained, the RLSSA being the best robust algorithm, in this simulation example.
Appendix B includes a second simulation scenario where robust SSA algorithm based on the Huber function (RHSSA) outperforms the classic ARIMA and SSA models and the robust SSA algorithm based on the L 1 norm (RLSSA). Table 12 shows mean of the root mean square errors for model forecasting (M = 1, 5, 10 stepsahead), computed for each of ARIMA, SSA, and robust SSA based on the L 1 norm, for the simulated data, based on 100 runs, using L = 24 and r = 5. The results for the robust SSA based on the Huber function were not included because of their computational cost and out-performance when compared with the robust SSA based on the L 1 norm. Again, as expected, the SSA model yielded the best performance for no data contamination. For scenarios with data contamination, the best performance was obtained by the robust SSA forecasting algorithm, with a very large decrease in RMSE in many scenarios. Table 11. Mean of the root mean square errors for model fit, computed for each of the four models, ARIMA, SSA, robust SSA based on the L 1 norm, and robust SSA based on the Huber function, for the simulated data, based on 100 runs, using L = 24 and r = 5.

Conclusions
In this paper we considered the problem of model fit and model forecasting in time series. In particular, we analyzed six mutual investment funds. Following up on [23], who proposed a robust SSA algorithm by replacing the standard least squares SVD by a robust SVD algorithm based on the L 1 norm [24] for model fit, we proposed another robust SSA algorithm where the robust SVD based on the Huber function is considered [25]. Moreover, we propose a forecasting strategy for the robust SSA algorithms, based on the linear recurrent SSA forecasting algorithm.
Comparisons were made between the classical SSA algorithm, the robust SSA algorithms, and the classical ARIMA model, both in terms of computational time and accuracy for model fit and model forecast. Those comparisons were made by using daily observations of six mutual investment funds, and a synthetic data set where the time series were contaminated with outlying observations. For model fit of the six mutual investment funds, the best results were obtained for the SSA model when the window length L was set to be equal to the length of the time series divided by 20, or when the window length is defined as the length of the largest cycle in the time series. The ARIMA model and the robust SSA algorithms alternated for the second best performance. For model forecasting of the six mutual investment funds, the best overall performance was obtained for the classic SSA model considering a lower value for the window length, L = N/20 or L p , followed closely by the ARIMA model and the robust SSA algorithm based on the L 1 norm.
Based on the similarity between the results from the classic SSA model and the robust SSA algorithms, both for model fit and model forecasting, one may assume that the time series data from the six mutual investment funds had no or little data contamination. To access that hypothesis and to better illustrate the usefulness of the robust SSA algorithms, using a scenario with known and controlled outliers, a simulation study and its results were presented in this article. For both mild and and more extreme contamination scenarios, the robust SSA algorithms clearly outperformed the classical AMMI and SSA models, both for model fit and for model forecasting. Another important advantage of the robust SSA algorithms, because of their use of the robust SVD, is that they allow for missing values.
In terms of computational time, the SSA model gives the best performance, the robust algorithms being the most time consuming. A possible future development to reduce the computational time in the robust SSA algorithms is to consider a similar strategy as in [39], where a randomized SVD algorithm was used to speed up the SSA algorithm.
The usefulness of the proposed approach, regarding the forecasting case, can be assessed based on forecasting competitions (e.g., [40]) or large scale forecasting studies (see, e.g., [41]).
The methodology and results presented in this paper are of great generality and can be applied to other time series applications.

Appendix B
A second synthetic dataset was obtained by generating random values from the following function and then transforming them into a time series: f (t) = cos (2πwt + φ) + , t = 1, ..., 100, with w = 3/8, φ = π/8 and the noise generated from the N(0, 0.1) (right-hand side of Figure A7). A total of 100 simulated time series were considered.  Figure A7. Synthetic data without contamination (right), data with 5% additive outliers (left), and data with 5% multiplicative outliers (center). The vertical axes show the simulated value and the horizontal axes show the index of the simulated observation.
The data contamination was done in the same manner as described before. An example of 5% additive outliers scenario can be found on the left-hand plot of Figure A7, and an example of 5% multiplicative outliers scenario can be found on the central plot of Figure A7. The results for the root mean square errors for model fit, computed for each of the four models, ARIMA, SSA, robust SSA based on the L 1 norm, and robust SSA based on the Huber function, can be found in Table A1. Table A1. Mean of the root mean square errors for model fit, computed for each of the four models, ARIMA, SSA, robust SSA based on the L 1 norm, and robust SSA based on the Huber function, for the simulated data, based on 100 runs, using L = 24 and r = 2.