Modelling the Behaviour of Currency Exchange Rates with Singular Spectrum Analysis and Artiﬁcial Neural Networks

: A proper understanding and analysis of suitable models involved in forecasting currency exchange rates dynamics is essential to provide reliable information about the economy. This paper deals with model ﬁt and model forecasting of eight time series of historical data about currency exchange rate considering the United States dollar as reference. The time series techniques: classical autoregressive integrated moving average model, the non-parametric univariate and multivariate singular spectrum analysis (SSA), artiﬁcial neural network (ANN) algorithms, and a recent prominent hybrid method that combines SSA and ANN, are considered and their performance compared in terms of model ﬁt and model forecasting. Moreover, speciﬁc methodological and computational adaptations were conducted to allow for these analyses and comparisons.


Introduction
Apart from other important economic indicators such as interest rates, consumer price index, money supply and inflation, the currency exchange rate is one of the most important determinants of a country's relative level of economic health [1]. Exchange rates play a vital role in any country's level of trade, which is critical to every free market economy in the world [2,3]. No economy can operate in autarky, therefore, exchange rates are among the most analysed and governmentally manipulated economic indicators in any nation. Recently, exchange rates forecasting have become an important economic problem that is receiving increasing attention among researchers and policy makers especially because of its practical national economic significance. A fluctuating (volatile) exchange rate might lead to an unstable economy where it becomes difficult to predict the value of goods, services and other important economic components. Exchange rates have been shown, in the literature, to be among the major challenging and difficult economic measures to accurately forecast because changes in exchange rates are erratic and can have drastic effects on the economy [4][5][6]. Erratic behaviour of exchange rate was also identified in the literature as part of the leading causes of economic recessions [7]. Various nations adopt different exchange rate systems based on their history and economic goals. For instance, Brazil, India and South Africa implement a free floating exchange rate system while China and Russia adopts a system of managing floating exchange rates.
According to the International Monetary Fund (IMF), the countries that form the BRICS nations (Brazil, Russia, India, China, and South Africa) have more than 25% of the world's land and 40% Stats 2020, 3 of the world's population and about 18.3% of global nominal output [8]. The exchange rate, as the main system for foreign exchange of a country, has become a key factor affecting the stable economic development of the BRICS countries. The BRICS nations are the fastest growing in the emerging economies of the world. However, in recent years, the exchange rates of these economies have all experienced periods of high volatility [8]. In the context of the gradual recovery of the US economy and the relatively poor economic situations in Europe, United Kingdom and Japan, these emerging economies have been experiencing relatively stable economic growth except for recent unpalatable global circumstances. This article is therefore based on the examination of a suitable forecasting model for predicting currency exchange rates with special emphasis on the BRICS nations. Here, besides the currency exchange rates of the BRICS currencies with respect to the United States dollar (USD), we also consider three other powerful world currencies: the British pound (GBP), the Euro (EUR), and the Japanese yen (JPY).
The main objective of this paper is to access the performance of classical and contemporary methods for model fit and model forecasting in currency exchange rates. In particular, we want to compare the success of recently proposed hybrid methods with classical parametric and non-parametric, univariate and multivariate methods in the context of currency exchange rates. To achieve our objectives, we consider daily exchange rates data consisting 4240 observations each for eight currencies from 01/12/2003 to 28/02/2020 and employ time series techniques such as the autoregressive integrated moving average (ARIMA) model, the non-parametric univariate and multivariate singular spectrum analysis (SSA), artificial neural network (ANN) algorithms, and an hybrid method that combines the SSA with the ANN. Moreover, we adapt the hybrid method for model fit as it was originally proposed for model forecasting only [9]. Comparisons are made for model fit and model forecasting by employing the root mean square error (RMSE) and the mean absolute percentage error (MAPE).
The rest of this paper is structured as follows: Section 2 presents the contextual issues, the models and methodologies about the considered models for time series model fit and model forecasting. Section 3 presents the empirical results and discussions, and Section 4 gives a short discussion and concludes the paper.

Materials and Methods
In this section, we present the data used in this study and give a brief description of the forecasting models employed in this article.

The Data
This study employs data on daily exchange rate of eight currencies, in reference to the United States dollar (USD), spanning seventeen years between 1 December 2003 and 28 January 2020 (4240 observations). These data were obtained from www.yahoo.finance. The currencies analysed and compared ( Figure 2) are: Brazilian real (USD/BRL), Russian rouble (USD/RUB), Indian rupee (USD/INR), Chinese renminby (USD/CNY), South African rand (USD/ZAR), British pound (USD/GBP), Euro (USD/EUR), and Japanese yen (USD/JPY).

Autoregressive Integrated Moving Average (ARIMA) Model
In time series analysis, an autoregressive integrated moving average (ARIMA) model is a generalization of an autoregressive moving average (ARMA) model. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). The auto regressive (AR) part of ARIMA indicates that the evolving variable of interest is regressed on its own lagged or prior values. The moving average (MA) part indicates that the regression error is actually a linear combination of error terms whose values occurred contemporaneously and at various times in the past. The "integrated" (I) part of the ARIMA model indicates that the data values were replaced with the difference between the data values and their previous values [10]. This parametric model can then be written as ARI MA(p, d, q), with p, d and q non-negative integers [11]. Given a time series Y = y 1 , . . . , y N , the ARI MA(p, d, q) model can be written as: where φ 1 , . . . , φ p are the parameters or coefficients of the p autoregressive terms; B is the time lag operator, or backward shift, which is a linear operator denoted by B k such that B k y t = y t−k , t ∈ Z; y t the observation at the time point t; c = µ(1 − φ 1 − · · · − φ p ); µ is the mean of (1 − B) d y t ; β 1 , . . . , β q are the parameters or coefficients of the q moving average terms; and ε t is an error term, usually white noise with variance σ 2 . The results presented in this paper are based on an alternative parametrization of the ARIMA model that is implemented in the arima function of the software R [12].
In this study, we only consider the classical ARIMA-based models from the class of pure parametric models. However, nonparametric and ANN-based approaches are also considered. In a recent study by [13] the supremacy of ANN over ARIMA or generalized autoregressive conditional heteroskedasticity (GARCH) model for time series prediction was discussed. On the other hand, Ref. [14] compared the methods of ARIMA, ANN and fuzzy systems on 1284 daily observations of seven major currencies for five years and concluded that ARIMA gives more significant results than ANN and fuzzy systems.
In the next subsection we briefly describe the ANN that is also considered in this paper.

Artificial Neural Network (ANN)
Neurons are the main cells that make up the nervous system and are responsible for conducting, receiving, and transmitting nerve impulses throughout the body, causing it to respond to stimuli in the environment, for example. The brain is a complex network of neurons that process information through a system of several interconnected neurons. It has always been challenging to understand brain functions; however, due to advances in computing technologies, we can now program neural networks artificially [15].
Neural networks were originally developed in cognitive science and later used in engineering for pattern recognition and classification [16]. Neural networks are particularly useful because they can be used to model nonlinear behaviour in economics and financial markets, in contrast to traditional linear models which are more restrictive. They also have the capability of being able to approximate any nonlinear function and decompose "noisy" data. They proved, in some instances, to be more effective in describing the dynamics of nonstationary time series due to their unique nonparametric, noise-tolerant, and adaptive properties [17]. Over the last few decades, researchers and practitioners alike showed growing interests in applying modified versions of ANNs for time series analysis and forecasting [18]. ANNs are an effective tool to realize any nonlinear input-output mapping. It was demonstrated that with a sufficient number of hidden layer units, an ANN is capable of approximating any continuous function to any desired degree of accuracy [17]. Due to the nature of their learning process, ANNs can be regarded as nonlinear autoregressive models [19].
Artificial neural networks (ANNs) have gained tremendous popularity and use as a promising alternative technique for forecasting time series because of their several distinguishing features. The first networks developed were the Perceptron and Adaline networks, developed in the 1950s and 1960s by Rosenblatt [20] and Widrow [21] respectively. Perceptron networks were developed with the objective of recognizing images, being a model that received a set of input data and returned a single binary output. Adaline networks were developed to be used for pattern recognition, signal processing and regression. Similar to the perceptron network in that it has several input layers and only one output, it differs in that its output is not binary but an activation function f . Similar to the biological structure of neurons, artificial neural networks define the neuron as a central processing unit, which performs a mathematical operation that generates an output from a set of inputs [15]. The output of a neuron is a function of the weighted sum of the inputs plus the bias. The scheme of a simple artificial neural network can be seen in Figure 1. An ANN is composed of the layers of input, output, and the so-called hidden layers, which are in the center of the network and with the help of so-called weights (W i ), bias (b) and the activation function f , converts the input data to the expected output. The weights in a neural network are the most important factor in the transformation of the input data at the output, functioning similarly to the functioning of the slope in linear regression. The weights here are numerical parameters that determine how strongly each neuron affects the other. Meanwhile, the bias is like the intercept added in a linear equation, being an additional parameter that is used to adjust the output together with the weighted sum of the neuron inputs, and in each neuron there is still an activation process, through the z function as Finally, a function is applied to z, which is called the activation function f . The types of neurons are differentiated by the activation function attributed to them, and in practice the three most used functions are the sigmoid function, hyperbolic tangent and ReLU (rectified linear unit). There is also the loss function, which is the function used as a minimization criterion when estimating the parameters of a neural network. The most common loss function is the sum of squares of errors. The neural network model for time series was applied with the aid of the R package forecast, through the nnetar function, which generates a feed-forward neural network with a single hidden layer and lagged inputs to forecasting univariate time series.

Singular Spectrum Analysis (SSA)
There is a vast literature on the non-parametric technique for time series modelling and forecasting SSA. SSA incorporates elements of classical time series analysis, matrix algebra, and multivariate statistics, and aims at decomposing a time series into a set of components that can be interpreted as trend components, seasonal and cyclic components and noise components [22][23][24][25]. This relatively new technique for time series analysis proved to be widely useful and applicable to many fields of application [9,[26][27][28][29][30][31][32][33][34][35][36][37][38], with applications ranging from parameter estimation to time series filtering, and forecasting.
The basic SSA method consists of three complementary stages: decomposition, reconstruction and forecasting. The first stage is divided in two steps where the time series is decomposed in several components, in the second stage (two steps) the noise free time series is reconstructed and in the third stage the reconstructed time series is used for out-of-sample forecasting. A short description of the SSA technique is given below. More information can be found in, e.g., [23][24][25]39].
2nd step: Singular value decomposition (SVD). In this step, the matrix Y will be decomposed using SVD as . . , λ L , the eigenvalues of YY T and U 1 , . . . , U L , the corresponding eigenvectors.

Second Stage: Reconstruction
3rd step: Grouping. The grouping step corresponds to splitting the elementary matrices into m disjunct subsets I 1 , . . . , I m , and summing the matrices within each group. In our application we will focus on m = 2, i.e., only two groups. I 1 = {1, . . . , r} and I 2 = {r + 1, . . . , L} are associated with the signal and noise components, respectively. 4th step: Diagonal averaging. This step transforms each matrix Y I j into a new series of length N.
Using diagonal averaging we have that m,n the (m, n) th entry of the estimated matrix Y I j and denoting by y j 1 , . . . , y j N the reconstructed components in the matrix Y I j , j = 1, . . . , m, applying diagonal averaging follows that n,j −n K + 2 ≤ j ≤ K + L.

Third Stage: Forecasting
Two main algorithms for out-of-the-sample forecasting in the context of SSA are available: the recurrent SSA forecasting algorithm [23,40,41], and the vector SSA forecasting algorithm [23,42,43]. Here we will be interested in the recurrent SSA forecasting algorithm, which is briefly described below.
The basic requirement to obtain SSA out-of-sample forecasts is that the time series Y t = (y 1 , . . . , y N ) satisfies a linear recurrent formula, i.e., if a given observation can be written as a linear combination of the last d observations: Let us assume that U j is the vector of the first L − 1 components of the eigenvector U j and π j is the last component of U j (j = 1, . . . , r). Denoting υ 2 = ∑ r j=1 π 2 j we define the coefficient vector R as: Considering the above notation, the recurrent SSA forecasts ( y N+1 , . . . , y N+h ) can be obtained by where Z i = [ y i−L+1 , . . . , y i−1 ] T and y 1 , . . . , y N , are the SSA reconstructed values obtained from 4th step of the SSA algorithm described above.

SSA Parameter Selection
The SSA calibration depends on two parameters: the window length L, and the number of eigentriples used for reconstruction r. The choice of improper values for the parameters L or r yield incomplete reconstruction and the forecasting results might be misleading [41,43]. Despite the importance in choosing proper values for these parameters, no theoretical solution was proposed to solve this problem. An overall agreeable suggestion to choose the window length is to have it close to the middle of the series and proportional to the number of observations per period (e.g., to 12 for monthly time series, to four for quarterly time series, etc.). However, this choice does not guarantee the best predictions [41,43], being advisable a parameter choice made accordingly to the available data and intended analysis.
Among the alternative ways described in the literature to determine the number of eigentriples used for reconstruction r, the most widely used is the w-correlations approach. Considering two vectors N ] T , the w-correlation between them can be written as where w L,N j = min{j, L, N − j + 1} and 2 ≤ L ≤ N − 1. According to this measure, two series (e.g., signal and noise components) are separable if the absolute value of their w-correlation is small. Therefore, we determine r in such a way that the reconstructed series and residual have a small w-correlation between them. Another way to determine r is by examining the forecast accuracy, i.e., r is determined in such a way that the minimum error in forecasting will be obtained.

Multivariate Singular Spectrum Analysis (MSSA)
Multivariate SSA is a natural extension of the univariate SSA for analysing multivariate time series data. The algorithm is similar to the univariate SSA and has the same range of applications. Complete details about MSSA can be found in [23,39,44], and a brief description is presented below.
, t = 1, . . . , N, denote a sample of a M-variate time series with length N. Let us assume that Y t can be written in terms of a signal plus noise model as: As with the univariate SSA, the goal here is to remove the noise, N, from the original data and to obtain an estimate for the signal, S, without having to specify a parametric form for the signal, which then can then be used to obtain out-of-the-sample forecasts. The MSSA algorithm also consists of three complementary stages just like the univariate case: decomposition, reconstruction and forecasting. In the first stage the series is decomposed; in the second stage the noise free series is reconstructed; and in the final stage the reconstructed time series is used to forecast new data points. Each stage in this algorithm includes two steps. • Horizontal form: • Vertical form: where Y i = U i U i Y is a unitary matrix corresponding to the i th largest singular value ( √ λ i ), and d is the rank of Y.

Second Stage: Reconstruction
3rd step: Grouping. Considering Y i to be associated with the i th largest singular value of Y, this step intends to separate the signal and noise components as follows: whereŜ = Y 1 + . . . + Y r and r < d is the number of components associated with the signal.
4th step: Diagonal averaging. In this step, using anti-diagonal averaging on each block ofŜ, the de-noised/smoothed time series will be reconstructed.

Third Stage: Forecasting
5th step: Forecast engine. The forecast engine of MSSA, which is a linear function of the last L observations of the de-noised/smoothed time series, will be constructed in this step [39,44]. These forecasts are obtained by using the linear recurrent formula in a similar manner and detailed above for the univariate SSA algorithm. By considering the two versions of the trajectory matrix defined in the 1st step of this algorithm, we obtain the forecasts based on the horizontal MSSA (H-MSSA) and the forecasts based on the vertical MSSA (V-MSSA).
6th step: Out-of-the sample forecasting. In this step, h-steps ahead forecasts will be produced by using the forecast engine [39,44].

Hybrid Approach
To improve the results for model fit and model forecasting in time series many hybrid models, which combine more than one time series methodology, have been developed [9]. In some of those cases, the SSA is first applied to the raw data in order to extract the deterministic component and then another method such as ANN is applied to the residuals of the SSA to fit/forecast the stochastic part of the time series [9,45]. In this analysis we will consider one of the methods proposed by [9] where the SSA, together with the recurrent SSA forecasting algorithm, is used to forecast the deterministic part of the series and an ANN is used to forecast the stochastic part associated with the signal left from the SSA fit. As with the methods presented before, this hybrid approach will be considered for both model fit and model forecasting.

Accuracy Measure
Here we will evaluate two types of errors: (i) in sample errors associated with model fit; and (ii) out-of-sample errors, associated with model forecasting. For each of the two types of errors, two measures will be considered: the RMSE, and the MAPE.
For model fit, the RMSE and MAPE are used as a criterion for accessing the quality of a model to fit the data, and it can be written, respectively, as: and where y t are the observed values and y t the fitted values by the considered model/algorithm (i.e., ARIMA, SSA, MSSA, ANN), and N the length of the time series. For model forecasting, let us assume that the last g observations, e.g., g = 12, are used as the test set. The RMSE and MAPE to measure the out-of-sample forecasting error for a given model can be written, respectively, as: and where y t are the last g observed values and y t the respective h-steps-ahead forecast values. Other measures such as the symmetric mean absolute percentage error or the mean directional accuracy can also be used to evaluate both model fit and model forecasting.
In this paper, we considered purely symmetric loss functions where the under-prediction and over-prediction of the currency exchange rates are considered to have the same importance. However, depending on the scope of the analysis, asymmetric loss functions that, e.g., give higher weights to losses of the currency exchange rates in relation to the USD, should be considered.

Results and Discussion
In this section, we will analyse the historical data from the eight currency exchange rates. This data will be used to compare: (i) the classical ARIMA model, (ii) the classical SSA algorithm, (iii) the classical MSSA algorithm, (iv) the artificial neural network algorithm, and (v) the hybrid algorithm that combines SSA and ANN, in terms of computational time and accuracy for model fit and model forecast. In terms of model assumptions, stationarity is of key importance. While many of the standard parametric time series methods (e.g., ARIMA) require the data to be stationary, the non-parametric SSA and MSSA do not require the this assumption in the data [23]. As for ANN, overfitting may ease the problem of having non-stationary time series significantly and might be a key to success for complex financial time-series analysis [46]. The computational times presented in this section were obtained by a laptop with processor 2.00 GHz Intel Core i3-6006U, 4 GB RAM of memory and operational system of 64 bits with Windows 10. Table 1 shows the descriptive statistics for the eight currency exchange rates, including the minimum, maximum, mean, standard deviation and coefficient of variation. Figure 2 shows the behaviour of the eight currency exchange rates along the time. From the analysis of these plots, it is possible to observe a different behaviour between series, and no clear pattern among developed or developing countries.

Model Fit
The models/algorithms under comparison for model fit are: (i) ARIMA; (ii) SSA (with three alternative parameter choice); (iii) multivariate SSA (two algorithms); (iv) artificial neural networks (ANN); and (v) hybrid algorithm that combines SSA and ANN.
The parameters of the ARIMA model were estimated with the function "auto.arima" from the R package "forecast" [47], that does model selection based on either the Akaike information criterion or the Bayesian information criterion. The model parameters for the ARIMA models, together with the observed values of the test statistic and p-values of the Dicky-Fuller test (obtained using the function adf.test of the R package tseries) are given in Table 2. These results provide evidence that the stationarity requirement of the ARIMA model is met.

Dickey-Fuller Test Currency AR(p) I(d) MA(q) Test Statistic p-Value
Brazilian real (USD/BRL) 5  As mentioned above, for the SSA and multivariate SSA algorithms, there are two choices to be made by the researcher: (i) the window length L; and (ii) the number of eigentriples used for reconstruction r. The values for L were chosen for each time series as defined in Table 3: L 1 = N/20, L 2 = N/2 and L p , being the L p obtained from the periodogram, based on the largest cycle for each time series [48] (i.e., about one trimester for all time series), being N the length of the time series length. The number of eigentriples used for reconstruction r, for each of the considered window lengths and each of the time series, was obtained by analysing the w-correlations between components [23].
The number of eigentriples r should be chosen in order to maximize the w-correlation among signal components, maximize the w-correlation among noise components, and minimize the w-correlation between signal and noise components, i.e., in order to maximize the separability between signal components and noise components.   Figure 3 shows the w-correlation matrices for each of the eight currency exchange rates, considering the window length L p obtained based on the periodogram. Figures A1 and A2 of the appendix show the w-correlation matrices for each of the eight currency exchange rates, considering the window L 1 = N/20 and L 2 = N/2, respectively. Figure A3 of the appendix show the w-correlations for the horizontal and vertical MSSA. These w-correlation plots can be obtained with the function "wcor" of the R package "Rssa" [49]. These w-correlation plots intend to help with the decision about the separability between signal and noise components (3rd step of the SSA and MSSA algorithms). Being the darker colors of Figure 3 associated with higher w-correlations and lighter colors associated with lower w-correlations, we intend to choose the "best" cut-point that maximizes the separability, i.e., high w-correlations between signal components, high w-correlations between noise components, and low w-correlations between signal and noise components. To access and compare the ability for model fit, the RMSE and the MAPE were calculated for each of the eight models/algorithms, one ARIMA, three SSA, two MSSA, one ANN, and one hybrid SSA-ANN, in each time series (Table 4 and Table 5, respectively). The results for the univariate and multivariate SSA are for the parameters defined in Table 3. The parameter for the SSA part of the hybrid method that combines SSA and ANN were chosen to be L p and r p (Table 3) because of the best fit when compared with the other parameter choices for SSA.
The results in Tables 4 and 5 show that the overall best SSA algorithm to fit the time series was the one with parameter L p and r p (Table 3), which also outperformed the ARIMA model and, in most cases, the ANN. The best multivariate SSA algorithm was the one that uses the horizontal form of the trajectory matrix (H-MSSA), that also outperformed all SSA algorithms and the ANN. However, the best overall model for model fit in the considered eight time series of exchange currency rates, was the hybrid model that combined the SSA and the ANN. Table 4. Root mean square error for model fit for each of the eight currency exchange rates, considering each of the eight models/algorithms, ARIMA, SSA for the window length and number of eigentriples for reconstruction as defined in Table 3, multivariate SSA for the window length and number of eigentriples for reconstruction as defined in Table 3 Table 5. Mean absolute percentage error for model fit for each of the eight currency exchange rates, considering each of the eight models/algorithms, ARIMA, SSA for the window length and number of eigentriples for reconstruction as defined in Table 3, multivariate SSA for the window length and number of eigentriples for reconstruction as defined in Table 3 Table 6 shows the computational time for each of the eight currency exchange rates, considering each of the eight models/algorithms, ARIMA, SSA for the window length and number of eigentriples for reconstruction as defined in Table 3, multivariate SSA for the window length and number of eigentriples for reconstruction as defined in Table 3, ANN, and the hybrid method that combines SSA and ANN. It can be seen that, although the hybrid algorithm that combines the SSA and ANN takes longer that the competing methods, the computational times are under three minutes. Table 6. Computational time, in minutes, for model fit, for each of the eight currency exchange rates, considering each of the eight models/algorithms, ARIMA, SSA for the window length and number of eigentriples for reconstruction as defined in Table 3, multivariate SSA for the window length and number of eigentriples for reconstruction as defined in Table 3, ANN, and the hybrid method that combines SSA and ANN.  Figure 4 shows the original time series, the smoothed time series after applying the SSA considering a window length L p and r p eigentriples (Table 3) and the model fit by the hybrid algorithm that combines the SSA and the ANN, for each of the eight currency exchange rates. It can be seen that the model fits almost overlap with the original time series, which was expected because of the overall low values for the RMSE (Table 4) and MAPE (Table 5). Similar behaviour was obtained by all considered methods.

Model Forecasting
In this section, we compare the forecasting ability of the eight models/algorithms under study: (i) ARIMA; (ii) SSA (with three alternative parameter choice); (iii) multivariate SSA (two algorithms); (iv) artificial neural networks (ANN); and (v) hybrid algorithm that combines SSA and ANN. Tables 7 and 8 give the RMSE and MAPE forecasting values, respectively, for each method/algorithm applied to each time series. These values are obtained by considering a test set of g = 12 observations from each time series, obtained for one, five and ten steps ahead out-of-sample forecast, i.e., one day ahead, one week ahead and two weeks ahead. The overall best performance, based on both RMSE and MAPE was obtained by the hybrid algorithm that combines the SSA and the ANN, for any number of steps ahead out-of-sample forecasts. For one-step-ahead out of sample forecasting, the second best overall performance was obtained by the SSA based on L p and r p (Table 3) and the multivariate SSA algorithms (Tables 7 and 8). For five and ten steps-ahead out of sample forecasting, the models ARIMA, SSA with L 1 = N/20 and r 1 , SSA with L p and r p , both versions of the multivariate SSA algorithm and the ANN, perform similarly in terms of RMSE (Table 7). When considering the MAPE (Table 8 for five and ten steps-ahead out of sample forecasting, the second best performance alternates between the multivariate versions of the SSA algorithm, the ANN and the SSA algorithms based on L p and r p , and based on L 1 and r 1 ( Table 3).
Although Tables Tables 7 and 8 only give the point estimates for the RMSE and MAPE, respectively, a measure of variability such as the standard errors could also be obtained based on resampling.
To reduce the variability in these measure, the test size could also be increased which, in this case, provides similar results. Table 7. Root mean square error for model forecasting for each of the eight currency exchange rates, considering each of the eight models/algorithms, ARIMA, SSA for the window length and number of eigentriples for reconstruction as defined in Table 3, multivariate SSA for the window length and number of eigentriples for reconstruction as defined in Table 3, ANN, and the hybrid method that combines SSA and ANN.  Table 8. Mean absolute percentage error for model forecasting for each of the eight currency exchange rates, considering each of the eight models/algorithms, ARIMA, SSA for the window length and number of eigentriples for reconstruction as defined in Table 3, multivariate SSA for the window length and number of eigentriples for reconstruction as defined in Table 3 The computational times to obtain the RMSE and MAPE values in Tables 7 and 8 are presented in  Table 9. The lowest computational time was obtained for the multivariate SSA algorithms (the times reported in Table 9 are to obtain the forecast values for the eight time series together) in every number of steps ahead. These are followed by the SSA algorithms with L 1 and L p (Table 3) because of the more rectangular trajectory matrices used in the singular value decomposition, the most time consuming step of the SSA algorithm. As expected from the analysis of the computational times for model fit in Table 6, the hybrid model was the highest computational costly with times between 15 and 31 min, which was compensated with the excellent results in terms of model forecasting (Tables 7 and 8). Table 9. Computational time, in minutes, for model fit, for each of the eight currency exchange rates, considering each of the eight models/algorithms, ARIMA, SSA for the window length and number of eigentriples for reconstruction as defined in Table 3, multivariate SSA for the window length and number of eigentriples for reconstruction as defined in Table 3

Discussion and Conclusions
In this paper, we compared standard and advanced, parametric and non-parametric, and univariate and multivariate models to access the ability for model fit and model forecasting.
The models under consideration were: (i) the ARIMA model; (ii) the univariate SSA model, considering three different choices for the window length L and the number of eigentriples used for reconstruction r; (iii) the multivariate SSA model, considering the horizontal and vertical forms of the trajectory matrix and the linear recurrent algorithm; (iv) the ANN; and (v) a hybrid model that uses the SSA to fit/forecast the deterministic part of the data and the ANN to fit/predict the stochastic part of the data.
Based on previous analysis and comparisons, the non-parametric SSA proved to outperform standard methods such as the Holt-Winters and ARIMA models [38,50]. Another advantage of SSA in comparison with other standard methods for time series analysis and forecasting is that, contrary to those, it does not require the time series to be stationary. However, when the time series data includes outliers, the SSA which uses an SVD based on the least squares L 2 norm, might not be appropriated and gives worse results than a robust SSA algorithm which uses an SVD based on the L 1 norm [38,51]. For the case of multivariate time series data, the MSSA tends to outperform its univariate counterpart because as the co-integration between time series is considered in MSSA and not in SSA. The performance of MSSA for forecasting improves when there is dependency among time series [39]. Further developments in the field of time series forecasting have been obtained by combining different methods in hybrid methodologies which have proven to outperform most competing methods [9,52,53].
Although part of the initial motivation of using data on currency exchange rates from developing and developed countries, no specific similarity in behaviour was found nor specific interpretable cluster was obtained ( Figure A4).
For both model fit and model forecasting, the best performance in terms of RMSE and MAPE was obtained by the hybrid method that combines the SSA and the ANN, although more expensive computationally. This was followed by the multivariate SSA algorithms with a much lower computational time. These results allow for possible further promising research directions such as the combination of the robust SSA algorithm [38,51,54] with ANN to model time series with data contamination with outlying observations, the combination of the randomized SSA algorithm [55] with ANN to reduce the computational time for long time series, and the combination of multivariate SSA algorithms [39] with ANN for multivariate time series analysis.    (Table 3), respectively. The vertical and horizontal lines in each w-correlations plot indicate the selected cut-point that maximize separability between signal and noise components. Figure A4. Dendrogram for the hierarchical cluster analysis for the eight currency, obtained using the "TSclust" package [56] of the R software.