The Effect of Data Transformation on Singular Spectrum Analysis for Forecasting

: Data transformations are an important tool for improving the accuracy of forecasts from time series models. Historically, the impact of transformations have been evaluated on the forecasting performance of different parametric and nonparametric forecasting models. However, researchers have overlooked the evaluation of this factor in relation to the nonparametric forecasting model of Singular Spectrum Analysis (SSA). In this paper, we focus entirely on the impact of data transformations in the form of standardisation and logarithmic transformations on the forecasting performance of SSA when applied to 100 different datasets with different characteristics. Our ﬁndings indicate that data transformations have a signiﬁcant impact on SSA forecasts at particular sampling frequencies.


Introduction
Amidst the emergence of Big Data and Data Mining techniques, forecasting continues to remain an important tool for planning and resource allocation in all industries. Accordingly, researchers, academics, and forecasters alike invest time and resources into methods for improving the accuracy of forecasts from both parametric and nonparametric forecasting models. One approach to improving the accuracy of forecasts is via data transformations prior to fitting time series models. For example, it is noted in [1] that data transformations can simplify the forecasting task, whilst evidence from other research indicates that, in economic analysis, taking logarithms can provide forecast improvements if it results in stabilising the variance of a series [2]. However, studies also indicate that data transformations will not always improve forecasts [3] and that they could complicate time series analysis models [4,5].
In fact, a key challenge for forecasting under data transformation is to transform the data back to its original scale, a process which could result in a forecasting bias [6,7]. Historically, most studies have focused on the impact of data transformations on parametric models such as Regression and Autoregressive Integrated Moving Average (ARIMA) models [8,9]. More recently, authors have resorted to evaluating the impact of data transformations on several other forecasting models [10,11], further highlighting the relevance and importance of the topic. Our interest is focused on the evaluation of the impact of data transformations on a time series analysis and forecasting technique called Singular Spectrum Analysis (SSA).
In brief, the SSA technique is a popular denoising, forecasting, and missing value prediction technique with both univariate and multivariate capabilities [12,13]. Recently, its diverse applications have focused on forecasting solutions for varied industries and fields, from tourism [14,15] and economics [16][17][18] to fashion [19], climate [20,21], and several other sectors [22][23][24]. Regardless of its wide and varied applications, researchers have yet to explore the effect of data transformations on the forecasting performance of this nonparametric forecasting technique. Previously, in [25], the authors evaluated the forecasting performance of the two basic SSA algorithms under different data structures. However, their work did not extend to evaluating the impact of data transformations to provide empirical evidence for future research. Accordingly, through this paper, we aim to contribute to the existing research gap by studying the effects of different data transformation options on the forecasting behaviour of SSA.
Logarithmic transformation is the most commonly used transformation in time series analysis. It has been used to convert multiplicative time series structures to additive structures or to reduce the time series skewness volatility and increase stability [2,26]. The autocorrelation structure in the time series may change under different transformations that may affect the model, and different transformations may result in different specifications for ARIMA models [6,27]. Like ARIMA models, SSA too can be greatly influenced by transformations. For instance, if data transformation makes noise uncorrelated or reduces the complexity of the time series, it can improve SSA performance [21,26]. As data standardisation and logarithmic transformations are the easiest in terms of interpretability and back-transformation to the original scale, we explore the effect of these data transformations on the forecasting performance of SSA.
The remainder of this paper is organised as follows. In Section 2, we provide a detailed exposition of SSA and its recurrent and vector forecasting algorithms. In Section 3, we present data transformation techniques and their effect on forecasting accuracy. Procedures for examining the effect of transformation based on different characteristics of time series are presented in Section 4. In Section 5, we analyse different datasets of varied characteristics and present our results for an evidence-based exploration of the effect of data transformations on SSA forecasts. Finally, we present our concluding remarks in Section 6.

SSA Forecasting
There are two different algorithms for forecasting with SSA, namely recurrent forecasting and vector forecasting [12,28]. Those interested in a comparison of the performance of both algorithms are referred to [25]. Both of these forecasting algorithms require that one follows two common steps of SSA, the decomposition and reconstruction of a time series [12,28]. In what follows, we provide a brief description of forecasting processes in SSA.

Decomposition and Reconstruction of Time Series
In SSA, we embed the time series {x 1 , x 2 , . . . , x N } into a high-dimensional space by constructing a Hankel structured trajectory matrix of the form: where m is the window length, the m−lagged vector x i = (x i , x i+1 , . . . , x i+m−1 ) is the ith column of the trajectory matrix X, n = N − m + 1, and m ≤ n.
The singular value decomposition (SVD) of the trajectory matrix X can be expressed as where u j is the jth eigenvector of X X corresponding to the eigenvalue λ j and v j = X u j / λ j .
If k is the number of signal components, S k = ∑ k j=1 λ j u j v j represents a matrix of signal, and λ j u j v j is the matrix of noise. We apply the diagonal averaging procedure to S k to reconstruct the signal series x t such that the observed series can be expressed as where x t is the less noisy, filtered series. A detailed explanation of decomposition in Equation (3) can be found in [28,29].
To construct the trajectory matrix X in Equation (1) and to conduct the SVD in Equation (2), we have to select the Window Length m and the number of signal components k. Since our aim is not to demonstrate the selection of SSA choices (m and k), we opt not to reproduce the selection procedures for SSA choices, as these are already covered in depth in [12,28]. As our interest is in examining the effect of transformation on the forecasting performance of SSA, we select m and k such that the Root Mean Squared Error (RMSE) in forecasting is minimised.

Recurrent Forecasting
Recurrent forecasting in SSA is also known as R-forecasting, and the findings in [25] indicate that R-forecasting is best when dealing with large samples. If u ∇ j = (u 1j , . . . , u (m−1)j ) is the vector of the first m − 1 elements of the jth eigenvector u j , and u mj is the last element of u j . The coefficients of linear recurrent equation can be estimated as a = (a (m−1) , . . . , a 1 ) = 1 With the parameters in Equation (4), a linear recurrent equation of the form is used to obtain a one-step-ahead recursive forecast [29]. This linear recurrent formula in Equation (5) is used to forecast the signal at time t + 1 given the signal at time t, t − 1, . . . , t − m + 2 [28] (Section 2.1, Equations (1)-(6)), and the one-step-ahead recursive forecast of x N+j iŝ We apply the recursive forecasting method in Equation (6) to obtain a one-step-ahead forecast.

Vector Forecasting
In contrast, the SSA Vector forecasting algorithm has proven to be more robust than the R-forecasting algorithm in most cases [25]. Let us define U ∇ k = u ∇ 1 . . . u ∇ k as the (m − 1) × k matrix consisting of the first m − 1 elements of k eigenvectors. The vector forecasting algorithm computes m−lagged vectorsẑ i and constructs a trajectory matrix Z = [ẑ 1 . . .ẑ nẑn+1 . . .ẑ n+h ] such thatẑ where s i is the ith column of the reconstructed signal matrix S k = ∑ k j=1 λ j u j v j , andẑ i is the last m − 1 elements of the vectorẑ i .

Transformation of Time Series
Data transformation is useful when the variation increases or decreases with the level of the series [1]. Whilst logarithmic transformation and standardisation are the most commonly used data transformation techniques in time series analysis, it is noteworthy that there are other transformations from the family of power transformation such as square root and cube root transformations. However, the interpretability is not as simple and common as that for standardisation and logarithmic transformation.

Standardisation
Standardisation of time series {x t } is formulated as wherex and σ x are the mean and standard deviation of the series {x t }, respectively. Data standardisation is another common data transformation in preprocessing. Standardisation is mostly common in machine learning techniques to reduce training time and error. In time series forecasting, standardisation has proven advantages when we are using machine learning algorithms (e.g., neural networks and deep neural networks) [30]. In terms of SSA, the theoretical literature does not investigate the effect of standardisation on SSA forecasts in detail. However, in Golyandina and Zhigljavski [26], the authors addressed the effect of centering the time series as preprocessing. In theory, if the time series can be expressed as an oscillation around a linear trend, centering will increase the SSA's accuracy [26].

Logarithmic Transformation
In this paper, the following logarithmic transformation is applied on time series {x t }: where C is a constant value, large enough to guarantee that the term inside the logarithm is positive. As mentioned before, log-transform is a common preprocessing to handle variance instability or right skewness. Furthermore, one may use log-transform as a form of preprocessing to convert a time series with a multiplicative structure to an additive one. Given that SSA can be applied to time series with both additive and multiplicative structures, it does not necessarily need log-transform pre-processing [26]. However, the authors in Golyandina and Zhigljavski [26] showed that using log-transform could affect SSA's forecasting accuracy. In fact, SSA's forecasting accuracy will increase if the rank of a transformed series is smaller than the original one.

Comparison between Transformations
Time series with different characteristics will behave differently after transformation. For instance, forecasting accuracy in time series, with positive skewness, non-stationarity, and non-normality, may improve with logarithmic transformation. Furthermore, in time series with large observations or large variance, standardisation can improve the forecasting accuracy. Sampling frequency is another potential factor affecting forecasting accuracy. Time series with high sampling frequency (e.g., hourly or daily) usually have an oscillation frequency close to its noise frequency and consequently show instable and noisy behaviour. On the other hand, time series with larger sampling frequency are smoother. These characteristics of time series may affect forecasting and accuracy as well. As such, to investigate the practical effect of data transformation in SSA forecasting, we should consider "Sampling Frequency," "Skewness," "Normality," and "Stationarity" as control factors.
To observe the effectiveness of data transformation prior to the application of SSA, we may compare the forecasting performance of SSA under different transformations and control factors: firstly, by comparing the Root Mean Squared Forecast Error (RMSFE), and secondly, by employing a nonparametric test to examine the treatment effect (data transformation).

Root Mean Squared Forecast Error (RMSFE)
The most commonly adopted approach for comparing the predictive accuracy between forecasts is to compute and compare the RMSFE from out of sample forecasts. The RMSFE can be defined as where h is the forecast horizon, N is the number of observations, x t is observed value of time series, andx t is the forecasted value. The application of data transformation prior to forecasting with SSA may significantly affect the forecasting outcome and the affect may vary based on the properties of a time series. Thus, we need to examine the effect of data transformation on RMSFE along with the differing properties of time series. Comparisons between the RMSFE of the original and transformed time series can be used to learn about the forecasting performance of a model for a given time series. However, comparison of RMSFE for a pool of time series with different characteristics is not straightforward. We compute RMSE h for h = 1, 3, 6, 12 (h = 1 for a short-term forecast, h = 12 for a long-term forecast, and h = 3, 6 as a medium-term forecasting horizon) for each of the time series in the pool and examine the effect of transformation by using statistical tests.

Nonparametric Repeated Measure Factorial Test
Treatment effects in the presence of factors can be examined by employing the nonparametric repeated measure factorial test [31,32] for a pool of time series of different characteristics. Thus, the effect of data transformation (treatment) can be examined by using this test under different characteristics of a time series.
Let us assume that we have K time series in the pool with series code A k , k = 1, . . . , K and for each of the series RMSE h for h = 1, 3, 6, 12 are computed. If the interest lies on exploring the effect of transformation for the skewness property of time series, we essentially perform the test for treatment effect (transformation) for categories of skewness properties of these time series. There are three factor levels of the factor Skewness, namely Skew Negative, Skew Positive, and Skew Symmetric. Similarly, we will have two levels for the factor Normality (Yes = normal; No = not normal) and two levels for the factor Stationarity (Yes = stationary; No = nonstationary). To test the effect of transformation (No transformation, Standardisation, and Logarithmic transformation), we follow the procedures described below.
First, we learn some basic characteristics of a time series such as normality, stationarity, skewness, and frequency. For example, the frequency of a time series can be learnt by examining the time of measurement: hourly, daily, weekly, monthly, or annually. We also classify time series into different categories via a series of statistical tests such as the Jarque-Bera test for normality [33], the KPSS test for stationarity [34], and the D'Agostino test for skewness [35].
Secondly, the nonparametric repeated measure factorial test [31,32] is used to test the effect of the transformation on RMSFE, across different categories where categories are defined based on Frequency, Normality, Skewness, and Stationarity.

Data Analysis
We used the same set of time series employed by Ghodsi et al. [25] to test the effect of data transformation on SSA forecasting accuracy, with different characteristics. The dataset contains 100 real time series with different sampling frequencies and stationarity, normality, and skewness characteristics, representing various fields and categories, obtained via the Data Market (http:// datamarket.com). Table 1 presents the number of time series with each feature. It is evident that the real data includes data recorded at varying frequencies (annual, monthly, weekly, daily, and hourly) alongside varying distributions (normal distribution, skewed, stationary, and non-stationary). Interestingly, the majority of the data are non-stationary overtime, which resonates with expectations within real-life scenarios.
The name and description of each time series and their codes assigned to improve presentation are presented in Table A1. Table A2 presents descriptive statistics for all time series to enable the reader to obtain a rich understanding of the nature of the real data. This also includes skewness statistics, and results from the normality (Shapiro-Wilk) and stationarity (Augmented Dickey-Fuller) tests. As visible in Table A1, the data comes from different fields such as energy, finance, health, tourism, housing market, crime, agriculture, economics, chemistry, ecology, and production.  Figure 1 shows the time series for a selection of 9/100 series used in this study. This enables the reader to obtain a further understanding of the different structures underlying the data considered in the analysis. For example, A007 is representative of an asymmetric non-stationary time series for the labour market in a U.S. county. This monthly series shows seasonality with an increasing non-linear trend. In contrast, A022 is related to a meteorological variable that is asymmetric, yet stationary and highly seasonal in nature. An example of a time series that is both asymmetric and non-stationary is A038, which represents the production of silver. Here, structural breaks are visible throughout. A055 is an annual time series, which is stationary and asymmetric, and relates to the production of coloured fox fur. An example of a quarterly time series representing the energy sector is shown via A061. This time series is non-stationary and asymmetric with a non-linear trend and an increasing seasonality over time. Another example focuses on the airline industry (A075) and is also asymmetric and non-stationary in nature. It appears to showcase a linear and increasing trend along with seasonality. A skewed and non-stationary sales series is shown via A081, with the trend indicating increasing seasonality with major drops in the time series between each season. A time series for house sales (A082) can be found to be normally distributed and non-stationary over time. It also shows a slightly curved non-linear trend and a sine wave that is disrupted by noise. Finally, the labour market is drawn on again via A094, but this is an example of a time series affected by several structural breaks leading to a non-stationary, asymmetric series, which also has seasonal periods and a clear non-linear trend. R packages "Rssa" [36][37][38] and "nparLD" [39] are employed to implement SSA forecasting and the nonparametric repeated measure factorial test, respectively. We apply SSA to three versions of a dataset: a dataset without any transformation, a standardised dataset, and a log-transformed dataset. For each of the three datasets, we obtain RMSFE from out-of-sample forecasting at forecast horizons h = 1, 3, 6, 12. It is noteworthy that our aim in this paper is to examine the effect of transformation in SSA forecasting. Thus, we consider the best forecast based on the RMSFE of the last 12 data points regardless of whether the forecast is from the recurrent or vector-based approach.
We also know that the window length m, the number of components k, and the forecasting methods (recurrent and vector) affect the forecasting outcome. Thus, we adopt a computationally intensive approach by considering combinations of m and k, and methods that provide the minimum RMSFE for the out-of-sample forecast for the last 12 data points. The RMSFEs obtained from the computationally intensive approach are given in Tables A3-A6.
Given that the best forecasting results are achieved by util ising a computationally intensive approach, we seek to identify the factors that can affect the RMSFE. In order to address this, we employ statistical tests described in Section 4.2. For each of the series with RMSFE reported in Tables A3-A6, we examine the characteristics of the time series by employing a statistical test, as described in Section 4.2. At this stage, we are ready with the inputs for nonparametric repeated measure factorial test to conduct testing on the treatment effect (data transformation) under different characteristics of these time series. Results obtained from the Wald type tests are provided in Table 2. transformation does not affect SSA forecasting performance, but the interaction between sampling frequencies and transformation is significant, which means the SSA performance is affected by transformation at some sampling frequencies.
The above findings are important in the practice for several reasons. First, in the real world, it is well known that most time series do not meet the assumption of normality. However, as the effect of normality and its interactions with transformations are not significant, when faced with normally distributed data, our findings indicate that there is no impact on the forecasting accuracy of SSA with or without data transformations. Furthermore, these findings also indicate that data transformations do not improve the forecast accuracy in non-normal data either. Secondly, we find that, when series are stationary, it affects the long-term forecasting accuracy of SSA. However, when generating short-term forecasts, the forecasting accuracy of SSA is not affected by stationarity. Thirdly, in reality, as most time series are skewed and increasingly found at varying frequencies (especially following the emergence of Big Data), these findings show that forecasters should remember that varying skewness and frequency of data are features indicative of the need for careful exploration of the use of SSA as the forecasts are sensitive to these features. In general, transformations are not required when forecasting with SSA, as there is no evidence of transformations impacting the SSA forecasting performance; however, there could be a significant impact at certain sampling frequencies. This indicates that, when modelling data with different frequencies, the sensitivity of SSA forecasts to such frequencies could potentially be controlled by transforming the input data.
Since the interaction between sampling frequency and transformation is significant, we explore the relative effect of frequencies on RMSFE. Figure 2 shows the effect plot of treatment (transformation) for different forecast horizons h = 1, 3, 6, 12. To explore the relative effects of sampling frequency for different forecast horizons, we plot the relative effect of frequencies in Figures 3 and 4.  Figure 2, we can evaluate how the hourly (F H), weekly (F W), quarterly (F Q), and annual (F A) sampling frequencies are affecting the forecasting performance of SSA. Moreover, the change in shape of the transform's relative effects (e.g., see the difference between the shapes of "F Q" and "F H" lines in Figures 3 and 4) suggests an interaction between transformation and sampling frequency.
We analyse the results by forecasting horizon. It can be seen in Figure 3 that, in very short-term forecasting (h = 1), the standardisation produces a comparatively large RMSFE in quarterly frequencies, while the log transformation reports a slightly larger RMSFE at daily, quarterly, hourly, and annual frequencies. This indicates that users should certainly avoid transforming data with quarterly frequencies when forecasting at h = 1 step ahead with SSA. In the short-term forecasting horizon (h = 3) (see Figure 3), the smallest RMSFE belongs to standardisation for monthly frequencies, while standardisation has the largest RMSFE at quarterly frequencies. In mid-and long-term forecasting horizons (h = 6 and 12), which are visible in Figure 4, the following can be seen. At h = 6 steps ahead, standardisation produces the lowest RMSFE at monthly sampling frequencies, whilst it has the largest RMSFE in quarterly and weekly time series data. The log transformation produces higher RMSFEs at daily, hourly, and annual frequencies. Accordingly, the only instance when standardisation could produce better forecasts with SSA at this horizon is when faced with monthly data. At h = 12 steps ahead, standardisation leads to better forecasts at daily frequencies, whilst log transformations can provide better forecasts with SSA at weekly frequencies.
Finally, these findings indicate that standardisation should only be used to transform data when forecasting with SSA at h = 12 steps ahead at the daily frequency, at h = 3 or h = 6 steps ahead when dealing with a monthly frequency, and at h = 1 step ahead when forecasting data with monthly or weekly frequencies. At the same time, standardisation should not be employed when forecasting quarterly data at any frequency, as it worsens the forecasting accuracy by comparatively larger margins. Interestingly, log transformations are only suggested when dealing with forecasting weekly data at h = 6 or h = 12 steps ahead. In the majority of the instances, SSA is able to provide superior forecasts without the need for data transformations when compared with time series following varied frequencies.

Concluding Remarks
This paper focused on evaluating the impact of data transformations on the forecasting performance of SSA, a nonparametric filtering and forecasting technique. Following a concise introduction, the paper introduces the SSA forecasting approaches followed by the transformation techniques considered here. Regardless of its popularity (and in contrast to other methods such as ARIMA and neural networks), there has been no empirical attempt to quantify the impact of data transformations on the forecasting capabilities of SSA. Accordingly, we consider the impact of standardisation and logarithmic transformations on the forecasting performance of both vector and recurrent forecasting in SSA. In order to ensure robustness within the analysis, we not only compare the forecasts using the RMSFE but also rely on a nonparametric repeated measure factorial test.
The forecast evaluation is based on 100 time series with varying characteristics in terms of frequencies, skewness, normality, and stationarity. Following the application of SSA to three versions of the same dataset, i.e. the original data, standardised data, and log transformed data, we generate out-of-sample forecasts at horizons of 1, 3, 6, and 12 steps ahead. Our findings indicate that, in general, data transformations do not affect SSA forecasts. However, the interaction between sampling frequency and transformations are found to be significant, indicating that data transformations are significant at certain sampling frequencies.
According to the results of this study, in time series with a higher sampling frequency (i.e. daily or hourly data), standardisation can improve SSA forecasting accuracy in the very long term at daily frequencies only. On the other hand, in time series with low sampling frequencies (i.e. quarterly and annual), neither logarithmic transformation nor standardisation is suitable across all horizons. In other time series' sampling frequencies (weekly and monthly), data transformation with standardisation can affect all forecasting horizons (except h = 12) when faced with monthly data and at h = 1 step ahead when faced with weekly data. The results also show improvement in forecasting accuracy in weekly data with logarithmic transformations at h = 6 and h = 12 steps ahead. These findings provide additional guidance to forecasters, researchers, and practitioners alike in terms of improving the accuracy of forecasts when modelling data with SSA.
Future research should consider the relative gains of suggested data transformations at different sampling frequencies in relation to other benchmark forecasting models as well as theories explaining the mechanism of these effects in detail. Moreover, the development of automated SSA forecasting algorithms could be informed by the findings of this paper to ensure that data transformations are conducted prior to forecasting at selected sample frequencies.

Conflicts of Interest:
The authors declare that there is no conflict of interest.