ForecastTB -- An R Package as a Test-bench for Forecasting Methods Comparison

This paper introduces the R package ForecastTB that can be used to compare the forecasting accuracy of different methods as related to the characteristics of a dataset. The ForecastTB is a plug-and-play structured module, and several forecasting methods can be included with simple instructions. The proposed test-bench is not limited to the default forecasting and error metric functions, and users are able to append, remove or choose the desired methods as per requirements. Besides, several plotting functions are provided to allow comparative visualization of the performance and behavior of different forecasting methods. Furthermore, this paper presents example applications with natural time series datasets to exhibit the feature of the ForecastTB package to evaluate forecasting comparison analysis as affected by characteristics of a dataset.


Introduction
Decision making is one of the most crucial tasks in many domains, and often, decisions are based on the accurate forecast available in the respective domains. A large number of areas such as energy [1,2], economics [3], infrastructure [4,5], health [6,7], agriculture [8], defense [9], education [10,11], technology [12] and many others are looking forward to benefits that can be achieved with time series forecasting. Time series are consecutive sequences of values ordered with respect to time. Statistically, it is represented by the theory of stochastic processes. The notable feature of time series data are the time-based correlation between values such that the likelihood of observing a single value depends on the values of past or future observations [13,14].
There are several challenges in adopting a standardized approach to compare forecasting methods. An obvious concern is the nature of the time series and selection of appropriate methods, such that the complete variation from seasonality, trends and random parameters should be handled with each method. For example, one method may be superior for very seasonal time-series but perform poorly relative to other methods for a trendy or a random one. Besides, interpretations might be influenced by the choice of error metric, e.g., Root mean square error (RMSE), as different metrics have different objectives [15].
The ForecastTB package can be helpful for professionals and researchers working in the field of data science and forecasting analysis. The salient features of the ForecastTB package are as follows: •

Reduction in efforts and time consumption:
The ForecastTB package is designed to reduce the efforts and time consumption for the time-series forecasting analysis. It avoids the repetitive steps in the analysis and leads to the promising comparative results report generation.
• Truthful comparison assurance: The ForecastTB package ensures a truthful and unbiased comparison of forecasting methods. Hence, this package may be considered a reliable tool for forecasting models based on industrial reports generation or scientific publications. •

Reproducible research:
Along with unbiased comparisons, the ForecastTB package provides ease in the reproducible research with minimum efforts. In other words, the forecasting comparison can be reproduced several times easily with the help of the ForecastTB package. • Stepping stone in machine learning automation: Forecasting methods play a very important role in machine learning applications. The ForecastTB package aims to evaluate the best performing forecasting method for a given time series and this can be presented as a stepping stone in machine learning automation. For example, on changing nature and patterns of the time series dataset, a machine learning application could automatically replace the existing forecasting methods based on the output of the ForecastTB package. •

A handy tool:
The ForecastTB package is a handy tool, especially for the researchers who are not comfortable with computer coding, since it is a plug-and-play module based package. A very simple syntax leads to very impressive and accurate forecasting comparison analysis.
This paper describes the ForecastTB package to simultaneously compare different forecasting methods for univariate time series [16]. The motivation behind this package is another R package, named imputeTestbench [17], which found great success in performing comparison analysis with time series imputation methods in several research studies [18][19][20][21]. The ForecastTB package aims at providing an evaluation toolset that inscribes the challenges for discovering the most suitable forecasting method before a more detailed analysis. This package provides several options for simulating random possibilities with different strategies including Monte-Carlo with repeated sampling from a complete dataset. Future values are forecasted using any of several methods and then compared with a common error metric chosen by the user. A couple of plotting functions are provided to visualize the overall forecast accuracy evaluation between methods. Besides, example applications are discussed to demonstrate how the ForecastTB package can be used to understand why different methods have different forecasting accuracy given characteristics of the original time-series dataset.

Overview of ForecastTB
The ForecastTB package is a plug-and-play structured module as shown in Figure 1. It is used to compare the forecasting methods, which begins by forecasting time series with distinct strategies. Then the error metrics are used to identify the prediction accuracies of each method for several repetitions. The primary function is prediction_errors(), which is used to evaluate different forecasting methods with the consideration of various input parameters. It returns an object, which is the basic module in the package. Further this module can be updated with new methods and other parameters with append_() and choose_() functions. The Monte_Carlo() function is a further extension of the prediction_errors() module to compare distinct methods for randomly selected patches of the input time series. The remaining two functions, plotMethods() and plot_circle(), are used to visualize forecasted results and error summaries for the chosen methods. All functions, as a set of modules, are based on, and connected with the object provided by the prediction_errors() function. Hence, the framework of the ForecastTB package is version-control-friendly. It means, in the future, new features in the next versions of the package, can be easily introduced. Dependencies include additional packages for data manipulation (reshape2, [22]; stats, [23]), graphing (ggplot2, [24]; circlize, [25]; graphics, [26]; gridExtra, [27]; RColorBrewer, [28]), and forecast (forecast, [29]; PSF, [30,31]; decomposedPSF, [32]; methods, [33]).
Recently, a package in R was proposed for streamlining time series forecasting with limited facilities and features, named as predtoolsTS [34,35]. This tool assists in forecasting with the automated Auto Regressive Integrated Moving Average (ARIMA) model [29] and only regression type algorithms from the caret [36] package in the closed environment. Beyond this, there is no facility to append new forecasting methods and updating the comparison environment as per the user's need. On the contrary, the proposed ForecastTB package provides such multiple facilities along with the possibilities of introducing new features with a simple plug-and-play structure as discussed in the article.

The prediction_errors() function
The prediction_errors() function evaluates the accuracy of different forecasting methods based on various parameters supplied by user. The default methods included in prediction_errors() are ARIMA from the forecast package [29] and the Pattern sequence based forecast (PSF) method from the PSF package [30,31]. This statistical method, ARIMA, is most widely and routinely applied in time series forecasting. It is easily understood compared to more complex approaches. Moreover, to avoid further complexity, the auto.arima() function in the forecast package [29] and psf() from the PSF package are used, which automatically estimates the model parameters required for the given dataset. Although we acknowledge that the accuracy of a chosen method depends on the data, the default technique is one of the best state-of-the-art forecasting methods. As noted below, additional methods can be easily added as needed.
The prediction_errors() function has the following arguments: A ts (stats) or numeric object that will be evaluated. The input object is a complete dataset for performance evaluation and comparison of forecasting methods. The examples in the paper use the nottem time series object of average air temperatures recorded at Nottingham Castle from 1920 to 1930 (datasets package, [23]).

nval:
An integer value to decide a number of values to be predicted while comparing the forecasting methods. The default value is 12.

ePara:
The error metric used to compare the original, observed values from data with the forecasted values. Metrics included with ForecastTB are root-mean squared error (RMSE), mean absolute percent error (MAPE) and mean absolute error (MAE). By default, the prediction_errors() produces results with all three metric, whereas the individual metric can be opted using ePara = 'rmse', 'mape', or 'mae'. Formulas for the error metrics are as follows: where X i andX i are the observed and forecasted data at time t, respectively. N is the number of data for forecast evaluation. Additionally, a new error metric of user's choice can be introduced in the following manner: ePara = c("errorparametername"), where errorparametername is an external function which returns desired error values for original and forecasted values. It is discussed in detail in following sections.

ePara_name:
A character string for the names of the user-supplied script that includes one to many error methods passed to ePara. Default list of names of error parameters in the prediction_errors() function are RMSE, MAE and MAPE.

Method:
A string with method names that are used to forecast values with ARIMA, PSF and user defined methods. The ARIMA is a default method. It is used unless the argument is changed by the user. For example, Method = c("PSF") will use only the PSF with the prediction_errors() function. Methods not included in the default options can be added by including the vector as Method = c("test1(data, nval)"), where test1 is an external function which forecasts nval number of future values for input data and returns the forecasted values. Such multiple functions can be added in the Method parameter. The extrnal function is expected to provide forecast methods using the recursive strategy.

MethodName:
A character string for the names of the user-supplied script that includes one or more forecast methods passed to Method. The default list of name of method in the prediction_errors() function is ARIMA.

strats:
A character string of names of forecasting strategies used for time series forecasting. The strategies used in the current version of the prediction_errors() function are Recursive and DirRec. In the recursive approach, the first value is predicted with a prediction model and then appended to the data-set. Fits to the same model are then used to predict the next value. In the Dirrec approach, the first value is predicted with a model and appended it to the data-set. A new model is then fitted and used to predict next value and so on. The default strategy in the prediction_errors() function is the Recursive one.

append_:
A flag that suggests if the function is used to append to another instance. The default flag is 1, which indicates newly added methods are compared with the default methods. When the flag is 0, all methods, excluding default methods, will be compared.

dval:
An integer that specifies the length of the data subset to be used for forecasting. The default value is the length of the data parameter.
Considering the default values for the above arguments, the prediction_errors() function returns an S4 object of class prediction_errors as the error profile for the forecasting methods. This prediction_errors() function returns two slots as output. The first slot is output, which provides Error_Parameters, indicating error values for the forecasting methods and error parameters defined in the framework, and Predicted_Values as values forecasted with the same foreasting methods. Further, the second slot is parameters, which returns the parameters used or provided to prediction_errors() function. This is illustrated in the following example: The append_() and choose_() are functions to add new forecasting methods and choose the selected methods to and within the prediction_errors object. These functions make ForecastTB a handy plug-and-play tool for forecasting analysis. The append_() function has the following arguments: object: A prediction_errors object with two slots, output and parameters returned by the prediction_errors() function.
Method, MethodName, ePara and ePara_name: These parameters are with similar meaning as that supplied to the prediction_errors() function. The ePara and ePara_name parameters of the append_() function automatically get synced with the prediction_errors() function. The append_() function returns the updated prediction_errors object as the error profile for the forecasting methods along with newly added ones.
Further, the choose_() function helps the user to manipulate the prediction_errors object, by allowing selection of the desired methods existing in the object. The choose_() function has the following argument: choose_(object = a) object: A prediction_errors object returned by the prediction_errors() or append_() function. The syntax of the choose_() function is discussed in the following example. The variable c is an object with three forecasting methods comparisons. After provide the c object to the choose_() function, it returns the following output showing names of forecasting methods attached in the c object and asks the index number of method to be removed from the same object. Finally, it stores the newly updated object in the variable d.  In this way, the append_() and choose_() functions provide features to manage the desired forecasting methods in the prediction_errors object with simple and single lined syntax. Besides, these functions avoid repetitive computation and execution of the methods which are already attached in the prediction_errors object.

The monte_carlo() function
Monte-Carlo sampling is a very popular strategy to compare the performance of forecasting methods. In this strategy, multiple patches of a dataset are selected randomly and then tests of the performance of the forecasting methods are executed. It returns the average of error values obtained for each data patches. The most notable feature of the Monte-Carlo strategy is that it ensures an accurate comparison of forecasting methods and avoids the baised results which might be obtained by chance.
The monte_carlo() function has the following arguments: monte_carlo(object, size, iteration, fval, figs) object: A S4 object of the prediction_errors class with two slots, output and parameters returned by the prediction_errors(), append_() or choose_() functions. The monte_carlo() function tunes itself based on parameters returned by the object.

size:
An integer value representing the length of the patches from the time series dataset used in the Monte Carlo strategy. The input dataset is refered from the prediction_errors object.

iteration:
An integer value representing the number of iterations to be used by the Monte-Carlo strategy.

fval:
A flag, when it is on (fval = 1), the monte_carlo() function return the output along with the forecasted values for each iteration. The default value of fval is 0, that returns only the error values in RMSE in each iteration.

figs:
A flag to plot with the plot.prediction_errors() function for each iteration in Monte-Carlo strategy when figs = 1. The default value of figs is 0 and it don't return any plots.
This way, the monte_carlo() function returns the error values along with the mean values for the provided forecasted methods in each iteration of the Monte-Carlo strategy. This function can be used as a quick tool to create an intution of best suited forecasting method for a given time series dataset.

A case study: Performance of forecasting methods on natural time series
This section presents a case study to introduce the use of the ForecastTB package as a test-bench to compare the performance of time series forecasting methods. The methods used in the case study are the auto.arima(), ets(), psf(), and lpsf() functions, which are well accepted functions in the R community working over time series forecasting techniques. The data used in this example are nottem and sunspots, both of which are standard time series datasets available in R. The nottem dataset is the average air temperatures recorded at Nottingham Castle in degrees Fahrenheit from 1920 to 1930 (datasets package, [23]) Similarly, the sunspots dataset is mean relative sunspot numbers from 1749 to 1983 (datasets package, [23]). Both datasets are measured on monthly basis.
In the following examples, model training, forecasting, and plotting were shown for the nottem dataset, but only the error values are shown for the sunspots dataset. The following are the two functions that represent forecasting methods. These methods include LPSF and PSF methods from the decomposedPSF [32] and PSF [30] packages. These functions consume time-series dataset and nval as input parameters and return the forecasted values as a string.
library(decomposedPSF) test1 <-function(data, nval){ return(lpsf(data = data, n.ahead = nval)) } library(PSF) test2 <-function(data, nval){ a <-psf(data = data, cycle = 12) b <-predict(object = a, n.ahead = nval) return(b) } The following code chunk is the main function in ForecastTB package, in which performance of ARIMA along with the above mentioned (PSF and LPSF) methods are compared for the nottem time series for next 12 months forecast. The output of the function is saved in the variable a1 as a S4 object of prediction_errors class. a1 <-prediction_errors(data = nottem, nval = 12, Method = c("test1(data, nval)", "test2(data, nval)"), MethodName = c("LPSF","PSF"), append_ = 1) a1 ## An object of class "prediction_errors" ## Slot "output": "test1(data, nval)" "test2(data, nval)" ## ## $MethodName ## [1] "ARIMA" "LPSF" "PSF" ## ## $Strategy The object a1 shows the comparison of ARIMA, PSF and LPSF methods in addition to the parameters provided by users and the defaults ones, whenever not provided. Imagining a case that the user wishes to attach a new method in the object a1. Though this could be done with the prediction_errors() function, as discussed above, this will lead to the execution of all methods which were already available in the object. To avoid such redundant calculations and executions, the append_() function could be used. The Error, Trend, Seasonal (ETS) method from the forecast [29] package is framed in a function as follows: library(forecast) test3 <-function(data, nval){ b <-as.numeric(forecast(ets(data), h = nval)$mean) return(b) } The object a1 is updated with ETS method into object c1 with appned_() function and returns the error parameters as shown below: c1 <-append_(object = a1, Method = c("test3(data,nval)"), MethodName = c('ETS')) c1@output$Error_Parameters Further, the object c1 is plotted with the S3 method plot.prediction_errors() as follows and are shown in Figure 2. This figure shows the forecasting errors (bar plot) and forecasted values (line plot) for all methods attached in object c1. From Figure 2, it can be observed that the performance of ETS method is not satisfactory for the nottem time series dataset. In this case, the user can omit the unwanted method from object d1 with the choose_() function as shown below. After providing the object as input to the choose_() function, it returns the names of forecasting methods available in the prediction_errors object with respective indices, and asks the user to enter the indices of the methods to be removed from the object. In the following code chunk, index 4 for the ETS the method is provided by the user and this results in a new object e1, which is nothing but the object d1 without the ETS method. The choose_() function permits the user to remove multiple numbers of methods from the object, where the user needs to enter multiple numbers in a vector format.

Adding new error metrics
As described in the earlier section, the error metrics included with ForecastTB are RMSE, MAE, and MAPE. These error metrics provide distinct information and contrasting approaches to evaluate forecasting methods. For example, RMSE is the most used metric that maintains the scale of observations in the input data. The MAE metric is similar to RMSE but more interpretable as the differences are in proportion to the absolute values of the errors. This makes it different from RMSE that uses the sum of squared deviations. Further, the MAPE metric is a simple extension of the MAE metric. The MAPE scales the output by the range of observations in the input data and is used for comparing datasets, which differ in scale. Based on the information provided by each metric, users should use an appropriate one. Each of the metrics are called internally within the prediction_errors() function.
Additionally, new error metrics can be added using an approach similar to that used for adding forecasting methods described in earlier sections. The following example shows the use of the percent change in variance (PCV, [37]) as an alternative error metric: where var(Predicted) and var(Observed) are variance of predicted and obvserved values. The user-supplied error function should have two mandatory arguments as input, the first one must be a vector of observed values and the second as a vector of forecasted values. Both arguments should be of the same length. The function must also return a single value as a summary of the errors or differences. The new error function should be saved as an R script. The function is added to the ePara argument and the error function name is added as a character string to the ePara_name argument for the prediction_errors() function.
# error metric to include with 'prediction_errors()' function pcv <-function(obs, pred){ d <-(var(obs) -var(pred)) * 100/ var(obs) d <-abs(as.numeric(d)) return(d) } The following code chunk shows how the new error function is attached to the prediction_errors object and corresponding update are observed with the plot() function, as shown in Figure 3.

A unique plot
The ForecastTB package proposed a unique way of showing forecasted values, especially if the time series dataset shows seasonality. The Figure 4 shows the proposed plot type. This plot shows the nature of forecasted values and how these values are behaving on an increasing number of time horizons.
The numbers on the circle shows the forecasted time steps. Several forecasting methods are suitable for short-range forecasts but fail to maintain the accuracy for longer horizons. This proposed plot is capable of reflecting this feature in a more intuitive way. In Figure 4, the yellow-colored line for the ARIMA method is moving away from the test values as the time horizon proceeds.

Monte-Carlo strategy
The code chunk below shows Monte-Carlo simulation on nottem time series with monte_carlo() function. This function is supplied with above mentioned prediction_errors object a1. It is also supplied with size of each data patch in each iteration as 144 and the number of iterations as 10. The flag parameters fval and figs are set to 0, in order to avoid display of forecasted values and comparison plots for all iterations in the Monte-Carlo simulations. The effect of these flags is further discussed in the vignette published in ForecastTB package. This output shows the performance of ARIMA, LPSF and PSF methods on nottem time series for 10 different time patches. The first column in the output shows the numeric number as an index in time series from which the data patch is selected for the analysis. Besides, the last row shows the averaged error values for all methods and it ensures a more accurate and unbiased performance of the forecasting methods.
Furthermore, the following code chunk shows an object to compare the performance of forecasting methods on the sunspots time series and further a more rigorous comparison is done with Monte-Carlo simulations.
x1 <-prediction_errors(data = sunspots, nval = 24, Method = c("test1(data, nval)", "test2(data, nval)"), MethodName = c("LPSF","PSF"), append_ = 1) x1@output$Error_Parameters It is interesting to know, for the full-length sunspots time series, the PSF method observed to be the best among the selected methods with the prediction_error() function, where last 24 values of dataset were forecasted. Whereas, for the Monte-Carlo simulation, with rigorous iterations, ARIMA outperformed with significant error difference as compared to PSF and LPSF methods. In this way, the monte_carlo() function can be used in comparing forecasting methods by avoiding the biased results that can be obtained by chance.

Discussion
This article proposed and demonstrated the ForecastTB package for informing the use of forecasting methods as an important step towards more formal time series analysis. For the demonstration, two natural time-series datasets ware used to demonstrate how characteristics of the temporal correlation structure can influence the forecast accuracy. The ForecastTB package greatly assists in comparing different forecasting methods and considering the characteristic of the time series dataset.
This article also demonstrated how the package can be modified to include additional forecasting methods or comparison metrics. By default, the package provides an ARIMA (auto.arima()) as a forecasting method, and RMSE, MAE, or MAPE as error metrics. Although, as per the users need, the comparison of other methods and error metrics will be required in some advanced scenarios. As such, the package allow users to include additional forecasting methods for comparison, which can be extremely useful given the capability of R to interface with other programming languages (e.g, matlabr for MatLab, [38]; Rcpp for compiled languages, [39], and many other). Finally, a simple plug-and-play module based architecture of the ForecastTB to append or remove several forecasting methods and error metrics makes it a robust and handy tool to evaluate forecasting analysis.