Next Article in Journal
Extrapolation Method for Non-Linear Weakly Singular Volterra Integral Equation with Time Delay
Next Article in Special Issue
Understanding Consumer Stockpiling during the COVID-19 Outbreak through the Theory of Planned Behavior
Previous Article in Journal
Existence of Nontrivial Solutions for Sixth-Order Differential Equations
Previous Article in Special Issue
Q or R Factor Analysis for Subjectiveness Measurement in Consumer Behavior? A Study Case on Durable Goods Buying Behavior in Romania
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Connection between the GEP Performances and the Time Series Properties

by
Alina Bărbulescu
1,* and
Cristian Ștefan Dumitriu
2,*
1
Department of Civil Engineering, Faculty of Civil Engineering, Transilvania University of Brașov, 5, Turnului Street, 900152 Brașov, Romania
2
Technical Office of Design, Technologies, Research, SC Utilnavorep SA, 55, Aurel Valicu Av., 900055 Constanța, Romania
*
Authors to whom correspondence should be addressed.
Mathematics 2021, 9(16), 1853; https://doi.org/10.3390/math9161853
Submission received: 20 May 2021 / Revised: 10 July 2021 / Accepted: 2 August 2021 / Published: 5 August 2021
(This article belongs to the Special Issue Analysis and Mathematical Modeling of Economic - Related Data)

Abstract

:
Artificial intelligence (AI) methods are interesting alternatives to classical approaches for modeling financial time series since they relax the assumptions imposed on the data generating process by the parametric models and do not impose any constraint on the model’s functional form. Even if many studies employed these techniques for modeling financial time series, the connection of the models’ performances with the statistical characteristics of the data series has not yet been investigated. Therefore, this research aims to study the performances of Gene Expression Programming (GEP) for modeling monthly and weekly financial series that present trend and/or seasonality and after the removal of each component. It is shown that series normality and homoskedasticity do not influence the models’ quality. The trend removal increases the models’ performance, whereas the seasonality elimination results in diminishing the goodness of fit. Comparisons with ARIMA models built are also provided.

1. Introduction

Financial time series analysis and forecasting have been interesting research fields for decades. Time series analysis may explain the law governing the data generating process, while good models may provide accurate predictions of its future behavior, supporting the decisions for profitable trading strategies [1,2].
From the apparition of the Box–Jenkins methodology [3] for times series analysis and forecast, ARIMA and SARIMA models have been extensively used to forecast financial time series [4,5]. In 1989, Hamilton [6] introduced the Markov Switching Model, which became one of the most popular models used for the nonlinear series, involving multiple equations that characterize the series behavior in different regimes [7]. The switching mechanism has also been incorporated in conditional variance models, such as stochastic volatility models [8] or GARCH [9]. Recent studies suggest that GARCH [10,11] can be a promising alternative to the traditional SARIMA method in forecasting problems, especially for nonlinear data. Still, it was shown that it is difficult to find such models with very high accuracy for long series with high variability.
The efficient market hypotheses, which assert that the market is efficient, attracted many criticisms from behavioral economists or psychologists. Introduced by Lo [12], for the reconciliation of the economic theories based on the efficient market hypothesis with behavioral economics, the adaptive market hypotheses apply the principles of evolution (competition, adaptation, and natural selection) [13]. The present study is in line with this theory.
Last period, artificial intelligence methods were extensively used for modeling financial series [14,15] because the traditional statistical methods cannot capture high nonlinearity and discontinuities. It seems that the evolutionary algorithms have the learning capacity of capturing the nonlinear future (which is dominant in financial series) [16,17], having the advantage to not rely on a specific form of the fitting model. Approaches like support vector regression [15,18,19], neural networks [14,20], or hybrid algorithms provided [13,21] valuable solutions to modeling economic and financial problems.
Part of the family of evolutionary techniques, the genetic programming paradigm, is defined as a generalization of Holland’s genetic algorithms. Starting from Koza’s seminal work [22], many algorithms have been developed as alternatives to genetic programming. One of them is Gene Expression Programming (GEP) [23], which was successfully used for modeling economics problems. For example, Huang et al. [24] used the GEP to generate trading signals for stocks, Chen et al. [16,25] utilized the Sortino ratio for fund selection, followed by the application of GEP for evolving dynamic trading strategies for them. Modeling economics problems, forecasting mutual funds, and financial indices’ evolution have also been conducted by other scientists [13,15,25,26,27,28,29,30,31,32,33].
Despite the significant number of articles in this field, only one investigated the relationship between the statistical properties of some financial series and the AI models’ goodness of fit. Still, it refers to the generalized regression neural networks [14]. Therefore, this study will fill a gap in this field. We analyze the statistical properties of different financial time series and investigate the connection between these properties and the GEP models’ goodness of fit. Comparisons of these models with the classical ARIMA are also provided, showing better performances of the GEP models. The research is important for selecting the best model from a diversity of options.

2. Materials and Methods

2.1. Studied Data and Statistical Tests

The data studied here consist of the BET index closing values, monthly series registered during October 2000–September 2014, and weekly series recorded in the period 1 October 2000–29 September 2014, downloaded from the website of the Bucharest Stock Exchange (BSE) [34]. Both series are complete, without gaps. The BET monthly and weekly series are represented in Figure 1.
BET, the first index developed by BSE, is a free-float weighted capitalization index with ten constituents being the reference index for BSE. BET reflects the performance of the most traded Romanian companies, listed at BVB, meeting the highest standards in the fields of investor relations and corporate governance. Constituents’ weights are capped quarterly to ensure that they fulfill the standards for portfolio diversification [35].
Since the series presents high variability, we took the logarithms of their values. The resulted monthly and weekly data series are denoted by S_m and S_w, respectively.
Statistical tests have been performed at the significance level of 5% for S_m, S_w, and their subseries determined after the change point detection. The null and alternative hypotheses (denoted by H0 and H1, respectively) of these tests are presented in Table 1, together with the references. Since the statistical tests are well-known, they are not discussed in detail here. For a deeper insight into them, readers may refer to [36,37,38,39,40,41,42,43,44,45,46,47,48,49].

2.2. Methods

For modeling purposes, gene expression programming (GEP) was used.
GEP is an algorithm for implementing symbolic regression, that does not request the advanced specification of the form of the function be determined.
In GEP, an individual is a solution to the problem, with all the individuals (e.g., the problem’s solutions) forming a population. GEP individuals are represented by hierarchical mathematical expressions, encoded as multi-genic linear chromosomes of a fixed length, which are linear strings of functional symbols (+, −, ∗, /, and functions—exponential, trigonometric, logarithmic, etc.), variables (containing the lagged values of the time series), and constants [13].
A gene is composed of a head (that may contain any symbol) and a tail (containing only constants and variables). The user must set the number of genes and the set of symbols before running the algorithm.
The word generation describes an iteration of the algorithm, pointing out the evolutionary aspect of the search process. The initial generation of individuals is randomly selected.
An iteration of the algorithm consists of modifying the individuals and choosing them based on their quality. The individuals’ modifications are realized by applying genetic operators: crossover, mutation, and transposition. Crossover combines the features of two or more parent individuals into one or more children. The mutation alters an individual, modifying some of its features. The gene transposition randomly chooses one of the chromosome’s genes and moves it in the place of another gene.
The fitness function evaluates the solution quality.
GEP was used with good results for solving symbolic regression problems and time series forecasting. In the context of time series modeling, a candidate solution is a mathematical expression formed by functions, variables, and constants, and as a consequence, it is represented as a parse tree of the mathematical expression [20,23].
If (xt) (t = 1,…,n) represent the values of the data series and ( x t ^ ) t = 1 ,   , n are those computed by the algorithm, GEP aims to determine a function f that estimates the registered value at a moment t (t < n), as best as possible, such that x t ^ = f ( x t 1 ,   , x t w ) , where w is a window size, and x t 1 ,   , x t w ,   0 t w n are the previous w values.
The analytical form of the function is not specified, being the result of the evolution of the mathematical expressions in the algorithm.
The basic structure of the gene expression algorithm is [23]:
(1)
Create chromosomes of the initial population;
(2)
Express chromosomes and evaluate their fitness;
  • If the stopping criterion is satisfied, designate results and stop;
  • If stopping criterion is not satisfied, go to the next step;
(3)
Select chromosomes and keep the fittest for the next generation;
(4)
Perform genetic modifications via genetic operators and gene recombination;
(5)
Select next-generation individuals;
(6)
Go to (2).
The termination criterion is the number of generations.
The accuracy of a model for a series of n observations is measured by the correlation between the actual and predicted values, the root mean squared error (RMSE), the mean squared error (MSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE) [13,14].
The settings for GEP parameters utilized in the present study (typical in the literature) are: the population size—50, the number of genes per chromosome—4, the gene head length—8, the maximum generations—2000, and the number of generations without improvement—1000. The settings of the evolution rate were: mutation rate—0.044, inversion rate—0.1, transposition rates—0.1, recombination rates: one point—0.3, two-point—0.3, gene rate—0.1. Other settings, recommended by the literature [13,16,22,23] and used for running the algorithm are:
  • The window size, w, was considered between 1 and 12. In the following, we report only the overall best result experiments;
  • The mutation rate of random constants: 0.01;
  • Maximum number of iterations: 1000;
  • Functions used in expressions: addition, subtraction, multiplication, division, square root;
  • Linking function: addition.
  • The fitness function is the mean squared error (MSE):
M S E = 1 n i = 1 n ( x i x ^ i ) 2
where x i is the actual value and x ^ i is the computed value.
The best solution is the one with the smallest fitness.
The studied series was divided into two parts, the first one for training (70% of data) and the rest for the test (30% of data). The model was built using the training data (the algorithm learns the data on this set) and was validated on the test set.
The GEP was chosen for modeling purposes, since it is computationally faster than the methods based on neural networks and support vector regression, and generally gives better results in comparison to them.
For details about GEP and its applications, readers may refer to [20,23,49].
ARIMA models have been built as benchmark models for the series and their subseries determined by the change points.
A linear process ( X t )(   t Z ) is called autoregressive of p order—AR(p)—if
X t = i = 1 p φ i X t i + ε t , φ p 0 ,   t Z
where ( ε t ,   t Z ) is a white noise with the variance σ2.
Defining
Φ ( z ) = 1 φ 1 z φ p z p ,
the Equation (2) becomes
( 1 φ 1 B φ p B p ) X t = ε t
or
Φ ( B ) X t = ε t
where ( ε t ,   t Z ) is a white noise with the variance σ 2 , B is the backshift operator (defined by B ( X t ) = X t 1 ) and B k ( X t ) = X t k .
A linear process ( X t ) ( t Z ) is called moving average of q order—MA(q)—if
X t = ε t j = 1 q θ j ε t j + ε t ,   θ q 0 ,   t Z
where ( ε t ,   t Z ) is a white noise with the variance σ2.
Defining
Θ ( z ) = 1 θ 1 z θ q z q ,
the relation (6) becomes
Θ ( B ) ε t = X t
A linear process ( X t ) ( t Z ) is called an autoregressive moving average of p, q orders ARMA(p, q) if
Φ ( B ) X t = c + Θ ( B ) ε t
where ( ε t ,   t Z ) is a white noise with the variance σ2. The constant c is called drift.
An autoregressive integrated moving average process ARIMA (p, d, q) is defined by the equation
Φ ( B ) Δ d X t = Θ ( B ) ε t
where Δ d X t = (1 − B)d X t (d > 0) the absolute values of Φ şi Θ are greater than 1 and ( ε t ,   t Z ) is a centered white noise.
To select the best ARIMA model, the Akaike (AIC) criterion was used. The lower the AIC is, the better the model is.
For a deeper insight into ARIMA models, the reader may refer to [41].
Statistical tests and modeling were performed using the R and DTREG software.

3. Results and Discussion

3.1. Results of Statistical Analysis and the GEP Models for the Monthly Series

The results of the statistical analysis of the lnBET monthly series are the following. The normality and homoskedasticity hypotheses were rejected. After the application of the runs test, the null hypothesis was rejected. The analysis of the correlogram confirms the existence of the data autocorrelation. After performing the Mann–Kendall test, the alternative hypothesis of the existence of a monotonic trend could not be rejected. Both ADF and PP tests did not support the rejection of the existence of a unit root, while the KPSS rejected the hypothesis of series’ stationarity in level or around a deterministic trend. After taking the first-order difference, the ADF and PP tests rejected the existence of the unit root, while the KPSS test did not reject the hypothesis of the stationarity in level and around a deterministic trend.
The Pettitt test provides December 2004 (the 51st value of the series) as a breakpoint, while the mDP algorithm and CUSUM detected December 2004, November 2007 (86th value of the series), and March 2009 (102nd value of the series) as change points.
The same statistical tests have been performed for each subseries, detected after the segmentation. Here are the notations used in the rest of the paper for the above monthly series: S_m is lnBET, S1_m—the series containing the first 51 values of S series (October 2000–December 2004), S2_m—the series containing the values from the 52nd one to the 86th one (January 2005–November 2007), S3_m—the series containing 16 values recorded from December 2007 to March 2009, S4_m—the series registered in the period April 2009 –September 2014 (the last 66 values), U1_m—the series recorded between January 2005 and September 2014.
For each monthly series and subseries, the Mann–Kendall test [42] was used for testing the hypothesis that the series does not have a monotonic trend (H0) against the existence of such a trend (H1).
The results of all the tests are presented in Table 2, where yes means that H0 is rejected, and no means that H0 cannot be rejected.
For building GEP models for the monthly series, the window size, w, was selected between 1 and 12, considering the possible existence of a seasonal component. The performances of the best models are presented in Table 3, Table 4, Table 5, Table 6 and Table 7 for the training and test sets.
Table 3, Table 4 and Table 5 contain the results of the modeling of the monthly series without the trend removal, when the regressors were lag 1, and lag 1 and lag 12 variables, respectively. Since S3_m is too short, the model with lag 1 and lag 12 variables could not be built.
When analyzing the goodness of fit indicators for the series S_m and S1_m, one can see that the models’ performances on the training sets are not significantly different when lag 1 and lag 12 variables, or only lag 1 variables are considered.
The two-sided Diebold–Mariano test [50] was also used to cross-validate these findings. The null hypothesis was (H0) Both models have the same accuracy, and the alternative was (H1) The models do not have the same accuracy.
The p-values were 0.341 (0.532), when comparing the models for S_m (S1_m) on the training set. Therefore, the null hypothesis could not be rejected.
The model for S_m is better on the test set when using both regressors than when using only the first one (lag 1 variable), in terms of all indicators but the correlation between the actual and predicted values. The result of the Diebold–Mariano test confirms this assertion. This time the null hypothesis was the same as previously, while the alternative one was (H1′) The model obtained using both regressors is better than the models obtained utilizing lag 1 regressor.
Comparing the goodness of fit indicator, the results for the S1_m model are better when considering only the lag 1 regressor. Still, the two-sided Diebolo–Mariano test does not reject the null hypothesis.
The models obtained for S2_m and S4_m are better when using only the lag 1 variable, than when using lag 1 and lag 12 because a negative correlation between the actual and predicted values is noticed in the second case.
From the point of view of the correlation between actual and predicted values, GEP models with lag 1 regressors perform satisfactorily on both training and test sets, the best results being obtained for S1_m and S4_m. Since the models with lag 1 variables as regressors are better, in terms of correlation between the actual and predicted values, we kept these models for further comparisons.
For U1_m (Table 5), the quality of the model obtained with lag 1 variable is worse than that obtained with lag 1 and lag 12 as regressors on the test set (see the negative value of the correlation between actual and predicted values), but on the training one, the results are not significantly different (through applying the Diebold–Mariano two-sided test).
GEP models have been built for the detrended series and subseries, with the same variables as regressors. The models with significant coefficients and the highest variance of the series explained by the model (among the linear, exponential, polynomial of second and third degrees, and logarithmic) have been chosen to express the trend. The best trends were found to be linear or exponential, as presented in Table 6.
For S1_m, S3_m, and S2_m, the variance of the data series explained by the models is very high.
The following notations are adopted for the monthly detrended series and subseries: S_md, S1_md, S2_md, S3_md, S4_md, and U1_md, respectively.
Table 7, Table 8 and Table 9 contain the modeling results after the trend removal.
Table 7 shows that the trend removal improved the models’ performances (most significantly for S1_m and S4_m) on both training and test series, when the regressors were the lag 1 variables. Indeed, the MAPEs values diminished after the trend removal for all but the S_m series on the test set.
For example, Figure 2 presents the chart of S1_md (corresponding to the data from Table 7, columns 3 and 8). The equation of the detected linear trend in this model is:
Yt = 6.179 + 0.042t
where t = 1,…,51 is the number of the month, and Yt is the trend. The standard errors of the coefficients are respectively 0.0235 and 0.0142.
The equation of the GEP model is:
y t = 0.151 y t 1 2 + 0.7745 y t 1
The standard errors of the coefficients are respectively 0.0121 and 0.0415. All the model’s coefficients are significant, at a significance level of 0.05.
From Table 8, one remarks that the algorithm learns the input data well for the monthly series and its subseries, after the trend elimination. It better performs, in terms of correlation between the actual and predicted values on the test sets S2_md and S4_md.
A significantly diminished MAPE was computed for the S2_md test set. When the regressors are lag 1 and lag 12 variables, a notable improvement of the models is noticed only on the training set. The algorithm learns data well, but it cannot use what it learned on the test sets. Overall, the models’ performances are comparable or better with one regressor (Table 7) than with two regressors (Table 8).
The best models for U1_md have been obtained for two regressors (columns 2 and 3 from Table 9). The models for U1_md are better than those for U1, in terms of RMSE, MSE, MAE, and MAPE. Comparing the results from Table 5 and Table 9, one may remark that the trend removal contributed to improving the models’ quality when using lag 1 variables as regressors. Still, the model is not satisfactory, in terms of the correlation between actual and predicted values on the test series.
The next stage of this analysis was building the deseasonalized series (and subseries) and the corresponding GEP models. For this aim, the classical decomposition method was employed. The equation is:
y t = Y t + S t + ε t
where y t is the series to be decomposed, Y t is the trend, S t is the seasonality index, and ε t , the random variable.
The difference y t S t is by definition the deseasonalized series.
In the following, the deseasonalized monthly series S is denoted by S_ms.
Analogous notations are used for the monthly subseries.
The performances of the best models are presented in Table 10 and Table 11.
For S_ms, the correlation between the actual and predicted values on the training set is close to zero. On the test set, the same indicator is higher. Still, it remains lower than for the raw series or the detrended one. Generally, the models for the deseasonalized subseries are worse than those for the raw and detrended series.
For example, in Figure 3, one may see the modeling errors for S4_ms (Table 10, columns 5 and 9) which are very low compared to the data values (that are in the interval (7.916, 8.932)). The equation of the corresponding GEP model is
y t = ( 2.7144406 / y t 1 ) +   y t 1 0.9135744 .
For U1_ms, the best model has been obtained for lag 1 and 12 as regressors.
To conclude, removing the seasonality did not improve the modeling quality.

3.2. Results of Statistical Analysis and the GEP Models for the Monthly Series

The change point determined by the Pettitt test for the weekly series is 215 (26 December 2004), and those found by mDP are 215 (26 December 2004), 365 (9 December 2007), 435 (26 April 2009). In what follows, the weekly series is denoted by S_w, and the subseries determined by the change points are denoted, respectively, by S1_w (containing the values 1–215), S2_w (containing the values 216–365), S3_w (containing the values 366–435), S4_w (containing the values 436–717), and U1_w (containing the values 216–717). The subseries and the fitted linear trend are presented in Figure 4.
Table 12 contains the equations of the trend lines of the weekly series. The coefficients and the models are significant at a 0.05 significance level (based on the t-test and the F-test). The trend determined for the series and subseries is the best fit for the data series, selected from a set of deterministic models (linear, exponential, polynomial of the second and third-order, logarithmic, and power functions).
The results of the statistical tests on the weekly series are presented in Table 13. For all tests except normality and homoskedasticity, they are similar to those for the monthly series. To summarize, the hypotheses that the series are Gaussian, homoskedastic, stationary (in trend and level), and are not random, have been rejected, while the unit root hypothesis could not be rejected. Therefore, the series differentiation has been performed for finding the best ARIMA model (presented in the next section).
A window size from one to five has been considered when selecting the best GEP models for the weekly series. Therefore, in the models, the regressors were the lagged variables from one to five. These lag values have been selected because the actual values could be influenced by the values from the same week (and taking into account that during Saturdays and Sundays, the stocks are not traded at BSE). The best GEP models for the weekly series are presented in Table 14.
The models for S1_w, S3_w, and S4_w are good on both training and test sets. The correlations between actual and predicted values on the training sets are high for the other subseries, but they are much lower for the test set (0.610 for S_w and 0.268 for S3_w). On the same test sets, MAPEs are 2.294 and 1.887, confirming that the algorithm does not apply what it learned on the test sets S_w and S3_w. Since the data series are not monthly or annual, the seasonal component does not appear.
The study results on the detrended series are presented in Table 15. The best models are those for S1_wd and S4_wd. Considering all indicators but the correlation between actual and predicted values, the second place is occupied by the model for S2_wd, whereas taking into account this indicator, the model for U1_wd is in second place. The worst model is for S3_wd. The trend removal improved the model quality (compared to the raw monthly series) for the training series and S1 test set.
For the test set, the models for S1_wd and U1_wd are worse than S_w and U1_wd, respectively. The models for S2_w and S2_wd are comparable. The model for S3_w (S4_wd) is better than for S3_wd (S4_w), only based on the correlation between the actual and predicted values.

3.3. ARIMA Models for Monthly Series

The best ARIMA models for the monthly series are presented in the following. The lnBET series and subseries present a unit root, so the best models are obtained after taking the first-order difference of the raw series. Therefore, the goodness of fit coefficients will be smaller for the differentiated series (Figure 5) than for the raw series. This aspect should be taken into consideration when comparing the GEP and ARIMA models.
The best ARIMA models for the monthly series are presented in Table 16. For S1_m, S2_m, and S3_m, they are the differentiated series with or without drift. The models for S_m and S4_m are ARIMA, for which the autoregressive part is missing; for U1_m, it is an ARIMA whose moving average part is absent.
To validate the ARIMA models, the hypothesis that the residual forms a white noise has been tested by using the Anderson–Darling (A–D) test (for normality), the Levene test (for homoskedasticity), and plotting the autocorrelation and partial autocorrelation functions (ACF and PACF). For the last two functions, the lag at which the correlation appears is listed, while p-values (with three decimals) are displayed for the other tests. If the correlation is absent, ‘none’ is inserted in Table 17 that summarizes the results.
Among the ARIMA models for the monthly series, only three are correct from a statistical viewpoint, those for S1_m, S2_m, and S3_m. These are kept for comparisons, given that only valid models can be employed for a possible forecast. The GEP models for S_m, S4_m, and U1_m (Table 4 and Table 5) are kept.
Comparing the models for S2_m (Table 4), it results that the GEP one is the best on the test set (showing that the GEP algorithm learned the data well, and applied what it learned on the new dataset). The same is true for S3_m considering all indicators but MAPE, which is slightly higher for S3_m (the test set).
For a fair comparison, the GEP has been run for S1_m, S2_m, and S3_m without dividing the series into two parts, for training and test. It resulted in:
  • For S1_m, RMSE = 0.1788, MSE = 0.0320, MAE = 0.0759, MAPE = 1.1035;
  • For S2_m, RMSE = 0.09165, MSE = 0.0084, MAE = 0.0638, MAPE = 0.7226;
  • For S3_m, RMSE = 0.2200, MSE = 0.0484, MAE = 0.1594, MAPE = 1.8826.
Therefore, in these cases, ARIMA models are the best.
The ARIMA models for the monthly detrended series are presented in Table 18.
Among them, there are an ARMA(3,1) (for U1_d) and three AR (for S1_d, S2_d, and U1_d) models. At first sight, all of them have significantly higher MAPE than the corresponding GEP models (Table 7 and Table 9). Being a non-dimensional indicator, MAPE is most suitable for comparing different kinds of models. Based on this, GEP models are more competitive.
The results of the models’ validation are contained in Table 19. Based on this, only two models (Table 18) should be compared to the GEP ones, S1_md and S3_md. For these series, all the indicators of the GEP models are lower than the corresponding ones for ARIMA. Therefore, the best performances are noticed for all the GEP detrended series.
The ARIMA models for the monthly deseasonalized series are given in Table 20, and the results of their validation are in Table 21. Among the models, only that for S2_ms was validated (the hypothesis that the residual forms a white noise could not be rejected), so only the quality of this model should be discussed. The corresponding goodness of fit indicators in the ARIMA are smaller than those in the GEP model for S2_ms (Table 15). Still, for a relevant comparison of both models, the entire series has been used for building a GEP model. The indicators that characterize it are RMSE = 0.0523, MSE = 0.0027, MAE = 0.0384, MAPE = 0.4237. Therefore, the GEP model is the best one.
Comparison of the ARIMA models for the raw series, the detrended and deseasonalized ones shows that:
  • In terms of MAPE, the models for the detrended series are the worst.
  • Taking into account RMSE, MSE, and MAE, the raw series models have similar performances with those for the detrended series for all but S3. Still, the model for S2_md is worse than for S2, since the first one does not satisfy the hypotheses on the residuals.
  • In terms of MAPE, the ARIMA models for S2_ms and S4_ms are better than those of S2_m and S4_m. Even if the MAPE for S1_ms is smaller than for S1_m, the residual is not white noise, so the first model cannot be considered better than the corresponding one for the raw data series.
  • Comparative results have been obtained for U1_m and U1_ms.
  • The other goodness of fit indicators generally have comparative values for the rest of the initial and deseasonalized series.

3.4. ARIMA Models for Weekly Series

The same procedure was followed for studying the weekly series. The ARIMA models for the series and the detrended ones are given in Table 22 and Table 23.
The output of the tests for validating the models is contained in Table 24 and Table 25.
Table 24 shows that all but the ARIMA model for S2_w do not satisfy the hypothesis that the residual series form a white noise.
To compare the models for S2_w, the GEP algorithm has been run without dividing the series into training and test. The following values of the indicators have been obtained: RMSE = 0.0477, MSE = 0.0023, MAE = 0.0285, MAPE = 0.3222 (higher than the corresponding values in the ARIMA model). Therefore, ARIMA performs better than GEP only on S2_w.
Table 25 shows that the ARIMA models for the detrended weekly series cannot be validated. Furthermore, MAPE is extremely high by comparison to those of the GEP models for the same series, and of ARIMA models for the raw weekly data. So, the trend removal does not lead to the quality augmentation.

4. Conclusions

Financial time series are characterized by nonlinearity and high variability, often accompanied by heteroskedasticity and non-stationarity, which make the deterministic methods unsuitable for modeling such series. Different stochastic models rely on restrictive assumptions on the data generating processes. The artificial intelligence methods do not impose such restrictions, making them good options when other kinds of algorithms cannot be used, or satisfactory models cannot be found by other approaches.
Despite their advantages, the relationship between the statistical properties of the data series—normality, homoscedasticity, autocorrelation, stationarity, and the trend existence—and the quality of the models built for the series utilizing AI techniques was not investigated. Knowledge on the performances of different algorithms applied on time series with the same statistical properties would eliminate some modeling options, and reduce the time of searching for the appropriate approach. Such modeling is suitable not only for financial time series but for other fields as well, according to the statistical characteristics of a series.
Therefore, in this article, we built GEP and ARIMA models for lnBET series and subseries, to determine if the statistical properties are connected to the goodness of fit of such models. The modeling results show that:
(1)
The normality and homoskedasticity do not have a major influence on the models’ performances;
(2)
The trend removal results in better GEP models;
(3)
The seasonality elimination does not lead to an improvement of the modeling quality.
(4)
The trend removal results in worse ARIMA models;
(5)
Generally, GEP performed better than ARIMA on the study series.
Remark (1) does not seem surprising, since we are dealing with point forecasts and performance measures of conditional location parameters of the conditional density.
The first three findings are in concordance with the results from [14], which refers to GRNN models.
The study should be extended on simulated and other real data series for collecting more evidence on the connection between the models’ performances and the series statistical characteristics.

Author Contributions

Conceptualization, A.B.; methodology, A.B.; software, C.Ș.D.; validation, A.B.; formal analysis, A.B. and C.Ș.D.; investigation, A.B. and C.Ș.D.; data curation, A.B.; writing—original draft preparation, A.B. and C.Ș.D.; writing—review and editing, A.B. and C.Ș.D.; supervision, A.B.; project administration, A.B. Both authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is available at: http://www.bvb.ro (accessed on 15 November 2018).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sinclair, T.M.; Stekler, H.O.; Kitzinger, L. Directional forecasts of GDP and inflation: A joint evaluation with an application to Federal Reserve predictions. Appl. Econ. 2008, 40, 2289–2297. [Google Scholar] [CrossRef]
  2. Wagner, N.; Michalewicz, Z.; Khouja, M.; Mcgregor, R.R. Time series forecasting for dynamic environments: The dyfor genetic program model. IEEE Trans. Evol. Comput. 2007, 11, 433–452. [Google Scholar] [CrossRef] [Green Version]
  3. Box, G.; Jenkins, G. Time Series Analysis: Forecasting and Control; Holden-Day: San Francisco, FA, USA, 1970. [Google Scholar]
  4. Li, Z.; Han, J.; Song, Y. On the forecasting of high-frequency financial time series based on ARIMA model improved by deep learning. J. Forecast. 2020, 39, 1081–1097. [Google Scholar] [CrossRef]
  5. Pahlavani, M.; Roshan, R. The Comparison among ARIMA and hybrid ARIMA-GARCH Models in Forecasting the Exchange Rate of Iran. Int. J. Bus. Dev. Stud. 2015, 7, 31–50. [Google Scholar]
  6. Hamilton, J. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 1994. [Google Scholar]
  7. Kuan, C.-M. Lecture on the Markov switching model. Inst. Econ. Acad. Sin. 2002, 8, 1–30. [Google Scholar]
  8. Harvey, A.C.; Ruiz, E.; Shephard, N. Multivariate stochastic variance models. Rev. Econ. Stud. 1994, 61, 247–264. [Google Scholar] [CrossRef] [Green Version]
  9. Gray, S.F. Modeling the conditional distribution of interest rates as a regime switching process. J. Financ. Econ. 1996, 42, 27–62. [Google Scholar] [CrossRef]
  10. Chatfield, C. Time Series Forecasting; Chapman and Hall Text in Statistical Science: London, UK, 2000. [Google Scholar]
  11. Lee, C.F.; Chen, G.-M.; Rui, O.M. Stock returns and volatility on China’s stock markets. J. Fin. Res. 2014, 24, 523–543. [Google Scholar] [CrossRef]
  12. Lo, A.W. The Adaptive Markets Hypothesis: Market Efficiency from an evolutionary perspective. J. Portf. Manag. 2004, 30, 15–29. [Google Scholar] [CrossRef]
  13. Bărbulescu, A.; Băutu, E. A hybrid approach for modeling financial time series. Int. Arab J. Inf. Technol. 2012, 9, 327–335. [Google Scholar]
  14. Bărbulescu, A. Do the time series statistical properties influence the goodness of fit of GRNN models? Study on financial series. Appl. Stoch. Model. Bus. 2018, 34, 586–596. [Google Scholar] [CrossRef]
  15. Simian, D.; Stoica, F.; Bărbulescu, A. Automatic Optimized Support Vector Regression for Financial Data Prediction. Neural Comput. Appl. 2020, 32, 2383–2396. [Google Scholar] [CrossRef]
  16. Chen, S. Genetic Algorithms and Genetic Programming in Computational Finance; Kluwer Academic Publishers: Amsterdam, The Nederlands, 2002. [Google Scholar]
  17. Karathanasopoulos, A.; Sermpinis, G.; Laws, J.; Dunis, C. Modelling and trading the Greek stock market with Gene Expression and Genetic Programing algorithms. J. Forecast. 2014, 33, 596–610. [Google Scholar] [CrossRef] [Green Version]
  18. Peng, X. TSVR: An efficient twin support vector machine for regression. Neural Netw. 2010, 23, 365–372. [Google Scholar] [CrossRef] [PubMed]
  19. Cao, L.; Tay, F.E.H. Financial forecasting using support vector machines. Neural Comput. Appl. 2001, 10, 184–192. [Google Scholar] [CrossRef]
  20. Emad, W.S.; Danil, V.P.; Donald, C.W. Comparative study of stock trend prediction using time delay, recurrent and probabilistic neural networks. IEEE Trans. Neural Netw. 1998, 9, 1456–1470. [Google Scholar]
  21. Lin, C.T.; Prasad, M.; Saxena, A. An Improved Polynomial Neural Network Classifier Using Real Coded Genetic Algorithm. IEEE Trans. Syst. Man Cybern. Syst. 2015, 45, 1389–1401. [Google Scholar] [CrossRef]
  22. Koza, J.R. Genetic programming as a means for programming computers by natural selection. Stat. Comput. 1994, 4, 87–112. [Google Scholar] [CrossRef]
  23. Ferreira, C. Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  24. Huang, C.H.; Yang, C.B.; Chen, H.H. Trading strategy mining with gene expression programming. In Proceedings of the 2013 International Conference on Applied Mathematics and Computational Methods in Engineering, Rhodes Island, Greece, 21–27 September 2013; pp. 37–42. [Google Scholar]
  25. Chen, H.H.; Yang, C.B.; Peng, Y.H. The trading on the mutual funds by gene expression programming with Sortino ratio. Appl. Soft Comput. 2014, 15, 219–230. [Google Scholar] [CrossRef]
  26. Lee, C.-H.; Yang, C.-B.; Chen, H.-H. Taiwan stock investment with gene expression programming. Procedia Comput. Sci. 2014, 35, 137–146. [Google Scholar] [CrossRef] [Green Version]
  27. Sermpinis, G.; Laws, J.; Karathanasopoulos, A.; Dunis, C. Forecasting and trading the EUR/USD exchange rate with Gene Expression and Psi Sigma Neural Networks. Expert Syst. Appl. 2012, 31, 8865–8877. [Google Scholar] [CrossRef]
  28. Bărbulescu, A.; Dumitriu, C.S. Artificial intelligence models for financial time series. Ovidius Univ. Ann. Econ. Sci. Ser. 2021, in press. [Google Scholar]
  29. Ariton, V.; Palade, V.; Postolache, F. Combined deep and shallow knowledge in a unified model for diagnosis by abduction. EuroEconomica 2008, 1, 33–42. [Google Scholar]
  30. Dragomir, F.L. Modeling Resource in E-Commerce. In Proceedings of the 10th International Conference on Knowledge Management: Projects, Systems and Technologies, Bucharest, Romania, 23–24 November 2017; Security and Defence Faculty “Carol I” National Defence University: Bucharest, Romania, 2017; pp. 38–41. [Google Scholar]
  31. Dragomir, F.L. Models of Digital Markets. In Proceedings of the 10th International Conference on Knowledge Management: Projects, Systems and Technologies, Bucharest, Romania, 23–24 November 2017; Security and Defence Faculty “Carol I” National Defence University: Bucharest, Romania, 2017; pp. 47–51. [Google Scholar]
  32. Bărbulescu, A.; Dumitriu, C.S. Markov Switching Model for Financial Time Series. Ovidius Univ. Ann. Econ. Sci. Ser. 2021, in press. [Google Scholar]
  33. Postolache, F.; Bumbaru, S.; Ariton, V. Complex systems virtualization in the current’s economical context. EuroEconomica 2010, 3, 29–50. [Google Scholar]
  34. Bucharest Stock Exchange. Available online: http://www.bvb.ro (accessed on 10 April 2019).
  35. Bucharest Exchange Trading. Available online: www.bvb.ro/info/indices/2017/2017.10.10%20-%20BET%20Factsheet.pdf (accessed on 10 April 2019).
  36. Gel, Y.R.; Gastwirth, J.L. A robust modification of the Jarque-Bera test of normality. Econ. Lett. 2008, 99, 30–32. [Google Scholar] [CrossRef]
  37. Razali, N.M.; Yap, B.W. Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests. J. Stat. Model. Anal. 2011, 2, 21–33. [Google Scholar]
  38. Levene, H. Robust Test for Equality of Variances. In Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling; Olkin, I., Ed.; Stanford University Press: Palo Alto, CA, USA, 1960; pp. 278–292. [Google Scholar]
  39. Gibbons, J.D.; Chakraborti, S. Nonparametric Statistical Inference, 4th ed.; Marcel Dekker: New York, NY, USA, 2003. [Google Scholar]
  40. Wald, A.; Wolfowitz, J. On a test whether two samples are from the same population. Ann. Math. Stat. 1940, 11, 147–162. [Google Scholar] [CrossRef]
  41. Brockwell, P.J.; Davis, R.A. Introduction to Time Series and Forecasting; Springer: New York, NY, USA, 2002. [Google Scholar]
  42. Kendall, M.G. Rank Correlation Methods, 5th ed.; Oxford University Press: London, UK, 1990; pp. 56–80. [Google Scholar]
  43. Dickey, D.A.; Fuller, W.A. Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 1979, 74, 427–431. [Google Scholar]
  44. Phillips, P.C.B.; Perron, P. Testing for a unit root in time series regression. Biometrika 1988, 75, 335–346. [Google Scholar] [CrossRef]
  45. Kwiatkowski, D.; Phillips, P.C.B.; Schmidt, P.; Shin, Y. Testing the null hypothesis of stationarity against the alternative of a unit root. J. Econ. 1992, 54, 159–178. [Google Scholar] [CrossRef]
  46. Pettitt, A.N. A non-parametric approach to the change-point problem. Appl. Stat. 1979, 28, 126–135. [Google Scholar] [CrossRef]
  47. Hawkins, D.M.; Olwell, D.H. Cumulative Sum Charts and Charting for Quality Improvement; Springer: New York, NY, USA, 1998. [Google Scholar]
  48. Gedikli, A.; Aksoy, H.; Unal, N.E.; Kehagias, A. Modified dynamic programming approach for offline segmentation of long hydrometeorological time series. Stoch. Environ. Res. Risk Assess. 2010, 24, 547–557. [Google Scholar] [CrossRef]
  49. Bărbulescu, A. Studies on Time Series. Applications in Environmental Sciences; Springer: New York, NY, USA, 2016. [Google Scholar]
  50. Diebold, F.X.; Mariano, R.S. Comparing the predictive accuracy. J. Bus. Econ. Stat. 1995, 13, 253–263. [Google Scholar]
Figure 1. BET monthly (left-hand side) and weekly series (right-hand side).
Figure 1. BET monthly (left-hand side) and weekly series (right-hand side).
Mathematics 09 01853 g001
Figure 2. The best model for S1_m (with lag 1 regressor) when a linear trend was removed.
Figure 2. The best model for S1_m (with lag 1 regressor) when a linear trend was removed.
Mathematics 09 01853 g002
Figure 3. The best model for S4_ms (with lag 1 regressor).
Figure 3. The best model for S4_ms (with lag 1 regressor).
Mathematics 09 01853 g003
Figure 4. The weekly series and subseries.
Figure 4. The weekly series and subseries.
Mathematics 09 01853 g004
Figure 5. The monthly differentiated series.
Figure 5. The monthly differentiated series.
Mathematics 09 01853 g005
Table 1. Tests performed for data analysis.
Table 1. Tests performed for data analysis.
No.Type of TestNull and Alternative HypothesesTests Performed
I.NormalityH0: The series is Gaussian
H1: The series is not Gaussian
Robust Jarque-Bera [36], Anderson-Darling, and Shapiro-Wilk tests [37]
II.HomoskedasticityH0: The series is homoskedastic
H1: The series is heteroskedastic
Levene test [38]
III.RandomnessH0: The series comes from a random process
H1: The series does not come from a random process
Runs test [39,40]
Autocorrelation function [41]
IV.Trend existenceH0: The series does not have a monotonic trend
H1: The series has a monotonic trend
Mann-Kendall test [42]
V.Unit root testsH0: The series has a unit root
H1: The series is stationary
Augmented Dickey-Fuller test (ADF) [43], Phillips-Perron test(PP) [44]
VI.StationarityH0: The series is stationary (a) in level or (b) around a deterministic trend
H1: The series is not stationary (a) in level or (b) around a deterministic trend
KPSS test [45]
VII.BreakpointH0: The series has no breakpoint
H1: The series has at least a breakpoint
Pettitt test [46], CUSUM [47], mDP [48]
Table 2. Results of statistical tests on the monthly series.
Table 2. Results of statistical tests on the monthly series.
Type of TestSeries
S_mS1_mS2_mS3_mS4_mU1_m
I.Normalityyesyesnononoyes
II.Homoskedasticityyesnoyesnonoyes
III.Randomnessyesyesyesyesyesyes
IV.Trend existenceyesyesyesyesyesno
V. Unit rootnononononono
VI.a. Level stationarityyesyesyesyesyesyes
b. Trend stationarityyesyesyesnoyesyes
Table 3. Performances of the best models built for the monthly series with lag 1 variables as regressors.
Table 3. Performances of the best models built for the monthly series with lag 1 variables as regressors.
SeriesTrainingTest
S_mS1_mS2_mS3_mS4_mS_mS1_mS2_mS3_mS4_m
Correlation actual-
predicted values
0.9650.8930.8500.7850.7430.6170.9800.5700.5770.960
RMSE 0.2130.2130.1020.2340.1010.4050.1410.0700.1300.096
MSE0.0530.0450.0100.0160.0100.1640.0200.0040.0160.009
MAE0.0910.0870.0720.1180.0570.3550.1140.0550.1180.086
MAPE1.2051.3230.8301.5290.6904.1291.4210.6081.5290.979
Removed trendnonenonenonenonenonenonenonenonenonenone
Table 4. The goodness of fit indicators for the best models built for the monthly series with lag 1 and lag 12 variables as regressors.
Table 4. The goodness of fit indicators for the best models built for the monthly series with lag 1 and lag 12 variables as regressors.
SeriesTrainingTest
S_mS1_mS2_mS4_mS_mS1_mS2_mS4_m
Correlation actual-predicted values0.9650.895−0.2920.7430.5580.9800.200−0.874
RMSE0.2280.2120.2750.0960.1470.2260.0970.239
MSE0.0520.0440.0750.0090.0210.0510.0090.057
MAE0.0910.0870.1930.0560.1230.1840.0740.215
MAPE1.2031.3202.2390.6691.4192.2742.8112.451
Table 5. The goodness of fit indicators for the best models built for U1_m.
Table 5. The goodness of fit indicators for the best models built for U1_m.
IndicatorsRegressors: Lag 1 and Lag 12Regressors: Lag 1
Training TestTraining Test
Correlation actual-predicted values0.9580.5150.956−0.933
RMSE0.1010.1360.1040.183
MSE 0.0100.0180.0100.033
MAE0.0730.1140.0740.151
MAPE0.8681.3120.8791.729
Table 6. The equations of the trend for the monthly series and subseries.
Table 6. The equations of the trend for the monthly series and subseries.
Trend EquationVariance Explained by the Trend
S_mYt = 7.2878 + 0.0112t47.938%
S1_mYt = 6.1794 + 0.0422t97.307%
U1_mYt = 8.5511 + 0.3236   e 0.0264 t 7.392%
S2_mYt = 8.5607 + 0.0202t84.604%
S3_mYt = 10.2396 − 1.0992   e 0.0604 t 92.758%
S4_mYt = 8.3897 + 0.03742   e 0.0388 t 54.772%
Note. Yt denotes the trend and t is the time.
Table 7. Performances of the best models built for the detrended monthly series with lag 1 variable as regressors.
Table 7. Performances of the best models built for the detrended monthly series with lag 1 variable as regressors.
Indicators/SeriesTrainingTest
S_mdS1_mdS2_mdS3_mdS4_mdS_mdS1_mdS2_mdS3_mdS4_md
Correlation actual-
predicted values
0.9880.9880.9270.8100.5760.9750.9800.5670.6990.942
RMSE 0.1300.0700.0710.1240.0840.4850.0620.0730.1500.034
MSE0.0170.0050.0050.0150.0070.2360.0030.0050.0220.001
MAE0.0780.0540.0580.0970.0530.4260.0500.0560.1150.025
MAPE1.0060.7910.6641.1260.6334.9560.6300.6251.4820.290
Removed trend linearlinearlinearexpexplinearlinearlinearexpexp
Table 8. Performances of the best models built for the detrended monthly series and subseries with lag 1 and lag 12 variables as regressors.
Table 8. Performances of the best models built for the detrended monthly series and subseries with lag 1 and lag 12 variables as regressors.
Indicators/SeriesTrainingTest
S_mdS1_mdS2_ mdS4_mdS_mdS1_mdS2_mdS4_md
Correlation actual-predicted values0.9890.9910.9280.9890.4480.8390.4440.448
RMSE0.1290.0610.0750.1210.6560.2950.1080.656
MSE0.0160.0030.0050.0160.4310.0870.0110.431
MAE0.0780.0480.0590.0780.5870.2390.0880.587
MAPE1.0090.7110.6811.0096.8283.0230.9776.828
Removed trendlinearlinearlinearlinearlinearlinearlinearlinear
Table 9. Performances of the best models built for U1_md.
Table 9. Performances of the best models built for U1_md.
IndicatorsRegressors: Lag 1 and Lag 12Regressors: Lag 1
TrainingTestTrainingTest
Correlation actual-predicted values0.9580.1680.954−0.916
RMSE0.1020.1710.1060.180
MSE 0.0100.0290.0110.032
MAE0.0770.1520.0770.148
MAPE0.9021.7540.9041.694
Removed trend linearlinearexpexp
Table 10. Performances of the best models built for the deseasonalized monthly series its subseries with lag 1 variables as regressors.
Table 10. Performances of the best models built for the deseasonalized monthly series its subseries with lag 1 variables as regressors.
Indicators/SeriesTrainingTest
S_msS1_msS2_msS4_msS_msS1_msS2_msS4_ms
Correlation actual-predicted values0.0650.9110.8550.7430.6170.9610.4420.960
RMSE0.2310.1980.0960.1010.4050.2120.1050.096
MSE0.0530.0390.0090.0100.1640.0440.0110.009
MAE0.0910.0800.0550.0570.3550.1870.0830.086
MAPE1.2051.2060.6350.6904.1292.3140.9100.979
Removed trendnonenonenonenonenonenonenonenone
Table 11. Performances of the best models built for U1_ms.
Table 11. Performances of the best models built for U1_ms.
Indicators/SeriesRegressors: Lag 1 and Lag 12Regressors: Lag 1
TrainingTestTrainingTest
Correlation actual-predicted values0.9580.4200.956−0.933
RMSE0.1020.1120.1040.184
MSE0.0100.0310.0110.034
MAE0.1520.0970.0750.151
MAPE0.9020.8570.8791.727
Removed trendnonenonenonenone
Table 12. The equations of the trend for the weekly series and subseries.
Table 12. The equations of the trend for the weekly series and subseries.
Trend EquationVariance Explained by the Trend
S_wYt = 7.2800 + 0.0026t47.650%
S1_wYt = 6.1372 + 0.0100t97.412%
U1_wYt = 8.7731 − 0.0005 t5.589%
S2_wYt = 8.5569 + 0.0047t86.379%
S3_wYt = 11.3341 − 2.1602 e 0.0081 t 89.660%
S4_wYt = 8.4420 + 0.0117 e 0.0137 t 49.732%
Table 13. Results of statistical tests on the weekly series.
Table 13. Results of statistical tests on the weekly series.
Type of TestSeries
S_wS1_wS2_wS3_wS4_wU1_w
I.Normalityyesyesyesyesyesyes
II.Homoskedasticityyesyesyesyesyesyes
III.Randomnessyesyesyesyesyesyes
IV.Trend existenceyesyesyesyesyesyes
V.Unit rootnononononono
VI.a. Level stationarityyesyesyesyesyesyes
b. Trend stationarityyesyesyesyesyesyes
Table 14. Performances of the best models built for S_w and its subseries.
Table 14. Performances of the best models built for S_w and its subseries.
SeriesTrainingTest
S_wS1_wS2_wS3_wS4_wU1_wS_wS1_wS2_wS3_wS4_wU1_w
Correlation actual-predicted values0.9920.9760.9570.9520.9300.9910.6100.9680.6040.2680.9010.916
RMSE0.1110.1020.0530.1010.0520.0480.2260.0800.0550.2160.0420.063
MSE0.0120.0100.0030.0100.0030.0020.0510.0060.0030.0470.0020.004
MAE0.0340.0320.0300.0550.0270.0320.1970.0660.0450.1490.0310.055
MAPE0.4480.4930.3390.6380.3230.3802.2940.8280.4911.8870.3560.642
Regressors (lag)1 & 511 & 511 & 51 ÷ 51 & 511 & 511 & 51 ÷ 5
Table 15. Performances of the best models built for the S_w and its subseries, after the trend removal. The detrended series are denoted by S_wd, S1_wd, S2_wd, S3_wd, S4_wd, and U1_wd.
Table 15. Performances of the best models built for the S_w and its subseries, after the trend removal. The detrended series are denoted by S_wd, S1_wd, S2_wd, S3_wd, S4_wd, and U1_wd.
SeriesTraining Test
S_wdS1_wdS2_wdS3_wdS4_wdU1_wdS_wdS1_wdS2_wdS3_wdS4_wdU1_wd
Correlation actual-predicted values0.9980.9970.9820.9810.9440.9890.5560.9760.6000.2130.9380.836
RMSE 0.0600.0330.0340.0650.0460.0500.4870.0580.0590.2040.0450.187
MSE0.0040.0010.0010.0040.0020.0020.2370.0030.0030.0410.0020.035
MAE0.0320.0250.0250.0430.0260.0320.4390.0490.0460.1420.0370.152
MAPE0.4050.3670.2850.5050.3070.3805.1010.6200.5071.8030.4231.740
Removed trendlin.lin.lin.expexplinlinlin.lin.expexplin
Regressors (lag)1 & 51 ÷ 51 & 51111 & 51 ÷ 51 & 5111
Table 16. ARIMA models for the monthly series and subseries.
Table 16. ARIMA models for the monthly series and subseries.
SeriesModel TypeCoefficientsThe Goodness of Fit Indicators
AR1DriftMA1RMSEMSEMAEMAPE
S_mARIMA (0,1,1) with drift-0.24490.01560.08750.00760.06210.7638
S1_mARIMA (0,1,0) with drift-0.0422-0.07070.00500.05310.7386
S2_mARIMA (0,1,0)---0.07550.00570.05510.6203
S3_mARIMA (0,1,0) with drift-−0.0949-0.15080.02270.11871.4401
S4_mARIMA (0,1,1)--0.27740.05520.00300.04320.5073
U1_mARIMA (1,1,0)0.2152--0.08930.00800.06300.7375
Table 17. Results of the statistical tests on the residual from ARIMA models for the monthly series.
Table 17. Results of the statistical tests on the residual from ARIMA models for the monthly series.
A–DLeveneACFPACFRemark
S_mARIMA (0,1,1) with drift0.0000.01244Yes
S1_mARIMA (0,1,0) with drift0.1950.708nonenoneNo
S2_mARIMA (0,1,0)0.3350.199nonenoneNo
S3_mARIMA (0,1,0) with drift0.9420.179nonenoneNo
S4_mARIMA (0,1,1) 0.2470.01766Yes
U1_mARIMA (1,1,0)0.0000.000nonenoneYes
Note. Yes means that the hypothesis that the residual forms a white noise could be rejected.
Table 18. ARIMA models for monthly detrended series.
Table 18. ARIMA models for monthly detrended series.
SeriesModel TypeCoefficientsThe Goodness of Fit Indicators
AR CoefficientsDriftMA1RMSEMSEMAEMAPE
S_mdARIMA (0,1,1)--0.24620.08750.00770.0626111.8606
S1_mdARIMA (1,0,0) with 0 mean-0.7873-0.67200.45160.053041.9276
S2_mdARIMA (1,0,0) with 0 meanAR1 = 0.6492 --0.06930.00480.0552197.9489
S3_mdARIMA (0,1,0) ---0.22600.05110.167565.7451
S4_mdARIMA (2,0,0)AR1 = 1.2211
AR2 = −0.3352
--0.05400.00290.0417103.3804
U1_mdARIMA (3,0,1)AR1 = 2.0902
AR2 = −1.2549
AR3 = 0.1561
-−0.98680.08580.00740.061087.0538
Table 19. Results of the statistical tests on the residual from ARIMA models for the monthly detrended series.
Table 19. Results of the statistical tests on the residual from ARIMA models for the monthly detrended series.
SeriesModel TypeA–D LeveneACFPACFRemark
S_mdARIMA (0,1,1)0.0000.00144Yes
S1_mdARIMA (1,0,0) with 0 mean0.2480.843nonenoneNo
S2_mdARIMA (1,0,0) with 0 mean0.0050.032nonenoneYes
S3_mdARIMA (0,1,0)0.0890.237nonenoneNo
S4_mdARIMA (2,0,0)0.1640.01244Yes
U1_mdARIMA (3,0,1)0.0050.000nonenoneYes
Note. Yes means that the hypothesis that the residual forms a white noise could be rejected.
Table 20. ARIMA models for the monthly deseasonalized series.
Table 20. ARIMA models for the monthly deseasonalized series.
CoefficientsThe Goodness of Fit Indicators
SeriesModel TypeAR1DriftMA1RMSEMSEMAEMAPE
S_msARIMA (0,1,1) with drift-0.015640.01560.09040.00800.06350.7816
S1_msARIMA (0,1,0) with drift-0.0424-0.07400.00550.05400.0127
S2_msARIMA (0,1,0) with drift-0.0156-0.05720.00330.04220.4753
S4_msARIMA (0,1,0) with drift 0.0154-0.06060.00370.04310.5044
U1_msARIMA (0,1,0)---0.09540.00910.06540.7698
Table 21. Results of the statistical tests on the residual from ARIMA models for the monthly deseasonalized series.
Table 21. Results of the statistical tests on the residual from ARIMA models for the monthly deseasonalized series.
SeriesModel TypeA–DLeveneACFPACFRemark
S_msARIMA (0,1,1) with drift0.9420.00144Yes
S1_msARIMA (0,1,0) with drift0.0170.723nonenoneYes
S2_msARIMA (0,1,0) with drift0.0610.833nonenoneNo
S4_msARIMA (0,1,0) with drift0.0010.017nonenoneYes
U1_msARIMA (0,1,0)0.0000.00011Yes
Note. Yes means that the hypothesis that the residual forms a white noise could be rejected.
Table 22. ARIMA models for weekly series S_w, S1_w, S2_w, S3_w, S4_w, and U1_w.
Table 22. ARIMA models for weekly series S_w, S1_w, S2_w, S3_w, S4_w, and U1_w.
SeriesModel TypeCoefficientsThe Goodness of Fit Indicators
AR Coef.DriftMA Coef.RMSEMSEMAEMAPE
S_wARIMA (1,0,5)AR1 = −0.9765-MA1 = 0.0167
MA2 = −0.8277
MA3 = 0.0373
MA4 = −0.1195
MA5 = −0.0816
0.03730.00140.02600.3219
S1_wARIMA (0,1,0) with drift-0.0101-0.03200.00100.02420.3404
S2_wARIMA (0,1,0)-0.0041-0.03420.00120.02580.2910
S3_wARIMA (0,1,0) with drift-−0.017-0.06810.00460.04710.0012
S4_wARIMA (4,1,1) with driftAR1 = −0.6230
AR2 = 0.1165
AR3 = 0.0737
AR4 = −0.1307
0.0030MA1 = 0.59240.02920.00090.02080.2444
U1_wARIMA (3,1,0) with driftAR1 = 0.0034
AR2 = 0.1779
AR3 = 0.0934
--0.03910.00150.02670.3123
Table 23. ARIMA models for detrended weekly series S_wd, S1_wd, S2_wd, S3_wd, S4_wd, and U1_wd.
Table 23. ARIMA models for detrended weekly series S_wd, S1_wd, S2_wd, S3_wd, S4_wd, and U1_wd.
SeriesModel TypeCoefficientsThe Goodness of Fit Indicators
ARDriftMA1RMSEMSEMAEMAPE
S_wdARIMA (0,0,1) with 0 mean--0.24620.08750.00770.0626111.8606
S1_wdARIMA (1,0,0) with 0 mean-0.7873-0.67200.45160.053041.9276
S2_wdARIMA (1,0,0) with 0 meanAR1 = 0.6492--0.06930.00480.0552197.9489
S3_wdARIMA (0,1,0)---0.22600.05110.167565.7451
S4_wdARIMA (2,0,0)AR1 = 1.2211
AR2 = −0.3352
--0.05400.00290.0417103.3804
U1_wdARIMA (3,0,1)AR1 = 2.0902
AR2 = −1.2549
AR3 = 0.1561
-−0.98680.08580.00740.061087.0538
Table 24. Tests of the residuals in ARIMA models for the weekly series S_w, S1_w, S2_w, S3_w, S4_wd, and U1_w.
Table 24. Tests of the residuals in ARIMA models for the weekly series S_w, S1_w, S2_w, S3_w, S4_wd, and U1_w.
SeriesModel TypeA–DLeveneACFPACFRemark
S_wARIMA (1,0,5)0.0000.000nonenoneYes
S1_wARIMA (0,1,0) with drift0.0040.1301111Yes
S2_wARIMA (0,1,0)0.1170.287nonenoneNo
S3_wARIMA (0,1,0) with drift0.0000.28523Yes
S4_wARIMA (4,1,1) with drift0.0000.000nonenoneYes
U1_wARIMA (3,1,0) with drift0.0000.0001212Yes
Note. Yes means that the hypothesis that the residual forms a white noise could be rejected.
Table 25. Tests of the residuals in ARIMA detrended models for weekly series S_wd, S1_wd, S2_wd, S3_wd, S4_wd, and U1_wd.
Table 25. Tests of the residuals in ARIMA detrended models for weekly series S_wd, S1_wd, S2_wd, S3_wd, S4_wd, and U1_wd.
SeriesModel TypeA–DLeveneACFPACFRemark
S_wARIMA (0,0,1) with 0 mean0.0000.0001212Yes
S1_wARIMA (1,0,0) with 0 mean0.0030.146none11Yes
S2_wARIMA (1,0,0) with 0 mean0.0870.3171011Yes
S3_wARIMA (0,1,0)0.0000.352nonenoneYes
S4_wARIMA (2,0,0)0.0000.000nonenoneYes
U1_wARIMA (3,0,1)0.0000.0001211Yes
Note. Yes means that the hypothesis that the residual forms a white noise could be rejected.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bărbulescu, A.; Dumitriu, C.Ș. On the Connection between the GEP Performances and the Time Series Properties. Mathematics 2021, 9, 1853. https://doi.org/10.3390/math9161853

AMA Style

Bărbulescu A, Dumitriu CȘ. On the Connection between the GEP Performances and the Time Series Properties. Mathematics. 2021; 9(16):1853. https://doi.org/10.3390/math9161853

Chicago/Turabian Style

Bărbulescu, Alina, and Cristian Ștefan Dumitriu. 2021. "On the Connection between the GEP Performances and the Time Series Properties" Mathematics 9, no. 16: 1853. https://doi.org/10.3390/math9161853

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop