On the Connection between the GEP Performances and the Time Series Properties

: Artiﬁcial intelligence (AI) methods are interesting alternatives to classical approaches for modeling ﬁnancial time series since they relax the assumptions imposed on the data generating process by the parametric models and do not impose any constraint on the model’s functional form. Even if many studies employed these techniques for modeling ﬁnancial time series, the connection of the models’ performances with the statistical characteristics of the data series has not yet been investigated. Therefore, this research aims to study the performances of Gene Expression Programming (GEP) for modeling monthly and weekly ﬁnancial series that present trend and/or seasonality and after the removal of each component. It is shown that series normality and homoskedasticity do not inﬂuence the models’ quality. The trend removal increases the models’ performance, whereas the seasonality elimination results in diminishing the goodness of ﬁt. Comparisons with ARIMA models built are also provided.


Introduction
Financial time series analysis and forecasting have been interesting research fields for decades. Time series analysis may explain the law governing the data generating process, while good models may provide accurate predictions of its future behavior, supporting the decisions for profitable trading strategies [1,2].
From the apparition of the Box-Jenkins methodology [3] for times series analysis and forecast, ARIMA and SARIMA models have been extensively used to forecast financial time series [4,5]. In 1989, Hamilton [6] introduced the Markov Switching Model, which became one of the most popular models used for the nonlinear series, involving multiple equations that characterize the series behavior in different regimes [7]. The switching mechanism has also been incorporated in conditional variance models, such as stochastic volatility models [8] or GARCH [9]. Recent studies suggest that GARCH [10,11] can be a promising alternative to the traditional SARIMA method in forecasting problems, especially for nonlinear data. Still, it was shown that it is difficult to find such models with very high accuracy for long series with high variability.
The efficient market hypotheses, which assert that the market is efficient, attracted many criticisms from behavioral economists or psychologists. Introduced by Lo [12], for the reconciliation of the economic theories based on the efficient market hypothesis with behavioral economics, the adaptive market hypotheses apply the principles of evolution (competition, adaptation, and natural selection) [13]. The present study is in line with this theory.
Last period, artificial intelligence methods were extensively used for modeling financial series [14,15] because the traditional statistical methods cannot capture high nonlinearity and discontinuities. It seems that the evolutionary algorithms have the learning capacity nonlinearity and discontinuities. It seems that the evolutionary algorithms have the learning capacity of capturing the nonlinear future (which is dominant in financial series) [16,17], having the advantage to not rely on a specific form of the fitting model. Approaches like support vector regression [15,18,19], neural networks [14,20], or hybrid algorithms provided [13,21] valuable solutions to modeling economic and financial problems.
Part of the family of evolutionary techniques, the genetic programming paradigm, is defined as a generalization of Holland's genetic algorithms. Starting from Koza's seminal work [22], many algorithms have been developed as alternatives to genetic programming. One of them is Gene Expression Programming (GEP) [23], which was successfully used for modeling economics problems. For example, Huang et al. [24] used the GEP to generate trading signals for stocks, Chen et al. [16,25] utilized the Sortino ratio for fund selection, followed by the application of GEP for evolving dynamic trading strategies for them. Modeling economics problems, forecasting mutual funds, and financial indices' evolution have also been conducted by other scientists [13,15,[25][26][27][28][29][30][31][32][33].
Despite the significant number of articles in this field, only one investigated the relationship between the statistical properties of some financial series and the AI models' goodness of fit. Still, it refers to the generalized regression neural networks [14]. Therefore, this study will fill a gap in this field. We analyze the statistical properties of different financial time series and investigate the connection between these properties and the GEP models' goodness of fit. Comparisons of these models with the classical ARIMA are also provided, showing better performances of the GEP models. The research is important for selecting the best model from a diversity of options.

Studied Data and Statistical Tests
The data studied here consist of the BET index closing values, monthly series registered during October 2000-September 2014, and weekly series recorded in the period 1 October 2000-29 September 2014, downloaded from the website of the Bucharest Stock Exchange (BSE) [34]. Both series are complete, without gaps. The BET monthly and weekly series are represented in Figure 1.
BET, the first index developed by BSE, is a free-float weighted capitalization index with ten constituents being the reference index for BSE. BET reflects the performance of the most traded Romanian companies, listed at BVB, meeting the highest standards in the fields of investor relations and corporate governance. Constituents' weights are capped quarterly to ensure that they fulfill the standards for portfolio diversification [35].  BET, the first index developed by BSE, is a free-float weighted capitalization index with ten constituents being the reference index for BSE. BET reflects the performance of the most traded Romanian companies, listed at BVB, meeting the highest standards in the fields of investor relations and corporate governance. Constituents' weights are capped quarterly to ensure that they fulfill the standards for portfolio diversification [35].
Since the series presents high variability, we took the logarithms of their values. The resulted monthly and weekly data series are denoted by S_m and S_w, respectively.
Statistical tests have been performed at the significance level of 5% for S_m, S_w, and their subseries determined after the change point detection. The null and alternative hypotheses (denoted by H 0 and H 1 , respectively) of these tests are presented in Table 1, together with the references. Since the statistical tests are well-known, they are not discussed in detail here. For a deeper insight into them, readers may refer to [36][37][38][39][40][41][42][43][44][45][46][47][48][49].

Methods
For modeling purposes, gene expression programming (GEP) was used. GEP is an algorithm for implementing symbolic regression, that does not request the advanced specification of the form of the function be determined.
In GEP, an individual is a solution to the problem, with all the individuals (e.g., the problem's solutions) forming a population. GEP individuals are represented by hierarchical mathematical expressions, encoded as multi-genic linear chromosomes of a fixed length, which are linear strings of functional symbols (+, −, * , /, and functions-exponential, trigonometric, logarithmic, etc.), variables (containing the lagged values of the time series), and constants [13].
A gene is composed of a head (that may contain any symbol) and a tail (containing only constants and variables). The user must set the number of genes and the set of symbols before running the algorithm.
The word generation describes an iteration of the algorithm, pointing out the evolutionary aspect of the search process. The initial generation of individuals is randomly selected.
An iteration of the algorithm consists of modifying the individuals and choosing them based on their quality. The individuals' modifications are realized by applying genetic operators: crossover, mutation, and transposition. Crossover combines the features of two or more parent individuals into one or more children. The mutation alters an individual, modifying some of its features. The gene transposition randomly chooses one of the chromosome's genes and moves it in the place of another gene.
The fitness function evaluates the solution quality. GEP was used with good results for solving symbolic regression problems and time series forecasting. In the context of time series modeling, a candidate solution is a mathematical expression formed by functions, variables, and constants, and as a consequence, it is represented as a parse tree of the mathematical expression [20,23].
If (x t ) (t = 1, . . . ,n) represent the values of the data series and (x t ) t=1, ...,n are those computed by the algorithm, GEP aims to determine a function f that estimates the registered value at a moment t (t < n), as best as possible, such thatx t = f (x t−1 , . . . , x t−w ), where w is a window size, and x t−1 , . . . , x t−w , 0 ≤ t − w ≤ n are the previous w values.
The analytical form of the function is not specified, being the result of the evolution of the mathematical expressions in the algorithm. The basic structure of the gene expression algorithm is [23]: (1) Create chromosomes of the initial population; (2) Express chromosomes and evaluate their fitness; i If the stopping criterion is satisfied, designate results and stop; ii If stopping criterion is not satisfied, go to the next step; (3) Select chromosomes and keep the fittest for the next generation; (4) Perform genetic modifications via genetic operators and gene recombination; (5) Select next-generation individuals; (6) Go to (2).
The termination criterion is the number of generations. The accuracy of a model for a series of n observations is measured by the correlation between the actual and predicted values, the root mean squared error (RMSE), the mean squared error (MSE), the mean absolute error (MAE), and the mean absolute percentage error (MAPE) [13,14].
The settings for GEP parameters utilized in the present study (typical in the literature) are: the population size-50, the number of genes per chromosome-4, the gene head length-8, the maximum generations-2000, and the number of generations without improvement-1000. The settings of the evolution rate were: mutation rate-0.044, inversion rate-0.1, transposition rates-0.1, recombination rates: one point-0.3, twopoint-0.3, gene rate-0.1. Other settings, recommended by the literature [13,16,22,23] and used for running the algorithm are:

•
The window size, w, was considered between 1 and 12. In the following, we report only the overall best result experiments; • The mutation rate of random constants: 0.01; • Maximum number of iterations: 1000; • Functions used in expressions: addition, subtraction, multiplication, division, square root; • Linking function: addition.

•
The fitness function is the mean squared error (MSE): where x i is the actual value andx i is the computed value. The best solution is the one with the smallest fitness. The studied series was divided into two parts, the first one for training (70% of data) and the rest for the test (30% of data). The model was built using the training data (the algorithm learns the data on this set) and was validated on the test set.
The GEP was chosen for modeling purposes, since it is computationally faster than the methods based on neural networks and support vector regression, and generally gives better results in comparison to them.
For details about GEP and its applications, readers may refer to [20,23,49]. ARIMA models have been built as benchmark models for the series and their subseries determined by the change points.
A linear process (X t )( t ∈ Z) is called autoregressive of p order-AR(p)-if where (ε t , t ∈ Z) is a white noise with the variance σ 2 . Defining where (ε t , t ∈ Z) is a white noise with the variance σ 2 , B is the backshift operator (defined by B(X t ) = X t−1 ) and B k (X t ) = X t−k . A linear process (X t ) (t ∈ Z) is called moving average of q order-MA(q)-if where (ε t , t ∈ Z) is a white noise with the variance σ 2 . Defining the relation (6) becomes A linear process (X t ) (t ∈ Z) is called an autoregressive moving average of p, q orders is a white noise with the variance σ 2 . The constant c is called drift. An autoregressive integrated moving average process ARIMA (p, d, q) is defined by the equation where ∆ d X t =(1 − B) d X t (d > 0) the absolute values of Φşi Θ are greater than 1 and (ε t , t ∈ Z) is a centered white noise.
To select the best ARIMA model, the Akaike (AIC) criterion was used. The lower the AIC is, the better the model is.
For a deeper insight into ARIMA models, the reader may refer to [41]. Statistical tests and modeling were performed using the R and DTREG software.

Results of Statistical Analysis and the GEP Models for the Monthly Series
The results of the statistical analysis of the lnBET monthly series are the following. The normality and homoskedasticity hypotheses were rejected. After the application of the runs test, the null hypothesis was rejected. The analysis of the correlogram confirms the existence of the data autocorrelation. After performing the Mann-Kendall test, the alternative hypothesis of the existence of a monotonic trend could not be rejected. Both ADF and PP tests did not support the rejection of the existence of a unit root, while the KPSS rejected the hypothesis of series' stationarity in level or around a deterministic trend. After taking the first-order difference, the ADF and PP tests rejected the existence of the unit root, while the KPSS test did not reject the hypothesis of the stationarity in level and around a deterministic trend.
The For each monthly series and subseries, the Mann-Kendall test [42] was used for testing the hypothesis that the series does not have a monotonic trend (H 0 ) against the existence of such a trend (H 1 ).
The results of all the tests are presented in Table 2, where yes means that H 0 is rejected, and no means that H 0 cannot be rejected. For building GEP models for the monthly series, the window size, w, was selected between 1 and 12, considering the possible existence of a seasonal component. The performances of the best models are presented in Tables 3-7 for the training and test sets. Table 3. Performances of the best models built for the monthly series with lag 1 variables as regressors.

Series
Training Test S_m S1_m S2_m S3_m S4_m S_m S1_m S2_m S3_m S4_m  Tables 3-5 contain the results of the modeling of the monthly series without the trend removal, when the regressors were lag 1, and lag 1 and lag 12 variables, respectively. Since S3_m is too short, the model with lag 1 and lag 12 variables could not be built.  When analyzing the goodness of fit indicators for the series S_m and S1_m, one can see that the models' performances on the training sets are not significantly different when lag 1 and lag 12 variables, or only lag 1 variables are considered.
The two-sided Diebold-Mariano test [50] was also used to cross-validate these findings. The null hypothesis was (H 0 ) Both models have the same accuracy, and the alternative was (H 1 ) The models do not have the same accuracy.
The p-values were 0.341 (0.532), when comparing the models for S_m (S1_m) on the training set. Therefore, the null hypothesis could not be rejected.
The model for S_m is better on the test set when using both regressors than when using only the first one (lag 1 variable), in terms of all indicators but the correlation between the actual and predicted values. The result of the Diebold-Mariano test confirms this assertion. This time the null hypothesis was the same as previously, while the alternative one was (H 1 ) The model obtained using both regressors is better than the models obtained utilizing lag 1 regressor.
Comparing the goodness of fit indicator, the results for the S1_m model are better when considering only the lag 1 regressor. Still, the two-sided Diebolo-Mariano test does not reject the null hypothesis.
The models obtained for S2_m and S4_m are better when using only the lag 1 variable, than when using lag 1 and lag 12 because a negative correlation between the actual and predicted values is noticed in the second case.
From the point of view of the correlation between actual and predicted values, GEP models with lag 1 regressors perform satisfactorily on both training and test sets, the best results being obtained for S1_m and S4_m. Since the models with lag 1 variables as regressors are better, in terms of correlation between the actual and predicted values, we kept these models for further comparisons.
For U1_m (Table 5), the quality of the model obtained with lag 1 variable is worse than that obtained with lag 1 and lag 12 as regressors on the test set (see the negative value of the correlation between actual and predicted values), but on the training one, the results are not significantly different (through applying the Diebold-Mariano two-sided test).
GEP models have been built for the detrended series and subseries, with the same variables as regressors. The models with significant coefficients and the highest variance of the series explained by the model (among the linear, exponential, polynomial of second and third degrees, and logarithmic) have been chosen to express the trend. The best trends were found to be linear or exponential, as presented in Table 6.
For S1_m, S3_m, and S2_m, the variance of the data series explained by the models is very high.
The following notations are adopted for the monthly detrended series and subseries: S_md, S1_md, S2_md, S3_md, S4_md, and U1_md, respectively. Tables 7-9 contain the modeling results after the trend removal.  Table 7 shows that the trend removal improved the models' performances (most significantly for S1_m and S4_m) on both training and test series, when the regressors were the lag 1 variables. Indeed, the MAPEs values diminished after the trend removal for all but the S_m series on the test set.
For example, Figure 2 presents the chart of S1_md (corresponding to the data from Table 7, columns 3 and 8). The equation of the detected linear trend in this model is: where t = 1, . . . ,51 is the number of the month, and Y t is the trend. The standard errors of the coefficients are respectively 0.0235 and 0.0142.  Table 8, one remarks that the algorithm learns the input dat monthly series and its subseries, after the trend elimination. It better perf of correlation between the actual and predicted values on the test sets S2_m A significantly diminished MAPE was computed for the S2_md test regressors are lag 1 and lag 12 variables, a notable improvement of the mo only on the training set. The algorithm learns data well, but it cannot use w on the test sets. Overall, the models' performances are comparable or b regressor (Table 7) than with two regressors (Table 8). The best models for U1_md have been obtained for two regressors (co from Table 9). The models for U1_md are better than those for U1, in terms o MAE, and MAPE. Comparing the results from Tables 5 and 9, one may re trend removal contributed to improving the models' quality when using as regressors. Still, the model is not satisfactory, in terms of the correlation b and predicted values on the test series.
The next stage of this analysis was building the deseasonalized series ( and the corresponding GEP models. For this aim, the classical decompositio The equation of the GEP model is: The standard errors of the coefficients are respectively 0.0121 and 0.0415. All the model's coefficients are significant, at a significance level of 0.05.
From Table 8, one remarks that the algorithm learns the input data well for the monthly series and its subseries, after the trend elimination. It better performs, in terms of correlation between the actual and predicted values on the test sets S2_md and S4_md.
A significantly diminished MAPE was computed for the S2_md test set. When the regressors are lag 1 and lag 12 variables, a notable improvement of the models is noticed only on the training set. The algorithm learns data well, but it cannot use what it learned on the test sets. Overall, the models' performances are comparable or better with one regressor (Table 7) than with two regressors ( Table 8).
The best models for U1_md have been obtained for two regressors (columns 2 and 3 from Table 9). The models for U1_md are better than those for U1, in terms of RMSE, MSE, MAE, and MAPE. Comparing the results from Tables 5 and 9, one may remark that the trend removal contributed to improving the models' quality when using lag 1 variables as regressors. Still, the model is not satisfactory, in terms of the correlation between actual and predicted values on the test series.
The next stage of this analysis was building the deseasonalized series (and subseries) and the corresponding GEP models. For this aim, the classical decomposition method was employed. The equation is: where y t is the series to be decomposed, Y t is the trend, S t is the seasonality index, and ε t , the random variable. The difference y t − S t is by definition the deseasonalized series. In the following, the deseasonalized monthly series S is denoted by S_ms. Analogous notations are used for the monthly subseries. The performances of the best models are presented in Tables 10 and 11. For S_ms, the correlation between the actual and predicted values on the training set is close to zero. On the test set, the same indicator is higher. Still, it remains lower than for the raw series or the detrended one. Generally, the models for the deseasonalized subseries are worse than those for the raw and detrended series.
For example, in Figure 3, one may see the modeling errors for S4_ms (Table 10,   In what follows, the weekly series is denoted by subseries determined by the change points are denoted, respectively, by S1_ For U1_ms, the best model has been obtained for lag 1 and 12 as regressors. To conclude, removing the seasonality did not improve the modeling quality.

Results of Statistical Analysis and the GEP Models for the Monthly Series
The change point determined by the Pettitt test for the weekly serie December 2004), and those found by mDP are 215 (26 December 2004), 365 ( 2007), 435 (26 April 2009). In what follows, the weekly series is denoted by S subseries determined by the change points are denoted, respectively, by S1_w the values 1-215), S2_w (containing the values 216-365), S3_w (containing the 435), S4_w (containing the values 436-717), and U1_w (containing the valu The subseries and the fitted linear trend are presented in Figure 4.  Table 12 contains the equations of the trend lines of the weekly series. The and the models are significant at a 0.05 significance level (based on the t-tes test). The trend determined for the series and subseries is the best fit for the selected from a set of deterministic models (linear, exponential, polynomial o and third-order, logarithmic, and power functions).
The results of the statistical tests on the weekly series are presented in T all tests except normality and homoskedasticity, they are similar to those for series. To summarize, the hypotheses that the series are Gaussian, hom stationary (in trend and level), and are not random, have been rejected, while hypothesis could not be rejected. Therefore, the series differentiation has been for finding the best ARIMA model (presented in the next section).  Table 12 contains the equations of the trend lines of the weekly series. The coefficients and the models are significant at a 0.05 significance level (based on the t-test and the F-test). The trend determined for the series and subseries is the best fit for the data series, selected from a set of deterministic models (linear, exponential, polynomial of the second and third-order, logarithmic, and power functions). The results of the statistical tests on the weekly series are presented in Table 13. For all tests except normality and homoskedasticity, they are similar to those for the monthly series. To summarize, the hypotheses that the series are Gaussian, homoskedastic, stationary (in trend and level), and are not random, have been rejected, while the unit root hypothesis could not be rejected. Therefore, the series differentiation has been performed for finding the best ARIMA model (presented in the next section). A window size from one to five has been considered when selecting the best GEP models for the weekly series. Therefore, in the models, the regressors were the lagged variables from one to five. These lag values have been selected because the actual values could be influenced by the values from the same week (and taking into account that during Saturdays and Sundays, the stocks are not traded at BSE). The best GEP models for the weekly series are presented in Table 14. The models for S1_w, S3_w, and S4_w are good on both training and test sets. The correlations between actual and predicted values on the training sets are high for the other subseries, but they are much lower for the test set (0.610 for S_w and 0.268 for S3_w). On the same test sets, MAPEs are 2.294 and 1.887, confirming that the algorithm does not apply what it learned on the test sets S_w and S3_w. Since the data series are not monthly or annual, the seasonal component does not appear.
The study results on the detrended series are presented in Table 15. The best models are those for S1_wd and S4_wd. Considering all indicators but the correlation between actual and predicted values, the second place is occupied by the model for S2_wd, whereas taking into account this indicator, the model for U1_wd is in second place. The worst model is for S3_wd. The trend removal improved the model quality (compared to the raw monthly series) for the training series and S1 test set. Table 15. Performances of the best models built for the S_w and its subseries, after the trend removal. The detrended series are denoted by S_wd, S1_wd, S2_wd, S3_wd, S4_wd, and U1_wd.

Series
Training Test S_wd S1_wd S2_wd S3_wd S4_wd U1_wd S_wd S1_wd S2_wd S3_wd S4_wd U1_wd Correlation For the test set, the models for S1_wd and U1_wd are worse than S_w and U1_wd, respectively. The models for S2_w and S2_wd are comparable. The model for S3_w Mathematics 2021, 9, 1853 13 of 19 (S4_wd) is better than for S3_wd (S4_w), only based on the correlation between the actual and predicted values.

ARIMA Models for Monthly Series
The best ARIMA models for the monthly series are presented in the following. The lnBET series and subseries present a unit root, so the best models are obtained after taking the first-order difference of the raw series. Therefore, the goodness of fit coefficients will be smaller for the differentiated series ( Figure 5) than for the raw series. This aspect should be taken into consideration when comparing the GEP and ARIMA models.
For the test set, the models for S1_wd and U1_wd are worse than S_w respectively. The models for S2_w and S2_wd are comparable. The m (S4_wd) is better than for S3_wd (S4_w), only based on the correlation betw and predicted values.

ARIMA Models for Monthly Series
The best ARIMA models for the monthly series are presented in the lnBET series and subseries present a unit root, so the best models are obtain the first-order difference of the raw series. Therefore, the goodness of fit co be smaller for the differentiated series ( Figure 5) than for the raw series. This be taken into consideration when comparing the GEP and ARIMA models The best ARIMA models for the monthly series are presented in Table  S2_m, and S3_m, they are the differentiated series with or without drift. T S_m and S4_m are ARIMA, for which the autoregressive part is missing; an ARIMA whose moving average part is absent.
To validate the ARIMA models, the hypothesis that the residual forms has been tested by using the Anderson-Darling (A-D) test (for normality), t (for homoskedasticity), and plotting the autocorrelation and partial a functions (ACF and PACF). For the last two functions, the lag at which t appears is listed, while p-values (with three decimals) are displayed for the the correlation is absent, 'none' is inserted in Table 17 that summarizes the The best ARIMA models for the monthly series are presented in Table 16. For S1_m, S2_m, and S3_m, they are the differentiated series with or without drift. The models for S_m and S4_m are ARIMA, for which the autoregressive part is missing; for U1_m, it is an ARIMA whose moving average part is absent. To validate the ARIMA models, the hypothesis that the residual forms a white noise has been tested by using the Anderson-Darling (A-D) test (for normality), the Levene test (for homoskedasticity), and plotting the autocorrelation and partial autocorrelation functions (ACF and PACF). For the last two functions, the lag at which the correlation appears is listed, while p-values (with three decimals) are displayed for the other tests. If the correlation is absent, 'none' is inserted in Table 17 that summarizes the results. Note. Yes means that the hypothesis that the residual forms a white noise could be rejected.
Among the ARIMA models for the monthly series, only three are correct from a statistical viewpoint, those for S1_m, S2_m, and S3_m. These are kept for comparisons, given that only valid models can be employed for a possible forecast. The GEP models for S_m, S4_m, and U1_m (Tables 4 and 5) are kept.
Comparing the models for S2_m (Table 4), it results that the GEP one is the best on the test set (showing that the GEP algorithm learned the data well, and applied what it learned on the new dataset). The same is true for S3_m considering all indicators but MAPE, which is slightly higher for S3_m (the test set).
For a fair comparison, the GEP has been run for S1_m, S2_m, and S3_m without dividing the series into two parts, for training and test. It resulted in: Therefore, in these cases, ARIMA models are the best. The ARIMA models for the monthly detrended series are presented in Table 18. Among them, there are an ARMA(3,1) (for U1_d) and three AR (for S1_d, S2_d, and U1_d) models. At first sight, all of them have significantly higher MAPE than the corresponding GEP models (Tables 7 and 9). Being a non-dimensional indicator, MAPE is most suitable for comparing different kinds of models. Based on this, GEP models are more competitive.
The results of the models' validation are contained in Table 19. Based on this, only two models (Table 18) should be compared to the GEP ones, S1_md and S3_md. For these series, all the indicators of the GEP models are lower than the corresponding ones for ARIMA. Therefore, the best performances are noticed for all the GEP detrended series.
The ARIMA models for the monthly deseasonalized series are given in Table 20, and the results of their validation are in Table 21. Among the models, only that for S2_ms was validated (the hypothesis that the residual forms a white noise could not be rejected), so only the quality of this model should be discussed. The corresponding goodness of fit indicators in the ARIMA are smaller than those in the GEP model for S2_ms (Table 15). Still, for a relevant comparison of both models, the entire series has been used for building a GEP model. The indicators that characterize it are RMSE = 0.0523, MSE = 0.0027, MAE = 0.0384, MAPE = 0.4237. Therefore, the GEP model is the best one.  Comparison of the ARIMA models for the raw series, the detrended and deseasonalized ones shows that:

•
In terms of MAPE, the models for the detrended series are the worst.

•
Taking into account RMSE, MSE, and MAE, the raw series models have similar performances with those for the detrended series for all but S3. Still, the model for S2_md is worse than for S2, since the first one does not satisfy the hypotheses on the residuals.

•
In terms of MAPE, the ARIMA models for S2_ms and S4_ms are better than those of S2_m and S4_m. Even if the MAPE for S1_ms is smaller than for S1_m, the residual is not white noise, so the first model cannot be considered better than the corresponding one for the raw data series.

•
Comparative results have been obtained for U1_m and U1_ms.

•
The other goodness of fit indicators generally have comparative values for the rest of the initial and deseasonalized series.

ARIMA Models for Weekly Series
The same procedure was followed for studying the weekly series. The ARIMA models for the series and the detrended ones are given in Tables 22 and 23.  The output of the tests for validating the models is contained in Tables 24 and 25. Note. Yes means that the hypothesis that the residual forms a white noise could be rejected. Note. Yes means that the hypothesis that the residual forms a white noise could be rejected. Table 24 shows that all but the ARIMA model for S2_w do not satisfy the hypothesis that the residual series form a white noise.
To compare the models for S2_w, the GEP algorithm has been run without dividing the series into training and test. The following values of the indicators have been obtained: RMSE = 0.0477, MSE = 0.0023, MAE = 0.0285, MAPE = 0.3222 (higher than the corresponding values in the ARIMA model). Therefore, ARIMA performs better than GEP only on S2_w. Table 25 shows that the ARIMA models for the detrended weekly series cannot be validated. Furthermore, MAPE is extremely high by comparison to those of the GEP models for the same series, and of ARIMA models for the raw weekly data. So, the trend removal does not lead to the quality augmentation.

Conclusions
Financial time series are characterized by nonlinearity and high variability, often accompanied by heteroskedasticity and non-stationarity, which make the deterministic methods unsuitable for modeling such series. Different stochastic models rely on restrictive assumptions on the data generating processes. The artificial intelligence methods do not impose such restrictions, making them good options when other kinds of algorithms cannot be used, or satisfactory models cannot be found by other approaches.
Despite their advantages, the relationship between the statistical properties of the data series-normality, homoscedasticity, autocorrelation, stationarity, and the trend existenceand the quality of the models built for the series utilizing AI techniques was not investigated. Knowledge on the performances of different algorithms applied on time series with the same statistical properties would eliminate some modeling options, and reduce the time of searching for the appropriate approach. Such modeling is suitable not only for financial time series but for other fields as well, according to the statistical characteristics of a series.
Therefore, in this article, we built GEP and ARIMA models for lnBET series and subseries, to determine if the statistical properties are connected to the goodness of fit of such models. The modeling results show that: (1) The normality and homoskedasticity do not have a major influence on the models' performances; (2) The trend removal results in better GEP models; (3) The seasonality elimination does not lead to an improvement of the modeling quality. (4) The trend removal results in worse ARIMA models; (5) Generally, GEP performed better than ARIMA on the study series.
Remark (1) does not seem surprising, since we are dealing with point forecasts and performance measures of conditional location parameters of the conditional density.
The first three findings are in concordance with the results from [14], which refers to GRNN models.
The study should be extended on simulated and other real data series for collecting more evidence on the connection between the models' performances and the series statistical characteristics.