Recurrent Neural Networks and ARIMA Models for Euro/Dollar Exchange Rate Forecasting

: Analyzing the future behaviors of currency pairs represents a priority for governments, ﬁnancial institutions, and investors, who use this type of analysis to understand the economic situation of a country and determine when to sell and buy goods or services from a particular location. Several models are used to forecast this type of time series with reasonable accuracy. However, due to the random behavior of these time series, achieving good forecasting performance represents a signiﬁcant challenge. In this paper, we compare forecasting models to evaluate their accuracy in the short term using data on the EUR/USD exchange rate. For this purpose, we used three methods: Autoregressive Integrated Moving Average (ARIMA), Recurrent Neural Network (RNN) of the Elman type, and Long Short-Term Memory (LSTM). The analyzed period spanned from 2 January 1998, to 31 December 2019, and was divided into training and validation datasets. We performed forecasting calculations to predict windows with six different forecasting horizons. We found that the window of one month with 22 observations better matched the validation dataset in the short term compared to the other windows. Theil’s U coefﬁcients calculated for this window were 0.04743, 0.002625, and 0.001808 for the ARIMA, Elman, and LSTM networks, respectively. LSTM provided the best forecast in the short term, while Elman provided the best forecast in the long term.


Introduction
Different stock market forecasting techniques have been developed to predict values since the birth of the foreign exchange market (FOREX) in the 1970s [1][2][3]. Some of these techniques are used to identify future movements and include fundamental analysis, technical analysis, and mixed analysis, such as using statistical methods to model prices' behaviors and generate future predictions [4,5]. Individual models, such as ARIMA, have been employed to forecast time series due to their popularity as classic prediction methods [6]. Since the beginning of the 1990s, economic and financial data studies have been carried out by applying artificial neural networks (ANN) as estimation and forecasting methods in non-linear functions with great success [7,8]. For instance, a class of neural networks was designed mainly based on the use of Liapunov's stability theory for learning laws. These networks are known as differential or dynamic neural networks (DNN) [8,9], whose applications were shown to be successful for forecasting the DAX and S&P 500 stock indices [7]. Following the same logic of variable weight analysis, a neural network (NN) system with twelve economic variables was used to analyze the significance of these variables in peso-dollar forecasting [10]. In the last decade, RNNs of the LSTM type have been widely used for forecasting sequential data [11][12][13][14]. The mechanism by which such networks store long-and short-term information makes them powerful when performing historical data forecasting. This type of RNN has been used for currency-pair forecasting, action trading on the New York Stock Exchange, recognition, environmental predictions, electricity demands, etc. [9]. The accuracy of the LSTM method was evaluated by comparing this method with other types of NNs and classical prediction methods [15][16][17][18]. Many of these comparisons and applications were used to formulate new hybrid models to improve the results of the predictions [19][20][21]. In this context, the results of a combination of classical prediction methods [19], neural networks [22], and recurrent neural networks [23] have helped clarify the path to creating new approximations based on standard methods applied to the currency exchange rate and stock market forecasting [24]. Most of these approaches were proposed to find the model that can provide the best forecasting in the short term according to the next-day predictions needed in currency exchange rate forecasting, which is the most challenging objective due to the inherently noisy and non-stationary behavior of the data.
In this work, we compare the modelling of ARIMA with RNN, Elman, and LSTM networks to perform out-of-sample forecasting for a EUR/USD exchange-rate dataset. The purpose of this work is to provide valuable tools not only to demonstrate the accuracy of these models and use them for financial purposes but also to show how these three methods can be used to create hybrid models to improve the forecasting of random time series. We begin this study by providing a summary of the three methods (ARIMA, Elman, and LSTM) to clarify how the algorithms work and how to optimize the models. Next, we define the datasets used for training and validation purposes, followed by exploratory analysis and data preprocessing. Then, we apply the ARIMA algorithm to determine the model that best forecasts the time series. We then define the Elman and LSTM networks by adjusting the optimal parameters. We performed training and validation of these three methods by using prediction windows with different forecast horizons. Finally, we chose the window that provides the best forecasting in the short term to evaluate, in detail, the accuracy of the three prediction methods.

Overview of Regression Techniques
The classical statistical methods were historically used to analyze the behavior of time series. ARIMA models are well-known parametric techniques trained through linear regressions [6,25]. The ARIMA algorithm in Figure 1a uses graphs, basic statistics, the Autocorrelation Function (ACF), the Partial Autocorrelation Function (PACF), and transformations to identify patterns and model components. This model provides estimates through least squares and maximum likelihood methods and uses the graphs, ACF, and PACF of residuals to verify the validity of the model-i.e., if the model is valid, then use that model; otherwise, go back to the first step. Lastly, the algorithm forecasts and tracks the model's performance using confidence intervals and simple statistics [26].
Appl. Sci. 2021, 11, x FOR PEER REVIEW 2 of 13 which such networks store long-and short-term information makes them powerful when performing historical data forecasting. This type of RNN has been used for currency-pair forecasting, action trading on the New York Stock Exchange, recognition, environmental predictions, electricity demands, etc. [9]. The accuracy of the LSTM method was evaluated by comparing this method with other types of NNs and classical prediction methods [15][16][17][18]. Many of these comparisons and applications were used to formulate new hybrid models to improve the results of the predictions [19][20][21]. In this context, the results of a combination of classical prediction methods [19], neural networks [22], and recurrent neural networks [23] have helped clarify the path to creating new approximations based on standard methods applied to the currency exchange rate and stock market forecasting [24]. Most of these approaches were proposed to find the model that can provide the best forecasting in the short term according to the next-day predictions needed in currency exchange rate forecasting, which is the most challenging objective due to the inherently noisy and non-stationary behavior of the data.
In this work, we compare the modelling of ARIMA with RNN, Elman, and LSTM networks to perform out-of-sample forecasting for a EUR/USD exchange-rate dataset. The purpose of this work is to provide valuable tools not only to demonstrate the accuracy of these models and use them for financial purposes but also to show how these three methods can be used to create hybrid models to improve the forecasting of random time series. We begin this study by providing a summary of the three methods (ARIMA, Elman, and LSTM) to clarify how the algorithms work and how to optimize the models. Next, we define the datasets used for training and validation purposes, followed by exploratory analysis and data preprocessing. Then, we apply the ARIMA algorithm to determine the model that best forecasts the time series. We then define the Elman and LSTM networks by adjusting the optimal parameters. We performed training and validation of these three methods by using prediction windows with different forecast horizons. Finally, we chose the window that provides the best forecasting in the short term to evaluate, in detail, the accuracy of the three prediction methods.

Overview of Regression Techniques
The classical statistical methods were historically used to analyze the behavior of time series. ARIMA models are well-known parametric techniques trained through linear regressions [6,25]. The ARIMA algorithm in Figure 1a uses graphs, basic statistics, the Autocorrelation Function (ACF), the Partial Autocorrelation Function (PACF), and transformations to identify patterns and model components. This model provides estimates through least squares and maximum likelihood methods and uses the graphs, ACF, and PACF of residuals to verify the validity of the model-i.e., if the model is valid, then use that model; otherwise, go back to the first step. Lastly, the algorithm forecasts and tracks the model's performance using confidence intervals and simple statistics [26]. For a non-parametric technique such as ANN, the model is trained through non-linear algorithms. These self-adapting models do not require a priori assumptions of the series due to their flexibility in building model topologies and ability to easily identify and predict behavior patterns in the series [16,25]. In this case, the training is conducted point-to-point; if the amount of atypical data is minimal, then the correct data fix the error generated by the atypical data, thereby converging to the exact model [27].
The Elman Neural Network (ENN) is a subclass of RNN. The ENN algorithm ( Figure 2) starts with an input layer followed by a hidden layer and a context layer (delay layer) with the same number of neurons. The feedback gives temporality to the network, providing the system with short-term memory. This memory process occurs through delay units that are fed by the neurons of the hidden layer. The weights of the connections between the hidden layer and delay units are fixed and equal to 1, allowing one to obtain a copy of the output values of the hidden layer from the previous step [28]. For a non-parametric technique such as ANN, the model is trained through non-linear algorithms. These self-adapting models do not require a priori assumptions of the series due to their flexibility in building model topologies and ability to easily identify and predict behavior patterns in the series [16,25]. In this case, the training is conducted pointto-point; if the amount of atypical data is minimal, then the correct data fix the error generated by the atypical data, thereby converging to the exact model [27].
The Elman Neural Network (ENN) is a subclass of RNN. The ENN algorithm ( Figure  2) starts with an input layer followed by a hidden layer and a context layer (delay layer) with the same number of neurons. The feedback gives temporality to the network, providing the system with short-term memory. This memory process occurs through delay units that are fed by the neurons of the hidden layer. The weights of the connections between the hidden layer and delay units are fixed and equal to 1, allowing one to obtain a copy of the output values of the hidden layer from the previous step [28]. In Figure 2, the input layer variables are represented by , and the hidden layer variables are represented by . The hidden layer vector is represented by the expression ℎ = ( ℎ ), and the output weight matrix is represented by = ( ℎ ), where = and are weight matrices, and = is the bias. Here, and are the activation functions for the hidden and output layers, respectively [28].
Similar to ENN, the LSTM network can remember a relevant sequence of data and preserve it for several time instances. In this way, an LSTM network can achieve shortterm memory similar to that of basic recurrent networks, as well as long-term memory. As shown in Figure 3a, each block of the LSTM network can contain several cells in a similar manner to an Elman network, only replacing the neurons and hidden units with a memory block (LSTM cell; Figure 3b). In Figure 2, the input layer variables are represented by x t , and the hidden layer variables are represented by y t . The hidden layer vector is represented by the , and the output weight matrix is represented by y t = σ y W y h t + b y , where W = W x W y T and U h are weight matrices, and b = b h b y T is the bias. Here, σ h and σ y are the activation functions for the hidden and output layers, respectively [28]. Similar to ENN, the LSTM network can remember a relevant sequence of data and preserve it for several time instances. In this way, an LSTM network can achieve short-term memory similar to that of basic recurrent networks, as well as long-term memory. As shown in Figure 3a, each block of the LSTM network can contain several cells in a similar manner to an Elman network, only replacing the neurons and hidden units with a memory block (LSTM cell; Figure 3b).
In Figure 3, the input and output are represented by x t and y t , respectively, while the vector h t represents short-term memory, and c t represents long-term memory. For time series predictions of x t , the LSTM system updates the memory cell c t and outputs a hidden state h t for each step t. Equation (1) represents the mechanism of LSTM [14,29]: The forget gate ( f t ), input gate (i t ), and output gate (o t ) are fed by the input x t and a previous short-term state, h t−1 , that includes gate g t , where σ stands for the standard logistic sigmoid function σ(x) = 1/(1 + e −x ), and tanh is denoted by tanh(x) = (e x − e −x )/(e x + e −x ). The weight matrices W xi , W x f , W xo , W xg and W hi , W h f , W ho , W hg are connected to the input vector x t and short-term h t−1 ; b i , b f , b o , b g are the bias terms for each of the four layers, where b f is initialized in 1 s to avoid forgetting everything at the beginning of the training [14,29]. respectively [28].
Similar to ENN, the LSTM network can remember a relevant sequence of data and preserve it for several time instances. In this way, an LSTM network can achieve shortterm memory similar to that of basic recurrent networks, as well as long-term memory. As shown in Figure 3a, each block of the LSTM network can contain several cells in a similar manner to an Elman network, only replacing the neurons and hidden units with a memory block (LSTM cell; Figure 3b). In Figure 3, the input and output are represented by and , respectively, while the vector ℎ represents short-term memory, and represents long-term memory. For time series predictions of , the LSTM system updates the memory cell and outputs a hidden state ℎ for each step . Equation (1) represents the mechanism of LSTM [14,29]: The forget gate ( ), input gate ( ), and output gate ( ) are fed by the input and a previous short-term state, ℎ , that includes gate , where stands for the standard logistic sigmoid function ( ) = 1/(1 ), and ℎ is denoted by The weight matrices  ,  , , and , , , are connected to the input vector and short-term ℎ ; , , , are the bias terms for each of the four layers, where is initialized in 1 to avoid forgetting everything at the beginning of the training [14,29].

Data and Sampling
ARIMA, Elman, and LSTM were used to forecast the time series to analyze the accuracy of the model. For this proposal, the time series represents the EUR/USD exchange rate's daily value. The data were obtained from the records on Investing.com from 2 January 1998, to 31 December 2019, with a total of 5737 observations. Each observation represents the daily price of the EUR/USD exchange rate from Monday to Friday. To apply the prediction techniques, the time series were divided into training and validation sets. For training, the dataset ( ) used the expression = -, where represents the total observations, and stands for the validation dataset, which includes windows with different forecasting horizons of 5, 11, 22, 35, 44, and 55 observations.

Data and Sampling
ARIMA, Elman, and LSTM were used to forecast the time series to analyze the accuracy of the model. For this proposal, the time series represents the EUR/USD exchange rate's daily value. The data were obtained from the records on Investing.com from 2 January 1998, to 31 December 2019, with a total of 5737 observations. Each observation represents the daily price of the EUR/USD exchange rate from Monday to Friday. To apply the prediction techniques, the time series were divided into training and validation sets. For training, the dataset (train) used the expression train = nvalid, where n represents the total observations, and valid stands for the validation dataset, which includes windows with different forecasting horizons of 5,11,22,35,44, and 55 observations. The window that provided the best forecasting in the short term was selected to evaluate the performance of the three forecasting methods. Theil's U coefficient and the Diebold-Mariano test were used to evaluate the forecasting method's accuracy.
Finally, we obtained the mean absolute percentage error (MAPE) of the selected prediction window to identify the observations where the method provided the greatest accuracy.

Application of Models
Based on the time series behavior graphically presented in Figure 4, the series does not follow a specific pattern in time, and the peaks do not oscillate around the average. Instead, the peaks are far from the average. Moreover, the series shows seasonality, indicating a random time series.
In the period analysis, the time series achieved a minimum price of USD 0.8273 and a maximum price of USD 1.5988. The average daily price was USD 1.1992. Forty percent of the time series were below the average market rates, and 60% were equal to or exceeded the average.

Application of ARIMA
The time series showed random behavior, as indicated in Figure 4. Therefore, a series analysis was carried out to identify the ARIMA model that best fits the data according to the ARIMA algorithm ( Figure 1). not follow a specific pattern in time, and the peaks do not oscillate around the average. Instead, the peaks are far from the average. Moreover, the series shows seasonality, indicating a random time series.
In the period analysis, the time series achieved a minimum price of USD 0.8273 and a maximum price of USD 1.5988. The average daily price was USD 1.1992. Forty percent of the time series were below the average market rates, and 60% were equal to or exceeded the average.

Application of ARIMA
The time series showed random behavior, as indicated in Figure 4. Therefore, a series analysis was carried out to identify the ARIMA model that best fits the data according to the ARIMA algorithm ( Figure 1).
First, we verify the stationarity of the series using a Dickey-Fuller test at a significance level of α = 0.05. We used the R software (v. 3.5.3) to codify ARIMA.  The p-value = 0.7546 > α = 0.05; therefore, the time series is not stationary. To make the time series stationary, we performed first-order differentiation followed by another stationarity test. The p-value = 0.1 < α = 0.05; therefore, the series is stationary. The differentiated series and the ACF were then analyzed. The former presented white-noise behavior, and the ACF corroborated the independence of the data. Based on this analysis, the best model to fit random walk was the ARIMA model (0, 1, 0), which corresponds to the following expression: where B is the delay operator, BY t = Y t−1 , and ε t ∼ RB 0, σ 2 . By testing the assumptions, we can verify the stationarity, independence, normality, and homoscedasticity. For stationarity, the p-value = 0.1 < α = 0.05, so the residuals are stationary. For independence, the p-value = 0.2118 > α = 0.05, so the residuals are independent. For normality, the p-value = 2.2 −16 < α = 0.5, so the residuals do not follow a normal distribution. For homoscedasticity, the p-value = 0.1656 > α = 0.05, so the residuals present homogeneous variance. Thus, the normality test was not fulfilled. Consequently, the ARIMA model will not yield reliable results a priori; however, we applied this model to the data for comparison purposes.

Application of Recurrent Neural Networks
A. Data preprocessing The data scale influences the processing of deep neural networks, mainly when using the sigmoidal or hyperbolic tangent activation functions. An alternative standardization was used to scale the data to give an absolute minimum and maximum value for each variable with intervals [−1, 1] and [0, 1]. To scale the random variable y i , we used Equation (3): where min (Y) and max (Y) are the minimum and maximum values of the vector Y, and z i is scaled between 0 and 1.
With the scaled data, we proceeded to build the RNN using Elman cells and LSTM. Each network parameter was then adjusted (activation functions, loss functions to minimize, number of layers, number of neurons in each layer, etc.) until better results were obtained.
The dropout technique was applied to the neural networks to avoid obtaining a smaller network due to overtraining [31], a process where randomly selected sets of neurons during the training phase are ignored with a probability of 1 − p. If the parameter is closer to 0, fewer neurons are deactivated, and if it is closer to 1, more variables are deactivated. The optimal dropout value obtained for this study was 0.2. Poorer results were obtained when not using the dropout technique.
B. Elman Cell This network consisted of 110 time steps (inputs) to forecast the window steps (outputs) and featured two layers with 3 and 2 neurons in each. We used the R software v. 3.5.3 to code the Elman cell. Here, maxit = 7000 (the number of iterations to train the model), and learnFuncParams = 0.2 (network training speed). Figure 5 shows the network error. Here, maxit = 7000 (the number of iterations to train the model), and learn-FuncParams = 0.2 (network training speed). Figure 5 shows the network error. In this case, the mean absolute error (MAE) was used as the loss function for the network, and (adam) was used as an optimizer. Figure 6 shows the behavior of the error In this case, the mean absolute error (MAE) was used as the loss function for the network, and (adam) was used as an optimizer. Figure 6 shows the behavior of the error as a function of the number of iterations. Here, the error is large for iterations one and two and stabilizes starting from iteration three.

Selection the Forecasting Window
In this section, we compare the forecasting dataset calculated with the ARIMA (0, 1, 0) model, Elman, and LSTM using the validation dataset for windows of 5,11,22,35,44, and 55 days. The purpose of this comparison is to select the window that best approximates the validation dataset in the short term.
As shown in Figure 7, for the 5-day forecasting window, the Elman model provided predictions far from the validation dataset, while ARIMA and LSTM provided similar

Selection the Forecasting Window
In this section, we compare the forecasting dataset calculated with the ARIMA (0, 1, 0) model, Elman, and LSTM using the validation dataset for windows of 5, 11, 22, 35, 44, and Appl. Sci. 2021, 11, 5658 8 of 12 55 days. The purpose of this comparison is to select the window that best approximates the validation dataset in the short term.
As shown in Figure 7, for the 5-day forecasting window, the Elman model provided predictions far from the validation dataset, while ARIMA and LSTM provided similar results to the validation dataset in observations 3 and 4. For the 11-day window, the forecasting dataset better approximated the validation dataset for the three models. LSTM matched in three observations, while ARIMA and ELMAN matched in only one observation. For the third prediction window of 22 days, the LSTM forecasting dataset closely approximated the validation dataset in ten observations, while Elman matched the validation dataset in just one observation. ARIMA remained constant, with predictions far from the validation dataset. The Elman network presented an upward trend in its forecasts, while for the LSTM network, the forecasting data showed random behavior. For the prediction windows of 35, 44, and 55 days, the forecasting dataset obtained with the Elman network best matched the validation dataset and improved further when considering long-term predictions. ARIMA and LSTM most poorly approximated the validation dataset for long-term forecasts with these types of time series behavior and conditions.
In this analysis, the 22-day forecasting window best approximated the validation dataset in the short term. According to these results, we selected this window to evaluate and compare the prediction methods in detail.

Prediction Evaluation Measures
Performance metrics were used for the evaluation measures (scale-dependent errors and percentage errors; Table 1 Table 1, the errors obtained through the LSTM network were less significant than the errors obtained with the Elman network and the ARIMA model (0, 1, 0). These results corroborate the forecast accuracy obtained with these three methods (Figure 7; 22-day window). For the prediction windows of 35, 44, and 55 days, the forecasting dataset obtained with the Elman network best matched the validation dataset and improved further when considering long-term predictions. ARIMA and LSTM most poorly approximated the validation dataset for long-term forecasts with these types of time series behavior and conditions.
In this analysis, the 22-day forecasting window best approximated the validation dataset in the short term. According to these results, we selected this window to evaluate and compare the prediction methods in detail.

Prediction Evaluation Measures
Performance metrics were used for the evaluation measures (scale-dependent errors and percentage errors; Table 1 Table 1, the errors obtained through the LSTM network were less significant than the errors obtained with the Elman network and the ARIMA model (0, 1, 0). These results corroborate the forecast accuracy obtained with these three methods (Figure 7; 22-day window).

Accuracy Analysis
By using the first expression for U1 given in [33], we obtained Theil's U coefficients from the forecasting set to evaluate the accuracy of the models. Theil's coefficients are bounded by 0 and 1, where the lower boundary indicates a perfect forecast, and the upper boundary indicates unreliable forecasting. Coefficients close to 1 represent fully impractical situations for exchange forecasting. The use of these values would involve repeatedly performing the forecasting process to find negative forecasts or negative exchanges. Theil's U coefficients for the three forecasting methods are listed in Table 2. The three models give values close to 0. Therefore, the three techniques provide reliable predictions where the coefficient for the LSTM network is lower than that of the ELMAN and ARIMA models.
A Diebold-Mariano test was then performed to compare the prediction accuracy of the three techniques. Table 3 Table 3, by comparing the p-values obtained with a significance level of α = 0.05, we can deduce the following results: ARIMA vs. Elman: For two-sided, the value of p = 1.694 × 10 −4 < α; therefore, H 0 is rejected, and it can be concluded that the two techniques did not have the same accuracy. For greater, p = 8.468 × 10 −5 < α, and H 0 is rejected; thus, it can be concluded that the predictions performed with Elman were more precise than those made with ARIMA.
ARIMA vs. LSTM: For two-sided, p = 4.287 × 10 −5 < α, and H 0 is rejected; thus, it can be concluded that the two techniques did not have the same accuracy. For greater, p = 2.144 × 10 −5 < α; thus, H 0 is rejected, and it can be concluded that the predictions performed with LSTM were more accurate than those made with ARIMA. Elman vs. LSTM: For two-sided, the value p = 0.0154 < α, and H 0 is rejected; thus, the two techniques did not have the same accuracy. For greater, p = 0.0077 < α, and H 0 is rejected; thus, it can be concluded that the forecasting performed with LSTM was more accurate than that performed with Elman.
Based on these results, the predictions made with the LSTM network were more accurate than the forecasts performed with the ARIMA (0, 1, 0) model and the Elman network for short-term forecasting.

Accuracy Based on Observations
To identify the observations where the method provided the greatest accuracy, we calculated the MAPE of the forecasting data obtained with LSTM for the 22-day window. The prediction data corresponded to observations from 2 December to 31 December 2019. As shown in Figure 8, the MAPE results presented randomness, with the last day of predictions showing a significantly higher error percentage than the other days. The error rates were high on 2, 3, 4, 6, 9, 16, 27, and 30 December and low on 5, 10, 12, 17, 20, 25, and 26 December. The lowest error rates were observed on 11, 13, 18, and 24 December. These results may be due to the volatility of this type of time series.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 11 of 13 As shown in Figure 8, the MAPE results presented randomness, with the last day of predictions showing a significantly higher error percentage than the other days. The error rates were high on 2, 3, 4, 6, 9, 16, 27, and 30 December and low on 5, 10, 12, 17, 20, 25, and 26 December. The lowest error rates were observed on 11, 13, 18, and 24 December. These results may be due to the volatility of this type of time series. LSTM network performance in out-of-sample forecasting using the EUR/USD exchange rate dataset.

Conclusions
In this paper, we compared three methods-ARIMA models, Elman, and LSTM networks-to perform out-of-sample forecasting of the EUR/USD exchange rate dataset. The time series did not present a trend, seasonality, or stationarity; therefore, the time series was determined to be a random walk. An ARIMA (0, 1, 0) model was selected to analyze a first-order differentiated series and its ACF. Elman and LSTM networks were modeled using systematic simulations by adjusting parameters until we obtained the best results. These three models were then used to forecast and compare six windows for the validation datasets. Through this comparison, we determined that ARIMA generated constant numbers, while LSTM provided the best forecasts up to a 22-day window in the short term. With Elman, we obtained better results in the long term. The selected window was then evaluated in detail to identify the observations with the lowest errors, finding only four observations among the 22 windows where LSTM best approximated the validation dataset.
The average accuracy for the 22-day window was 71.76%. By comparing our results with the results of previous studies, we can conclude the following. First, it is difficult to make a direct comparison since there is no standardized method for selecting training and LSTM network performance in out-of-sample forecasting using the EUR/USD exchange rate dataset.

Conclusions
In this paper, we compared three methods-ARIMA models, Elman, and LSTM networks-to perform out-of-sample forecasting of the EUR/USD exchange rate dataset. The time series did not present a trend, seasonality, or stationarity; therefore, the time series was determined to be a random walk. An ARIMA (0, 1, 0) model was selected to analyze a first-order differentiated series and its ACF. Elman and LSTM networks were modeled using systematic simulations by adjusting parameters until we obtained the best results. These three models were then used to forecast and compare six windows for the validation datasets. Through this comparison, we determined that ARIMA generated constant numbers, while LSTM provided the best forecasts up to a 22-day window in the short term. With Elman, we obtained better results in the long term. The selected window was then evaluated in detail to identify the observations with the lowest errors, finding only four observations among the 22 windows where LSTM best approximated the validation dataset.
The average accuracy for the 22-day window was 71.76%. By comparing our results with the results of previous studies, we can conclude the following. First, it is difficult to make a direct comparison since there is no standardized method for selecting training and validation datasets. Second, previous studies reported an average accuracy lower than 70% for different approaches [34][35][36] and achieved greater than 70% average accuracy when using NN, combining models, or introducing a hybrid model [37][38][39][40]. Based on the comparison of the three forecasting models in this work, LSTM fit better in the short term, although the results were not entirely desirable, as LSTM only coincided in four observations with 95.99% average accuracy.
The advantage of using an RNN of the Elman or LSTM type is its efficiency when working with a time series. Both techniques have similar characteristics in their networks; the only difference lies in their memory capacity. The limitation of ARIMA is that it represents a general univariate model in which the assumptions must be fulfilled to succeed. A combination of these methods as a type of hybrid model could be an aim of future studies, similar to the model reported in [41,42].
Author Contributions: Programming, figures formatting, and draft the process, W.A.; first draft preparation; P.E.; supervision, conceptualization, P.E., J.P.; writing-review and editing, W.A., P.E., J.P.; funding acquisition, P.E. All authors have read and agreed to the published version of the manuscript.
Funding: This work was supported by the Indoamerican Technological University and Chimborazo's Superior Polytechnic School.