The Predictability of the Exchange Rate When Combining Machine Learning and Fundamental Models

: In 1983, Meese and Rogo ﬀ showed that traditional economic models developed since the 1970s do not perform better than the random walk in predicting out-of-sample exchange rates when using data obtained after the beginning of the ﬂoating rate system. Subsequently, whether traditional economical models can ever outperform the random walk in forecasting out-of-sample exchange rates has received scholarly attention. Recently, a combination of fundamental models with machine learning methodologies was found to outcompete the predictability of random walk (Amat et al. 2018). This paper focuses on combining modern machine learning methodologies with traditional economic models and examines whether such combinations can outperform the prediction performance of random walk without drift. More speciﬁcally, this paper applies the random forest, support vector machine, and neural network models to four fundamental theories (uncovered interest rate parity, purchase power parity, the monetary model, and the Taylor rule models). We performed a thorough robustness check using six government bonds with di ﬀ erent maturities and four price indexes, which demonstrated the superior performance of fundamental models combined with modern machine learning in predicting future exchange rates in comparison with the results of random walk. These results were examined using a root mean squared error (RMSE) and a Diebold–Mariano (DM) test. The main ﬁndings are as follows. First, when comparing the performance of fundamental models combined with machine learning with the performance of random walk, the RMSE results show that the fundamental models with machine learning outperform the random walk. In the DM test, the results are mixed as most of the results show signiﬁcantly di ﬀ erent predictive accuracies compared with the random walk. Second, when comparing the performance of fundamental models combined with machine learning, the models using the producer price index (PPI) consistently show good predictability. Meanwhile, the consumer price index (CPI) appears to be comparatively poor in predicting exchange rate, based on its poor results in the RMSE test and the DM test.


Introduction
Despite the existence of various economic theories explaining the fluctuation of future exchange rates, as shown in Meese andRogoff (1983a, 1983b), the random walk often produces better predictions for future exchange rates.More specifically, it has been shown that traditional economical models developed since the 1970s do not perform better than the random walk in predicting the out-of-sample exchange rate when using data obtained after the beginning of the floating rate system.Since the publication of these papers, many researchers have investigated this puzzle.Cheung et al. (2005) confirmed the work of Meese andRogoff (1983a, 1983b), and demonstrated that the interest rate parity, monetary, productivity-based, and behavioral exchange rate models do not outperform the random walk for any time-period.Similarly, Rossi (2013) could not find a model with strong out-of-sample forecasting ability.On the contrary, Mark (1995) showed that the economic exchange-rate models perform better than the random walk in predicting long term exchange rates.Amat et al. (2018) also found that combining machine learning methodologies, traditional exchange-rate models, and Taylor-rule exchange rate models could be useful in forecasting future short-term exchange rates in the case of 12 major currencies.
There have been similar attempts by researchers using stock market data.These studies show the predictability of future stock price using machine learning methodologies (Cervelló-Royo et al. 2015;Chong et al. 2017), and stock market trends (Chang et al. 2012;García et al. 2018).Hamori et al. (2018) also analyzed the default risk using several machine learning techniques.
Following on from these previous studies, this paper focuses on a combination of modern machine learning methodologies and economic models.The purpose of this paper is to determine whether such combinations outperform the prediction performance of random walk without drift.This model has been used as the comparison in most studies in this field since Meese andRogoff (1983a, 1983b).The most profound study in this field is Amat et al. (2018).What distinguishes the present paper from previous studies is that instead of using an exponential weighted average strategy and sequential ridge regression with discount factors, this paper applies the random forest, support vector machine (SVM), and neural network models to four fundamental theories (uncovered interest rate parity, purchase power parity, the monetary model, and the Taylor rule models).Furthermore, the robustness of the results is thoroughly examined using six government bonds with different maturities (1, 2, 3, 5, 7, and 10 years) and four price indexes (the producer price index (PPI), the consumer price index (CPI) of all items, CPI excluding fresh food, and CPI excluding fresh food and energy) individually in three machine learning models.Together, these elements should provide concrete evidence for the results that were obtained.
In the empirical analysis, a rolling window analysis was used for a one-period-ahead forecast for the JPY/USD exchange rate.The sample data range from August 1980 until August 2019.The window size was set as 421.Hence, in total, the rolling window analysis was conducted 47 times for the individual fundamental models.The main findings of this study are as follows.First, when comparing the performance of the fundamental models combined with machine learning to that of the random walk, the root mean squared error (RMSE) results show that the fundamental models with machine learning outperform the random walk (the mean absolute percentage error (MAPE) also confirmed this result).In the Diebold-Mariano (DM) test, most of the results show significantly different predictive accuracies compared to the random walk, while some of the random forest results show the same accuracy as the random walk.Second, when comparing the performance of the fundamental models combined with machine learning, the models using the PPI show fairly good predictability in a consistent manner.This is indicated by both the RMSE and the DM test results.However, the CPI is not appropriate for predicting exchange rates, based on its poor results in the RMSE test and DM test.This result seems reasonable given that the CPI includes volatile price indicators such as food, beverages and energy.
The rest of the paper is organized as follows.Section 2 explains the fundamental models, Section 3 describes the data used in the empirical studies, Section 4 describes the methodology of machine learning, Section 5 shows the results and evaluation, and Section 6 summarizes the main findings of the paper.

Fundamental Models
Following Rossi (2013) and Amat et al. (2018), this paper uses four basic methods to predict the exchange rate.These methods are uncovered interest rate parity (UIRP), purchase power parity (PPP), the monetary model, and the Taylor rule models.

Uncovered Interest Rate Parity
The UIRP theorem used in the following section was proposed by Fisher (1896).This theorem analyzes how interest rates can be altered due to expected changes in the relative value of the objected units.UIRP is based on the following assumption; in a world with only two currencies and where market participants possess perfect information, investors can buy 1 = S t units of foreign government bonds using one unit of their home currency.When investors buy a foreign bond between time t and time t + h, the earnings from the foreign bond are the bond premium plus the foreign interest rate: i * t+h .At the end of the period, investors can collect the return converted to the home currency, which is shown as S t+h 1 + i * t+h /S t in expectation.Additional transaction costs during the whole process are ignored in this analysis, and the bond return should be the same whether the investors buy the home bond or the foreign bond.Hence, the following equation is given: By taking logarithms, the previous UIRP equation can be rewritten as where S t is the logarithm of the exchange rate, and h is the horizon.
Another uncovered interest rate parity equation used in Taylor (1995) is as follows: where s t denotes the logarithm of the spot exchange rate (domestic price for foreign currency) at time t, and i t is the nominal interest for domestic and foreign securities, respectively (with k periods to maturity).
It is worth noting that in both equations, maturity is denoted as k, meaning that if we follow the equation faithfully to predict the one-month ahead exchange rate, we should use the one-month maturity of the government bond to predict that rate.However, the focus here is on the relationship between interest rate differences and the exchange rate.Thus, the above equations are rewritten as The above equation is used in the following empirical analysis.Meese andRogoff (1983a, 1983b), who used Equation (1) to forecast sample real exchange rates using the real interest rates and compared their performance with the predictions using random walk, found that the latter provided better forecasting results.

Purchasing Power Parity
The PPP was first proposed in Cassel (1918).The concept of PPP is that the same amount of goods or services can be purchased in either currency with the same initial amount of currency.That is, a unit of currency in the home country would have the same purchasing power in the foreign country.
The absolute purchase power parity can be expressed as the following equation: where S denotes the exchange rate in period t, P denotes the price level in the home country, and P * denotes the price level in the foreign country.
Assuming that the absolute purchasing power parity holds in period t + 1, we can obtain the following equation: Assuming that the inflation rate from period t to period t + 1 is π, we can obtain following equation: which means that Assuming that the rate of the change in the exchange rate is ρ, then Using Equations ( 8) and ( 9), we can obtain Since ρπ * is a very small value, it is ignored in the following analysis.Then, we obtain From Equation (11), we can see that there is a clear relationship between the rate of change in the exchange rate and the inflation rate.This paper use four indexes to calculate the inflation rate.These indexes are the PPI, the CPI of all items, the CPI excluding fresh food, and the CPI excluding fresh food and energy.Most papers use the CPI when describing the PPP theorem.However, Hashimoto (2011) mainly uses PPI for purchase power parity, since it includes business activities in both home and foreign markets.

Monetary Model
The monetary model was first introduced by Frenkel (1976) and Mussa (1976).The monetary approach determines the exchange rate as a relative price of two currencies and models the exchange rate behavior in terms of the relative demand for and the supply of money in the two countries.The long-run money market equilibrium in the domestic and foreign country is given by From Equations ( 12) and ( 13), we can obtain where m t denotes the logarithms of the money supply, p t denotes the logarithms of the price level, y t denotes the logarithms of income, and i t denotes the logarithms of the interest rates.k denotes the income elasticity.Assuming that k is 1 and using an uncovered interest rate parity of This paper mainly focuses on the relationship between the change rate of the exchange rate and other variables.Thus, the following equation is used: 2.4.Taylor Rule Models Engel andWest (2005, 2006) and Molodtsova and Papell (2009) improved the original Taylor rule for monetary policy (Taylor 1993), which describes the change in the exchange rate.
The concept in the original Taylor model (Taylor 1993) is that the monetary authority sets the real interest rate as a function of the difference between the real inflation and the target level and also as a function of the output gap y t .Taylor (1993) proposed the following equation: where i T t denotes the target for the short-term nominal interest rate, π t is the inflation rate, π * is the target level of inflation, y t is the output gap, and r * is the equilibrium level of the real interest rate.
Following Molodtsova and Papell (2009), assuming that µ = r * − φπ * , and λ = 1 + φ, the following equation is obtained: Since the monetary policy also depends on the real exchange rate, the real exchange rate variable q t is added into the previous equation: On top of Equation ( 19), we added another feature so that the interest rate adjusts gradually to achieve its target level (Clarida et al. 1998).This means that the actual observable interest rate i t is partially adjusted to the target, as follows: where ρ is the smoothing parameter, and v t is a monetary shock.By substituting Equation ( 19) into Equation ( 20), we get the following equation: where for the US, δ = 0, and v t is the monetary policy shock.Thus, we can obtain the following two equations using asterisks to denote foreign country variables: By taking the difference of Equations ( 22) and ( 23), using the UIRP model and re-defining the coefficients, we get In Molodtsova and Papell (2009), the strongest result was found in the symmetric Taylor rule model, which means that the coefficient of the real exchange rate δ = 0. Therefore, the Taylor fundamentals take the inflation, output gaps, and lagged interest rate into consideration.

Data
The data used to describe macroeconomies were taken from the DataStream database.All data describe monthly frequency.This paper used government bonds with different maturities (1, 2, 3, 5, 7, and 10 years) for each country.The producer price index (PPI) and consumer price index (CPI) of all items, the CPI excluding fresh food, and the CPI excluding fresh food and energy were used to calculate the inflation rate.For the money stock, we used each country's M1.To measure the output, we used the industrial production index, as GDP is only available quarterly.Following Molodtsova and Papell (2009), we used the Hodrick-Prescott filter to calculate the potential output to obtain the output gap.The exchange rates were taken from the BOJ Time-Series Data Search.The data is from the period ranging from August 1980 to August 2019.Data are described in Table 1.This paper used a rolling window analysis for the one-period-ahead forecast.A rolling window analysis runs an estimation iteratively, while shifting the fixed window size by one period in each analysis.The whole sample dataset ranges from the first period of August 1980 until August 2019.Here, the window size was set as 421.For example, the first window taken from August 1980 to August 2015 was used to estimate September 2015.Hence, the model uses the training data from period 1 to 421 to predict period 422 and then uses the training data from period 2 to 422 to predict period 423.This is repeated until the end of the time-series.In total, the rolling window analysis is run 47 times for one model.
There are two reasons why we used the end of month exchange rate rather than the monthly average exchange rate.First, the end of month exchange rate is often used in this field of study.Second, as mentioned in Engel et al. (2019), although replacing the monthly average exchange rate with the end of month exchange rate reduces the forecasting power of the Taylor rule fundamentals compared to that of the random walk (Molodtsova and Papell 2009), it is highly possible that changes in the monthly average exchange rate are serially correlated.Thus, following Engel et al. (2019), this study also used the end of month exchange rate.

Methodologies
Here, we use the result from random walk as the benchmark test and compare its performance to three types of machine learning: random forest, support vector machine, and neural network.The results are examined using the RMSE and a DM test.

Random Forest
Random forest (Breiman 2001) is an ensemble learning method that builds multiple decision trees by analyzing data features and then merges them together to improve prediction performance.This method enables us to avoid an overfitting problem when more trees are added to the forest and improves prediction performance because each tree is drawn from the original sample using bootstrap resampling and is grown based on a randomly selected feature.The uncorrelated models obtained from this method improve prediction performance, as the feature mentioned above protects individual errors from each other.In this way, an individual error will not interfere with the entire group moving toward the correct direction.The random forest produces regression trees through the following steps (Figure 1 Assume that there is a dataset  = {( ,  ) … … ( ,  )} and the target is to find the function :  → , where  is the inputs, and  is the produced outputs.Let  be the number of features.
1. Random forest randomly selects  observations from the sample  with a replacement to form a bootstrap sample.2. Multiple trees are grown by subsets of  features from the overall  features.For each subset,  features are selected at random.
A prediction is produced by taking the average of the predictions from all trees in the forest (in the case of a classification problem, a prediction is decided by the majority).
In this paper, X indicates the fundamental economic features, and Y is the exchange rate.D refers to all data.

Support Vector Machine
The original SVM algorithm was introduced by Vapnik and Lerner (1963).Boser et al. (1992) suggested an alternative way to create nonlinear classifiers by applying the kernel functions to maximum-margin hyperplanes.
The primary concept of SVM regression is discussed first with a linear model and then is extended to a non-linear model using the kernel functions.Given the training data Assume that there is a dataset D = (x 1 , y 1 ) . . . . . .(x n , y n ) and the target is to find the function f : X → Y , where X is the inputs, and Y is the produced outputs.Let M be the number of features.

1.
Random forest randomly selects n observations from the sample D with a replacement to form a bootstrap sample.

2.
Multiple trees are grown by subsets of m features from the overall M features.For each subset, m features are selected at random.
A prediction is produced by taking the average of the predictions from all trees in the forest (in the case of a classification problem, a prediction is decided by the majority).
In this paper, X indicates the fundamental economic features, and Y is the exchange rate.D refers to all data.

Support Vector Machine
The original SVM algorithm was introduced by Vapnik and Lerner (1963).Boser et al. (1992) suggested an alternative way to create nonlinear classifiers by applying the kernel functions to maximum-margin hyperplanes.
The primary concept of SVM regression is discussed first with a linear model and then is extended to a non-linear model using the kernel functions.
Given the training data (x 1 , y 1 ), . . ., (x 1 , y 1 ), X R n , Y R , the SVM regression can be given by ξ is the insensitive loss function considered in SVM from the loss function described as The principal objective of SVM regression is to find function f (x) with the minimum value of the loss function and also to make it as flat as possible.Thus, the model can be expressed as the following convex optimization problem: subject to where C determines the trade-off between the flatness of f (x) and the amount up to which deviations larger than ε are tolerable (ξ i , ξ * i ).After solving the Lagrange function from Equations ( 29)-(31) and using the kernel function, the SVM model using the kernel function can be expressed as follows: where k x i , x j is the kernel function, and α i , α * i are the Lagrangian multipliers.SVM can be performed by various functions, such as the linear, polynomial, or radial basis function (RBF), and sigmoid functions.This paper uses the radial basis function SVM model.The radial basis function can be expressed as follows: Here, the best C and sigma are determined using a grid search.Depending on the size of the C parameter, there is a trade-off between the correct classification of training examples and a smooth decision boundary.A larger C does not tolerate misclassification, offering a more complicated decision function, which a smaller C does tolerate.In this way, a simpler decision function is given.The sigma parameter defines how far the influence of a single training example reaches, with low values meaning 'far' and high values meaning 'close'.A larger sigma gives a great deal of weight to the variables nearby, so the decision boundary becomes wiggly.For a smaller sigma, the decision boundary resembles a linear boundary, since it also takes distant variables into consideration.

Neural Network
The feedforward neural network is the first and simplest type of neural network model.General references for this model include Bishop (1995), Hertz et al. (1991), and Ripley (1993Ripley ( , 1996)).This paper uses one hidden layer model, which is the simplest model, as shown in Figure 2. As shown in Figure 2, the information moves forward from the input nodes, through the hidden nodes, and then reaches the output nodes.
Inputs are summed by individual nodes.Then, adding a bias ( in the Figure 2), the result is substituted into a fixed function  (Equation ( 37)).The results of the output units are produced in the same process with output function  .Thus, the equation of a neural network is written as follows: The activation function  of the hidden layer units usually takes a logistic function as and the output function  usually takes a linear function in regression (in the case of a classification problem, the output function often takes a logistic form.)Here, we adjust two hyper-parameters, which are the number of the units in the hidden layer C and the parameter for weight decay, using a grid search.The latter is a regularization parameter to avoid the over-fitting problem (Venables and Ripley 2002).

Root Mean Squared Error
Random walk uses the following equation: For the other machine learning models, the following equation is used: As shown in Figure 2, the information moves forward from the input nodes, through the hidden nodes, and then reaches the output nodes.
Inputs are summed by individual nodes.Then, adding a bias (w ij in the Figure 2), the result is substituted into a fixed function φ h (Equation ( 37)).The results of the output units are produced in the same process with output function φ o .Thus, the equation of a neural network is written as follows: The activation function φ h of the hidden layer units usually takes a logistic function as and the output function φ o usually takes a linear function in regression (in the case of a classification problem, the output function often takes a logistic form.)Here, we adjust two hyper-parameters, which are the number of the units in the hidden layer C and the parameter for weight decay, using a grid search.The latter is a regularization parameter to avoid the over-fitting problem (Venables and Ripley 2002).

Root Mean Squared Error
Random walk uses the following equation: For the other machine learning models, the following equation is used: where A(t) denotes the actual value of the change rate in the exchange rate, and F(t) is the predicted value.If S = 0, then S is a prediction for September 2015.

Modified Diebold-Mariano Test
The DM test was proposed by Diebold and Mariano (1995).This test examines whether the null hypothesis (that the competing model has the same predictive accuracy) is statistically true.Let us define the forecast error e it as where ŷit and y t are the predicted and actual values at time t, respectively.Let g(e it ) denote the loss function.In this paper, it is defined as the following Then, the loss differential d t can be written as The statistic for the DM test is defined as follows: where d, s, and N represent the sample mean, the variance of d t , and the sample size, respectively.The null hypothesis is that H 0 : E[d t ] = 0 ∀t, which means that the two forecasts have the same accuracy, while the alternative hypothesis is that H 1 : E[d t ] 0 ∀t, meaning that the two forecasts have different levels of accuracy.If the null hypothesis is true, then the DM statistic is asymptotically distributed as N(0, 1), the standard normal distribution.A modified DM test was proposed by Harvey et al. (1997), who found that the modified DM test performs better than the original one.They defined the statistic for the modified DM test as follows: where h denotes the horizon, and the DM represents the original statistic, as in Equation ( 43).In this study, we predict one period ahead, meaning that h = 1, so Tables 2-6 indicate the following.According to the RMSE, the fundamental models using machine learning outperform the random walks with regard to their error size.This is also confirmed by the MAPE (Appendix A, Table A1).Since this is confirmed regardless of the government bonds' time to maturity and the price level measurements used, these findings are robust.Furthermore, models using PPI always show better predictability compared to CPI.This empirical result is in line with that of Hashimoto (2011).Because of the close trading relationship between Japan and the US, the fluctuation of JPY/USD tends to be influenced by PPI rather than CPI.The poor performance of CPI in terms of its error size and the significance of its predictive accuracy could be explained by the inclusion of volatile price indicators, such as food, beverages, and energy, w makes it difficult to measure an accurate inflation rate gap.In addition, in the case of the Taylor rule models, both Equations ( 24) and (25) present reasonable results for this empirical study.This demonstrates that either equation can be used to predict the exchange rate.44), "Taylor1" indicates using Equation (25), "PPI" indicates producer price index, "CPI" indicates the CPI of all items, "CPI_CORE" indicates the CPI excluding fresh food, and "CPI_CORECORE" indicates the CPI excluding fresh food and energy.
From the perspective of the modified DM test, we can see that most of the results show significantly different predictive accuracies compared to the random walk, while some of the random forest results show the same predictive accuracy as the random walk.Random forest is a weak tool for predicting the out-of-sample exchange rate compared with the other machine learning models.This seems reasonable, as the random forest model ignores two characteristics of time series data, that is, inherent time trend and the interdependency among variables.However, random forest can be useful in predicting time series data in some cases, such as that in Dudek (2015).Note."SVM" indicates support vector machine, "Taylor2" indicates using Equation (24), "DM" indicates modified Diebold-Mariano test statistic as in Equation ( 44), "PPI" indicates producer price index, "CPI" indicates the CPI of all items, "CPI_CORE" indicates the CPI excluding fresh food, and "CPI_CORECORE" indicates the CPI excluding fresh food and energy."Taylor2_PPI_2y" indicates using the PPI to calculate the inflation rate and using a government bond with 2 year to maturity to calculate the lagged interest rate."bond_2y" indicates the government bond with two years to maturity."bond_3y" indicates the government bond with three years to maturity."bond_5y" indicates the government bond with five years to maturity."bond_7y" indicates the government bond with seven years to maturity."bond_10y" indicates the government bond with ten years to maturity.

Conclusions
Since the work of Meese andRogoff (1983a, 1983b), there have been many attempts by researchers to solve the puzzle of why traditional economical models are not able to outperform the random walk in predicting out-of-sample exchange rates.In recent years, Amat et al. (2018) found that in combination with machine learning methodologies, traditional exchange-rate models and Taylor-rule exchange rate models can be useful for forecasting future short-term exchange rates across 12 major currencies.
In this paper, we analyzed whether combining modern machine learning methodologies with economic models could outperform the prediction performance of a random walk without drift.More specifically, this paper sheds light on the application of the random forest method, the support vector machine, and neural networks to four fundamental theories (uncovered interest rate parity, purchase power parity, the monetary model, and the Taylor rule models).The robustness of the results was also thoroughly examined using six government bonds with different maturities and four different price indexes in three machine learning models.This provides concrete evidence for predictive performance.
In the empirical analysis, a rolling window analysis was used for the one-period-ahead forecast for JPY/USD.Using sample data from between August 1980 and August 2019, there were two main findings.First, comparing the performance of the fundamental models combining machine learning with the performance obtained by the random walk, the RMSE results show that the former models outperform the random walk.In the DM test, most of the results show a significantly different predictive accuracy with the random walk, while some of the random forest results show the same accuracy as the random walk.Second, comparing the performance of the fundamental models combined with machine learning, the models using PPI show fairly good predictability in a consistent manner, from the perspective of both the size of their errors and their predictive accuracy.However, CPI does not appear to be a useful index for predicting exchange rate based on its poor results in the RMSE test and DM test.

Figure 2 .
Figure 2. The mechanism of a neural network.

Figure 2 .
Figure 2. The mechanism of a neural network.

Table 2 .
Results for the uncovered interest rate parity (UIRP) model.

Table 3 .
Results for the purchasing power parity (PPP) model.Note."PPP"indicatespurchasing power parity, "DM" indicates modified Diebold-Mariano test statistic as in Equation (44), "PPI" indicates producer price index, "CPI" indicates the CPI of all items, "CPI_CORE" indicates the CPI excluding fresh food, and "CPI_CORECORE" indicates the CPI excluding fresh food and energy.

Table 4 .
Results for the monetary model.Note."DM"indicatesmodified Diebold-Mariano test statistic, as in Equation (44), "PPI" indicates producer price index, "CPI" indicates the CPI of all items, "CPI_CORE" indicates the CPI excluding fresh food, and "CPI_CORECORE" indicates the CPI excluding fresh food and energy.