Dynamic Model Averaging in Economics and Finance with fDMA: A Package for R

The described R package allows to estimate Dynamic Model Averaging (DMA), Dynamic Model Selection (DMS) and Median Probability Model. The original methods, and additionally, some selected modifications of these methods are implemented. For example the user can choose between recursive moment estimation and exponentially moving average for variance updating in the base DMA. Moreover, inclusion probabilities can be computed in a way using “Google Trends” data. The code is written with respect to minimise the computational burden, which is quite an obstacle for DMA algorithm if numerous variables are used. For example, this package allows for parallel computations and implementation of the Occam’s window approach. However, clarity and readability of the code, and possibility for an R-familiar user to make his or her own small modifications in reasonably small time and with low effort are also taken under consideration. Except that, some alternative (benchmark) forecasts can also be quickly performed within this package. Indeed, this package is designed in a way that is hoped to be especially useful for practitioners and researchers in economics and finance.


Introduction
This paper is organised as follows. The first section contains the general motivation and information about the package. The next section provides an economic motivation. It is focused on examples from oil market. However, it briefly provides arguments in favour of using model averaging and Bayesian methods in economics. This section contains also short information about the already done researches in which Dynamic Model Averaging (DMA) was used. The third section contains the brief note about the theory of DMA, Dynamic Model Selection (DMS) and Median Probability Model. In the fourth section the main function of this package is described. The fifth section provides some information about the implemented information-theoretic averaging. The sixth section describes some additional functions implemented in fDMA. These functions are mostly some variations about model averaging. The seventh section describes the implementation of the Diebold-Mariano test in such a way that the user can quickly compare forecasts from different models obtained with fDMA. The eighth section describes various minor functions from the package which can be helpful in organizing and making the work faster for a researcher working with fDMA package. The ninth section provides an example of direct application of fDMA for data from oil market. The user greedy for quick familiarizing with the package can directly go to this section and later read the whole paper. He or she can also start from this section and later seek the necessary information in other sections. The final section summarizes the performance of fDMA and compares it with other similar packages.
The very first motivation behind the package described in this paper is to provide an easy and efficient tool for practitioners and/or researchers dealing with DMA. It provides not only the basic implementation of the method originally described by Raftery et al. [1], but also various later modifications of this method, and also some methods more or less linked with DMA. In other words, The manual for the package contains numerous examples and the most basic literature references are provided there.
Finally, it should be mentioned that there exist packages implementing the "static" Bayesian Model Averaging (BMA) and Bayesian Model Selection (BMS). They are BMA for BMA [26], BMS [27] for BMS and mlogitBMA [28] for BMA with multinomial logit data. Moreover, as BMA is a quite popular method in social and nature sciences, there exist also packages which apply BMA/BMS as a part of certain modelling schemes. In particular, ensembleBMA [29] uses BMA to create probabilistic forecasts from ensemble forecasts and weather observations. metaBMA [30] uses BMA for some meta-analysis models. INLABMA [31] fits spatial econometrics models with the help of BMA. dga [32] uses BMA for capture-recapture. spatial.gev.bma [33] allows to fit a hierarchical spatial model for the generalized extreme value distribution with the option of Bayesian Model Averaging over the space of covariates. MHTrajectoryR [34] uses BMS in logistic regression for the detection of adverse drug reactions.

Economic Motivation
Now, the economic motivation for the use of the implemented methods is provided. Indeed, DMA joins a few features of econometric modelling together. First of all, the final forecast is produced out of several regression models by model averaging. Secondly, the method is a Bayesian one, i.e., probabilities are interpreted in a degree-of-belief way. Indeed, for example the DMA forecast for time t is made on the basis of data up to time t − 1 only. Moreover, the gain of a new data results in a direct parameters' updating. Therefore, in DMA both regression coefficients and weights ascribed to the models vary in time.
Bayesian methods are not the mainstream of modern econometrics. However, these methods are gaining more and more interest recently. There are various reason for this. First of all, one can link this to the increasing amount of data that a potential researcher has to cope with in his or her research. Due to the technological progress usually one is faced with the case of many potential explanatory variables. Despite the fact that most of them are probably unimportant, the researcher usually does not know which ones should be rejected.
Of course, up to some point conventional methods can still be used. But unfortunately precise estimates of parameters usually cannot be done due to the lack of enough information. The simplest example is when the number of explanatory variables is greater than the number of observations in the time-series. For example, even in the case of a linear regression the standard ordinary least squares estimation a singular matrix would emerge, resulting in the impossibility of taking its inverse. In the Bayesian framework, still a meaningful formula is derived. It also seems that the Bayesian methods deal better with over-parametrization and over-fitting issues. Some kind of conventional methods to deal with omitted variables bias, like hypothesis testing to select the "true" model out of many possible is also problematic in case of many variables due to computational issues [35]. All this results in an increasing concern about the Bayesian methods in new researches, which is thoroughly described in various reviews and monographs [36][37][38][39][40][41].
In this place it is worth to mention that various approaches can be found in recent trends in forecasting. Narrowing, as an example, to oil price, forecasting methods can usually be classified into time-series models, structural models and some other methods like machine learning, neural networks, etc. This is well described in the recent extensive surveys [42][43][44][45][46]. Generally, time-series models are focused on modelling rather volatility than the spot prices. Structural models by definition include causal relations, but they usually have a good predictive power in certain periods and very poor in others. Also, the other methods, basing on wavelet decomposition, neural networks, etc. usually omit influence of other factor and focus on a single time-series [47][48][49][50][51][52][53][54][55][56][57][58][59][60][61]. These makes DMA an interesting method for a practitioner.
The next aspect of DMA is that it allows for regression coefficients to be time-varying. Indeed, in the presence of both slow and rapid (structural breaks) changes in economy, such a property of the econometric model is very desirable. Of course, such an approach also exists in conventional methodology, for example, as recursive or rolling window regressions. But since the seminal paper by Kalman [62] more sophisticated methods are usually used. They allow to incorporate uncertainty of variables' estimates in the presence of noise in the measurement, etc. As a result more accurate estimates are obtained [63].
Another important feature of DMA is that the final forecast comes from forecast combination. As already mentioned in many economic problems the researcher is not sure which variables should be included in the model. In other words, narrowing for instance to linear regression models, the research has to cope with uncertainty about the model. Of course, one approach is to use some methods to uncover the "true" model out of all considered. Actually, one of the assumptions of DMA is that such a "true" model exists, so with upcoming new data its weight should increase. But the forecast in DMA is formulated as a weighted average from the predictions of all possible models concerned. In this way DMA incorporates a model averaging technique. Since the seminal paper by Bates and Granger [64] it is know that forecast combination can be beneficial. Bates and Granger [64] provided an example when forecast combination with suitable setting the weights for two models can result in smaller Mean Square Error than those of two separate models. Yet, since then in many cases it was found that not only on a theoretical ground model averaging can be beneficial over model selection. Just in case of practical applications, i.e., seeking a method to produce smaller errors, to better fit the historical data, and/or to produce more accurate forecast, it was found that model averaging can be useful for practitioners [65][66][67][68][69][70][71]. Indeed, model averaging is a popular technique not only in the Bayesian setting. Various approach are developed, basing on different underlying theories [2, [72][73][74][75][76][77][78][79][80]. Recently, a very detailed surveys in model averaging in economics has been written [81][82][83], to which the interested reader is referred.
In economics, model averaging within the Bayesian framework was well adapted in studying the economic growth [84]. However, the examples for fDMA in this paper are taken from the oil market, i.e., the spot oil price is examined. Indeed, this topic was studied, for example, by Drachal [85] and Naser [86]. Therefore, the reader interested in financial and economical perspective is referred to these papers. Herein, just fDMA package is described, and oil market data serve just as an illustrative examples how to use the package -not how to do a good economic research. In particular, the data set is greatly reduced in order to execute the examples in a few seconds. Therefore, in Table 1 a collection of several potential oil price drivers is presented. These relationships, their strengths and direction are also found to be time-varying. It has to be stressed that variables in this table are only potential oil price drivers. Although in some cases direct causal relationship can be found from literature, it is enough just to have a tentative supposition that given variable might be somehow linked with oil price. In this way the list of potential drivers is reasonable, but still uncertainty about them exists.

Driver Reason to Treat as a Potential Driver and Source
Interest rate Higher rate results in higher price of a non-renewable resource (Hotelling's rule). Higher rate results in lower price of a commodity because of the cost of holding inventory [87].
Supply and demand Increase in demand results in price increase. Increase in supply -in price decrease. This is sometimes known as the law of supply and demand [88][89][90][91][92][93][94].

Stock markets
Empirically it was found that oil price and stock indices are usually following similar time paths. Also, that predictive accuracy of oil price forecasts increase when data from stock markets is used. Oil price can affect interest rates, production, GDP, which further affects the expected free cash flow. This further affects stock prices. Moreover, there is a rising financialization of oil market, i.e., since 2000s increasing links between oil and stocks markets [95,[103][104][105][106][107][108][109][110][111][112].

Inventories
They can be released to cover supply shortages, as so called buffer inventory. Also, they can be stored to be sold at higher prices in the future. This is called speculative inventory [119][120][121][122].

Oil price volatility
It has a direct effect on the energy sector. Higher volatility might put pressure on consumers to switch to alternative energy sources, and therefore, demand decrease [123,124].

Policy uncertainty
Empirically, economic policy uncertainty increases predictive accuracy of oil price forecasts. Usually, positive uncertainty shocks affect commodity prices returns negatively [141][142][143].
As mentioned, for the most detailed economic analysis of the literature the reader is referred to the already mentioned papers. Moreover, it is worth to notice that there exist a few, both up-to-date and extensive, reviews of various forecasting techniques of oil price [42][43][44][45][46]. Also, it is interesting to consider how the Bayesian approach in general is used in forecasting oil price. Basing on the Scopus database and narrowing to last 10 year, it can be concluded that conventional methods or those based on, for example, neural networks and machine learning are still more popular. But interestingly, the number of papers with the Bayesian approach has been rapidly increasing over the past few years [85,86,96,99,115,135,137,[143][144][145][146][147].
Some modification of DMA implemented in fDMA allows to include data about Internet queries. Indeed, use of such data in forecasting is raising popularity. It was also found useful in modelling oil market [148]. The exact use in DMA is described later in this paper.

Theoretical Framework
In this section the theoretical framework of fDMA is shortly described. In particular, Dynamic Model Averaging (DMA), Dynamic Model Selection (DMS), Median Probability Model and informationtheoretic averaging.

Dynamic Model Averaging (DMA)
DMA was introduced in great details in the original paper by [1]. However, below a short exposition is presented, necessary to understand what each function in fDMA does.
Suppose that y t is the forecast time-series (dependent variable), and let x (k) t be the column vector of independent variables in k-th regression model. For instance, 10 potential oil price drivers are listed in Table 1. If each of them would be represented by a suitable time-series, then 2 10 possible linear regression models can be constructed. Indeed, each variable can be included or not included in a model. So, 2 choices are possible for each variable, constituting 2 10 possibilities. This includes a model with constant only. So, in general having potentially useful m independent variables, up to K = 2 m models can be constructed. In other words the state space model is given by where k = 1, . . . , K and θ t is the column vector of regression coefficients. It is assumed that errors follow the normal distribution, i.e., . Please notice herein that having m potential explanatory variables 2 m is the upper limit of constructed models. However, all methodology described in this paper (if not stated otherwise) is applicable for any subset of these 2 m models, i.e., K ≤ 2 m .
Let L t = k if the process is governed by k-th model in time t. The evolution can be determined by K × K transition matrix (p k,l ) with p k,l = p[L t = l|L t−1 = k]. For large K computations are impossible, therefore an approximation is necessary. In particular, let θ t = (θ (1) t , . . . , θ (K) t ) , then the underlying state consists of the pair (θ t , L t ). Its probability distribution is p(θ t , . However, the approximation can be used, i.e., where α is a forgetting factor, i.e., a fixed number between 0 and 1. Further, π t|t−1,k are called posterior probabilities. During the numerical approximations, zeros can emerge in Equation (2). Therefore, following [1], in the package Equation (2) is replaced by where where λ is the next forgetting factor between 0 and 1. Additionally, π t|t,k = p[L t = l|Y t ] can be computed in the following way where f k (y t |Y t−1 ) is the predictive density of k-th model at y t . But this is given as the density of is updated by the recursive moment estimation, i.e., where e (k) is the error term from the one-ahead prediction of k-th model. If the updated V t ) regression coefficients are updated in the following waŷ On the other hand, E (k) t is updated in the following way In order to set the above recursive computations some initial values have to be set. Assuming that initially all K models are equally probable (i.e., setting the non-informative prior) the below formula is given for every k = 1, . . . , K, and the vectors of regression coefficients are initially set to zeros, i.e., θ (k) Moreover, Var(x where j k is the number of dependent variables in k-th model and β is the estimated intercept term in a linear regression model with y as the dependent variable and whole x as the independent variables. It can happen during the numerical estimations that some Var(x (k) i ) will be approximated to 0. In such a case, in fDMA if zero value emerge, then it is replaced by a small constant computed analogously as in Equation (3).
Finally, the DMA forecast is given bŷ All in all, the above scheme can be viewed as estimating K time-varying parameters regression models (TVP), and then averaging them with a set of recursively updated weights. In particular, having the variables y t and x (k) t the user has to specify the forgetting parameters α and λ, and the initial variances V (k) 0 . Then, the other initial parameters are set through Equations (11)- (13). Next, the process is updated recursively. Each of K models are estimated through Equations (1), (8), (10), (6), (9) and (7). Weights are updated by Equations (2) and (5).
It is worth to notice that if α = 1 and λ = 1 is taken, then the above scheme reproduces in a recursive way Bayesian Model Averaging (BMA).

Dynamic Model Selection (DMS)
Dynamic Model Selection (DMS) is based on the same idea, as the one behind DMA. The only difference is that in DMA model averaging is performed, whereas in DMS -model selection. In other words, for each period t the model with the highest posterior probability is selected. This means that just Equation (14) is modified toŷ DMS =ŷ where h t denotes this model out of K models, which corresponds to the highest π t|t−1,k given by Equation (2).

Relative Variable Importance
Before proceeding further it is necessary to explain the concept of a relative variable importance.
The idea is just to sum posterior probabilities of models, which contain given explanatory variable. In other words, consider the posterior probabilities given by Equation (2), but not for all k = 1, . . . , Konly for those models, which contain the given variable as explanatory one: where 1 k (x j ) = 1 if k-th model contains x j as an explanatory variable, and 1 k (x j ) = 0 otherwise.
Of course, this concept can be applied both in case of model averaging and model selection. Simply, in model selection in Equation (15) will be no summing, but just one term -the one corresponding to the selected model. So in such a case relative variable importance reduces to trivial outcome. Generally, this concept can be easily transferred to other model averaging schemes, like the information-theoretic one, for instance. The idea is just to replace the posterior probabilities π t|t−1,k by suitable weights. Indeed, π t|t−1,k are just weights ascribed to the averaged models in DMA.
Actually, this concept is very interesting from the economical point of view. Indeed, in the mentioned applied papers which uses DMA methodology, this concept has been used and discussed. However, a great caution has to be taken. First of all, the term importance is used, in order not to misuse with statistical significance. Indeed, relative variable importance is not connected with any statistical test, it is also a very relative measure. It measures how the given variable becomes more "appreciated" within the set of considered models.
Secondly, it is not known how to deal with the situation that quite high weight is ascribed to the model containing given variable and some other variables. In other words, it might happen that such a model is preferred not because of including the variable the researcher is interested in, but because of the joint "importance" of this variable and the other ones in the model. The topic of jointness in Bayesian framework is a separate research problem [169].
Another, directly coming to the mind idea is to consider the expected number of variables. In other words, to consider the weighted average of the number of explanatory variables (including constant term if present), where the weights are given by posterior probabilities (or other suitable weight, if other kind of model averaging is performed). In other words, to consider where x 0 stands for the constant term.

Median Probability Model
Barbieri and Berger [2] observed that selecting the model with highest posterior probability, although somehow desirable under very general conditions, is still optimal only in case of competing only two models, and also in the case of linear models having orthogonal design matrices. But not in general conditions. Therefore, they proposed Median Probability Model.
In the already presented framework the application of their approach is very similar to DMS. However, the selected model is the one which contains as explanatory variables exactly those whose relative variable importance is equal to or greater than 0.5.
It should be noticed that if K = 2 m , i.e., all possible models are considered such a model always exist. However, if K < 2 m it might not exist.

Internet Searches
Koop and Onorante [170] proposed to modify Equation (2) in such a way that it includes the information from Internet queries. The motivation behind this is that the weight ascribed to the model should include the information about the interest of investors in variables in this model. For instance, if there is an increasing interest in exchange rates, then models with this explanatory variable should be given somehow higher weights than the models without this variable. The data about Internet searches are available directly from, for example, Google [171]. These data are the numbers between 0 and 100. Moreover, they correspond not to the absolute volume of searches, but to the relative one -in comparison to all searches. Therefore, these numbers can reasonable represent interest in given variable (representing some economic factor). In order, to interpret these data as probabilities they just have to be divided by 100 to fit between 0 and 1. They are called Google probabilities in this paper.
Of course, it is another topic how to suitably choose search terms in the context of a particular research study. Moreover, how to include this information in model averaging is also an open topic. What Koop and Onorante [170] proposed is, first, to compute where g i,t denotes the Google probability of i-th variable at time t, i.e., the number obtained from Google Trends [171] corresponding to Internet searches of the query associated with i-th variable, divided by 100; and I N denotes variables included in k-th model, and OUT denotes variables not included in k-th model. It can happen during the numerical estimations that some ∏ i∈I N g i,t or ∏ j∈OUT (1 − g j,t ) will be approximated to 0. In such a case, in functions in fDMA it will be replaced by a small constant computed analogously as in Equation (3).
Next, Equation (2) is replaced with where ω is a fixed parameter, i.e., a number between 0 and 1. Of course, in case of ω = 1 this modification replicates the basic DMA without Google data. Naturally, for any other kind of model averaging the above concept can be similarly mimicked. The only thing is to replace the weights used in that forecast combination scheme, w t,k , by ω · w t,k + (1 − ω) · p t,k .

Dynamic Occam's Window
Actually, DMA is quite computationally demanding technique. Generally, every new extra explanatory variable added to DMA increases the time of computations twice. Therefore, if dealing with around 10 potential explanatory variables DMA still can be performed on an average computer machine. However, if the number of variables grows and is around 20 the time needed grows in exponential pattern. The use of simply faster machines, or more machines in parallel way is still not a satisfactory solution.
Onorante and Raftery [172] proposed to apply dynamic Occam's window. In particular, their procedure, modifying DMA, can be described in the following steps. It should be noticed that the implementation slightly differs from the one described by Onorante and Raftery [172], but the idea remains unaffected.
Suppose that one would like to estimate DMA for some set of models M with observations covering the period up to time T. And that this is computationally infeasible, i.e., M is too large.
The procedure is recursive, in the following sense.

1.
Assume that the standard DMA was performed for some subset of models M t ⊂ M, but for the observations up to time t ≤ T.

2.
The DMA forecast is produced. It is called DMA-E. Basing on the weights given by Equation (5) the cut-off is performed, i.e., DMA forecast for the period t is made only from models with Of course, the weights for the remaining models are normalized to sum up to 1. This forecast is called DMA-R.

3.
The above cut-off reduces the set of models M t . Such a reduced set is now expanded by new models constructed by adding or removing exactly one explanatory variable from all possible variables to each model in this reduced set of models. The constant term is not included in this adding/removing procedure. As a result, the set of models M t+1 is created. 4.
If t < T Step 1 is performed with the updated set of models. Otherwise, the procedure is stopped.
In order to begin, the initial set of models should be specified, i.e., M 0 . As M 0 = M is usually too large even for the first step, there are two natural candidates. One can start with some random subset M 0 ⊂ M. The second option is to specify M 0 as the models consisting of exactly one explanatory variable. Moreover, the cut-off threshold C between 0 and 1 has to be specified. Moreover, the user should decide which forecast would be of his or her interest: DMA-R or DMA-E.
Of course, the above scheme requires to estimate standard DMA T times, if the length of time-series is T. But the idea behind the above concept is that in certain cases the suitable selection of threshold C can greatly reduce the number of models used in averaging. Therefore, leading to overall speed increase.
Actually, the original version, described by Onorante and Raftery [172] was not making cut-off immediately as in the above presentation. It was allowed for the full initial set M to be estimated with standard DMA for some short period. However, in majority of tests performed by the author of fDMA such calculations were still taking too much time. Therefore, the idea was kept, but small details are modified in fDMA. Indeed, Onorante and Raftery [172] also stated directly that, for example, the models expansion through adding/removing one explanatory variable might be replaced by some other procedure.

Fundamental Functions
In this section the main functions of fDMA are described. The main function in this package is fDMA. It should be used in the following way. Below the arguments for this function are explained: • y should be a numeric or a column matrix representing the dependent variable. If it is xts, then plots will have time index on the x axis, • x should be a matrix of explanatory variables. Observations should go by rows, and different columns should correspond to different variables, • alpha should be a numeric representing the forgetting factor α used in Equation (2), • lambda should be a numeric representing the forgetting factor λ used in Equation (4), It is also possible to specify lambda as numeric vector. Then, its values are automatically ordered in descending order, and if numbers are not unique they are reduced to become unique. The idea is that if more than one value is given for lambda, then the model state space, i.e., mods.incl, is expanded by considering all these models with given values of lambda. The outcomes of fDMA are then ordered by columns in a way that first outcomes from models with first value of lambda are presented, then from models with second value of lambda, etc. This specification can be used if the user would like to perform the model combination scheme in which models specified by mods.incl are additionally treated as separate new models each with different value of λ given by lambda. Such an approach is given by Raftery et al. [1] at the end of their paper, • initvar should be a numeric. It represents the initial variances in the state space equation, V (k) 0 . It is taken the same for all k = 1, . . . , K models, • W is optional. If W = "reg" then E (k) 0 are specified in the initial step as in Equation (13). On the other hand, if a positive numeric is given, then Equation (13) is modified in the following way: By default W = "reg", • initial.period is optional. This numeric indicates since which moment MSE (Mean Squared Error) and MAE (Mean Absolute Error) should be computed. As fDMA already produces some forecast quality measures, the user might require to treat some first observations as "training period". By default the whole sample is used to produce forecast quality measures, i.e., initial.period = 1, • V.meth is optional. If V.meth = "rec", then the recursive moment estimator as in Equation (7) is used. If V.meth = "ewma", then the exponentially weighted moving average (EWMA) is used, i.e., Equation (7) is replaced by where κ (kappa) has to be also specified. EWMA is a common estimator in finance. For instance, Riskmetrics calls κ a decay factor and suggests κ = 0.97 for monthly data, and κ = 0.94 for daily data. Koop and Korobilis [157] used κ = 0.98 for quarterly data. EWMA was also used by them when ARCH effects in Equation (1) were suspected, as other methods would increase the computational burden too much. By default V.meth = "rec" is used, • kappa is optional numeric. It has to be specified if V.meth = "ewma". Then, it corresponds to κ in Equation (20), • gprob is optional. This matrix represents Google probabilities, g i,t , as in (17). In other words, columns should correspond to different explanatory variables, i.e., the columns of x. These values should be not a direct Google Trends data, but these search volumes index divided by 100. It should also be noticed that gprob is not lagged one period back automatically inside the function. If nrow(gprob) < length(y), then the method by Koop and Onorante [170] is used for the last nrow(gprob) observations. For the preceding ones the original method by Raftery et al. [1] is used. In such case a warning is generated. • omega is optional. This numeric has to be specified if gprob is used, and it represents the parameter ω from Equation (18) ) However, it should be noticed that parallel computations are not always desired. It can happen that additional time for overheads will take too much time, and sequential computations would be faster. Therefore, if dynamic Occam's window is applied, and parallel = TRUE, then parallel computations are turned on only for rounds in which 2 10 or more models are estimated. This was chosen basing on some tests. The reason for such a methodology is that forcing all rounds to be computed in a parallel way can result in a situation that in rounds where too few models are generated overheads would increase the computation time. By default parallel = FALSE is used, • m.prior is optional. This numeric represents a parameter π for general model prior. In other words, Equation (11) can be replaced by where p k is the number of variables including constant term in k-th model and m is the total number of potential explanatory variables including constant considered in the forecast combination scheme [173,174]. By default m.prior = 0.5, which corresponds to the uniform distribution, i.e., non-informative priors. Then, Equation (21) reduces to Equation (11). The interpretation of π is that the prior expected number of explanatory variables in the model is m · π, so by changing π the user can modify the initial weights giving more attention to models with more variables or to rather parsimonious models, • mods.incl is optional. This matrix indicates which models should be used for the estimation. The first column indicates inclusion of a constant. In other words, different models are differentiated by rows, whether columns correspond to the explanatory variables. Inclusion of a variable is indicated by 1, omitting by 0. By default all possible models with a constant are used, • DOW is optional. This numeric represents the threshold for dynamic Occam's window, i.e., the parameter C in Equation (19). If DOW = 0, then no dynamic Occam's window is applied. By default DOW = 0 is used. Dynamic Occam's window can be applied only to Dynamic Model Averaging (DMA), i.e., if model = "dma", • DOW.nmods is optional. This numeric indicates the initial number of models for dynamic Occam's window. Of course, it should be less than the number of all possible models and larger than or equal to 2. These models are randomly chosen. So if the user wants to start with all models specified by mods.incl, then he or she should just specify DOW.nmods equal to the number of all models given by mods.incl. If DOW.nmods = 0, then initially models with exactly one explanatory variable and a constant term are taken. By default DOW.nmods = 0 is used, • DOW.type is optional. DOW.type = "r" corresponds to DMA-R in dynamic Occam's window. DOW.type = "e" corresponds to DMA-E in dynamic Occam's window. By default DOW.type = "r" is used, • DOW.limit.nmods is optional. This numeric indicates the maximum number of models selected by dynamic Occam's window. In other words, it can happen that the cut-off specified by C in dynamic Occam's window will leave too many models for efficient computations. If the user wants to use the additional limitation, i.e., to be sure that no more than DOW.limit.nmods models are left after the cut-off, this parameter should be specified. By default no limit is set, • progress.info is optional. This logical is applicable only if dynamic Occam's window is used.
Otherwise it is ignored. If progress.info = TRUE then the number of the current recursive DMA computation round and number of models selected for this round are printed on the screen as the computations go. This feature can be useful if the user is uncertain about the existence of some bottleneck in dynamic Occam's window. For instance, the cut-off level can result in reasonably small number of model up to some period, but then suddenly too many models would be taken. av is optional. If av = "dma", then the original DMA averaging scheme is performed. If av = "mse" then predictive densities in Equation (5) are replaced by the inverses of Mean Squared Errors of the models [175,176]. If av = "hr1", then they are replaced by Hit Ratios (assuming time-series are in levels). If av = "hr2", then they are replaced by Hit Ratios (assuming time-series represent changes, i.e., differences). By default av = "dma".
If y is the dependent variable, and x represents first lags of independent variables, then the classical one-ahead forecast is computed. In order to compute h-ahead forecast (h > 1) simply more lags of independent variables should be taken, i.e., x should represent h-th lags of independent variables.
The outcome of fDMA is the object of class dma, i.e., it is a list of: • $y.hat: forecasted values, • $post.incl: relative variable importance for all explanatory variables (also called posterior inclusion probabilities), as given by Equation (15) (2) for all k = 1, . . . , K from the last period, • $exp.lambda: the expected values of lambda parameter. This is meaningful only if lambda was specified as numeric vector. Then, as the models are given different weights given by Equation (2) the average value of λ varies in time.
For objects of class dma the results can be also easily presented. In particular, print.dma prints the parameters, RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error) from the estimated model. It also shows the number of observations, the number of models used in the forecast combination scheme and the number of variables including constant used. The number of models does not count multiple lambda. The function also shows forecast quality measures for alternative forecasting methods.
The function summary.dma produces the outcomes as print.dma. Additionally, for Dynamic Model Averaging (DMA), it shows how often (in comparison to the whole analysed period) a relative variable importance for a given explanatory variable exceeds 0.5. It also shows minimum, maximum and mean relative variable importance for every explanatory variable throughout the analysed period.
For Dynamic Model Selection (DMS) and Median Probability Model, it shows how often (in comparison to the whole analysed period) a given variable is present in the selected model. Also, plot.dma is available. Depending on the method used (DMA, DMS or Median Probability Model, and whether dynamic Occam's window was applied) various outcomes can be graphically presented. In particular, • actual and predicted values, • residuals, • the expected number of variables (including constant), • relative variable importance for all explanatory variables both on one plot, or in separate png files, saved in the current working directory, and joining them into one big plot, also saved as png file in the current working directory, • the expected coefficients (including constant) on one plot, or in separate png files, saved in the current working directory, and joining them into one big plot, also saved as png file in the current working directory, • the expected value of lambda, • posterior model probabilities, i.e., the ones given by Equation (2) It is strongly suggested to execute graphics.off from grDevices before executing plot commands. Of course, the user should take care to save all other plots before executing this command, as they can be lost. But if graphics.off is not executed sometimes a legend might cover the important parts of the plot.
Finally, for dma object some other useful methods are implemented. In particular, coef extracts the expected values of regression coefficients. fitted extracts the predicted values. predict can be used to estimate the predicted values basing on the already estimated expected values of regression coefficients, but for some arbitrary newdata, as well as, to estimate the predicted values basing on the already estimated expected values of regression coefficients in the last period, but also for some arbitrary newdata. residuals extracts the residuals from the model. rvi extracts relative variable importances, i.e., posterior inclusion probabilities given by Equation (15).
Usually, in applications of DMA it was suggested to take α = 0.99 and λ = 0.99. However, it is based on experimental ground; there is no theory indicating which values of forgetting factors should be taken [1]. Generally, they have some interpretation. For example, setting λ = 0.99 for monthly data, means that observations 3 months ago receive around 97% as much weight as the observation of the last month. For λ = 0.90 it would be around 73%. In other words, observations lagged i periods back are given λ i weight. This is somehow similar to applying a rolling window regression with a window size of (1 − λ) −1 . However, forgetting factors are responsible for controlling the degree of instability in coefficients. But rapidly changing coefficients can "catch the noise" instead of reasonably adapt to data, i.e., they can inflate estimation errors. Therefore, some researchers advise to compare the results based on different forgetting factors [150]. In this context, grid.DMA is useful. This function is just a wrapper for fDMA, so its arguments are used in the same way; except three of them: The object grid.dma can also be easily presented with print.grid.dma, summary.grid.dma or plot.grid.dma. The first function just prints RMSE and MAE for all estimated models. The second function produces the same information, but additionally indicates the indices for the model minimizing RMSE, and for the model minimizing MAE. Finally, the third function allows to graphically present: • RMSE for all estimated models, • MAE for all estimated models, • relative variable importance for all estimated models in separate png files in the current working directory, and additionally join them in one big plot also saved as png file in the current working directory, • the expected coefficients (including constant) for all estimated models in separate png files in the current working directory, and additionally join them in one big plot also saved as png file in the current working directory.
Of course, as previously for plots it is suggested to first execute graphics.off. If graphics.off is not executed before plotting grid.dma object, sometimes a legend might cover the important parts of the plot.
If any of the models comes from using multiple lambda, then RMSE and MAE are not plotted. If length(grid.alpha) or length(grid.lambda) is less than 2, then RMSE and MAE are not plotted.
From the above, it can be clearly seen that grid.dma can be used for quick and easy robustness check, whether different forgetting factors lead to similar economical conclusions. Finally, it was already mentioned that the applied forecast combination schemes are based on time-varying parameters regressions (TVP). In other words, such TVPs have to be computed, independently from each other, and the rest is just to ascribe them certain time-varying weights. Therefore, there are two arguments favouring including the function tvp in fDMA. The first argument is that a time-varying estimations must be done inside the DMA scheme. Therefore, as a separate function they can also be useful for some other purposes. The second argument is that with tvp as a separate function, fDMA can be written in a more elegant and readable way. In order to improve the computations speed this function is written partially in C++ [12].
The use of this function is the following: The arguments for this function are similar as for fDMA. The only difference is that kappa is optional. If not specified the recursive moment updating as in Equation (7) is performed. If kappa is specified then EWMA is performed as explained in Equation (20). Argument c is optional. This logical argument indicates whether constant term is included. If not specified, then c = TRUE is used, i.e., constant term is included. In particular, it is not possible to set c = FALSE, if ncol(x) = 0. In such a case the function will automatically reset to c = TRUE inside the code. Actually, it is shown below when such a case that ncol(x) = 0 can emerge.
The outcomes of tvp function are the object of class tvp, i.e., list of: These outcomes can be easily presented with a help of functions print.tvp, summary.tvp and plot.tvp. The first function prints mean regression coefficients from the analysed period, RMSE and MAE from the estimated model. The second function does exactly the same. The third function allows to graphically present: • actual and predicted values, • residuals, • regression coefficients on one plot, or in separate png files, saved in the current working directory, and moreover, to paste them into one big plot (also saved as png file in the current working directory). The arguments are similar like in tvp. However, grid.lambda is a numeric vector of different values of lambda. Optional parallel.grid is a logical indicating whether parallel computations should be used. By default parallel.grid = FALSE is used.
The outcomes are the object of class grid.tvp, i.e., a list of: • $models: list of estimated tvp objects, • $fq: matrix with RMSE and MAE of all estimated models.
As previously, the outcomes can be easily presented with print.grid.tvp, summary.grid.tvp and plot.grid.tvp functions. The first one prints RMSE and MAE for all estimated models. The second one additionally finds the model minimizing RMSE and the model minimizing MAE. The last one allows to graphically present: • RMSE for all estimated models, • MAE for all estimated models, • coefficients (including constant) for all estimated models, both in separate png files in the current working directory, and collected into one big plot (also saved as png file in the current working directory).

Information-Theoretic Averaging
Although, the particular Bayesian model averaging scheme is the main topic of this package, herein also information-theoretic averaging is implemented. The motivation is to provide a tool, which would be able to contain some competitive approach as, for example, a set of benchmark forecast. With the package described herein the presentation of the outcomes from information-theoretic averaging can be done is a very similar way as for DMA, DMS and Median Probability Model. And, indeed, such a comparison can be interesting for researchers [80].
Of course, some other packages are designed especially for information-theoretic averaging [19]. The aim of fDMA is in no sense to compete with them. But just to provide some simple and easy tool to compare DMA with the conventional method. Therefore, information-theoretic models are implemented just in very basic versions, as some kind of an add-on to fDMA package.
First of all, in case of model averaging the Bayesian approach dominates in the literature. The conventional approach (i.e., frequentist one) is more popular in the context of model averaging in biological, ecological, and similar sciences, rather than in economics or finance. It should be noticed that this approach is based on completely different assumptions than the Bayesian one. For example, using all possible models is not a good approach.
First, as being a conventional method (i.e., opposite to the Bayesian one), the user has to consider certain limitations of the number of observations. In other words, they should be large enough to provide correct estimations of parameters. Secondly, including models having no theoretical underlying in the averaging procedures, is widely criticized. In other words, the user should not relocate the uncertainty about the correct model to the model averaging procedure (in information-theoretic approach). He or she should rather first examine the models thoroughly, and in the context of the underlying (for example, economic) theory, and then select possibly few models to averaging [77,78].
The most popular scheme in information-theoretic averaging is to ascribe weights basing on Akaike Information Criterion (AIC). This comes from the fact that AIC can be interpreted as the estimate of the difference between the Kullback-Leibler distance of two competing models. And Kullback-Leibler distance is a way to calculate how much information is lost when one approximates one distribution with another [80,177].
Suppose there are k = 1, . . . , K competing models. And each of them is characterized by some AIC k . Denote the AIC of the model with the minimum one by min k=1,...,K {AIC k }. And consider Ψ AIC k = AIC k − min k=1,...,K {AIC k }. The weights for information-theoretic model averaging [77] are then given by The numerator can be interpreted as the relative likelihood of k-th model, and denominator is used for the normalisation. Sometimes Akaike Information Criterion with a correction (AICc) is suggested to be used [77]. For instance, if the ratio of number of observations (n) to the number of model's parameters (l) is less than 40. Then the weights are given by with Ψ AICc k = AICc k − min k=1,...,K {AICc k }. The distinction between AIC and AICc is given by AICc = AIC + 2l(l+1) n−l−1 . Sometimes, Bayesian Information Criterion (BIC) is also used [77]. Then the weights are given by with Ψ BIC k = BIC k − min k=1,...,K {BIC k }. Significantly often, in certain applications, despite many other tries, equal weights lead to quite well-performing forecast in a sense of accuracy [178]. In other words, it is sometimes worth to consider the weights given by with K being the number of the averaged models. Finally, the inverse of Mean Squared Error (MSE) can be used in a reasonable model averaging [178]. Then the weights are given by with K being the number of the averaged models. The function which allows to perform certain model averaging schemes settled in the conventional approach is altf2. As this implementation was done in order to have some benchmark (or alternative) forecast, this function automatically generates forecast quality measures like ME (Mean Error), RMSE (Root Mean Squared Error), MAE (Mean Absolute Error), MPE (Mean Percentage Error) and MAPE (Mean Absolute Percentage Error). These measures are computed with a help of forecast, so the detailed formulas can be found in the description of this package [20]. The other computed forecast quality measure, i.e., HR (Hit Ratio), is computed by the function hit.ratio, described later in this paper.
altf2(y, x, mods.incl = NULL, gprob = NULL, omega = NULL, av = NULL, window = NULL, initial.period = NULL, d = NULL, f = NULL, fmod = NULL, parallel = NULL) The arguments for this function are the following: • y should be a numeric or a column matrix of a dependent variable, • x should be a matrix of explanatory variables, in which different columns correspond to different explanatory variables, and rows enumerate the observations, • mods.incl is optional. This matrix indicates which models, constructed out of explanatory variables given by x, should be used in the averaging scheme. If not specified all possible models are taken. This argument is similarly used as in the already described fDMA function, • gprob is optional. This is a matrix of Google probabilities, columnwisely corresponding to explanatory variables given by x; similarly like the one in fDMA function, • omega is optional. This numeric corresponds to ω in a suitably modified Equation (18). In other words, let w k be the weight given by one of Equations (22)- (26). Then, w k can be replaced by where p k is computed analogously as in Equation (17), • av is optional. This parameter indicates a method for model averaging.
av = "ord" corresponds to equal weights for each model, i.e., weights are computed from Equation (25), -av = "aic" corresponds to information theoretic model averaging based on Akaike Information Criterion (AIC), i.e., weights are computed from Equation (22), -av = "aicc" corresponds to information theoretic model averaging based on Akaike Information Criterion with a correction for finite sample sizes [77], i.e., AICc. In other words, weights are computed from Equation (23), -av = "bic"| corresponds to information theoretic model averaging based on Bayesian Information Criterion (BIC), i.e., weights are computed from Equation (24)  • fmod is optional. This represents a class dma object. Then previously estimated DMA, DMS or Median Probability Model can be quickly compared with alternative forecast in a sense of forecast quality measures, • parallel is optional. This logical indicates whether parallel computations should be used. By default parallel = FALSE is used.
All weights, except the ones defined by av = "ord" and a situation that av. OLS method is chosen, are recursively updated. In other words, in the initial period equal weights are ascribed to the models. Then, they are successively updated based on the chosen criterion.
If gprob is used, then for av. OLS mean values of Google searches from the in-sample period are taken, for av. rec. OLS -mean values from periods up to the current one, for av. roll. OLS -mean values from the last window periods, and for av. TVP -values from the current period. In particular, it should be noticed that weights are computed basing on the past information; whereas gprob matrix is not lagged one period back automatically inside the function.
The outcomes are an object of altf2 class, i.e., a list of: Recursive regressions and rolling window regressions are based on rec.reg and roll.reg functions described later in this paper.
The outcomes can be easily and quickly presented with print.altf2, summary.altf2 and plot.altf2 functions. The first function prints forecast quality measures. The second, additionally provides • mean values of coefficients, • minimum, maximum and mean relative variable importance reached during the analysed time period for each explanatory variable, • frequency when relative variable importance is over 0.5 for each explanatory variable, • how often p-values (averaged over used models) for t-test of statistical significance for each explanatory variable are below 1%, 5% and 10%, respectively.
The third one allows to graphically present: • expected coefficients in separate png files, saved in the current working directory, and moreover, to paste them into one big plot (also saved as png file in the current working directory), • p-values (averaged over used models) for t-test of statistical significance for regression coefficients from applied models, in separate png files, saved in the current working directory, and moreover, to paste them into one big plot (also saved as png file in the current working directory), • weights of all models used in the averaging scheme, • relative variable importance in separate png files, saved in the current working directory, and moreover, to paste them into one big plot (also saved as png file in the current working directory), • the expected number of variables (including constant) from all models used in the averaging scheme.

Alternative Forecasts
It was already mentioned that in order to make this package an easy tool for a practitioner, some other than DMA, DMS or Median Probability Model methods were also implemented. The aim was to provide a few other, in some sense similar, or competitive, methods of model averaging. Then, the user is able to quickly compare DMA, DMS or Median Probability Model with some other forecast. Moreover, putting them into one package allows to have a visualisation of the outcomes in a similar fashion, which might greatly ease the work.
The function altf computes a few basic and common models. In particular: • naive forecast (naive), i.e., all forecasts are set to be the value of the last observation, Auto ARIMA, i.e., auto.arima from the package forecast of Hyndman and Khandakar [20]. [20]. HR (Hit Ratio) is computed as the later described hit.ratio.

ME (Mean Error), RMSE (Root Mean Squared Error), MAE (Mean Absolute Error), MPE (Mean Percentage Error) and MAPE (Mean Absolute Percentage Error) are computed by accuracy from forecast by Hyndman and Khandakar
altf(y, x, window = NULL, initial.period = NULL, d = NULL, f = NULL, fmod = NULL, c = NULL) Using this function is similar to using the already described function altf2. The arguments for this function are the following: • y should be a numeric or a column matrix of the dependent variable, • x should be a matrix of the explanatory variables, where different columns correspond to different explanatory variables, • window is optional. This numeric represents the size of a rolling regression window (a number of observations). If not specified, then 10% of all observations are taken. For the details, please see the description of the function roll.reg further, • initial.period is optional. This numeric represents the number of observation since which forecast quality measures are computed. If not specified the whole sample is used, i.e., initial.period = 1 is taken, this argument also divides the sample into in-sample and out-of-sample for non-recursive methods (OLS, AR (1) OLS, 3.
TVP-AR(2), should be computed. If not specified, then f = c(rep(TRUE, 10)) is taken, i.e., all the alternative forecasts are computed, • fmod is optional. This should be a class dma object-a model which will be compared with the alternative forecast, • c is optional. This logical indicates whether a constant term should be included in the models.
If not specified c = TRUE is used, i.e., constant term is included in the estimated models.
The outcomes are a class altf object, i.e., a list of: These outcomes can be easily visualised with print.altf, summary.altf and plot.altf functions. The first function prints the computed forecast quality measures. The second, additionally provides mean values of coefficients and how often p-values for t-test of statistical significance for each explanatory variable in the model are below 1%, 5% and 10%, respectively. The third allows to graphically present: • regression coefficients in separate png files, saved in the current working directory, and moreover, to paste them into one big plot (also saved as png file in the current working directory), • p-values for t-test of statistical significance for regression coefficients from applied models, in separate png files, saved in the current working directory, and moreover, to paste them into one big plot (also saved as png file in the current working directory).
The next function allowing to quickly estimate some alternative forecast is altf3. This function estimates a rolling regression averaged over different window sizes. Indeed, sometimes there is uncertainty about the window size for the rolling regression, and averaging different models can be a reasonable approach [179]. The arguments for this function are the following: • y should be a numeric or a column matrix representing the dependent variable, • x is optional. This matrix represents the explanatory variables. Different columns should correspond to different explanatory variables. However, if it is not specified, then only the constant term is included, • windows should be a numeric vector representing the sizes of rolling regression windows (i.e., numbers of observations), • av is optional. It indicates the method for model averaging. In particular, -av = "ord" corresponds to equal weights for each model, -av = "aic" corresponds to information theoretic model averaging based on Akaike Information Criterion (AIC), -av = "aicc" corresponds to information theoretic model averaging based on Akaike Information Criterion with a correction for finite sample sizes (AICc), -av = "bic" corresponds to information theoretic model averaging based on Bayesian Information Criterion (BIC), -av = "mse" corresponds to setting weights proportional to the inverse of the models Mean Squared Error (MSE), -If av is numeric, then weights are computed proportionally to the av-th power of the window size. In particular let there be K models considered with windows win 1 , . . . , win K . Then, the weight of k-th model is given by The exact formulas for other av methods are given by Equations (22) If not specified c = TRUE is used, i.e., constant term is included in the estimated models. Of course it is not possible to set simultaneously x = NULL and c = FALSE, as such settings would be automatically turned inside the function into c = TRUE.
Of course, for each av method, in the initial period equal weights for each model are taken, and then successively updated based on the chosen criterion.
The outcomes are a class altf3 object, i.e., a list of: These outcomes can be easily visualised with print.altf3, summary.altf3 and plot.altf3 functions. The first function prints the computed forecast quality measures. The second, additionally provides mean values of coefficients and how often p-values (averaged over the selected window sizes) for t-test of statistical significance for each explanatory variable in the model are below 1%, 5% and 10%, respectively. The third allows to graphically present: • the expected coefficients in separate png files, saved in the current working directory, and moreover, to paste them into one big plot (also saved as png file in the current working directory), • p-values (averaged over selected window sizes) for t-test of statistical significance for coefficients from the rolling regressions, in separate png files, saved in the current working directory, and moreover, to paste them into one big plot (also saved as png file in the current working directory), • weights of all the models used in averaging, • the expected window size, i.e., weighted average of the widow sizes. Finally, the function altf4 computes the selected forecast quality measures for the time-varying parameters rolling regressions averaged over different window sizes. Ascribing of weights for averaging is performed by the method of Raftery et al. [1], i.e., as in Equations (2) and (5). The difference between this method and DMA is that the state space of the models are constructed not by choosing different combinations of explanatory variables, but for a fixed set of explanatory variables various rolling windows sizes are chosen and the models constructed in such a way constitute the state space [79,179]. In other words, function tvp is performed for full set of explanatory variables, but with different windows sizes. Models obtained in such a way are then recursively weighted with Equations (2) and (5).
altf4(y, x, windows, V = NULL, alpha = NULL, lambda = NULL, initial.period = NULL, d = NULL, fmod = NULL, parallel = NULL, c = NULL, small.c = NULL) The arguments for this function are the following: • y should be a numeric or a column matrix of the dependent variable, • x should be a matrix of the explanatory variables, and different columns should correspond to different explanatory variables, • windows should be a numeric vector. It indicates the sizes of the rolling regression windows (i.e., the numbers of observations), • V is optional. This numeric represents the value of the parameter V in tvp function (taken for the rolling regression case). If not specified V = 1 is taken, • lambda is optional. This numeric represents the forgetting factor used in tvp function. If not specified lambda = 0.99 is taken, • alpha is optional. This numeric represents the forgetting factor α in Equation (2) In the above examples, in a1 models without a constant term are considered. In a2 models with the constant term only are considered. a2 produces the same result as a22. In a22 the parameter c is automatically switched to c = TRUE inside the function altf4.

Forecast Comparison
In every research it is necessary to compare the obtained outcomes (predictions) with some benchmark ones. In other words, a researcher should not only present his or her outcomes based on the methodology used by him or her; but also discuss if the presented method is beneficial comparing some other (already known and used) methods. Therefore, in some sense, it is necessary to compare the predictions from the studied model with some alternative (benchmark) ones. As it was already discussed in this paper, thanks to the package forecast [20] comparisons based on a few commonly used measures, such as ME, RMSE, MAE, MPE, MAPE are easily available.
However, in certain applications an additional measure is interesting to consider. In particular Hit Ratio (HR) analyses just whether the forecast can predict the direction of a change in the modelled time-series. In other words, this measure describes the proportion of correctly predicted movements (i.e., how often the direction of a change given by the forecast agrees with the real observed change in the data). From an investor's perspective such a measure is interesting. Although it does not allow to measure exactly his or her gains or looses, it allows to measure if, for example, the decision about buying or selling was right or wrong [150].
This function is implemented as hit.ratio.
hit.ratio(y, y.hat, d = NULL) The arguments for this function are: • y should be a numeric, vector, or one row or one column matrix or xts object, representing the dependent (forecasted) time-series, • y.hat should be a numeric, vector, or one row or one column matrix or xts object, representing the forecast predictions, • d is optional. This logical should be set d = FALSE for the level dependent time-series and d = TRUE if the dependent time-series already represent changes (i.e., differences). By default d = FALSE is used.
It is clear from the definition of Hit Ratio that the above argument d is crucial for the correct computations.
The outcome is a numeric. Another aspect of forecast comparison is to have some statistical test which could differentiate forecasts' qualities. A well-known and commonly used test to compare forecasts is the Diebold-Mariano test [180]. The great advantage of this test is that it is based on relatively weak assumptions. Therefore, it can be applied in relatively many situations. One version of this test is already implemented in the forecast package by Hyndman and Khandakar [20]. Indeed, dmtest, mdmtest and hmdmtest are wrappers for dm.test from forecast package.
In short, if i,1 , . . . , i,T and j,1 , . . . , j,T are forecast errors from two alternative forecasting methods, then the quality of each forecast can be evaluated by some loss function g. The null hypothesis is that the two methods have the same forecast accuracy, i.e., that E(g( i,t ) − g( j,t )) = 0 for all t.
All implemented functions assume that one-ahead forecasts are compared and the second power is used in the loss function. Moreover, it should be noticed that "the Diebold-Mariano (DM) test was intended for comparing forecasts; it has been, and remains, useful in that regard. The DM test was not intended for comparing models" [181].
The function dmtest computes the original Diebold-Mariano test [180]. The function mdmtest computes the modified Diebold-Mariano test. The modification is useful for small samples [182]. The function hmdmtest computes another modification of the Diebold-Mariano test. This modification is useful if the presence of ARCH effects is suspected in forecast errors. But it is also useful for small samples [183].
In this package three versions of this test are implemented. Each of them as a separate function. Moreover, the outcomes are presented as a matrix. The purpose for such a choice is that the user can easily perform this test for a set of forecasts by one command.
dmtest(y, f) mdmtest(y, f) hmdmtest(y, f) The arguments for all of these three functions are the same, i.e., • y should be a vector of the dependent (forecasted) time-series, • f should be a matrix of the predicted values from various models. The forecasts should be ordered by rows. The first row should correspond to the method that is compared with the alternative ones (corresponding to the subsequent rows).
The outcomes are a matrix, in which the first column contains the tests statistics, the next columns contains p-values, respectively to the alternative hypothesis being that the alternative forecasts have different accuracy than the compared forecast, that the alternative forecasts are less accurate than the compared forecast, and that the alternative forecasts have greater accuracy than the compared forecast.
In the above example, assuming 5% significance level the null hypothesis that two methods have the same forecast accuracy can be rejected for the second and the third alternative forecasting method, and the alternative hypothesis that these methods have different accuracy than DMA can be assumed. In these two cases also the alternative hypothesis that the alternative forecasts (i.e., av. rec. OLS and av. roll. OLS) have greater accuracy than the DMA forecast can be assumed.

Make Work Easier
In the fDMA package also a few additional functions were implemented, which can be of a general use/interest. The motivation behind this decision was just to include in the package important tools, and let the user have it at hand with no necessity to search for them in other packages or write such functions by his or her own.
The function descstat is simply a wrapper of the function describe from the package psych by Revelle [21]. This function computes selected descriptive statistics which are quite useful to see before performing Dynamic Model Averaging.

descstat(data)
The argument for this function, data, should be a matrix. The observations should be put in rows, and variables should be grouped by columns. If the argument is not a matrix, the function tries to convert the object into a matrix. For example, it works smoothly for xts objects.
The outcomes are a matrix with: • skewness, i.e., • kurtosis, i.e., In the above formulasx denotes the mean of the sample, n -the number of observations in the sample. The sample is given by x = (x 1 , . . . , x n ). Skewness is computed with a natural method of moments estimator, i.e., as the sample third central moment divided by the third power of the sample standard deviation. Kurtosis is computed as the sample fourth central moment divided by the the fourth power of the sample standard deviation and this is lessen by 3. Their properties are discussed by Joanes and Gill [184].  It is a very common situation that one needs to have all variables to have mean 0 and standard deviation 1. The function standardize rescales the values in such a way, i.e., standardizes them.

standardize(data)
The argument for this function, data, should be a matrix. The observations should be put in rows, and variables should be grouped by columns. If the argument is not a matrix, the function tries to convert the object into a matrix. For example, it works smoothly for xts objects.
The outcomes are a matrix of the standardized data. The structure of argument data is kept, i.e., the observations are in rows, and variables are grouped by columns.
In particular, let x = (x 1 , . . . , x n ) be the sample one wants to standardize. Then, the standardized sample isx = (x 1 , . . . ,x n ), wherex for all i = 1, . . . , n. µ(x) denotes mean of the sample x and σ(x) standard deviation of the sample x.

R> s <-standardize(crudeoil)
It was mentioned that in order to estimate DMA some initial parameters have to be set by the user. One of such parameters is initvar in the function fDMA, which represent the initial values of variances in the state space equation, V (k) 0 . As many models have to be estimated, these values have to correspond in some sense to the explanatory variables used in these models. Indeed, it represents in a certain sense the degree of instability of parameters in the models -how high volatility of them is assumed by the user. The problem becomes complicated if explanatory variables are of significantly different magnitudes. On the other hand, it is simple if their magnitudes are similar. For example, if all explanatory variables are between 0 and 1, then setting V (k) 0 = 1 is a reasonable choice. Therefore, it can be desirable in certain cases, for computational reasons, to rescale variables to be between 0 and 1, i.e., normalize them. The function normalize does it.

normalize(data)
The argument for this function, data, should be a matrix. The observations should be put in rows, and variables should be grouped by columns. If the argument is not a matrix, the function tries to convert the object into a matrix. For example, it works smoothly for xts objects.
The outcomes are a matrix of the normalized data. The structure of argument data is kept, i.e., the observations are in rows, and variables are grouped by columns.

R> n <-normalize(crudeoil)
On the other hand, sometimes normalization by rows of a matrix is desired. For example, Google Trends data are given as numbers between 0 and 100. If the user divides them by 100, they can be interpreted in a certain sense as probabilities [170]. However, if there are such probabilities for several variables, sometimes it might be desirable that these probabilities for all variables sum up to 1. The function gNormalize does not divide the values of an argument by 100, but rescales every row of the argument to sum up to 1. In other words, values in each row of the argument are divided by the sum of all values in this row.

gNormalize(data)
The argument for this function, data, should be a matrix. The observations should be put in rows, and variables should be grouped by columns.
The outcomes are a matrix.

R> gt <-gNormalize(trends)
R> normalize(rbind(c(0, 1, 2), c (1, 2, 3 Actually, most R packages offering Engle's ARCH test require the argument to be of a specific class. This might be some kind of a problem for the user. On the other hand, the procedure of checking ARCH effects is very common and can also be useful during DMA research. The function archtest computes Engle's ARCH test [185]. The null hypothesis of this Lagrange Multiplier test is that a series of residuals exhibits no ARCH effects. The alternative hypothesis is that ARCH(lag) effects are present. The argument lag is specified by the user.
In particular, let y t be the time-series, t = 1, . . . , n, in which ARCH effects of lag p are suspected. The test procedure is the following. First, the linear regression is estimated y t = a + t , and t are derived. Next, the below linear regression is estimated The test statistic is R 2 · (n − p), where R 2 is R-squared coefficient of the linear regression model given by Equation (27). This statistic has the χ 2 distribution with p degrees of freedom.

archtest(ts, lag = NULL)
The arguments for this function are the following: • ts should be a vector representing the tested time-series, • lag is optional. This numeric represents the suspected order of ARCH process, i.e., p from the above considerations. If not specified, then lag = 1 is taken.
The outcomes are a class htest object, i.e., a list of: Another common procedure, used in various researches with time-series, is checking stationarity. However, because of Equation (7) time-series used in DMA do not have to be stationary. Actually, in this case variance updating by Equation (20) can perform better [157]. Nevertheless, DMA is not the only method implemented in this package. Secondly, stationarity checking is an often performed procedure. The function stest computes a few common stationarity tests.
In particular, this function is a wrapper for three functions from tseries package by Trapletti and Hornik [186]. Augmented Dickey-Fuller (ADF), Phillips-Perron (PP) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS) tests for stationarity are performed. The corresponding functions from tseries package are adf.test, pp.test and kpss.test.
For ADF test the null hypothesis is that a unit root is present in the time-series. The alternative hypothesis is that the time-series is stationary. If n is the length of the tested time-series, then the lag order in the test statistic is the number obtained from discarding non-integer part of (n − 1) 1 3 . For PP test the null hypothesis is that a unit root is present in the time-series. The alternative hypothesis is that the time-series is stationary. The truncation parameter for the Newey-West estimator is the number obtained from discarding non-integer part of 4 · n 100 1 4 . Z(α) statistic is used. The PP test mainly differ from ADF test in a way to deal with serial correlation and heteroskedasticity in errors. Both ADF and PP tests are asymptotically equivalent.
Contrary to most unit root tests in KPSS test the null hypothesis is that the time-series is stationary. The alternative hypothesis is that it contains a unit root. The truncation parameter for the Newey-West estimator is the number obtained from discarding non-integer part of 3 · √ n 13 . Unfortunately, KPSS test tends to reject the null hypothesis too often.

stest(data)
There is only one argument for this function, i.e., data, which should be a matrix representing the variables to be tested. Different columns should correspond to different variables, and observations should correspond to rows.
The outcomes are a matrix, such that the tests statistics and p-values are given by columns. The tests outcomes for different variables are ordered by rows.  A very popular class of models possible to be constructed out of the given set of variables is the one consisting of models with the constant and just one other explanatory variable, or just with the constant only. The function onevar generates a matrix representing such models out of the given collection of variables, so the user can quickly generate the suitable argument mods.incl for the functions used in fDMA package.

onevar(x)
Only one argument is used for this function. x should be a matrix of explanatory variables. The outcomes are a matrix, which can serve as the argument mods.incl in various functions from package fDMA. In particular, the inclusion of a variable is indicated by 1, and omission by 0.
Recursive regression is implemented as the function rec.reg. This function is simply based on lm from the package stats.
rec.reg(y, x = NULL, c = NULL) The arguments for this function are the following: • y should be a numeric or a column matrix representing the dependent variable, • x is optional. This matrix represents the explanatory variables. Different columns should correspond to different variables. If not specified, then only the constant term is used in the regression, • c is optional. This logical is the parameter indicating whether the constant term should be included in the regression equation. If not specified c = TRUE is used, i.e., the constant term is included.
It is not possible to set c = FALSE if x = NULL. In such a case the function will automatically reset to c = TRUE inside the code.
The outcomes are a class reg object, i.e., a list of: It might happen during computations that lm function (used inside rec.reg) will produce NA or NaN. In such a case regression coefficients for the given period are taken as 0 and p-values for t-test for statistical significance of regression coefficients are taken as 1.
roll.reg(y, x = NULL, window, c = NULL) The arguments y, x and c are the same as for the already described function rec.reg. The argument window should be a numeric indicating the size of a window for rolling. In particular, for the first window − 1 observations recursive regression is computed. Then, since window-th observation the exact rolling is performed.
The outcomes are the already described reg object. Outcomes from the object of class reg can be easily presented with a help of functions print.reg, summary.reg and plot.reg. The first function prints mean regression coefficients from the analysed period, RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error) from the estimated model. If the outcomes come from the estimation of the rolling window regression, then the size of a rolling window is also printed.
The second function additionally provides how often p-values for t-test of statistical significance for each explanatory variable in the model is below 1%, 5% and 10%, respectively.
The third function allows to graphically present the outcomes. In particular: • residuals, • regression coefficients on one plot, or in separate png files, saved in the current working directory, and moreover, to paste them into one big plot (also saved as png file in the current working directory), • p-values for t-test of statistical significance for regression coefficients on one plot, or in separate png files, saved in the current working directory, and moreover, to paste them into one big plot (also saved as png file in the current working directory). Sometimes it is necessary to consider various values of parameter window in rolling regression. The function grid.roll.reg computes a set of roll.reg functions for the given values of window. In other words, it is a wrapper of roll.reg allowing to quickly compute rolling regressions for various window sizes.
grid.roll.reg(y, x = NULL, grid.window, parallel.grid = NULL, c = NULL) The arguments y, x and c are the same as the arguments for the function roll.reg. The model minimising MAE: 3 In the above example, model is the extracted model with window = 100. The computational issues with Dynamic Model Averaging are not trivial. As a result, in case of the large number of models considered the outcomes of fDMA or grid.DMA function can be too big. If the information about each of the sub-model is not so important, then the function reduce.size can be useful. reduce.size(dma.object) The argument dma.object should be a dma or grid.dma object. The outcome is a dma or grid.dma object with the information corresponding to each sub-model erased. R> ld.wti <-(diff(log(wti)))[-1, ] R> ld.drivers <-(diff(log(drivers)))[-1, ] R> m1 <-fDMA(y=ld.wti,x=ld.drivers,alpha=0.99,lambda=0.99,initvar=1) R> m2 <-reduce.size(m1)

An Example: Oil Market
The example provided below is just for the purpose of familiarizing the user with methods used in this package. First of all, it is not a copy of any real research, as such were already cited. Secondly, it does not explore all the variations how this package can be used. It just gives some initial insight about the main steps, and what can be worth to look deeper in real research.
At the beginning of this paper there was given an example from the oil market. According to this it can be said that there is an uncertainty about which time-series can be treated as useful explanatory variables for predicting spot oil price. The xts object crudeoil contains selected data from oil market, i.e., These data are in a monthly frequency. They cover the period between Jan, 1990 and Dec, 2016. They were obtained from World Steel Association [187], CBOE [188], EIA [189], FRED [190] and MSCI [191].
The xts object trends contains data from Google [171] about the Internet qeuries for selected search terms. In particular, • trends$stock_markets represents Google Trends for "stock markets", • trends$interest_rate represents Google Trends for "interest rate", • trends$economic_activity represents Google Trends for "economic activity", • trends$exchange_rate represents Google Trends for "exchange rate", • trends$oil_production represents Google Trends for "oil production", • trends$oil_consumption represents Google Trends for "oil consumption", • trends$market_stress represents Google Trends for "market stress".
These data are also in a monthly frequency. They cover the period between January 2004 and December 2016, because Google Trends does not cover the earlier period.
From the economic point of view it is reasonable to consider logarithmic differences of these time-series. Except some problem with PROD, all time-series can be assumed stationary at 5% significance level. For WTI differences also ARCH effects are present. Therefore, it seems reasonable to consider exponentially weighted moving average (EWMA) estimation of variances in DMA. Also, a few forgetting factors can be tested. As suggested by Riskmetrics for the monthly time-series κ = 0.97 is taken. All variances are less than 1. Therefore, no rescaling of the time-series seems necessary. It seems also enough to take initvar = 1 in the estimations of DMA. actual and predicted 2: residuals 3:

R> wti <-crudeoil
exp. var 4: posterior inclusion probabilities -one plot 5: posterior inclusion probabilities -separate plots (files in working directory) 6: expected coefficients -one plot 7: expected coefficients -separate plots (files in working directory) 8: exp. lambda 9: posterior model~probabilities Selection: Comparing Figures 1 and 2 it can be seen that during turbulent periods on the market, the DMA quickly adapts by ascribing higher weights to models with more variables. Indeed, this agrees with Figure 3. Relative variable importance of all explanatory variables rose in this period. It can also be seen than since 2007 the role of developed stock markets increased. However, after 2013 this role become to diminish; whereas the roles of other variables started to increase. This is very clear especially for the exchange rates. Figure 3 should be read in correspondence with Figure 4. Although, the relative variable importance can be high, the expected value of the regression coefficient for this variable can be around 0. Indeed, the high relative variable importance is simultaneously observed with non-zero expected regression coefficients for MSCI, CSP and TWEXM. So, this analysis confirms now that these three factors were playing an important predictive role for oil price between 2007 and 2013. Since 2013, the role of developed stock markets diminished, and was taken over by the exchange rates. Around 2013, the most important role was played by developed stock markets.   Finally, it can be suspected that there is some model outperforming in some sense other ones. In other words, that model selection would be preferred over model averaging. This can be checked by analysing DMS and Median Probability Model. However, from Figure 5 it is definitely clear that none of the models reached posterior probability over 0.5. Secondly, after 2007 and after 2013 none of the models seems to be superior.
It can also be questioned whether the applied method is robust to different parameter setting. For example, if other forgetting factors α and λ would lead to different conclusions. Figure 6 presents relative variable importance for all explanatory variables for all models from the object g, i.e., for all At the end, the selected model can be compared with some alternative forecasts.

Comparison with Other Packages
The speed comparison for the existing packages for Dynamic Model Averaging is presented in Table 2. The evaluation was based on estimating a model with 10 explanatory variables, i.e., 2 10 = 1024 models were averaged. The suitable formula was executed 5 times for every tested package. Actually, this is a small sample, but a lot of speed checks were done during writing fDMA package. During that time it was very often experienced that variation (dispersion) of time taken for evaluation is very small. Therefore, such a small sample seems enough to give the user some insight. Simultaneously, larger sample does not seem necessary. The speed comparisons were done with a help of microbenchmark package by Mersmann et al. [192]. The calculations were done on an Intel ® Core TM i5-6200U CPU 2.30 GHz machine with 20 GB RAM and under Debian 9 (Stretch).
It can be seen that fDMA is faster than dma, but not than eDMA. Indeed, eDMA is extremely fast. The price is however that this package is mostly written in C++. On the other hand, fDMA is more easy to modify by the user familiar with R. It was mentioned already that in many researches with DMA, the authors slightly modify the original method. Therefore, it seems reasonable to aim for having an easier to modify, but a bit slower package.
Finally, the purpose for writing fDMA was not to compete with other packages. The motivation was to provide a tool yet unavailable for users. In other words, the aim was rather to complement the existing tools. Therefore, the stress was put on implementing things not available in other packages. The author believes that fDMA will be useful for some researchers. By publishing fDMA package they will have more choices which tool to use, and they would be able to choose among the products more tailored for their needs and various applications. Somehow subjective comparison of the three packages is presented in Table 3.
Finally, Figure 7 presents a comparison of sequential vs. parallel computations, i.e., time of execution the function fDMA with the argument parallel = FALSE (default) and with parallel = TRUE. It can be seen that a time gain is obtained when the number of variables is over 10, or more precisely: if the number of estimated model exceeds 1024. Of course, this number is not precise; moreover, it can depend on hardware and software used. Nevertheless, it gives some general warning that because of overheads parallel computations are not always faster. Unfortunately, the gain from parallel computations is not so dramatic. Anyways, this gain is present, and in the case of, for instance, estimation of multiple models, can be highly beneficial for the user. In particular, it is often useful to turn on parallel computations in grid.DMA function, but not in fDMA function. Grey means that direct method is not available but the user can extract the method.