Next Article in Journal / Special Issue
Bayesian Model Averaging with the Integrated Nested Laplace Approximation
Previous Article in Journal / Special Issue
BACE and BMA Variable Selection and Forecasting for UK Money Demand and Inflation with Gretl
Open AccessArticle

Sovereign Risk Indices and Bayesian Theory Averaging

Norwegian Computing Center; Gaustadalleen 23a, Kristen Nygaards Hus, 0373 Oslo, Norway
Author to whom correspondence should be addressed.
Econometrics 2020, 8(2), 22;
Received: 15 October 2019 / Revised: 22 March 2020 / Accepted: 19 May 2020 / Published: 29 May 2020
(This article belongs to the Special Issue Bayesian and Frequentist Model Averaging)


In economic applications, model averaging has found principal use in examining the validity of various theories related to observed heterogeneity in outcomes such as growth, development, and trade. Though often easy to articulate, these theories are imperfectly captured quantitatively. A number of different proxies are often collected for a given theory and the uneven nature of this collection requires care when employing model averaging. Furthermore, if valid, these theories ought to be relevant outside of any single narrowly focused outcome equation. We propose a methodology which treats theories as represented by latent indices, these latent processes controlled by model averaging on the proxy level. To achieve generalizability of the theory index our framework assumes a collection of outcome equations. We accommodate a flexible set of generalized additive models, enabling non-Gaussian outcomes to be included. Furthermore, selection of relevant theories also occurs on the outcome level, allowing for theories to be differentially valid. Our focus is on creating a set of theory-based indices directed at understanding a country’s potential risk of macroeconomic collapse. These Sovereign Risk Indices are calibrated across a set of different “collapse” criteria, including default on sovereign debt, heightened potential for high unemployment or inflation and dramatic swings in foreign exchange values. The goal of this exercise is to render a portable set of country/year theory indices which can find more general use in the research community.
Keywords: Bayesian model averaging; conditional Bayes factors; sovereign debt default; macroeconomic forecasting Bayesian model averaging; conditional Bayes factors; sovereign debt default; macroeconomic forecasting

1. Introduction

In economic applications, Bayesian Model Averaging (BMA) has proven a useful tool to assess theories related to the potentials and risks of economic expansion, see Steel (2019) for a comprehensive review. All economic theories are in some sense qualitative and no single empirical observation can encapsulate the theory’s essence perfectly. To address this, a group of variables–self-evidently correlated–are often collected to proxy each theory. Not accounting for the uneven manner by which different variables may be available for each theory can lead to inappropriate conclusions regarding overall theory validity. Standard approaches to BMA can be modified, especially through the model prior, to account for these characteristics, but still consider the direct effect of the collected variables on the single response in question. One example is Chen et al. (2017), which consider the determinants of the 2008 crisis. They use a hierarchical formulation that allows for a simultaneous selection of both theories and relevant variables.
We propose an entirely separate approach to testing theories, both with regards to standard BMA and also to Chen et al. (2017), through a new model averaging approach. We assume each observation has a number of latent features encoding values for these theories. This requires the researcher to pre-specify which theory a given empirical observation is meant to proxy, a task which is often straightforward and frequently performed in practice. The outcome of this modeling exercise is a set of theory indices associated with each observation, as well as the model parameters necessary to derive these indices for observations not included in training. Our second innovation is to link the embedding of empirical factors to theory indices across a number of correlated outcome variables. This is driven by a motivation for theory index consistency. Ideally, an index which assesses the strength of a government’s institutions should be roughly the same when using the index to predict the potentials of economic growth and the susceptibility to economic collapse, for example. Indeed an ideal encoding would allow the theory index to be trained on one set of outcome variables and be immediately useful as a standalone feature in modeling separate but related economic activity. We therefore construct a framework by which theory-level modeling occurs on a latent level and is tuned to addressing a theory’s role in explaining the variability of a number of economic outcome variables simultaneously. Brock et al. (2003) recommend considering both theory uncertainty (many theories can explain a phenomena) and variable (which empirical proxies should be used to explain each theory) uncertainty. Following this recommendation, model averaging in our Bayesian Theory Averaging (BTA) approach occurs on two separate levels. On the theory-level, a standard BMA formulation is used to determine which proxies for a given theory have the greatest relevance. Our modeling is across multiple different outcome variables and a given theory may only be relevant for a subset of these outcomes. Thus, we also perform theory averaging on the outcome level, allowing theories to selectively enter into each outcome under consideration.
Outcomes in economics can be quantified in a variety of manners and thus our framework is formulated to entertain a broader family of outcome sampling distributions than the Gaussian context to which most economic BMA applications have adhered (Steel 2019). Indeed, our framework is organized to accommodate all generalized additive models (GAMs) (see for example, Hastie and Tibshirani (1990) or Wood (2017)) and quantile GAMS (qgams) (Fasiolo et al. 2017). Operationally, the posterior model space is explored via Markov Chain Monte Carlo (MCMC), see for example, Gamerman and Lopes (2006) or Robert and Casella (2013), and model moves are efficiently performed via Conditional Bayes Factors (CBFs), (Karl and Lenkoski 2012), which have been shown to be highly useful in related model averaging exercises (Dyrrdal et al. 2015; Lenkoski 2013).
Our motivating example concerns developing useful theory-based indices for quantifying the potential for significant negative economic outcomes in macroeconomies, which we term Sovereign Risk Indices (SRIs). These outcomes range across default on Sovereign debt, the potential for high levels of inflation or unemployment, and heightened risks instability in foreign exchange. Useful introductions to sovereign default are found in Roubini and Manasse (2005) and Savona and Vezzoli (2015). Each of these outcomes have a number of theories which explain their variability. These theories encapsulate institutional and financial characteristics of each country and overall aspects of the global economy at the time and are proxied by a large number of potential variables. By modeling these outcomes jointly, we can construct a set of theory indices that are relevant for general research into macroeconomic extremes. Our goal is to create a broad database of SRIs that can then be made available to the general research population, where each index has a clearly defined construction and encodes a well-articulated theory regarding economic well-being. Our data combines the data in Savona and Vezzoli (2015) with new data sources, as explained in Section 3.1.
The structure of the article is as follows—Section 2 outlines BTA. The specifics of the algorithm that performs posterior inference for BTA is rather involved and relegated to Appendix A. Section 3 contains our analysis of the data which constructs the SRIs while Section 4 concludes.

2. Bayesian Theory Averaging

In this section we discuss our modeling framework. Our final modeling framework has multiple response variables and non-standard response likelihoods. However, the basic concepts behind BTA can be explained via a partitioning of a standard BMA problem and the addition of an intermediate random effects process. Therefore, Section 2.1 shows how standard BMA exercises can be grouped by similar variables, from which the indices that are our main focus naturally arise. Section 2.2 then develops the general joint modeling framework.

2.1. BTA and Linear Gaussian Regression

We start with a standard Gaussian regression exercise. Let Y be a length n univariate response with Y i R and X be a n × p matrix of covariates. Furthermore let M { 1 , , p } be a model over a subset of the p potential covariates and X M the sub-matrix of columns associated with the model M. The standard BMA regression with known variance is then
Y = α + X M β M + ϵ
ϵ i N ( 0 , 1 ) .
We note that fixing the variance ϵ i to 1 in (2) is done for expositional convenience. It is not important to the developments of Section 2.2 to consider the general case of unknown variance. Under the g-prior (Zellner 1962)
β M N ( 0 , g ( X M X M ) 1 ) ,
we have that the integrated likelihood of this model is
p r ( M | Y , X ) | Ξ M | 1 / 2 exp 1 2 β ^ M Ξ M 1 β ^ M ,
Ξ M = g + 1 g X M X M β ^ M = Ξ M 1 X M Y .
Now suppose that there is a natural partition of the p covariates into two groups, that is, the first p 1 columns of X belong to group 1 and the final p 2 columns ( p 1 + p 2 = p ) belong to group 2. Then instead of considering a single model M { 1 , , p } , we could imagine there is a collection ( M 1 , M 2 ) of models with M 1 { 1 , , p 1 } and M 2 { p 1 + 1 , , p 1 + p 2 } . In many cases in BMA-driven studies, such a partition is natural since various concepts are proxied by collecting several features which are meant to encapsulate a given concept quantitatively. We therefore find it natural to discuss the model M 1 as the “theory one” model and the model M 2 as the “theory two” model.
We note that at this point, the integrated likelihood of p r ( M 1 , M 2 | Y , X ) can be evaluated jointly and efficiently by (3). However, while there is no reason to do so, one could instead elect to update the models M 1 and M 2 separately.
In particular suppose that M 1 and β 1 are given. Then
p r ( M 2 | β 1 , M 1 , Y , X ) | Ξ M 2 | 1 / 2 exp 1 2 β ^ M 2 | M 1 Ξ M 2 1 β ^ M 2 | M 1 ,
Ξ M 2 = g + 1 g X M 2 X M 2 β ^ M 2 | M 1 = Ξ M 2 1 X M 2 E 1 E 1 = Y I 1
I 1 = X M 1 β 1 .
Thus, we have effectively “separated” the response Y from the update of M 2 by replacing it with the residual calculation E 1 given the theory 1 parameter set. This leads to the alternative representation
Y = α + I 1 + I 2 + ϵ .
Thus, again while there is no need to do so, an MCMC for the overall BMA exercise could be conducted by alternating between updating model M 1 and thereby I 1 , then updating model M 2 and I 2 . These two summary variables I 1 and I 2 can then be referred to as the theory one and two indices respectively.
In the Bayesian paradigm is it often natural to now incorporate a notion of over-dispersion. In particular, we can imagine that while X M 1 β 1 represents the “mean” theory one index given the features X M 1 , a random process adds a source of randomness to this mean level. It is therefore common to replace (3) with
I 1 = X 1 β 1 + η 1 η 1 i N ( 0 , ν 1 1 ) ,
where the overdispersion parameter ν 1 can then be given a prior distribution, for example Γ ( a 1 / 2 , b 1 / 2 ) . A similar formulation can be made for I 2 . In the context of econometric BMA exercises we feel such a random effects representation is imminently sensible as it implicitly admits that the features X 1 can only ever be imperfect encapsulations of a theory’s essence.
At this junction, the joint marginal likelihood (3) is no longer directly applicable. However, the conditional strategy of alternating between models M 1 and M 2 using (4) can still be used with an important modification. In particular, we note that given β 2 , I 1 , ν 2 we have
p r ( I 2 | Y , I 1 , ν 2 ) N ( ( 1 + ν t ) 1 ( E 1 + ν t X 2 β 2 ) , ( 1 + ν t ) 1 ) .
Furthermore, given I 2 we may replace (5) with
β ^ M 2 | M 1 = Ξ M 2 1 X M 2 I 2 .
Subsequent to the sampling of the latent factors I 2 we may resample the random effects precision parameters ν t via a standard Gibbs step.
Indeed, we could then consider one final embellishment where
Y = α + γ 1 I 1 + γ 2 I 2 + ϵ ,
with γ t { 0 , 1 } with for example, prior probability that γ t = 1 set to 1 / 2 (or any other value in ( 0 , 1 ) ). Then when γ 2 = 0 the update of I 2 would simply be
p r ( I 2 | Y , I 1 , ν 2 , γ 2 = 0 ) N ( X 2 β 2 , ν t 1 ) ,
that is, a sample from the prior conditional on β 2 . Updating the parameter γ 2 conditional on all other factors would then involve a straightforward Metropolis-Hastings step. If the models M 1 and M 2 indicate which variables are included in the theory one and theory two models, the γ 1 and γ 2 act as a wholesale inclusion parameter which dictates the overall relevance of the respective theory.
This partitioning and random effects strategy forms the basis of our development in Section 2.2. We note that the inclusion of the random effects component has the effect of keeping model evaluations conditionally Gaussian, which enables the use of conditional Bayes factors to efficiently resample model parameters.

2.2. Multivariate BTA and Generalized Regression Models

We now generalize to the case where we have R responses from a general response family. Let Y i be an R dimensional response vector for observation i and D = { Y 1 , , Y n } be a collection of n such observations. Each variate Y i r in the vector Y i is assumed to belong to a general field F r . In this paper we consider examples where F r is { 0 , 1 } , R and R + , though others such as N , could easily be entertained. We associate Y i r with an outcome distribution as
Y i r g r ( α r , μ i r ) ,
where g r is a general probability density or mass function, α r is a set of global parameters and μ i r is an observation i dependent mean value. We note that the assumption that only the mean parameter μ i r varies according to the observation i could be relaxed in future work.
The parameter μ i r is then assumed to have the form
μ i r = t = 1 T γ r t I i t .
In the above formulation γ r t can either be 0 or γ r t R . We assign a prior probability of 1 / 2 to these two possibilities, clearly other prior probabilities could be entertained. By convention if several γ r t are non-zero for a given index t then one of these non-zero γ r t is set to 1 to avoid issues related to identification. This matter is discussed subsequently.
The variable I i t is then referred to as the theory-t index for observation i. We further assume that the I i t depends on a set of p t theory proxies X i t according to the linear model
I i t = X i t β t + ϵ i t ,
where ϵ i t N ( 0 , ν t 1 ) independently. The precision term ν t is assigned a Γ ( a t / 2 , b t / 2 ) prior. We note that this prior actually is forced to adapt throughout the procedure (by adjusting the a t , b t parameters) to control for issues of identification, we discuss this aspect below. We typically begin the inference procedure setting a t and b t to 1.
Associated with the parameter β t is a model M t { 1 , , p t } such that β i t = 0 when i M t , a standard BMA formulation. As the “null” model can be controlled by the γ r t parameter, we exclude M t = Ø from our consideration, see Kourtellos et al. (2019) for a motivation of this structure. Writing β M t to represent the subvector of β t not constrained to zero we assume
β M t N ( 0 , ν t 1 g t ( X M t X M t ) 1 ) ,
where p M t is the size of model M t and independent across t. As with the prior parameters a t , b t , the g-prior parameter g t adapts throughout the procedure, we begin with g t = 1 / n . Alternative priors for this model could have been considered, see our discussion in the Conclusions section.
Finally, the model M t can have a number of priors, see Ley and Steel (2009) for an overview of potential issues to consider when selecting this prior. For the time being we choose the uniform prior
p r ( M t ) 1 .
When γ r t R we assign the prior probability γ r t N ( 0 , 1 ) . This has the effect of imposing a uniform model prior on the inclusion of theories in the outcome equation. Alternatively, joint priors for the γ factors could be considered which would control for the size of the included theories. However, since the number of theories is meant to be modest (roughly five to ten), we have avoided such aspects in the current framework.
The system outlined above then serves as the core latent process which drives the subsequent outcome variables. Thus we see that the models M t investigate which proxies best encode a theory quantitatively while also accounting for the obvious model uncertainty in this formulation and incorporating a notion of over-dispersion. The γ r t terms serve two purposes. First, by examining their non-zero elements we see for which response equations a given theory is relevant. Secondly, by requiring the first non-zero γ r t to be equal to 1 and all others to be in R the γ r term scales the latent indices to allow them to enter into model parameters differentially and indeed in opposite directions.
Finally, the latent theory indices I i t are potentially of greatest interest, as they are meant to encapsulate the way that the theory proxies affect the outcome equations of interest. Again, as outlined in the Appendix, these terms suffer from potential identification issues when combined with the restrictions placed on a given γ r . The hyperparameters a t , b t ultimately control this aspect and therefore, final interest focuses on the scale-free term I ˜ i t = ( a t / b t ) I i t .
This concern regarding identification requires a modicum of bookkeeping when conducting posterior inference. If, for example all non-zero γ values were allowed to be in R then the final outcome equation could have a variety of γ r t and β t combinations that would yield the same posterior probability. This is the justification for our restriction that the γ r t with the smallest r be constrained to 1.
However, this constraint yields its own issues, primarily due to its effects on the priors for the β and ν parameters. If, for example, γ 11 = 1 and γ 21 = 0.5 and our chain sets γ 11 to 0, γ 21 will suddenly double. This would imply that γ 21 I 1 will suddenly have twice the effect on the mean value of outcome Equation (2). The obvious answer is to simultaneously halve I 1 , or equivalently, halve β 1 . However, it would no longer be appropriate to keep the priors for β t and ν t fixed and therefore their priors are also adjusted by this factor. Technical details are given in Appendix A.
To review, our full modeling framework therefore takes the form
Y r g r ( α r , μ i r ) μ i r = t = 1 T γ r t I i t p r ( γ r t = 0 ) = 1 2 p r ( γ r t | γ r t 0 ) N ( 0 , 1 ) unless γ r t = 1 I i t N ( X i t β M t , ν t 1 ) β M t N ( 0 , g t ( X M t X M t ) 1 ) p r ( M t ) 1 p r ( ν t ) Γ ( a t / 2 , b t / 2 ) .
Choices for families g r that control the outcome variables are considerable. In our application, we focus on three models. The first is logistic regression. In this case Y i r { 0 , 1 } , α r is univariate and
p r ( Y i r = 1 ) = exp ( α r + μ i r ) 1 + exp ( α r + μ i r ) .
We use this logistic regression to model the probability that a country will default on its sovereign debt based on theory-indices.
The second family considered corresponds to the non-central asymmetric Laplace variates. In this case α r is two dimensional with α r 1 denoting the intercept and α r 2 the log-precision parameter. In particular, we write
p r ( Y i r | α r , μ i r τ ) = τ ( 1 τ ) exp α r 2 e α r 2 ρ τ ( Y i r α r 1 μ i r ) ρ τ ( x ) = x ( τ 1 { x < 0 } ) ,
where τ is the quantile under consideration. This model is often referred to as the Bayesian Quantile Regression since its posterior mode is related to the quantile regression estimate under the so-called pin-ball loss ρ τ . We employ this model for two separate variates, the inflation and unemployment rates and set τ = 0.9 for both, thus focusing on the 90th percentile of the respective distributions.
Finally, we consider the Generalized Extreme Value (GEV) model with α r = ( α r 1 , α r 2 , α r 3 ) parameterized by
p r ( Y i r | α r , μ i r ) = e α r 2 h ( Y i r ) ( α r 3 + 1 ) / α r 3 exp h ( Y i r ) α r 3 1 ,
for h ( Y i r ) > 0 with
h ( Y i r ) = 1 + α r 3 e α r 2 ( Y i r α r 1 μ i r ) .
The GEV model is used to model block maxima and hence understand the nature of extreme behavior. In our case we use it to model the largest daily percentage jump in a country’s exchange rate (relative to USD) seen over the course of a year. The global parameters α r 2 and α r 3 are the log precision and shape respectively while α r 1 again serves as the global intercept.
Based on D we then conduct posterior interference on the full parameter set, which includes global parameters α r , the theory-level models M t , theory-inclusion and scaling parameters γ r t and linear model parameters β t as well as the latent indices I t and their random effect variances ν t . Posterior inference is performed via Markov Chain Monte Carlo (MCMC). Given the involved and nested nature of the MCMC, several different approaches are employed at different stages of the hierarchy and the full details are provided in Appendix A.
However, the main themes of the MCMC involve conditional Bayes Factors (CBFs) to change models M t and update proxy regression parameters β t . Standard block Metropolis-Hastings proposals using local Laplacian calculations of the log posterior density are used to update latent theory indices I i t as well as any global parameters in θ r . Finally, reversible jump methods (Green 1995) alternate γ r t between being 0 or in R , with a modicum of book keeping to ensure that at least one γ r t is set to 1 when theory t is represented in more than one dependent equation r, again to ensure identification of the system. When conducting this bookkeeping exercise, prior distributions are adjusted accordingly to ensure that log-posterior density values are not affected by mere changes in variate representations.

3. Using BTA to Construct Sovereign Risk Indices

3.1. Data Outline

Our dataset for constructing SRIs originated from the dataset in Savona and Vezzoli (2015). Savona and Vezzoli (2015) track 70 countries between the years of 1975 and 2010. We have extended these original data to 2018 and are primarily focused on whether a country defaults on its sovereign debt in a given year. Some country year combinations are not present and thus n = 2032 . To model this default probability, Savona and Vezzoli (2015) collect 27 covariates. These covariates are meant to proxy 5 different theories related to sovereign debt default. In particular, they entertain the concepts of Insolvency, Illiquidity, Macroeconomic Factors, Political Factors and Global Systemic factors. Table 1 provides an overview of the 27 covariates considered and to which theory Savona and Vezzoli (2015) associate them. For a given year, most covariates are “lagged”(except for contagion, dummy for oil and dummy for international capital markets), in that these values would be available at the start of a given year, as opposed to co-occurring with the default event. Covariate missingness is prevalent, we derive imputed values using the semi-parametric Gaussian copula of Hoff (2007) within each theory-group.
Savona and Vezzoli (2015) is concerned with predictive models of sovereign default and therefore solely focus on this binary outcome. We augment the default binary with three other measures that can also indicate a macroeconomy in a state of collapse. First, the country’s lagged (i.e., one-year behind) inflation rate was originally included in the Macroeconomic factors group of covariates in Savona and Vezzoli (2015). We instead treat (non-lagged) inflation as another dependent variable and note that doing so has no effect on the Default outcome; a run of BTA solely on Default with inflation included in the Macroeconomic factors gave this variable 0 inclusion probability. In addition, we collected unemployment data from the IMF website. These data were only available for a subset (897 country/year pairs) of the total data. We note that this dependent variable missingness poses no substantive problem in terms of the derivation of SRIs. The BTA approach simply ignores the missing likelihood contributions necessary to update the associated latent theory indices.
Finally, we collected foreign exchange rate data from the website of the IMF. For each country/year pair, we first computed the log rate change relative to the US dollar and then used the annual maximum of these log changes. This variable therefore shows the largest single-day weakening of a currency relative to the US dollar in the course of a year. We avoided commercial sources of foreign exchange data and therefore only had these values for 272 country/year pairs. See our discussion in the conclusions section regarding expanding these data.
A country is in default if it is classified so by Standard & Poor’s (SP) or if it receives a large nonconcessional loan by the IMF. A nonconcessional loan is a loan that has the standard IMF’s market-related interest rate, while a concessional loan has a lower interest rate. The type of loans we consider from IMF must be in excess of 100 percent of quota. Each member country has a quota, where the initial quota is set when a country joins IMF. The quota determines, by other things, the country’s access to IMF loans and for instance its voting power. By augmenting the data from SP with data from IMF, we also capture near-defaults or debt restructurings avoided through loan packages from IMF. We consider Stand By Agreements (SBA) and Extended Fund Facility (EFF).
Our posterior inference is performed after running the BTA algorithm for 400 thousand iterations over these data. In order to verify convergence, 30 separate runs of the algorithm were run simultaneously and the resulting output was inspected to verify posterior inference for each individual chain was nearly identical. Runtime on a 32 core machine (dual 8-core 3.4 GHz AMD Ryzen processors with hyperthreading capabilities) with 128 GB of RAM was roughly 7.5 h. Runtime of a single chain (as opposed to all 30) on a Macbook with 8 GB RAM and a 2.4 GHz dual-core processor is roughly similar, indicating that specialized hardware is not necessary.

3.2. Results

We begin our discussion of the SRI results by investigating outcomes on the theory level. Table 2 shows the theory inclusion probability (i.e., the proportion of time that γ r t was not constrained to zero in the chain) for each theory, across the four outcome variables. Given that the original dataset was constructed to model the default variable, it is unsurprising that all theories achieve inclusion probabilities of one for this outcome. We note, however, that this does not indicate that all theories are equally strong in explaining default, simply that none of them can be considered irrelevant. The inflation outcome is interesting in that it suggests that proxies measuring a country’s political stability, macroeconomic and system factors have the best explanation for the upper tails of the inflation distribution. Insolvency and Illiquidity theories are also relevant to inflation, achieving probabilities between 0.64 and 0.367. The results for unemployment in Table 2 finds little inclusion for the Insolvency index, while all others achieve inclusion of 1. Finally, we achieve relatively low inclusion probabilities for all theories for the devaluation outcome. This is likely due in part to the relatively small amount of data that was available using public sources, see our discussion in Section 4. However, we feel this result highlights a useful feature of BTA, namely that including this outcome variable and having the system set theory-inclusion probabilities to zero meant there was no subsequent effect on the calculation of theory indices.
Table 3 shows the mean value (conditional on inclusion) of the parameter γ r t for each theory and outcome pair. Since Default was ordered first in our system and achieves inclusion probabilities of 1 for all theories, this system serves to orientate the rest of the outcomes. In particular, a positive γ r t for the outcomes indicates that the directionality of this theory on the outcome is similar to that of default. The conditional means even more clearly the importance of the Macroeconomic, Political and Systemic theories relative to the remaining two. The value of their condition means (between 1.4 and 2.5) is substantially higher than 0.451 and 0.124 for Insolvency and Illiquidity respectively. Since these three theories achieved substantially higher inclusion probabilities in Table 2 this implies that their unconditional effect is the main driver of the upper tail of inflation.
Recalling again from Table 2 that the unemployment outcome was driven by the Illiqudity, Macroeconomic, Political factors and Systemic Factors, the results in Table 3 are interesting. They show that Macroeconomic and Systemic have a strong, positive effect on the unemployment outcome. We note that “positive” is in the sense of working in the same direction as Default. The Illiquidity and Political factors then balance this behavior; they are negatively orientated to the impact these factors have on Default. Finally, as noted in Table 2 there appears to be negligible effect of the theory indices on the devaluation outcome.
Table 4 shows the inclusion probabilities and conditional posterior mean for each proxy contained in the Insolvency theory group. Five factors achieve probabilities above 0.98. These include one factor that measures the strength of the country’s balance sheet (ResG), two factors describing the country’s trade balance dynamics (XG and MG) and finally two factors describing the state of foreign direct investment (FDIY and FDIG). Interestingly, features that assess the country’s debt load are included in the posterior to an appreciable degree.
Table 5 shows the inclusion probabilities for the Illiquidity theory. In contrast to the balanced view offered in the Insolvency results of Table 4, Table 5 puts almost all weight on a single feature, a measure of a country’s shot-term cash and cash equivalents to reserves (M2R).
Table 6 shows the inclusion probabilities for the proxies in the Macroeconomic grouping. We see that measures related to inflation dynamics (RGRWT) and the rate on US treasuries (UST) are given inclusion probabilities of one while the other factors are given low inclusion probabilities.
Table 7 and Table 8 show the inclusion probabilities for proxies of the Political and Systemic theories respectively. In each theory there are only two features and all four receive inclusion probabilities of 1. We see that the Political theory is thus a blend of the Political rights index (PR) and a measure of past susceptibility to default (History). Likewise, a measure of global contagion (Cont_tot) as well as local factors (Cont_area) combine to form the Systemic theory.
Our results echo many of the main themes in Savona and Vezzoli (2015), in that variables from the Illiqudity, Insolvency and Systemic theories are included in both cases. However, Savona and Vezzoli (2015) find no inclusion of variables from the macroeconomic or political theories, while these theories are included with probability one in our results. We note that this is likely to partially due to overall model size; Savona and Vezzoli (2015) only include 6 of the 26 variables in their tree-based approach. In contrast, the average total model size using BTA was 12.8 variables, with all iterations having between 11 and 17 variables included.
We conclude by investigating detailed results for two of the theories, namely Insolvency and Illiquidity. Table 9 shows the country/year pairs with the five lowest and five highest posterior mean values of I i t for the insolvency theory. The lowest five country/year pairs listed represent the countries whose Insolvency index indicates the lowest probabilities of default. Interestingly, Gabon is represented twice amongst these five countries (for the years 1981 and 1995), which is unsurprising given the country’s oil wealth and relative aggregate prosperity amongst African nations. Amongst the five country/year pairs with the highest Insolvency index scores we see a mix of African (Tunisia 1988; Niger 1983) Caribbean (Trinidad and Tobago 1987; Haiti 1979) and Southeast Asian (Sri Lanka 2009) countries. Two of the five (i.e., 40%) of these pairs experience a default, which is substantially higher than the 6% average over all the data, showing the degree to which this feature is positively associated with default.
Table 10 shows the five highest and lowest country year pairs according to the Illiquidity index. Burundi in 1991 (i.e., two years before the start of the civil war that ran between 1993 and 2005) receives the lowest Illiqudity index, otherwise followed by countries in South Asia. On the highest end, we see both Jamaica and Lesotho represented twice. In addition, Gabon in 2002 is present, a year in which the country defaulted on its sovereign debt. This contrast to Table 9 is illuminating, as it shows the trade off between potential for insolvency and risks of illiquidity in precipitating sovereign default. We note again that two of the top five country year pairs record a default, similar to the results of Table 9. However, when inspecting the unemployment result, we also see high levels of unemployment for four of the five top countries (and a missing value for Gabon 2002, the remaining country). Simultaneously the countries with the lowest illiquidity indices have negligible unemployment rates. This lines up with the results presented in Table 4, where the insolvency index had a large, positive effect on the unemployment outcome equation.
Finally, we address an issue related to theory index portability. In the theoretical construction of these indices we specified an independence structure between indices. However, there has been nothing enforcing this condition in posterior estimation. If theory indices were correlated in the posterior, this would be acceptable, however it would imply that these indices would need to be included as a set when attempting to model other phenomena. Table 11 suggests such considerations are likely unnecessary. In Table 11 we show the posterior correlation matrix of the theory indices, estimated over all samples and country/year pairs. We see in general a low degree of correlation (the entry −0.161 between the Macroeconomic and Political theories being the highest in absolute value). This feature is desirable, since it suggests that the theory indices can be used on an individual basis for subsequent modeling of other issues related to economic collapse.

3.3. Investigating the Multiple Response Framework

One of the innovations of the BTA framework has been the use of multiple responses to jointly determine the parameters β t and implicitly the theory indices I i . This section investigates the advantages of this joint modeling in the context of an out of sample prediction exercise. In this study we conduct a leave-one-country-out cross validation. For each country we create a training dataset which excludes all observations from that country. We then fit five models. The first is the full specification described above with all four responses contained in the framework. We then fit versions of BTA including each of the responses individually.
For the country that has been left out, we then derive the fitted values, based on their features for each of the four responses from the joint model as well as each model individually. These fitted values are then scored relative to the observed value by appropriate proper scoring rules, namely the brier score for the binary response (default), the quantile score for the two quantile regressions (unemployment, inflation) and the likelihood score for the GEV regression. A permutation test (see e.g., a similar procedure in Möller et al. (2013)) assesses the significance of these results.
Table 12 shows the mean scores across all countries in the joint and single model cases for each of the four response variables. The results for the default variable are nearly identical. While multiple variables did not aid predictive performance, this result is still positive as it indicates that performance was not hampered by their inclusion. However, for the two quantile regressions we see a substantial improvement in predictive performance when using the model output from the joint model with the GEV regression on devaluation substantially improved when using the joint model. We feel the results of Table 12 provide a strong indication of the usefulness of jointly modeling several response variables together.

4. Conclusions

We have constructed a system whose purpose is to create indices representing various theories which are believed to drive heterogeneity in economic outcomes. When constructing an index, interpretability is an important feature to retain. This is primarily because through interpretability additional proxies can be found when deficiencies become apparent, and specific results can be explained directly. Our BTA approach then forms a natural means of incorporating and resolving the obvious model uncertainty present in such a specification. Furthermore, our focus on modeling multiple outcomes coupled with the ability to entertain a broad set of outcome sampling distributions lends our system both generalizability and flexibility.
There is considerable additional work to be done, both on the technical, algorithmic sides of BTA and also related to the specific goal of modeling an economy’s potential for collapse. One key point has been the assumption that the multiple outcome variables are conditionally independent from one another given the indices. In practice, this did not seem to be overly critical, as seen by the fact that the inflation outcome was not present in the posterior when BTA was run on default using this feature alongside the others shown above. However, incorporating outcome variable dependence should be relatively straightforward using the Gaussian copula approach of Hoff (2007). Indeed uncertainty over these conditional independence assumptions could also be model averaged using the copula Gaussian graphical model approach of Dobra and Lenkoski (2011).
Another matter that was avoided was country and year effects. Initial investigations using country-level fixed effects suggested little residual country-level correlation once other features were accounted for. Furthermore, since our goal is ultimately the use of indices for forecasting, it is desirable that latent factors such as random effects (which would not be internally estimated for countries or years not in the dataset) do not need to be supplied when forecasting. It our view that evidence of result clustering along year or country lines is primarily an indication of feature inadequacy. As we continue to build the SRI dataset we will monitor for clustering in results that are not captured in the feature set and use these to continue building out our collected features.
In this current system, outcome equations had a linear dependence on theory indices. While it will always be necessary to orientate the indices for reasons of identification (i.e., the assumption that γ r t = 1 for at least one non-zero r), expanded linear forms such as spline models (Wood (2017)) are entirely feasible. Indeed a third layer of model selection would be to test between linearity and the expanded linearity offered by spline modeling.
The MCMC algorithm necessary to resolve BTA was neither trivial nor the most complex. As outlined as early as Rue (2001), block updates of parameters in hierarchical generalized models is often advantageous. We have in general avoided block updates at present, but such a sampling regime could speed up convergence and also algorithm run-time.
One difficulty we experience when implementing the quantile regression was the null second derivative in the asymmetric Laplace distribution. This in turn, makes intelligent updates of parameters for this distribution somewhat harder, since there is less information regarding posterior curvature and thus proposals have a tendency to move too far along the posterior density surface. This feature has already been investigated in some detail in related contexts. One potential for improved mixing would be to follow Fasiolo et al. (2017), who propose a smooth version of the pinball loss to aid the fitting of qgam models.
Finally, our reversible jump proposals were in some sense the least inspired part of the current system. Though mixing appeared acceptable, more focused jumps could have been constructed, by following much the same Laplacian formulations as the other model parameters.
With the onset of a global pandemic in the form of the COVID-19 virus, the great expansionary period following the global financial crisis appears to have finally halted. It is clear that we can expect to enter a retractionary phase of the global business cycle. Our applied interest has been to begin building a monitoring, forecasting and inferential toolset that can prepare us for this period. While we believe the current version of the SRI estimation system is encouraging, considerable work remains to be done.
Fresh data will be paramount to this effort. We intend to continue building this system to include all available years. We are broadly happy with the proxies collected to model insolvency and illiquidity in an economy. Macroeconomic and Systemic features could likely be expanded in a number of obvious ways. For instance including information on global financial markets or personal or industrial bankruptcy information could expand the Systemic theory proxies.
However, we are convinced that the Political risk proxies can be expanded in several important manners. Aspects related to political regimes are likely to affect potential for economic collapse. Merging our data with the regime change dataset of Reich (2002) could be one avenue to account for the effect of differing regimes and overall regime uncertainty.
Finally, it has been our hope to use only publicly available data sources to aid in the reproducability of our index construction. While we are convinced that devaluation matters should be included in our set of outcome equations, the necessary currency data has been hard to find publicly. We will continue to investigate open and public sources of currency exchange data to increase the coverage of this variable. In doing so, we hope the relative inconclusivity related to theories and their effect on sudden devaluations can be resolved.

Author Contributions

This is a collaborative project, and both authors contributed to all aspects of the work. All authors have read and agreed to the published version of the manuscript.


This research received no external funding.


We would like to thank Mark F.J. Steel for the invitation to contribute to the Special Issue on Bayesian and Frequentist Model Averaging. We are also grateful to Roberto Savona for providing the original dataset on which we based this study as well as helpful discussions. In addition, we would like to thank the guest editor and two anonymous reviewers for their helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Full Algorithm Details

Based on data D we use MCMC to obtain a sample { ς [ 1 ] , , ς [ S ] } of the posterior distribution, where each ς [ i ] contains
  • M 1 , , M T the models associated with theories 1 through T
  • β 1 , , β T , the coefficient vectors associated with each theory. Note that by construction β j t = 0 when j M t
  • γ 1 , , γ R the theory-scaling vectors for each outcome equation r. A γ t r can be set to zero, indicating that theory-t is not currently relevant for outcome equation r. For purposes of identification if multiple γ t r are non-zero for a given t, we set γ t r = 1 for whichever r is smallest.
  • I 1 , , I T the latent theory index vectors (each of length n) where I i t is the current state of the theory t index for observation i. By convention if γ t r = 0 for all r then I i t = 0 for all i.
  • ν 1 , , ν T , the random effect precision terms
  • Global parameters α r in the R outcome equations
When moving from ς [ s ] to ς [ s + 1 ] we utilize four different MCMC strategies, all of which are now relatively standard in the MCMC literature. These are
  • Gibbs sampling, relevant for updating β t and ν t
  • Conditional Bayes Factors, which are used to update the theory-level models M t
  • Metropolis-Hastings via Laplacian calculations of the log posterior density which are used, in turn, to update theory indices I t , global parameters α r and those theory-scaling parameters γ t r which are neither constrained to zero or one.
  • Reversible Jump Methods for alternating γ t r between being 0 or in R . Note that the moves here become especially detailed–though primarily in the sense of bookkeeping–when γ t r is currently set to 1, or if γ t r is currently zero and r is smaller than all other non-zero γ t r . Finally, this becomes a joint reversible jump move when the model move will either turn-on or shut-off the theory entirely, as both γ t r and I r will be affected.
The sections below detail each of these approaches individually.

Appendix A.1. Gibbs Sampling Updates

To resample β t we note that its posterior distribution
p r ( β t | · ) p r ( β t | M t , I t , X t , ν t )
where β M t and X M t indicate the restriction to those elements and columns of β t and X t , respectively, associated with the variables in model M t . We then have that
p r ( β M t | I t , X M t , ν t ) p r ( I t | β M t , X M t , ν t ) p r ( β M t | ν t ) .
Via standard results this yields
p r ( β M t | I t , X M t , ν t ) = N ( β ^ M t , Ξ M t 1 ) Ξ M t = ν t g + 1 g X M t X M t β ^ M t = Ξ M t 1 X M t I t .
Given I t , β t we have that
ν t Γ a t + n 2 , b t + S t 2 S t = ( I t X t β t ) ( I t X t β t )

Appendix A.2. Conditional Bayes Factors to Update Mt

Conditional Bayes Factors compare integrated likelihoods for models M t and a new proposal model M t , conditioning on the latent indices I t . This conditioning then separates the Gaussian regression components on which the models operate from the larger non-Gaussian components in the response equations, leading to an efficient sampling regime. This efficiency is present both in the availability of closed form calculations to compare models and the relative parsimony of the approach’s exposition.
In particular, note that
p r ( M t | D , · ) p r ( I t | M t ) p r ( M t )
where it is implicit that we have conditioned on the fixed regressors. This implies that the latent theory indices I t separate the conditional posterior of the model M t from the data D and the associated non-integrable likelihoods. This term can the be represented by
p r ( I t | M t ) p r ( M t ) = β M r p r ( I t | β M r ) p r ( β M r | M r ) d β M r p r ( M t )
The integrand above is then
β M r p r ( I t | β M r ) p r ( β M r | M r ) d β M r | Ξ M t | 1 / 2 exp 1 2 β ^ M t Ξ M t β ^ M t
where β ^ M t and Ξ M t are defined as above. Similar to the classic MC3 algorithm, models M t and M t are compared via Metropolis-Hastings.

Appendix A.3. Metropolis-Hastings Updates via Laplacian Expansions

The two sections above dealt with parameters that could effectively be “conditioned” away from the sampling model of the dependent variables, in both cases by conditioning on the latent variables I t . This, in turn, led to updates that were straightforward to calculate as in both cases they relied on well-known results for integrals over the Gaussian distribution. However, when conditional posterior distributions do not have a form amenable to integration or Gibbs sampling, Metropolis-Hastings algorithms provide an obvious alternative. This section therefore details all proposal distributions and acceptance ratios necessary to update these parameters.
In all cases, we follow a standard approach to creating Gaussian proposals which requires no pre-specified tuning parameters and instead adapts proposals to the local curvature of the log posterior density, see for example, chp. 4 of Rue and Held (2005) for a detailed discussion of this approach and Dyrrdal et al. (2015) for a similar algorithmic design. More involved methods, such as Hamiltonian MCMC, Manifold MCMC and so forth, which build on these concepts could have been entertained but mixing was already sufficiently acceptable that these more sophisticated methodologies seemed unnecessary. See our discussion the Section 4. Suppose, in general, that we would like to update a parameter τ and write log p r ( τ | · ) = f ( τ ) to represent the log posterior density of this parameter with respect to the observations and all other parameters. For designing the proposal distribution, we employ a Gaussian approximation of this posterior density. A quadratic Taylor expansion of the log-posterior f ( τ ) around the value τ gives
f ( τ ) f ( τ ) + f ( τ ) ( τ τ ) + 1 2 f ( τ ) ( τ τ ) 2 = a + b τ 1 2 c ( τ ) 2 ,
where b = f ( τ ) f ( τ ) τ and c = f ( τ ) . The posterior distribution p r ( τ | · ) can therefore be approximated by
p r ˜ ( τ | · ) exp 1 2 c ( τ ) 2 + b τ ,
the density of the Gaussian distribution N ( b / c , c 1 ) . Using this relationship, we choose N ( b / c , c 1 ) as our proposal distribution, where τ is the current state in the MCMC chain. This formulation alleviates the user from specifying a large number of sampling tuning parameters and achieves high acceptance proportions.
The following subsections outline the specific forms of f , f , and f for all variates that are updated in this manner. Since the I i t depend on all r equations they are handled in a final, separate subsection.

Appendix A.3.1. Logistic Regression

If equation r is a logistic model then it has the form
p r ( Y i r | · ) = exp ( μ i r ) 1 + exp ( μ i r ) Y i r 1 1 + exp ( μ i r ) 1 Y i r
μ i r = α r + t = 1 T γ r t I i t
The formulas for α r , γ r t require derivation (as noted above we leave I i t to a final subsection). First, note
log p r ( Y i r ) = Y i r μ i r log ( 1 + exp ( μ i r ) )
Then for the global parameter α r with prior distribution α r N ( 0 , 1 ) we have that
f ( α r ) = i = 1 n Y i r μ i r log ( 1 + exp ( μ i r ) ) α r 2 2 f ( α r ) = i = 1 n Y i r exp ( μ i r ) ( 1 + exp ( μ i r ) ) α r f ( α r ) = i = 1 n μ i r ( 1 + μ i r ) 2 1
Similarly, for γ r t not constrained to be 0 or 1 we assume γ r t N ( 0 , 1 ) and have
f ( γ r t ) = i = 1 n Y i r μ i r log ( 1 + exp ( μ i r ) ) γ r t 2 2 f ( γ r t ) = i = 1 n Y i r I i t I i t exp ( μ i r ) ( 1 + exp ( μ i r ) ) I i t f ( γ r t ) = i = 1 n I i t 2 exp ( μ i r ) ( 1 + exp ( μ i r ) ) 2 1
Finally, as it will be important in derivations for the updates of I i t we write
l r ( Y i r , I i t ) = Y i r μ i r log ( 1 + exp ( μ i r ) ) l ˙ r ( Y i r , I i t ) = γ i r Y i r γ i r exp ( μ i r ) 1 + exp ( μ i r ) l ¨ r ( Y i r , I i t ) = γ i r 2 exp ( μ i r ) 1 + exp ( μ i r ) 2

Appendix A.3.2. Bayesian Quantile Regression

p r ( Y i r | μ i r , κ , q ) exp κ e κ ρ q ( Y i r μ i r ) ρ q ( Y i r μ i r ) = ( Y i r μ i r ) ( q 1 { Y i r < μ i r } )
be a Bayesian Quantile Regression, that is, Y i r is considered asymmetric Laplace distributed with log-precision parameter κ and
μ i r = α i r + t = 1 T γ r t I i t .
We therefore need to derive the relevant formulas for α i r , γ r t and likelihood derivatives for I i t . We note
log p r ( Y i r | μ i r , κ , q ) = κ e κ ρ q ( Y i r μ i r )
and thus,
log p r ( Y i | · ) μ i = e κ ( q 1 { Y i < μ i } ) 2 log p r ( Y i | · ) ( μ i ) 2 = 0 .
Therefore, for α r with N ( 0 , 1 ) prior we have
f ( α r ) = κ e κ i = 1 n ρ q ( Y i r μ i r ) α r 2 2 f ( α r ) = e κ i = 1 n ( q 1 { Y i < μ i } ) α r f ( α r ) = 1 .
Similarly when γ r t is not constrained to 0 or 1 we set γ r t N ( 0 , 1 ) and have
f ( γ r t ) = i = 1 n κ e κ ρ q ( Y i r μ i r ) γ r 2 2 f ( γ r t ) = i = 1 n I i t e κ ( q 1 { Y i < μ i } ) γ r f ( γ r t ) = 1 .
Likewise, we note that
log p r ( Y i | · ) κ = 1 e κ ( q 1 { Y i < μ i } ) 2 log p r ( Y i | · ) ( κ ) 2 = e κ ( q 1 { Y i < μ i } ) .
and thus if κ N ( 0 , 1 ) in the prior, then
f ( κ | · ) = n κ e κ i = 1 n ρ q ( Y i μ i ) 1 2 κ 2 f ( κ | · ) = n e κ i = 1 n ρ q ( Y i μ i ) κ f ( κ | · ) = e κ i = 1 n ρ q ( Y i μ i ) 1 .
Finally, for I i t we have
l ( Y i r , I i t ) = κ e κ ρ q ( Y i r μ i r ) l ˙ ( Y i r , I i t ) = γ r t e κ ( q 1 { Y i < μ i } ) l ¨ ( Y i r , I i t ) = 0

Appendix A.3.3. GEV Regression

When Y i r has the form of a GEV Regression with global log-precision κ and shape ξ we have
p r ( Y i r | μ i r , κ , ξ ) = e κ h ( Y i r ) ( ξ + 1 ) / ξ exp h ( Y i r ) ξ 1 h ( Y i r ) = 1 + ξ e κ ( Y i r μ i r ) μ i r = α r + t = 1 T γ r t I i t .
with the additional restriction that h ( · ) > 0 . Calculations for this density have a tendency to become somewhat involved. We first note
a ( Y i r ) log p r ( Y i r | μ i r , κ , ξ ) = κ ξ + 1 ξ log h ( Y i r ) h ( Y i r ) ξ 1
Since h ( Y i r ) / μ i r = e κ ξ we have that
a ˙ ( Y i r ) μ i r log p r ( Y i r | · ) = ( ξ + 1 ) e κ h ( Y i r ) 1 e κ h ( Y i r ) ξ 1 1 a ¨ ( Y i r ) 2 ( μ i r ) 2 log p r ( Y i r | · ) = ξ ( ξ + 1 ) e 2 κ h ( Y i r ) 2 ( ξ + 1 ) e 2 κ h ( Y i r ) ξ 1 2 .
Therefore, to update α r N ( 0 , 1 ) we have
f ( α r ) = i = 1 n a ( Y i r ) α r 2 2 f ( α r ) = i = 1 n a ˙ ( Y i r ) α f ( α r ) = i = 1 n a ¨ ( Y i r ) 1 .
Likewise, to update any γ r t not constrained to 0 or 1 we have
f ( γ r t ) = i = 1 n a ( Y i r ) γ r t 2 2 f ( γ r t ) = i = 1 n a ˙ ( Y i r ) I r t γ r t f ( γ r t ) = i = 1 n a ¨ ( Y i r ) I r t 2 1 .
For the term I r t we note
l ( Y i r , I i t ) = a ( Y i r ) l ˙ ( Y i r , I i t ) = a ˙ ( Y i r ) γ r t l ¨ ( Y i r , I i t ) = a ¨ ( Y i r ) γ r t 2 .
Now focus on the global log precision term κ N ( 0 , 1 ) we have
f ( κ ) = i = 1 n a ( Y i r ) κ 2 2 f ( κ ) = i = 1 n 1 e κ ( ξ + 1 ) ( Y i r μ i r ) + b 1 ( Y i r ) κ f ( κ ) = i = 1 n e κ ( ξ + 1 ) ( Y i r μ i r ) + b 1 ( Y i r ) b 2 ( Y i r ) 1
b 1 = e κ ( Y i r μ i r ) h ( Y i r ) ξ 1 1 b 2 = ( ξ + 1 ) e 2 κ ( Y i r μ i r ) 2 h ( Y i r ) ξ 1 2 .
The calculations for the shape parameter ξ are somewhat more involved. Let
g 1 ( Y i r ) = ξ + 1 ξ log h ( Y i r ) g 2 ( Y i r ) = exp ( ξ 1 + 1 ) log h ( Y i r )
We then obtain
g ˙ 1 ( Y i r ) = g 1 ( Y i r ) ξ = log h ( y t s ) ξ 2 + ξ + 1 ξ h ( Y i r ) 1 e κ ( Y i r μ i r ) g ˙ 2 ( Y i r ) = g 2 ( Y i r ) ξ = g 2 log h ( Y i r ) ξ 2 ( ξ 1 + 1 ) h ( Y i r ) 1 e κ ( Y i r μ i r ) ,
from which it follows that
ξ log p r ( Y i r | · ) = g ˙ 1 g ˙ 2 .
For the second derivative, similar calculations return
2 ( ξ ) 2 log p r ( Y i r | · ) = ξ g ˙ 1 g ˙ 2 = d 1 + d 2 d 3 + d 4 ,
d 1 = 2 ξ 3 log h ( Y i r ) + ξ 2 h ( Y i r ) 1 e κ ( Y i r μ i r ) d 2 = h ( Y i r ) 1 ( Y i r μ i r ) e κ ξ 2 + ξ + 1 ξ h 2 ( Y i r ) ( Y i r μ i r ) 2 e 2 κ d 3 = g ˙ 2 ( Y i r ) log h ( Y i r ) ξ 2 + g 2 ( Y i ) 2 log h ( Y i r ) ξ 3 + h ( Y i r ) 1 e κ ( Y i r μ i r ) ξ 2 d 4 = g ˙ 2 ( Y i r ) h ( Y i r ) 1 e κ ( Y i r μ i r ) ξ g 2 ( Y i ) ( Y i r μ i r ) e κ h ( Y i r ) 1 ξ 2 + h ( Y i r ) 2 ( Y i r μ i r ) e κ ξ .
Hence, for updating ξ N ( 0 , 1 ) we have
f ( ξ | · ) = i = 1 n κ g 1 ( Y i r ) g 2 ( Y i r ) ξ 2 2 f ( ξ | · ) = i = 1 n g ˙ 1 ( Y i r ) g ˙ 2 ( Y i r ) ξ f ( ξ | · ) = i = 1 n d 1 + d 2 d 3 + d 4 1 .

Appendix A.3.4. Updating Theory Indices

We now consider updating of theory indices I i t . Noting that
I i t = X r t β t + ϵ i t , ϵ N ( 0 , ν t 1 )
We have the formulas
f ( I i t | · ) = r = 1 R l r ( I i t | · ) ν t 2 ( I i t X r t β t ) 2 f ( I i t | · ) = r = 1 R l ˙ r ( I i t | · ) ν t ( I i t X r t β t ) f ( I i t | · ) = r = 1 R l ¨ r ( I i t | · ) ν t
Were the l r , l ˙ r and l ¨ r terms are those discussed in the sections above for each respective outcome equation r in the system.

Appendix A.4. Updating Theory Inclusion Parameters via Reversible Jump

Suppose now that γ r t = 0 in the current state of the chain. In the relatively straightforward case in which there is an r < r for which γ r t = 1 – and thus the inclusion of the γ r t will not affect identification matters–we may attempt to make γ r t non zero by proposing γ r t N ( 0 , 1 ) . We thus transition from ( γ r , γ r t ) , where γ r t = 0 to γ t with ( γ t ) r = γ r t and ( γ t ) s = ( γ t ) s for all other s r , a transformation with Jacobian 1. Letting
μ i r = α r + t = 1 T γ r t I i t
μ i r = α r + t = 1 T γ r t I i t
Since our prior sets all γ r s N ( 0 , 1 ) , the auxiliary density cancels with the larger prior we thus have that
log p r ( γ r , γ r t | · ) i = 1 n l r ( Y i t | μ i t ) log p r ( γ r | · ) i = 1 n l r ( Y i t | μ i t )
where l r is the associated log-likelihood for equation r. This gives the necessary log densities for comparing γ r t R and γ r t = 0 . See our discussion in the Conclusions section regarding more focused proposals of γ r t which could aid in mixing and would also make the expressions above slightly more involved.
When γ r t = 0 and γ s t = 1 for s > r , some bookkeeping is necessary to adjust the system. In particular, we sample u N ( 0 , 1 ) . We then create a new vector γ t where
γ s t = 1 , if s = r u γ s t , otherwise
And similarly we move from I i t to I i t by setting I i t = I i t / u , β t = β / u , g t = g t / u and ν t = u ν t , b t = b t / u . We therefore note that while we have changed all γ s t values and the associated theory indices I i t , only the likelihood for the dependent variable r is affected and comparisons can then be performed as discussed above.


  1. Brock, William A., Steven N. Durlauf, and Kenneth D. West. 2003. Policy Evaluation in Uncertain Economic Environments. Technical Report. Cambridge: National Bureau of Economic Research. [Google Scholar]
  2. Chen, Ray-Bing, Yi-Chi Chen, Chi-Hsiang Chu, and Kuo-Jung Lee. 2017. On the determinants of the 2008 financial crisis: A Bayesian approach to the selection of groups and variables. Studies in Nonlinear Dynamics & Econometrics 21: 1–17. [Google Scholar]
  3. Dobra, Adrian, and Alex Lenkoski. 2011. Copula Gaussian graphical models and their application to modeling functional disability data. The Annals of Applied Statistics 5: 969–93. [Google Scholar] [CrossRef]
  4. Dyrrdal, Anita Verpe, Alex Lenkoski, Thordis L. Thorarinsdottir, and Frode Stordal. 2015. Bayesian hierarchical modeling of extreme hourly precipitation in norway. Environmetrics 26: 89–106. [Google Scholar]
  5. Fasiolo, Matteo, Yannig Goude, Raphael Nedellec, and Simon N. Wood. 2017. Fast calibrated additive quantile regression. arXiv. [Google Scholar] [CrossRef]
  6. Gamerman, Dani, and Hedibert F. Lopes. 2006. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. London: Chapman and Hall/CRC. [Google Scholar]
  7. Green, Peter J. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–32. [Google Scholar] [CrossRef]
  8. Hastie, Trevor, and Robert Tibshirani. 1990. Generalized Additive Models. London: Chapman and Hall/CRC. [Google Scholar]
  9. Hoff, Peter D. 2007. Extending the rank likelihood for semiparametric copula estimation. The Annals of Applied Statistics 1: 265–83. [Google Scholar] [CrossRef]
  10. Karl, Anna, and Alex Lenkoski. 2012. Instrumental variable Bayesian model averaging via conditional Bayes factors. arXiv. [Google Scholar]
  11. Kourtellos, Andros, Alex Lenkoski, and Kyriakos Petrou. 2019. Measuring the strength of the theories of government size. In Empirical Economics. Berlin: Springer, pp. 1–38. [Google Scholar]
  12. Lenkoski, Alex. 2013. A direct sampler for G-Wishart variates. Stat 2: 119–28. [Google Scholar] [CrossRef]
  13. Ley, Eduardo, and Mark F.J. Steel. 2009. On the effect of prior assumptions in bayesian model averaging with applications to growth regression. Journal of Applied Econometrics 24: 651–74. [Google Scholar] [CrossRef]
  14. Möller, Annette, Alex Lenkoski, and Thordis L. Thorarinsdottir. 2013. Multivariate probabilistic forecasting using ensemble Bayesian model averaging and copulas. Quarterly Journal of the Royal Meteorological Society 139: 982–91. [Google Scholar] [CrossRef]
  15. Reich, Gary. 2002. Categorizing political regimes: New data for old problems. Democratization 9: 1–24. [Google Scholar] [CrossRef]
  16. Robert, Christian, and George Casella. 2013. Monte Carlo Statistical Methods. Berlin: Springer Science & Business Media. [Google Scholar]
  17. Roubini, Nouriel, and Paolo Manasse. 2005. “Rules of Thumb” for Sovereign Debt Crises. Working paper no. 05/42. Washington: IMF. [Google Scholar]
  18. Rue, Håvard. 2001. Fast sampling of Gaussian Markov random fields. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63: 325–38. [Google Scholar] [CrossRef]
  19. Rue, Havard, and Leonhard Held. 2005. Gaussian Markov Random Fields: Theory and Applications. London: Chapman and Hall/CRC. [Google Scholar]
  20. Savona, Roberto, and Marika Vezzoli. 2015. Fitting and forecasting sovereign defaults using multiple risk signals. Oxford Bulletin of Economics and Statistics 77: 66–92. [Google Scholar] [CrossRef]
  21. Steel, Mark F.J. 2019. Model averaging and its use in economics. arXiv. [Google Scholar]
  22. Wood, Simon N. 2017. Generalized Additive Models: An Introduction with R. London: Chapman and Hall/CRC. [Google Scholar]
  23. Zellner, Arnold. 1962. An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. Journal of the American Statistical Association 57: 348–68. [Google Scholar] [CrossRef]
Table 1. Overview of variables considered in the SRI dataset.
Table 1. Overview of variables considered in the SRI dataset.
TheoryShort Variable NameDescription
InsolvencyMACMarket access to capital markets, dummy
InsolvencyIMFIMF lending dummy
InsolvencyCAYCurrent account balance, in % of GDP
InsolvencyResGReserves growth/change in %
InsolvencyXGExport growth in %
InsolvencyWXExport in USD billions
InsolvencyTEDXTotal external debt to exports, in %
InsolvencyMGImport growth, in %
InsolvencyFDIYForeign direct investment to GDP, in %
InsolvencyFDIGChange in % of foreign direct investment inflows
InsolvencyTEDYTotal external debt to GDP, in %
InsolvencySEDYShort term external debt to GDP, in %
InsolvencyPEDYPublic external debt to GDP, in %
InsolvencyOPENExports and imports over GDP, in %
IlliquiditySTDRShort term debt to reserves
IlliquidityM2RM2 to reserves
IlliquidityDSERDebt service on long term debt to reserves
MacroeconomicDOilOil producing dummy
MacroeconomicRGRWTReal (inflation adjusted) GDP change in %
MacroeconomicOVERExchange rate residual over liner trend
MacroeconomicUSTUS treasury bill
PoliticalPRIndex of political rights, 1 (most free) to 7 (least free)
PoliticalHistoryNumber of past defaults
SystemicCont_totNumber of defaults in the world
SystemicCont_areaNumber of defaults in the region the country is part of
Table 2. Theory Inclusion Probabilities by Dependent Variable.
Table 2. Theory Inclusion Probabilities by Dependent Variable.
Table 3. Mean value of γ r t conditional on inclusion.
Table 3. Mean value of γ r t conditional on inclusion.
Table 4. Proxy Level Results for the Insolvency Theory.
Table 4. Proxy Level Results for the Insolvency Theory.
ProbabilityConditional Mean
Table 5. Proxy Level Results for the Illiquidity Theory.
Table 5. Proxy Level Results for the Illiquidity Theory.
ProbabilityConditional Mean
Table 6. Proxy Level Results for the Macroeconomic Theory.
Table 6. Proxy Level Results for the Macroeconomic Theory.
ProbabilityConditional Mean
Table 7. Proxy Level Results for the Political Theory.
Table 7. Proxy Level Results for the Political Theory.
ProbabilityConditional Mean
Table 8. Proxy Level Results for the Systemic Theory.
Table 8. Proxy Level Results for the Systemic Theory.
ProbabilityConditional Mean
Table 9. Highest and Lowest Five Country/Year Pairs for the Insolvency Theory.
Table 9. Highest and Lowest Five Country/Year Pairs for the Insolvency Theory.
Korea, Rep.2009−6.15704.7043.20.071
Trinidad and Tobago19874.40707.6949.37NA
Sri Lanka20094.448022.5645.220.012
Table 10. Highest and Lowest Five Country/Year Pairs for the Illiquidity Theory.
Table 10. Highest and Lowest Five Country/Year Pairs for the Illiquidity Theory.
Table 11. Posterior correlation matrix of theory indices. This table mainly shows that the indices have the desirable property of low dependence between one another.
Table 11. Posterior correlation matrix of theory indices. This table mainly shows that the indices have the desirable property of low dependence between one another.
Table 12. Comparison of out of sample scores for each dependent variable. This table shows that, with the exception the default variable, there is significant reduction in predictive loss when using a joint modeling framework. Asterisks indicate greater than 99% significance of differences based on a permutation test.
Table 12. Comparison of out of sample scores for each dependent variable. This table shows that, with the exception the default variable, there is significant reduction in predictive loss when using a joint modeling framework. Asterisks indicate greater than 99% significance of differences based on a permutation test.
VariableJoint ModelSingle ModelRatio Joint to Single
Unemployment19.99222.7650.878 *
Inflation2.3622.6580.889 *
Devaluation3.5285.4640.646 *
Back to TopTop