Revisiting Structural Breaks in the Terms of Trade of Primary Commodities (1900–2020)—Markov Switching Models and Finite Mixture Distributions

: This paper presents an analysis of the long-term dynamics of the terms of trade of primary commodities (TTPC) using an extended data set for the whole period 1900–2020. Following our original contribution, we implement three approaches of time series—the ﬁnite mixture of distributions, the Markov ﬁnite mixture of distributions, and the Markov regime-switching model. Our results conﬁrm the hypothesis of the existence of a succession of three different dynamic regimes in the TTPC over the 1900–2020 period. It seems that the uncertainty characterising the long-term dynamic analysis of TTPC is better taken into account with a Markov hypothesis in the transition from one regime to another than without this hypothesis. In addition, this hypothesis improves the quality of the time series segmentation into regimes.


Introduction
One of the main conclusions emerging from the abundant literature dedicated to the study of the long-term evolution of primary commodities' prices is that structural breaks constitute an essential characteristic for the comprehension of the long-term dynamics of terms of trade of primary commodities. Empirical studies of price volatility assess a high level of uncertainty, especially in the post-2008 boom research [1]. However, this literature appears inconclusive on the question of the identification of structural breaks. In this paper, we explore this question by implementing three time series approaches-that have not been, to our knowledge, considered in this literature-for detecting these breaks. We identify structural breaks as the endpoints of the time periods obtained by clustering the data (mixture distributions) or as the endpoints of the regimes (Markov switching regimes). Following our original contribution [2] to the empirical literature on the Prebisch-Singer hypothesis [3,4] of a secular decline in the terms of trade of primary commodities (TTPC), in this paper, we consider an extension of our approaches to the whole period of 1900-2020. The data correspond to the Grilli and Yang Index, here after {GY t } t=t 1 , ..., t N , see [5,6]. The three approaches of time series we implement-the finite mixture of distributions, the Markov finite mixture of distributions and the Markov regime-switching model-converge in the detection of three different regimes over the 1900-2020 period.
The three following sections of the paper present the methodology and results of, respectively, a finite mixture of distributions approach (Section 2) a finite Markov mixture of distributions approach (Section 3) and a Markov switching model approach (Section 4). The last section is dedicated to the discussion and conclusion (Section 5).

A Finite Mixture Distributions Approach
To investigate the hypothesis that the time series {GY t } t=t 1 , ..., t N follows different periods over 1900-2020, we first used a finite mixture of distributions with normal components, as a way of putting similar data points (years) together into clusters (which we call regimes). Clusters are represented by the components' distributions of the mixture. The idea is that the years that exhibit the same behaviour belong to the same cluster and come from the same distribution.
A very detailed account of the practical aspects of Markov Chain Monte Carlo (MCMC) for mixture of distributions is given in Frühwirth-Schnatter [7]. The Handbook of mixture analysis [8] provides an overview of the methods of mixture modelling.

Methodology
A finite mixture of normal distributions can be defined as follows: where: K is the number of components, η i is the mixing weight of the ith component, f i is a normal component distribution of mean µ i and variance σ 2 i . In this approach, three kinds of statistical inference problems have to be considered:

•
The specification of the number of components K, • The component parameters (µ i , σ 2 i ) and the weight distribution (η 1 , . . . , η K ) should be estimated from the data, Finally, we must assign each observation of the time series, {GY t } t=t 1 , ..., t N , to a certain component of the mixture model by making inference on a hidden vector indicator S = (S t 1 , . . . , S t N ). To estimate the parameters of the components and the weights, we use Bayesian estimation [9] with MCMC [10] and a two block Gibbs sampling algorithm [7]: (1) Parameter simulation conditional on the classification S = (S t 1 , . . . , S t N ): a. Sample the weights η = (η 1 , . . . , η K ) from a Dirichelet posterior p(η|S), b.
Sample the variances σ 2 i in each group i, from an inverted Gamma distribution Sample the means µ i in each group i, from an inverted Gamma distribution The precise form of b i (S), B i (S), c i (S), C i (S) depends upon the chosen prior distribution family.
(2) Classification of each observation y i conditional on knowing µ = (µ 1 , . . . , µ K ), . . , σ 2 K and η = (η 1 , . . . , η K ): The number of components may be known or unknown. In our case, the number of components is unknown, and our model selection is based on marginal likelihood [11]. In the academic literature, the unknown number of regimes taken into account is three at most. To determinate the best model, we expand the number of potential regimes to five. Thus, we chose the model with the largest marginal likelihood, approximated by three estimators [7]: • RI is the estimator obtained by reciprocal importance sampling, • IS is the estimator obtained by importance sampling, • BS is the estimator obtained by bridge sampling techniques.
For computing purposes, we use the Matlab library Bayesf 2.0 in this publication.

Results
The results are presented in the following four sections. First, we confirm the existence of three different components in the mixture. Then, we present the statistical parameters (mean, standard deviation, and weight) of each distribution, associated to the correspondent regime (regime1: 1900-1921; regime 2: 1922-1985 and 2006-2020; regime 3: 1986-2005). The third sub-section presents a point representation of posterior draws. The fourth sub-section clusters the data based on MCMC draws.
In all Monte Carlo simulations using posterior draws, we use 1,000,000 draws after a burn-in of 100,000 draws.

The Choice of the Number of Components
If K is not too large, the different estimators should approximatively agree. As K increases, we observe that, the reciprocal importance sampling and the importance sampling estimators are less precise than the bridge sampling estimator, although all three select the same number of components: among the considered models (number of components ≤ 5), the model with the largest marginal likelihood is a mixture of three normal distributions.
Thus, the results (Table 1) for the mixture of distribution models confirm the accuracy of the hypothesis of the existence of three different components, as already established in our previous analysis for the 1900-2016 period.

The Parameters of the Mixture of Three Normal Distributions
The components of the mixture differ mainly in the mean. Components 2 and 3 have nearly the same variance, whereas the first component has a variance that is slightly higher ( Table 2). . This scatter plot is closely related to the point process representation of the underlying mixture distribution. A finite mixture distribution from a fixed parametric family has a representation as a marked point process [12]. Here, we use point process representation (Figure 1) of draws from the posterior density. Three clusters of draws are distinguished, they will scatter around the . This scatter plot is closely related to the point process representation of the underlying mixture distribution. A finite mixture distribution from a fixed parametric family has a representation as a marked point process [12]. Here, we use point process representation (Figure 1) of draws from the posterior density. Three clusters of draws are distinguished, they will scatter around the three points corresponding to the true point process representation, with the spread of the clouds representing the uncertainty of estimating the points (Figure 1).

Clustering the Data
We perform clustering of the data into three groups ( Figure 2) based on the MCMC draws. Three criteria are used: • The Bayesian maximum a posteriori (MAP), • The similarity matrix based on the posterior similarity, • The misclassification rate.
Three regimes are confirmed, (1900-1921; 1922-1985 and 1986-2020; 1986-2005). The second regime is interrupted by the regime 1986-2005, which represents the lowest level in the terms of trade of primary commodities (see Section 5). However, we observe that some years have an ambiguous cluster membership.

Methodology
In the finite mixture models approach, we assign each observation of the time series ). Now, we suppose that this allocation vector is a

Clustering the Data
We perform clustering of the data into three groups ( Figure 2) based on the MCMC draws. Three criteria are used:

•
The Bayesian maximum a posteriori (MAP), • The similarity matrix based on the posterior similarity, • The misclassification rate.
distribution from a fixed parametric family has a representation as a marked point process [12]. Here, we use point process representation (Figure 1) of draws from the posterior density. Three clusters of draws are distinguished, they will scatter around the three points corresponding to the true point process representation, with the spread of the clouds representing the uncertainty of estimating the points (Figure 1).

Clustering the Data
We perform clustering of the data into three groups ( Figure 2) based on the MCMC draws. Three criteria are used: • The Bayesian maximum a posteriori (MAP), • The similarity matrix based on the posterior similarity, • The misclassification rate.
Three regimes are confirmed, (1900-1921; 1922-1985 and 1986-2020; 1986-2005). The second regime is interrupted by the regime 1986-2005, which represents the lowest level in the terms of trade of primary commodities (see Section 5). However, we observe that some years have an ambiguous cluster membership.

Methodology
In the finite mixture models approach, we assign each observation of the time series ). Now, we suppose that this allocation vector is a Three regimes are confirmed, (1900-1921; 1922-1985 and 1986-2020; 1986-2005). The second regime is interrupted by the regime 1986-2005, which represents the lowest level in the terms of trade of primary commodities (see Section 5). However, we observe that some years have an ambiguous cluster membership.

Methodology
In the finite mixture models approach, we assign each observation of the time series {GY t } t=t 1 , ..., t N to a certain component of the mixture model by making inference on a hidden vector indicator S = (S t 1 , . . . , S t N ). Now, we suppose that this allocation vector is a hidden Markov chain, GY t = µ S t + ε t where ε t is a zero-mean white noise process with variance σ 2 , which is a special case of interest of finite Markov mixture of distributions. Now, the transition probability matrix T of the hidden Markov chain S = (S t 1 , . . . , S t N ) is unknown and need to be estimated from the data. We suppose that the Markov chain is aperiodic and starts from its ergodic distribution η = (η 1 , . . . , η K ): P( S N = k|T) = η k and the transition probability matrix T is defined by: T ji = P(S t+1 = j|S t = i) for i, j = 1, . . . , K and t = t 1 , . . . , t N − 1.
What is the relation between finite mixture distributions and finite Markov mixture distributions? In fact, every finite mixture of distributions may be considered of as a limiting case of a finite Markov mixture of the same family of distributions where S = (S t 1 , . . . , S t N ) is an i.i.d. random sequence and where the transition probabilities are all equal to η k .

Results
The results are presented in the following three sub-sections. We present the statistical parameters (mean and standard deviation) of each distribution, associated to the correspondent regime (regime1: 1900-1921; regime 2: 1922-1985 and 2006-2020; regime 3: 1986-2005) and transition probabilities from one regime to another one. The second subsection presents a point representation of posterior draws. The third sub-section clusters the data based on MCMC draws.

The Parameters of the Markov Mixture of Three Normal Distributions
The components of the mixture differ mainly in the mean but have nearly the same variance (Table 3). The transition probabilities T 11 , T 22 , T 33 are high ( Figure 3, Table 4), which indicate that is difficult to change from on regime to the other. variance σ 2 , which is a special case of interest of finite Markov mixture of distributions. Now, the transition probability matrix T of the hidden Markov chain = ( , … , ) is unknown and need to be estimated from the data. We suppose that the Markov chain is aperiodic and starts from its ergodic distribution = ( , … , ): ( = | ) = and the transition probability matrix T is defined by: = ( = | = ) for , = 1, … , and = , … , − 1.
What is the relation between finite mixture distributions and finite Markov mixture distributions? In fact, every finite mixture of distributions may be considered of as a limiting case of a finite Markov mixture of the same family of distributions where = ( , … , ) is an i.i.d. random sequence and where the transition probabilities are all equal to .

Results
The results are presented in the following three sub-sections. We present the statistical parameters (mean and standard deviation) of each distribution, associated to the correspondent regime (regime1: 1900-1921; regime 2: 1922-1985 and 2006-2020; regime 3: 1986-2005) and transition probabilities from one regime to another one. The second subsection presents a point representation of posterior draws. The third sub-section clusters the data based on MCMC draws.

The Parameters of the Markov Mixture of Three Normal Distributions
The components of the mixture differ mainly in the mean but have nearly the same variance (Table 3). The transition probabilities , , are high ( Figure 3, Table 4), which indicate that is difficult to change from on regime to the other.

Point Process Representation of Posterior Draws
We observe that this time, the clusters obtained with the point process representation of posterior draws in the case of a Markov finite mixture (Figure 4) are well-separated  We observe that this time, the clusters obtained with the point process representation of posterior draws in the case of a Markov finite mixture (Figure 4) are well-separated and have less dispersion compared with that of the clusters obtained in the case of a finite mixture of distributions. The shapes of the clusters are also different.

Clustering the Data
We confirm the existence of the three regimes previously found (1900-1921; 1922-1985 and 1986-2020; 1986-2005). This time, all the years have a perfect cluster membership. The periods of the regimes are well defined ( Figure 5).

Methodology
A finite Markov switching (MS) model assumes that the dynamics of a data series, ,…, , depend on a discrete latent variable , postulated to follow a Markov chain with realizations in {1, ..., K}. This model was popularized by Hamilton [13,14] who applied the Markov-switching approach to model the probability of a recession in the U.S.

Clustering the Data
We confirm the existence of the three regimes previously found (1900-1921; 1922-1985 and 1986-2020; 1986-2005). This time, all the years have a perfect cluster membership. The periods of the regimes are well defined ( Figure 5). We observe that this time, the clusters obtained with the point process representation of posterior draws in the case of a Markov finite mixture (Figure 4) are well-separated and have less dispersion compared with that of the clusters obtained in the case of a finite mixture of distributions. The shapes of the clusters are also different.

Clustering the Data
We confirm the existence of the three regimes previously found (1900-1921; 1922-1985 and 1986-2020; 1986-2005). This time, all the years have a perfect cluster membership. The periods of the regimes are well defined ( Figure 5).

Methodology
A finite Markov switching (MS) model assumes that the dynamics of a data series, ,…, , depend on a discrete latent variable , postulated to follow a Markov chain with realizations in {1, ..., K}. This model was popularized by Hamilton [13,14] who applied the Markov-switching approach to model the probability of a recession in the U.S.

Methodology
A finite Markov switching (MS) model assumes that the dynamics of a data series, {y t } t=t 1 , ..., t N , depend on a discrete latent variable S t , postulated to follow a Markov chain with realizations in {1, . . . , K}. This model was popularized by Hamilton [13,14] who applied the Markov-switching approach to model the probability of a recession in the U.S. economy. In this model, the economy alternates between two unobserved states of economic expansion and recession according to a Markov chain process. The model assumes constant transition probabilities for the unobserved states, which, in turn, imply constant expected durations in the various regimes. A general representation is given by: where: y t denotes the series observed, X t f ,i are the independent regressors with fixed effects, Eng. Proc. 2021, 5, 34 7 of 9 X t r,i are the independent regressors with random effects, y t−i these variables represent the autoregressive part of model, ε t are independent variables with N (0, σ 2 ε,S t ) distribution, S t is modelled by a homogeneous Markov chain with K states.
For i = 1, . . . , K: We consider only the case where there is no fixed or random effects and no autoregressive part in the model.
Essentially, two computational approaches can be used for the estimation of Markovswitching models. One approach involves maximising the log-likelihood, a function of the transition probabilities, subject to the constraint that the probabilities lie between 0 and 1 and sum to unity. This can be done with the EM algorithm [15], but the non-linear programming approach [16] can also be used. We mobilise this last approach implemented in Oxmetrics. An alternative approach involves using Bayesian estimators with MCMC methods.

Results
The results of a three-regime model based on the terms of trade of commodities are shown in Tables 5 and 6, and Figure 6. There is a perfect match with the previous results, notably concerning the identification of three regimes over the exact same sub-periods.

Discussion and Conclusions
The existence of different regimes appears robust to various changes in the data span. Indeed, considering data from 1900 to 2010, or 1900 to 2014, or 1900 to 2016 or 1900 to 2020

Discussion and Conclusions
The existence of different regimes appears robust to various changes in the data span. Indeed, considering data from 1900 to 2010, or 1900 to 2014, or 1900 to 2016 or 1900 to 2020 on the same {GY t } t=t 1 , ..., t N index leads to the same representation, with the same break dates (1921,1986,2006). The approach using a Markov finite mixture of distributions and the approach using a Markov switching model give very similar results. These two methods differ essentially in the computational aspects. The former uses Bayesian estimation with MCMC and the later involves maximising the log-likelihood. The fact that each observation of the time series {GY t } t=t 1 , ..., t N is assigned to a certain component of the Markov mixture model by making inference on a hidden Markov vector indicator, improves the results obtained with the finite mixture model. This time, all years have a perfect cluster membership.
These three approaches applied to the extended 1900-2020 data set confirm the identification of a succession of three different dynamic regimes in the TTPC over the 1900-2020 period. The third regime (1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005) is still characterized by the lowest level of terms of trade of the whole period, and the return to the second regime after 2005 is associated with a price significantly higher (56.7% higher). Such an upward shift in primary commodities' prices is unprecedented at the scale of the 20th century and questions more specifically the hypothesis of a secular decline in the terms of trade of primary commodities. Indeed, the entry into a higher level of prices contradicts the hypothesis of a secular decline. However, from 1900 to 2006, this decline manifested itself through the succession of regimes with a lower average level of primary commodity terms of trade, but not in a continuous way. Moreover, data from 2020 for TTPC do not exhibit a specific pattern, leaving open the question of the effect of COVID on the long-term dynamics of primary commodity prices. Therefore, the dynamics behind the evolution of primary commodities in the long-run call for alternative explanations and a change of perspective.
This paper contributes to this change of perspective by considering (and confirming) the existence of three changes in regime in the long term (121 years). Yet, an operational theory of long-term dynamic regime change in primary commodities' terms of trade is still to be constructed.
Following the methodologies used in this present paper, a promising perspective appears to be the introduction of explanatory variables (such as the GDP of main countries, the share of emerging countries in the global GDP, and various indices of real interest rate and exchange rates) in a Markov switching model, in order to identify the incidence of these covariates on the dynamic regimes.