Modeling the Yield Curve of BRICS Countries: Parametric vs. Machine Learning Techniques

: We compare parametric and machine learning techniques (namely: Neural Networks) for in–sample modeling of the yield curve of the BRICS countries (Brazil, Russia, India, China, South Africa). To such aim, we applied the Dynamic De Rezende–Ferreira ﬁve–factor model with time–varying decay parameters and a Feed–Forward Neural Network to the bond market data of the BRICS countries. To enhance the ﬂexibility of the parametric model, we also introduce a new procedure to estimate the time varying parameters that signiﬁcantly improve its performance. Our contribution spans towards two directions. First, we offer a comprehensive investigation of the bond market in the BRICS countries examined both by time and maturity; working on ﬁve countries at once we also ensure that our results are not speciﬁc to a particular data–set; second we make recommendations concerning modelling and estimation choices of the yield curve. In this respect, although comparing highly ﬂexible estimation methods, we highlight superior in–sample capabilities of the neural network in all the examined markets and then suggest that machine learning techniques can be a valid alternative to more traditional methods also in presence of marked turbulence.


Introduction
The term structure of interest rates, whose graphical representation is given by the yield curve, describes the relationship between market interest rates and different times to maturity, and provides an ex-ante measure of the investor's return in a fixed income market (Saunders and Cornett 2014). Besides, the yield curve contains fundamental information to analyze the economic and financial situation of a country, which can be interpreted in terms of market expectations of monetary policy, economic activity and inflation over short, medium and long-term horizons; for this reason it is often employed to support macroeconomic strategies. Modeling it is therefore fundamental for financial economists and risk managers to define hedging and pricing strategies, as well as to get an effective assessment of portfolio risk (Pereda 2009). Furthermore, yield curves can also provide valuable information as input for financial stability, and banking supervision. Besides, once a nominal yield curve is computed, a term structure of real interest rates and break-even inflation rates can be derived.
When estimating yield curves, an important challenge is that they should reflect as many as possible relevant movements in the underlying term structure of interest rates. In the past decades an extensive literature has been developed accordingly: models based on stochastic processes (Hess 2020), methods relying on splines (Filipović 2009), factor models (Diebold and Rudenbusch 2017) and techniques based on machine learning algorithms (Lopez De Prado 2018), to cite more relevant research strands. Shifting our attention to practical applications, Chakroun and Abid (2014); Ullah and Bari (2018) pinpointed that the ability to replicate yield curves dynamics and stylized facts, as testified by recent works of Rosadi et al. (2011), Vela (2013), Posthaus (2019) and Suimon et al. (2020). The FFNNs capability to manage the ex-ante uncertainty turns out to be of paramount importance within the BRICS bond market where FFNNs are asked to identify the functional form of the yield curve as well as to overcome the limitations of parametric models in presence of multiple humps. Furthermore, the possibility of customizing FFNNs settings for each country, in order to achieve improvements in their fitting ability for all markets is without any doubt an advantage of using this technique.
In short, our work will try to address the following instances. (i) We try to offer a comprehensive view of the yield curves of the BRICS countries through techniques of consolidate use on developed countries markets. (ii) Applying them on the BRICS, that is a set of five emerging countries, in our opinion, should give proof that the results are general and not data-dependent. (iii) An in-depth study of the fitting abilities of those techniques on BRICS yield curves, would also help to better address the forecasting issue for future research.
The remainder of the paper is organized as follows. Section 2 provides a brief description of both the De Rezende-Ferreira model and the neural architecture employed in our simulation; Section 3 contains the results and their discussion; Section 4 concludes and offers some suggestions for further research.

The Five Factor De Rezende-Ferreira Model
The De Rezende-Ferreira model assumes to estimate the value RF of the zero-coupon spot rate depending on a 5-parameter vector β and a two dimension array τ to model the set of humps observable on the yield curve dynamics: where t is the estimation time, m k is the maturity over a set of N possible values m 1 , m 2 , . . . , m N that can be both fractions and multiples of the year, β = (β 0 , β 1 , β 2 , β 3 , β 4 ) and τ = (τ 1 τ 2 ) are the parameters and decay terms vectors, respectively. In particular, β 0 represents the impact of the long-term component which is constant for every maturity; β 1 and β 2 are the weights associated to the short-term components, β 3 and β 4 are the weights of the medium-term components. Additionally, τ 1 and τ 2 control the convergence speed of the exponential components and determines the maturity at which the medium-term components reach their maximum. By construction, there is a trade-off between the values of τ 1 , τ 2 and the effectiveness of the model at long/short term maturities. Big values of τ j (j = 1, 2), in fact, result in a slow decay and ensure a better fit at long maturities but not in the short term if marked curvatures are present; conversely small values of τ j (j = 1, 2) get a quick decay and hence a better fit at short maturities, but not in the long run. Finally, m k is the error term for all the maturities m k , k = 1, . . . , N:, with: In our study we used a slight modification of the model discussed in De Rezende and Ferreira (2008). We consider the decay components as time-varying parameters and we run a two-step estimation procedure that at each time t finds the optimal pair (τ * 1t ,τ * 2t ) as the one associated to the lowest Root Mean Square Error (RMSE). This optimal pair is then employed to getβ * (t) = (β * 0t ,β * 1t ,β * 2t ,β * 3t ,β * 4t ) . In contrast to an a priori selection of decay terms, this procedure provides the model with the highest adaptive capability, that is a very desirable feature to use in such a turbulent context like that of the BRICS markets.
For an easier understanding, on following we describe the main steps of the procedure.

1.
For each market we define the sets Ω j = {m j,k } k=1,...,N j of maturities m j,k with j = 1, 2 and N j equal to the sets cardinality. In particular, m 1,1 is the lower bound of Ω 1 (m 1,L ) and corresponds to the first available maturity of the market, while the upper bound m 1,U is, at the same time, the lower bound of Ω 2 , that is m 1,U = m 2,L and it is equal to the straddling maturity between the short and medium-term period. Finally, the upper bound of Ω 2 (m 2,U ) is the longest observed maturity. In our study we set m 2,U = 30 years, as in general there aren't any bonds traded for longer maturities in the analyzed markets. Values in Ω 1 and Ω 2 ranges between corresponding lower/upper values by proper step sizes ∆ 1 and ∆ 2 . As the step size can affect the overall performance of the procedure, we tried various step sizes in the range [0.25, 0.75] for ∆ 1 and [0.25, 1] for ∆ 2 . After extensive simulations we set ∆ 1 = 0.75 and ∆ 2 = 1.

2.
For each maturity m j,k in the sets Ω 1 and Ω 2 we estimated the parameters τ 1 (t) and τ 2 (t) that maximize the medium term component: In this way we get as many curves as the number of maturities.

3.
For each time t in the time horizon of length T and for every maturity m j,k , k = 1, . . . , N, keep τ 1 (t) constant and vary τ 2 (t) to estimate by OLS different array setsβ(t); choose then the setβ * (t) associated to the lowest Sum of Squared Residuals (SSR): with y(t, m k ) and RF(t, m k ,β(t), τ(t)) being the observed and fitted spot rates respectively.

4.
Repeat Step 3 for all τ 1 (t) so that there are as many sets of optimal parametersβ * (t) as the τ 1 values. Then, select the set with the lowest SSR and fit the yield curve at the desired time t:

5.
Repeat Steps 3-4 for each time t (t = 1, . . . , T), to get the set of T yield curves fitting and the related time series for all the model parameters.

Feed Forward Neural Networks
Feed Forward Neural Networks (FFNNs) are non-linear regression tools which do not require any a priori assumption about the functional form or statistical properties of the data set under examination and can be used to identify and model nonlinearity in the data (Hornik et al. 1989). They are made of information processing units (nodes or neurons) arranged into one or more interconnected layers. Figure 1 shows a graphic representation of an artificial neuron: each input vector x = (x 1 , . . . , x n ) is processed by a linear combiner (Σ) through the use of a weight vector w = (w 1 , . . . , w n ) , and hence transformed with the aid of a proper activation function (usually the sigmoid). At this stage a bias value can be inserted to delay the triggering of the activation function.
Activation function z j Output . . . Nodes are grouped in three types of layers: the Input layer with nodes supplying the input features to next layers consisting of one or more Hidden layers which, in contrast to Input/Output layers, are not in direct contact with either the network input or the output. The response value (z j ) is sent either to another hidden layer (if any) or to the Output layer that is the neurons layer which generates the final response of the network.
In order to obtain the desired output, the network must undergo a learning process which identifies the optimal weights configuration (Wilamowski and Irwin 2011): increasing or decreasing the weights values by means of proper learning algorithms changes the strength of nodes connections and directly affects the network capability to learn the input space features. The FFNN makes use of a supervised learning algorithm: information and correct target results are available and presented to the network which tries to define the optimal weights configuration, so that the network response is as close as possible to the correct output. In detail, we used the Backpropagation Algorithm-BPA (Rumelhart et al. 1986). This is an iterative procedure through a certain number of cycles (epochs), each including two phases: a forward stage with the network, initialized at random, generating the output signals (responses), and a backward phase. In the latter the network responses are compared to the target values, the error is back-propagated from the output layer through the hidden ones towards the inputs, and then used to update the networks coefficients to reduce the error at the end of the next forward phase. The process goes on until the optimal combination of weights that minimizes a loss function (e.g., the Mean Squared Error-MSE) is determined according to the gradient descent criterion, so that the estimated output values are the closest to the target output.

Data
Our dataset consists of daily returns for government zero-coupon bonds (ZCB) of the BRICS countries, as summarized in Table 1, where for each country we provided the Start and the End of the observation period, the overall number of observations and the dataset source, that is either TRD (Thomson Reuters Datastream) or CBR (The Central Bank of the Russian Federation). As previously said, different maturities could be used to represent the yield curve, depending on the liquidity conditions and availability of information of the analyzed market: for example Diebold and Li (2006) examined maturities from 3 to 120 months (US), while De Rezende and Ferreira (2008) focused on maturities in the range 1 to 60 months (Brazil), and Caldeira et al. (2010) from 1 to 33 months (Brazil). In this work we considered maturities from 3 months (i.e., 0.25 of the year) to 30 years. Combining the available maturities with the observed time we obtained a tensor whose number of rows is equal to the number of analyzed days and with the number of columns equal to the number of maturities. In this way, for each maturity it is possible to observe the evolution of the spot rates time-series: Figure 2a provides an example on the Chinese market with the maturity set at 2, 7 and 15 years respectively; moreover for each day it is possible to extract the yield curve varying the maturities: in Figure 2b we give an example on the Chinese market with t set to 24/01/2005, 09/07/2007, 23/10/2014 and 30/12/2020.  Furthermore, varying both t and m for each country it is also possible to build a 3D surface chart as shown in Figure 3 for all the BRICS, where the x-axis reports the time t, the y-axis shows the maturity m expressed in fractions (multiples) of the year and the z-axis the observed level of the interest rate at each time t and for each maturity m.
Looking at the shapes of the yield surfaces in Figure 3 suggest that they are the result of quite different instances, that is they have been affected not only by external conditions such as the global crisis or the drop in the commodities demand, but also by internal drivers, such as policy decisions, unemployment and recession in general. In the case of Brazil, (Figure 3a), for instance, pronounced spikes are observable since 2015, when the country began to be interested by a crippling two-year recession (in 2015 and 2016) which was only partially recovered in next years. The booms and busts of the yield surface after 2017 bear witness of the uncertainty dominating the market in those years. In the case of Russia (Figure 3b), on the other hand, the highest turbulence in the yield surface corresponds to the period between 2014 and 2016, when the Russian economy suffered from a currency crisis caused by the collapse of oil prices and the country's engagement in the conflict with Ukraine. In the case of India (Figure 3c), instead, the mostly flat surface is due to the persistence of economic regression during the whole 2010s. Notably, we can observe an inverted shape of the yield surface corresponding to the period 2012-2014, when India underwent the worst slowdown of the decade in the manufacturing and mining sectors, both of which labour intensive crucial sectors for the growth of other sectors. Conversely, at the end of 2020 the surface turned to a normal behaviour, with lowest (higher) interest rates associated to lower (higher) maturities; how much this stage is solid will depend on the country capability in replying to ongoing inflation of fuel, food prices as well as rising urban unemployment. Similar considerations can be extended to the discussion of the yield surface of South Africa in Figure 3e: in the period 2011-2015, the surface is extremely flat, in connection to higher turbulence in the markets. In this period, in fact, like all economies dependent on commodity prices, the South African economy has been exposed to two highly correlated external shocks: the slowdown in Chinese demand for commodities, which has become the leading destination for South African exports in recent years and the ensuing decline in iron ore. Finally, if we turn the attention to Figure 3d, we can observe that highest variability corresponds to the period from June 2007 to September 2010, when the Chinese financial market faced the Global Financial Crisis, while the flattening of the surface in the following years is due partly to the trade war with the USA and partly to central government driving the country's transition towards an economy led by consumption and services, rather than one driven by exports and investment.
Overall, we can preliminary conclude that BRICS countries offer the ground for testing the effectiveness of fitting methods in presence of sensitively varying conditions.
For each country data we then run simulations with the 5F-DRF according to the guidelines given in Section 2.1, while for what is concerning the FFNN, the final output was obtained partitioning the countries market data into training (70%), validation (15%) and testing (15%) sets. We defined the number of input, hidden and output neurons depending on the analyzed country; in particular, the quantity of I/O nodes corresponds to the number of available maturities. Regarding the number of intermediate layers and nodes, as there is no a precise rule to select their best combination and the choice is data-dependent (Lantz 2019), we followed a trial and error approach analyzing different configurations (i.e., with one or more hidden layers). The best performance in terms of determination coefficient R 2 was obtained with the network architectures summarized in Table 2 for each BRICS country.
All the FFNNs were trained with the backpropagation learning rule; the learning process was cut after 1000 epochs, i.e., after presenting each training set to the network 1000 times. The fitting accuracy of both models was evaluated comparing the Mean Square Error (MSE) and Root Mean Square Error (RMSE) of both the 5F-DRF model and the FFNN.

Comparison of the 5F-DRF and FFNN Models Fitting Performances
In this section we present the results of the empirical estimation of the term structure. The analysis was carried out using R 4.0.4 and a freshly-new R package (Castello and Resta 2019) with estimation routines implementing the 5F-DRF model, while the MATLAB R2021a (9.10.0) Neural Network Toolbox was employed to run the FFNNs. All the code is available for download in the Zenodo repository (https://zenodo.org/record/5814658 (accessed on 3 January 2022)). Table 3 shows the average estimated parameters values for the time-varying 5F-DRF. The estimated values emphasise the different role played in the yield curve of BRICS by short/medium and long term components. The value of Brazilian β 0 , which addresses the long-term effect, is greater than those of Russia and South Africa by 35% and 24%, and by 96% and 366% than those of India and China, respectively. Clearly, these values reflect the different perception of long-term expectation in the observed countries. Again, if we turn to parameters associated to short term components of the yield curve (β 1 and β 2 ), we observe that they differ significantly from one country to another: the scale of values is in the (negative) tenths for Russia, in the dozens for South Africa, in the hundreds for Brazil and China with alternate signs, in the thousands for India. Although so different in scale, values for Brazil, India and China when compared to corresponding β 0 , β 3 and β 4 highlight the importance of short-term expectation in the yield curve of those countries. This aspect is more shaded in the case of South Africa and ever more so for Russia. Midterm parameters (β 3 and β 4 ) are in turn different in scales in various BRICS countries, with higher values associated to India and China, and lower values corresponding to Russia. Overall the estimated values point out how, despite BRICS are often referred as a compact group of countries, the underlying financial and economic drivers substantially differ and they are reflected in the behaviour of the yield curve.
Moving to the results comparison, we obtained a very accurate fit with both the models, as it can be seen in Figure 4, showing the average yield curve against the average fitted ones generated by both the 5F-DRF and the FFNN. At first sight it seems not possible to distinguish the average observable curve from the fit provided by both the 5F-DRF and the FFNN. Besides, a common feature for the two methods seems the capability to fit curves with quite different shapes as well as in different variation ranges. Indeed, despite the fact that all the curves appear to share similar shapes, that is increasing as function of the maturity, the slope of the average yield curve of South Africa in Figure 4e grows faster than in other cases. This curve, in fact, varies in a range of five percentage points while the range of variation of the other yield curve is much smaller. Conversely, the average yield curve of India, in Figure 4c is the flattest one with a range of variation of only 1.2 percentage points. Moreover, both the methods were able to capture different variation ranges: in the interval [9, 11] for Brazil, [6,9] for Russia, [7, 8.2] for India, [2.5, 4] for China and [6,11] for South Africa. Our first conclusion is that the analysis of the average yield curves does not provide sufficient evidence to uncover which is the best fitting method. Indeed, the lesson learned from this first exploration into the results is that both methods work very well in keeping very different behaviors. In Figure 5, in fact, we offer a direct comparison on the average observable yield curve of the BRICS countries where it is possible to look at the difference in both the variation range and the shape of the curves, as we have already highlighted in the above rows. This feature makes the fitting capabilities of the two methods even more valuable because they were able to capture the dynamics of all the curves, despite the difference in both the maturity spectrum and the steepness of the curve. In search of more clues, we then turned to analyze the average residuals that are shown in Figure 6. The average residuals are very close to 0 for all the BRICS countries: the range of variation is between ±2 × 10 −3 for Brazil and Russia while for India, China and South Africa it lies between ±2 × 10 −2 . These results highlight how both models are characterized by excellent average fitting capabilities. However FFNNs residuals are lower at least by a factor of 100 compared to those of the 5F-DRF model which are characterized by higher oscillations. Taking for example the case of China at the maturity 1 and 30, for m = 1, the errors are equal to 1.3 × 10 −3 and −3.7 × 10 −7 for the 5F-DRF and FFNNs respectively, and equal to 7.7 × 10 −4 and −1.2 × 10 −6 for m = 30.
To evaluate and compare the models fitting abilities we also calculated the MSE and RMSE for the observed yield curves for each maturity, reporting main statistics in Table 4.
Further evidence of the higher fitting abilities of FFNNs is given in the 3D plots shown in Figures 7 and 8 which represent the residuals surface generated by the 5F-DRF and FFNN models respectively, obtained varying both the time t (x-axis) and the maturity m (y-axis). For the FFNN the error surface also includes a zoomed-in area to highlight the residuals magnitude otherwise unobservable at the same scale used to monitor the errors behaviour in the 5F-DRF. In principle, we can conclude that both models provided very good fitting performances. The residuals did not exhibit systematic behaviour, i.e., neither incorrect zeroing or cyclical behaviour caused by wrongly specified models; they took absolute values of very small magnitude with fairly rare spikes mainly concentrated on periods characterized by greater market volatility. In this regard, the error surface of both models contains additional clues leading us to the following conclusions: • (a) FFNNs perform better than the 5F-DRF model for in sample fitting of the yield curve of the BRICS countries. • (b) The reason of (a) is in a better adaptability of the FFNN to both internal and external shocks.
The surface generated by the parametric model (5F-DRF), in fact, highlights that more pronounced residuals values are associated to higher market turbulence, as discussed analyzing the yield curve surfaces in Figure 3. This is, for instance, the case of Russia, with maximum values in the error surface of the 5F-DRF corresponding to the period between 2014 and 2016. Again, if we turn the attention to Figure 7d, we can observe that maxima in the error surface correspond to the period from June 2007 to September 2010, when the China was struggling with the Global Financial Crisis. In the case of India (see Figure 7c), instead, we can observe that the whole error surface is disseminated by spikes, mainly concentrated at lower maturities. This clearly indicate an underestimation (overestimation) of the observable values; as a matter of fact, the behaviour of the Indian yield curve at various maturities was a bit trickier to fit than for other countries in the sample: the economic regression probably affected the volatility of financial markets. Similar remarks hold also for South Africa and its error surface in Figure 7e: error peaks raised at the maximum level once again in connection to higher turbulence in the markets. The lone voice in this roundup comes from Brazil, with a flat error surface for the 5F-DRF as it can be seen in Figure 7a, and more pronounced spikes (however not exceeding the range ±0.001) in the zoomed-in area of Figure 8a, during the period 2015-2018. In general, we can conclude that 5F-DRF model suffers for underestimation (overestimation) issue when used to interpolate curves with pronounced oscillations at certain maturities. These are common for parametric model, and even the use of a more efficient parameters estimation can only reduce the error but not at the levels provided by non-linear black box technique like the FFNN. At the same time, in fact, FFNNs provided evidence of being a more flexible tool for in-sample fit of the term structure of interest rates: the residuals generated by the FFNNs are considerably lower, than those of the 5F-DRF and less influenced by the market turbulence in every analyzed situation. They therefore denote a greater ability to replicate almost all the patterns exhibited by the term structure curve, even in cases of strong market fluctuations or downturn.

Conclusions
Motivated by the important role played by the term structure of interest rates we investigated and compared the in-sample fitting abilities of two distinct methods, the Five Factor De Rezende-Ferreira (5F-DRF) model and Feed-Forward Neural Networks (FFNNs) when applied to the data of BRICS countries. For the parametric approach we discussed the use of time-varying decay terms to ensure more flexible parameters and hence get higher interpolating performances. Focusing on in-sample rather than out of sample was not limited, in our opinion, as bond prices reflect market participants' views on interest rate levels in a forward-looking way.
The results may be analyzed under two reading keys. First, we offered a comprehensive study of the BRICS yield curves all at once. In this respect, we outlined how despite BRICS countries are often viewed as a compact set, this is not true, as reflected by the different behaviour of related yield curves. This aspect was accurately kept by the 5F-DRF model, as it can be seen by looking at the average values of estimated parameters: the values, in fact, highlight the different weight associated to short/mid and long-term components of the yield curve. Second, we highlighted very high in-sample fitting capabilities of both the models for all the examined countries. Nevertheless although the 5F-DRF is the most flexible model in the Nelson-Siegel Family and these features have been further enhanced by way of time-varying parameters, the empirical evidence has clearly shown the superior capability of FFNN in interpolating the behaviour of the yield curve; this was also confirmed by comparing the average behaviour of monitored and interpolated yield curves as well as examining the average residuals and the residual surface of both the 5F-DRF and the FFNN. In fact, the FFNN perfectly adapted to all the typical yield curve shapes, even to the most twisted ones with multiple inflection points like in the Indian and Russian case. Moreover FFNNs efficiently replicated all the features of the examined term structures, being flexible enough to overcome the common limitations of parametric models in presence of booms and busts. The greater ability of the FFNN was confirmed by the MSE and RMSE associated to the fit, with values ranging, in the worst case, between ±0.02 for the 5F-DRF and between ±0.0002 for the FFNN. Another advantage of FFNNs was related to the estimation process which is less time-consuming than for the 5F-DRF, because it requires a lower amount of parameters during the network calibration process. The possibility of customizing FFNNs settings for each country, in order to achieve improvements in their fitting ability for all markets was without any doubt an additional advantage of using this technique.
Based on the arguments set out above, we therefore conclude that the FFNN is a better and flexible tool for the in-sample fit of the yield curve in all the BRICS markets. Nevertheless we do not underestimate some limitations of our approach and mainly the fact that we performed in-sample fitting. To such aim, future plans include the extension of our conclusions by comparing models performance out-of-sample and a deepest investigation of the potential of our procedure to estimate time-varying parameters in the 5F-DRF compared to alternative solutions discussed in the more recent literature on parametric models.