Data Driven Approach to the Dynamics of Import and Export of G7 Countries

The dynamics of imports plus exports of 226 product classes by the G7 countries between 1962 and 2000 is described in terms of stochastic differential equations. The model allows interesting comparisons among the different economies related to the compositions of the national baskets. Synthetic solutions can also be used to estimate hidden and unexploited growth potentials. These prerogatives are strictly connected with the fact that a network structure is at the basis of the model. Such a network expresses the mutual influences of different products through resource transfers, and is a key ingredient producing cooperative growth effects which can be quantified and distinguished from those generated by deterministic drifts and representing direct resource inputs. An analysis of this network, which differs substantially from those previously considered within the economic complexity approach, allows to estimate the centrality of different products in each national basket, highlighting the most essential commodities for each economy. Solutions of the model give the possibility of performing counterfactual analyses aimed at estimating how much the growth of each country could have profited from a general strengthening, or weakening, of the links in the same products network.


Introduction
Recently, the availability of extensive data-sets concerning a large number of product groups exported and imported by all nations stimulated investigations aimed at determining diversities or other complexity features of economic systems. In particular, the approaches to economic complexity [1,2] or Fitness [3][4][5][6] try to extract from the panel of yearly export data information on the degrees of diversification of national economies consistently with the specialization of the products. In all these approaches, even if dynamic considerations are the final objective, the data contained in the bipartite network connecting, e.g., countries to exported products, are used to construct indicators applying year by year. Only in a later stage these indicators are used in combination with other quantities, like gross domestic product per capita (GDP pc ), in regression or alternative analyses of dynamic character [6].
A different approach consists in considering the yearly data as a particular realization of a stochastic model process, which, once identified and calibrated, should give more direct insight into dynamics. The idea to model the time evolution of country-product networks by such processes has been put forward most recently and shown to open the possibility of insight into complexity features directly related to dynamics and not otherwise accessible [7,8]. A first step in this direction

The Model
Data for yearly exports and imports were taken from the international trade data furnished by the National Bureau of Economic Research [12] and cover a period of 39 years from 1962 to 2000. The products are classified on the basis of the Standardized International Trade Code at 3-digit level (SITC-3) and trades are reported in US-dollars. We limit here our consideration to the countries of G7 (Canada, France, Germany, Italy, Japan, the United Kingdom, and the USA) and, as a benchmark, we consider the global aggregate exports realized by all countries worldwide (note that imports and exports for each one of these countries have been recorded as imports from and export towards the rest of the world). We denote as Z c p,n the total value (in thousands of current US-dollars) of the product category p (p = 1, 2, . . . , M c ) traded (import plus export) in the year n (n = 0, 1, . . . , T with T = 38) by country c. The number of products, M c , varies from country to country and is reported in Table 1. When referring to worldwide aggregated data, the superscript c is implicitly assumed to mean world. Table 1. Values of the model parameters for the world and each country member of the G7. Errors of the calibration procedure are also reported. Exception is made for τ c , for which the errors obtained from the calibration procedure are not very meaningful since even a great change (up to 80%) in its value influences very little the results and the other calibrated parameters. As already noticed in previous papers [7,8], a key feature of the organization of economies consist in the fact that the various products, besides showing an approximate average exponential growth, have a rather stable ranking over the whole considered period. In fact, if we assign to each product p of a given country a color, the wavelength of which is proportional to the fraction of the value of product p over the whole country basket, Ω c n ≡ ∑ M c p=1 Z c p,n , averaged over the last 10 years covered by the database, a rainbow image is perceived when plotting the exports as a function of time (see Figure 1). The interpolations shown in Figure 1 resemble those of geometric Brownian motions: On average they show similar drifts and fluctuations keeping amplitude constant on a logarithmic scale. This suggests to define average growth as [10]: In Table 1 (second column), we report the estimated growth for the world and for each country. As we can see, Canada and Japan are the countries that experienced the largest average growth, while the United Kingdom is the country that grew the least among the G7.
Since yearly trade records result from variations in much shorter periods, we switch to continuous time t in year units (with t = 0 corresponding to 1962): The trade values in the year preceding time t is now indicated with Z c p (t). Thus, Z c p,n gives a discrete representation of Z c p (t) with a point for every year in the database. Taking inspiration from a former paper of Goudré et al. [10] and following Ref. [7] the minimal model describing how the values of various trades evolve in time can be written as a set of Stochastic Differential Equations (SDE): where µ c (t) represents a deterministic drift, accounting for the average growth of the exports (including the inflationary one), and η c p (t) is a multiplicative noise representing the variability of conditions faced by different products at different times. Such variable conditions may depend on many complex causes which can be both "internal" to the country itself and "external", i.e., due to dynamics in countries with which goods are traded. Finally, the coupling terms J c pp describe the shift of resources from production j to production i.   Stochastic differential equations similar to Equation (3) were used to describe other problems like population and evolutionary dynamics [13,14], portfolio strategies [11], interface growth [15], or optimal pinning of vortices by random defects in materials [16,17]. An interesting aspect of Equation (3) is the fact that, if the noise η c p (t) is correlated in time, as it is natural to expect, its combination with the coupling terms proportional to J c pp may induce a nonzero average growth, even in the absence of deterministic trends driving the single productions [10]. The deterministic drift, µ c (t), represents the average growth of the import plus exports in a given country in the absence of contributions generated by the interplay between the correlated noise and transfer terms. The time dependency comes from the need to include inflationary effects due to the fact that import-export values are expressed in the current currency of the specific year. Thus, we write: in order to separate the real average contribution to the drift,μ c , from the inflationary one, I(t), which can be read as a yearly step-wise function, the values of which are taken from the Organization for Economic Cooperation and Development (OECD) [18] and reported in Table 2.  [18], while for the missing initial 9 years period we evaluated a weighted average of the inflation of the 23 countries of highest GDP (taken from [19]). The average value over the 38 years is I = 7.92. To take into account the variability of conditions to which the production and acquisition of a given product are subjected, Equation (3) contains a multiplicative noise, η c p , that, together with the deterministic drift, lets the quantities Z c p (t) perform a sort of geometric Brownian motion. As we mention, this noise has to be correlated in time and our choice, similar to what has been done in Ref. [10], is for an exponential correlation with a characteristic time τ c which represents the typical duration of opportunity/crisis periods. Since the various products are reasonably clustered in sets facing similar external conditions, we also introduce a correlation between products. Thus, the noise has zero average, η c p (t) = 0, and its correlator reads:

Year
where σ c weights the importance of the stochastic part of the dynamics, and c c pp are the elements of the correlation matrix constructed from the variations at equal times of the different exports in the database [7] (see Section 5 for further details).
Finally, we put the coupling terms equal to: where c c pp are again the elements of the correlation matrix entering in the noise, and G c is a coupling constant that regulates the magnitude of the transfer of resources among different products. By considering the factor |c c pp | as a proxy for inverse distance, Equation (6) is consistent with the gravity law, often used for estimating transfer rates in economics [20,21]. The proportionality of J c pp to z c p guarantees that, for t → ∞, Z c p (t) ∼ z c p . The network specified by the J c pp 's is deeply different from other networks proposed in the economic complexity literature as that based on products similarities [9]. In the latter, for instance, apples and pears are strongly connected because they need the same infrastructures and undergo similar production processes. Such kinds of similarities are indirectly present in our case as a secondary effect inherent to the correlation matrix, which in turn also takes into account the fact that a fluctuation in the trade of oil at a certain time is likely going to affect the production of apples, pears, and many other products. However, an even more important feature of the matrix of transfer rates is the proportionality J c pp ∝ z c p , which strongly weights the influence of each single product on the global dynamics. Indeed, if a given product p experiences favorable conditions for growth, in force of this proportionality, part of its extra gain tends to be mostly redistributed towards nodes with larger z c p . A proper estimate of how effective this mechanism is, requires to also take into account proper notions of centrality of the directed network. The next section is partly devoted to such discussion. At the same time, given the proportionality of J c pp to z c p , the structure of our network is such that the most traded products are also the most central nodes (see Figure 2).

Food and live animals
Beverages and tobacco Crude materials, inedible, except fuels Mineral fuels, lubricants, and related materials Animal and vegetable oils, fats, and waxes Chemicals and related products, n.e.s. Manufactured goods classified chiefly by material Machinery and transport equipment Miscellaneous manufactured articles Commodities and transactions not classified in the SITC Figure 2. Representation of undirected graphs U WOR (panel (a)) and U USA (panel (b)) associated with the correspondingJ c pp = max{J c pp , J c p p } matrices. Since U c are fully connected graphs with approximately M c (M c − 1) ∼ 5 × 10 4 edges, here we reported only the 10% of the strongest links, which in turn are colored with a palette that is lighter for the weakest among these. The size of the nodes is directly proportional to the value of the ranking z c p associated with the product p (the SITC code is highlighted for the most central nodes, see Table 3), while the colors are representative of the macro-category of products illustrated in the legend.

Matrix of Transfer Rates
For each country and for the World, the directed network having products as nodes and links is quite complex. Figure 2 gives a partial and undirected visual representation of the network structure in the case of the USA and the world. The links,J c pp , of the undirected graphs, U c and U WOR , are built with a maximum criterion:J c pp = max{J c pp , J c p p }. At a first glance, the qualitative structure of the two networks appears very similar: There are few nodes (about 2% of the total number) that are central (degreewise) in the graphical representation, with the remaining nodes being connected almost exclusively to the more central ones in a fashion similar to scale-free networks [22,23]. In both cases the central nodes are given by the same categories of goods, namely oil, cars, machinery, and electronics related products. Finally, the USA seem to have an economy thoroughly dominated by machinery and electronics, and, quite surprisingly, the oil related nodes are substantially smaller than in the case of the world network.
A more quantitative analysis to highlight these analogies and differences can be performed by determining a suitable node centrality measure. The J c pp matrix has strictly positive real entries, that can be considered as the weights of the links of a full directed graph. To properly exploit such a feature, we chose to make use of the Kleinberg's authority score [24]. In Table 3, we show the top 10 commodities with respect to Kleinberg's authority: As we can see, 8 out of 10 products are present in both charts and only swap position in ranks, confirming that the two networks are very similar to each other. It is interesting to notice that the ranking of products in terms of authority score does not differ very much from that in terms of z c p . So, the structure of our network is such that the most traded products are also the most central nodes (see Figure 2). Interestingly, Sharma et al. have recently shown that a very similar structure arises in the financial network at sectoral level by using a methodology based on multi-layered networks [25,26]: In fact, their results show that there exists a one-to-one mapping between the economic size of the sectors and their centrality in the corresponding financial network. Table 3. Top 10 products in the world and USA J c pp networks with highest authority score. The value of z c p (defined in (1)) together with its ranking is also reported. In Figure 3, we plotted the empirical survival distribution function (ESDF) of the authority scores for the networks of the World and of all the G7 countries. One notices that the world's network has two nodes with a very high value (781 and 333), while the USA have only one (776). Another interesting feature pointed out by Figure 3 is the initial exponential decay of all the authority scores ESDF. This is another property that our product networks have in common with scale-free networks. All the networks of the G7 countries present a distribution steeper than the world's one. Taking inspiration from work related to the vulnerability of networks [27], steepness can be interpreted as an alternative instability indicator [28], since concentrating high values of centrality in few products will result in an exposure of the country to major risks in the eventuality of a crisis striking such sectors (negative trends will spread very easily to the rest of the nodes). These scenarios could be in principle tested with our dynamic model, through simulations of hypothetical setbacks of high-centrality products.  Table 3.

Calibrated Parameters
For every G7 country, we performed a calibration procedure in order to determine the values of the model parameters that best fit the historical data. In Table 1, we show such values with the associated errors. We note that σ c assumes values in the interval [0.1, 0.2] y −1/2 , with Canada and Japan showing the highest ones, and the world the smallest. The latter feature has to be expected since multiplicative noise should get reduced by the aggregation process. The values of the parameter G c of all the countries have comparable magnitudes. In the next section we will investigate more accurately the role played by such parameter in determining the overall average growth, λ c T , of every country. All the values ofμ c are of comparable magnitude and are substantially lower than the contribution, which can be ascribed to average inflation. This is approximately 0.08 y −1 (see Table 2 in the Section 5). Quite interesting is the case of the UK, the only country showing a negative value for the deterministic driftμ c : This is probably due to the fact that the UK, in the second half of the twentieth century, did not manage to fully exploit the relationship with its trade partners as well as the other countries did. Nevertheless, its inner trade network is still good because it shows an average growth similar to that of the other countries.

Counterfactual Analysis and Optimization
Our model allows to distinguish three distinct contributions to the overall growth. In fact, if we integrate Equation (3) and average over the products we obtain: On top of the two deterministic contributions due toμ c and the average inflation, we have the term h c T (G c ). This results from the integration of the transfer terms and can be estimated from historical data by discrete summations. Its magnitude depends sensibly on G c , and on the interplay that the resource transfers have with the fluctuations determined by multiplicative noise. Indeed, favorable stochastic fluctuations at a local level, if properly exploited, can spread globally to the rest of the network more efficiently than unfavorable ones. In portfolio optimization, this fact leads to the explore-exploit dilemma [10] of deciding whether to exploit a local opportunity (of amplitude σ c and expected duration τ c ), or to move towards possibilities offered by other nodes in the network by transferring a percentage of the local investment (at a transfer rate speed controlled by G c ).
It is therefore interesting to study and quantify what the overall growth of the network would have been if we varied the coupling constant of the transfers rate G c . Such counterfactual analysis produces the plots reported in Figure 4: In panel a we show, for every G7 country and the world, the dependence of λ c T on G c (in logarithmic scale). These results are obtained by simulating many times the evolution of every network at fixed, historically calibrated values ofμ c , σ c and τ c , and varying only G c . For extremely low values of G c the growth is exclusively determined given by the deterministic drifts, and Equation (3) reads: Once integrated in the interval [0, T] and averaged over the products, this equation yields the relation λ c T =μ c + 1 For increasing values of G c , we see that every country produces the same kind of curve: λ c T rises until it reaches a peak for some specific value of G c , and then starts to fall for large values. Almost every country (the exception is Canada) has a curve that for extremely large values of G c goes beneath the plateau defined by the deterministic drift: This means that in conditions of frenetic transfers the contribution to the growth h c T is negative. The arrows in the plots indicate the coordinates of G c and λ c T determined from the historical data (values shown in Table 1). In order to better compare the intrinsic growth of the network, in Figure 4b we plot the same curves deprived of their drift contributions.
Clearly, every network is characterized by different peak amplitudes, and by different locations of the peaks with respect to the historically calibrated G c . For two countries (namely Canada and Japan), the plots indicate a much higher, unexpressed growth potential compared to that of the other countries. Indeed, an even mild increase of G c for them would have produced substantial extra growth. These two countries are also those with the highest historical h c T , and this is in agreement with the fact that Japan and Canada have been two emerging economies in the second half of the twentieth century that reached a well established position nowadays (indeed they became G7 members). Looking at the rainbow plots in Figure 1, we also see that for these countries numerous products were out of rank in 1962 and only through the later evolution reached a presumably more stable position in the year 2000. This is in agreement with results obtained for the calibration of the parameter σ c : As we can see in Table 1 and already remarked above, Canada and Japan are the countries with the two highest σ c among those observed. Having a great amplitude of fluctuations, together with a good organization of the transfer rates, can lead to a very high overall growth.  For the other countries, we find that the peak has an amplitude of magnitude comparable to that of the world (just slightly bigger). We also see that the values of the calibrated parameter G c are pretty close to that of the world, while those of Canada and Japan are almost one order of magnitude smaller. Moreover, we find that the value of the historically calibrated G c is rather close to the location of the maximum (as for the world). All these hints tell us that these other countries have an economy which is much more similar to that of the world, because they have been well established since the beginning of the analysis in 1962, while Canada and Japan, as previously stated, underwent big radical changes in this 39 years period that led them to become leading economies in the world scenario.
Finally, we observe that a common feature shared by all countries is that the corresponding calibrated value of G c is always on the left side of the peak. This is an indication of a conservative character of their economies, in the sense that these countries prefer the safety of exploiting the resources rather than exploring new directions of investment through transfers.

Conclusions
A dynamic description of the evolution in time of the bipartite network connecting countries to the traded goods can be a key to identify interesting complexity features of the economies and of the products. The choice of undertaking such modelization amounts to pushing further some basic points of view of the economic complexity approach [1,2]: Besides assuming that export/import panels should be sufficient to take into account intangible factors of the economies, it is postulated that specifying the composition of the trade baskets should be fully sufficient to account for their time evolution up to the information contained in the deterministic drift and the noise. The dynamic insight one can gain is independent from, and complementary to, that provided by analyses of the Fitness complexity approach [3][4][5][6]. Here we tried to give an account of these features and of the perspectives they are opening, by considering the application of our stochastic differential system of equations to the case of the G7 countries and to the data aggregated for the whole world. The relevant emerging aspects are related to the novel network structure underlying the dynamics of transfers between different productions and to the possibility of performing synthetic simulations and counterfactual analyses. The preliminary results shown should provide a clear indication of the information one can expect to extract by a careful analysis of the networks and of the model dynamics.

Correlation Matrix
The first step in building the correlation matrix is to evaluate the yearly logarithmic returns: which we standardize exploiting the data of all the available years: The correlation between the products p and p is then defined by: r c p,n r c p ,n .

Calibration
In this section, we provide details regarding the calibration procedure that allowed us to find the values of the parameters (shown in Table 1) that best fit the historical data. To the purpose of better readability, here we will drop the · c superscript: Each quantity will be intended as country specific.
The calibration procedure follows mainly the one presented in Ref. [7], with the only difference lying in the approach to estimate the noise parameters σ and τ. The first parameter we are going to calibrate is G. Dividing Equation (3) by Z p (t) and integrating in the time interval [n 1 , n 2 ] (n 1 and n 2 integers), we obtain: f p (n 1 , n 2 ) = Gg p (n 1 , n 2 ) +μ + 1 n 2 − n 1 where we introduced the functions Integrals involving Z p (t) and I(t) are approximated in terms of summations because of the discrete nature of the data at our disposal.
Equation (12) establishes a linear relation between these two functions, therefore we can perform a linear regression of the scatter plot f vs. g to estimate the value of G. As explained in [7], the random source makes the points with coordinate g p close to zero not reliable for the calibration of G. By calibration of synthetic histories with known G c , we found that considering the 1/10 of the data with highest |g p | value in the regression will provide us with the correct G value with a confidence level of the 10%. The intercept obtained with the regression is a first estimate of the driftμ, however, since we excluded the points with lower |g p |, it is not a very accurate one. After calibrating σ and τ we will be able to perform a more accurate calibration of the drift.
In order to calibrate these two parameters, we first rearrange Equation (12) by moving to the left side the Gg p term, and we evaluate the variance of the resulting equation. Given n 2 = n and n 1 = 0 we obtain: v(n) = n 2 Var [ f i (0, n) − Gg i (0, n)] = 2σ 2 n + τ e −n/τ − 1 (15) We find the values of the parameters by fitting the empirical variance with the function on the r.h.s. of the equation. Exploiting the fact that the characteristic time is short compared to the time interval of 39 years, we find that the exponential term e −n/τ → n τ 0. As a consequence, we perform a linear regression analysis of Equation (15), neglecting data for which e −n/τ /v(n) < 1%. Eventually, we calibrate the remaining parameterμ through repeated synthetic simulations (see below) of the system of equations deprived of the deterministic driftμ itself. We can in fact exploit the growth Equation (7), which in this case reads as λ * T = h T (G) + ∑ T t=0 I t . Thus, one can find the value ofμ that reproduces correctly the growth from the historical data by difference:

Numerical Integration
Equation (3) is a stochastic differential equation (SDE) characterized by a colored Gaussian noise that evolves according to: where ρ ≡ e − dt/τ and ξ p (t) is a zero-mean Gaussian noise with correlation ξ p (t)ξ p (t ) = c pp δ(t − t ). The Cholesky decomposition (see for instance [29]) of the matrix C ≡ c pp allows us to obtain a noise with such properties. In detail, we perform the C = LDL T decomposition, which is ideal in this case because of the relatively large size of the matrices involved (it does not require the evaluation of any square roots in the diagonal terms, which, if excessively small, could become negative because of computing precision issues). This decomposition provides us with a vector of correlated Gaussian noises ξ starting from a vector of independent Gaussian noises ξ : Substituting the correlated noise in Equation (3) and discretizing gives us: where we introduced the Wiener processes ∆W p,t = √ ∆tξ p,t and the two coefficients: With the assumption ∆t τ, different integration prescriptions yield the same results. In all the numerical integrations performed we chose ∆t = 0.001, which satisfies such a condition. We chose a second order Runge-Kutta scheme for the Itô prescription, suited for systems of equations and characterized by both strong and weak convergence of order 1, which is thoroughly explained in [30].