A new and stable algorithm for economic complexity

We present a non-linear non-homogeneous fitness-complexity algorithm where the presence of non homogeneous terms guarantees both convergence and stability. After a suitable rescaling of the relevant quantities, the non homogeneous terms are eventually set to zero so that this new method is parameter free. This new algorithm reproduces the findings of the original algorithm proposed by Tacchella et al. [1], and allows for an approximate analytic solution in case of actual binarized RCA matrices. This solution discloses a deep connection with the network theory of bipartite graphs. We define the new quantity of “country net-efficiency” quantifying how a country efficiently invests in capabilities able to generate innovative high quality products. Eventually, we demonstrate analytically the local convergence of the algorithm.


Introduction
In the last decade a new approach to macroeconomics has been developed to better understand the growth of countries [2].The key idea is to consider the international trade of countries as a proxy of their internal production system.By describing the international trade as a bipartite network, where countries and products are sites of the two layers, new metrics for the economy of countries and the quality of products can be constructed with a simple algorithm [1] by leveraging the network structures only.This algorithm evaluates the fitness of countries, the quality of their industrial system and the complexity of commodities, by indirectly inferring the technological requirements needed to produce them.These two new metrics have been successfully used to describe past events and to forecast the economic development of countries and commodities production [3,4].
The very same approach has been applied to different social and ecological systems presenting a bipartite network structure and a competition between the components of the system [5,6].Thus, it is natural to interpret fitness and complexity as properties of the network underlying those systems.The revised version of the fitness-complexity algorithm that we show here, results in a clear and natural interpretation in terms of network properties and helps to better understand the different component that contribute to the fitness.
In the following, we first describe the original algorithm and its properties underlining some critical issues that we solve with the revised version.Then, we define the new algorithm step by step and study its advantages in the case of countries-products networks.Finally, we propose an approximated solution and discuss its interpretation.

Algorithm definition 2.1 The original algorithm
Object of this work is the network of countries and their exported goods.This network is of bipartite type (countries and products are mutually linked, but no link exists between countries as well as between products) and weighted (links carry a weight scp, i.e., the exported volume of product p of country c, measured in US$).Data ranging from year 1995 to year 2015 can be freely retrieved from the Web [7], though we use it after some procedure to enhance its quality [4].Eventually, we come up with data about 161 countries and more than 4000 products, which were categorized according to the Harmonized System 2007 coding system, at 6 digits level of coarse-graining.The weighted bipartite network of countries and products can be projected onto an unweighted network described solely by the Mcp matrix with elements set to unity when a given country c meaningfully exports a good p and zero otherwise (See Methods).
The original algorithm is defined by the following non-linear iterative map, with initial values In the previous expression Fc and Qp stand for the fitness of a country c and quality of a product p; C and P are the total number of countries and exported products respectively and from the dataset we have that C P. By multiplying all Fc and Qp by the same numerical factor k, the map remains unaltered, so that the fixed point of the map (as n → ∞) is defined up to a normalization constant.In the original algorithm this constant is chosen at each iteration n such that c The algorithm of Eqs. ( 1) and ( 2) successfully ranks the countries of our world according to their potential technological development and, when applied to different yearly time intervals can be used to suggest precise strategies to improve country economies.It has also been proved to give the correct ranking of importance of species in a complex ecological system [5].Despite its success, some points can still be improved: i. Convergence issues: As stated in a recent paper [8]: "If the belly of the matrix [Mcp] is outward, all the fitnesses and complexities converge to numbers greater than zero.If the belly is inward, some of the fitnesses will converge to zero."This means that all the products exported by the countries with zero fitness get zero quality.This is mathematically acceptable but heavily underestimates the quality of such products: even natural resources need the right know-how to be extracted so that their quality would be better represented by a positive quantity.To cure this issue one has to introduce the notion of "rank convergence" rather than absolute convergence, i.e., the fixed point is considered achieved when the ranking of countries stays unaltered step by step.
ii. Zero exports: The countries that do not export any good do have zero fitness independently from their finite capabilities.
iii.Specialized world: In an hypothetical world where each country would export only one product, different from all other products exported by other countries, the algorithm would assign a unity fitness and quality to all countries and products.Though mathematically acceptable, this solution does not take into account the intrinsic complexity of products.
iv. Equation symmetry: This is rather an aesthetic point, in that Eq. ( 1) are not cast in a symmetric form.

The new algorithm
First, we reshape Eq. (1) in a symmetric form by introducing the variable Pp = Q −1 p , i.e., Now the quality of products are given by the quantities P −1 p and the algorithm is trivially equivalent to the original one provided one uses the normalization conditions c F (n) c = C and p (P (n) p ) −1 = P. Next, we introduce two set of quantities δc > 0 and δp > 0 and consider the inhomogeneous non-linear map defined as Since the map is no more defined up to a multiplicative constant, the normalization condition is not required anymore, while the initial condition can be set as in the original algorithm F ( The parameters δc and δp can be interpreted as follows.The parameter δc represents the intrinsic fitness of a country.In fact, for a country k that does not export any good we have M kp = 0 ∀p so that its fitness is simply equal to δ k .Irrespective of its exports any country has a set of capabilities that characterize it.The parameter δp is more intriguing.If no country exports it (probably because no country produces it), the product q has not been invented yet and its quality lies at its maximum value δ −1 q since Mcq = 0 ∀c.Therefore, the inverse of δq may be interpreted as a sort of innovation threshold: the smaller the parameter is, the higher is the quality of the product in his outset and more sophisticated capabilities are necessary to produce it.On the other hand, products like natural resources may be associated with a larger value of the parameter since require less complex capabilities for their extraction.
In order to keep the algorithm simple and parameter free as the original one, we first set a common value δc = δq = δ, then we study the dependence of the algorithm on δ and finally we set δ = 0.

Dependence on the non-homogeneous parameter
We consider δc = δq = δ and address the dependence of the fixed point upon δ.To outline the dependence of Fc and Pp from the parameter δ, we can use the relations defined in Eq. ( 4) and introduce the rescaled quantities Pp = Pp/δ and Fc = Fcδ.After some trivial algebra we get from Eq. (4), from which we deduce that, as soon as the parameter δ 2 is much smaller than the typical value of Mcp matrix elements, i.e., much smaller than unity, the fixed point in terms of Fc and Pp almost does not depend on δ (see Fig. 1).It is worth noting that the values of fitness Fc and quality Qp = P −1 p of the original map defined by Eqs. ( 1) and ( 2) cannot be obtained from this new algorithm when the parameter δ tends to zero.In terms of Fc and Pp the fitness and quality obtained from the original algorithm can be expressed as Fc = Fc δ −1 and Qp = P −1 p δ −1 .Since the new algorithm provides finite non vanishing values of Fc and Pp, by taking the limit δ → 0 would deliver infinite values of Fc and Qp.We might think that the normalization procedure necessary in the old algorithm in order to fix the arbitrary constant would get rid of the common factor δ −1 and deliver the same values of the new method.Unfortunately, this is not the case since the new method does not rely on a normalization procedure.Therefore, since a self-consistent procedure of normalization, i.e., a projection on the double simplex defined by Eq. ( 2), is missing in the new algorithm, the results cannot coincide.Since the quantities Fc and Pp are well defined in the limit δ → 0, we shall focus on them only, in the following.We remind that the complexities of products delivered by the original method are connected to the set of P −1 p and thus to the P −1 p .In particular, the second of Eq. ( 6) can be interpreted at the fixed point as Pp = 1 + Q−1 p with the Qp expressed as in the second of Eq. ( 1), but with the tilde quantities calculated with the new algorithm.Therefore, we shall assign to Qp = ( Pp − 1) −1 the meaning of complexity of products in our new algorithm.The differences between the old and new algorithm are depicted in Fig. 2, while the evolution of the fitnesses in time is shown in Fig. 4.

Analytic approximate solution
Despite their symmetric shape, Eq. ( 4) are not symmetric at all since in case of actual countries and products, the matrix Mcp is rectangular with the number of its rows C being much less than the number of its columns P. To estimate the effect of this asymmetry, we first consider Eq. ( 4) in a mean field fashion, where each element of Mcp is set to the average value M = c,p Mcp/CP, and write, at the fixed point, with now all Fc and Pp set to be equal to their mean field value f and p respectively.By setting δ = 0, we find p = 1/(1 − C P ) ≈ 1 + C P and f = P − C. Indeed, an approximate expression for the fixed point of Eq. ( 6) in the regime δ 1 and C P can be derived also beyond the mean field approximation.To this end, we set again δ = 0 and consider the corresponding fixed point equation associated to Eq. ( 6), i.e., We observe that the quantity Dc = p Mc,p, representing the diversification of country c, i.e., the number of different products exported, is of the order of P (at least for the majority of countries).Therefore, setting P * = maxp Pp and F * = minc Fc, Eq. ( 8) implies, From the first estimate, F * ≥ const P/ P * , and therefore, by the second estimate, P * ≤ 1+const C P P * .As Pp ≥ 1, we conclude that Pp = 1 + Wp with Wp in the order of magnitude of C/P, and, as a consequence, Fc is of the order of magnitude of P.
We next compute explicitly the values of Fc and Pp at the first order in this approximation.The calculation of second order terms can be found in Appendix A. By using the first order approximation (1 + a) −1 ≈ 1 − a twice, from Eq. ( 8) we have, Let now H be the square matrix of elements Letting D −1 be the column vector with components 1/Dc, the last displayed formula reads, We now observe that H pp ≤ c 1/D 2 c ≤ const C/P 2 .Therefore, the matrix (1 − H) is close to the identity (the correction is of order C/P 2 ) and hence invertible (with also the inverse close to the identity).In this approximation, W = M T D −1 , so that the rescaled (reciprocals of the) qualities of products are given by P In the same approximation, we obtain the rescaled fitnesses Fc; since having introduced the co-production matrix K = MM T with elements K cc = p M cp M p c , representing the number of the same products exported by the two countries c and c .Interesting to note how, up to the first order approximation, the values of the fitness of countries are depending on the co-production matrix only.The goodness of the approximations above can be appreciated in Fig. 3 that shows how the relative difference between the numerical values at the fixed point and the approximate solution of Eq. ( 10) is below 0.5% for more than 85% of the countries.

Country inefficiency and net-efficiency
From Eq. ( 10) we deduce that the leading part of fitness Fc is given by the diversification Dc.The diversification of a country is indeed an important quantity, for the calculation of which we do not need any complicated algorithm.On the other hand, what the non-linear algorithm proposed does, is to quantify how a country manages to successfully differentiate its products, and indirectly offers an estimate of the capabilities of a nation.In fact, a country exporting mainly raw materials would be less efficient with respect to a country exporting high technological goods, when they have the same diversification value.For this reason, we introduce the new quantity Ic = Dc − Fc, inefficiency of country c: the smaller the value Ic the more efficient is the diversification it chooses.From the approximate solution displayed in Eq. ( 10), we get that Ic ≈ c K cc /D c , so that the inefficiency of a country is a weighted average of its co-production matrix elements.The dependence of the country inefficiency on the diversification is displayed in Fig. 5, while a visual representation of it is displayed in Fig. 8.It is interesting to notice how a clear power-law dependence exists between the inefficiency and the diversification of a country.By is plotted with the number of countries on the vertical axis.The approximated values are calculated using Eq. ( 10).indicating with Ic = qD m c the least square best fit of yearly data we find that over the range 1995-2014, m = 0.751 ± 0.0029 and q = 0.318 ± 0.015.
The structure of the M matrix is such that those countries with high diversification also export low quality goods in average.Therefore to a large diversification would statistically correspond a large inefficiency, though the found power-law is not trivial and depends on the structure of the M. A similar power-law behaviour is found between the fitness calculated with the traditional method and the diversification, but with a different exponent (from the left panel of Fig. 2 we deduce that there is a power-law relation between the fitnesses calculated with the original method and this new method, and the exponent is around 1.53; since the fitness Fc calculated with the new method goes as Dc at the first order, then the old fitnesses also go as D 1.53 c ).In order to better appreciate the production strategies of countries, we subtracted the common power-law trend of the dependency of the inefficiency on the diversification for each year, changed its sign and plotted the result in Fig. 6, which thus shows the time evolution of a quantity that we call country net-efficiency Nc (net in the sense opposed to gross) over the years 1995-2014.It interesting to note how countries behave differently over the time lapse considered.Some countries display a decreasing net-efficiency, others an increasing or a constant one.What many of these curves have in common is the decreasing behaviour after year 2007, i.e., the year considered the beginning of the last large financial crisis.

Local convergence
From the simulations it is clear that the fixed point obtained by iterating Eq. ( 4) is locally stable.We can also prove it by resorting to the Jacobian of the transformation, in the case of countries and products.First we recall that the sum over the indexes c and p of Eq. ( 4) run from 1 to C and P respectively, with usually C P. In the case of countries and products C/P ≈ 10 −1 .We also fix δc = δq = δ 1, so that the fitnesses and the (reciprocals of the) qualities at the fixed point are approximately given by Fc = Fc/δ and Pp = δ Pp with Fc and Pp the components of the vectors F and P given in Eq. ( 10) and Eq. ( 9) respectively.
Next, we calculate the Jacobian of the transformation at the fixed point which can be simply expressed as the block anti-diagonal matrix having introduced the diagonal matrices F = diag(F1, F2, . . ., Fc) and P = diag(P1, P2, . . ., Pp) respectively.
We claim that the spectral radius ρ(J) of the square matrix J is strictly smaller than one.Denoting by σ(J) the spectrum of J, this means that ρ(J) := max{|λ| : λ ∈ σ(J)} < 1.From this it follows [9] that the fixed point is asymptotically stable and the convergence exponentially fast.To prove the claim we consider the square of the Jacobian that can be written as a block diagonal matrix, and note that the traces of the two matrices on the diagonal is the same by applying a cyclic permutation.
Noticing that FcPp = Fc Pp and using the approximate solutions in Eq. ( 10) and Eq. ( 9), we find with simple algebra that Moreover, we can write the two non trivial matrices composing J 2 as and with A = F −1 MP −1 .The matrices AA T and A T A are symmetric and positive-semidefinite so that their eigenvalues are real and non negative, and the matrices FAA T F −1 and PA T AP −1 have the same eigenvalues.Therefore, the eigenvalues of J 2 are real and non negative and we can write according to Eq. ( 13) with λi eigenvalues of J. Finally, from the preceding equation we have max λ 2 i < max |λi| < 1 so that at the fixed point ρ(J) < 1.

Robustness to noise
Fitness and complexity (and quality) values depend on the structure of the matrix Mcp.Noise can affect its elements by flipping their value.Thus, we test the robustness of the algorithm to noise as described in [10].The idea is to introduce random noise by flipping each single bit of the matrix with probability η, which then is a parameter tuning the noise level.The rank of country fitnesses in presence of noise R η c is then compared with the rank obtained without noise R 0 c .The Spearman correlation ρs is then evaluated between these two sets and shown in Fig. 7 as a function of η for both the original and the new algorithm: the new algorithm shows a perfect stability to random noise as the original one with an unavoidable transition around η ≈ 0.5, where noise is so strong to alter significantly the structure of the matrix Mcp.

Discussion
The proposed new inhomogeneous algorithm of economic complexity defined in Eq. ( 4) and in Eq. ( 6) carries many advantages with respect to the original one.The fitnesses and complexities coming out from these two methods are not identical, but highly correlated to each other, as witnessed by the plots in Fig. 2.This high correlation between the two methods ensures that all the studies carried on with the original method so far, can be obtained by applying this new method as well.
Besides the stability of the algorithm and its robustness, one advantage of this method is that the fitness is well defined also for those countries that have low exportation volumes and that in the original method had their fitness tending to zero.For those countries it is now possible to undertake a comparative study based on hypothetical investments (changing the elements of the M matrix) so to make predictions on their economic impact.
By first symmetrising the original equations, by adding an inhomogeneous parameter and by rescaling the quantities, one obtains Eq. ( 6), where the parameter can be safely set to zero.This ensures that this The performance of the two algorithms is practically indistinguishable.Note that at η = 1 all elements are flipped so that the perturbed system is perfectly anti-correlated with the original one.
new algorithm is parameter free as the original one.As a side effect, the fixed point of the map can be well approximated analytically, with an error with respect to the iterative fixed point of less than 3% (see Fig. 3).The result is represented by Eq. ( 9) and Eq. ( 10) at the first order (Eq.( 19) and Eq. ( 20) at the second order), which allow for a simple intuitive explanation of the complexity of products and fitness of countries.Let us discuss Eq. ( 10) first.The result suggests that the fitness of a country is trivially related, at the first order, to its diversification: the more products a country exports, the larger is its fitness, i.e., the more developed its capabilities.This simple explicit dependence of the fitness on the diversification is also an advantage with respect to the original method, where the dependence was not explicitly clear.The second term of Eq. ( 10), which we call inefficiency, is also very interesting.If a country is the only one to export a given product, the contribution of this product to its fitness is a full one, or in other words, the contribution to the inefficiency is zero.This situation mimics a condition of monopoly on that product and it is logical that the exporting country has the full benefit of it.When a product is exported by multiple nations then it is critical to assess whether those countries export few or many other products (see Fig. 8).If a product is exported by a country c with low diversification (low capabilities), then that product is not supposed to be of high complexity.The result is that the ratio K cc /D c can be close to one (c = 1, c = 2 in the figure) and the inefficiency associated to the common products is high, resulting in a small contribution to the fitness of c.The inefficiency can be interpreted in terms of the bipartite network of countries and products: the K cc counts the number of links that connect countries c and c to the same products, while the differentiation Dc is the node degree of country c.In other words, for a country c the inefficiency counts the links to common products of all other countries and weights them according to the degree of those.To our knowledge, this kind of measure has never been considered in complex networks so far.Since, statistically, countries with an high diversification also export many less complex products, the inefficiency is an increasing function of the diversification (Fig. 5, main graph).If we subtract the general trend, which stems from the structure of the matrix Mcp, we can appreciate the net effect of choosing the goods to export.We call this new de-trended quantity net-efficiency.In this way we somehow remove the negative effect of less valuable products and highlight the contribution of more sophisticated goods.In the inset of Fig. 5 we show the net-efficiency as a function of diversification and underline the three nations (Japan, Korea and Switzerland) that stand out among the others.The complexity of products is estimated by Eq. ( 9) as the reciprocal of the second term of the sum.Since the diversification of a country Dc is a direct measure of its capabilities, we expect to find a simple relation between it and the complexities of products Qp.Indeed, if we indicate with ci those countries exporting the product p, for which obviously we have  from which we corroborate the main idea that the complexities of products are driven by the countries with low diversification (capabilities) that export it.Just for amusement, we observe how the complexity of products can be considered as the equivalent resistor of a parallel of resistors each one with resistance Dc.Somehow, a high Dc represents an effective resistance to the creation of a product and its export, so that if a country exists with a low diversification exporting it, the effort (resistance) of producing that product is also low.

Construction of the M matrix
Given the export volumes scp of a country c in a product p one can evaluate the Revealed Comparative Advantage (RCA) indicator [11]  in this way one can filter out size effects.As described in the Supplementary information of [4], from the time series of the RCA we can evaluate the productive competitiveness of each country in each product by assigning to it a productivity state from 1 to 4. State 1 means that the country does not produce (or is very uncompetitive in producing) a product, state 4 means that it is one of the main producer in the world.We can then project this states onto the binarized matrix Mcp by simply setting its elements to unity whenever a state larger than 2 is encountered, and set them to null otherwise.

= 1 ,
∀ c, p.The fixed point of the transformation is now trivially characterized by the conditions Fc ≥ δc, Pp ≥ δp, FcPp > Mcp.

Figure 1 :
Figure 1: Dependence on the non-homogeneous parameter: Dependence of fitness and quality at the fixed point on the parameter δ.One country (Afghanistan) and one product (live horses) were chosen arbitrarily from the sample of year 2014.

Figure 2 :
Figure 2: Comparison between the original and the revised method: Differences in country fitness (left panel) and product complexity (right panel) calculated with the original method of Ref. [1] (vertical axes) and new method (horizontal axes) as referred to year 2007.The green line in the left panel is the best least square approximation of power-law type (correlation coefficient 0.989) with exponent ca.1.53.The dark line in the right panel is the best power-law approximation (correlation coefficient 0.971) resulting with an exponent of ca.1.38.

Figure 3 :
Figure 3: Numerical vs Analytic relative error: The histogram of the relative difference ( F (fixed point) c − F (approximated) c

Figure 4 :
Figure 4: Country fitness evolution: Country fitness as calculated by the new algorithm.Curves were artificially smoothed by a cubic spline for a better visual representation.

Figure 5 :
Figure 5: Role of diversification: The country inefficiency (I c = D c − Fc vs the diversification D c with the black line representing the power-law relation I c ≈ D 0.75 c (linear regression with correlation coefficient 0.994).In the inset the net efficiency N c , defined as the difference between the black line and the inefficiency of the main graph, is shown.Data pertain to year 2007.

Figure 6 :
Figure 6: Yearly evolution of net efficiency: yearly time evolution of country net efficiency.The net efficiency is a detrended version of the inefficiency defined in the text and already displayed in the inset of Fig. 5 in the year 2007.Curves were artificially smoothed by a cubic spline for a better visual representation.

Figure 7 :
Figure7: Noise robustness: Spearman correlation between the ranking of countries based on fitness at zero noise and at different noise levels η (see Sec. 3.5 in the main text).The performance of the two algorithms is practically indistinguishable.Note that at η = 1 all elements are flipped so that the perturbed system is perfectly anti-correlated with the original one.