Complexity of Products: The Effect of Data Regularisation

Among several developments, the field of Economic Complexity (EC) has notably seen the introduction of two new techniques. One is the Bootstrapped Selective Predictability Scheme (SPSb), which can provide quantitative forecasts of the Gross Domestic Product of countries. The other, Hidden Markov Model (HMM) regularisation, denoises the datasets typically employed in the literature. We contribute to EC along three different directions. First, we prove the convergence of the SPSb algorithm to a well-known statistical learning technique known as Nadaraya-Watson Kernel regression. The latter has significantly lower time complexity, produces deterministic results, and it is interchangeable with SPSb for the purpose of making predictions. Second, we study the effects of HMM regularization on the Product Complexity and logPRODY metrics, for which a model of time evolution has been recently proposed. We find confirmation for the original interpretation of the logPRODY model as describing the change in the global market structure of products with new insights allowing a new interpretation of the Complexity measure, for which we propose a modification. Third, we explore new effects of regularisation on the data. We find that it reduces noise, and observe for the first time that it increases nestedness in the export network adjacency matrix.


Introduction
Complexity and Fitness measures were originally proposed [1] within the field of Economic Complexity (EC) to capture respectively the level of sophistication of a given class of products found on the international export market and the advancement of the productive system of a country. These two measures are calculated from international trade data, and they stem from the hypothesis that the difference between countries' competitiveness comes from their respective capabilities [2][3][4]. Capabilities are non-exportable features of the productive system of a country that allow it to produce a certain class of products. The problem with the theory of capabilities is that capabilities themselves are hard to define: one can speculate on what they might be, e.g., good regulations, a well-organized education system, or maybe the presence of facilities specifically useful for a product's making, but there is currently no good principled "a priori" or normative approach to classify and measure them [5]. On the other hand, the observation that a country c exports product p contains a strong signal. It implies that c is competitive enough in the production of p for export to be convenient on the global market. Therefore, one could say that c has all the capabilities needed to make p. Hausmann [6] proposed the Method of Reflections, a non-normative algorithm to rank countries by how many capabilities they have, and products by how many capabilities they need for production, based on observed exports. The algorithm leverages topological properties of the export network, which is a bipartite network where the nodes can be either countries or product classes, and where a link is added to the network if country c is a significant exporter of p. Fitness and Complexity are the output of an alternative algorithm [7] exploiting the discovery that the export network has a nested topology [1] (a comparative analysis is found in [8]). In other words, it has been observed that some countries, usually the richest in monetary terms, export almost all product classes, and some products are exported only by the countries that are most diversified in terms of export. Conversely, the less diversified countries only export a handful of products which are also being exported by almost all countries. This means that the adjacency matrix of the export network M cp can be reordered to be very close to triangular, in analogy with some biological systems [9,10]. The Fitness/Complexity algorithm takes the adjacency matrix M cp as an input and produces a value of Fitness F for each country and one of Complexity C for each product. Sorting the matrix rows and columns by increasing Fitness and Complexity produces the characteristic triangular structure. This ordering offers a robust way to rank the countries in terms of their competitiveness and products in terms of how sophisticated they are [1]. Nestedness of the bipartite export network is a fundamental point of the theory and, in this paper, we measured nestedness with one widespread metric, NODF [11], for the first time. The Economic Complexity approach is an innovative way to use the wealth of data that is being currently produced in economics, and it has the advantage of offering a data-driven and mathematically defined method of analysis, which reduces the necessity of interpretation.
Several results have been produced in many directions but mainly in the direction of the Fitness measure. The network approach produced an algorithm to forecast the sequence of products a country will start to export [12], and inspired the exploration of innovation models [13]. Fitness as a macroeconomic indicator has been particularly fruitful. One very interesting result calls for an extension of neo-classical economic theories of growth. It is classically understood that for countries to start the process towards industrialization they have to pass a threshold of GDP per capita (GDPpc), and it has been found that higher Fitness can significantly lower this threshold [14]. It has long been observed that Fitness might allow for Gross Domestic Product (GDP) prediction [1,15], but the most recent advances have introduced a dynamical systems based approach to quantitative forecasting called Bootstrapped Selective Predictability Scheme [16] (SPSb, see Section 3.4). The method is based on the observation that trajectories of countries tend to be collinear in many regions in the GDP-Fitness two-dimensional space. Making the assumption that the growth process of countries can be modelled as a two-dimensional dynamical system allows to use nonparametric regression techniques such as the method of analogues [17] to forecast growth. SPSb has been proven to give state-of-the-art GDP forecasts [18]. In this work, we prove that SPSb converges to a well-known nonparametric regression originally proposed by Nadaraya and Watson. The same work introduced a new regularization method for the M cp based on a Hidden Markov model (HMM, see Section 3.6), and it has been proven to give state-of-the-art GDP forecasts [18] (but, to our best knowledge, has never been applied to the Complexity measure until the present work). These ideas were originally introduced to validate the new Fitness metric, which is non-monetary, by comparing and contrasting it to an established monetary metric such as GDP. This line of thinking proved very fruitful, so other attempts have been made to extract information by comparing an Economic Complexity metric with established ones. One such attempt compared economic inequality measurements with Fitness [19]. This paper contributes to the latest developments of the Complexity and Fitness measures and it follows up mainly from the earlier work by Angelini et al. [20] focusing on the Complexity measure. In particular, the Complexity index has been paired with logPRODY (L, see Section 3.2) to obtain an interesting insight. LogPRODY of a product is a weighted average of the GDP of its exporters, where the weights are proportional to comparative advantage in making that product. It is possible to represent product classes as points on the Complexity-logPRODY plane. Their motion on said plane can be modelled with a potential-like equation [20] (see Section 3.3 for more details). In this work, we report the results of the application of SPSb and HMM regularization on the Complexity measure, and we show how HMM affects the M cp matrices. This paper is structured as follows. In Section 2.1 we show that, as suggested in [16], the SPSb technique converges to the faster and mathematically well-grounded Nadaraya-Watson kernel regression (NWKR), allowing applications of SPSb to larger datasets. In Section 2.2 we look at how the HMM regularization affects the aforementioned Complexity-logPRODY plane motion and analyse its effect on a set of different M cp matrices. Finally, Section 2.3 reports our application of the SPSb algorithm to make predictions on the Complexity-logPRODY plane.

Convergence of SPSb to a Nadaraya-Watson Kernel Regression
In this section, we prove that the SPSb prediction method converges, for a large number of iterations, to a Nadaraya-Watson kernel regression (NWKR). The idea was originally suggested in [16], but never developed mathematically. We prove the convergence analytically and numerically so that for all prediction purposes the two methods are interchangeable. The result is significant because it connects SPSb to a well-established, tried and tested technique, and frames the predictions made with this method in a more mathematically rigorous setting. SPSb is a non-deterministic algorithm so, at every run, it will yield slightly different results, while NWKR will always produce the same results up to machine precision. From a computational perspective, NWKR has much smaller time complexity, so our result allows the use of SPSb on much larger datasets than previously explored.
SPSb is fundamentally a nonparametric regression. We describe the algorithm here, and in Section 3.4. In the original formulation [18], one is presented with xĉ ,t , the position of a given countrŷ c in the Fitness-GDP (FG) plane at timet, and wants to predict the change (displacement) in GDP at the next timestept + ∆t, namely δxĉ ,t . The method is based on the idea, advanced in [15], that the growth process of countries is well modeled by a low-dimensional dynamical systems. For many important cases, the best model is argued to be embedded in the two-dimensional Euclidean space given by Fitness and GDPpc. It is not possible to identify the analytical equations of motion, so instead one uses observations of previous positions and displacements of other countries (δx c,t , x c,t ), which are called analogues, a term borrowed from [17]. Because the evolution is argued to be dependent only on two parameters, observed past evolutions of countries nearby xĉ ,t in the FG plane are deemed to be good predictors of δxĉ ,t . Threfore SPSb predicts δxĉ ,t as a weighted average of past observations. The weights will be proportional to the similarity of countryĉ to its analogues, and the similarity is evaluated by calculating Euclidean distance on the Fitness-GDP plane. A close relative of this approach is the well-known K-nearest neighbours regression [21]. In order to obtain this weighted average, one samples with repetition a number B of bootstraps from all N available analogues. The sample probability density of an analogue δx c,t , found at position x c,t is given by a gaussian distribution: Therefore sampling probability will be inversely proportional to distance, i.e., analogues closer on the FG plane are sampled more often. We will adopt the following notation: each bootstrap will be numbered with b and each sampled analogue in a bootstrap with n, so each specific analogue sampled during the prediction of δxĉ ,t can be indexed with sĉ ,t b,n . Once the sampling operation is done, one averages the samples per bootstrap, obtaining vĉ ,t b = ∑ N n sĉ ,t b,n /N = sĉ ,t b,n n . These averaged values constitute the distribution we expect for δxĉ ,t . From this distribution we can derive an expectation value and a standard deviation (interpreted as expected prediction error) for δxĉ ,t : NWKR is conceptually very similar to SPSb. We will use the symbol ↔ to establish a correspondence between the two algorithms: in NWKR one is presented with an observation X ↔ xĉ ,t and wants to predict Y ↔ δxĉ ,t from it. Other observations are available (Y i , X i ) ↔ (δx c,t , x c,t ), and the prediction is a weighted average of the Y i 's.
The weights will be given by K, a function of the distance on the Euclidean space containing the X i values. This function is called kernel. A more detailed explanation of this technique can be found in Section 3.5.

Analytical Convergence
SPSb returns both an expected value and a standard deviation for the quantity being measured. We begin by proving convergence of expected value.
Expected values. -Suppose that we execute B bootstraps of N samples from all available analogues {δx c,t }, so that each sampled value in a bootstrap can be labelled as sĉ ,t b,n with 1 ≤ n ≤ N and 1 ≤ b ≤ B. Then the SPSb probabilistic forecast E SPSb (δxĉ ,t ) will be: If we aggregate all B bootstraps, we can label the frequency with which the analogue δx c,t appears overall in the sampled analogues as where 1 {·} is intended to be an indicator function. So we can rewrite the forecast as: where ∑ c,t indicates a sum over all available analogues. But since the analogues are being sampled according to a known probability distribution p(δx c,t | xĉ ,t ), we can expect, by the law of large numbers, that for B → ∞ the sample frequency will converge to the probability values (which it does, see Figure 1a): Now, SPSb uses a Gaussian probability distribution p(δx c,t | xĉ ,t ) = N ( x c,t − xĉ ,t |0, σ) (see Section 3.4) so our forecast will tend to: but this is exactly the definition of a NWKR with Gaussian kernel that has bandwidth σ, see Section 3.5 (in the machine learning literature it's usually not called Gaussian, but radial basis function.). We assumed for brevity that the sum is already normalized, i.e., ∑ c,t p(δx c,t | xĉ ,t ) = ∑ c,t N ( x c,t − xĉ ,t |0, σ) = 1, normalization is needed in Equations (9) and (10) if this is not true, but it doesn't change the result of the proof. B (δx c,t ) converge to kernel probabilities p(δx c,t | xĉ ,t ), as defined in Equation (8). This plot compares them after B = 5 × 10 4 bootstrap cycles of SPSb (with N = 100, i.e., 5 × 10 6 sampled analogues). The values, as expected, start to visibly diverge around 10 −6 ; (b) (right)-Histogram of the probabilities assigned by the kernel to all analogues on the plane, for a typical prediction. It can be seen that a sizeable proportion of the analogues has probability e.g., ≤ 10 −5 . They will therefore not be included in SPSb if the number of analogues sampled is of order 10 5 .
Variances. -The variance of the distribution of samples in SPSb is calculated first by computing vĉ ,t b = ∑ N n sĉ ,t b,n /N = sĉ ,t b,n n i.e., the average of the samples of each boostrap, and then computing the variance of the vĉ ,t b across bootstraps, so (with the same notation as Equation (6)) it can be written as: In the second row we considered that 1 BN ∑ B,N b ,n sĉ ,t b,n , the operation of averaging across all sample analogues, irrespective of which bootstrap they are in, is equivalent to taking the expected value in SPSb. In the third row, because in SPSb we are calculating the variance of the means sĉ ,t b,n n , and each of the means is done over N samples, for the central limit theorem when N 1 we expect a variance that is N times smaller than the population variance of the analogues sampled with probability p, which we called σ 2 bn (sĉ ,t b,n ). The approximation in the last row is justified by the fact that σ 2 bn (sĉ ,t b,n ) in the third and fourth row is an unbiased estimator of the variance, and ∑ B,n b,n (sĉ ,t b,n ) 2 /(BN) in the last row is an unbiased estimator of the second moment of the distribution of the samples. In the limit of large B, the relation E((z − E(z)) 2 ) = E(z 2 ) − E(z) 2 applies to unbiased estimators too. Now, we know by the definition of NWKR (Section 3.5) that E(δxĉ ,t ) ↔ E(Y) is actually a conditional probability E(δxĉ ,t |xĉ ,t ) ↔ E(Y|X), i.e., the probability of observing a certain displacement δxĉ ,t given the position on the plane xĉ ,t . Therefore we can compute the variance for a NWKR as: which tranlsates, for SPSb formalism, into: We again omitted normalization terms in the third and fourth rows. This equation, combined with Equation (11), means that the standard deviation calculated with NWKR is espected to be proportional to the standard deviation calculated with SPSb multiplied by √ N. Note that this method makes it possible to estimate any moment of thef (X|Y) distribution, not just the second.

Numerical Convergence
We computed expectations and standard deviations for economic complexity data with both SPSb (5 × 10 5 bootstraps) and NWKR. The results here refer to the calculation for GDP prediction, but the same results are obtained with products predictions. It can be clearly seen from Figure 2a that the expectation values for SPSb converge to NWKR expectation values as the number of bootstraps increases. We show that the mean average error MAE[E SPSb (δx)] = abs converges numerically to zero (by E M (δx) we mean the expectation value of the displacement of x calculated with method M). The standard deviations converge as well, as can be seen from Figure 2b.
. A comparison of the values obtained for expectations with the two methods is shown in Figure 3a. The difference between predictions with the two methods is 3 × 10 −5 on average with a standard deviation of 3 × 10 −5 . A comparison of the standard deviations obtained with the two methods is shown in Figure 3b. The difference between the two methods in this case is 6 × 10 −4 on average with a standard deviation of 5 × 10 −4 . For the purpose of GDP prediction we can therefore say that the two methods are completely interchangeable. The time complexity for SPSb is of the order O(NB), while for NWKR is O(N), so with B = 1000 bootstraps (as reccommended by the literature [18]) NWKR is expected to be 1000 times faster. The same is not true for space complexity, since the original SPSb can be implemented with O(N) memory requirements like NWKR.
, i.e., the percentage mean average error done by NWKR while estimating the output of SPSb, while on the horizontal axis the number of bootstraps. After B = 10 5 bootstrap cycles (with the default N = 100 samples per cycle), the relative error is always smaller than 0.1%. This figure also allows to estimate by how much SPSb results can vary between different runs. For 10 3 bootstrap cycles, the largest deviation is around 1% of the value. (b) (right)-For 30 predictions, we show the difference between standard deviations calculated with SPSb and the same quantity computed with NWKR at different numbers of bootstrap cycles. On the vertical axis , i.e., the percentage mean average error done by NWKR while estimating the standard deviation predicted by SPSb, while on the horizontal axis the number of bootstraps. After 10 5 bootstrap cycles, the relative error is always less than 1%. The match is, for all prediction purposes, perfect. In the legend, we report the value of R 2 for the observations, as well as the p-value for a linear regression (which is below machine precision, so it is approximated to 0), mean relative error (the absolute value of differences normalized), and the standard deviation of the relative error; (b) (right)-For all possible predictions to be made on the plane, a comparison of the standard deviations obtained with SPSb at 5 × 10 5 bootstraps and NWKR. The match is, again, perfect for prediction purposes.
The convergence does not reach machine precision even at 5 × 10 5 bootstrap cycles of SPSb because many of the analogues have extremely small probabilities to appear in a bootstrap. In Figure 1b we show the probabilities assigned by the kernel to all analogues of the plane for a typical prediction. In Figure 1a we compare, for a typical prediction, the sample frequency of each analogue with the sampling probability assigned to it by the kernel. It can be clearly seen from both figures that a sizeable proportion of the analogues has no chance to appear even in a bootstrap of 5 × 10 5 cycles since about 30 percent of them have probability significantly ≤ 10 −7 (each bootstrap samples N ≈ 10 2 analogues). These analogues are instead included in the NWKR estimate, although with a very small weight.
To obtain complete convergence one would have to sample, in total, as many analogues as the inverse of the smallest probability found among the analogues, and this number can go up to 10 25 in typical use cases. We expect the discrepancies to decrease with the total number of samples (i.e., NB), as more and more analogues are sampled with the correct frequency. A visual representation of such discrepancies can be seen in Figure 1a, where we plot the kernel probabilities of each available analogue p(δx c,t | xĉ ,t ) against the sampled frequencies φĉ ,t B (δx c,t ) for a bootstrap of 5 × 10 4 samples. Discrepancies start to show, as expected, at a probability of about 10 −6 . The code we used to compute NWKR is publicly available [22].

HMM Regularization Reduces Noise and Increases Nestedness
In analogy to what happens for countries, product classes too can be represented as points (L t , C t ) on the Complexity-logPRODY (CL) plane. Their trajectories over time t can be then considered, and one can find the average velocity field v by dividing the CL plane into a grid of square cells and averaging the time displacements (δL t , δC t ) of products per cell. The procedure of averaging per cell on a grid can be considered a form of nonparametric regression, but it is by no means the only technique available to treat this problem. All the following results hold independently of the regression technique used to do the spatial averages, as reported in [20]. The product model described in [20] and summarised in Section 3.3 explains the v field in terms of competition maximization. For each product, it is possible to compute the Herfindahl index H(p, t) (Equation (20) Section 3.3), which quantifies the competition on the international market for the export of product p in year t. The lower H(p, t), the higher the competition. Averaging the values of H(p, t) per cell on the CL plane gives rise to a scalar field, which we call the Herfindahl field H. The inverse of the gradient of this field −∇H explains the average velocity field (Equation (21), Section 3.3), much like a potential.
The original work where this model was proposed used a dataset of about 1000 products, classified according to the Harmonized System 2007 [20]. The Harmonized System classifies products hierarchically with a 6-digit code. The first 4 digits specify a certain class of product, and the subsequent two digits a subclass (see Section 3.7). In [20], the export flux was aggregated at the 4 digit level, and we will refer to this dataset as noreg4. We recently obtained the full 6-digit database, comprehensive of about 4000 products. We calculated the model on M cp matrices at 6 digit level (noreg6), to compare it with the noreg4 case. We also obtained the same 6-digit dataset regularized with the aforementioned HMM method [18] (see Section 3.6), which we will call hmm6. This method goes beyond the classical definition of the M cp matrix as a threshold of the RCA matrix (Equations (14) and (15), in Section 3.3). Because the value of RCA fluctuates over time around the threshold, it can lead to elements of the M cp matrix switching on and off repeatedly, polluting the measurements with noise. The HMM algorithm stabilizes this fluctuation. Because of this, it can significantly increase the accuracy of GDP predictions [18].
We computed the CL motion model on the three different datasets hitherto described. The results can be compared visually in Figure 4. Each of the panels in Figure 4a  The horizontal axes mark the Complexity, and the vertical ones logPRODY. Note that in these figures we use tied ranking as coordinates, instead of the observed values directly. Panels (a,c,e) show the v field, together with a kernel regression of the minima of the field across the vertical direction in yellow. An uncertainty measure of this minima line has been calculated by means of a bootstrap. Panels (b,d,f) show a heat map of the H field, and its gradient. The blue line indicates the minima of the H field along the vertical direction, together with an uncertainty calculated via bootstrap. The first feature of this Figure is the difference in the v fields. The one calculated from noreg6 has much higher velocities on the Complexity axis, while the hmm6 velocities along the same direction are much smaller. This might be an indication that much of the change in Complexity over time is actually due to noise. The second feature is that, when going from 4 to 6 digit granularity, the observed minima lines become incompatible with those predicted by the model. Noise reduction. -Panels in Figure 4a,b are almost identical to those in [20], since the noreg4 data set is the same with the addition of one more year of observations (namely 2015). Figure 4c,d represent the velocity and Herfindahl field obtained with noreg6. The most noticeable change is the strong horizontal component of the velocity field: Complexity changes much faster than in noreg4. We believe this is due to two effects. The first one is the increased noise: when a 4-digit code is disaggregated into many 6-digit codes, there are fewer recorded export trades for each product category. This means that each individual 6-digit product category will be more sensitive to random fluctuations in time, of the kind described in Section 3.6. The second source of change is due to overly specific product classes. There are some products, such as e.g., products typical of a specific country, for which we would expect generally low Complexity. It typically happens that these products are exported by almost only one, fairly high-Fitness, country, which produces it as a speciality. When the Complexity of such products is computed with Equation (18) (Section 3.1), it will be assigned a high value, because they have few high-quality exporters. This effect increases the Complexity of the product and is stronger in more granular data. Combined with the stronger fluctuations coming from disaggregation, it contributes to noise in the Complexity measurements.
Another, stronger argument in favour of noise causing fast Complexity change over the years in noreg6 is Figure 4e,f. These figures show the velocity and Herfindahl field for the regularized hmm6 data. It is clear that the horizontal components of the v field are much smaller compared to noreg6, and that the only change in the data comes from the regularization, which was explicitly developed to reduce the impact of random fluctuations in export measurements. We, therefore, conclude that the HMM regularization is effective in reducing noise and generating smoother Complexity time series. Another interesting observation is that the v obtained from hmm6 is very similar to the noreg4 one. Therefore we would like to conjecture that aggregating data from 6 digits to 4 has an effect similar to that of reducing noise with the HMM algorithm. We will see in the next section that there is a further evidence to this conjecture.
Increase in nestedness.
-A yet undocumented effect of HMM regularization is the increase in nestedness of the M cp matrices. It can be visualized by looking at Figure 5a,c,e. Here we show a point for each nonzero element of all M cp matrices available in each dataset. To be able to resolve the differences in density, we computed a kernel estimate of the density of points on the plane. The horizontal axis is the value of rank(Complexity), while rank(Fitness) is on the vertical axis. All three datasets feature very nested matrices, as expected, but hmm6 has one peculiarity. The top left corner of Figure 5e exhibits in fact a higher density than the other two. This means that regularization has the effect of activating many low-Complexity exports of high-Fitness countries. This makes sense since we expect the thresholding procedure described in Section 3.1 to be noisier in this area. Indeed, we know that the high-Complexity products are exported only by high-Fitness countries, so we expect the numerator of the RCA cp (proportional to the importance of p in total world export, see Section 3.1) in this area to be small. We also know RCA cp is proportional to the importance of product p relative to total exports of c, so we expect it to be high in the low-Complexity/low-Fitness area since low-Fitness countries export few products. Furthermore, it has been described in [20] that countries are observed to have similar competitive advantage in low-Complexity products regardless of their level of Fitness. So in the high-Fitness/low-Complexity area, we expect to observe a lower numerator, possibly fluctuating around the thresholding value, due to the high diversification of high-Fitness countries.
A higher density in the high-Fitness, low-Complexity area naturally results in more nested matrices. To show this, we computed the well-known NODF [11,23] measure of nestedness for all M cp matrices in all datasets. The results can be found in Figure 6a, and show clearly that hmm6 matrices are much more nested than unregularized ones. Another observed result is that noreg4 matrices are slightly but consistently more nested than the noreg6 ones. This is further support for our conjecture that aggregating from 6 to 4 digit has an effect similar to regularizing with an HMM model. Figure 6b shows the significance level of the NODF measurements. In order to assess significance, we computed n obs , the observed value of NODF on the M cp matrices, and we compared it with n null the NODF obtained from null models. The null models usually generate new adjacency matrices at random while holding some of the properties of the observed matrix (such as e.g., total number of nonzero elements) fixed. This is a way to control for the effect of the fixed property on the nestedness. Several runs of a null model generate an empirical probability distribution p(n null ). The p-value of the measurement is assessed by calculating in which quantile of p(n null ) the observed value n obs falls. In Figure 6b we report the ratio between n obs /E p (n null ) and the scaled standard deviation of the null distribution σ(n null )/E p (n null ), for three common null models [23]. The scaling allows to compare very different distributions on the same axis. The ratio of σ(n null ) to n obs − E p (n null ) is very small. Thus, the observed measurements' significance is so high that there is no need to calculate quantiles. NODF was calculated using the FALCON [23] software package, for which we provide a wrapper in Python [24]. To be able to resolve the difference in the density of points, we applied a kernel density estimate (KDE). The triangular shape suggesting nestedness is clearly visible in all three cases. The differences lie in the top left corner, where low-Complexity products exported by high-Fitness countries are found. The unregularized data (noreg4, noreg6) notably have lower density here when compared with regularized matrices (hmm6). This is reflected in the increased nestedness of regularized matrices, as shown in Figure 6. Figure 4 is that, while it works well for 4-digit data, the model of product motion has trouble with reproducing the data at the 6-digit level. Regressing the v components against the derivatives of the H field, as shown in Table 1, seems to indicate that the 6-digit models work better (However, the 4-digit BACI dataset hmm4 has one peculiarity that needs explaining. Specifically, the bottom right corner of Figure 4b does not contain the maximum of H that is found in all other datasets ever observed (including the Feenstra dataset studied in [20]). This causes the gradient of H in that area to produce small values, which do not match the high vertical components of v in the same spot, significantly lowering the R 2 coefficient of a linear regression.). But one key feature of the model disappears when moving from 4 to 6 digits. The yellow and blue lines in Figure 4 indicate a kernel regression of respectively the minima of the v field and the minima of the H field across each column of the grid (together with error bars obtained via bootstrap). The model predicts that v will be almost zero where the minima of H lie, but at 6 digits this feature disappears, and the minima lines become incompatible with each other. We are currently lacking an explanation of this behaviour, that seems independent of regularisation. matrices in the three datasets discussed in this work. We used the NODF [11] measure, which goes from 0 (no nestedness) to 100 (perfectly nested matrix). It can be clearly seen that the regularized data, hmm6, is much more nested than the rest, as already suggested by the observation of Figure 5. The noreg4 dataset, though, is significantly and consistently more nested than the noreg6. This suggests that aggregating from 6 to 4 digits might have a regularizing effect; (b) (right)-Significativity of NODF measures. We calculate an ensemble of 100 null models for each dataset and report the ratio (null model NODF)/(observed NODF). We do this for 3 commonly used null models [23], and we report the standard deviation of the ensemble (similarly scaled) in the form of an error bar. The standard deviation of the DD and EE null models ensembles is so small that it cannot be seen in the plot. We observe that all null models have significantly smaller NODF than the observed matrices, and the results are therefore highly significant. All calculations were done with the FALCON software package [23]. Table 1. R 2 coefficients of a linear regression of v components against the derivatives of the H field along the x-axis (Complexity) and y-axis (logPRODY).

Predictions on Products with SPSb
Dynamics of products on the CL plane appears to be laminar everywhere, in the sense that the average velocity field seems to be smooth [20], similarly to what happens to countries on the Fitness-GDP plane [15]. If so, then it's a reasonable hypothesis that the information contained in the average velocity field can be used to predict the future positions of products on the plane. The idea was originally conceived because it could lead to refined predictions on GDP and Fitness (see Appendix A). We tried to predict the future displacement of products with SPSb. Because the number of products is about 1 order of magnitude larger than the number of countries used in [18], the computational demand of the algorithm induced us to develop the proof of convergence reported in Section 2.1.
The results for the backtests on this methodology are reported in Figure 7. We predicted the Percentage Compound Annual Growth Rate (CAGR%) for each of the two metrics, and defined the error as E = |CAGR% observed − CAGR% forecasted |, so that if e.g., Complexity increases by 2% and we forecast 3%, E = 1%. The forecasts are made at timescales ∆t = 3, 4, 5 years. We used the three datasets hmm6, noreg6 and noreg4. The predictions are not very accurate, with an error between 12% and 6% for logPRODY and in the 32-13% range for Complexity. We compared the predictions to a random baseline, i.e., predicting the displacement by selecting an observed displacement at random from all the available analogues. Compared to the random baseline, SPSb is always more accurate. One peculiarity about the predictions, though, is that they are generally much smaller in magnitude than the actual displacements observed. This led us to add another comparison, which we call static baseline, that consists in predicting zero displacement for all products. Compared to this baseline, SPSb still systematically shows some predictive power for logPRODY, especially in noreg4, but is definitely worse when predicting Complexity. We will clarify our explanation for this behaviour with an analogy. While the average velocity field v exhibits laminar characteristics, in the sense that it is relatively smooth, the actual motion of the underlying products is much more disorderly. In a given neighbourhood of the CL plane, products generally move in every direction, often with large velocities, even though the average of their displacements is nonzero and small. We could tentatively describe this as a Brownian motion with a laminar drift given by v. So trying to predict the future position of a product from their aggregate motion would be similar to trying to predict the position of a molecule in a gas. That's why the static prediction is better than a random prediction: in general, the last position of a product is a better predictor than a new random position on the plane, since the new one might be farther away. To test this Brownian motion with drift hypothesis, we added a third baseline, which we call autocorrelation baseline. It consists in forecasting the displacement of a product to be exactly equal to its previous observed displacement. If the hypothesis is true, we expect each product displacement to be uncorrelated with its displacement at previous time steps. For logPRODY the autocorrelation baseline is always worse than the static, which we interpret as a signal that logPRODY changes are not autocorrelated. The reverse is true for Complexity: in fact, for noreg4 and hmm6 the autocorrelation baseline is the best predictor for Complexity change. Figure 7. SPSb predictions on products. We predicted future values of logPRODY and Complexity on the log(Complexity)-logPRODY plane (not using ranking) with backtested SPSb at ∆t = 3, 4, 5 years in the future. We used the three datasets hmm6,noreg6 and noreg4. On the vertical axis, the Mean Average Error of the prediction (MAE). Three baselines are shown. The first one, called "random", consists of predicting displacement by randomly selecting one available analogue. The second, called "autocorrelation", consists of predicting the next displacement of a product to be exactly the same as the last observed one. The last, called "static" predicts 0 displacement for every product. (a,c,e) (left)-Complexity predictions are always worse than both the static baselines, and worse than the autocorrelation one in hmm6 and noreg4. This might signify that observed changes in Complexity mostly caused by random noise. Very interesting is the good result of the autocorrelation baseline: this suggests that Complexity changes over time might be autocorrelated. Finally, prediction accuracy is significantly better for regularized data. It can be interpreted as a signal that, by reducing the noise, the motion becomes more predictable; (b,d,f) (right)-logPRODY predictions are significantly better than random predictions in all cases. Predictions are significantly better than all baselines for noreg4, and slightly but systematically better than the static prediction for the other two datasets. We interpret this as a clue that logPRODY change over time actually signals a change in market structure, as discussed in Section 2.3. These results also confirm that the logPRODY model performs significantly better on noreg4.
As already mentioned, SPSb does still have slightly but systematically more predictive power than the autocorrelation and static prediction, but only for logPRODY. We speculate that this is due to the fact that change in logPRODY is actually a signal of the underlying market structure changing, as explained in [20] and in Section 3.3. The fact that this advantage over the baseline is much bigger on noreg4 confirms that the logPRODY model performs significantly better on noreg4, as discussed in Section 3.3. On the other hand, the autocorrelation prediction (as well as the static one) can be significantly better than SPSb when predicting changes in Complexity. It is not clear whether this implies that changes in Complexity are autocorrelated in time -this effect for example disappears in noreg6, and will require an analysis with different techniques. But the fact that SPSb is always worse than the baseline, combined with the fact that regularization, which is supposed to mitigate noise, significantly reduces changes in Complexity over time raises a doubt over whether changes in Complexity are significant at all, or are drowned by noise in the M cp . The fact that Complexity predictions are significantly better on the hmm6 dataset suggests confirms the contribution of noise to Complexity changes, although it is not possible to argue that regularization is strengthening the signal coming from these changes over time, since we could not characterize any signal. This might be an important finding because it could shed some light on the nature of the Complexity metric. We suggest that an alternative line of thinking should be explored, in which one treats the Complexity of a product as fixed over time. This resonates with the data structure: product classes are fixed over the timescales considered in our analyses, and new products that might be introduced in the global market during this time are not included. It also might be derived from an interpretation of the theory: Complexity is meant to be a measurement of the number of capabilities required to successfully export a product [7]. Practically, this means that there is no specific reason to believe that the Complexity of (i.e., the capabilities required for) wheat, or aeroplanes, changes over the course of the 20 years typically considered in this kind of analysis. It is possible that changes in Complexity, defined as a proxy for the number of capabilities required to be competitive in a given product, occur over longer timescales, or maybe that Complexity never changes at all. If this were true, then all observed Complexity changes would be due to noise, and it would be better to consider defining a measure of Complexity that is fixed or slowly changing in time for the model. We remark that these definition problems will probably be insurmountable as long as it is impossible to give an operational definition of capabilities, and they can only be measured indirectly through aggregate proxies, i.e., countries and products. There always is a tradeoff of interpretability to pay in order to give up normative practices in favour of operational definitions, but it affects economics and social sciences more than the physical sciences.

Fitness and Complexity Algorithm
As discussed in Section 1, Fitness and Complexity measures are calculated from the M cp . This matrix is intended to be binary, with M cp = 1 if country c is an exporter of product p, and 0 elsewhere. To measure how significant the exports of p are for a given country, literature turns to the RCA cp , where the acronym stands for Revealed Comparative Advantage, or Balassa index [25], and we defined the weighs. If we define the value in dollars of product p exported by country c as EXM (also known as the export matrix), then the Balassa index is defined as: We take the ratio between the exports of p done by country c and total exports of c, and divide it by the world-average of this same ratio. Traditionally, the thresholding of this matrix returns the M cp : This is the definition we refer to when mentioning unregularized data. Because both EXM and RCA are noisy matrices, a new procedure procedure for deriving a regularized M cp has been introduced, as explained in Section 3.6. We mention in Section 1 that the M cp matrix is nested, and this observation is crucial to the definition of the Fitness-Complexity Algorithm because of two important implications. The first one is that observing a p being exported by a very diversified country c is uninformative, while if c is poorly diversified we have good reason to think that the product should be a low-Complexity one. On the other hand, if p is only exported by high-Fitness countries, chances are that it should be assigned high Complexity. The algorithm itself is a map that is iterated to convergence on the M cp , and it embeds the former considerations with a non-linearity. The equations of the map are: Now Fitness of country c is defined as the plain sum of Complexities of products exported by c. Complexity of product p is instead bound by the equations to be less than the lower Fitness found among the exporters of p. Additionally, the more exporters of p, the less its Complexity. Convergence of the map can be defined numerically in various ways [26,27], and the stability of the metric with respect to noise has been studied in [28,29].

LogPRODY
LogPRODY is a modification of the PRODY index proposed by Hausmann [30], who employed it to investigate the relationship between exports and growth of a country. logPRODY is defined, for a product p, as follows: where RCA is the Balassa index explained in Section 3.1, Equation (14). The Hausmann's PRODY is defined the same way, except that log 10 (GDP c ) is replaced by GDP c in the sum. We employ logarithms because the numerical distribution of GDPs spans several orders of magnitude, and a geometric average contributes to the stability of the measure [20]. Note that we defined nRCA cp = RCA cp ∑ j RCA jp , the normalized RCA. Comparing this quantity with the definition of RCA, we can see that normalization removes the effect of numerator from Equation (14). In other words, nRCA cp is proportional to the ratio between the exports of p done by country c and total exports of c. The more product p contributes to total exports of c, the more c will be weighed in logPRODY p . Further considerations about this measure can be found in [20].

Complexity-logPRODY Motion Model
Products can be represented as points on the Complexity-logPRODY (CL) plane. Their aggregate motion in time, averaged as a vector field v can be seen in Figure 4a,c,e. In those figures, the CL plane has been divided into a grid of cells, and we averaged the displacement vector of all products for each cell. Note that all axes in Figure 4 are labeled as rank(·). This is because Complexity and logPRODY can be badly behaved, and the standard treatment is to use tied ranking, instead of the observed value, when calculating this model. This motion can be modeled with a potential-like equation [20]. One first needs to define the Herfindahl index [31]: where EXM cp is the export matrix, defined in Section 3.1. The Herfindahl index measures the competitiveness of a market by summing the square of the market shares of each participant to the market. It ranges from 1 (for a monopoly) to 1/N (the case of N participants all with equal market share). When defined as in Equation (20), it refers to the total market share of countries. Averaging the Herfindahl index per cell on the CL plane produces a scalar field, H, for which one can compute the gradient with respect to the C (Complexity) and L (logPRODY) coordinates on the plane. Then the model explaining v is: where k C ,k L are two scalar constants. This implies that the average velocity of products v points towards area of lower H, i.e., higher competition on the CL plane. The lines in Figure 4 show respectively where v is minimum and where H is minimum for each column of the grid. The interpretation given to this model in [20] is that logPRODY p serves as a proxy for the global market structure of product p. The full market structure is defined by the distribution of the weights of logPRODY p across countries. As mentioned in Section 3.2, these weights are given by the nRCA cp and they are proportional to the competitive advantage of country c in making product p. The market structure that maximizes H, or competition, is named asymptotic in [20], and it depends on Complexity. Low-Complexity products typically show an asymptotic distribution of comparative advantage that is uniform across all countries, or sometimes mildly peaked on low-Fitness countries. High-Complexity products show instead a sharp peak of comparative advantage on high-Fitness countries. The name asymptotic comes from the observation that whenever the market structure of a product is different from the asymptotic, it tends to revert to it. In doing so, it increases competition (H). LogPRODY is by definition the expectation value of the GDP on the distribution of comparative advantage, so its value tends to revert to the value it assumes on the asymptotic distribution. Interpretation for the horizontal displacements (along the Complexity axis) is, instead, less clear-cut. This difference in interpretability between logPRODY and Complexity displacements plays a role into our discussion of Section 2.3.

SPSb
As mentioned in Section 2.1, Bootstrapped Selective Predictability Scheme (SPSb) is a prediction technique allowing quantitative forecast of GDP growth for a country by averaging the growth of countries nearby on the Fitness-GDP (FG) plane [16,18]. We will describe the algorithm in detail here. Given xĉ ,t , the position of countryĉ in the FG plane at timet, we want to forecast δxĉ ,t , the future displacement of countryc from timet tot + ∆t. Note that while the position on the FG plane is vectorial ( x), we are referring to the displacement as a scalar (δx). This is because we want to keep the formalism of the original work, which is concerned only with displacement along the GDP direction. Nothing forbids to forecast displacement along any arbitrary direction, though. In that case, the displacement would have to be a vector quantity. To do so, we consider the set of observed past observations (δx c,t , x c,t ) on the FG plane, which we will call analogues. Note that, if one wants to rigorously implement a backtesting procedure, only the analogues for which t <t are allowed. It is possible to bootstrap an empirical probability distribution for δxĉ ,t in two steps:

1.
Sample with repetition the N available analogues with a probability distribution p given by a gaussian kernel centered in xĉ ,t , i.e., the probability of sampling the analogue displacement δx c,t is: Note that the probability of sampling depends only on the Euclidean distance between xĉ ,t and the position of the analogue.

2.
Sample B = 1000 bootstraps with the above procedure (bootstrap) and average the displacements per bootstrap. The global distribution of these averages is the empirical probability distribution for δ xĉ ,t . The mean of the distribution is used as the prediction value and the standard deviation as the uncertainty on the forecast.
An example of the resulting prediction is shown in Figure 8. The analogues are represented in green (not to scale), and the opacity is proportional to their weight in the final prediction. Analogues excluded from the calculation because are observed in the at times t future ≥t are represented as red dots. A blue arrow represents the predicted displacement on the plane (for both GDP and Fitness), while a red arrow represents the observed displacement during ∆t.

Nadaraya-Watson Kernel Regression
Nadaraya-Watson kernel regression was originally introduced in 1964 [32,33]. Its purpose is to estimate the conditional expectation of a variable Y relative to a variable X, which we will denote as E(Y|X), in the hypothesis that the probability distributions f (X, Y) and f (X) exist. If one has n sampled observations (X 1 , Y 1 ), . . . , (X n , Y n ) (where X can be multivariate), the regression model is: where m(x) is a (yet) unknown function and the errors satisfy these hypotheses: One can try to approximate the probability distributions with a kernel density estimation: where K h (x) = K(x/h)/h is a kernel, i.e., a non-negative function such that K(x)dx = 1, and h > 0 is called bandwidth and scales the kernel to provide smoothing to the regression. In this paper we will use only one type of kernel, the gaussian (also known as radial basis function): K(x) = e −x 2 . The conditional expected value can therefore be approximated, using Equations (26) and (27) as: Therefore we can rewrite m in Equation (24) as:

HMM Regularization
As explained in Section 3.1, the traditional way to calculate the M cp matrix consists of calculating the RCA(Equation (14)) and then thresholding it (Equation (15)). This procedure introduces noise in the matrix because very often the value of RCA fluctuates around the threshold. By introducing time in the estimation of the M cp it is possible to mitigate this problem. The procedure has been introduced in [18], and it consists of modelling each RCA cp time series as the emission probabilities of hidden states in a Hidden Markov Model [34] (HMM). The competitive advantage of a given country c in making product p is represented as a series of 4 quantized "developement stages", obtained by calculating the quantiles of the RCA cp time-series. We will call this quantized matrix RCA q To each of these development stages corresponds a probability to express a given value of RCA cp . Countries transition between these development states with a Markov process that has transition matrix T. Both T and the parameters of the RCA distribution are estimated with the Baun-Welch algorithm [34]. Additionally, one separate model is evaluated for each country. The algorithm produces one RCA q cp matrix for each year of observation, containing the most probable development stage at each timestep. The matrices can then be binarized. It can be shown that this regularization technique reduces noise and increases the predictive performance of the SPSb algorithm [18].

Datasets and Product Digits
In this work, we use a dataset containing all the information of the EXM matrix (from which all the Economic Complexity metrics can be calculated). We call it BACI, and it is documented in [35].
The original data in BACI comes from UN-COMTRADE, and it has been further elaborated by CEPII, which sells the right to use it. The elaborated version of the dataset is not in the public domain, but a free version without data cleaning is available on the BACI section of the organization's website [36]. 149 countries are included in our analysis, spanning 21 years from 1995 to 2015. Products are classified by UN-COMTRADE according to the Harmonized System 2007 [37] (HS2007). HS2007 is divided in 16 Sections, which are broad categories such as, e.g., "Vegetable Products", "Textiles", "Metals", and so on. Products are then hierarchically denoted each by a set of 6-digit codes. The code is divided into three 2-digit parts, each specifying one level of the hierarchy: so the first part (Chapter) indicates the broadest categories, such as e.g., "Cereals" (10xxxx). The second two digits (Heading) specify further distinctions in each category, for example, "Rice" (1006xx). The last two digits (Subheading) are more specific, e.g., "Semi-milled or wholly milled rice, whether or not polished or glazed" (100630). For the analysis mentioned in the paper, we look at data for products aggregated at both 4-digit level (1131 products retained) and 6-digit level (4227 products). Data cleaning procedures outside of the HMM regularization mentioned above consist in the elimination of extremely small countries and countries with fragmented data; aggregation of some product categories that are closely related, and (for what we call non-regularized data) a very simple regularization of the M cp matrices based the recognition and substitution of fixed handmade patterns. GDPpc data has been downloaded from the World Bank Open Data website [38].

Conclusions
In this work, we focused on the analysis of Product Complexity, which had received little attention since [20]. The application of the motion model to the 6-digit data set with and without HMM regularisation seems to indicate that much of the change in Complexity over time is due to noise. Further analysis will be certainly needed on this topic, as it could lead to a better understanding of the Complexity measure as discussed in Section 2.3. We suggest that these results should be strengthened and confirmed in future work by an evaluation of the quantity of noise might be carried out, in the fashion of [28,39]. Insights gathered this way might be used to calibrate a model that evaluates the effect of noise on Complexity change over time. Also very interesting is the finding that changes in Complexity might be autocorrelated over time. Further analysis is needed to clarify whether this is true, and if appropriate to understand the causes of the autocorrelation. Applying SPSb to the CL plane seems to confirm the findings of [20] regarding the meaning of logPRODY and gives further grounds to argue that changes in Complexity over time are not relevant. The same suggestions as before apply: further validation with a study of the noise is probably a good research path. We analysed the change in nestedness caused by the HMM regularisation technique on the M cp matrices, and thoroughly validated the statistical significance of the difference with several null models. We suggest that aggregating data from 6 to 4-digit level might have a regularising effect. Finally, in order to be able to apply SPSb to a data set larger by one order of magnitude than what was previously done, we developed proof that SPSb itself converges, for a high number of iterations, to a well-known statistical learning technique, NWKR. The two techniques can be used interchangeably. NWKR has the advantage of being significantly faster, and of producing a deterministic result. The proof also has the benefit of further clarifying the nature of SPSb. This technique belongs to the same family of algorithms that predict by similarity based on distance, such as NWKR and k-nearest neighbours. We suggest that regression trees might do well in its place, too. We also suggest that a further technical development in this field might be the introduction of one of the many flavours of variable-bandwidth NWKR techniques because of the significant changes in density of analogues over the considered data sets.
Author Contributions: All authors contributed equally.
Funding: This research received no external funding.

Acknowledgments:
We would like to thank Luciano Pietronero, Andrea Gabrielli, Andrea Zaccaria, Andrea Tacchella, Dario Mazzilli, for enlightening conversations and for freely exchanging their findings and data with us. Thanks to Anshul Verma for the many useful discussions. We also wish to thank the ESRC Network Plus project "Rebuilding macroeconomics". We acknowledge support from Economic and Political Science Research Council (EPSRC) grant EP/P031730/1. We are grateful to the NVIDIA corporation for supporting our research in this area with the donation of a GPU.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: in the CL plane, but there is another important source of noise too. An important contribution to the change in Fitness is due to new products being exported (or lost) over time. But in a backtesting, the M cp and nRCA matrices fed to Equation (A1) contain only information about products exported at the initial time. This is true for the GDP too, if one substitutes the M cp matrix with the nRCA in Equation (A2).