The Probability Distribution of Worldwide Forest Areas

: This paper analyses the probability distribution of worldwide forest areas. We find moderate support for a Pareto-type distribution (power law) using FAO data from 1990 to 2015. Power laws are common features of many complex systems in nature. A power law is a plausible model for the world probability distribution of forest areas in all examined years, although the log-normal distribution is a plausible alternative model that cannot be rejected. The random growth of forest areas could generate a power law or log-normal distribution. We study the change in forest coverage using parametric and non-parametric methods. We identified a slight convergence of forest areas over the time reviewed; however, random forest area growth cannot be rejected for most of the distribution of forest areas. Therefore, our results give support to theoretical models of stochastic forest growth.


Introduction
A current emerging environmental concern is the loss of forest area in many developed countries. Economic and population growth requires increasing amounts of resources (such as land and timber). If these resources are not renewed, or regeneration is not adequate, one might expect a gradual depletion of resources over time.
After several decades of worldwide deforestation, recent data reports good news. In recent years, the current rates of deforestation have diminished in many countries. The Global Forest Resources Assessment (FRA) 2020, elaborated on by the Food and Agriculture Organization (FAO) of the United Nations [1], highlights that "the rate of net forest loss decreased substantially over the period 1990-2020 due to a reduction in deforestation in some countries, plus increases in forest area in others through afforestation and the natural expansion of forests." This changing trend from a decrease to an expansion in forests was defined as a forest transition by Mather [2] and can be expressed in terms of the environmental Kuznets curve [3]. Empirical evidence supporting the forest transition is increasing (as reported in [4][5][6][7]). Nevertheless, most of these studies are case studies, and the fate of a country's forest area depends on country-and area-specific idiosyncratic factors [8]. These factors include transport costs and trade issues, changes in land use, migrations from rural to urban areas, agricultural sector productivity (which reduces the pressure on arable land), energy diversification (which reduces energy dependence on wood fuel), climate change, and changes to the mindset of individuals who are becoming increasingly concerned about the preservation of nature.
Rather than focusing on a particular case of study, our approach here is global. We aimed to analyse the probability distribution of worldwide forest areas and to search for consistent statistical patterns in forest-area-frequencies. This paper contributes to the literature in several ways. First, using FAO yearly data from 1990 to 2015, we will describe the variation in the frequency distribution of worldwide forest areas over time.
Our benchmark model is a power law. Mandelbrot introduced the idea that one of the main characteristics of "nature" was that it possessed so-called "scaling laws" (or "power laws") [9], which is a property related to the fractal structure of nature, with the word "nature" designating both physical geography and human geography [10]. Therefore, power laws are common features of many complex systems and are applied in studies on varied environmental-related phenomena, such as the intensity of earthquakes [11,12], losses caused by floods [13], precipitation [14], forest fires [15] and the size distribution of national carbon dioxide emissions [16]. They have also been applied in human geography to analyse city and country size distributions [17,18].
Using the method of Clauset et al. [19], we find that the power law is a plausible fit in all years, but there is a plausible alternative model-namely, the log-normal distribution. If the probability distribution of forest areas follows a power law, the relationship between magnitude and frequency could be satisfactorily fitted by a straight decreasing line with a negative slope. This striking empirical regularity could have important empirical, theoretical, and policy implications. However, to our knowledge, this issue has remained unexplored from either a theoretical or an empirical point of view.
Secondly, exploiting our data's yearly temporal dimension, we study the behaviour of the rates of forest area changes. In this case, we test whether there is any relationship between a country's initial forest area and its rate of change: do large forest areas show higher rates of change, or on the contrary, are their deforestation rates higher? In particular, we empirically test random (or stochastic) forest area growth (i.e., the change in forest coverage is independent of the initial forest area), which could generate both Pareto and log-normal distributions. Although we find evidence of a slight convergence in forest areas over the period considered, our results support random growth in forest areas from 1990 to 2015 for most of the distribution of forest areas, indicating that no systematic pattern of change can be identified at a country level.
The paper is organised as follows: Section 2 introduces the materials and methods used. Section 3 shows the results, containing the statistical analysis of the probability distribution of worldwide forest areas and the analysis of its evolution over time. Lastly, Section 4 discusses the main results and concludes.

Data
Forests data come from the FAO statistics (FAOSTAT) concerning forest land by country. Although forests do not conform to geopolitical boundaries (for example, the Amazon rain forest spans nine countries), it is common to assess forest area changes by country. As forests are managed by countries, any changes in forest areas can reflect country-level forest policies.
FAO defines forest land as "land spanning more than 0.5 hectares with trees higher than 5 metres and a canopy cover of more than 10 per cent, or trees able to reach these thresholds in situ." Chen et al. highlight the limitations presented by FAO's concept of forest land [20]. The main one is that the term "forest" in the FRA reports represents a type of land use rather than physical trees compared to the satellite-based land cover datasets. Thus, an area can be classified as a forest if it is registered as "forest" land use, even if there is no tree. Data are collected from FAO member countries through the annual FAO questionnaire on land use, irrigation, and agricultural practices. The questionnaire design process involved users, national correspondents, and experts from various technical backgrounds. Starting from the Global Forest Resource Assessment [21][22][23][24][25] data on forest area for the years 1990, 2000, 2005, 2010 and 2015, FAO provides annual data via linear interpolation to obtain a complete time series from 1990 to 2015, which is the period considered in this study. Thus, we can estimate the year-by-year statistical distribution. Table 1 shows the sample sizes for each year and the descriptive statistics. The number of observations represents the number of countries included in the sample, which slightly increased over time. Forest land is reported in 1000 hectares. Values of the average forest area and its standard deviation are quite persistent, but a slight decrease in both statistics can be observed over time. This decrease is observed in the first half of the 1990s, and in the last several decades, the mean forest area has stabilised above 18 million hectares. The maximum value corresponds to the Russian Federation (former USSR in 1990 and 1991) in all years, while the minimum forest area is always recorded at the Faroe Islands. Our sample includes all countries with no size restriction but, although the number of countries in the sample is high (ranges from 196 in 1990 to 223 in 2015), we acknowledge that the FAO data includes only a partial list of countries because it does not provide information on all countries worldwide because some countries are non-represented in the FAO Regular Programme. Moreover, these data could be biased towards some areas on the planet (and some forest types) as noted in FAO reports using these data to assess global forests. Therefore, our results are restricted to a subset of the planet. Nevertheless, in 2015 most data were reported from countries themselves (a total of 155 reports)-countries that contain 98.8% of the world's forests according to the FAO data. The quality of data also improved over time. Keenan et al. found that estimates of about 60% of global forest area in 2015 were reported to be based on data of the highest quality (e.g., using remote sensing data), while this figure was 57% in 1990 [26]. This improvement in data quality is especially significant in certain areas, such as the tropical countries [27], although several methodological issues persist. Natural variation in forest growth and estimation errors in forest inventories can cause uncertainty in forest growth and developmental predictions [28]. Even pixel-based comparisons of land cover maps built using remote sensing data can reveal spatial disagreement and uncertainty [29]. This problem usually arises in the measurement of natural resources, and several technical methods have been developed to address this issue; for some real-life case studies of uncertainty on sustainability see [30][31][32][33][34][35]. Global Forest Resources Assessments to reveal uncertainties in the global forest changes in the early 21st century, finding that these datasets displayed substantial divergences in total area, spatial distribution, latitudinal profile, and annual area change [20]. Nevertheless, Chen et al. conclude that an inconsistent definition of forest areas is not the major factor driving the inconsistencies in the overall global forest area change, and acknowledge that the FRA reports are the most comprehensive forest assessment datasets, which are widely used for forest conditions popularizations, policy guidance, and land cover data accuracy validations [20].

Power Laws and Curve Fitting
Let S denote the forest area (measured in hectares) by country. If forest area is distributed according to a power law, also known as a Pareto distribution, the density ∀S ≥ S, in which a > 0 is the Pareto exponent (or the scaling parameter), and S is the number of forest hectares in the country at the truncation point, which is the lower bound to the power law behaviour.
Taking natural logarithms, we obtain a linear specification: where u represents a standard random error (E(u) = 0 and Var(u) = σ 2 ) and ln A is a constant. The greater the coefficientâ, the more homogeneous forest areas are across countries. Similarly, a small parameter (less than 1) indicates a heavy-tailed distribution. From Equation (1), it seems easy to estimate the Pareto exponent because it is just the slope of a line fitted by Ordinary Least Squares (OLS). However, this regression analysis used commonly in the literature can present some problems [36]. The main one is that the Maximum Likelihood (ML) estimator is more efficient if the underlying stochastic process is really a Pareto distribution [37,38]. Furthermore, both [37] and [19] highlight that the OLS estimates of the Pareto exponent are subject to systematic and potentially large errors. Finally, this procedure is strongly biased in small sample sizes [39].
Therefore, to overcome these limitations we use we use an innovative method proposed by Clauset et al. to estimate power laws, based on the (ML) estimator of the Pareto exponent [19]:â where n is the number of data points. Sample size is an important issue for the ML estimation. The ML estimator's properties are consistency, normality, and efficiency, but only when the sample size approaches infinity. In the field of power laws estimation, [38] showed that the variance of the estimates obtained with the ML estimator is notably lower than that of the estimates using a linear fit on the first five bins in the frequency distribution. In fact, the ML estimator has been shown mathematically to be the minimum variance unbiased estimator [40]. Therefore, the ML estimator has a lower variance than any other unbiased estimator for all possible values of the scaling parameter. This makes the ML estimator the most accurate and robust method for estimating the power law scaling parameter [41]. Therefore, even if the sample size is low, ML is less biased than other estimators. Clauset et al. [19] propose an iterative method to estimate the adequate truncation point (S). The exponent a is estimated for each S i ≥ S using the ML estimator (bootstrapped standard errors are calculated with 1000 replications). Then, the Kolmogorov-Smirnov (KS) statistic is computed for the data and the fitted model. The S lower bound that is finally chosen corresponds to the value of S i for which the KS statistic is the smallest.
Clauset et al. [19] proposed several goodness-of-fit tests (alternative statistical tests to check whether a variable is Pareto distributed are available; for instance, see [42]). In the same way as Brzezinski, we used a semi-parametric bootstrap approach [43]. This procedure is based on the iterative calculation of the KS statistic for 500 bootstrap dataset replications. This method samples from observed data and checks how often the resulting synthetic distributions fit the actual data as poorly as the ML-estimated power law. Thus, the null hypothesis is the power law behaviour of the original sample for S i ≥ S. Nevertheless, this test has an unusual interpretation because we can always fit a power law regardless of the true distribution from which our data were drawn. Clauset et al. [19] recommend the conservative choice that the power law is ruled out if the p-value is below 0.1: "that is, it is ruled out if there is a probability of 1 in 10 or less that we would merely by chance get data that agree as poorly with the model as the data we have". Therefore, this procedure only allows us to conclude whether the power law is a plausible fit to the data. Finally, we compare the linear power law fit with the fit provided by other non-linear standard statistical distributions, the log-normal and exponential distributions. The density functions for these distributions are for the log −normal and p(S) = e −λS λe λS for the exponential. We use Vuong's model selection test to make bilateral comparisons between the power law and the other distributions. Although there are specific tests designed to compare the fit provided by a power law and a log-normal distribution [44], Vuong's test allows for the comparison between any two distributions. The test is based on the normalised log-likelihood ratio; the null hypothesis is that the two distributions are equally far from the true distribution, while the alternative is that one of the test distributions is closer to the true distribution. High p-values indicate that one model cannot be favoured over the other, while low values indicate that one of the two distributions provides a better fit to the true distribution. If the null hypothesis is rejected, the sign of the normalised log-likelihood ratio indicates which one of the two compared distributions is closer to the empirical data.

Parametric and Non-Parametric Empirical Models of Change in Forest Coverage
Again, let S it be the forest area (measured in hectares) of the country i at time t and let Change it be its logarithmic change in forest coverage; then Change it = ln S it − ln S it−1 . One possible issue related to the change in forest coverage is that it might change simply because the country's area changed over time. To check whether this scenario could be the case, we use country area data from FAO to compute the change rate of country's area. Most of the rates are zero since countries' boundaries usually do not change over time. Only in 187 cases (3.4% of the total), we obtained a non-zero rate of change in land area. We then calculate the correlation between the rate of change in forest coverage and the rate of change in the country area in the same year, when the latter is non-zero: Spearman's rho = 0.0038. Furthermore, we also run a test in which the null hypothesis is that both rates are independent, and the p-value of the test is 0.9614. Therefore, we can conclude that changes in forest coverage are not significantly driven by changes in country area.
Next, we define g it as the normalized rate of change (by subtracting the contemporary mean and dividing by the standard deviation in the relevant year). Rates of change are normalised because we are considering a panel of rates of change from different years. The hypothesis we aim to test is the random (or stochastic) growth of forest areas, that is, whether the rate of change of the forest areas is independent of its initial area. First, we consider the following parametric model of change in forest coverage: where φ j are country fixed effects, δ t are time fixed effects, and u it is the residual term, which we assume to be identically and independently distributed for all countries, with E(u it ) = 0 and Var(u it ) = σ 2 ∀i, t. The specification includes a square term of the initial forest area (a quadratic function) to capture non-linearity in the relationship between change in forest area and initial size. Note that Equation (2) could be easily extended to include any other variables that can have an important influence on the growth of forests, from human actions to climate change [45,46]. Nevertheless, as our main interest is to tests the random (or stochastic) growth of forest areas and not to analyse the different drivers of change in forest coverage, the model in Equation (2) does not include additional regressors and the β j are the key coefficients, capturing the effect of the initial forest area on the rate of change. However, although the β j coefficients help to detect non-linearities, a parametric regression is not necessarily the best way to address such non-linear relationships. Some authors [47] have highlighted the advantages of the non-parametric approach over the standard parametric one. Mainly, non-parametric methods do not impose any structure on underlying relationships that may be non-linear and may change over time (no need to restrict the relationship to being stationary).
Therefore, we also perform a non-parametric analysis using kernel regressions [48]. This consists of taking the following specification: where g i is again the normalized rate of change (by subtracting the contemporary mean and dividing by the standard deviation in the relevant year) and s i is the logarithm of the ith country's forest area (s i = ln S i ). Instead of making assumptions about the functional relationship m,m(s) is estimated as a local mean around point s and is smoothed using a kernel, which is a symmetrical, weighted, and continuous function in s. The kernel used is an Epanechnikov, and the bandwidth is set using Silverman's rule of thumb. Thus, this non-parametric estimate allows the rate of change to vary with the initial forest area over the entire distribution. We run the kernel regression for each period and for a pool from 1990 to 2015, using the Nadaraya-Watson method to estimatem(s). As a robustness check, we re-estimated the kernel regression using the LOcally WEighted Scatter plot Smoothing (LOWESS) algorithm instead of the Nadaraya-Watson method and identified similar results (these results are available from the author upon request). As the rates of change are normalised, if the change was independent of the initial forest area, the non-parametric estimate would be a straight line on the zero value and values different from zero would involve deviations from the mean: significant higher-than-zero values would indicate divergence (with larger forest areas showing rates of change higher than those of the smaller ones), while negative estimates would point to convergence with the largest forest areas showing rates of change lower than those of the smaller units.

The Probability Distribution of Worldwide Forest Areas
We use the yearly FAO dataset to estimate the probability distribution of worldwide forest areas by year from 1990 to 2015 by fitting a power law for each period of our yearly sample of countries. Data interpolation could generate some doubts about the robustness of the annual data set. Furthermore, one possible concern with our analysis might be that the change in the list of countries might bring about bias to our results. Thus, as a robustness check, in Appendix A, we re-estimate the main results using only the FRA periodical data, considering a fixed list of countries. Figure 1 shows the results for four selected years: 1990, 2000, 2010, and 2015 (the results for all of the years are available from the author upon request). The data, plotted as a complementary cumulative distribution function (CCDF), are fitted by a power law, and its exponent is estimated using the ML estimator. For illustrative purposes, the log-normal distribution is also fitted to the data by ML (the blue dotted line). The optimal lower bound for both distributions is estimated using the method provided by Clauset et al.'s [19] method. The black line indicates the power law behaviour of the upper tail distribution. data, considering a fixed list of countries. Figure 1 shows the results for four selected years: 1990, 2000, 2010, and 2015 (the results for all of the years are available from the author upon request). The data, plotted as a complementary cumulative distribution function (CCDF), are fitted by a power law, and its exponent is estimated using the ML estimator. For illustrative purposes, the lognormal distribution is also fitted to the data by ML (the blue dotted line). The optimal lower bound for both distributions is estimated using the method provided by Clauset et al.'s [19] method. The black line indicates the power law behaviour of the upper tail distribution.
The estimated Pareto exponent is quite persistent over time with a value around 1.8 across all years (see Table 2). Not only is the scaling parameter persistent, the threshold estimate is also consistent over time with values around 8000 in most years. Thus, the power law tail starts from forest areas ≥ 8,000,000 hectares, including an average number of 61 countries by year. Note that this optimal threshold identifies the point of the forest area distribution at which the data's power law behaviour starts. We fit the different distributions to the upper tail, which means that our analysis focuses only on the countries with the largest forest areas, and although the threshold and the size of the upper tail may vary between years (not dramatically, as shown in Table 2), there are few variations in the sample of countries included in the upper tail. Therefore, although some countries enter the sample of FAO data over time (see Table 1), this variation in the set of countries should not cause a significant change in our results for the upper tail distribution because usually they are small countries. Nevertheless, Appendix A demonstrates that our results are robust regardless of new country entries in the sample.  The estimated Pareto exponent is quite persistent over time with a value around 1.8 across all years (see Table 2). Not only is the scaling parameter persistent, the threshold estimate is also consistent over time with values around 8000 in most years. Thus, the power law tail starts from forest areas ≥ 8,000,000 hectares, including an average number of 61 countries by year. Note that this optimal threshold identifies the point of the forest area distribution at which the data's power law behaviour starts. We fit the different distributions to the upper tail, which means that our analysis focuses only on the countries with the largest forest areas, and although the threshold and the size of the upper tail may vary between years (not dramatically, as shown in Table 2), there are few variations in the sample of countries included in the upper tail. Therefore, although some countries enter the sample of FAO data over time (see Table 1), this variation in the set of countries should not cause a significant change in our results for the upper tail distribution because usually they are small countries. Nevertheless, Appendix A demonstrates that our results are robust regardless of new country entries in the sample. The power law appears to provide a good description of the distribution behaviour. In contrast, the fit of the log-normal distribution does not seem to be visually appealing, especially for the higher observations. Nevertheless, visual methods can lead to inaccurate conclusions, especially at the upper tail because of large fluctuations in the empirical distribution [49], so next we conduct statistical tests on the goodness of fit. Table 2 shows the results of the tests. The p-values of the test are always higher than 0.1, confirming that the power law is a plausible approximation to the data's real behaviour, as we cannot reject the power law in any case.
Results in Table 2 show that the log-normal distribution is a plausible alternative to the power law that we cannot reject (to run the test we used the same lower bound, the estimated value corresponding to the power law). In contrast, while the exponential distribution is clearly rejected with low p-values of the test and a positive and large value (not shown for size restrictions, but available from the author upon request) of the normalized log-likelihood ratio for all years. Therefore, using the terminology described by Clauset et al.'s [19], we obtain moderate support for the power law behaviour of the crosscountry probability distribution of forest-area-frequencies: the power law is a plausible fit, but there is also a plausible alternative.

An Analysis of Change in Forest Coverage
The above results suggest what can be considered as a snapshot of the probability distribution of worldwide forest areas from 1990 to 2015. For each year, we estimated the Pareto exponent and conducted a goodness-of-fit test to indicate the plausibility of a power Sustainability 2021, 13, 1361 9 of 19 law model. Furthermore, our estimates revealed that the exponent of the power law for the upper-tail distribution remained stable throughout the considered period. However, this finding does imply that the distribution of worldwide forest areas remains static. To illustrate this point, Figure 2 shows the empirical density functions for the first and last periods in our sample (1990 and 2015), which was estimated using adaptive kernels. Contrary to the previous analysis where we focused on the upper-tail distribution behaviour, all observations are considered; the average estimated threshold was 8145 (see Table 2), which in logarithmic terms corresponds to a value roughly equal to 9. Although the shape of the empirical distribution is quite similar in both periods, we can observe a loss of density in both the upper and lower tails and, as a result, an increase in density in the central values. Therefore, in 2015 we find that the empirical distribution of forest area coverage was slightly more even than in earlier years.  Economic literature on the distribution of financial assets [50], firm size [51], and cit size [52] usually concludes that a Pareto-type distribution is generated by a random growth process (in the firm and city size literature, this hypothesis is called Gibrat's law Furthermore, other plausible, alternative models that cannot be rejected in the previou empirical analysis, such as the log-normal distribution, can also generate random rates o change in forest areas. The hypothesis that is usually tested states that the rate of chang of the variable is independent of its initial size (i.e., the underlying growth model is multiplicative process).
In ecology and bioeconomics the common approach is also to treat changes in th resource population as a random variable [53]. The bioeconomics literature typically as sumes that the standard deviation is proportional to the resource population and com bines this with a mean growth component following a Brownian motion that can be geo metric [54] or logistic [55]. Therefore, our analysis of the rates of change can be considere as a test of the models of stochastic forest growth.
We carry out a dynamic analysis of the change in forest areas using the parametri and non-parametric methods, as the FAO dataset enables us to calculate the yearly rate of change in forest areas by country. Table 3 shows the results of the OLS estimation o the parametric model of Equation (2). The first column corresponds to a simple bivariat regression, which indicates a negative and significant impact of the initial forest area o the change in forest coverage. In column 2, we add country and year fixed effects to con trol for temporal shocks and unobserved characteristics that can vary at a country leve Economic literature on the distribution of financial assets [50], firm size [51], and city size [52] usually concludes that a Pareto-type distribution is generated by a random growth process (in the firm and city size literature, this hypothesis is called Gibrat's law). Furthermore, other plausible, alternative models that cannot be rejected in the previous empirical analysis, such as the log-normal distribution, can also generate random rates of change in forest areas. The hypothesis that is usually tested states that the rate of change of the variable is independent of its initial size (i.e., the underlying growth model is a multiplicative process).
In ecology and bioeconomics the common approach is also to treat changes in the resource population as a random variable [53]. The bioeconomics literature typically assumes that the standard deviation is proportional to the resource population and combines this with a mean growth component following a Brownian motion that can be geometric [54] or logistic [55]. Therefore, our analysis of the rates of change can be considered as a test of the models of stochastic forest growth.
We carry out a dynamic analysis of the change in forest areas using the parametric and non-parametric methods, as the FAO dataset enables us to calculate the yearly rates of change in forest areas by country. Table 3 shows the results of the OLS estimation of the parametric model of Equation (2). The first column corresponds to a simple bivariate regression, which indicates a negative and significant impact of the initial forest area on the change in forest coverage. In column 2, we add country and year fixed effects to control for temporal shocks and unobserved characteristics that can vary at a country level.
The estimated coefficient for the initial forest area is not significant, although it remains negative. Finally, in column 3 we use the full specification, including both the log-level of initial forest area and its square term, and the country and time fixed effects. The estimated coefficients are significantly different from zero, withβ 1 < 0 andβ 2 > 0, which implies a U-shaped relationship between change in forest coverage and initial forest area, pointing to a robust non-linear relationship between both variables.  Figure 3 shows the non-parametric estimates for a selection of years (the results for all the years are available from the author upon request). The graphs also include the 95% confidence bands. We only focus on the relationship between mean rate of change and initial size although, strictly speaking, random growth implies that the rate has a distribution function with both mean and variance independent of the initial size. The decreasing pattern is clear in all cases and demonstrates that the greater the initial forest area, the lower the rate of change of forest area. This is especially noticeable during 1990 to 1991, in which estimated rates of change are significantly negative for large forest areas. However, we can observe a recovery in the rates at the upper tail end of the distribution for all years. This corresponds with the data from countries with the largest forest areas, displaying high rates of change. Overall, this pattern points to convergence (mean reversion) across countries, especially for the smallest units, consistent with the change in the empirical distribution shown in Figure 2. Nevertheless, the zero value falls within the confidence bands for the entire distribution of forest areas (especially since the years from 2000 to 2001). Thus, we cannot reject the premise that in recent years the rate of change of forest areas has differed from zero (random forest area growth hypothesis).   Figure 3 shows the non-parametric estimates for a selection of years (the results for all the years are available from the author upon request). The graphs also include the 95% confidence bands. We only focus on the relationship between mean rate of change and initial size although, strictly speaking, random growth implies that the rate has a distribution function with both mean and variance independent of the initial size. The decreasing pattern is clear in all cases and demonstrates that the greater the initial forest area, the lower the rate of change of forest area. This is especially noticeable during 1990 to 1991, in which estimated rates of change are significantly negative for large forest areas. However, we can observe a recovery in the rates at the upper tail end of the distribution for all years. This corresponds with the data from countries with the largest forest areas, displaying high rates of change. Overall, this pattern points to convergence (mean reversion) across countries, especially for the smallest units, consistent with the change in the empirical distribution shown in Figure 2. Nevertheless, the zero value falls within the confidence bands for the entire distribution of forest areas (especially since the years from 2000 to 2001). Thus, we cannot reject the premise that in recent years the rate of change of forest areas has differed from zero (random forest area growth hypothesis).   We also build a pool with all the annual rates of change between two consecutive years; there are 5453 forest area-rate of change pairs in the period from 1990 to 2015. Graph (a) in Figure 4 shows the kernel regression of the rate of change for the pool. The estimated mean rate decreases with the initial forest area, indicating a convergent pattern throughout the period while rejecting random growth. Previous year-by-year results showed that since 2000, rates of change were not significantly different from zero, so the pool results may be driven by the years prior to 2000. Nevertheless, another explanation could be that the density of observations around each point of the forest area distribution is uneven (especially at the lower tail of the distribution, with fewer observations), explaining why the estimate is jagged instead of smooth.
To check this last issue, Graph (b) in Figure 4 shows the stochastic kernel estimation of the distribution of normalised rates of change, conditional on the distribution of initial forest areas at the same date. To make the interpretation easier, a contour plot is shown. The graph reveals that although there are some deviations in the rates (especially at the upper tail of the distribution), most of the bivariate density is concentrated around the zero value (which is consistent with the previous finding, noting the tiny scale in the yaxis in Graph (a) in Figure 4). This means that a small number of observations drives the apparent decreasing pattern observed in Figure 4a and that the true pattern for the entire pool is random forest area growth, indicating that both distributions (forest areas and rates of change) are independent for most of the observations throughout the period from 1990 to 2015. We also build a pool with all the annual rates of change between two consecutive years; there are 5453 forest area-rate of change pairs in the period from 1990 to 2015. Graph (a) in Figure 4 shows the kernel regression of the rate of change for the pool. The estimated mean rate decreases with the initial forest area, indicating a convergent pattern throughout the period while rejecting random growth. Previous year-by-year results showed that since 2000, rates of change were not significantly different from zero, so the pool results may be driven by the years prior to 2000. Nevertheless, another explanation could be that the density of observations around each point of the forest area distribution is uneven (especially at the lower tail of the distribution, with fewer observations), explaining why the estimate is jagged instead of smooth.
To check this last issue, Graph (b) in Figure 4 shows the stochastic kernel estimation of the distribution of normalised rates of change, conditional on the distribution of initial forest areas at the same date. To make the interpretation easier, a contour plot is shown. The graph reveals that although there are some deviations in the rates (especially at the upper tail of the distribution), most of the bivariate density is concentrated around the zero value (which is consistent with the previous finding, noting the tiny scale in the y-axis in Graph (a) in Figure 4). This means that a small number of observations drives the apparent decreasing pattern observed in Figure 4a and that the true pattern for the entire pool is random forest area growth, indicating that both distributions (forest areas and rates of change) are independent for most of the observations throughout the period from 1990 to 2015.

Discussion and conclusion
Power laws appear extensively in physics, biology, earth and planetary sciences, economics and finance, computer science, demography, and the social sciences [56]. In this paper, we analyse whether the distribution of forest area coverages can also be well-described by a power law. Using cross-country FAO data of forest areas by year from 1990 to 2015, our study on the probability distribution of forest-area-frequencies reveals new insights: 1. Using the methodology described by Clauset et al., we found moderate support for a power law in the upper tail distribution of worldwide forest areas [19]. A power law cannot be rejected in any period. It is a finding consistent with the size distribution of other phenomena related to forests, such as rainfall precipitations [14] and

Discussion and conclusion
Power laws appear extensively in physics, biology, earth and planetary sciences, economics and finance, computer science, demography, and the social sciences [56]. In this paper, we analyse whether the distribution of forest area coverages can also be welldescribed by a power law. Using cross-country FAO data of forest areas by year from 1990 to 2015, our study on the probability distribution of forest-area-frequencies reveals new insights: 1.
Using the methodology described by Clauset et al., we found moderate support for a power law in the upper tail distribution of worldwide forest areas [19]. A power law cannot be rejected in any period. It is a finding consistent with the size distribution of other phenomena related to forests, such as rainfall precipitations [14] and forest fires [15]. However, the log-normal distribution appears to be a plausible alternative model that we cannot reject in any case.

2.
The scaling parameter's estimated value is around 1.8 during all periods and is quite consistent over time. This finding indicates stability in the probability distribution of worldwide forest areas over time, a result confirmed by the estimated empirical density functions.

3.
The study of the rates of change in forest areas reveals that the distribution of forest areas and their rates of change are independent for most of the observations throughout the whole period from 1990 to 2015. Therefore, random (or stochastic) forest area growth cannot be rejected for most of the distribution of forest areas, which could explain the resulting Pareto (power law) or log-normal probability distribution. 4.
In Appendix A, we run some robustness checks and re-estimate the main results using a subset of the primary sample that includes a fixed list of countries in all years and excludes the annual interpolated values estimated by the FAO. Results are quite similar to those obtained using the full sample from FAO, indicating that our results are not biased by changes in the sample size or issues related to interpolations.
The power law behaviour of forest areas data has important implications. First, the power law, and fat tails in general, have relevant policy implications [16]. Knowledge about the probability distribution of worldwide forest areas can play a crucial role in designing efficient environmental policies. Since a power law characterises the upper tail of the distribution, the total forest area in the world is mainly determined by the largest units; nowadays, five countries (the Russian Federation, Brazil, Canada, the United States, and China) account for 54% of the total forest area in the world, according to the FAO data. Therefore, it makes more sense to focus on these countries that contain the largest forest areas instead of dealing with smaller countries from a policy point of view. Such efforts could facilitate international negotiations and make agreements easier to reach.
Second, a power law suggests complex systems of forest areas. This type of complex systems of forest areas involves a hierarchy that is statistically self-similar and hence fractal [9,57,58]. Therefore, systems of forest areas are a kind of hierarchy with a cascade structure that is similar to other hierarchies observed in nature, such as the hierarchy of rivers and the energy distributions of earthquakes [59].
Third, our findings regarding the random rates of change of forest areas give support to models of stochastic forest growth [53]. Previous research on forest management under uncertainty focuses on the influence of stochasticity from different sources [60]; among them, we can highlight climate change [61]. Although stochastic forest growth can be modelled in several ways (as a Brownian motion process [54] or using Markov models [62], stochasticity in forest growth influences critical issues such as planting and harvest decisions [63]. Finally, our results strongly depend on the dataset used (explicitly, on the "forest" definition by FAO) and on the statistical functions considered. Further research varying any of these two features can generate new findings and remains an open question. We can propose several future lines of research: first, the use of different datasets could provide strong validation of the empirical patterns observed. As we acknowledged in Section 2, FAO data have some problems. Chen et al. recommend the use of the Hansen [64] dataset for forest cover and forest cover change estimations [20], although there are more datasets available. A meta-analysis using data from different sources could provide robust information about forest areas' statistical worldwide distribution. Second, as the log-normal distribution provides an alternative fit that we cannot reject in any case, a mixed distribution combining a log-normal body and power law tails could be considered [65][66][67]. This kind of convoluted distributions could provide a more accurate fit to the probability distribution of forest areas. To date, they have only been applied to study city size distributions [65][66][67][68] and the size distribution of national CO 2 emissions [16].
Funding: This research was funded by Fundación Ibercaja, "Proyectos de investigación, desarrollo e innovación para jóvenes investigadores", 2019 call. The author also acknowledges the financial support of the Spanish Ministry of Economy and Competitiveness (project ECO2017-82246-P), DGA (ADETRE research group), and ERDF.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: A publicly available dataset by FAO was analyzed in this study. These data can be found here: http://www.fao.org/faostat/en/#home, accessed on 15 December 2020.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A
In this Appendix, we conduct a robustness check: we re-estimate the main results of the paper using only Global FRA periodic data for the years 1990, 2000, 2005, 2010, and 2015 [21][22][23][24][25]. Therefore, we exclude linear interpolated values by FAO between FRA years, as natural variability and randomness of the data would disappear (data smoothing) through interpolation. Furthermore, we also use a balanced panel: the countries' list is the same in all periods. The subset of 189 countries considered is fixed, and thus, our results cannot be driven by changes in the sample size or issues related to interpolation. Nevertheless, this sample size is a smaller subset since it contains selected countries compared to the full sample size used in the previous sections. This change results in a lower value coverage of the worldwide forest land.
Thus, we fit the statistical distributions again to this fixed sample of countries. Table A1 reports the new results, which are consistent with our previous findings. Furthermore, the estimated exponent is relatively stable over time with a value of around 1.8 over all years. The threshold estimates are similar to those reported in Table 2, with values a slightly above 8000. The average tail size is 56 countries. The p-values of the power law test are still higher than 0.1, except for the year 2005. Therefore, the power law is a plausible approximation to the data's real behaviour in most periods, although we reject the power law in 2005. Regarding the comparison with the different non-linear distributions, again the log-normal distribution appears to be a plausible alternative to the power law that we cannot reject. In contrast, the Pareto distribution outperforms the exponential distribution. Next, we re-estimate the empirical density functions using adaptive kernels for our fixed sample of 189 countries. Results shown in Figure A1 display an almost identical pattern to Figure 2, confirming that these results are robust even with new countries' inclusion. From 1990 to 2015, we observe a slight decrease of density in the tails and an increase in density in the central values.  We also re-examine the relationship between changes in forest coverage and initial forest area. In this scenario, the rates of change are not annual since they measure the change in forest land between the FRA years:  Table A2, are quite similar to those reported in Table 3: we obtain a negative and significant impact of the initial level of forest area on the rate of change. The main difference is that now, the estimate of the initial forest land's square term is not significant, thus casting some doubts about the non-linear effect of the initial forest area on change. We also re-examine the relationship between changes in forest coverage and initial forest area. In this scenario, the rates of change are not annual since they measure the change in forest land between the FRA years:  Table A2, are quite similar to those reported in Table 3: we obtain a negative and significant impact of the initial level of forest area on the rate of change. The main difference is that now, the estimate of the initial forest land's square term is not significant, thus casting some doubts about the non-linear effect of the initial forest area on change. The non-parametric estimates of the rate of change in forest areas still show a decreasing pattern in all periods (see Figure A2). Similar to the results shown in Figure 3 Finally, Figure A3 shows the non-parametric results for a pool with all rates of change between two consecutive periods; now, there are 756 forest area-rate of change pairs.  Finally, Figure A3 shows the non-parametric results for a pool with all rates of change between two consecutive periods; now, there are 756 forest area-rate of change pairs. Graph (a) in Figure A3 shows the kernel regression of the rate of change for the pool. The estimated mean rate decreases with the initial forest land, but the estimated values are smoother than those presented in Figure 4a. Graph (b) in Figure A3 displays the stochastic kernel estimation of the distribution of normalised rates of change, conditional on the distribution of initial forest areas at the same date, showing a very similar plot to that shown in Figure 4b. Again, most of the bivariate density is concentrated around the zero value.
Overall, results in this Appendix using a balanced sample of countries and excluding interpolated values of forest areas are quite similar to those obtained in previous sections considering the full sample of countries by year provided by FAO. Therefore, we confirm that our results do not appear to have been driven by changes in the sample size or interpolation issues.