Comparing Reported Forest Biomass Gains and Losses in European and Global Datasets

Net CO2 emissions and sequestration from European forests are the result of removal and growth of flora. To arrive at aggregated measurements of these processes at a country’s level, local observations of increments and harvest rates are up-scaled to national forest areas. Each country releases these statistics through their individual National Forest Inventory using their particular definitions and methodologies. In addition, five international processes deal with the harmonization and comparability of such forest datasets in Europe, namely the IPCC, SOEF, FAOSTAT, HPFFRE, FRA (definitions follow in the article). In this study, we retrieved living biomass dynamics from each of these sources for 27 European Union member states. To demonstrate the reproducibility of our method, we release an open source python package that allows for automated data retrieval and analysis, as new data becomes available. The comparison of the published values shows discrepancies in the magnitude of forest biomass changes for several countries. In some cases, the direction of these changes also differs between sources. The scarcity of the data provided, along with the low spatial resolution, forbids the creation or calibration of a pan-European forest dynamics model, which could ultimately be used to simulate future scenarios and support policy decisions. To attain these goals, an improvement in forest data availability and harmonization is needed.


Introduction
Forests consist of the largest terrestrial ecosystems that actively store carbon in living biomass. The sequestration effect is highly relevant in the context of climate change mitigation. Quantifying the magnitude of this effect remains a very active research topic [1]. In addition, forests are an important source of raw materials and renewable energy. For instance, substituting fossil-based materials with forest products provides additional climate mitigation benefits [2,3]. Also, as an integral ecosystem, forests provide countless benefits such as biodiversity habitats and recreation services.
In this context, researchers build models and run scenarios that simulate the synergies and trade-offs between these different demands that are made on forest ecosystems [4,5]. Scenarios and models are generally parametrized or trained to reproduce historical developments. As such, a prerequisite for the calibration of a model is to obtain precise and wide ranging data about the current and historical state of forests, as well as records of silvicultural practices. At the minimum these should include the explanatory variables of growth/yield and harvest models, i.e., forested area, species composition, increments and harvest rates.
National Forest Inventories (NFI) make data directly available for each European country individually through their websites. The direct use of NFI data would certainly offer a high level of detail and number of modeling features such as age breakdown and species composition. However, we were not able to use these data sources for the following reasons: (i) It would require parsing dozens of different websites which each have different do not distinguish between the two. The distinction is complex because changes in natural mortality, anthropogenic and natural disturbances are frequently combined.
Changes in biomass volume through time can generally be described by the difference between increments (in green) and fellings (in red) visible in Figure 1. The gross increment box corresponds to all the above ground biomass growth. It was given a formal definition during the COST action usewood [6]. The net increment is the gross increment minus natural losses (Figure 1 [7].
By crossing these different datasets amongst each other, we were able to compare biomass growth and disturbances albeit at a very aggregated level. These aggregates should be useful for checking whether a forest dynamics model reproduces the historical biomass growth and disturbance trends or whether it diverges from official figures.
In contrast to forests in the boreal or tropical zones, a majority of Europe's forest are managed. Table A1 shows that most countries have close to 90% of their forest area available for wood supply according to the SOEF. Therefore it makes sense to compare the orders of magnitude of biomass dynamics across these sources even though they have different definitions of what constitutes forest land.
There is a need to facilitate exchange between scientific communities. For example, ref. [8] encourages collaboration between scientists contributing to the IPCC assessment reports and scientists contributing to the greenhouse gas inventory reports. In a similar vein, ref. [9] encourages data integration between sources to provide "detailed information at large spatial and long temporal scales that can be used in different modeling frameworks".
Finally, advances in remote sensing are changing the playing field. Forest dynamics modeling can only provide meaningful information when it is calibrated with ground data. Because of the high cost of data collection, inventory data are scarce with a periodicity of 5 to 10 years and a spatial resolution limited to a few hundred or a few thousand sample plots per country. Information gained from these sample plots is then extrapolated to the area of a whole country. In conjunction, airborne and space-based sensors complement ground data by providing repeated measurements over wide areas. However remote sensing observations from satellites and planes do not measure biomass directly. They measure various signals in the electromagnetic spectrum which are indicators of biomass stock (tree structure) or biological processes (photosynthesis). The temporal and spatial resolution of remote sensing methods has recently increased dramatically, with measurements available every year or even at a higher frequency and down to a spatial resolution of 30 m [10]. The detection of land use changes from forestry to agriculture is now performed routinely with a good accuracy at the global level. In permanently forested areas however, current remote sensing maps provide static information on the presence or absence of forest cover. Within a forested pixel, biomass has a fixed value [11]. When biomass is present, it is considered to be at a stable state without any notion of fluxes (input and outputs). Indeed, remote sensing based detection of tree growth or small scale biomass loss remains highly uncertain [12]. In the future, increased use of airborne LIDAR maps will help link field plot data to the scale of satellite based measurements [13].
In contrast to land use changes happening at the global scale, in EU countries, tree cover changes by less than 1% per year [7]. Therefore, the vast majority of forest biomass dynamics and related carbon fluxes happens in forest land remaining forest land. For the foreseeable future, ground measurement will remain the most precise source of information on slow growth processes. And this is where aggregating inventory data remains crucial.

Data Sources
We set out to retrieve data concerning forests on the European continent and to include only data sources that would provide measurements on multiple countries concurrently in a harmonized format. We focused on obtaining measurements expressing the amount of forested areas as well as growth and harvest rates.
There were several public data sources accessible online that provided these types of information in various forms and levels of granularity. Five data sources were identified and included in this study. Each one is detailed in the sections below.
We chose to include 27 countries in the analysis. Namely, all past and present European member states with the exception of Malta which has less than 400 hectares of forested land [7]. Not all countries were present in each data source, unfortunately.

IPCC
The "Intergovernmental Panel on Climate Change" (IPCC) is the United Nations body responsible for assessing the science related to climate change. An integral part of this process is the collection of emission estimates for each participating country. Under the name "National Inventory Submissions", they provide a common reporting format that assembles data on "all greenhouse gas (GHG) emissions and removals, implied emission factors and activity data" [14].
The IPCC provides a bulk download option [15] but it does not distinguish forest land emissions by losses and gains. To retrieve data from the IPCC National Inventory Submission website, we first parsed the HTML table [14] and selected the appropriate link for each country. For countries where several links were available, we picked the files that matched the country's ISO3 code. As the IPCC website currently blocks automated requests with the use of the "Incapsula" software solution, the pages and excel files were downloaded by running a headless browser with the "gecko-driver" and "selenium" [16] technologies to bypass their restrictions. As a side effect of the anti-DDoS techniques employed, their current interface is only practical for the manual retrieval of specific measurements. It actively prevents automated data retrieval and therefore impairs scientists from performing wide-ranging analyses.
We then developed routines that parsed the resulting excel sheets. We were interested in the carbon stock changes in forest biomass reported in " Table 4.a Sectoral background data for land use, land-use change and forestry-forest land" and focused only on forest land that remained as forest land. From this table, we used the columns titled "Carbon stock change in living biomass" subdivided in gains, losses and net change.
Countries varied in the structure of the aforementioned table and the number of rows provided. For instance, some countries used the different rows to distinguish coniferous and deciduous forest land while other countries had no such rows. Other times, the additional rows were used to distinguish between categories such as mainland versus overseas territories instead of forest types. This is due to the fact that the IPCC common reporting format does not impose a specific subdivision of the forest land category, and many countries add or remove rows at will, specifying custom information or not. This often prevents automated parsing of all countries. Fortunately, the code automatically identifies the length of the contained table. However it may require some manual intervention to adapt the software to a new country's data structure as it becomes available.

SOEF
The "Ministerial Conference on the Protection of Forests in Europe" [7] regularly publishes a "State of European Forests Report" (SOEF). This report includes an aggregation of NFI data that is made possible by a common reporting standard.
Each countries' submissions are accessible in a database [17]. To retrieve the data concerning all countries, we parsed the HTML contents of the drop-down menu and automated the download of all excel files based on the links extracted. Following the download, every country's data were contained in a single Excel file. For each file, we parsed four tables: • This was enabled by developing a flexible table parser that can identify the start and end rows within each different file to streamline the process and avoid manual interventions. This was necessary as the excel files differed greatly between countries and do not seem to be post-processed or corrected by SOEF themselves. This is made evident by the presence of calculations and temporary notes written in local languages next to the tables (in usually empty cells), as well as other typos, mistakes and inconsistencies. To help with ease of access and comparison, column titles were renamed and units were converted to SI standards where possible.
The main table of interest in the SOEF source contains the increment and fellings data. There are potential differences in land use definitions and in stock definitions between the SOEF and IPCC. We didn't account for the differences in forest land definitions, but we did account for the differences in stock definitions. Further explanations on the conversion from volumes of tree stem (over bark) to tons of carbon are available in the section on conversion to mass. To choose conversion factors, we used the tables on forest area by forest types and growing stock by forest types to compute a stock per forest type per hectare.

FAOSTAT
FAOSTAT is the corporate statistical database of the "Food and Agriculture Organization of the United Nations". The forest_puller software downloads forest area [18] and wood removals data [19]. In the case of the "Forest Land" dataset, we filtered the data by picking rows where "element" was equal to "area" and where "item" was equal to "forest land". In the case of the "Forestry Production and Trade" dataset, wood removals were determined by picking the rows where "element" was equal to "production quantity". Furthermore, we selected all the "roundwood" and "wood fuel" items, whether they were coniferous or non-coniferous. The wood removals obtained here are narrower in scope that the IPCC loss data, since they cover only the productive forest land. They are expressed in cubic meters under bark, we converted them to tons of carbon using the methodology below.

FRA
FRA stands for "Forest Resource Assessment" and is a report that is published every five years by the FAO (same organization as FAOSTAT). This report provides global information on forest area, stock and additional sustainability indicators.
The software produced downloads two datasets from the "CountrySTAT" platform. The first is titled "Extent of forest and other wooded land" [20]. We filtered this dataset by selecting rows where "category" was equal to "forest" to obtain the total area for each country. The second is titled "Growing stock by forest/other wooded land" [21]. We filtered this dataset by selecting rows where "category" was equal to "total growing stock" and "land type" was equal to "forest" to retrieve the standing stock in each country. This source doesn't provide data on the dynamics, i.e., increments and fellings, therefore we only report the forest area in the results.

HPFFRE
HPFFRE stands for "Harmonized projections of future forest resources in Europe" [22]. The supplementary data [23] covers 21 of the 27 countries studied in this analysis. It was released as part of a work package in the Diabolo project [24].
We selected forest area and felling volumes from the first scenario and used the historical period only, discarding all future predictions made by the model. We also summed the different categories of availability for wood supply together (FAWS, FNAWS, FRAWS).

Conversion to Mass
Though all data sources describe the same phenomena of tree growth and removal, not all use the same definitions or units. The IPCC data source provides measures of carbon stock change in the living biomass using tonnes of carbon per hectare as units. In their case, the term biomass includes the tree stem, the branches and the roots. The same unit is used for both increments and removals.
Meanwhile, SOEF provides increment and felling volumes using cubic meters per hectare and including only the stem of the tree (over bark). FAOSTAT describes roundwood removals measurement as "all quantities of wood felled and removed from the forest and other wooded land or other felling sites. They are measured in cubic meters under bark (without bark)". Lastly, HPFFRE expresses the dynamics in "Stemwood volume measured over bark expressed as unit area volume". It further specifies: "Total stemwood volume measured over bark. Part of tree stem from the felling cut to the tree top with the branches removed, including bark". A summary of the different units used is provided in Table 1. As three of the sources (SOEF, FAOSTAT and HPFFRE) report forest dynamics in volume of stem wood (over bark or under bark) while IPCC reports biomass dynamics in tonnes of carbon, for the purposes of comparison, we converted the merchantable volume increment to a carbon biomass gain (including both above and below ground biomass) using Equation (1) based on the IPCC guidelines [25]: where: BCEF I is the biomass conversion and expansion factor of the annual increment; it accounts for both the density and the expansion of merchantable biomass to above ground biomass [kg/m 3 ]. • R is the root to shoot ratio or the "ratio of below-ground biomass to above-ground biomass (r)" [25] [unitless]. • CF is the carbon fraction of dry biomass [unitless].
Similarly, we converted wood removal volumes H v to losses in tonnes of carbon L mc according to Equation (2) based on [25]: where: •  (1).
The 2006 IPCC guidelines [25] distinguish the BCEF and R parameters along different criteria. First, the criteria to choose BCEF I , BCEF R and BCEF S are: climatic zone, forest type and growing stock level in cubic meters. We allocated each country to one or two climatic zones based on the FAO map of global ecological zones [26]. We used the SOEF forest stock data to compute a merchantable biomass stock per hectare distinguished by coniferous and broadleaves. Based on that stock value, we choose separate BCEF parameters for coniferous and broadleaves in each country. Since country level gains and losses are not distinguished by forest types, we combined each BCEF into a weighted average, using the proportion of coniferous and broadleaved forest stock as a weighting factor. Second, the criteria to choose R are: domain, ecological zone, forest type and a threshold expressed in tons of aboveground biomass per hectare. We multiplied the merchantable stock by BCEF S to obtain the above ground biomass stock and used it as a threshold value (along climatic zone and forest type) to choose the R parameter. Since gains and losses are not distinguished by forest types, we used the ratio of conifer and broadleaves in the stock to obtain a single weighted average value of R for each country. For countries belonging to 2 climatic zones, the R values of the 2 climate zones are used in the weighed mean.
Based on the IPCC guidelines [25], we selected a single value for the carbon fraction of dry matter CF = 0.47. This value expresses tonnes of carbon per tonne of dry biomass.
In order to convert the under-bark volumes reported by FAOSTAT to over-bark volumes, we used an average value of "Volume ratio wood/bark plus wood" equal to 0.88 [27].
Several simplifying assumptions were made when choosing BCEF and R. For countries belonging to 2 climatic zones, the forest area was split equally between 2 climatic zones and the proportion of coniferous and broadleaves were considered to be the same in the 2 climatic zones. Different definitions of forested land use, mean that the biomass changes per hectare do not refer to the same areas. For example some data sources are limited to forest land available for wood supply while other sources include forest land not available for wood supply, even so, FAWS covers more than 90% of the total forest area in most countries according to the SOEF (Table A1). Increments and removals already have an unknown level of uncertainty in the input data and the conversion to tons of carbon increases the uncertainty. Therefore when using these values for comparison purposes, we should look only at their orders of magnitude and keep in mind that there is a large margin of error.

Common Data Format
From a software implementation perspective, the parsing of FAOSTAT bulk data is the most reproducible through time and across countries. The experience FAOSTAT has acquired by providing publicly available agricultural data for decades is apparent as the data are more standardised. Other sources tend to have ill-formtted spreadsheet for some countries or some years. We ended up implementing special conditions in the data preparation software to work around non-standard data input. It is likely that more of these special conditions will have to be implemented in future updates, rendering the updated process less automated than we would have desired.
Following the procedure described above for each data source, the results were compiled in a common format for all countries by using the pandas python package [28] and its DataFrame object. Data frames enable grouped computations, pivoting and table joins necessary to transform the variables to comparable units. Pandas data frames are also compatible with the matplotlib library [29] used for the graphs and visualizations.

Open Software
All the software written for the purpose of this publication to download, store, process, parse and visualize the forest data is freely available as a python package under the name forest_puller. The package can be installed with the command pip3 install forest_puller and the source code is distributed online [30] under the MIT license. For example Equations (1) and (2) are expressed in a vectorized form in the source code of forest_puller/viz/converted_to_tons.py line 85 and 87 of commit 6d713d8. We would like to encourage readers to review and contribute to the software as well as report any issues encountered on the bug tracker. Each source file has ample comments and every routine is documented.

Comparison of Forest Area
The main goal was to compare the changes in biomass volumes. Since these changes are always normalized by the area, it is natural to start comparing forest areas first (Figure 2). For SOEF, HPFFRE and FRA, the total forest area excludes other wooded land, i.e., land not defined as forest, though it should be noted that forest infrastructure such as roads and ditch networks can be reported under forest land. The maximum forest area is shown for each country in Tables A2 and A3 compares the forest area for the most recent years available in each source. In most countries, the total forest area reported by FAOSTAT is identical to the one reported by SOEF. The former has a periodicity of one year while the latter has a periodicity of five years. Additional points in FAOSTAT's yearly data have been obtained by interpolation as is visible in the changes of slope for Denmark and Bulgaria, for example. As the dynamics reported by SOEF and FAOSTAT are highly similar, we will focus on the comparison between the IPCC and SOEF forest areas for the rest of this section.
In Figure 2, four types of patterns emerged: (i) countries for which IPCC and SOEF forest areas are identical: Czechia, Hungary, Ireland, Italy, Sweden, (ii) counties for which the trends are similar but the curves are separated by an offset which could be due to a different forest land definition: Austria, Belgium, Croatia, Cyprus, Germany, Luxembourg, Slovakia, United Kingdom, (iii) countries for which the trends differ only slightly: Bulgaria, Estonia, France (footnote Guyana), Lithuania, Luxembourg, Poland, Portugal, Slovenia and Spain, (iv) countries for which the trends differ markedly: Denmark, Finland, Greece, Latvia, Lithuania, The Netherlands, Romania.
Forest dynamics are modeled differently between productive and non-productive forest land. This distinction is made available in the HPFFRE dataset which reports (i) forest available for wood supply, (ii) forest with restricted availability for wood supply, and (iii) forest not available for wood supply. However, in the IPCC data, this distinction is not available for all countries.

Growth Dynamics
International processes provide methods to aggregate biomass gains into comparable figures. However, conversion factors were not available for all countries, hence Figures 3 and A1 are expressed in the original units. In order to compare other sources with the IPCC, which expresses gains in tonnes of carbon for the total above and below ground biomass, we converted all sources to tonnes of carbon (see Figures 4 and A2).
In most countries, biomass gains are stable or slightly increasing over the period. Net increment values are around five to six cubic meters per hectare according to the SOEF data.
Corresponding gain values are below two tonnes of carbon per hectare in the IPCC data. The following countries show similar stable or slightly increasing trends both in terms of IPCC gains and SOEF increment values: Austria, Croatia, Finland, France, Italy, Poland, Romania, Hungary. However, trends differ for some countries. Indeed, the IPCC gain values are slightly decreasing year over year, while the SOEF data shows increases in gains over the same period of time in Denmark, Netherlands, Slovenia and Lithuania. In the case of Latvia, IPCC gain values have a decreasing trend over the period while the reported SOEF data are stable. Other countries do not have enough data points to compare trends.
Germany and Belgium use the stock change approach to report biomass gain values to the IPCC. As a result, the green curve Figure A1 has typical step shapes with constant gains for a few years followed by large changes. Other countries use the approach "one inventory plus change" which causes the curve to have more gradual annual changes along a trend.   (Austria, Croatia, Finland, France, Italy, Romania). Plots for all countries are available in the Appendix A, Figure A2. The following countries have similar biomass gain levels in the IPCC and SOEF after conversion to tonnes of carbon: Austria, Croatia, Cyprus, Finland, France, Latvia, The Netherlands. Another comparison provided here are the gains averaged over the whole period available in Table A3 and Figure 3. We can see that the IPCC gain level is higher than the corresponding SOEF gain in the case of Italy. Conversely, the IPCC gain level is lower than the one found in SOEF for Denmark, Romania, Slovenia.
A lower gain value is expected in countries where the IPCC forest area is larger than the SOEF forest area. Since the larger area is likely to include more of the unproductive forest land possessing slow growth rates, this lowers the average growth value.
A comparison in tonnes of carbon is not possible in other countries due to a lack of data in one of the sources or a missing conversion factor. Even though the conversion from stem volume increment to biomass gain is approximate, the fact that seven countries have similar values seemed to confirm that the approach is relevant to check the order of magnitude of biomass gains at a national level.

Disturbances Dynamics
Biomass losses are due to the combination of harvest and natural disturbances. At the national level, changes in losses can be due to large fluctuations in economic activity. As an example, in many countries, the impact of the 2008 financial crisis can be observed by a decrease in harvest. Indeed, the red curve moves upward as losses are represented by negative values on the vertical axis ( Figure 4). Then both types of disturbances can be combined. For instance, large storms or insect outbreaks lead to significant amounts of salvage logging visible in the national harvest statistics. A striking illustration is visible in the losses curves of Austria and Czechia as both countries have been severely hit by storms in recent years. Beyond the issue of combined disturbances, one should remain cautious and remember that in the international sources analysed in this article, disturbances losses are reported without any indication of the associated uncertainty. Improved methods to indicate the uncertainty of the estimates would help in understanding the significance of changes in the time series.
In the following countries, biomass loss levels were lower (in absolute value) in IPCC than in SOEF: Austria, Croatia, Cyprus, Denmark, Finland, Latvia, Netherlands, Romania, Slovenia. Countries where biomass loss levels were higher (in absolute value) in IPCC than in SOEF were: Ireland and Italy.
Considering the stark differences of reported loss values (Table A3 and figure 3), there is likely an issue with the expansion factors used in Equation (2). Further analysis of the reasons for these discrepancies would require a more detailed model benefiting from growing stock level broken down by species and climate zones.
On a planetary scale, taking into account indirect land use effect is crucial to avoid underestimating forest emissions, i.e., overestimating the forest sink effect [8]. Additionally, anthropogenic carbon missions lead to increases in natural disturbances and mortality. However, these effects are not separable from the base line disturbances and natural mortality in the National Inventory Reports data. We also note that future climate change is likely to continue impacting the interaction between disturbance agents [31].

Conclusions
The discrepancies across the five sources varied greatly from one country to another. It is difficult to identify the reasons for this incoherence as many sources of errors are compounded in national aggregates. More specifically, the variability along tree species composition, age distribution, soil and climatic conditions is lost in the aggregation process. Disaggregating data across these variables would help to find out the reasons for the differences observed.
Fundamentally, the raw measurements on which all estimates are based, are obtained by observing forests on a very small set of areas localized in space and time. The country-wide forest growth rates are but aggressive interpolations made from these ground measurements. Furthermore, these essential plot data obtained from field campaigns are habitually not divulged by the NFIs as they are protected by statistical confidentiality. To expand the spatial and temporal scales, more players would need to release harmonised public information as close as possible to these primordial quantifications, while continuing to ensure statistical confidentiality.
The data harvesting and merging software we introduce in this article can be reused by others. All data conversion steps have been developed in the python programming language. The software should be capable of updating data automatically as new data becomes available. Though future changes in the structure of the input data might require slight adjustments to the code. Grassi et al. [8] call for the global vegetation modeling community to: "[...] design future models and model experiments to increase their comparability with historical [Green House Gas Inventories] and thus their relevance in the context of the Paris Agreement". We hope the software module we produced can provide an overview of the biomass losses and gains at national levels and can facilitate comparison attempts by the vegetation and carbon cycle modeling community in the future. The harmonized data assembled here are not sufficient to calibrate a European forest dynamics model, but it provides a series of reference points necessary to validate such a model on historical data. The underlying software demonstrates how to structure the data acquisition and how to implement a conversion algorithm. Can it be a meaningful step towards "the availability and provision of harmonized freely-available databases" [9]?
In the future, the spatial and temporal precision of remote sensing data will continue to increase and maps of biomass change will become available. There will be a pressing need to compare them with ground based observations of biomass losses and gains. At the international level, these comparisons can be supported by a framework for sharing ground based observation. Building such a framework will be very challenging. It will be challenging on the scientific level because each ground data collection is adapted to its own biome and it will be challenging on the policy level because each national forest inventory effort is shaped to its particular socio-economic context.

Data Availability Statement:
The data is accessible through the python package forest_puller [30].

Acknowledgments:
We would like to thank our colleagues of the JRC biomass team. In particular, Roberto Pilli and Raul Abad Vinas for introducing us to the IPCC inventory submission dataset, Valerio Avitabile, Andrea Camia, Sarah Mubareka and Nicolas Robert for comments on an early version of this article and Bernd Eckhardt for sharing his experience concerning NFIs data platforms. The work described in this paper has (though not constituting its official output) been carried out in the context of the JRC Biomass assessment study https://ec.europa.eu/knowledge4policy/projectsactivities/jrc-biomass-study_en.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: