An Improved Forest Structure Data Set for Europe

: Today, European forests face many challenges but also offer opportunities, such as climate change mitigation, provision of renewable resources, energy and other ecosystem services. Large-scale analyses to assess these opportunities are hindered by the lack of a consistent, spatial and accessible forest structure data. This study presents a freely available pan-European forest structure data set. Building on our previous work, we used data from six additional countries and consider now ten key forest stand variables. Harmonized inventory data from 16 European countries were used in combination with remote sensing data and a gap-ﬁlling algorithm to produce this consistent and comparable forest structure data set across European forests. We showed how land cover data can be used to scale inventory data to a higher resolution which in turn ensures a consistent data structure across sub-regional, country and European forest assessments. Cross validation and comparison with published country statistics of the Food and Agriculture Organization (FAO) indicate that the chosen methodology is able to produce robust and accurate forest structure data across Europe, even for areas where no inventory data were available.


Introduction
Since the pre-industrial period, the anthropogenic greenhouse gas emissions including carbon dioxide (CO 2 ) have steadily increased and will have important impacts on human and natural systems [1]. Forests play an important role within the global carbon cycle because they store a large amount of carbon and mitigate climate change affects [2]. The reduction of greenhouse gas emissions by replacing fossil material and energy with renewable resources, such as biomass, is important to avoid the further increase in atmospheric CO 2 concentration. Thus, the use of wood products is expected to increase since wood is important for a bio-based economy, aiming for a reduction in emissions from the combustion of fossil fuels. In addition, forests provide other important valuable ecosystem services such as the protection of infrastructure in mountainous areas, habitat for wildlife and recreation areas especially near large cities [3].
Forest ecosystems mitigate climate change effects, but they are also directly affected by climate change through changing growing conditions (e.g., temperature, precipitation and length of drought periods). Forest adaptation to changing environmental conditions takes time due to the long lifespan of trees [4,5]. Climate change is often associated with an increase in weather extremes such as drought or storm events followed by wildfires, wind throw and bark beetle infections [6,7]. This is an additional challenge to the forestry sector because the demanded ecosystem services need to be provided and secured for the future [8,9].
European forests cover about 33% of Europe's total land area [3] and extend from the Mediterranean in the south to the Boreal regions in the north. They grow in elevations from sea level to high mountainous areas. These differences in the regional growing conditions have led to distinct ecosystems which are additionally shaped by the long-lasting historic 1.
to provide an improved gridded forest structure data set on 8 × 8 km resolution across Europe, 2.
to assess the error components of the new forest structure data, 3.
to obtain land cover information to generate consistent gridded forest structure maps at 500 m resolution enabling upscaling to regions and/or countries, and 4. to evaluate the provided higher resolution maps by calculating country totals and compare these data to the original NFI data and the FAO statistics.

Materials and Methods
The principal approach of our study is to only use point sampled National Forest Inventory (NFI) data covering 16 different countries and apply a gap-filling algorithm for countries and regions where no such data are available. The R statistical software is used for data preparation, statistical analysis and data visualization [35]. The gap-filling is done using the Python script written and made publicly available by Moreno et al. [28]. Figure 1 provides the workflow and the used data including the methodological steps: Remote Sens. 2022, 14,395 3 of 25 3. to obtain land cover information to generate consistent gridded forest structure maps at 500 m resolution enabling upscaling to regions and/or countries, and 4. to evaluate the provided higher resolution maps by calculating country totals and compare these data to the original NFI data and the FAO statistics.

Materials and Methods
The principal approach of our study is to only use point sampled National Forest Inventory (NFI) data covering 16 different countries and apply a gap-filling algorithm for countries and regions where no such data are available. The R statistical software is used for data preparation, statistical analysis and data visualization [35]. The gap-filling is done using the Python script written and made publicly available by Moreno et al. [28]. Figure  1 provides the workflow and the used data including the methodological steps: Figure 1. Flow-chart for the methodology to derive a gridded gap-filled forest structure dataset from individual NFI plot tree data across Europe. Grey, round-edged boxes represent input and output data and ellipses calculation steps. Light grey, dotted boxes summarize grouping of the input data and the two-step gap-filling algorithm.

National Forest Inventory Data
The National Forest Inventory (NFI) data are obtained from 16 European countries and consist of recorded tree information from 350.489 inventory plots (Table 1). All selected countries maintain a gridded systematic point sampling National Forest Inventory. Data for 12 countries were available from a previous study by Moreno et al. [28] with additional details available in [36]. Data from the Czech Republic, used by Moreno et al. Figure 1. Flow-chart for the methodology to derive a gridded gap-filled forest structure dataset from individual NFI plot tree data across Europe. Grey, round-edged boxes represent input and output data and ellipses calculation steps. Light grey, dotted boxes summarize grouping of the input data and the two-step gap-filling algorithm.

National Forest Inventory Data
The National Forest Inventory (NFI) data are obtained from 16 European countries and consist of recorded tree information from 350.489 inventory plots (Table 1). All selected countries maintain a gridded systematic point sampling National Forest Inventory. Data for 12 countries were available from a previous study by Moreno et al. [28] with additional details available in [36]. Data from the Czech Republic, used by Moreno et al. [28] were not used in our study. We obtained data from four additional countries, Albania, Croatia [37], Ireland [38] and the Netherlands [39]. We complemented the data from Italy [40,41] (now covering the whole country) and Belgium (now including also the region Wallonie). Over- all, about 90.000 additional plot data were gathered. Each country has its own inventory system and sampling design (see Table 1) and use their own definitions and methods. All countries provided plot-level data based on their inventory system. The data were gathered, processed and harmonized as described in Neumann et al. [36]. Harmonization was done according to tree species groups, age classes and as far as possible for biomass and volume definitions (for details see [19,36]). However, basic definitions of volumes (e.g., inclusion of branches or measuring over or under bark) and sampling designs (e.g., diameter thresholds) cannot be changed which makes harmonization difficult [20][21][22]. The resulting data comprise a full set of plot-level forest variables derived from the recorded tree information on each plot and cover information such as the carbon content, biomass for individual compartments (stem, branch, foliage, root), volume, height, diameter at breast height, stem number, basal area, stand density index, age class and the tree species group. In the previous study [28], only six variables (carbon for whole tree, volume, basal area, diameter at breast height, height, age) were considered, while we extend the forest stand characteristics data to ten variables.
The plot-level inventory data are aggregated to 8 × 8 km grid by averaging the metric variables. For the nominal values (age class and tree species group), we calculate the proportion and most frequent value within an 8 × 8 km grid-cell based on the number of inventory points belonging to each class or group. At 8 × 8 km resolution, on average eight inventory plots are within a cell to ensure statistical confidence in the cell values [42]. It is important to note that Moreno et al. [28] used 0.133 × 0.133 degree resolution and WGS84 projection leading to a varying cell size from approximately 111 × 89 km at 37 • latitude (e.g., southern Spain) to 111 × 43.5 km at 67 • latitude (e.g., northern Finland). In this study, we use the ETRS89-LAEA projection with a fixed cell size of 8 × 8 km across latitude, providing consistent cell size and resolution in the whole study area ( Figure 2). Remote Sens. 2022, 14, 395 4 of 25 [28] were not used in our study. We obtained data from four additional countries, Albania, Croatia [37], Ireland [38] and the Netherlands [39]. We complemented the data from Italy [40,41] (now covering the whole country) and Belgium (now including also the region Wallonie). Overall, about 90.000 additional plot data were gathered. Each country has its own inventory system and sampling design (see Table 1) and use their own definitions and methods. All countries provided plot-level data based on their inventory system. The data were gathered, processed and harmonized as described in Neumann et al. [36]. Harmonization was done according to tree species groups, age classes and as far as possible for biomass and volume definitions (for details see [19,36]). However, basic definitions of volumes (e.g., inclusion of branches or measuring over or under bark) and sampling designs (e.g., diameter thresholds) cannot be changed which makes harmonization difficult [20][21][22]. The resulting data comprise a full set of plot-level forest variables derived from the recorded tree information on each plot and cover information such as the carbon content, biomass for individual compartments (stem, branch, foliage, root), volume, height, diameter at breast height, stem number, basal area, stand density index, age class and the tree species group. In the previous study [28], only six variables (carbon for whole tree, volume, basal area, diameter at breast height, height, age) were considered, while we extend the forest stand characteristics data to ten variables. The plot-level inventory data are aggregated to 8 × 8 km grid by averaging the metric variables. For the nominal values (age class and tree species group), we calculate the proportion and most frequent value within an 8 × 8 km grid-cell based on the number of inventory points belonging to each class or group. At 8 × 8 km resolution, on average eight inventory plots are within a cell to ensure statistical confidence in the cell values [42]. It is important to note that Moreno et al. [28] used 0.133 × 0.133 degree resolution and WGS84 projection leading to a varying cell size from approximately 111 × 89 km at 37° latitude (e.g., southern Spain) to 111 × 43.5 km at 67° latitude (e.g., northern Finland). In this study, we use the ETRS89-LAEA projection with a fixed cell size of 8 × 8 km across latitude, providing consistent cell size and resolution in the whole study area ( Figure 2).    [43]. For ACS we provide the basal area factor, while for FAP we show plot area. Min. DBH is the minimum diameter at breast height (DBH) required for each tree to be included in the sample. Sampling Date Range provide information about the observation period by country. Arrangement of sample plots indicates whether the inventory plots are arranged as single plots or within clusters.

Country
Sampling

Land Cover and Bioregions for Clustering
Following Moreno et al. [28], we use a bioregion map with six different regions: (i) Alpine, (ii) Atlantic, (iii) Boreal (including Boreal, Arctic and Norwegian Alpine), (iv) Continental (including Continental, Black Sea and Steppe), (v) Mediterranean and (vi) Pannonia. For land cover information, we use version 6 of the MODIS MCD12Q1 product with the University of Maryland (UMD) classification [44] representing land cover conditions of 2005. Spatial aggregation from 500 m to the 8 km resolution was needed to determine the most frequent land cover type within the 8 km grid cell. Only cells dominated by a vegetation land cover type are used for the gap-filling, while cells that are dominated by urban and other non-vegetated land are excluded.

Co-Variates for Gap-Filling
We use the following variables as co-variates in the gap-filling algorithm: Net primary production, net primary production trend, canopy height and a climate limitation index.
Net Primary Production (NPP) at 0.0083 • resolution is derived using the original MOD17 algorithm for global calculations in combination with a European climate dataset which has improved the NPP estimations for European forests [45,46]. For net primary production trend, a linear regression line is fitted to the annual NPP values of 2000 to 2012 at the original 0.0083 • resolution. The trend is given by the slope of the regression line.
Tree canopy height is obtained from a global spaceborne lidar data forest canopy height map at 0.0083 • resolution [47]. The climate limitation index is the product of three normalized climate datasets: relative growing season length, average annual shortwave solar radiation and average annual vapor pressure deficit. Average growing season length is estimated by using the average time between the onset of the increasing leaf area index (LAI) in spring and the end of the decreasing LAI in autumn using the MODIS Leaf Area Index data [48]. Both short-wave solar radiation and vapor pressure deficit are calculated using the MtClim algorithm which uses climate data and a digital elevation model as inputs [46,49]. Therefore, by using the climate limitation index the elevational gradient is also considered. The climate limitation index was still available from the previous work carried out by Moreno et al. [28]. All data are aggregated to 8 km resolution by calculating the average value within the cell.

K-Means Clustering and k-Nearest Neighbor Gap-Filling
We apply the landcover and bioregion data and the above-mentioned co-variates in a two-step gap-filling algorithm which (i) clusters cells by similarity and (ii) uses a k-Nearest Neighbor algorithm to fill empty cells. In step number one, cells are grouped according to their land cover and related biogeographical region using a k-means clustering algorithm for assigning all grid cells to their corresponding cluster. In step number two, each cell with missing gridded inventory data is assigned to its nearest neighbor with inventory data belonging to the same cluster using a k-Nearest Neighbor (kNN) algorithm. The kNN method is a non-parametric approach used to predict the values of variables by finding the k most similar objects with observed values within a user-defined co-variate space [49]. Similarity is based on the (Euclidean) distance between the objects in the covariate space [50]. Following Moreno et al. [28], two k-means cluster and one nearest neighbor are applied. Thus, each cell without NFI data is provided with NFI data from the cell which (i) has NFI data, (ii) belongs to the same cluster as the cell with missing data and (iii) is from all cells fulfilling these two conditions, its nearest neighbor in the co-variate space formed by NPP, NPP trend, canopy height and climate limitation index.

Forest Area Mask
We use the MODIS land cover data and UMD classification to produce a forest area mask at 500 × 500 m resolution. The UMD land cover classification is among other things based on tree cover and canopy height. We use this information to assign a forest area to a land cover class. Each forest land cover class, (i) Evergreen Needleleaf, (ii) Evergreen Broadleaf, (iii) Deciduous Needleleaf, (iv) Deciduous Broadleaf, (v) Mixed, are defined by a tree cover > 60% and a canopy height > 2 m. In our forest area mask, cells belonging to these classes are assumed to be fully forested and contribute 25 ha (500 m × 500 m) of forest area. Woody Savannas have by definition a tree coverage of 30%-60% and we therefore define a factor of 0.45 for calculating the forest area for such cells, meaning that each cell contributes 11.25 ha. For Savannas, we defined a factor of 0.15 (3.75 ha) and for Cropland/Natural Vegetation Mosaics a factor of 0.05 (1.25 ha) is used. All other land cover classes (e.g., Closed/Open Shrublands, Grasslands, Croplands) are assumed to be non-forested.

Calculation of Country Sums and Comparison with FAO Statistics
The forest area mask (ha) is combined with the gap-filled volume map (m 3 /ha) to pro-  (Table 1). Linear regression is used to determine goodness-of-fit and bias of our estimations.

An Improved Gridded Forest Structure Data
Even though the collected NFI data covers large forest areas in Europe, there are still regions where no systematic grid-sampled forest inventory data were available (see Figure 2). Another issue is the fact that due to differences in the data recording system and the grid raster by country, differences between countries may occur. The missing forest inventory information from countries where no data were available is filled with the two-step gap-filling algorithm by identifying for every cell where no forest inventory data are available, its most similar cell that does provide such data.
After applying the gap-filling algorithm, a full set of pan-European gridded data comprising volume, carbon content, biomass by compartment, height, diameter at breast height, stem number, basal area, stand density index, age class and tree species group is generated. Volume, carbon content, biomass, stem number and basal area given in per hectare values. All other variables represent the mean average characteristics by cell independent of the forest area of this cell. The results by cell for selected variables are given as maps in Figures 3-6 and include grid-cells with low tree cover such as mixed forest-agriculture cells or small forests in urban areas.

Accuracy of the Improved Data Set
An important issue with data is its accuracy. Thus, we next assess the error range of the gap-filling algorithm by executing 'leave-one-out' and 'country-wise' cross

Accuracy of the Improved Data Set
An important issue with data is its accuracy. Thus, we next assess the error range of the gap-filling algorithm by executing 'leave-one-out' and 'country-wise' cross

Accuracy of the Improved Data Set
An important issue with data is its accuracy. Thus, we next assess the error range of the gap-filling algorithm by executing 'leave-one-out' and 'country-wise' cross validations. The 'leave-one-out' cross validation is performed by iteratively removing data from one cell and gap-fill this cell using the data of all other cells. The 'country-wise' cross validation is performed by iteratively removing the entire data from a country and filling all cells within that country only with data from all other countries. We calculate the mean value and the standard deviation (SD) of the gap-filled data sets as well as the mean bias error (MBE), the mean absolute error (MAE) and the root mean square error (RMSE). These measures are compared with the mean value, SD and confidence interval (CI, α = 0.05) of the original inventory data.
The mean and SD of the gap-filled data is nearly the same as in the original data in the case of the 'leave-one-out' cross validation and shows, depending on the variable considered, no or only little bias ( Table 2). The 'country-wise' cross validation shows negative bias for all variables, leading to lower mean values, while the SD is slightly higher than, but still comparable to, the SD of the original data. MAE and RMSE are close to the CI for the 'leave-one-out' cross validation. The 'country-wise' cross validation shows higher values for the MAE and RMSE, with the MAE still being comparable to the SD. Table 2. Results of the leave-one-out and country-wise cross validation versus gridded NFI data for entire Europe. N is the total number of 8 × 8 km cells evaluated which may differ by variables since tree height and stand age were not reported by all countries. Aggregated NFI data refers to the aggregated National Forest Inventory (NFI) plots, for which the mean value, standard deviation (SD) and confidence interval (CI) are shown. For leave-one-out and country-wise cross validation the mean value, SD, mean bias error (MBE), mean absolute error (MAE) and root mean squared error (RMSE) are shown. Not Available (NA) indicate that this metric cannot be calculated.  At country level, the error is expressed as the median of the relative difference in percent between the aggregated inventory data versus the gap-filled data (see Table 3). Our results of the cross validation on the country level exhibit for the 'country-wise' cross validation a higher error than 'leave-one-out.' No pattern is evident, suggesting that regions, latitude or longitude, are associated with higher errors. At the EU level, the median of the percentage relative difference is close to zero for 'leave-one-out', while for the county vise validation it is around 10% and is evident for all produced variables except the most frequent age class (see Table 3). . Tree Volume (m 3 /ha) only for 500 × 500 m cells with a minimum tree cover of at least 5% (1.25 ha) according to the MODIS land cover classification. We combined this map with MODIS land cover data to calculate total volume (m 3 ) for each 500 × 500 m cell. Summing up cell values, total volume at regional or country level are estimated. In this figure, values higher than 400 m 3 /ha are truncated for display reasons.

Upscaling Data
An important part of this study is to deliver consistent European forest data across countries and regions at any spatial resolution. For the upscaling, we use the forest area mask derived from MODIS land cover data at a 500 × 500 m resolution. By combining the forest area mask with the 8 × 8 km gap-filled maps, we produce maps at 500 × 500 m resolution which better reflect the possible heterogeneity of the landscape within an 8 × 8 km grid cell (Figures 7-9). An 8 × 8 km cell may only be partly forested and also contain urban areas, water bodies or agricultural land. These maps can then be used in combination with other data available at a similar resolution to study certain aspects of European forests, such as the legal or technical accessibility of forest resources.

Evaluate the Results Using FAO Statistics
Important for this study is comparing our derived information with other available data such as the State of Europe's Forests (FAO) statistics [3], as this allows to assess the quality of the gap-filling algorithm in countries, where no NFI data were available. In general, a high agreement between our produced volume estimates versus the reported values (Adj R 2 = 0.903) are evident (Figures 10 and 11). When looking at the European scale, almost no bias (Slope = 0.964) is present. However, some countries (e.g., Bulgaria, France, Romania, Spain) do show substantial overestimation or underestimation (Figures 10 and 11). When only looking at countries where no inventory information was available, there is also an overall good agreement (Adj R 2 = 0.787) between our estimations and the reported values (Figure 11b). In this case, the estimates in general tend to be higher than the reported values (Slope = 1.12). Individual countries can be overestimated or underestimated, in some cases substantially (e.g., Bosnia-Herzegovina, Bulgaria, Serbia). Estimations are close to the reported values for some countries where we had inventory information (Austria, Belgium, Estonia, Finland, Italy, Netherlands, Poland) as well as countries without that information (Czech Republic, Greece, Hungary, Latvia, Montenegro, Slovenia, United Kingdom) (Figures 10 and 11).

Discussion
Lack of a consistent and accessible European forest inventory data makes detailed analysis at the European level difficult or even impossible [17,19]. With this study, we provide and produce a pan-European gridded forest structure dataset based on an extensive collection of point sampled National Forest Inventory data, remote sensing information and a gap-filling algorithm for those areas where no gridded data were collected or gridded inventory data were not available. With these data, we are able to depict forest structure differences at the regional and sub-regional levels which allows detailed forest analysis across Europe (Figures 3-6).
Cross validation is used to assess the quality and accuracy of the data. The results show that the chosen methodology is robust and accurate which gives confidence in the produced data. The gap-filling algorithm relies on the regional distribution of the input inventory data and how well it covers the latitudinal and elevational gradients, climatic conditions and bioregions in Europe. Only high coverage of the relevant conditions ensures that the nearest object determined by the kNN algorithm really bears similarity with the object which variable values need to be predicted. This limitation is evident when looking at the 'country−wise' cross validation results. When removing the entire data from a country where prevailing conditions (e.g., climatic, management history) are not well covered by the remaining data of all the other countries, the uncertainty tends to be higher. This can be observed for Spain on the Iberian Peninsula, Ireland on the British Isles, as well as Albania and Croatia in the north-eastern Mediterranean. This may also be evident for the Netherlands, where our produced inventory volume and DBH values are lower compared to the neighboring countries Belgium and Germany, which are likely to be used for gap-filling.
When removing data from an entire country and the growing conditions are well covered by the remaining data from the neighboring countries, the error tends to be lower, e.g., Austria, France or Germany. Being aware of these limitations, in this study we specifically added data from Albania, Croatia and Ireland to better cover these regions.
Differences in the sampling design or calculation method among countries, such as the minimum diameter at breast height required for a tree to be included or biomass function used, could also explain high errors. For instance, when removing the entire data from Sweden, it is likely that it will be filled with data coming from the two other Scandinavian countries. However, Sweden uses fixed area plots with no minimum diameter requirement, while Norway, although also using fixed area plots, requires trees to have a minimum diameter at breast height of 5 cm and Finland uses angle count sampling, which is a different plot design altogether.

Comparison with FAO Statistics
Comparison with FAO statistics gives further confidence in the chosen methodology and produced data, especially on the European scale where agreement between calculated and reported values is high. This high agreement was expected when only looking at countries were we did gather inventory data, as also the FAO statistics are mainly based on national inventories [18]. However, for some of these countries substantial overestimation or underestimation can still be observed. The reasons can be manifold.
In Romania, reported growing stock for the years 2000, 2005 and 2010 are estimated using a relationship between forest area and a mean growing stock by unit of area based on the Forest Inventory in 1985, while the estimation for 2015 is based on the National Forest Inventory 2012, the same inventory our data for Romania comes from. The growing stock reported for 2015 is substantially higher than the previous reported values (2222 million m 3 in 2015 versus 1378 million m 3 in 2010) and closer to our estimations [51]. However, in our study, we only compare with the mean of the values reported for the period 2000 to 2010, as this period overall coincides best with our gathered inventory data. In general, different time periods can affect the comparison as we considered the mean of the reported values for the period 2000 to 2010 for all countries, although the inventory data in individual countries do not cover this whole period (Table 1).
In Sweden, a different minimum diameter requirement for trees to be considered between FAO report (minimum of 10 cm) and NFI data (no minimum requirement) may contribute to overestimation [52]. In general, problems with the harmonization of variables and definitions such as stem volume [22], growing stock [20], forest available for wood supply or even forest in general [18,21] could partly explain some of the observed differences.
These differences in the definitions of forest and forest available for wood supply lead to different forest area estimations which can be another reason for overestimation or underestimation [53]. The FAO defines forest as, "land spanning more than 0.5 ha with trees higher than 5 m and a canopy cover of more than 10%, or trees able to reach these thresholds in situ. It does not include land that is predominantly under agricultural or urban land use" [54]. In addition to the presence of trees the definition is also land-use based, meaning that temporarily unstocked areas intended for forestry or conservation use are included [54,55]. In our study, a forest area mask is generated using MODIS land cover data, which relies on the presence or absence of tree cover to identify forests and cannot account for intended land use [55]. The coarse resolution of 500 × 500 m (25 ha) also means that not only the Forest or Woody Savanna land cover types contain forest as basically every cell irrespective of the land cover class can contain forest as defined by the FAO, i.e., a land spanning more than 0.5 ha with trees higher than 5 meters and a canopy cover of more than 10%. We partly considered this in our forest area mask by assuming all Forest land cover cells to be fully forested, compensating for small forest areas in cells with land cover classes which did not contribute to forest area in our study. We also introduced forest area factors for selected other land cover classes, such as Woody Savannas, Savannas or Cropland/Natural Vegetation Mosaic. Differences between reported global forest cover change in the Global Forest Resources Assessment 2015 (FRA 2015) and global remote sensing estimations by Hansen et al. [56,57] could also possibly be explained by the used tree cover threshold of 25% and also by the coarse resolution of the MODIS images used in the earlier of the two studies [55]. We also explored the usage of the high-resolution 25 × 25 m CORINE land cover data to estimate forest area (Appendix A) [58]. Underestimation of forest area might be the main reason for the substantial underestimation of volume in France and Spain. In both countries, the share of 500 m cells belonging to one of the Forest MODIS land cover classes are low compared to other countries. As savannas and shrublands cover large areas in these countries, our approach may be unable to account for all the small forest areas in cells belonging to these and other classes. This could be improved by using a different approach based on forest cover data or using different land cover data such as CORINE [59,60]. However, detailed analysis of all possible reasons for overestimating or underestimating and optimizing our estimations were not the aim and scope of our study. Furthermore, when using the gap-filled data, one is not limited to the forest area mask proposed in this study but can use any other forest area mask.
In countries where no inventory data were obtained, overestimation or underestimation can, in addition to the reasons already mentioned, also be related to the inventory data not covering the prevalent conditions in these countries well enough (see also Section 4.1). In south-east Europe, we were only able to gather data from Albania and Croatia, which might not be sufficient to accurately describe the situation in other countries belonging to this region, as suggested by the observed overestimation in many of these countries (Bosnia and Herzegovina (BA), Bulgaria (BG), Greece (EL), the Republic of North Macedonia (MK) and Serbia (RS)). As a consequence, although we were able to add the data from Albania and Croatia for this study, additional data from that region are likely needed to improve the forest structure data.

Potential Applications
A big advantage of high-quality spatial explicit gridded forest structure data is the ability to combine it with other spatial explicit information, such as soil information, conservation status, land cover, climate data or terrain data. This allows for detailed analysis which can contribute to solving and providing relevant questions related to the carbon mitigation potential of European forests, the impact of conservation policies or how forest resources are threatened by changing disturbance regimes [31][32][33][34]. A most recent example is the assessment of the harvestable forest area and stocking volume in Europe [61]. The study combines the Forest Structure data with conservation status, slope, soil and road infrastructure data to quantify the legal and technical accessibility of forest resources for mechanized harvesting. The availability of wood resources is an important question in regard to the potential of using wood products to substitute non−renewable fossil products. Another application is initializing large-scale, climate-sensitive bio-geochemicalmechanistic forest simulation models. All these studies can support decision makers during the transition towards a bio-based economy.

Room for Improvement
The lack of a common inventory system across European forests with standardized definitions as well as sampling designs and measurement methods, make a comparison of even basic forest variables, such as forest area or growing stock difficult in Europe.
Our harmonization of the different national forest inventory data with the improved data sources and methodology is important part for consistent pan-European forest studies. If a common inventory system as well as a common forest area mask could be established, the methods described in this study could be used to support countries in official reporting of key forest variables. The results of this study are an important attempt to provide consistent forest structure data to the scientific community.  . Additional financial support came from the «GIS and Remote Sensing for Sustainable Forestry and Ecology (SUFOGIS)» project funded with support from the EU ERASMUS+ program. The European Commission support for the production of this publication does not constitute an endorsement of the contents, which reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Acknowledgments:
We acknowledge the open data policies by European forest inventory agencies, that made this analysis possible. Parts of the used inventory data were gathered as part of the collaborative project 'FORest management strategies to enhance the MITigation potential of European forests' (FORMIT), which received funding from the European Union Seventh Framework Programme under grant agreement n • 311970. We would like to thank John Redmond and Luke Heffernan and the Department of Agriculture, Food and the Marine for providing national forest inventory data for Ireland. The data used for Ireland are the property of the Department of Agriculture, Food and the Marine, Ireland. We appreciate the help of Huguez Lecomte and Andre Thibaut (Service Public de Wallonie) and Sebastien Bauwens (University of Liege) for their support with the Wallonian forest inventory. The data used for Wallonia are the property of SPW−DGARNE. We thank Elvin Toromani (University of Tirana) for his support analyzing the Albanian forest inventory. We further extend our gratitude to Jura Cavlovic (University of Zagreb, Croatia) and Goran Videc (Ministry of Agriculture, Croatia) for their support with the Croatian forest inventory. The data used for Croatia are the property of the Croatian Ministry of Agriculture. We acknowledge the open access to national forest inventory data from Italy and Netherlands.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
During the course of our study, we also explored the usage of CORINE land cover data for the production of the forest area mask. The higher resolution 25 × 25 m data is better able to detect small forested areas which might be overlooked when using the coarse 500 × 500 m MODIS land cover data (see Section 4.1). For the CORINE forest area mask, each cell belonging to a Forest land cover class contributed 0.0625 ha (25 × 25 m) of forest area, while cells belonging to other land cover classes did not contribute any forest area.
In some countries such as Finland, France, Norway or Spain, difference in estimated forest area is substantial, while in other countries (e.g., Austria, Bulgaria, Romania) the difference is only minor ( Figure A1). Especially in France and Spain, the forest area estimated with the CORINE data is likely way closer to the true value compared to the estimations based on the MODIS data. However, although overall the model estimations better fitted the reported values ( Figure A2a), this was not true when only looking at countries where we did not have inventory data ( Figure A2b). Therefore, we decided to still use the MODIS land cover data for forest area estimation as this data set are already used in the gap-filling algorithm. It is important to stress that the gap-filled maps can be combined with any other forest area mask and using forest area mask with higher resolution is encouraged. In this study, we simply provide one way of creating a forest area mask using MODIS land cover data, which was sufficient for our needs. It was not the aim of the study to provide the best possible forest area mask.
Remote Sens. 2022, 14, 395 21 of 25 Figure A1. Comparison of the forest area estimations based on MODIS [44] and CORINE [58] land cover data sets. MODIS refers to the forest area mask as used in this study (see Section 2.5) while CORINE refers to a forest area mask based on CORINE land cover data which was not used in the study (see Appendix A). Countries where no inventory data were available are marked with an asterisk. Figure A1. Comparison of the forest area estimations based on MODIS [44] and CORINE [58] land cover data sets. MODIS refers to the forest area mask as used in this study (see Section 2.5) while CORINE refers to a forest area mask based on CORINE land cover data which was not used in the study (see Appendix A). Countries where no inventory data were available are marked with an asterisk.  Countries where no inventory data were present are marked with an asterisk.