Forest type and tree species classification of nemoral forests with Sentinel-1 and 2 Time Series data

Mapping forest extent and forest cover classification are important for the assessment of forest resources in socio-economic as well as ecological terms. Novel developments in the availability of remotely sensed data, computational resources, and advances in areas of statistical learning have enabled fusion of multi-sensor data, often yielding superior classification results. Most former studies of nemoral forests fusing multi-sensor and multi-temporal data have been limited in spatial extent and typically to a simple classification of landscapes into major land cover classes. We hypothesize that multi-temporal, multi-censor data will have a specific strength in further classification of nemoral forest landscapes owing to the distinct seasonal patterns of the phenology of broadleaves. This study aimed to classify the Danish landscape into forest/non-forest and further into forest types (broadleaved/coniferous) and species groups, using a cloud-based approach based on multi-temporal Sentinel 1 and 2 data and machine learning (random forest) trained with National Forest Inventory (NFI) data. Mapping of non-forest and forest resulted in producer accuracies of 99% and 90 %, respectively. The mapping of forest types (broadleaf and conifer) within the forested area resulted in producer accuracies of 95% for conifer and 96% for broadleaf forest. Tree species groups were classified with producer accuracies ranging 34-74%. Species groups with coniferous species were the least confused whereas the broadleaf groups, especially Oak, had higher error rates. The results are applied in Danish National accounting of greenhouse gas emissions from forests, resource assessment and assessment of forest biodiversity potentials.


Introduction
Forests produce a wide range of socio-economic goods for the owners of forest lands as well as for the surrounding society. Many of these goods are closely related to the spatial distribution of forest and tree species. Different tree species are managed differently, may be used for producing different products, and are sold at different prices yielding widely different economic profiles for the forest owner. In relation to climate change, different species accumulate different amounts of carbon in widely different temporal patterns [1], thereby contributing differently to climate change mitigation. The interception and evaporation of precipitation in tree crowns differ between tree species and results in different rates of ground water formation as well as in different ground water quality [2][3][4]. Different forest tree species furthermore produce different habitats and therefore contribute differently to biodiversity conservation in general [5]. The future provision of the goods provided by forests are furthermore largely dependent on their spatial arrangement since different tree species will be affected differently by climate change, depending on complex interactions with the local site conditions as well as with other trees [6].
Due to the intrinsic relationship between the spatial forest and forest tree species distribution and the provision of socioeconomic goods, mapping of forests and forest tree species provide a possibility to assess the provision of such goods. Consequently, mapping of forests as part of management planning has been detailed and widely used since the beginning of the 19th century [7]. However, previous methods of forest resource mapping were time consuming, expensive, and hardly feasible in today's society. Meeting the society's needs for detailed and at the same time cost effective mapping of forest characteristics based on remote sensing commenced in the early 1970's [8] and began a rapid development in mapping of forest ecosystem services, including forest resources [9][10][11][12][13], carbon storage [14,15], biological diversity [16][17][18], habitats [19][20][21], and forest recreation [22].
In recent decades, the possibility to map the forest and forest tree species distribution has increased dramatically due to improvements in temporal and spatial resolution of remotely sensed data, the combination of multiple sensor systems, increasing processing capacity as well as advances in areas of statistical learning. A prominent feature of recent advances in classification is the fusion of multi-sensor data [23][24][25]. Different sensors vary in terms of spatial and temporal resolution as well as in the properties of data collected. Combining different sensors may therefore overcome shortcomings of individual sensors, providing improved classification results [26]. In a review of 32 studies on fusing optical and radar data, the vast majority of studies concluded that the fusion of data improved forest classification compared to using a single source [27].
During recent years, the high acquisition frequencies of dual synthetic aperture radar (SAR) Sentinel-1 (S-1) and optical Sentinel-2 (S-2) has provided novel opportunities for classification of forested landscapes from multi-temporal data. In a study aiming to classify deciduous and coniferous forests in Switzerland, the use of multi-temporal S-1 Cband backscatter showed promising results with higher classification accuracies achieved for forest types than for individual species [28]. Dostalova et al. [29] focused on S-1 annual backscatter seasonality over forested areas from where annual backscatter variation was attributed to forest type (coniferous, deciduous or mixed forest) and forest structure in Austria and Sweden. Expanding the concept to include both multi-sensor and multi-temporal data has shown superior classification in mapping of forest-agriculture mosaics when combining multi-temporal sets of S-1 and S-2 data across highly different landscapes [30,31]. Similarly, high accuracies were obtained with fusion of S-1, S-2, a digital elevation model (DEM), and multi-temporal Landsat-8 data for classification forest types in Wuhan, China [32].
Most former studies conducted on nemoral forest resources were generally limited in spatial extent and/or focused on a simple classification of landscapes into major land cover classes. However, we hypothesize that multi-temporal, multi-censor data will have a specific strength in further classification of nemoral forest landscapes into species forest types or even individual species owing to the distinct seasonal patterns of the phenology of nemoral broadleaved forests. This study aims to classify the Danish landscape (43,000 km 2 ) into forest/non-forest and further into forest types (broadleaved/coniferous) and species groups. This is achieved by semi-automatic classification of multi-temporal S-1 and -2 images using machine learning (random forest) trained with samples from the National Forest Inventory (NFI).

Materials and Methods
The presented methodology comprises the preprocessing of multi-temporal microwave and optical S-1 and S-2 data from the Copernicus Programme, acquisition of suitable reference information from the national forest inventory (NFI), machine learning-based model building and classification including performing tuning and validation procedures.
To get a representation of the main phenological changes in the landscape during a year, two sets of images were collected. A set representing the winter period after the senescence of deciduous trees and a set representing the summer period after the foliation of deciduous trees. The winter collection is a series of mosaic images recorded from primo November 2018 to ultimo February 2019, where the summer collection represents images obtained from medio May to medio September 2019. A summary of all feature layers included in the models and their mean and standard deviation is provided in Table S1. All initial pre-processing of satellite images were performed in the Google Earth Engine (GEE) code environment [33].

Preparation of optical data
Multispectral optical data were generated from a series of 192 (summer) and 188 (winter) S-2 Level-2A BOA (bottom-of-atmosphere reflectance) images acquired from the constellation of the two twin satellites, S-2A and S-2B. A cloud-masking procedure was applied identifying flagged cloud and cirrus pixels. Finally, a cloud-free composite of each spectral band covering the entire land surface of Denmark was derived by applying a median compositing function on the winter and summer images, respectively. All bands with a resolution of 10-20 m were selected, including bands within the visible (band 1-4), near infrared (NIR, band 5-8) and short wave infrared (SWIR, band 9-12) parts of the electromagnetic spectrum. The normalized difference vegetation index (NDVI) was calculated and added to the set. The final composites were resampled (nearest neighbor) to 10 m resolution and exported as GeoTif-files.
To add a textural component to the image set, seven rotation-invariant Haralick texture features were calculated using a 3x3-window on the NDVI summer composite, including mean, variance, homogeneity, contrast, dissimilarity, entropy and second moment [34].

Preparation of SAR data
SAR data were generated from a series of 140 (summer) and 127 (winter) S-1 Level-1 Ground Range Detected images with a pixel size of 10 m. The constellation of two satellites, S-1A and S-1B, sharing the same orbital plane, carries a single C-band synthetic aperture radar instrument operating at a centre frequency of 5.405 GHz. The data was recorded in Interferometric Wide swath mode (IW), with incidence angels below 45 degrees and delivered in dual polarization, both vertical-vertical (VV) and vertical-horizontal (VH). The images were subject to radiometric calibration and thermal noise removal and finally mosaicked to a composite of the area of interest by applying a mean function on the winter and summer collection respectively. The final images were smoothened with a 3x3-window mean filter.

National Forest Inventory data
Sample plot measurements from the Danish National Forest inventory (NFI) were applied as ground-truth reference data for training of the classifiers and validation of the classification output. The NFI is a sample-based inventory with partial replacement of sample plots and is carried out on a yearly basis (Nord-Larsen and Johannsen, 2016). The sample plots are structured in a 2 x 2-km grid covering the whole country. Each sample plot (viz. primary sampling units or PSU for short) is composed of four circular subplots (viz. secondary sampling units or SSU for short) with a radius of 15 m and centered on the corners of a 200 ×200 m square. A total of ~43.000 SSUs are included in the sampling design but only a representative subsample of 1/5 covering the entire country is measured every year of a 5-year cycle. Prior to each year's measurements, the presence/absence of forest within SSU is identified on the most recent aerial photographs (typically updated every year, [35]) and only forest covered plots are measured in the field. The SSU may be subdivided into tertiary sampling units (TSU), when covering several land-use classes or forest stands.
The SSUs are located in the field using a Trimble GPS Pathfinder Pro XRS receiver mounted with a Trimble Hurricane antenna. The precision of the equipment is expected to be sub-one meter. The SSUs have a nested design composed of three concentric circles with a radius of 3.5, 10 and 15 meters respectively. In the 3.5-meter circle, the diameter of all trees is measured (single calipered) at breast height (1.3 meter). In the 10-meter circle, trees with a diameter > 10 cm are measured, and only trees with a diameter > 40 cm are measured on the full SSU. Total height is measured on a random sample of 2-6 trees inside the plot. Further recordings of relevance for this study include registration of tree species and crown cover. The crown cover in percent is estimated visually on temporary SSUs (2/3 of the sample) and using a densiometer on permanent SSUs (1/3 of the sample). Total crown cover is aggregated as the area weighted average of the TSU level estimates.
The individual SSU is designated a forest type (broadleaf, conifer or mixed) based on the proportion of the basal area of the measured trees. If broadleaves represent more than 75% of the basal area the plot is characterized as broadleaf and vice versa for conifers. If these criteria are not met, the plot is characterized as mixed forest.

Preparation of NFI data
The forest definition applied in the NFI follows the definition of the Global Forest Resource Assessment [36] i.e. a minimum area of 0.5 ha wider than 20 meters with a minimum crown cover of 10% with trees with a minimum height of 5 meters. The definition includes areas where the vegetation has the potential of reaching the aforementioned criteria as well as temporarily unstocked areas and permanently unstocked areas designated to forest management. The definitions exclude areas dominated by agricultural or urban land-uses such as orchards, cottage areas and city parks.
The adoption of land-use and potential criteria in the forest definition is problematic in the perspective of pixel-based classification of remotely sensed imagery as only the actual land cover is directly observable by the sensors. To reduce ambiguity on pixel level in the reference data, a stricter definition of forest was applied, aiming at reducing instances were land cover and land-use conflicts e.g. unstocked SSUs labeled as forest and SSUs only partly covered by forest. Consequently, a SSU was labeled as forest if a fraction of more than 75% was characterized as forest and the estimated crown cover exceeded 10%. A SSU was labeled as non-forest if 0% of the plot was characterized as forest and the estimated crown cover equals 0%. SSUs falling between these thresholds were labelled as ambiguous and removed from the reference data (Table 1).
For classification into forest types (conifer/broadleaf) only SSUs either dominated by coniferous or broadleaved species respectively, i.e. constituting a basal area of more than 75% of the plot, were included in the reference data (Table 1).
For tree species classification six distinct classes were defined based on morphological similarities and prevalence in the Danish forest area. A class can both consist of a single predominant species, a genus or ensembles of minor important species grouped together. Hereafter species, genera and species groups are referred to as tree species. The six classes include: (1) Beech (Fagus sylvatica).
(2) Quercus, comprising pedunculate oak (Quercus robur), sessile oak (Quercus petaea) and red oak (Quercus rubra). Only SSUs dominated (>75% of basal area) by one of the species of the defined classes were included in the reference data.
The training dataset was restricted to SSUs measured in the field season of 2018 to reduce the temporal span between the field measurements of the NFI and the collection of satellite data, resulting in a total of 8,586 SSUs.
The training data was created by stacking all 33 input layers (both optical and SAR) into one raster brick and subsequently sample all band values from pixels with a centroid covered by a 10 meter circular buffer of every labelled SSU center coordinate resulting in a total of 24,584 (corresponding to an average of three pixel samples per SSU), 2,220 and 1,812 pixel samples for forest cover classification, tree type classification, and species classification, respectively.
The evaluation data was based on SSUs measured in the field seasons of 2017 and 2019 keeping the maximum temporal span between the field data collection and the recording of satellite images at approximately one year and avoiding spatial correlation with the training data. The evaluation dataset included a total of 17,020 reference SSUs.

Model building and evaluation framework
The three different classification tasks (forest cover, forest type, and tree species) were performed in a hierarchical procedure. First, the forest area was delineated by the forest cover classification model, secondly and thirdly, the delineated forest mask was classified into forest type (conifer/broadleaf-dominated), and tree species. In all cases, the output was in 10x10 m resolution.
The Random Forest (RF) algorithm was applied for the supervised classification of pixels. RF is as a decision tree-based ensemble classifier building on the idea of bagging, i.e. bootstrap aggregation. The principle of RF is to create a large forest of decorrelated decision trees grown on bootstrapped reference samples and a randomly selected subset of features and subsequently average across the potential unstable single trees [37]. RF has been applied successfully for a variety of remote sensing-based forestry applications [38][39][40][41].
Model tuning and training of the RF classifier were performed in the CARET package in R [42]. The models were trained and tuned using resampling by repeated 10-fold crossvalidation to stabilize predictions and were optimized with respect to the overall accuracy (OA).
To estimate the area proportions of the mapped land cover classes and the corresponding standard errors, a "good practice" workflow for error adjusted area estimation was subsequently applied [43].
The classification performance was evaluated based on the independent evaluation dataset by the confusion matrix including overall accuracy (OA) as well as producer's (PA) and user's accuracy (UA) for the different classes. Feature importance was analyzed using a RF-based wrapper algorithm implemented in the Boruta package in R [44]. The algorithm works by creating permuted copies of the included features called shadow features. Subsequently, a RF classifier is trained with respect to the response of interest on the extended set of features and the importance measured in mean decrease accuracy is logged for every feature. Each feature is iteratively checked against the best performing shadow feature in terms of Z-score and either rejected or confirmed based on the relative importance. The algorithm was configured to perform 1000 iterations.

Post classification filtering
To close a part of the gap between the land-use definition of forest comprised by the NFI and the strict land cover definition used in the classification, two post-classification filtering procedures were applied. To comply with the minimum mapping unit criteria of the NFI, all clusters of coherent forest pixels less than 50 corresponding to an area of 0.5 ha was masked out. To comply with at least some of the land-use criteria in the NFI, areas mapped as low build-up in the GeoDanmark mapping (http://geodanmark.nu/Spec6/HTML5/DK/StartHer.htm) was masked out. The low build-up polygons include amongst others cottage areas, gardens, playgrounds, lawns, parks, and parking lots.

Forest cover model
The training procedure for the RF classification of forest cover resulted in 500 trees and 7 splitting features per tree node. Based on the Boruta simulations, all 33 input features were significant when classifying forest and non-forest (Figure 1). The most important features in predicting forest cover were the summer and winter SAR images in VH polarization (VHs and VHw). The most important feature of the optical data were the green band of the summer composite (B3s) and the SWIR band of the winter composite (B12w). In general, the most important features for classification of forest/non-forest were obtained from the S-1 SAR images (3 out of the 4 most important features). When considering the individual types of data (i.e. optical bands, SAR polarizations etc.), the summer image was nearly always more important for the classification than the winter image. Textural features from the optical data showed to have a moderate importance when classification forest vs non-forest.  The raw forest cover map was post-processed by the application of the minimum mapping unit (MMU) filter removing forest areas of less than 0.5 ha and low build-up filter removing cottage areas, gardens, playgrounds, lawns, etc.. After filtering, the mapping resulted in 100% correct classification of non-forest and a 90% correct classification of forest (Table 1). The resulting map reported a forest area of 597,980 ha (±8,573) corresponding to a forest fraction of 13.9%. The error-adjusted estimate was 636,079 ha corresponding to a forest fraction of 14.8%. Table 1. Confusion matrix and accuracy statistics based on the independent evaluation reference data and the raw forest cover classification (left) as well as for the post filtered (MMU and low buildup) forest cover classification (right). Entries in the matrix represents pixels and proportions in brackets. For the forest type classification, the RF model training resulted in 500 trees and 11 splitting features per tree node. Based on the Boruta simulations all 33 input features were found significant (Figure 3). The most important feature in predicting tree type was the winter NDVI image (NDVIw) followed by the SAR VH polarization summer image (VHs), and the band 11 SWIR summer image (B11s). The SAR VH polarization winter image (VHw) comes out as the fourth most important feature. When considering the individual types of data, the summer image was in most cases markedly more important for the classification than the winter image. Most of the textural features were less important for this type of classification. The mapping of forest types (broadleaf and conifer) resulted in an overall accuracy of 95% in correct classification of 95% (PA) of conifer and 96% (PA) of broadleaf forest (

Reference Prediction
Conifer Broadleaf Total pred. ). As for the forest vs non-forest classification, the forest type classification revealed a balanced level of commission/omission errors. The estimated conifer forest area adjusted for classification errors was 268,824 ha (± 7,247 ha) corresponding to a fraction 45.0% of the forest area. The adjusted broadleaf forest area was estimated to 329,156 ha (± 7,247 ha) corresponding to 55.0% of the forest area. Table 1. Confusion matrix and accuracy statistics based on the independent evaluation reference data and the forest type classifications of the raw forest map. Entries in the matrix represents pixels and proportions in brackets.

Tree species model
The training of the RF tree species classification model resulted in 500 trees and 4 splitting features per tree node. Based on the Boruta simulations all 33 input features were found significant. The most important feature in predicting tree species was the summer SAR image in VV polarization (VVs) followed by the winter VV image (VVw) and the band 12 SWIR summer image (B12s). The SAR VH polarization winter image (VHw) shows to be the fourth most important feature. As for the other classifications, the S-1 SAR features were generally important for the classification of individual species, but we found no particular pattern in the importance of summer vs. winter features for the classification. As for the forest type model, the textural features from the optical imagery were less important for the species type classification.  The tree species were classified with an overall accuracy of 63%. Producer accuracies were ranging from 34% to 73% (Table 3), with the oak species (Quercus) showing noticeable lower accuracies than the remaining species and the pine species (Pinus) performed the best. The species groups consisting of coniferous species were the least confused whereas the broadleaf groups (especially Oak) had a higher error rate. The error adjusted estimated area proportions and their confidence intervals are reported in Table 3. Table 2. Confusion matrix and accuracy statistics based on the independent evaluation reference data and the tree species classifications of the raw forest map. Entries in the matrix represents pixels and proportions in brackets.

Reference Prediction
Beech

Forest cover classification
Our study demonstrated that the use of multi-temporal data combining optical and SAR data from S-1 and -2 led to very high classification accuracies of forest/non-forest. Good accuracies for forest/non-forest classification were also obtained from the use of S-1 only [29] with overall agreements between 86% and 91%. We found the overall most important features for classifying forest cover to be the S-1 VH polarization (summer and winter composite). The total backscatter from a forest at wavelengths of 5.6 cm (C-band) comes primarily from leaves or needles and twigs of the upper canopy and their interactions, allowing the identification of forest/non-forest areas [45,46]. The importance of the dual-polarization (VH) information over single-polarization (VV) for non-forest vs forest cover classification aligns well with the well-known higher sensitivity of VH microwave backscatter to total above-ground biomass than VV [46]. The importance of VH information over spectral signatures from optical instruments for forest extent mapping contrasts other studies fusing S-1 and -2 data, who have found optical data features from the S-2 to be superior in areas covering temperate dense mixed forests, open savanna woody vegetation and forest plantations as well as forest-agriculture mosaics [30,31]. However, Heckel et al. [30] found that S-1 SAR data were better at predicting tree cover in highly fragmented landscapes, which apply for the Danish intensively cultivated landscape. It is well known that SAR-derived land-cover classification is imposed by topographic relief and here the flat terrain of all Danish landscapes makes an ideal case for optimal signalto-noise in the use of S-1 data. The highest ranking S-2 optical features were Band 3 (green) summer composite and Band 12 (SWIR) winter composite, which were also amongst the top predictors in [30].
The total forest area resulting from the final map was very close to the forest area of 598,186 ha (excluding unstocked areas) reported by the Danish NFI [47] with a deviation of less than 0.1%. A comparison of the mapped forest area to the land-use map resulting from the National emission inventory [48], demonstrated that the differences mainly included temporarily unstocked forest and auxiliary areas included in the forest definition such as fire-breaks, forest roads etc.
The obtained accuracies for the forest cover classification corresponds to a mapping accuracy similar to that obtained in studies using combinations of S-1 and S-2 data [31] and to the accuracy obtained for land classification with multi-seasonal Landsat Thematic Mapper-5 images [49]. The accuracy obtained in our study should be viewed in the context of the small-scale variation typical for the Danish landscape but also recognizing that we in this study distinguished only between forest and non-forest, where the studies above distinguished between up to eight different land-cover types. Even higher accuracies of classifying only forest/non-forest (OA=99%) were obtained in relation to a study of forest type classification in Wuhan, China when fusing S-2 and DEM images obtained from the Shuttle Radar Topography Mission (SRTM) [32]. In countries with very small differences in elevation, such as Denmark where the highest point is 171 m a.s.l., it is unlikely that inclusion of a DEM would improve classification results. However, wall-to-wall airborne laser scanning campaigns in Denmark have provided canopy height models, which in future efforts could be used to improve forest/non-forest classification [50].

Forest type classification
The classification of the forested landscape into forest types (i.e. broadleaved and coniferous forest) had similar classification accuracy to the forest/non-forest classification with an overall of accuracy of 95%. Not surprisingly, given the phenological differences between the two forest types, NDVI obtained from winter images had the highest mean rank of importance among the variables used in the classification. The use of dual-polarization SAR data (VH) was also found to perform well in discriminating between broadleaved and coniferous forests, which is likely caused by the intrinsic nature of SAR backscatter responding to the structure of the forest (forest type specific volume scattering from different interaction with leaves versus needles and twigs of the upper canopy).
The classification accuracy of forest types obtained in this study was higher than obtained in a study by Dostálová, et al. [29] reporting an overall accuracy of 85% when classifying the forest type (between non-forest, coniferous and broadleaf forest classes) in temperate forest from the use of S-1 only. Also in their study, the VH-polarization was found to contain most of the information necessary to perform a forest type classification. In a study by Rüetschi et al. [28], an overall accuracy of 86% was achieved in classifying deciduous and coniferous species, using S-1 in Northern Switzerland. Other data fusionbased studies of forest type classification involved eight different forest types in Wuhan, China from fusion of multiple sources of EO data including S-1, S-2, DEM and Landsat 8 images [32] and of 11 subtropical forest types in Guangxi Zhuang, China from fusion of ZiYuan-3, S-2, and Landsat 8 images [51]. In the study by Liu et al. [32], the highest accuracy (83%) was achieved when fusing S-2, DEM, S-1 (VV polarization) and Landsat 8 images and the authors concluded that that elevation and multi-temporal data contributed the most to forest type identification. Similarly, Yu et al. [51] found the highest classification accuracy (84%) when fusing spectral and topographical data. As with classification of forest/non-forest, it is unlikely that inclusion of terrain differences would improve classification results, as also observed by Strahler et al. [52], in a flat country, such as Denmark. However, as with classification of forest/non-forest, airborne laser scanning in Denmark has provided canopy height models that may provide additional information relevant to classification of forest types [50].
The resulting forest type distribution was very similar to that obtained from the Danish NFI (51% broadleaved forest and 49% conifer forest) (Nord-Larsen et al., 2020). Some of the differences between observed and predicted forest type was due to misclassification of larch (~1% of the forest area [53]), likely due to phenological changes across seasons similar to broadleaves.

Tree species classification
Our classification of individual tree species showed lower accuracy than the classification of forest types with an overall accuracy of 63% and producer accuracies ranging 34-74% for individual species and species groups. Features of highest importance for the classification of nemoral forests was primarily the VV polarization images during summer and winter, contrasting the classification of forest and forest type, where the VH polarization performed better. The obtained accuracy was lower than the one obtained in a much similar study using multi-temporal S-2 images for species classification in Central Sweden (Persson et al., 2018). Using four S-2 from April, May, June and October Persson et al. [54] reported an overall accuracy of 88% for the area-based classification of five different tree species. The most important classifiers in the study op. cit. included the red edge bands 2 and 3 as well as the narrow NIR (near-infrared) band 8a, all obtained on May 27th.
Although we found a different set of potential variables to be more important in the classification, we speculate that higher species classification accuracy may generally be obtained if including several image composites around the time of flushing of the nemoral species in April-June and during the peak of autumn colors in Oktober-November. The possibility for successfully doing so depends on the cloud cover during this critical time of the growing cycle. Another option would be to make use of the full temporal resolution of the Sentinel data to produce per-pixel phenological metrics such as start, end and length of growing season that have otherwise proven useful in land cover classification [55][56][57]. Other approaches for classification of single tree species are typically based on remote sensing data sources (Lidar and VHR data) not currently offered as cloud-based date sources facilitating national scale wall-to-wall coverage analysis of no cost [58,59] or data fusion between S-2 data and Airborne Laser Scanning Data [60].
Noticeably, in our study the groups of conifers were more accurately classified than the broadleaves. Especially oak was poorly classified. We speculate that the differences in results may be related to differences in crown size, as also pointed out in a comprehensive review on tree species classification from remotely sensed data [61]. In the predominant management systems in Denmark, oak is often open grown and obtain large crowns compared to most conifers commonly managed with dense stocking to avoid wind throw. With a pixel size of 10 m in the present study, each pixel may cover several conifer crowns but for older trees of oak and beech cover less than a tree crown. Another feature of oak management that may affect the result is the common use of understory crops to prevent formation of epicormics branches, which may influence reflected radiation from other trees [62,63], often too small to be captured by the design of NFI plots.
Finally, in our study, species groups were formed with respect to the prevalence of the individual species and their seeming similarities rather than based on a study of their spectral reflectance. We believe that improvements to the classification would be possible if forming of species groups based on an analysis of similarities in spectral reflectance for different tree species such as presented by Persson et al. [54].

Use of NFI data in area-based classification
A prominent feature of our study was the use of NFI sample plots (0.0706 ha) covering all land cover classes across the entire country area for both training and as reference in the evaluation of prediction accuracy. The classification of the individual plots into forest/non-forest relies on a visual interpretation of high-resolution (12.5 cm pixel-size) aerial photographs, which is subsequently validated for all plots with possible forest or other wooded land (OWL) cover. The subsequent classification of forested land into forest types and tree species relied on manual measurements of the trees inside the plot. This approach contrasts most other studies, commonly using existing maps or manual discrimination of polygons from aerial photographs [30,31,49] or classification of coarse resolution pixels from high resolution EO data [64].
Our choice of reference features a number of advantages over alternative approaches. The sample was freely available and collected in accordance with established and internationally acknowledged procedures [65]. Furthermore, the sample available for training is large, unbiased and represents the entire landscape rather than small test sites as often the case in similar studies [61]. Finally, the information available on the individual plots has a high degree of accuracy based on actual field measurements. However, the NFI design also posed a number of challenges when used for area-based mapping of forest cover. Although the Danish NFI use a sample plot size larger than most countries [65], the sample plots often cover more than one land use, forest type, and tree species. We elected to discard sample plots if a contrasting class covered more than 25% of the sample area and although this approach may be practical, it inevitably reduces the samples and ignore obvious variability. Another issue is related to the use of concentric circles for measuring different size trees. Individual sample plots may be dominated by tree crowns of trees not measured, when they are placed outside the concentric circle in which they should be measured. It is likely that some of the explanation for the poor performance of tree species classification is related to this issue.

Conclusions
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 12 January 2021 doi:10.20944/preprints202101.0235.v1 This study aimed at a wall-to-wall classification (random forest) of the Danish landscape into forest/non-forest and further into forest types (broadleaved/coniferous) and tree species classification of nemoral forests. A cloud-based preprocessing and analysis workflow (conducted in GEE), taking advantage of multi-temporal S-1 and -2 imagery covering Denmark in 2018, showed high accuracies of both forest/non-forest and forest type classification (overall accuracies of 98% and 95%, respectively) when evaluated against independent high quality ground observations from the National Forest Inventory (NFI). Area estimates of total forest area and the partitioning into conifer and broadleaf forests areas reveal a remarkable correspondence with the numbers reported by the national forest statistics 2018 (based solely on field inventories), showing the high potential for future use of freely available S-1 and -2 data for detailed, yet fully automated and timely forest resource mapping and management in a cost efficient manner. Classification of nemoral tree species performed less well (overall accuracy of 63%) from the current analysis design, with oak species mapping being the most challenging. Possibly, an increased temporal depth in the time-series data, to better capture the period of leaf flushing of the various nemoral species and the inclusion of seasonal metrics in the image classification, would improve the performance of tree species mapping. Table S1: Feature layers included in the models and their mean and standard deviation. All features except the texture features comes in a summer (s) and a winter (w) version.