Analysis of Regional Distribution of Tree Species Using Multi-Seasonal Sentinel-1&2 Imagery within Google Earth Engine

: Accurate information on tree species is in high demand for forestry management and further investigations on biodiversity and environmental monitoring. Over regional or large areas, distinguishing tree species at high resolutions faces the challenges of a lack of representative features and computational power. A novel methodology was proposed to delineate the explicit spatial distribution of six dominant tree species ( Pinus tabulaeformis , Quercus mongolia , Betula spp., Populus spp., Larix spp., and Armeniaca sibirica ) and one residual class at 10 m resolution. Their spatial patterns were analyzed over an area covering over 90,000 km 2 using the analysis-ready large volume of multisensor imagery within the Google Earth engine (GEE) platform afterwards. Random forest algorithm built into GEE was used together with the 20th and 80th percentiles of multitemporal features extracted from Sentinel-1/2, and topographic features. The composition of tree species in natural forests and plantations at the city and county-level were performed in detail afterwards. The classiﬁcation achieved a reliable accuracy (77.5% overall accuracy, 0.71 kappa), and the spatial distribution revealed that plantations ( Pinus tabulaeformis , Populus spp., Larix spp., and Armeniaca sibirica ) outnumber natural forests ( Quercus mongolia and Betula spp.) by 6% and were mainly concentrated in the northern and southern regions. Arhorchin had the largest forest area of over 4500 km 2 , while Hexingten and Aohan ranked ﬁrst in natural forest and plantation area. Additionally, the class proportion of the number of tree species in Karqin and Ningcheng was more balanced. We suggest focusing more on the suitable areas modeling for tree species using species’ distribution models and environmental factors based on the classiﬁcation results rather than ﬁeld survey plots in further studies.


Introduction
A clear understanding of the spatial distribution of tree species is crucial for afforestation decision-making, carbon cycle estimation, biodiversity assessment [1,2], and further analysis of tree-environment interactions [3,4]. Conventional forestry inventories, though time-consuming and inefficient, are the established standard. Remote sensing technology has greatly improved efficiency, because it is able to capture forest type composition and forest structure information over larger and inaccessible areas through multiband and multimode sensors compared to conventional field works [1]. This brings possible solutions to the challenging but promising topic of tree species identification.
Remotely sensed imagery data with very high spatial resolution (lower than 5 m or even submeter class) have been used for tree species identification, because they can Figure 1. The study area (the blue plots represent the field measurements and the enlarged one is the shape of the rectangular plots).

Field Measurements
The forest resources inventory sample plots from the eighth National Forest Inventory (NFI) were used as the ground-truth of the seven forest classes in this study, and a total of 342 rectangular plots (60 × 10 m) surveyed in 2018 were distributed throughout

Field Measurements
The forest resources inventory sample plots from the eighth National Forest Inventory (NFI) were used as the ground-truth of the seven forest classes in this study, and a total of 342 rectangular plots (60 × 10 m) surveyed in 2018 were distributed throughout the area (Figure 1). These plots are enabled to function as a reliable modeling and validation data because they have a positioning accuracy higher than 98% and their attribute information is updated every five years during which one full cycle of investigation is completed. Additional measurements of natural forests and plantations polygons from NFI were also used as a mask layer to delineate the study area for natural and plantation forests.

Satellite Data in the Google Earth Engine
The Google Earth engine (GEE) is a big data cloud platform with high performance computing where a petabyte analysis-ready dataset is freely available and the programming interface is also quite access-friendly [52,53,58]. The study was conducted based on the Sentinel-1A (S1) ground range detected (GRD) scenes, Sentinel-2 (S2) surface reflectance (SR) images, and the shuttle radar topography mission (SRTM) digital elevation model (DEM) dataset of GEE platform. The S1 GRD data was processed by thermal noise removal, radiation correction, and terrain correction, and the 10 m dual band VV+VH and HH+HV of interferometric wide swath (IW) mode was selected for further processing to match the resolution of Sentinel-2. For the S2 SR image geometric, radiation, and atmospheric correction were performed, and the final SR image composites consisted of 10 bands with two spatial resolutions, including 10 m visible and 20 m infrared and red-edge bands. Additionally, cloud mask (QA60) band was also used to help mask clouds in image scenes, leaving only cloudless pixels with good quality. To reflect the temporal characteristics of different tree species in the four seasons of spring, summer, autumn, and winter, the S1 and S2 images of four months (March, June, September, and December) in 2019 were selected. The 30 m SRTM DEM data was used to depict the varied topographic feature, generating aspect and slope variables. Table 1 provided details regarding the parameter of the satellite data in this study. Our goal was to leverage the powerful computing ability of the GEE platform for producing a high resolution typical tree species distribution map across Chifeng and figure out the spatial pattern of each tree species. We proposed and implemented the novel methodology within the GEE cloud-computing platform, which was split into four processes including field sample plots processing, mining multitemporal feature, optimizing RF model, and classification and analyzing. Figure 2 is an overview of our workflow described in detail in subsequent sections.

Field Plots Processing
These plots were not built-in data on the GEE platform. Instead, they were made up of 307 modeling and independent 35 validation plots, of which the spatial distribution was provided in Figure 1. Taking the differentiation of regional scales into account, modeling and verification points are reasonably distributed throughout the study area to better represent the local characteristics of each tree species and reduce the systematic errors in modeling and accuracy assessment. All pixel values extracted fall inside the polygons, not only the geometric center pixels, were used in the subsequent analysis.

Mining Multitemporal Features from SENTINEL-1/2 Imagery
The multitemporal features were all derived from the S1 and S2 products using GEE on line. We imported the S1 and S2 datasets from the data catalog of GEE and filtered out all the images covering the entire study area in March, June, September, and December according to the image acquisition date. Figure 3 provided the visual characteristic changes reflected by S2 surface reflectance images of four seasons. Cloud occlusion seriously affects the application of optical satellite images. Therefore, the cloud-mask operation was performed on each S2 scene afterwards, while GRD images of S1 were further screened leaving only image scenes of the IW mode.

Field Plots Processing
These plots were not built-in data on the GEE platform. Instead, they were made up of 307 modeling and independent 35 validation plots, of which the spatial distribution was provided in Figure 1. Taking the differentiation of regional scales into account, modeling and verification points are reasonably distributed throughout the study area to better represent the local characteristics of each tree species and reduce the systematic errors in modeling and accuracy assessment. All pixel values extracted fall inside the polygons, not only the geometric center pixels, were used in the subsequent analysis.

Mining Multitemporal Features from SENTINEL-1/2 Imagery
The multitemporal features were all derived from the S1 and S2 products using GEE on line. We imported the S1 and S2 datasets from the data catalog of GEE and filtered out all the images covering the entire study area in March, June, September, and December according to the image acquisition date. Figure 3 provided the visual characteristic changes reflected by S2 surface reflectance images of four seasons. Cloud occlusion seriously affects the application of optical satellite images. Therefore, the cloud-mask operation was performed on each S2 scene afterwards, while GRD images of S1 were further screened leaving only image scenes of the IW mode.  When all four months of S1 and S2 images were analysis-ready for further procedures, we made full use of the advantages of the GEE platform to integrate the threedimensional (time, space, and spectrum) features mined from these multitemporal images. Specifically, 23 metrics derived from S1 and S2 dataset ( Table 2) were divided into seven categories, of which 16 were from SR images and the remains from S1 GRD images ( Figure 4). For the S2 spectral index, we calculated eight commonly used indices incorporating the visible, near-infrared, and red-edge bands, including infrared percentage vegetation index (IPVI) [59], transformed normalized difference vegetation index (TNDVI) [60], green normalized difference vegetation index (GNDVI) [61], the second brightness index (BI2) [62], Meris terrestrial chlorophyll index (MTCI) [63], red-edge inflection point index (REIP) [64], inverted red-edge chlorophyll index (IRECI) [64,65], normalized difference vegetation index (NDVI), and enhanced vegetation index (EVI) [66]. To take full use of the high spatial resolution, grey-level co-occurrence matrix (GLCM) was performed on the NIR bands with highest resolution (10 m) and sensitivity to vegetation to generate four texture features (the second moment, contrast, homogeneity, and entropy) of S1 scenes. Furthermore, we addressed the like-polarization (VV/HH) and cross-polarization (VH/HV) yielding four radar indices (division, difference, amplitude, and normalization). Finally, we applied linear regression on the EVI and VH variables to capture the gradient of spectral and radar back scatter over time in one month as well. When all four months of S1 and S2 images were analysis-ready for further procedures, we made full use of the advantages of the GEE platform to integrate the three-dimensional (time, space, and spectrum) features mined from these multitemporal images. Specifically, 23 metrics derived from S1 and S2 dataset ( Table 2) were divided into seven categories, of which 16 were from SR images and the remains from S1 GRD images ( Figure 4). For the S2 spectral index, we calculated eight commonly used indices incorporating the visible, near-infrared, and red-edge bands, including infrared percentage vegetation index (IPVI) [59], transformed normalized difference vegetation index (TNDVI) [60], green normalized difference vegetation index (GNDVI) [61], the second brightness index (BI2) [62], Meris terrestrial chlorophyll index (MTCI) [63], red-edge inflection point index (REIP) [64], inverted red-edge chlorophyll index (IRECI) [64,65], normalized difference vegetation index (NDVI), and enhanced vegetation index (EVI) [66]. To take full use of the high spatial resolution, grey-level co-occurrence matrix (GLCM) was performed on the NIR bands with highest resolution (10 m) and sensitivity to vegetation to generate four texture features (the second moment, contrast, homogeneity, and entropy) of S1 scenes. Furthermore, we addressed the like-polarization (VV/HH) and cross-polarization (VH/HV) yielding four radar indices (division, difference, amplitude, and normalization). Finally, we applied linear regression on the EVI and VH variables to capture the gradient of spectral and radar back scatter over time in one month as well. Table 2 summarized all the monthly variables, of which, aside from the four characteristics of slope, aspect, EVI_scale, and VH_scale, we used the 20th and 80th percentiles of the remaining monthly characteristics instead for subsequent analysis. This can reduce sensitivity of features to noise such as residual cloud and shadows, and unify the same features used in the four seasons [67]. The original 30 m terrain features were resampled to 10 m to be consistent with the spatial resolutions of S1 and S2. A total of 176 features from four months including March, June, September, and December were finally derived. Table 2. Detailed description of all the features generated from the satellite images in GEE and used for random forest classification.

Additional Ancillary Features
Topographic factors (slope and aspect) closely related to the vegetation distribution were derived from SRTM DEM data, which was widely used in forestry research [68]. We used the built-in terrain algorithm of GEE to calculate the additional two features (Table 2), which were then reclassified to convert the two continuous variables to categorical variables according to the technical regulations for continuous inventory of forest resources. Table 3 lists the criteria used for slope and aspect reclassification. The result of one-hot encoding of the two reclassified variables was used as the final topographic feature.

Optimizing Random Forest Classifier
Machine learning algorithm tuning is of great importance in obtaining a stable and high-performance classifier. Here, the random forest algorithm built-in GEE platform named "smileRandomForest" (RF) was leveraged for capturing regional tree species distribution. A tree-based RF model was one of the most commonly used typical bagging learning algorithms [42,43]. RF merges multiple decision trees to obtain a more accurate and stable model, of which the predicted results are based on the results of each decision tree by voting. The critical hyperparameter "numberOfTrees" in the RF classifier was optimized by balancing model complexity and model generalization accuracy. The learning curves were used to characterize the generalization capability of the RF model with "numberOfTrees" increasing from 1 to 100 (Figure 8). Moreover, the RF built-in attribute of out-of-bag score (oob) was used in the model tuning process, which takes advantage of the unused samples during the random decision tree generation process to evaluate the accuracy of each tree, thus yielding the quantified performance of RF algorithms by taking the average accuracy value of all trees. The smaller the difference between kappa detailed in Equation (1) and oob learning curves, the better the robustness of the model given a specific parameter value. To map the spatial distribution of the six targeted tree species (Pinus tabulaeformis, Quercus mongolia, Betula spp., Populus spp., Larix spp., and Armeniaca sibirica) and one residual class, a bagging learning model was carried out using the optimal RF classifier based on multitemporal features within the GEE cloud-computing platform. Quality evaluation of the classification result is always indispensable in remote sensing because it proves how well the used classifier is capable of identifying the desired objects from a given image. Here, we applied two accuracy measures on the classification validation and one additional metric on model optimization. As a preliminary evaluation Cohen's kappa coefficient (kappa), expressed as Equation (1), measures the consistency of assigned and reference classification assuming they are equally reliable and independent. Another accuracy measure overall accuracy (OA), expressed as Equation (2), gives quantified evidence of the proportion of all correctly classified reference samples; that is, the class assignment of the classification agrees with the reference classification. To quantitatively clarify the tree species composition in whole areas and local districts on the basis of the spatial distribution area, the zonal statistics of each tree species of the whole area and of local regions was carried out respectively leading to assessment of the natural forest and plantation forest areas.
where n is the number of reference class, x ii is the number of the correctly classified pixels of the i-th category, N is the total number of the reference pixels, and x i+ and x +i represents the total number of the i-th category of reference classification and assignment of classification respectively.

Multisource Feature Composition
The feature dataset was composed of 176 multitemporal features (e.g., original image bands, spectral/radar indices, textures, and gradient) and additional two one-hot encoded topographic features (slope and aspect). The multitemporal features were listed as Figure 5, of which the ordinate and abscissa individually represent the feature name and feature score, which were computed by the importance attribute built into RF classifier within GEE and was proportional to the contribution of the corresponding feature to RF model. Additional two terrain features were represented as Figures 6 and 7 where the grayscale pictures on the left were the unprocessed continuous numeric features extracted from terrain data, while the classified color images were their corresponding reclassification results according to the criteria in Table 3.          Figure 8 implies the accuracy of the RF classifier generally increased with the increase of the parameter 'numberOfTrees'. The growth rate increased at first and then decreased, finally stabilizing with small fluctuations. The two learning curves (kappa and out_bag_score) had the minimum difference (0.06). This shows the RF model had the best generalization capability and robustness when the key hyperparameter 'numberOfTrees' reached 71. Therefore, we built a random forest model with 71 trees as the optimal model for tree species classification.  Figure 8 implies the accuracy of the RF classifier generally increased with the increase of the parameter 'numberOfTrees'. The growth rate increased at first and then decreased, finally stabilizing with small fluctuations. The two learning curves (kappa and out_bag_score) had the minimum difference (0.06). This shows the RF model had the best generalization capability and robustness when the key hyperparameter 'numberOfTrees' reached 71. Therefore, we built a random forest model with 71 trees as the optimal model for tree species classification.

Tree Species Classification and Accuracy Assessment
The spatial distribution mapping of target tree species across Chifeng city was presented as Figure 9, of which the spatial pattern clearly depicts that Pinus tabulaeformiswas mainly distributed in the southern four districts, while Quercus mongolia was concentrated in the northernmost part of Chifeng. Additionally, Armeniaca sibirica and Populus spp. trees were found in the central region and in the western and eastern regions, respectively. As for Betula spp. it was found along the western and northwestern forest margins, and Larix spp. was mainly distributed in the western and southwestern regions.

Tree Species Classification and Accuracy Assessment
The spatial distribution mapping of target tree species across Chifeng city was presented as Figure 9, of which the spatial pattern clearly depicts that Pinus tabulaeformis was mainly distributed in the southern four districts, while Quercus mongolia was concentrated in the northernmost part of Chifeng. Additionally, Armeniaca sibirica and Populus spp. trees were found in the central region and in the western and eastern regions, respectively. As for Betula spp. it was found along the western and northwestern forest margins, and Larix spp. was mainly distributed in the western and southwestern regions.
There were significant differences in the distribution range and tree species composition between natural forests and plantations Figure 10. The planted forests were distributed in a wider area compared to natural forests. The former was distributed almost throughout the area but mostly in the south, while the latter was dominant in the north. In addition, the planted trees in this region were composed of Pinus tabulaeformis, Populus spp., and Larix spp., while the natural tree species mainly consisted of Quercus mongolia, Betula spp., and Armeniaca sibirica.

Tree Species Classification and Accuracy Assessment
The spatial distribution mapping of target tree species across Chifeng city was presented as Figure 9, of which the spatial pattern clearly depicts that Pinus tabulaeformiswas mainly distributed in the southern four districts, while Quercus mongolia was concentrated in the northernmost part of Chifeng. Additionally, Armeniaca sibirica and Populus spp. trees were found in the central region and in the western and eastern regions, respectively. As for Betula spp. it was found along the western and northwestern forest margins, and Larix spp. was mainly distributed in the western and southwestern regions.   There were significant differences in the distribution range and tree species composition between natural forests and plantations Figure 10. The planted forests were distributed in a wider area compared to natural forests. The former was distributed almost throughout the area but mostly in the south, while the latter was dominant in the north. In addition, the planted trees in this region were composed of Pinus tabulaeformis, Populus spp., and Larix spp., while the natural tree species mainly consisted of Quercus mongolia, Betula spp., and Armeniaca sibirica. The classification accuracy assessment was carried out to yield the confusion matrix as in Figure 11, from which the quantified accuracy of the classification results was calculated according to the accuracy evaluation metrics of OA and kappa that are detailed in Section 2.4.6. The overall accuracy (OA) = 77.5% and kappa = 0.71 for the seven classes in Chifeng are based on the RF classifier with multitemporal features within GEE.  The classification accuracy assessment was carried out to yield the confusion matrix as in Figure 11, from which the quantified accuracy of the classification results was calculated according to the accuracy evaluation metrics of OA and kappa that are detailed in Section 2.4.6. The overall accuracy (OA) = 77.5% and kappa = 0.71 for the seven classes in Chifeng are based on the RF classifier with multitemporal features within GEE. There were significant differences in the distribution range and tree species composition between natural forests and plantations Figure 10. The planted forests were distributed in a wider area compared to natural forests. The former was distributed almost throughout the area but mostly in the south, while the latter was dominant in the north. In addition, the planted trees in this region were composed of Pinus tabulaeformis, Populus spp., and Larix spp., while the natural tree species mainly consisted of Quercus mongolia, Betula spp., and Armeniaca sibirica. The classification accuracy assessment was carried out to yield the confusion matrix as in Figure 11, from which the quantified accuracy of the classification results was calculated according to the accuracy evaluation metrics of OA and kappa that are detailed in Section 2.4.6. The overall accuracy (OA) = 77.5% and kappa = 0.71 for the seven classes in Chifeng are based on the RF classifier with multitemporal features within GEE. Figure 11. The confusion matrix of six target tree species and one remaining categories. Figure 11. The confusion matrix of six target tree species and one remaining categories.

Quantitative Analysis on the Tree Species Distribution
The tree species area results of the entire region revealed that natural forests and plantations separately accounted for 47% and 53% of the total forest area Figure 12. Moreover, Armeniaca sibirica was roughly equal in proportion to natural and cultivated trees, and it had the largest distribution area of more than 10,000 km 2 , followed by the Populus spp. trees covering an area over 8000 km 2 . Almost all Pinus tabulaeformis(84%) and Populus spp. (80%) were planted trees, whereas Quercus mongolia (91%) and Betula spp. (94%) occurred mainly in natural forest.

Quantitative Analysis on the Tree Species Distribution
The tree species area results of the entire region revealed that natural forests and plantations separately accounted for 47% and 53% of the total forest area Figure 12. Moreover, Armeniaca sibirica was roughly equal in proportion to natural and cultivated trees, and it had the largest distribution area of more than 10,000 km 2 , followed by the Populus spp. trees covering an area over 8000 km 2 . Almost all Pinus tabulaeformis(84%) and Populus spp. (80%) were planted trees, whereas Quercus mongolia (91%) and Betula spp. (94%) occurred mainly in natural forest. The statistical results of tree species composition and spatial area at the county-level are summarized in Figure 13. It quantitatively revealed the regional differences of forest resources among districts in Chifeng. In terms of total forest resources, Arhorchin and Aohan ranked first and second with total forest area of more than 4800 and 4100 km 2 respectively, and the proportion of natural forest to plantation forest in the former was relatively balanced, while the latter was mainly plantation forest. In addition, Linxi and Ningcheng are dominated by plantations having the smallest total forest area, but both were less than 2000 km 2 . From the perspective of diversity of tree species, the distribution area of these typical tree species in Kalaqin Banner and Ningcheng was more balanced with both having only slightly more Pinus tabulaeformis, of which each area was respectively about 280 and 480 km 2 . Pinus tabulaeformis, Populus spp., and Armeniaca sibirica trees were dominant in multiple regions, and the regions of Pinus tabulaeformis were Karqin and Ningcheng, of Populus spp. were Aohan, Arhorchin, and Ongniud, and of Armeniaca sibirica were Bahrain left and right, Hexingten, Linxi, and the municipal district. The statistical results of tree species composition and spatial area at the county-level are summarized in Figure 13. It quantitatively revealed the regional differences of forest resources among districts in Chifeng. In terms of total forest resources, Arhorchin and Aohan ranked first and second with total forest area of more than 4800 and 4100 km 2 respectively, and the proportion of natural forest to plantation forest in the former was relatively balanced, while the latter was mainly plantation forest. In addition, Linxi and Ningcheng are dominated by plantations having the smallest total forest area, but both were less than 2000 km 2 . From the perspective of diversity of tree species, the distribution area of these typical tree species in Kalaqin Banner and Ningcheng was more balanced with both having only slightly more Pinus tabulaeformis, of which each area was respectively about 280 and 480 km 2 . Pinus tabulaeformis, Populus spp., and Armeniaca sibirica trees were dominant in multiple regions, and the regions of Pinus tabulaeformis were Karqin and Ningcheng, of Populus spp. were Aohan, Arhorchin, and Ongniud, and of Armeniaca sibirica were Bahrain left and right, Hexingten, Linxi, and the municipal district.

Discussion
Tree species identification by use of the remote sensing technique is a fairly challenging task due to the mixed pixels and low separability among trees [2]. There are numerous studies that have been dedicated to tree species mapping using remotely sensed data, but the classification results dealt with a lower number of targeted categories, covered a small area, and required high spatial or spectral images. These images are less practical to assist in forestry inventories, environmental monitoring, or carbon cycle estimation, all of which require working over large areas [2]. Sufficient computing capacity to handle a large volume of satellite images is a prerequisite for large geographic regions, but it is generally not available locally.
This study explored the use of a non-parametric RF classifier built into the GEE cloud computing platform to classify the dominant tree species over a regional area of more than 90,000 km 2 to assess the potential of GEE in the identification of forest fine categories over large areas. Two main critical points were undertaken in this workflow: (i) taking the 20th and 80th percentiles of multisource indicators for the same month and scene, to reduce the noise effect of the maximum and minimum values of the images [67]; and (ii) making full use of the computing power provided by the GEE platform on the repeated observations of S2 satellite over the studied area.
The results obtained in this study revealed that most of the natural forests and plantations were locally concentrated, and different dominant tree species, which plainly indicated the heterogeneity of forest site conditions between the northern and southern regions. Four of the six tree species are typically found in mountainous areas. Quercus mongolia grows in the northern areas, Betula spp. is distributed along the western high altitude montane areas, while Pinus tabulaeformis and Larix spp. grows under a similar site condition and are mostly found in southwestern areas in mixed plantings. The main difference is that Pinus tabulaeformis has a wide distribution range while Larix spp. is only concentrated in steep areas. The other two species, including Armenniaca sibirica and Populus spp. have a wider suitable area. They are distributed in both mountainous and flat terrain areas, but the former is mainly in mountainous areas, while the latter mostly grows in relatively flat areas.
The tree species classification achieved an acceptable accuracy (kappa = 0.71, OA = 77.5%), which was comparable to existing related studies. For example, one similar study for seven different deciduous and coniferous tree species covering an area about 100 km 2 in Germany based on RF and Sentinel-2 images achieved a lower accuracy (OA = 65%) [35], and another for regional single tree species (Shorea robusta) classification using time series MODIS data had also obtained lower accuracy (OA = 69.9%, kappa = 0.58) [69]. Using object-based methods together with multitemporal and multispectral images acquired with UAV resulted in overall accuracies greater than 73% [70][71][72][73]. Furthermore, some of previous studies obtained better accuracy on tree species classification. Their study areas were relatively small, such as one case using time series Sentinel-2 data for an area of 11.8 km 2 with nine tree species achieving an 82% classification accuracy [74] and another in an area of 9 km 2 with two tree species achieved 88% accuracy [41]. Furthermore, very high spatial or spectral resolution images were required, e.g., using Sentinel-2 together with Hyperion images for the classification of two tree species over 11.2 km 2 , which exhibited an overall accuracy of 97% [75].
The present study mainly focused on tree species classification with high spatial resolution imagery using the GEE platform without analyzing the driving factors of tree species distribution. Further work may perform in suitable areas (ecological envelope) modeling of tree species using species' distribution models and environmental factors including climate, soil, and terrain attributes based on the classification results rather than field survey plots. This could assist in analyzing the projection of tree species in future climate scenarios.

Conclusions
High spatial resolution maps of tree species composition over large areas are key to support afforestation decision-making, to monitor deforestation, and assess biodiversity. A novel methodology was proposed for tree species identification over an area covering 90,000 km 2 using Sentinel-1/2 images acquired from four months (once per season within the same year) within the GEE platform.
To our knowledge, this is the first attempt to achieve tree species classification over such a large area with high spatial resolution using GEE. We produced a 10 m spatial map of six dominant trees species (Pinus tabulaeformis, Quercus mongolia, Betula spp., Populus spp., Larix spp., and Armeniaca sibirica) and found that Pinus tabulaeformis and Populus spp. were mainly present as plantation forests, while Quercus mongolia and Betula spp. were typically found in natural forest areas. Additionally, the areas of Populus spp. and Armeniaca sibirica occupied the largest area across the study area.
The reliable accuracy demonstrated that the proposed cloud-computing workflow is capable of classifying forest types and analyzing spatial pattern over large areas when using only freely-accessible Sentinel-1/2 imagery instead of more expensive high resolution or hyperspectral data. We conclude that the novel design is well-suited to be applied on larger geographic areas to assist in helping forestry inventories.