Use of Sentinel-2 Data to Improve Multivariate Tree Species Composition in a Forest Resource Inventory

: Aerial-photo interpreted inventories of forest resources, including tree species composition, are valuable in forest resource management, but are expensive to create and can be relatively inaccurate. Because of differences among tree species in their spectral properties and seasonal phenologies, it might be possible to improve such forest resource inventory information (FRI) by using it in concert with multispectral satellite information from multiple time periods. We used Sentinel-2 information from nine spectral bands and 12 dates within a two-year period to model multivariate percent tree species composition in >51,000 forest stands in the FRI of south-central Ontario, Canada. Accuracy of random forest (RF) and convolutional neural network (CNN) predictions were tested using species-speciﬁc basal area information from 155 0.25-ha ﬁeld plots. Additionally, we created models using the Sentinel-2 information in concert with the ﬁeld data and compared the accuracy of these models and the FRI-based models by use of basal areas from a second (13.7-ha) ﬁeld data set. Based on average R 2 values across species in the two ﬁeld data sets, the Sentinel-FRI models outperformed the FRI, showing 1.5- and 1.7-fold improvements relative to the FRI for RF and 2.1- and 2.2-fold improvements for CNN (mean R 2 : 0.141–0.169 (FRI); 0.217–0.295 (RF); 0.307–0.352 (CNN)). Models created with the ﬁeld data performed even better: improvements relative to the FRI were 2.1-fold for RF and 2.8-fold for CNN (mean R 2 : 0.169 (FRI); 0.356 (RF); 0.469 (CNN)). As predicted, R 2 values between FRI- and ﬁeld-trained predictions were higher than R 2 values with the FRI. Of the 21 tree species evaluated, 8 relatively rare species had poor models in all cases. Our multivariate approach allowed us to use more FRI stands in model creation than if we had been restricted to stands dominated by single species and allowed us to map species abundances at higher resolution. It might be possible to improve models further by use of tree stem maps and incorporation of the effects of canopy disturbances.


Introduction
Spatially extensive and up-to-date information on forest attributes, including tree species composition, play vital roles in forest management and in many other environmental fields. In Canada, for example, forest resource inventories (FRIs) created from interpreted aerial photos provide stand-level estimates of tree species composition, tree height, tree density, and site quality [1]. These attributes in turn play key roles in tactical and strategic management of wood fibre resources. Species composition is particularly important because it is used to define major forest types, which in turn are used to operationalize many aspects of forest management, including growth-and-yield calculations and forecasts of timber supply [2,3]. Forest management in Germany similarly makes use of databases of stand-level attributes derived from a mix of aerial photography, field sampling, and expert judgement (Peerenboom 2003, cited by [4]). In Norway, manual interpretation of orthophotos or stereo photogrammetry is commonly used [5]. In Finland, recent FRIs (also termed forest management inventories) use spectral and texture information from large-format digital cameras in combination with LiDAR information to estimate stand attributes, including attributes of dominant species and major forest types [6]. Such forest resource inventories are also widely used in a variety of other applications; for example, in quantifying wildlife habitats [1,7,8] and carbon stocks (e.g., [9,10]).
Unfortunately, although valuable, forest resource inventories are expensive and laborious to create and the subjective photo-interpretation can be relatively inaccurate, especially with regards to tree species composition. Potential sources of error are many, including photo-interpreter skills, image quality, forest complexity, and inventory definitions [2,11]. For example, Thompson et al. [12] used 50-100 ground sampling points in each of 129 boreal stands and found that percent species abundances in the FRI were accurate only about 50% of the time for common species and <25% of the time for rare species (where accuracy was defined as either ≤20% off or ≤10% off when FRI abundance was zero). Maxie et al. [13] reported compositional accuracies of 29-88% across six studies and found only 55-62% agreement between field and FRI data when coarse forest-type classifications were compared. Especially high error rates have been reported for mixed-compared to single-species stands [12,14]. In some regions, such as the boreal forest, relatively homogenous areas of one or a few tree species are common; however, in other regions, including northern temperate forests of North America, forest stands often are spatially heterogenous species mixes. In addition to being error prone, FRIs also are typically at multiple-hectare scales. The ability to map species composition at finer spatial scales could prove useful in many applications; for example, to facilitate more detailed understanding of wildlife-habitat relationships [13], to aid in cut-block design, to map small-scale disturbances, or to better mitigate effects of invasive forest pests.
Continuing advances in remote sensing offer considerable potential to improve estimates of tree community composition. Because of differences among tree species in their spectral characteristics and phenologies, of particular interest are multi-or hyper-spectral data from multiple time periods. Such information at high resolution has been used to identify individual tree species with considerable success, especially when combined with high-resolution information on tree structure from LiDAR (e.g., [15,16]). However, high-resolution multi-spectral data can be expensive and may not be widely available. Copernicus Sentinel-2 satellites, which were launched in 2015 and 2017 and whose data are freely available, are potentially useful in that they provide multiple spectral bands in the visible, near and short-wave infrared parts of the spectrum at moderate resolution (10-60 m) (e.g., [17]). Importantly, the satellites have nominal return times of 2-5 days, which allows for the possibility of multiple cloud-free images during the growing season. Several studies have reported success in identifying tree species at the stand level using these data, especially when images from several dates were included. For example, Immitzer et al. [17] summarized information from seven studies and reported overall accuracies of 76-92% in identifying tree species and forest types. In their own study they achieved an overall accuracy of 89% for 12 species using multiple Sentinel-2 scenes.
Some Sentinel-2 studies to date have used regional or national FRIs to identify areas that were heavily dominated by single species, and then tested the predictive power of Sentinel-2 models against these same data (see also [4]). This begs the question though: can such models be used to improve the FRI itself? In particular, can Sentinel-2 imagery in concert with the FRI be used to improve the FRI? The reasoning here is that because of species-specific phenological variation, the satellite information in concert with the FRI might increase the accuracy of the FRI, resulting in better estimation of tree species composition than in the original training data. Such an approach might also lend itself to higher resolution mapping of species composition, given that the satellite information is primarily at 20-m resolution compared to the multiple-hectare size of forest stands in FRIs. This approach might also help to simplify and improve the process of creating FRI updates.
A notable challenge, however, is that even single Sentinel-2 pixels, and certainly many FRI stands, include more than one tree species [17,18]. For example, in the FRI used in the present analysis (see below), only 5% of stands in the study area contained only one species and in 35% of stands, even the single-most abundant species comprised <45% of the forest canopy. Instead of focusing on areas heavily dominated by a single species, an interesting alternative might be to use a multivariate approach; that is, to model the vector of species abundances in each stand. Such an approach could make use of more FRI information, rather than just that from monodominant (or near-monodominant) stands and might allow mapping of species abundances rather than simple presence or absence. We are unaware of any tests of such an approach in the literature, although multivariate methods have been used to impute forest attributes for dominant species and major forest types from a combination of aerial and LiDAR information (e.g., [6]) or from a combination of field and satellite information [19,20].
Here, we investigate the utility of Sentinel-2 data in combination with FRI information to predict multivariate species composition in an area of the Great Lakes-St. Lawrence forest of south-central Ontario, Canada. We were especially interested to see if Sentinel-2 predictions improved on the FRI itself, which we tested by use of independent, field-based information on species-specific basal areas from 155 0.25-ha field plots. We were also interested to see how well the models performed against ones that were trained with field data rather than FRI information; therefore, we compared the performance of both FRI-and field-trained models using a second field-based data set (a 13.7-ha, exhaustively sampled area). Under the hypothesis that the FRI in combination with Sentinel information provides more accurate information on tree species composition than the FRI itself, we predicted that correlations among FRI-and field-trained predictions would be higher than their correlations with the FRI itself. Finally, in creating our models, we compared two statistical modelling methods. One was random forest, which is a relatively simple approach that has proven successful in many studies (e.g., [21,22]). However, several authors reported better prediction with multi-layer neural networks (e.g., [16,23]). Hence, as a second method we used convolutional neural networks.

Study Area
The study area consisted of an approximately 1.3 million ha area in south-central Ontario, Canada that included Algonquin Provincial Park (APP) to the north and adjoining areas to the south, including the privately-owned Haliburton Forest & Wild Life Reserve Ltd. (Figure 1a). The area falls within Rowe's [24] Great Lakes-St. Lawrence forest zone. This moderately-dissected region of the Canadian Shield has an elevation range of approximately 160-590 m, which in concert with variation in soils results in considerable diversity in local site conditions and tree communities. White pine and other conifers are especially prevalent on glacial outwash plains to the northeast and sugar maple and other hardwoods on the higher-elevation headwater regions to the southwest. In the Forest Resource Inventory (FRI) for the study area, 23 tree taxa were represented: sugar maple (26.6% of the canopy), Populus spp. (11.6%), white pine (8.0%), yellow birch (7.3%), eastern hemlock (7.0%), red maple (6.8%), balsam fir (6.5%), white birch (5.8%), white spruce (4.4%), black spruce (3.2%), red pine (2.8%), beech (2.5%), red oak (2.5%), eastern white cedar (2.3%), black ash (0.6%), basswood (0.6%), jack pine (0.5%), white ash (0.3%), larch (0.2%), ironwood (0.1%), black cherry (0.1%), red spruce (<0.1%), and white elm (<0.1%; see Table 1 for scientific names). Forests were diverse at the stand scale, with 95% of stands comprised of more than one tree species. Most of the area is managed for timber resources. Partial harvesting systems dominate, especially single tree selection in maple-dominated stands and shelterwood silviculture in pine-dominated stands. Mean monthly temperatures for January and July, respectively, were −10 and 19 • C, and annual precipitation was 1078 mm (1981-2010 normals; [25]). , and Forest Resource Inventory information (green) that was used in combination with Sentinel-2 satellite information to model percent tree species composition (a). White areas are lakes and areas outside the study area. Also shown are locations of field-measured 0.25-ha plots ("plus" symbols; (b)) and tree stems ≥10 cm DBH mapped in the "Megaplot" (dots) with superimposed 0.24-ha rectangular plots (black lines; (c)).

The Forest Resource Inventory
In the Ontario FRI, 1:10,000 or 1:20,000 analogue aerial photos that were panchromatic and stereoscopic were interpreted to delineate forests stands (relatively homogeneous areas with respect to tree species composition, age, arrangement, or condition) and to estimate associated forest attributes such as tree species composition, average tree height, relative tree density, and site quality.
Field cruises and other sources of information were used in some cases to aid in interpretation [2,26]. The FRIs that we used were based on photography taken in 2000 (APP) or 2007-2008 (areas outside of APP). More recent FRIs based on interpretation of higher-resolution colour imagery were available for some of the study area, but not all, hence we used the older FRI. Also, three forest managers in the region expressed reservations to us about the accuracy of the newer FRI compared to the older one. The disparity in age between the FRI data and the Sentinel-2 imagery is addressed in the discussion. In the FRI, photo interpreters estimated percent crown occupancy of tree species to the nearest 10% and included all species contributing at least 10% [2]. As noted above, 23 native tree taxa were represented in the FRI for the study area (22 species plus one taxon at the genus level (Populus tremuloides, P. balsamifera, plus P. grandidentata)). Other species groups and non-native species were rare and were excluded from consideration.

Sentinel-2 Data
Because of the relatively diverse species pool represented in the study area and evidence that multi-temporal imagery is useful when identifying multiple species or forest types (e.g., [17,27]), we used a relatively large set of cloud-free scenes between the approximate start and end of the growing season. Sentinel-2 data became available only recently for our study area, hence we were able to obtain 12 growing-season images with <10% cloud cover during 2018 and 2019. Most of the images were already transformed to bottom-of-atmosphere (BOA) reflectance (Sentinel product 2A); however, for two we used sen2cor in R to convert them from top-of-atmosphere (Sentinel product 1C) to BOA [28]. October 2019. We used the 9 spectral bands provided in the download product at 20-m resolution: namely, B02, B03, B04, B05, B06, B07, B8A, B11, and B12. In hindsight, it might also have been useful to have included B08 (10-m resolution), but it was excluded.

Field Data
We used information from two sets of field plots georeferenced at sub-metre accuracy. Both were in Haliburton Forest in the southwestern part of the study area ( Figure 1).

Data Pre-Processing
We created models to predict multivariate tree species abundances from the Sentinel-2 scenes using two sets of training data: the FRI (termed Sentinel-FRI models) and the 0.25-ha field plots (termed Sentinel-field models; see Figure 2 for a summary of the project workflow and Supplementary Materials Table S1 for a summary of model creation and validation).
Prior to creating Sentinel-FRI models, we excluded any FRI stands that included species or species groups other than the 23 listed above. We excluded defective, cloud, cloud-shadow, snow, and "dark area" Sentinel-2 pixels by making use of accompanying scene classification and cloud and snow probabilities (also at 20-m resolution). Specifically, we excluded any pixels that for one or more images had a probability of cloud or snow of >50% and we included only scene classification types 4 (vegetation) or 5 (nonvegetated land). The FRI stand polygons were overlaid onto the Sentinel images and those that included at least 2 ha of Sentinel imagery were used (n = 51,751 polygons; mean area = 11.3 ha; range = 2.2-26.7 ha). For the Sentinel-2 pixels in each polygon, we calculated the mean reflectance for each of the 108 combinations of scene and band. Each combination was normalized across stands to range between zero and one by subtracting the minimum and dividing by the maximum. These same constants were used to normalize all other reflectances mentioned below. (1) Forest Resource Inventory (FRI) information for the whole study area extent (termed "Sentinel-FRI" models) or (2) fieldbased measurements from 155 0.25-ha plots (termed "Sentinel-field" models). Prediction success of the first was evaluated using field data from 155 0.25-ha plots, whereas prediction success of both were evaluated using field data from 51 0.24-ha plots in the "Megaplot". See text for details.
In order to create Sentinel-field models, we calculated percent basal area of each tree species in each 0.25-ha plot. We used only trees ≥10 cm DBH because they were more likely to be represented in the canopy. Among deciduous trees sampled by Thomas [32] in Haliburton Forest, 10 cm was close to the lower DBH threshold at which trees typically reached a "co-dominant" size class where >50% of the canopy was exposed. In his sample, 95% of trees with >50% upper crown exposure had DBHs ≥ 10 cm (Thomas, unpubl. data). Plot boundaries were overlaid onto the Sentinel imagery, and mean reflectance for each combination of image and band were calculated based on weighted averages of Sentinel pixel areas within the plot boundaries. For comparative purposes, we also calculated FRI species composition for the field plots; again, weighted averages of the FRI polygon areas within the plot boundaries were used.

Model Creation
We used two modelling approaches: random forest (RF) and convolutional neural networks (CNN). The former is a regression-tree (or classification-tree) method in which random subsets of data and predictors are used to create multiple trees, with the final model representing a consensus among them [33]. It is non-parametric and implicitly incorporates interactions among predictors. We used the multivariate implementation of RF provided by R function rfsrc and used default parameter settings [34,35]. Several authors have found that RF models built with subsets of predictors performed better than those built with all predictors (e.g., [17,18,36]), hence we tested a variable selection strategy in which forward selection in a redundancy analysis (RDA) was used to pick best subsets of predictors. This is similar to the "greedy-Wilks" strategy used by Lim et al. [37]. Because of its linear nature, we used arcsine-transformed percent tree composition in the RDA. Ordination and forward selection were accomplished using rda and ordiR2step in the R library vegan [38]. For the Sentinel-FRI model, we undertook a single 80:20 split of the data into training and validation data sets, respectively, and tried the best 20, 40, 60, etc. variables. We observed that model performance increased with the number of variables (as judged by mean R 2 values between observed and predicted values across species) for both the training and validation data; accordingly, we used all variables (and data) in the final model. For the Sentinel-field model, because sample sizes were much smaller, we undertook 30 random 80:20 splits, and for each tried the best 10, 20, 30, etc. variables. In this case, the highest mean R 2 values were obtained with models with ≥90 variables, so again we used all variables (and data) in the final model.
Neural networks use layers of interconnected "neurons" in which the connections among them are weighted and iteratively improved to maximize prediction of the dependent data (the observed percent canopy composition) from the independent data (the Sentinel-2 imagery). Convolutional neural networks (CNN) are optimized to find spatial relationships in data and are widely used in the field of computer vision for both regression and classification tasks [39][40][41]. CNNs also have been used in many domains that process temporal information [42]; for example, via transformation of audio information into spectrogram images [43]. Here, we envisioned the 108 predictors as a 9 by 12 "image" with rows as spectral bands and columns as acquisition dates.
Unlike RF models, there are a plethora of possible CNN architectures and architecturespecific parameterizations. Typical architectures consist of a pipeline of standard computation layers, including: convolution layers, in which sliding filter windows are applied across the input image; pooling layers, in which the size of incoming data is reduced by averaging or taking the maximum of neighbouring pixels; batch normalization layers, in which output feature maps of hidden layers are normalized; dense layers, in which all-to-all connected sets of neurons are used to analyse features generated from the convolutional layers; regularization layers, in which typical L1 and/or L2 penalties are applied; and dropout layers, in which random proportions of the previous layers outputs are ignored to reduce overfitting. Additional parameterizations include choices of activation functions between layers, number of input samples used to train each optimization step, the final output loss function used to guide the optimization, and the learning rate associated with each optimization step. The base architecture used in our CNN analyses is shown in Figure 3.
Given the high dimensionally of the parametrization space, we used Bayesian hyperparameter searches to find best-performing models by comparing mean R 2 values among predictions for 30% validation datasets. In these searches, we allowed parameters to vary within predefined limits, including the optimizer, number of convolutional and dense layers, number of kernels per convolutional layer, the filter radius of each convolutional layer, number of neurons at each dense layer, pooling, dropout percentage, L1 and L2 regularization strength at each layer, and activation and loss functions. Networks were trained for an unlimited number of epochs with a decaying learning rate and were stopped once validation loss stopped improving. Final activation was through a softmax layer to ensure that the output distribution (i.e., proportional species composition) summed to 1 (see Supplementary Materials Table S2 for search parameters and final parameterizations for the Sentinel-FRI and Sentinel-field models). We used TensorFlow v. 2.1 [45] to undertake the CNN modelling; hyper-parameter searches made use of the HyperOpt library ( [46]). All other data processing and modelling was undertaken in R (v. 4.0.2; [47]).

Model Testing
Predictive capabilities of the Sentinel-FRI model (and of the FRI itself) were evaluated using the 0.25-ha field plots. Predictive capabilities of both the Sentinel-FRI and Sentinelfield models (and of the FRI) were evaluated using information from the Megaplot at both plot and stand levels. In all cases, predictions were compared against the actual species composition (percent basal areas (BA) of trees with DBH ≥ 10 cm). In the Megaplot plot-level approach, we used plot sizes that approximated the size of the 0.25-ha plots used to train the Sentinel-field models; specifically, we subdivided the Megaplot into 51 non-overlapping 40-by-60 m (0.24 ha) plots (Figure 1c). Plot boundaries were defined by Sentinel pixels. Because they were contiguous, information from these plots is presumably spatially autocorrelated; however, given that our purpose was primarily to compare model performances, spatial autocorrelation was ignored. For each plot, we calculated the mean reflectance for each of the 108 combinations of Sentinel image and band. As before, in calculating the FRI composition of the plots, we used weighted averages of the FRI polygon areas within the plot boundaries. In the Megaplot, stand-level approach, we used the four FRI polygons (stands) that overlapped parts of the Megaplot (areas of 3.9, 0.8, 6.0, and 3.1 ha). Corresponding field-based percent BA of the three leading tree species in the four stands were: (1) 42% eastern hemlock, 16% red maple, 7% eastern white cedar; (2) 44% American beech, 27% sugar maple, 20% eastern hemlock; (3) 67% sugar maple, 11% American beech, 6% eastern hemlock, and (4) 54% sugar maple, 12% American beech, 12% red maple. For each of the four stands, we calculated the mean reflectance for each of the 108 combinations of Sentinel-2 date and band based on weighted averages of Sentinel pixel areas within the stand boundaries. For each stand, we compared observed, FRI, and predicted species composition by calculating pair-wise Euclidean distances (ED): where n is the number of species and p i and q i are the relative abundances of species i in respective vectors p and q.
In a final comparison of model performance, we used a 3634-ha area in the Depot Lake region of Haliburton Forest. We subdivided it into 40-by-60 m contiguous plots (with boundaries defined by Sentinel pixels) and for each plot calculated the mean reflectance for each of the 108 combinations of Sentinel date and band. As before, we used 40-by-60 m (0.24 ha) plots to approximate the size of the 0.25-ha plots used to train the Sentinel-field models. We examined plot-level agreement among model predictions (and with the FRI) by calculating coefficients of determination (R 2 ). FRI values for each plot were based on weighted averages of plot areas within FRI stands. Calculations were undertaken for only those plots that had Sentinel information for 5 or more of the 6 Sentinel pixels per plot (n = 13,374 plots).

Results
Averaged across R 2 values for the 21 tree species in the 0.25-ha field plots, the Sentinel-FRI model created using RF outperformed the FRI by a factor of 1.5 (average R 2 values of 0.217 and 0.141, respectively). The Sentinel-FRI model created using CNN performed even better, showing a 2.2-fold increase over the FRI (average R 2 value of 0.307; Table 1). For species that showed an R 2 value of at least 0.2, RF predictions outperformed the FRI for all species except red oak, and CNN predictions outperformed both RF predictions and the FRI for all species. Eight species showed poor R 2 values (<0.2) in all cases (eastern white cedar, red maple, black spruce, eastern larch, basswood, black cherry, white elm, and red spruce). FRI and model predictions plotted against field observations for eastern hemlock and sugar maple illustrated the tighter relationships obtained for the Sentinel models compared to the FRI, but also showed that both models (but especially RF) tended to underestimate relatively high abundances (Figure 4).
Similar results were obtained when using the Sentinel-FRI model to estimate percent abundances of the 15 species in the Megaplot. At the plot level, based on average R 2 values across species, the RF model showed an average 1.7-fold improvement over the FRI and the CNN model showed a 2.1-fold improvement over the FRI (average R 2 values were 0.169 for the FRI, 0.295 for RF, and 0.352 for CNN; Table 2). For the 12 species with an R 2 value >0.2, the FRI had the highest R 2 value for two species, RF had the highest for two species, and CNN had the highest for eight species. Three species had no R 2 values >0.2 (ironwood, black cherry, and basswood). At the stand level, RF predictions slightly outperformed CNN predictions, showing a 35% improvement relative to the FRI, versus 33% for CNN (Euclidian distances from field observations averaged across the four stands were 6.8 for the FRI, 4.4 for RF, and 4.6 for CNN; Table 3).
When sentinel-based models were trained using field data (155 0.25-ha plots) instead of the FRI, they showed even better predictive performance for the Megaplot. For RF at the plot level, instead of a 1.7-fold improvement, we found a 2.1-fold improvement (Table 2). Similarly for CNN at the plot level, instead of the 2.2-fold improvement, we found a Table 1. Coefficients of determination (R 2 and corresponding p values) between percent basal area of stems >10 cm DBH in 155 0.25-ha field plots and: (1) percent canopy composition in the Forest Resource Inventory (FRI), (2) random forest (RF) predictions, and (3) convolutional neural network (CNN) predictions. The last two were from Sentinel-2-based models trained with FRI information for the entire study area. Species order is based on average R 2 values (highest to lowest).    Table 3. Euclidian distances between field observations and FRI or model predictions for four stands in the Haliburton Forest "Megaplot" in south-central Ontario. Tree communities in the field plots were quantified by species-specific percent basal area (stems ≥ 10 cm diameter at breast height). 1 FRI = Forest Resource Inventory; RF Sentinel-FRI = random forest predictions from Sentinel-2-based models trained with FRI information for the entire study area; CNN Sentinel-FRI = as previous, but using a convolutional neural network; RF Sentinel-field = random forest predictions from Sentinel-2-based models trained with fieldsampled information from 155, 0.25-ha plots; and CNN Sentinel-field = as previous, but using a convolutional neural network.

Species
2.8-fold improvement. Across all comparisons for species with at least one R 2 value >0.2, Sentinel-field CNN models had the highest R 2 for six species, Sentinel-FRI CNN models had the highest for three species, and the remaining models and the FRI each had one highest R 2 value. Again, no R 2 values >0.2 were observed for ironwood, black cherry, or basswood. At the stand level for the Megaplot, the Sentinel-field CNN model showed a 58% improvement relative to the FRI, whereas the Sentinel-field RF model showed a 52% improvement (respective mean Euclidian distances were 3.3 and 2.9; Table 3). In the plots of FRI and field observations and model predictions for sugar maple, tighter relationships for model predictions compared to the FRI were evident ( Figure 5). Again, all models underestimated abundances when they were at their highest, although underestimation was least pronounced for the Sentinel-field models.
In a final comparison, we plotted the FRI and predicted abundances of red oak in the Depot Lake region of Haliburton Forest ( Figure 6). The ability of the Sentinel-based models to provide higher-resolution mapping compared to the FRI was evident. For example, some areas of homogeneous abundances in the FRI showed a finer-grain pattern of higher and lower abundances in the models. Also, red oak was absent from the FRI in the north-western quadrant of the region, whereas the models predicted several areas of moderate abundances. Compared to CNN models, RF models appeared to show a more restricted ranges of values, overestimating relatively low abundances and underestimating relatively high abundances. When we looked at plot-level R 2 values among the various data sets, as predicted we observed that R 2 values among the various model predictions were higher than R 2 values between the model predictions and the FRI (ranges of 0.24-0.44 vs. 0.11-0.16; Table 4). Table 4. Average species-specific coefficients of determination (R 2 ) between the FRI and model predictions of percent species composition for the Depot Lake region in Haliburton Forest, Ontario. The 3634-ha study site was subdivided into 40 by 60 m plots; see Table 3 for descriptions of the various predictions.

Discussion
As tested against two independent field data sets, we could use Sentinel-2 information in concert with the FRI to better estimate tree species composition than in the FRI itself. Based on average R 2 values, improvements were approximately 1.5-and 2.1-fold using RF and CNN models, respectively. The FRI's poor performance at the relatively small scales that we used (c. 0.25 ha) is perhaps not surprising given that it was designed to be used at stand (multiple-ha) scales; however, our test at the stand scale also showed improvements over the FRI of 33-35%, although based on a small sample of four stands. We also found that predictions from FRI-trained models agreed more closely with predictions from fielddata-trained models than with the FRI itself, supporting the hypothesis that the Sentinel-2 models extracted accurate information from the FRI. The use of Sentinel-2 information also meant that we could map species composition at higher spatial resolution than in the FRI. Additionally, our multivariate approach allowed us to use a much larger set of FRI stands for model training than if we had relied upon stands heavily dominated by single tree species. For example, in our study area, this meant that we could use >50,000 stands for model training instead of the 4153 stands dominated by a single species (90% abundance or more). It also allowed us to model abundances of a relatively large number of species, which was a key future research need identified by Immitzer et al. [17].
Although they had improved performance relative to the FRI, our models still performed poorly for approximately 8 of 21 tree species. Differences in classification success among species using Sentinel-2 data have been noted by several authors and have been attributed to short-term changes in tree phenology missed by the satellite scenes available, imbalanced representation of tree species in the training data, small training sample sizes, and similar spectral signatures among species [17,27,48]. Sample size evidently was a major factor in our analyses: when we plotted model success (as judged by R 2 values) against percent abundances in the testing data, a strong effect of sample size was observed, with R 2 values increasing with abundance ( Figure 7). For both the FRI and field-based training data, but especially for the latter, model fits showed rapid improvement over a relatively small range of abundances. For the field-based training data, any species with abundance of <4% showed relatively poor R 2 values (<0.4), whereas more abundant species showed systematically higher R 2 values (>0.5). In some cases, for these rarer species, the few tree canopies available may not have permitted accurate Sentinel-2 signatures to be created. Part of the failure may also have represented biases in the models towards maximizing fits for the most abundant species, holding out the possibility that cost-sensitive learning or re-balancing efforts might improve model performance [49]. Interestingly, even for relatively abundant species, the considerable variability shown among FRI-trained model fits was not mirrored in the field-trained fits (Figure 7). For example, despite similar abundances, FRI-trained fits for red maple were relatively poor and those for red oak and eastern hemlock were relatively good. These same relative differences were not evident for field-trained models, suggesting that the differences were not driven by difficulties in discriminating these species based on spectral or phenological characteristics. Instead, they may have reflected species identification errors in the FRI. Under this hypothesis, the abundance of red maple was poorly estimated by photo interpreters, whereas red oak and eastern hemlock abundances were relatively accurately assessed. Thompson et al. [12] found evidence of systematic biases in FRI species accuracy between two boreal study areas, suggesting interpreter errors.
In general, one might expect field-trained models to outperform FRI-trained ones, however the improved performance that we observed was somewhat surprising given the much larger sample size available in the FRI compared to the field data (>50,000 stands of FRI information vs. 38.75 ha of field data). This appears to be a testament to the overall poor quality of the FRI information and suggests that field-based data should be used whenever possible in model creation, especially to avoid possible human identification errors. The field data in this case represented a relatively large effort (23,202 stems ≥ 10 cm DBH); such data may not always be available. Also, we used relatively large field plots (0.25 ha), which have special value because they encompass multiple remotely sensed pixels and have low perimeter:area ratios and hence a lesser influence of neighbouring (unmeasured) trees [50]. FRI information, on the other hand, is already available for the entire managed forest area of Ontario and in many other jurisdictions as well. Although beyond the scope of this study, an interesting research question concerns the quantity of field data required to improve upon simple use of the FRI, and the relative costs and benefits of doing so. The value of field data in training also might be increased without much additional effort. We simply included any stem ≥10 cm in diameter at breast height in the training data: actual canopy representation as seen by the satellites might be more accurately estimated by using stem maps in concert with estimates of tree canopy size (e.g., [51]) or by using LiDAR.
The nature of training data also raises the interesting question of approaches that might be used to periodically update or create FRIs. An obvious shortcoming of the present effort is the long temporal gap between the Sentinel-2 imagery and the FRI and field information, which was usually close to 10 years, but in the case of the FRI for Algonquin Provincial Park was nearly 20 years. Although trees in the study region are relatively slowgrowing, a smaller temporal gap can be expected to lead to improved model performance. At the same time, such gaps may not be unrealistic given the relative infrequency with which the FRI is updated; for example, in Ontario such updates occur approximately every 20 years [2]. The ability of the Sentinel-2 models to outperform the FRI even given the considerable temporal gap highlights a potentially useful aspect of our approach as applied to FRI updates. As new satellite imagery for a region becomes continuously available, evidently even quite old FRI or field-based information can be used to train models, which then present a snapshot of tree species information at the time the imagery was collected. That is, the satellites are automatically collecting new data all of time, whereas the FRI or plot-based information could be updated at less frequent intervals. Retraining of models presumably would be needed for each new set of satellite imagery because of year-to-year variability in scene availability and tree phenology. Here, methods for efficient use of large collections of satellite images, including those with considerable cloud cover, might be especially useful (e.g., [52,53]). Our methods might also find application in national forest inventories, where satellite information is often used in combination with large networks of field plots to derive forest attributes, although in this case by imputing information from neighbouring plots (e.g., [19,20]). Another potential use of our approach might be to test for biases of interpreter errors [11].  Table 1) and in (b) using field data (Sentinel-field; see Table 2). R 2 values for RF and CNN models were averaged. Species acronyms are the first two letters of the genus and species.
An additional opportunity to improve model performance might be to incorporate information on disturbance events such as forest harvesting. Stand and species spectral properties can be influenced by many factors, including tree age, local site conditions, shadowing effects, and crown health [4,48,54]. We did not have comprehensive information on harvesting activities hence we excluded them; we also expected such effects to be of lesser importance in this region where partial harvesting systems dominate compared to more intensive methods such as clearcutting. Presumably, information on gap fractions and gap ages could be used to further improve model performance, perhaps by treating expected understory species and shade as additional "species" in a multivariate approach. Another approach might be to screen out disturbed stands a priori; for example, Immitzer et al. [17] used NDVI changes from year to year to exclude disturbed stands.
In general, CNN models outperformed RF models by factors of 1.1-1.4 depending on the specific training and testing data sets. The exception was the Sentinel-FRI model at the stand scale, where RF slightly outperformed CNN. The generally higher performance of CNN models highlights the considerable potential of neural network approaches, which have received support in other studies as well (e.g., [16,23,41]). We did not incorporate vegetation indices into our analysis, which might have improved our RF models (e.g., [16,17]), but might have decreased performance in the CNN models [16]. Modifications of RF might also provide better performance; for example, optimized AdaBoosted RF [55] or Cascaded RF [56]. Certainly, RF was much easier to implement that CNN, although we hope this will change with continuing research on neural networks and their parameterization, especially as applied to remote sensing. We also did not incorporate environmental information such as topography into our species modelling, which has led to improved forest classification in several Sentinel-2 studies (e.g., [22,57]). This was purposeful however: although such approaches can improve model performance because of habitat associations of the various tree species, they might preclude extrapolation to situations where habitat associations have been altered; for example, due to management activities or climate change.

Conclusions
In conclusion, our use of multiple Sentinel-2 scenes from a two-year period in combination with local FRI information allowed us to map tree species abundances more accurately and at higher spatial resolution than in the original FRI. Use of information from 155 0.25-ha field plots instead of the FRI resulted in even more accurate models, as did CNN compared to RF models. Importantly, our multivariate approach meant that we could use information from any forested area to train models, irrespective of whether it was dominated by a single tree species or not. Such approaches may prove useful in updating and validating FRI information. Areas for future research include use of tree stem maps in improving field-trained models and incorporation of the effects of canopy disturbances.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/rs13214297/s1, Table S1: Summary of model creation and testing, including model names, statistical methods, training data, validation data, and tables with R 2 values; Table S2: Neural network architecture and parameter values used in Bayesian hyper-parameter searches and final values used in Sentinel-FRI and Sentinel-field models. References [58][59][60][61][62][63][64] are cited in the supplementary materials.