The Best of Both Worlds? Integrating Sentinel-2 Images and airborne LiDAR to Characterize Forest Regeneration

: Sustainable forest management relies on practices ensuring vigorous post-harvest regeneration. Data on regeneration structure and composition are often collected through intensive ﬁeld surveys. Remote sensing technologies (e.g., Light Detection and Ranging (LiDAR), satellite imagery) can cover a much larger spatial extent, but their ability to estimate regeneration characteristics is often challenged by the obstruction associated with canopy foliage. Here, we determined whether the integration of LiDAR and Sentinel-2 images can increase the accuracy of sapling density estimates and whether this accuracy decreased with canopy cover in the Acadian forest of New Brunswick, Canada. Using random forest regression, we compared the accuracy of three models (LiDAR and Sentinel-2 images alone or combined) to estimate sapling density for two species groups: saplings of all species or commercial species only. The integration of both sensors did not increase the accuracy of sapling density estimates, nor did it reduce the negative inﬂuence of canopy cover for either species group compared to LiDAR, but it increased the accuracy by approximately 15% relative to Sentinel-2 images. Under very high canopy cover, the accuracy of density estimates for all species combined was signiﬁcantly lower with Sentinel-2 images only. We recommend using LiDAR and high-resolution satellite images acquired in the fall to obtain more accurate estimates of sapling density.


Introduction
Regeneration is essential to establish or renew a stand after a disturbance, whether of natural or anthropogenic origin [1]. Two main approaches are used to estimate regeneration characteristics such as density, composition, and height: field surveys and remote sensing. Even though field surveys are more accurate [2], they are more time-consuming [3] and their spatial extent is limited compared to remote sensing [4,5].
Remote sensing technologies such as satellite images have been used for decades in forestry to estimate various stand characteristics [6]. Forest health [7], aboveground biomass [8], fire risk zone [9], and cover change [10] are just a few variables that have been estimated using satellite images. More recently, Light Detection and Ranging (hereafter referred to as LiDAR) has been shown to be efficient at estimating various stand attributes, such as canopy height [11], basal area [4], tree crown diameter [12], and canopy structure [13]. However, forest stand characteristics that can be estimated from remote sensing are mostly restricted to overstory, mainly because canopy foliage interferes with the detection of understory vegetation. Overstory density [14][15][16][17], canopy height [18], proportion of hardwoods in the overstory [19], and canopy cover [20] all have been shown to decrease the accuracy of estimates of overstory tree characteristics. These limitations are even more evident when estimating regeneration characteristics, which may explain why remote sensing has mainly been used to quantify regeneration following a fire [21,22] or tree plantation [23,24]. Only a few authors have attempted to characterize regeneration under high values of canopy cover [25,26] and, not surprisingly, the accuracy of estimates decreased with canopy cover.
Studies estimating regeneration attributes from remote sensing have mainly been conducted using a single sensor: either satellite and aerial images, or LiDAR. Satellite and aerial images display attributes on the horizontal plan, and thus may not accurately represent vertical structure [39]. LiDAR is specifically designed to quantify characteristics on a vertical plan and, therefore, it can detect subcanopy features more accurately than satellite sensors [40]; thus, it could complement satellite and aerial images. Combining satellite or aerial images and LiDAR has been shown to explain a greater proportion of variance in stand characteristics such as basal area [41], tree volume [42], tree height [43], and tree density [23,44] than models based either on satellite images or LiDAR. The combination of LiDAR and satellite images has also been shown to increase the accuracy of crown delineation [45,46] and tree height [47]. However, very few studies have tried to characterize forest regeneration using a combination of satellite images and LiDAR and those that did have mainly been conducted in areas with low canopy cover [22,24].
Here, we compared the accuracy of sapling density estimates between LiDAR, Sentinel-2 images, and a combination thereof, and we identified the limitations of each approach along a gradient of canopy cover. First, we hypothesized that a model using variables derived from both sensors (satellite images and LiDAR) will be more accurate than a model using a single one, owing to complementarity in the information acquired by the two sensors. Then, we predicted that estimation errors will increase with canopy cover for either of the three approaches (single sensors or their combination), but that the difference between low and high canopy cover values will be less pronounced when both sensors are combined.

Study Area
We sampled forest stands across the entire province of New Brunswick in eastern Canada (66 • 21 42 W, 46 • 31 05 N, CRS: WGS84, Figure 1). This province is part of the Atlantic Maritime Ecozone, whose climate is strongly influenced by the Atlantic Ocean [48]. New Brunswick's forest has been shaped by a long history of forestry. As a result, it is dominated by second-and third-growth stands, with small, scattered patches of old-growth [48]. Agriculture is restricted to valleys [49] and natural disturbances are dominated by eastern spruce budworm (Choristoneura fumiferana) outbreaks and fine-scale disturbances such as windthrows [50,51]. Most of New Brunswick's forested lands are classified as Acadian forest, with a small portion of boreal forest in the northwestern corner of the province [52]. New Brunswick's Acadian forest is characterized by a high species richness of both softwood and hardwood species, with spruces (Picea spp.), balsam fir (Abies balsamea), maples (Acer spp.), and birches (Betula spp.) as dominant canopy species [53].

Field Data
In this study, we used permanent sample plots (PSP) monitored by the New Brunswick Department of Natural Resources and Energy Development [54] and J.D. Irving Ltd. (n = 813). Almost

Field Data
In this study, we used permanent sample plots (PSP) monitored by the New Brunswick Department of Natural Resources and Energy Development [54] and J.D. Irving Ltd. (n = 813). Almost all the PSP plots were distributed as a grid of 2 km × 2 km while the others were randomly distributed in stands of high values and intensively managed, 1/10 th of the plots were measured annually.
Field surveys were performed ±1 year from the date of LiDAR coverage and Sentinel-2 image acquisition. We visually inspected each plot to ensure that no disturbance had taken place between the field survey and LiDAR capture. Because the annual growth in height of saplings is very variable due to different factors such as light intensity [55], initial height of the sapling [56], and shade-tolerance [57], we did not consider annual height growth of saplings as a bias in our study. Each stem with a diameter at breast height (DBH) of 1 to 7 cm and a height of ≥1.3 m was tallied in 50-m 2 circular plots (3.99-m radius). Plot center location was determined with a GPS (Topcon HiPer SR, Topcon Corp., Tokyo, Japan) set up on a tripod. The tripod was placed in the center of the plot for a minimum of one hour to obtain high spatial accuracy (50 cm). For each stem tallied, species, DBH, and health class (live, dead, or declining) were recorded. Stem height was measured for every 10th stem using a clinometer.
All stems whose DBH was ≥7.1 cm were tallied in a 400-m 2 circular plot (11.28-m radius) using the same plot center as for regeneration plots. Species, DBH, health class, and height were then recorded.

Satellite Images and Preprocessing
We used Sentinel-2 images with the highest resolution, 10 m × 10 m, acquired on 1 July 2016, 9 July 2017, and 21 July 2018. We used images from July to maximize our sample size (low cloud cover) and the highest image resolution to increase the accuracy of the models [33]. At this resolution, the images only include four bands: blue, green, red, and near-infrared (NIR). Since only level-1C images (top-of-atmosphere reflectance) were available for our study area, we performed an atmospheric correction using sen2cor plugin (SNAP software, ESA, Venice, Italy [58]). Then, we divided the digital number of each band by 10,000 to extract reflectance.
We only selected plots classified as vegetation from the scene classification generated by the sen2cor plugin for further analysis. Finally, for each plot, we extracted the mean reflectance of each band within an 11.28-m radius circle around the plot center and we calculated the normalized difference vegetation index (NDVI; Equation (1)). We selected NDVI because of its strong and well-described relation to canopy vegetation.  [25,38].

LiDAR Acquisition and Preprocessing
We only used returns classified as "ground" or "vegetation". To do so, we first filtered points using LAStools software (rapidlasso GmbH, Gilching, Germany [59]) to remove those classified as building or noise, i.e., points not classified as "vegetation" or "ground". Then, also with LAStools, we normalized vegetation points with ground points to calculate vegetation height above ground. Finally, we extracted points corresponding to field plots (11.28-m radius) and calculated LiDAR metrics for each plot using the lidR package (Roussel and Auty, Québec, Qc, Canada [60]) in the R statistical software (R Core Team, Vienna, Austria [61]) ( Table 1).

Dependent Variables
We calculated the living stem density (stems/ha) of saplings for two species groups: (1) all species and (2) commercial species ( Table 2). The "All species" group combined non-commercial and commercial species, whereas the other species group only comprised species having a commercial value. Both species groups included hardwood and softwood species (Table 2). On the other hand, all non-commercial species were hardwoods ( Table 2). We defined a sapling as a stem whose DBH varied from 1 to 9 cm and whose height was ≥1.3 m.

Environmental Variables
We estimated canopy cover for each 11.28-m radius plot by calculating the proportion of all LiDAR points >7 m, which corresponds to the mean height of saplings in our study area. We also calculated the basal area of merchantable trees (DBH ≥ 9.1 cm), and the proportion of basal area of merchantable trees represented by hardwood species for each plot. We characterized plots using aspect, topography position index, and hillshade, which were derived from a digital elevation model (DEM) provided by the New Brunswick Department of Natural Resources and Energy Development (NBDNRED) at a resolution of 10 m. We also extracted ecoregion, ecodistrict, and ecosite [49], biomass growth index (BGI; [63]), depth to water table, soil type, and an index of the probability of occurrence of each species for every plot.

Statistical Analyses
To compare the accuracy of sapling density estimates between the different sensors (LiDAR, Sentinel-2 images, and both), we first built three models using random forest regression [64] for each dependent variable, i.e., absolute density of saplings of (1) all species and (2) commercial species. The first series of models combined variables from both sensors, LiDAR and Sentinel-2 images; the second series of models used LiDAR, whereas the third relied on Sentinel-2 images. In each model, environmental variables were also included.
We used the VSURF package [65] in R to select variables in each model based on their importance to reduce bias towards intercorrelated variables and categorical variables [66]. The model parameters entered were default values of ntree (500) and mtry (one-third of the variables), and we set the nodesize parameter to 1 to grow larger trees [64]. We then calculated the root means squared error (RMSE) with the randomForest package and obtained pseudo R-squared values for each model including only variables selected by the VSURF package. RMSE and pseudo R-squared values were calculated with internal values of mean squared error (MSE) and pseudo R-squared values of each tree (n = 500), respectively. We also computed relative RMSE (Equation (2)) and 95% confidence intervals (C.I.) of RMSE, relative RMSE, and pseudo R-squared to determine whether the accuracy and variance explained by the models differed.
Finally, to test the effect of canopy cover on the accuracy of each model, we performed a Kruskal-Wallis test on relative errors (Equation (3)) of four different intervals of canopy cover (low: 0% to 24%, medium: 25% to 49%, high: 50% to 74%, and very high: ≥75%). None of the transformations we tried could normalize the residuals from an ANOVA. Following significant Kruskal-Wallis tests, we performed a Dunn test using the Bonferroni correction to adjust p-values.

Variable Selection and Model Accuracy
Contrary to our prediction, the integration of LiDAR and Sentinel-2 images did not increase the accuracy of sapling density estimates, nor it did for the variance explained compared to LiDAR for both species groups, i.e., all species combined and commercial species only (Table 3). Moreover, no spectral variables were retained in the models integrating the two sensors for both species groups (Table 3). However, the integration of both sensors, as for LiDAR, yielded higher accuracy and pseudo R-squared values compared to models using Sentinel-2 images for both species groups (Table 3).
When estimating sapling density for all species, the RMSE of the models combining both sensors or LiDAR only was similar (2828 no. stem/ha vs. 2836 no. stem/ha, respectively). For the model using Sentinel-2 images only, it was 3320 no. stem/ha (Table 3). When estimating sapling density for commercial species, the model integrating both sensors yielded an accuracy of 2784 no. stem/ha, and that using LiDAR only had a similar RMSE (2779 no. stem/ha) ( Table 3). The model using Sentinel-2 images had an accuracy of 3165 no. stem/ha.
When estimating sapling density of all species, relative RMSE of the model combining both sensors and that using LiDAR was 84%, whereas it was 98% for the model using only Sentinel-2 images (Table 3). Therefore, relative RMSE of the model using Sentinel-2 images only was 14% higher than the model integrating both sensors and that using LiDAR. On the other hand, when estimating sapling density of commercial species, relative RMSE of the model integrating both sensors and that using LiDAR were identical at 100%, whereas relative RMSE of the model using Sentinel-2 images was 114% (Table 3). Again, relative RMSE of the model using Sentinel-2 images only was 14% higher than the model combining both sensors and that using LiDAR.
Pseudo R-squared of the models combining both sensors and those using LiDAR were similar at 0.33, when estimating sapling density of all species, and 0.23 and 0.24, respectively, when estimating sapling density of commercial species (Table 3). Pseudo R-square were lower for the models using Sentinel-2 images, at 0.08 when estimating sapling density of all species and 0.01 when estimating sapling density of commercial species (Table 3). When estimating sapling density of all the species, pseudo R-squared of the model using Sentinel-2 images was 25% lower than the model combining both sensors and the one using LiDAR. Furthermore, when estimating sapling density of commercial species, pseudo R-squared was 22% lower than the model integrating both sensors and 23% lower than the one using LiDAR. Table 3. Variables selected and comparison of root means squared error (RMSE) (stems/ha), relative RMSE (%), and pseudo R-squared of models estimating sapling density of all species and commercial species using LiDAR metrics and/or spectral variables and environmental variables. See Table 2  In addition, 95% confidence intervals of RMSE, relative RMSE, and pseudo R-squared were similar between the models combining both sensors and those using only LiDAR for both species groups (Table 3). However, 95% confidence intervals of the models using Sentinel-2 images were larger from the ones of the models combining both sensors as well as from that of the models using LiDAR only, again for both species groups (Table 3). Those results are suggesting that the accuracy and the variance explained of the models integrating both sensors and the ones using LiDAR only did not differ statistically. To the contrary, models using Sentinel-2 images yielded significantly lower accuracy and variance explained than those combining both sensors or using LiDAR only.
As shown in Table 3, spectral variables were only selected in the model using Sentinel-2 images when species needed to be differentiated, thus estimating sapling density of commercial species, whereas only an environmental variable, basal area of tree with DBH ≥ 10 cm, was retained when estimating sapling density for all species.
On the other hand, some variables were retained in almost all models, irrespective of the species group considered. Those variables were the higher percentiles of LiDAR height distribution (zq75, zq80, and zq85), the proportion of LiDAR points near the ground (zpcum1), the percentage of LiDAR points >2 m (pzabove2), as well as the percentage of LiDAR points above LiDAR mean height (pzabovezmean; Table 3). For models combining both sensors, predicted values of sapling density for both species groups were higher when values of the highest percentiles of height distribution, the proportion of LiDAR points near ground, as well as above the mean height were low. In contrast, predicted values increased with the proportion of LiDAR points above 2 m (Figures 2 and 3). These relations are indicating a denser vegetation in the lower than in the higher strata. On the other hand, some variables were retained in almost all models, irrespective of the species group considered. Those variables were the higher percentiles of LiDAR height distribution (zq75, zq80, and zq85), the proportion of LiDAR points near the ground (zpcum1), the percentage of LiDAR points >2 m (pzabove2), as well as the percentage of LiDAR points above LiDAR mean height (pzabovezmean; Table 3). For models combining both sensors, predicted values of sapling density for both species groups were higher when values of the highest percentiles of height distribution, the proportion of LiDAR points near ground, as well as above the mean height were low. In contrast, predicted values increased with the proportion of LiDAR points above 2 m (Figures 2 and 3). These relations are indicating a denser vegetation in the lower than in the higher strata.

Effect of Canopy Cover
Models using LiDAR and those integrating both sensors had similar relative errors, irrespective of canopy cover class or species group considered (Figures 4a, 4b, 5a, and 5b). By contrast and as predicted, the relative errors of sapling density estimates of both species groups increased under a very high canopy cover when using Sentinel-2 images along with environmental variables (Figures  4c and 5c), although the difference was only significant when considering all sapling species ( Figure  4c).

Effect of Canopy Cover
Models using LiDAR and those integrating both sensors had similar relative errors, irrespective of canopy cover class or species group considered ( Figure 4a,b and Figure 5a,b). By contrast and as predicted, the relative errors of sapling density estimates of both species groups increased under a very high canopy cover when using Sentinel-2 images along with environmental variables (Figures 4c and 5c), although the difference was only significant when considering all sapling species (Figure 4c).  (3)) as a function of canopy cover class (low: 0% to 24%, medium: 25% to 49%, high: 50% to 74%, and very high: ≥75%) for models estimating sapling density (stems/ha) of all species.

Variable Selection and Model Accuracy
The combination of Sentinel-2 images and LiDAR did not increase the accuracy and variance

Variable Selection and Model Accuracy
The combination of Sentinel-2 images and LiDAR did not increase the accuracy and variance explained for sapling density estimates of either species group (all species or commercial species) compared to LiDAR. Furthermore, no spectral variables were retained in the models integrating both sensors for either of the species groups. However, Leckie et al. [45] and Coops et al. [46] found that a combination of high-resolution (50 cm) images and LiDAR provided more accurate delineations of tree crowns. Pouliot et al. [33] also found that the accuracy of crown diameter estimates of planted seedlings increased with image resolution. Here, we used medium-resolution images (10 m) integrated with high-density LiDAR (6 points/m 2 ). Perhaps high-resolution images would have increased the accuracy of sapling density estimates.
The integration of both sensors and the use of LiDAR yielded higher accuracy and variance explained compared to Sentinel-2 images. Other authors have also reported that LiDAR was more accurate than satellite images to estimate density [67], biomass [47], forest height [68], and basal area [41] of canopy trees in conifer and mixed stands. LiDAR is known to penetrate deeper into the canopy than satellite images [40].
Spectral variables were only selected in the model estimating sapling density of commercial species using Sentinel-2 images, whereas none were selected when estimating sapling density of all species. Commercial species included both softwoods and hardwoods species, whereas non-commercial species only comprise hardwoods species. This suggests that Sentinel-2 images helped to discriminate between hardwood species of commercial and non-commercial species. Satellite images have been shown to be more accurate than LiDAR to differentiate species [44]. Consistent with our results, the model integrating LiDAR and aerial images developed by Korhonen et al. [69] to estimate sapling density of all species also did not include any spectral variable. Here, we used images captured in summer. Key et al. [70] have shown that fall images are the best to identify species owing to their maximally contrasting spectral signatures. Hence, future studies should examine whether fall images could increase the accuracy of sapling density estimates.
We identified good predictors of sapling density, namely, the higher percentiles of height distribution (zq75, zq80, and zq85), and the proportion of LiDAR points between 2 m and the mean height of LiDAR points. A low value of the highest percentiles of height distribution indicates that the height of most of the vegetation is low, thus corresponding to saplings. Height distribution variables have also been shown elsewhere to be important to estimate stem densities >1 m [34,37,38]. Our results indicated a high density of saplings when the proportion of points near the ground (zpcum1) and above mean height (pzabovezmean) were low as well as when the proportion of points above 2 m (pzabove2) was high. Hence, we concluded that the proportion of points between 2 m and mean height was a good predictor of sapling density and that a high proportion of points between 2 m and mean height suggested a high density of saplings.

Effects of Canopy Cover
The integration of LiDAR and Sentinel-2 images also did not decrease the influence of canopy cover on the accuracy of sapling density estimates compared to LiDAR for both species groups. However, the accuracy of sapling density estimates of all species was significantly lower under a very high canopy cover when using Sentinel-2 images only. Donoghue and Watt [63] also found that under a closed canopy, a model using only LiDAR yielded accurate estimates of tree height. Indeed, satellite images are more sensitive to foliage characteristics than to tree height [41].

Conclusions
The integration of two sensors, LiDAR and Sentinel-2 images, did not outperform LiDAR when estimating sapling density, but it increased the accuracy of estimates compared to Sentinel-2 images.
We identified two valuable predictors of sapling density: (1) the higher percentiles of height distribution (zq75, zq80, and zq85), and (2) the percentage of LiDAR points between a height of 2 m and mean height of LiDAR points. The influence of canopy cover on the accuracy of estimates was similar whether we used LiDAR or LiDAR and Sentinel-2 images combined. However, estimation accuracy was significantly lower under very high canopy covers (≥75%) when using Sentinel-2 images to estimate sapling density of all species. Hence, our results suggest that LiDAR alone is sufficient to estimate regeneration density in the Acadian forest across variable stand species compositions and canopy covers. Hence, when LiDAR is available, it should be prioritized over Sentinel-2 images. However, spectral variables were useful to differentiate species, and thus satellite images are highly recommended when information on sapling species composition is required. Finally, we recommend using the higher percentiles of height distribution (zq75, zq80, and zq85) and the proportion of LiDAR points between a height of 2 m and mean height of LiDAR points to obtain accurate estimates of sapling density in the Acadian forest under a range of canopy cover values.