Comparison of Multi-Temporal PlanetScope Data with Landsat 8 and Sentinel-2 Data for Estimating Airborne LiDAR Derived Canopy Height in Temperate Forests

: Developing accurate methods for estimating forest structures is essential for e ﬃ cient forest management. The high spatial and temporal resolution data acquired by CubeSat satellites have desirable characteristics for mapping large-scale forest structural attributes. However, most studies have used a median composite or single image for analyses. The multi-temporal use of CubeSat data may improve prediction accuracy. This study evaluates the capabilities of PlanetScope CubeSat data to estimate canopy height derived from airborne Light Detection and Ranging (LiDAR) by comparing estimates using Sentinel-2 and Landsat 8 data. Random forest (RF) models using a single composite, multi-seasonal composites, and time-series data were investigated at di ﬀ erent spatial resolutions of 3, 10, 20, and 30 m. The highest prediction accuracy was obtained by the PlanetScope multi-seasonal composites at 3 m (relative root mean squared error: 51.3%) and Sentinel-2 multi-seasonal composites at the other spatial resolutions (40.5%, 35.2%, and 34.2% for 10, 20, and 30 m, respectively). The results show that RF models using multi-seasonal composites are 1.4% more accurate than those using harmonic metrics from time-series data in the median. PlanetScope is recommended for canopy height mapping at ﬁner spatial resolutions. However, the unique characteristics of PlanetScope data in a spatial and temporal context should be further investigated for operational forest monitoring.


Introduction
Monitoring forest structure is an essential part of sustainable forest management. Canopy height is a key measure of the vertical structure of forest stands and an important indicator of potential forest growth [1] and biodiversity [2]. In addition, canopy height is a critical parameter for estimating aboveground biomass (AGB) and growing stock volume [3], which play an important role in global carbon stocks and dynamics. Traditionally, canopy or tree height has been measured in field surveys; however, this is time-consuming and labor-intensive on a large scale. Accurate methods for mapping canopy heights over large areas are required for an operational forest structure monitoring.
Remote sensing data are well suited to mapping wall-to-wall forest structural attributes, including canopy height. Airborne Light Detection and Ranging (LiDAR) can accurately map canopy height and has been used to measure canopy structures in various forest types [4]. However, the cost of measurement and the limited area coverage make regional-scale monitoring problematic. Therefore, optical and synthetic aperture radar (SAR) satellite data is investigated for regional wall-to-wall Location of the study area. A composite image of Sentinel-2 data was used for visualization. The land area was provided by the National Land Numerical Information (http://nlftp.mlit.go.jp/ksj-e/index.html).

Airborne LiDAR Data
Wall-to-wall airborne LiDAR data were acquired from 31 January to 5 February 2016, using a Trimble Harrier 68i laser system mounted on the rotary-wing helicopter with a flight altitude of 400 m. The average pulse density was 23.5 pulses per m 2 . Ground control points in the study area were used for data registration. The point cloud data were classified into ground and non-ground classes using a cloth simulation filtering algorithm [25]. A digital terrain model (DTM) was generated with spatial interpolation of the ground points using a triangle irregular network (TIN) at a spatial resolution of 0.25 m. All LiDAR point clouds were normalized using the DTM. The mean canopy height (MCH) was calculated at spatial resolutions of 3, 10, 20, and 30 m. Manipulation of the LiDAR point data was conducted using the "lidR" package [26] in the R statistical software [27].

PlanetScope, Landsat 8, and Sentinel-2 Data
PlanetScope Analytic Ortho Scene surface reflectance (SR) Level 3B data were used in this study. This data product has four spectral bands (blue, green, red, and near-infrared) with a spatial resolution resampled at 3 m [19]. All PlanetScope data were acquired in 2017 with less than 50% cloud cover. A visual interpretation was conducted to remove low-quality data and images affected by haze. The porder tool [28] was used to list, order, and download the PlanetScope data through the Planet Application Program Interface [29]. Clouds were masked using an Automatic Time-Series Analysis algorithm [30]. Several studies have reported radiometric consistency issues in the PlanetScope SR data due to cross-sensor inconsistency [31,32]. However, in this study, we considered the radiometric correction of PlanetScope SR data to be sufficient. This is because our preliminary analysis confirmed that the distribution of root mean squared error (RMSE) of the actual and predicted values from the first-order harmonic regression model applied to PlanetScope's time series spectral bands was in the range of that of the Landsat 8 and Sentinel-2 data ( Figure 2). This indicated that the variability of SR values in PlanetScope data due to atmospheric effects was similar to Landsat 8 and Sentinel-2. Therefore, no additional preprocessing was applied to the PlanetScope SR data. All four spectral bands were used for further analyses. A total of 148 images across 35 days were used (Table 1).

Figure 2.
The distribution of the RMSE of the actual and predicted values from the first-order harmonic regression model applied to four spectral bands (blue, green red, and near-infrared) for each satellite data. Landsat 8 Operational Land Imager Collection 1 SR data were downloaded from the Google Earth Engine [33]. Data from the study area that were acquired between the beginning of 2016 and the end of 2018 with less than 50% cloud cover were used. The Landsat 8 SR data were atmospherically corrected by the LaSRC [34]. Clouds and cloud shadows were masked using a pixel QA band from the CFmask algorithm [35,36]. A total of 45 images from 45 days from 2016 to 2018 were used, including 15 images from 15 days in 2017 ( Figure 3). A total of six spectral bands (band 2, 3, 4, 5, 6, and 7) were used.
Sentinel-2 Multispectral Instrument Level 1C data acquired in 2017 were used, where cloud cover was less than 50%. The Level 1C top of atmosphere (TOA) reflectance data in the Long Term Archive of the European Space Agency Copernicus Open Access Hub were ordered and downloaded using the "sen2r" package [37]. All Level 1C data were corrected to bottom of atmosphere (BOA) reflectance data using the Sen2Cor algorithm [38]. Clouds and cloud shadows were masked using the scene classification map computed by Sen2Cor. As a result of Sen2Cor processing, band 8a and 10 were removed at the BOA reflectance data. In addition, 60 m spatial resolution bands were removed in this study because these bands were mainly dedicated for atmospheric correction and cloud detection [39]. All remaining spectral bands of BOA reflectance data (band 2, 3, 4, 5, 6, 7, 8, 11, and 12) were resampled to a spatial resolution of 10 m. A total of 58 images from 33 days were used (Table 1 and Figure 3).

Predictor Variables from PlanetScope, Landsat 8, and Sentinel-2 Data
We prepared a suite of images corresponding to single composite, multi-seasonal composites, and time-series data to derive predictor variables from each satellite data (i.e., PlanetScope, Landsat 8, and Sentinel-2) at spatial resolutions of 3, 10, 20, and 30 m. All PlanetScope, Landsat 8 and Sentinel-2 data were first resampled and co-registered to LiDAR MCH pixels (i.e., 3, 10, 20, and 30 m) using bilinear interpolation. The single composite was generated by calculating the median composite of all 2017 images. The multi-seasonal composites were generated from the median composite of images acquired in winter (January to March), spring (April to June), summer (July to September), and autumn (October to December). For Landsat 8 only, images from 2016 to 2018 were used to generate multi-seasonal composites to avoid missing pixels. The time-series data were generated from all images in 2017. Predictor variables were extracted from a suite of images (Table 2). Spectral bands, the Normalized Difference Vegetation Index (NDVI), Normalized Difference Moisture Index (NDMI), Normalized Burn Ratio (NBR), Enhanced Vegetation Index (EVI), and the Gray Level Co-occurrence Matrix (GLCM) were used as predictor variables from the single composite. A total of seven first-and second-order texture measures (mean, variance, homogeneity, contrast, dissimilarity, entropy, and second moment) were calculated as GLCM variables within a 3 × 3 pixel window. The near-infrared band was selected to calculate the GLCM. We calculated the same predictor variables as the single composite for spring, summer, autumn, and winter multi-seasonal composites. The harmonic regression model was fitted to all spectral bands and the indices of time-series data using the following equation [40]: where t is the time of data acquired, a 0 is the constant term, a 1 and b 1 are the first-order amplitude of cosine and sine waves, respectively, and a 2 and b 2 are the second-order amplitude of cosine and sine waves, respectively. We used a first-order harmonic regression model for Landsat 8 due to the limited number of available images, whereas Equation (1) was used for PlanetScope and Sentinel-2. These coefficients were used as predictor variables in modeling with time-series data. In addition, the RMSEs of harmonic regression and the mean observation values were included as predictor variables.

Random Forest Model Building
To build random forest (RF) models [41] for estimating airborne LiDAR derived MCH, training and test sample pixels were extracted from the study area. Prior to sampling the pixels, water bodies were excluded using a global surface water mask [42]. In addition, because forest disturbances that had occurred between the airborne LiDAR data acquisition and the satellite data could potentially influence the RF model development, we excluded the disturbance areas in 2016 and 2017 using Hansen Global Forest Change data [43]. After eliminating the water bodies and disturbed pixels, we randomly selected 5000 pixels for both training and test samples, which were first derived from the MCH pixels at 3 m spatial resolution. The same sample locations were then used for other spatial resolutions to minimize possible sampling errors. In cases where multiple samples were located in the same pixel at resolutions of 10, 20, and 30 m, we simply left one sample at each pixel. For each RF model, feature selection was implemented to reduce redundant information in predictor variables using the "ClustOfVar" package [44]. This algorithm is based on a hierarchical clustering method and identifies the cluster of variables using the squared Pearson correlation for quantitative variables. The optimal number of clusters was determined using the adjusted Rand index [45]. One predictor variable was then selected from each cluster.
The RF models were developed for all PlanetScope, Landsat 8, and Sentinel-2 processing sets across all spatial resolutions using the "randomForest" package [46]. The models were tuned with 10-fold cross validation of 500 trees using the "caret" package [47]. The developed RF models were applied to test samples to evaluate the estimation accuracy. The coefficient of determination (R 2 ), RMSE, and relative RMSE (rRMSE) were used for accuracy assessment, which were expressed as follows: where x i is the LiDAR MCH of the ith test sample, x is the mean value of LiDAR MCH,x i is the predicted MCH of the ith test sample, and n is the number of test samples. Accuracy was also assessed based on forest and non-forest classes. The class was determined using a forest area layer from the National Land Numerical Information (http://nlftp.mlit.go.jp/ksj-e/index.html). The developed RF models were applied to the entire study area to map the canopy height. The variable importance was calculated based on the mean decrease in accuracy.

Accuracy Assessment of RF Models
A total of 36 RF models were developed based on predictor variables from different satellite data (PlanetScope, Landsat 8, and Sentinel-2), processing sets (single composite, multi-seasonal composites, and time series), and spatial resolutions (3,10,20 (Figure 4a). The lower rRMSEs were found in the RF models with a lower spatial resolution across all satellite data and processing sets. When the rRMSE was assessed in the forest and non-forest classes, rRMSE values in the forest class were lower than those in the non-forest class for all RF models (Figure 4b,c). Similar trends were observed for R 2 , ranging from 0.62 to 0.81, 0.61 to 0.80, and 0.66 to 0.83 for PlanetScope, Landsat 8, and Sentinel-2, respectively ( Figure 5). For 3 m spatial resolution, the lowest rRMSE and highest R 2 were achieved by an RF model with PlanetScope multi-seasonal composites (rRMSE of 51.3% and an R 2 of 0.70). For the other spatial resolutions, these were achieved by Sentinel-2 multi-seasonal composites (rRMSE of 40.5%, 35.2%, and 34.2% for 10, 20, and 30 m, respectively, and an R 2 of 0.79, 0.83, and 0.83 for 10, 20, and 30 m, respectively). The highest R 2 (0.83) and the lowest RMSE (2.40 m) and rRMSE (34.2%) were obtained by an RF model with the Sentinel-2 multi-seasonal composites resampled to 30 m. Table 3 summarizes the differences between rRMSE and R 2 for RF models with different satellite data, processing, and spatial resolutions. The lower rRMSEs and higher R 2 were found in RF models with a lower spatial resolution; however, the differences became smaller with decreasing spatial resolution (rRMSE, −8.0%, −5.0%, and −2.7% for 3 to 10 m, 10 to 20 m, and 20 to 30 m, respectively; R 2 , 0.06, 0.04, and 0.02 for 3 to 10 m, 10 to 20 m, and 20 to 30 m, respectively). In terms of processing, RF models using multi-seasonal composites exhibited a slightly better prediction accuracy compared with those using time-series data (−1.4% and 0.01 for rRMSE and R 2 , respectively), followed by those using a single composite. When the prediction accuracy was assessed based on satellite data, the RF models with Sentinel-2 achieved lower rRMSEs and a higher R 2 compared with PlanetScope (−2.2% and 0.03 for rRMSE and R 2 , respectively) and Landsat 8 (−4.0% and 0.04 for rRMSE and R 2 , respectively).   Table 3. Median, minimum, and maximum values of the differences between the prediction accuracy of RF models for each combination, in terms of rRMSE and R 2 . The comparison was based on combinations of (1) the same satellite data and processing, (2) the same satellite data and spatial resolution, and (3) the same spatial resolution and processing.  Figure 6 shows the scatter plots of the predicted and airborne LiDAR MCH. The underestimation of MCH in higher canopy height was observed in all RF models, even though this was alleviated in some cases. In addition, the predicted MCH values were also overestimated in lower canopy height, especially those with 3 m spatial resolution.
The five most important variables in each RF model were shown in Table 4. Although variable importance varied at different spatial resolutions, several predictor variables were ranked as the five most important variables regardless of spatial resolution (e.g., b5 in Landsat 8 single composite, NBR-mean in Sentinel-2 time series, and summer-NDVI in PlanetScope multi-seasonal composites). In the RF models using time-series data, variables from harmonic regression models were important variables although the mean values of spectral bands and indices had the highest importance in all models. The GLCM variables were important variables in all spatial resolutions of RF models using PlanetScope single composite. The near-infrared and short-wave infrared related spectral bands and indices had the highest importance in 29 out of 36 RF models. Figure 7 shows an example of MCH mapping from the PlanetScope multi-seasonal composites at 3 m for the entire study area. Figure 8 shows a detailed example of MCH mapping in the study area, which includes forest roads and harvested areas with stripe shapes, using multi-seasonal composites at different spatial resolutions. The mapped canopy height was different for each combination of satellite data and spatial resolution. It was visually difficult to map the MCH in small-scale disturbed areas, especially with 3 m spatial resolution, leading to obscured mapping to small paths and harvested areas. The canopy heights with a lower spatial resolution had better visualization of these small-scale disturbance areas; however, they were not clear at 30 m spatial resolution.

Discussion
The single composite, multi-seasonal composites, and time-series RF models had varying accuracies for different combinations of satellite data and spatial resolution. Although the RF models with time-series data performed better than those with multi-seasonal composites in some combinations (e.g., Landsat 8 with 3 m), the multi-seasonal composites generally had higher prediction accuracy. Previous studies have indicated the effectiveness of using textural variables, which are good indicators of vegetation structure as well as spatial characteristics of tree canopies [48], to estimate growing stock volumes [49]. The seasonal information from multi-temporal images was also effective for estimating forest structural attributes in previous studies [8,49,50], the results of which are consistent with those in this study. This might be explained by lower canopy height in broadleaf forests in the study area. Because seasonal and phenological variation in evergreen broadleaf forests in this region can be distinguished by satellite data [51], the RF models using multi-seasonal composites might have higher accuracies. In contrast, as noted by Csillik et al. [24], the contribution of textural information from the PlanetScope data is not apparent and needs further investigation. Several studies have used harmonic metrics from time-series data to classify tree species and estimate canopy cover (e.g., [40,52,53]). This has been found to be more accurate compared with multi-seasonal composites [54]; however, improvements using harmonic metrics were not observed when estimating canopy height in this study. This was potentially caused by undetected cloudy pixels in time-series data.
The PlanetScope estimations achieved the highest accuracy with a 3 m spatial resolution. Downscaling the Sentinel-2 and Landsat 8 data to 3 m does not result in any new information; therefore, it is sensible to use PlanetScope data with a 3 m spatial resolution to improve prediction accuracy. At a spatial resolution of less than 10 m, the RF models with Sentinel-2 achieved the highest accuracy. The potential of short-wave infrared and red-edge bands of Sentinel-2 for forest structural mapping has been demonstrated in previous studies [9,55]. This may be one of the reasons why Sentinel-2 is more accurate than PlanetScope, which has only four spectral bands. This was supported by high variable importance of short-wave infrared related variables in Sentinel-2. Although the original spatial resolution of the PlanetScope data is higher than that of Sentinel-2, the RF models with Sentinel-2 were more accurate. The influential factor for the lower accuracy achieved by Landsat 8 may be its lower original spatial resolution compared with PlanetScope and Sentinel-2. A previous study showed that the spectral indices from PlanetScope did not have a significant contribution to forest structural estimates in tropical forests [23]. This might be caused by different vegetation conditions and should be investigated in further studies.
In general, the RF models with a lower spatial resolution had a higher prediction accuracy (Figures 4 and 5). This is partially explained by the reduced co-registration errors between airborne LiDAR derived MCH and satellite observation [20]. As the errors in estimating AGB from tree height decrease with a lower spatial resolution [56] due to a slower rate of increasing total error than total AGB, our study may also be affected by spatial error variations in estimating canopy heights at different spatial resolutions.
All RF models showed an underestimation of canopy heights at higher values. The underestimation of canopy heights using optical satellite data was reported in other studies (e.g., [57]), and this is explained by the saturation of reflectance at a certain height. The overestimation of low canopy heights was partially caused by low prediction accuracy in small areas that had low or zero canopy height, as shown in Figure 8. Temporal differences between data acquisition may be another reason due to vegetation recovery after forest disturbance. One possible solution for reducing prediction biases could come from investigations into alternative machine learning approaches, such as deep learning (e.g., [10]).
In this study, we only investigated the use of optical satellite data; however, other types of satellite data, such as SAR data, might be used for accurate canopy height mapping. Because SAR can observe the forest structure even in cloudy conditions, frequent observation can be achieved by SAR data for regional and countrywide scale monitoring. Although L-band SAR observations are sometimes limited by the availability of the operational L-band SAR satellite (i.e., ALOS-2/PALSAR-2), the Copernicus Sentinel-1 mission provides temporally dense and systematically acquired C-band SAR data since launched in 2014. Previous studies have used ALOS-2 and Sentinel-1 data for forest structural estimation (e.g., [23,55]), and showed the potential use of SAR data. Further studies might investigate the integration of optical and SAR satellite data for estimating forest structural attributes.
CubeSat data, such as those from PlanetScope, has great potential to map forest structural attributes in conjunction with field surveys and airborne LiDAR data on a regional and countrywide scale at a reasonable cost. This study found that PlanetScope performed well with a fine spatial resolution of 3 m; however, Sentinel-2 was more accurate for other spatial resolutions using multi-seasonal composites. Sentinel-2 might be more applicable for large scale monitoring, while PlanetScope might be used for near-real-time and detailed monitoring. In addition, because of the high spatial and temporal resolution of the PlanetScope data, robust forest monitoring using PlanetScope data, such as the near-real-time monitoring of forest attributes with a finer spatial resolution, may be possible. Future research is needed to study the various applications in different forest types.

Conclusions
This study examined the capabilities of PlanetScope data to map canopy heights and compared this with the performance of Landsat 8 and Sentinel-2 data in temperate forests. Previous studies have rarely investigated the predictive features from multi-seasonal composites and time series of PlanetScope data. This study systematically evaluated the relative performance of RF models from these variables. The accuracy of RF models for estimating airborne LiDAR derived MCH indicated that PlanetScope data was suitable for mapping at 3 m spatial resolution with an rRMSE of 51.3% and an R 2 of 0.70; however, for lower spatial resolutions (10, 20, and 30 m), RF models using Sentinel-2 data performed better. The RF models using multi-seasonal composites achieved a higher prediction accuracy compared with those using time-series data or a single composite. PlanetScope data is recommended for use with finer spatial resolution; however, considering the minor differences between the prediction accuracies of Sentinel-2 and PlanetScope, and the high spatial and temporal resolution of PlanetScope, further uses of PlanetScope data, such as near-real-time monitoring, should be investigated.