Potential of Multi-Temporal ALOS-2 PALSAR-2 ScanSAR Data for Vegetation Height Estimation in Tropical Forests of Mexico

Information on the spatial distribution of forest structure parameters (e.g., aboveground biomass, vegetation height) are crucial for assessing terrestrial carbon stocks and emissions. In this study, we sought to assess the potential and merit of multi-temporal dual-polarised L-band observations for vegetation height estimation in tropical deciduous and evergreen forests of Mexico. We estimated vegetation height using dual-polarised L-band observations and a machine learning approach. We used airborne LiDAR-based vegetation height for model training and for result validation. We split LiDAR-based vegetation height into training and test data using two different approaches, i.e., considering and ignoring spatial autocorrelation between training and test data. Our results indicate that ignoring spatial autocorrelation leads to an overoptimistic model’s predictive performance. Accordingly, a spatial splitting of the reference data should be preferred in order to provide realistic retrieval accuracies. Moreover, the model’s predictive performance increases with an increasing number of spatial predictors and training samples, but saturates at a specific level (i.e., at 12 dual-polarised L-band backscatter measurements and at around 20% of all training samples). In consideration of spatial autocorrelation between training and test data, we determined an optimal number of L-band observations and training samples as a trade-off between retrieval accuracy and data collection effort. In summary, our study demonstrates the merit of multi-temporal ScanSAR L-band observations for estimation of vegetation height at a larger scale and provides a workflow for robust predictions of this parameter.


Introduction
The status of tropical forests and their temporal dynamics can be assessed and monitored by measuring different forest biophysical parameters (e.g., vegetation height, canopy cover, stem volume and aboveground biomass (AGB)).Accurate spatial estimates of these parameters are crucial to assess terrestrial carbon (C) stocks and C-emissions, as well as to develop sustainable forest management strategies.Furthermore, these products can help provide a better understanding of the ecosystem dynamics and the effects of environmental drivers through modelling.Field measurements of forest biophysical parameters are, however, associated with high costs (e.g., they are labour intensive and time consuming) and are limited to point measurements, which may not adequately describe patterns at different spatial scales.Spatially explicit information on three-dimensional vegetation structure can be provided by Light Detection and Ranging (LiDAR) sensors (e.g., [1][2][3]).Laser pulses sent from a sensor are capable to penetrate the forest canopy and to directly measure vertical vegetation structure.LiDAR is usually operated from an airborne platform and thus limited to small spatial coverage.Both spaceborne as well as airborne LiDAR may provide samples of forest structure parameters over large areas (e.g., regional, national, continental scales) and need to be integrated with satellite imagery to derive wall-to-wall estimates of forest biophysical parameters.
Satellite imagery collected by Radio Detection and Ranging (RADAR) sensors are sensitive to forest structure parameters (e.g., [4][5][6][7][8][9][10][11]), as microwave signals have the capability to penetrate the vegetation profile, and thus to probe the three-dimensional vegetation structure.Furthermore, RADAR data are particularly useful for weather independent applications, as long wavelengths (with a spectral range between 1 cm and 1 m) penetrate clouds.A key parameter obtained from Synthetic Aperture Radar (SAR) data, backscatter intensity, measures the return energy from a target and is determined by the geometric and dielectric properties (which is mostly determined by the water content) of the reflective material, as well as by the frequency, polarisation and angle of incidence of the emitted wave [12].Long wavelengths (e.g., at P-band and L-band) are more suitable for the retrieval of forest structure parameters (e.g., growing stock volume, AGB) because of their ability to penetrate deeper in forest canopies as compared to short wavelengths (e.g., at X-band and C-band) [7,13,14], and thus to interact with large branches (in order of the wavelength) and trunks.In general, the cross-polarised (HV, VH) waves are induced by volume scattering [15] (e.g., as occurring within woody canopies) and are thus more sensitive to volume and possess a stronger correlation with forest structure parameters compared to co-polarised (HH, VV) waves [16,17].Advanced SAR techniques (e.g., interferometry (InSAR), polarimetry (PolSAR), polarimetric interferometry (PolInSAR), tomography (TomoSAR)) show promising results for estimating vegetation structure parameters [18][19][20][21][22][23].For instance, studies based on TanDEM-X single-pass interferometer (e.g., [18,19,24,25]) reported promising sensitivity of the InSAR height to canopy height and thus to stem volume and AGB in boreal and tropical forests.Since the temporal baseline of the TanDEM-X mission is quasi zero, temporal decorrelation of the interferometric coherence can be excluded, and the InSAR coherence is primarily impacted by volume decorrelation.Furthermore, due to the short wavelength of TanDEM-X data (i.e., X-band), most of the incoming electromagnetic energy is scattered from the top of vegetation and the TanDEM-X InSAR height exhibits a strong correlation with vegetation height.For PolInSAR analysis, at least two full polarimetric (i.e., all polarisations are available) SAR datasets acquired from two slightly different positions are needed to determine the volume phase scattering centre, which is located close to the top of a canopy.Accordingly, this measurement is useful to estimate tree height.For TomoSAR methods, a stack of multi-baseline SAR data is required to delineate SAR Tomograms.From the Tomogram, the 3D position and scattering amplitude of each relevant scatterer can be derived, i.e., individual tomographic layers can be classified as surface layers, topmost layers and volume layers (e.g., middle heights).However, currently only one of these methods can be applied for large-scale mapping (i.e., single-pass InSAR), due to the lack of PolInSAR and TomoSAR data at this scale.Moreover, single-pass interferometry based on TanDEM-X data is also somehow limited due to data policy, including restricted data access for scientific use.Because of the data availability and technically less complex analysis compared to InSAR, PolInSAR and TomoSAR, most studies on the estimation of forest structure parameters are based on SAR backscatter analysis (e.g., for AGB estimation [26]).For instance, L-band backscatter was successfully applied to map fractional woody cover [27][28][29][30][31] as well as regional or global forests [32][33][34][35].Moreover, L-band backscatter was used to predict vegetation height in various biomes from boreal [36] and temperate [4,37,38] to tropical forests [39].
In this study, we examine the potential of multi-temporal L-band SAR backscatter acquired by the Advanced Land Observation Satellite 2 Phased Array L-band Synthetic Aperture Radar 2 (ALOS-2 PALSAR-2) in ScanSAR mode to estimate vegetation height over tropical deciduous and evergreen forests in the Yucatan peninsula, Mexico.ScanSAR data exhibit much larger swath width compared to Stripmap data (i.e., 350 km vs. 70 km) and accordingly can cover wider areas at higher repetition rates.For instance, ScanSAR acquisitions cover the entire tropical region approximately every 42 days.Therefore, we analyse the performance of the multi-temporal combination of L-band backscatter for vegetation height mapping.Airborne LiDAR-based vegetation height is used as reference data for model training and validation.
In this paper, we demonstrate the value of multi-temporal PALSAR-2 ScanSAR mosaics compared to open global PALSAR-2 Stripmap Fine Beam Dual Polarisation (FBD) mosaics.Moreover, we apply two different validation approaches to show the effect of spatial autocorrelation on model performance.In previous studies (e.g., [38,39]) it was reported that statistical models improve by increasing the number of spatial predictors as well as response variables.We examine these findings considering two different validation schemes, which include and minimise the autocorrelation.The results of the study are relevant in the context of the upcoming L-band missions for global vegetation monitoring (e.g., ALOS-4, NISAR, SAOCOM, Tandem-L) and an increasing number of L-band SAR data in the near future.Since vegetation height correlates with other forest parameters (e.g., AGB, tree canopy cover) [40][41][42], we can assume that similar results can be achieved for the estimation of AGB and tree canopy cover using multi-temporal dual-polarised L-band SAR backscatter.

Study Area
The study area is the Yucatan peninsula, Mexico, which is mostly covered by tropical deciduous and evergreen forests (Figure 1).According to a land use map for 2010 generated by the Mexican National Institute for Statistics and Geography (INEGI) [43], around 1/3 of the total area (i.e., ~140,000 km 2 ) is covered by tropical deciduous forests (i.e., ~42,000 km 2 ), whereas around 1/2 of the total area is covered by tropical evergreen forests (~68,000 km 2 ).Based on ca.5000 field plots collected over the entire peninsula during the National Forest Inventory programme between 2004 and 2011 [44], mean tree height is 9 m (max.height 27 m), mean diameter at breast height (DBH) is 13 cm (max.DBH 46 cm), mean basal area is 12 m 2 •ha −1 (max. 36m 2 •ha −1 ) and mean AGB is 70 t•ha −1 (max.AGB 227 t•ha −1 ).The area of the peninsula is rather flat.According to the Shuttle Radar Topography Mission (SRTM) Digital Elevation Model (DEM) version 4.1 [45], average elevation is 62 m with a standard deviation of 70.8 m.The average slope is 1.3 • with a standard deviation of 1.5 • .
The regional climate is tropical sub-humid with slightly higher air temperatures in summer than in winter (Figure 2).The average annual temperature between 2013 and 2017 was ~27 • C [46].Average annual precipitation ranges from 1000 to 1500 mm, while the period between May and November is characterised by pronounced rainfall, and the relatively dry period extends between December and April (Figure 2) [46].Furthermore, the rainfall increases from north (~1100 mm mean annual total precipitation between 2013 and 2017 for the federal state of Yucatan) to south (~1400-1450 mm mean annual total precipitation between 2013 and 2017 for the federal states of Campeche and Quintana Roo, respectively).The forests in the north-western part of the peninsula have experienced slash-and-burn agriculture for 2000 years, resulting in forest patches at different succession stages [47].In contrast, the southern and eastern parts of the Yucatan peninsula show one of the lowest annual deforestation rates, with some nearly intact areas of tropical forest in Central America (e.g., Calakmul Biosphere Reserve) [47].

Airborne LiDAR Data
Small-footprint discrete-return airborne LiDAR data were collected by the NASA's G-LiHT imager [48] in April-May 2013 over entire Mexico (Figure 1).The average pulse density was

Airborne LiDAR Data
Small-footprint discrete-return airborne LiDAR data were collected by the NASA's G-LiHT imager [48] in April-May 2013 over entire Mexico (Figure 1).The average pulse density was

Airborne LiDAR Data
Small-footprint discrete-return airborne LiDAR data were collected by the NASA's G-LiHT imager [48] in April-May 2013 over entire Mexico (Figure 1).The average pulse density was approximately 6 returns•m −2 .The data over the Yucatan peninsula were acquired during leaf-off conditions, which can lead to an underestimation of vegetation height.The LiDAR returns were classified into "ground returns", "shrub returns" (i.e., non-ground returns below 1.37 m) and "tree returns" (i.e., returns above 1.37 m).From the topography-normalised point clouds, 88 plot-aggregated LiDAR metrics (e.g., percentiles, density metrics for "shrub returns", "tree returns" and "all returns") as described in [49][50][51] were calculated at 13 m spatial resolution.These LiDAR metrics correspond to the vertical structure of a target.Both LiDAR point cloud, as well as metrics, can be downloaded from the NASA G-LiHT data portal [52].In this study, a LiDAR metric, percentile 100% of all returns (hereafter p100) (i.e., top-of-canopy estimate), was used as reference data (i.e., for model training and result validation).In the next step, we aggregated this LiDAR metric from 13 m × 13 m to 100 m × 100 m using block averaging and nearest neighbour resampling, i.e., a top-of-canopy estimate for a 13 m × 13 m pixel was averaged to 100 m.In total, ca.150,000•1 ha LiDAR samples (ca.1% of the total area of the Yucatan peninsula) were used as reference data for wall-to-wall mapping of vegetation height.Although LiDAR provides an estimation of height, its accuracy might be higher than field height measurements, especially in a forest with a closed canopy, where the top of a tree is difficult to detect from the ground.

SAR Data
The multi-temporal L-band SAR backscatter measurements used in this study were collected by the ALOS-2 PALSAR-2 sensor in dual-polarisation (i.e., HH and HV) ScanSAR mode between October 2014 and February 2018 (resulting in 24 ScanSAR mosaics).ALOS-2 PALSAR-2 ScanSAR data feature a swath width of ca.350 km and a repetition rate of ca.42 days [53].The data were distributed by the Japan Aerospace Exploration Agency (JAXA) in the frame of the Kyoto & Carbon Initiative as 1 • × 1 • tiles with a pixel spacing of 50 m.Single ScanSAR tiles were then mosaicked to provide wall-to-wall L-band backscatter mosaics (hereafter ScanSAR mosaics) over the entire peninsula.In addition to the ScanSAR mosaics, three annual global PALSAR-2 mosaics based on Stripmap Fine Beam Dual Polarisation (i.e., HH and HV) data from 2015, 2016 and 2017 (hereafter FBD mosaics) [54] were used in the study to examine the merit of ScanSAR time series data.The FBD mosaics feature a pixel spacing of 25 m.Both ScanSAR and FBD mosaics were pre-processed (slope-corrected and orthorectified) by JAXA.In the next step, ScanSAR and FBD mosaics were speckle-filtered using the multi-temporal filter by Quegan et al. [55,56].Finally, the speckle-filtered ScanSAR and FBD mosaics were aggregated to a pixel spacing of 100 m using block averaging and nearest neighbour resampling.The SAR data were aggregated to a pixel spacing of 100 m as a trade-off between the efficiency of the model and the spatial details [38].The model's predictive performance increases with decreasing of spatial scale caused by reduction of speckle in SAR data and local variability in reference data [7,30,38,57].24 ScanSAR mosaics and 3 FBD mosaics were then used separately for vegetation height estimation (Section 2.4) (Table 1).

Splitting Methods of Reference Data
For model training and validation of SAR-based estimates, we applied two different splitting methods of the reference data (i.e., LiDAR-based vegetation height).For both approaches, 70% of the reference data were selected for model training (calibration) and the remaining 30% were used for testing (validation).In the first approach, we applied stratified random sampling, i.e., the data were split based on value intervals with ignoring spatial location of the samples.For this, intervals of 3 m were selected, i.e., 70/30 partition for each 3 m height class.We used intervals of 3 m as a trade-off between a sufficient number of samples for each height class and small enough bins to ensure the intra-class similarity.This splitting approach does not consider spatial autocorrelation between the training and test datasets, which generally results in overoptimistic model prediction statistics [58].Therefore, in the second splitting approach, we divided the reference data based on their spatial location using k-means clustering of spatial coordinates (i.e., latitude and longitude) (Figure 3) using the R package "sperrorest" [58].The first splitting approach will hereafter be called "stratified random sampling", while the second splitting approach will be called "spatial sampling".After data splitting using a "stratified random sampling" approach, the value distribution in training and test dataset is similar (Figure 4a).In "spatial sampling" after data splitting based on their geographical location, we fit the histograms of training and test data to each other, to ensure that value distribution in both datasets is similar, and the differences in the model's predictive performance are from spatial autocorrelation and not from different value distributions.For this, the number of samples for specific 3 m height classes have been reduced until similar height distributions in the training and test datasets are reached (Figure 4b).reference data were selected for model training (calibration) and the remaining 30% were used for testing (validation).In the first approach, we applied stratified random sampling, i.e., the data were split based on value intervals with ignoring spatial location of the samples.For this, intervals of 3 m were selected, i.e., 70/30 partition for each 3 m height class.We used intervals of 3 m as a trade-off between a sufficient number of samples for each height class and small enough bins to ensure the intra-class similarity.This splitting approach does not consider spatial autocorrelation between the training and test datasets, which generally results in overoptimistic model prediction statistics [58].Therefore, in the second splitting approach, we divided the reference data based on their spatial location using k-means clustering of spatial coordinates (i.e., latitude and longitude) (Figure 3) using the R package "sperrorest" [58].The first splitting approach will hereafter be called "stratified random sampling", while the second splitting approach will be called "spatial sampling".After data splitting using a "stratified random sampling" approach, the value distribution in training and test dataset is similar (Figure 4a).In "spatial sampling" after data splitting based on their geographical location, we fit the histograms of training and test data to each other, to ensure that value distribution in both datasets is similar, and the differences in the model's predictive performance are from spatial autocorrelation and not from different value distributions.For this, the number of samples for specific 3 m height classes have been reduced until similar height distributions in the training and test datasets are reached (Figure 4b).

Estimation of Vegetation Height from SAR Data
To estimate vegetation height using PALSAR-2 L-band backscatter, we applied different scenarios comprising two splitting approaches of reference data (i.e., "stratified random sampling" and "spatial sampling") and two sets of input variables (i.e., ScanSAR and FBD mosaics).For all scenarios, the estimation of vegetation height was performed using a machine learning algorithm, namely Random Forests (RF) [59].This machine learning algorithm generates an ensemble of regression trees with a random selection of predictors at each node as well as with a random subset of samples for each tree to prevent overfitting.To calculate a single estimate, the predictions of each regression tree are averaged [59].We selected RF, since it is computationally efficient and has already been successfully applied to map vegetation structure metrics over large areas (e.g., AGB, vegetation height) with high retrieval accuracy (e.g., [37,[60][61][62]).We generated RF models with 500 regression trees.
We furthermore investigated the impact of multi-temporal combination of ScanSAR data on the model's predictive performance.Therefore, we modelled vegetation height using a different number of input layers.The acquisition date of a single ScanSAR mosaic can be found in Figure 5 (note: a single mosaic can be comprised of different ScanSAR products with slightly different acquisition dates (+/− one week from the reference date in Figure 5).We first estimated vegetation height with the first four ScanSAR scenes from2014-2015 (i.e., mosaics from 13 October 2014 to 16 February 2015 (Figure 5)) and added four further scenes at each step, i.e., we estimated vegetation height using 4, 8, 12, 16, 20 and 24 ScanSAR mosaics.The reference data were split using "stratified random sampling" and "spatial sampling" with 100 repetitions, i.e., the training and testing data were selected 100 times

Estimation of Vegetation Height from SAR Data
To estimate vegetation height using PALSAR-2 L-band backscatter, we applied different scenarios comprising two splitting approaches of reference data (i.e., "stratified random sampling" and "spatial sampling") and two sets of input variables (i.e., ScanSAR and FBD mosaics).For all scenarios, the estimation of vegetation height was performed using a machine learning algorithm, namely Random Forests (RF) [59].This machine learning algorithm generates an ensemble of regression trees with a random selection of predictors at each node as well as with a random subset of samples for each tree to prevent overfitting.To calculate a single estimate, the predictions of each regression tree are averaged [59].We selected RF, since it is computationally efficient and has already been successfully applied to map vegetation structure metrics over large areas (e.g., AGB, vegetation height) with high retrieval accuracy (e.g., [37,[60][61][62]).We generated RF models with 500 regression trees.
We furthermore investigated the impact of multi-temporal combination of ScanSAR data on the model's predictive performance.Therefore, we modelled vegetation height using a different number of input layers.The acquisition date of a single ScanSAR mosaic can be found in Figure 5 (note: a single mosaic can be comprised of different ScanSAR products with slightly different acquisition dates (+/− one week from the reference date in Figure 5).We first estimated vegetation height with the first four ScanSAR scenes from2014-2015 (i.e., mosaics from 13 October 2014 to 16 February 2015 (Figure 5)) and added four further scenes at each step, i.e., we estimated vegetation height using 4, 8, 12, 16, 20 and 24 ScanSAR mosaics.The reference data were split using "stratified random sampling" and "spatial sampling" with 100 repetitions, i.e., the training and testing data were selected 100 times using both approaches to get reliable goodness-of-fit statistics (i.e., R 2 and RMSE).Additionally, we performed the same steps for speckle unfiltered ScanSAR data to show the effect of speckle filtering.
Finally, we examined the impact of the number of training samples on the model prediction performance.Therefore, from the training set, 1%, 5%, and every decile (i.e., 10%, 20% ... 90%) of the data were sampled randomly 100 times and model performance statistics (i.e., R 2 and RMSE) for each scenario were analysed.The validation dataset from "stratified random sampling" and "spatial sampling" (e.g., red stripes in Figure 3) remained unchanged.
Remote Sens. 2018, 10, x FOR PEER REVIEW 8 of 19 using both approaches to get reliable goodness-of-fit statistics (i.e., R 2 and RMSE).Additionally, we performed the same steps for speckle unfiltered ScanSAR data to show the effect of speckle filtering.Finally, we examined the impact of the number of training samples on the model prediction performance.Therefore, from the training set, 1%, 5%, and every decile (i.e., 10%, 20% ... 90%) of the data were sampled randomly 100 times and model performance statistics (i.e., R 2 and RMSE) for each scenario were analysed.The validation dataset from "stratified random sampling" and "spatial sampling" (e.g., red stripes in Figure 3) remained unchanged.

Estimation of Vegetation Height from SAR Data
We first sought to determine the merit of multi-temporal ScanSAR mosaics compared to the annual global FBD mosaics to map vegetation height.To examine this, we modelled vegetation height using ScanSAR and FBD mosaics separately.As described above, we applied two validation approaches, i.e., "stratified random sampling" and "spatial sampling" of reference data.
The results based on the "stratified random sampling" approach provide higher goodness-of-fit statistics (R 2 and RMSE) compared to the "spatial sampling" approach for different sets of input variables (i.e., ScanSAR and FBD mosaics) (Figure 6).This is caused by the spatial autocorrelation of reference data, which is ignored in the "stratified random sampling" approach.In contrast, the "spatial sampling" validation considers this effect, providing a more realistic estimation of the model's predictive performance [58], and might contribute to building a more robust predictive model.Furthermore, in the "spatial sampling" validation, the effect of over-and underestimations at low and high ranges, respectively, is more apparent compared to "stratified random sampling".The effect of over-and underestimations at both ends have been reported in many studies that have applied an ensemble of regression trees (e.g., Random Forests) [39,63,64].

Estimation of Vegetation Height from SAR Data
We first sought to determine the merit of multi-temporal ScanSAR mosaics compared to the annual global FBD mosaics to map vegetation height.To examine this, we modelled vegetation height using ScanSAR and FBD mosaics separately.As described above, we applied two validation approaches, i.e., "stratified random sampling" and "spatial sampling" of reference data.
The results based on the "stratified random sampling" approach provide higher goodness-of-fit statistics (R 2 and RMSE) compared to the "spatial sampling" approach for different sets of input variables (i.e., ScanSAR and FBD mosaics) (Figure 6).This is caused by the spatial autocorrelation of reference data, which is ignored in the "stratified random sampling" approach.In contrast, the "spatial sampling" validation considers this effect, providing a more realistic estimation of the model's predictive performance [58], and might contribute to building a more robust predictive model.Furthermore, in the "spatial sampling" validation, the effect of over-and underestimations at low and high ranges, respectively, is more apparent compared to "stratified random sampling".The effect of over-and underestimations at both ends have been reported in many studies that have applied an ensemble of regression trees (e.g., Random Forests) [39,63,64].The training and test data were split using "stratified random sampling" (left) and "spatial sampling" (right).ScanSAR mosaics (24 scenes) show higher retrieval accuracy compared to FBD mosaics (3 scenes) due to a larger number of SAR images.The "stratified random sampling" approach shows higher goodness-of-fit statistics compared to the "spatial sampling" approach due to spatial autocorrelation between training and test data.
Moreover, the FBD-based vegetation height estimates are more biased at low and high ranges compared to the results based on ScanSAR mosaics for both validation scenarios (Figure 6).In other words, FBD-based vegetation height estimates tend to average height values of the training data.In the difference map between the two products (Figure 7), the FBD height possesses greater values in the areas with small vegetation (e.g., the northern part of the peninsula) compared to the ScanSARbased product (i.e., it shows a larger overestimation), while in the intact areas (e.g., the Calakmul Biosphere Reserve) the FBD height exhibits lower values than in the ScanSAR height (i.e., it shows a larger underestimation).The training and test data were split using "stratified random sampling" (left) and "spatial sampling" (right).ScanSAR mosaics (24 scenes) show higher retrieval accuracy compared to FBD mosaics (3 scenes) due to a larger number of SAR images.The "stratified random sampling" approach shows higher goodness-of-fit statistics compared to the "spatial sampling" approach due to spatial autocorrelation between training and test data.
Moreover, the FBD-based vegetation height estimates are more biased at low and high ranges compared to the results based on ScanSAR mosaics for both validation scenarios (Figure 6).In other words, FBD-based vegetation height estimates tend to average height values of the training data.In the difference map between the two products (Figure 7), the FBD height possesses greater values in the areas with small vegetation (e.g., the northern part of the peninsula) compared to the ScanSAR-based product (i.e., it shows a larger overestimation), while in the intact areas (e.g., the Calakmul Biosphere Reserve) the FBD height exhibits lower values than in the ScanSAR height (i.e., it shows a larger underestimation).For areas with obvious disagreements between the two maps and where LiDAR data were available, transects of vegetation height (red stripes at the bottom of Figure 7) were generated (Figure 8).Compared to the LiDAR measurements, both SAR-based vegetation height maps show an overestimation of small vegetation and an underestimation of tall vegetation.The over-und underestimation at both ends (small and large trees) is partly caused by a tree-based regression (e.g., Random Forests and Cubist), where single predictions of each tree are averaged.Thus, height at low and high range tends to a mean value.Larger bias is observed for the FBD-based estimates (Figure 8).Accordingly, the increasing number of spatial predictors results in a reduction of the bias at both ends (i.e., the lowest and highest vegetation heights).Moreover, a lower deviation from LiDAR height for the A-B transect between pixel IDs 80-150 (Figure 8) can be observed.This is caused by the fact that this part of the LiDAR transect was used for model training, while other parts are independent (Figures 3 and 8).For areas with obvious disagreements between the two maps and where LiDAR data were available, transects of vegetation height (red stripes at the bottom of Figure 7) were generated (Figure 8).Compared to the LiDAR measurements, both SAR-based vegetation height maps show an overestimation of small vegetation and an underestimation of tall vegetation.The over-und underestimation at both ends (small and large trees) is partly caused by a tree-based regression (e.g., Random Forests and Cubist), where single predictions of each tree are averaged.Thus, height at low and high range tends to a mean value.Larger bias is observed for the FBD-based estimates (Figure 8).Accordingly, the increasing number of spatial predictors results in a reduction of the bias at both ends (i.e., the lowest and highest vegetation heights).Moreover, a lower deviation from LiDAR height for the A-B transect between pixel IDs 80-150 (Figure 8) can be observed.This is caused by the fact that this part of the LiDAR transect was used for model training, while other parts are independent (Figures 3 and 8).7), while the bottom image shows vegetation height for the transect C-D in the central part of the peninsula (Figure 7).The FBD-based map overestimated height in areas with small vegetation and underestimated in areas with tall vegetation more noticeably compared to the ScanSAR-based map.For visualisation reasons, every 5 pixels from north to south was averaged.7), while the bottom image shows vegetation height for the transect C-D in the central part of the peninsula (Figure 7).The FBD-based map overestimated height in areas with small vegetation and underestimated in areas with tall vegetation more noticeably compared to the ScanSAR-based map.For visualisation reasons, every 5 pixels from north to south was averaged.

Impact of Number of L-Band Observations on Model's Predictive Performance
Here, we analysed the influence of the number of spatial predictors on model prediction performance.For this analysis, we used ScanSAR mosaics only.As mentioned in Section 2.4, the reference data were divided into training and test data using "stratified random sampling" and "spatial sampling" with 100 repetitions.Furthermore, to show the effect of a speckle filter on the model's predictive performance, unfiltered and multi-temporal speckle-filtered SAR data after [56] were used as predictive variables.The models based on "stratified random sampling" demonstrate a steady increase of R 2 and decrease of RMSE using both unfiltered and filtered SAR data (Figure 9 upper and bottom left).Moreover, the results based on the filtered SAR data possess much higher R 2 and lower RMSE compared to the results from unfiltered SAR data.The increase of R 2 and decrease of RMSE from 4 to 8 scenes and further is much stronger in the models based on speckle-filtered SAR data.This is most likely caused by the fact that the applied filter uses a 7 × 7 moving window and thus strengthens the spatial autocorrelation between training and test data.Impact of number of scenes on model prediction performance using "stratified random sampling" (left) and "spatial sampling" (right) for multi-temporal Quegan speckle-filtered data ("mtf", white boxplots) and unfiltered ScanSAR data ("orig", dark-grey boxplot).

Impact of Number of Samples on Model Prediction Performance
To test the impact of number of training samples on model's predictive performance, the test data from both sampling strategies remained unchanged, while training set from both sampling strategies were sampled randomly 100 times with 1%, 5%, and every decile of the data.According to Figure 10 (dark-grey boxplots), for the "stratified random sampling" scenario, an increasing number of training data results in a steady improvement of the goodness-of-fit statistics (i.e., increase of R 2 of Figure 9. Impact of number of scenes on model prediction performance using "stratified random sampling" (left) and "spatial sampling" (right) for multi-temporal Quegan speckle-filtered data ("mtf", white boxplots) and unfiltered ScanSAR data ("orig", dark-grey boxplot).
Another observation is that the variance of the model statistics based on "stratified random sampling" is much lower compared to the model statistics based on "spatial sampling".Considering the models based on "spatial sampling", those based on filtered data possess higher R 2 and lower RMSE compared to those based on unfiltered SAR data (Figure 9 upper and bottom right).However, the differences in the statistics are not as high as in the "stratified random sampling" models.This confirms that the differences between the results based on filtered and unfiltered SAR data using "stratified random sampling" are caused by the spatial autocorrelation of the reference data.Finally, it can be observed that for more than 12 scenes, further increment of acquisitions does not result in further improvement of R 2 and RMSE.

Impact of Number of Samples on Model Prediction Performance
To test the impact of number of training samples on model's predictive performance, the test data from both sampling strategies remained unchanged, while training set from both sampling strategies were sampled randomly 100 times with 1%, 5%, and every decile of the data.According to Figure 10 (dark-grey boxplots), for the "stratified random sampling" scenario, an increasing number of training data results in a steady improvement of the goodness-of-fit statistics (i.e., increase of R 2 of about 27% and decrease of RMSE of about 22%).This is again caused by the spatial autocorrelation between training and test data, i.e., with an increasing number of training data, the probability to be adjoined with test data increases, resulting in continuous improvement of retrieval accuracies.However, if spatial autocorrelation is considered (i.e., the "spatial sampling" scenario), the model's predictive performance saturates with an increasing number of training samples (Figure 10

Discussion and Summary
We examined the potential of multi-temporal L-band SAR backscatter to estimate vegetation height.Additionally, we investigated the impact of the number of spatial predictors and training samples on the model's predictive performance given that the dataset is characterised by spatial autocorrelation.
This study has three main implications.First, we showed the value of multi-temporal L-band Figure 10.Impact of sample quantity on model prediction performance using "stratified random sampling" (dark-grey boxplot) and "spatial sampling" (white boxplot).With an increasing number of training samples, the model performance increases continuously for "stratified random sampling" and saturates for "spatial sampling".1% of training data corresponds to 1000 1-ha samples.

Discussion and Summary
We examined the potential of multi-temporal L-band SAR backscatter to estimate vegetation height.Additionally, we investigated the impact of the number of spatial predictors and training samples on the model's predictive performance given that the dataset is characterised by spatial autocorrelation.
This study has three main implications.First, we showed the value of multi-temporal L-band SAR backscatter for estimation of vegetation height.It is well known that, depending on forest structure, L-band SAR backscatter saturates in dense forests at biomass levels around 100 t/ha [65][66][67].Our results indicate that including more L-band observations in a statistical model can partly help to reduce under-and overestimation at low and high ranges of a forest parameter, respectively (Figure 8).These results are in agreement with previous studies reporting that the usage of multi-temporal L-and C-band SAR data improves retrieval accuracies of growing stock volume and AGB [4,8,23,68,69].We found, however, that after a specific number of observations, the model's predictive performance is not further enhanced.In our case, using 12 dual-polarised L-band SAR observations, we estimated vegetation height with similar retrieval accuracies as using 24 dual-polarised L-band SAR observations (Figure 9).Obviously, 10-12 ScanSAR observations are sufficient to represent the multi-seasonal conditions of the forest vegetation over one year including dry and wet conditions at different phenological stages (i.e., leaf-off/leaf-on).According to our findings, the integration of 12 additional ScanSAR scenes does not lead to further prediction improvements.Hence, no relevant additional information can be gained from these images.
Second, spatial autocorrelation in the training and test dataset must be considered to provide a realistic predictive performance of the model.In the case where spatial autocorrelation in the reference data is ignored, estimation of the model's performance is overoptimistic, caused by spatial autocorrelation between training and test data [58].Moreover, ignoring spatial autocorrelation between training and test data can lead to incorrect conclusions, e.g., an increasing number of spatial predictors and/or training samples leads to steady improvements of the model's predictive performance.
Third, an increasing number of training samples leads to improvements in the model's predictive performance, but it saturates at a specific percentage of training samples.In contrast, Xu et al. [39] reported steady improvements of model performance with an increasing number of training samples in tropical forests.Garcia et al. [38] showed that retrieval accuracies were enhanced and saturated at a different percentage of training samples depending on forest structure (i.e., temperate broadleaf, mixed or coniferous forests).Nonetheless, neither study split the training and test data using geographical location of them, which might result in an overoptimistic model performance due to the likely spatial autocorrelation.In our study, using 20% of the training data was sufficient to get a similar level of accuracy as using 100% of the training data (Figures 6 and 10).20% of the training data represents 20,000 1-ha samples, though it is still a large number of training data.In any case, it is crucial that training samples represent the entire range of values (in our case vegetation height) from different forest types (i.e., deciduous and evergreen forests) of the study area.
As expected, speckle filtering of the SAR images resulted in a better retrieval accuracy.However, using a speckle filter based on a moving window approach strengthens the spatial autocorrelation in the reference data.Nowadays, with the increasing availability of SAR time series (e.g., Sentinel-1), novel approaches that rely on temporal patterns only can be applied to suppress speckle and preserve spatial details without spatial blurring [70].Furthermore, since the difference in acquisition time between LiDAR and SAR data is between one to five years, significant changes (caused, e.g., by fire or deforestation) within the LiDAR transects might have occurred, which together with forest growth reduce model's predictive performance [71].
As shown in several studies [38,39,63,71,72], retrieval accuracies of forest structure parameters can be improved by including additional information from optical remote sensing data, i.e., Landsat or Sentinel-2 surface reflectance, and digital surface model (e.g., SRTM DEM).Nevertheless, in this study, we analysed the performance of multi-temporal L-band backscatter data only for the estimation of vegetation height.Since vegetation height correlates with other forest parameters (e.g., AGB, tree canopy cover) [40][41][42], we can assume that similar results can be achieved for the estimation of AGB and tree canopy cover using multi-temporal dual-polarised L-band SAR backscatter.

Conclusions
ALOS-2 PALSAR-2 ScanSAR data provide time series of dual-polarised L-band observations (~10 observations per year) over wider areas (i.e., 350 km) at medium spatial resolution (i.e., 50 m), which is crucial for land monitoring at national or continental scales.In this study, we investigated the potential of this dataset to map vegetation height over tropical deciduous and evergreen forests of the Yucatan peninsula, Mexico.For this, we used airborne LiDAR-based vegetation height for the training of L-band backscatter using the Random Forests algorithm.Specifically, we examined the value of multi-temporal L-band data for the estimation of vegetation height taking the spatial autocorrelation between training and test data into account.Our results indicate that ignoring spatial autocorrelation between training and test data lead to an overoptimistic model's predictive performance.Accordingly, a spatial splitting of the reference data into training and test data should be preferred to provide realistic retrieval accuracies.Moreover, based on this analysis, we determined an optimal number of L-band observations and training samples as a trade-off between retrieval accuracies and data collection effort.
Open data policies such as those of the ESA and NASA stimulate development of novel approaches based on these data (e.g., Landsat, Sentinels).Bearing in mind new L-band missions in near future (ALOS-4, NISAR, SAOCOM, Tandem-L) that will provide time series of L-band observations, open L-band PALSAR-2 ScanSAR data for the scientific community would foster further development of innovative algorithms for forest monitoring including mapping of forest structure parameters and detection of deforestation and forest degradation over large areas to support international climate initiatives (e.g., UN Reducing Emissions from Deforestation and Forest Degradation+ programme).

Figure 1 .
Figure 1.Study area.Land use and vegetation map of Mexico from the Mexican National Institute for Statistics and Geography (INEGI) Series IV [43] with the available airborne LiDAR strips.

Figure 2 .
Figure 2. Mean monthly air temperature (circles on the left-hand side) and total monthly precipitation (bars on the right-hand side) for three Mexican federal states for the years 2013-2017.Error bars represent monthly standard deviations.

Figure 1 .
Figure 1.Study area.Land use and vegetation map of Mexico from the Mexican National Institute for Statistics and Geography (INEGI) Series IV [43] with the available airborne LiDAR strips.

Figure 1 .
Figure 1.Study area.Land use and vegetation map of Mexico from the Mexican National Institute for Statistics and Geography (INEGI) Series IV [43] with the available airborne LiDAR strips.

Figure 2 .
Figure 2. Mean monthly air temperature (circles on the left-hand side) and total monthly precipitation (bars on the right-hand side) for three Mexican federal states for the years 2013-2017.Error bars represent monthly standard deviations.

Figure 2 .
Figure 2. Mean monthly air temperature (circles on the left-hand side) and total monthly precipitation (bars on the right-hand side) for three Mexican federal states for the years 2013-2017.Error bars represent monthly standard deviations.

Figure 3 .
Figure 3. Location of training and validation data using the "spatial sampling" approach of the LiDAR metric p100.Black stripes represent model training data; red stripes represent validation data.

Figure 3 .
Figure 3. Location of training and validation data using the "spatial sampling" approach of the LiDAR metric p100.Black stripes represent model training data; red stripes represent validation data.

Figure 4 .
Figure 4. Histograms of LiDAR-based vegetation height data used for training and test using (a) "stratified random sampling" (i.e., based on value intervals while ignoring spatial location) and (b) "spatial sampling" (i.e., based on spatial location).

Figure 4 .
Figure 4. Histograms of LiDAR-based vegetation height data used for training and test using (a) "stratified random sampling" (i.e., based on value intervals while ignoring spatial location) and (b) "spatial sampling" (i.e., based on spatial location).

Figure 5 .
Figure 5. Acquisition dates of the ScanSAR mosaics with corresponding monthly precipitation for the three federal states from [46].

Figure 5 .
Figure 5. Acquisition dates of the ScanSAR mosaics with corresponding monthly precipitation for the three federal states from [46].

Figure 6 .
Figure 6.ScanSAR (above) and FBD (bottom)-based vegetation height estimates plotted against LiDAR p100 metric.The training and test data were split using "stratified random sampling" (left) and "spatial sampling" (right).ScanSAR mosaics (24 scenes) show higher retrieval accuracy compared to FBD mosaics (3 scenes) due to a larger number of SAR images.The "stratified random sampling" approach shows higher goodness-of-fit statistics compared to the "spatial sampling" approach due to spatial autocorrelation between training and test data.

Figure 6 .
Figure 6.ScanSAR (above) and FBD (bottom)-based vegetation height estimates plotted against LiDAR p100 metric.The training and test data were split using "stratified random sampling" (left) and "spatial sampling" (right).ScanSAR mosaics (24 scenes) show higher retrieval accuracy compared to FBD mosaics (3 scenes) due to a larger number of SAR images.The "stratified random sampling" approach shows higher goodness-of-fit statistics compared to the "spatial sampling" approach due to spatial autocorrelation between training and test data.

Figure 7 .
Figure 7. Vegetation height estimates based on ScanSAR (A) and FBD (B) mosaics using the "spatial sampling" approach.The difference map (C) between the two products depicts disagreements in areas with low (northern part) and tall trees (central and southern parts).The height values over the two red transects (A-B profile and C-D profile) are shown in Figure 8. White gaps in the ScanSARbased vegetation height map resulted from the gaps in the ScanSAR backscatter mosaics.

Figure 7 .
Figure 7. Vegetation height estimates based on ScanSAR (A) and FBD (B) mosaics using the "spatial sampling" approach.The difference map (C) between the two products depicts disagreements in areas with low (northern part) and tall trees (central and southern parts).The height values over the two red transects (A-B profile and C-D profile) are shown in Figure 8. White gaps in the ScanSAR-based vegetation height map resulted from the gaps in the ScanSAR backscatter mosaics.

Figure 8 .
Figure 8. Transects of the vegetation height of the three products over areas with small (upper profile) and tall trees (bottom profile).The upper image shows vegetation height for the transect A-B in the northern part of the peninsula (Figure7), while the bottom image shows vegetation height for the transect C-D in the central part of the peninsula (Figure7).The FBD-based map overestimated height in areas with small vegetation and underestimated in areas with tall vegetation more noticeably compared to the ScanSAR-based map.For visualisation reasons, every 5 pixels from north to south was averaged.

Figure 8 .
Figure 8. Transects of the vegetation height of the three products over areas with small (upper profile) and tall trees (bottom profile).The upper image shows vegetation height for the transect A-B in the northern part of the peninsula (Figure7), while the bottom image shows vegetation height for the transect C-D in the central part of the peninsula (Figure7).The FBD-based map overestimated height in areas with small vegetation and underestimated in areas with tall vegetation more noticeably compared to the ScanSAR-based map.For visualisation reasons, every 5 pixels from north to south was averaged.
, white boxplots).With an increasing number of training data in the "spatial sampling" scenario, the model performance is enhanced up to a threshold of 20% of the training data, and very slightly afterwards.Additionally, the variance in the model statistics decreases with an increasing number of training samples.Based on the results, a threshold of around 20% of training data represents a plausible trade-off between model efficiency and data collection effort.20% of training data corresponds here to 20,000 1-ha samples.Remote Sens. 2018, 10, x FOR PEER REVIEW 14 of 19 boxplots).With an increasing number of training data in the "spatial sampling" scenario, the model performance is enhanced up to a threshold of 20% of the training data, and very slightly afterwards.Additionally, the variance in the model statistics decreases with an increasing number of training samples.Based on the results, a threshold of around 20% of training data represents a plausible tradeoff between model efficiency and data collection effort.20% of training data corresponds here to 20,000 1-ha samples.

Figure 10 .
Figure10.Impact of sample quantity on model prediction performance using "stratified random sampling" (dark-grey boxplot) and "spatial sampling" (white boxplot).With an increasing number of training samples, the model performance increases continuously for "stratified random sampling" and saturates for "spatial sampling".1% of training data corresponds to 1000 1-ha samples.

Table 1 .
Dataset used in the study.