Canopy Height Estimation Using Sentinel Series Images through Machine Learning Models in a Mangrove Forest

Canopy height serves as a good indicator of forest carbon content. Remote sensing-based direct estimations of canopy height are usually based on Light Detection and Ranging (LiDAR) or Synthetic Aperture Radar (SAR) interferometric data. LiDAR data is scarcely available for the Indian tropics, while Interferometric SAR data from commercial satellites are costly. High temporal decorrelation makes freely available Sentinel-1 interferometric data mostly unsuitable for tropical forests. Alternatively, other remote sensing and biophysical parameters have shown good correlation with forest canopy height. The study objective was to establish and validate a methodology by which forest canopy height can be estimated from SAR and optical remote sensing data using machine learning models i.e., Random Forest (RF) and Symbolic Regression (SR). Here, we analysed the potential of Sentinel-1 interferometric coherence and Sentinel-2 biophysical parameters to propose a new method for estimating canopy height in the study site of the Bhitarkanika wildlife sanctuary, which has mangrove forests. The results showed that interferometric coherence, and biophysical variables (Leaf Area Index (LAI) and Fraction of Vegetation Cover (FVC)) have reasonable correlation with canopy height. The RF model showed a Root Mean Squared Error (RMSE) of 1.57 m and R2 value of 0.60 between observed and predicted canopy heights; whereas, the SR model through genetic programming demonstrated better RMSE and R2 values of 1.48 and 0.62 m, respectively. The SR also established an interpretable model, which is not possible via any other machine learning algorithms. The FVC was found to be an essential variable for predicting forest canopy height. The canopy height maps correlated with ICESat-2 estimated canopy height, albeit modestly. The study demonstrated the effectiveness of Sentinel series data and the machine learning models in predicting canopy height. Therefore, in the absence of commercial and rare data sources, the methodology demonstrated here offers a plausible alternative for forest canopy height estimation.


Introduction
Understanding the role of forest carbon emissions and sequestration is needed to build a robust framework for international agreements to limit the concentration of greenhouse gases in the atmosphere [1]. The function of tropical forests is critical in the global carbon cycle because they are carbon-dense and highly productive [2]. Above-Ground Biomass (AGB) is the best indicator of the carbon content of tropical forests [3]. AGB estimation models for tropical forests generally ignore canopy height as a factor [4]. However, studies have shown that the inclusion of canopy height in the allometric models tends to improve the estimation accuracy of AGB in tropical forests [4][5][6]. Therefore, the tree canopy height of tropical forests is an essential factor in estimating its biomass, and an inaccurate estimate of canopy height can result in over-or underestimation of AGB [7]. total one-sided leaf area per unit ground surface area, demonstrating good correlation with tree height [48,49]. LiDAR based studies found that canopy height has a good correlation with LAI [50,51] and one could serve as proxy to other [52]. Canopy height estimation with/without a Digital Elevation Model (DEM) has been benefitted from the Shuttle Radar Topography Mission (SRTM) [53,54]. RS-based Canada Airborne LiDAR, WorldView-2 Multispectral LAI, LiDAR height [50] Modeling the relationship of forest parameters such as canopy height from field measured values, and remote sensing derived variables can be done in several ways. The most common are parametric regression methods [55][56][57] or machine learning models like Random Forest (RF) [58,59]. Due to their simplicity, parametric regression-based methods are widely used for modeling biophysical parameters and remote sensing variables. One major problem in parametric regression is that it assumes a standard relationship before the analysis, which may not be true, in reality. RF is a decision tree-based machine learning model used for estimation of biophysical parameters [60]. The RF model can capture the actual non-linear relationship between the predictor and predicted variables. However, the interpretation of an RF model is complicated as it consists of many trees and numerous sets of rules.
Symbolic Regression (SR) using genetic programming is a relatively modern technique that estimates a straightforward best-fit model for a given dataset by minimizing error rates while searching through all possible regression models [61,62]. Conventional machine learning algorithms work as black-boxes; implying that the internal mechanisms of these algorithms are hard to comprehend and difficult to reproduce desired results. The SR model works with the principle to determine the input-output relationship and selection of variables, which are most effective in predicting outputs [63]. The SR model's inclination towards finding correct solutions makes it distinct from other types of regressions. SR is capable of constructing a robust and interpretable formula, which is not possible by other linear and nonlinear regressions or machine learning models.
The main objective of this study was to establish a methodology by which forest canopy height can be estimated using SAR and optical remote sensing data using machine learning models. The mangrove forest in the Bhitarkanika Wildlife Sanctuary (BWS), Odisha, India, was chosen as the study area. First, we attempted to establish canopy height models using SR and RF, and Sentinel-1 and Sentinel-2 derived parameters. Further, the canopy heights estimated using the two models were compared and validated with field measured and ICESat-2 derived data.

Study Area and Filed Measurement of Canopy Heights
The Bhitarkanika Wildlife Sanctuary (BWS), with 58 species, is considered as one of the vital mangrove ecosystems in the world due to its genetic diversity [64]. Dense and moderately dense mangroves cover an area of 165 km 2 [65] of BWS. Some of the dominant species of BWS are Heritiera sp., Excoecaria sp., Avicennia sp., and Sonneratia sp. [66]. This area receives a high average annual rainfall of about 1642 mm, and most of it is received during June to October. BWS experiences a typical warm and humid tropical climate with temperature maxima in May and minima in January [67]. Multiple field surveys were conducted during November-December 2018 for canopy height measurement. Height measurement of all the trees in an inventory plot is time consuming, and redundant, especially if it has to be correlated with satellite derived canopy height pixels. Sullivan et al. [68] have mentioned that although an increase in the number of sampled trees results in better accuracy, the inclusion of the ten largest trees is most important. Thus, we observed the sampling frequency of the 10 largest trees per plot and measured tree height using a laser range finder instrument. The center locations were obtained using a handheld GPS for all 185 plots ( Figure 1). The Bhitarkanika Wildlife Sanctuary (BWS), with 58 species, is considered as one of the vital mangrove ecosystems in the world due to its genetic diversity [64]. Dense and moderately dense mangroves cover an area of 165 km 2 [65] of BWS. Some of the dominant species of BWS are Heritiera sp., Excoecaria sp., Avicennia sp., and Sonneratia sp. [66]. This area receives a high average annual rainfall of about 1642 mm, and most of it is received during June to October. BWS experiences a typical warm and humid tropical climate with temperature maxima in May and minima in January [67]. Multiple field surveys were conducted during November-December 2018 for canopy height measurement. Height measurement of all the trees in an inventory plot is time consuming, and redundant, especially if it has to be correlated with satellite derived canopy height pixels. Sullivan et al. [68] have mentioned that although an increase in the number of sampled trees results in better accuracy, the inclusion of the ten largest trees is most important. Thus, we observed the sampling frequency of the 10 largest trees per plot and measured tree height using a laser range finder instrument. The center locations were obtained using a handheld GPS for all 185 plots ( Figure 1).

Satellite Data and Processing
Sentinel-1 and Sentinel-2 data were downloaded from the European Space Agency's (ESA) Copernicus Hub. The Sentinel-1 mission consists of two satellites, Sentinel-1A and Sentinel-1B, which image the earth in C-Band SAR with a six day revisit period between them. A total of six images, three each for Sentinel-1A and Sentinel-1B, were in the Single Look Complex (SLC) format. They were acquired in the interferometric wide-swath mode (IW), which contains both amplitude and phase information of the backscattered SAR signal (Tables 2,3). Sentinel-1 uses the advanced Terrain Observation by Progressive Scans (TOPS) SAR mode to capture images in three sub-swaths in a total swath width of 250 km [69]. The study area falls in the second sub-swath of all the SLC images. So, each image was split to subset only the second sub-swath. Each six day pair of Sentinel-1 SLC images was co-registered in the ESA's Sentinel Application Platform (SNAP) using orbital information and SRTM 1-sec DEM. Only the VV polarisation band of the Sentinel-1 data were selected for the study as the VH polarisation tends to lower the coherence value by introducing decorrelation due to cross-polarization noise [70]. The Sentinel-2 mission also consists of two satellites, Sentinel-2A and Sentinel-2B, with a five day revisit period in combination. Seven cloud free Sentinel-2 data were acquired in the L1C processing level, with the top of the atmosphere reflectance values. Sentinel-2 data were atmospherically corrected to the L2A processing level using the SEN2COR processor [71]. It adjusted the image to yield the bottom of the atmosphere-surface

Satellite Data and Processing
Sentinel-1 and Sentinel-2 data were downloaded from the European Space Agency's (ESA) Copernicus Hub. The Sentinel-1 mission consists of two satellites, Sentinel-1A and Sentinel-1B, which image the earth in C-Band SAR with a six day revisit period between them. A total of six images, three each for Sentinel-1A and Sentinel-1B, were in the Single Look Complex (SLC) format. They were acquired in the interferometric wide-swath mode (IW), which contains both amplitude and phase information of the backscattered SAR signal (Tables 2 and 3). Sentinel-1 uses the advanced Terrain Observation by Progressive Scans (TOPS) SAR mode to capture images in three sub-swaths in a total swath width of 250 km [69]. The study area falls in the second sub-swath of all the SLC images. So, each image was split to subset only the second sub-swath. Each six day pair of Sentinel-1 SLC images was co-registered in the ESA's Sentinel Application Platform (SNAP) using orbital information and SRTM 1-sec DEM. Only the VV polarisation band of the Sentinel-1 data were selected for the study as the VH polarisation tends to lower the coherence value by introducing decorrelation due to cross-polarization noise [70]. The Sentinel-2 mission also consists of two satellites, Sentinel-2A and Sentinel-2B, with a five day revisit period in combination. Seven cloud free Sentinel-2 data were acquired in the L1C processing level, with the top of the atmosphere reflectance values. Sentinel-2 data were atmospherically corrected to the L2A processing level using the SEN2COR processor [71]. It adjusted the image to yield the bottom of the atmosphere-surface reflectance. The pre-processed Sentinel-1 and Sentinel-2 data were further used for specific parameter extraction and canopy height modeling ( Figure 2).  reflectance. The pre-processed Sentinel-1 and Sentinel-2 data were further used for specific parameter extraction and canopy height modeling ( Figure 2). Launched in September 2018, ICESat-2 can measure the earth surface with a 17 m diameter footprint and with 91 days temporal resolution. ICESat-2 has three pairs of beams. Each pair of beams footprint is separated by about 3 km of cross-track with a pair spacing of 90 m [72]. The ICESat-2 data were downloaded in hdf5 format from the National Snow and Ice Data Centre (NSIDC) [73]. The data product selected was ALT08, which includes the height of the surface, including the canopy. An R package was used to extract the canopy height information from the ALT08 data.

Interferometric Coherence and Biophysical Parameters Extraction
Coherence for all co-registered SLC pairs was estimated using the SNAP tool. The coherence of an image pair depends on the baseline length. A massive baseline results in low coherence and vice versa. The longest baseline for which coherence becomes zero is known as the critical baseline [39]. It can be expressed as a function of band chirp width, orbital height, and sensor operating frequency [74], as shown in Equation (1).
Where, = critical baseline; = chirp width; = operating frequency; = incidence angle Sentinel-1 sensor characteristics vary according to observation mode [75]. The critical baseline, Launched in September 2018, ICESat-2 can measure the earth surface with a 17 m diameter footprint and with 91 days temporal resolution. ICESat-2 has three pairs of beams. Each pair of beams footprint is separated by about 3 km of cross-track with a pair spacing of 90 m [72]. The ICESat-2 data were downloaded in hdf5 format from the National Snow and Ice Data Centre (NSIDC) [73]. The data product selected was ALT08, which includes the height of the surface, including the canopy. An R package was used to extract the canopy height information from the ALT08 data.

Interferometric Coherence and Biophysical Parameters Extraction
Coherence for all co-registered SLC pairs was estimated using the SNAP tool. The coherence of an image pair depends on the baseline length. A massive baseline results in low coherence and vice versa. The longest baseline for which coherence becomes zero is known as the critical baseline [39]. It can be expressed as a function of band chirp width, orbital height, and sensor operating frequency [74], as shown in Equation (1).
where, B cr = critical baseline; b = chirp width; f = operating frequency; θ = incidence angle. Sentinel-1 sensor characteristics vary according to observation mode [75]. The critical baseline, obtained using optimal values of parameters stands at 3.65 Km. As all the pairs have a baseline length significantly lower than the critical baseline, the coherence values fall within the acceptable range (Table 3). Backscatter images were also generated from the SLC images by converting them from the SLC image to Ground Range Detected (GRD) image.
Sentinel-2 L2A images were used to calculate LAI and FVC using the SNAP toolbox biophysical variable processor. All Sentinel-2 images were resampled to 20 m pixel size. The biophysical processor algorithm was implemented to generate biophysical products from a range of sensors [76]. An extensive database was prepared to include the top of canopy reflectance and associated vegetation characteristics in the biophysical processor algorithm. This database was used to train neural networks to estimate the canopy characteristics from the top of canopy reflectance along with the observational configuration. In SNAP, the prediction of the new variable was made based on the set of coefficients computed during the training phase.

Canopy Height Modeling
After generating the necessary variables, the next step was to establish the relationship between the variables and canopy height through regression. The values of coherence and other biophysical parameters were extracted at field plot locations to build the dataset for regression. Non-linear regression was implemented via two machine learning models, first, using the RF and then using SR. Primarily, the whole canopy height dataset was divided randomly into two parts using data partition function of CARET package. Seventy percent of it was used for model building, and the other 30 percent was used in model validation. The model building data was further divided into the model training and testing datasets in accordance with the model characteristics. Therefore, for both the models, we had separate model building, i.e., training and testing, and validation data.

Random Forest
The RF model was implemented via the CARET package in R [77]. In an RF model, hundreds of trees are built based on a bootstrap sample of the original data [78]. Variables were chosen randomly at each node for the split, and the final value was predicted by averaging the prediction of all the trees. The importance of the variables in the RF model was measured to find the most influential variables. As the first step of the importance measurement, the Mean Squared Error (MSE) was measured for each tree using an Out Of the Bag (OOB) sample. Thereafter a new error rate was calculated using the same procedure after permuting a variable. The difference between the two accuracies were averaged for all trees, and normalized by the standard error. This value was termed as the importance of the permuted variable to the model. The exclusion of a variable with positive importance increases the error rate of the model, while it is opposite for the variables with negative importance. After the initial run of the model, variables with negative importance were removed from the predictor list. The final model was built only on the variables with positive importance. Here one more fold cross-validation was done to reduce the chance of over-fitting. Initially, the field-measured canopy heights were correlated with the remote sensing derived variables, followed by model prediction of canopy heights.

Symbolic Regression
The use of SR, in retrieval of biophysical parameters has not been explored earlier. The success of machine learning algorithms in the biophysical parameters' retrieval problem showed the existence of the non-linear relationship between remotely sensed variables and biophysical parameters. However, the establishment of an interpretable model describing the relationship was not possible with machine learning regression. The working principle of SR makes it a viable option for the establishment of such a model. SR was implemented through genetic programming, and it consisted of several steps. First was the selection of the terminals, i.e., independent variables. Coherence, SRTM DEM, FVC, and LAI were selected as terminals for prediction of canopy height. The second step was to identify a set of functions that would be used to build the models. In this study, constant, addition, multiplication, subtraction, division, exponents and natural logarithms were selected as the primitive functions. Each of these functions has an associated complexity. The first four have a complexity of 1, division has a complexity of 2, and exponents and natural logarithm have a complexity of 4 each. The total complexity of the solution is the sum of the complexity of the functions used in the solution [79]. Each symbolic expression proposed by the genetic programming was evaluated based on its fitness, which in this case, was measured by the mean squared error between observed and predicted values. Probability values were assigned to the initial models based on their fitness. After that, a new generation of models were created by reproduction, i.e., copying an existing model to the new population and genetic recombination, i.e., building a new model by recombining parts from existing models [61]. The trial version of Eureqa pro software was used for the SR model [80]. In the absence of specific termination criteria in the software, the final model was chosen when the 50th generation was reached. After getting the final canopy height model, the efficiency of the model was checked using the test dataset. Further, canopy heights were predicted using the model for the study area.
One of the goals of the SR model was to identify the variables that provided the most significant explanatory power for the dataset. Sensitivity analysis was used to identify the variables which have the greatest contribution to the regression equation [81]. The partial derivative of the dependent variable with respect to an independent variable was taken as the first step of sensitivity analysis. The final sensitivity was obtained by multiplying the partial derivative with the ratio of the standard deviation of the independent variable and the dependent variable [79]. The sensitivity of variable x for the function y = f(x) is estimated as follows: where ∂y ∂x = partial derivative of y with respect to x; σ x = standard deviation of the independent variable x, and σ y = standard deviation of the dependent variable y.
The sensitivity indicates the direction, either positive or negative, and the magnitude of the correlation between input and output variables. A positive sensitivity suggests that an increase in the input variable will increase the value of the output variable and vice versa. The magnitude of the sensitivity determines the amount of increment or reduction of the output variable due to a unit increase in the input variable. A higher magnitude of positive sensitivity denotes a high amount of increase in the output variable value and vice versa for negative sensitivity.

Field Measured Canopy Height
The distribution of the field-measured canopy height showed that the height range of 8 to 10 m has the largest number of plots (Figure 3). The tallest canopy heights observed during the field measurement were in the range of 14-16 m and occurred in three plots, whereas the lowest measured from 2 plots were in the range of 2-3 m.
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 21 Figure 3. Distribution of field-measured canopy heights displayed against their frequency of occurrence.

Isnterferometric Coherence and Biophysical Psarameters
The average values of SAR coherence and biophysical variable images were used as inputs to reduce the effect of temporal variation. The spatial distribution of pixel values of all the variables showed different patterns (Figure 4). The coherence values found higher and close to '1' over fully correlated areas, and near '0', where there is no statistical relationship between the images [40]. As vegetation loss is a significant reason explaining loss of coherence, the coherence values remained low in dense mangrove areas. The FVC and LAI images provided idea about presence and absence of vegetation. Lower values of FVC and LAI indicated sparse or no vegetation. A comparison of the coherence images with LAI and FVC images showed that areas (within yellow ellipse in Figure 4), with lower FVC and LAI values, showed a relatively higher coherence corresponding to lower DEM values. However, for some areas (under the red ellipse), FVC, and LAI showed higher values, but the coherence of the region remained high, and DEM values remained on the lower range.

Isnterferometric Coherence and Biophysical Psarameters
The average values of SAR coherence and biophysical variable images were used as inputs to reduce the effect of temporal variation. The spatial distribution of pixel values of all the variables showed different patterns (Figure 4). The coherence values found higher and close to '1' over fully correlated areas, and near '0', where there is no statistical relationship between the images [40]. As vegetation loss is a significant reason explaining loss of coherence, the coherence values remained low in dense mangrove areas. The FVC and LAI images provided idea about presence and absence of vegetation. Lower values of FVC and LAI indicated sparse or no vegetation. A comparison of the coherence images with LAI and FVC images showed that areas (within yellow ellipse in Figure 4), with lower FVC and LAI values, showed a relatively higher coherence corresponding to lower DEM values. However, for some areas (under the red ellipse), FVC, and LAI showed higher values, but the coherence of the region remained high, and DEM values remained on the lower range.
The frequency of pixel values in the coherence image almost shaped like a gaussian distribution (

Canopy Height Model Establishment Using SR
The progression of the SR showed how the mean squared error decreased by selecting different relationships between the variables (Figure 6). The final regression model (Equation (3)) is a genetic combination of seven primary relationships formed with the input data. The final model by the SR used multiplication, addition, subtraction and division to build the relationships between the a b c d

Canopy Height Model Establishment Using SR
The progression of the SR showed how the mean squared error decreased by selecting different relationships between the variables ( Figure 6). The final regression model (Equation (3)) is a genetic combination of seven primary relationships formed with the input data. The final model by the SR used multiplication, addition, subtraction and division to build the relationships between the dependent and independent variables.
Remote Sens. 2020, 12, x FOR PEER REVIEW 11 of 21 The 'variable sensitivity' analysis gave an idea about the critical variables and their impacts on the regression model (Table 4). FVC had the highest sensitivity that means among all the variables, a unit increase in FVC caused the most significant change in estimated height. Additionally, the direction of FVC sensitivity was negative for 100% of cases, which suggested negative correlation with canopy height. A unit increase in FVC value caused a decrease of 1.122 m in estimated canopy height. LAI was also highly sensitive, but it was positively correlated to the estimated canopy height for all the cases. Thus, a unit increase in LAI increased the estimated canopy height by an amount of 1.108 m. Coherence and DEM had relatively lower sensitivity. Coherence was negatively correlated with height with a sensitivity of 0.57056. For DEM, the correlation was positive, with a sensitivity of 0.34082. VH backscatter had very low sensitivity with the estimated canopy height. The observed and predicted values of canopy height closely followed the identity line ( Figure  7). The R 2 value between observed and predicted canopy height was 0.62, and RMSE value was 1.48 m. A trend of overestimation for lower height and underestimation for higher height can be observed in the correlation plot. However, the magnitude of deviation from the identity line was pretty low, and the number of overestimated and underestimated points was almost evenly distributed. The normalized RMSE was 13.7% concerning the range of field measured canopy heights. Considering the operators and constants used in the model, the total complexity of the model was 32. Equation (3) was used to predict canopy height for the test dataset.
The 'variable sensitivity' analysis gave an idea about the critical variables and their impacts on the regression model (Table 4). FVC had the highest sensitivity that means among all the variables, a unit increase in FVC caused the most significant change in estimated height. Additionally, the direction of FVC sensitivity was negative for 100% of cases, which suggested negative correlation with canopy height. A unit increase in FVC value caused a decrease of 1.122 m in estimated canopy height. LAI was also highly sensitive, but it was positively correlated to the estimated canopy height for all the cases. Thus, a unit increase in LAI increased the estimated canopy height by an amount of 1.108 m. Coherence and DEM had relatively lower sensitivity. Coherence was negatively correlated with height with a sensitivity of 0.57056. For DEM, the correlation was positive, with a sensitivity of 0.34082. VH backscatter had very low sensitivity with the estimated canopy height. The observed and predicted values of canopy height closely followed the identity line (Figure 7). The R 2 value between observed and predicted canopy height was 0.62, and RMSE value was 1.48 m.
A trend of overestimation for lower height and underestimation for higher height can be observed in the correlation plot. However, the magnitude of deviation from the identity line was pretty low, and the number of overestimated and underestimated points was almost evenly distributed. The normalized RMSE was 13.7% concerning the range of field measured canopy heights.

Canopy Height Model Establishment Using RF
RF regression was run for canopy height model establishment. The same set of training and the test dataset were used. Coherence was the most critical variable in the RF model, followed by LAI and FVC. DEM and VH backscatter acted as the variables with the lowest importance (Figure 8a). The importance of VH backscatter found much less than other variables, similar to SR model ( Table  4). The correlation plot between field-measured canopy heights and model predicted canopy heights, showed that the magnitude of over-and underestimation of canopy height was more or less similar, like the SR model ( Figure 8b). However, the result showed a lower of 0.6, between field measured and model predicted canopy height values, while the RMSE value of RF model was higher (1.57 m) and the normalized RMSE was 14.54%.

Canopy Height Model Establishment Using RF
RF regression was run for canopy height model establishment. The same set of training and the test dataset were used. Coherence was the most critical variable in the RF model, followed by LAI and FVC. DEM and VH backscatter acted as the variables with the lowest importance (Figure 8a). The importance of VH backscatter found much less than other variables, similar to SR model ( Table 4). The correlation plot between field-measured canopy heights and model predicted canopy heights, showed that the magnitude of over-and underestimation of canopy height was more or less similar, like the SR model ( Figure 8b). However, the result showed a lower R 2 of 0.6, between field measured and model predicted canopy height values, while the RMSE value of RF model was higher (1.57 m) and the normalized RMSE was 14.54%.

Canopy Height Model Establishment Using RF
RF regression was run for canopy height model establishment. The same set of training and the test dataset were used. Coherence was the most critical variable in the RF model, followed by LAI and FVC. DEM and VH backscatter acted as the variables with the lowest importance (Figure 8a). The importance of VH backscatter found much less than other variables, similar to SR model ( Table  4). The correlation plot between field-measured canopy heights and model predicted canopy heights, showed that the magnitude of over-and underestimation of canopy height was more or less similar, like the SR model ( Figure 8b). However, the result showed a lower of 0.6, between field measured and model predicted canopy height values, while the RMSE value of RF model was higher (1.57 m) and the normalized RMSE was 14.54%.

Comparison of Canopy Height Maps Derived Using SR and RF Models
The predicted canopy height map using SR and RF models demonstrated a range between 0 to 18 m and 3 to 15 m respectively (Figure 9a,c). Areas showing the upper canopy height range distribution was less in the RF model based predicted map, while for SR prediction, a larger area was found with upper canopy height ranges. Overall, both the maps showed similar trends with the medium range of canopy height values being the same for both. The difference in the canopy height as per the two models were found largely in lower range values (Figure 9e,f).
Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 21 The predicted canopy height map using SR and RF models demonstrated a range between 0 to 18 m and 3 to 15 m respectively (Figure 9a,c). Areas showing the upper canopy height range distribution was less in the RF model based predicted map, while for SR prediction, a larger area was found with upper canopy height ranges. Overall, both the maps showed similar trends with the medium range of canopy height values being the same for both. The difference in the canopy height as per the two models were found largely in lower range values (Figure 9e,f).  (e,f) canopy height difference map derived using estimations from two models (SR-RF) and its histogram distribution.

Comparison of Canopy Heights Derived from Model Predictions with ICESat-2 Estimates
The distribution of canopy heights from ICESat-2 showed a similar pattern with SR model predictions (Figure 10a,b). However, canopy heights from ICESat-2, data demonstrated a peak at 10-12 m, whereas it was at 9-10 m for both model predictions. It can be observed that the ICESat-2 footprints lie mostly in the areas with higher canopy heights. Further, the canopy height distribution was more similar to the SR-based predictions than RF.

Comparison of Canopy Heights Derived from Model Predictions with ICESat-2 Estimates
The distribution of canopy heights from ICESat-2 showed a similar pattern with SR model predictions (Figure 10a,b). However, canopy heights from ICESat-2, data demonstrated a peak at 10-12 m, whereas it was at 9-10 m for both model predictions. It can be observed that the ICESat-2 footprints lie mostly in the areas with higher canopy heights. Further, the canopy height distribution was more similar to the SR-based predictions than RF. The canopy height values from SR prediction showed a better correlation with ICESat-2 estimated canopy heights with an R 2 value of 0.45 and RMSE of 2.24 m. The relationship with ICESat-2 based canopy height was much weaker for the RF model predicted canopy height values, where the R 2 value between observed and predicted height was 0.34 with an RMSE of 2.69 m ( Figure  11). There were quite a high number of footprints with extreme values, for which over-and under-estimation can be observed.

Comparison of Canopy Heights Derived from Model Predictions with ICESat-2 Estimates
The distribution of canopy heights from ICESat-2 showed a similar pattern with SR model predictions (Figure 10a,b). However, canopy heights from ICESat-2, data demonstrated a peak at 10-12 m, whereas it was at 9-10 m for both model predictions. It can be observed that the ICESat-2 footprints lie mostly in the areas with higher canopy heights. Further, the canopy height distribution was more similar to the SR-based predictions than RF. The canopy height values from SR prediction showed a better correlation with ICESat-2 estimated canopy heights with an R 2 value of 0.45 and RMSE of 2.24 m. The relationship with ICESat-2 based canopy height was much weaker for the RF model predicted canopy height values, where the R 2 value between observed and predicted height was 0.34 with an RMSE of 2.69 m ( Figure  11). There were quite a high number of footprints with extreme values, for which over-and under-estimation can be observed.

Comparison of SR and RF Model Based Canopy Height Estimates
Although, both SR and RF can be termed as a machine learning models, the fundamental working principle is entirely different from one another. The variable importance of RF and variable sensitivity of SR varied significantly for the canopy height models, as both methods had a different perception regarding the importance of variables. The RF model determines variable importance by the change in the regression error through variable permutation [82], i.e., the change in prediction accuracy due to the presence and absence of a variable was used as a measure of importance [83]. The size of the increase or decrease in regression error due to the absence of a predictor variable measured the magnitude of importance in the RF model. A more significant increase means that the variable was more important compared to other variables. There was no separate variable importance measurement procedure in the SR model. In SR, the model undergoes through a continuous evolutionary process. Models incorporating unimportant variables will perform worse than individuals using only relevant variables. Those unimportant variables will in turn, have a lower chance of being chosen to produce highly accurate symbolic expressions [84]. Therefore, the presence of irrelevant variables was discouraged throughout the process. Hence, the presence of a variable in a sufficiently evolved population will indicate the necessity of that variable in the model.
Stijven et al. [84] had argued that the variable selection method of SR is more reliable than in RF regression. With the detailed analysis of four different datasets, they have listed several reasons for which variable selection by RF may not always be reliable. As the first reason, they concluded that when multiple variables had almost equal importance, the RF model struggles to differentiate between them and assigns random variable importance. RF can also assign considerably less significance to a variable than expected due to its correlation with other irrelevant variables present in the model. Data distribution often can influence the variable importance. SR was found to be free from all these obstacles, which were held back in the RF model. Thus, in the SR model, there was a lower chance of omitting the vital variables. Chen et al. [85] also confirmed that SR was more efficient in variable selection than RF.

Importance of Variables in the Canopy Height Models
While building the canopy height model using SR, it was found that FVC, LAI, and coherence worked as the most sensitive variable for canopy height estimation. Although, FVC was an indicator of the tree crown property, it had shown a good correlation with canopy height in some earlier studies. Simrad et al. [86] concluded that while canopy cover shows a correlation with tree height, it might not hold for tall mature tropical forests due to the saturation of canopy cover with tree height. Wang et al. [47] also produced a global canopy height map using GLAS data and RF regression. In their study, they confirmed that tree cover was closely related to canopy height. Liu et al. [35] also showed the relationship between tree height and its predictors would be a non-linear one. Korhonen et al. [87] in their study, found that on reaching a certain height, canopy height has a strong non-linear relationship with canopy cover. One more reason for this kind of result can be the species distribution in the study area. It was also observed from the results that FVC had a negative sensitivity with canopy height, which means an increase in FVC will result in a decrease of canopy height. In BWS, it was observed that trees with relatively lower height such as Excoecaria agallocha were densely packed than other taller trees, like Avicennia officinalis, and Heritiera fomes. Thus, densely packed patches could have lower canopy heights, leading to negative sensitivity of FVC to canopy height.
Several studies found that coherence showed reasonable correlation with canopy height. Olesk et al. [42] used four different models to illustrate the relationship between coherence and canopy height and found that all the models performed well in describing the relationships. In general, the coherence of an interferometric pair decreases due to the volumetric effect, among other reasons [74]. As tree height is a significant indicator of aboveground volume in forested areas, and the canopy height is highly correlated with aboveground volume, coherence must have a correlation with canopy height. Therefore, coherence can be used to estimate canopy height. Schlund et al. [88] found that volume decorrelation can be traced back to the canopy height. As a result, we found it as an essential variable in both models. In the SR model, it is negatively correlated with the canopy height. It happened so because, with the increase in height, the volume decorrelation increases leading to decrease in coherence. Therefore, decrease in coherence could indicate increase in canopy height and vice versa.
The estimation of canopy height with the help of LAI was not largely explored. Some studies have shown a possible relationship between them [52,89]. Pope and Treitz [50] showed that LiDAR predicted height could be used for LAI estimation to an acceptable extent for boreal forests. Qu et al. [51] found that even for a tropical forest site, height metrics derived from LiDAR data can estimate LAI quite efficiently. They also showed that it even performs better than MODIS LAI. Thus, these studies showed that there is a relationship that exists between canopy height and LAI. However, remote sensing based studies have not tried to use the relationship to estimate canopy height from LAI. Here, LAI was found as the second most important variable for both models, which showed good correlation of LAI with canopy height.

Canopy Height Models and Maps
Both SR and RF models demonstrated capability in predicting canopy heights with the former having better efficiency. The SR model established here is complicated with more functions and higher complexity. A complex model is more likely to map the inherent non-linearity among the predictors, while a simpler model is preferred for its easy interpretability. The relationship observed between remotely sensed parameters, and biophysical variables were generally non-linear and complex [90]. Although different machine learning algorithms, including RF, can estimate biophysical parameters efficiently, the lack of interpretability restricts the replication of their results; which can be overcome with the use of SR based models.
Though the canopy height maps differ in values, they showed similar trends for most of the areas, i.e., higher values in one map correspond to higher values in other maps, and vice versa. The difference in canopy height map showed that the values were generally higher for SR model prediction than the RF model prediction values. However, there could be exceptions (marked by a red ellipse in Figure 3), due to mainly two reasons. First, the different variable importance in different models as the RF and SR models used different set of essential variables leading to slightly different results. Second, some differences occurred due to the extrapolation problem of the RF model [91]. RF regressions cannot predict values outside the range of the training data as it is based on averaging the values of multiple outputs. In RF regression, final predictions were derived by averaging the results of many tree canopies. Additionally, each canopy output was derived as the mean values in each terminal node of a tree. The average for a set of values must be well within the value range. Therefore, the highest canopy height value remained below 14 m in the RF model predicted map (Figure 9c), whereas field-measured data reported canopy height values beyond 14 m. The RF model training sample tend to underestimate the higher canopy height values and overestimate the lower range values. The SR model can extrapolate values beyond the range of training data, therefore, a large area observed with a canopy height between 14 and 18 m for the SR model prediction (Figure 9a). Therefore, for the upper and lower canopy height values that lie beyond the training data range, the SR model prediction could be erroneous. Castillo et al. [92] also reported discouraging extrapolation with SR model.
The canopy height maps showed limited correlation with the ICESat-2 estimated canopy height. The first release of the ALT08 data product has some known issues which may affect the estimated canopy height [93]. ICESat-2 data is also found to have a vertical RMSE of 3.2 m for canopy height retrieval [94]. So, these reasons may affect the correlation between model estimated canopy height and ICESat-2 estimated height. In future, with the availability of an increased amount of more accurate footprints, an improved relationship between the two can be expected. Additionally, the canopy height models were built on field measured inputs with a laser rangefinder, which may have some instrumental error [8].

Conclusions
The mangroves are one of the critical storages of aboveground carbon, and they are experiencing considerable alternations due to climate change. Accurate information about the carbon storage proxies, such as canopy height, will help estimate AGB and carbon sequestration. Forest canopy height models, especially for the mangroves, were generally prepared by using airborne LiDAR, high spatial resolution stereo imagery, or SAR interferometry [20,56,95]. Nonetheless, most of these methods were not always applicable to all the areas due to a lack of proper data availability. The current study proposed a method that can be applied to anywhere else in the world as it is based on Sentinel series data having global coverage.
In this study, we have analysed the potential of Sentinel-1 interferometric coherence, Sentinel-2 biophysical parameters in predicting the canopy height for mangroves. The interferometric coherence and biophysical parameters act as good predictors due to their relationship with the canopy height. Machine learning models were found to be an excellent method for canopy height modeling. Although the RF model demonstrated its efficiency in canopy height estimation, the SR model through genetic programming was found to be the most effective. The SR also established an interpretable model, which is not possible via any other machine learning algorithms. The SR-based model outperforms commonly used machine learning models like the RF. The fraction of vegetation cover (FVC) was found to be an essential variable for predicting canopy height. It was also found that the canopy height map correlates with ICESat-2 estimated canopy height, albeit modest. Overall, this study demonstrated the effectiveness of Sentinel series data and the SR in predicting canopy height.