Retrieval of Leaf Area Index Using Sentinel-2 Imagery in A Mixed Mediterranean Forest Area

: Leaf area index (LAI) is a crucial biophysical indicator for assessing and monitoring the structure and functions of forest ecosystems. Improvements in remote sensing instrumental characteristics and the availability of more efficient statistical algorithms, elevate the potential for more accurate models of vegetation biophysical properties including LAI. The aim of this study was to assess the spectral information of Sentinel-2 MSI satellite imagery for the retrieval of LAI over a mixed forest ecosystem located in northwest Greece. Forty-eight field plots were visited for the collection of ground LAI measurements using an ACCUPAR LP-80: PAR & LAI Ceptometer. Spectral bands and spectral indices were used for LAI model development using the Gaussian processes regression (GPR) algorithm. A variable selection procedure was applied to improve the model’s prediction accuracy, and variable importance was investigated for identifying the most informative variables. The model resulting from spectral indices’ variables selection produced the most precise predictions of LAI with a coefficient of determination of 0.854. Shortwave infrared bands and the normalized canopy index (NCI) were identified as the most important features for LAI prediction.


Introduction
Leaf area index (LAI), commonly defined as the amount of leaf area (m 2 ) in a canopy per unit of ground area (m 2 ) [1,2], is a critical biophysical indicator recognized as an essential climate variable (ECV). LAI is applicable in evaluating land ecosystem condition and functions, as well as observing various characteristics of global ecosystems [3]. It can also be applied for the evaluation of the photosynthetic capacity of vegetation as a function of available leaf area [4]. The total amount of leaf area is an important vegetation parameter that can be used to model and quantify the role of vegetation cover to many Earth's surface processes, such as primary productivity, rainfall interception, and carbon flux [5]. Thus, LAI can be used as a tool for adapting and implementing more sustainable forest management practices [6].
Direct field measurements such as point contact sampling, litterfall traps, or destructive sampling (area harvest) are considered to be the most accurate approach to estimate LAI. However, these processes are labor-intensive and are often constrained by site accessibility, logistical, staffrelated and financial factors [7]. Improved indirect, non-destructive field-based techniques such as the indirect point quadrat, allometric and canopy gap fraction analysis through terrestrial sensors, and have an increased cost-efficiency of the traditional in situ approach. However, ground measurements cannot provide LAI information over large areas and extended temporal periods [8].
At the present time, airborne and satellite, passive and active sensors are used for the rapid, spatial explicit retrieval of LAI [9]. Satellite sensors assist the potential to retrieve LAI values from local to global scales, relying on the specific spectral attributes of green leaves (i.e., strong absorption in the visible and high reflectance in the near infrared) compared with other land surface materials [5].
Currently, LAI estimating methods based on optical remote sensing include the development of empirical relationships between LAI and spectral/spatial information, biophysical modeling (inversion of radiative transfer models simulating canopy reflectance), and hybrid inversion methods [10,11]. Empirical approaches (parametric or non-parametric models) are set up on the association between texture or spectral features and field LAI values defined over sample plots. In the second approach (i.e., biophysical modeling), LAI is estimated through a reverse physical model using spectral reflectance as the input variable and LAI as the model's output. Finally, the hybrid inversion approach integrates statistical and physical models [12,13]. Compared to the statistical models, the last two approaches are considered more accurate, generalizing well across a wider area extent, however, they are more time and data demanding during the training phase [14,15].
The most common approach for LAI estimation through remote sensing data relies upon the use of spectral vegetation indices, and it is a relatively simple and computationally efficient approach [11,15,16]. Vegetation indices serving as proxy indicators of the vegetation's surface reflectance which can be used in order to reduce the dimensionality and redundancy of spectral information, as well as confounding variables such as scene illumination, soil background, topography or atmospheric effects [5,17]. Nevertheless, the statistical relationships between remotely sensed LAI and spectral indices are ecosystem dependent and do not typically generalize well across different ecosystems [16,18].
Until recently, Landsat imagery was the most frequent source of information for LAI, due to its spatial resolution, large area coverage, and free availability from the end of 2008 [11]. The launch of Sentinel-2A on 2015, providing a high revisit time and a spatial resolution imagery up to 10 m, also at no-cost, increased the analysis accuracy of biophysical parameters such as LAI which has been anticipated due its higher spatial/spectral resolution and higher revisit frequency [4]. In particular, Sentinel-2, apart from six bands that are comparable to Landsat-8, offers three bands in the red-edge part of the spectrum, met at 705, 740 and 783 nm. These additional bands located in the sharp-edge between the red absorption maximum and the near-infrared reflectance, respond to canopy reflectance as resulting from the multiple scattering among leaf layers [5,15]. In addition, spectral information from the red-edge region of the spectrum is less affected by biophysical attributes such as canopy structure and leaf spectral properties, solar zenith angle irradiance, and other atmospheric effects [19]. However, the improvement of LAI retrieval through the exploitation of spectral information available in the red-edge position of the reflectance spectrum is still open to research, since several studies presented controversy results [5,15].
Along with the advances in sensor characteristics, new approaches have been introduced for statistical model development, providing a more robust framework to model complex dynamics. The majority of the studies developing LAI prediction models over forest environments rely on the use of simple and multiple regression-based (i.e., linear, log-linear or exponential) models [5,11]. However, conventional regression methods might be insufficient for multiple independent predictors, such as multiple vegetation indices [12], because of its weakness to handle highdimensional nonlinear relationships, multi-collinearity limitations, and normal distribution requirement [20], raising the need for enhanced modeling approaches.
Meyer et al. [5] developed simple and multiple linear regression models for LAI estimation in a temperate forest in the southeastern Germany, using Landsat-8 and Sentinel-2 data. Omner et al. [20] compared SVM and NN regression models for the LAI retrieval of endangered tree species in South Africa, using WorldView imagery. SVM regression was also employed by Durbha et al. [21] to estimate LAI from multiangle imaging spectroradiometer in an agricultural area in France. Wang et al. [22] estimated the LAI of grassland in Oklahoma, United States, using Sentinel-1, Sentinel-2, and Landsat data in SVM and RF models. Cohrs et al. [16] used SVM classification to enhance the linear models of LAI-2200C data and the spectral information of Sentinel-2, in a pine plantation. Kial et al. [25] employed the PLSR algorithm to predict LAI in a grassland area, using hyperspectral data. Houborg et al. [26] assessed LAI in an agricultural area, suggesting a hybrid model on the base of decision tree regression algorithms. Campos-Taberner et al. [27] examined crops' LAI retrieval using GRP and PROSAIL model with Landsat and SPOT5 satellite data. Verrelst et al. [28] developed GRP models for LAI estimations, using a field hyperspectral dataset in agricultural area. Verrelst et al. [29] tested a range of parametric, non-parametric and physical retrieval methods for LAI estimation in different crop types, using Sentinel-2 imagery.
Nevertheless, the evaluation of statistical machine learning regression approaches to improve correlations between LAI and spectral reflectance over forest areas is still an open challenge [16], since studies were conducted in different biomes, the chosen algorithms are configured per biome, and the non-direct measurements of LAI are influenced by canopy structure [2]. To the best of our knowledge, the correlation of in situ LAI values of mixed forest with spectral values produced by Sentinel-2 imagery, using GPR algorithm, remains investigating in the Mediterranean environment.
Specifically, Gaussian processes have become popular in Earth science and the remote sensing field [31,32] and present encouraging results in estimating biophysical variables [27][28][29]31]. Several studies' findings support GRP advantages such as model stability and computation efficiency [33,34]. Comparative studies where statistical methods were evaluated for LAI prediction [29], demonstrate GPR's efficiency on processing time as well as on performance accuracy. Moreover, the GP technique provides an insight in the model evaluating the relevance of variables according to the automatic relevance determination (ARD) and indicates the relative contribution of different predictors in model development [35]. Given its advantages for biophysical parameter prediction, GRP appeared to be a first choice to explore the spectral information for LAI estimation.
In this framework, the aim of this study was to examine the utility of the Sentinel-2 optical images for LAI retrieval in a heterogeneous forest ecosystem in the Mediterranean area, through empirical statistical relationships built upon a GPR machine learning algorithm. Specific objectives were to evaluate both the original bands and spectral indices models for LAI prediction and to enhance LAI models using the variable selection approach and identifying the most informative spectral features.

Study Area
The forest ecosystem under investigation is the Northern Pindos National Park, which is one of the largest protected terrestrial areas in Greece. The Park is located in Northwestern Greece and covers a total area 1969 sq. km. The region is characterized by a montane climate which varies in aspect and elevation. The annual precipitation ranges between 1000 and 1800 mm and the monthly temperatures average from 0.9 to 21.4 Celsius.
Northern Pindos National Park has an outstanding diversity of flora and fauna and the woodland covers a unique aesthetic landscape. Lower and middle altitudes are covered by oaks (Q. macedonica, Q. cerris, Q. pubescens, Quercus frainetto), and other mixed or pure stands of deciduous tree species (Ostrya carpinifolia, Carpinus orientalis, Carpinus betulus, Fraxinus ornus). At higher altitudes, two conifers Pinus nigra and Abies borissii regis are found. Up to 1800 m, beech forest (Fagus sylvatica) extends on the northern slopes, and Bosnian pine (Pinus leucodermis) covers the verge of the mountain slope. Above 1800 m, sub-alpine grasslands reach the peaks and often are dotted with Balkan pines (Pinus peuce). In the treeless alpine meadows, only certain types of scrubs are found.

Field Data
Field data were collected during August 2018. The set of ground data was constituted by 48 elementary sampling units (ESU) (Figure 1), where biophysical parameters were measured. Each ESU has a size equal to a pixel size (20 × 20 m) and were located among various forest types. Canopy measurements were made with a portable photosynthetically active radiation meter ACCUPAR LP-80: PAR & LAI Ceptometer. The AccuPAR LP-80, facilitating non-destructive LAI measurements, consists of a linear array of 80 independent photosynthetically active radiation (PAR) sensors. The indirect field measurements with LP-80 AccuPAR Ceptometer consider the amount of light energy transmitted by a plant canopy and calculate LAI using a simplified version of the Norman-Jarvis radiation transmission and scattering model [36]. In each ESU, we collected above canopy measurements in nearby, unshaded open field, followed by six individual sensor retrievals below canopy, which were used to obtain a statistical mean of each ESU. The average LAI measures for broadleaved and coniferous species was 5.01 and 1.89, respectively.

Remote Sensing Data Acquisition and Preprocessing
The remotely sensing data employed in this study consist of a geometrically and atmospherically corrected at bottom-of-atmosphere (BoA) reflectance, and Sentinel-2 MSI (Level-2A) cloud-free image acquired on 25 August 2018. Sentinel imagery was available for download at no-cost via Sentinels Scientific Data Hub website (https://scihub.copernicus.eu/).
For the analysis, we retained 10 out of the 13 original spectral bands of the image (the 60 m spatial resolution bands were excluded), covering the visible to the shortwave infrared (SWIR) reflectance spectrum. The 10 m bands were resampled to 20 m to be compatible with the ESU size. Finally, three vegetation indices: normalized difference vegetation index (NDVI) [37], non-linear index (NLI) [38], and the normalized canopy index (NCI) [39], as well as their modified counterparts considering the red-edge and near-infrared regions of the spectrum [40,41], were estimated to generate the second feature set. In the second dataset, we also included the tasseled cap features (TCFs) [42] calculated upon the original spectral bands (Table 1).

Statistical Analysis
In the present study, a Gaussian process specified parametrically for regression problems was performed to determine the associations among field biophysical LAI measurements and spectral data. The Gaussian process is a Bayesian non-parametric algorithm, which could be considered as a generalization of a Gaussian (normal) probability distribution [43] extended to infinite dimensionality [44]. In contrast to other regression algorithms, the GPR algorithm does not define a conditional mean function but instead detects a befitting covariance between observations [45]. The GPR consists of a kernel method approach which can provide further advantages such as conditional, statistical information for the predicted variable. This extracted knowledge allows the interpretability as well as the flexibility of GPR models.
The GPR algorithm was used as implemented in the kernlab package [46] within the R environment software [47]. To assess the quality of the model, a 10-fold cross validation on the training data was performed. The determination coefficient (R 2 ) and root mean square error (RMSE) were calculated to access the accuracy of the models. In general, the higher R 2 values and the lower RMSE, the more accurate the model is. RMSE formula is: where n represents the number of predictions, is the prediction produced for observation i, and Y represents the observed values which are the inputs to the equation.
Initially, two GPR models (spectral bands and spectral indices) were developed using the full set of variables. In the next step, the importance of the individual predictors was estimated using permutation importance analysis. Permutation importance method figures the change in model's performance before and after permuting the values of each variable and contrasts this to the predictions made on full dataset [48]. Using backward elimination selection technique [49] and gradually removing the least contributing variables, we established an adequate set of features as input for the new models.
Subsequently, the minimal subset of predictors producing the lowest RMSE and the best coefficient of determination (R 2 ) for the LAI model were selected. The permutation importance analysis process was applied within R environment software (R Development Core Team, 2014) using 'mlr' package [48].

Results
Two GRP models for LAI prediction were built, considering the original ten spectral bands and the full set of spectral indices ( Table 1). The accuracy of the spectral indices model was slightly better (R 2 = 0.825 RMSE = 1.415) than that of the spectral bands model (R 2 = 0.811 RMSE = 1.646) (Figure 2). In the subsequent step of the analysis, we ranked the individual variables according to their importance on the model's prediction performance. Figure 3 shows the permutation importance rankings for spectral bands and for the 16 spectral indices.   Finally, we produced LAI maps ( Figure 5) based on the best developed models considering spectral bands and the spectral indices model.

Discussion
This study investigated the utility of Sentinel-2 spectral information for the estimation of the LAI over Northern Pindos National Park in Greece, using a GPR algorithm. As previous studies [29,50] have also indicated, the GPR technique provides a satisfactory accuracy of LAI estimates, efficiently handling a multi-dimensional dataset. In addition, GPR models generated through 10-fold cross validation on the training data reveal the most relevant variables to LAI in ranking order presenting an insight in the relation of LAI values with vegetation spectral response. The developed ranking list and the backward elimination process, pruning the least promising variables facilitated the development of less complex and computationally lighter model with less independent variables. and slightly improved accuracy.
The GPR model based on the original full dataset of spectral indices has shown marginally better performance than the model developed upon the spectral bands. The same pattern was also presented by the models after the variable selection procedure. Spectral indices models, using five spectral indices (NDVI, NLI, WET and the modified NLI_RE1, NLI_RE2) slightly outperformed the best spectral bands model of the seven spectral bands. In the Korhonen et al. [51] study, indices models present lower but adequate predictive accuracy compared to the individual band's models, for the assessment of biophysical variables. In Verrelst et al.'s [33] study where CHRIS hyperspectral satellite images were used, the Gaussian Process model of four or more well chosen bands outperformed vegetation indices for the assessment of vegetation biophysical parameters. However, the measurement of LAI values in these previous studies referred to different biomes affecting the relationship with spectral response. It has to be also noted that even though several studies examined the relation between Earth observation data with LAI, the fact that different techniques and methods for LAI measures were applied in the field, may render them not comparable to one another.
Based on spectral bands' importance rank order, SWIR (B11 and B12), NIRn2 (B8A), and NIR (B8) bands were found to be more proper for LAI assessment. Previous studies also found a strong correlation between SWIR bands with LAI [51,52]. Reflectance on the SWIR and NIR part of spectrum is noted to be affected by soil and vegetation attributes and thus can be useful for LAI estimation [53]. Moreover, the NIR as well the SWIR band has a capacity to sense plant components through combatively deep layers of vegetation [26,54]. In particular, the significance of the SWIR spectral band introduced by its relation to canopy reflectance and water content seems promising for the efficient estimation of LAI mostly in closed canopy forests [52]. The first narrow NIR B7 (NIRn1) and the second red-edge B6 (RE2) bands also appear to contribute importantly to LAI evaluation, as they are recognized indicators of plant chlorophyll content [55]. These bands located over the transition spectral zone which is characterized by chlorophyll absorption to leaf scattering, thus a positive change of leaf chlorophyll content implies reflectance changes from low in red-edge region to very high in the NIR [56].
Regarding the spectral indices importance rank order, the results were not entirely unexpected. NDVI is a very common index for LAI assessment across a wide range of ecosystems [51,52,[57][58][59][60] . NDVI boosts the contrast between vegetation and soil but it minimizes the influence rising by illumination conditions [61]. In addition, NDVI is associated with well known limitations of saturation at intermediate levels of LAI [26]. The importance of NDVI for LAI estimation becomes weaker while LAI is increasing beyond a species-dependent threshold, which is commonly around mid-LAI values [57,62].
The modified versions of NDVI derived from the red-edge and near-infrared narrow bands are also among the high important variables. Twele et al. [63], who evaluated LAI in a tropical environment, found that the performance of indices using narrowband indices was better than that of broadband indices. Red edge has also been also by previous studies as a valuable variable for the assessment of LAI [51,55] due to its detection strength of canopy depth under dense canopy and high biomass status [64,65]. Meyer et al. [5] also indicated that vegetation indices developed on nearinfrared bands were most highly correlated with LAI.
Furthermore, indices and TCFs have been suggested to avoid the saturation problem and enhance the predictive accuracies regarding forest LAI. Schönert et al. [66] confirmed the great abilities of TCFs for the estimation of crop LAI, using RapidEye imagery. The index of wetness, as also its terms reveals, gages the moisture content of the vegetation or soil by abstracting the sum of the visible and near infrared band from the longer infrared bands [67]. The wetness feature is sensitive to canopy moisture, thus as the amount of canopy increases, the wetness values rise until maximum canopy cover is achieved [67].
Moreover, the NCI index, which was calculated from the SWIR and green band, was presented as the most significant variable for LAI estimation. The importance of NCI for LAI assessment is linked to the green band's sensitivity to chlorophyll and on SWIR sensitivity to moisture. In a dry rangeland ecosystem, Barati et al. [68] detected that NCI presented low prediction accuracy for biophysical parameters, compared to other vegetation indices. Vescovo et al. [39] found that NCI showed stronger correlation to LAI when phytomass levels were relatively high in a grassland environment. Consequently, according to these previous studies, the correlation strength of NCI with LAI depends on the biomass level. This fact is aligned with the results of our study where NCI is among the top ranking important variables for LAI estimation in a Mediterranean forest ecosystem.
All in all, the GPR algorithm, the variable selection technique, and the Sentinel-2 MSI spectral data seem appealing for LAI assessment over mixed Mediterranean forests. The achieved results could assist the efficient selection of proper Sentinel-2 MSI bands and spectral indices for LAI retrieval, in a regional setting approach and under an operational monitoring framework.
However, additional studies are required to establish the relation between the specific forest ecosystem attributes and biophysical parameters using optical remote sensing data, since environmental conditions affect the spectral response. Previous research has also highlighted the importance of the understory vegetation, species diversity, and surface moisture for modeling the LAI and spectral information linkage [69].

Conclusions
Remote sensing-based LAI measurements could be used to monitor forest ecosystems response to the pressures induced by various drivers of change, and to indicate early warning signs regarding forest sustainability risks. In this paper, we evaluated the utility of Sentinel-2 MSI satellite imagery for estimating LAI in a heterogenous Mediterranean forest. Furthermore, we compared the effectiveness of spectral indices and the spectral band to model LAI using the GPR algorithm and we identified the most relevant informative variables for monitoring and mapping LAI.
The results of the present study can be summarized as follows: → GRP algorithm seems promising for LAI estimation and LAI models' interpretation through variables' permutation importance rankings; → Although SWIR bands have been designed for atmospheric correction applications and supposed to be of mirror significance for biophysical parameter estimation, GPR revealed spectral information in SWIR bands which is proven to be beneficial for the assessment of biophysical parameter such as LAI; → LAI over a heterogeneous Mediterranean forest can be mapped at a high predictive accuracy using five spectral indices (NCI2, NCI1, WET, NDVI_RE1, NDVI_RE2). NCI, red-edge NDVI, and TCFs wetness indices have been proven to be important predictors for forest LAI modeling.
Overall, the outcomes of this research provide proof of the potential of Sentinel-2 MSI spectral resolution for LAI assessment in Mediterranean forests. However, additional sampling efforts, extended over several growing seasons, could contribute in the verifying the robustness of our findings.