# Estimating Crop Biophysical Parameters Using Machine Learning Algorithms and Sentinel-2 Imagery

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

_{ab}), and Canopy Chlorophyll Content (CCC) are essential for characterising field-level spatial variability and thus are necessary for enabling variable rate application technologies, precision irrigation, and crop monitoring. Moreover, robust machine learning algorithms offer prospects for improving the estimation of biophysical parameters due to their capability to deal with non-linear data, small samples, and noisy variables. This study compared the predictive performance of sparse Partial Least Squares (sPLS), Random Forest (RF), and Gradient Boosting Machines (GBM) for estimating LAI, LC

_{ab}, and CCC with Sentinel-2 imagery in Bothaville, South Africa and identified, using variable importance measures, the most influential bands for estimating crop biophysical parameters. The results showed that RF was superior in estimating all three biophysical parameters, followed by GBM which was better in estimating LAI and CCC, but not LC

_{ab}, where sPLS was relatively better. Since all biophysical parameters could be achieved with RF, it can be considered a good contender for operationalisation. Overall, the findings in this study are significant for future biophysical product development using RF to reduce reliance on many algorithms for specific parameters, thus facilitating the rapid extraction of actionable information to support PA and crop monitoring activities.

## 1. Introduction

_{ab}), and Canopy Chlorophyll Content (CCC). Leaf area index (LAI)—defined as the one-sided area (m

^{2}) of the total developed green leaf area per unit ground surface area (m

^{2}) in broadleaf canopies [7]—is an essential biophysical parameter that provides valuable information on plant physical and physiological processes, and thus is critical for characterising crop growth status and health and stress.

_{ab}and CCC) are essential parameters for determining photosynthetic capacity, optimising N application to increase yields and profits, and reducing environmental impact from excessive fertilization [9]. Traditionally, these parameters have been measured using direct lab-based methods, which are destructive, spatially, and temporally limited, expensive, time-consuming, and labour-intensive. Therefore, remotely sensed approaches for biophysical parameter estimation are highly sought to rapidly and frequently characterise crop conditions at regional scales to inform the development of policy instruments for alleviating the impacts of climate change. On a landscape scale, they facilitate land management decision-making by providing information about when and where to fertilize or irrigate and by how much, thus promoting the sustainability and profitability of agricultural systems.

_{ab}and CCC for precision agriculture and crop monitoring applications [23]. Using the Sentinel-2 and Sentinel-3 band settings, Clevers and Gitelson [24] observed that the RE indices, i.e., red-edge chlorophyll index (CIred-edge), the green chlorophyll index (CIgreen), and the MERIS terrestrial chlorophyll index (MTCI), were linearly related to CCC and nitrogen (N) content and hence were accurate estimators. However, Sakamoto, et al. [25] argue that SVIs are not transferable over various canopy architectures, leaf structures, climate zones, and environmental conditions and contain lower information content.

^{2}m

^{−2}. In contrast, simple and more robust MLRAs such as Random Forest [29], Gaussian Process Regression (GPR) [39], and Kernel Ridge Regression (KRR) [40] have better estimation accuracy and computational cost, can learn complex relationships from input data, and require few and intuitive tuning parameters [17,20,41]. Due to continuous improvements in MLRA, optimisation techniques (e.g., Gradient descent), sensors capabilities (such as increasing resolution and number), and evolving user needs [42], it is essential to evaluate various MLRAs to improve site-specific accuracy and reliability and inform future operationalisation, towards the improvement of food production in developing countries.

_{ab}, and CCC using Sentinel-2 Multi-Spectral Imager (MSI) data, and (2) to identify the influential spectral bands with high predictive power for estimating the crop biophysical parameters. These algorithms have been limitedly exploited in the context of crop biophysical parameters estimation from Sentinel-2 imagery for precision agriculture and crop monitoring. Moreover, the comparison of the three MLRAs in the context of crop biophysical parameter estimation is worthwhile to elucidate their capabilities under the same environmental and acquisition conditions and their consistency to previous performances as they may provide a promising alternative to existing operational techniques.

## 2. Materials and Methods

#### 2.1. Study Area

#### 2.2. Data

#### 2.2.1. Remotely Sensed Data

#### 2.2.2. Calibration and Validation Data

_{ab}, and CCC MLR models were collected in the field from 11 to 23 April 2021 over the dominant crops at the study area, i.e., Maize, Beans, and Peanuts. LAI and LC

_{ab}measurements were collected non-destructively within 40 m × 40 m plots to avoid edge effects and allow biophysical parameter mapping at 20 m resolution. The plots were selected systematically along a transect to capture variability. Each plot consisted of an average of six to eight random measurements. For LAI measurements, we used LiCor 2200c Plant Canopy Analyzer (Li-Cor, Inc., Lincoln, NE, USA), with a 180° view cap to shield the influence of the operator and unequal sky conditions on the measurements. Trimble

^{®}TDC600 handheld Data Collector, with global navigation satellite systems (GNSS) accuracy of 1.5 m, was used to Geo-tag the centroid of each plot and take plot pictures. In contrast, LC

_{ab}measurements were collected with MC-100 Chlorophyll Concentration Meter (Apogee Instruments, Inc., Logan, UT, USA) from the sunlit upper canopy. The Canopy Chlorophyll Content (CCC) for each plot was estimated as a product of LC

_{ab}and LAI (LC

_{ab}× LAI) [16]. The data were divided into 70% training and 30% validation [46]. The summary statistics for calibration and validation are given in Table 1.

#### 2.3. Crop and Green-Vegetation Masking

#### 2.4. Machine Learning Regression Algorithms

#### 2.4.1. Sparse Partial Least Squares

_{CV}, Equation (3)). The sPLS analysis was accomplished in R-statistics software [62] with ‘spls’ package [63]. The MSE

_{CV}was converted to Root Mean Squared Error of CV (RMSE

_{CV}) to compare with other MLRAs.

#### 2.4.2. Random Forest

_{ab}, and CCC with Sentinel-2 MSI were determined by the grid-search strategy [69] using values ranging from 1 to p with a single interval, where p is the total number of input explanatory variables, and from 100 to 500 with an interval of 10, respectively. Grid-search strategy is a commonly used approach for hyperparameter tuning as it considers, exhaustively, all possible parameter combinations and chooses the pair of parameters that yields minimum OOB error. The RF analysis was performed in R-statistics software [62] using ‘randomForest’ library [70].

#### 2.4.3. Gradient Boosting Machines

_{CV}). The GBM algorithm also calculates the relative influence of each variable on the model, by averaging the relative influence of variables across all trees.

#### 2.5. Prediction Accuracy Assessment

^{2}), root mean squared error (RMSE), and relative root mean squared error (RRMSE) (Equations (7)–(11)). These measures were recommended by Richter, et al. [72] and are frequently used in literature; thus, they facilitate comparison between studies of biophysical parameters. The R

^{2}is a correlation-based, dimensionless measure that reflects spatial (temporal) patterns, with values ≥ 0.9 interpreted as Excellent and 0.5 ≤ R

^{2}≤ 0.8 as Good. In contrast, the RMSE indicates the magnitude of error in the units of the biophysical parameter, i.e., m

^{2}m

^{−2}and µg cm

^{−2}for LAI, and LC

_{ab}and CCC, respectively. For LAI, the RMSE values < 0.5 m

^{2}m

^{−2}can be interpreted as Excellent and 0.5 m

^{2}m

^{−2}≤ RMSE < 1.0 m

^{2}m

^{−2}as Good. Lastly, RRMSE is a dimensionless index suitable for comparisons between different variables or ranges, where the values ≤ 10% are regarded as Excellent and 10% < RRMSE ≤ 20% as Good [72]. Finally, percentage Bias (%Bias) is a measure of the tendency of a model to underestimate or overestimate a biophysical parameter, where the ideal value is 0% and values close to 0% indicate an accurate model [73].

_{i}is the observed biophysical parameter (e.g., LAI), and y

_{i}is the predicted biophysical parameter (e.g., LAI), ${\overline{x}}_{i}$, and ${\overline{y}}_{i}$ are the mean of observed and predicted biophysical parameters, respectively; n is the sample size, and N is the number of errors.

## 3. Results

#### 3.1. Optimal Tuning Parameters

_{ab}. The excluded variables for sPLS-LC

_{ab}are B6, B8, and B8A.

#### 3.2. Model Performance

_{ab}, and CCC using sparse Partial Least Squares (sPLS), Random Forest (RF), and Gradient Boosting Machine (GBM).

^{2}m

^{−2}, explaining 76% of LAI variability, followed by GBM with RMSE of 0.63 m

^{2}m

^{−2}and R

^{2}of 0.61, while sPLS is relatively worse, i.e., RMSE: 0.77 m

^{2}m

^{−2}, and explained only 49% of the variability in LAI. Consistently, sPLS has the greatest %Bias, i.e., 5%, while GBM has the lowest %Bias, i.e., 1%, among the compared MLRAs. Similarly, in predicting LC

_{ab}(Figure 2d–f), the results show that the RF algorithm is superior, i.e., RMSE: 7.57 µg cm

^{−2}, followed by sPLS with RMSE of 7.90 µg cm

^{−2}, while GBM is relatively worse with 8.25 µg cm

^{−2}. The variability explained by the sPLS is the lowest (i.e., 81%) when compared to that achieved by RF and GBM models, i.e., 83%, respectively. Moreover, the %Bias is highest in the sPLS model, i.e., 5%, and lowest in the GBM model, i.e., 1%, while RF has a %Bias of 3%. For CCC estimation (Figure 2g–i), RF results show a relatively better predictive performance with RMSE of 39.49 µg cm

^{−2}, when compared to sPLS and GBM. The GBM model has the next better predictive performance, achieving an RMSE of 44.19 µg cm

^{−2}, while sPLS is relatively poor with an RMSE of 52.76 µg cm

^{−2}. Both the GBM-CCC and sPLS-CCC models have markedly high %Bias, i.e., 12% and 13%, as compared to only 2% achieved by the RF-CCC model. Like LAI and LC

_{ab}models, the RF-CCC model explains the greatest variability, i.e., 83%, followed by GBM explaining 77% of CCC variability, while sPLS explains the least variability, i.e., 74%, among the compared MLRAs.

#### 3.3. Biophysical Parameter Mapping

_{ab}, and CCC, within and between crop fields. The results are presented in Figure 3. As shown in Figure 3a,d,g, the spatial variation of LAI within the field is discernible with some differences between MLRAs. The sPLS results (Figure 3a) show a wide distribution of low LAI values (i.e., ~2 m

^{2}m

^{−2}), particularly over rainfed (regular) fields, while the lowest LAI values estimated by RF (Figure 3d) and GBM (Figure 3g) are mostly above 2 m

^{2}m

^{−2}. However, GBM show relatively lesser values within some fields, indicative of senescing leaves. For LC

_{ab}(Figure 3b,e,h), a similar spatial variation can be observed, where sPLS (Figure 3b) results in widely distributed low LC

_{ab}values over rainfed fields and maximum values, i.e., ~50 µg cm

^{−2}, over irrigated fields. In contrast, the LC

**values around 50 µg cm**

_{ab}^{−2}are not widely distributed in the LC

_{ab}maps estimated from RF (Figure 3e) and GBM (Figure 3h). At the canopy level, the chlorophyll content (i.e., CCC) maps for RF (Figure 3f) and GBM (Figure 3i) are more similar, while sPLS (Figure 3c) contained lower values (i.e., red areas, ~10 µg cm

^{−2}). The results observed here are consistent with previous studies that found that PLS is well adapted to estimating lower values than other algorithms [74].

#### 3.4. Variable Importance

_{ab}, the RF importance results (Figure 4b) show that all Sentinel-2 spectral bands have the high contribution the model performance, i.e., >20%, while the GBM model shows that only one variable out of 10 variables (i.e., spectral bands) has a similar high contribution to the model performance, as depicted by Figure 4e. Specifically, the SWIR bands, B11:1610 nm and B12: 2190 nm, RE band, B5:705 nm, and VIS bands, B3:560 nm and B4:665 nm, are the most influential to the RF-LC

_{ab}model performance with %IncMSE of >30%, while NIR bands, B8:842 nm, and B8A:865 nm, have relatively low importance alongside VIS band, B2:490 nm, and RE bands, B6:740 nm, and B7:783 nm. In contrast, the GBM results show the greatest influence of the VIS band, B3:560 nm, i.e., >20%, while the VIS bands, B2:490 nm, and B4:665 nm, and RE band, and B6:740 nm, have a moderate influence (i.e., >10% <20%) on the GBM-LC

_{ab}model. Like RF-LC

_{ab}models, the variable importance for the RF-CCC model (Figure 4c) is also high, i.e., %IncMSE >20%, for all Sentinel-2 spectral bands, except for B12:2190 nm, which has a moderate influence on the model, i.e., >10% <20%. The most important variables, i.e., the VIS bands, B3:560 nm, B2:490 nm, and B4:665 nm, RE bands, B5:705 nm, B6:740 nm, and B7:783 nm, and SWIR band, B11:1610 nm, and NIR band, B8:842 nm, and narrow NIR band, B8A:865, have a %IncMSE of >30%. On the other hand, the GBM-CCC model (Figure 4f) shows that the SWIR band, B11:1610 nm, has the highest influence on the model performance, i.e., >20%, while VIS bands, B4:665 nm, B3:560 nm, and B2:490 nm, have a moderate influence on the model performance, i.e., >10% <20%. The rest of the other bands, i.e., B8A: 865 nm, B5: 705 nm, B7:783 nm, B6: 740 nm, B8:842 nm, and B12:2190 nm, have the least influence on the model, i.e., <10%.

## 4. Discussion

#### 4.1. Predictive Performance of Machine Learning Regression Algorithms

_{ab}) and canopy levels (i.e., CCC) are essential for both precision agriculture and crop monitoring needs. This paper evaluated the performance of three machine learning regression algorithms (MLRAs) in estimating these foliar biophysical parameters in an African semi-arid agricultural area in Bothaville, South Africa.

_{ab}, and CCC across compared MLRAs. These results present the RF as a good contender for the operationalization of biophysical parameters to support precision agriculture and crop monitoring since all relevant parameters for detecting crop health could be achieved in a single algorithm. Previous MLRA comparison studies found that different algorithms were optimal for characterizing different and individual parameters [17,75]. Therefore, the results here are significant for product developers aiming to reduce reliance on multiple algorithms for multiple parameter estimation. The algorithm’s inner workings are one of its greatest strengths as they are relatively transparent and require a few and intuitive parameters, i.e., m

_{try}and n

_{tree}, when compared to complex algorithms such as NN. For example, since it is based on the traditional CART algorithm, one can interrogate the variables used to split the nodes in the individual trees and thus explain the predictions. The strength of ensemble learning methods is also shown by the better predictive performance of GBM in estimating LAI and CCC. Although RF and GBM use CART as a base regressor, GBM makes predictions by sequentially and iteratively combining weak regression trees to improve the predictive performance rather than independently constructing several decision trees [76]. However, as shown by the results in this study, the RF formulation is more superior in terms of predictive performance.

_{ab}, compared to sPLS which showed relatively better prediction accuracy. The relatively poor performance of sPLS in estimating other biophysical parameters, i.e., LAI and CCC, can be attributed to its inability to deal with non-linear relationships between explanatory variables, i.e., spectral bands, or between these variables and the biophysical parameters [74]. Since Sentinel-2 contains optimized and few spectral bands for land monitoring, the variable selection capability of sPLS was rather not fully exploited. Thus, the observed results can be attributed to ordinary PLS used in the algorithm for projecting the variables into orthogonal space and estimation of biophysical parameters, with the exception of the sPLS-LC

_{ab}model. Concerning the sPLS-LC

_{ab}model, the capability of reducing dimensionality is observed, which assisted its better prediction accuracy against GBM. Despite its better performance in terms of RMSE and R

^{2}, it showed greater underestimations, i.e., %Bias ~5%, when compared to GBM. Delloye, et al. [77] found that extreme LC

_{ab}values, i.e., <30 and >70 µg cm

^{−2}, tend to be under-estimated. Our results showed this underestimation at >50 µg cm

^{−2}, possibly due to the difference in the range of training and validation data and the limitations of MLRAs in estimating beyond the range of training data. In fact, GBM resulted in the lowest %Bias for LAI and LC

_{ab}compared to both RF and sPLS. Therefore, its potential should be explored further in future studies, particularly its improved version, i.e., eXtreme Gradient Boosting (XGB), which has been reported highly efficient and more accurate.

^{2}m

^{−2}, R

^{2}: 0.76) is slightly low compared to that found by Verrelst, et al. [17], i.e., RMSE

_{CV}:0.44 m

^{2}m

^{−2}(R

^{2}: 0.90), using variational heteroscedastic Gaussian Process regression (VHGPR) and simulated Sentinel-2 bands in Barrax, Spain, though their data had a higher spatial resolution, i.e., 5 m. Over the same site, Verreslt, et al. [20] found an RMSE of 0.49 m

^{2}m

^{−2}(R

^{2}: 0.92) using Gaussian Process regression (GPR) and simulated Sentinel-2 data at 20 m resolution. This finding is comparable to the RF-LAI results in this study; however, their RRMSE is considerably worse, i.e., 23%, thus failing to reach the prescribed minimum accuracy of 10% for LAI product [78]. Our LC

_{ab}prediction performance was slightly worse than that reported by Delloye, Weiss and Defourny [77] using NN and Sentinel-2 data, i.e., RMSE: 7.26 μg m

^{−2}and better than that reported by Upreti, et al. [75], i.e., RMSE: 8.88 µg cm

^{−2}using Sentinel-2 data and RF. Considering that all the MLRAs evaluated in this study resulted in considerably better RRMSE, i.e., 2–3%, the study supports the recommendation by [17] that slower and non-optimal MLRAs such as NN should be replaced by simpler but more powerful ones.

#### 4.2. Influential Variables for Biophysical Parameter Estimation

_{ab}results due to low absorption of solar radiation by chlorophyll pigments. As a result, LC

_{ab}induces the largest variation in reflectance in the RE region below 730 nm [77]. The other Sentinel-2 RE bands are centred above 730 nm, i.e., B6:740 nm and B7:783 nm. Interestingly, both LAI and CCC models were mostly influenced by somewhat similar spectral bands. For example, in addition to B5, B11:1610 nm, B3:560 nm, and B4:665 nm, showed the greatest influence on the MLR model performance. It is, therefore, clear that these bands have high information content, explaining the greatest variability of LAI, LC

_{ab}, and CCC in the study area. The similarity in variable importance between LAI and CCC reflects the co-variation of LAI and LC

_{ab}at the canopy level. The VIS bands, B3:560 nm, and B4:665 nm, and SWIR band, B11:1610 nm, featured prominently in LC

_{ab}models, while the contribution of NIR was relatively low. Consistently, the sPLS-LC

_{ab}model excluded NIR bands, B8, and B8A. In rare instances, i.e., LC

_{ab}and CCC with RF, NIR bands, B8:842 nm and B8A:865 nm, were among the highly influential variables (i.e., %IncMSE >20%) to the model performance. Nonetheless, it should be noted that all the Sentinel-2 MSI co-variates contributed towards the MLRA performance, hence they should not be discarded. This is especially true for LAI which has been found to drive spectral variation in all bands and due to co-variation of biophysical parameters in various spectral regions [13,14]. However, it may be interesting to test the different subsets consisting of four to 10 bands based on these variable importance measures. This suggestion is informed by the sPLS-LC

_{ab}model which used fewer, i.e., 7 variables, resulting in better performance over GBM. Consistently, Verrelst, et al., [13] found that fewer HyMap bands, i.e., 9 and 7, were optimal (NRMSE

_{CV}: <10%) for estimating LC

_{ab}and LAI, respectively. In another study, Verrelst, et al., [81] found that only four Compact High-Resolution Imaging Spectrometer (CHRIS) bands, i.e., 674 nm, 605 nm, 942 nm, and 978 nm and 725 nm, 471 nm, 997 nm, and 511 nm, are sufficient to obtain accurate LAI and LC

_{ab}estimations, respectively. Therefore, the RF and GBM importance measures can be explored in future studies to optimise the number of spectral bands for estimating biophysical parameters with MLRAs based on some threshold.

_{ab}and CCC (Figure 4b,c). In contrast, GBM’s most important variable showed a markedly high relative influence compared to the next or other important variables. This observation can be attributed to the differences in the architecture of these MLRAs (i.e., RF and GBM). RF selects a subset of random variables to split the nodes of each tree; thus, all Sentinel-2 bands have an equal chance of contributing to the model. This may work against or in favour of prediction performance. For example, in datasets containing highly correlated variables, such as hyperspectral data, collinear variables may be selected for all or most of the trees in the forest, resulting in overfitting [82]. In such a case, feature selection and dimensionality reduction become essential [13,74]. Fortunately, in our case, the spectral bands were discrete and any redundant bands, such as B12:2190 nm, B6:740 nm, B7:783 nm, B8:842 nm, and B8A:865 nm, ranked relatively low in variable importance. The influential variables found here are consistent with the known absorption features and relationships with plant biophysical parameters [11,12,14].

#### 4.3. Limitations of the Study

## 5. Conclusions

_{ab}, and CCC using Sentinel-2 data. Moreover, the spectral bands that had greatest influence on the model accuracy were identified using RF and GBM variable importance measures. The results showed that RF was superior in estimating all three biophysical parameters, followed by GBM which was better than sPLS in estimating LAI and CCC but not LC

_{ab}, where sPLS showed relatively better prediction accuracy. Nevertheless, all MLRAs resulted in acceptable accuracy by GCOS/GMES, i.e., RRMSE of ≤10%, for all the biophysical parameters. This result is comparable (in other cases better when compared) to studies using simulated and hyperspectral data, RTMs, and advanced MLRAs such as NN and GPR [17,20,77,83]. Based on sPLS’ better predictive performance using only seven variables over GBM in estimating LC

_{ab}, it is recommended that fewer, i.e., 4–8, Sentinel-2 spectral subsets should be evaluated in the future. This recommendation is consistent with previous studies [13,81] that found that four to nine bands were sufficient to achieve robust estimates of biophysical parameters. Among all Sentinel-2 MSI bands, B5 (705 nm) was consistently more important in estimating all the biophysical parameters considered here. This band was previously shown to be a better estimator of chlorophyll content [80,81] than other RE regions. Interestingly, the results also showed some level of dependency between canopy biophysical parameters, i.e., LAI and CCC, as the variables with the greatest influence on the model performance were somewhat similar, i.e., B5, B11, B3, and B4. This finding shows that these bands have high information content, hence high predictive power, and are highly related to both LAI and CCC; this is consistent with previous studies that found that structural and leaf parameters have a co-dependent effect on canopy spectral variations which is difficult to decouple [14,77]. Overall, our results demonstrated that the RF regression algorithm provides the most robust predictive performance for all the crop biophysical parameters considered here by efficiently utilising all Sentinel-2 MSI bands, thus it is a good contender for operationalisation. However, this assertion should be tested further in different growth stages, crops, and climatic environments, in future studies. This is essential for future satellite-based biophysical product development using a single MLRA, thus improving the prospects and effectiveness of developing accurate prescription maps to support VRA precision agriculture techniques and regional crop monitoring activities.

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Clark, M.; Tilman, D. Comparative analysis of environmental impacts of agricultural production systems, agricultural input efficiency, and food choice. Environ. Res. Lett.
**2017**, 12, 064016. [Google Scholar] [CrossRef] - UN. Transforming Our World: The 2030 Agenda for Sustainable Development; United Nations: San Francisco, CA, USA, 2015. [Google Scholar]
- Peralta, N.R.; Costa, J.L.; Balzarini, M.; Franco, M.C.; Córdoba, M.; Bullock, D. Delineation of management zones to improve nitrogen management of wheat. Comput. Electron. Agric.
**2015**, 110, 103–113. [Google Scholar] [CrossRef] - Stamatiadis, S.; Schepers, J.; Evangelou, E.; Glampedakis, A.; Glampedakis, M.; Dercas, N.; Tsadilas, C.; Tserlikakis, N.; Tsadila, E. Variable-rate application of high spatial resolution can improve cotton N-use efficiency and profitability. Precis. Agric.
**2020**, 21, 695–712. [Google Scholar] [CrossRef] - Manandhar, A.; Zhu, H.; Ozkan, E.; Shah, A. Techno-economic impacts of using a laser-guided variable-rate spraying system to retrofit conventional constant-rate sprayers. Precis. Agric.
**2020**, 21, 1156–1171. [Google Scholar] [CrossRef] - Karatay, Y.N.; Meyer-Aurich, A. Profitability and downside risk implications of site-specific nitrogen management with respect to wheat grain quality. Precis. Agric.
**2020**, 21, 449–472. [Google Scholar] [CrossRef] - Weiss, M.; Baret, F.; Smith, G.; Jonckheere, I.; Coppin, P. Review of methods for in situ leaf area index (LAI) determination: Part II. Estimation of LAI, errors and sampling. Agric. For. Meteorol.
**2004**, 121, 37–53. [Google Scholar] [CrossRef] - Haboudane, D.; Tremblay, N.; Miller, J.R.; Vigneault, P. Remote estimation of crop chlorophyll content using spectral indices derived from hyperspectral data. Geosci. Remote Sens. IEEE Trans.
**2008**, 46, 423–437. [Google Scholar] [CrossRef] - Boegh, E.; Soegaard, H.; Broge, N.; Hasager, C.; Jensen, N.; Schelde, K.; Thomsen, A. Airborne multispectral data for quantifying leaf area index, nitrogen concentration, and photosynthetic efficiency in agriculture. Remote Sens. Environ.
**2002**, 81, 179–193. [Google Scholar] [CrossRef] - Jensen, R.; Mausel, P.; Dias, N.; Gonser, R.; Yang, C.; Everitt, J.; Fletcher, R. Spectral analysis of coastal vegetation and land cover using AISA+ hyperspectral data. Geocarto Int.
**2007**, 22, 17–28. [Google Scholar] [CrossRef] [Green Version] - Blackburn, G.A. Quantifying chlorophylls and caroteniods at leaf and canopy scales: An evaluation of some hyperspectral approaches. Remote Sens. Environ.
**1998**, 66, 273–285. [Google Scholar] [CrossRef] - Curran, P.J. Imaging spectrometry for ecological applications. Int. J. Appl. Earth Obs. Geoinf.
**2001**, 3, 305–312. [Google Scholar] [CrossRef] - Verrelst, J.; Rivera, J.P.; Gitelson, A.; Delegido, J.; Moreno, J.; Camps-Valls, G. Spectral band selection for vegetation properties retrieval using Gaussian processes regression. Int. J. Appl. Earth Obs. Geoinf.
**2016**, 52, 554–567. [Google Scholar] [CrossRef] - Ollinger, S.V. Sources of variability in canopy reflectance and the convergent properties of plants. New Phytol.
**2011**, 189, 375–394. [Google Scholar] [CrossRef] [PubMed] - Jacquemoud, S.; Verhoef, W.; Baret, F.; Bacour, C.; Zarco-Tejada, P.J.; Asner, G.P.; François, C.; Ustin, S.L. PROSPECT+ SAIL models: A review of use for vegetation characterization. Remote Sens. Environ.
**2009**, 113, S56–S66. [Google Scholar] [CrossRef] - Jacquemoud, S.; Baret, F.; Andrieu, B.; Danson, F.; Jaggard, K. Extraction of vegetation biophysical parameters by inversion of the PROSPECT+ SAIL models on sugar beet canopy reflectance data. Application to TM and AVIRIS sensors. Remote Sens. Environ.
**1995**, 52, 163–172. [Google Scholar] [CrossRef] - Verrelst, J.; Rivera, J.P.; Veroustraete, F.; Muñoz-Marí, J.; Clevers, J.G.; Camps-Valls, G.; Moreno, J. Experimental Sentinel-2 LAI estimation using parametric, non-parametric and physical retrieval methods–A comparison. ISPRS J. Photogramm. Remote Sens.
**2015**, 108, 260–272. [Google Scholar] [CrossRef] - Fu, D.; Chen, B.; Wang, J.; Zhu, X.; Hilker, T. An improved image fusion approach based on enhanced spatial and temporal the adaptive reflectance fusion model. Remote Sens.
**2013**, 5, 6346–6360. [Google Scholar] [CrossRef] [Green Version] - Huang, S.; Miao, Y.; Yuan, F.; Gnyp, M.L.; Yao, Y.; Cao, Q.; Wang, H.; Lenz-Wiedemann, V.I.; Bareth, G. Potential of RapidEye and WorldView-2 satellite data for improving rice nitrogen status monitoring at different growth stages. Remote Sens.
**2017**, 9, 227. [Google Scholar] [CrossRef] [Green Version] - Verrelst, J.; Muñoz, J.; Alonso, L.; Delegido, J.; Rivera, J.P.; Camps-Valls, G.; Moreno, J. Machine learning regression algorithms for biophysical parameter retrieval: Opportunities for Sentinel-2 and -3. Remote Sens. Environ.
**2012**, 118, 127–139. [Google Scholar] [CrossRef] - Mutanga, O.; Skidmore, A.K. Red edge shift and biochemical content in grass canopies. ISPRS J. Photogramm. Remote Sens.
**2007**, 62, 34–42. [Google Scholar] [CrossRef] - Houborg, R.; Fisher, J.B.; Skidmore, A.K. Advances in Remote Sensing of Vegetation Function and Traits; Elsevier: Amsterdam, The Netherlands, 2015. [Google Scholar]
- Mulla, D.J. Twenty five years of remote sensing in precision agriculture: Key advances and remaining knowledge gaps. Biosys. Eng.
**2013**, 114, 358–371. [Google Scholar] [CrossRef] - Clevers, J.G.; Gitelson, A.A. Remote estimation of crop and grass chlorophyll and nitrogen content using red-edge bands on Sentinel-2 and-3. Int. J. Appl. Earth Obs. Geoinf.
**2013**, 23, 344–351. [Google Scholar] [CrossRef] - Sakamoto, T.; Gitelson, A.A.; Nguy-Robertson, A.L.; Arkebauer, T.J.; Wardlow, B.D.; Suyker, A.E.; Verma, S.B.; Shibayama, M. An alternative method using digital cameras for continuous monitoring of crop status. Agric. For. Meteorol.
**2012**, 154–155, 113. [Google Scholar] [CrossRef] [Green Version] - Xu, X.; Lu, J.; Zhang, N.; Yang, T.; He, J.; Yao, X.; Cheng, T.; Zhu, Y.; Cao, W.; Tian, Y. Inversion of rice canopy chlorophyll content and leaf area index based on coupling of radiative transfer and Bayesian network models. ISPRS J. Photogramm. Remote Sens.
**2019**, 150, 185–196. [Google Scholar] [CrossRef] - Atzberger, C. Object-based retrieval of biophysical canopy variables using artificial neural nets and radiative transfer models. Remote Sens. Environ.
**2004**, 93, 53–67. [Google Scholar] [CrossRef] - Houborg, R.; Boegh, E. Mapping leaf chlorophyll and leaf area index using inverse and forward canopy reflectance modeling and SPOT reflectance data. Remote Sens. Environ.
**2008**, 112, 186–202. [Google Scholar] [CrossRef] - Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] [Green Version] - Vapnik, V.N. An overview of statistical learning theory. IEEE Trans. Neural Netw.
**1999**, 10, 988–999. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Haykin, S.; Network, N. A comprehensive foundation. Neural Netw.
**2004**, 2, 41. [Google Scholar] - Houborg, R.; McCabe, M.F. A hybrid training approach for leaf area index estimation via Cubist and random forests machine-learning. ISPRS J. Photogramm. Remote Sens.
**2018**, 135, 173–188. [Google Scholar] [CrossRef] - Liang, L.; Di, L.; Zhang, L.; Deng, M.; Qin, Z.; Zhao, S.; Lin, H. Estimation of crop LAI using hyperspectral vegetation indices and a hybrid inversion method. Remote Sens. Environ.
**2015**, 165, 123–134. [Google Scholar] [CrossRef] - Weiss, M.; Baret, F. S2ToolBox Level 2 Products: LAI, FAPAR, FCOVER; Institut National de la Recherche Agronomique (INRA), Avignon: Paris, France, 2016. [Google Scholar]
- Baret, F.; Weiss, M.; Lacaze, R.; Camacho, F.; Makhmara, H.; Pacholcyzk, P.; Smets, B. GEOV1: LAI and FAPAR essential climate variables and FCOVER global time series capitalizing over existing products. Part1: Principles of development and production. Remote Sens. Environ.
**2013**, 137, 299–309. [Google Scholar] [CrossRef] - Brown, L.A.; Fernandes, R.; Djamai, N.; Meier, C.; Gobron, N.; Morris, H.; Canisius, F.; Bai, G.; Lerebourg, C.; Lanconelli, C. Validation of baseline and modified Sentinel-2 Level 2 Prototype Processor leaf area index retrievals over the United States. ISPRS J. Photogramm. Remote Sens.
**2021**, 175, 71–87. [Google Scholar] [CrossRef] - Kganyago, M.; Mhangara, P.; Alexandridis, T.; Laneve, G.; Ovakoglou, G.; Mashiyi, N. Validation of sentinel-2 leaf area index (LAI) product derived from SNAP toolbox and its comparison with global LAI products in an African semi-arid agricultural landscape. Remote Sens. Lett.
**2020**, 11, 883–892. [Google Scholar] [CrossRef] - Bochenek, Z.; Dąbrowska-Zielińska, K.; Gurdak, R.; Niro, F.; Bartold, M.; Grzybowski, P. Validation of the LAI biophysical product derived from Sentinel-2 and Proba-V images for winter wheat in western Poland. Geoinf. Issues
**2017**, 9, 15–26. [Google Scholar] - Rasmussen, C.E. Gaussian processes in machine learning. In Proceedings of Summer School on Machine Learning, Magdeburg, Germany, 27–29 September 2021; pp. 63–71. [Google Scholar]
- Shawe-Taylor, J.; Cristianini, N. Kernel Methods for Pattern Analysis; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
- Camps-Valls, G.; Bruzzone, L. Kernel Methods for Remote Sensing Data Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Atzberger, C. Advances in remote sensing of agriculture: Context description, existing operational monitoring systems and major information needs. Remote Sens.
**2013**, 5, 949–981. [Google Scholar] [CrossRef] [Green Version] - Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens. Environ.
**2012**, 120, 25–36. [Google Scholar] [CrossRef] - Mueller-Wilm, U. Sentinel-2 MSI—Level-2A Prototype Processor Installation and User Manual. 2016. Available online: S2-PDGS-MPC-L2A-SUM-V2.4.0.pdf (accessed on 15 June 2021).
- Louis, J.; Debaecker, V.; Pflug, B.; Main-Knorn, M.; Bieniarz, J.; Mueller-Wilm, U.; Cadau, E.; Gascon, F. Sentinel-2 Sen2Cor: L2A Processor for Users. In Proceedings of the Living Planet Symposium 2016, Prague, Czech Republic, 9–13 May 2016; pp. 1–8. [Google Scholar]
- Adelabu, S.; Mutanga, O.; Adam, E. Testing the reliability and stability of the internal accuracy assessment of random forest for classifying tree defoliation levels using different validation methods. Geocarto Int.
**2015**, 30, 810–821. [Google Scholar] [CrossRef] - Consortium, C.E. Field Crop Boundary data layer (Free State province). In Field Crop Boundary Data Layer (Free State Province), 2017; Consortium, C.E., Ed.; Department of Agriculture, Forestry and Fisheries: Pretoria, South Africa, 2017. [Google Scholar]
- Abdel-Rahman, E.M.; Mutanga, O.; Odindi, J.; Adam, E.; Odindo, A.; Ismail, R. A comparison of partial least squares (PLS) and sparse PLS regressions for predicting yield of Swiss chard grown under different irrigation water sources using hyperspectral data. Comput. Electron. Agric.
**2014**, 106, 11–19. [Google Scholar] [CrossRef] - Sibanda, M.; Mutanga, O.; Dube, T.; Vundla, T.S.; Mafongoya, P.L. Estimating LAI and mapping canopy storage capacity for hydrological applications in wattle infested ecosystems using Sentinel-2 MSI derived red edge bands. GISci. Remote Sens.
**2019**, 56, 68–86. [Google Scholar] [CrossRef] - Sibanda, M.; Mutanga, O.; Rouget, M.; Kumar, L. Estimating biomass of native grass grown under complex management treatments using worldview-3 spectral derivatives. Remote Sens.
**2017**, 9, 55. [Google Scholar] [CrossRef] [Green Version] - Shah, S.H.; Angel, Y.; Houborg, R.; Ali, S.; McCabe, M.F. A random forest machine learning approach for the retrieval of leaf chlorophyll content in wheat. Remote Sens.
**2019**, 11, 920. [Google Scholar] [CrossRef] [Green Version] - Xu, Y.; Smith, S.E.; Grunwald, S.; Abd-Elrahman, A.; Wani, S.P. Incorporation of satellite remote sensing pan-sharpened imagery into digital soil prediction and mapping models to characterize soil property variability in small agricultural fields. ISPRS J. Photogramm. Remote Sens.
**2017**, 123, 1–19. [Google Scholar] [CrossRef] [Green Version] - Beltran, J.C.; Valdez, P.; Naval, P. Predicting Protein-Protein Interactions based on Biological Information using Extreme Gradient Boosting. In Proceedings of the 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Siena, Italy, 9–11 July 2019; pp. 1–6. [Google Scholar]
- Mao, H.; Meng, J.; Ji, F.; Zhang, Q.; Fang, H. Comparison of machine learning regression algorithms for cotton leaf area index retrieval using Sentinel-2 spectral bands. Appl. Sci.
**2019**, 9, 1459. [Google Scholar] [CrossRef] [Green Version] - Azodi, C.B.; Tang, J.; Shiu, S.-H. Opening the Black Box: Interpretable machine learning for geneticists. Trends Genet.
**2020**, 36, 442–455. [Google Scholar] [CrossRef] [PubMed] - Moreira, C.; Chou, Y.-L.; Velmurugan, M.; Ouyang, C.; Sindhgatta, R.; Bruza, P. LINDA-BN: An interpretable probabilistic approach for demystifying black-box predictive models. Decis. Support Syst.
**2021**, 113561. [Google Scholar] [CrossRef] - Chun, H.; Keleş, S. Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. Roy. Stat. Soc. Ser. B
**2010**, 72, 3–25. [Google Scholar] [CrossRef] [Green Version] - Wold, H. Estimation of Principal Components and Related Models by Iterative Least Squares; Academic Press: New York, NY, USA, 1966. [Google Scholar]
- Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf Theory
**1968**, 14, 55–63. [Google Scholar] [CrossRef] [Green Version] - Jolliffe, I.T.; Trendafilov, N.T.; Uddin, M. A modified principal component technique based on the LASSO. J. Comput. Graph. Stat.
**2003**, 12, 531–547. [Google Scholar] [CrossRef] [Green Version] - Zou, H.; Hastie, T.; Tibshirani, R. Sparse principal component analysis. J. Comput. Graph. Stat.
**2006**, 15, 265–286. [Google Scholar] [CrossRef] [Green Version] - Tierney, L. The R statistical computing environment. In Statistical Challenges in Modern Astronomy V; Springer: Berlin/Heidelberg, Germany, 2012; pp. 435–447. [Google Scholar]
- Chung, D.; Chun, H.; Keles, S. An Introduction to the ‘spls’ Package, Version 1.0; CRAN: University of Wisconsin: Madison, WI, USA, 2012; Available online: https://cran.r-project.org/web/packages/spls/vignettes/spls-example.pdf (accessed on 15 June 2021).
- Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification And Regression Trees, 1st ed.; Routledge: Boca Raton, FL, USA, 1984; p. 368. [Google Scholar] [CrossRef]
- Fawagreh, K.; Gaber, M.M.; Elyan, E. Random forests: From early developments to recent advancements. Syst. Sci. Control Eng. Open Access J.
**2014**, 2, 602–609. [Google Scholar] [CrossRef] [Green Version] - Gislason, P.O.; Benediktsson, J.A.; Sveinsson, J.R. Random Forests for land cover classification. Pattern Recognit. Lett.
**2006**, 27, 294–300. [Google Scholar] [CrossRef] - Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens.
**2012**, 67, 93–104. [Google Scholar] [CrossRef] - Mutanga, O.; Adam, E.; Cho, M.A. High density biomass estimation for wetland vegetation using WorldView-2 imagery and random forest regression algorithm. Int. J. Appl. Earth Obs. Geoinf.
**2012**, 18, 399–406. [Google Scholar] [CrossRef] - Okun, O.; Priisalu, H. Random forest for gene expression based cancer classification: Overlooked issues. In Proceedings of the Iberian Conference on Pattern Recognition and Image Analysis, Girona, Spain, 6–8 June 2007; pp. 483–490. [Google Scholar]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News
**2002**, 2, 18–22. [Google Scholar] - Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat.
**2001**, 29, 1189–1232. [Google Scholar] [CrossRef] - Richter, K.; Hank, T.B.; Mauser, W.; Atzberger, C. Derivation of biophysical variables from Earth observation data: Validation and statistical measures. J. Appl. Remote Sens.
**2012**, 6, 063557. [Google Scholar] [CrossRef] - Gara, T.W.; Darvishzadeh, R.; Skidmore, A.K.; Wang, T.; Heurich, M. Accurate modelling of canopy traits from seasonal Sentinel-2 imagery based on the vertical distribution of leaf traits. ISPRS J. Photogramm. Remote Sens.
**2019**, 157, 108–123. [Google Scholar] [CrossRef] - Rivera-Caicedo, J.P.; Verrelst, J.; Muñoz-Marí, J.; Camps-Valls, G.; Moreno, J. Hyperspectral dimensionality reduction for biophysical variable statistical retrieval. ISPRS J. Photogramm. Remote Sens.
**2017**, 132, 88–101. [Google Scholar] [CrossRef] - Upreti, D.; Huang, W.; Kong, W.; Pascucci, S.; Pignatti, S.; Zhou, X.; Ye, H.; Casa, R. A comparison of hybrid machine learning algorithms for the retrieval of wheat biophysical variables from sentinel-2. Remote Sens.
**2019**, 11, 481. [Google Scholar] [CrossRef] [Green Version] - Konstantinov, A.V.; Utkin, L.V. Interpretable machine learning with an ensemble of gradient boosting machines. Knowl. Based Syst.
**2021**, 222, 106993. [Google Scholar] [CrossRef] - Delloye, C.; Weiss, M.; Defourny, P. Retrieval of the canopy chlorophyll content from Sentinel-2 spectral bands to estimate nitrogen uptake in intensive winter wheat cropping systems. Remote Sens. Environ.
**2018**, 216, 245–261. [Google Scholar] [CrossRef] - ESA. Sentinel 2 Mission Requirements Document; ESA: Paris, France, 2019. [Google Scholar]
- Ramoelo, A.; Cho, M.A.; Mathieu, R.; Madonsela, S.; Van De Kerchove, R.; Kaszta, Z.; Wolff, E. Monitoring grass nutrients and biomass as indicators of rangeland quality and quantity using random forest modelling and WorldView-2 data. Int. J. Appl. Earth Obs. Geoinf.
**2015**, 43, 43–54. [Google Scholar] [CrossRef] - Wu, C.; Niu, Z.; Tang, Q.; Huang, W. Estimating chlorophyll content from hyperspectral vegetation indices: Modeling and validation. Agric. For. Meteorol.
**2008**, 148, 1230–1241. [Google Scholar] [CrossRef] - Verrelst, J.; Alonso, L.; Camps-Valls, G.; Delegido, J.; Moreno, J. Retrieval of vegetation biophysical parameters using Gaussian process techniques. IEEE Trans. Geosci. Remote Sens.
**2011**, 50, 1832–1843. [Google Scholar] [CrossRef] - Curran, P.J. Remote sensing of foliar chemistry. Remote Sens. Environ.
**1989**, 30, 271–278. [Google Scholar] [CrossRef] - Fernandes, R.; Weiss, M.; Camacho, F.; Berthelot, B.; Baret, F.; Duca, R. Development and assessment of leaf area index algorithms for the Sentinel-2 multispectral imager. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 13–18 July 2014; pp. 3922–3925. [Google Scholar]
- Djamai, N.; Fernandes, R. Comparison of SNAP-derived Sentinel-2A L2A product to ESA product over Europe. Remote Sens.
**2018**, 10, 926. [Google Scholar] [CrossRef] [Green Version]

**Figure 1.**Location of Bothaville (Orange) in Free State province (Dark grey), South Africa. The panel on the left shows the land cover distribution in 2018 and the one on the right shows Sentinel-2 False-colour composite (Red: B8, Green: B4, and Blue: B3) acquired on the 14-04-2021.

**Figure 2.**Scatterplots for Leaf Area Index (LAI, m

^{2}m

^{−2},

**a**–

**c**), Leaf Chlorophyll Content (LC

_{ab}, µg cm

^{−2},

**d**–

**f**), and Canopy Chlorophyll Content (CCC, µg cm

^{−2},

**g**–

**i**) showing the performance of sparse Partial Least Squares (sPLS,

**a**,

**d**,

**g**), Random Forest (RF,

**b**,

**e**,

**h**), and Gradient Boosting Machine (GBM,

**c**,

**f**,

**i**) with Sentinel-2 data.

**Figure 3.**Spatial variation of LAI (m

^{2}m

^{−2},

**a**,

**d**,

**g**), Leaf Chlorophyll Content (LC

_{ab}, µg cm

^{−2},

**b**,

**e**,

**h**), and Canopy Chlorophyll Content (CCC, µg cm

^{−2},

**c**,

**f**,

**i**), estimated with sparse Partial Least Squares (sPLS,

**a**–

**c**); Random Forest (RF,

**d**–

**f**); Gradient Boosting Machine (GBM,

**g**–

**i**) using Sentinel-2 data, respectively.

**Figure 4.**Important Sentinel-2 spectral bands (ranked from high to low) for estimating Leaf Area Index (LAI,

**a**,

**d**), Leaf Chlorophyll Content (LC

**,**

_{ab}**b**,

**e**), and Canopy Chlorophyll Content (CCC,

**c**,

**f**), from Random Forest (RF,

**a**–

**c**) and Gradient Boosting Machine (GBM,

**d**–

**f**).

**Table 1.**Descriptive statistics of the calibration (70%) and validation (30%) datasets for the measured LAI (m

^{2}m

^{−2}), LC

_{ab}(µg cm

^{−2}), and CCC (µg cm

^{−2}).

Datasets | n | Min | Mean | Max | SD | |
---|---|---|---|---|---|---|

LAI | Calibration | 113 | 1.78 | 3.35 | 5.57 | 0.86 |

Validation | 48 | 2.02 | 3.60 | 5.75 | 1 | |

LC_{ab} | Calibration | 113 | 4.06 | 33.87 | 66.18 | 14.93 |

Validation | 48 | 3.69 | 32.62 | 70.69 | 19.20 | |

CCC | Calibration | 113 | 10.30 | 105.69 | 288.22 | 61.60 |

Validation | 48 | 7.87 | 116.42 | 339.10 | 90.39 |

**Table 2.**Gradient Boosting Machine (GBM) parameters required for optimisation in ‘gbm’ R-package and their descriptions.

Parameters | Description |
---|---|

Number of trees (T) | This is the total number of trees to fit or iterations. |

Tree depth (K) | The depth of a tree determines the number of splits in each tree to control the complexity of the boosted ensemble. |

Learning rate (⋋) | The learning rate controls the speed of the algorithm down the gradient descent. The smaller values improve the performance and reduce the chance of overfitting. |

Subsample (p) | The subsample ratio of the training instance controls the randomly collected data instance to grow trees. For example, a value of 0.5 causes GBM to randomly collect half of the data instances and prevent overfitting through implementing stochastic gradient descent. The values for this parameter should be between 0 and 1. |

**Table 3.**Optimal tuning parameters used to train Leaf Area Index (LAI, m

^{2}m

^{−2}), Leaf Chlorophyll Content (LC

_{ab}, µg cm

^{−2}), and Canopy Chlorophyll Content (CCC, µg cm

^{−2}) models based on grid-search parameterisation strategy and k-fold cross-validation.

sPLS | RF | GBM | |
---|---|---|---|

LAI | eta = 0.7; K = 5; p = 10; RMSE_{CV} = 0.8 | m_{try} = 5; OOB error = 0.34 | n_{trees} = 146; interaction depth = 5; shrinkage = 0.1; n.minobsinnode = 15; RMSE_{CV} =0.65 |

LC_{ab} | eta = 0.9; K = 5; p = 7 *; RMSE_{CV} = 7.58 | m_{try} = 3; OOB error = 7.21 | n_{trees} = 28; interaction depth = 3; shrinkage = 0.1; n.minobsinnode = 15; RMSE_{CV} = 7.56 |

CCC | eta = 0.8; K = 5; p = 10; RMSE _{CV} = 41.22 | m_{try} = 2; OOB error = 34.56 | n_{trees} = 94; interaction depth = 5; shrinkage = 0.1; n.minobsinnode = 15;RMSE _{CV} = 37.21 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kganyago, M.; Mhangara, P.; Adjorlolo, C.
Estimating Crop Biophysical Parameters Using Machine Learning Algorithms and Sentinel-2 Imagery. *Remote Sens.* **2021**, *13*, 4314.
https://doi.org/10.3390/rs13214314

**AMA Style**

Kganyago M, Mhangara P, Adjorlolo C.
Estimating Crop Biophysical Parameters Using Machine Learning Algorithms and Sentinel-2 Imagery. *Remote Sensing*. 2021; 13(21):4314.
https://doi.org/10.3390/rs13214314

**Chicago/Turabian Style**

Kganyago, Mahlatse, Paidamwoyo Mhangara, and Clement Adjorlolo.
2021. "Estimating Crop Biophysical Parameters Using Machine Learning Algorithms and Sentinel-2 Imagery" *Remote Sensing* 13, no. 21: 4314.
https://doi.org/10.3390/rs13214314