Predicting Canopy Chlorophyll Content in Sugarcane Crops Using Machine Learning Algorithms and Spectral Vegetation Indices Derived from UAV Multispectral Imagery

The use of satellite-based Remote Sensing (RS) is a well-developed field of research. RS techniques have been successfully utilized to evaluate the chlorophyll content for the monitoring of sugarcane crops. This research provides a new framework for inferring the chlorophyll content in sugarcane crops at the canopy level using unmanned aerial vehicles (UAVs) and spectral vegetation indices processed with multiple machine learning algorithms. Studies were conducted in a sugarcane field located in Sugarcane Research Institute (SRI, Uda Walawe, Sri Lanka), with various fertilizer applications over the entire growing season from 2020 to 2021. An UAV with multispectral camera was used to collect the aerial images to generate the vegetation indices. Ground measurements of leaf chlorophyll were used as indications for fertilizer status in the sugarcane field. Different machine learning (ML) algorithms were used ground-truthing data of chlorophyll content and spectral vegetation indices to forecast sugarcane chlorophyll content. Several machine learning algorithms such as MLR, RF, DT, SVR, XGB, KNN and ANN were applied in two ways: before feature selection (BFS) by training the algorithms with all twenty-four (24) vegetation indices with five (05) spectral bands and after feature selection (AFS) by training algorithms with fifteen (15) vegetation indices. All the algorithms with both BFS and AFS methods were compared with an estimated coefficient of determination (R2) and root mean square error (RMSE). Spectral indices such as RVI and DVI were shown to be the most reliable indices for estimating chlorophyll content in sugarcane fields, with coefficients of determination (R2) of 0.94 and 0.93, respectively. XGB model shows the highest validation score (R2) and lowest RMSE in both methods of BFS (0.96 and 0.14) and AFS (0.98 and 0.78), respectively. However, KNN and SVR algorithms show the lowest validation accuracy than other models. According to the results, the AFS validation score is higher than BFS in MLR, SVR, XGB and KNN. Even though, validation score of the ANN model is decreased in AFS. The findings demonstrated that the use of multispectral UAV could be utilized to estimate chlorophyll content and measure crop health status over a larger sugarcane field. This methodology will aid in real-time crop nutrition management in sugarcane plantations by reducing the need for conventional measurement of sugarcane chlorophyll content.


Introduction
The use of UAVs for agriculture and plant biosecurity is rapidly increasing [1][2][3][4][5][6] and the use of UAV remote sensing for precision agriculture (PA) has grown dramatically [7]. The use of unmanned aerial vehicles (UAVs) for remote sensing (RS) has developed rapidly as a method of capturing high-resolution images from the near surface of the Earth [8][9][10][11][12][13]. Several remote sensing applications have proven to be a valuable source of reflectance data for estimating various crop canopy variables relating to biophysical, physiological, or biochemical properties [14]. Many criteria of crop monitoring have already been proved to be relevant to remote sensing data and methodologies [15]. Remote sensing methods enable monitoring the agriculture field by detecting variations in the chlorophyll content with a large area for a short time [16]. Remote sensing of plant spectral responses has been demonstrated to be a promising method for capturing changes in vegetation attributes while also providing a non-destructive approach [17].
The evolving UAV platforms provide several benefits (for example, they are economical, versatile, and less affected by environmental variables), as well as their capability to collect high temporal and spatial resolution data [18]. Advances in low-altitude remote sensing technology, such as UAVs, provide a high temporal and spatial resolution solution for non-destructive, quick, and accurate assessment of various crops' biophysical parameters [19]. Satellite and manned aircraft-based remote sensing platforms can also monitor crop status across large fields and measure numerous crop and environmental parameters in real time based on precise spectral information. However, limited spatial and temporal resolution and expensive equipment costs are the major constraints in satellite remote sensing over UAV applications. Even though are highly sensitive to environmental factors [20] and UAV payload limitations and flying time is lower than those possible with satellite or manned aircraft remote sensing [20], UAV is however a viable technology and a good aerial platform for farmers, with high spatial and temporal resolution benefits [21,22].
UAV remote sensing platforms equipped with different sensors including RGB, multispectral and hyperspectral cameras have emerged as a viable option for rapid highthroughput phenotyping due to their flexibility and convenience, on-demand data access, and high spatial resolution [23,24] illustrated the application and suitability of different UAV cameras used in smart farming. RGB cameras are highly suited for the determination of canopy height and lodging. In contrast, multispectral cameras are highly suited for drought stress detection, pathogen detection, estimation of nutrients, determination of growth vigour, yield prediction, and hyperspectral and multispectral cameras are more suitable for the identification of pests and disease, weed detection and estimation of nutrient status [25].
Sugarcane (Saccharum officinarum) is a perennial crop commonly planted throughout the tropical and subtropical regions in the world [26]. Many natural and manmade disturbances and stressors directly impact chlorophyll content, which is the principal pigment that drives photosynthesis [16]. The accurate measurement of leaf chlorophyll concentration is critical for examining overall plant health, regulating fertilizer application, and other inputs [17]. The chlorophyll content is associated with nitrogen concentration in vegetation and is an indicator of photosynthetic activity [15]. Traditional methods for pigmentation analysis, such as spectrophotometers, leaf destruction, or high-performance liquid chromatography (HPLC), cannot quantify changes in pigmentation over a short time [19]. Furthermore, these technologies are time-consuming, costly, and assessing the health of the crops is unfeasible. As a result, these methods have limitations in monitoring crop nutritional status over a large area and greatly discourage monitoring crops. As a result, reliable, efficient, and practical methods for estimating this biophysical parameter are required [19,27,28].

Application of Remote Sensing on Sugarcane Crops
Studying the spectrometric response of leaves is essential because spectral properties are linked to non-destructive plant growth and health monitoring [29]. Plant nutritional Remote Sens. 2022, 14,1140 3 of 22 analysis has been made easier because spectral vegetation indices were collected from UAV images and machine learning techniques [30]. Bei Cui et al. [31] designed a new method for estimating chlorophyll content in winter wheat based on crop canopy reflectance with low sensitivity to the leaf area index (LAI) and consistent sensitivity to various crop growth situations. Gitelson et al. [32] established the conceptual model to estimate chlorophyll in maize and soybean canopies remotely, critical for regional and global carbon balance and fertilizer management. Ballester et al. [33] performed experiments to verify that the green, red vegetation index (GRVI), and the red edge ratio (RE/R) derived from UAS imagery could be used to monitor the effects of soil water status in cotton crop.
Nitrogen deficiency reduces the leaf chlorophyll concentration, enhancing the leaf's transmittance at visible wavelengths. As a result, reflected radiation from crop has been used to measure chlorophyll content. These pigments have diverse spectrum behaviour with particular absorption properties at different wavelengths, allowing the estimation of the chlorophyll concentration by remote sensing tools [19]. Spectral indices are a strong remote sensing feature for measuring plant nitrogen concentrations [34]. Therefore, spectral vegetation analysis is a viable alternative for estimating plant health [30]. On the other hand, those indices result from a combination of responses to changes in a variety of vegetation and environmental variables, such as the LAI and leaf chlorophyll content [15]. Syed Haleem Shah et al. [17] used standard statistical methodologies in conjunction with random forest regression algorithm machine learning (ML) techniques with 45 existing vegetation indices to assess the potential of hyperspectral data to estimate chlorophyll in wheat. A typical univariate regression ML analysis was used as a baseline to model the association between observed chlorophyll and the selected vegetation indices.

Machine Learning for Crop Health and Chlorophyll Content
ML algorithms have recently been applied to a variety of remote sensing applications to monitor and measure crop health and parameters [35]. ML approaches try to create an empirical relationship between the independent factors and yield, giving them the advantage of forecasting production without relying on specific crop attributes [36]. neural networks (ANN), random forests (RF), support vector machines (SVM), decision trees (DT), and other algorithms are useful for UAV-based image processing [30]. Multiple linear regression (MLR) is a statistical technique that predicts the outcome of a dependent variable using multiple independent variables. The RF Regression is a supervised learning approach for regression that uses the ensemble learning technique for remote sensingbased agricultural research projects [17,37]. The decision tree (DT) uses a tree topology to generate regression or classification models. It incrementally cuts down a dataset into smaller sections while simultaneously developing an associated decision tree. Support vector regression (SVR) is based on the same premise as SVM, but it is used to solve regression problems. Support vector regression (SVR) is a regression technique that is an extension of support vector machine (SVM). SVR develops an ideal separating hyperplane in order to distinguish classes that overlap and are not linearly separable. In this scenario, a huge modified feature space is produced to map the data and then separated along a linear boundary using kernel functions [38].
ML technologies have been used to predict crop parameters [37]. For example, winter wheat biomass estimation was carried out using the visualization approach for SVR and the investigation of influential textures [39]. Extreme gradient boosting (XGB) is a class of ensemble machine learning techniques for classification and regression predictive modelling tasks. It is an effective gradient boosting implementation that may be used for predictive regression modelling. K-Nearest neighbors (KNN) regression, RF, SVR are a non-parametric technique that approximate the connection between independent variables and the continuous outcome. Computations and mathematics are used in the ANN model to imitate human-brain processes. That of a biological nervous system influences the architecture format of the ANN models. The ANN models are composed of a complex and nonlinear network of neurons like the real brain. Han et al. [40] showed that ANN is more effective than random forest regression in calculating maize above-ground biomass.
Zhang et al. [18] used four structure VIs and two chlorophyll VIs, as well as three regression algorithms (MLR, ANN and RF) in a maize field in Inner Mongolia, China, to measure the maize chlorophyll and vegetation indices (VIs) to crop water stress. Moran et al. [41] looked at using narrowband vegetation indices and multivariate approaches like PLSR and RF regression to estimate forest canopy chlorophyll content from airborne hyperspectral data. [26] used spectral data from a spaceborne hyperspectral image to determine sugarcane canopy nitrogen concentration spatial variation using MLR and SVR. Xu et al. [42] used six regression algorithms, including MLR, SMR, KRLS, GLM, GBM and RF, to assess the yield using a ML with UAV-lidar data in sugarcane crops. The Table 1 illustrate the different ML algorithms and their equations and optimal hyperparameter values. Canata et al. [43] used Sentinel-2 multi-temporal imagery data to experiment and found that the RF regression method enabled the development of predictive yield models for commercial sugarcane fields, with the authors concluding that the RF regression method was more accurate (lower RMSE and higher R 2 ) than the MLR. RF has lately gained popularity in remote sensing research for classification and regression. The RF algorithm's variable importance plot is particularly good at detecting the most important input variables in the model [41]. Using spectral vegetation indices computed from UAV-imagery and the RF method Osco et al. [30] provided a new framework to infer nitrogen content in citrus trees at a canopy level. Feng et al. [36] used UAV-based hyperspectral imaging and ensemble learning to forecast alfalfa yield using SVR, KNN, and RF. Lee et al. [44] used three empirical methods (linear regression, RF, and SVR to statistically connect spectral data and nitrogen levels in two corn fields in Canada. Combining machine learning techniques with spectral vegetation indices is a relatively new advanced practice to overcome the limitation of the conventional method of determining the amount of chlorophyll in plants which is time-consuming and labour-intensive [30]. Only a few studies have found a link between leaf nitrogen levels and chlorophyll content. Therefore, a major goal of this paper was to assess high-resolution multispectral UAV images for non-destructive measurement of the chlorophyll content of sugarcane crops. There were three sub-objectives; (1) to correlate the vegetation indices with the variations of the chlorophyll in the field; (2) to compare the validation performance of the before feature selection (BFS) and after feature selection (AFS) approaches in selected ML models; and (3) to assess the prediction performance on prediction of chlorophyll content by different ML methods.

Study Site
The study was carried out during September 2021 sugarcane growing season in a 1512 m 2 field located in Sugarcane Research Institute (SRI), Uda Walawe, Sri Lanka as shown in the Figure 1. The sugarcane variety of SL 96 128 was planted on the reddish-brown earth (RBE) in the sugarcane field.
connect spectral data and nitrogen levels in two corn fields in Canada. Combining machine learning techniques with spectral vegetation indices is a relatively new advanced practice to overcome the limitation of the conventional method of determining the amount of chlorophyll in plants which is time-consuming and labour-intensive [30].
Only a few studies have found a link between leaf nitrogen levels and chlorophyll content. Therefore, a major goal of this paper was to assess high-resolution multispectral UAV images for non-destructive measurement of the chlorophyll content of sugarcane crops. There were three sub-objectives; (1) to correlate the vegetation indices with the variations of the chlorophyll in the field; (2) to compare the validation performance of the before feature selection (BFS) and after feature selection (AFS) approaches in selected ML models; and (3) to assess the prediction performance on prediction of chlorophyll content by different ML methods.

Study Site
The study was carried out during September 2021 sugarcane growing season in a 1512 m 2 field located in Sugarcane Research Institute (SRI), Uda Walawe, Sri Lanka as shown in the Figure 1. The sugarcane variety of SL 96 128 was planted on the reddishbrown earth (RBE) in the sugarcane field. Average climatic (Table 2) during the study period was collected from a weather station located at the SRI, Uda Walawe.  Average climatic (Table 2) during the study period was collected from a weather station located at the SRI, Uda Walawe.

Experimental Design
The whole sugarcane field was allocated into twelve (12) fertilizer treatments with three replications, as shown in Figure 2 and Table 3. Altogether thirty-six (36) blocks (7 × 6 m 2 ) were designed for each treatment, and 90 three-budded setts were planted per block. Two sampling sites of subblock (1.5 × 1.5 m 2 ) were selected randomly in each block as average canopy area of each plant is 1.5 m 2 .

Experimental Design
The whole sugarcane field was allocated into twelve (12) fertilizer treatments with three replications, as shown in figure 2 and Table 3. Altogether thirty-six (36) blocks (7 × 6 m 2 ) were designed for each treatment, and 90 three-budded setts were planted per block. Two sampling sites of subblock (1.5 × 1.5 m 2 ) were selected randomly in each block as average canopy area of each plant is 1.5 m 2 .

Ground Truth Data Collection
This study used ground measurements of chlorophyll SPAD reading as references for sugarcane nutrient status, as shown in Figure 3. Chlorophyll content was collected using the SPAD-502 plus chlorophyll meter (accuracy of ±1.0, Konico Minolta optics Inc Osaka, Japan). Three upper side of the leaves were selected to measure the SPAD readings within the each subblock. A total of 216 SPAD readings were collected during the vegetative stage of the 5-month-old plant to build the different ML models, and the sample locations were geo-located using a Triton 2000 handheld GPS receiver (Magellan, California, United State of America).

Ground Truth Data Collection
This study used ground measurements of chlorophyll SPAD reading as references for sugarcane nutrient status, as shown in Figure 3. Chlorophyll content was collected using the SPAD-502 plus chlorophyll meter (accuracy of ±1.0, Konico Minolta optics Inc Osaka, Japan). Three upper side of the leaves were selected to measure the SPAD readings within the each subblock. A total of 216 SPAD readings were collected during the vegetative stage of the 5-month-old plant to build the different ML models, and the sample locations were geo-located using a Triton 2000 handheld GPS receiver (Magellan, CA, USA). Remote Sens. 2022, 14, x FOR PEER REVIEW 7 of 22

Acquisition and Preprocessing of UAV Multispectral Images
A DJI P4 multispectral system (Da-Jiang Innovations (DJI), Shenzhen, Guangdong, China) was used to conduct a UAV flight mission on a sunny day between 12.30 pm and 02.00 pm (Sri Lankan standard time) during the sugarcane growing season. The visible to the near-infrared spectral range of the DJI P4 multispectral camera has five bands (blue, green, red, red edge, and near infrared) at 450.0, 560.0, 650.0, 730.0 and 840.0 nm, respectively. The flight altitude above ground, speed, and ground sample distance, were 15 m, 6 m·s -1 and 1.42 cm, respectively. The front and side overlap of images on the flight line was 80 % and 70 %, respectively, as shown in Table 4. Six ground control points (GCPs) were used to improve geolocation accuracy for post-image processing. The image mosaic processing was carried out with Agisoft Metashape (Version-1.6.6, Agisoft LLC, Petersburg, Russia)

Estimation of Vegetation Indices
The values of reflectance in the red, green, blue, red edge and NIR portions of the electromagnetic spectrum of UAVs were used to generate several VIs. Twenty-four (24) VIs were estimated, as shown in Table 5 to demonstrate the feasibility of calculating sugarcane vegetation indices to predict the chlorophyll content. Two rectangle regions of interest (ROI) were identified based on the GPS coordinates collected during ground truth measurement in each block on the aerial vegetation index map. The average VIs inside the ROI was determined. Generation of vegetation Indices and extraction of index values were performed using the Open-Source Geographic Information System of QGIS (version-3.20) [45].

Acquisition and Preprocessing of UAV Multispectral Images
A DJI P4 multispectral system (Da-Jiang Innovations (DJI), Shenzhen, Guangdong, China) was used to conduct a UAV flight mission on a sunny day between 12.30 pm and 02.00 pm (Sri Lankan standard time) during the sugarcane growing season. The visible to the near-infrared spectral range of the DJI P4 multispectral camera has five bands (blue, green, red, red edge, and near infrared) at 450.0, 560.0, 650.0, 730.0 and 840.0 nm, respectively. The flight altitude above ground, speed, and ground sample distance, were 15 m, 6 m·s −1 and 1.42 cm, respectively. The front and side overlap of images on the flight line was 80 % and 70 %, respectively, as shown in Table 4. Six ground control points (GCPs) were used to improve geolocation accuracy for post-image processing. The image mosaic processing was carried out with Agisoft Metashape (Version-1.6.6, Agisoft LLC, Petersburg, Russia).

Estimation of Vegetation Indices
The values of reflectance in the red, green, blue, red edge and NIR portions of the electromagnetic spectrum of UAVs were used to generate several VIs. Twenty-four (24) VIs were estimated, as shown in Table 5 to demonstrate the feasibility of calculating sugarcane vegetation indices to predict the chlorophyll content. Two rectangle regions of interest (ROI) were identified based on the GPS coordinates collected during ground truth measurement in each block on the aerial vegetation index map. The average VIs inside the ROI was determined. Generation of vegetation Indices and extraction of index values were performed using the Open-Source Geographic Information System of QGIS (version-3.20) [45].

Machine Learning Modelling and Statistical Analysis
Statistical analysis was used to examine and establish an association between the UAV-derived vegetation indices and ground-truthing SPAD reading through different machine learning modelling by using Python (version 3.8.10). One of the feature selection techniques of Pearson's correlation coefficient was utilized to select which vegetation indices were most sensitive to chlorophyll, and the highest correlation coefficients (R 2 ) values from reflectance features were used to develop machine learning algorithms to predict the sugarcane chlorophyll content accurately. In this study, seven (07) machine learning regression algorithms, MLR, RF, DT, SVR, XGB, KNN and ANN were compared to predict the sugarcane chlorophyll concentration based on VIs derived from reflection images. The root means square error (RMSE) and the coefficient of determination or validation score (R 2 ) were calculated for training and validation to compare and select the best-fit algorithm for chlorophyll prediction in the sugarcane field. In statistics and machine learning, feature selection refers to the process of selecting a subset of relevant features (predictors and variables) for inclusion in a model. It is the process of automatically selecting the data qualities (such as columns in tabular data) that are most significant and pertinent to the predictive modelling challenge at hand. It is mean that to minimize the number of input variables to those that are deemed to be most beneficial in predicting the target variable. According to the previous studies mentioned in the Table 4, 24 VIs were selected to correlate the ground truth measurement. After estimating the correlation values for all VIs (before feature selection), 15 VIs were selected (after feature selection) based on the Pearson correlation values greater than 50% (±0.5) to improve the model performance and reduce the training time for the development of ML models. Finally total of 216 samples were used to build the different ML models.

One Way ANOVA Statistical Analysis for Different Treatments and Chlorophyll Content
A one-way ANOVA test was performed to estimate the significant relationship between all twelve (12) fertilizer treatments and sugarcane chlorophyll content. The result shows a significant interaction (p = 0.001) between all fertilizer treatment and chlorophyll content. Figure 4 shows the comparing the treatment means and variability of chlorophyll reading, and Figure 5 shows the quantile-quantile (Q-Q) plot confirming that the data were adequately close to the theoretical reference line, representing a soundly model fit.

Correlation between Vegetation Indices and Sugarcane Chlorophyll Content
The Pearson's correlation coefficients (R 2 ) for the relationship between VIs and sugarcane chlorophyll content are shown in Figure 6 and detailed correlation matrix shown in Figure A1 (appendix A). Pearson's correlation test was performed to select the essential features crucial for training the ML algorithms. The RVI showed the highest positive correlation with the chlorophyll content (R 2 value: 0.94), and DVI also had stronger correlations with chlorophyll content (R 2 value: 0.93). Next to the RVI and DVI, other vegetation indices such as NDRE, GNDVI, LCI, EVI, and NDVI showed positive correlation coefficients of 0.86, 0.86, 0.85, 0.84, and 0.82, respectively.

Correlation between Vegetation Indices and Sugarcane Chlorophyll Content
The Pearson's correlation coefficients (R 2 ) for the relationship between VIs and sugarcane chlorophyll content are shown in Figure 6 and detailed correlation matrix shown in Figure A1 (Appendix A). Pearson's correlation test was performed to select the essential features crucial for training the ML algorithms. The RVI showed the highest positive correlation with the chlorophyll content (R 2 value: 0.94), and DVI also had stronger correlations with chlorophyll content (R 2 value: 0.93). Next to the RVI and DVI, other vegetation indices such as NDRE, GNDVI, LCI, EVI, and NDVI showed positive correlation coefficients of 0.86, 0.86, 0.85, 0.84, and 0.82, respectively.

Prediction of Sugarcane Chlorophyll Content by Using Machine Learning Algorithms
Different ML techniques including MLR, RF, DT, SVR, XGB, KNN and ANN were developed in two methods. The first is before feature selection (BFS)-training the algorithms with all twenty-four (24) vegetation indices and five (05) spectral bands; The second method is after feature selection (AFS)-training the algorithms with selected fifteen (15) vegetation indices. The two methods were compared with the estimated coefficient of determination (R 2 ) and root mean square error (RMSE), as shown in Figure 7.
As shown in Table 6, XGB model shows the highest validation score (R 2 ) and lowest RMSE in both methods of BFS (0.96 and 0.14) and AFS (0.98 and 0.78), respectively. As for RF, both R 2 values derived from the validation data set were lower than the XGB. Also, the MLR model also shows a good training and validation score in both methods. However, KNN and SVR algorithms show the lowest validation accuracy than other models. When comparing the two approaches, the AFS validation score increases in MLR, SVR, XGB and KNN. Even though RF and DT show no changes in validation score in both methods, the validation score of the ANN model decreases in AFS. Remote Sens. 2022, 14, x FOR PEER REVIEW 11 of 22

Prediction of Sugarcane Chlorophyll Content by Using Machine Learning Algorithms
Different ML techniques including MLR, RF, DT, SVR, XGB, KNN and ANN were developed in two methods. The first is before feature selection (BFS)-training the algorithms with all twenty-four (24) vegetation indices and five (05) spectral bands; The second method is after feature selection (AFS)-training the algorithms with selected fifteen (15) vegetation indices. The two methods were compared with the estimated coefficient of determination (R 2 ) and root mean square error (RMSE), as shown in Figure 7.
As shown in Table 6, XGB model shows the highest validation score (R 2 ) and lowest RMSE in both methods of BFS (0.96 and 0.14) and AFS (0.98 and 0.78), respectively. As for RF, both R 2 values derived from the validation data set were lower than the XGB. Also, the MLR model also shows a good training and validation score in both methods. However, KNN and SVR algorithms show the lowest validation accuracy than other models. When comparing the two approaches, the AFS validation score increases in MLR, SVR, XGB and KNN. Even though RF and DT show no changes in validation score in both methods, the validation score of the ANN model decreases in AFS.

Discussion
In this study, we compared canopy level multispectral data to estimate leaf chlorophyll in sugarcane crops using different ML architecture. Chlorophyll has long been thought to be the most crucial pigment for detecting nutritional stress. When sugarcane canopy structure was more responsive to nutrient stress, the results demonstrated that the chlorophyll content could only assess sugarcane nutrient stress. Chlorophyll concentration drops when nutrient stress occurs and causes structural or colour changes identified as visual nutrient stress symptoms. The use of ML architectures including MLR, RF, DT, SVR, XGB, KNN, and ANN comparing (1) all spectral bands and (2) selected VIs as well as regressions analysis using existing VIs were investigated.

Basic Statistical Analysis
An initial ANOVA test was used to determine whether or not the treatment results of an experiment are significant, and F-distribution was used to compare two means from two independent variables of VIs using a one-way ANOVA [62]. As shown in Figure 4, the result is statistically significant, which indicates that the two means are unequal. This test confirms the conducted fertilizer treatments are significantly different from each other. Therefore, we can confirm that all the fertilizer treatments show significant variation among them, which is important to develop efficient ML models to predict sugarcane chlorophyll content. After confirming the ANOVA outputs, the pearson's correlation

Discussion
In this study, we compared canopy level multispectral data to estimate leaf chlorophyll in sugarcane crops using different ML architecture. Chlorophyll has long been thought to be the most crucial pigment for detecting nutritional stress. When sugarcane canopy structure was more responsive to nutrient stress, the results demonstrated that the chlorophyll content could only assess sugarcane nutrient stress. Chlorophyll concentration drops when nutrient stress occurs and causes structural or colour changes identified as visual nutrient stress symptoms. The use of ML architectures including MLR, RF, DT, SVR, XGB, KNN, and ANN comparing (1) all spectral bands and (2) selected VIs as well as regressions analysis using existing VIs were investigated.

Basic Statistical Analysis
An initial ANOVA test was used to determine whether or not the treatment results of an experiment are significant, and F-distribution was used to compare two means from two independent variables of VIs using a one-way ANOVA [62]. As shown in Figure 4, the result is statistically significant, which indicates that the two means are unequal. This test confirms the conducted fertilizer treatments are significantly different from each other. Therefore, we can confirm that all the fertilizer treatments show significant variation among them, which is important to develop efficient ML models to predict sugarcane chlorophyll content. After confirming the ANOVA outputs, the pearson's correlation coefficients (R 2 ) for the relationship between UAV-derived VIs and sugarcane chlorophyll content are estimated to select the essential features crucial for training the ML algorithms. The highly correlated input variables including RVI, DVI, NDRE, GNDVI, LCI, EVI, and NDVI are linked with the target. In this study, we used an absolute number, such as 0.5, as the variable selection threshold. If the predictor variables are found to be associated, the variable with the lowest correlation coefficient value with the target variable is discarded. However, other features selection techniques such as chi-square test, Fisher's score, variance threshold, mean absolute difference (MAD), forward feature selection, and backward feature elimination can also be used in different ML studies. Therefore, future studies can be compared the different feature selection methods for the forecast of chlorophyll to find the best prototype model [63].

Machine Learning Approach Using Multispectral Bands and VLs
The ML models evaluated; MLR, RF, DT, SVR, XGB, KNN, and ANN are all good at handling a continuous dependent variable that is correlated with VIs. The use of spectral vegetation indices in conjunction with machine learning models proved to be an effective method for predicting chlorophyll content consistent with [64], and spectral indices have proven to be an essential technique for evaluating nitrogen [14]. This study did not remove the soil from the reflectance map to estimate vegetation indices because the crop completely covered the soil during the experimental period. However, it is necessary to remove the soil for estimation of VIs during the early stage of sugarcane crops as an aerial map can be shown the soil between the sugarcane crops. We used 80% of the available VIs as input training data and 20% as validation data to estimate the machine learning model's performance on new data. Best fit line plots were generated to compare all ML models using both BFS and AFS methods, as illustrated in Figure 7. The line of best fit is a line that runs through a scatter plot of data points and best reflects the relationship between them [65]. The regression ML analysis output can be used to predict the chlorophyll content over variation in VIs. The red line in Figure 8 is referred to as the best fit straight line for each model. Figure 9 shows the learning curves of MLR, RF, DT, SVR, XGB, KNN and ANN to evaluate the model learning performance over training instances. The shadow green and blue represent the standard deviation of accuracy, while the lines show the mean accuracy values in the proposed models. Learning curves are a common diagnostic tool for algorithms that learn progressively from a training dataset in ML [66]. The model's performance improves over time, indicating that the model is learning and improving [67].
coefficients (R 2 ) for the relationship between UAV-derived VIs and sugarcane chlorophyll content are estimated to select the essential features crucial for training the ML algorithms. The highly correlated input variables including RVI, DVI, NDRE, GNDVI, LCI, EVI, and NDVI are linked with the target. In this study, we used an absolute number, such as 0.5, as the variable selection threshold. If the predictor variables are found to be associated, the variable with the lowest correlation coefficient value with the target variable is discarded. However, other features selection techniques such as chi-square test, Fisher's score, variance threshold, mean absolute difference (MAD), forward feature selection, and backward feature elimination can also be used in different ML studies. Therefore, future studies can be compared the different feature selection methods for the forecast of chlorophyll to find the best prototype model [63].

Machine Learning Approach Using Multispectral Bands and VLs
The ML models evaluated; MLR, RF, DT, SVR, XGB, KNN, and ANN are all good at handling a continuous dependent variable that is correlated with VIs. The use of spectral vegetation indices in conjunction with machine learning models proved to be an effective method for predicting chlorophyll content consistent with [64], and spectral indices have proven to be an essential technique for evaluating nitrogen [14]. This study did not remove the soil from the reflectance map to estimate vegetation indices because the crop completely covered the soil during the experimental period. However, it is necessary to remove the soil for estimation of VIs during the early stage of sugarcane crops as an aerial map can be shown the soil between the sugarcane crops. We used 80% of the available VIs as input training data and 20% as validation data to estimate the machine learning model's performance on new data. Best fit line plots were generated to compare all ML models using both BFS and AFS methods, as illustrated in Figure 7. The line of best fit is a line that runs through a scatter plot of data points and best reflects the relationship between them [65]. The regression ML analysis output can be used to predict the chlorophyll content over variation in VIs. The red line in Figure 8 is referred to as the best fit straight line for each model.   Figure 9 shows the learning curves of MLR, RF, DT, SVR, XGB, KNN and ANN to evaluate the model learning performance over training instances. The shadow green and blue represent the standard deviation of accuracy, while the lines show the mean accuracy values in the proposed models. Learning curves are a common diagnostic tool for algorithms that learn progressively from a training dataset in ML [66]. The model's performance improves over time, indicating that the model is learning and improving [67].  When all spectral bands were utilized as input predictors, the results showed that the XGB technique delivers a higher retrieval accuracy than other models. XGB has recently been demonstrated to be a highly effective machine learning technique for mapping in RS, and it is capable of performing well even with limited training data [68,69]. Yang et al., 2021 conducted experiments on wheat SPAD estimation utilising cluster-regression algorithms using UAV hyperspectral data, and the results indicated that the XGB model beat the random forest model somewhat in estimating wheat SPAD. Therefore, the XGB algorithm may be used for fertiliser treatments in precision agriculture [70]. Also, MLR, RF, and DT models show a good training and validation score in both methods of BFS and AFS. Although the ANN outperformed the SVM and LR algorithms, it produced results that were considered inferior to the RF and DT methods [71,72]) because ANN approaches were favored over other types of spectral information for predicting crop nitrogen stress and mapping vegetation. Additionally, different types of plant stress have been identified using ANNs and multispectral data [73]. We observed that XGB, MLR, RF, DT show almost the same training and validation score in this study. However, The RF method shows the best accuracy of 90% than other models of SVM and MLR in the previous study to estimate the chlorophyll content conducted by Osco et al. [42]. Figure 9 shows the learning curves of MLR, RF, DT, SVR, XGB, KNN and ANN to evaluate the model learning performance over training instances. The shadow green and blue represent the standard deviation of accuracy, while the lines show the mean accuracy values in the proposed models. Learning curves are a common diagnostic tool for algorithms that learn progressively from a training dataset in ML [66]. The model's performance improves over time, indicating that the model is learning and improving [67]. When all spectral bands were utilized as input predictors, the results showed that the XGB technique delivers a higher retrieval accuracy than other models. XGB has recently been demonstrated to be a highly effective machine learning technique for mapping in RS, and it is capable of performing well even with limited training data [68,69]. Yang et al., 2021 conducted experiments on wheat SPAD estimation utilising cluster-regression algorithms using UAV hyperspectral data, and the results indicated that the XGB model beat the random forest model somewhat in estimating wheat SPAD. Therefore, the XGB algo- Our findings revealed that feature selection techniques can increase prediction accuracy in many models, including MLR, SVR, XGB and KNN however, these techniques were less important in the ANN model. This procedure is carried out to reduce the number of input features while maintaining the model's predicted accuracy. We looked at a total of 24 spectral indices, which is more than prior research of this type have done [42]. to previous studies of Ballester et al. [74] and Zeng & Chen, [75], when a single VI was used to create an association with chlorophyll using basic linear regressions, the R 2 of the generated relationships had significant fluctuates in values. When using a lower number of spectral indices, the XGB model helped to increase the algorithm's accuracy [42]. This suggests that the number of spectral indices utilized might be reduced while still obtaining extremely accurate results [42]. This information is critical since it aids new research in reducing the amount of processed data, which has an impact on training and testing times [42].
SVMs and KNN are frequently utilised when scientists are confronted with a huge number of features and a high degree of sparsity [74]. Although the SVM and KNN algorithm's prediction accuracy was lower than that of the other algorithms in this investigation due to the selection of important features of this study. Previous research has also indicated an increase in SVM and KNN performance when specified variables are used [74]. This could be because picking relevant variables improves the SVM and KNN performance by increasing its interpretability, computational efficiency, and generalisation performance [74]. Furthermore, SVM and KNN algorithms are more sensitive to the quality of data than other algorithms which may lead to reduced performance of this prediction model. This study is very important for the current fertiliser trial at SRI at Uda Walawe because chlorophyll measurement should be taken every two weeks to compare and analyse the effect of variation in different fertiliser treatments. However, it is very difficult and needs more time with labourers using SPAD meters. Therefore, this proposed ML model can be used to measure the chlorophyll content every two weeks at a large sugarcane field if we use UAV and ML techniques. Further, measuring chlorophyll by SPAD may be produced inaccurate reading due to leaf structure, water content and leaf pigment distribution [76]. Environmental factors including light intensity can also affect the light transmittance of a leaf, resulting in incorrect measurement of chlorophyll content. Therefore, the application of UAV, multispectral camera and AI can be an effective solution for fertiliser trails for sugarcane crops.

Limitations of the Experimental and Modelling Approach
The small number of samples (216 samples) is a limitation of this study. Though RF is suitable for modest amounts of sampling data, the RF model's performance is linked to the sample size, and the more sample points there are, the more accurate the forecasts [76], the predicted model may not be suitable for discriminating crop chlorophyll content from different growth stages as this research was done for 5-month-old sugarcane crops. In the modelling approach, manual hyper parameter tuning was performed to obtain the best model for all the ML algorithms including MLR, RF, DT, SVR, XGB, KNN and ANN, so future work using grid-searching, which is the process of scanning the data to align best parameters for a given model with minimal human effort, can be employed. This study focused on regression ML models. Therefore, further studies should be needed to develop the chlorophyll prediction model by ML and DL classification techniques with grid searching methods for different stages of sugarcane crops and various environmental conditions.

Conclusions
Chlorophyll is an important crop biophysical feature to measure crop health and create early predictions. This current study looked at the viability of using multispectral UAV images to predict the chlorophyll content of sugarcane crops in SRI, Sri Lanka. A SPAD chlorophyll meter acquired ground-truthing data of the sugarcane chlorophyll contents to correlate the different vegetation indices for ML. Different ML models were compared for several vegetation indices and the chlorophyll content of sugarcane crops to construct a prediction model for sugarcane chlorophyll content. Among the other indices utilized in the study, RVI and DVI revealed a strong and positive correlation with the chlorophyll content of sugarcane crops. The results show that the XGB technique delivered a higher retrieval accuracy than other models when all spectral bands were utilized as input predictors. The most important finding of this study is that spectral signals derived from space multispectral data offer useful information for quantifying sugarcane chlorophyll content over greater geographic areas for implementing proper farm management. Due to practical constraints, the agronomical approach of collecting leaf tissue and performing chemical analysis in the laboratory is time consuming and spatially limited. This research and the use of UAVs with AI can positively impact fertilization procedures and lead to more accurate yield projections. With the success of predicting the chlorophyll content across larger geographic areas using spaceborne multispectral data, cane growers will be able to monitor the nutritional state of their sugarcane early and address nutrient deficient areas with appropriate management. In further work on the proposed approach for estimating chlorophyll content, this has to be tested in different sugarcane fields with different varieties and must be validated for the different growing stages of sugarcane crops.