Detection of White Leaf Disease in Sugarcane Using Machine Learning Techniques over UAV Multispectral Images

: Sugarcane white leaf phytoplasma (white leaf disease) in sugarcane crops is caused by a phytoplasma transmitted by leafhopper vectors. White leaf disease (WLD) occurs predominantly in some Asian countries and is a devastating global threat to sugarcane industries, especially Sri Lanka. Therefore, a feasible and an effective approach to precisely monitoring WLD infection is important, especially at the early pre-visual stage. This work presents the ﬁrst approach on the preliminary detection of sugarcane WLD by using high-resolution multispectral sensors mounted on small unmanned aerial vehicles (UAVs) and supervised machine learning classiﬁers. The detection pipeline discussed in this paper was validated in a sugarcane ﬁeld located in Gal-Oya Plantation, Hingurana, Sri Lanka. The pixelwise segmented samples were classiﬁed as ground, shadow, healthy plant, early symptom, and severe symptom. Four ML algorithms, namely XGBoost (XGB), random forest (RF), decision tree (DT), and K-nearest neighbors (KNN), were implemented along with different python libraries, vegetation indices (VIs), and ﬁve spectral bands to detect the WLD in the sugarcane ﬁeld. The accuracy rate of 94% was attained in the XGB, RF, and KNN to detect WLD in the ﬁeld. The top three vegetation indices (VIs) for separating healthy and infected sugarcane crops are modiﬁed soil-adjusted vegetation index (MSAVI), normalized difference vegetation index (NDVI), and excess green (ExG) in XGB, RF, and DT, while the best spectral band is red in XGB and RF and green in DT. The results revealed that this technology provides a dependable, more direct, cost-effective, and quick method for detecting WLD.


Introduction
Sugarcane (Saccharum officinarum) is a tropical plant, and it is the most important sugar extracting crop in Sri Lanka [1,2].Sugarcane white leaf disease (WLD) is one of the most economically important diseases in Sri Lanka's sugarcane industry [3], and WLD severely progresses in ratoon sugarcane, which ultimately affects yield [4]. WLD is caused by a phytoplasma, an obligate plant parasite that attacks plant phloem tissue.It is transmitted through leafhopper insect vectors [3][4][5].Cream-white stripes are developed parallel to the midrib of sugarcane leaves, eventually covering the entire leaf in the infected crops.Other symptoms of WLD include stunted stalks, the absence of lateral shoots on the upper portion of infected stalks, and eventual plant death.Currently, there are no sugarcane varieties found to be resistant to WLD in Sri Lanka [4].As a preventive approach, growers still follow traditional scouting methods all over the field, monitoring disease symptoms with human eyes and burning infected crops on the spot.However, this method requires Drones 2022, 6, 230 2 of 22 a significant amount of time to watch the entire field to identify infected areas in large field sugarcane plantations.Thus, precision agriculture technologies aided with modern computational machine learning approaches may provide an effective way of detecting sugarcane WLD on-field, an alternative to human-based methods.
Precision agriculture is a smart farming method that uses current technologies to examine and manage changes within an agricultural field to maximize cost-effectiveness, sustainability, and environmental protection [6][7][8].Precision agriculture is crucial to seeking low-input, high-efficiency, and sustainable methods in agricultural industries [9].Recent improvements in the application of UAV-based remote sensing in crop production have proved crucial in improving crop productivity [10].Remote sensing for precision agriculture is based on the indirect detection of soil and crop reflected radiation in an agricultural field [11].This approach is well suited for monitoring plant stress and disease since it provides multitemporal and multispectral data.UAVs are increasingly used for agriculture to collect high-resolution images and videos for post-processing.Artificial intelligent (AI) approaches are used to process these UAV images for planning, navigation, and georeferencing, as well as for a variety of agricultural applications [12].UAVs and advanced computational ML techniques are increasingly used to forecast and improve yield in various farming industries, including sugarcane [10].
León-Rueda et al. [13] examined the use of multispectral cameras mounted on UAVs to classify commercial potato vascular wilt using supervised random forest classification.Su et al. [14] investigated the yellow rust disease in winter wheat using a multispectral camera by selecting spectral bands and SVI with a high discriminating capability.Albetis et al. [15] assessed the possibility of distinguishing Flavescence dorée symptoms using UAV multispectral imaging.Gomez Selvaraj et al. [16] examined the potential of aerial imagery and machine learning approaches for disease identification in bananas by classifying and localizing bananas in mixed-complex African environments using pixelbased classifications and machine learning models.Lan et al. [17] assessed the feasibility of large-area identification of citrus Huanglongbing using remote sensing and committed to improving the accuracy of detection using numerous ML techniques, including support vector machine (SVM), K-nearest neighbor (KNN), and logistic regression (LR).Table 1 represents the application of UAVs for disease management in precision agriculture, and Table 2 shows the use of UAVs for pest and disease control in the sugarcane sector.
ML algorithms have been used to monitor the crop status in many remote sensing applications in agriculture [30][31][32][33].ML methods attempt to establish a relationship between crop parameters to forecast crop production [34].Artificial neural networks (ANN), random forests (RF), SVM, and decision trees (DT) are relevant algorithms in remote sensing applications [35].
Saini and Ghosh [36] utilized XGBoost (XGB), stochastic gradient boosting (SGB), RF, and SVM for rice mapping crops in India to evaluate the efficacy of ensemble methods.Huang et al. [37] used VIs generated from canopy level hyperspectral scans to examine the utility of the RF technique in combination with the XGB approach for detecting wheat stripe rust early and mid-term.Compared to typical machine learning approaches, the XGB, as a unique ML methodology, can reduce model overfitting and computing effort [37].Tageldin et al. [38] used the XGB method to predict the occurrence of cotton leaf miner infestation with an accuracy of 84 percent, which was greater than the findings obtained using algorithms, such as RF and logistic regression.The RF non-parametric classifier is an ensemble-based machine learning technique that combines the predictions of many decision tree classifiers using a voting strategy [39].Santoso et al. [40] assessed the RF model's potential for predicting BSR disease in oil palm fields and produced BSR disease distribution maps.With the cascade parallel random forest (CPRF) algorithm and a 20-year examination of pertinent data, Zhang [41] identified the pattern of rice diseases.Samajpati and Degadwala [42] experimented with identifying apple scab, apple rot, and apple blotch utilizing the RF algorithm.Some of the researchers suggested a model that employs a decision tree to identify and categorize leaf disease and boosts its detection accuracy while reducing detection time compared to the current system using DT models [43][44][45].K-nearest neighbor (KNN) is a prevalent machine learning algorithm that performs well in supervised learning scenarios and simple recognition issues [46].Vaishnnave et al. [47] developed the ML model by KNN algorithm to detect the groundnut leaf disease, and Krithika and Grace [48] used a KNN classifier to identify the grape leaf diseases.Kapil et al. [49] developed a system for recognizing cotton leaf disease by the KNN algorithm.Vegetation indices are numerical metrics used in remote sensing applications to assess the differentiation of vegetation cover, vigor, and growth dynamics.A sum, difference, ratio, or other linear combination of reflectance factor or radiance measurements from two or more wavelength intervals normally constitutes the vegetation index.It is utilized to increase the reliability of regional and temporal comparisons of terrestrial photosynthetic activity and canopy structure variation by enhancing the contribution of vegetation features [50].A VI's ability to detect WLD-infected sugarcane via image processing from a multispectral camera placed on a UAV was examined by Sanseechan et al. [5].Moriya et al. [29] developed a method for accurately identifying and mapping mosaic virus in sugarcane using aerial surveys conducted with a UAV equipped with a hyperspectral camera.A few research studies have been undertaken using ML techniques over UAV multispectral images to identify the other sugarcane diseases, and no research studies have been undertaken related to detection of WLD using ML models and high-resolution UAV imagery in sugarcane crops.Therefore, this study proposes developing a method for identifying sugarcane WLD by combining UAV technology with high-resolution multispectral cameras and multiple machine learning classification algorithms.There were two sub-goals: (1) to correlate the VIs with the fluctuation in severity level of WLD in the sugarcane field; and (2) to evaluate the detection performance in WLD severity levels using various ML approaches.
UAV-based remote sensing can assist farmers in analyzing crop health and management in precision agriculture.Early detection of WLD in Sri Lankan sugarcane fields will be used to implement effective management measures throughout the crop's early phases.This method will aid in disease management in sugarcane farms by eliminating the requirement for conventional methods [51].Ultimately, it will help farmers and the cane industry in Sri Lanka recover economically.However, the commercial application of UAVs and artificial intelligence algorithms in sugarcane sectors has been limited due to various variables, including technology, UAV legislation, and cost [51].

Process Pipeline
As depicted in Figure 1, a process pipeline with four key components was developed: acquisition, preprocessing, training, and prediction.Images are downloaded, orthorectified, mosaicked, and preprocessed to extract samples with crucial features and then we labelled them.The data were then supplied to supervised machine learning classifiers, trained, and optimized for detection.The complete orthorectified data were then analyzed to determine where WLD crops would grow in the field.Images were collected, orthorectified, and preprocessed to extract samples with essential characteristics and then we labelled them.The data were subsequently sent to supervised machine learning classifiers that had been trained and optimized for detection.
from a multispectral camera placed on a UAV was examined by Sanseechan et al. [5].Moriya et al. [29] developed a method for accurately identifying and mapping mosaic virus in sugarcane using aerial surveys conducted with a UAV equipped with a hyperspectral camera.A few research studies have been undertaken using ML techniques over UAV multispectral images to identify the other sugarcane diseases, and no research studies have been undertaken related to detection of WLD using ML models and high-resolution UAV imagery in sugarcane crops.Therefore, this study proposes developing a method for identifying sugarcane WLD by combining UAV technology with high-resolution multispectral cameras and multiple machine learning classification algorithms.There were two sub-goals: (1) to correlate the VIs with the fluctuation in severity level of WLD in the sugarcane field; and (2) to evaluate the detection performance in WLD severity levels using various ML approaches.
UAV-based remote sensing can assist farmers in analyzing crop health and management in precision agriculture.Early detection of WLD in Sri Lankan sugarcane fields will be used to implement effective management measures throughout the crop's early phases.This method will aid in disease management in sugarcane farms by eliminating the requirement for conventional methods [51].Ultimately, it will help farmers and the cane industry in Sri Lanka recover economically.However, the commercial application of UAVs and artificial intelligence algorithms in sugarcane sectors has been limited due to various variables, including technology, UAV legislation, and cost [51].

Process Pipeline
As depicted in Figure 1, a process pipeline with four key components was developed: acquisition, preprocessing, training, and prediction.Images are downloaded, orthorectified, mosaicked, and preprocessed to extract samples with crucial features and then we labelled them.The data were then supplied to supervised machine learning classifiers, trained, and optimized for detection.The complete orthorectified data were then analyzed to determine where WLD crops would grow in the field.Images were collected, orthorectified, and preprocessed to extract samples with essential characteristics and then we labelled them.The data were subsequently sent to supervised machine learning classifiers that had been trained and optimized for detection.

Study Site
The study was conducted in a 1.24-hectare sugarcane field in Gal-Oya Plantation, Hingurana, Sri Lanka (7 • 16 42.94N, 81 • 42 25.53E) during the sugarcane growing season of October 2021 (Figure 2).For this experiment, two-month-old sugarcane plants with an average height of 1.2 m were chosen.

Study Site
The study was conducted in a 1.24-hectare sugarcane field in Gal-Oya Plantation, Hingurana, Sri Lanka (7° 16'42.94"N, 81° 42'25.53"E) during the sugarcane growing season of October 2021 (Figure 2).For this experiment, two-month-old sugarcane plants with an average height of 1.2 m were chosen.Disease plants were randomly sampled throughout the field for the levels of disease severities followed by the natural disease occurrence pattern in the field.Field agronomists confirmed the following during this experiment: (1) Ridges and furrows irrigation method was used in the field, and there was no water stress to the plants; (2) the entire experimental site had a uniform soil type (sandy to clay loam soils); (3) fertilizers were applied in the recommended level to the entire experimental field, and there was no fertilizer stress to the plants; and (4) WLD disease was transmitted by insect vector and was not associated with soil or water, and this symptom was developed only by WLD.Due to the above four reasons, it was not necessary to design the experiment for block design in this site.

Ground Truth Data Collection
Experts visually inspected and labelled diseased and healthy plants as ground truth before image acquisition to train and test the classifier [39].The sugarcane plants (a total of 150 plants) were classified into three types, healthy plants (50 plants), early symptoms plants (50 plants), and severe symptoms plants (50 plants) by using different color tags such as white tag, yellow tag, and red tag, respectively, that were installed in the training site manually as shown in Figure 3.A total of 90 plants were classified into three types, Disease plants were randomly sampled throughout the field for the levels of disease severities followed by the natural disease occurrence pattern in the field.Field agronomists confirmed the following during this experiment: (1) Ridges and furrows irrigation method was used in the field, and there was no water stress to the plants; (2) the entire experimental site had a uniform soil type (sandy to clay loam soils); (3) fertilizers were applied in the recommended level to the entire experimental field, and there was no fertilizer stress to the plants; and (4) WLD disease was transmitted by insect vector and was not associated with soil or water, and this symptom was developed only by WLD.Due to the above four reasons, it was not necessary to design the experiment for block design in this site.

Ground Truth Data Collection
Experts visually inspected and labelled diseased and healthy plants as ground truth before image acquisition to train and test the classifier [39].The sugarcane plants (a total of 150 plants) were classified into three types, healthy plants (50 plants), early symptoms plants (50 plants), and severe symptoms plants (50 plants) by using different color tags such as white tag, yellow tag, and red tag, respectively, that were installed in the training site manually as shown in Figure 3.A total of 90 plants were classified into three types, healthy plants (30 plants), early symptoms plants (30 plants), and severe symptoms plants (30 plants), by using color tags in the testing site for validation.Early symptom plants were characterized by the youngest leaves appearing white with older leaves remaining green.Pure white leaves classify the severe plant symptoms in most leaves with stunted growth [52].
Drones 2022, 6, x FOR PEER REVIEW 6 of 24 healthy plants (30 plants), early symptoms plants (30 plants), and severe symptoms plants (30 plants), by using color tags in the testing site for validation.Early symptom plants were characterized by the youngest leaves appearing white with older leaves remaining green.Pure white leaves classify the severe plant symptoms in most leaves with stunted growth [52].

UAV Platform
DJI P4 multispectral UAV was used to conduct the experiment in the sugarcane field.DJI P4 Multispectral is a fully integrated UAV platform, and it can complete the data collection task independently without the help of other aircraft.It has a take-off weight of 1487 g, and the average flight time is 27 min.The P4 Multispectral imaging system contains six cameras with 1/2.9-inchCMOS sensors, including an RGB camera that produces images in the JPEG format and a multispectral camera array containing five cameras (Figure 4b) that produce multispectral images in the TIFF format.It uses a global shutter to ensure performance.The five cameras in the multispectral camera array can capture photos in the following imaging bands: Blue (B): 450 nm ± 16 nm; green (G): 560 nm ± 16 nm; red (R): 650 nm ± 16 nm; red edge (RE): 730 nm ± 16 nm; and near-infrared (NIR): 840 nm ± 26 nm [53] without zoomable.Table 3 shows the information on central wavelength and wavelength width for DJI P4 multispectral camera [53].

UAV Platform
DJI P4 multispectral UAV was used to conduct the experiment in the sugarcane field.DJI P4 Multispectral is a fully integrated UAV platform, and it can complete the data collection task independently without the help of other aircraft.It has a take-off weight of 1487 g, and the average flight time is 27 min.The P4 Multispectral imaging system contains six cameras with 1/2.9-inchCMOS sensors, including an RGB camera that produces images in the JPEG format and a multispectral camera array containing five cameras (Figure 4b) that produce multispectral images in the TIFF format.It uses a global shutter to ensure performance.The five cameras in the multispectral camera array can capture photos in the following imaging bands: Blue (B): 450 nm ± 16 nm; green (G): 560 nm ± 16 nm; red (R): 650 nm ± 16 nm; red edge (RE): 730 nm ± 16 nm; and near-infrared (NIR): 840 nm ± 26 nm [53] without zoomable.Table 3 shows the information on central wavelength and wavelength width for DJI P4 multispectral camera [53].Table 4 shows the camera specifications of the DJI P4 Multispectral.The remote controller features, as shown in the Figure 4a, of DJI's long-range transmission technology can control the aircraft and the gimbal cameras at a maximum transmission range of 4.3 mi (7 km).It is possible to connect an iPad to the remote controller via the USB port to use  Table 4 shows the camera specifications of the DJI P4 Multispectral.The remote controller features, as shown in the Figure 4a, of DJI's long-range transmission technology can control the aircraft and the gimbal cameras at a maximum transmission range of 4.3 mi (7 km).It is possible to connect an iPad to the remote controller via the USB port to use the DJI GS Pro app to plan and perform missions.It can also be used to export the captured images for analysis and create multispectral maps [53].The RTK module is integrated directly into the Phantom 4 RTK, providing real-time, centimeter-level positioning data for improved absolute accuracy on image metadata.The GSD for the P4 multispectral is (H/18.9)cm/pixel.Height can be calculated based on the accuracy needed for flight mission.

Collection of Multispectral UAV Images
A UAV flying operation was undertaken during the growing season utilizing a DJI P4 multispectral system on a sunny day between 11:00 a.m. and 12:00 p.m.The visible-tonear-infrared spectral range of the DJI P4 multispectral camera comprises five bands with wavelengths of 450.0 nm, 560.0 nm, 650.0 nm, 730.0 nm, and 840.0 nm, respectively (blue, green, red, red edge, and near-infrared).The flying altitude was 20 m and maintained by a barometer present in the UAV.DJI P4 multispectral UAV uses barometer to maintain the altitude.It uses mean barometer to measure air pressure and establish and maintain a stable altitude during flying.Barometer can rapidly measure changes in atmospheric pressure to help ensure the UAV is flying at the appropriate elevation.Additionally, the experiment site is in the level surface.Therefore, it was easy to maintain the height in the same altitude.The size of pixels in terms of real-world dimensions for this experiment was 1.1 cm/pixel.
The UAV was flown at different heights, 10 m, 15 m, 20 m, and 25 m, before conducting the flight mission to select the suitable height needed for labelling the WLD over the multispectral orthomosaic image.The flight campaign at 15 m was selected as captured data, which provided the best outcomes in terms of WLD detection, UAV endurance, and battery capacity.The speed of the UAV and front and side overlap of images were 1.4 m per second and 75% and 65%, respectively.The experiment was conducted between 11:00 a.m. and 12:00 p.m. because plant leaves are erect and at maximum transpiration time (active time for plants) at the time of image capture.Early morning and late afternoon or evening are not suitable for conducting this experiment due to dew on the plants in the early morning and dropping of leaves in the late afternoon or evening.Additionally, this will have not an effect on the VIs values.Therefore, the time of image capture is very important for developing the accurate WLD detection models.

Software and Python Libraries
This research was conducted using several software tools and python libraries.Agisoft Metashape (Version 1.6.6;Agisoft LLC, Petersburg, Russia) was used to process, filter, and orthorectify 5600 raw photos for multispectral image analysis.A set of images from cropped regions was extracted and then labelled using QGIS (Version 3.2.0;Open-Source Geospatial Foundation, Chicago, IL, USA).Visual Studio Code (VS Code) 1.70.0 was used as source-code editor to develop the different ML algorithms using the Python 3.8.10programming language.Several libraries were used for data manipulation and machine learning, including Geospatial Data Abstraction Library (GDAL) 3.0.2,eXtreme Gradient Boosting (XGBoost) 1.5.0,Scikit-learn 0.24.2,OpenCV 4.6.0.66, and Matplotlib 3.4.3.

Data Labelling
A mask for each image was generated by assigning integer values for every highlighted pixel to perform image labelling.The integer values were set as follows: 1 = ground cover; 2 = shadow; 3 = healthy; 4 = early symptoms; and 5 = severe WLD by using QGIS.Each bright colored pixel was filtered from an orthomosaic image.A new shapefile was created to draw the polygons on the multispectral orthomosaic image to label each class by using toggle editing and adding polygon tools in the QGIS.In total, 471,748 pixels were labelled from all the classes based on the ground truth information by observing the different color tags in the orthomosaic image as shown in Figure 3.The edges of the plant leaves were not labelled to prevent the misclassification of mixed pixels.All the selected 150 plants' leaves were labelled by using the polygon tool in QGIS as shown in Figure 5. Ground truth shape files (.shp) were exported for training the different ML models.Shape region in the shape file was converted into labelled pixels using translation techniques before training the data.

Statistical Analysis for Algorithm Development
Statistical analysis was conducted using multicollinearity testing and normality testing to select the best fit ML models before tuning them with labelled data.From an initial list of twenty VIs, only six of them were chosen to train the models via multicollinearity testing via variable inflation factors (VIF) to avoid model overfitting.Finally, eleven input features (five bands and six VIs) were used to develop the ML models to detect WLD.

Statistical Analysis for Algorithm Development
Statistical analysis was conducted using multicollinearity testing and normality testing to select the best fit ML models before tuning them with labelled data.From an initial list of twenty VIs, only six of them were chosen to train the models via multicollinearity testing via variable inflation factors (VIF) to avoid model overfitting.Finally, eleven input features (five bands and six VIs) were used to develop the ML models to detect WLD.Variance inflation factor was used to measure how much the variance of the estimated regression coefficient is inflated if the independent variables are correlated [54].VIF is calculated as shown in Equation (1).
where R i 2 represents the unadjusted coefficient of determination for regressing the i-th independent variable on the remaining ones, and tolerance is simply the inverse of the VIF.The lower the tolerance, the more likely is the multicollinearity among the variables.The value of VIF =1 indicates that the independent variables are not correlated to each other.If the value of VIF is 1< VIF < 5, it specifies that the variables are moderately correlated to each other.The challenging value of VIF is between 5 to 10 as it specifies the highly correlated variables.If VIF ≥ 5 to 10, there will be multicollinearity among the predictors in the regression model, and VIF > 10 indicates the regression coefficients are feebly estimated with the presence of multicollinearity [54].
Based on the literature review [54][55][56], input features, which are not correlated among all input features and moderately correlated among all input features, such as blue, green, red, red edge, NIR, normalized difference vegetation index (NDVI), green normalized difference vegetation index (GNDVI), normalized difference red edge index (NDRE), green chlorophyll index (GCI), modified soil-adjusted vegetation index (MSAVI), and excess green (ExG), were selected to train the models, as shown in the Table 5. Highly correlated input variables, such as leaf chlorophyll index (LCI), difference vegetation index (DVI), ratio vegetation index (RVI), enhanced vegetation index (EVI), triangular vegetation index (TVI), green difference vegetation index (GDVI), normalized green red difference index (NGRDI), atmospherically resistant vegetation index (ARVI), structure insensitive pigment index (SIPI), green optimized soil adjusted vegetation index (GOSAVI), excess red (ExR), excess green red (ExGR), normalized difference index (NDI), and simple ratio index (SRI), were not selected to train the ML models due to higher VIF that range from around 7 to 22 [54].A second statistical experiment of normality test was conducted to determine whether sample data have been drawn from a normally distributed population for the development of ML models.Different normality tests, namely quantile-quantile (Q-Q) plot, were conducted to confirm the normal distribution of features.Figure 6 shows the Q-Q plot confirming that the data were adequately close to the theoretical reference line, representing a sound model fit.The python libraries, such as matplotlib, numpy, statsmodels.graphics.gofplots,and scipy.stats,were used to develop the Q-Q plot.

9
MSAVI 1.0121 10 ExG 3.0231 A second statistical experiment of normality test was conducted to determine whether sample data have been drawn from a normally distributed population for the development of ML models.Different normality tests, namely quantile-quantile (Q-Q) plot, were conducted to confirm the normal distribution of features.Figure 6 shows the Q-Q plot confirming that the data were adequately close to the theoretical reference line, representing a sound model fit.The python libraries, such as matplotlib, numpy, statsmodels.graphics.gofplots,and scipy.stats,were used to develop the Q-Q plot.

Development of Classification Algorithms and Prediction
The development of algorithms includes multiple steps to load, preprocess, fit the classifier to the data, and prediction.The processing phase converts the read data into a collection of features, which are then analyzed by the classifier as shown in Table 6.An orthomosaic multispectral raster was loaded into the algorithm to calculate spectral indexes and improve the detection rates as mentioned in step 5.For this approach, the VIs, such as ExG, GCI, MSAVI, GNDVI, NDRE, and NDVI, are estimated (step 6) as shown in Table 7.All five bands in the multispectral raster, as well as in the estimated vegetation spectral indexes, are denominated as input features (step 7).

Development of Classification Algorithms and Prediction
The development of algorithms includes multiple steps to load, preprocess, fit the classifier to the data, and prediction.The processing phase converts the read data into a collection of features, which are then analyzed by the classifier as shown in Table 6.An orthomosaic multispectral raster was loaded into the algorithm to calculate spectral indexes and improve the detection rates as mentioned in step 5.For this approach, the VIs, such as ExG, GCI, MSAVI, GNDVI, NDRE, and NDVI, are estimated (step 6) as shown in Table 7.All five bands in the multispectral raster, as well as in the estimated vegetation spectral indexes, are denominated as input features (step 7).

Table 6.
Steps in algorithms (XGB, RF, DT, and KNN) development-detection and segmentation of WLD using multispectral imagery.
Step 1-import required modules and libraries.
Step 2-Load input file (Multispectral images as .tiff)and ground truth file (Ground truth shape file as .shp).
Step 3-Extract the bands (features) from input file (blue, green, red, red edge, and NIR) through GDAL library.
Step 5-Store the bands in the variable (V) as five input features.
Step 6-Estimation of selected VIs (Additional input features-six).
Step 7-Append the VIs and five bands and store in the same variable (V)-Total-11 input features.
Step 8-Search for the number of classes of the labelled data.
Step 9-Filter unlabelled data from the source image and store their values in the 'X' features variable and store in the array (x_array).
Step 10-Select only labelled data from the labelled image and store their values in the 'y' labels variable and store in the array (y_array).
Step 11-Splitting the dataset into the 'Training' set (75%) and 'Test' set (25%) Step 12-Data normalization (Feature Scaling) of the 'X' features matrix for Euclidean distance Step 13-Fitting Classifier (XGB, RF, DT, and KNN) to the training set (11 input features) Step 14-manual hyper-parameter tuning based on the algorithms as shown in Table 7.

Step 15-Applying k-fold Cross Validation
Step 16-Export and save the model.

Step 17-Predict the values for each sample in x-array
Step 18-Export the output file as tagged image file (TIF) format.
The labelled regions from the ground-based assessments are exported from QGIS and loaded into an array (y_array) (step 10).In all, 471,748 pixelwise samples were filtered and randomly divided into a training array (75%) and a testing array (25%) (step 11).In step 13, data are processed into different ML classifiers.This study employed four (04) machine learning regression methods, XGB, RF, DT, and KNN, to detect the sugarcane WLD from multispectral UAV images.Finally, the fitted model is validated using k-fold cross-validation (step 15).In the prediction stage, unlabelled pixels are processed in the optimized classifier, and their values are displayed in the same 2D spatial image from the orthorectified multispectral raster (step 17).Each image's identified pixels are then colored differently and exported in TIF format, which can be read with geographic information system (GIS) platforms (step 18).The best performant model for identifying WLD in the sugarcane field was selected by comparing performance metrics, such as precision, recall, f1 score, and accuracy.Further details on the calculation of these metrics can be found in Section 3.4.

Validation
For validation, 90 sugarcane plants were classified into three types, healthy plants (30 plants), early symptoms plants (30 plants), and severe symptoms plants (30 plants), by using different color tags, such as white tag, yellow tag, and red tag, respectively, as shown in Figure 3, in the testing site.The labelling was performed the same as in the training site that is mentioned in the Section 2.6.Then, a python script was developed to validate the validation accuracy in the testing site.Finally, an input file (multispectral images as .tifffor testing site), a ground truth file (ground truth shape file as .shpfor testing site), and a best model file (as .jsonexported from training) were loaded into different algorithms for estimating the validation accuracy.

Estimation of Vegetation Indices
Leaf pigments' absorption characteristics govern spectral reflectance.Therefore, any variation in pigment concentrations correlates closely with the health and production of the plant [20].Six (06) VIs were selected based on the results from multicollinearity testing and variable optimization techniques [51,67] to develop the different ML models to detect the WLD, as shown in Figure 7 and Table 8.To construct the various VIs, reflectance values in multispectral bands corresponding to blue (B): 450 nm ± 16 nm; green (G): 560 nm ± 16 nm; red (R): 650 nm ± 16 nm; red edge (RE): 730 nm ± 16 nm; near-infrared (NIR): 840 nm ± 26 nm were utilized in this study.

Ranking of Feature Importance
As shown in Figure 8, induvial five bands and selected VIs were ranked using feature importance techniques with python programming during the model development.The top five important features in XGB models are MSAVI, NDVI, red, green, and NIR.Moreover, MSAVI, NDVI, red, blue, and NIR were ranked as the top five features in the RF model, while NDVI, green, MSAVI, red, and ExG were the top five in the DT model.However, GCI shows the lowest rank in XGB and DT models while NDRE shows the lowest ranking in the RF model to detect the WLD in the sugarcane field.

Ranking of Feature Importance
As shown in Figure 8, induvial five bands and selected VIs were ranked using feature importance techniques with python programming during the model development.The top five important features in XGB models are MSAVI, NDVI, red, green, and NIR.Moreover, MSAVI, NDVI, red, blue, and NIR were ranked as the top five features in the RF model, while NDVI, green, MSAVI, red, and ExG were the top five in the DT model.However, GCI shows the lowest rank in XGB and DT models while NDRE shows the lowest ranking in the RF model to detect the WLD in the sugarcane field.

Segmentation Results of the Proposed Approaches
Figure 9a represents the multispectral orthomosaic image generated from the UAV raw images, while Figure 9b shows the WLD spatial map developed by XGB, which is an optimized model among other ML models.The severe WLD plant shows a red color in almost all the canopy areas, as shown in Figure 10c.In early symptom plants, most of the canopy region shows yellow, as shown in Figure 10b, while healthy plants show green color in most of the canopy region (Figure 10a).However, the margin of the canopy shows red color in all the classifications due to dead leaves presented in each sugarcane crop.The spatial distribution of the severity of the WLD of sugarcane is plotted in Figure 9 using different prediction models, such as (a) XGB, (b) RF, (c) DT, and (d) KNN.Segmented images for photo interpretation and accuracy indicators were implemented for validation purposes.In total, 117,937 labelled pixels were evaluated from the test to assess the algorithms.Figure 11 represents the segmentation results of healthy, early symptoms, and severe symptoms of WLD in sugarcane plants for different ML models.

Segmentation Results of the Proposed Approaches
Figure 9a represents the multispectral orthomosaic image generated from the UAV raw images, while Figure 9b shows the WLD spatial map developed by XGB, which is an optimized model among other ML models.The severe WLD plant shows a red color in almost all the canopy areas, as shown in Figure 10c.In early symptom plants, most of the canopy region shows yellow, as shown in Figure 10b, while healthy plants show green color in most of the canopy region (Figure 10a).However, the margin of the canopy shows red color in all the classifications due to dead leaves presented in each sugarcane crop.The spatial distribution of the severity of the WLD of sugarcane is plotted in Figure 9 using different prediction models, such as (a) XGB, (b) RF, (c) DT, and (d) KNN.Segmented images for photo interpretation and accuracy indicators were implemented for validation purposes.In total, 117,937 labelled pixels were evaluated from the test to assess the algorithms.Figure 11 represents the segmentation results of healthy, early symptoms, and severe symptoms of WLD in sugarcane plants for different ML models.

Confusion Matrix and Classification Report
The training performance of various machine learning models, such as XGB, RF, DT, and KNN, was compared over consecutive runs by overall accuracy, F1 score, precision, and recall.The results indicate that all machine learning models performed similarly well in the suggested pipeline for detecting WLD.The classification results indicated that all models achieved high accuracy.The confusion matrix of each model as represented in Table 9 and the classification reports for each model were as shown in Table 10.The descriptors true positive (TP), false positive (FP), true negative (TN), and falsenegative (FN) were utilized to construct the confusion matrix (Equation ( 2)) and subsequently calculate the overall accuracy (Equation ( 3)), precision (Equation ( 4)), recall (Equation ( 5)), and F-score (Equation ( 6) [68].

Confusion Matrix and Classification Report
The training performance of various machine learning models, such as XGB, RF, DT, and KNN, was compared over consecutive runs by overall accuracy, F1 score, precision, and recall.The results indicate that all machine learning models performed similarly well in the suggested pipeline for detecting WLD.The classification results indicated that all models achieved high accuracy.The confusion matrix of each model as represented in Table 9 and the classification reports for each model were as shown in Table 10.The results show that 94% of overall accuracy was attained in the XGB, RF, and KNN to detect WLD in the field, even though the DT model also shows good overall accuracy of 93%.Among five classes, ground cover, shadow, and healthy plants were classified with more than 93% of precision, recall, and F 1 scores in all the models.In contrast, early and severe symptom crops were classified with more than 75% accuracy in XGB and RF models.However, the DT model obtained the lowest precision, recall, and F1 scores of 67%, 69%, and 68% to classify the severe symptom crops.According to the previous studies, Sandino et al. [69] detected healthy and infected trees in the forest with exotic pathogens using the XGboost algorithms with a 97% classification accuracy.Santoso et al. [40] identified healthy and unhealthy oil palms with an overall accuracy of 91% using the RF classifier model.Sandika et al. [70] presented a classification scheme for three grape diseases: anthracnose, powdery mildew, and downy mildew by RF.The proposed system achieved a classification accuracy of 86%.Suresha et al. [71] proposed a method for identifying blast and brown spot diseases in rice with a KNN classifier with an accuracy of 76.59%.Abdulridha et al. [46] developed the KNN algorithms with an overall classification accuracy of 94%, 95%, and 96% to detect citrus canker on tree canopies in the orchard.Zhang et al. [50] built optimal BFW classification models with higher overall accuracy (OA) of 97.28% by RF based on the five multispectral bands.

Testing and Validation in a Different Field at Gal-Oya Plantation
The K-fold cross-validation technique was used to develop the best trained models of XGB, RF, DT, and KNN.Finally, the best models were used to detect the WLD in the testing site located in the same region (Figure 2).In addition to that, validation was performed by observing and labelling the color tags representing the testing site as explained in Section 2.3.Finally, another classification report was developed as shown in Table 11.The results show that 92% of overall accuracy was attained in the XGB, RF, and KNN to detect WLD in the different fields by using the same ML models while 91% of accuracy was obtained by the DT model.

Model Training Time at the Training Site
The training times for each approach under the computer capacity of the 11th Gen Intel(R) Core (TM) i7-1185G7 @ 3.00GHz, 1805 MHz, 4 Core(s), 8 Logical Processor(s), and 16.0 GB RAM in Microsoft Windows 10 Enterprise are listed in Table 12.The most accurate team, XGB, also had the smallest training time, nine minutes.KNN's training took the most time, 29 min, but had the same total accuracy as XGB.RF and DT had the same overall accuracy of 94% and 93%, respectively, with 15 and 18 min of training duration.

Discussion
The current study demonstrates a viable strategy for detecting WLD in sugarcane fields by UAVs and machine learning-based classification models.This methodology will give a realistic, accurate, and efficient method for determining the presence of WLD in vast sugarcane fields.VIs are crucial for developing the best classification algorithms because diseases cause changes in the color, water content, and cell structure of the leaves, which are reflected in the spectrum [72].Pigment changes cause visible spectral responses, while changes in cell structure cause near-infrared spectral responses.Initially, twenty VIs were selected, and only six VIs were chosen via multicollinearity testing and feature selection techniques to minimize the training time and resource requirements of the computer because training time is crucial for model evaluation and to avoid model overfitting.However, the UAV-derived spectral bands and indices used in this work are not disease-specific; hence, they can only measure different infestation levels or damage when a single disease impacts the crop, as they cannot differentiate between their different types of diseases.
Feature selection is important to attain a higher classification accuracy with less training time.However, it isn't easy to obtain the best time and accuracy, and hence a balance must be established based on users' requirements.Different color tags were used for ground truth measurements for post image processing of labelling.However, a handheld GPS meter with higher accuracy can locate each class due to the unavailability of high accuracy GPS meters.However, it is a good method for validating the prediction results in the segmented images.Since conducting ground truth investigations into plant diseases needs professional competence and is time-and labor-intensive, most of the research relied heavily on sampling surveys, as did the evaluation outcomes [50].Two-month-old sugarcane plants were selected in this study because young plants are highly affected by WLD in the sugarcane industries.Early detection of illnesses is crucial for successful mitigation actions [46].However, this study should continue further in various sugarcane crop stages.Additionally, flight missions should be conducted in different climatic seasons in different sugarcane varieties to find the severity level of this incident.
Variable optimization was implemented, and just six variables were deemed essential for developing the various ML prediction models.During the optimization process, all ML models eliminated less significant variables.Other studies have indicated that excluding insignificant factors improves the classification performance of machine learning.When variable relevance is very low, the variable is either unimportant or substantially collinear with another variable or variables.Based on the five-band pictures and VIs, the selected ML models, such as XGB, RF, DT, and KNN, produced distribution maps with comparable results.In addition, the overall classification accuracy of this study employing multispectral VIs produced from UAVs is equivalent to similar studies described in the preceding sections.RF has great precision, excellent outlier tolerance, and parameter selection.León-Rueda et al. [13] also used the RF classifier for the classification process.Lan et al. [17] evaluated the feasibility of monitoring citrus Huanglongbing (HLB) by using multispectral images, VIs, and KNN algorithm because KNN is one of the simplest classification algorithms available.It may be used to solve classification and regression predicting problems with extremely competitive results [46].The DT algorithm tends to have more numerical features in the classification results for data within consistent sample sizes in each category [73].However, as a result, XGB was chosen as the ideal technology for monitoring WLD in the sugarcane field because it is highly flexible and works well in small to medium datasets.Therefore, the best prediction model was developed with high accuracy within a short training time.
In the segmentation results, the margin of all the crops was shown in red color due to the dead leaves.Therefore, precision, recall, and F1 score for early and severe symptoms were reduced during the training process.It is a limitation of this study.However, severely diseased plants can be detected easily if the segmented crop canopy is covered completely with red color.Therefore, further research should be conducted to determine the usefulness of deep learning algorithms for detecting WLD in sugarcane fields.Only four ML algorithms were selected in this study based on the previous studies mentioned in Section 1.However, other ML models, such as SVM and LR, can be developed to detect the WLD while comparing with existing models.In addition to these research gaps, high-resolution hyperspectral cameras can improve accuracy, and disease-specific VIs should be developed to detect the specific disease in the sugarcane field.

Conclusions
This research utilized multispectral UAV images and machine learning methods to detect WLD in a sugarcane field.High-resolution multispectral images and pixel-bypixel classification answered the need for precise and efficient detection and segmentation approaches for WLD monitoring.The classification performance of four machine learning (ML) methods (XGB, RF, DT, and KNN) was comprehensively evaluated from multiple perspectives, including classification accuracies based on pixel scale and plant scale, the degree of agreement with the ground truth density maps, and the identified areas of infection.The total accuracy of all ML models for five-band pictures was greater than 93%.The experimental results reveal that both XGB and RF performed well in classification.DT, however, demonstrated the lowest classification performance.The five-multispectral-band XGB model with a higher OA of 94% and a faster running duration of nine minutes was deemed the best-supervised model.This study's findings could guide sugarcane plantation management for disease identification by pinpointing the precise location of infected areas in sugarcane fields.

Figure 1 .
Figure 1.Main steps of the proposed methodology for a single flight campaign.Figure 1. Main steps of the proposed methodology for a single flight campaign.

Figure 1 .
Figure 1.Main steps of the proposed methodology for a single flight campaign.Figure 1. Main steps of the proposed methodology for a single flight campaign.

Figure 3 .
Figure 3. Ground truth classification of sugarcane crops in the field.

Figure 3 .
Figure 3. Ground truth classification of sugarcane crops in the field.

Figure 4 .
Figure 4. UAV and camera used in this study: (a) DJI P4 with remote controller and (b) RGB sensor for visible light imaging and five monochrome sensors (Source [54].)

Figure 4 .
Figure 4. UAV and camera used in this study: (a) DJI P4 with remote controller and (b) RGB sensor for visible light imaging and five monochrome sensors (Source [54].

Figure 5 .
Figure 5. Pixelwise labelling of severe symptom of WLD plant using QGIS tool.

Figure 6 .
Figure 6.Normal Q-Q plot for the observed sample against theoretical quantiles.

Figure 6 .
Figure 6.Normal Q-Q plot for the observed sample against theoretical quantiles.

Figure 9 .Figure 10 .
Figure 9. Segmentation results of the proposed approach: (a) multispectral image and (b) segmentation result of XGB model.

Table 1 .
Application of UAVs for disease management in precision agriculture.

Table 2 .
Use of UAVs for pest and disease control in the sugarcane sector.

Table 3 .
Spectral band information for the DJI P4 Multispectral.

Table 3 .
Spectral band information for the DJI P4 Multispectral.

Table 4 .
Camera specification of the DJI P4 Multispectral.

Table 5 .
VIF values for selected VIs.

Table 9 .
Confusion matrix of different classifiers in the training site.

Table 10 .
Classification report for different ML models in the training site.: Ratio between true positives and the sum of true positives and false positives; Recall: Ratio between true positives and the sum of true positives and false negatives. Precision

Table 11 .
Classification report for different ML models in the testing site.

Table 12 .
The training time of the XGB, RF, DT, and KNN.