Detecting Walnut Leaf Scorch Using UAV-Based Hyperspectral Data, Genetic Algorithm, Random Forest and Support Vector Machine Learning Algorithms

Weng, Jian; Zhang, Qiang; Wang, Baoqing; Zhang, Cuifang; Zhang, Heyu; Meng, Jinghui

doi:10.3390/rs17243986

Open AccessArticle

Detecting Walnut Leaf Scorch Using UAV-Based Hyperspectral Data, Genetic Algorithm, Random Forest and Support Vector Machine Learning Algorithms

by

Jian Weng

^1,2,

Qiang Zhang

^1,3,*,

Baoqing Wang

^1,3,

Cuifang Zhang

⁴,

Heyu Zhang

⁴ and

Jinghui Meng

²

¹

Xinjiang Uygur Autonomous Region Academy of Forestry, Urumqi 830092, China

²

Research Group on Efficient Management of Water Conservation Forests in Northwest China, State Key Laboratory of Efficient Production of Forest Resources, Beijing 100083, China

³

Akesu Observation and Research Station of Chinese Forest Ecosystem, Akesu 843101, China

⁴

College of Forestry and Landscape Architecture, Xinjiang Agricultural University, Urumqi 830052, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(24), 3986; https://doi.org/10.3390/rs17243986

Submission received: 29 October 2025 / Revised: 29 November 2025 / Accepted: 2 December 2025 / Published: 10 December 2025

(This article belongs to the Special Issue Remote Sensing-Assisted Forest Inventory Planning)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

An efficient monitoring model integrating UAV hyperspectral imagery and machine learning was developed for detecting walnut leaf scorch.
The Genetic Algorithm-optimized SVM model (GA-SVM) achieved the highest predictive performance (R² = 0.6302, RMSE = 0.0629, MAE = 0.0480).

What are the implications of the main findings?

Offers a rapid and precise tool for the detection and precision management of walnut leaf scorch.
The UAV-based approach enables site-specific disease detection, improves monitoring efficiency, and reduces reliance on costly manual ground surveys.

Abstract

Walnut (Juglans regia L.), a critical economic species, experiences substantial declines in fruit quality and yield due to Walnut Leaf Scorch (WLS). This issue is particularly severe in the Xinjiang Uygur Autonomous Region (XUAR)—one of Asia’s leading walnut-producing regions. To mitigate the disease, timely and efficient monitoring approaches for detecting infected trees and quantifying their disease severity are in urgent demand. In this study, we explored the feasibility of developing a predictive model for the precise quantification of WLS severity. First, five 4-mu (1 mu = 0.067 ha) sample plots were established to identify infected individual trees, from which the WLS Disease Index (DI) was calculated for each tree. Concurrently, hyperspectral data of individual trees were acquired via an unmanned aerial vehicle (UAV) platform. Second, DI estimation models were developed based on the Random Forest (RF) and Support Vector Machine (SVM) algorithms, with each algorithm optimized using either Grid Search (GS) or a Genetic Algorithm (GA). Finally, four integrated models (GS-RF, GA-RF, GS-SVM, and GA-SVM) were constructed and systematically compared. The results showed that the Genetic Algorithm-optimized SVM model (GA-SVM) exhibited the highest predictive accuracy and robustness, achieving a coefficient of determination (R²) of 0.6302, a Root Mean Square Error (RMSE) of 0.0629, and a Mean Absolute Error (MAE) of 0.0480. Our findings demonstrate the great potential of integrating UAV-based hyperspectral remote sensing with optimized machine learning algorithms for WLS monitoring, thus offering a novel technical approach for the macroscopic, rapid, and non-destructive surveillance of this disease.

Keywords:

walnut leaf scorch; hyperspectral data; unmanned aerial vehicle; random forest; support vector machine; genetic algorithm

1. Introduction

Walnuts (Juglans regia L.), which are extensively cultivated throughout the world, serve as a crucial economic forest species boasting exceptionally high nutritional and medicinal values [1]. Xinjiang Uygur Autonomous Region (XUAR) has a long history of walnut cultivation along with a rich variety of breeds [2] and it is one of the prominent walnut-producing regions in Asia [3]. Walnuts in XUAR are predominantly produced in the southern part of the region. According to the data from the Forest and Fruit Industry Development Center of XUAR, by the end of 2021, the walnut planting area was 6.3088 million Chinese mu (0.067 ha), with the output reaching as high as 1.1322 million tons [4]. It was noteworthy that the walnut income accounts for more than 40% of the annual income of local farmers in XUAR [5].

Walnut leaf scorch (WLS) is generally regarded as a physiological disease [5,6,7], though there exists a study suggesting that it is caused by Xylella fastidiosa [8]. Infected walnuts exhibit symptoms such as kernel leakage, empty shells, discoloration, and shrinkage, with the commodity rate being only 50% to 80% [9], leading to a significant decline in the quality and yield of walnuts [6,10]. With the annual expansion of walnut cultivation in southern XUAR, the damage from WLS has become increasingly prominent, severely affecting the region’s walnut industry [11]. As early as 2012, WLS had caused 20% to 30% of walnut orchards in southern XUAR to suffer from varying degrees of leaf scorch [7].

Timely and efficient detection of infected individuals and their disease severity provides a crucial basis for the prevention, accurate diagnosis and treatment of WLS. Currently, the identification of WLS mainly relies on ground surveys, which are time-consuming, costly, and inefficient. Moreover, given the large-scale planting of walnuts in XUAR, this method is impractical. Although multispectral and satellite-based remote sensing have been applied in agricultural disease monitoring [12,13], their relatively low spatial resolution limits the identification of individual infected trees [14], and the broad spectral bands often fail to distinguish subtle differences in disease severity [15]. Consequently, to achieve macroscopic, rapid, and non-destructive surveillance, there is an urgent need for high-precision large-scale identification methods of WLS.

Hyperspectral remote sensing refers to a technology that images targets using narrow and continuous spectral channels, with a spectral resolution reaching the nanoscale [16]. Compared with traditional broadband multispectral data, its core advantage lies in the fact that numerous continuous bands can sensitively capture the differences in spectral characteristics caused by subtle physiological and biochemical changes in vegetation due to disease [17,18]. Because of its strong ability to monitor minor anomalies in the spectral features of targets [19], even when vegetation is in the early stage of disease, tiny changes in leaf chlorophyll content, moisture content, and cell structure can be accurately identified through subtle fluctuations in the spectral curve [20,21].

Low-altitude unmanned aerial vehicle (UAV) remote sensing, as an important application form, is flexible, efficient, and cost-effective compared with ground and satellite remote sensing, capable of acquiring high-spatial-resolution data to enhance monitoring accuracy [22]. Moreover, UAVs can be equipped with hyperspectral imaging spectrometers, and the resulting hyperspectral images contain rich spatial and spectral information, showing great potential in vegetation disease monitoring [23]. For instance, Abdulridha et al. (2019) developed a hyperspectral imaging system (400–1000 nm) integrated with UAV technology to detect citrus canker across multiple disease stages, achieving classification accuracies of 94–100% for leaves, and 100% accuracy for distinguishing healthy from infected trees in orchard conditions [24]. Guo et al. (2021) employed UAV-based hyperspectral imaging with vegetation indices and texture features to monitor wheat yellow rust at field scale using PLSR models, achieving optimal monitoring accuracy (R² = 0.75–0.82) with combined VI-TF models at 10 cm spatial resolution across different infection periods [25].

The construction of effective predictive models based on acquired hyperspectral data is a critical component for achieving precise disease identification and assessment. Traditional statistical methods, such as partial least squares regression (PLSR), have been employed in previous agricultural monitoring studies [26,27,28]. However, as a linear algorithm, PLSR is often limited in capturing the complex, non-linear relationships between spectral features and physiological stress indicators, particularly when dealing with high-dimensional hyperspectral data [29,30]. For instance, Chemura et al. (2018) reported that linear models struggled to accurately predict coffee leaf rust severity due to the non-linear influence of infection on spectral reflectance [31]. Similarly, Wang et al. (2023) demonstrated that machine learning algorithms significantly outperformed PLSR in walnut canopy analysis by effectively modeling these non-linear interactions [26]. To address these challenges, in recent years, numerous machine learning algorithms have been extensively employed for analyzing multidimensional hyperspectral data [20,32,33,34]. Amongst, Random Forest (RF) and Support Vector Machine (SVM), as renowned machine learning algorithms, exhibit exceptional performance in reducing the complexity of classification problems associated with hyperspectral data [35,36]. Many authors [37,38,39] have demonstrated that both RF and SVM are well-suited for hyperspectral image classification, as these algorithms can handle large input spaces, efficiently process noisy datasets [40,41], and produce satisfactory classification accuracies.

The performance of machine learning models largely depends on the rational configuration of hyperparameters and the effective selection of input features, which directly influence model accuracy and generalization capability [42,43]. Grid search (GS), as a classical hyperparameter optimization method, systematically traverses predefined hyperparameter spaces to identify optimal hyperparameter combinations [44]. This process ultimately identified the hyperparameter combination that minimized classification error as the optimal values. However, conventional grid search can only address hyperparameter optimization problems [45], and in the high-dimensional feature environment of hyperspectral data, feature selection is equally critical [46], which limits its application effectiveness in complex data modeling.

To address this limitation, Genetic Algorithm (GA), as an adaptive optimization search methodology based on Darwinian natural selection and biological genetics principles, can simultaneously accomplish both hyperparameter optimization and feature selection tasks [45]. Particularly, in RF and SVM applications, optimal input feature subset selection influences the optimal model hyperparameters, necessitating synchronous optimization, and GA precisely addresses this requirement.

Actually, the advantages of GA have been thoroughly validated by many authors [47,48]. For instance, Miranda et al. (2022) demonstrated that applying GA to the variable selection process in RF successfully streamlined the explanatory variables while significantly improving model predictive performance [49]. Zhang et al. (2022) produced a GA-SVM model by combining GA with SVM classifiers and found that the model achieved an overall accuracy of 95.24% and a Kappa coefficient of 0.9234 in identifying different infection stages of pine tree diseases, significantly surpassing the performance of conventional KNN, RF, and single SVM classifiers [50]. More recently, Samanta et al. (2024) employed a Genetic Algorithm for feature selection in wheat rust disease detection, demonstrating that GA-selected features significantly improved classification accuracy by effectively reducing data redundancy compared to classical methods [51]. Similarly, Zhang et al. (2024) validated the superiority of the GA-SVM framework in identifying maize varieties using hyperspectral imaging, revealing that the model achieved high classification accuracy and effectively overcame the local optima problem inherent in high-dimensional datasets compared to standard grid search optimization [52].

This study aimed to develop a predictive model to detect WLS using UAV-based hyperspectral data, the genetic algorithm, random forest, and support vector machine algorithms. Additionally, our study presents an opportunity to investigate the performance benefits of GA and GS. We hope that our predictive model could contribute to timely and efficient detection of infected individuals and their disease severity of WLS, hence providing a solid basis for the prevention, accurate diagnosis and treatment of WLS.

2. Materials and Methods

2.1. Study Area

The study area is located within Lop County in Hotan Prefecture, Xinjiang Uygur Autonomous Region (Figure 1), a county situated on the southern edge of the Tarim Basin and bordered by the Taklamakan Desert to the north, the Kunlun Mountains to the south, the Yurungkash River across from Hotan City and County to the west, and Qira County to the east [53]. The region features a south-to-north sloping topography with a complex terrain of mountains and plains ranging from 1200 to 5466 m, and has an extremely arid, temperate continental climate characterized by frequent sandstorms, low air humidity, and a high rate of evaporation [54]. As one of China’s primary walnut-producing regions, the Hotan Prefecture has developed a large-scale and diversified forest fruit industry, earning its reputation as the renowned “Hometown of Walnuts.” In 2022, the total area under fruit cultivation in Hotan Prefecture reached 3.2425 million Chinese mu, of which walnut plantations accounted for 1.7307 million Chinese mu [4].

Data collection was conducted at a walnut orchard located in Kuoqiak Airik Village, Qiarbag Town, Lop County, Hotan Prefecture. The orchard is situated adjacent to National Highway 315 to the south and covers an area of 120 Chinese mu. The primary cultivar planted is Za343, with a small proportion of Xinfeng, arranged in a planting configuration of 6 m × 8 m spacing. The age structure of the walnut trees consists of 70% at 13 years, 20% ranging from 7 to 13 years, and 10% under 7 years of age.

2.2. The General Structure and Overall Workflow of This Study

The overall workflow of this study is illustrated in Figure 2. The research comprised four key stages: data acquisition, model optimization, model development, and model evaluation. Ultimately, the optimal predictive model for WLS detection was identified. Throughout this workflow, we utilized UAV-based hyperspectral images, the Genetic Algorithm (GA), Random Forest (RF), and Support Vector Machine (SVM) algorithms.

2.3. Groud Data Collection and Analysis

2.3.1. Sampling Design and Field Measurements

Following Wu et al. (2023), the walnut orchard was systematically sampled using five sample plots of approximately 4 Chinese mu, which were established at the four corners and the center of the walnut orchard [55]. This systematic sampling strategy is extensively used in forest and orchard disease monitoring to ensure spatial representativeness and minimize subjective bias. For instance, Yang et al. (2021) applied this method to investigate field crop diseases [56]. Each plot contained around 40 walnut trees for investigation (Figure 3).

We collected the data during the peak incidence period of WLS, i.e., from June to August 2024, with samples repeatedly collected across five sampling periods (16 June, 2 July, 30 July, 12 August, and 30 August). The leaf samples were collected from five directions (east, west, south, north, and center) of the upper canopy of each tree. Three leaves were collected from each direction, and the disease severity of the central leaflets was assessed and classified. Figure 4 demonstrated the walnut leaf scorch of various levels.

2.3.2. Ground Data Analysis Method

Following Xing et al. (2023), we adopted the classification criteria for disease severity on individual leaves, as outlined in Table 1 [57]. Additionally, The disease index (DI) for each individual walnut was calculated following Xing et al. (2023) and Chiang and Bock (2021), using the following formula [57,58]:

D I = \frac{\sum_{i = 1}^{n} V_{i}}{n V_{m}}

(1)

where

n

represents the total number of leaves under investigation for each individual walnut;

V_{i}

is the representative value of the disease grade of the

i

-th leaf; and

V_{m}

represents the maximum value of

V_{i}

, which is equal to 4.

2.4. Hyperspectral Data Acquisition and Preprocessing

2.4.1. UAV-Based Hyperspectral Imagery Acquisition

Hyperspectral imagery acquisition using unmanned aerial vehicles (UAVs) was synchronized with ground-based data collection, with all flights conducted under clear weather conditions at solar noon to optimize illumination consistency. The UAV platform used in this study, DJI M350 RTK(DJI, Shenzhen, China), was equipped with an FS-60c hyperspectral imaging sensor (Hangzhou CHNSpec Technology Co., Ltd., Hanzhou, China) (Figure 5) and operated over a survey area of 120 Chinese mu at a flight altitude of 120 m. Forward overlap and side overlap were set to 80% and 70%, respectively, to ensure full coverage. The hyperspectral sensor specifications included a spectral resolution of 2.17 nm, spatial resolution of 480p (approximately 11 cm ground sampling distance), spectral range of 400–1000 nm, and 300 discrete spectral bands.

2.4.2. Hyperspectral Imagery Preprocessing

Radiometric calibration, including reflectance correction, was performed using the FigSpec Studio software (v2023.11.24.01, Hangzhou CHNSpec Technology Co., Ltd., Hanzhou, China) with a white reference panel. Following radiometric calibration, crown boundaries of each individual walnut tree were digitized through visual interpretation using ArcGIS 10.4 software. Mask-based spectral extraction was subsequently executed in Python 3.10 environment, resulting in mean reflectance values across all hyperspectral bands for all individual walnut tree crowns.

Following Dye et al. (2011) and Abdel-Rahman (2014), spectral data beyond 900 nm was excluded from further analysis. This exclusion was primarily due to the sharp decline in the signal-to-noise ratio (SNR) and elevated noise levels caused by the limits of the sensor’s spectral sensitivity in this region [37,59]. Therefore, we remove 55 spectral bands which were beyond 900 nm and 245 spectral bands were retained for subsequent processing. The remaining spectral bands underwent smoothing using a Savitzky–Golay filter to minimize instrumental artifacts [60]. Following the methods of Yu et al. (2021) and Mullen et al. (2016), we evaluated filter window sizes ranging from 3 to 15 points. The selection criteria aimed to balance noise suppression with the preservation of subtle spectral absorption features. Based on this evaluation, a 15-point window was identified as the optimal configuration to effectively remove high-frequency noise without distorting the key spectral characteristics required for WLS detection [61,62]. The final dataset, designated as the WLS-UAV Dataset, comprised smoothed spectral reflectance values from 231 wavebands for subsequent analysis.

2.5. Model Building

The WLS-UAV Dataset was constructed using spectral reflectance data from 231 bands and the corresponding disease index (DI) for each individual walnut tree collected across all five observation periods. To ensure the model captured the full range of spatiotemporal variability [26,63], this pooled dataset was randomly partitioned at the individual tree level into a training set (70%) and a testing set (30%) [64,65]. Random forest (RF) and support vector machine (SVM) were employed to develop the predictive models. Additionally, the input data was normalized to a range of 0 to 1 before training the SVM model, as SVM is sensitive to feature scaling [66]. To optimize model performance, Grid Search (GS) and Genetic Algorithm (GA) were used for hyperparameter optimization and feature selection [45].

2.5.1. Random Forest

Random Forest is an ensemble learning method, proposed by Leo Breiman (2001), which operates by constructing a multitude of decision trees at training time [40]. It combines the concept of bagging (bootstrap aggregating) with random feature selection to create a diverse set of uncorrelated trees [67]. For each tree, a random bootstrap sample is drawn from the training data. Furthermore, when splitting a node, the algorithm only considers a random subset of the available features. This dual-randomization process effectively reduces model variance [68]. In a regression task, the final prediction is determined by averaging the outputs from all individual trees in the forest. The randomForest library [67] of R statistical packages version 4.5.1 [69] was used to establish the RF model.

2.5.2. Support Vector Machine

Support Vector Regression is an adaptation of the Support Vector Machine algorithm for predicting continuous values [70]. The fundamental principle of SVM is to find a function that best fits the data points within a specified error margin, known as epsilon (ε). Unlike traditional regression models that minimize the error for all data points, SVM creates an “epsilon-insensitive tube” around the regression line and is designed to ignore errors within this tube [71]. Only the data points that fall outside this tube—the support vectors—influence the final position of the regression function. To model complex, non-linear relationships, SVM employs the kernel trick (e.g., RBF), which maps the data into a higher-dimensional feature space where a linear regression is possible [72]. The e1071 library [73] of R statistical packages version 4.5.1 [69] was utilized to establish the SVM model.

2.5.3. Hyperparameter Optimization and Feature Selection

To enhance the predictive performance and generalization ability of the models, a systematic approach was employed to identify the optimal combination of hyperparameters and input features for both RF and SVM. This was achieved by comparing a conventional grid search (GS) with a more advanced Genetic Algorithm (GA) approach.

Grid Search (GS) was implemented for hyperparameter tuning. It systematically evaluated each hyperparameter combination defined in the parameter grids. For each combination, a 10-fold cross-validation was performed on the training set. The average Root Mean Square Error (RMSE) across the ten folds was calculated, and the hyperparameter set yielding the minimum average RMSE was selected as optimal [43,74,75]. While comprehensive, this method is computationally intensive and is limited to optimizing hyperparameters without simultaneously performing feature selection [76].

The predefined parameter grids in this study were as follows: for the Random Forest model, the search space included ntree (100, 200, 300, …, 800) and mtry (features per split: {1/16, 1/8, 1/6, 1/4, 1/2, 3/4, 1, 2} multiplied by the default mtry) [67,77,78,79]. For the Support Vector Machine model (using a radial basis function kernel), the parameters included C (2⁻³, 2⁻¹, …, 2¹¹), γ (2⁻¹¹, 2⁻⁹, …, 2³), and ε (2⁻⁸, 2⁻⁷, …, 2⁻¹) [80,81].

To address the limitations of GS and investigate a potentially superior optimization strategy, the Genetic Algorithm (GA) was implemented. Theoretically, unlike the exhaustive traversal of GS which is prone to the “curse of dimensionality” [82], GA employs a heuristic evolutionary search that offers specific advantages in handling high-dimensional feature selection and hyperparameter optimization [45,47]. A key distinction lies in the mechanism to avoid local optima. While GS or simple gradient-based methods may get trapped in local peaks, GA maintains population diversity through specific genetic operators [48,49]. Specifically, the ‘crossover’ operator facilitates the recombination of superior genes to explore the solution space, while the ‘mutation’ operator introduces stochastic perturbations with a low probability [83]. This mechanism allows GA to balance global exploration with local exploitation, theoretically enabling it to jump out of local optima and converge towards the global maximum.

To implement this strategy for concurrently optimizing hyperparameters and selecting the most relevant feature subset [84,85], a custom chromosome structure was designed to encode a complete solution, comprising two distinct parts, i.e., one for hyperparameters (

C h_h y p

) and one for features (

C h_f e a t

). The structure of a chromosome (Ch) can be represented as:

C h = \overset{C h_{h y p}}{\overset{⏞}{P_{1}^{1} P_{1}^{2} P_{1}^{3} \dots P_{j}^{1} P_{j}^{2} P_{j}^{3}}} \overset{C h_{f e a t}}{\overset{⏞}{F_{1} F_{2} F_{3} \dots F_{k} \dots F_{231}}}

(2)

where

C h

represents chromosome of any individual in the GA population; the hyperparameter segment,

C h_{h y p}

, is composed of genes for each parameter;

j

represents the number of hyperparameters which is two for RF and three for SVM; and

P_{j}^{1}

,

P_{j}^{2}

and

P_{j}^{3}

represents the first, second and third bit of the gene for the

j

-th hyperparameter; the feature selection segment,

C h_f e a t

, is composed of genes for each feature;

F_{k}

represents the gene for the

k

-th input feature; all gene values are binary (0 or 1); a gene value of 1 signifies the inclusion of the

k

-th input feature, while a value of 0 signifies its exclusion.

The conversion formula from the binary gene segment (

C h_h y p

) to the real value of a hyperparameter is as follows:

P_{j} = {L i s t}_{j} [d e c i m a l (P_{j}^{1} P_{j}^{2} P_{j}^{3})]

(3)

where

P_{j}

represents the real value of the

j

-th hyperparameter;

{L i s t}_{j}

represents the predefined list of candidate values for the

j

-th hyperparameter;

P_{j}^{1} P_{j}^{2} P_{j}^{3}

represents the gene for the

j

-th hyperparameter.

The GA process begins with an initial random population of chromosomes. The fitness of each chromosome is then evaluated to guide the evolution [45]. The fitness function [86] was designed to balance model accuracy with feature parsimony, which also utilized a 10-fold cross-validation on the training set and defined as:

F i t n e s s = 0.8 \times \bar{R^{2}} + 0.2 \times \frac{1}{N}

(4)

where

\bar{R^{2}}

is the average the coefficient of determination (R²) across the ten folds split in the training set;

N

is the number of selected features.

Based on this fitness score, individuals were selected to become parents; those with higher fitness had a greater probability of being chosen for crossover and mutation. To ensure the best solutions were preserved, an elitism strategy was employed, automatically carrying the top 5 individuals to the next generation. The iterative process continued until a maximum of 100 generations was reached. However, to improve efficiency, an early stopping criterion was set: the process would terminate if the population’s maximum fitness did not improve for 20 consecutive generations. The GA was configured with a population size of 100, a crossover probability of 0.8, and a mutation probability of 0.1 [87], employing the GA library [87] of R statistical packages version 4.5.1 [69].

2.6. Evaluation Metrics

Once the optimal hyperparameters were identified, the final predictive models were produced by training on the entire 70% training dataset. The remaining 30% testing dataset were then used to validate the produced model. To quantitatively assess model performance, three widely recognized evaluation metrics were chosen: the coefficient of determination (R²), root mean square error (RMSE), and mean absolute error (MAE).

The R² value indicates the proportion of variance in the observed data that is explained by the model, where a value closer to 1 signifies a better model fit. RMSE and MAE measure the magnitude of the error between predicted and observed values, with lower values indicating higher prediction accuracy. The evaluation metrics are defined in Equations (5)–(7).

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(5)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(6)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(7)

where

n

is the total number of samples in the test set;

y_{i}

is the observed value of the

i

-th sample;

{\hat{y}}_{i}

is the predicted value of the

i

-th sample; and

\bar{y}

is the mean of all observed sample values.

3. Results

3.1. Model Optimization Results

The optimal model configurations for RF and SVM models were determined through 10-fold cross-validation process performed on the 70% training set. The final configurations for the four models were shown in Table 2. We observed that the optimal hyperparameter values for both RF and SVM differed between the two methods. For the models optimized via Grid Search (GS), the full set of 231 spectral features was used. The optimal hyperparameter set for the GS-RF model was (ntree: 600, mtry: 38, mtry factor: 0.5), which yielded the lowest cross-validation RMSE of 0.0754 ± 0.0103. Similarly, the optimal configuration for the GS-SVM model was (C: 2¹¹, γ: 2⁻¹¹, ε: 2⁻⁵), achieving a minimum RMSE of 0.0683 ± 0.0081. In contrast, the Genetic Algorithm (GA) approach performed simultaneous feature selection. The optimal GA-RF model was configured with hyperparameters (ntree: 400, mtry: 27, mtry factor: 0.75) and utilized a reduced subset of 108 features, achieving the highest fitness score of 0.4115. The GA-SVM model identified an even more streamlined feature set of 96 bands with optimal hyperparameters (C: 2¹¹, γ: 2⁻¹¹, ε: 2⁻²), resulting in a maximum fitness of 0.4966.

For the spectral features (bands) selected by GA-RF and GA-SVM were shown in Figure 6. The “Common” set (green) comprises the core wavelengths identified by both algorithms. In contrast, the “GA-RF Only” (blue) and “GA-SVM Only” (red) sets consist of wavelengths selected exclusively by their respective algorithms.

3.2. Comparative Performance and Visualization

The evaluation metrics (R², RMSE, and MAE) for each model were detailed in Table 3. Generally, we observed that RF showed weaker performance than SVM. Moreover, RF consistently demonstrated clear signs of overfitting, with performance on the training set being markedly superior to that on the test set, irrespective of the optimization method used (GS or GA). In contrast, the SVM model exhibited robust performance with minimal discrepancy between training and test results. Furthermore, the application of GA-based feature selection led to enhanced scores across all three metrics in both the final test results (Table 3).

Figure 7 displays the scatter plots of predicted versus observed values and their corresponding regression fits (left panels), alongside the residual distributions for each model (right panels). The GA-SVM model (Figure 7g,h) showed the best relative performance, i.e., its regression line most closely aligned with the 1:1 diagonal line, indicating minimal systematic bias. The performance of the GS-SVM model (Figure 7c,d) was comparable, though with a slightly higher bias. In contrast, the regression lines of the RF models (Figure 7a,b,e,f) deviated more significantly from the 1:1 diagonal, suggesting a greater systematic tendency for overestimation or underestimation.

The Genetic Algorithm-optimized SVM (GA-SVM) yielded the top scores on the test set across all three metrics: R² = 0.6302, RMSE = 0.0629, and MAE = 0.0480, while indicating minimal systematic bias. The GA-SVM model was, therefore, finally affirmed as the optimal model, because it provided a strong fit to the training data while maintaining superior generalization ability.

4. Discussion

4.1. Model Development and Performance

This study aimed to address a notable gap by developing a framework for the large-scale monitoring of WLS, a task for which traditional, inefficient ground surveys are ill-suited. By leveraging UAV-based hyperspectral remote sensing, we demonstrated that machine learning models built upon canopy reflectance data can effectively predict the disease index, identifying infected individuals and their disease severity of WLS.

The central finding is the superior performance of the Support Vector Machine model optimized with a Genetic Algorithm (GA-SVM), which outperformed both the Random Forest (RF) and Grid Search-tuned SVM (GS-SVM) models. Its superior predictive accuracy was confirmed by achieving the highest coefficient of determination (R² = 0.6302) alongside the lowest root mean square error (RMSE) and mean absolute error (MAE).

Numerous studies [88,89,90] have documented the superior performance of SVM for both regression and classification tasks within forestry and ecological research, particularly when compared with other machine learning algorithms. For instance, Zadbagher et al. (2024) identified that SVM was the best method in estimating the forest above-ground biomass, amongst linear regression (LR) model, and three different machine learning (ML) algorithms, i.e., RF, Artificial Neural Network (ANN) and SVM [91]. Zhang et al. (2023) employed SVM, K-Nearest Neighbor (KNN), RF, decision trees (DT) and Multi-layer Perceptron (MLP) in classifying forest land resource types, with SVM achieving the highest overall accuracy (95.8%) as well as the average accuracy (SVM 88.3%, KNN 87.5, RF 85.3%, MLP 85.00% and DT 77.5%) [92].

The robustness of Support Vector Machines (SVMs), particularly with high-dimensional data, is primarily rooted in the principle of structural risk minimization [72]. This inherent characteristic addresses the overfitting issue observed in the Random Forest model in our study. The RF model, driven by Empirical Risk Minimization and lacking manual depth constraints, tended to grow excessively complex to fit the high-dimensional noise [77]. Unlike other algorithms that seek only to minimize training error (empirical risk), SVM aims to minimize an upper bound on the generalization error, therefore providing a stronger safeguard against overfitting [93]. SVM operationalizes this principle through its margin maximization strategy, constructing a model defined only by the most critical data points—the support vectors—to achieve robustness against noise and outliers [94]. Furthermore, the kernel trick extends this powerful framework to effectively model complex, non-linear patterns by implicitly mapping the data into a higher-dimensional feature space [95], a crucial capability for analyzing intricate datasets like hyperspectral imagery. To fully leverage this theoretical framework in our specific context, we implemented two improvement measures: (1) Regularization, where the optimization of the penalty factor (C) functions as an automated regularization term [96]; and (2) Structural Adjustment, where the input structure is dynamically optimized by filtering out redundant noise.

To achieve such optimal configuration, although SVM demonstrated strong performance amongst machine learning algorithms, it has been extensively combined with a conventional grid search (GS) or a more advanced Genetic Algorithm (GA) approach to further enhance the predictive performance and generalization ability of the models. For instance, Lameski et al. (2015) investigated the impact of parameter tuning of SVMs with GS on the classification performance and its effect on preventing over-fitting and reported the improvements of predictive performance with proper parameter tuning but also improved stability of the classification models even when the test data comes from a different time period and class distribution [97]. Similarly, Hong et al. (2018) combined GA with SVM to set the optimal combination of forest fire related variables and model forest fire susceptibility and reported that GA-SVM yielded higher AUC values than original SVM model [98].

The improvement of machine learning model after introducing GA or GS could be explained by their systematic approaches to hyperparameter optimization and feature selection, addressing critical limitations of default model configurations [99,100]. GS exhaustively tests all combinations of hyperparameters within a predefined range, ensuring no promising configuration is overlooked [101]. This brute-force approach guarantees local optimality in low-to-moderate dimensional parameter spaces [37]. In comparison, GA mimics natural selection (selection, crossover, mutation) to evolve a population of candidate solutions, enabling global search and multi-objective optimization in complex, high-dimensional parameter spaces [102].

In comparison to GS, numerous researchers have argued that GA, an optimization method based on Darwinian natural selection [87], exhibits superior performance for tackling combinatorial optimization problems. Consequently, GA has been extensively applied to Support Vector Machine (SVM) modeling as well as feature selection across forestry and ecological research [86,103]. For instance, using polarimetric SAR data, Ji et al. (2021) employed GA-SVM approach for estimating forest aboveground biomass (AGB) and found that cross-validation coefficients of 80.21% (GF-3) and 71.41% (ALOS-2), outperforming traditional methods like GA with default SVM parameters or GS optimization [104]. Sukawattanavijit et al. (2017) used GA-SVM on multisource remote sensing data and achieved >95% classification accuracy, outperforming GS optimization by 5–8% and reducing input features by 30–40% [103].

The superior performance of the GA-SVM model could be attributed to the inherent strengths of the Genetic Algorithm in complex optimization tasks. Unlike the exhaustive but one-dimensional Grid Search, the GA is capable of simultaneously optimizing both model hyperparameters and the input feature subset [45]. This simultaneous optimization is critical for managing model complexity; by dynamically selecting feature subsets alongside tuning parameters (e.g., C and γ), the model effectively balances bias and variance, avoiding the overfitting risks often associated with manual or fixed-step tuning. More importantly, its population-based and stochastic search characteristics allow it to explore the search space more broadly, reducing the risk of getting trapped in local optima and thus increasing the probability of finding the global optimum [105,106]. This mechanism forms a synergistic closed-loop system that combines heuristic global search with SVM’s structural risk minimization. Moreover, numerous studies consistently show that, especially in complex problems, GA can find better or near-optimal parameter combinations in less time than grid search [107]. Ultimately, this dual optimization is particularly advantageous for the specific data characteristics of this study: it effectively addresses the high dimensionality of the hyperspectral data [108] by filtering out uninformative noise and ensuring the SVM operates on a highly discriminative feature subset. To further validate the model’s decision-making process, we analyzed the spectral features selected by the GA-SVM model (Figure 6). The selected bands are broadly distributed across the visible, red-edge, and near-infrared (NIR) regions. Physiologically, features in the visible and red-edge regions align with pigment degradation, while NIR selections likely capture structural and water content changes associated with necrosis [109]. However, the discrete distribution of these bands reflects the inherent data-driven characteristic of the Genetic Algorithm [83]. Beyond individual physiological signals, GA selects features for their synergistic contribution to model accuracy. It identifies complementary features that, while potentially less informative individually, function synergistically to reduce noise and enhance separability in the high-dimensional feature space [110].

4.2. Comparative Study and Future Perspective

The evaluation metrics obtained in this study, i.e., R² = 0.6302, RMSE = 0.0629, and MAE = 0.04810, indicated the relatively good performance of the predictive model, compared to the similar disease predictive models. For instance, Li et al. (2022) developed a hybrid model combining a stochastic radiative transfer model (SRTP) with a Random Forest predictor to estimate the infected area of pine wilt disease and the R² was 0.57 in the Jiangxi Provicne and 0.48 in the Shandong Province [111]. In contrast, our GA-SVM model is, however, also less favorable relative to those reported in other comparable studies [112,113,114]. For example, various machine learning approaches have achieved high coefficients of determination in disease severity estimation, such as for cucumber downy mildew (R² = 0.9190) using a Convolutional Neural Network (CNN) model [115], wheat powdery mildew (validation R² > 0.722) using a Partial Least-Squares Regression (PLSR) model [113], and wheat leaf rust (R² of about 0.94) using linear spectral unmixing and the Fisher function [116]. These performance disparities underscore the potential limitations of using machine learning on spectral data alone. Consequently, addressing these gaps serves as the primary motivation for our subsequent discussion on integrating multi-source data and applying advanced deep learning architectures.

4.2.1. Deep Learning

The better performance of the similar studies could be explained by applying advanced statistical methods. For example, compared to traditional machine learning methods such as RF and SVM, Deep Learning (DL) models, particularly Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), offer significant advantages for processing high-dimensional hyperspectral data [117]. Sothe et al. (2020) reported that a CNN model that used only the raw spectral bands as input achieved an accuracy that was 22% to 26% higher than both SVM and RF models using the same input when classifying 16 different tree species with hyperspectral data [117]. Adeniyi et al. (2024) compared the performance of CNN and SVM for detecting diseases on various plant leaves and found that the CNN model achieved an accuracy of 98%, while the SVM model’s accuracy was 62% [118]. This significant improvement highlights the capacity of deep learning architectures to better discriminate between healthy and diseased plant tissues by learning more discriminative features directly from image data. More recently, Tu et al. (2025) further demonstrated the efficacy of advanced architectures, such as autoencoders, for optimizing feature representation in hyperspectral analysis [119].

Beyond accuracy, DL models also exhibit superior stability against data quality variations. Jia et al. (2024) revealed that unlike traditional models (e.g., RF) which are sensitive to signal-to-noise ratio (SNR) fluctuations, advanced architectures (specifically Vision Transformers) maintain high accuracy even with reduced spectral or spatial resolutions [34]. This conclusion is further reinforced by Yao et al. (2025), who utilized advanced AI frameworks to demonstrate significant improvements in model robustness against environmental variability in complex monitoring tasks [120].

The findings strongly suggest that DL models are a powerful tool for hyperspectral data analysis. Our current study utilized traditional RF and SVM models, which, while effective, may not fully exploit the complex patterns inherent in the data. Therefore, a key direction for our future research will be to develop and evaluate DL architectures. This includes utilizing 1D-CNNs for spectral feature extraction and Long Short-Term Memory (LSTM) for analyzing temporal disease progression. Furthermore, as recommended by recent advancements, we also aim to explore modern object detection frameworks, such as CNN-based YOLO series [121] and Transformer-based DETR series (e.g., MSMT-RTDETR) [122], to integrate spatial and textural information for more granular disease monitoring.

4.2.2. Multi-Source Remote Sensing Application

The better performance of the similar studies might also be attributed to integrating multi-source remote sensing techniques. While passive remote sensing technologies like hyperspectral imaging provide valuable information on the biochemical composition of vegetation, they are primarily limited to two-dimensional surface characteristics [123]. Active remote sensing technologies, such as Light Detection and Ranging (LiDAR), Synthetic Aperture Radar (SAR), and thermal imaging, actively emit energy and measure the return signal, providing unique information about the physical structure and physiological status of plants. Many authors [124,125,126] documented that Fusing data from these active sensors with passive hyperspectral data can significantly improve the accuracy and reliability of disease monitoring.

For instance, Zhou et al. (2022) combined UAV-based hyperspectral imagery (UAV-HI) with LiDAR data to monitor Emerald Ash Borer (EAB) insect infestation and found that the fusion of LiDAR and hyperspectral data, leveraging LiDAR’s precise three-dimensional structural information for improved tree segmentation, increased the overall classification accuracy of infestation levels to 82.90%, a notable increase from the 79.03% and 70.32% achieved when using each data source individually [127]. Sankey et al. (2017) also combined UAV-based hyperspectral and LiDAR imagery for individual species classification in southwestern US forests, yielding an overall accuracy of 84–89%, a significant improvement over the 72–76% accuracy achieved using hyperspectral data alone [128]. This demonstrates the synergistic benefit of combining biochemical information from hyperspectral sensors with 3D structural information from LiDAR, because the integrated information can reveal subtle differences indicative of plant health and adaptability. Most recently, Zhou et al. (2024) further reinforced this direction by demonstrating that such multi-sensor fusion significantly enhances the characterization of complex vegetation structures, providing a more comprehensive basis for monitoring forest health [129].

In addition to the combination of UAV-based hyperspectral and LiDAR imagery, other combinations were also used in forestry and ecology studies, e.g., disease monitoring and above-ground biomass estimation. For instance, because plant diseases often disrupt transpiration, leading to abnormal changes in canopy temperature that can be detected by thermal sensors before visible symptoms appear, Francesconi et al. (2021) used a UAV equipped with both thermal and RGB cameras to monitor Fusarium Head Blight (FHB) in wheat and successfully detected the disease by identifying these canopy temperature anomalies, showcasing the potential of thermal imaging for early disease warning [130].

In this present study, we only used a single remote sensing data source, i.e., UAV-borne hyperspectral imagery. The literature clearly indicates that a more comprehensive understanding of plant health can be achieved by integrating multiple data streams [131]. Active sensors like LiDAR provide crucial 3D structural information [132], while thermal sensors offer insights into physiological stress [133]. Furthermore, recent advancements by Deng et al. (2025) in 3D point cloud analysis have enabled more precise extraction of phenotypic traits from LiDAR data, offering new possibilities for quantifying disease-induced structural changes [134]. Therefore, a primary goal for our future work is to pursue a multi-sensor fusion approach. This will involve equipping our UAV platform with additional LiDAR and thermal sensors, or alternatively, integrating our high-resolution UAV data with satellite-based SAR imagery to create a more robust and accurate model for disease detection.

4.2.3. Spectral and Texture Indices

The better performance of similar studies might be attributed to the integration of spectral and textural information into predictive model building. While using the full hyperspectral spectrum provides the most complete information, it also introduces data redundancy and high computational complexity (the “curse of dimensionality”) [135]. Spectral and textural indices serve as effective methods for data reduction and feature enhancement, often leading to improved model efficiency and accuracy [136,137]. For instance, Zhang et al. (2020) developed a new, disease-specific Fusarium Disease Index (FDI) for wheat Fusarium Head Blight (FHB) and found that the estimation model for disease severity built upon the FDI performed the best, achieving a coefficient of determination (R²) greater than 0.90, compared to the other 16 common vegetation indices [138]. Furthermore, Guo et al. (2020) compared the performance of models using only spectral features (vegetation indices, VIs), only textural features (TFs), and a combination of both in identifying yellow rust in wheat leaves and found tha the model combining VIs and TFs achieved the highest accuracy (95.8%), which was 6.3% or 9.3% higher than the model using only VIs or TFs [139], proving that fusing spectral and texture features can significantly improve disease identification.

In this present study, we only used the full reflectance spectrum as the primary input to train our machine learning models. While this approach is comprehensive, many authors suggested that performance can be enhanced by using engineered features that reduce dimensionality and amplify the signals related to disease stress. Spectral and texture indices have proven effective in this regard [140,141,142]. For our future work, a wide array of both established and newly developed spectral and texture indices are encouraged to be extracted from our hyperspectral imagery and be used as input features to develop new models, with the objective of improving predictive accuracy, reducing computational load, and increasing the interpretability of our results.

4.2.4. Canopy 3D Structure

Many authors [143,144] demonstrated that tree shape and the vertical distribution of disease within the canopy can have a greater impact on the performance of vegetation indices than observation geometry alone. This is particularly challenging for early detection when diseases originate in the lower or middle canopy layers [145]. To solve this problem, as mentioned above, the active remote sensing sensors, e.g., LiDAR and SAR, were normally combined with passive remote sensing, providing unique information about the physical structure and physiological status of plants. Actually, numerous studies documented the great improvement of the predictive model in disease identification [126,127,146] as well as species classification [90,128,147] after introducing active remote sensing sensors.

In this study, however, we utilized an “average” crown reflectance, a simplification that overlooks the canopy’s complex three-dimensional structure, including the vertical profile of foliage [148] and leaf angle distribution [149], which profoundly influences the spectral signal captured by remote sensors. Collectively, these dynamic and structural factors likely contribute to the unexplained variance.

Consequently, our future research will focus on incorporating this structural information to build more sophisticated models. A promising direction is the fusion of our hyperspectral data with LiDAR, which can provide explicit 3D measurements of the canopy. This structural information could be used to normalize spectral data against structural effects, or it could serve as an additional set of predictive features, potentially improving the accuracy of health estimations [150]. An alternative approach involves employing advanced radiative transfer models that can simulate the light interaction within the canopy, using parameters like leaf angle distribution to normalize the spectral data and better isolate the spectral signature of the disease itself.

5. Conclusions

This study developed and compared several models (GS-RF, GA-RF, GS-SVM, and GA-SVM) for monitoring walnut leaf scorch (WLS) using UAV-based hyperspectral data. The results demonstrated that the Support Vector Machine model optimized with a Genetic Algorithm (GA-SVM) achieved the highest predictive accuracy (R² = 0.6302, RMSE = 0.0629, and MAE = 0.0480), outperforming counterparts based on Random Forest and Grid Search optimization. The superior performance of the GA-SVM model is largely attributed to its robustness in handling non-linear relationships, avoiding the strict assumptions of normality and independence required by linear regression, and the effective feature optimization capability of GA compared to GS. Consequently, this model serves as a powerful tool for the macroscopic, rapid, and non-destructive surveillance of WLS, providing specific decision-support for disease prevention. While this research provides an effective method for the precision monitoring of walnut leaf scorch, it acknowledges limitations related to challenges within the data acquisition, model building technique, and data processing workflow. Future research should aim to overcome these limitations by integrating multi-source data (e.g., thermal infrared, LiDAR), applying more advanced deep learning models, and developing novel spectral and textural indices. Such efforts would advance the development of more robust and accurate systems for early monitoring and warning of WLS.

Author Contributions

Conceptualization, Q.Z., J.M. and J.W.; methodology, Q.Z., J.M. and J.W.; software, J.W.; validation, J.W., C.Z., H.Z. and Q.Z.; formal analysis, B.W.; investigation, J.W.; resources, B.W.; data curation, C.Z., H.Z. and B.W.; writing—original draft preparation, Q.Z. and J.W.; writing—review and editing, J.M. and J.W.; visualization, C.Z. and H.Z.; supervision, C.Z., H.Z. and J.M.; project administration, Q.Z. and B.W.; funding acquisition, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Research on the Occurrence Mechanism and Integrated Prevention and Control Technologies of Juglans Leaf Necrosis Disease of Xinjiang Uygur Autonomous Region Challenge & Solution Science and Technology Program.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

We acknowledge the support from the villagers of Kuoqiak Airik Village, Qiarbag Town, Lop County, Hotan Prefecture during the field work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhu, K.; Ma, J.; Cong, J.; Zhang, T.; Lei, H.; Xu, H.; Luo, Z.; Li, M. The road to reuse of walnut by-products: A comprehensive review of bioactive compounds, extraction and identification methods, biomedical and industrial applications. Trends Food Sci. Technol. 2024, 143, 104264. [Google Scholar] [CrossRef]
Zhao, X.; Yang, J. Research on the development strategy of Xinjiang walnut industry cluster. China Oils Fats 2025, 50, 5–10. [Google Scholar] [CrossRef]
Zhu, B. Models for Estimating Foliar Mineral Nutrition Concentrations of Juglans regia ‘Xinxin2’ Using Spectral Reflectance. Master’s Thesis, Xinjiang Agricultural University, Ürümqi, China, 2016. [Google Scholar]
China News Service. How Xinjiang Forged a Ten-Billion-Yuan Industry Cluster from Walnuts? Available online: https://www.chinanews.com.cn/cj/2023/12-30/10138171.shtml (accessed on 17 September 2025).
Li, Y.; Pu, S.; Mao, X.; Zhang, J.; Li, Q. Analysis on characteristics and causes of leaf scorch in walnut. Xinjiang Agric. Sci. 2022, 59, 1475–1481. [Google Scholar] [CrossRef]
Liang, Z.; Zhang, J.; Jing, R.; Zou, Y.; Han, Y.; Ba, W.; Kheymo, A. Control technology for the physiological disease of “leaf margin scorch” in walnut. Xinjiang Agric. Sci. Technol. 2014, 19–20. [Google Scholar] [CrossRef]
Zhang, J.; Liang, Z.; Zou, Y.; Zhou, B. Study on causation of walnut withered leaf symptom in southern Xinjiang. Xinjiang Agric. Sci. 2012, 49, 1261–1265. [Google Scholar] [CrossRef]
Guo, T.; Wang, S.; Pan, C.; Sattar, A.; Xing, C.; Hao, H.; Zhang, C. Evidence of the involvement of Xylella fastidiosa in the occurrence of walnut leaf scorch in Xinjiang, China. Plant Dis. 2024, 108, 3648. [Google Scholar] [CrossRef]
Liu, Z. Causes and control measures of walnut leaf scorch in Aksu. Rural Sci. Technol. 2014, 41–42. [Google Scholar] [CrossRef]
Gao, R. Study on Effect of Environmental Factors on Disease of Walnut Leaf Scorch in Xinjiang Shajingzi. Master’s Thesis, Tarim University, Alar, China, 2017. [Google Scholar]
Han, M.; Jiang, P. Identification of the pathogens of walnut leaf spot disease. Xinjiang Agric. Sci. 2015, 52, 91–96. [Google Scholar]
Liu, L.; Dong, Y.; Huang, W.; Du, X.; Ren, B.; Huang, L.; Zheng, Q.; Ma, H. A disease index for efficiently detecting wheat Fusarium head blight using Sentinel-2 multispectral imagery. IEEE Access 2020, 8, 52181–52191. [Google Scholar] [CrossRef]
Bai, Y.; Zarco-Tejada, P.J.; Peñuelas, J.; McCabe, M.F.; Hawkesford, M.J.; Atzberger, C.; Poblete, T.; Kumar, L.; Reynolds, M.P.; Nie, C. Hyperspectral remote sensing for monitoring crop disease: Applications, challenges, and perspectives. IEEE Geosci. Remote Sens. Mag. 2025, 2–26. [Google Scholar] [CrossRef]
Zhi, J.; Li, L.; Fang, Y.; Zhi, D.; Guang, Y.; Liu, W.; Qu, L.; Fu, X.; Zhao, H. Rapid large-scale monitoring of pine wilt disease using Sentinel-1/2 images in GEE. Forests 2025, 16, 981. [Google Scholar] [CrossRef]
Feng, A.; Zhou, J.; Vories, E.; Sudduth, K.A. Evaluation of cotton emergence using UAV-based narrow-band spectral imagery with customized image alignment and stitching algorithms. Remote Sens. 2020, 12, 1764. [Google Scholar] [CrossRef]
Yao, J.; Hong, D.; Li, C.; Chanussot, J. SpectralMamba: Efficient Mamba for hyperspectral image classification. arXiv 2024, arXiv:2404.08489. [Google Scholar]
Goetz, A.F. Three decades of hyperspectral remote sensing of the earth: A personal view. Remote Sens. Environ. 2009, 113, S5–S16. [Google Scholar] [CrossRef]
Lin, X. Hyperspectral Remote Sensing Monitoring of Pinus densiflora Bursaphelenchus xylophilus Disease Infection Stages. Master’s Thesis, Beijing Forestry University, Beijing, China, 2020. [Google Scholar]
Nasi, R.; Honkavaara, E.; Lyytikainen-Saarenmaa, P.; Blomqvist, M.; Litkey, P.; Hakala, T.; Viljanen, N.; Kantola, T.; Tanhuanpaa, T.; Holopainen, M. Using UAV-Based photogrammetry and hyperspectral imaging for mapping bark beetle damage at tree-level. Remote Sens. 2015, 7, 15467–15493. [Google Scholar] [CrossRef]
Abdulridha, J.; Batuman, O.; Ampatzidis, Y. UAV-Based remote sensing technique to detect citrus canker disease utilizing hyperspectral imaging and machine learning. Remote Sens. 2019, 11, 1373. [Google Scholar] [CrossRef]
Liu, F.; Zhang, M.; Hu, J.; Pan, M.; Shen, L.; Ye, J.; Tan, J. Early diagnosis of pine wilt disease in Pinus thunbergii based on chlorophyll fluorescence parameters. Forests 2023, 14, 154. [Google Scholar] [CrossRef]
Ren, W.; Wu, D.; Qin, L. Preliminary study on data collecting and processing of unmanned airship low altitude hyperspectral remote sensing. Ecol. Environ. Monit. Three Gorges 2016, 1, 52–57. [Google Scholar] [CrossRef]
Yu, R.; Luo, Y.; Zhou, Q.; Zhang, X.; Wu, D.; Ren, L. A machine learning algorithm to detect pine wilt disease using UAV-based hyperspectral imagery and Lidar data at the tree level. Int. J. Appl. Earth Obs. Geoinformation 2021, 101, 102363. [Google Scholar] [CrossRef]
Abdulridha, J.; Ampatzidis, Y.; Kakarla, S.C.; Roberts, P. Detection of target spot and bacterial spot diseases in tomato using UAV-Based and benchtop-based hyperspectral imaging techniques. Precis. Agric. 2019, 21, 955–978. [Google Scholar] [CrossRef]
Guo, A.; Huang, W.; Dong, Y.; Ye, H.; Ma, H.; Liu, B.; Wu, W.; Ren, Y.; Ruan, C.; Geng, Y. Wheat yellow rust detection using UAV-based hyperspectral technology. Remote Sens. 2021, 13, 123. [Google Scholar] [CrossRef]
Wang, Y.; Feng, C.; Ma, Y.; Chen, X.; Lu, B.; Song, Y.; Zhang, Z.; Zhang, R. Estimation of nitrogen concentration in walnut canopies in southern Xinjiang based on UAV multispectral images. Agronomy 2023, 13, 1604. [Google Scholar] [CrossRef]
An, L.; Liu, Y.; Liu, G.; Zhao, R.; Tang, W.; Liu, M.; Li, J.; Li, Z.; Sun, H.; Li, M.; et al. Estimation on powdery mildew of wheat canopy based on in-situ hyperspectral responses and characteristic wavelengths optimization. Crop Prot. 2024, 184, 106804. [Google Scholar] [CrossRef]
Niu, Z.; Li, Y.; Moncada, J.D.S.; Johnson, W.; Lang, E.B.; Li, X.; Jin, J. Proximal hyperspectral imaging for early detection and disease development prediction of Septoria leaf blotch in wheat using spectral-temporal features. Comput. Electron. Agric. 2025, 235, 110400. [Google Scholar] [CrossRef]
Cohen, A.R.; Chen, G.; Berger, E.M.; Warrier, S.; Lan, G.; Grubert, E.; Dellaert, F.; Chen, Y. Dynamically controlled environment agriculture: Integrating machine learning and mechanistic and physiological models for sustainable food cultivation. ACS EST Eng. 2022, 2, 3–19. [Google Scholar] [CrossRef]
Özdoğan, G.; Gowen, A. Unveiling the potential: Harnessing spectral technologies for enhanced protein and gluten content prediction in wheat grains and flour. Curr. Res. Food Sci. 2025, 10, 101054. [Google Scholar] [CrossRef]
Chemura, A.; Mutanga, O.; Sibanda, M.; Chidoko, P. Machine learning prediction of coffee rust severity on leaves using spectroradiometer data. Trop. Plant Pathol. 2018, 43, 117–127. [Google Scholar] [CrossRef]
Lelong, C.C.D.; Roger, J.-M.; Brégand, S.; Dubertret, F.; Lanore, M.; Sitorus, N.A.; Raharjo, D.A.; Caliman, J.-P. Evaluation of oil-palm fungal disease infestation with canopy hyperspectral reflectance data. Sensors 2010, 10, 734–747. [Google Scholar] [CrossRef]
Zhang, N.; Wang, Y.; Zhang, X. Extraction of tree crowns damaged by Dendrolimus tabulaeformis Tsai et Liu via spectral-spatial classification using UAV-based hyperspectral images. Plant Methods 2020, 16, 135. [Google Scholar] [CrossRef]
Jia, J.; Zheng, X.; Wang, Y.; Chen, Y.; Karjalainen, M.; Dong, S.; Lu, R.; Wang, J.; Hyyppä, J. The effect of artificial intelligence evolving on hyperspectral imagery with different signal-to-noise ratio, spectral and spatial resolutions. Remote Sens. Environ. 2024, 311, 114291. [Google Scholar] [CrossRef]
Adam, E.M.; Mutanga, O.; Rugege, D.; Ismail, R. Discriminating the papyrus vegetation (Cyperus papyrus L.) and its co-existent species using random forest and hyperspectral data resampled to HYMAP. Int. J. Remote Sens. 2012, 33, 552–569. [Google Scholar] [CrossRef]
Plaza, A.; Benediktsson, J.A.; Boardman, J.W.; Brazile, J.; Bruzzone, L.; Camps-Valls, G.; Chanussot, J.; Fauvel, M.; Gamba, P.; Gualtieri, A.; et al. Recent advances in techniques for hyperspectral image processing. Remote Sens. Environ. 2009, 113, S110–S122. [Google Scholar] [CrossRef]
Abdel-Rahman, E.M.; Mutanga, O.; Adam, E.; Ismail, R. Detecting Sirex noctilio grey-attacked and lightning-struck pine trees using airborne hyperspectral data, random forest and support vector machines classifiers. ISPRS J. Photogramm. Remote Sens. 2014, 88, 48–59. [Google Scholar] [CrossRef]
Abe, B.T.; Olugbara, O.O.; Marwala, T. Experimental comparison of support vector machines with random forests for hyperspectral image land cover classification. J. Earth Syst. Sci. 2014, 123, 779–790. [Google Scholar] [CrossRef]
Shoot, C.; Andersen, H.-E.; Moskal, L.M.; Babcock, C.; Cook, B.D.; Morton, D.C. Classifying forest type in the national forest inventory context with airborne hyperspectral and Lidar data. Remote Sens. 2021, 13, 1863. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Camps-Valls, G.; Bruzzone, L. Kernel-based methods for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1351–1362. [Google Scholar] [CrossRef]
Cai, J.; Luo, J.; Wang, S.; Yang, S. Feature selection in machine learning: A new perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
Won, J.; Lee, H.-S.; Lee, J.-W. A review on multi-fidelity hyperparameter optimization in machine learning. ICT Express 2025, 11, 245–257. [Google Scholar] [CrossRef]
Waske, B.; Benediktsson, J.A.; Arnason, K.; Sveinsson, J.R. Mapping of hyperspectral AVIRIS data using machine-learning algorithms. Can. J. Remote Sens. 2009, 35, S106–S116. [Google Scholar] [CrossRef]
Huang, C.-L.; Wang, C.-J. A GA-based feature selection and parameters optimizationfor support vector machines. Expert Syst. Appl. 2006, 31, 231–240. [Google Scholar] [CrossRef]
Guo, T.; Han, L.; He, L.; Yang, X. A GA-based feature selection and parameter optimization for linear support higher-order tensor machine. Neurocomputing 2014, 144, 408–416. [Google Scholar] [CrossRef]
Zhuo, L.; Zheng, J.; Li, X.; Wang, F.; Ai, B.; Qian, J. A genetic algorithm based wrapper feature selection method for classification of hyperspectral images using support vector machine. In Geoinformatics 2008 and Joint Conference on GIS and Built Environment: Classification of Remote Sensing Images, Proceedings of the 2008 SPIE, Guangzhou, China, 28–29 June 2008; Society of Photo Optical: Bellingham, DC, USA, 2008; Volume 7147, pp. 503–511. [Google Scholar]
Wang, Z.; Shao, Y.-H.; Wu, T.-R. A GA-based model selection for smooth twin parametric-margin support vector machine. Pattern Recognit. 2013, 46, 2267–2277. [Google Scholar] [CrossRef]
Miranda, E.N.; Barbosa, B.H.G.; Silva, S.H.G.; Monti, C.A.U.; Tng, D.Y.P.; Gomide, L.R. Variable selection for estimating individual tree height using genetic algorithm and random forest. For. Ecol. Manag. 2022, 504, 119828. [Google Scholar] [CrossRef]
Zhang, S.; Huang, H.; Huang, Y.; Cheng, D.; Huang, J. A GA and SVM classification model for pine wilt disease detection using UAV-based hyperspectral imagery. Appl. Sci. 2022, 12, 6676. [Google Scholar] [CrossRef]
Samanta, S.; Chatterji, S.; Pratihar, S. Feature selection using quantum inspired island model genetic algorithm for wheat rust disease detection and severity estimation. In Proceedings of the International Conference on Frontiers in Computing and Systems, Goa, India, 13–15 December 2024; Springer: Singapore, 2024; Volume 492, pp. 499–511. [Google Scholar]
Zhang, F.; Wang, M.; Zhang, F.; Xiong, Y.; Wang, X.; Ali, S.; Zhang, Y.; Fu, S. Hyperspectral imaging combined with GA-SVM for maize variety identification. Food Sci. Nutr. 2024, 12, 3177–3187. [Google Scholar] [CrossRef]
Ju, Y. Research on the Relative Poverty Management of Luopu County in Hetian Area. Master’s Thesis, Tarim University, Alar, China, 2021. [Google Scholar]
Zhang, J. Study on the Typical Experience of Poverty Alleviation in Industry and the Consolidation Path of Poverty Alleviation in Luopu County. Master’s Thesis, Tarim University, Alar, China, 2023. [Google Scholar]
Wu, W.; Zhong, X.; Lei, C.; Zhao, Y.; Liu, T.; Sun, C.; Guo, W.; Sun, T.; Liu, S. Sampling survey method of wheat ear number based on UAV images and density map regression algorithm. Remote Sens. 2023, 15, 1280. [Google Scholar] [CrossRef]
Yang, L.; Gao, W.; Wang, W.; Zhang, C.; Wang, Y. First report of stem and root rot of coriander caused by Fusarium equiseti in China. Plant Dis. 2021, 105, 220. [Google Scholar] [CrossRef]
Xing, C.; Wang, S.; Zhang, C.; Guo, T.; Hao, H.; Zhang, Z.; Wang, S.; Shu, J. Effects of leaf scorch on chlorophyll fluorescence characteristics of walnut leaves. J. Plant Dis. Prot. 2023, 130, 115–124. [Google Scholar] [CrossRef]
Chiang, K.-S.; Bock, C.H. Understanding the ramifications of quantitative ordinal scales on accuracy of estimates of disease severity and data analysis in plant pathology. Trop. Plant Pathol. 2021, 47, 58–73. [Google Scholar] [CrossRef]
Dye, M.; Mutanga, O.; Ismail, R. Examining the utility of random forest and AISA Eagle hyperspectral image data to predict Pinus patula age in KwaZulu-Natal, South Africa. Geocarto Int. 2011, 26, 275–289. [Google Scholar] [CrossRef]
Savitzky, A.; Golay, M.J.E. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 1964, 36, 1627–1639. [Google Scholar] [CrossRef]
Yu, R.; Ren, L.; Luo, Y. Early detection of pine wilt disease in Pinus tabuliformis in North China using a field portable spectrometer and UAV-based hyperspectral imagery. For. Ecosyst. 2021, 8, 44. [Google Scholar] [CrossRef]
Mullen, K. Early Detection of Mountain Pine Beetle Damage in Ponderosa Pine Forests of the Black Hills Using Hyperspectral and WorldView-2 Data. Master’s Thesis, Minnesota State University, Mankato, MN, USA, 2016. [Google Scholar]
Kumar, C.; Mubvumba, P.; Huang, Y.; Dhillon, J.; Reddy, K. Multi-stage corn yield prediction using high-resolution UAV multispectral data and machine learning models. Agronomy 2023, 13, 1277. [Google Scholar] [CrossRef]
Latif, G.; Abdelhamid, S.E.; Mallouhy, R.E.; Alghazo, J.; Kazimi, Z.A. Deep learning utilization in agriculture: Detection of rice plant diseases using an improved CNN model. Plants 2022, 11, 2230. [Google Scholar] [CrossRef]
Centorame, L.; Gasperini, T.; Ilari, A.; Del Gatto, A.; Foppa Pedretti, E. An overview of machine learning applications on plant phenotyping, with a focus on sunflower. Agronomy 2024, 14, 719. [Google Scholar] [CrossRef]
Sarle, W.S. Neural Network FAQ, Part 2 of 7: Learning. 1997. Available online: https://www.inf.ufsc.br/~aldo.vw/patrec/FAQ2.html (accessed on 14 October 2025).
Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Peters, J.; De Baets, B.; Verhoest, N.E.C.; Samson, R.; Degroeve, S.; De Becker, P.; Huybrechts, W. Random forests as a tool for ecohydrological distribution modelling. Ecol. Model. 2007, 207, 304–318. [Google Scholar] [CrossRef]
R Development Core Team. R: A Language and Environment for Statistical Computing. 2025. Available online: https://www.R-project.org (accessed on 2 July 2025).
Drucker, H.; Burges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support Vector Regression Machines. In Advances in Neural Information Processing Systems 9, Proceedings of the 1996 Conference on Neural Information Processing Systems, Denver, CO, USA, 2–5 December 1996; MIT Press: Cambridge, MA, USA, 1996; Volume 9, pp. 155–161. [Google Scholar]
Vergun, S.; Deshpande, A.; Meier, T.B.; Song, J.; Tudorascu, D.L.; Nair, V.A.; Singh, V.; Biswal, B.B.; Meyerand, M.E.; Birn, R.M.; et al. Characterizing functional connectivity differences in aging adults using machine learning on resting state fMRI data. Front. Comput. Neurosci. 2013, 7, 38. [Google Scholar] [CrossRef]
Basak, D.; Pal, S.; Patranabis, D.C. Support vector regression. Neural Inf. Process.-Lett. Rev. 2007, 11, 203–224. [Google Scholar]
Meyer, D.; Wien, F.T. Support vector machines. R News 2001, 1, 23–26. [Google Scholar]
Demirtürk, B.; Harunoğlu, T. A comparative analysis of different machine learning algorithms developed with hyperparameter optimization in the prediction of student academic success. Appl. Sci. 2025, 15, 5879. [Google Scholar] [CrossRef]
Kang, Y.-W.; Li, J.; Cao, G.-Y.; Tu, H.-Y.; Li, J.; Yang, J. Dynamic temperature modeling of an SOFC using least squares support vector machines. J. Power Sources 2008, 179, 683–692. [Google Scholar] [CrossRef]
Aszemi, N.M.; Dominic, P.D.D. Hyperparameter optimization in convolutional neural network using genetic algorithms. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 269–278. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Guan, H.; Li, J.; Chapman, M.; Deng, F.; Ji, Z.; Yang, X. Integration of orthoimagery and Lidar data for object-based urban thematic mapping using random forests. Int. J. Remote Sens. 2013, 34, 5166–5186. [Google Scholar] [CrossRef]
Mustafa, G.; Zheng, H.; Liu, Y.; Yang, S.; Khan, I.H.; Hussain, S.; Liu, J.; Weize, W.; Chen, M.; Cheng, T.; et al. Leveraging machine learning to discriminate wheat scab infection levels through hyperspectral reflectance and feature selection methods. Eur. J. Agron. 2024, 161, 127372. [Google Scholar] [CrossRef]
Hsu, C.-W.; Chang, C.-C.; Lin, C.-J. A Practical Guide to Support Vector Classification. 2003. Available online: https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf (accessed on 17 September 2025).
Fan, R.E.; Chen, P.H.; Lin, C.J. Working set selection using second order information for training support vector machines. J. Mach. Learn. Res. 2005, 6, 1889–1918. [Google Scholar]
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Katoch, S.; Chauhan, S.S.; Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar] [CrossRef]
Bakır, H.; Ceviz, Ö. Empirical enhancement of intrusion detection systems: A comprehensive approach with genetic algorithm-based hyperparameter tuning and hybrid feature selection. Arab. J. Sci. Eng. 2024, 49, 13025–13043. [Google Scholar] [CrossRef]
Taha, Z.Y.; Abdullah, A.A.; Rashid, T.A. Optimizing feature selection with genetic algorithms: A review of methods and applications. Knowl. Inf. Syst. 2025, 67, 9739–9778. [Google Scholar] [CrossRef]
Zhang, Y.; Deng, N.; Zhang, S.; Liu, P.; Chen, C.; Cui, Z.; Chen, B.; Tan, T. Prediction of plasticizer property based on an improved genetic algorithm. Polymers 2022, 14, 4284. [Google Scholar] [CrossRef] [PubMed]
Scrucca, L. GA: A package for genetic algorithms in R. J. Stat. Softw. 2013, 53, 1–37. [Google Scholar] [CrossRef]
He, N.; Chen, B.; Lu, X.; Bai, B.; Fan, J.; Zhang, Y.; Li, G.; Guo, X. Integration of UAV multi-source data for accurate plant height and SPAD estimation in peanut. Drones 2025, 9, 284. [Google Scholar] [CrossRef]
Mabdeh, A.N.; Al-Fugara, A.; Khedher, K.M.; Mabdeh, M.; Al-Shabeeb, A.R.; Al-Adamat, R. Forest fire susceptibility assessment and mapping using support vector regression and adaptive neuro-fuzzy inference system-based evolutionary algorithms. Sustainability 2022, 14, 9446. [Google Scholar] [CrossRef]
Wu, Y.; Zhang, X. Object-based tree species classification using airborne hyperspectral images and LiDAR data. Forests 2019, 11, 32. [Google Scholar] [CrossRef]
Zadbagher, E.; Marangoz, A.M.; Becek, K. Estimation of above-ground biomass using machine learning approaches with InSAR and LiDAR data in tropical peat swamp forest of Brunei Darussalam. Iforest-Biogeosci. For. 2024, 17, 172–179. [Google Scholar] [CrossRef]
Zhang, C.; Liu, Y.; Tie, N. Forest land resource information acquisition with Sentinel-2 image utilizing support vector machine, k-nearest neighbor, random forest, decision trees and multi-layer perceptron. Forests 2023, 14, 254. [Google Scholar] [CrossRef]
Shin, H.J.; Cho, S. Response modeling with support vector machines. Expert Syst. Appl. 2006, 30, 746–760. [Google Scholar] [CrossRef]
Ke, T.; Ge, X.; Yin, F.; Zhang, L.; Zheng, Y.; Zhang, C.; Li, J.; Wang, B.; Wang, W. A general maximal margin hyper-sphere SVM for multi-class classification. Expert Syst. Appl. 2024, 237, 121647. [Google Scholar] [CrossRef]
Nedaie, A.; Najafi, A.A. Polar support vector machine: Single and multiple outputs. Neurocomputing 2016, 171, 118–126. [Google Scholar] [CrossRef]
Cherkassky, V.; Ma, Y. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw. 2004, 17, 113–126. [Google Scholar] [CrossRef] [PubMed]
Lameski, P.; Zdravevski, E.; Mingov, R.; Kulakov, A. SVM Parameter Tuning with Grid Search and Its Impact on Reduction of Model Over-Fitting. In Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing; Yao, Y., Hu, Q., Yu, H., Grzymala-Busse, J.W., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9437, pp. 464–474. [Google Scholar]
Hong, H.; Tsangaratos, P.; Ilia, I.; Liu, J.; Zhu, A.-X.; Xu, C. Applying genetic algorithms to set the optimal combination of forest fire related variables and model forest fire susceptibility based on data mining models. The case of Dayu County, China. Sci. Total Environ. 2018, 630, 1044–1056. [Google Scholar] [CrossRef]
Arnold, C.; Biedebach, L.; Kuepfer, A.; Neunhoeffer, M. The role of hyperparameters in machine learning models and how to tune them. Polit. Sci. Res. Methods 2024, 12, 841–848. [Google Scholar] [CrossRef]
Rimal, Y.; Sharma, N.; Alsadoon, A. The accuracy of machine learning models relies on hyperparameter tuning: Student result classification using random forest, randomized search, grid search, bayesian, genetic, and optuna algorithms. Multimed. Tools Appl. 2024, 83, 74349–74364. [Google Scholar] [CrossRef]
Chong, K.; Shah, N. Comparison of naive bayes and SVM classification in grid-search hyperparameter tuned and non-hyperparameter tuned healthcare stock market sentiment analysis. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 90–94. [Google Scholar] [CrossRef]
Wang, Z.; Yue, C.; Wang, J. An optimization framework with dimensionality reduction using Markov chain Monte Carlo and genetic algorithms for groundwater potential assessment. Appl. Soft Comput. 2024, 164, 111991. [Google Scholar] [CrossRef]
Sukawattanavijit, C.; Chen, J.; Zhang, H. GA-SVM algorithm for improving land-cover classification using SAR and optical remote sensing data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 284–288. [Google Scholar] [CrossRef]
Ji, Y.; Xu, K.; Zeng, P.; Zhang, W. GA-SVR algorithm for improving forest above ground biomass estimation using SAR data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 6585–6595. [Google Scholar] [CrossRef]
Anh, V.P.; Minh, L.N.; Lam, T.B. Feature Weighting and SVM parameters optimization based on genetic algorithms for classification problems. Appl. Intell. 2017, 46, 455–469. [Google Scholar] [CrossRef]
Sawyerr, B.A.; Adewumi, A.O.; Ali, M.M. Real-coded genetic algorithm with uniform random local search. Appl. Math. Comput. 2014, 228, 589–597. [Google Scholar] [CrossRef]
Syarif, I.; Prugel-Bennett, A.; Wills, G. SVM Parameter optimization using grid search and genetic algorithm to improve classification performance. TELKOMNIKA Telecommun. Comput. Electron. Control 2016, 14, 1502–1509. [Google Scholar] [CrossRef]
Zhao, H.; Bruzzone, L.; Guan, R.; Zhou, F.; Yang, C. Spectral-spatial genetic algorithm-based unsupervised band selection for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9616–9632. [Google Scholar] [CrossRef]
Zhang, J.; Huang, Y.; Pu, R.; Gonzalez-Moreno, P.; Yuan, L.; Wu, K.; Huang, W. Monitoring plant diseases and pests through remote sensing technology: A review. Comput. Electron. Agric. 2019, 165, 104943. [Google Scholar] [CrossRef]
Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
Li, X.; Tong, T.; Luo, T.; Wang, J.; Rao, Y.; Li, L.; Jin, D.; Wu, D.; Huang, H. Retrieving the infected area of pine wilt disease-disturbed pine forests from medium-resolution satellite images using the stochastic radiative transfer theory. Remote Sens. 2022, 14, 1526. [Google Scholar] [CrossRef]
Castro-Valdecantos, P.; Egea, G.; Borrero, C.; Perez-Ruiz, M.; Aviles, M. Detection of Fusarium wilt-induced physiological impairment in strawberry plants using hyperspectral imaging and machine learning. Precis. Agric. 2024, 25, 2958–2976. [Google Scholar] [CrossRef]
Khan, I.H.; Liu, H.; Li, W.; Cao, A.; Wang, X.; Liu, H.; Cheng, T.; Tian, Y.; Zhu, Y.; Cao, W.; et al. Early detection of powdery mildew disease and accurate quantification of its severity using hyperspectral images in wheat. Remote Sens. 2021, 13, 3612. [Google Scholar] [CrossRef]
Azadbakht, M.; Ashourloo, D.; Aghighi, H.; Radiom, S.; Alimohammadi, A. Wheat leaf rust detection at canopy scale under different LAI levels using machine learning techniques. Comput. Electron. Agric. 2019, 156, 119–128. [Google Scholar] [CrossRef]
Zhang, L.; Tian, X.; Li, Y.; Chen, Y.; Chen, Y.; Ma, J. Estimation of disease severity for downy mildew of greenhouse cucumber based on visible spectral and machine learning. Spectrosc. Spectr. Anal. 2020, 40, 227–232. [Google Scholar]
Ashourloo, D.; Mobasheri, M.; Huete, A. Developing two spectral disease indices for detection of wheat leaf rust (Puccinia triticina). Remote Sens. 2014, 6, 4723–4740. [Google Scholar] [CrossRef]
Sothe, C.; De Almeida, C.M.; Schimalski, M.B.; La Rosa, L.E.C.; Castro, J.D.B.; Feitosa, R.Q.; Dalponte, M.; Lima, C.L.; Liesenberg, V.; Miyoshi, G.T.; et al. Comparative performance of convolutional neural network, weighted and conventional support vector machine and random forest for classifying tree species using hyperspectral and photogrammetric data. GIScience Remote Sens. 2020, 57, 369–394. [Google Scholar] [CrossRef]
Adeniyi, A.E.; Madamidola, O.A.; Awotunde, J.B.; Misra, S.; Agrawal, A. Comparative Analysis of CNN and SVM Machine Learning Techniques for Plant Disease Detection. In Data Engineering and Applications; Agrawal, J., Shukla, R.K., Sharma, S., Shieh, C.-S., Eds.; Lecture Notes in Electrical Engineering; Springer Nature: Singapore, 2024; Volume 1146, pp. 389–402. [Google Scholar]
Tu, B.; Zhou, T.; Liu, B.; He, Y.; Li, J.; Plaza, A. Multi-scale autoencoder suppression strategy for hyperspectral image anomaly detection. IEEE Trans. Image Process. 2025, 34, 5115–5130. [Google Scholar] [CrossRef]
Yao, S.; Guan, R.; Peng, Z.; Xu, C.; Shi, Y.; Ding, W.; Gee Lim, E.; Yue, Y.; Seo, H.; Lok Man, K.; et al. Exploring radar data representations in autonomous driving: A comprehensive review. IEEE Trans. Intell. Transp. Syst. 2025, 26, 7401–7425. [Google Scholar] [CrossRef]
Wang, Y.; Yang, X.; Wang, H.; Wang, H.; Chen, Z.; Yun, L. RSWD-YOLO: A walnut detection method based on UAV remote sensing images. Horticulturae 2025, 11, 419. [Google Scholar] [CrossRef]
Zhao, Y.; Lv, W.; Xu, S.; Wei, J.; Wang, G.; Dang, Q.; Liu, Y.; Chen, J. DETRs beat YOLOs on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 16965–16974. [Google Scholar]
Tao, J.-Z.; Song, D.-R.; Song, C.-M.; Wang, X.-H. Multi-band remote sensing image sharpening: A survey. Spectrosc. Spectr. Anal. 2023, 43, 2999–3008. [Google Scholar]
Sa, H.-Y.; Huang, X.; Ling, L.; Zhou, D.; Zhang, J.; Bao, G.; Tong, S.; Bao, Y.; Ganbat, D.; Ariunaa, M.; et al. Multi-dimensional estimation of leaf loss rate from larch caterpillar under insect pest stress using UAV-based multi-source remote sensing. Drones 2025, 9, 529. [Google Scholar] [CrossRef]
Zhang, N.; Chai, X.; Li, N.; Zhang, J.; Sun, T. Applicability of UAV-based optical imagery and classification algorithms for detecting pine wilt disease at different infection stages. GIScience Remote Sens. 2023, 60, 2170479. [Google Scholar] [CrossRef]
Feng, Z.; Song, L.; Duan, J.; He, L.; Zhang, Y.; Wei, Y.; Feng, W. Monitoring wheat powdery mildew based on hyperspectral, thermal infrared, and RGB image data fusion. Sensors 2022, 22, 31. [Google Scholar] [CrossRef] [PubMed]
Zhou, Q.; Yu, L.; Zhang, X.; Liu, Y.; Zhan, Z.; Ren, L.; Luo, Y. Fusion of UAV hyperspectral imaging and LiDAR for the early detection of EAB stress in Ash and a new EAB detection index—NDVI (776,678). Remote Sens. 2022, 14, 2428. [Google Scholar] [CrossRef]
Sankey, T.; Donager, J.; McVay, J.; Sankey, J.B. UAV Lidar and hyperspectral fusion for forest monitoring in the southwestern USA. Remote Sens. Environ. 2017, 195, 30–43. [Google Scholar] [CrossRef]
Zhou, G.; Jia, G.; Zhou, X.; Song, N.; Wu, J.; Gao, K.; Huang, J.; Xu, J.; Zhu, Q. Adaptive high-speed echo data acquisition method for bathymetric LiDAR. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–17. [Google Scholar] [CrossRef]
Francesconi, S.; Harfouche, A.; Maesano, M.; Balestra, G.M. UAV-based thermal, RGB imaging and gene expression analysis allowed detection of Fusarium head blight and gave new insights into the physiological responses to the disease in durum wheat. Front. Plant Sci. 2021, 12, 628575. [Google Scholar] [CrossRef]
Li, K.; Wang, C.; Rong, G.; Wei, S.; Liu, C.; Yang, Y.; Sudu, B.; Guo, Y.; Sun, Q.; Zhang, J. Dynamic evaluation of agricultural drought hazard in Northeast China based on coupled multi-source data. Remote Sens. 2023, 15, 57. [Google Scholar] [CrossRef]
Utla, C.S.; Dashora, A.; Mishra, R.K.; Zhang, Y. A review on aboveground biomass estimation methods utilizing forest structural characteristics. Int. J. Remote Sens. 2025, 46, 5917–5937. [Google Scholar] [CrossRef]
Wang, S.; Xu, W.; Guo, T. Advances in thermal infrared remote sensing technology for geothermal resource detection. Remote Sens. 2024, 16, 1690. [Google Scholar] [CrossRef]
Deng, J.; Liu, S.; Chen, H.; Chang, Y.; Yu, Y.; Ma, W.; Wang, Y.; Xie, H. A precise method for identifying 3-D circles in freeform surface point clouds. IEEE Trans. Instrum. Meas. 2025, 74, 1–13. [Google Scholar] [CrossRef]
Rasti, B.; Hong, D.; Hang, R.; Ghamisi, P.; Kang, X.; Chanussot, J.; Benediktsson, J.A. Feature extraction for hyperspectral imagery: The evolution from shallow to deep: Overview and toolbox. IEEE Geosci. Remote Sens. Mag. 2020, 8, 60–88. [Google Scholar] [CrossRef]
Szantoi, Z.; Escobedo, F.J.; Abd-Elrahman, A.; Pearlstine, L.; Dewitt, B.; Smith, S. Classifying spatially heterogeneous wetland communities using machine learning algorithms and spectral and textural features. Environ. Monit. Assess. 2015, 187, 262. [Google Scholar] [CrossRef]
Matarira, D.; Mutanga, O.; Naidu, M. Google Earth Engine for informal settlement mapping: A random forest classification using spectral and textural information. Remote Sens. 2022, 14, 5130. [Google Scholar] [CrossRef]
Zhang, D.; Wang, Q.; Lin, F.; Yin, X.; Gu, C.; Qiao, H. Development and evaluation of a new spectral disease index to detect wheat Fusarium head blight using hyperspectral imaging. Sensors 2020, 20, 2260. [Google Scholar] [CrossRef] [PubMed]
Guo, A.; Huang, W.; Ye, H.; Dong, Y.; Ma, H.; Ren, Y.; Ruan, C. Identification of wheat yellow rust using spectral and texture features of hyperspectral images. Remote Sens. 2020, 12, 1419. [Google Scholar] [CrossRef]
Gao, C.; Ji, X.; He, Q.; Gong, Z.; Sun, H.; Wen, T.; Guo, W. Monitoring of wheat Fusarium head blight on spectral and textural analysis of UAV multispectral imagery. Agriculture 2023, 13, 293. [Google Scholar] [CrossRef]
Ma, R.; Zhang, N.; Zhang, X.; Bai, T.; Yuan, X.; Bao, H.; He, D.; Sun, W.; He, Y. Cotton Verticillium wilt monitoring based on UAV multispectral-visible multi-source feature fusion. Comput. Electron. Agric. 2024, 217, 108628. [Google Scholar] [CrossRef]
Song, Z.; Liu, Y.; Yu, J.; Guo, Y.; Jiang, D.; Zhang, Y.; Guo, Z.; Chang, Q. Estimation of chlorophyll content in apple leaves infected with mosaic disease by combining spectral and textural information using hyperspectral images. Remote Sens. 2024, 16, 2190. [Google Scholar] [CrossRef]
Gara, T.W.; Skidmore, A.K.; Darvishzadeh, R.; Wang, T. Leaf to canopy upscaling approach affects the estimation of canopy traits. GIScience Remote Sens. 2019, 56, 554–575. [Google Scholar] [CrossRef]
Lin, S.; Li, J.; Liu, Q.; Huete, A.; Li, L. Effects of forest canopy vertical stratification on the estimation of gross primary production by remote sensing. Remote Sens. 2018, 10, 1329. [Google Scholar] [CrossRef]
Zhang, W.; Yang, G.; Qi, J.; Chen, R.; Zhang, C.; Xu, B.; Wu, B.; Su, X.; Zhao, C. The impacts of tree shape, disease distribution and observation geometry on the performances of disease spectral indices of apple trees. Remote Sens. Environ. 2025, 329, 114953. [Google Scholar] [CrossRef]
Alsadik, B.; Ellsasser, F.J.; Awawdeh, M.; Al-Rawabdeh, A.; Almahasneh, L.; Elberink, S.O.; Abuhamoor, D.; Al Asmar, Y. Remote sensing technologies using UAVs for pest and disease monitoring: A review centered on date palm trees. Remote Sens. 2024, 16, 4371. [Google Scholar] [CrossRef]
Heinzel, J.; Koch, B. Exploring full-waveform LiDAR parameters for tree species classification. Int. J. Appl. Earth Obs. Geoinformation 2011, 13, 152–160. [Google Scholar] [CrossRef]
Ni-Meister, W.; Yang, W.; Kiang, N.Y. A clumped-foliage canopy radiative transfer model for a global dynamic terrestrial ecosystem model. I: Theory. Agric. For. Meteorol. 2010, 150, 881–894. [Google Scholar] [CrossRef]
Toda, M.; Ishihara, M.I.; Doi, K.; Hara, T. Determination of species-specific leaf angle distribution and plant area index in a cool-temperate mixed forest from UAV and upward-pointing digital photography. Agric. For. Meteorol. 2022, 325, 109151. [Google Scholar] [CrossRef]
Liu, X.; Liu, Y.; Chen, X.; Wan, Y.; Gao, D.; Cao, P. LiDAR-assisted UAV variable-rate spraying system. Agriculture 2025, 15, 1782. [Google Scholar] [CrossRef]

Figure 1. Locations of the study and UAV flight areas. (a) is the location of Luopu County in Hotan Prefecture; (b) is the location of Hotan Prefecture, Xinjiang Uygur Autonomous Region.

Figure 2. The technical workflow of our experiment.

Figure 3. Sampling and plot design of WLS identification.

Figure 4. Walnut leaf scorch of various levels. (I–V) represent scorched areas of 0% (Healthy), 0–25%, 26–50%, 51–75%, and 75–100%, respectively.

Figure 5. The hyperspectral imaging sensor FS-60c (a) and UAV platform called DJI M350 RTK (b).

Figure 6. Distribution of selected subset of bands obtained by GA-RF and GA-SVM algorithms.

Figure 7. Scatter plots of predicted versus observed values (a,c,e,g) and their corresponding residual distributions (b,d,f,h) for the four optimized models on the test set. The models evaluated are Random Forest with Grid Search (GS-RF), Support Vector Machine with Grid Search (GS-SVM), Random Forest with Genetic Algorithm (GA-RF), and Support Vector Machine with Genetic Algorithm (GA-SVM). The solid 1:1 identity line and the dashed zero-error line represent a perfect prediction.

Table 1. Classification criteria for WLS disease severity on individual leaves.

Disease Grade	Representative Value	Grading Standards
Grade I (b0)	0	The scorched area of leaves is 0
Grade II (b1)	1	0–25% of leaf area in diseased leaves become scorched
Grade III (b2)	2	26–50% of the diseased leaves become brown and scorched
Grade IV (b3)	3	51–75% of the diseased leaves become brown and scorched
Grade V (b4)	4	76–100% of the diseased leaves become scorched

Table 2. Optimal configurations for the developed models based on 10-fold cross-validation.

Optimization Method	Model	Optimal Hyperparameters	No. of Features	Selection Criterion (Value)
Grid Search	GS-RF	ntree: 600, mtry: 38, mtry factor: 0.5	231	Min. RMSE (0.0754 ± 0.0103)
Grid Search	GS-SVM	C: 2¹¹, γ: 2⁻¹¹, ε: 2⁻⁵	231	Min. RMSE (0.0683 ± 0.0081)
Genetic Algorithm	GA-RF	ntree: 400, mtry: 27, mtry factor: 0.75	108	Max. Fitness (0.4115)
Genetic Algorithm	GA-SVM	C: 2¹¹, γ: 2⁻¹¹, ε: 2⁻²	96	Max. Fitness (0.4966)

Table 3. The Model evaluation metrics of all models.

WLS-UAV Dataset	Model	R²	RMSE	MAE
Train	GS-RF	0.9226	0.0303	0.0229
	GS-SVM	0.6882	0.0608	0.0431
	GA-RF	0.9216	0.0305	0.0232
	GA-SVM	0.6647	0.0631	0.0480
Test	GS-RF	0.5260	0.0712	0.0554
	GS-SVM	0.5997	0.0654	0.0498
	GA-RF	0.5331	0.0707	0.0550
	GA-SVM	0.6302	0.0629	0.0480

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Weng, J.; Zhang, Q.; Wang, B.; Zhang, C.; Zhang, H.; Meng, J. Detecting Walnut Leaf Scorch Using UAV-Based Hyperspectral Data, Genetic Algorithm, Random Forest and Support Vector Machine Learning Algorithms. Remote Sens. 2025, 17, 3986. https://doi.org/10.3390/rs17243986

AMA Style

Weng J, Zhang Q, Wang B, Zhang C, Zhang H, Meng J. Detecting Walnut Leaf Scorch Using UAV-Based Hyperspectral Data, Genetic Algorithm, Random Forest and Support Vector Machine Learning Algorithms. Remote Sensing. 2025; 17(24):3986. https://doi.org/10.3390/rs17243986

Chicago/Turabian Style

Weng, Jian, Qiang Zhang, Baoqing Wang, Cuifang Zhang, Heyu Zhang, and Jinghui Meng. 2025. "Detecting Walnut Leaf Scorch Using UAV-Based Hyperspectral Data, Genetic Algorithm, Random Forest and Support Vector Machine Learning Algorithms" Remote Sensing 17, no. 24: 3986. https://doi.org/10.3390/rs17243986

APA Style

Weng, J., Zhang, Q., Wang, B., Zhang, C., Zhang, H., & Meng, J. (2025). Detecting Walnut Leaf Scorch Using UAV-Based Hyperspectral Data, Genetic Algorithm, Random Forest and Support Vector Machine Learning Algorithms. Remote Sensing, 17(24), 3986. https://doi.org/10.3390/rs17243986

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detecting Walnut Leaf Scorch Using UAV-Based Hyperspectral Data, Genetic Algorithm, Random Forest and Support Vector Machine Learning Algorithms

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. The General Structure and Overall Workflow of This Study

2.3. Groud Data Collection and Analysis

2.3.1. Sampling Design and Field Measurements

2.3.2. Ground Data Analysis Method

2.4. Hyperspectral Data Acquisition and Preprocessing

2.4.1. UAV-Based Hyperspectral Imagery Acquisition

2.4.2. Hyperspectral Imagery Preprocessing

2.5. Model Building

2.5.1. Random Forest

2.5.2. Support Vector Machine

2.5.3. Hyperparameter Optimization and Feature Selection

2.6. Evaluation Metrics

3. Results

3.1. Model Optimization Results

3.2. Comparative Performance and Visualization

4. Discussion

4.1. Model Development and Performance

4.2. Comparative Study and Future Perspective

4.2.1. Deep Learning

4.2.2. Multi-Source Remote Sensing Application

4.2.3. Spectral and Texture Indices

4.2.4. Canopy 3D Structure

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI