Optimized Hybrid Feature Space for High-Efficiency Citrus Disease Diagnosis: A Fusion of Handcrafted Blue-Green-Red Color Moments and Deep Convolutional Descriptors

Tello-Leal, Edgar; Macías-Hernández, Bárbara A.; Rubio-Tinajero, Sarahi; Hernandez-Resendiz, Jaciel David; Ramirez-Alcocer, Ulises Manuel

doi:10.3390/agriculture16060711

Open AccessArticle

Optimized Hybrid Feature Space for High-Efficiency Citrus Disease Diagnosis: A Fusion of Handcrafted Blue-Green-Red Color Moments and Deep Convolutional Descriptors

by

Edgar Tello-Leal

^1,*

,

Bárbara A. Macías-Hernández

¹

,

Sarahi Rubio-Tinajero

²

,

Jaciel David Hernandez-Resendiz

³

and

Ulises Manuel Ramirez-Alcocer

⁴

¹

Faculty of Engineering and Science, Autonomous University of Tamaulipas, Victoria 87000, Mexico

²

Multidisciplinary Academic Unit Mante, Autonomous University of Tamaulipas, El Mante 89840, Mexico

³

Cinvestav Tamaulipas, Center for Research and Advanced Studies, Victoria 87130, Mexico

⁴

Multidisciplinary Academic Unit Reynosa-Rodhe, Autonomous University of Tamaulipas, Reynosa 88779, Mexico

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(6), 711; https://doi.org/10.3390/agriculture16060711

Submission received: 1 March 2026 / Revised: 19 March 2026 / Accepted: 20 March 2026 / Published: 23 March 2026

(This article belongs to the Special Issue Diseases Diagnosis, Prevention and Weeds Control in Crops—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Accurate and timely diagnosis of citrus diseases is essential for reducing economic losses in global agriculture. Although deep learning models provide high diagnostic accuracy, their computational demands often hinder deployment on resource-limited edge devices. To overcome this challenge, this study proposes an optimized hybrid framework for phytopathological classification. The methodology combines handcrafted descriptors (Blue-Green-Red “BGR” color statistical moments) with hierarchical spatial abstractions derived from a pre-trained Visual Geometry Group 16-layer (VGG16) deep architecture. An initial high-dimensional feature space was created by concatenating 360 handcrafted statistical descriptors and 12,800 deep textural features. By implementing a Wrapper-Greedy Stepwise selection strategy, this original space was reduced by over 96%. The resulting Elite Model identifies 12 and 18 critical attributes across two independent, transcontinental datasets (Mexico and Pakistan, respectively), effectively capturing both subtle chromatic anomalies and complex structural lesions. Experimental benchmarking confirms that this parsimonious hybrid approach delivers robust classification accuracy ranging from 87.30% to 95.23%, significantly outperforming unimodal architectures. Ultimately, this framework provides a highly efficient, interpretable, and scalable solution for real-time disease monitoring in precision agriculture.

Keywords:

citrus diseases; feature fusion; wrapper selection; VGG16; BGR color space; dimensionality reduction

1. Introduction

Citrus crops are among the most economically important fruit commodities globally, with production spanning tropical and subtropical regions worldwide. However, citrus cultivation faces persistent challenges from foliar diseases that significantly reduce yield, degrade fruit quality, and shorten tree longevity [1,2]. Among the most devastating are Huanglongbing (HLB or citrus greening), citrus canker, melanose, greasy spot, and nutrient deficiencies, including zinc deficiency [3,4,5]. Traditional disease diagnosis relies heavily on visual inspection by trained agronomists, a process that is time-consuming, subjective, and often delayed until symptoms become severe [6,7]. HLB disease is widely recognized as the most destructive citrus disease worldwide, causing branch dieback and plant death, with no known cure [8,9,10]. Early detection and removal of infected trees remain the primary management strategy to prevent disease spread. Citrus canker, caused by Xanthomonas citri, produces characteristic raised lesions on leaves, stems, and fruit, leading to defoliation and fruit drop [9]. Melanose and greasy spot are fungal diseases that cause cosmetic damage and reduce photosynthetic capacity, while zinc deficiency manifests as interveinal chlorosis and reduced leaf size, affecting overall tree health and productivity [1].

In agriculture, diagnosing citrus diseases is not a simple visual task. Field symptoms show significant biological variation, with signs changing based on the infection stage, the plant’s health, and environmental conditions. Additionally, the overlap between disease symptoms (such as HLB) and nutritional deficiencies (such as zinc deficiency) complicates diagnosis, often requiring specialized knowledge from an agronomist. Expert inspection remains the highest standard because it detects subtle color changes and structural textures. However, the scale of global citrus production demands tools capable of supporting expert-level screening. Our hybrid framework is designed not to replace human expertise but to act as a digital proxy that captures these multidimensional biological patterns, combining optical physiology (color) with structural features (texture) and providing a reliable decision-support tool in uneven field conditions.

In this context, the advent of machine learning (ML) and deep learning (DL) has revolutionized plant disease diagnosis [11], offering automated, objective, and scalable solutions for early detection [12,13,14,15]. Computer vision, combined with artificial intelligence (AI), enables rapid analysis of leaf images to identify disease symptoms with accuracy that often matches or exceeds that of human experts [3,16]. This transformation is particularly critical for citrus diseases such as HLB, which requires early detection [10]. The transition to DL represents a fundamental paradigm shift, as convolutional neural networks (CNN) automatically learn hierarchical feature representations directly from raw image data [15,17], eliminating the need for manual feature engineering [6,18]. This capability has proven particularly valuable for detecting subtle disease symptoms and for handling the natural variability in leaf appearance, lighting conditions, and disease progression stages encountered in real-world agricultural settings [19].

Transfer learning has become a cornerstone technique in agricultural DL applications [20,21], leveraging knowledge from models pre-trained on large-scale datasets such as ImageNet [22]. This approach addresses the challenge of limited labeled agricultural data by initializing the model weights with features learned from general image-recognition tasks and then fine-tuning on domain-specific citrus-disease datasets. Studies consistently show that transfer learning with architectures such as ResNet50 [23], Visual Geometry Group 16-layer (VGG16) [24], InceptionV3 [25], and EfficientNetB0 [26] outperforms training from scratch.

Despite the undeniable success of transfer learning and deep architectures, several technical challenges remain unaddressed in citrus pathology, and a gap persists in developing models that achieve high performance without excessive computational resources [6,27]. CNN, while proficient at capturing complex textures, often exhibits texture bias, in which pooling and normalization layers suppress fine-grained variations in the blue-green-red (BGR) color space [4,28]. This limitation is particularly critical for citrus diseases such as HLB or zinc deficiency, where the earliest diagnostic markers are subtle shifts in leaf pigment distribution rather than structural changes. Moreover, the shift from these three primary color channels to the multi-channel representation used in deep architectures, which often produces

5 \times 5 \times 512

feature maps in the final convolutional stages, results in an extremely high-dimensional feature space. This high dimensionality of deep-learned features imposes a significant computational burden [29], and most current approaches sacrifice interpretability and efficiency for architectural depth, hindering deployment of these models on resource-constrained devices for real-time field monitoring (edge computing).

Recent advances in precision agriculture have shifted the focus from powerful cloud-based servers to Edge AI and lightweight systems that operate on resource-limited devices, such as mobile platforms and Internet of Things (IoT) sensors [6]. While cutting-edge DL models achieve high accuracy, deploying them in the field is often hampered by high energy use and long inference times. Current research in lightweight plant disease detection aims to close this gap through model quantization, knowledge distillation, or hybrid feature engineering. In this context, our proposed framework introduces a streamlined elite feature space that emphasizes computational efficiency while maintaining the diagnostic accuracy required for international disease detection. By concentrating on a small set of 12 to 18 distinctive descriptors, this study meets the needs of real-time edge deployment and provides a scalable solution for local, on-site agricultural monitoring.

To address the trade-off between complex high-dimensional features and the limitations of edge computing hardware, this study introduces a targeted feature optimization method. The design of this architecture is based on its two-stage process: first, it combines handcrafted BGR statistical moments (specifically, the mean, standard deviation, skewness, and kurtosis) to counteract the loss of color information common in deep architectures. Second, to manage the high-dimensional space created by the

5 \times 5 \times 512

convolutional maps, we utilize a Wrapper-Greedy Stepwise selection process. Unlike traditional hybrid classification frameworks that focus on high-dimensional feature concatenation, this study introduces an Elite Model based on an aggressive pruning and distillation strategy. The novelty of this work lies in an optimization pipeline that connects traditional optical physiology and deep textural abstractions, capturing both subtle chromatic anomalies and complex structural lesions within a streamlined framework. By identifying a minimalist, domain-invariant feature set, the proposed approach reduces the original feature space by 96.23% without compromising diagnostic accuracy. Additionally, this method effectively reduces the texture bias present in DL models, thereby improving the architecture for real-time field monitoring and edge computing environments. The strength of this framework is confirmed by a transcontinental validation using independent datasets from Mexico and Pakistan, effectively linking high-level spatial abstractions to practical biophysical descriptors. Based on this objective, the primary contributions of this research are threefold:

We present a hybrid feature selection framework that combines DL abstractions with biophysical color moments (BGR). This method goes beyond black-box models by incorporating human-interpretable descriptors that identify subtle symptomatic variations often overlooked by CNNs.
By applying an aggressive pruning strategy through Greedy Stepwise Search, we reduced the feature space by 95.38% (from 390 to 18 descriptors). This results in a 0.03 s inference latency, demonstrating that minimalist architectures can outperform multi-million-parameter models such as MobileNetV2 and EfficientNetB0 on specific agricultural tasks.
The framework underwent transcontinental validation using diverse datasets from Mexico and Pakistan. This ensures that the model is not overfitted to a single region’s optical conditions, providing a robust, edge-ready solution for real-time citrus disease monitoring on low-resource mobile devices.

2. Materials and Methods

2.1. Datasets

2.1.1. Transcontinental Data Sources

The first experiment uses a dataset of Citrus sinensis (Valencia orange) leaves, originally compiled to identify diseases in local crops in Mexico, and collected during the 2019–2020 period [30]. The dataset comprises 262 field-collected samples, classified into four leaf conditions: greasy spot (64), healthy (70), variegated chlorosis (64), and zinc deficiency (64). To evaluate how well the proposed methodology works across different image resolutions, geographic locations, and class distributions, a second experiment was conducted using the leaf subset of the citrus fruits and leaves dataset [31]. From the original collection, only images of leaf tissue (

N = 609

) were retained, while fruit samples were discarded to maintain morphological consistency with the first experiment. The samples originate from the Sargodha region of Pakistan, focusing on the Kinnow Mandarin (Citrus nobilis × Citrus deliciosa). These images have been validated by experts (ground truth) and were collected between 2017 and 2018. The distribution of instances in this second dataset is characterized by a notable imbalance among classes [32]: black spot (171), canker (163), greening/HLB (204), melanose (13), and healthy (58). This class imbalance reflects real-world agricultural settings, where the prevalence of certain diseases often exceeds that of others, providing a rigorous test of the proposed framework’s robustness.

2.1.2. Data Acquisition and Environmental Conditions

Regarding the first dataset (Mexico), the images were captured with a 5-megapixel camera, producing high-resolution files of

2592 \times 1456

pixels. The capture occurred in a semi-controlled environment with consistent lighting and a standard background (natural canopy) to minimize specular reflections and shadow artifacts. This setup aimed to preserve the color accuracy of the leaf pigments and serve as a baseline for the model’s highest possible precision (see Table 1).

On the other hand, the second dataset (Pakistan) comprised images taken outdoors using a high-resolution Digital Single-Lens Reflex (DSLR) camera (Canon EOS 1300D). The images were captured under inconsistent natural lighting and varying weather conditions, which created significant optical challenges, including harsh shadows, specular reflections on leaf surfaces, and complex backgrounds such as natural soil, dense foliage, and human interference. Although the original images were shot at a high pixel density, all samples were resized to

256 \times 256

pixels to improve computational efficiency. This resizing caused a substantial loss of detail (72 dpi), posing a major challenge for manual texture descriptors and testing the model’s performance under less-than-ideal data conditions (see Table 1).

2.1.3. Dataset Comparison and Class Distribution

Table 1 summarizes the technical specifications and the distribution of pathological classes for both study areas.

The selection of datasets from Mexico and Pakistan was strategically designed to assess the framework’s robustness across different climate and varietal conditions. Mexico (specifically the Veracruz region) has a tropical-to-subtropical, humid climate in which Citrus sinensis is dominant, often resulting in leaves with high cuticle reflectance. In contrast, the Sargodha region in Pakistan has a semi-arid climate and is a major producer of Kinnow mandarin (Citrus nobilis × Citrus deliciosa). This deliberate choice to use non-overlapping disease sets and different hosts was made to avoid pathological and taxonomic symmetry. By evaluating the methodology in geographically distant regions with notable differences in leaf thickness, solar exposure (Albedo), and humidity levels, we ensure that the hybrid feature selection is not overfit to a specific disease. Instead, this transcontinental approach demonstrates that the framework is a versatile, globally relevant diagnostic tool for the genus Citrus, capable of detecting both subtle color anomalies and complex structural lesions, regardless of regional variety or environmental conditions.

2.2. Experiment Design

The procedure for constructing the initial dataset uses an early-feature-fusion approach. For each leaf sample, a feature vector was generated from two complementary domains: handcrafted descriptors and DL CNN features. From these, three test configurations were derived: (a) handcrafted only, (b) CNN only (post-PCA), and (c) hybrid (post-PCA). This experimental design enabled an ablation study to quantify the individual contributions of traditional computer vision and DL.

2.2.1. Feature Extraction

Handcrafted Chromatic Moments ( $V_{H C}$ ): In this work, we opted for patch-based feature extraction rather than global segmentation-based approaches, such as Otsu’s method. To capture pathological signatures in the color domain, we performed a local statistical analysis. Each image was divided into 30

128 \times 128

-pixel patches using a deterministic

5 \times 6

grid sampling pattern with no overlap (stride = 128 pixels). To ensure reproducibility across all runs and folds, a fixed coordinate seed was used, keeping the position of each patch (e.g., Patch 1 to Patch 30) consistent throughout the dataset. For each patch, the first four statistical moments: mean (

μ

), standard deviation (

σ

), skewness (

γ

), and kurtosis (

κ

), were computed across the three BGR color channels. This produced a vector of 360 handcrafted descriptors (

3 \times 4 \times 30 = 360

), providing a robust representation of chromatic variations, spot distributions, and intensity peaks associated with foliar lesions. A segmentation into 30

128 \times 128

-pixel patches was then chosen to capture local chromatic variability, which is critical in diseases such as variegated chlorosis and greasy spot, where lesions are not uniform. Table 2 lists the technical parameters for extracting handcrafted features. Higher-order measures, such as

γ

and

κ

, enable the model to identify the average color, asymmetry, and spectral-intensity peaks that characterize necrotic lesions and mineral deficiencies. The choice of BGR statistical moments over options such as Hue-Saturation-Value (HSV) or LAB color space (L for lightness, a for green-red, and b for blue-yellow channels) is based on two main reasons. First, from a computational perspective, BGR is the default format used by most digital image sensors and processing libraries (e.g., OpenCV). Therefore, extracting moments directly from BGR preserves signal integrity and avoids nonlinear color-space conversion steps. This decision is crucial for the Edge-AI pipeline’s real-time performance, as it greatly reduces inference time latency (see Section 4.3). Second, from a feature-engineering standpoint, the first four statistical moments of the BGR channels effectively capture the necrotic and chlorotic signs of citrus diseases. Although HSV and LAB spaces are commonly used for luminance invariance, the skewness and kurtosis of the BGR channels offer a similar level of robustness against specular reflections and shadow effects by showing the asymmetry and peakedness of the pigment distribution.

Deep Textural Features ( $V_{C N N}$ ): Structural and textural features were extracted from the VGG16 CNN base, pre-trained on ImageNet. To balance feature resolution and computational cost, the input images were resized to

160 \times 160 \times 3

. The fully connected layers were excluded, and only the feature maps from the last pooling layer (block5_pool) were retained. Following a fixed feature-extraction strategy, all convolutional layers were kept frozen and used the pre-trained weights without additional fine-tuning; consequently, no learning rate or optimizer was required for the backbone. With the architecture’s downsampling factor of 32, the input dimensions were reduced to a

5 \times 5

spatial grid of 512 channels. The grid acts as a deterministic spatial map, with each of the 25 units representing a specific receptive field of the leaf. This produced a high-dimensional feature vector of 12,800 raw descriptors per image (

5 \times 5 \times 512

), providing a dense representation of deep textural patterns associated with citrus pathologies and leaf symptoms while requiring significantly less memory than the standard 224-pixel resolution. Table 2 describes the CNN domain parameters in detail. Figure 1 shows an excerpt of the feature map produced by the CNN model on the Experiment 1 dataset. At this stage of the methodology, two new sets of characteristics are obtained (derived from the handcrafted and CNN methods, respectively). VGG16 was selected as the primary architecture for feature extraction due to its depth and the use of small

3 \times 3

convolutional kernels, which are highly effective at capturing hierarchical textural features. Unlike some lightweight architectures that focus on faster computation by using depthwise separable convolutions and compromise spatial resolution, the traditional convolutional layers of VGG16 preserve the structural integrity of citrus leaf lesions. This precise feature extraction is crucial for the subsequent PCA-based compression and the identification of the hybrid elite subset.

2.2.2. Dimensionality Reduction Using PCA

To mitigate the high dimensionality of the CNN’s dense descriptor output, principal component analysis (PCA) was applied. This procedure transformed the original feature space into a set of orthogonal, uncorrelated variables called principal components [33]. A selection criterion based on cumulative explained variance was applied, retaining only components that together accounted for 95% of the original dataset’s total variance. Mathematically, the number of components k was determined as follows:

\frac{\sum_{i = 1}^{k} λ_{i}}{\sum_{j = 1}^{n} λ_{j}} \geq 0.95

(1)

where

λ

represents the ordered eigenvalues of the covariance matrix. This step significantly reduces redundant noise and improves the computational efficiency of subsequent classifiers without sacrificing information relevant to the discrimination of leaf symptoms.

Subsequently, when constructing the initial hybrid dataset for each experiment, a method for selecting the first principal components was implemented to ensure balanced representation between the handcrafted chromatic features and the deep structural descriptors while minimizing computational redundancy based on the elbow criterion for variance extraction via a screen plot analysis. This component selection captures the essence of the VGG16 textural hierarchy without introducing excessive dimensionality that would dilute the influence of the 360 chromatic moments (handcrafted). By limiting the representation of the deep vision of the main high-variance descriptors, a robust balance between chromatic descriptors and structural characteristics associated with leaf morphology is maintained, facilitating faster convergence of the classification algorithms and minimizing the risk of overfitting during the initial benchmarking phase.

In summary, to ensure computational feasibility and mitigate the high-dimensional noise inherent in CNN architectures, PCA was implemented as a preliminary denoising stage for the DL descriptors. Given that the VGG16 base generates high-density feature maps with significant redundancy, PCA was used to project these features into a latent space defined by a number of principal components (determined by the elbow method) that capture the maximum variance. This step is critical for reducing the search space prior to applying wrapper-based selection. Because Wrapper methods are computationally expensive, requiring multiple training iterations for each subset evaluation, the PCA-based pre-processing enables the Greedy Stepwise search (described in the following subsection) to identify the elite feature subset more efficiently and stably without compromising the integrity of the diagnostic signal.

2.2.3. Hybrid Space Definition and Augmentation Strategy

The initial hybrid dataset (

D_{H y b r i d}

) for each experimental phase is formally defined as the union of two distinct feature domains: the handcrafted statistical descriptors (

V_{H C}

) and the latent features extracted by the convolutional architecture (

V_{C N N}

), as follows:

D_{H y b r i d} = V_{H C} \cup V_{C N N}

(2)

For Experiment 1 (Mexico), to strengthen generalizability and mitigate overfitting inherent in small-scale datasets, a horizontal-flip data augmentation strategy was applied [34,35]. This procedure doubled the sample size from

N = 262

to

N = 524

instances, ensuring the statistical stability required for 10-fold cross-validation and enabling the model to capture the spatial variability of lesions without compromising the spectral integrity of the images.

In Experiment 2 (Pakistan), utilizing the Sargodha citrus leaves dataset (

N = 609

), a targeted data augmentation protocol was implemented to mitigate the severe class imbalance, particularly evident in the melanose category (

n = 13

). To ensure statistical fairness and enhance model robustness, the training samples were standardized to

n = 200

images per category via lossless geometric transformations (horizontal/vertical reflections and 90° rotations), resulting in a balanced training set of 1000 instances. This standardization (detailed in Table 3) was fundamental to preventing majority-class bias and to ensuring high diagnostic sensitivity, even for rare pathological symptoms, under heterogeneous field conditions.

To ensure the reliability of the results and minimize the risk of overfitting, a stratified 10-fold cross-validation scheme was applied separately to each dataset. This method ensures that, in each fold, the model is tested on a completely different sample that was not used during feature selection or training. Crucially, any balancing process was applied only to the training part of each fold, leaving the validation sets untouched to prevent data leakage and ensure an accurate assessment of the model’s performance in unbalanced, real-world agricultural context environments. Moreover, using two separate datasets (Mexico and Pakistan) serves as a stress test for domain generalization. This setup confirms whether the proposed hybrid architecture can effectively adapt to different citrus species and various disease conditions, demonstrating the methodology’s flexibility in addressing the diverse requirements of global precision agriculture.

2.2.4. Feature Space Optimization

To achieve high efficiency, a hierarchical optimization strategy based on the wrapper method was implemented. Unlike filter methods, the wrapper approach evaluates attribute subsets using a learning algorithm as the validation engine, thereby capturing complex interdependencies among attributes. The process was executed in two independent phases to ensure domain isolation and prevent the high dimensionality of the features from having a statistical noise.

A similar feature selection approach was applied to both the handcrafted domain (

V_{H C}

) and the CNN-PCA domain (

V_{C N N}

). This approach uses the WrapperSubsetEval evaluator on the initial 360 chromatic moments and on the principal components extracted from VGG16. In the WrapperSubsetEval method, the Random Forest (RF) classifier was configured as the search engine due to its robustness to noise and its ability to model nonlinear interactions. Furthermore, the Greedy Stepwise search method was defined in the forward direction, enabling the identification of an optimal subset of features for each dataset and efficient convergence to a subset of critical components. The wrapper-based feature selection was configured to use classification accuracy as the primary evaluation metric. This ensured that the Greedy Stepwise search prioritized feature subsets that maximized the base classifier’s (RF) hit rate through 5-fold internal cross-validation, effectively pruning redundant descriptors while preserving high diagnostic sensitivity, which can be formally described as follows:

Initialization: Define the selected feature set as an empty set $S \leftarrow \emptyset$ .
Iterative Evaluation: For each iteration i, every feature $f_{j} \notin S$ is evaluated by training and testing the classifier on the subset $S \cup {f_{j}}$ .
Selection: The feature $f_{b e s t}$ that maximizes the objective function improvement is selected and updated: $S \leftarrow S \cup {f_{b e s t}}$ .
Stepwise Refinement (Backward Step): After each addition, the algorithm evaluates the exclusion of previously incorporated features. Any feature whose removal increases the objective function value is discarded from S to eliminate redundancy.
Termination Criteria: The process converges when no further significant improvement is observed in the objective function ( $Δ J < ϵ$ ) or when the optimal performance plateau is reached.

Mathematically, the algorithm seeks to find:

S^{★} \in arg max_{S \subseteq 1, \dots, D} J (S)

(3)

where

J (S)

is the classification accuracy estimated through internal 5-fold cross-validation. The winning subsets from both domains were combined to form the optimized hybrid dataset (

D_{O p t}

). This configuration represents a substantial reduction in the original dimensionality, enabling ultrafast execution in edge computing environments. To ensure scientific validity, the final model was evaluated using 10-fold cross-validation with an RF classifier and 500 iterations to stabilize performance metrics.

2.2.5. Statistical Moments of BGR Color Space

To quantify the characteristics of optical physiology and chromatic distribution in leaf images, four statistical moments were computed for each color channel (

B, G, R

). For a color channel with N pixels, where

x_{i}

denotes the intensity of the i-th pixel, the descriptors are defined as follows:

Mean ( $μ$ ): Represents the average brightness of the color channel.

$μ = \frac{1}{N} \sum_{i = 1}^{N} x_{i}$

(4)
Standard Deviation ( $σ$ ): Measures the contrast or dispersion of color tones.

$σ = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - μ)}^{2}}$

(5)
Skewness ( $γ_{1}$ ): Indicates the degree of asymmetry in the color distribution; high values suggest the presence of spots or subtle chromatic anomalies (chlorosis).

$S = \frac{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - μ)}^{3}}{σ^{3}}$

(6)
Kurtosis ( $κ$ ): Describes the shape of the distribution. High kurtosis indicates that the disease manifests in very specific and extreme color ranges (localized necrosis).

$κ = \frac{\frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - μ)}^{4}}{σ^{4}}$

(7)

These moments capture essential biophysical indicators,

μ

and

σ

represent global illumination and contrast, while

γ_{1}

and

κ

detect subtle pigmentary shifts associated with early-stage chlorosis and localized necrotic tissue.

2.3. Validation and Splitting Strategy

To ensure the statistical reliability and reproducibility of the results, a stratified 10-fold cross-validation procedure was used in all experiments. Under this method, the entire dataset (N) was divided into ten mutually exclusive subsets (folds) of equal size. In each of the ten iterations, nine folds (90%) of the data were used for model training and feature selection, while the remaining fold (10%) served as an independent test set.

Crucially, to prevent data leakage and ensure the independence of the validation results, the dataset was divided into folds at the original source image level before any data augmentation was performed. This group-based splitting method ensures that all augmented versions derived from a single original leaf image remain in the same training fold. As a result, no augmented version of an image used during training ever appeared in the folds of the testing or validation sets.

The stratified split maintains the original class distribution within each fold. For Experiment 2 (Pakistan), the internal balancing described in Section 2.2.3 was strictly applied only to the 90% training subset of each fold, while the 10% test subset remained unbalanced to mimic real-world diagnostic scenarios. This careful separation prevents biased evaluations and ensures that the reported mean accuracy and Kappa coefficients genuinely reflect the model’s true ability to generalize.

2.4. Classification Algorithms Benchmarking

To evaluate the discriminatory capacity of the hybrid descriptor feature space, a comparative study was conducted using six ML algorithms. This diversity ensures that the ability to discriminate among foliar symptoms does not depend on a specific algorithm but rather on the quality of the extracted information. The classifiers were implemented in the Waikato Environment for Knowledge Analysis (Weka) 3.8.6 software tool (https://ml.cms.waikato.ac.nz/weka/ (accessed on 6 January 2026)):

Multilayer Perceptron (MLP): A feed-forward artificial neural network was configured with an adaptive hidden layer of 197 neurons ( $[a t t r i b u t e s + c l a s s e s] / 2$ ) and trained for 500 epochs. It was selected for its ability to model complex nonlinear relationships between texture and color.
RF: An assembly algorithm based on bagging 500 decision trees was used for its high noise tolerance and its ability to assess the intrinsic importance of attributes, making it the engine of the optimization phase.
Support vector machine (SVM): Implemented using the sequential minimal optimization (SMO) algorithm and configured with a polynomial kernel, which projected the data into a higher-dimensional space where classes defined by foliar symptoms are linearly separable.
BayesNet: A probabilistic graphical model that uses the K2 algorithm to learn the dependency structure among descriptors and the foliar symptoms class.
J48: An implementation of the C4.5 algorithm for generating pruned decision trees. It was included to evaluate the model’s interpretability through explicit decision rules.
Naive Bayes: A probabilistic classifier based on Bayes’ theorem, with a strong independence assumption. It served as a baseline for validating the system’s required complexity.

2.5. Computational Environment

To ensure the model’s viability in precision agriculture scenarios, computational efficiency was evaluated on a high-performance workstation based on the Apple Silicon architecture. The hardware consisted of an Apple MacBook Pro (Apple Inc., Cupertino, CA, USA) with the M4 Max chip (14-core), which integrates a 32-core GPU and 36 GB of unified memory. This configuration is particularly relevant because unified memory enables ultra-low-latency data transfers between the CPU and GPU, optimizing the execution of the VGG16 architecture via the Metal Performance Shaders (MPS) interface.

The software ecosystem was built in Python 3.10.14 for preprocessing and hybrid descriptor extraction, using the PyTorch (v2.10.0), OpenCV (v4.11.0.86), and scikit-learn (v1.3.0) libraries. The latest library was used for feature scaling (StandardScaler) and dimensionality reduction via PCA. Pandas (v2.3.3) was employed for high-level data structure management and fusion of hybrid feature vectors, while matplotlib.pyplot (v3.10.8) was used to visualize performance metrics and generate diagnostic stability plots. Final feature selection and classifier validation were performed in Weka 3.8.6.

3. Results

This section presents findings from three experimental phases: (1) evaluation of individual descriptors, (2) optimization using feature selection techniques, and (3) synthesis of the final hybrid model.

3.1. Experiment 1

3.1.1. Base Classifier Selection and Hybrid Benchmarking

The initial hybrid feature space for Dataset 1 consists of 390 descriptors: 360 handcrafted chromaticity moments and 30 principal components derived from the CNN. Although the cumulative variance analysis identified a mathematical elbow at 158 components (reaching the 95% threshold at 385 PCs), as shown in Figure 2, a heuristic selection of the top 30 PCs was used. This choice was made to maintain a balanced 1:12 ratio between deep textural abstractions and handcrafted features, capturing the region of steepest information gain while emphasizing model simplicity. Therefore, to determine the most suitable computational engine for classifying foliar symptoms in citrus leaves, the initial hybrid dataset was subjected to a stress test across various ML paradigms, using a benchmarking approach to select the base classifier. This phase is crucial because the classification model’s performance on the original hybrid dataset determines the feasibility of subsequent optimizations. The following algorithms were evaluated: Bayes Net, J48, MLP, Naive Bayes, RF, and SVM. Table 4 compares the models’ performance, revealing a significant disparity in their generalization capabilities.

The MLP algorithm consistently outperformed all models in terms of accuracy and stability (kappa). The results suggest that MLP more effectively integrates manual features with CNN components, avoiding overfitting by modeling complex nonlinear relationships among descriptors. Furthermore, MLP was slightly more efficient than RF and SVM. While the J48 algorithm outperformed Naive Bayes, its accuracy lagged behind the ensemble methods. This is attributed to the fact that a single decision tree tends to overfit due to the dataset’s high feature dimensionality, making it less efficient at capturing nonlinear decision boundaries than MLP or RF models. Based on preliminary benchmarking, the RF classifier was selected as the engine for subsequent feature selection and model optimization. Although the MLP showed slightly higher accuracy (97.71% vs. 96.75%), the RF model demonstrated superior statistical robustness, validated by area under the Receiver Operating Characteristic (ROC) curve (AUC) value of 0.996 achieved and far greater computational efficiency, reducing training time by 99.6% compared to the MLP. This speed is critical for the iterative process of the wrapper feature selection method, enabling an exhaustive search of the feature space without compromising the system’s viability in real-time environments.

3.1.2. Attribute Selection and Elite Model Definition

Identifying the Elite Model with the wrapper method enabled distillation into a hybrid space of just 12 descriptors, achieving a 96% reduction in dimensionality without compromising diagnostic stability. In the manual domain, selection focused on five key attributes of the first extracted patch (V2 to V6): the standard deviation, skewness, and kurtosis of the red channel, as well as the mean and standard deviation of the green channel. From a phytopathological perspective, the prominence of these statistical moments indicates that the system has isolated optical markers of pigment degradation: while the green channel detects the decline in chlorophyll in variegated chlorosis and zinc deficiency, the variability captured by the red channel reveals the heterogeneity of necrotic lesions in greasy spot.

To enhance this color sensitivity, the model incorporates seven latent components derived from the VGG16 architecture (PC_CNN_2, 4, 5, 7, 9, 12, and 20), which act as spatial-invariance descriptors. While BGR moments capture local color variation, these CNN components provide the structural hierarchy needed to differentiate complex morphologies. Specifically, the importance of the first principal components (PC_CNN_2, 4, and 5) shows that the model focuses on high-energy structural differences, enabling it to distinguish the rough, textured surface of a canker from the diffuse, smooth patterns of a nutritional deficiency. This connection between understandable color and deep abstraction forms the foundation of a model that can generalize diagnoses across different agricultural settings. Table 5 shows the performance of the pruned classification model. An important aspect is the problem boundary between variegated chlorosis and healthy leaves, identified from the model’s error structure. The overlap observed in the confusion matrix (where mutual confusion persists in 8 instances) reflects the biological heterogeneity of the infection in its early stages. At this stage, the chlorotic mottling is so subtle that it blends in with the natural variation in leaf pigmentation or with light interference from the canopy. Analyzing recall by class confirms that trait hybridization resolves these ambiguities, capturing chromatic gradients that pure DL architectures typically overlook because of their bias toward structural textures.

To verify the consistency of the elite subset, a 10-fold cross-validation was performed. Table 6 summarizes the selection frequency for the top-performing features. In Dataset 1 (Mexico), chromatic moments V4 and V6 showed a perfect selection frequency (10/10), while in the deep textural components, PC_CNN_2, 4, and 5 achieved an 80% recurrence rate. These results demonstrate a high level of stability in the feature engineering process.

The discriminatory power of the optimized hybrid model is further validated by the AUC (Figure 3). The nearly perfect performance observed in the greasy spot and zinc deficiency classes (

A U C \approx 1.0

) shows that combining BGR chromatic moments with CNN features effectively captures their unique spectral and structural signatures, almost eliminating overlap with other pathologies. This indicates that for lesions with high contrast or clearly defined necrotic patterns, the 12 selected descriptors provide a sufficient margin of separation. In contrast, the curves for the healthy and variegated chlorosis classes, while still showing high area-under-the-curve values, reflect the complexity of the previously discussed problem boundary. The slight deviations from the ideal point in these categories are not due to model limitations but to the subtle pigmentary transitions of early-stage chlorosis. This graphical evidence confirms that hybrid feature pruning did not reduce predictive power; instead, it refined the diagnostic space to the most relevant biological markers, resulting in a clearer decision boundary and enhancing the model’s ability to generalize across varying symptom levels.

The comparative performance across various classification methods (Table 7) shows that the optimized hybrid space is mostly independent of the paradigm. The high stability observed in neural networks (MLP) and ensemble methods (RF), with Kappa statistics exceeding

0.93

, indicates that the 12-descriptor subset captures consistent, high-quality signatures that go beyond specific learning architectures. Notably, the fact that models such as Naive Bayes and J48 maintain high discriminatory power (

A U C > 0.92

) provides further scientific evidence supporting the attribute space’s linear separability. This demonstrates that combining BGR chromatic moments and latent CNN textures effectively differentiates the diagnostic signal from background noise. The slight performance advantage of the MLP and RF models suggests that modeling the non-linear interactions within the problem boundary (particularly the subtle overlap between early-stage variegated chlorosis and healthy tissue) is best accomplished by architectures capable of mapping complex hierarchical relationships between chromatic and structural descriptors.

Beyond overall accuracy metrics, it is crucial to analyze how the models perform for each specific foliar condition. Figure 4 shows the confusion matrix for the MLP classifier in the final experiment. The main diagonal, shown in Figure 4, indicates that the model classifies most instances correctly. Perfect performance is observed in the greasy spot class, and near-perfect performance in the zinc deficiency class, with 128 and 126 correct classifications, respectively. However, the matrix also reveals where the model’s difficulties lie, particularly in the interaction between the variegated chlorosis and healthy leaves classes. Specifically, the MLP model misclassified 8 healthy leaves as having variegated chlorosis (false positives). Conversely, and perhaps more critically, 8 leaves that actually had variegated chlorosis were misclassified as healthy (false negatives). This behavior is consistent with the visual similarity between the two conditions, particularly in early or moderate stages of chlorosis, when yellowing can be subtle, heterogeneous, or confined to specific interveinal regions, making differentiation based on global chromatic characteristics difficult. This suggests that the features extracted for these two categories overlap in the vector space, hindering their linear separation by the MLP.

To determine whether this pattern is unique to the MLP or inherent in the data, Table 8 presents recall metric (true positive rate) by class for the three best-performing models (MLP, RF, and SVM). The analysis of Table 8 shows a consistent pattern: the variegated chlorosis class is the most difficult to detect across all algorithms, with the lowest recall values (ranging from 84.4% to 93.8%), which highlights the greater difficulty in correctly identifying it based solely on visual characteristics. This behavior is consistent with foliar chlorosis in citrus, where interveinal yellowing can occur heterogeneously and irregularly across the leaf blade, especially in the initial or early stages of the disease. These characteristics can resemble normal physiological variations or appear healthy, increasing the likelihood of confusion even in visual assessments by specialists. None of the models surpassed the 94% threshold for this class, reinforcing the hypothesis that the visual characteristics of variegated chlorosis are either the most difficult to isolate or the most similar to other conditions (especially healthy leaves) in this dataset. In contrast, MLP achieved perfect detection (100%) for the greasy spot class and is the most balanced model, with the best recall in three of the four classes (greasy spot, healthy leaves, and zinc deficiency), although by narrow margins. This detailed analysis shows that, although overall accuracy is high, the choice of the final model could depend on the practical priority: if the critical objective is to avoid missing any cases of variegated chlorosis, RF would be ideal; if the highest average reliability across all diseases is sought, MLP would be the most robust solution.

3.2. Experiment 2

3.2.1. Validation Results of Hybrid Benchmarking

The analysis of Dataset 2 reveals greater structural complexity compared to the Mexican samples. As shown in Figure 5, the mathematical elbow was identified at 314 PCs, and 790 PCs were required to reach the 95% variance threshold. Despite this broader distribution, we limited the selection to the top 30 PCs to maintain consistency across experiments. This heuristic choice effectively captures the main structural features (where the information gain per component is highest) while preventing the high-dimensional CNN family from numerically overshadowing the chromatic descriptors.

Following this dimensionality reduction, the benchmarking of ML models on the Pakistani dataset reveals a high-stability core, with SVM, RF, and MLP achieving an accuracy ceiling of about 88–89% (Table 9). While this confirms the effectiveness of the hybrid feature space in addressing inconsistent field conditions, the main scientific contribution of this experiment is the identification of a distinct error zone between HLB (greening) and black spot. This diagnostic boundary highlights the real challenge of distinguishing symptoms digitally in the Sargodha region. The confusion (particularly the 47 HLB cases misclassified as black spot by the MLP) stems from the biological similarity of their early signs. In early black spot, necrotic lesions are small and often surrounded by chlorotic halos that resemble the irregular, asymmetric mottling typical of HLB. This visual similarity leads to comparable spectral and textural signatures, making it difficult to separate these classes linearly, even with deep hierarchical features.

In contrast, the model achieves near-perfect discrimination for melanose, with recall exceeding 95% across all top-tier classifiers. This is due to the unique textural signature of melanose, characterized by raised, corky, necrotic pustules that provide high chromatic contrast against healthy tissue, making them easily recognizable even in poor lighting. The disparity between how easily melanose is identified and the difficulty of defining the HLB-black spot boundary indicates that future improvements should focus on hyperspectral or higher-resolution textural descriptors to better navigate the subtle transition between chlorotic halos and actual necrotic lesions.

3.2.2. Synthesis of the Hybrid Elite and Attribute Hierarchy Model

To maximize computational efficiency and diagnostic accuracy in the transcontinental environment, a two-stage attribute selection was performed using a wrapper approach with the RF algorithm. This strategy enabled a drastic reduction in dimensionality, reducing the feature space from 390 features to an optimal subset of 18 elite descriptors (8 handcrafted and 10 CNN). This reduction represents a 95.4% decrease in problem complexity by eliminating redundant variables that caused errors at the decision boundaries. Among the 8 selected handcrafted descriptors, Patch 24 contained a critical concentration of information, contributing six of the eight handcrafted features. The standard deviation (V286) and kurtosis (V288) in the blue channel are particularly noteworthy. The persistence of these fourth-order moments suggests that the model relies on peak detection of chromatic intensity to distinguish black spot necrotic lesions from chlorotic mottling of HLB.

On the other hand, integrating the 10 CNN components with the highest statistical merit (

m e r i t = 0.833

) provided the necessary structural robustness. Components PC_CNN_10, PC_CNN_2, and PC_CNN_5 ranked highest in Gini importance, capturing overall leaf morphology and enabling flawless detection of melanose (

A U C = 1.000

). This simplified 18-feature model retained 99.5% of the predictive capacity of the full model, achieving 87.4% accuracy. However, minor errors in discriminating between HLB and black spot persist, suggesting symptomatic overlap in the early stages of these diseases in the Pakistani phytosanitary context. This finding indicates that the remaining confusion does not depend on the number of features but rather on subtle shared morphological variance that even challenges deep texture descriptors and higher-order statistical moments. Finally, the 10-fold cross-validation for attribute selection showed strong consistency in the feature space. Textural components from the deep backbone (VGG16-PCA) exhibited the highest stability, with PC_CNN_2, PC_CNN_22, and PC_CNN_23 selected in 90% of the runs. This consistent pattern indicates that these specific components reliably capture textural features across different stratified partitions of the Pakistan dataset (see Table 10). Conversely, although handcrafted chromatic features had lower individual selection frequencies (e.g.,

40 %

for V282), their occasional selection indicates redundancy among color descriptors, as the wrapper algorithm alternates between highly correlated features to optimize classification performance.

The comparative analysis of the optimized subset of 18 attributes shown in Table 11 indicates that the RF model has the most robust architecture for diagnosing citrus diseases under low-dimensional conditions. Although effective (83.5% accuracy), the MLP model showed greater sensitivity to errors in distinguishing between HLB and black spot, suggesting that the loss of redundant descriptors affects the neural network’s ability to converge on stable decision boundaries for these pathologies. However, both models significantly outperformed SVM (74.3%), demonstrating that ensemble-based and neural network techniques are preferable to SVM for future mobile implementations that prioritize parsimony and computational efficiency.

Furthermore, a significant divergence in classifier performance was observed after dimensionality reduction. While RF maintained remarkable robustness (87.3%) with only 18 features, the SVM model’s overall accuracy decreased to 74.3%. This phenomenon suggests that SVMs rely on redundancy in high-dimensional spaces to compensate for the overlap in symptomatology between HLB and black spot, especially in the early stages, when both diseases can present irregular chlorosis and mottling on the leaf blade. Conversely, the RF model demonstrated greater efficiency in leveraging elite descriptors, making it the most suitable architecture for low-computation implementations.

Table 12 presents the class-wise performance of the RF classifier on the optimized hybrid dataset of 18 descriptors. In pathologies such as HLB, the characteristic symptom is chlorosis, manifested as irregular mottling, with chlorophyll loss occurring heterogeneously across the leaf blade. Statistically, this translates into a high dispersion of intensity values and a heterogeneous distribution of chlorophyll and photosynthetic pigments, associated with phloem alteration and physiological deterioration of leaf tissue. The RF-based model uses these second- and fourth-order moments to distinguish HLB from healthy leaves, achieving a recall of

0.873

in this category. This confirms that the hybrid approach captures both the chromatic alterations associated with chlorophyll degradation and the structural changes resulting from pathogen-induced physiological stress. Furthermore, the RF model relies on global structural information (leaf margins and morphology captured by VGG16) and employs BGR statistical moments to refine the diagnosis through fine chromatic analysis.

Figure 6 shows the confusion matrix from cross-validation in the classification analysis of the Experiment 2 dataset. A strong classification diagonal is evident, confirming the effectiveness of the selected 18-feature subset. The classification model stands out for its high performance in identifying critical pathologies, particularly in the canker and melanose classes, where the highest accuracy was achieved (

0.905

and

1.000

, respectively). Lesions in these classes exhibit distinctive morphological characteristics, including well-defined necrotic areas, suberized tissue, and high-contrast patterns that facilitate visual and spectral differentiation. This result is particularly relevant to precision agriculture, as early diagnosis of these diseases is vital to preventing massive economic losses. Despite the high efficiency of the 18-feature subset, persistent overlap between the HLB and black spot classes was observed (45 cases of confusion), yielding a recall of

0.730

, which reflects the partial similarity in their symptomatology, particularly in the early stages of symptom development. This phenomenon is attributed to the convergence of the kurtosis and standard deviation statistical signatures in the green and blue channels. From a phytopathological perspective, HLB is characterized by diffuse, asymmetric mottling resulting from the irregular degradation of chlorophyll. In black spot, the lesions are angular, brown to purple, and surrounded by a yellow halo. It also produces localized necrotic lesions that alter the structural and chromatic integrity of the leaf surface. Hence, by reducing the model’s dimensionality, fine-texture descriptors that distinguish the stochastic irregularity of mottling from the defined geometry of necrotic lesions are eliminated, thereby establishing an intrinsic limit on the accuracy of parsimonious models for these two diseases.

The optimized hybrid space of 18 descriptors was evaluated using attribute importance analysis based on the Gini impurity index (Figure 7). The results reveal a clear hierarchy where deep spatial abstractions from the CNN architecture (PC_CNN_10, 2, and 5) rank highest, with importance values exceeding

0.34

. This highlights the significance of structural hierarchy in identifying eruptive pustules and complex morpho-textural patterns, such as those found in canker. However, the hierarchy also demonstrates a compensatory synergy: the prominence of CNN components is supported by handcrafted descriptors that prevent texture bias, a common limitation in DL where pooling operations average out subtle chromatic signals.

A key discovery in attribute ranking is the high importance of the blue-channel statistical moments, especially the standard deviation (V286). From a biophysical viewpoint, this descriptor’s significance arises from the scattering characteristics of short-wavelength light; as blue light is highly responsive to refractive-index changes in deteriorated cells and structural modifications in the cuticular layer, V286 acts as an important indicator for detecting cellular necrosis in black spot and greasy spot. Moreover, the combination of V286 and the blue-green contrast (V282) is crucial for distinguishing the problem boundary between HLB and black spot. While traditional texture descriptors often mistake early necrotic spots for the asymmetric mottling seen in HLB, these second-order chromatic moments effectively represent the diffuse pigment gradients related to phloem dysfunction.

The importance of this hybrid architecture is supported by the ablation study: while a standalone CNN reached a performance limit of

83.50 %

in this transcontinental scenario, incorporating these 18 elite descriptors resulted in a

4.55 %

relative increase, raising the final accuracy to

87.30 %

. This shows that the model does not depend on a single dominant feature but rather on the combination of a deep spatial hierarchy and detailed spectral descriptors, ensuring robustness in diagnostics despite optical variability under field conditions.

Figure 8 summarizes the

F-measure

,

A U C

, and

p r e c i s i o n

metrics by category. This analysis indicates that the model has strong discriminative power, with a weighted average AUC of

0.976

. All classes exceed the

0.94

threshold, indicating that the hybrid architecture (BGR + CNN) is highly effective at separating the classes. With the exception of black spot and HLB, all pathologies maintained

F-measure

above

0.87

, underscoring the model’s role as a balanced diagnostic tool. This metric stability on a dataset with reduced resolution (

256 \times 256

px) and variable lighting conditions validates the robustness of the proposed methodology. Therefore, it can be considered an ideal model because of its potential as a support tool for early phytosanitary monitoring in the field, particularly in precision agriculture systems and real-time monitoring devices. In this regard, the melanosa class shows a near-perfect balance across the three metrics, and the canker class displays a near-perfect balance (>0.92). This suggests that the 18 selected features (especially the colored BGR moments) accurately capture the morphology of necrotic lesions characteristic of this disease. Importantly, despite reducing the number of features from 390 to 18,

p r e c i s i o n

across all classes remains consistently above

0.87

, with a minimum of

0.75

in the black spot class. This confirms that no critical information was lost during the CNN’s PCA compression.

To validate the need for the hybrid architecture, an ablation study was conducted to compare the performance of statistical color descriptors (handcrafted), deep texture descriptors (CNN), and the optimized hybrid model (see Table 13). The results show that although BGR moments provide an acceptable diagnostic basis (

a c c u r a c y = 75.70 %

), integrating latent deep vision features via the CNN model increases accuracy by

10.30 %

. Notably, this improvement is attained while reducing overall dimensionality, confirming that the latent synergy between color and texture enables a more robust decision boundary at lower computational cost. Furthermore, the hybrid model’s superiority over the pure DL approach is demonstrated by analyzing the black spot class. The model based exclusively on latent CNN descriptors (10 features) failed to diagnose 57 cases of this pathology (

r e c a l l = 0.715

). Integrating BGR moments increased discrimination by over

11 %

. This result suggests that chromatic information complements structural features, since many citrus diseases manifest through alterations in the distribution of chlorophyll and pigments before producing evident structural changes. Thus, deep neural networks, despite their capacity for textural abstraction, exhibit a texture bias toward pathologies whose lesions share similar textural signatures but different colorimetric profiles. Therefore, the hypothesis that the

h a n d c r a f t e d + C N N

fusion produces a feature space superior to those of its individual components is validated, with a

4.55 %

increase in overall accuracy and a

6 %

improvement in statistical stability (kappa) compared to CNN alone. This improvement demonstrates that handcrafted features are not redundant but essential for compensating for the loss of spectral information in deep convolutional layers.

The diagnostic robustness of the 18-descriptor subset is confirmed through a multi-class AUC analysis (Figure 9). The model achieves an

A U C = 1.000

for melanose and scores above

0.99

for canker and healthy leaves, demonstrating that the hybrid space effectively captures the most distinctive features of these conditions. The slight decrease in

A U C

for HLB (

0.947

) and black spot (

0.951

) aligns with the convergence of kurtosis descriptors in the green and blue channels, pinpointing the exact boundary where chromatic signatures overlap. This high discriminatory ability is further demonstrated by the reduction in false-positive rates, which is crucial for cutting down unnecessary phytosanitary interventions in real field conditions. Additionally, including the standard deviation of the green channel was key to achieving a

73 %

recall for HLB, solving ambiguities that are usually hard to distinguish with traditional computer vision methods.

4. Discussion

Unlike traditional methods that depend on high-dimensional feature spaces, this study shows that an Elite Model fusion offers a more robust diagnostic framework. By reducing the input to a hybrid subset, the classifier structure becomes simpler, effectively reducing noise often found at decision boundaries in non-controlled agricultural environments.

4.1. Discussion Experiment 1

The technical advantage of the MLP over decision-tree models such as RF lies in its ability to capture complex nonlinear relationships among foliar symptom phenotypes. While RF splits the attribute space into separate regions, the MLP’s non-linear activation functions are more effective at capturing subtle interdependencies between BGR chromatic moments and CNN latent descriptors. This multidimensional mapping is essential for distinguishing between variegated chlorosis and nutritional deficiencies, a challenge that remains difficult even for experienced field pathologists due to overlapping interveinal chlorosis patterns. These results support the idea of efficient minimalism by showing that large DL models are not always necessary for accurate plant diagnosis. Instead, a carefully designed feature space that combines the statistical sensitivity of color with the structural hierarchy of texture allows compact models to achieve strong generalization. This is especially important for agricultural applications, where data labeling can be costly, and processing must happen on edge devices with limited computing power, offering a sustainable alternative to data-heavy CNN models.

The continued use of chromatic descriptors in the Elite Model indicates that changes in pigment concentration are still the main spectral indicator of citrus diseases. Specifically, the green channel moments highlight impaired photosynthesis and lower chlorophyll levels in affected leaves. By measuring color fragmentation using standard deviation, the model identifies interveinal chlorosis patterns that structural descriptors alone might miss, offering a chromatic anchor that improves diagnosis stability despite variations in leaf age or environmental conditions.

A key finding is that while CNN architectures excel at recognizing spatial patterns like edges and shapes, they tend to average out chromatic information because of successive pooling layers. This inherent ’texture bias’ often limits their ability to detect diseases mainly characterized by pigment gradients. By explicitly reintroducing BGR statistical moments, the hybrid model improves the combination of structural hierarchy with chemical sensitivity. This synergy allows the classifier to distinguish between the physical roughness of a necrotic lesion (captured by CNN components) and the underlying tissue chemistry (captured by chromatic moments), providing a more complete representation of the phytopathological state.

4.2. Discussion Experiment 2

The shift to a set of 18 descriptors for the Pakistani dataset reflects a necessary adjustment to address transcontinental domain differences. Unlike the controlled settings of the first experiment, field-collected images from Pakistan exhibit significant variation in sensor quality and ambient lighting. This expansion of the feature space acts as an information buffer, capturing residual variance caused by different spectral signatures in semi-arid environments. This flexibility shows that the framework is not overfitted to a single regional type but instead maintains diagnostic stability across geographically distant citrus populations.

The hierarchy of attribute importance reveals a crucial compensatory synergy between deep spatial abstractions and statistical color descriptors. While the CNN components provide a strong structural hierarchy for detecting complex lesions such as canker, they can still be influenced by texture bias in uncontrolled environments, where pooling operations might dilute subtle pigmentary anomalies. By explicitly incorporating handcrafted BGR moments, the model addresses this limitation. In particular, the high significance of the blue-channel descriptors is related to the biophysical properties of short-wavelength light scattering. Since blue light is highly responsive to structural changes in the leaf cuticle and refractive-index variations in degraded cells, including it helps the model detect the earliest signs of cellular necrosis in diseases like black spot that structural textures alone might miss.

A key analytical finding, and possibly the most important diagnostic boundary in this study, is the error structure between HLB and black spot. From a plant disease standpoint, the difficulty in differentiating this boundary comes from the biological similarity between early necrotic halos and the irregular, asymmetric mottling typical of HLB. While single CNN architectures often focus on edge detection (frequently confusing small necrotic spots with chlorotic green islands), our hybrid method employs second-order statistical moments to capture the diffuse pigment gradients typical of phloem dysfunction. The framework’s ability to address this biological overlap confirms the usefulness of a dual-perspective architecture (chromatic and textural) in resolving ambiguities that are still challenging even for expert visual inspection in the field.

The stability demonstrated through AUC analysis confirms that the hybrid space is a highly effective solution for real-time monitoring. The near-perfect separation of melanose, compared to the more complex boundary between HLB and black spot, highlights that the model’s strength is in its ability to adapt its discriminative logic to the specific shape of each disease. Achieving high diagnostic accuracy with about 95% reduction in dimensionality, this approach meets the scalability requirements of modern precision agriculture. This efficiency enables the deployment of advanced diagnostic tools on edge computing systems, providing a sustainable alternative to data-heavy, monolithic DL models that often struggle to generalize across the diverse optical and biological conditions of global citrus production.

The stability selection statistics address the potential risk of selection bias and stochastic overfitting. The fact that

90 %

of the folds prioritized the same textural principal components confirms that the elite hybrid subset is not an artifact of a specific data split but is driven by intrinsic pathological signatures. The repeated occurrence of these features across multiple runs with different random seeds shows that the model effectively filters out noise specific to individual cases. This consistency is crucial for real-world use because it ensures the diagnostic criteria stay reliable across various training instances, thus minimizing the impact of stochastic artifacts often seen in high-dimensional biological datasets.

Finally, the superiority of the hybrid approach was evaluated using the corrected t-test for cross-validation proposed by Nadeau and Bengio [36]. While the resulting p-value (

0.056

) sits at the edge of the conventional significance threshold, this is expected given the conservative nature of the correction method. However, the analysis confirmed a substantial mean improvement of

3.70 %

and a large effect size (Cohen’s

d_{z} = 1.00

). This evidence, combined with the fact that the hybrid model outperformed the unimodal CNN baseline in 9 out of 10 folds (including one ties), demonstrates the practical robustness and diagnostic advantage of the proposed feature fusion strategy.

4.3. Edge-AI Feasibility Analysis

To meet the practical requirements of precision agriculture, the Hybrid Elite Model was tested for deployment readiness on edge-computing architectures. Benchmarking was conducted on a native ARM64 platform (ARMv8 instruction set), enabling a direct performance comparison with devices such as the Raspberry Pi 4/5 and modern mobile system-on-chip (SoC) units. The model was serialized using the Joblib protocol to ensure efficient memory mapping during inference. As shown in Table 14, replacing the full DL backbone (VGG16) with our optimized hybrid feature space resulted in a 97.4% reduction in storage needs (from approximately 528 MB to 13.59 MB). More importantly, the inference process proved highly efficient, with a peak RAM usage of just 1.2 GB and an average latency of 56.35 ms per diagnostic task. This offers roughly a 37x speed increase compared to running unoptimized CNN architectures on ARM-based CPUs. These metrics confirm that the proposed framework can support real-time monitoring at >17 frames per second (FPS) in offline orchard environments, where cloud connectivity is often unavailable, and power resources are limited.

The end-to-end latency analysis for the decision-making stages (Stages 3 and 4) showed a processing time of 56.35 ms, resulting in a throughput of 17.75 diagnostic tasks per second. Despite the complexity of the hybrid feature space, system memory usage remained within the limits of modern edge computing platforms (1.2 GB total system RAM). These metrics demonstrate that combining deep textural features with handcrafted chromatic moments is computationally feasible for both offline and real-time orchard monitoring.

4.4. Contextualization Within the State-of-the-Art

The comparative analysis shown in Table 15 offers context for the proposed framework within the current landscape of citrus disease detection. It is important to highlight that making direct numerical comparisons between studies is limited due to differences in dataset size, host species, and validation methods. Therefore, the value of the Elite Model is not solely based on reaching a specific accuracy percentage, but on its methodological simplicity. While high-performance architectures like DenseNet121 [6] or complex ensemble stacks [4] achieve near-perfect accuracy on limited, often controlled samples, they require millions of parameters and substantial computational resources. In contrast, the proposed hybrid method simplifies the diagnostic problem to a minimal signature of only 12 to 18 descriptors. This major reduction in the feature space (achieving similar diagnostic stability with 95% fewer dimensions) shows that combining BGR chromatic moments with structural hierarchy captures a more essential and efficient biological signal than standalone DL models.

Furthermore, the strength of this simple representation is demonstrated by its performance in the transcontinental validation (Experiment 2). Unlike models that rely on expensive sensors [3] or are built for a single geographic region, the Elite Model maintains its ability to tell differences across various climates and areas (from Mexico to Pakistan). This capability to generalize without requiring large amounts of retraining data or costly hardware makes the framework a scalable solution for real-world precision agriculture, where the emphasis is on the reliability and efficiency of the tool in uncontrolled field environments rather than on minor improvements in accuracy.

4.5. Integrative Analysis of the Hybrid Elite Model Performance

The experimental results revealed a performance gap between the paradigms when handling transcontinental data. While the MLP achieved high accuracy on Dataset 1, its performance declined on the more diverse Pakistan dataset. In contrast, RF demonstrated better generalization. This is mainly because RF’s non-parametric nature and its use of feature bagging lessen the impact of feature drift across different geographic regions. Additionally, while MLPs require precise hyperparameter tuning and are sensitive to data distribution, RF’s decision-tree architecture is more resilient against the non-linear noise often found in field-acquired agricultural images, making it a more reliable choice for edge-AI deployment in various ecological zones.

The comparative analysis shown in Table 16 reveals a clear trade-off between model complexity and diagnostic performance across various geographical contexts. In the Mexico dataset, there was an inverse relationship between the number of parameters and accuracy: the proposed Elite Model, despite its simple 18-feature design, outperformed both MobileNetV2 (93.92%) and EfficientNetB0 (93.53%). This indicates that for high-quality images, high-dimensional DL features may introduce unnecessary noise, whereas our biophysically relevant descriptors yield a more stable and accurate decision boundary.

In the Pakistan dataset, which is characterized by greater complexity, higher background noise, and variable lighting, EfficientNetB0 achieved the highest mean accuracy (89.33%). However, the Elite Model remained highly competitive (87.30%) with only a 2.03% difference. From an Edge-AI perspective, this small increase in accuracy by the DL baseline comes at a much higher computational cost, using approximately 5.3 million parameters compared to just 18 features in our framework. The consistent Mean Kappa values (≥0.84) across both datasets for the Elite Model confirm that combining BGR color moments with pruned deep features produces a robust and interpretable representation. This design choice results in an inference latency of 0.03 s, making it much more suitable for real-time use on low-resource mobile devices compared to standard ‘black-box’ CNN architectures.

To assess the feasibility of the proposed framework for Edge-AI applications, we analyzed the model size and inference requirements. As shown in Table 17, the Elite Model shows a significant decrease in computational demand compared to traditional DL methods. This significant reduction is achieved by the Elite Model skipping the full neural network during the specialized diagnostic phase, using only the optimized 18-feature vector. This efficient design meets the constraints of mobile devices with limited RAM and processing power.

4.6. Complexity and Overfitting Mitigation

A key challenge in wrapper-based feature selection is the risk of overfitting, where the search algorithm might identify descriptors that reflect random variations in the training data. In this study, this risk was reduced by maintaining a high sample-to-feature ratio. While DL models often have thousands of latent parameters, our Elite Model uses a small vector of

n = 12

features for the Mexico dataset and

n = 18

for the Pakistan dataset. This parsimonious approach ensures that the classifier focuses on the most reliable optical and structural features of citrus diseases. Additionally, using stratified 10-fold cross-validation during evaluation guarantees that the reported metrics accurately reflect the model’s true generalization ability. The fact that the same hierarchical selection logic produced highly accurate results in two different environmental contexts (tropical Mexico vs. semi-arid Pakistan) provides evidence that the selected features are not statistical artifacts but universal indicators of citrus leaf diseases.

Despite the high performance of the Hybrid Elite Model, some limitations remain. Handcrafted chromatic features are vulnerable to extreme lighting changes, which are common in uncontrolled field environments. Although the deep descriptors from VGG16 provide structural stability, stochastic lighting and harsh shadows can still affect the color distribution of the patches. Additionally, leaf occlusion (where overlapping leaves or debris partially obscure the lesion) presents a challenge to current global feature extraction methods, as it can introduce non-target texture noise. Future research will focus on two main areas. First, integrating Instance Segmentation (e.g., Mask R-CNN) to isolate individual leaves before feature extraction will effectively eliminate background noise and overlapping foliage. Second, we plan to explore multispectral imaging to detect physiological changes, such as chlorophyll fluorescence or water stress, that occur before visible symptoms appear. These efforts will result in a more resilient, early-stage citrus diagnostic system suitable for autonomous robotic monitoring across various orchard environments.

Although the models performed well in both testing scenarios, directly applying a model trained in one geographic region to data from another remains challenging due to differences in disease patterns between datasets. While both datasets include the healthy category, the specific diseases found in Dataset 1 (Mexico) do not match those in Dataset 2 (Pakistan), reflecting different regional plant health issues. In supervised learning frameworks, the lack of shared labels across domains prevents direct one-to-one model evaluation. However, the ‘transcontinental’ robustness of our approach is supported by the observed methodological convergence: the same hybrid architecture, which combines deep textural features and handcrafted BGR moments, appeared as the best configuration across two independent datasets collected under different environmental conditions and citrus cultivars. Future studies could investigate domain adaptation techniques or incorporate more varied international datasets to further confirm the cross-continental diagnostic abilities of the proposed framework.

5. Conclusions

This study demonstrates that combining latent CNN features with statistical BGR moments, optimized through elite-wrapper selection, offers a simple and effective framework for identifying citrus symptoms. The findings show that reducing the feature space from thousands of descriptors to just 12–18 attributes still preserves diagnostic accuracy, despite the optical and biological variability present in transcontinental datasets.

The methodological contribution of this study lies in combining deep structural hierarchy with handcrafted chromatic sensitivity. Keeping key BGR moments in the Elite Models shows that chromatic variance remains an important complement to CNN abstractions, especially for differentiating the boundaries between early-stage chlorosis and healthy tissue. However, it is important to recognize that the system performs best when detecting high-contrast necrotic lesions, and its current success in distinguishing symptoms across hosts and regions (Mexico and Pakistan) should be viewed as a proof of concept for domain adaptability rather than as a universal field diagnostic tool.

Although the 99.8% reduction in dimensionality and ultra-low training latency (

0.03

s) meet technical standards for deployment on edge devices, their practical use in precision agriculture still needs more validation. Future research should address the limitations of internal cross-validation by including independent hold-out samples from diverse agronomic regimes and conducting longitudinal studies. Moving from image-based classification to a comprehensive field diagnostic solution will involve integrating these efficient hybrid models with multi-temporal data to more effectively detect subtle pigmentary changes in complex agricultural environments.

Author Contributions

Conceptualization, E.T.-L., J.D.H.-R. and U.M.R.-A.; methodology, E.T.-L. and B.A.M.-H.; software, E.T.-L.; validation, E.T.-L., B.A.M.-H. and S.R.-T.; formal analysis, E.T.-L., J.D.H.-R. and U.M.R.-A.; investigation, E.T.-L., B.A.M.-H. and S.R.-T.; resources, E.T.-L.; data curation, E.T.-L.; writing—original draft preparation, E.T.-L., B.A.M.-H. and S.R.-T.; writing—review and editing, E.T.-L., B.A.M.-H. and S.R.-T.; visualization, E.T.-L., J.D.H.-R. and U.M.R.-A.; supervision, B.A.M.-H.; project administration, E.T.-L.; funding acquisition, E.T.-L. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by the Autonomous University of Tamaulipas, Mexico.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in this study are openly available in https://www.eddsac.com/LeavesImages.rar (accessed on 6 January 2026) and https://doi.org/10.17632/3f83gxmv57.2.

Acknowledgments

The Autonomous University of Tamaulipas (Mexico) partially supported this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ML	Machine learning
DL	Deep learning
CNN	Convolutional neuronal network
BGR	Blue-Green-Red
VGG16	Visual Geometry Group 16-layer
AI	Artificial Intelligence
RF	Random forest
MLP	Multilayer perceptron
SVM	Support vector machine
SMO	Sequential minimal optimization
FPS	Frames per second
AUC	Area under the curve
ROC	Receiver operating characteristic

References

Dhiman, P.; Kaur, A.; Balasaraswathi, V.R.; Alwan, A.A.; Hamid, Y. Image Acquisition, Preprocessing and Classification of Citrus Fruit Diseases: A Systematic Literature Review. Sustainability 2023, 15, 9643. [Google Scholar] [CrossRef]
Barman, K.D.; Kumar, A.; Halder, A. Hybrid Deep Learning for Citrus Disease Detection. In Advances in Environmental Engineering and Green Technologies Book Series; IGI Global Scientific Publishing: Hershey, PA, USA, 2025; pp. 173–208. [Google Scholar] [CrossRef]
Min, H.J.; Qin, J.; Yadav, P.K.; Frederick, Q.; Burks, T.F.; Dewdney, M.M.; Baek, I.; Kim, M.J. Classification of Citrus Leaf Diseases Using Hyperspectral Reflectance and Fluorescence Imaging and Machine Learning Techniques. Horticulturae 2024, 10, 1124. [Google Scholar] [CrossRef]
Zhu, H.; Wang, D.; Wei, Y.; Zhang, X.; Li, L. Combining Transfer Learning and Ensemble Algorithms for Improved Citrus Leaf Disease Classification. Agriculture 2024, 14, 1549. [Google Scholar] [CrossRef]
Syed-Ab-Rahman, S.F.; Hesamian, M.H.; Prasad, M. Citrus disease detection and classification using end-to-end anchor-based deep learning model. Appl. Intell. 2022, 52, 927–938. [Google Scholar] [CrossRef]
Goyal, A.; Lakhwani, K. Integrating advanced deep learning techniques for enhanced detection and classification of citrus leaf and fruit diseases. Sci. Rep. 2025, 15, 12659. [Google Scholar] [CrossRef]
Kaur, B.; Gupta, S.K.; Janarthan, M.; Alsekait, D.M.; AbdElminaam, D.S. Precision diagnosis of citrus leaf diseases using image enhancement and nonlinear fuzzy ranking ensemble approach NLFuRBe. Sci. Rep. 2025, 15, 32296. [Google Scholar] [CrossRef]
Dong, R.; Shiraiwa, A.; Hayashi, T. A Simple Diagnostic Method for Citrus Greening Disease with Deep Learning. Electron. Commun. Jpn. 2025, 108, e12472. [Google Scholar] [CrossRef]
Çetiner, H. Citrus disease detection and classification using based on convolution deep neural network. Microprocess. Microsyst. 2022, 95, 104687. [Google Scholar] [CrossRef]
Xu, Q.; Cai, J.R.; Zhang, W.; Bai, J.W.; Li, Z.Q.; Tan, B.; Sun, L. Detection of citrus Huanglongbing (HLB) based on the HLB-induced leaf starch accumulation using a home-made computer vision system. Biosyst. Eng. 2022, 218, 163–174. [Google Scholar] [CrossRef]
Li, J.; Chen, D.; Qi, X.; Li, Z.; Huang, Y.; Morris, D.; Tan, X. Label-efficient learning in agriculture: A comprehensive review. Comput. Electron. Agric. 2023, 215, 108412. [Google Scholar] [CrossRef]
Dong, R.; Shiraiwa, A.; Pawasut, A.; Sreechun, K.; Hayashi, T. Diagnosis of Citrus Greening Using Artificial Intelligence: A Faster Region-Based Convolutional Neural Network Approach with Convolution Block Attention Module-Integrated VGGNet and ResNet Models. Plants 2024, 13, 1631. [Google Scholar] [CrossRef]
Dinesh, P.; Lakshmanan, R. Deep Learning-Driven Citrus Disease Detection: A Novel Approach with DeepOverlay L-UNet and VGG-RefineNet. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1023–1041. [Google Scholar] [CrossRef]
Ergün, E. Deep learning based multiclass classification for citrus anomaly detection in agriculture. Signal Image Video Process. 2024, 18, 8077–8088. [Google Scholar] [CrossRef]
Dananjayan, S.; Tang, Y.; Zhuang, J.; Hou, C.; Luo, S. Assessment of state-of-the-art deep learning based citrus disease detection techniques using annotated optical leaf images. Comput. Electron. Agric. 2022, 193, 106658. [Google Scholar] [CrossRef]
Dai, Q.; Huang, Y.; Li, Z.; Lyu, S.; Xue, X.; Song, S.; Liang, S.; Fu, J.; Zhang, S. Field-Based, Non-Destructive and Rapid Detection of Citrus Leaf Physiological and Pathological Conditions Using a Handheld Spectrometer and ASTransformer. Agriculture 2025, 15, 1864. [Google Scholar] [CrossRef]
Shastri, R.; Chaturvedi, A.; Mouleswararao, B.; Varalakshmi, S.; Prasad, G.N.R.; Ram, M.K. An Automatic Detection of Citrus Fruits and Leaves Diseases Using Enhanced Convolutional Neural Network. Remote Sens. Earth Syst. Sci. 2023, 6, 123–134. [Google Scholar] [CrossRef]
Kaur, A.; Kukreja, V.; Upadhyay, D.; Aeri, M.; Sharma, R. DeepCitrus: Leveraging Integrated CNN and VGG16 for Automated Orange Leaf Disease Detection. In 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI); IEEE: Gwalior, India, 2024; Volume 2, pp. 1–5. [Google Scholar] [CrossRef]
George, R.; Thuseethan, S.; Ragel, R.G.; Mahendrakumaran, K.; Nimishan, S.; Wimalasooriya, C.; Alazab, M. Past, present and future of deep plant leaf disease recognition: A survey. Comput. Electron. Agric. 2025, 234, 110128. [Google Scholar] [CrossRef]
Butt, N.; Iqbal, M.M.; Ramzan, S.; Raza, A.; Abualigah, L.; Fitriyani, N.L.; Gu, Y.; Syafrudin, M. Citrus diseases detection using innovative deep learning approach and Hybrid Meta-Heuristic. PLoS ONE 2025, 20, e0316081. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Guo, J.; Qiu, H.; Chen, F.; Zhang, J. Denoising Diffusion Probabilistic Models and Transfer Learning for citrus disease diagnosis. Front. Plant Sci. 2023, 14, 1267810. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Miami, FL, USA, 2009; pp. 248–255. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Las Vegas, NV, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Las Vegas, NV, USA, 2016; pp. 2818–2826. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2020, arXiv:1905.11946. [Google Scholar] [CrossRef]
Sankaran, S.; Subbiah, D.; Chokkalingam, B.S. CitrusDiseaseNet: An integrated approach for automated citrus disease detection using deep learning and kernel extreme learning machine. Earth Sci. Inform. 2024, 17, 3053–3070. [Google Scholar] [CrossRef]
Xian, Z.; Huang, R.; Towey, D.; Yue, C. Convolutional Neural Network Image Classification Based on Different Color Spaces. Tsinghua Sci. Technol. 2025, 30, 402–417. [Google Scholar] [CrossRef]
Saygılı, A. Enhancing Lemon Leaf Disease Detection: A Hybrid Approach Combining Deep Learning Feature Extraction and mRMR-Optimized SVM Classification. Appl. Sci. 2025, 15, 10988. [Google Scholar] [CrossRef]
Sánchez-DelaCruz, E. Orange Leaves Images from Local Crops. 2021. Available online: https://www.eddsac.com/LeavesImages.rar (accessed on 12 December 2025).
Rauf, H.T.; Saleem, B.A.; Lali, M.I.U.; Khan, M.A.; Sharif, M.; Bukhari, S.A.C. A Citrus Fruits and Leaves Dataset for Detection and Classification of Citrus Diseases through Machine Learning. 2019. Mendeley Data, V2. Available online: https://data.mendeley.com/datasets/3f83gxmv57/2 (accessed on 15 December 2025).
Rauf, H.T.; Saleem, B.A.; Lali, M.I.U.; Khan, M.A.; Sharif, M.; Bukhari, S.A.C. A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning. Data Brief 2019, 26, 104340. [Google Scholar] [CrossRef]
Xiao, W.; Shang, J.; Li, F.; Ao, O.; Wang, X.; Tian, S. Research on Citrus Leaf Disease Recognition Using Class-Agnostic Contrastive Learning and Supervised Organizational Mapping. IEEE Access 2025, 13, 101592–101608. [Google Scholar] [CrossRef]
Siam, A.K.M.F.K.; Bishshash, P.; Nirob, M.A.S.; Mamun, S.B.; Assaduzzaman, M.; Noori, S.R.H. A comprehensive image dataset for the identification of lemon leaf diseases and computer vision applications. Data Brief 2025, 58, 111244. [Google Scholar] [CrossRef]
Elaraby, A.; Hamdy, W.; Alanazi, S. Classification of Citrus Diseases Using Optimization Deep Learning Approach. Comput. Intell. Neurosci. 2022, 2022, 9153207. [Google Scholar] [CrossRef]
Nadeau, C.; Bengio, Y. Inference for the Generalization Error. Mach. Learn. 2003, 52, 239–281. [Google Scholar] [CrossRef]
Sujatha, R.; Chatterjee, J.M.; Jhanjhi, N.; Brohi, S.N. Performance of deep learning vs machine learning in plant leaf disease detection. Microprocess. Microsyst. 2021, 80, 103615. [Google Scholar] [CrossRef]

Figure 1. Feature activation maps extracted from the convolutional layers of the VGG16 backbone. The visualization shows how the deep network captures complex textural and structural patterns associated with citrus foliar lesions before PCA dimensionality reduction.

Figure 2. Scree plot and cumulative variance analysis for Dataset 1 (Mexico). The blue line tracks the cumulative explained variance in the 12,800 raw CNN descriptors. The mathematical elbow (dashed green line) at 158 PCs and the 95% threshold (dotted black line) at 385 PCs contrast with the heuristic selection of 30 PCs (solid red line).

Figure 3. Multi-class AUC for the 12-features optimized hybrid model. The weighted-average

A U C

of

0.994

demonstrates an almost perfect discriminative capacity across the four conditions associated with the visible expression of symptoms, thereby validating the applied dimensionality reduction.

Figure 3. Multi-class AUC for the 12-features optimized hybrid model. The weighted-average

A U C

of

0.994

demonstrates an almost perfect discriminative capacity across the four conditions associated with the visible expression of symptoms, thereby validating the applied dimensionality reduction.

Figure 4. Heatmap of the confusion matrix for the MLP model.

Figure 5. Scree plot and cumulative variance analysis for Dataset 2 (Pakistan). Due to the higher morphological diversity of the samples, the 95% threshold is reached at 790 PCs, with an elbow point at 314 PCs. The standardized selection of 30 PCs (solid red line) captures the main structural patterns while maintaining a balanced feature ratio for the subsequent hybrid classification.

Figure 6. Confusion matrix for citrus disease classification on the dataset of Experiment 2 using the optimized hybrid model. The values represent the number of correctly classified instances (diagonal) versus the prediction errors across phytopathological categories.

Figure 7. Relative importance analysis (Gini index/wrapper merit) of the top 18 hybrid descriptors.

Figure 8. Class-wise performance comparison (

F-measure

,

A U C

and

p r e c i s i o n

) for dataset of the Experiment 2.

Figure 8. Class-wise performance comparison (

F-measure

,

A U C

and

p r e c i s i o n

) for dataset of the Experiment 2.

Figure 9. AUC analysis for the transcontinental validation on Dataset 2.

Table 1. Comprehensive summary of Transcontinental datasets.

Parameter	Mexico Dataset	Pakistan Dataset
Origin & Climate	Veracruz (Warm-humid) [30]	Sargodha (Semi-arid) [31]
Citrus Varieties	C. Sinensis	C. Nobilis x C. Deliciosa (Kinnow Mandarin)
Total Samples (N)	262	609
Number of Classes	4 (incl. healthy)	5 (incl. healthy)
Class Distribution	Healthy (70), Greasy Spot (64), Variegated Chlorosis (64), Zinc Deficiency (64)	Healthy (58), Black Spot (171), Canker (163), HLB ¹ (204), Melanose (13)
Device Type	High-resolution mobile	DSLR (Canon EOS 1300D)
Acquisition Environment	Semi-controlled (Canopy)	Inconsistent Field Conditions
Original Resolution	5 MP ( $2592 \times 1456$ )	18 MP ( $5202 \times 3465$ )
Processing Size	$224 \times 224$ pixels	$256 \times 256$ pixels
Distance	10 cm to 20 cm	Variable (macro and standard shots)
Image Format	RGB (JPEG/PNG)	RGB (JPEG)

¹ HLB stands for Huanglongbing.

Table 2. Technical parameters of the hybrid feature extraction process.

Domain	Parameter	Configuration
Handcrafted (color)	Color space	BGR
	Analysis unit	Patches
	Patch size	$128 \times 128$ pixels
	Number of patches	30 per image
	Statistical moments	4 ( $μ$ , $σ$ , $γ$ , and $κ$ )
	Total features $V_{H C}$	360 ( $30 \times 3 \times 4$ )
CNN (texture)	Base model	VGG16
	Weights	Pre-trained (ImageNet)
	Training mode	Fixed feature extractor (frozen)
	Optimization	None (static inference)
	Input resolution	$160 \times 160 \times 3$
	Extraction layer	block5_pool
	Final spatial grid	$5 \times 5$
	Filter depth	512
	Total features $V_{C N N}$	12,800 ( $5 \times 5 \times 512$ )

Table 3. Image distribution and augmentation strategy for Dataset 2 (Pakistan).

Phytopathological Condition	Original Images	Augmented Images	Total (Balanced)	Techniques Applied
Black spot	171	29	200	Flip (H/V), Rotation 90°
Canker	163	37	200	Flip (H/V), Rotation 90°
HLB	204	0	200	Random Subsampling
Melanose	13	187	200	Flip (H/V), Rotation 90°
Healthy	58	142	200	Flip (H/V), Rotation 90°
Total	609	395	1000	–

Table 4. Preliminary benchmarking of ML paradigms on the initial hybrid dataset (390 features).

Algorithm	Accuracy (%)	Kappa	AUC	Training Time (s)
MLP	97.71	0.969	0.993	102.28
RF	96.75	0.956	0.996	0.38
SVM (SMO)	96.56	0.954	0.980	0.05
BayesNet	88.55	0.847	0.975	0.02
J48	85.50	0.806	0.924	0.04
Naive Bayes	81.87	0.757	0.946	0.01

Table 5. Impact of feature pruning on dimensionality reduction and classification performance.

Model	Initial Attributes	Selected Attributes	Reduction (%)	Accuracy (%)	Kappa	Training Time (s)
Handcrafted (BGR)	360	5	98.61	93.51	0.913	0.06
CNN (VGG16-PCA)	30	7	76.66	92.56	0.900	0.03
HC-CNN Hybrid Optimized	390	12	96.92	95.23	0.936	0.09

Table 6. Feature selection stability across 10-fold cross-validation (Dataset 1).

Feature ID	Type	Selection Frequency	Stability
V4	Handcrafted	10/10	High
V6	Handcrafted	10/10	High
V5	Handcrafted	8/10	Medium-High
PC_CNN_2	CNN (PCA)	8/10	Medium-High
PC_CNN_4	CNN (PCA)	8/10	Medium-High
PC_CNN_5	CNN (PCA)	8/10	Medium-High
PC_CNN_7	CNN (PCA)	7/10	Medium

Table 7. Comparative performance metrics of classifiers using the optimized hybrid feature space (12 descriptors).

Classifier	Accuracy (%)	Kappa	Precision	Recall	F-Measure	AUC
MLP	95.61	0.941	0.956	0.956	0.956	0.992
RF	95.23	0.936	0.952	0.952	0.952	0.994
SVM (SMO)	92.37	0.898	0.925	0.924	0.924	0.964
Naive Bayes	87.21	0.829	0.872	0.872	0.872	0.960
J48	87.21	0.829	0.873	0.872	0.873	0.929
BayesNet	85.50	0.806	0.855	0.855	0.855	0.972

Table 8. Comparative class-wise Recall (%) metric for the top-performing classification models.

Target Class	MLP	RF	SVM (SMO)
Greasy spot	100	98.4	96.9
Healthy leaves	94.3	91.4	94.3
Variegated chlorosis	89.8	93.8	84.4
Zinc deficiency	98.4	97.7	93.8

Table 9. Preliminary benchmarking of ML paradigms on the initial hybrid dataset 2 (390 features).

Algorithm	Accuracy (%)	Kappa	AUC	Training Time (s)
MLP	87.80	0.847	0.972	248.88
RF	87.80	0.847	0.975	0.31
SVM (SMO)	89.00	0.862	0.958	0.15
BayesNet	80.00	0.750	0.940	0.08
J48	77.00	0.712	0.860	0.23
Naive Bayes	66.60	0.582	0.885	0.04

Table 10. Feature selection stability across 10-fold cross-validation (Dataset 2).

Feature ID	Type	Selection Frequency	Stability
PC_CNN_2	CNN (PCA)	9/10	High
PC_CNN_23	CNN (PCA)	9/10	High
PC_CNN_24	CNN (PCA)	9/10	High
PC_CNN_25	CNN (PCA)	7/10	Medium
PC_CNN_11	CNN (PCA)	6/10	Medium
PC_CNN_20	CNN (PCA)	6/10	Medium
V234	Handcrafted	4/10	Consistent

Table 11. Comparative performance metrics of classifiers using the optimized hybrid feature space of Experiment 2 (18 descriptors).

Classifier	Accuracy (%)	Kappa	Precision	Recall	F-Measure	AUC
MLP	83.6	0.795	0.835	0.836	0.835	0.962
RF	87.3	0.841	0.874	0.873	0.873	0.976
SMO (SVM)	74.2	0.677	0.743	0.742	0.741	0.897
Naive Bayes	76.7	0.709	0.767	0.767	0.766	0.935
J48	77.4	0.717	0.772	0.774	0.773	0.881
Bayes Net	78.1	0.726	0.783	0.781	0.782	0.947

Table 12. Transcontinental validation: Performance metrics on dataset of the Experiment 2 using the minimalist 18-feature architecture.

Class	Precision	Recall	F-Measure	MCC	AUC
Black spot	0.749	0.790	0.769	0.710	0.951
Canker	0.943	0.905	0.923	0.905	0.992
HLB	0.772	0.730	0.751	0.691	0.947
Melanose	1.000	1.000	1.000	1.000	1.000
Healthy	0.904	0.940	0.922	0.902	0.990
Average	0.874	0.873	0.873	0.841	0.976

Table 13. Ablation study: Comparative analysis of final optimized models across different feature domains.

Dataset	Features	Quantity	Accuracy (%)	Kappa	F-Measure	AUC	Contribution
Handcrafted	BGR moments	8	$75.70 %$	$0.696$	$0.754$	$0.934$	Baseline
CNN	PCA descriptors	10	$83.50 %$	$0.794$	$0.834$	$0.968$	$+ 10.30 %$ (over HC)
Optimized hybrid	BGR + PCA (CNN)	18	$87.30 %$	$0.841$	$0.873$	$0.976$	$+ 4.55 %$ (over CNN)

Table 14. Hardware performance benchmarking on ARM64 edge-compatible architecture (Stages 3 and 4: Scaling, PCA, and Classification).

Metric	Full VGG16 (Baseline)	Hybrid Elite Model (Ours)	Improvement
Model Storage	528 MB	13.59 MB	97.4% reduction
Peak System RAM	>1.5 GB	1.21 GB	19.3% reduction
E2E Latency	2.1 s (ARM CPU)	56.35 ms	37x Speedup
Throughput	<1 FPS	17.75 FPS	Real-time ready

Table 15. Contextual benchmark of the Proposed Elite Model vs. state-of-the-art systems.

Reference	Diagnostic Strategy	Feature Space Complexity	Validation Scope	Reported Accuracy (%)
Goyal (2025) [6]	DL (DenseNet)	High (> $20 M$ parameters)	Single Domain (Controlled)	$99.12$
Zhu (2024) [4]	Ensemble (VGG16 + ResNet)	Very High (Multi-model stack)	Single Domain (Regional)	$97.50$
Min (2024) [3]	Hyperspectral Imaging	High (Multiple Spectral Bands)	Specialized Hardware	$94.00$
Sujatha (2021) [37]	VGG16 Benchmarking	High (Monolithic CNN)	Single Domain	$89.50$
Proposed Elite Model	Hybrid (BGR + CNN + Pruning)	Minimalist (12–18 descriptors)	Transcontinental (Mexico/Pakistan)	95.23/87.30 (Field Robustness)

Note: Accuracy values are contextual; direct comparison is restricted by the use of non-standardized datasets and diverse environmental conditions.

Table 16. Quantitative performance comparison between the proposed Elite Model and state-of-the-art lightweight CNN architectures using 10-fold cross-validation.

Dataset	Architecture	Parameters/Features	Accuracy (%)	Kappa
Mexico	EfficientNetB0	∼5.3 Million	93.53	0.9115
	MobileNetV2	∼3.4 Million	93.92	0.9169
	Elite Model	18	95.23	0.9410
Pakistan	EfficientNetB0	∼5.3 Million	89.33	0.8517
	MobileNetV2	∼3.4 Million	84.39	0.7842
	Elite Model	18	87.30	0.8400

Table 17. Comparative analysis of computational complexity and inference performance between the VGG16 baseline and the proposed Elite Model.

Metric	Full VGG16 (Baseline)	Proposed Elite Model (Pakistan)
Number of Features	25,088 (last conv layer)	18
Model Size (Storage)	∼528 MB	<2.5 MB
Inference Latency	∼0.18 s (CPU)	$0.03$ s (CPU)
Complexity (FLOPs)	∼15.3 GigaFLOPs	Negligible (<0.01 MFLOPs)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tello-Leal, E.; Macías-Hernández, B.A.; Rubio-Tinajero, S.; Hernandez-Resendiz, J.D.; Ramirez-Alcocer, U.M. Optimized Hybrid Feature Space for High-Efficiency Citrus Disease Diagnosis: A Fusion of Handcrafted Blue-Green-Red Color Moments and Deep Convolutional Descriptors. Agriculture 2026, 16, 711. https://doi.org/10.3390/agriculture16060711

AMA Style

Tello-Leal E, Macías-Hernández BA, Rubio-Tinajero S, Hernandez-Resendiz JD, Ramirez-Alcocer UM. Optimized Hybrid Feature Space for High-Efficiency Citrus Disease Diagnosis: A Fusion of Handcrafted Blue-Green-Red Color Moments and Deep Convolutional Descriptors. Agriculture. 2026; 16(6):711. https://doi.org/10.3390/agriculture16060711

Chicago/Turabian Style

Tello-Leal, Edgar, Bárbara A. Macías-Hernández, Sarahi Rubio-Tinajero, Jaciel David Hernandez-Resendiz, and Ulises Manuel Ramirez-Alcocer. 2026. "Optimized Hybrid Feature Space for High-Efficiency Citrus Disease Diagnosis: A Fusion of Handcrafted Blue-Green-Red Color Moments and Deep Convolutional Descriptors" Agriculture 16, no. 6: 711. https://doi.org/10.3390/agriculture16060711

APA Style

Tello-Leal, E., Macías-Hernández, B. A., Rubio-Tinajero, S., Hernandez-Resendiz, J. D., & Ramirez-Alcocer, U. M. (2026). Optimized Hybrid Feature Space for High-Efficiency Citrus Disease Diagnosis: A Fusion of Handcrafted Blue-Green-Red Color Moments and Deep Convolutional Descriptors. Agriculture, 16(6), 711. https://doi.org/10.3390/agriculture16060711

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimized Hybrid Feature Space for High-Efficiency Citrus Disease Diagnosis: A Fusion of Handcrafted Blue-Green-Red Color Moments and Deep Convolutional Descriptors

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.1.1. Transcontinental Data Sources

2.1.2. Data Acquisition and Environmental Conditions

2.1.3. Dataset Comparison and Class Distribution

2.2. Experiment Design

2.2.1. Feature Extraction

2.2.2. Dimensionality Reduction Using PCA

2.2.3. Hybrid Space Definition and Augmentation Strategy

2.2.4. Feature Space Optimization

2.2.5. Statistical Moments of BGR Color Space

2.3. Validation and Splitting Strategy

2.4. Classification Algorithms Benchmarking

2.5. Computational Environment

3. Results

3.1. Experiment 1

3.1.1. Base Classifier Selection and Hybrid Benchmarking

3.1.2. Attribute Selection and Elite Model Definition

3.2. Experiment 2

3.2.1. Validation Results of Hybrid Benchmarking

3.2.2. Synthesis of the Hybrid Elite and Attribute Hierarchy Model

4. Discussion

4.1. Discussion Experiment 1

4.2. Discussion Experiment 2

4.3. Edge-AI Feasibility Analysis

4.4. Contextualization Within the State-of-the-Art

4.5. Integrative Analysis of the Hybrid Elite Model Performance

4.6. Complexity and Overfitting Mitigation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI