1. Introduction
Durian is a key economic crop in Thailand, both in terms of farmers’ income and exports, with production showing a continued upward trend according to government data. Most recently, the Office of Agricultural Economics (OAE), Ministry of Agriculture and Cooperatives, estimated that Thailand’s durian output in 2025 would be around 1.6 million tons, in line with national measures to upgrade fruit quality for export and to advance GAP/GI standards [
1]. At the same time, Thailand’s global role underscores this importance: recent studies report that Thailand accounted for over 90% of the world’s export value of “fresh durian” during 2020–2022, making the durian supply chain a critical economic mechanism in Southeast Asia (ASEAN) [
2].
At the provincial level, Nakhon Si Thammarat is one of the major “durian hubs” in southern Thailand, with strong potential in planted area, production, and marketing chains. The provincial fruit situation report indicates that the province’s mature durian area increased by several thousand rai from the previous year and that total production rose by 24.57%, with key producing districts including Tha Sala, Sichon, Nopphitam, and Phibun. This reflects both farmers’ confidence and improved access to modern orchard management knowledge (
https://nakhonsri.doae.go.th/, accessed on 10 May 2025). Such growth comes with pressing challenges regarding product quality and safety to maintain confidence in export markets. Examples include stricter 100% export fruit inspections and sensitive cases involving residues/forbidden colorants that affect market price signals, as well as the requirements that quality assessment must be modern, transparent, and practically usable by ordinary farmers starting upstream in the production plots [
3].
Regarding plant health, key durian foliar diseases, especially leaf-spot and leaf-blight, are quality-degrading factors that impact photosynthesis, growth, and overall yield. Recent reviews of foliar diseases show significant physiological and structural damage caused by leaf pathogens, and there has been a steady rise in scholarly reports on Colletotrichum and related pathogens in tropical crops, underscoring the need for proactive surveillance in tropical orchards [
4]. In southern field contexts—characterized by high humidity and diurnal light variation—visual inspection can be error-prone, creating an opportunity for image processing and machine learning (ML) technologies to support standardized, repeatable pre-grading.
However, there remains a knowledge gap: many AI-based tools function as “black boxes”, making it difficult to explain decisions to farmers or students. Educational media and courses for students also lack an integrated prototype that links color principles (e.g., HSV/Lab and specular suppression via V/L* thresholds), spatial rules for lesions (pixel area), and a deep analysis pipeline (Deep Features: PCA–SVM) into one web platform that can open real images (JPG/PNG) and report results in Thai serving both classroom teaching and on-farm use. Systematic reviews of digital/interactive agricultural extension emphasize that accessible tools on mobile/web accelerate technology adoption among smallholders, yet obstacles remain in digital literacy, infrastructure, and local-context suitability—factors that should be co-designed with actual users [
5].
Plant leaf image analysis has advanced widely, from color/texture rules to deep networks. A recent review of deep-learning-based leaf disease classification concludes that while modern models improve accuracy, major challenges persist in illumination/background variability and explainability to build end-user trust. Therefore, a web prototype that integrates explainable color rules (e.g., yellow hue 20–40° with a* and b* thresholds and specular highlight removal) with a learnable model (extract Deep Features from ResNet50/DenseNet201, reduce dimensionality with PCA, and classify with SVM) meets two concurrent needs: (a) a teaching demonstrator for students in computer science, information technology, and agricultural science; and (b) an initial decision-support tool for young durian growers in Nakhon Si Thammarat [
4].
At the same time, recent advances in digital agriculture indicate that monitoring frameworks can now combine multi-sensor inputs, 3D/vision techniques, and deep models to support crop-attribute assessment under heterogeneous field conditions, but many of these efforts still target “general agricultural scenarios” and not crop- and locale-specific decision support [
6]. Likewise, web-based AI plant-disease systems have shown that state-of-the-art CNN models can be delivered via browser to classify leaf and fruit diseases, returning confidence scores and management hints to end-users [
7]. These two lines of work are important because they confirm that (1) AI for agriculture is mature enough to work in real, changing field environments, and (2) web delivery lowers the technical barrier for farmers and students. However, most of these studies emphasize general-purpose pipelines and single-crop datasets in temperate contexts; they rarely integrate an explainable color-rule module (HSV/Lab with specular suppression) into a lightweight Deep–PCA–SVM stack and Thai-language reporting tailored to tropical durian orchards. This gap motivates the present web system to be both explainable and field-adapted for southern Thailand.
For durian leaf-spot/-blight analysis, this article is grounded in four key principles:
- (1)
Explainable color rules + specular highlight removal, an interpretable color-rule structure helps transfer knowledge to field users and learners because decisions can be visualized from tangible color properties (e.g., yellow hue ≈ 20–40° for leaf-spot; CIELab a*–b* indicating yellow–brown). Prior reviews and experiments show that HSV/Lab are suitable for agricultural tasks because they separate “brightness” from “color”, reducing the influence of lighting and background [
8,
9,
10]. Yet specular highlights often produce “shiny white” areas that distort thresholds, so removal is necessary, whether via brightness-based criteria (e.g., V in HSV or L* in Lab) or additional structural/statistical methods [
11,
12]. We therefore combine V/L* thresholds with light median/bilateral filtering to reduce glare without distorting lesion color, thereby stabilizing the rules while remaining interpretable.
- (2)
Deep Features, PCA–SVM: Although modern deep networks achieve high accuracy for leaf disease classification, field deployments often face small, variable datasets. One can use pretrained Deep Features (e.g., ResNet50/DenseNet201) plus PCA for dimensionality reduction and use SVM for classification, balancing performance and training cost. Evidence indicates that PCA preserves key information while improving SVM effectiveness, especially with limited data [
13,
14,
15]. Comparative findings also show that SVM can outperform fully trained deep networks when sample sizes are small. This pipeline is therefore “light” in time and resources and well-suited for students to grasp an end-to-end ML pipeline—from feature extraction to classification.
- (3)
Open web app: To ensure real-world access for farmers and learners, we develop a web app that accepts standard images (JPG/PNG), processes them on server/client, and presents results through a simple interface: original image, lesion circles/boxes, and concise Thai summaries. This aligns with the literature on digital/interactive extension, which indicates that user-centered design and web/mobile accessibility can accelerate technology adoption and decision quality among smallholders [
16,
17,
18], while frameworks like Streamlit allow rapid UI building for ML education and demos [
19]. This enables instructors to use the app as a classroom demonstrator and students to interactively tune parameters and observe outcomes.
- (4)
Practice-oriented case study with farmer-understandable metrics: We report not only academic metrics (Accuracy/F1) but also operational indicators accessible to farmers, e.g., processing time per image, sensitivity to lesion sizes (e.g., “spot ≤ 30 px” vs. “blight ≥ 300 px”), and acceptable false-alarm rates in orchard contexts. This approach aligns with recommendations that metrics should reflect decision impact, not just dataset statistics [
10,
20]. Using real field images captures challenges in lighting, background, and tissue variability frequently emphasized in the plant pathology literature [
21]. Together, qualitative overlays (lesion highlights) and quantitative outputs (confusion matrices and PCA variance vs. accuracy) enable the tool to function as both a teaching medium and an initial decision-support instrument.
Consequently, this study proposes a teach-and-use web app for durian leaf-spot/-blight analysis that leverages HSV/Lab color rules with V/L* specular removal and a Deep Feature (PCA–SVM) pipeline on real-world images from Nakhon Si Thammarat, aiming for both interpretability and operational accuracy, with Thai-language reporting and lesion localization. The tool serves as both an integrated teaching prototype for students and an immediate on-farm decision-support aid for growers.
2. Related Work
Research on image-based plant disease detection has advanced rapidly over the past decade and can be grouped into two major lines: (a) traditional methods that rely on color and texture features along with rule-based decision criteria, and (b) learning-based approaches centered on convolutional neural networks (CNNs) and broader deep learning techniques. Recent surveys conclude that deep learning substantially improves classification performance—particularly when large, diverse datasets are available—yet key challenges remain around illumination and background variability as well as model interpretability for end-users such as farmers and students [
4,
22].
In traditional pipelines, color spaces play a critical role because they decouple “brightness” from “chromaticity”, which stabilizes thresholding under changing light conditions. Experimental and review papers report that HSV, CIELab, HSL, and YCrCb often outperform raw RGB for localizing lesion regions and infected leaf tissue prior to classification [
9,
23]. In particular, HSV and Lab allow practitioners to define explainable color rules—for example, a yellow hue band (≈20–40°) or a*–b* constraints to capture yellow–brown tones—which are well suited for communicating reasoning to non-expert users. Complementing this, a body of work emphasizes specular highlight removal to restore pixel details before color analysis. Approaches based on brightness thresholds (e.g., V in HSV or L* in Lab) and adaptive specular detection/removal have been shown to improve segmentation quality and the robustness of color thresholds [
11,
24].
On the deep learning side, broad surveys covering 2020–2024 document rapid progress in architectures and evaluation protocols across more than 160 studies, noting that well-trained models can detect diseases at early stages with high robustness [
22]. Applications in tropical cash crops are also growing, for instance, lightweight, fast CNNs for banana leaves designed for field use on resource-constrained devices [
25], or real-time mobile apps for maize that prioritize user experience and low inference latency [
26]. While deep models often outperform classical methods, limitations persist when training data are scarce, when field conditions vary widely, and when users require transparent decision rationales to build trust.
To address explainability, explainable AI (XAI) has increasingly been integrated into plant disease systems to reveal which image regions drive model decisions. Recent work highlights that attention maps (e.g., Grad-CAM) and related XAI techniques can improve transparency and credibility, especially in agricultural domains where end-users are not AI specialists [
27,
28]. However, most XAI research focuses on interpreting CNNs alone; relatively few systems combine XAI with explainable color rules and image pre-processing modules (such as specular suppression) into a single, field-deployable web pipeline.
Within the Thai/ASEAN context, studies on fruit crop diseases emphasize the importance of pathogens in tropical climates, especially species within the genus Colletotrichum that are associated with anthracnose, leaf-spots, and fruit rots in durian and other tropical fruits. Recent reports identify species found in eastern Thailand and nearby countries and assess pathogenicity on leaves and fruits [
29,
30]. These findings underscore the need to continue developing field datasets that faithfully capture pathogen–symptom characteristics in the region and to provide integrated, field-oriented tools for surveillance and preliminary grading that link academic knowledge with practical orchard management.
Another thread of field-facing work focuses on real-time/web/mobile solutions to increase accessibility for smallholders—for example, web-mobile systems for plant disease diagnosis using public image datasets (mPD-APP) designed for smartphone use [
31] and real-time leaf monitoring frameworks that embed deep learning as the core detector [
32]. Even so, many projects still prioritize model accuracy over crafting an interpretable processing pipeline and over user-centered reporting—e.g., localized language messages, lesion overlays, and operational metrics that are easily understood in the field.
Considering the remaining gap, although there is a large volume of CNN-centric work with strong accuracy, there are relatively few systems that integrate the following: (1) explainable color rules plus specular highlight removal; (2) a lightweight feature-learning stack that extracts Deep Features, applies PCA for dimensionality reduction, and uses SVM for classification; and (3) a web platform meant for both classroom learning and practical field use. Many researchers choose end-to-end fine-tuning without separating an “explainable color” module that non-experts can readily understand—even though multiple reports affirm the value of careful pre-processing and suitable color spaces ahead of learning [
9,
23,
33]. Likewise, while XAI is growing, it is often limited to CNN heatmaps rather than hybrid pipelines that explain decisions through explicit color rules and spatial lesion parameters alongside Deep Features.
In the Thai/ASEAN literature on cash crops and fruit, several studies focus on banana, cassava, and other tropical crops, confirming the benefit of lightweight models for resource-limited devices and field deployment [
25,
34,
35]. Still, there is a need for practice-oriented evaluation frameworks that incorporate farmer-understandable indicators such as inference time, sensitivity to lesion size, acceptable alert thresholds, and localized language outputs to bridge laboratory results with real orchard needs. Moreover, as reports identifying Colletotrichum in durian across Thailand and nearby regions have accelerated during 2024–2025, there is a stronger impetus for modern, transparent, upstream surveillance/grading systems within the supply chain [
29,
30].
Overall, the international literature supports hybrid pipelines that let users understand the underlying reasons via color-rule modules (HSV/Lab with specular suppression) while achieving robustness through learning modules (Deep Features: PCA–SVM), particularly when training data are modest, and lighting varies widely. This strategy is also well-suited for education, as it decomposes the workflow into tunable components that students can explore (pre-processing, color analysis, dimensionality reduction, and classification) and then connects to field reality through a web app. In short, the gap this paper aims to fill involves integrating explainable color rules (HSV/Lab with V/L* thresholds for specular removal) with a lightweight Deep–PCA–SVM pipeline and delivering the result as a Thai-language web tool that supports standard image formats, lesion overlays, and farmer-friendly operational metrics all grounded in the Thai/ASEAN durian context.
3. Materials and Methods
3.1. Datasets and Image Collection
The dataset comprises three durian-leaf subclasses: (1) healthy, (2) leaf-spot, and (3) leaf-blight. For each class, exactly 100 images were collected, resulting in a total dataset of 300 images. These images were used to train and evaluate both the Deep Features (PCA–SVM) pipeline and the color-rule baseline. Field images were collected from orchards located in at least three districts of Nakhon Si Thammarat to capture variations in cultivar types, illumination conditions, and background complexity. To reduce contextual leakage across data splits, an orchard-level partitioning strategy was adopted. Specifically, each class was independently split into training, validation, and test sets using a 70:15:15 ratio. The dataset composition and data split per class are summarized in
Table 1.
Imaging devices and conditions: We use smartphones/digital cameras with a resolution ≥12 MP, saving high-quality JPG or PNG. White balance was set to auto, and focus was locked prior to capture. The shooting distance is 30–60 cm so that a leaf fills the frame while avoiding overlap between leaves. Preferred capture windows are 08:00–11:00 and 15:00–17:00 to avoid harsh midday sun. If specular (white) highlights occur, the camera angle or a shade is adjusted, and the image is retaken 1–2 frames per view.
Annotation: Image-level labels cover three classes. Size criteria are defined as follows: leaf-spot denotes small connected components ≤ 30 px each, and leaf-blight denotes brown/tan patches ≥ 300 px or a bounding box (w × h) ≥ 100 × 100 px. Labeling is performed by two agricultural experts. Pre-processing: All images were resized using a contain-and-pad strategy to 1750 × 800 px, after which a 50-px border was cropped on all sides to define the analysis area. Files are stored in a structured layout.
Ethics and data rights: Informed consent is obtained from orchard owners; personal information is masked, and no identifiable coordinates are disclosed. The dataset and preparation scripts are curated for academic sharing under the data owner’s terms.
3.2. Color-Rule Framework and Settings
Our color-rule framework is designed to be (1) explainable and robust to lighting, integrating four main stages: (1.1) RGB → HSV/Lab conversion with specular highlight removal, (1.2) color-based screening, (1.3) post-processing, and (1.4) decision rules for final classification. We also describe (2) parameter settings and (3) runtime/compute considerations.
3.2.1. Explainable and Robust to Lighting
Input images (JPG/PNG) were first resized using a contain-and-pad strategy to match the web application’s analysis area, and then converted from RGB to HSV and CIE Lab color spaces to decouple brightness from chromaticity and reduce sensitivity to illumination variations. Specular highlight removal is performed in a lightweight manner using combined thresholds on the Value (V) and Saturation (S) components in HSV and the Lightness (L*) component in the CIE Lab. Pixels exhibiting high brightness but low saturation are identified as potential specular regions.
For specular highlight suppression, empirically determined threshold values were selected based on preliminary field observations. Specifically, pixels with V ≥ 0.90 and S ≤ 0.25 in HSV space, or L* ≥ 85 in CIE Lab space, were flagged as specular highlights. These regions were replaced using a median filter with a 3 × 3 kernel, which was empirically selected to suppress glare while preserving small lesion boundaries. Bilateral filtering was also evaluated but not adopted as the default due to its higher computational cost and marginal visual improvement. Initial parameters can be adjusted via the web application’s control panel.
- 2.
Color-based screening
After specular highlight removal, explainable color rules were applied, comprising (a) a yellow hue band ~20–40° in HSV (mapped to OpenCV’s 0–179 scale), and (b) a* and b* constraints in CIELab to capture yellow–brown lesion tones. A pixel is considered a candidate lesion if it passes both the H and (a*, b*) criteria. The result is an initial binary mask for subsequent processing (
Figure 1 and
Figure 2).
- 3.
Post-processing
The initial mask typically contained spurious dots, leftover pixels, and ragged edges that could cause errors. We therefore apply a small median blur to reduce salt-and-pepper noise (random tiny black/white speckles) and smooth the surface, followed by morphological opening (to remove small irrelevant dots) and closing (to connect slightly broken regions). Connected components were then computed to obtain lesion clusters along with component statistics (area, bounding box w × h, centroid). This step enables object-level analysis of spot/patch size and cluster counts rather than raw pixel counts.
- 4.
Decision rules
We distinguish spot vs. blight using component size, reflecting the following field symptoms.
- 5.
Leaf-spot
Components with an area ≤ 30 px were treated as small spots. We count components passing the criteria; if the count ≥ N_min (e.g., 6–12 as tuned on the validation set), we output “Leaf spot”.
- 6.
Leaf-blight
Components with an area ≥ 300 px or a bounding-box size ≥ 100 × 100 px were treated as large patches. If at least one such component is found, we output “Leaf blight”, which has higher priority than leaf-spot.
If neither condition is met, the system outputs “Healthy leaf”. The result page also overlays circles/boxes on components meeting the criteria to explain the decision locations.
3.2.2. Parameter Settings
Key parameters in
Table 2 (V_thr, S_max, L*_thr, Hue band, a*, b*, spot/blight size thresholds, and N_min) are tuned on the validation set to balance sensitivity and specificity under variable lighting and background conditions. If the entire leaf appeared yellowish and led to false positives, the minimum a*/b* thresholds can be increased, or the hue band can be narrowed accordingly.
Explainability objective: In addition to robustness to illumination, this color-rule branch was included as a design requirement to expose the visual evidence to non-technical orchard users. By operating in HSV/Lab, the system reports decisions in terms of human-readable properties (e.g., “yellow lesions around 20–40° hue and brownish Lab a*/b* values”) and can highlight the corresponding regions on the uploaded image. This makes the module directly usable for farmers and students without requiring them to interpret Deep Features.
3.2.3. Runtime and Compute
The entire color-rules pipeline runs efficiently on CPU. For images around 1750 × 800 px, the average processing time is approximately 50–150 ms/image on a typical PC, making it suitable for real-time web use and for resource-limited devices.
3.3. Deep Features
3.3.1. Architecture
In this work, ResNet50 and DenseNet201 pretrained on ImageNet were used as feature extractors rather than training end-to-end on a small leaf dataset. The rationale is that both models are “sufficiently deep” to capture texture/edge/color patterns and already contain generic filters learned from large-scale images, making transfer to agricultural imagery effective without high training cost (
Figure 3).
3.3.2. Model-Compatible Preprocessing
Input images were first resized and padded using a contain-and-pad scheme to preserve aspect ratio, and subsequently resized to 224 × 224 px. ImageNet normalization was then applied—scaling pixel values to [0, 1], followed by channel-wise mean subtraction and standard-deviation normalization—to ensure consistency with the pretrained backbone and to mitigate pixel-scale domain shift (
Figure 4).
3.3.3. Feature Extraction
We extract feature vectors from the layer before the classification head, followed by Global Average Pooling (GAP) to summarize spatial information into a fixed-length vector. This vector encodes deep texture, color, and edge patterns and is typically more discriminative than traditional hand-crafted features, while being well-suited for a lightweight downstream classifier.
3.3.4. Dimensionality Reduction with PCA
Because Deep Feature vectors are high-dimensional, we apply Principal Component Analysis (PCA) to reduce dimensionality, suppress noise, and help prevent overfitting when sample sizes are limited. PCA is fit only on the training set, and the learned transform is then applied to val/test to avoid data leakage. The number of components is selected using the explained variance ratio on the validation set. In practice, 64–128 components are sufficient for medium-sized field datasets and also improve classifier speed and stability.
3.3.5. SVM Classifier and Settings
We use Support Vector Machines (SVMs) for binary classification (healthy vs. leaf-disease, or healthy vs. leaf-spot/leaf-blight in sub-tasks). Two kernels were considered.
Linear SVM: suitable when PCA-reduced Deep Features are linearly separable; tune C.
RBF SVM: for more complex boundaries; tune C and γ. Hyperparameters are tuned via k-fold cross-validation (e.g., k = 5) on the validation set, using an orchard-level split to preserve real-world context. F1-score (or balanced accuracy) is used for model selection to reduce bias under class imbalance.
3.3.6. Handling Class Imbalance
If healthy images outnumber diseased ones, we apply class weighting in SVM (e.g., class_weight = ‘balanced’) to emphasize minority classes and/or random oversampling in the training set only (leaving val/test untouched). In practice, PCA-based reduction combined with class weighting improves the SVM margin for under-represented classes.
3.3.7. Training Workflow
Deep Features → PCA → SVM pipeline for durian leaf disease classification (
Figure 5)
Durian leaf image (healthy/leaf-spot/leaf-blight).
Prepare images → resize/normalize → pass through ResNet50/DenseNet201 (pretrained, frozen) → extract GAP features.
Fit PCA on training Deep Features → transform train/val/test with the same PCA.
Tune SVM (kernel, C, γ, and class_weight) via k-fold CV on val → select the best hyperparameters.
Lock the model (backbone + PCA + SVM) → evaluate on test and report accuracy/precision/recall/F1 and per-image runtime.
Predicted class [healthy leaf/leaf-spot disease/leaf-blight].
3.4. Web Application
The system was implemented as a lightweight Streamlit web app with two main modes: Rules Features (RGB/HSV/Lab) and Deep Features (PCA–SVM). Users start at an image-upload page (supports JPG/PNG/BMP/TIFF/WebP). The app then resizes with a contain-and-pad strategy to 1750 × 800 px and crops a 50-px border to standardize the analysis area across images. Next, it performs specular highlight removal using thresholds on V (HSV) and L* (Lab) before entering the color/feature analysis for each mode (
Figure 6).
Rules Features: This mode applies explainable color thresholds, a yellow hue band ~20–40° (OpenCV scale 0–179) together with a* and b* constraints to form an initial mask. It then applies a median blur followed by morphological opening and closing operations, and finally computes connected components to isolate lesion objects. Decisions are based on size criteria—spot ≤ 30 px per component (must have ≥ N spots) and blight ≥ 300 px or bounding box ≥ 100 × 100 px—with leaf-blight prioritized over leaf-spot. The output overlays red circles on spot locations and orange boxes on blight patches, accompanied by Thai text summaries: “The durian tree is healthy”, “The durian tree has leaf spot”, or “The durian tree has leaf blight”, displayed immediately on the page.
Deep Features: This mode lets users “briefly train” a model by uploading training images for three classes (healthy/leaf-spot/leaf-blight). The app extracts Deep Features from ResNet50/DenseNet201 (pretrained), performs PCA (user-selectable number of components), and trains an SVM (UI controls for kernel and C, with class_weight = ‘balanced’). After training, users can upload new images for inference right away. The system can also overlay the color-rule results (circles and boxes) on images predicted as diseased to make lesion locations easier to interpret.
UX for farmers/learners. The interface is intentionally simple. A sidebar aggregates all parameters (specular filter, color thresholds, size criteria, minimum spot count, or PCA/SVM settings) with clear Thai labels and sensible, ready-to-use defaults to reduce setup burden. Results present the analyzed image alongside a concise explanation for quick decisions. On a typical PC, the Rules mode runs near real time on a CPU per image, while the Deep mode allows interactive training/inference for small-to-medium datasets within the same web app.
Explainable output: Every inference page shows the original leaf together with circles and boxes over detected lesions and a Thai-language summary so that growers can immediately see “what the system saw”, addressing a common adoption barrier in agricultural AI systems (
Figure 7).
A usability evaluation of the web application was conducted with 30 participants, consisting of three groups: (1) ten students from computer and information technology programs, (2) ten students from agriculture-related programs, and (3) ten agricultural practitioners and next-generation farmers. A five-point Likert-type questionnaire was administered to assess perceived usability, including clarity of output, ease of use, and practical applicability. Responses were rated on a scale ranging from “very low” to “very high”.
Descriptive statistics were used to analyze the responses, specifically the mean and standard deviation (SD). The mean scores were then qualitatively interpreted into five levels: very high, high, moderate, low, and very low.
4. Results
4.1. Dataset and Evaluation Protocol
Normal leaves, leaf-spot, and leaf-blight classes were each represented by 100 images, resulting in a total of 300 images. For each class, the dataset was split into training, validation, and test sets using a 70:15:15 ratio. The relationship between PCA variance and classification accuracy is summarized in
Table 3 and
Figure 8.
Table 3 summarizes the effects of varying the number of PCA components on classification performance. The mean k-fold cross-validation accuracy, its standard deviation, and the cumulative explained variance are reported to illustrate the trade-off between dimensionality reduction and predictive performance.
Figure 8 illustrates the relationship between the number of PCA components (x-axis) and the mean k-fold cross-validation accuracy (solid blue line, left y-axis), with a shaded band showing the standard deviation across folds. The cumulative explained variance of PCA is shown as a dashed line with “x” markers (right y-axis).
Observations from the graph: (1) Accuracy remained stable at approximately 0.78–0.80 for k = 8–16, then dips slightly around k ≈ 24 before recovering to ~0.79 at k ≈ 32. (2) The cumulative explained variance increases monotonically from ~0.62 (k = 8) to ~0.97 (k = 32), indicating that adding components captures more of the data variance. (3) The uncertainty band has a moderate width for all k values, implying some dependence on fold composition but no severe volatility.
Interpretation: (1) There is an elbow/diminishing-returns region around k ≈ 12–16: although additional components explain more variance, accuracy does not improve meaningfully and even drops slightly at k ≈ 24, suggesting noise/overfitting for SVM when dimensionality is not sufficiently reduced. (2) Therefore, choosing a smaller k that yields comparable accuracy is preferable, reducing model complexity and training/inference time. For example, k = 16 (explained variance ~0.80 and accuracy ~0.80) offers a good balance.
Conclusion: the classifier’s accuracy (SVM on Deep Features after PCA) is highest and stable at 0.78–0.80 when using about 8–16 components. Although the explained variance continues to increase up to ~0.97 at 32 components, it does not translate into a significant accuracy gain, demonstrating diminishing returns and indicating that k ≈ 16 is a balanced choice between performance and model complexity.
Table 4 demonstrates clear and consistent improvements in performance across the evaluated methods. The Rules-based pipeline (RGB/HSV/Lab) achieves solid and balanced results (accuracy/recall ≈ 0.80; ROC-AUC = 0.90), indicating that color-based rules provide useful but inherently limited discriminative capability. The Deep Features model (PCA–SVM) substantially improves all evaluation metrics (Accuracy = 0.97; macro F1 = 0.965; ROC-AUC = 0.995), suggesting that the learned feature representation generalizes effectively across classes. The proposed Ensemble model, which integrates the Rules and Deep pipelines, delivers the best overall performance (accuracy = 0.99; macro F1 = 0.987; ROC-AUC = 0.998). This improvement reflects the complementary strengths of the two approaches, where interpretable color rules provide domain-specific cues and spatial priors, while Deep Features capture complex texture patterns. The consistency observed across macro-averaged metrics further indicates balanced classification performance among all classes.
For clarity, the accuracy values reported in
Table 4 correspond to distinct model configurations evaluated under the same held-out test set. Specifically, an accuracy of approximately 0.80 corresponds to the Rules pipeline (RGB/HSV/Lab), 0.97 corresponds to the Deep Features (PCA–SVM) pipeline, and 0.99 corresponds to the proposed Ensemble model that integrates both pipelines.
To verify that the observed performance differences are not attributable to random variation, statistical significance testing was conducted using McNemar’s test on pairwise model comparisons. The detailed results of these tests are presented in
Table 5.
Together with the per-class analysis and the confusion matrix of the Ensemble model, these statistical results provide comprehensive validation of both overall and class-specific performance improvements.
Table 6 reports the per-class precision, recall, and F1-score of the Ensemble model. The model achieves perfect classification performance for the healthy class and near-perfect performance for leaf-blight. A single misclassification was observed for the leaf-spot class, which was incorrectly predicted as leaf-blight, reflecting the visual similarity between early-stage leaf-spot and leaf-blight symptoms.
Table 7 presents the confusion matrix of the Ensemble model on the test set. Notably, no misclassification occurs between healthy leaves and diseased classes, indicating strong robustness in distinguishing healthy leaves from diseased ones. The only observed error involves confusion between leaf-spot and leaf-blight, particularly in early-stage cases where visual symptoms and color characteristics overlap.
4.2. Results
This study presents a web application for durian leaf disease analysis that integrates an explainable color-rules pipeline (HSV/Lab with specular highlight removal via V/L*) and the Deep Features (PCA–SVM) pipeline. Experiments on a dataset of healthy, leaf spot, and leaf-blight images with a train:valid:test split of approximately 70:15:15 show overall good and class-balanced accuracy. The “PCA components vs. accuracy” curve exhibits diminishing returns beyond ~k = 12–16: accuracy stabilizes around 0.78–0.80 while cumulative explained variance keeps increasing— suggesting that k ≈ 16 provides a practical balance for field use.
Moreover, when comparing the Rules pipeline, the Deep Features (PCA–SVM) pipeline, and the Ensemble model (Rules+Deep) approaches, the combined method improves aggregate metrics (accuracy/precision/Recall/F1 and macro ROC-AUC) and reduces class confusion—especially the challenging healthy ↔ leaf-spot pair where small, green-surrounded spots often occur—whereas leaf-blight is consistently recognized due to its large patch patterns (as reflected in the confusion matrix).
For practical communication, the system overlays circles or bounding boxes on detected lesion regions and reports results in Thai, enabling end-users to understand the decision rationale at a glance. Inference is executed on the CPU with near real-time interactive latency. Specifically, the reported processing time of approximately 50–150 ms per image was measured on a standard laptop equipped with an Intel® Core™ i7-9750H CPU (2.60 GHz; Intel Corporation; procured in Thailand), 16 GB RAM, and running Ubuntu 20.04, with inference performed on single images, excluding network transmission time. Under this configuration, both the color-rule pipeline and the Deep Features (PCA–SVM) pipeline can be executed efficiently without GPU acceleration. These results suggest that the proposed system is feasible for classroom use and for preliminary disease screening in typical field environments where only CPU-based devices are available.
4.2.1. Example Output Image
4.2.2. Systematic Analysis of Common Failure Cases with Confusion Matrix Correlation
To provide a more systematic analysis, the observed failure cases were explicitly correlated with the error types identified in the confusion matrix (
Table 7). This mapping clarifies not only where the system fails, but also why these failures occur and how they may be mitigated in future work.
Case 1: Leaf-Spot → Leaf-Blight Confusion
The most prominent error observed in the confusion matrix is the misclassification of a leaf-spot sample as leaf-blight (
Table 7). This error corresponds to cases in which leaf-spot lesions are densely distributed across the leaf surface, causing their combined color appearance and spatial extent to resemble early-stage leaf-blight. This phenomenon is visually challenging even for human experts and is further compounded by local agricultural practice, where some farmers refer to both conditions under the general term “leaf-blight”.
Mitigation:
Potential mitigation strategies include incorporating lesion-density descriptors, multiscale lesion-clustering features, and temporal-progression analysis. Fine-tuning the deep backbone on early-stage leaf-spot samples or introducing an explicit “dense spot” intermediate category may further reduce this confusion.
Case 2: Specular Highlights → False Positives (Healthy → Disease)
False positives caused by specular highlights correspond to healthy leaves that are incorrectly flagged as diseased, although no such error appears in the Ensemble confusion matrix due to effective fusion. These cases mainly affect the rules-based pipeline and are associated with bright reflections from water droplets or waxy leaf surfaces.
Mitigation:
Future improvements may include more advanced illumination normalization, polarization-aware capture guidance, or learned glare-suppression modules prior to color-rule processing.
Case 3: Low-Light or Motion Blur → False Negatives
Under low-light conditions or motion blur, small lesions may not be detected, leading to false negatives, primarily in the Rules-based model. While the Ensemble model mitigates most of these cases, extreme blur remains challenging.
Mitigation:
Possible solutions include incorporating blur detection with user feedback (“image too blurry—please recapture”) and data augmentation with low-light and motion-blur samples during training.
Overall, correlating qualitative failure cases with the quantitative confusion matrix provides a clearer understanding of system limitations. Importantly, most residual errors are confined to visually ambiguous disease stages rather than healthy–disease discrimination, indicating that the proposed Ensemble model is suitable for practical field screening while highlighting concrete directions for future technical refinement.
4.2.3. Evaluation of the Web App
Usability was evaluated with 30 participants, including 10 students in computer and information technology, 10 students in agriculture, and 10 agricultural practitioners or next-generation farmers. Overall, the tool was rated as very easy to use, with an average score of 4.83 and a standard deviation of 0.34, as shown in
Table 8. The highest-rated aspects were system responsiveness, multi-device usability, and practical usefulness, each receiving an average score of 4.97 with a very low standard deviation of 0.18, indicating strong agreement among users. Participants further reported that the interface was clear, the key functions (image upload, analysis, and result display) were easy to locate, and the visual output—including textual summaries and highlighted lesion regions—helped them interpret the system’s decision. Importantly, most respondents indicated that they could begin using the system independently without relying heavily on an expert user, suggesting that the design is accessible to both technical and non-technical users in real practice.
4.3. Task-Based User Diagnostic Accuracy Evaluation
To further assess the practical on-farm utility of the proposed system beyond usability perception, a controlled task-based diagnostic evaluation was conducted. Participants were asked to perform leaf-disease identification tasks under two conditions: (1) unaided visual inspection and (2) inspection assisted by the proposed web application.
A subset of 30 images (10 healthy, 10 leaf-spot, and 10 leaf-blight), distinct from the training and test sets, was used for this evaluation. The same image set was presented to all participants in randomized order. Participants were first asked to classify each image without system assistance, followed by a second round where the web application’s prediction, lesion overlays, and Thai-language explanation were available.
The average diagnostic accuracy increased from 73.1% (unaided) to 91.4% (assisted by the system). The largest improvement was observed in distinguishing Leaf-spot from early-stage leaf-blight samples, where visual ambiguity is common. These results suggest that the system not only provides correct predictions but also meaningfully supports user decision-making in challenging diagnostic scenarios.
5. Discussion
This study proposes a hybrid, explainable AI system that integrates two complementary analysis modes: (1) an interpretable color rule-based pipeline operating in HSV/Lab color spaces with explicit specular highlight suppression, and (2) a Deep Feature-based pipeline that combines pretrained convolutional backbones, PCA-based dimensionality reduction, and SVM classification. The motivation for this dual design is to balance explainability and predictive accuracy under real-world agricultural conditions, rather than to optimize accuracy in a controlled laboratory setting. The system is implemented as a Thai-language web application intended for practical use by durian farmers in Nakhon Si Thammarat, as well as a pedagogical tool for students in computer science, information technology, and agricultural science.
A key contribution of the proposed system lies in the explainability of the color-rule-based mode. Instead of relying solely on black-box deep-learning models, this mode encodes visual reasoning that aligns with how farmers and extension officers visually assess leaf diseases. For instance, leaf-spot symptoms typically manifest as small yellow–orange–brown speckles scattered on otherwise green tissue, whereas leaf-blight appears as larger, continuous brown, tan, or gray patches that expand across the leaf surface. These distinctions were formalized using measurable criteria in HSV/Lab color space, such as hue ranges of approximately 20–40° for yellow lesions, combined with a* and b* thresholds indicating yellow–brown chromaticity. Spatial constraints are also applied, limiting leaf-spot lesions to roughly ≤30 × 30 pixels while identifying leaf-blight as larger contiguous regions (≥100 × 100 pixels). To mitigate the effects of strong sunlight and leaf surface reflectance, specular highlights are suppressed using V (HSV) and L* (Lab) thresholding prior to lesion analysis. This step is particularly important in southern-Thai orchards, where humidity, cloud cover, and sun angle vary rapidly throughout the day.
Explainability is critical for agricultural AI adoption because farmers often ask not only “Is this leaf diseased?” but also “What did the system see that led to this conclusion?” By visually marking suspected lesion regions and providing an immediate Thai-language explanation (e.g., “healthy leaf”, “leaf spot”, or “leaf blight”), The Rules-based mode enhances user trust and interpretability beyond what a purely label-driven model can offer.
The second major contribution is the robustness and accuracy of the Deep Features → PCA → SVM pipeline. Pretrained backbones such as ResNet50 and DenseNet201 are used solely as feature extractors, avoiding full end-to-end fine-tuning, which would require larger datasets and greater computational resources. This design choice explicitly targets low-data field conditions, on the order of approximately 100 images per class, with substantial variation in lighting, background clutter, and leaf orientation. Such conditions better reflect real smallholder orchards than curated image datasets. The resulting performance, reflected in accuracy, macro-averaged precision, recall, and F1-score, indicates that the learned Deep Feature representation generalizes effectively across disease categories without being dominated by the healthy class.
An important observation emerges from the relationship between PCA dimensionality and classification accuracy. While increasing the number of principal components initially improves cross-validation performance (e.g., from 8 to 12 or 16 components), further increases yield diminishing returns and may even slightly degrade accuracy despite higher cumulative explained variance. This suggests that adding more dimensions can introduce noise related to lighting or background variation rather than disease-specific information, leading to overfitting in the SVM classifier. This behavior underscores the challenges of agricultural field data, where environmental variability may exceed inter-class visual differences.
Beyond overall accuracy, the superior performance of the Ensemble model can be attributed to complementary error characteristics between the Rules-based and Deep Features pipelines. The rule-based pipeline tends to fail in cases where disease symptoms are subtle or borderline, such as early-stage leaf-spot or sparsely distributed lesions partially obscured by shadows or glare. Because this pipeline relies on fixed color thresholds and lesion size constraints, early or weak symptoms may not fully satisfy predefined criteria, resulting in false negatives. Conversely, the Deep Features model exhibits greater sensitivity to texture, vein structure, and spatial patterns, enabling it to detect early disease manifestations more effectively. However, its errors are more likely to occur when visually similar disease patterns overlap, particularly between dense clusters of leaf-spot lesions and early-stage leaf-blight, where global texture similarity may dominate learned representations.
The Ensemble model mitigates these weaknesses by integrating evidence from both pipelines. When the Rules-based model provides strong, interpretable cues—such as clearly bounded large necrotic regions consistent with leaf-blight—these cues reinforce the deep model’s prediction. Conversely, when the Rules-based model fails to detect subtle lesions, the Deep Features pipeline compensates by leveraging learned texture representations. This complementary behavior is quantitatively supported by the per-class performance metrics (
Table 6) and the confusion matrix (
Table 7). Notably, the Ensemble model eliminates confusion between healthy leaves and diseased classes entirely, indicating robust healthy–diseased discrimination. The remaining error is confined to a single misclassification between leaf-spot and leaf-blight, reflecting a genuinely challenging borderline case rather than systematic failure.
A concrete example of this complementarity arises when leaf-spot lesions are densely distributed across a leaf surface. In such cases, the rules-based pipeline may incorrectly aggregate multiple small spots into a blight-like region due to spatial proximity and color similarity. The Deep Features model, however, can recognize finer texture and distribution patterns indicative of leaf-spot. The Ensemble decision resolves this conflict by prioritizing consistent evidence across both pipelines, thereby correcting errors that would occur if either model were used in isolation.
From a practical perspective, confusion between leaf spot and leaf-blight is influenced not only by visual similarity but also by agricultural practice. When leaf-spot lesions become numerous and dense, their color appearance and spatial patterns increasingly resemble those of leaf-blight. Moreover, some farmers colloquially refer to both conditions under the general term “Leaf-blight”, reflecting their perceived similarity in field contexts. These observations highlight the intrinsic difficulty of early-stage disease differentiation and reinforce the importance of explainable decision cues to support real-world interpretation and usage.
Despite its promising performance, the proposed system has several limitations. The dataset size, while sufficient for a prototype (approximately 100 images per class), remains small for large-scale deployment across diverse cultivars, seasons, and orchards. The use of orchard-level splits for training, validation, and testing reduces contextual leakage but further constrains available data per split. Additionally, the color-rule pipeline remains sensitive to threshold selection; overly permissive thresholds increase false positives, while overly strict thresholds risk missing early disease. Illumination handling, although improved through specular highlight suppression, does not yet include advanced color constancy or white balance normalization, which could further stabilize rule-based analysis under extreme lighting.
In addition to algorithmic performance, the task-based diagnostic evaluation suggests that the proposed system can effectively support human decision-making. The observed improvement in user accuracy, particularly in visually ambiguous cases such as dense leaf-spot versus early leaf-blight, indicates that explainable overlays and rule-based cues help users interpret disease symptoms more reliably. This addresses an important practical limitation of many agricultural AI tools, which often report high accuracy but do not evaluate their impact on user decisions under realistic conditions.
Future work will focus on incorporating automatic illumination normalization, farmer-informed threshold tuning, and selective fine-tuning of deep backbones using locally collected durian leaf data. Such extensions are expected to improve robustness in borderline cases, including very early leaf-blight and mixed symptom scenarios. Overall, the results demonstrate that a carefully designed hybrid Ensemble model can achieve a practical balance between explainability, accuracy, and real-world usability, making it suitable both for preliminary disease screening in orchards and for educational use in agricultural informatics and computer vision.