An Explainable Machine Learning Framework for the Hierarchical Management of Hot Pepper Damping-Off in Intensive Seedling Production

Wang, Zhaoyuan; Liu, Kaige; Liang, Longwei; Li, Changhong; Ji, Tao; Xu, Jing; Liu, Huiying; Diao, Ming

doi:10.3390/horticulturae11101258

Open AccessArticle

An Explainable Machine Learning Framework for the Hierarchical Management of Hot Pepper Damping-Off in Intensive Seedling Production

by

Zhaoyuan Wang

^1,†

,

Kaige Liu

^1,2,†

,

Longwei Liang

^1,2

,

Changhong Li

¹,

Tao Ji

¹

,

Jing Xu

¹,

Huiying Liu

^1,* and

Ming Diao

^1,*

¹

Agricultural College, Shihezi University/Key Laboratory of Special Fruits and Vegetables Cultivation Physiology and Germplasm Resources Utilization of Xinjiang Production and Construction Corps, Shihezi 832003, China

²

Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences/National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Horticulturae 2025, 11(10), 1258; https://doi.org/10.3390/horticulturae11101258

Submission received: 14 September 2025 / Revised: 7 October 2025 / Accepted: 15 October 2025 / Published: 17 October 2025

(This article belongs to the Special Issue New Trends in Smart Horticulture)

Download

Browse Figures

Versions Notes

Abstract

Facility agriculture cultivation is the main production form of the vegetable industry in the world. As an important vegetable crop, hot peppers are easily threatened by many diseases in a facility microclimate environment. Traditional disease detection methods are time-consuming and allow the disease to proliferate, so timely detection and inhibition of disease development have become the focus of global agricultural practice. This article proposed a generalizable and explainable machine learning model for hot pepper damping-off in intensive seedling production under the condition of ensuring the high accuracy of the model. Through Kalman filter smoothing, SMOTE-ENN unbalanced sample processing, feature selection and other data preprocessing methods, 19 baseline models were developed for prediction in this article. After statistical testing of the results, Bayesian Optimization algorithm was used to perform hyperparameter tuning for the best five models with performance, and the Extreme Random Trees model (ET) most suitable for this research scenario was determined. The F₁-score of this model is 0.9734, and the AUC value is 0.9969 for predicting the severity of hot pepper damping-off, and the explainable analysis is carried out by SHAP (SHapley Additive exPlanations). According to the results, the hierarchical management strategies under different severities are interpreted. Combined with the front-end visualization interface deployed by the model, it is helpful for farmers to know the development trend of the disease in advance and accurately regulate the environmental factors of seedling raising, and this is of great significance for disease prevention and control and to reduce the impact of diseases on hot pepper growth and development.

Keywords:

hot pepper damping-off; disease forecast; machine learning; explainable model; facility agriculture

1. Introduction

As a modern cultivation system, facility agriculture breaks seasonal limitations inherent in traditional agriculture, enabling a year-round supply of agricultural products and enhancing supply chain resilience. It creates a controlled growing environment for crops, thereby achieving efficient production [1,2]. However, the enclosed microclimate in facility agriculture also promotes disease epidemics. Diseases threaten crops throughout their entire growth cycle. Outbreaks at the seedling stage can be particularly devastating, leading to significant crop losses and severely impacting grower income [3]. Hot pepper (Capsicum annuum L.) is the world’s fourth-largest cash crop. China, as the largest global consumer and producer, accounts for approximately 40% of the worldwide cultivation area [4,5]. Damping-off, caused by the pathogen Rhizoctonia solani Kühn, is one of the most common diseases in hot pepper seedling production. It attacks seedlings, causing stunting and plant death, making it a leading cause of seedling loss [6].

In practice, determining peak disease incidence often relies on visual inspection or expert judgment, followed by traditional chemical control methods involving multiple pesticide applications. This approach leads to issues such as high production costs, pesticide residues, and environmental pollution [7]. Furthermore, due to disease developmental stages like the incubation period, a lag exists between infection and visual symptom detection in the field. By the time symptoms appear, pathogen populations are often already high, necessitating intensive control measures. However, frequent chemical applications exacerbate pathogen resistance [8]. Over time, the efficacy of chemicals diminishes, often requiring higher doses to maintain control, which intensifies ecological damage [9,10]. Therefore, developing predictive models for early intervention is crucial to prevent large-scale outbreaks and enable timely management strategies [11,12].

The integration of big data, artificial intelligence (AI), and the Internet of Things (IoT) has provided strong methodological support for advancing disease prediction [13]. Machine learning (ML) models have demonstrated significant capabilities in this domain. Existing studies have shown that environmental factors play a key role in disease prediction [14,15], and developing prediction models with environmental robustness has become the key to breaking through the current industrial dilemma. In recent years, research on agricultural disease prediction models based on environmental parameters has made great progress, highlighting its potential in early warning and precise prevention and control [16]. A prevalent paradigm in early research has been binary classification, which focuses on predicting the simple occurrence or non-occurrence of a disease. For instance, a typical approach might involve using environmental data to train a model that distinguishes between “healthy” and “diseased” states, providing a basic early warning signal [17]. Similarly, the study by Liu et al. [18] used air temperature, air relative humidity, solar radiation, soil temperature and other variables, combined with deep learning methods, to develop a prediction model for cucumber downy mildew in solar greenhouses, with an accuracy of 90%. However, a key limitation of this paradigm is its inability to predict the trajectory of severity after pathogen establishment. While useful for initial alerts, it does not address the critical need for assessing how severe an outbreak might become, which is essential for planning graded control measures. Furthermore, reliance on standard evaluation metrics often overlooks the practical requirement for guiding specific interventions.

To address the challenge of model interpretability, the application of explainable machine learning (XML) is gaining increasing attention. Techniques such as the SHAP (SHapley Additive exPlanations) framework [19] have been introduced to interpret model decisions and quantify feature contributions. For example, Wadhwa and Malik [19] applied SHAP to pest classification, providing an important tool for understanding the model decision logic. Furthermore, advancing beyond simple binary outcomes, some studies have begun to explore multi-class severity prediction. The potato late blight model by Fenu and Malloci [20] establishes an important benchmark for prediction of this disease, but its reliance on data from a single weather station fails to adequately capture farmland microclimate variability. Subsequent research, such as the random forest-neural network hybrid model proposed by Bai et al. [14], which performs well in rubber tree powdery mildew prediction, and Sriwanna’s rice blast model [21] which successfully identified key climate factors through feature importance ranking, further demonstrates the value of XML in identifying key drivers and offering decision support. Despite these advancements, a critical bottleneck persists: even explainable and multi-class models often fall short of establishing quantitative relationships with agronomic control measures, which remains a key bottleneck connecting model prediction with field actions [22].

This article is devoted to developing an explainable prediction framework for facility hot pepper seedling environment. The main contents are as follows: (i) the Isolation Forest method is used for outlier detection, followed by multi-source sensor data fusion and the application of the Kalman filter algorithm to improve data quality, (ii) features are selected through correlation analysis and variance inflation factor (VIF) evaluation, and target variable samples are balanced using the SMOTE-ENN method, (iii) nineteen machine learning models are developed and evaluated based on F1-score and AUC values, with shortlisted models undergoing statistical testing and hyperparameter optimization to identify the best-performing model; (iv) the SHAP framework is employed to quantify feature contributions and analyze their influence on disease severity through global and local interpretability analysis.

2. Materials and Methods

2.1. Data Acquisition

The experiment was conducted from March to July 2024 and from March to June 2025 at the Xinjiang Kashi (Shandong Shuifa) Vegetable Industry Demonstration Park (39.35° N, 76.02° E), located in Shule County, Kashi City, within China’s Xinjiang Uygur Autonomous Region. Three hot pepper varieties—‘Special Selected Screw Hot Pepper’, ‘Sipingtou’, and ‘Zhudachang’ (representing high-, medium-, and low-resistance levels, respectively)—Were selected as test varieties. A substrate of peat moss and perlite mixed at a 2:1 ratio was filled into 128-cell seedling trays. Seedlings were raised following the “one plant per cell” method, with water and fertilizer management strictly adhering to local production standards throughout the growth period. Sensors were deployed inside the greenhouse to continuously monitor air temperature, relative humidity, and solar radiation. Outdoor wind speed data were obtained from weather stations within the park. A schematic diagram of the experimental greenhouse and sensor deployment scheme is presented in Figure 1.

A representative seedling tray was selected, and a calibrated weighing system was used to continuously record its weight, enabling the calculation of changes in substrate moisture content. The frequency and descriptions of the collected data are provided in Table 1. The incidence of hot pepper damping-off for each variety was assessed and recorded daily using the five-point sampling method. Disease severity was graded according to the criteria outlined in Table 2, and the disease index (DI) was calculated daily using Equation (1) based on these records. The entire investigation period spanned 100 calendar days. Based on the numerical range of the disease index, severity was categorized into three grades: low (0 ≤ DI < 5), moderate (5 ≤ DI < 13), and high (DI ≥ 13). These grades corresponded to classes 0, 1, and 2, respectively, for the purpose of classification (where the class labels do not represent ordinal numerical significance). The temporal variation in the disease index throughout the experimental period is illustrated in Figure 2.

D I = \frac{\sum (n_{i} * i)}{9 N} \times 100,

(1)

where i represents the disease grading at Table 2, nᵢ is the number of plants at grading i, and N is the total number of plants surveyed.

2.2. Data Preprocessing

2.2.1. Outlier Handling

The sensor data acquisition process is susceptible to factors such as equipment failure, unstable communication, and environmental disturbances, which can lead to anomalous data values. This study employs the Isolation Forest algorithm [23] for unsupervised anomaly detection. Its core advantage lies in constructing isolation trees through random feature segmentation and leveraging the sparsity of outliers in the feature space for efficient detection.

The decision score threshold was selected based on the high precision of the sensors used in this experiment. Under normal operating conditions, the anomaly ratio for these sensors is typically below 2%. To preserve the integrity of normal data and prevent over-detection, a low contamination parameter of 1% (i.e., contamination = 0.01) was adopted. The threshold θ was thus dynamically determined using the dynamic quantile method, as follows:

θ = Q_{s c o r e s} (100 * c),

(2)

where c = 0.01 is the preset contamination rate (i.e., the proportion of outliers), and Q_scores(p) represents the p-th percentile of the decision score distribution. Detected outliers were addressed using a global median replacement strategy. This approach significantly reduced noise interference while preserving the temporal characteristics of the data.

2.2.2. Kalman Filter Smoothing Processing

This study employed the Kalman filter to fuse and smooth multi-source environmental data collected from sensors within the greenhouse. The process noise covariance matrix was optimized to enhance data stability. As an efficient recursive algorithm, the Kalman filter can accurately estimate the system state in the presence of noise [24,25]. The initial state estimate and error covariance matrix were initialized using the first observation. A set of candidate process noise covariance matrices (Q) was constructed based on the variance of each variable, and their performance was evaluated. The Q-matrix that yielded the smallest mean square error (MSE) was selected as optimal. The smoothing performance of the Kalman filter was quantified using the mean square error (MSE) and the coefficient of determination (R²). The MSE measures the average squared deviation between estimated and true values. A smaller MSE and an R² value closer to 1 indicate better smoothing performance and greater sensitivity to outliers. The equations are defined as follows:

M S E = \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n},

(3)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},

(4)

where n is the total number of samples,

y_{i}

is the original value,

{\hat{y}}_{i}

is the smoothed value, and

\bar{y}

is the average value of the original values.

2.2.3. One-Hot Encoding

In the construction of the dataset, encoding variety information is a critical step. One-hot encoding enables the model to learn the distinctive characteristics of each variety by treating each one as an independent category, thereby avoiding the introduction of ordinal or hierarchical relationships. This approach helps the model learn features accurately, prevents misleading information, maintains class independence, enhances model accuracy and interpretability, and improves its performance in classification tasks [26].

2.3. Feature Engineering

2.3.1. Feature Creation

To analyze the environmental drivers of hot pepper disease development, key features were selected based on phytopathological principles (Table 3). Existing studies indicate that diurnal temperature variation significantly influences disease occurrence. Daytime temperatures of 28 °C and 30 °C represent critical thresholds for the disease resistance mechanism in hot pepper [27]. Temperatures exceeding these thresholds adversely affect plant growth, whereas nighttime temperatures often approach the optimal range for soil-borne pathogen development [28]. Disease development is influenced by multiple factors. Diurnal fluctuations in relative humidity, reflecting moisture variation, can accelerate the spread of pathogens [29]. Wind speed and solar radiation indirectly influence disease spread by modulating the greenhouse thermal environment. Changes in substrate weight are linked to the dynamic balance between plant water uptake and pathogenic activity. Based on this rationale, features were extracted at 12 h intervals. A total of 17 features related to temperature, relative humidity, solar radiation, wind speed, and other factors were derived. The definitions, calculation methods, and units for all features are provided in Table 3.

2.3.2. Feature Selecting

In order to optimize the model’s performance and interpretability, this study adopted a feature selection strategy that combines Spearman correlation analysis with the variance inflation factor (VIF). Spearman’s correlation coefficient (Spearman’s ρ) removes distributional assumptions through rank transformation, making it suitable for evaluating monotonic nonlinear correlations [30]. This method helps identify input variables that are associated with the target variables. The equation for calculating Spearman’s ρ is as follows:

ρ = 1 - \frac{6 \sum_{i = 1}^{n} d_{i}^{2}}{n (n^{2} - 1)},

(5)

where d_i is the rank difference in the i-th pair of observations, and n is the sample size. Based on Cohen’s [31] effect size criteria and the specific conditions of the experimental environment, features demonstrating a statistically significant correlation were retained for subsequent analysis. Following Schober’s [32] guidelines for interpreting Pearson correlation coefficients, a threshold of |ρ| = 0.1 was established. Features with a correlation coefficient below this threshold were treated as uncorrelated noise and excluded. Subsequently, to mitigate model distortion due to multicollinearity, a collinearity test was conducted on the features that passed the initial screening, and the variance inflation factor (VIF) was calculated as follows:

{VIF}_{j} = \frac{1}{1 - R_{j}^{2}},

(6)

where R_j² is the coefficient of determination when the j-th variable is regressed against all other variables. A VIF value exceeding 5 indicates severe multicollinearity, and the variable with the highest VIF is iteratively removed [33]. This two-stage screening process ensures that the selected features are significantly correlated with the target variables, thereby enhancing the model’s stability and interpretability.

2.4. Dataset Preparation

2.4.1. Dataset Partitioning and Standardization

The selected feature data were split into training and test sets at an 8:2 ratio. The training set was used for the model to learn the nonlinear mapping between features, while the test set was reserved for evaluating its accuracy and generalization performance. K-fold cross-validation (with 5 folds and a random seed of 42) was employed. This technique alternates the selection of training and test sets, reducing the risk of overfitting, improving data utilization, and enhancing model generalization. Data standardization eliminates scale differences between features, making them comparable and thereby improving model training speed and performance, which is crucial in data analysis [34]. In this study, min-max scaling was applied to the input variables of the training set (note that variety variables were already encoded). The fitted scaler was saved and subsequently applied to the test set, preventing data leakage during model evaluation. The equation is as follows:

x^{'} = \frac{x - x_{\min}}{x_{m a x} - x_{m i n}},

(7)

where x’ is the normalized value, x is the original data value, x_min is the minimum value in the data, and x_max is the maximum value in the data.

2.4.2. SMOTE-ENN Handles Unbalanced Classification

Machine learning models can exhibit bias toward the majority class when trained on imbalanced datasets. To mitigate the potential impact of class imbalance on model training and evaluation, the Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors (SMOTE-ENN) was employed [35]. This technique balances class distribution by synthesizing new minority class instances (SMOTE) and cleaning the resulting dataset by removing noisy samples (ENN). This process enhances the model’s sensitivity to minority classes and improves its overall robustness.

2.5. Model Development and Evaluation Statistical Testing

2.5.1. Model Operating Environment

All computational work was conducted on a workstation with the following hardware configuration: an Intel Core i5-13500 processor (12 cores, 4.6 GHz), an NVIDIA GeForce RTX 4050 GPU (6 GB VRAM), and 16 GB of DDR4 RAM. The software environment comprised the Windows 11 operating system. Development was carried out in the PyCharm 2024 IDE, utilizing Python 3.10 within an Anaconda virtual environment. Key software dependencies included scikit-learn (v1.4), XGBoost (v2.0), CatBoost (v1.2), LightGBM (v4.1), TensorFlow (v2.15), and the SHAP library (v0.45) for model interpretation.

2.5.2. Development of Baseline Model

For baseline model development, 19 machine learning algorithms were selected, encompassing a diverse set of approaches: Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Naive Bayes (NB), Multilayer Perceptron (MLP), XGBoost, CatBoost, LightGBM, AdaBoost, Extreme Random Trees (ET), Balanced Random Forest (BRF), Decision Tree (DT), Gradient Boosting Machine (GBM), Gaussian Process Classifier (GPC), One-vs-Rest Logistic Regression (OVR-Logistic), Artificial Neural Network (ANN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). This selection includes both classical and ensemble machine learning algorithms. Table 4 summarizes their applications in agricultural contexts, along with the key parameters employed in this study. All baseline models were initially executed using their default parameters as implemented in scikit-learn to ensure a fair comparison of their intrinsic performance on this specific task [36,37].

2.5.3. Model Evaluation Metrics

This study primarily employs the F1-score, ROC curve, AUC value, and confusion matrix to evaluate and compare the predictive performance of the classification models. The F1-score is a key metric for assessing a model’s overall performance, particularly in identifying minority classes. It represents the harmonic mean of precision and recall, thereby simultaneously considering the accuracy and completeness of predictions.

F_{1} - score = \frac{2 * Precision * Recall}{Precision + Recall},

(8)

where Precision is defined as the proportion of samples predicted to be positive by the model that are actually positive, Recall represents the proportion of samples that are actually positive and correctly predicted by the model.

Precision = \frac{TP}{TP + FP},

(9)

Recall = \frac{TP}{TP + FN},

(10)

Here, TP (True Positive) is a sample that is actually a positive class and is correctly predicted as a positive class, TN (True Negative) is a sample that is actually a negative class and is correctly predicted as a negative class, FP (False Positive) is a sample that is actually a negative class but is incorrectly predicted as a positive class, and FN (False Negative) is a sample that is actually a positive class but is incorrectly predicted as a negative class.

AUC (Area Under the ROC Curve) is a key metric to measure the robustness of the model to class sorting ability, and its calculation is based on the ROC curve (Receiver Operating Characteristic Curve). The ROC curve is generated by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) across all possible classification thresholds. This curve reflects the model’s ability to distinguish between positive and negative classes at any given threshold.

The True Positive Rate (TPR), or sensitivity, for a given class is the proportion of samples truly belonging to that class which are correctly predicted. The False Positive Rate (FPR) for a given class is the proportion of samples not belonging to that class which are incorrectly predicted as belonging to it.

TPR = \frac{TP}{TP + FN},

(11)

FPR = \frac{FP}{FP + TN},

(12)

For this multi-class problem, the One-vs-Rest (OvR) strategy was adopted. Under this strategy, the AUC evaluates the model’s ability to distinguish each class from all other classes combined. In the discrete case, the trapezoidal rule is used to calculate the area under the ROC curve, which in turn leads to the AUC value for this class.

AUC = \sum_{i = 1}^{n - 1} \frac{({TPR}_{i + 1} + {TPR}_{i}) \times ({FPR}_{i + 1} - {FPR}_{i})}{2},

(13)

where n is the number of points on the ROC curve of this class, and TPR_i and FPR_i are the true positive and false positive rates of the i-th point, respectively. Furthermore, the confusion matrix provides a visual summary of the TP, FP, TN, and FN counts for each class, illustrating the distribution of the model’s final predictions. It is an essential tool for evaluating per-class performance.

2.5.4. Model Performance Statistical Testing

To assess the overall performance differences among multiple models, the Friedman test was employed. This non-parametric statistical test is used to determine if there are significant differences in the performance of multiple classification models across different datasets. It makes no assumptions about the underlying data distribution, making it suitable for comparing multiple algorithms [52]. It is particularly appropriate for this study, which involves evaluating multiple models, as it accounts for dependencies between repeated measures [53].

Q = \frac{12 N}{k (k + 1)} [\sum_{j = 1}^{k} R_{j}^{2} - \frac{k {(k + 1)}^{2}}{4}],

(14)

where Q is the Friedman test statistic used to measure the overall difference in the performance rankings of each model. k is the number of models to be compared, that is, the number of different classification algorithms or models participating in the evaluation. R_j is the average ranking of the j-th model on all data sets, which is obtained by summing the rankings of the model in each data set and dividing it by the number of data sets, which can reflect its overall relative position. In five-fold cross-validation, the data set is divided into 5 mutually exclusive subsets.

If the Friedman test rejects the null hypothesis, indicating significant differences among models, the Conover post hoc test can be applied for pairwise comparisons to identify which specific model pairs differ significantly. This test determines significance by calculating adjusted p-values. Its core principle involves non-parametric pairwise comparisons based on ranks, incorporating a correction for multiple comparisons [54].

t = \frac{\bar{R_{i}} - \bar{R_{j}}}{\sqrt{M S_{e r r o r} \cdot (\frac{1}{n_{i}} + \frac{1}{n_{j}})}},

(15)

where

\bar{R_{i}}

and

\bar{R_{j}}

are the average ranks of the i-th and j-th groups, respectively. MS_error is the rank-based MSE in Friedman’s test. n_i and n_j are the sample sizes of groups i and j, respectively. The p-value corresponding to this t-statistic is then determined from the t-distribution with the appropriate degrees of freedom.

A p-value below the preset significance level α (0.05) indicates a statistically significant performance difference between the two models. This sequential testing approach provides a comprehensive and accurate assessment of inter-model performance differences [55,56].

2.5.5. Hyperparameter Tuning

Following predictive evaluation and performance testing, the top-performing baseline models were selected for hyperparameter tuning to improve predictive accuracy. A Bayesian optimization framework, implemented using the Scikit-Optimize (v0.10.2) library, was employed to optimize the models’ internal hyperparameters. This framework models the hyperparameter space as a Gaussian process. The objective was set to maximize the cross-validated F1-score, aiming to efficiently identify the optimal hyperparameter configuration and obtain the model with the best performance [57,58].

The optimization process commenced with 20 initial points in the hyperparameter space (controlled by the n_initial_points parameter), sampled using Latin Hypercube Sampling to ensure a uniform distribution and broad exploration. It then proceeded to the sequential optimization phase, wherein the next evaluation point was selected based on the Expected Improvement (EI) acquisition function. This function balances the exploration of unknown regions of the parameter space with the exploitation of known promising areas [36].

The total number of objective function evaluations was set to 100 (controlled by the n_calls = 100 parameter). This method can efficiently search mixed parameter spaces comprising continuous (Real, e.g., learning rate), integer (Integer, e.g., tree depth), and categorical (Categorical, e.g., kernel type) variables. Compared to grid search and random search, Bayesian optimization can identify superior solutions with fewer evaluations, significantly improving tuning efficiency.

2.5.6. Model Explainable Analysis

To elucidate the decision-making mechanism of the optimal model and quantify the contribution of environmental features to its predictions, this study employed the SHAP (SHapley Additive exPlanations) framework for interpretability analysis. SHAP is a model-agnostic interpretation method grounded in cooperative game theory’s Shapley values. Its core idea is to fairly allocate the contribution of each feature to the prediction by measuring the difference between the model’s output and a baseline (typically the average model output over the dataset). This approach ensures both consistency and local accuracy in its attributions, providing a theoretically grounded framework for interpreting machine learning model predictions [59]. Computations and analyses were performed using the SHAP library in Python [60].

2.6. Overall Modeling Workflow

In order to provide a clear overview of the entire method described in the Section 2, the overall workflow of the proposed model, from data preprocessing to interpretation, is illustrated in Figure 3.

3. Results

3.1. Kalman Filter Smoothing Results

Figure 4 shows the dynamic smoothing process of environmental data using the Kalman filter. Figure 4a–c present time series comparisons between the original temperature, relative humidity, and solar radiation data and the filtered data, respectively. The original data exhibit obvious noise fluctuations, and the filtered curves are significantly smoother, while preserving the natural rhythm of diurnal variation.

The results show that the Kalman filter estimates effectively maintain the temporal correlation of the data while suppressing random noise, thus providing a more stable data basis for the correlation analysis of subsequent disease and environmental factors.

3.2. Feature Selection Results

Figure 5 shows a scatter plot in the upper right quadrant that visually displays relationships between variables, histograms along the diagonal that display the distribution of each variable, and correlation analysis in the lower left quadrant that is used to eliminate variables weakly correlated with the target variable. Variables ‘wst’ and ‘mean_ws’ were excluded because their p-values were below the correlation threshold of 0.1.

After screening features based on preliminary correlation analysis, a variance inflation factor (VIF) screening was performed to assess multicollinearity, and the process and results are summarized in Table 5.

3.3. Results of Unbalanced Sample Processing

Analysis of the initial dataset revealed a significant class imbalance, with low, medium, and high severity classes accounting for 36.4%, 19.8%, and 43.8% of the data points, respectively (Figure 6). To mitigate this imbalance for subsequent model training, the SMOTE-ENN hybrid resampling technique was applied to the training set. This approach generates new instances for minority classes while removing noisy samples. After processing, the distribution across the three severity classes was notably more balanced (Figure 6). This preprocessing step provided a more robust data foundation for model training and ensured that model performance was evaluated on a held-out test set, thus guaranteeing the reliability of the results.

3.4. Baseline Model Prediction Results

In this study, 19 commonly used baseline models were trained on the training set. The optimal metrics of each model are shown in Figure 7. The F₁-score ranges from 0.6883 to 0.9754, and the AUC value ranges from 0.9133 to 0.9984. The results indicate that ensemble learning models are generally superior to neural network models, suggesting that ensemble learning offers advantages for this dataset and problem. Based on the F₁-score, ET > GPC > CatBoost > LightGBM > SVM > RF > XGBoost, and the AUC value, ET > CatBoost > SVM > GPC > KNN > RF > LightGBM, the next step is to perform model testing to determine the difference at the statistical level among the top-performing models with better scores above.

3.5. Model Difference Test Results

The Friedman test results (Table 6) indicated significant differences among models in both F1-score and AUC. It shows that there is a clear differentiation in the performance of different models in the classification task, and it is necessary to further locate the source of differences between the two models through post hoc testing.

As shown in Figure 8, the heatmap from the F1-score post hoc test provides a visual representation of statistically significant pairwise comparisons. Specifically, p-values between the ET model and most models (GPC, CatBoost, LightGBM, SVM) were greater than 0.05. Although performance differences exist, they did not reach statistical significance. Conversely, p-values for ET versus XGBoost, KNN, and RF were all below 0.05, indicating the ET model’s performance was significantly superior with high statistical significance.

The AUC post hoc test revealed no significant differences between models. Based on the baseline metrics and test results, ET, GPC, CatBoost, LightGBM, and SVM were shortlisted. A Bayesian optimization framework was subsequently used to conduct a structured hyperparameter search, prioritizing F1-score as the primary optimization objective with AUC as a secondary constraint.

3.6. Hyperparameter Tuning and Optimal Model Selection

Based on the preceding results, systematic hyperparameter tuning was carried out for the five shortlisted models (ET, CatBoost, SVM, GPC, LightGBM) using a Bayesian Optimization algorithm to explore the parameter space and globally optimize key hyperparameters. The process balanced model performance with generalization ability, determining the optimal parameter combination through multiple rounds of cross-validation. The hyperparameter optimization process is summarized in Table 7.

As can be seen from Table 8, the performance of the five models has been improved through hyperparameter tuning, and the results of verifying the models using test sets show that the models have high generalization and robustness.

Based on the test results (Table 8), CatBoost had a marginally higher AUC, but ET significantly outperformed it in F1-score. Overall, the ET model was selected as the optimal model for predicting hot pepper damping-off severity. The ROC curves and confusion matrix for the low, moderate, and high severity classes (0, 1, 2) generated by the final model demonstrate its excellent performance (Figure 9).

3.7. SHAP Explainable Analysis of the Model

3.7.1. Global Importance Analysis

For the low-severity class (Figure 10), the variety was the most influential feature (23.9% contribution), indicating that genetic resistance traits most significantly affect disease initiation. This aligns with the known relationship between varietal resistance and incidence rates, where higher resistance correlates with lower initial disease occurrence.

Substrate moisture content (max_w) was the second most important feature (19.3%). Its positive SHAP values indicate that higher moisture levels increase the predicted probability of low-severity disease. Maximum air temperature (max_t) ranked third (18.6%), with its SHAP profile suggesting that higher temperatures may reduce incidence probability. These findings are consistent with the principles of disease suppression through environmental control considered during feature selection. Collectively, variety, substrate moisture, and air temperature accounted for over 80% of the feature importance for low severity. This underscores that, alongside selecting resistant varieties, management should focus on environmental control—such as ventilation and dehumidification—to reduce early-stage disease risk and prevent escalation.

For the moderate severity class (Figure 11), the maximum substrate moisture content (max_w) was the most critical factor, contributing 22.5%. The negative SHAP values indicate that maintaining higher substrate moisture can effectively inhibit disease progression to more severe stages. Variety ranked second (22.0% contribution), suggesting that genetic differences influence not only initial disease incidence (low severity) but also directly correlate with the likelihood of developing moderate severity. This was followed by the substrate moisture range (de, 12.2%), maximum air temperature (max_t, 11.2%), maximum air relative humidity (max_rh, 8.0%), and minimum substrate moisture content (min_w, 7.5%). Therefore, for moderate severity, management strategies should include increasing irrigation volume (to raise substrate moisture and its range) while avoiding frequent light watering that keeps roots perpetually wet, combined with maintaining a relatively lower temperature environment to curb disease severity.

For high severity (Figure 12), the substrate moisture range (de) ranked first (19.0% contribution), followed by the minimum substrate moisture (min_w, 17.5%) and maximum relative humidity (max_rh, 16.0%). The contribution of variety decreased to 11.0%, indicating that environmental factors surpass varietal characteristics at the high-severity stage, although the initial resistance selection continues to influence disease development. The SHAP values of the top three features suggest that, during late-stage disease, alleviating high severity involves reducing the substrate moisture range (de), increasing the minimum moisture level (min_w), and increasing air humidity (max_rh). This translates to maintaining adequate substrate moisture to prevent excessive drying under severe disease conditions, thereby supporting normal crop growth.

Analysis of the SHAP results across the three severity classes reveals a shift in influential factors as disease severity increases. This underscores the necessity for differentiated control measures tailored to specific disease stages—a hierarchical management strategy in field management. The interaction mechanisms among these key factors are further elucidated in the dependency analysis below.

3.7.2. Local Dependency Analysis

For the low-severity class (Figure 13), the maximum temperature (max_t) significantly enhances the inhibitory prediction once it exceeds a threshold of 0.7 (SHAP value > 0.04). Its interaction with maximum substrate moisture (max_w) is synergistic: when max_w > 0.6, the inhibitory effect of high temperature (max_t > 0.7) increases by 28%. Conversely, the combination of minimum temperature (min_t) and minimum moisture (min_w) reveals a moisture-sensitivity mechanism under low-temperature conditions (min_t < 0.4). Here, a min_w > 0.65 is required to maintain a positive SHAP value (>0.02); otherwise, the inhibitory effect is reversed (SHAP value ≤ 0.01).

For the moderate severity class (Figure 14), the core dynamic involves water regulation. While a maximum substrate moisture (max_w) > 0.75 significantly inhibits disease escalation (SHAP value = 0.05), this effect is modulated by temperature, decaying by 40% when max_t > 0.6.

For the high-severity class (Figure 15), extreme stress responses reveal a hydrothermal compensation mechanism. A significant synergistic effect exists between the irrigation amount (de) and minimum moisture (min_w): when de > 0.6 and min_w > 0.55, the SHAP value remains slightly positive (0.01–0.02). However, if min_w < 0.5, even de > 0.8 fails to reverse the negative trend (SHAP value ≤ 0.015). Additionally, the combination of maximum humidity (max_rh) and minimum temperature (min_t) shows that under low-temperature conditions (min_t < 0.3), a max_rh > 0.8 is necessary for positive regulation (SHAP value = 0.012); otherwise, a synergistic detrimental effect occurs (SHAP value = 0.022).

4. Discussion

4.1. Machine Learning for Predicting Disease Severity

Numerous countries have established goals to reduce chemical usage in agricultural production, aiming for sustainable development [61]. Compared to traditional strategies, Decision Support Systems (DSS) have proven effective in promoting sustainable development, reducing fungicide use by an average of 50% [62]. DSS enable agricultural practitioners to conduct preventive spraying when necessary and gradually refine measures based on predictions and real-world feedback, moving toward an optimal operational model [63].

Although numerous disease risk prediction models exist, most remain at the research level, with few being applied in agricultural production [63]. A primary reason is that while existing models may predict disease occurrence or progression, they often fail to provide actionable decision-making support needed in practical farming situations. Producers require a transparent interface and model-derived recommendations that are interpretable, enabling them to mitigate risks through practical operations. Consequently, we developed a website with a real-time feedback interface, deployed in the experimental field, allowing users to observe results, verify predictions, and plan subsequent operations (Figure 16). The system calls an open API to obtain a seven-day weather forecast. This external data is then fed into a pre-established and validated indoor microclimate prediction model [64] to generate high-precision forecasts of the greenhouse’s internal environment. After standardized preprocessing, the data is input into the pre-trained optimal model to generate disease risk predictions. These predictions are then delivered via an API to the front-end interface and visualized using a color-coded scheme, completing the closed loop from data to decision.

4.2. Comparison with Existing Models

Research in agricultural disease prediction has made progress, but limitations in application persist. For instance, the potato late blight model by Fenu and Malloci [20] relies on a single data source and focuses on a single disease, overlooking the spatiotemporal heterogeneity of the facility agriculture microenvironment, which limits its generalizability. Although the rice blast model [21] and rubber tree powdery mildew model [14] employ feature selection, they inadequately address data imbalance, and their reliance on post hoc explanation limits direct guidance for agronomic regulation.

The model developed in this study achieves breakthroughs in data preprocessing, feature selection, model evaluation, and interpretability. For data outliers, the Isolation Forest algorithm dynamically sets thresholds combined with a global median replacement strategy, effectively overcoming the poor scenario adaptability of traditional methods. In feature selection, redundant features are eliminated by combining Spearman correlation and VIF, while SMOTE-ENN hybrid sampling balances the data, significantly enhancing minority class recognition.

Furthermore, the introduction of the SHAP framework enables both global feature importance ranking and the revelation of causal relationships between environmental factors and disease risk through local interpretation, addressing the “black box” bottleneck of machine learning models. Compared to existing models, the proposed framework offers advantages in the depth and breadth of interpretability. Although studies like [19] utilize SHAP, they lack in-depth exploration of the insights from an agronomic perspective. Experimental results show that the model maintains over 97% classification accuracy, and the feature attribution analysis aligns closely with agronomic experience, providing an accurate and operable solution for disease prevention and control in facility agriculture.

4.3. Horticultural Insights and Model Limitations

This study first developed a reliable machine learning model. Based on the quantitative interpretability of the SHAP framework, a phased dynamic blight control system was then proposed. Key measures and thresholds were converted back to their original dimensions via denormalization to guide agricultural production directly (Table 9). Insights from the explainable analysis at different stages were summarized into stage-specific horticultural management insights.

Although the model demonstrated strong initial performance and high predictive accuracy, its prediction error may accumulate over time. The risk associated with error accumulation is that over-reliance on the model’s risk recommendations could introduce decision-making bias, potentially leading to reduced productivity [65].

A key strategy to mitigate these risks is to acquire substantial new data for continuous model optimization. However, data acquisition is challenging. Currently, high-quality data primarily depends on manual collection by technicians, which is costly and difficult to scale. Despite these data challenges, the results indicate good generalizability potential for this modeling approach within the current research scope.

Therefore, future research should focus on expanding the dataset, particularly by incorporating environmental data from diverse geographical regions worldwide to further validate and enhance the model’s generalization ability.

5. Conclusions

This study developed a generalizable and interpretable machine learning model for predicting hot pepper seedling damping-off in intensive seedling production systems. A total of 19 machine learning models were constructed utilizing Kalman filter smoothing, SMOTE-ENN for handling class imbalance, and feature selection. Based on the performance of these baseline models, statistical tests were conducted, and hyperparameters of the top-performing models were optimized. The Extreme Random Trees (ET) model was ultimately selected as the optimal model, and an interpretability analysis was performed. The main conclusions are as follows:

(1): The ET model achieved an F1-score of 0.9734 and an AUC of 0.9969 in predicting hot pepper damping-off severity.
(2): SHAP analysis was employed for both global and local interpretability, leading to the formulation of a hierarchical management and control strategy for damping-off.
(3): Key interacting environmental variables were identified. Based on the dependency analysis results, threshold-based environmental control measures were proposed and implemented in the platform’s real-time prediction system.

Author Contributions

Conceptualization, H.L. and M.D.; Methodology, Z.W. and K.L.; Software, Z.W. and K.L.; Validation, Z.W. and K.L.; Formal analysis, Z.W. and K.L.; Investigation, Z.W. and L.L.; Resources, H.L. and M.D.; Data curation, Z.W., L.L., C.L. and J.X.; Writing—original draft preparation, Z.W. and K.L.; Writing—review and editing, Z.W., K.L. and T.J.; Visualization, Z.W. and K.L.; Manuscript revising, H.L. and M.D.; Study design, Z.W., L.L., C.L. and J.X.; Supervision, T.J., H.L. and M.D.; Project administration, T.J., H.L. and M.D.; Funding acquisition, M.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Key Research and Development Program of the Xinjiang Uygur Autonomous Region (2022B02032-3), the National Talent Plan Project (KZ617201), and the earmarked fund for XJARS (XJARS-07).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xie, J.; Yu, J.; Chen, B.; Feng, Z.; Li, J.; Zhao, C.; Lyu, J.; Hu, L.; Gan, Y.; Siddique, K.H.M. Facility Cultivation Systems “设施农业”: A Chinese Model for the Planet. In Advances in Agronomy; Elsevier: Amsterdam, The Netherlands, 2017; Volume 145, pp. 1–42. ISBN 978-0-12-812417-8. [Google Scholar]
Xie, J.; Yu, J.; Chen, B.; Feng, Z.; Lyu, J.; Hu, L.; Gan, Y.; Siddique, K.H.M. Gobi Agriculture: An Innovative Farming System That Increases Energy and Water Use Efficiencies. A Review. Agron. Sustain. Dev. 2018, 38, 62. [Google Scholar] [CrossRef]
Kowalska, A.; Lingham, S.; Maye, D.; Manning, L. Food Insecurity: Is Leagility a Potential Remedy? Foods 2023, 12, 3138. [Google Scholar] [CrossRef]
Sundari, M.T.; Darsono, D.; Sutrisno, J.; Antriyandarti, E. Analysis of Trade Potential and Factors Influencing Chili Export in Indonesia. Open Agric. 2023, 8, 20220205. [Google Scholar] [CrossRef]
Zou, Z.; Zou, X. Geographical and Ecological Differences in Pepper Cultivation and Consumption in China. Front. Nutr. 2021, 8, 718517. [Google Scholar] [CrossRef]
Delai, C.; Muhae-Ud-Din, G.; Abid, R.; Tian, T.; Liu, R.; Xiong, Y.; Ma, S.; Ghorbani, A. A Comprehensive Review of Integrated Management Strategies for Damping-off Disease in Chili. Front. Microbiol. 2024, 15, 1479957. [Google Scholar] [CrossRef] [PubMed]
Zhao, C.-J.; Li, M.; Yang, X.-T.; Sun, C.-H.; Qian, J.-P.; Ji, Z.-T. A Data-Driven Model Simulating Primary Infection Probabilities of Cucumber Downy Mildew for Use in Early Warning Systems in Solar Greenhouses. Comput. Electron. Agric. 2011, 76, 306–315. [Google Scholar] [CrossRef]
Deguine, J.-P.; Aubertot, J.-N.; Flor, R.J.; Lescourret, F.; Wyckhuys, K.A.G.; Ratnadass, A. Integrated Pest Management: Good Intentions, Hard Realities. A Review. Agron. Sustain. Dev. 2021, 41, 38. [Google Scholar] [CrossRef]
Ghorbani, A.; Emamverdian, A.; Pishkar, L.; Chashmi, K.A.; Salavati, J.; Zargar, M.; Chen, M. Melatonin-Mediated Nitric Oxide Signaling Enhances Adaptation of Tomato Plants to Aluminum Stress. S. Afr. J. Bot. 2023, 162, 443–450. [Google Scholar] [CrossRef]
Nanehkaran, F.M.; Razavi, S.M.; Ghasemian, A.; Ghorbani, A.; Zargar, M. Foliar Applied Potassium Nanoparticles (K-NPs) and Potassium Sulfate on Growth, Physiological, and Phytochemical Parameters in Melissa officinalis L. under Salt Stress. Environ. Sci. Pollut. Res. 2024, 31, 31108–31122. [Google Scholar] [CrossRef]
Corkley, I.; Fraaije, B.; Hawkins, N. Fungicide Resistance Management: Maximizing the Effective Life of Plant Protection Products. Plant Pathol. 2022, 71, 150–169. [Google Scholar] [CrossRef]
Liu, K.; Mu, Y.; Chen, X.; Ding, Z.; Song, M.; Xing, D.; Li, M. Towards Developing an Epidemic Monitoring and Warning System for Diseases and Pests of Hot Peppers in Guizhou, China. Agronomy 2022, 12, 1034. [Google Scholar] [CrossRef]
Liakos, K.G.; Busato, P.; Moshou, D.; Pearson, S.; Bochtis, D. Machine Learning in Agriculture: A Review. Sensors 2018, 18, 2674. [Google Scholar] [CrossRef] [PubMed]
Bai, R.; Wang, J.; Li, N.; Chen, R. Short- and Long-Term Prediction Models of Rubber Tree Powdery Mildew Disease Index Based on Meteorological Variables and Climate System Indices. Agric. For. Meteorol. 2024, 354, 110082. [Google Scholar] [CrossRef]
Ghorbani, A.; Emamverdian, A.; Pehlivan, N.; Zargar, M.; Razavi, S.M.; Chen, M. Nano-Enabled Agrochemicals: Mitigating Heavy Metal Toxicity and Enhancing Crop Adaptability for Sustainable Crop Production. J. Nanobiotechnol. 2024, 22, 91. [Google Scholar] [CrossRef]
Scortichini, M. Sustainable Management of Diseases in Horticulture: Conventional and New Options. Horticulturae 2022, 8, 517. [Google Scholar] [CrossRef]
Baker, K.M.; Kirk, W.W. Comparative Analysis of Models Integrating Synoptic Forecast Data into Potato Late Blight Risk Estimate Systems. Comput. Electron. Agric. 2007, 57, 23–32. [Google Scholar] [CrossRef]
Liu, K.; Zhang, C.; Yang, X.; Diao, M.; Liu, H.; Li, M. Development of an Occurrence Prediction Model for Cucumber Downy Mildew in Solar Greenhouses Based on Long Short-Term Memory Neural Network. Agronomy 2022, 12, 442. [Google Scholar] [CrossRef]
Wadhwa, D.; Malik, K. A Generalizable and Interpretable Model for Early Warning of Pest-Induced Crop Diseases Using Environmental Data. Comput. Electron. Agric. 2024, 227, 109472. [Google Scholar] [CrossRef]
Fenu, G.; Malloci, F.M. Artificial Intelligence Technique in Crop Disease Forecasting: A Case Study on Potato Late Blight Prediction. In Intelligent Decision Technologies; Czarnowski, I., Howlett, R.J., Jain, L.C., Eds.; Smart Innovation, Systems and Technologies; Springer: Singapore, 2020; Volume 193, pp. 79–89. ISBN 9789811559242. [Google Scholar]
Sriwanna, K. Weather-Based Rice Blast Disease Forecasting. Comput. Electron. Agric. 2022, 193, 106685. [Google Scholar] [CrossRef]
Saha, S.; Kucher, O.D.; Utkina, A.O.; Rebouh, N.Y. Precision Agriculture for Improving Crop Yield Predictions: A Literature Review. Front. Agron. 2025, 7, 1566201. [Google Scholar] [CrossRef]
Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
Bitmead, R.R.; Hovd, M.; Abooshahab, M.A. A Kalman-Filtering Derivation of Simultaneous Input and State Estimation. Automatica 2019, 108, 108478. [Google Scholar] [CrossRef]
Wang, Z.; Li, W.; Tang, Z. Enhancing the Genomic Prediction Accuracy of Swine Agricultural Economic Traits Using an Expanded One-Hot Encoding in CNN Models. J. Integr. Agric. 2024, 24, 3574–3582. [Google Scholar] [CrossRef]
Kim, M.K.; Jeong, H.B.; Yu, N.; Park, B.M.; Chae, W.B.; Lee, O.J.; Lee, H.E.; Kim, S. Comparative Heat Stress Responses of Three Hot Pepper (Capsicum annuum L.) Genotypes Differing Temperature Sensitivity. Sci. Rep. 2023, 13, 14203. [Google Scholar] [CrossRef]
Bita, C.E.; Gerats, T. Plant Tolerance to High Temperature in a Changing Environment: Scientific Fundamentals and Production of Heat Stress-Tolerant Crops. Front. Plant Sci. 2013, 4, 273. [Google Scholar] [CrossRef]
Chaves, S.W.P.; Coelho, R.D.; Costa, J.d.O.; Tapparo, S.A. Micrometeorological Modeling and Water Consumption of Tabasco Pepper Cultivated under Greenhouse Conditions. Ital. J. Agrometeorol. 2021, 21–36. [Google Scholar] [CrossRef]
Spearman, C. The Proof and Measurement of Association between Two Things. Am. J. Psychol. 1904, 15, 72–101. [Google Scholar] [CrossRef]
Cohen, J. The Analysis of Variance. In Statistical Power Analysis for the Behavioral Sciences; Routledge: New York, NY, USA, 2013; ISBN 978-0-203-77158-7. [Google Scholar]
Schober, P.; Boer, C.; Schwarte, L.A. Correlation Coefficients: Appropriate Use and Interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef]
Senter, H.F. Applied Linear Statistical Models. J. Am. Stat. Assoc. 2008, 103, 880. [Google Scholar] [CrossRef]
Zhang, Q.; Sun, S. Weighted Data Normalization Based on Eigenvalues for Artificial Neural Network Classification; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Batista, G.E.A.P.A.; Prati, R.C.; Monard, M.C. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 2016, 104, 148–175. [Google Scholar] [CrossRef]
Scikit-Learn Developers. *User Guide (Version 1.4)* [Computer Software Documentation]. 2024. Available online: https://scikit-learn.org/1.4/user_guide.html (accessed on 4 October 2025).
Mohammed, S.; Arshad, S.; Alsilibe, F.; Moazzam, M.F.U.; Bashir, B.; Prodhan, F.A.; Alsalman, A.; Vad, A.; Ratonyi, T.; Harsányi, E. Utilizing Machine Learning and CMIP6 Projections for Short-Term Agricultural Drought Monitoring in Central Europe (1900–2100). J. Hydrol. 2024, 633, 130968. [Google Scholar] [CrossRef]
Khan, N.; Sachindra, D.A.; Shahid, S.; Ahmed, K.; Shiru, M.S.; Nawaz, N. Prediction of Droughts over Pakistan Using Machine Learning Algorithms. Adv. Water Resour. 2020, 139, 103562. [Google Scholar] [CrossRef]
Aneece, I.; Thenkabail, P.S. Classifying Crop Types Using Two Generations of Hyperspectral Sensors (Hyperion and DESIS) with Machine Learning on the Cloud. Remote Sens. 2021, 13, 4704. [Google Scholar] [CrossRef]
Tageldin, A.; Adly, D.; Mostafa, H.; Mohammed, H.S. Applying Machine Learning Technology in the Prediction of Crop Infestation with Cotton Leafworm in Greenhouse. bioRxiv 2020. [Google Scholar] [CrossRef]
Gao, Y.; Huang, C.; Zhang, X.; Zhang, Z.; Chen, B. Vertical Stratification-Enabled Early Monitoring of Cotton Verticillium Wilt Using in-Situ Leaf Spectroscopy via Machine Learning Models. Front. Plant Sci. 2025, 16, 1599877. [Google Scholar] [CrossRef]
Nagesh, O.S.; Budaraju, R.R.; Kulkarni, S.S.; Vinay, M.; Ajibade, S.-S.M.; Chopra, M.; Jawarneh, M.; Kaliyaperumal, K. Boosting Enabled Efficient Machine Learning Technique for Accurate Prediction of Crop Yield towards Precision Agriculture. Discov. Sustain. 2024, 5, 78. [Google Scholar] [CrossRef]
Zhao, Y.; Dong, H.; Huang, W.; He, S.; Zhang, C. Seamless Terrestrial Evapotranspiration Estimation by Machine Learning Models across the Contiguous United States. Ecol. Indic. 2024, 165, 112203. [Google Scholar] [CrossRef]
Branstad-Spates, E.H.; Castano-Duque, L.; Mosher, G.A.; Hurburgh, C.R.; Owens, P.; Winzeler, E.; Rajasekaran, K.; Bowers, E.L. Gradient Boosting Machine Learning Model to Predict Aflatoxins in Iowa Corn. Front. Microbiol. 2023, 14, 1248772. [Google Scholar] [CrossRef]
Ghosh, S.S.; Mandal, D.; Kumar, S.; Bhogapurapu, N.; Banerjee, B.; Siqueira, P.; Bhattacharya, A. An Evidence Modified Gaussian Process Classifier (EM-GPC) for Crop Classification Using Dual-Polarimetric C- and L- Band SAR Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 18683–18702. [Google Scholar] [CrossRef]
Vázquez-Veloso, A.; Toraño Caicoya, A.; Bravo, F.; Biber, P.; Uhl, E.; Pretzsch, H. Does Machine Learning Outperform Logistic Regression in Predicting Individual Tree Mortality? Ecol. Inform. 2025, 88, 103140. [Google Scholar] [CrossRef]
Shahoveisi, F.; Riahi Manesh, M.; Del Río Mendoza, L.E. Modeling Risk of Sclerotinia sclerotiorum-Induced Disease Development on Canola and Dry Bean Using Machine Learning Algorithms. Sci. Rep. 2022, 12, 864. [Google Scholar] [CrossRef]
Kim, Y.; Roh, J.-H.; Kim, H.Y. Early Forecasting of Rice Blast Disease Using Long Short-Term Memory Recurrent Neural Networks. Sustainability 2017, 10, 34. [Google Scholar] [CrossRef]
Xiao, Q.; Li, W.; Kai, Y.; Chen, P.; Zhang, J.; Wang, B. Occurrence Prediction of Pests and Diseases in Cotton on the Basis of Weather Factors by Long Short Term Memory Network. BMC Bioinform. 2019, 20, 688. [Google Scholar] [CrossRef] [PubMed]
Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU Neural Network Methods for Traffic Flow Prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 11–13 November 2016; pp. 324–328. [Google Scholar]
Friedman, M. The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance. J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar] [CrossRef]
Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Borror, C.M. Practical Nonparametric Statistics, 3rd Ed. J. Qual. Technol. 2001, 33, 260. [Google Scholar] [CrossRef]
Brown, I.; Mues, C. An Experimental Comparison of Classification Algorithms for Imbalanced Credit Scoring Data Sets. Expert Syst. Appl. 2012, 39, 3446–3453. [Google Scholar] [CrossRef]
García, S.; Fernández, A.; Luengo, J.; Herrera, F. Advanced Nonparametric Tests for Multiple Comparisons in the Design of Experiments in Computational Intelligence and Data Mining: Experimental Analysis of Power. Inf. Sci. 2010, 180, 2044–2064. [Google Scholar] [CrossRef]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar]
Jones, D.R.; Schonlau, M.; Welch, W.J. Efficient Global Optimization of Expensive Black-Box Functions. J. Glob. Optim. 1998, 13, 455–492. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 4768–4777. [Google Scholar]
Mishra, P. Model Explainability and Interpretability. In Practical Explainable AI Using Python: Artificial Intelligence Model Explanations Using Python-Based Libraries, Extensions, and Frameworks; Mishra, P., Ed.; Apress: Berkeley, CA, USA, 2022; pp. 1–22. ISBN 978-1-4842-7158-2. [Google Scholar]
Shuqin, J.; Fang, Z. Zero Growth of Chemical Fertilizer and Pesticide Use: China’s Objectives, Progress and Challenges. J. Resour. Ecol. 2018, 9, 50–58. [Google Scholar] [CrossRef]
Lázaro, E.; Makowski, D.; Vicent, A. Decision Support Systems Halve Fungicide Use Compared to Calendar-Based Strategies without Increasing Disease Risk. Commun. Earth Environ. 2021, 2, 224. [Google Scholar] [CrossRef]
Magarey, R.D.; Travis, J.W.; Russo, J.M.; Seem, R.C.; Magarey, P.A. Decision Support Systems: Quenching the Thirst. Plant Dis. 2002, 86, 4–14. [Google Scholar] [CrossRef]
Liang, L.; Shi, H.; Wang, Z.; Wang, S.; Li, C.; Diao, M. Research on Time Series Prediction Model for Multi-Factor Environmental Parameters in Facilities Based on LSTM-AT-DP Model. Front. Plant Sci. 2025, 16, 1652478. [Google Scholar] [CrossRef]
Zhao, G.; Zhao, Q.; Webber, H.; Johnen, A.; Rossi, V.; Nogueira Junior, A.F. Integrating Machine Learning and Change Detection for Enhanced Crop Disease Forecasting in Rice Farming: A Multi-Regional Study. Eur. J. Agron. 2024, 160, 127317. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of experiment shed and sensors deployment. a: Greenhouse baby sensor NX-WSWW1230 (Green Water, NERCITA); b: Schematic diagram of anemometer; c: weighting system NX-WTS-302 (Green Water, NERCITA).

Figure 2. Distribution of DI under three varieties (Three sowing date in seedling period).

Figure 3. Flowchart of model development process.

Figure 4. Comparison of environmental variable profiles before and after Kalman filter fusion: (a) Air temperature; (b) Relative humidity; (c) Solar radiation.

Figure 5. Correlation analysis heat map.

Figure 6. Comparison of effects before and after SMOTE-ENN treatment.

Figure 7. Radar chart of baseline model (a) F₁-score and (b) AUC value.

Figure 8. Conover test results for (a) F₁-score and (b) AUC value.

Figure 9. Classification prediction results of ET model in test set. (a) ROC curve, (b) confusion matrix.

Figure 10. Importance map of SHAP global analysis features for low-severity class.

Figure 11. Importance map of SHAP global analysis features for moderate-severity class.

Figure 12. Importance map of SHAP global analysis features for high-severity classes.

Figure 13. Local dependencies of SHAP for low-severity classes.

Figure 14. Local dependencies of SHAP for moderate-severity classes.

Figure 15. Local dependencies of SHAP for high-severity classes.

Figure 16. Visual interface for real-time disease prediction. The interface displays risk predictions for different crop diseases over the next seven days. A grid visualization uses color coding (red: high risk; yellow: medium risk; green: low risk), allowing users to quickly assess risk levels and plan corresponding agricultural practices. The interface includes a toggle to switch between time views such as “Real-time,” “Tomorrow,” and “Day after tomorrow”.

Table 1. Variable names and data descriptions.

Variable Name	Measurement Range & Accuracy	Statistical Parameters	Sampling Interval	Unit
Air temperature	−30 to 70 °C, ±0.20 °C	Max: 53.83 Min: 11.61 Mean: 28.04 SD: 8.96	15 min	°C
Air relative humidity	0 to 100%, ±2%	Max: 90.58 Min: 4.71 Mean: 40.64 SD: 20.30	15 min	%
Solar radiation	0 to 1800 W/m², ±5%	Max: 665.86 Min: 0 Mean: 64.76 SD: 116.85	15 min	W/m²
Substrate moisture content	0–100%, ±0.02%	Max: 81 Min: 39 Mean: 68 SD: 8	10 min	%
Outdoor wind speed	0 to 67 m s⁻¹, ±0.30 m s⁻¹	Max: 13.10 Min: 0 Mean: 1.46 SD: 1.76	10 min	m/s

Table 2. Classification criteria of hot pepper damping-off disease.

Disease Grading	Symptom Description
0	No visible symptoms
1	Slight discoloration or faint lesions at the stem base
3	Distinct lesions at the stem-root junction, but plant growth is unaffected
5	Lesions or rot covering 1/3 to 1/2 of the stem base or root collar
7	Complete girdling of the stem base or root collar with discoloration and rot
9	Whole plant wilts and dies

Table 3. Feature list derived from origin data, aggregated at 12 h intervals.

Feature Name	Input Variable	Unit
Maximum air temperature	max_t	°C
Minimum air temperature	min_t	°C
Mean air temperature	mean_t	°C
Temperature range (max_t–min_t)	dt	°C
Duration of temperature >28 °C	tt_28	minutes
Duration of temperature >30 °C	tt_30	minutes
Maximum relative humidity	max_rh	%
Minimum relative humidity	min_rh	%
Mean relative humidity	mean_rh	%
Relative humidity range (max_rh–min_rh)	drh	%
Maximum solar radiation	max_sr	W/m²
Mean solar radiation	mean_sr	W/m²
Maximum substrate moisture content	max_w	%
Minimum substrate moisture content	min_w	%
Substrate moisture range (max_w–min_w)	de	%
Mean wind speed	mean_ws	m/s
Duration of non-zero wind speed	wst	minutes

Table 4. Model parameters description used in this article.

Model	Important Parameters	Research Task
RF	n_estimators, max_depth, max_features, min_samples_leaf	Short-term agricultural drought monitoring [38]
SVM	C, kernel, gamma	Potato blight disease [20]
KNN	n_neighbors, weights	Prediction of droughts [39]
NB	alpha	Classifying Crop Types [40]
MLP	hidden_layer_sizes, activation, learning_rate_init	Weather-based rice blast disease forecasting [21]
XGBoost	n_estimators, learning_rate, max_depth, gamma, lambda	Predict the manifestation of Egyptian cotton leaf worm [41]
CatBoost	iterations, learning_rate, depth	Early warning of pest-induced crop diseases [19]
LightGBM	num_leaves, learning_rate, n_estimators	Early monitoring of cotton Verticillium wilt [42]
AdaBoost	n_estimators, learning_rate, base_estimator	Crop yield prediction [43]
ET	n_estimators, max_depth, max_features	Seamless terrestrial evapotranspiration estimation [44]
BRF	n_estimators, sampling_strategy, base_estimator	Early warning systems for pest-induced crop diseases [19]
DT	max_depth, min_samples_split, criterion	Short-term agricultural drought monitoring [38]
GBM	n_estimators, learning_rate, max_depth	Predicting aflatoxin contamination in Iowa corn [45]
GPC	kernel, optimizer, n_restarts_optimizer	Accurate crop classification [46]
OVR-Logistic	C, penalty, solver, max_iter	Predicting individual tree mortality [47]
ANN	layers, units, learning_rate_init, dropout	Diseases on canola and dry bean crops [48]
RNN	num_layers, hidden_size, units, activation, optimizer, dropout	Predict rice blast disease [49]
LSTM	num_layers, hidden_size, units, activation, optimizer, dropout	Predict the occurrence of cotton pests and diseases [50]
GRU	num_layers, hidden_size, units, activation, optimizer, dropout	Prediction of plant sap flow in precision agriculture [51]

Table 5. Process and results of VIF analysis of variables.

Input Variable	Origin VIF	Final VIF	Select Status
max_t	∞	3.18	save
min_t	∞	2.06	save
mean_t	48.29	—	remove
dt	∞	—	remove
tt_28	14.67	—	remove
tt_30	13	—	remove
max_rh	∞	2.52	save
min_rh	∞	2.67	save
drh	∞	—	remove
mean_rh	23.35	—	remove
mean_sr	8.92	3.38	save
max_sr	4.24	2.94	save
max_w	2.11	2.05	save
min_w	2.12	2.02	save
de	1.84	1.75	save

Table 6. Friedman test results.

Performance Metric	Test Statistic	p-Value	Conclusion
F₁-score	23.7808	8.8374 × 10⁻⁵	Highly Significant
AUC value	18.9434	8.0633 × 10⁻⁴	Highly Significant

Table 7. Hyperparameter search range and optimal parameter table of shortlisted models.

Model	Parameter	Default Value	Search Range	Optimal Value
ET	n_estimators	100	5–2000	1770
	max_depth	None	3–200	200
	min_samples_split	2	2–60	2
	min_samples_leaf	1	1–60	1
	max_leaf_nodes	None	10–200	166
	max_feartures	sqrt	0.1–1.0	0.1
	max_samples	None	sqrt\log2\None	None
CatBoost	Learning_rate	0.1	0.001–0.1	0.0986
	Depth	6	4–15	14
	Iterations	1000	10–60	51
	L2_leaf_reg	3	1–10	1
	Random_strength	1	1–10	3.0605
	Border_count	255	5–15	14
SVM	C	1.0	0.001–1000	1000
	gamma	scale	1 × 10⁻⁵–10	0.0159
	Kernel	rbf	Linear\rbf\poly	rbf
GPC	Kernel_type	RBF	RBF\Matern	Matern
	Length_scale	1.0	0.001–10	0.0054
	alpha	0	1 × 10⁻⁷–1	0.0287
	n_restarts	0	2–60	37
	max_iter_predict	100	20–300	145
LightGBM	Learning_rate	0.1	0.003–0.1	0.1
	Max_depth	−1	3–15	3
	N_estimators	100	100–1000	1000
	subsample	1.0	0.6–1.0	0.8845
	Colsample_bytree	1.0	0.6–1.0	0.7035
	Lambda_l1	0	0.003–1.0	0.6485
	Lambda_l2	0	0.003–1.0	0.0213

Table 8. Performance comparison of five models before and after hyperparameter tuning. The Extreme Trees (ET) model (highlighted in bold) was selected as the optimal model for final analysis.

Model	F₁-Score		Test Set Performance	AUC Value		Test Set Performance
Model	Before Tuning	After Tuning	Test Set Performance	Before Tuning	After Tuning	Test Set Performance
ET	0.9754	1	0.9734	0.9984	1	0.9969
CatBoost	0.9441	1	0.9504	0.9957	1	0.9988
SVM	0.9357	1	0.9734	0.9948	1	0.9877
GPC	0.9475	0.9687	0.8738	0.9938	0.9985	0.9766
LightGBM	0.9379	1	0.9457	0.9878	1	0.9716

Table 9. Prevention and control measures described in original dimensions after de-normalization.

Stage of Intervention	Control Measures
Low severity	When the minimum air temperature falls below 20.01 °C, the minimum substrate moisture content should be maintained above 73.2% to ensure disease control.
Moderate severity	A maximum substrate moisture content greater than 78.7% significantly inhibits disease progression.
High severity	When relative humidity exceeds 67.94% and the minimum temperature drops below 19.24 °C, environmental humidity should be actively reduced to mitigate disease risk.
Full stage	When solar radiation exceeds 262.1 W/m², precise water management becomes critical—irrigation volume and humidity control must be dynamically coordinated.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Liu, K.; Liang, L.; Li, C.; Ji, T.; Xu, J.; Liu, H.; Diao, M. An Explainable Machine Learning Framework for the Hierarchical Management of Hot Pepper Damping-Off in Intensive Seedling Production. Horticulturae 2025, 11, 1258. https://doi.org/10.3390/horticulturae11101258

AMA Style

Wang Z, Liu K, Liang L, Li C, Ji T, Xu J, Liu H, Diao M. An Explainable Machine Learning Framework for the Hierarchical Management of Hot Pepper Damping-Off in Intensive Seedling Production. Horticulturae. 2025; 11(10):1258. https://doi.org/10.3390/horticulturae11101258

Chicago/Turabian Style

Wang, Zhaoyuan, Kaige Liu, Longwei Liang, Changhong Li, Tao Ji, Jing Xu, Huiying Liu, and Ming Diao. 2025. "An Explainable Machine Learning Framework for the Hierarchical Management of Hot Pepper Damping-Off in Intensive Seedling Production" Horticulturae 11, no. 10: 1258. https://doi.org/10.3390/horticulturae11101258

APA Style

Wang, Z., Liu, K., Liang, L., Li, C., Ji, T., Xu, J., Liu, H., & Diao, M. (2025). An Explainable Machine Learning Framework for the Hierarchical Management of Hot Pepper Damping-Off in Intensive Seedling Production. Horticulturae, 11(10), 1258. https://doi.org/10.3390/horticulturae11101258

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Explainable Machine Learning Framework for the Hierarchical Management of Hot Pepper Damping-Off in Intensive Seedling Production

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Acquisition

2.2. Data Preprocessing

2.2.1. Outlier Handling

2.2.2. Kalman Filter Smoothing Processing

2.2.3. One-Hot Encoding

2.3. Feature Engineering

2.3.1. Feature Creation

2.3.2. Feature Selecting

2.4. Dataset Preparation

2.4.1. Dataset Partitioning and Standardization

2.4.2. SMOTE-ENN Handles Unbalanced Classification

2.5. Model Development and Evaluation Statistical Testing

2.5.1. Model Operating Environment

2.5.2. Development of Baseline Model

2.5.3. Model Evaluation Metrics

2.5.4. Model Performance Statistical Testing

2.5.5. Hyperparameter Tuning

2.5.6. Model Explainable Analysis

2.6. Overall Modeling Workflow

3. Results

3.1. Kalman Filter Smoothing Results

3.2. Feature Selection Results

3.3. Results of Unbalanced Sample Processing

3.4. Baseline Model Prediction Results

3.5. Model Difference Test Results

3.6. Hyperparameter Tuning and Optimal Model Selection

3.7. SHAP Explainable Analysis of the Model

3.7.1. Global Importance Analysis

3.7.2. Local Dependency Analysis

4. Discussion

4.1. Machine Learning for Predicting Disease Severity

4.2. Comparison with Existing Models

4.3. Horticultural Insights and Model Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI