Next Article in Journal
Integrating Bioactive Compound Variation and Habitat Suitability to Map the Quality Zoning of Crataegus pinnatifida Bunge Under Human Activity and Climate Change: A Biomod2 Ensemble Modeling Approach
Previous Article in Journal
ToRLNet: A Lightweight Deep Learning Model for Tomato Detection and Quality Assessment Across Ripeness Stages
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Early Detection of Chinese Cabbage Clubroot Based on Integrated Leaf Multispectral Imaging and Machine Learning

1
Ministry of Education of China-Hebei Province Joint Innovation Center for Efficient Green Vegetable, Key Laboratory of Vegetable Germplasm Innovation and Utilization of Hebei, College of Horticulture, Hebei Agricultural University, Baoding 071000, China
2
College of Mechanical & Electrical Engineering, Hebei Agricultural University, Baoding 071000, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Horticulturae 2025, 11(11), 1335; https://doi.org/10.3390/horticulturae11111335
Submission received: 23 August 2025 / Revised: 29 October 2025 / Accepted: 3 November 2025 / Published: 5 November 2025
(This article belongs to the Section Biotic and Abiotic Stress)

Abstract

Clubroot, caused by Plasmodiophora brassicae, is a destructive disease of Chinese cabbage (Brassica rapa ssp. pekinensis) at all growing stages. Early detection of the disease is essential to mitigate the impact of clubroot. Here, we established an optimal algorithm for multispectral imaging combined with machine learning to detect leaf responses of highly susceptible cultivar YoulvNo.3 at different day after inoculation (DAI). Spectral data at 19 wavelengths were collected from leaf multispectral images, and key characteristic wavelengths were further extracted. Principal Component Analysis (PCA) revealed a clear separation between healthy and infected samples at 11 DAI. Four classification algorithms, including Random Forest (RF), Partial Least Squares Discriminant Analysis (PLS-DA), Support Vector Machine (SVM) and Extreme Learning Machine (ELM), were employed to construct early detection model for clubroot. SVM achieved over 81% accuracy with full-spectrum data, while ELM based on characteristic wavelengths provided the best performance, accuracy exceeding 84%. Stratified five-fold cross-validation was used to validate the optimal model. An average accuracy of 83.79% (±1.04%) and macro-averaged F1-score of 82.13% (±1.12%) across validation folds were obtained, confirming stable performance. Our findings, for the first time, identified detectable spectral differences between the healthy and infected plants at 11 DAI using leaf multispectral combined with machine learning, providing a potential application for early detection of clubroot and timely control in Chinese cabbage.

1. Introduction

Chinese cabbage (Brassica rapa L. ssp. pekinensis), originating in China, is an important leafy vegetable of the genus Brassica within the family Brassicaceae [1]. It is widely planted in China and Southeast Asian countries owing to its rich nutritional value, easy cultivation, and resilience during storage and transportation, etc. Clubroot, a serious soil-borne disease caused by Plasmodiophora brassicae, infects the roots of cruciferous crops [2]. In recent years, clubroot has become a major disease affecting Chinese cabbage, with a rapidly expanding incidence area that seriously reduces yield and quality, leading to substantial significant economic losses. The resting spores of P. brassicae can survive in soil for more than 20 years and are generally difficult to eliminate [3]. As a result, effective control of clubroot in agricultural production remains challenging [4].
The impact of clubroot disease on crop yield varies depending on disease type and growth stage at which infection occurs. Early detection is critical for implementing timely interventions to minimize yield losses. However, detecting P. brassicae infection in the earlier stages remains challenging. By the time visible symptoms emerge, the root system has already suffered severe damage, leading to considerable economic losses. Therefore, achieving early detection and effective prevention of clubroot is essential. Conventional diagnostic methods include field observation, symptom diagnosis, and molecular techniques such as polymerase chain reaction (PCR) and other molecular assays for pathogen detection [5,6,7,8,9]. These methods, however, involve multiple steps—including soil sampling, root and soil separation, washing, diagnosis, and tissue sampling—making them labor-intensive, time-consuming, and costly to plant tissue [10]. Consequently, there is a growing need to develop early diagnostic techniques to address these limitations.
Plants respond to biotic and abiotic stresses through alterations in both external morphology and internal physiology [11]. For example, drought stress on plant roots can lead to yellowing and wilting. Similarly, root diseases can also induce visible or invisible changes in the aerial parts of plants. Previous research has established that aboveground plant characteristics can reflect belowground conditions, enabling disease detection. Various technologies—including X-ray computed tomography, nuclear magnetic resonance imaging, minirhizotron systems, and spectral imaging—have been employed to investigate this relationship [12]. Among these, spectral imaging techniques, particularly multispectral and hyperspectral imaging, integrate the advantages of conventional imaging and spectroscopy by simultaneously capturing spatial and spectral information from target objects [13], making them particularly valuable for plant disease detection.
Multispectral imaging captures data from a limited number (typically 5–20) of discrete spectral bands, enabling rapid and cost-effective detection, whereas hyperspectral imaging records data from hundreds of contiguous narrow bands, providing higher spectral resolution but at greater cost and computational demand. For example, Heath et al. [14] examined changes in canopy reflectance of potatoes infested with potato cyst nematodes. Feng et al. [15] employed visible/near-infrared hyperspectral imaging coupled with convolutional neural networks to identify aboveground phenotypic changes caused by P. brassicae. Under controlled conditions, Hillnhütter et al. [16] successfully utilized hyperspectral imaging to detect and evaluate symptoms of Heterodera schachtii (nematode) and Rhizoctonia solani (fungus) infections in sugar beet. Jayapal et al. [17] employed deep learning on RGB images of leaves to diagnose root rot in Korean ginseng.
Plant spectral characteristics represent integrated responses to environmental influences during growth. Previous studies have shown that the spectral features of plants affected by fungi, diseases, or insect pests differ significantly from those of healthy plants [18]. Leaves, in particular, contain rich information related to plant health, and the changes in their optical properties can be leveraged for infection detection [19]. Multispectral imaging offers multiple advantages—being high-throughput and accurate—while enabling simultaneous acquisition of spectral and spatial data from target objects. This capability facilitates the inference of belowground disease status from aboveground spectral information [20].
Several studies demonstrate the practical implementation of this approach. For instance, Feng et al. [21] utilized multispectral imaging to extract information from rice leaves and canopies, establishing an efficient model for detecting and grading rice leaf blast. Hou et al. [22] applied an ant colony clustering algorithm with multispectral image overlays to detect grape leaf curl disease. Bebronne et al. [23] employed multispectral imaging to assess the severity of stripe rust, brown rust, and black spots in wheat, finding artificial neural network algorithm yielded the best recognition performance. Zhang et al. [24] identified drought stress in soybean canopies using multispectral imaging and developed a support vector machine recognition model achieving 96.87% accuracy. Lizarazo et al. [25] integrated UAV-based multispectral imaging with a gradient boosting algorithm to classify the severity of Verticillium wilt infection in potato, achieving stable classification result. Collectively, these studies affirm that the reliability of multispectral imaging for plant diseases detection.
In addition to convolutional neural networks (CNNs), Support Vector Machines (SVMs), and gradient boosting, several other machine learning methods, such as Random Forest (RF), Partial Least Squares Discriminant Analysis (PLS-DA), and Extreme Learning Machine (ELM), have also been employed in plant disease detection. RF has demonstrated robust performance in quantifying wheat stripe rust severity using hyperspectral data [26]. PLS-DA has been widely adopted in optical spectroscopy for disease discrimination and frequently in spectral plant studies [27]. ELM has been utilized in hyperspectral analysis and plant leaf disease classification, showing fast training and competitive accuracy compared to traditional classifiers [28]. Notably, SVM has been widely applied in spectral-based plant disease recognition and diagnostic studies due to its strong generalization ability and effectiveness in handling high-dimensional nonlinear data [29].
Despite these methodological advances, most existing studies have focused on agronomic crops such as wheat, rice, and maize, with relatively limited attention paid to cruciferous vegetables. Specifically, early detection of clubroot in Chinese cabbage remains underexplored. Moreover, current approaches predominantly focus on disease identification at later stages rather than early diagnosis, and systematic comparisons of different machine learning algorithms are often lacking. These research gaps underscore the need for an efficient and robust approach for early and rapid detection of clubroot in Chinese cabbage.
The objective of this study was to develop an effective model for the early and rapid detection of clubroot disease in Chinese cabbage using multispectral imaging. To achieve this, we monitored changes in leaf spectral reflectance of seedlings during infection by P. brassicae, and applied the Successive Projection Algorithm (SPA) and Principal Component Analysis (PCA) to select informative features. By integrating multiple machine learning methods with spectral characteristics, this study, for the first time, demonstrates that leaf multispectral imaging combined with algorithmic comparison enables reliable early detection of clubroot, highlighting its novelty and potential application in practical disease monitoring.

2. Materials and Methods

2.1. Experimental Materials

The highly clubroot-susceptible Chinese cabbage variety, YoulvNo.3, was used in this study. The inoculum consisted of a mixed field population of P. brassicae, primarily comprising the predominant pathotype race 4 and race 1. Chinese cabbage roots infected with P. brassicae and filled with resting spores were stored at −20 °C until use. Both the seeds of YoulvNo.3 and P. brassicae pathogen were provided by the Key Laboratory of Vegetable Germplasm Innovation and Utilization in Hebei Province.

2.2. Preparation of P. brassicae Suspension

The clubroot-infected roots were retrieved from the refrigerator at −20 °C and thawed. They were then allowed to fully decay at 25 °C. The decayed root materials were processed by pressing, filtering, centrifuging to extract the spores. The spore concentration was adjusted to 1 × 108 spores/mL using sterile water. The spore suspension was stored at 4 °C for short-term use.

2.3. Inoculation and Disease Assessment

YoulvNo.3 seeds were sown in a 50-cell–matrix tray, with one seed per cell. On the sixth day after sowing, 300 plants were inoculated by injecting 3 mL of P. brassicae spore suspension near the root zone. Control plants (150) received 3 mL of sterile water. All plants were then placed in an artificial climate chamber set to 25 °C (day) and 20 °C (night), with a photoperiod of 16 h light and 8 h darkness. Substrate moisture was maintained at appropriate levels throughout the experiment.
Disease severity levels were evaluated 42 days after inoculation and classified into four grades (0–3) based on the following criteria [30]:
  • Grade 0: Normal root development, no galls;
  • Grade 1: No gall on the taproot, but small galls present on lateral roots;
  • Grade 2: Small gall on the taproot and larger tumors on lateral roots;
  • Grade 3: Large galls on both taproot and lateral roots, accompanied by plant wilting.
  • Incidence (%) = (No. of diseased plants/No. of total plants) × 100,
  • Disease index = Σ(No. of diseased plants at certain level × the corresponding disease grade)/(No. of total plants × the highest disease grade) × 100.

2.4. Multispectral Data Acquisition

In this experiment, a total of 150 healthy plants and 300 plants inoculated with P. brassicae were cultivated. Multispectral images were acquired in situ scanning at 10 time points post inoculation (4, 7, 11, 15, 19, 23, 27, 32, 37, and 42 days after inoculation [DAI]). During the course of the experiment, some inoculated plants died, resulting in a final dataset of 3820 valid spectral samples, including 1500 healthy and 2320 infected samples. To avoid information leakage, spectral data were split by plant ID into training and test sets at a ratio of 7:3. This ensured that spectra from the same plant were not present in both training and test subsets.
Multispectral imaging was performed using a VideometerLab4 system (Videometer A/S, Hørsholm, Denmark). The device features hollow integrating spheres coated with white coatings and is equipped with multiple light-emitting diodes (LEDs). A CCD multi-band camera with a resolution of 2192 × 2192 pixels, mounted at the top of the sphere, captures images across 19 spectral bands: 365 nm, 405 nm, 430 nm, 450 nm, 470 nm, 490 nm, 515 nm, 540 nm, 570 nm, 590 nm, 630 nm, 645 nm, 660 nm, 690 nm, 780 nm, 850 nm, 880 nm, 940 nm, and 970 nm, covering the ultraviolet, visible, and near-infrared regions.
Before image acquisition, instrument and illumination calibration procedures were performed. The multispectral camera was focused to ensure the image was clear and free of distortion. The descent height of the imager was set to 20 cm, and the height was adjusted to 90 cm. Chinese cabbage leaves were sampled in vivo on the 4th, 7th, 11th, 15th, 19th, 23rd, 27th, 32nd, 37th, and 42nd days after inoculation. Each seedling was placed upside down inside the closed integrating sphere. The largest and clearest leaf was selected and placed on the sample table. Multispectral images were captured by triggering the capture function (F12 command), yielding 19 reflectance images at the respective wavelengths. Spectral band data were extracted using the VideometerLab software (version 3.12.15). The flowchart of the spectral data research process is shown in Figure 1.

2.5. Multispectral Data Preprocessing

To reduce the influence of astigmatism, noise, and background interference, multispectral data were preprocessed prior to model development to improve the signal-to-noise ratio. Spectral data were smoothed using the Savitzky–Golay (SG) method implemented in MATLAB 2019b. The SG algorithm is a least-squares polynomial fitting-based smoothing technique that enhances spectral smoothness and reduces noise interference [31].

2.6. Dimension Reduction Analysis of Multispectral Data

Multispectral images comprise high-dimensional data with redundant information, which can lead to instability in the classification task.
Principal Component Analysis (PCA) is an unsupervised linear dimensionality-reduction technique that transforms potentially correlated high-dimensional variables into linearly uncorrelated variables through orthogonal transformation. PCA is employed to preserve significant features during data dimensionality reduction, eliminating noise and unimportant features and improving the robustness of statistical methods [32,33]. Ultimately, the goal is to enhance the model accuracy.
The Successive Projections Algorithm (SPA) is a variable selection technique that identifies spectral variables with minimal collinearity. Its selection criterion is based on choosing each new variable as the one with the maximum projection value in the orthogonal subspace of previously selected variables [34]. In this study, SPA was applied to extract characteristic wavelengths highly correlated with clubroot infection in Chinese cabbage.

2.7. Model Construction and Evaluation

In multispectral image analysis, a comprehensive understanding of both qualitative and quantitative relationship between spectral data and sample attributes is essential. The rules are extracted from spectral data and used to classify samples into different categories, which is critical for the accurate recognition performance of the model [35]. In this study, a detection model for Chinese cabbage clubroot was developed using Matlab 2019b software combined with four machine learning methods: RF, PLS-DA, SVM, and ELM.
For model construction, the hyperparameters of each algorithm were set according to commonly used values in the literature and empirical experience, without further optimization. Specifically, the SVM model employed a radial basis function (RBF) kernel with a penalty parameter (C = 100) and kernel coefficient (γ = 0.01). The RF model was configured with 500 decision trees, a maximum depth of 10, and the Gini index as the splitting criterion. The PLS-DA model retained 10 latent variables for discrimination, while the ELM model incorporated 100 hidden neurons with sigmoid activation function and random initialization of input weights. All hyperparameters remained unchanged throughout the study to ensure methodological reproducibility.
To avoid information leakage and maintain experimental validity, the dataset was first partitioned by plant ID, allocating 70% of samples to the training set and 30% to the test set. During partitioning, a fixed random seed was applied to guarantee reproducibility, while stratified sampling maintained consistent distribution between healthy and infected plants across both sets. This initial split provided the basis for comparative model evaluation and optimal model selection.
To further evaluate the robustness and generalization capability of the optimal model, and to mitigate potential overfitting caused by class imbalance, a stratified five-fold cross-validation strategy was adopted. Specifically, the dataset was randomly shuffled and stratified by class labels to ensure that each fold preserved the same class distribution as the entire dataset. The dataset was then evenly divided into five mutually exclusive subsets. In each iteration, one subset was used as the validation set, while the remaining four subsets were used for training. This procedure was repeated five times, ensuring every sample was included in the validation set exactly once.
During training, the model was fitted on the training folds and evaluated on the corresponding validation folds. For each fold, classification performance was assessed using accuracy, macro-averaged precision, macro-averaged recall, and macro-averaged F1-score. Finally, the mean and standard deviation of the metrics across the five folds were calculated to comprehensively reflect the overall robustness and generalization ability of the model. Model performance was evaluated using a visual binary confusion matrix, as illustrated in Figure 2.
In Figure 2, TP (True Positive) indicates the number of correctly predicted positive cases, TN (True Negative) represents correctly predicted negative cases, FP (False Positive) denotes negative cases incorrectly predicted as positive, and FN (False Negative) refers to positive cases incorrectly predicted as negative. Based on the confusion matrix, higher-level evaluation metrics can be derived, including accuracy, precision, recall, and F1-score. The formulas for these metrics are as follows:
Accuracy = (TP + TN)/(TP + FP + TN + FN)
Precision = TP/(TP + FP)
Recall = TP/(TP + FN)
F1-score = 2 × Precision × Recall/(Precision + Recall)
The Recall is equal to the true positive rate.
The main workflow of this study is summarized in the flowchart in Figure 3.

3. Results

3.1. Disease Identification

Following each multispectral image acquisition, six additional samples from each treatment were randomly selected to evaluate root disease progression (Figure 4). The results indicated that no disease symptoms were observed in the roots of either inoculated or control plants prior to 15 DAI. Beginning at 15 DAI, very slight swelling was observed on the lateral roots of plants inoculated with P. brassicae. These symptoms progressively intensified in subsequent observations. By 42 DAI, all inoculated plants exhibited severe gall formation, with the majority showing pronounced swelling symptoms. Specifically, the disease incidence reached 99.67%, and the disease index was calculated to be 96.00%.
Custom Regions of Interest (ROIs) were carefully selected on the leaves. Subsequently, the spectral band reflectance of both the control treatment and the inoculation with P. brassicae were extracted. Figure 5 showed the average spectral reflectance of leaves in the healthy and infected plants from 4 DAI to 19 DAI, which demonstrated the typical spectral characteristics of green plants, wherein the 550 nm wavelength exhibits a strong reflection of chlorophyll green, while the strongest absorption takes place near the 680 nm band, attributed to chlorophyll. Notably, the reflectivity of both healthy and infected materials followed the same trend, with identical peak and trough positions. Specifically, they exhibited a reflection peak around 570 nm, a reflection valley around 680 nm, and a sharp increase within the 700–800 nm range. Furthermore, variations in reflectance were observed on the leaf surface within the 500 nm–600 nm and 780 nm–970 nm wavelengths, with significantly higher reflectance on the infected leaves compared to the healthy leaves from 11 DAI.

3.2. Multispectral Preprocess and Characteristic Wavelength Extraction

The original spectral data contains noise from the instrument and the environment. SG smoothing and filtering can remove irrelevant signals from the original spectral data and preserve effective information (Figure 6). Compared with the original curve, the processed curve has become smoother, and the noise has been significantly reduced.
The successive projections algorithm (SPA) was used to extract the characteristic wavelengths which contain information related to Chinese cabbage clubroot disease. A total of 11 characteristic wavelengths were obtained, including 365 nm, 430 nm, 470 nm, 490 nm, 515 nm, 540 nm, 590 nm, 690 nm, 780 nm, 850 nm and 880 nm (Figure 7). The redundant information of 8 wavelengths was filtered out, which can simplify the construction of the model to a certain extent.

3.3. Principal Component Analysis (PCA) of Multispectral Data

PCA was used to explore the spectral reflectance of the healthy and infected plant leaves. Two principal component (PC) factors were extracted and contained 95.4% of the spectral information, with PC1 accounting for 69.1% and PC2 accounting for 26.3% (Figure 8). PCA also revealed that the samples showed a distinct distribution pattern between 4 DAI, 7 DAI and 11 DAI to 42 DAI. The samples at 4 DAI and 7 DAI were mainly distributed near the coordinate axes of the first and second quadrants in a highly overlapped region. With the invasion of P. brassicae, the samples gradually moved towards the third and fourth quadrants from 11 DAI to 42 DAI, and highly overlapped near the origin and in the third quadrant (Figure 8). Consequently, the spectral data from 11 DAI were partitioned into two groups: the first group comprising uninfected samples prior to 11 DAI, and the second category comprising infected samples from 11 DAI to 42 DAI. At 11 DAI, it can be detected that Chinese cabbage has been infected with clubroot disease.

3.4. Machine Learning Classification Results

Recognition models were constructed using the RF, PLS-DA, SVM, and ELM methods based on the data of full-band reflectance data. After inoculation with P. brassicae, 10 time points of multispectral images were collected, and spectral data from a total of 3820 test samples were obtained. To evaluate the performance of the constructed model, the spectral data collected each time were randomly divided into training and test sets in a 7:3 ratio.
Figure 9a,b show the diagram of the RF confusion matrix. The RF model achieved an accuracy 99.89%, a precision 99.90%, and a recall 99.81% in the training set, and an accuracy 82.45%, a precision 80.89%, and a recall 72.44% in the test set. However, the difference in various evaluation index value between the training set and the test set was significant, so the RF model was unstable.
Figure 9c,d shows the PLS-DA confusion matrix. The accuracy and precision of the training set in the PLS-DA model was 80.85% and 77.22%, respectively, while the accuracy and precision of the test set was 80.80% and 77.52%, respectively. For the SVM model, the accuracy, precision and recall of the training set in the SVM model was 84.74%, 80.74%, and 98.27%, respectively, while the accuracy, precision, and recall of the test set was 81.85%, 77.94%, and 96.11%, respectively (Figure 9e,f). The various evaluation index values of the training and testing sets were consistent and higher than those in PLS-DA model.
Figure 9g,h shows the diagram of ELM confusion matrix. The accuracy, precision and recall of the training set of the ELM model was 82.01%, 74.25%, and 99.44%, respectively, while the accuracy, precision, and recall rate of the test set was 78.36%, 73.90%, and 99.42%, respectively. The recognition performance of the test set of this model was relatively poor compared to the training set.
The F1-score was further used to evaluate the four models (Table 1). The training set of the SVM model achieved an F1-score of 0.8865, while the F1-score of the test set was 0.8604. Compared with the other three models, the SVM model exhibited better performance. The recognition performance of the PLS-DA model was second only to SVM model.

3.5. Early Detection Model Results Based on Characteristic Bands

The recognition models were further constructed using the RF, PLS-DA, SVM, and ELM algorithms based on the data of the characteristic band reflectance. The performance of each model is shown in Table 2. It was found that the evaluation indexes in the training set of the RF model were all close to 100% and better than the test set, which showed over-fitting phenomenon. The recognition performance of the test set of the PLS-DA model was relatively poor compared to the training set. The recognition accuracy of SVM model was low in the training and test sets. The accuracy, precision, recall, and F1-score of ELM model was 84.74%, 88.98%, 69.87%, and 0.7828 in the training set, and 84.28%, 87.29%, 70.22%, and 0.7783 in the test set, respectively. These evaluation indexes were the best performing among these four models, and there was not much difference in recognition performance between the test set and the training set. Thus, the ELM model was the best among the four models and better than the full-band model.

3.6. Performance of SPA-ELM in Early Prediction of Clubroot Disease Based on Five-Fold Cross-Validation

In this study, the best-performing SPA-ELM model was evaluated using five-fold cross-validation to verify its generalization capability and reduce the risk of overfitting resulting from class imbalance. The results demonstrated that the model exhibited stable and reliable performance in detecting clubroot disease in Chinese cabbage based on characteristic bands from multispectral data. On the training folds, the model achieved an average accuracy of 84.69% (±0.69%), precision of 85.81% (±0.84%), recall of 82.13% (±0.79%), and F1-score of 83.93% (±0.75%). On the validation sets, the model achieved an average accuracy of 83.79% (±1.04%), precision of 84.80% (±1.21%), recall of 81.19% (±1.30%), and F1-score of 82.13% (±1.12%).
All four evaluation metrics remained above 80% on both the training and validation sets, indicating that the optimal SPA-ELM model possesses strong robustness and generalization capability. These results demonstrate that the combination of multispectral imaging and the SPA-ELM algorithm provides an effective approach for the early prediction of clubroot disease in Chinese cabbage, thereby enabling more accurate and reliable disease monitoring with significant practical value.

4. Discussion

Detection of plant root disease is a challenging task compared to other diseases [36]. By the time symptoms are visible in the leaves, root disease has often progressed to a serious level. In terms of spectral reflectance, the concentration of intracellular pigments is associated with the difference in spectral reflectance in the visible light range of 400 nm–700 nm [37]. The results of this study suggested that clubroot may affect the synthesis of leaf pigments during P. brassicae infection. Additionally, the near-infrared light range of 700 nm–1300 nm is related to cell structure and water [38]. In this study, differences were observed between the inoculated material and the control material at 780 nm–970 nm, indicating that root health affected water and nutrient absorption during the process of root swelling, subsequently causing the changes in physiological phenotype in leaves.
Spectral imaging has shown considerable promise for plant disease detection. For instance, Veys et al. [39] combined a multispectral imaging system with a SVM model to accurately assess the severity of rapeseed leaf spot disease, achieving an accuracy of 92%. Zhao et al. [40] adopted hyperspectral imaging technology to collect spectral data from ginseng leaves to investigate ginseng root diseases and employed an extreme RF algorithm to construct an early detection model for ginseng root diseases. For Chinese cabbage, Feng et al. [15] also employed hyperspectral imaging technology to capture leaf images of both healthy and clubroot-diseased plants at six weeks after inoculation. They established a spectral model for identifying clubroot disease using a convolutional neural network and SVM. These findings confirm that root diseases can be diagnosed through spectroscopic analysis of leaf phenotypes. However, model development at late infection stages is time-consuming and impractical for timely disease management, as root disease symptoms are already apparent by this stage. In contrast, the current study shows that multispectral imaging can detect P. brassicae infection as early as 11 days after inoculation. This approach offers a rapid and efficient alternative to hyperspectral imaging for early clubroot detection.
SVM performed best in the model constructed based on full-band spectral data, the accuracy of the test set reached 81.15%. ELM performed best in the model constructed based on characteristic bands, and the accuracy of the test set was 84.28%. The model constructed using the characteristic band has the best recognition effect. In the processing of feature band data, SVM has a tendency to fall into local optimum, which may lead to its inability to find the global optimal classification surface. ELM uses random initialization weights and direct calculation of output layer weights to avoid this phenomenon, so the model ELM constructed by feature bands performed better than SVM. The superior performance of the ELM model compared to SVM can be attributed to its learning mechanism [41]. ELM employs a single hidden-layer feedforward neural network where the input weights and biases are randomly generated and remain fixed, while the output weights are analytically determined using the Moore–Penrose generalized inverse. This non-iterative training process allows ELM to achieve faster convergence and reduces the risk of overfitting [42]. In contrast, SVM relies on kernel-based optimization, which is sensitive to parameter selection and more prone to becoming trapped in local optima when identifying complex classification boundaries—especially in feature band data [29]. This may explain why SVM fails to consistently locate the global optimal classification surface in such scenarios.
The RF model exhibited signs of overfitting on the training sets for both full-band and characteristic-band data, likely due to the influence of irrelevant or redundant spectral features [43]. In this study, the primary objective was to compare the performance of different models and identify the most suitable algorithm for early detection of clubroot. Therefore, additional optimization of the RF model, such as hyperparameter tuning or feature regularization, was not conducted. Instead, the focus was placed on validating the robustness and generalization capability of the best-performing model through stratified five-fold cross-validation [44].
To further evaluate the robustness of the optimal model, a stratified five-fold cross-validation was conducted. The results showed an average training accuracy of 84.69% (±0.69%) and a validation accuracy of 83.80% (±1.04%). Macro-averaged Precision, Recall, and F1-score all remained in the range of 83–85%, with low standard deviations, indicating consistent and stable performance across different data splits and demonstrating strong generalization capability.
However, it should be noted that this study was conducted under controlled environmental conditions using a single susceptible Chinese cabbage cultivar (Youlv No.3). While such controls help ensure experimental consistency, they may not fully represent the variability of field conditions such as light intensity, temperature fluctuations, and soil heterogeneity [45]. Moreover, the use of a single cultivar might limit the model’s generalizability across genotypes, as spectral characteristics may differ due to variations in resistance, pigment composition, and physiological traits. Therefore, future research should include multiple cultivars with differing resistance levels and validate the model under field conditions to enhance its applicability.
The optimal model achieved early detection accuracy in the range of 80–84%, suggesting that this performance may be valuable for practical applications. It enables timely warnings for a majority of plants, supporting early intervention. However, misclassifications—particularly false negatives where infected plants are incorrectly identified as healthy—pose a potential risk for disease spread if left unaddressed. Future efforts should focus on reducing the false negative rate through expanded datasets, integration of environmental variables, or more advanced modeling techniques.
Despite the promising results achieved in this study, several practical considerations also need to be addressed before large-scale field deployment [46]. While the multispectral system used here is more cost-effective than hyperspectral imaging, it requires further optimization for real-time, large-scale monitoring. In addition, confounding factors such as nutrient stress, drought, or other pathogens may produce spectral responses similar to clubroot, complicating accurate diagnosis under real-world conditions. Although these were minimized in the current controlled setup, future studies should incorporate additional indicators—such as chlorophyll fluorescence, soil moisture, or nutrient status—and test under multi-stress field environments to improve discrimination ability and robustness.

5. Conclusions

Multispectral imaging technology holds the potential for early diagnosis of clubroot disease of Chinese cabbage. Different classification models, including RF, PLS-DA, SVM, and ELM, were utilized to distinguish between infected and uninfected plants. The findings demonstrate that a multispectral imaging system can detect infection as early as the eleventh day after P. brassicae inoculation in Chinese cabbage. The SVM model based on the full-band data and the ELM model based on the characteristic-band data both achieved accuracy rates above 80% for early detection of Chinese cabbage clubroot disease. Notably, the ELM model based on the characteristic-band data showed superior performance compared with the SVM full-band model. Stratified five-fold cross-validation was used to validate the optimal model. The average accuracy of 83.80% (±1.04%) and macro-averaged F1-score of 82.95% (±1.12%) across validation folds were obtained, confirming stable performance. Our findings, for the first time, identified the detectable spectral differences between the healthy and infected plants at 11 DAI using leaf multispectral combined with machine learning, providing a potential application for early detection of clubroot and timely control in Chinese cabbage.
In future research, the goal is to extend this approach to the early detection of Chinese cabbage clubroot and other soil-borne diseases under field conditions, as well as to quantify the severity of clubroot disease in Chinese cabbage. However, it should be acknowledged that this study was conducted under controlled environmental conditions using a single Chinese cabbage cultivar. Therefore, further validation under variable field environments and across multiple cultivars with different resistance levels is necessary to confirm the robustness and practical applicability of the proposed model.

Author Contributions

This manuscript was written by Z.J., who completed the experimental design and data collection. D.Z., J.Z., X.F. and B.P. were involved in data analysis and model construction; S.X., S.S. and A.G. conceived the research plans; S.X. and S.S. supervised the experiments; L.W. and D.M. helped to manage the plant materials and take photos; L.M. and Y.W. contributed to the paper revision. All authors have read and agreed to the published version of the manuscript.

Funding

We appreciate the financial support provided by Key Research & Development Project of Hebei Province (grant no. 21326311D-2, 22326903D), ‘Hundred Talents Plan’ of Hebei Province (grant no. E2020100004), Special Project for Enhancing Innovation Capability in Baoding City, Hebei Province.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. Zhao, Y.; Wang, H.; Mei, Y.; Yue, Z.; Lei, J.; Tao, P.; Li, B.; Zhao, J.; Hu, Q. Profiling metabolites distribution among various leaf layers of Chinese cabbage. Horticulturae 2024, 10, 988. [Google Scholar] [CrossRef]
  2. Adhikary, D.; Islam, A.U.; Adhikari, S.; Chapara, V.; Truman, W.; Ludwig-Müller, J. Clubroot disease: 145 years post-discovery, challenges, and opportunities. Annu. Rev. Phytopathol. 2025, 63, 603–626. [Google Scholar] [CrossRef] [PubMed]
  3. Khalid, M.; Rahman, S.; Kayani, S.I.; Khan, A.A.; Gul, H.; Hui, N. Plasmodiophora brassicae–the causal agent of clubroot and its biological control/suppression with fungi—A review. S. Afr. J. Bot. 2022, 147, 325–331. [Google Scholar] [CrossRef]
  4. Greer, S.F.; Surendran, A.; Grant, M.; Lillywhite, R. The current status, challenges, and future perspectives for managing diseases of brassicas. Front. Microbiol. 2023, 14, 1209258. [Google Scholar] [CrossRef]
  5. Ito, S.; Maehara, T.; Maruno, E.; Tanaka, S.; Kameya-Iwaki, M.; Kishi, F. Development of a PCR-based assay for the detection of Plasmodiophora brassicae in soil. J. Phytopathol. 1999, 147, 83–88. [Google Scholar] [CrossRef]
  6. Zhang, D.; Fang, H.; He, Y. Research of crop disease based on visible/near infrared spectral image technology: A review. Spectrosc. Spectr. Anal. 2019, 39, 1748–1756. [Google Scholar]
  7. Faggian, R.; Strelkov, S.E. Detection and measurement of Plasmodiophora brassicae. J. Plant Growth Regul. 2009, 28, 282–288. [Google Scholar] [CrossRef]
  8. Tu, J.; Bush, J.; Bonham-Smith, P.; Wei, Y. Live cell imaging of Plasmodiophora brassicae—Host plant interactions based on a two-step axenic culture system. Microbiol. Open. 2019, 8, e00765. [Google Scholar] [CrossRef]
  9. Lu, X.H.; Zhang, X.M.; Jiao, X.L.; Hao, J.J.; Zhang, X.S.; Luo, Y.; Gao, W.W. Taxonomy of fungal complex causing red-skin root of panax ginseng in China. J. Ginseng Res. 2020, 44, 506–518. [Google Scholar] [CrossRef]
  10. Tholl, D.; Hossain, O.; Weinhold, A.; Röse, U.S.R.; Wei, Q. Trends and applications in plant volatile sampling and analysis. Plant J. 2021, 106, 314–325. [Google Scholar] [CrossRef]
  11. Cao, J.; Chang, J.; Huang, Y.; Wu, Y.; Ji, Z.; Lai, X.; Wang, J.; Li, Y.; Zhu, W.; Li, X. Optical design and fabrication of a common-aperture multispectral imaging system for integrated deep space navigation and detection. Opt. Lasers Eng. 2023, 167, 107619. [Google Scholar] [CrossRef]
  12. Urfan, M.; Sharma, S.; Hakla, H.R.; Rajput, P.; Andotra, S.; Lehana, P.K.; Bhardwaj, R.; Khan, M.S.; Das, R.; Kumar, S.; et al. Recent trends in root phenomics of plant systems with available methods-discrepancies and consonances. Physiol. Mol. Biol. Plants 2022, 28, 1311–1321. [Google Scholar] [CrossRef]
  13. Hernanda, R.A.P.; Lee, J.; Lee, H. Spectroscopy imaging techniques as in vivo analytical tools to detect plant traits. Appl. Sci. 2023, 13, 10420. [Google Scholar] [CrossRef]
  14. Heath, W.; Haydock, P.; Wilcox, A.; Evans, K. The potential use of spectral reflectance from the potato crop for remote sensing of infection by potato cyst nematodes. Remote Sens. Agric. Asp. Appl. Biol. 2000, 60, 185–188. [Google Scholar]
  15. Feng, L.; Wu, B.; Chen, S.; Zhang, C.; He, Y. Application of visible/near-infrared hyperspectral imaging with convolutional neural networks to phenotype aboveground parts to detect cabbage plasmodiophora brassicae (clubroot). Infrared Phys. Technol. 2022, 121, 104040. [Google Scholar] [CrossRef]
  16. Hillnhütter, C.; Mahlein, A.K.; Sikora, R.A.; Oerke, E.C. Use of imaging spectroscopy to discriminate symptoms caused by Heterodera schachtii and Rhizoctonia solani on sugar beet. Precis. Agric. 2012, 13, 17–32. [Google Scholar] [CrossRef]
  17. Jayapal, P.K.; Park, E.; Faqeerzada, M.A.; Kim, Y.S.; Kim, H.; Baek, I.; Kim, M.S.; Sandanam, D.; Cho, B.K. Analysis of RGB plant images to identify root rot disease in Korean ginseng plants using deep learning. Appl. Sci. 2022, 12, 2489. [Google Scholar] [CrossRef]
  18. Shi, Y.; Huang, W.; Luo, J.; Huang, L.; Zhou, X. Detection and discrimination of pests and diseases in winter wheat based on spectral indices and kernel discriminant analysis. Comput. Electron. Agric. 2017, 141, 171–180. [Google Scholar] [CrossRef]
  19. Vishnoi, V.K.; Kumar, K.; Kumar, B. A comprehensive study of feature extraction techniques for plant leaf disease detection. Multimedia Tools Appl. 2022, 81, 367–419. [Google Scholar] [CrossRef]
  20. Pei, Y.; Zuo, Z.; Zhang, Q.; Wang, Y. Multi-source information fusion strategies of aerial parts in FTIR-ATR spectroscopic characterization and classification of Paris polyphylla var. yunnanensis. J. Mol. Struct. 2019, 1196, 478–490. [Google Scholar] [CrossRef]
  21. Feng, L.; Wu, D.; He, Y. Identification and classification of rice leaf blast based on multi-spectral imaging sensor. Spectrosc. Spect. Anal. 2009, 29, 2730–2733. [Google Scholar]
  22. Hou, J.; Li, L.; He, J. Detection of grapevine leafroll disease based on 11-index imagery and ant colony clustering algorithm. Precis. Agric. 2016, 17, 488–505. [Google Scholar] [CrossRef]
  23. Bebronne, R.; Carlier, A.; Meurs, R.; Leemans, V.; Vermeulen, P.; Dumont, B.; Mercatoris, B. In-field proximal sensing of Septoria tritici blotch, stripe rust and brown rust in winter wheat by means of reflectance and textural features from multispectral imagery. Biosyst. Eng. 2020, 197, 257–269. [Google Scholar] [CrossRef]
  24. Zhang, T.; Guan, H.; Ma, X.; Shen, P. Drought recognition based on feature extraction of multispectral images for the soybean canopy. Ecol. Inf. 2023, 77, 102248. [Google Scholar] [CrossRef]
  25. Lizarazo, I.; Rodriguez, J.L.; Cristancho, O.; Olaya, F.; Duarte, M.; Prieto, F. Identification of symptoms related to potato Verticillium wilt from UAV-based multispectral imagery using an ensemble of gradient boosting machines. Smart Agric. Technol. 2023, 3, 100138. [Google Scholar] [CrossRef]
  26. Cross, J.F.; Cobo, N.; Drewry, D.T. Non-Invasive Diagnosis of Wheat Stripe Rust Progression Using Hyperspectral Reflectance. Front. Plant Sci. 2024, 15, 1–15. [Google Scholar] [CrossRef]
  27. Niu, Z.; Li, Y.; Moncada, J.D.S.; Johnson, W.; Lang, E.B.; Li, X.; Jin, J. Proximal Hyperspectral Imaging for Early Detection and Disease Development Prediction of Septoria Leaf Blotch in Wheat Using Spectral–Temporal Features. Comput. Electron. Agric. 2025, 235, 110400. [Google Scholar] [CrossRef]
  28. Zarbakhsh, S.; Fakhrzad, F.; Rajkovic, D.; Niedbała, G.; Piekutowska, M. Approaches and Challenges in Machine Learning for Monitoring Agricultural Products and Predicting Plant Physiological Responses to Biotic and Abiotic Stresses. Curr. Plant Biol. 2025, 43, 100535. [Google Scholar] [CrossRef]
  29. Ray, K.K.; Kumari, A.; Kumar, S.; Machavaram, R.; Shekh, I.; Deshmukh, S.M.; Tadge, P. Guava Leaf Disease Detection Using Support Vector Machine (SVM). Smart Agric. Technol. 2025, 12, 101190. [Google Scholar] [CrossRef]
  30. Zhang, H.; Feng, J.; Hwang, S.F.; Strelkov, S.E.; Falak, I.; Huang, X.; Sun, R. Mapping of clubroot (Plasmodiophora brassicae) resistance in canola (Brassica napus). Plant Pathol. 2016, 65, 435–440. [Google Scholar] [CrossRef]
  31. Zhang, G.; Hao, H.; Wang, Y.; Jiang, Y.; Shi, J.; Yu, J.; Cui, X.; Li, J.; Zhou, S.; Yu, B. Optimized adaptive savitzky-golay filtering algorithm based on deep learning network for absorption spectroscopy. Spectrochim. Acta Part A 2021, 263, 120187. [Google Scholar] [CrossRef]
  32. Sando, K.; Hino, H. Modal principal component analysis. Neural Comput. 2020, 32, 1901–1935. [Google Scholar] [CrossRef]
  33. Zhang, J.; Chen, X.; Khan, A.; Zhang, Y.; Kuang, X.; Liang, X.; Taccari, M.L.; Nuttall, J. Daily runoff forecasting by deep recursive neural network. J. Hydrol. 2021, 596, 126067. [Google Scholar] [CrossRef]
  34. Wu, D.; Wang, S.; Wang, N.; Nie, P.; He, Y.; Sun, D.W.; Yao, J. Application of time series hyperspectral imaging (TS-HSI) for determining water distribution within beef and spectral kinetic analysis during dehydration. Food Bioprocess Technol. 2013, 6, 2943–2958. [Google Scholar] [CrossRef]
  35. Zhang, J.; Cui, X.; Cai, W.; Shao, X. A variable importance criterion for variable selection in near-infrared spectral analysis. Sci. China Chem. 2019, 62, 271–279. [Google Scholar] [CrossRef]
  36. Ram, B.G.; Oduor, P.; Igathinathane, C.; Howatt, K.; Sun, X. A Systematic Review of Hyperspectral Imaging in Precision Agriculture: Analysis of Its Current State and Future Prospects. Comput. Electron. Agric. 2024, 222, 109037. [Google Scholar] [CrossRef]
  37. Hallik, L.; Kazantsev, T.; Kuusk, A.; Galmés, J.; Tomás, M.; Niinemets, Ü. Generality of relationships between leaf pigment contents and spectral vegetation indices in Mallorca (Spain). Reg. Environ. Change 2017, 17, 2097–2109. [Google Scholar] [CrossRef]
  38. Hughes, R.F.; Balzotti, C. A spectral mapping signature for the rapid ohia death (ROD) pathogen in Hawaiian forests. Remote Sens. 2018, 10, 404. [Google Scholar]
  39. Veys, C.; Chatziavgerinos, F.; AlSuwaidi, A.; Hibbert, J.; Hansen, M.; Bernotas, G.; Smith, M.; Yin, H.; Rolfe, S.; Grieve, B. Multispectral imaging for presymptomatic analysis of light leaf spot in oilseed rape. Plant Methods 2019, 15, 4. [Google Scholar] [CrossRef]
  40. Zhao, G.; Pei, Y.; Yang, R.; Xiang, L.; Fang, Z.; Wang, Y.; Yin, D.; Wu, J.; Gao, D.; Yu, D.; et al. A non-destructive testing method for early detection of ginseng root diseases using machine learning technologies based on leaf hyperspectral reflectance. Front. Plant Sci. 2022, 13, 1031030. [Google Scholar] [CrossRef]
  41. Chorowski, J.; Wang, J.; Zurada, J.M. Review and Performance Comparison of SVM- and ELM-Based Classifiers. Neurocomputing 2014, 128, 507–516. [Google Scholar] [CrossRef]
  42. Wang, Q.; Gao, Z.; Li, T.; Li, J.; Yang, F.; Chen, X.; Li, Z. Hierarchical Extreme Learning Machine for Dimensionality Reduction in Near-Infrared Spectral Analysis. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2026, 346, 126852. [Google Scholar] [CrossRef]
  43. Chittaragi, A.; Patil, B.; Kumar, M.K.P.; Devanna, P. Hybrid Random Forest—Artificial Neural Network Model Based Forecasting of Anthracnose in Bottle Gourd across Different Transplanting Windows. Smart Agric. Technol. 2025, 12, 101477. [Google Scholar] [CrossRef]
  44. Liu, C.; Grasso, S.; Brunton, N.P.; Yang, Q.; Li, S.; Chen, L.; Zhang, D. Metabolomics for Origin Traceability of Lamb: An Ensemble Learning Approach Based on Random Forest Recursive Feature Elimination. Food Chem. X 2025, 29, 102856. [Google Scholar] [CrossRef] [PubMed]
  45. Pérez-Pérez, C.A.; Gonzalez Viejo, C.; Fuentes, S.; Valiente-Banuet, J.I. Vineyard Proximal Sensing Using Multispectral Imaging to Evaluate Grape Ripening and Quality Traits Using Artificial Neural Networks Modeling. J. Agric. Food Res. 2025, 23, 102252. [Google Scholar] [CrossRef]
  46. Wei, J.; Dai, Z.; Zhang, Q.; Yang, L.; Zeng, Z.; Zhou, Y.; Liu, J.; Chen, B. Seed Multispectral Imaging Combined with Machine Learning Algorithms for Distinguishing Different Varieties of Lettuce (Lactuca sativa L.). Food Chem. X 2025, 27, 102399. [Google Scholar] [CrossRef]
Figure 1. Multispectral imaging system and data processing workflow. (a). Multispectral imaging acquisition system used for capturing reflectance data from plant leaves; (b). Representative 19-channel multispectral image cube, showing the spatial and spectral dimensions used for data extraction. The marked rectangular region indicates the selected spectral region for analysis; (c). Six customized regions of interest (ROIs) on the leave surface; (d). Representative leaf spectral reflectance curve derived from the ROI data.
Figure 1. Multispectral imaging system and data processing workflow. (a). Multispectral imaging acquisition system used for capturing reflectance data from plant leaves; (b). Representative 19-channel multispectral image cube, showing the spatial and spectral dimensions used for data extraction. The marked rectangular region indicates the selected spectral region for analysis; (c). Six customized regions of interest (ROIs) on the leave surface; (d). Representative leaf spectral reflectance curve derived from the ROI data.
Horticulturae 11 01335 g001
Figure 2. Diagram of the dichotomous confusion matrix. The confusion matrix shows the relationship between actual and predicted values in a binary system. True Positive (TP), False Negative (FN), False Positive (FP), and True Negative (TN) are four outcomes. They serve as the basis for calculating key model performance metrics.
Figure 2. Diagram of the dichotomous confusion matrix. The confusion matrix shows the relationship between actual and predicted values in a binary system. True Positive (TP), False Negative (FN), False Positive (FP), and True Negative (TN) are four outcomes. They serve as the basis for calculating key model performance metrics.
Horticulturae 11 01335 g002
Figure 3. Workflow of early detection of clubroot using multispectral imaging and machine learning. The workflow outlines the process for detecting Plasmodiophora brassicae infection at early stages. After inoculation, multispectral images of healthy and diseased plant samples are captured and processed through ROI extraction, Savitzky–Golay (SG) pre-processing, and feature extraction using principal component analysis (PCA) and successive projection algorithm (SPA). Extracted spectral features are used to train and validate machine learning models, including random forest (RF), support vector machine (SVM), partial least squares discriminant analysis (PLS-DA), and extreme learning machine (ELM). Model performance is then evaluated to enable accurate classification of clubroot infection.
Figure 3. Workflow of early detection of clubroot using multispectral imaging and machine learning. The workflow outlines the process for detecting Plasmodiophora brassicae infection at early stages. After inoculation, multispectral images of healthy and diseased plant samples are captured and processed through ROI extraction, Savitzky–Golay (SG) pre-processing, and feature extraction using principal component analysis (PCA) and successive projection algorithm (SPA). Extracted spectral features are used to train and validate machine learning models, including random forest (RF), support vector machine (SVM), partial least squares discriminant analysis (PLS-DA), and extreme learning machine (ELM). Model performance is then evaluated to enable accurate classification of clubroot infection.
Horticulturae 11 01335 g003
Figure 4. Disease progression of the Chinese cabbage cultivar Youlv No.3 inoculated with Plasmodiophora brassicae at different days after inoculation (DAI). Panels (ae) show representative plants at 11, 15, 19, 23, and 42 DAI, respectively. Early infection stages (ac) display mild leaf curling and reduced root elongation, while advanced stages (d,e) exhibit evident root swelling and deformation, typical of clubroot symptoms. Panel (f) presents the uninoculated control (CK) plants at 42 DAI, which maintained normal morphology and healthy white roots.
Figure 4. Disease progression of the Chinese cabbage cultivar Youlv No.3 inoculated with Plasmodiophora brassicae at different days after inoculation (DAI). Panels (ae) show representative plants at 11, 15, 19, 23, and 42 DAI, respectively. Early infection stages (ac) display mild leaf curling and reduced root elongation, while advanced stages (d,e) exhibit evident root swelling and deformation, typical of clubroot symptoms. Panel (f) presents the uninoculated control (CK) plants at 42 DAI, which maintained normal morphology and healthy white roots.
Horticulturae 11 01335 g004
Figure 5. Changes in leaf spectral reflectance of Chinese cabbage (Youlv No.3) in control (CK) and inoculated (TS) groups at different days after inoculation (DAI). The spectral reflectance curves of Chinese cabbage leaves in CK and TS groups at 4, 7, 11, 15, and 19 DAI are displayed. Each curve represents the mean reflectance spectrum across the visible (400–700 nm) and near-infrared (700–1100 nm) regions.
Figure 5. Changes in leaf spectral reflectance of Chinese cabbage (Youlv No.3) in control (CK) and inoculated (TS) groups at different days after inoculation (DAI). The spectral reflectance curves of Chinese cabbage leaves in CK and TS groups at 4, 7, 11, 15, and 19 DAI are displayed. Each curve represents the mean reflectance spectrum across the visible (400–700 nm) and near-infrared (700–1100 nm) regions.
Horticulturae 11 01335 g005
Figure 6. Original and smoothed spectral reflectance curves of Chinese cabbage leaves. Spectral data before and after Savitzky–Golay (SG) smoothing are shown. (a) Original spectral reflectance curves containing noise from the instrument and environmental interference. (b) Spectral reflectance curves after SG smoothing.
Figure 6. Original and smoothed spectral reflectance curves of Chinese cabbage leaves. Spectral data before and after Savitzky–Golay (SG) smoothing are shown. (a) Original spectral reflectance curves containing noise from the instrument and environmental interference. (b) Spectral reflectance curves after SG smoothing.
Horticulturae 11 01335 g006
Figure 7. Characteristic wavelength extraction by the successive projection algorithm (SPA). The characteristic wavelengths from the leaf spectral reflectance curve of Chinese cabbage were selected. The blue line represents the original spectral reflectance across the 400–1000 nm wavelength range, while the red squares indicate the specific wavelengths identified by the SPA method.
Figure 7. Characteristic wavelength extraction by the successive projection algorithm (SPA). The characteristic wavelengths from the leaf spectral reflectance curve of Chinese cabbage were selected. The blue line represents the original spectral reflectance across the 400–1000 nm wavelength range, while the red squares indicate the specific wavelengths identified by the SPA method.
Horticulturae 11 01335 g007
Figure 8. Principal component (PCA) score plot of multispectral data for Chinese cabbage at different days after inoculation (DAI). The plot is based on multispectral reflectance data collected from Chinese cabbage at various infection stages following Plasmodiophora brassicae inoculation. Each point represents an individual sample, and colors correspond to different days after inoculation (4–42 DAI), The circles in the figure represent the 95% confidence interval.
Figure 8. Principal component (PCA) score plot of multispectral data for Chinese cabbage at different days after inoculation (DAI). The plot is based on multispectral reflectance data collected from Chinese cabbage at various infection stages following Plasmodiophora brassicae inoculation. Each point represents an individual sample, and colors correspond to different days after inoculation (4–42 DAI), The circles in the figure represent the 95% confidence interval.
Horticulturae 11 01335 g008
Figure 9. Confusion matrices of the classification results obtained by different machine learning models. Confusion matrices show the performance of four models—random forest (RF) (a,b), partial least squares discriminant analysis (PLS-DA) (c,d), support vector machine (SVM) (e,f), and extreme learning machine (ELM) (g,h)—for early detection of Plasmodiophora brassicae infection in Chinese cabbage using multispectral data. Panels (a,c,e,g) represent training datasets, while (b,d,f,h) correspond to test datasets. RF achieves the highest overall accuracy. Class 1 refers to healthy plants, while Class 2 refers to infected plants.
Figure 9. Confusion matrices of the classification results obtained by different machine learning models. Confusion matrices show the performance of four models—random forest (RF) (a,b), partial least squares discriminant analysis (PLS-DA) (c,d), support vector machine (SVM) (e,f), and extreme learning machine (ELM) (g,h)—for early detection of Plasmodiophora brassicae infection in Chinese cabbage using multispectral data. Panels (a,c,e,g) represent training datasets, while (b,d,f,h) correspond to test datasets. RF achieves the highest overall accuracy. Class 1 refers to healthy plants, while Class 2 refers to infected plants.
Horticulturae 11 01335 g009
Table 1. F1-scores of random forest (RF), partial least-squares discriminant analysis (PLS-DA), support vector machine (SVM) and extreme learning machine (ELM) models for classifying Chinese cabbage seedlings based on multispectral imaging in relation to inoculation with Plasmodiophora brassicae.
Table 1. F1-scores of random forest (RF), partial least-squares discriminant analysis (PLS-DA), support vector machine (SVM) and extreme learning machine (ELM) models for classifying Chinese cabbage seedlings based on multispectral imaging in relation to inoculation with Plasmodiophora brassicae.
ModelRFPLS-DASVMELM
Train data0.99930.86000.88650.8502
Test data0.76430.85880.86040.8477
Table 2. Prediction results of random forest (RF), partial least-squares discriminant analysis (PLS DA), support vector machine (SVM) and extreme learning machine (ELM) models for classifying Chinese cabbage seedlings using characteristic bands in relation to inoculation with Plasmodiophora brassicae.
Table 2. Prediction results of random forest (RF), partial least-squares discriminant analysis (PLS DA), support vector machine (SVM) and extreme learning machine (ELM) models for classifying Chinese cabbage seedlings using characteristic bands in relation to inoculation with Plasmodiophora brassicae.
ModelTraining DataTest Data
AccuracyPrecisionRecallF1-ScoreAccuracyPrecisionRecallF1-Score
RF99.89%99.90%99.80%0.998583.41%83.16%72.44%0.7743
PLS-DA86.70%91.23%56.37%0.696880.52%89.55%57.11%0.6864
SVM77.44%87.23%50.00%0.635675.98%83.27%48.67%0.6143
ELM84.74%88.98%69.87%0.782884.28%87.29%70.22%0.7783
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiao, Z.; Zhang, D.; Zhang, J.; Wang, L.; Ma, D.; Ma, L.; Wang, Y.; Gu, A.; Fan, X.; Peng, B.; et al. Early Detection of Chinese Cabbage Clubroot Based on Integrated Leaf Multispectral Imaging and Machine Learning. Horticulturae 2025, 11, 1335. https://doi.org/10.3390/horticulturae11111335

AMA Style

Jiao Z, Zhang D, Zhang J, Wang L, Ma D, Ma L, Wang Y, Gu A, Fan X, Peng B, et al. Early Detection of Chinese Cabbage Clubroot Based on Integrated Leaf Multispectral Imaging and Machine Learning. Horticulturae. 2025; 11(11):1335. https://doi.org/10.3390/horticulturae11111335

Chicago/Turabian Style

Jiao, Zhiyang, Dongfang Zhang, Jun Zhang, Liying Wang, Daili Ma, Lisong Ma, Yanhua Wang, Aixia Gu, Xiaofei Fan, Bo Peng, and et al. 2025. "Early Detection of Chinese Cabbage Clubroot Based on Integrated Leaf Multispectral Imaging and Machine Learning" Horticulturae 11, no. 11: 1335. https://doi.org/10.3390/horticulturae11111335

APA Style

Jiao, Z., Zhang, D., Zhang, J., Wang, L., Ma, D., Ma, L., Wang, Y., Gu, A., Fan, X., Peng, B., Shen, S., & Xuan, S. (2025). Early Detection of Chinese Cabbage Clubroot Based on Integrated Leaf Multispectral Imaging and Machine Learning. Horticulturae, 11(11), 1335. https://doi.org/10.3390/horticulturae11111335

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop