1. Introduction
Rice is one of the most crucial crops in the world, and a main source of food for over half of the world’s population [
1]. Crop pests and diseases are among the serious threats to agricultural production in China. Diseases such as rice blast are increasingly affecting rice growers worldwide, posing a significant risk to global food security [
2]. The incidence and severity of rice blast fluctuate annually based on geographical location and environmental factors [
3], and can cause an annual yield loss of 10 to 30% [
4]. During severe outbreaks, reductions in rice production can escalate from 40 to 50%. When the pathogen infects the crown, leaf blade, leaf neck, and spike, the onset of the disease is most pronounced [
5]. Initially, the infection manifests as a small brown spot on the leaf tissue, gradually expanding into a fusiform shape, with a gray center and brown edges. These lesions may continue to grow, eventually merging and causing the death of the entire leaf [
6,
7]. Therefore, developing rapid and non-destructive methods for disease detection is indispensable for effectively monitoring rice growth.
Traditional methods for detecting rice diseases rely heavily on labor-intensive practices, including field inspections, surveys, and sample collection [
8]. These approaches are time- and labor-intensive, highly dependent on subjective judgment, and often yield diagnostic accuracy. Moreover, they typically require inspectors with specialized expertise [
9]. In large-scale or cross-regional farming, such methods are highly inefficient. Detecting early or latent signs of infestation also poses significant time and technical challenges [
10]. Hyperspectral imaging technology (HSI) has emerged as a promising alternative to traditional methods. Owing to its high spectral resolution and the ability to capture both spectral and spatial information, it has become a research hotspot and has been widely applied in crop disease identification [
11,
12,
13].
Technological advances in computer vision and machine learning have promoted the effective application of image processing in crop disease detection [
14,
15]. With the introduction of convolutional neural network (CNN) methods, crop disease detection based on deep learning has shown remarkable performance compared to traditional machine learning methods [
16,
17]. Even with limited labeled samples, semi-supervised models have been able to achieve high classification accuracy. You et al. [
18] utilized hyperspectral microscope imaging combined with advanced chemometrics to recognize diseased Korla fragrant pears. The CNN model based on fusion datasets performed the best at differentiating between healthy pears and diseased pears, with a maximum accuracy of 96.72%. Chun et al. [
19] developed a rapid and non-destructive method for determining the stage of botrytis cinerea infection in strawberry fruit using hyperspectral fluorescence imaging combined with multiple classification models. The 1D-CNN model, which was based on a ResNet-50, outperformed the other models with a precision of 96.88%, a recall of 96.87%, an F1-score of 96.85%, and an accuracy of 96.86%. Gui et al. [
20] suggested a classification method using HSI technology for the early identification of soybean mosaic virus disease. This method combined a convolutional neural network with a support vector machine. The CNN-SVM model demonstrated 96.67% accuracy on the training set and 94.17% accuracy on the test set. Qi et al. [
21] proposed a deep learning classification framework for effectively detecting potato late blight in hyperspectral images. The model integrated 2D and 3D convolutional neural networks (2D-CNN and 3D-CNN) with deep collaborative attention networks (PLB-2D-3D-A), achieving a maximum accuracy of 79%. Bu et al. [
22] proposed ResNet-R&H, a novel classification model based on the ResNet architecture. By fusing RGB and hyperspectral images, the model was used to assess the freshness of vegetable soybean and achieved a test accuracy of 97.6%. Chen et al. [
23] developed a deep learning model called the Rice Bakanae Disease-Visual Geometry Group (RBD-VGG) to detect rice bakanae disease. On the 21st day of infection, an accuracy of 92.2% was attained on average using this model.
Although HSI and image processing technologies have been widely used in agricultural production, most of the existing studies are still limited to modeling and analysis using spectral or image data. In contrast, relatively little attention has been given to the extraction and fusion of spatial structural features embedded within spectral images. Rice blast infestation often presents as a multilevel symptom ranging from microscopic pigment changes to macroscopic leaf morphology. Relying solely on biochemical response signals from spectral curves makes it difficult to fully capture the spatial distribution characteristics of the disease. Image information, on the other hand, can reveal structural changes in the shape, margins, and irregularities of disease spots. Therefore, relying only on single-dimensional features may lead to information loss and limited discriminative capability. The core of the HSI technique lies in its three-dimensional data, which include rich two-dimensional image information and one-dimensional spectral data. This necessitates utilizing the spectral and spatial information of the fusion features to distinguish different levels of rice blast.
Given these, this paper proposed a dual-channel deep learning feature fusion adaptive conditioning model. It employed the successive projection algorithm (SPA), random frog (RFrog), and competitive adaptive reweighted sampling (CARS) to determine spectral characteristic wavelengths associated with rice blast. In parallel, spatial features were extracted using the MobileNetV2 model combined with a dual-attention mechanism. Spectral and image data were input simultaneously and fused through a feature fusion adaptive conditioning module. This approach fully leveraged both internal and external features of disease identification, enabling high-precision classification of rice blast. Comparative analysis was conducted using support vector machine (SVM), particle swarm algorithm optimized support vector machine (PSOSVM), random forest (RF), residual neural network (ResNet50), and MobileNetV2 models with those in this study. This provides research ideas and methods for the application of hyperspectral images in rice blast detection.
4. Discussion
Rice blast poses a serious threat to rice production, potentially leading to significant yield losses or possibly crop collapse [
37]. Efficient and precise monitoring and control of rice blast are crucial for increasing rice production and quality. HSI is an emerging technology that has been widely applied in disease detection and agriculture due to its non-destructive, quick, and precise qualities.
4.1. Physiological Relevance and Mechanistic Interpretation of Selected Spectral Features
To eliminate the effect of nonlinear distortions and noise caused by the environment and equipment in the spectral images, and to obtain more accurate data, the Savitzky–Golay (SG) filter with nine points was first used to calibrate and process hyperspectral images [
38]. Identifying the correct spectral features is essential for recognizing abnormal areas and disease lesions [
39], as shown by the spectral reflectance curves corresponding to the different types of ROIs in
Figure 7. Therefore, after pre-processing, the high-dimensional spectral data were analyzed to find characteristic wavelengths using SPA, RFrog, and CARS. As can be seen from
Table 3, there were some differences in the wavelengths extracted by the three feature selection methods, but all of them were highly concentrated in the red-edge and NIR region. Specifically, seven of the ten characteristic wavelengths (685–973 nm) extracted by the SPA method were located in the red-edge and NIR bands. Similarly, 11 of the 18 wavelengths (648–961 nm) extracted by the RFrog method belong to the same wavelengths, while 44 of the 59 characteristic wavelengths (630–990 nm) extracted by the CARS method were in the red-edge and NIR bands. The red-edge band, located between 630–720 nm, was a unique spectral wavelength between the red absorption maximum and high reflectance [
40], reflecting the conversion process between chlorophyll absorption and sheet scattering. Horler et al. [
41] identified a peak, about 720 nm, attributed to leaf reflection scattering properties by derivative analysis.
Table 3 shows that all three methods selected wavelengths near this peak, 726 nm, 721 nm, and 718 nm, respectively. In addition, the red-edge band was selected frequently, probably due to a decrease in leaf chlorophyll content caused by rice blast, which resulted in a shift of the red-edge position to shorter wavelengths (blueshift), a spectral change that provides reliable feature support for early disease detection [
42]. With the deepening of the infestation process, the infestation filaments gradually penetrated the interior of the leaf cells, leading to a decrease in the reflective ability of water in the leaf to the NIR band and a weakening of the cellular water-holding capacity. At the same time, the plant produces reactive oxygen species (e.g., hydrogen peroxide) and deposits cellulose at the site of infection, eventually triggering tissue necrosis. Therefore, the selected characteristic wavelengths were highly concentrated in the NIR region, reflecting the sensitive response of water changes to spectral features [
43].
4.2. Performance Assessment of Rice Blast Recognition Based on Spectral–Spatial Feature Fusion
Previous studies demonstrated that combining spectral features with image textural features enabled the identification of rice blast [
44,
45]. In contrast, the DCFM model proposed in this study directly input two-dimensional hyperspectral images, after PCA-based dimensionality reduction, into the network. MobileNetV2 was employed to extract spatial information, and the CBAM module was introduced to capture dependencies between spatial neighborhoods and salient features of the input feature maps. This enabled the model to extract more important spatial features. The last SoftMax layer of MobileNetV2 was removed, so its final output was the spatial information features of the image, eliminating the need to extract texture and shape features separately. Spatial features were mapped to one-dimensional data using a fully connected layer, which was then used for subsequent feature fusion with spectral data. This method utilized both spectral and image data obtained from hyperspectral images, enabling the simultaneous consideration of structural differences and spectral features associated with rice blast to improve classification accuracy. Based on different input features, three DCFM models were constructed, with the OA and Kappa being higher than 90% and 88%, respectively.
To further evaluate the classification model, the proposed DCFM model was compared to SVM, PSOSVM, RF, ResNet50, and MobileNetV2. The OA, Kappa, and F1-score of these models were lower than those of the DCFM model. When combined with the SPA feature extraction method, the DCFM achieved its best performance, with an OA of 96.72% and a Kappa coefficient of 95.97%. Additionally, it exhibited high precision in detailed disease classification, with accuracies of 94.48%, 99.42%, 95.03%, 94.53%, and 100% for levels 0, 1, 2, 3, and 4, respectively. Firstly, this may be because SPA selected fewer but more representative feature dimensions. It retained a set of wavelengths with minimal information redundancy among bands in the spectral data and effectively avoided multicollinearity. In contrast, CARS tended to select a set of spectral characteristic wavelengths that were highly correlated with the classification results. However, its stability was easily affected by the adaptive resampling strategy, resulting in a slightly lower performance compared to the SPA-based model. RFrog was a probability-based feature selection method, which was more likely to introduce redundant wavelengths or noise while capturing potential information. This reduced the purity of the feature set and weakened the model’s generalization ability, making its overall performance the poorest among the three methods. In addition, the spatial attention mechanism added to the convolutional neural network emphasized important features and suppressed irrelevant information. This enhanced the network’s capacity to extract features and facilitated the fusion of spatial and spectral features, thereby enabling the proposed model to achieve superior classification accuracy and robustness. This proved that the dual-channel feature fusion method in this study was effective and superior in rice blast classification.
Figure 13A–C show the prediction results of the fusion feature model compared to the actual categories.
Figure 13D–L present the prediction results of models based solely on spectral features, while
Figure 13M,N illustrate the results for models using only image features. When using spectral or image features alone, the model’s misclassification on the mild disease levels (levels 1–3) was more obvious, indicating that the models have a problem of recognition ambiguity in identifying the early stages of the disease. In contrast, the number of missed and misclassified judgments by DCFM was significantly reduced. This indicated that the fused spatial information provided the model with more discriminative features. This enhanced the practicality and promotion potential of the DCFM model in complex field environments.
4.3. Advantages and Limitations
In small and medium sample data scenarios, complex feature fusion mechanisms may trigger problems such as unstable model training. To balance model performance and computational efficiency, this study adopted a simple tandem strategy to fuse the spectral features with the spatial features that were mapped into one-dimensional vectors. Although fusion methods based on the attention mechanism or weighted combination can theoretically enhance the interaction and representation between different features, they tend to incur huge computational costs. In practical agricultural applications, the model can be integrated into hyperspectral imaging systems and deployed on ground monitoring platforms or handheld portable devices for real-time field monitoring. Due to the low computational complexity of the model, it is suitable for resource-constrained agricultural production environments.
Meanwhile, it is undeniable that the number of training samples for different complex field environments has an important impact on the performance of deep learning models. To further improve the generalization ability and recognition accuracy of the rice blast detection model, future research will focus on expanding the sample sources to include rice blast samples from diverse geographical regions, cropping structures, and rice varieties. It will also aim to extend detection from the leaf scale to the canopy scale and incorporate more advanced feature fusion strategies to support broader field-level monitoring and management applications.