Recognition Method of Crop Disease Based on Image Fusion and Deep Learning Model

Ma, Xiaodan; Zhang, Xi; Guan, Haiou; Wang, Lu

doi:10.3390/agronomy14071518

Open AccessArticle

Recognition Method of Crop Disease Based on Image Fusion and Deep Learning Model

College of Information and Electrical Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, China

^*

Author to whom correspondence should be addressed.

Agronomy 2024, 14(7), 1518; https://doi.org/10.3390/agronomy14071518

Submission received: 13 June 2024 / Revised: 10 July 2024 / Accepted: 11 July 2024 / Published: 12 July 2024

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Accurate detection of early diseased plants is of great significance for high quality and high yield of crops, as well as cultivation management. Aiming at the low accuracy of the traditional deep learning model for disease diagnosis, a crop disease recognition method was proposed based on multi-source image fusion. In this study, the adzuki bean rust disease was taken as an example. First, color and thermal infrared images of healthy and diseased plants were collected, and the dynamic thresholding excess green index algorithm was applied to extract the color image of the canopy as the reference image, and the affine transformation was used to extract the thermal infrared image of the canopy. Then, the color image was fused with the thermal infrared image by using a linear weighting algorithm to constitute a multi-source fusion image. In addition, the sample was randomly divided into a training set, validation set, and test set according to the ratio of 7:2:1. Finally, the recognition model of adzuki bean rust disease was established based on a novel deep learning model (ResNet-ViT, RMT) combined with the improved attention mechanism and the Squeeze-Excitation channel attention mechanism. The results showed that the average recognition rate was 99.63%, the Macro-F1 was 99.67%, and the recognition time was 0.072 s. The research results realized the efficient and rapid recognition of adzuki bean rust and provided the theoretical basis and technical support for the disease diagnosis of crops and the effective field management.

Keywords:

crops; disease; multi-source image fusion; deep learning; recognition model

1. Introduction

Diseases seriously affect the yield and quality of crops. Timely and efficient recognition of diseases is important for high-quality breeding, scientific cultivation, and fine management [1,2]. Adzuki bean belongs to a short-day and temperature-loving plant. The soil requirements are not strict: it is appropriate to be planted in the sandy loam with good drainage, water and fertilizer retention, and rich in organic matter. Adzuki bean is not only an important food crop, but also a kind of high-quality nutrition and healthcare product. Rust disease is one of the important factors restricting high yield for the adzuki bean plant. Once the disease conditions are suitable, rust disease has the characteristics of wide incidence area, fast spread, multiple re-infection, and large harm yield reduction. If the usual effective prevention is ignored, once there is a large occurrence year, it will cause the leaves to fall off, the premature aging of the plant, the thinning of the grain, as well as heavy yield loss. Thus, it is necessary to diagnose the occurrence of rust disease in time and effectively. The yield of adzuki bean is reduced by about 20% every year due to rust disease, which seriously affects the yield and quality of adzuki bean [3]. The adzuki bean rust disease in this study was transmitted by spores, and there were no significant changes in the appearance of the leaves at the beginning of infection. However, as the disease progressed, yellow or light green spots gradually appeared, and spore mounds were formed on both sides of the leaves. The traditional method of disease diagnosis is mainly based on the visual diagnosis of plant protection experts’ professional experience. Although accurate and authoritative, the infection rate of adzuki bean rust disease is extremely fast, and plant protection experts cannot diagnose the disease in the field at all times, so the disease diagnosis will be delayed [4]. With the continuous development of intelligent agriculture, many scholars have researched the intersection of computer technology and agriculture by using image processing technology. Color images have limited ability to reflect the subtle changes in the early stages of rust disease. After the crop is infected, the respiration and transpiration of the canopy will be affected, resulting in a differential temperature distribution between the infected and healthy canopies. With the increase in time spent collecting thermal infrared samples in room-temperature environments, both the acquisition device and the ambient temperature affect the leaf temperature distribution in the thermal infrared image. Relying solely on a single image type has limitations in disease recognition. Thus, the multi-source image fusion of color and thermal infrared images was carried out to improve the recognition accuracy. The fused images not only retained the color information but also integrated the thermal signal of the internal physiological state of the crop, which made up for the deficiency of a single image type in recognizing early symptoms or resisting environmental interference. The method proposed here enhances the early detection and accurate recognition ability of plant diseases, such as adzuki bean rust, and will become a frontier hotspot in the field of intelligent agriculture.

In addition, deep learning technology and excellent algorithms play a crucial role in intelligent agriculture. Pre-trained models were used to recognize the dry, healthy, and unhealthy states of lemongrass, and the dataset was expanded, which showed that an enriched dataset played a greater role in improving the precision of the machine learning models in detection of diseases [5]. Segmentation of high-resolution remote sensing images was realized by introducing a fuzzy neighborhood module to solve the ambiguity of feature information [6]. The improved thresholding neural network was used to improve the image quality and the deep spectral generative adversarial neural network was used to achieve the detection of rice leaf blight disease [7]. The DFN-PSAN network-integrated multi-level depth features were applied to effectively classify plant diseases in complex natural environments [8]. The ResNet18 network model, based on transfer learning, was used to identify black pepper leaf anthracnose, slow blight, early blight, and yellowing diseases, with up to 99.7% accuracy [9]. By analyzing the shape and color feature information of different kinds of pests, the HSV segmentation threshold and template contour of pests were obtained, and the shape factor was used to determine the segmentation adhesion area. The color segmentation method and contour positioning segmentation method were used to realize the segmentation of non-interspecific and interspecific adhesion pests [10]. The affine transformation algorithm was used to obtain the thermal infrared image of the lettuce canopy. By calculating the water stress index (CWSI) and daily evapotranspiration (ET) of the canopy temperature under different treatments, the correlation between CWSI and ET under different irrigation treatments was analyzed to monitor the degree of lettuce water deficit [11]. The thermal infrared canopy area of crops was extracted by the affine transformation algorithm [12,13]. The weighted fusion algorithm fused the input images by calculating the saliency weights and saturation weights of the two images [14]. Detection and classification of anthracnose, powdery mildew, and leaf blight diseases of South Indian mango were performed using the deep convolutional neural network, with a 93.34% classification accuracy [15]. Through the above analysis, it can be seen that most of the current studies mainly focused on plant diseases with relatively obvious symptoms using color images based on the convolutional neural network or residual neural network. The adzuki bean rust disease studied in this paper does not show obvious characteristics in the early stage. Thus, the color image cannot be adopted to diagnose rust disease. Therefore, the use of multi-source fusion images to diagnose adzuki bean rust disease has great exploration value.

In plant disease detection, automatic disease diagnosis using deep learning models was a contemporary research hotspot in smart agriculture. The convolutional neural network was used to detect downy mildew and early stages of infestation in RGB images of grape leaves. The recognition rate and F1 value for downy mildew on grapes in the infected stage were 81% and 77%, respectively [16]. The YOLOv3 model was constructed to diagnose tobacco disease, which had an average accuracy of 77% [17]. The BP neural network was adopted in recognizing three plant health conditions of mango, lemon, and pomegranate, with an 83.90% recognition accuracy [18]. A hybrid 3D-CNN-RNN model was proposed for the detection of corn leaf diseases, which achieved more than 90% accuracy in disease prediction [19]. The above analysis showed that the traditional deep learning model, which utilized single-source images as experimental samples, had some limitations in recognizing crop disease with no obvious characteristics. Therefore, early recognition of crop diseases using multi-source images has become a research priority.

In this study, multi-image fusion algorithms and deep learning models were used to fuse color and thermal infrared images of adzuki bean canopy for efficient recognition of adzuki bean rust disease. First, the dynamic threshold excess green index algorithm (DEXG) was applied to obtain the color image of the adzuki bean canopy, which was used as the reference image; second, thermal infrared images of the canopy were acquired by using affine transformation operations, and the linear weighting algorithm was applied to fuse the multi-source images of the canopy. Then, data enhancement algorithms (image rotating, peppercorn noise, and Gaussian noise) were used to expand the data samples of the healthy and diseased adzuki bean plants. Finally, the multi-head attention mechanism (MHSA) was improved by combining it with the attention mechanism (SE) embedded into the feed-forward neural network (FFN) structure of the transformer network to form the ResNet-ViT (RMT) network model for efficiently diagnosing the adzuki bean rust disease.

2. Materials and Methods

2.1. Experimental Materials

In this research, the Baoqing variety of adzuki bean plant was used as the research object, cultivated in the artificial climate laboratory of Heilongjiang Bayi Agricultural University of China. The experiments were carried out by selecting red lentil varieties with full grains and cultivated in PVC material pots with an aperture × height of 20 cm × 15 cm, which were filled with a soil–sand mixture substrate with a ratio of 2:1 and pre-applied with bottom fertilizer. Six adzuki beans were planted in each pot, and when each seedling successfully sprouted at least three fully extended leaves, a suspension of rust disease with urediospore was uniformly inoculated into the leaves using a spray inoculation technique to mimic the pathogen infection process. Before collecting the experimental samples, a multi-source image acquisition system based on the thermal infrared imager (FLIR-T620, Wilsonville, OR, USA) was constructed to collect the color image and thermal infrared image of the adzuki bean canopy (Figure 1). The FLIR-T620 device could sense temperature changes in the range of −40 °C to 1200 °C, with thermal sensitivity of 0.04 °C and resolution of 640 p × 480 p, as well as the wavelength range of 7.8–14 μm.

2.2. Data Acquisition Method

In this paper, the color image and thermal infrared image of the healthy and diseased adzuki bean plants were collected as the research objects. When the image of the adzuki bean plant was collected in an indoor environment, the adzuki bean canopy was fixed in the center area of the image. The distance of the camera lens from the ground was adjusted to 110 cm, and the focal length and aperture were set manually with automatic white balance. The resolution of the color image and the thermal infrared image was set to 640 × 480 pixels. Data samples were collected at a fixed time every day. The collected color images and thermal infrared images of adzuki bean canopies are shown in Figure 2 and Figure 3.

2.2.1. Extraction of the Color Canopy Images

The dynamic threshold excess green index algorithm (DEXG) is an improved algorithm based on the excess green index [20,21]. Through the linear combination of R, G, and B channels, the excess green index effectively enhanced the contrast between the green plants and the background to highlight the green vegetation in the image. However, its segmentation threshold was fixed, which could not adapt to the light and color changes of different images, resulting in a limited recognition efficiency of green areas. Therefore, in this paper, the DEXG algorithm was selected to extract the green canopy region in the color image of the adzuki bean canopy.

The DEXG algorithm is a dynamic threshold that changes according to the maximum and average values of the excess green index in the image, which was calculated by the following equation:

D E X G = \{\begin{cases} 255, 2 G - R - B > 255 \\ 2 G - R - B, 0 \leq 2 G - R - B < 255 \\ 0, 2 G - R - B < 0 \end{cases}

(1)

where,

R

,

G

, and

B

represent the red channel, blue channel, and green channel in the image, respectively, and 2G-R-B represents the calculated value of the super green index.

By calculating the dynamic threshold excess green index, the gray image was generated to highlight the green canopy area in the image, and the dynamic threshold T₂ was used to distinguish the canopy area and the background area in the sample. The calculation equation of T₂ was as follows:

T_{2} = D E X G_{a v g} + f \cdot (D E X G_{\max} - D E X G_{a v g})

(2)

where,

D E X G_{a v g}

represents the average value of the

D E X G

,

D E X G_{\max}

represents the maximum value of the

D E X G

,

f

represents the adjustment coefficient, and its value range is

[- 0.5, 0.5]

.

The dynamic threshold obtained above was used to extract the canopy region in the sample, and the segmentation results were post-processed with noise removal, filling the holes to obtain a color image of the adzuki bean canopy (Figure 4).

2.2.2. Evaluation of Extraction Effect for Color Canopy Image

In this study, the following indexes were used to verify the extraction effect of the color canopy image of the adzuki bean plant.

(1) The

D I C E

coefficient index was used to assess the similarity of the color canopy images before and after extraction, which indicated the proportion of the area intersected before and after the extraction of the canopy of the image to the total area. The

D I C E

coefficient was calculated by the following equation:

D I C E = \frac{2 | X \cap Y |}{| X | + | Y |}

(3)

where,

| X |

and

| Y |

represent the element counts in the color canopy image before and after extraction, respectively. The value range was [0, 1], and the nearer the value was to 1, the more highly matched the content of the image before and after the extraction.

(2)

O E

(overlap error) is the ratio of the error area of the overlap between the canopy image and the original image to the total area. The calculation equation was as follows:

O E = \frac{2 | X - Y |}{| X | + | Y |}

(4)

(3) The Jaccard coefficient was used to measure the similarity between images, which represented the proportion of the number of intersection elements before and after the extraction of the fusion image. The calculation equation was as follows:

J (A, B) = \frac{|A \cap B|}{|A \cup B|}

(5)

where,

A, B

represent the color image before and after canopy extraction, respectively, and the range of values of

J (A, B)

was [0, 1]. The closer the values were to 1, the higher the match between the images.

The values of each evaluation index for the above methods are shown in Table 1.

Table 1 showed that the DICE index was 0.9971, the Jaccard coefficient was 0.9869, with the value tending to 1, and the OE index was 0.0016, with the value tending to 0. Therefore, the DEXG algorithm selected in this paper could accurately extract the color images of the canopy for the adzuki bean plant.

2.2.3. Extraction of the Thermal Infrared Canopy Image

The thermal infrared sensor has a strong sensitivity to temperature changes. With an increased time when collecting the experimental data, the imaging effect of the thermal infrared image will be interfered with by the environmental temperature, resulting in fuzzy and uneven edges with no obvious difference in gray gradient between the canopy region and the background region. The traditional image thresholding segmentation algorithms could not accurately extract the canopy region of the thermal infrared images; thus, the affine transform algorithm [13] was selected to extract the thermal infrared images of the canopy.

(1) Registration of the feature point pairs in the reference image

The reference image and the original thermal infrared image were placed in the same coordinate system, and the feature point pairs,

P (x, y)

and

P^{'} (x^{'}, y^{'})

, were labeled, respectively. According to the labeled feature point pairs, the multi-source image was initially aligned (Figure 5).

(2) Extraction of the canopy thermal infrared image

After the initial alignment operation, the canopy thermal infrared image could not accurately contain all the regions of the canopy, so the affine transformation parameters were adjusted by translating, rotating, and scaling the reference image to realize the accurate fusion of the reference image and the canopy thermal infrared image. The feature parameters during the 2D affine transformation operation were expressed as follows:

[\begin{array}{l} x^{'} \\ y^{'} \end{array}] = η [\begin{array}{l} \cos (θ) & - \sin (θ) \\ \sin (θ) & \cos (θ) \end{array}] [\begin{array}{l} x \\ y \end{array}] + [\begin{array}{l} t_{x} \\ t_{y} \end{array}]

(6)

where,

η

represents the scaling parameter,

θ

represents the rotation angle,

t_{x}

represents the offset with respect to

x

, and

t_{y}

represents the offset with respect to

y

.

By continuously adjusting the translation, scaling, and rotation parameters, the extracted canopy thermal infrared images were as shown in Figure 6.

2.2.4. Evaluation of Extraction Effect of the Thermal Infrared Canopy Image

To evaluate the extraction effect of the canopy thermal infrared image, the segmentation accuracy (

S A

), over-segmentation rate (

O R

), and under-segmentation rate (

U R

) were chosen as the evaluation indexes [22,23,24,25,26].

(1)

S A

denotes the proportional relationship between the number of pixels correctly recognized as target canopy pixels after segmentation and the total number of original target canopy pixels.

S A

was defined as follows:

S A = \frac{T P}{T P + F P}

(7)

where,

T P

is the number of pixel points that were actually the canopy thermal infrared image after segmentation, and

F P

is the number of pixel points that misclassified the background pixels in the original image into canopy regions.

(2)

O R

refers to the ratio of the pixel number in the target canopy to the number of pixels in the target canopy before extraction.

O R

was calculated as follows:

O R = \frac{F P}{T P + F P}

(8)

(3)

U R

refers to the ratio of the pixel number of those that were actually canopy areas but wrongly classified as background to the count of canopy pixel points before extraction. UR was calculated as follows:

U R = \frac{F N}{T P + F P}

(9)

where,

F N

denotes the count of pixel points that were actually images of the adzuki bean canopy but were incorrectly classified as background. Table 2 shows the evaluation results.

Table 2 shows that the over-segmentation rate using the affine transform algorithm was 0.0026, and the under-segmentation rate was 0.0193, which converged to the ideal segmentation criterion value of 0. The average segmentation accuracy was 0.9785, which was converted to the ideal segmentation criterion value of 1. Therefore, the use of the affine transform algorithm could accurately extract the thermal infrared image of the adzuki bean canopy.

2.3. Multi-Source Image Fusion Algorithm

2.3.1. Linear Weighted Algorithm

The linear weighted algorithm was based on a pixel-level image fusion method [27], which multiplies the value of each pixel point in the color image and thermal infrared image by the corresponding weight coefficient, and dynamically adjusts the weight occupancy ratio between the multi-source images, ultimately obtaining the accurately fused image.

First, the multi-source canopy image was fused with uniform pixels,

224 \times 224

, as input, and the canopy thermal infrared image was read in grayscale mode. Second, in order to avoid the distortion of the image of the canopy region after the fusion of multi-source images, the background region of the multi-source image was extracted by setting a threshold. The pixels lower than the threshold were identified as the background pixels and were set to be white. Then, the R, G, and B channels of the color image were fused with the temperature information contained in the canopy thermal infrared image, respectively, to form a fused canopy image. Finally, the weight of the proportion of the multi-source image was dynamically adjusted to complete the accurate fusion of the multi-source canopy image. The flowchart of the process is shown in Figure 7.

In this study, the linear weighted algorithm was used to fuse the extracted color canopy image with the thermal infrared image. For inputted

I_{1}

and

I_{2}

, the fused image

I_{f u s e}

was defined as follows:

I_{f u s e} = α * I_{1} + β * I_{2}

(10)

where,

α

and

β

represent the percentage weights of the color image and thermal infrared image of the canopy in the range of [0, 1], respectively, and

α + β = 1

.

I_{1}

and

I_{2}

represent the color canopy image and the canopy thermal infrared image, respectively.

The linear weighted fusion algorithm selected in this paper could easily and efficiently fuse the multi-source images of the canopy. By weighting and summing the multi-source images of the canopy, the weight of the color image was set to

α

, and the weight of the thermal infrared image was set to

β

, to generate the fused image.

2.3.2. Evaluation of the Fusion Effect

To determine the weight ratio of the canopy thermal infrared image and the canopy color image, the weight ratio,

β

, of the thermal infrared image was progressively increased by increments of 1%, within the range of 0–100%. Correspondingly, the color image had a weight ratio of

α

. The effects of different proportions on the thermal infrared image are illustrated in Figure 8.

In this study, five evaluation indexes were used to evaluate the optimal weight ratio of multi-source canopy images, including mutual information (MI), peak signal-to-noise ratio (PSNR), structural similarity (SSIM), entropy, and edge count [28,29,30,31,32].

(1)

M I

is an index for evaluating the similarity between images, which was mainly used to measure the correlation between variables. The closer the value of

M I

was to 0, the lower the similarity between images; on the contrary, the higher the similarity between images.

M I

was calculated as follows:

M I (X, Y) = \sum_{x \in X} \sum_{y \in Y} p (x, y) \log (\frac{p (x, y)}{p (x) p (y)})

(11)

where,

X, Y

represent the random variables corresponding to images A and B,

p (x, y)

represents the joint probability distribution, while

p (x)

and

p (y)

represent the edge probability distribution of images A and B, respectively.

(2)

P S N R

is an index for evaluating the quality of image reconstruction and is commonly used to measure the similarity of the initial image and the fused image. The higher the value of

P S N R

, the lower the degree of distortion of the image.

P S N R

was calculated as follows:

P S N R = 10 \times \log (\frac{M^{2}}{M S E})

(12)

where,

M

represents the maximum value for the pixels of the fused image, and

M S E

represents the average square error for the initial and fused images.

(3)

S S I M

is an evaluation method of the three comparative dimensions of brightness, contrast, and structure. It is a measure of the change in structural information between images.

S S I M

was calculated as follows:

S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(13)

where

μ_{x} = \frac{1}{N} \sum_{i = 1}^{N} x_{i}

represents variance,

σ_{x} = {(\frac{1}{N - 1} \sum_{i = 1}^{N} {(x_{i} - μ_{x})}^{2})}^{1 / 2}

represents standard deviation, and

σ_{x y} = \frac{1}{N - 1} \sum_{i = 1}^{N} (x_{i} - μ_{x}) (y_{i} - μ_{y})

represents covariance.

(4) Entropy is a measure of the richness of the information in the image: the higher the entropy value, the more information in the fusion image, and the more complex the texture and details. The calculation process was as follows:

① The occurrence probability of each gray level was calculated by setting the image to contain N pixels, with the range of gray levels extending from 0 to

[0, L - 1]

. The number of occurrences of the

i th

gray level is denoted by

n_{i}

.

p (i)

represents the occurrence probability of that gray level.

② The entropy value, H, was calculated as follows:

H = - \sum_{i = 0}^{L - 1} p (i) \log_{2} p (i)

(14)

where

- \log_{2} p (i)

represents the amount of information contained in the multi-source canopy image.

(5) Edge count is an index used for evaluating the quality and clarity of an image. It assesses the detail and clarity by calculating the number of edges within the image. A higher number of edges represents that the image contains more detailed information. The calculation process proceeded as follows:

① Edge detection

The edge detection operator was used to process the fusion image, resulting in an edge image,

E

.

② Binarization

The edge image was binarized by setting a specific threshold,

T

, which transformed the edge image,

E

, into a binarized image,

B

.

③ Calculated edge pixels

Edge count was obtained by calculating the total count of pixels with the value of 1 in the binary edge image,

B

. The calculation equation was as follows:

E d g e C o u n t = \sum_{x} \sum_{y} B (x, y)

(15)

where

B (x, y)

represents a pixel value in the binary edge image,

B

, and

B (x, y) \in [0, 1]

, where 1 represents the edge pixel and 0 represents the non-edge pixel.

The fused images were analyzed by using the above image evaluation algorithm. Figure 9 shows the visualization results.

Figure 9 indicates that

M I = 0 . 936756

,

P S N R = 0 . 939969

,

S S I M = 0.763645

,

E n t r o p y = 1.000000

, and

E d g e C o u n t = 1.000000

were the maximum values of each index, with the proportions of 64%, 64%, 59%, 64%, and 64%, respectively, in the canopy thermal infrared image. When the proportion was 64%, the corresponding

S S I M = 0 . 758827

, with a difference of 0.0048 from the maximum value. The comprehensive analysis of the five evaluation indexes, MI, PSNR, SSIM, entropy, and edge count, showed that when the proportion of the canopy thermal infrared image was 64% and the proportion of color images was 36%, the information contained in the images was the maximum, and at this time, the fusion effect of multi-source images was the most effective, which provided a reliable database for the subsequent training of the model.

2.4. Construction of Improved Deep Learning Model

2.4.1. The ResNet-ViT Network Structure

The transformer network captured the connections of the elements in the input sequence by means of a self-attentive mechanism, independent of traditional convolutional or recursive operations [33,34,35]. The network divided the input multi-source fusion image into non-overlapping small blocks, captured the global dependence between the image blocks in the self-attention layer, and completed the classification of the multi-source fusion image. The structure is illustrated in Figure 10.

The transformer network could only project the structure in the image block with less effective linear information, and the local feature extraction ability of the image was inferior to that of the residual neural network. Thus, the stem architecture and the residual network were used to improve the transformer network to form the ResNet-ViT network structure (RMT). The structure of the RMT model is shown in Figure 11.

The model adequately combined the advantages of the two network models of residual structure and transformer, which could better extract the local and global information of the image and improve the training and inference speed of the model. First, the model contained four stages for generating feature maps at different scales, and each stage was preceded by a patch-embedding layer, which segmented the input multi-source fusion image into image blocks of the same size without overlapping and transformed them into a series of vectors that could be processed by the model. Second, it entered the transformer encoder module to perform the normalization operation and integrated the overall information of the fusion image to complete the image feature transformation through the lightweight multi-headed self-attention mechanism (LMHSA) and improved reverse feed-forward neural network (IRFFN). Finally, the rust disease recognition of adzuki bean plant was completed by averaging the pooling layer to reduce the dimensionality of the fusion image and to determine whether it was a rusted plant or not.

The RMT model has been improved in terms of local information extraction, residual structure, MHSA mechanism, and feed-forward neural network structure, which are described in detail as follows.

(1) The image’s local information was extracted by the local perceptual unit (LPU)

To improve the ability of the model to capture local features in the input image, the LPU was utilized to capture the local features of the image, which was defined as follows:

L P U (X) = D W C o n v (X) + X

(16)

where,

X \in R^{H \times W \times D}

,

H \times W

represents the resolution of the present input stage,

D

represents the dimensionality of the feature, and

D W C o n v (\cdot)

represents the deep convolution.

(2) The SE channel attention mechanism was introduced

The SE channel attention mechanism was used to form a residual attention network model, which improved the feature capture and image recognition ability of the model. The attention mechanism was capable of adaptive learning of the channel weights of the input features, and deep mining of the feature information contained in the fused image of the adzuki bean, which enhanced the representation of fused image features and feature information extraction for health and dyeing disease, and further improved the recognition accuracy of adzuki bean diseases. The structure of the model with the SE channel attention mechanism embedded in the residual block is shown in Figure 12.

(3) The MHSA mechanism was improved

The traditional MHSA mechanism was applied to extract the global features of the image by modeling the remote dependencies between different locations in the multi-source fusion image of the adzuki bean plant, which was defined as follows:

A t t n (Q, K, V) = Softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(17)

where,

Q \in R^{n \times d_{k}}, K \in R^{n \times d_{k}}, V \in R^{n \times d v}

,

n = H \times W

represent the number of patches in the image,

d, d_{k}, d_{v}

represent the input, key, and value, respectively, and

d_{k}

denotes the dimensionality of

K

.

To decrease the computational complexity, lightweight MHSA was adopted to improve the inference speed and decrease the model size based on ensuring the image recognition performance. Here,

h

lightweight MHSA functions were used as inputs, and the output of each head was a sequence of size

n \times d

. Then, the

h

sequences were connected into an

n \times d

sequence, and the spatial size of K and V was reduced by a deep convolution, with a step size of

k \times k

. A deviation of relative locations,

B

, was added to each self-attention module with the following expression:

L i g h t A t t n (Q, K, V) = S o f t \max (\frac{Q K^{' T}}{\sqrt{d_{k}}} + B) V^{'}

(18)

where

K^{'} = D W C o n v (K) \in R^{\frac{n}{k^{2}} \times d_{k}}

,

V^{'} = D W C o n v (V) \in R^{\frac{n}{k^{2}} \times d_{v}}

, and

B \in R^{n \times \frac{n}{k^{2}}}

represents a randomly initialized value that was learnable. The learned relative positional deviation was able to be transferred to

B' \in R^{m_{1} \times m_{2}}

of a different size

m_{1} \times m_{2}

by a cubic convolutional interpolation algorithm (i.e.,

B' = B i c u b i c (B)

). Thus, the RMT block in this model could be optimally adapted for other downstream vision tasks.

(4) FFN structure was improved

The FFN in the traditional ViT model consisted of two fully connected layers. The first fully connected layer extended the dimension of the input vector to facilitate the model to capture more abundant information in the input data. The second fully connected layer compressed the extended dimension to the original dimension and recombined the activated features. The expression was as follows:

F F N (X) = R E L U (X W_{1} + ξ_{1}) W_{2} + ξ_{2}

(19)

where

W_{1}

and

W_{2}

represent the weights of the two fully connected layers, respectively, while

ξ_{1}

and

ξ_{2}

represent the deviation terms.

The

I R F F N

model proposed here was similar to the inverse residual block structure, which used residual connectivity to prevent the gradient explosion problem and improved the model performance to the input image. The lightweight convolution kernel reduced the computational and parametric quantities of the model. The expression was as follows:

I R F F N (X) = C o n v (F (C o n v (X)))

(20)

F (X) = D W C o n v (X) + X

(21)

where

D W C o n v (\cdot)

represents the depth convolution.

The local feature information in the image was extracted by the

I R F F N

model, and shortcut connections were introduced in the structure to transfer the information from the shallow layer to the deep layer directly, thus alleviating the gradient vanishing problem.

In summary, the RMT module in this model could be represented as follows:

\{\begin{cases} Y_{i} = L P U (X_{i - 1}) \\ Z_{i} = L M H S A (L N (Y_{i})) + Y_{i} \\ X_{i} = I R F F N (L N (Z_{i})) + Z_{i} \end{cases}

(22)

where

Y_{i}

represents the

L P U

module of the ith block,

Z_{i}

represents the output characteristics of the

L M H S A

module of the ith block, and

L N

represents the normalization layer. The model was stacked with multiple RMT modules in each stage to facilitate feature transformation.

2.4.2. Optimization Algorithm

To further improve the model’s performance, the Adam with Weight Decay Fix optimization algorithm (AdamW) was used in this study [36,37]. AdamW is an optimizer combining the Adam optimizer, weight decay, and gradient correction. Weight decay prevented the model from being overweighted and reduced the risk of overfitting. The gradient correction prevented the gradient from being too large and improved the stability of the model.

The specific steps of the AdamW optimizer to train network parameters were as follows:

① To search and optimize each parameter, a moment vector,

m_{t}

, and an exponentially weighted infinite norm,

v_{t}

, must be maintained. Here,

m_{t}

and

v_{t}

were initialized to 0 at the beginning of the search, i.e.,

m_{0} = 0

and

v_{0} = 0

.

② At each moment, t, the AdamW optimizer updated the model parameters,

θ

, through a series of steps. To update one parameter, for example, it first calculated the gradient,

g_{t}

of the current parameter,

θ_{t}

, with respect to the loss function,

L (θ_{t})

:

g_{t} = \nabla_{θ t} L (θ_{t})

(23)

It updated the first moment with gradient

g_{t}

and hyperparameter

β_{1}

:

m_{t} = β_{1} m_{t - 1} + (1 - β_{1}) g_{t}

(24)

It updated the second moment with square gradient

g_{t}^{2}

and hyperparameter

β_{2}

:

v_{t} = β_{2} v_{t - 1} + (1 - β_{2}) g_{t}^{2}

(25)

where,

g_{t}

is the gradient of the previous time step,

g_{t}^{2}

is the square gradient,

β_{1}

is the decay factor of the first momentum, and

β_{2}

is the decay factor of the infinite norm. Here,

β_{1}

and

β_{2}

were used to control the first- and second-order momentum.

③ As

m_{t}

and

v_{t}

were initialized to 0 in step ①,

m_{t} / v_{t}

will be biased to 0, especially in the initial training stage, so it was necessary to correct the deviation of gradient mean

m_{t} / v_{t}

to reduce the influence of the deviation on the initial training stage.

The deviation correction of the first moment and the second moment was carried out, taking the first moment as the starting point:

m_{t}^{'} = \frac{m_{t}}{1 - β_{1}^{t}}

(26)

and the second moment as:

v_{t}^{'} = \frac{v_{t}}{1 - β_{2}^{t}}

(27)

where,

β_{1}^{t}

and

β_{2}^{t}

decayed according to schedule during the iteration of the algorithm, and

β_{1}^{t} = {(β_{1})}^{t}

,

β_{2}^{t} = {(β_{2})}^{t}

.

Finally, we iteratively calculated and updated the parameters:

θ_{t + 1} = θ_{t} - α (\frac{{\overset{⌢}{m}}_{t}}{\sqrt{{\overset{⌢}{v}}_{t}} + ε} + λ θ_{t})

(28)

where,

α

is the step size hyperparameter (learning rate) and

ε

is a smoothing term, with a small value to prevent the denominator from being 0.

The relevant hyperparameters of the experimental model in this paper were set as follows: the learning rate of

α

was 0.001, which was adjusted by exponential attenuation. The attenuation factor

β_{1}

of the first momentum was 0.9, the attenuation factor

β_{2}

of the infinite norm was 0.99, and the smoothing term

ε

was 1 × 10⁻⁸. Each batch of 64 images was used for training.

3. Results

3.1. Experimental Environment

In order to diagnose the adzuki bean rust disease, the research was completed with the Pytorch learning framework, constructed Alexnet convolutional network, ResNet18 residual neural network, transformer network, and RMT model. The configuration of the experimental environment was the Windows 10 64-bit operating system and NVIDIA GeForce GTX 1050 Ti (4096 MB). The software environment was Anaconda3 (64-bit), CUDA 10, Python 3.9, and Pytorch-GPU 2.0.

3.2. Sample Expansion and Segmentation

The size of the acquired multi-source fusion images of the adzuki bean canopy was unified to

224 \times 224

pixels. In order to solve the problems of image deflection caused by the shooting angle during the image shooting process, and image distortion caused by sensor failure, lighting changes, and other factors, this research adopted the image rotation, pretzel noise, and Gaussian noise methods to expand the experimental samples to provide a sufficient database for model training [38,39].

(1) Image rotation

Image rotation is the rotation of a point within an image by a specific angle in a clockwise or counterclockwise direction, and the principle is shown in Figure 13.

In Figure 13, the coordinates of P and Q were

(x, y)

and

(x^{'}, y^{'})

, respectively. The coordinates of these two points were placed in the unit circle, and the expressions of the two sets of coordinates and the angle relationship between them were as follows:

\{\begin{cases} x = \cos α \\ y = \sin α \end{cases}

(29)

\{\begin{cases} x^{'} = \cos (α + θ) = x \cos θ - y \sin θ \\ y^{'} = \sin (α + θ) = x \sin θ + y \cos θ \end{cases}

(30)

The center of the image was taken as the origin, and the values of

θ

were

90^{\circ}

and

180^{\circ}

, respectively, which meant that the multi-source fusion image of the adzuki bean canopy was rotated counterclockwise at

90^{\circ}

and

180^{\circ}

to obtain the fused canopy image after rotation.

(2) Salt and pepper noise addition

Pepper noise is the noise due to the signal pulse intensity, where the pixel values of pepper noise and salt noise were 0 and 255, respectively. The multi-source fusion image of the adzuki bean canopy to be expanded was subjected to a random selection of pixel points and its gray value was set to 0 or 255 to simulate the pepper noise and salt noise. Finally, the generated pepper and salt noise was added to the multi-source fusion image.

(3) Gaussian noise

The density of Gaussian noise was determined by the formula:

G (X - m e a n, s i g m a)

, where

X - m e a n

represents the average value,

s i g m a

represents the standard variance, and the expression for the output pixel,

P o u t

, was:

P o u t = P i n + X - m e a n + s i g m a \times G (d)

(31)

where

P i n

represents each input pixel,

d

represents a linear random number, and

G (d)

represents a Gaussian-distributed random value with a random number range of

[0, 1]

.

The enhanced multi-source fusion image of the adzuki bean is shown in Figure 14, where (a)–(d) and (e)–(f) represent the enhanced effects of the healthy and diseased plants of the adzuki bean, respectively.

Taking 73 color images and 73 thermal infrared images as experimental objects, the number of infected samples of color images and thermal infrared images was 40, and the number of healthy samples was 33, which meant that the number of fused healthy image samples was 33, and the number of infected samples was 40. The above data enhancement algorithms were used to expand the sample, and we obtained 13,152 diseased multi-source fused canopy images and 12,646 healthy multi-source fused canopy images, totaling 25,798 images. Subsequently, the dataset was then randomly divided into a training set, validation set, and test set in the ratio of 7:2:1 for training, validation, and performance testing. There was a total of 18,060 samples in the training set for healthy and diseased plants, 5159 samples in the validation set, and 2579 samples in the test set. The number and division of multi-source fusion image samples of healthy and diseased plants are shown in Table 3.

3.3. Model Training

To avoid pixel differences between images and improve the model training speed and accuracy, in this experiment, the pixels of the multi-source fusion images of the adzuki bean plant were normalized to

224 \times 224

. When the network model was trained, the batch of training samples was set to 4, the epoch was 100 times, and the data from each batch were normalized using batch normalization. During the training process, the AdamW optimizer was selected, with a learning rate of 0.001. To assure the experiment’s accuracy, the same optimization algorithm, sample size, number of training rounds, and activation function were used in all four network models employed in this study.

The Alexnet network [40] was the beginning of deep learning in the field of image recognition, being the typical neural network model. The ResNet18 network [41] introduced residual connection in traditional neural networks, which solved the problem of gradient disappearance and gradient explosion caused by the increase in the network depth and improved the model’s ability of extracting local features of the image. The transformer network [42] used the mechanism of self-attention, which worked in capturing the global features of the image. Thus, the Alexnet, ResNet18, and transformer networks were selected as the comparison models. During the training and validation of the AlexNet, ResNet18, and transformer network models, the variation in accuracy and loss values were as shown in Figure 15.

Figure 15a,b showed that in the training set and validation set, the accuracy curves of the Alexnet model, the ResNet18 model, and the transformer model changed by 0.427863 and 0.434221, 0.179695 and 0.123355, and 0.466686 and 0.54524, respectively. The Epoch values at convergence were 41 and 42, 38 and 43, and 72 and 62, respectively. The accuracy rates were 96.93% and 95.66%, 97.62% and 96.33%, and 97.27% and 99.88%, respectively. The accuracy curves of the RMT model on the training set and the verification set were 0.395362 and 0.254995, respectively. When Epoch = 35 and Epoch = 19, the model began to converge, and the accuracy was 99.41% and 100%. The smaller variation in accuracy indicated higher network stability, and the stability of the models employed in this study was sequentially improved, following the order of transformer, AlexNet, RMT, and ResNet18.

Figure 15c,d showed that the loss values of the training set and the validation set in the Alexnet model, the ResNet18 model, and the transformer model decreased by 0.656602 and 0.667728, 0.458021 and 0.429020, and 0.951776 and 1.151022, respectively. The models began to converge when the Epoch values were 43 and 51, 50 and 54, and 50 and 72, respectively. The loss value of the RMT model on the training set and the verification set decreased by 0.768456 and 0.552645, respectively. When Epoch = 42 and Epoch = 29, the models began to converge. When the convergence value approached 0 and the Epoch value decreased, it indicated that the network learned more efficiently. Based on the order of transformer, ResNet18, Alexnet, and RMT, the convergence rate of each network model on the training and validation sets was enhanced sequentially.

Through the comprehensive analysis, the optimal network model in the training and verification process was the RMT model, which could recognize the adzuki bean rust disease more efficiently.

3.4. Simulation Test

The sample size in the test set was uniformly resized to 224 × 224 pixels, and four neural network models (Alexnet, ResNet18, transformer, and RMT) were used for the comparative evaluation of training and performance. Table 4 shows the recognition accuracies of these four models on the test set.

Table 4 shows that the Alexnet network had the lowest recognition accuracy, and the ResNet18 network was 1.99% more accurate compared to the Alexnet network. The reason was that the residual structure was able to extract the disease features in a more detailed way, thus improving the recognition of the disease. The average recognition accuracy of the RMT network was found to be 4.61%, 2.63%, and 2.55% higher than that of the Alexnet, ResNet18, and Transformer networks, respectively. The superior performance of the RMT network was attributed to its retention of the long-distance dependence advantage typical of the transformer network in processing images. Additionally, it incorporated the ResNet18 residual neural network with an SE attention mechanism, which significantly enhanced the model’s capability to capture and learn local features of the input samples and improved the accuracy of the model in recognizing adzuki bean rust.

In this paper, a confusion matrix was used to evaluate the effectiveness of the models in classifying adzuki bean rust disease. Figure 16 shows the results of these four models in recognizing the adzuki bean rust disease.

Figure 16 shows that the models were ranked as Alexnet, ResNet18, transformer, and RMT in descending order of the overall recognition error rate value. Compared to the Alexnet model, the ResNet18 network incorporated a residual structure and reduced the error rate of disease recognition. The transformer network excelled at capturing global dependencies between images. Consequently, the transformer network was combined with the ResNet18 network to form the RMT network model, further reducing the recognition error rate. In the test set, the RMT network model performed the best.

In Table 5, the model size was reduced in a sequence of transformer, ResNet18, RMT, and Alexnet. The recognition time was reduced in a sequence of transformer, Alexnet, ResNet18, and RMT. Thus, the RMT model performed optimally.

Precision rate and recall rate are indicators of model classification performance and are a pair of contradictory measurements. It was not reasonable to rely on one of them alone to judge the strengths and weaknesses of the model, so it was necessary to use both of them together to evaluate the model more comprehensively. F1-score, as a reconciled average of the precision rate and the recall rate, effectively balanced the impact of the two indexes, and made the evaluation of the model more reasonable.

The specific steps for evaluating the effectiveness of the RMT model in recognizing adzuki bean rust using the F1-score were as follows:

(1) Precision rate

Precision rate represented the number of samples in which the adzuki bean plant were correctly predicted, as a proportion of the total number of positive samples, which was calculated as follows:

P r e c i s i o n_{k} = \frac{T P}{T P + F P}

(32)

where,

T P

represents positive samples that were the positively predicted,

F P

represents negative samples that were positively predicted, and

k \in [0, 1]

represents that there were two types of rust and health in the adzuki bean plants.

(2) Recall rate

The recall rate indicated the percentage of positive cases that were actually positive and correctly predicted to be positive, which was calculated as follows:

R e c a l l_{k} = \frac{T P}{T P + F N}

(33)

where

F N

represents the positive samples that were predicted to be negative and

k

represents both species of adzuki bean rust disease plants and healthy plants.

(3)

F 1 - S c o r e

According to the precision rate and recall rate, the

F 1 - S c o r e

was calculated as follows:

F 1_{k} - S c o r e = \frac{2 \times P r e c i s i o n_{k} \times R e c a l l_{k}}{P r e c i s i o n_{k} + R e c a l l_{k}}

(34)

(4)

M a c r o - F 1

By calculating the

F 1 - S c o r e

, the macro-average (

M a c r o - F 1

) was computed, yielding the final evaluation results. The equation was as follows:

M a c r o - F 1 = \frac{1}{n} \sum F 1_{k}

(35)

According to the confusion matrix and Equations (32)–(34), the Alexnet, ResNet18, transformer, and RMT models were evaluated for the recognition performance of adzuki bean rust disease. The evaluation results of each model are shown in Figure 17.

We macro-averaged the above F1 values to obtain Macro-F1 values to evaluate the final results [38]. The Macro-F1 value of each model is shown in Figure 18.

In Figure 17 and Figure 18, when the precision rate, recall rate, and F1-score all converged to 1, the performance of the model was better, and thus the model performance was improved sequentially. The RMT model performed better than the other three models, with the highest Macro-F1 value of 99.67% and an average recognition rate of 99.63%, including 99.3% for color images of leaves without obvious symptoms, which achieved fast and efficient recognition of adzuki bean rust disease.

4. Discussion

In this paper, a crop disease recognition method based on multi-source image processing technology was proposed, and it achieved a better effect in the diagnosis of the adzuki bean rust disease, which is discussed as follows.

(1) Fusion efficiency of the multi-source image

Pre-processing of data samples was the basic condition for efficient training of the network model. First, the DEXG algorithm was used to acquire the color canopy image, compared with the single background threshold segmentation algorithm [43,44]. The algorithm selected in this study was able to accurately extract the color image of the adzuki bean canopy in the face of the complex background, which was used as the reference image. Second, the color image and thermal infrared original image of the canopy were initially aligned, and the thermal infrared image was further accurately acquired by affine transformation of the reference image, which improved the segmentation accuracy by 11% compared with the image segmentation algorithm on the basis of grayscale clustering [45]. Then, the color image and thermal infrared image were fused by using the linear weighting algorithm to obtain an effective experimental sample. Compared with the image fusion algorithm on the basis of feature optimization and GAN [46], the entropy value of the algorithm used in this paper was improved by 20.49%, which could achieve the fusion of multi-source images in a more simple and efficient way. Finally, the experimental samples were expanded to 25,798 by applying the image rotation, pretzel noise, and Gaussian noise data enhancement algorithms, which enhanced the data diversity and robustness of the neural network training and provided sufficient and reliable experimental datasets for its training and the extraction of image feature information.

(2) Advantage of the recognition model

In this paper, the performance of the Alexnet network was compared with the ResNet18 network. The ResNet18 network with the inclusion of a residual structure was 1.99% higher than the Alexnet network in the recognition accuracy of adzuki bean rust disease. The recognition accuracy of rust disease was improved by 7.23% compared to the lightweight CCG-YOLOv5n model [47]. The ResNet Block was used to improve the feed-forward neural network for the transformer network to form an inverse residual feed-forward neural network (IRFFN) combined with the SE attention mechanism. Compared to the apple leaf disease recognition model [48], the RMT model used in this paper enhanced the ability to capture local features of an image and effectively mitigated the gradient explosion problem, which enhanced the recognition precision of the model. The MHSA mechanism in the transformer network was improved to the lightweight MHSA mechanism to reduce the computational effort. The RMT model used in this paper achieved accurate and non-destructive diagnosis of crop diseases, as compared to the existing research literature [49,50,51]. In the test set, the average recognition rate of the RMT model was 99.63%, the F1-score was 99.67%, and the recognition time was 0.072184 s. The recognition accuracy of the RMT model was 99.3% for the samples with no obvious symptoms in the color images. Compared with the models of Alexnet, ResNet18, and transformer, the accuracy, Macro-F1 and recognition time of the RMT model were improved by 3.27%, 5.48%, and 37.79% on average, respectively. Therefore, the model was able to recognize adzuki bean rust efficiently, rapidly, and immediately.

(3) Model error analysis

The network model proposed in this paper was capable of recognizing adzuki bean rust disease, but misrecognition occurred in the simulation experiment. The early disease characteristics of rust disease were not obvious, and there was some similarity between them and the healthy samples, which interfered with the accurate recognition, so more detailed feature differentiation algorithms are required for the disease characteristics. Data acquisition was accompanied by an increase in the use of the shooting equipment over time, which also increased the temperature of the machine body, which led to a distortion problem in the collection of adzuki beans’ thermal infrared images. Further, it is better to acquire data in the morning to avoid strong light. If the data acquisition is conducted using a thermal infrared sensor in the field, a precise registration device, such as a black body, should be adopted to reduce the impact of long-time strong light on data accuracy [52]. Additionally, a variety of network models were compared, and the network structure and algorithms were optimized to avoid the overfitting phenomenon during the training process, which resulted in unsatisfied results in the recognition of adzuki bean rust disease.

(4) Future work

The multi-source canopy image extraction and fusion algorithm selected in this research had higher robustness and was able to accurately acquire the canopy and its fused images. In addition, the RMT model proposed in this research also possessed a high recognition rate in the case of inconspicuous adzuki bean disease characteristics, which had a wide range of application prospects. For different plant and disease species, the RMT model proposed in this research was utilized, equipped with a strong generalization ability and adaptive learning properties to enable specific model construction for a wide range of plant diseases. Based on the content of this research, future work will expand the scope of plant disease detection in this research and integrate the model into portable mobile terminals. Meanwhile, we plan to add a thermal infrared sensor to the phenotype device (PlantEye F500, Heerlen, The Netherlands) to acquire not only 3D data and multi-spectral data, but also color and thermal infrared data with the support of the black body. Once the improved phenotype device is applied in field conditions, the high-throughput phenotypic traits can be acquired rapidly and accurately based on the established models to accomplish real-time monitoring of phenotypic traits, which will help agricultural researchers to detect diseases as early as possible and provide accurate disease prevention as well as control measures to optimize the production structure. By combining the use of Internet of Things technology, an intelligent agricultural management system will be established to realize continuous remote monitoring of crop growth and disease infection, which will assist farmers in specifying accurate pest control measures and optimization management strategies.

5. Conclusions

In this paper, healthy and diseased plants of the adzuki bean plant were used as experimental datasets, and the DEXG algorithm and affine transform algorithm were applied to extract color images and thermal infrared images of the canopy, respectively. The linear weighting algorithm was used to fuse multi-source images of the canopy. Based on applying the method of digital image enhancement for image sample expansion and segmentation, the multi-head attention mechanism was improved, and the SE channel attention mechanism was combined with residual blocks and embedded into the feed-forward neural network structure of the transformer to establish a novel adzuki bean rust disease recognition model (RMT). The model had an average recognition accuracy of 99.63%, which was higher than the recognition accuracy using color canopy images or thermal infrared canopy images alone. The proposed model achieved a recognition time of 0.072184 s and a Macro-F1 value of 99.67%. Among them, the recognition rate of samples without obvious symptoms in the leaf color image was 99.3%. The results showed that the linear weighting algorithm proposed in this paper realized the accurate extraction and fusion of multi-source canopy images of adzuki bean disease, provided data support and technical support for the high-throughput automated phenotyping research of the crop, and the novel neural network (RMT)-based recognition method of adzuki bean rust was adopted, which accurately, rapidly, and efficiently realized the diagnosis of adzuki bean rust disease. In the future, high-throughput phenotypic traits (plant type traits and component traits) will be acquired in field using the multi-source sensors (RGB sensor, 3D laser sensor, thermal infrared sensor and multispectral sensor) to monitor the plant diseases in the growth stage. The proposed method provided scientific guidance for variety selection, planting technology, and a refined management strategy of adzuki bean plant, and laid the theoretical foundation and technological support for intelligent big data analysis and extraction of crop disease phenotypes.

Author Contributions

X.M., L.W. and H.G. conceived and designed the experiments; X.M. and X.Z. generated the data; X.Z. analyzed the data; X.Z. and H.G. analyzed and processed the data; X.M. and X.Z. drafted the manuscript; X.Z., X.M. and H.G. revised the manuscript; L.W. provided the funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded jointly by the Natural Science Foundation of Heilongjiang Province, China (funding code: LH2021C062), the Heilongjiang Bayi Agricultural University Support Program for San Heng San Zong, China (funding code: ZRCQC202311), and the Postdoctoral Scientific Research Developmental Fund of Heilongjiang Province, China (LBH-Q20053).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Fu, X.; Ma, Q.; Yang, F.; Zhang, C.; Zhao, X.; Chang, F.; Han, L. Crop pest image recognition based on the improved ViT method. Inf. Process. Agric. 2023, 11, 249–259. [Google Scholar] [CrossRef]
Zhang, J.; Shen, D.; Chen, D.; Ming, D.; Ren, D.; Diao, Z. ISMSFuse: Multi-modal fusing recognition algorithm for rice bacterial blight disease adaptable in edge computing scenarios. Comput. Electron. Agric. 2024, 223, 109089. [Google Scholar] [CrossRef]
Padhi, S.; Gore, P.; Tripathi, K. Nutritional Potential of Adzuki Bean Germplasm and Mining Nutri-Dense Accessions through Multivariate Analysis. Foods 2023, 12, 4159. [Google Scholar] [CrossRef]
Guo, B.; Ke, X.; Gao, S.; Yu, X.; Sun, Q.; Zuo, Y. A preliminary study on the mechanisms of melatonin-induced rust resistance of adzuki bean. Plant Prot. 2020, 46, 145–150. [Google Scholar] [CrossRef]
Patil, K.; Suryawanshi, Y.; Patrawala, A. A comprehensive lemongrass (Cymbopogon citratus) leaf dataset for agricultural research and disease prevention. Data Brief 2024, 53, 110104. [Google Scholar] [CrossRef] [PubMed]
Qu, T.; Xu, J.; Chong, Q.; Liu, Z.; Yan, W.; Wang, X.; Song, Y.; Ni, M. Fuzzy neighbourhood neural network for high-resolution remote sensing image segmentation. Eur. J. Remote Sens. 2023, 56, 2174706. [Google Scholar] [CrossRef]
Mahadevan, K.; Punitha, A.; Suresh, J. Automatic Recognition of Rice Plant Leaf Diseases Detection using Deep Neural Network with Improved Threshold Neural Network. e-Prime-Adv. Electr. Eng. Electron. Energy 2024, 8, 100534. [Google Scholar] [CrossRef]
Dai, G.; Tian, Z.; Fan, J. DFN-PSAN: Multi-level deep information feature fusion extraction network for interpretable plant disease classification. Comput. Electron. Agric. 2024, 216, 108481. [Google Scholar] [CrossRef]
Kini, A.; Prema, K.; Pai, S. Early stage black pepper leaf disease prediction based on transfer learning using ConvNets. Sci. Rep. 2024, 14, 1404. [Google Scholar] [CrossRef]
Liu, S.; Wang, Y.; Liu, H.; Sun, L.; Ma, B.; Mu, J.; Ren, Z.; Wang, J. Image Segmentation of Apple Orchard Feathering Pest Adhesion Based on Shape—Color Screening. Trans. Chin. Soc. Agric. Mach. 2024, 55, 263–274. [Google Scholar] [CrossRef]
He, T.; Huang, Y.; Gao, H.; Zhang, Z.; Guo, R.; Yang, Y. Study on Water Stress Index Model of Lettuce Based on Fusion of Thermal Infrared and Visible Light Images. Water Sav. Irrig. 2023, 2023, 116–122. [Google Scholar] [CrossRef]
Ma, X.; Liu, M.; Guan, H.; Wen, F.; Liu, G. Research on crop canopy identification method based on thermal infrared image processing technology. Spectrosc. Spectr. Anal. 2021, 41, 216–222. [Google Scholar] [CrossRef]
Liu, M.; Guan, H.; Ma, X.; Yu, S.; Liu, G. Recognition method of thermal infrared images of plant canopies based on the characteristic registration of heterogeneous images. Comput. Electron. Agric. 2020, 177, 105678. [Google Scholar] [CrossRef]
Ben, Y.; Tang, R.; Dai, P.; Li, Q. Image enhancement algorithm for underwater vision based on weighted fusion. J. Beijing Univ. Aeronaut. Astronaut. 2023, 50, 1438–1445. [Google Scholar] [CrossRef]
Thaseentaj, S.; Ilango, S. Deep Convolutional Neural Networks for South Indian Mango Leaf Disease Detection and Classification. Comput. Mater. Contin. 2023, 77, 3593–3618. [Google Scholar] [CrossRef]
Hernández, I.; Gutiérrez, S.; Tardaguila, J. Image analysis with deep learning for early detection of downy mildew in grapevine. Sci. Hortic. 2024, 331, 113155. [Google Scholar] [CrossRef]
Liu, Y.; Wang, J.; Du, C.; Ding, R.; Gao, Q.; Zong, H.; Jiang, H. Detection of Various Tobacco Leaf Diseases Based on YOLOv3. Chin. Tob. Sci. 2022, 43, 94–100. [Google Scholar] [CrossRef]
Ma, N.; Ren, Y. Identification of Various Plant Leaf Diseases Based on Multi-feature BP Neural Network. Chin. Agric. Sci. Bull. 2024, 40, 158–164. [Google Scholar] [CrossRef]
Ashwini, C.; Sellam, V. An optimal model for identification and classification of corn leaf disease using hybrid 3D-CNN and LSTM. Biomed. Signal Process. Control 2024, 92, 106089. [Google Scholar] [CrossRef]
Song, Z.; Lu, Y.; Ding, Z. A New Remote Sensing Desert Vegetation Detection Index. Remote Sens. 2023, 15, 5742. [Google Scholar] [CrossRef]
Tan, J.; Xiao, W.; Ouyang, X.; Zhang, F.; Wei, X.; Jia, Z. Detection and Characterization Method of Algae on Insulator Surface Based on Vegetation Index. Environ. Technol. 2021, 39, 133–139+143. [Google Scholar] [CrossRef]
Wen, L.; Chen, S.; Xie, M.; Liu, C.; Zhang, L. Training multi-source domain adaptation network by mutual information estimation and minimization. Neural Netw. 2024, 2024, 171353–171361. [Google Scholar] [CrossRef] [PubMed]
Dtissibe, F.Y.; Ari, A.A.A.; Abboubakar, H.; Njoya, A.N.; Mohamadou, A.; Thiare, O. A comparative study of Machine Learning and Deep Learning methods for flood forecasting in the Far-North region, Cameroon. Sci. Afr. 2024, 23, e02053. [Google Scholar] [CrossRef]
Tolie, H.F.; Ren, J.; Elyan, E. DICAM: Deep Inception and Channel-wise Attention Modules for underwater image enhancement. Neurocomputing 2024, 584, 127585. [Google Scholar] [CrossRef]
Wei, X.; Li, J.; Chen, G. Image Quality Estimation Based on Multi-linear Analysis. J. Image Graph. 2008, 13, 2123–2131. [Google Scholar] [CrossRef]
Wang, Y.; Wu, M.; Geng, F.; Feng, Y.; Zhou, W. Underwater image enhancement algorithm based on image entropy linear weighting. J. Chongqing Technol. Bus. Univ. 2023, 1–8. Available online: http://kns.cnki.net/kcms/detail/50.1155.N.20230524.1059.002.html (accessed on 10 July 2024).
Xie, X.; Cheng, W. Method of infrared thermal imaging and visible light image fusion based on U2⁃Net. Mod. Electron. Tech. 2024, 47, 100–104. [Google Scholar] [CrossRef]
Gao, S.; Jin, W.; Wang, L.; Luo, Y.; Li, J. Quality evaluation for dual-band color fusion images based on scene understanding. Infrared Laser Eng. 2014, 43, 300–305. [Google Scholar] [CrossRef]
Tang, Z.; Li, J.; Huang, J.; Wang, Z.; Luo, Z. Multi-scale convolution underwater image restoration network. Mach. Vis. Appl. 2022, 33, 85. [Google Scholar] [CrossRef]
Lin, Y.; Zhou, J.; Ren, W.; Zhang, W. Autonomous underwater robot for underwater image enhancement via multi-scale deformable convolution network with attention mechanism. Comput. Electron. Agric. 2021, 191, 106497. [Google Scholar] [CrossRef]
Ma, H.; Zhang, Y.; Sun, S.; Zhang, W.; Fei, M.; Zhou, H. Weighted multi-error information entropy based you only look once network for underwater object detection. Eng. Appl. Artif. Intell. 2024, 130, 107766. [Google Scholar] [CrossRef]
Luo, X.; Wang, J.; Zhang, Z.; Wu, X.J. A full-scale hierarchical encoder-decoder network with cascading edge-prior for infrared and visible image fusion. Pattern Recognit. 2024, 148, 110192. [Google Scholar] [CrossRef]
Fang, J.; Liu, H.; Mu, X. Self-supervised monocular depth estimation based on convolution neural network hybrid transformer. Intell. Comput. Appl. 2024, 14, 168–174+179. [Google Scholar] [CrossRef]
Dwivedi, K.; Dutta, M.K.; Pandey, J.P. EMViT-Net: A novel transformer-based network utilizing CNN and multilayer perceptron for the classification of environmental microorganisms using microscopic images. Ecol. Inform. 2024, 79, 102451. [Google Scholar] [CrossRef]
Thwal, C.M.; Nguyen, M.N.; Tun, Y.L.; Kim, S.T.; Thai, M.T.; Hong, C.S. OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning. Neural Netw. 2024, 170, 635–649. [Google Scholar] [CrossRef] [PubMed]
Tian, Y.; Zhang, Y.; Zhang, H. Recent advances in stochastic gradient descent in deep learning. Mathematics 2023, 11, 682. [Google Scholar] [CrossRef]
Setiawan, A.; Yudistira, N.; Wihandika, R.C. Large scale pest classification using efficient Convolutional Neural Network with augmentation and regularizers. Comput. Electron. Agric. 2022, 200, 107204. [Google Scholar] [CrossRef]
Yu, M.; Ma, X.; Guan, H. Recognition method of soybean leaf diseases using residual neural network based on transfer learning. Ecol. Inform. 2023, 76, 102096. [Google Scholar] [CrossRef]
Makomere, R.S.; Koech, L.; Rutto, H.L.; Kiambi, S. Precision forecasting of spray-dry desulfurization using Gaussian noise data augmentation and k-fold cross-validation optimized neural computing. J. Environ. Sci. Health Part A 2024, 59, 1–14. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In Proceedings of the Computer Vision—ECCV 2016—14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 45, 87–110. [Google Scholar] [CrossRef]
Yu, M.; Ma, X.; Guan, H.; Liu, M.; Zhang, T. A recognition method of soybean leaf diseases based on an improved deep learning model. Front. Plant Sci. 2022, 13, 878834. [Google Scholar] [CrossRef] [PubMed]
Yu, M.; Ma, X.; Guan, H.; Zhang, T. A diagnosis model of soybean leaf diseases based on improved residual neural network. Chemom. Intell. Lab. Syst. 2023, 237, 104824. [Google Scholar] [CrossRef]
Xie, Q.; Sun, W.; Shen, Z. Segmentation of photovoltaic infrared hot spot image based on gray level clustering. Acta Energiae Solaris Sin. 2023, 44, 117–124. [Google Scholar] [CrossRef]
Hao, S.; Li, J.; Ma, X.; He, T.; Sun, S.; Li, T. Infrared and Visible Image Fusion Algorithm Based on Feature Optimization and GAN. Acta Photonica Sin. 2023, 52, 1210004. [Google Scholar] [CrossRef]
Li, J.; Liu, Z.; Wang, D. A Lightweight Algorithm for Recognizing Pear Leaf Diseases in Natural Scenes Based on an Improved YOLOv5 Deep Learning Model. Agriculture 2024, 14, 273. [Google Scholar] [CrossRef]
Si, H.; Li, M.; Li, W.; Zhang, G.; Wang, M.; Li, F.; Li, Y. A Dual-Branch Model Integrating CNN and Swin Transformer for Efficient Apple Leaf Disease Classification. Agriculture 2024, 14, 142. [Google Scholar] [CrossRef]
Wang, F.; Rao, Y.; Luo, Q.; Jin, X.; Jiang, Z.; Zhang, W.; Li, S. Practical cucumber leaf disease recognition using improved Swin Transformer and small sample size. Comput. Electron. Agric. 2022, 199, 107163. [Google Scholar] [CrossRef]
Ding, R.; Qiao, Y.; Yang, X.; Jiang, H.; Zhang, Y.; Huang, Z.; Liu, H. Improved ResNet based apple leaf diseases identification. IFAC-Pap. 2022, 55, 78–82. [Google Scholar] [CrossRef]
Kumar, S.; Pal, S.; Singh, V.P.; Jaiswal, P. Performance evaluation of ResNet model for classification of tomato plant disease. Epidemiol. Methods 2023, 12, 20210044. [Google Scholar] [CrossRef]
Xie, F.; Qi, T.; Zhang, P. Design of infrared detection sensitivity test equipment based on double black body. In Eighth Symposium on Novel Photoelectronic Detection Technology and Applications; SPIE: Bellingham, WA, USA, 2022; pp. 2301–2305. [Google Scholar] [CrossRef]

Figure 1. The dynamic acquisition system of the multi-source image for adzuki bean plant.

Figure 2. Multi-source images of healthy plant and diseased plant with no obvious infection symptoms. Note: (a1) Color image of healthy condition. (a2) Thermal infrared image of healthy condition. (b1) Color image of rust disease infection at 24 h. (b2) Thermal infrared image of rust disease infection at 24 h. (c1) Color image of rust disease infection at 48 h. (c2) Thermal infrared image of rust disease infection at 48 h. (d1) Color image of rust disease infection at 72 h. (d2) Thermal infrared image of rust disease infection at 72 h. (e1) Color image of rust disease infection at 96 h. (e2) Thermal infrared image of rust disease infection at 96 h. (f1) Color image of rust disease infection at 120 h. (f2) Thermal infrared image of rust disease infection at 120 h.

Figure 3. Multi-source images of obvious infectious symptoms. Note: (a1) Color image showing chlorotic yellow spots. (a2) Thermal infrared image showing chlorotic yellow spots. (b1) Color image showing urediniospores of dark yellow bacteria. (b2) Thermal infrared image showing urediniospores of dark yellow bacteria. (c1) Color image showing dark brown lesions. (c2) Thermal infrared image showing dark brown lesions. (d1) Color image showing yellowing, drying, and wilting. (d2) Thermal infrared image showing yellowing, drying, and wilting.

Figure 4. The extracted color canopy image. Note: (a1) Extracted color image of the healthy condition. (b1) Extracted color image of rust disease infection at 24 h. (c1) Extracted color image of rust disease infection at 48 h. (d1) Extracted color image of rust disease infection at 72 h. (e1) Extracted color image of rust disease infection at 96 h. (f1) Extracted color image of rust disease infection at 120 h. (g1) Extracted color image showing chlorotic yellow spots. (h1) Extracted color image showing urediniospores of dark yellow bacteria. (i1) Extracted color image showing dark brown lesions. (j1) Extracted color image showing yellowing, drying, and wilting.

Figure 5. Initial registration of feature point pairs.

Figure 6. Canopy thermal infrared image. Note: (a1) Thermal infrared image of the healthy condition. (a2) Extracted thermal infrared canopy image of the healthy condition. (b1) Thermal infrared image of rust disease infection at 24 h. (b2) Extracted thermal infrared canopy image of rust disease infection at 24 h. (c1) Thermal infrared image of rust disease infection at 48 h. (c2) Extracted thermal infrared canopy image of rust disease infection at 48 h. (d1) Thermal infrared image of rust disease infection at 72 h. (d2) Extracted thermal infrared canopy image of rust disease infection at 72 h. (e1) Thermal infrared image of rust disease infection at 96 h. (e2) Extracted thermal infrared canopy image of rust disease infection at 96 h. (f1) Thermal infrared image of rust disease infection at 120 h. (f2) Extracted thermal infrared canopy image of rust disease infection at 120 h. (g1) Thermal infrared image showing chlorotic yellow spots. (g2) Extracted thermal infrared canopy image showing chlorotic yellow spots. (h1) Thermal infrared image showing urediniospores of dark yellow bacteria. (h2) Extracted thermal infrared canopy image showing urediniospores of dark yellow bacteria. (i1) Thermal infrared image showing dark brown lesions. (i2) Extracted thermal infrared canopy image showing dark brown lesions. (j1) Thermal infrared image showing yellowing, drying, and wilting. (j2) Extracted thermal infrared canopy image showing yellowing, drying, and wilting.

Figure 7. Image fusion flowchart.

Figure 8. The weight proportions of the canopy thermal infrared image.

Figure 9. Evaluation results using five indexes.

Figure 10. Structure of the transformer block.

Figure 11. The structure of the RMT model.

Figure 12. The SE channel attention mechanism structure.

Figure 13. Rotating diagram.

Figure 14. Data enhancement effects.

Figure 15. Training set and validation set accuracy and loss rate.

Figure 16. Comparison of the four models in recognizing the adzuki bean rust disease.

Figure 17. Evaluation results of each model.

Figure 18. Macro-F1 value.

Table 1. Evaluation of the segmentation method.

	Extraction Method	DEXG Algorithm
Evaluation Indicators		DEXG Algorithm
DICE		0.9971
OE		0.0016
Jaccard		0.9869

Table 2. Evaluation of extraction results.

Canopy Extraction Methods	Canopy Extraction Evaluation Indicators
Canopy Extraction Methods	OR	UR	SA
Affine transformation	0.0026	0.0193	0.9785

Table 3. Division of the number of samples of adzuki beans.

Status of Adzuki Beans	Number of Training Sets	Number of Validation Sets	Number of Test Sets
Rust disease	9207	2630	1315
Healthy	8853	2529	1264

Table 4. Comparison of models on the test set.

Plant Condition	Accuracy/%
Plant Condition	Alexnet	ResNet18	Transformer	RMT
Rust disease	94.17%	96.36%	96.76%	99.47%
Healthy	95.86%	97.65%	97.39%	99.78%
Average value	95.01%	97.00%	97.08%	99.63%

Table 5. Performance of the models.

	Alexnet	ResNet18	Transformer	RMT
Performance Index	Alexnet	ResNet18	Transformer	RMT
Model size/MB	55.69	72.76	127.35	61.26
Average recognition time/s	0.086380	0.073682	0.138329	0.072184

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, X.; Zhang, X.; Guan, H.; Wang, L. Recognition Method of Crop Disease Based on Image Fusion and Deep Learning Model. Agronomy 2024, 14, 1518. https://doi.org/10.3390/agronomy14071518

AMA Style

Ma X, Zhang X, Guan H, Wang L. Recognition Method of Crop Disease Based on Image Fusion and Deep Learning Model. Agronomy. 2024; 14(7):1518. https://doi.org/10.3390/agronomy14071518

Chicago/Turabian Style

Ma, Xiaodan, Xi Zhang, Haiou Guan, and Lu Wang. 2024. "Recognition Method of Crop Disease Based on Image Fusion and Deep Learning Model" Agronomy 14, no. 7: 1518. https://doi.org/10.3390/agronomy14071518

APA Style

Ma, X., Zhang, X., Guan, H., & Wang, L. (2024). Recognition Method of Crop Disease Based on Image Fusion and Deep Learning Model. Agronomy, 14(7), 1518. https://doi.org/10.3390/agronomy14071518

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Recognition Method of Crop Disease Based on Image Fusion and Deep Learning Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Materials

2.2. Data Acquisition Method

2.2.1. Extraction of the Color Canopy Images

2.2.2. Evaluation of Extraction Effect for Color Canopy Image

2.2.3. Extraction of the Thermal Infrared Canopy Image

2.2.4. Evaluation of Extraction Effect of the Thermal Infrared Canopy Image

2.3. Multi-Source Image Fusion Algorithm

2.3.1. Linear Weighted Algorithm

2.3.2. Evaluation of the Fusion Effect

2.4. Construction of Improved Deep Learning Model

2.4.1. The ResNet-ViT Network Structure

2.4.2. Optimization Algorithm

3. Results

3.1. Experimental Environment

3.2. Sample Expansion and Segmentation

3.3. Model Training

3.4. Simulation Test

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI