Deep-Learning Approach for Fusarium Head Blight Detection in Wheat Seeds Using Low-Cost Imaging Technology

: Modern techniques that enable high-precision and rapid identiﬁcation/elimination of wheat seeds infected by Fusarium head blight (FHB) can help to prevent human and animal health risks while improving agricultural sustainability. Robust pattern-recognition methods, such as deep learning, can achieve higher precision in detecting infected seeds using more accessible solutions, such as ordinary RGB cameras. This study used different deep-learning approaches based on RGB images, combining hyperparameter optimization, and ﬁne-tuning strategies with different pretrained convolutional neural networks (convnets) to discriminate wheat seeds of the TBIO Toruk cultivar infected by FHB. The models achieved an accuracy of 97% using a low-complexity design architecture with hyperparameter optimization and 99% accuracy in detecting FHB in seeds. These ﬁndings suggest the potential of low-cost imaging technology and deep-learning models for the accurate classiﬁcation of wheat seeds infected by FHB. However, FHB symptoms are genotype-dependent, and therefore the accuracy of the detection method may vary depending on phenotypic variations among wheat cultivars.


Introduction
Wheat (Triticum aestivum L.) is a key crop for global food security [1]. However, this crop is threatened by numerous biotic and abiotic pressures, which result in significant losses in productivity and quality [2]. Among the fungal diseases, Fusarium head blight (FHB), predominantly caused by Fusarium graminearum Schwabe, is one of the most widespread and devastating fungal diseases in wheat production [3]. This pathogen produces different mycotoxins such as deoxynivalenol (DON) that accumulate in wheat seeds and can seriously affect human and animal health [4]. Further, mycotoxins drastically compromise the physiological potential of seeds used to establish production fields [5]. Therefore, techniques that enable the identification of seeds infected by FHB are necessary to mitigate the risks associated with this disease. For instance, enzyme-linked immunosorbent assays (ELISA) and high-performance liquid chromatography (HPLC) are among the most widely used and accurate traditional techniques for identifying contaminated seeds. However, they have high costs [6], requiring a large number of samples to be tested, which affects the efficiency of the seed production chain.
Emerging techniques, particularly those based on spectroscopy, hyperspectral, and multispectral imaging, have shown satisfactory results for the detection of FHB in wheat spikes or crop canopy in the field [7][8][9][10][11][12][13]. Few studies, however, have tried to apply imaging systems to detect FHB or predict DON content levels in wheat kernels, seeds, or flour, and most of them were performed with hyperspectral imaging techniques [14][15][16][17][18]. Although these techniques provide important spectral information from qualitative and quantitative measurements, there are still challenges related to cost, complexity, and instrumentation [17,19,20]. Alternatively, ordinary RGB cameras and scanners are more accessible options, and although they are limited in terms of spectral resolution, when combined with robust pattern-recognition methods, such as deep learning, RGB cameras and scanners can also achieve high precision [21][22][23] (Table S1).
Deep-learning models, including convolutional neural networks (convnets), provide outstanding performance in solving computer-vision tasks, such as plant disease recognition, fruit classification, and crop seed phenotyping [24][25][26][27][28][29]. In the convnets, the input goes through a convolutional base composed of convolutional layers (conv layers) that perform feature extraction. In the final portion of these networks, there is a classifier consisting of fully connected layers (dense layers) that perform a classification to produce the network output [30]. These networks learn how to predict the results based on features extracted from inputs, and then they can be applied to assess seed quality without elaborate equipment or manual extraction.
When building a deep-learning model, an important step is the choice of hyperparameters. Hyperparameters are usually arbitrary. They are related to the network architecture that is not learned, which is similar to the model parameters via backpropagation. Different strategies can be used to adjust these hyperparameters, including random interactions that maximize search efficiency [31]. Another useful approach is to use a convnet previously trained on a large dataset with established hyperparameters [30]. In pretrained networks, a method called fine-tuning can be used. This method consists of training the weights of a few of the top layers from a pretrained convolutional base and jointly training a new fully-connected classifier to adjust the more abstract representations of the model [32]. Therefore, the features learned from a pretrained network can effectively act as a generic model for other computer vision tasks [30].
The objectives of this work were summarized as follows: (1) to provide deep-learning models for monitoring FHB in wheat seeds, and for this, we employed convnets to discriminate unhealthy seeds (i.e., seeds infected by FHB) from healthy seeds based on data extracted from RGB digital images; (2) the use of different convnet models for performance comparison. To address this, we defined a network architecture from scratch (custom convnet) and performed various interactions in the search for optimal hyperparameters. We also used different pretrained convnet architectures with different fine-tuning strategies, comparing their performance with the custom convnet. Furthermore, (3) we searched for the seed features that were most commonly used by the convnets in the learning process.

Seed Sample
Wheat seeds of the TBIO Toruk genotype (moderately susceptible to FHB) were produced during the 2019/2020 crop season in an experimental field located in the region of Passo Fundo (28 • 15 54 S, 52 • 19 14 W, 700 m), Southern Brazil. The spikelets showing symptoms of FHB (premature bleaching) and healthy spikelets were collected, and then the seeds were kept in a cold chamber (15 • C) until image acquisition. In total, 1200 seeds were sampled, containing 600 seeds infected by FHB and 600 healthy seeds. In this study, fungal colonization symptoms in the wheat seeds related to FHB infection were roughness, shriveled appearance, and pink-colored tissues. The following subsections will provide details about deep-learning procedures and the proposed imaging technology (Figure 1). fungal colonization symptoms in the wheat seeds related to FHB infection were roughness, shriveled appearance, and pink-colored tissues. The following subsections will provide details about deep-learning procedures and the proposed imaging technology ( Figure 1). The numbers on the right side of the rectangles indicate the step in the process. These steps are described in the subsequent sections from 2.2 to 2.5.

Image Acquisition, Segmentation, and Resizing
The images were acquired using an Epson Perfection V800 scanner with 1200 DPI resolution. The seeds were randomly arranged without standardization of the ventral and dorsal surfaces to obtain generalist models for any surface or position of the seeds. All images were segmented using histogram-based thresholding to remove the background. Then, the seed bounding boxes were calculated based on the contour lines of the selection The numbers on the right side of the rectangles indicate the step in the process. These steps are described in the subsequent sections from Section 2.2 to Section 2.5.

Image Acquisition, Segmentation, and Resizing
The images were acquired using an Epson Perfection V800 scanner with 1200 DPI resolution. The seeds were randomly arranged without standardization of the ventral and dorsal surfaces to obtain generalist models for any surface or position of the seeds. All images were segmented using histogram-based thresholding to remove the background. Then, the seed bounding boxes were calculated based on the contour lines of the selection and used to cut and store each seed image individually. The images were resized to a resolution of 124 × 124 pixels before training the convnet models. We sampled 1200 images from individual seeds (600 healthy and 600 unhealthy), and this dataset was split into 70% for training, 15% for testing, and 15% for validation.

Custom Convnet Hyperparameter Optimization
To define the architecture of the custom convnet, we used a hyperparameter space (Table 1), and a random search strategy was performed to maximize the search efficiency [31]. Initially, we performed 1728 random interactions to optimize the number of convolutional layers and dense layers in the network architecture. After defining the architecture, we performed more 1701 interactions to evaluate other categorical and numeric hyperparameters of the network (Table 1). To avoid overfitting during hyperparameter optimization, we used data augmentation to randomly rotate images up to 30 • , and apply 20% variations in height, width, shear, and zoom. We also flipped half of the images horizontally, and the nearest strategy was used for filling in newly created pixels, which can appear after a rotation or a width/height shift. We assessed the relationship of validation accuracy with the number of convolutional and dense layers through analysis of variance (ANOVA). An analysis of covariance (ANCOVA) was performed to verify whether the correlation between training loss and validation loss (continuous covariate) differed among the number of layers (categorical variable). We performed multiple analyses of variance (two-way ANOVA) to evaluate the effects of the optimizers, activation functions, and the interactions between these explanatory variables on the variable-response validation accuracy. For the other numeric hyperparameters, we applied multivariate correlations. When necessary, we used Tukey's test (p < 0.05) for comparisons among the explanatory variable levels.

Fine-Tuning Pretrained Convnets
To perform fine-tuning, we used the previously trained convnets MobileNet [33], InceptionV3 [34], VGG16 [35], VGG19 [35], and Xception [36], with different percentages of weights from the ImageNet dataset [37]. We fine-tuned 0% (FT 0) with a convolutional base using all weights from ImageNet, 20% (FT 20), 50% (FT 50), 80% (FT 80), and 100% (FT 100), training the complete network with randomly initialized weights. The classifier added on top of these pretrained convnets consisted of three dense layers with 64, 32, and 2 units per layer, respectively. Among these dense layers, a dropout rate of 20% and an exponential linear unit (ELU) activation function were added. The sigmoid function was used in the last layer as it involved binary classification. A mini-batch stochastic gradient descent (mini-batch SGD) was used as the optimizer, and a binary cross-entropy was used as the cost function. In the SGD optimizer, the learning rate was 0.001 and the momentum was 0.9. We used a callback to decrease or increase the learning rate in the case of a loss plateau to remove local minima during training. In these fine-tuning approaches, we also used data augmentation with the same parameters as the hyperparameter optimization of the custom convnet (Section 2.3).
During training and validation of the fine-tuned convnets, we adjusted the callback to stop the training early after five successive epochs without validation accuracy improvement. The accuracy, loss, and mean square error for training and validation were measured to check the better models. We also measured the training time per epoch and overfitting. Overfitting was calculated as the ratio of training loss divided by validation loss.

Model Tests
The three superior-established convnets (one custom convnet and two from the finetuning stage) were validated using the test data. Therefore, we constructed the confusion matrices and calculated the overall accuracy (OA) (Equation (1)) and kappa coefficient (K) (Equation (2)): where k = number of classes; x ii = observations classified in the correct population (diagonal of the confusion matrix); N = sample size; n i⊕ = marginal total of the matrix line i; n ⊕i = marginal total of the matrix column i. We considered 0.4 <K < 0.6 to indicate that the classification is reasonably suitable, 0.61 <K < 0.8 to mean substantially appropriate, andK > 0.81 to be highly suitable [38]. A Z-test (Equation (3)) was also used to indicate whether theK models (K i andK j ) were statistically different at a significance level of 5%: where, σ 2 After defining the best model in the test step, we developed a class activation map to understand which parts of the seeds led the convnet to the classification decision. In all image analysis and data processing, we used the Python language (version 3.6.8; Amsterdam, The Netherlands) on a Dell G3 machine, with Intel i7-9750H CPU 2.60 GHz × 12, 8 GB RAM and GPU NVIDIA ® GeForce ® GTX 1660 (6 GB) Ti Max-Q. Statistical analyses were performed in the R language (version 4.0.0; Vienna, AT) [39].

Custom Convnet Architecture
Regarding the number of conv layers, there was a significant difference between the numbers tested, in which the model with the lowest number of layers (3 conv layers) showed the highest accuracy (F 2,1725 = 10.54, p < 0.0001, Figure 2A). The accuracy did not vary significantly in relation to the number of dense layers (F 2,1725 = 2.1, p = 0.13, Figure 2A). For overfitting assessed through the correlation between training loss and validation loss, there was a difference between the number of convolutional layers (F 2,1724 = 5.32, p = 0.005, Figure 2B); however, there was no difference between the dense layers (F 2,1724 = 0.79, p = 0.45, Figure 2B). Accordingly, a model with fewer layers tended to avoid overfitting. There was also a significant interaction between the optimizer and the activation function (F 3,1693 = 13.1, p < 0.0001), wherein better accuracy was obtained with the SGD optimizer and Elu activation ( Figure 2C). Although other tested hyperparameters did not show a high correlation with validation accuracy (Figure 2D), we chose these hyperparameters

Fine-Tuning
Considering the results obtained during fine-tuning of the pretrained convnets, the outstanding strategies were MobileNet FT 50 and InceptionV3 FT 100 (Figure 4). High validation accuracy (98.9% and 99.9%), low validation loss (0.029 and 0.012), and low mean square error (0.009 and 0.003) were achieved for MobileNet FT 50 and InceptionV3 FT 100, respectively. These networks also exhibited outstanding values associated with the proportion of overfitting (3.16 and 1.91 for MobileNet FT 50 and InceptionV3 FT 100, respectively), demonstrating excellent generalizability. Therefore, these two networks were used to compare the performance with the custom convnet in the test dataset.

Fine-Tuning
Considering the results obtained during fine-tuning of the pretrained convnets, the outstanding strategies were MobileNet FT 50 and InceptionV3 FT 100 (Figure 4). High validation accuracy (98.9% and 99.9%), low validation loss (0.029 and 0.012), and low mean square error (0.009 and 0.003) were achieved for MobileNet FT 50 and InceptionV3 FT 100, respectively. These networks also exhibited outstanding values associated with the proportion of overfitting (3.16 and 1.91 for MobileNet FT 50 and InceptionV3 FT 100, respectively), demonstrating excellent generalizability. Therefore, these two networks were used to compare the performance with the custom convnet in the test dataset.
The activation map for the custom convnet showed that different regions are activated between unhealthy and healthy seeds ( Figure 6). In healthy seeds, the activation is broader and more homogeneous, as evidenced by image histograms (Figure 6). Nonetheless, in unhealthy seeds, the network tends to activate specific regions where the fungus is evident on the seeds.

Model Performance on the Test Dataset
For classifier tests, the custom (i.e., the network architecture we defined), MobileNet  The activation map for the custom convnet showed that different regions are activated between unhealthy and healthy seeds ( Figure 6). In healthy seeds, the activation is broader and more homogeneous, as evidenced by image histograms (Figure 6). Nonetheless, in unhealthy seeds, the network tends to activate specific regions where the fungus is evident on the seeds.

Discussion
In this study, we used a low-cost imaging technology combined with convnets for the detection of FHB in wheat seeds. Studies using imaging technologies for this seed species and deep-learning approaches are mainly based on hyperspectral imaging systems [8,[14][15][16]40]. Although hyperspectral imaging is becoming more popular, it still presents

Discussion
In this study, we used a low-cost imaging technology combined with convnets for the detection of FHB in wheat seeds. Studies using imaging technologies for this seed species and deep-learning approaches are mainly based on hyperspectral imaging sys-tems [8,[14][15][16]40]. Although hyperspectral imaging is becoming more popular, it still presents cost restrictions and operational complexity [17,19,20]. On the other hand, the use of RGB images generally offers easy-to-use devices with high spatial resolution and low cost [41][42][43]. RGB scanner systems are easy to set up, upgrade, and replace in an online quality control process. Furthermore, the use of these devices (which are based on closed light sources) reduces the influence of ambient lighting and, therefore, does not require advanced processing techniques to obtain accurate results. However, the image capture performance may not be as fast as high-speed cameras, which may not be convenient when analyzing a very large number of samples [41].
RGB images have already been tested for the identification of FHB in wheat seeds with 85% accuracy by means of a linear discriminant analysis [44]. Gu et al. [23] used features of RGB images of wheat ears extracted from AlexNet (a deep convolutional neural network-DCNN) for training a random forest (RF) algorithm, which was able to identify infected and healthy ears with an accuracy of ca. 93%. Qiu et al. [22] used RGB images of the canopy and a mask region-based convolutional neural network (Mask-RCNN) for insitu non-destructive FHB detection, obtaining an average precision of 92.01%. In our study, a high accuracy (>97%) was achieved using a low-complexity design architecture with hyperparameter optimization and 99% accuracy was reached using pretrained networks with fine-tuning.
The deep-learning approaches (hyperparameter optimization and fine-tuning) applied in the current work were crucial to obtaining accurate models for the detection of FHB.
Regarding the number of layers in the custom convnet model, the hyperparameter optimization results demonstrated that the custom convnet with three convolution layers and three dense layers (i.e., the custom convnet with the fewest layers) presented superior performances, both in terms of learning (superior accuracy in random search) and generalization (low overfitting in random search). With increasing network size, the number of parameters also increases, which can generate overparameterization, wherein the network has more parameters than training data (i.e., RGB images of seeds). Therefore, the system tends to overfit the training data and does not learn to generalize when applied to disease prediction in new wheat seed data [45]. The satisfactory generalization of networks can also be attributed to the use of data augmentation. This approach increases the amount of training data through random transformations and helps expose the model to more aspects of the data and enables it to generalize better [46].
The results for hyperparameter optimization also demonstrated that the activation function and optimization were important to improving the accuracy of the custom convnets. The highest accuracy for the validation dataset was obtained with the Elu function, along with the mini-batch stochastic gradient descent (SGD) optimizer. The identity function, Elu, tends to normalize the output of the layers [47]. This normalization probably increased the accuracy by optimizing the performance of the SGD in the process of updating the trainable parameters (weights) of the network to minimize the cost function.
We used binary cross-entropy (cost function) to compute the loss during the network training, which is in consonance with the binary classification of the seeds (healthy and unhealthy) [48]. The other hyperparameters were optimized following the direction of the correlations to keep the custom convnet more parsimonious. For instance, dropout had a low correlation with accuracy, so we applied only 1% dropout. Dropout is a regularization technique that randomly zeroes out the input units of a layer, breaking fixed patterns to avoid overfitting, but can decrease accuracy at high rates [49]. The learning rate was one of the hyperparameters that showed the highest correlation with accuracy in hyperparameter optimization, indicating that a higher learning rate would be better in FHB detection. Inversely, at high learning rates, network updates can result in great randomness [50]; therefore, we applied a rate that decreased during the training epochs (each iteration over all the training data).
Fine-tuning is an interesting technique to reuse the model weights learned in large datasets in a new dataset. The model was fine-tuned to the new data of interest. The models tested in the current study (MobileNet, InceptionV3, VGG16, VGG19, and Xception) were trained on the ImageNet dataset. This dataset was broad enough that the resources learned by the pretrained networks could effectively act as a generic and relevant model for the detection of FHB in wheat seeds [37]. Although the MobileNet FT 50 and InceptionV3 FT 100 strategies showed similar accuracy in the fine-tuning stage, InceptionV3 FT 100 presented better predictive capacity (i.e., better performance in the test stage), as indicated in the Z-test. InceptionV3 and MobileNet networks have similar proposals, but InceptionV3 aims for high precision, while MobileNet is a lightweight model that balances compression with accuracy [51]. Thus, InceptionV3 was the model that exhibited better generalization capacity in the prediction of the FHB. An interesting result is that the custom convnet was similar to InceptionV3, indicating that even a small network with few parameters can be applied successfully in the identification of FHB in wheat seeds with low computational cost.
The specific areas of the seeds detected by the network (i.e., the most activated areas) would be difficult to manually extract for the detection of FHB in wheat seeds. Meanwhile, the application of deep learning methods allowed the extraction of different features automatically, without requiring manual features. This means that this method has great potential for practical applications in quality-control programs of wheat seeds, with high throughput capacity and high accuracy, in addition to the advantage of requiring low-cost equipment. With the availability of sufficiently good quality images, the models could also be trained in the future to make classifications based on the severity of FHB on seeds and even of other diseases, such as Ergot (Claviceps purpurea Tul.) [18]. However, despite the promising results of our approach, FHB symptoms are genotype-dependent, and the accuracy of the detection method may vary depending on phenotypic variations among wheat cultivars. This issue, however, can be solved by training the convnets with datasets of seed images of the different cultivars used worldwide.

Conclusions
We present an approach based on deep learning (hyperparameter optimization and fine-tuning) with different convnets (MobileNet, InceptionV3, VGG16, VGG19, and Xception) using RGB images for automatic inspection of classes of wheat seeds infected by FHB. The proposed method was developed using wheat seeds of the TBIO Toruk cultivar, and the pretrained convnets with fine-tuning allowed an increase in the accuracy of detection of infected seeds to more than 97% with the InceptionV3 FT 100 model and the custom convnet. Therefore, our findings provide guidelines for the accurate detection of FHB in wheat seeds using low-cost imaging technology combined with deep-learning models, providing new opportunities for non-destructive and rapid screening of the wheat seed health status in breeding programs, seed-analysis laboratories, and the food industry.