Imperfect Wheat Grain Recognition Combined with an Attention Mechanism and Residual Network

Zhang, Weiwei; Ma, Huimin; Li, Xiaohong; Liu, Xiaoli; Jiao, Jun; Zhang, Pengfei; Gu, Lichuan; Wang, Qi; Bao, Wenxia; Cao, Shengnan

doi:10.3390/app11115139

Open AccessArticle

Imperfect Wheat Grain Recognition Combined with an Attention Mechanism and Residual Network

by

Weiwei Zhang

¹,

Huimin Ma

^1,2,*

,

Xiaohong Li

³,

Xiaoli Liu

¹,

Jun Jiao

¹,

Pengfei Zhang

³,

Lichuan Gu

¹,

Qi Wang

³,

Wenxia Bao

² and

Shengnan Cao

⁴

¹

School of Information and Computer, Anhui Agricultural University, Hefei 230036, China

²

National Engineering Research Center for Agro-Ecological Big Data Analysis & Application, Anhui University, Hefei 230601, China

³

Institute of Intelligent Machinery, Hefei Academy of Material Sciences, Chinese Academy of Sciences, Hefei 230031, China

⁴

Anhui Institute of Grains and Oil Sciences, Hefei 230031, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(11), 5139; https://doi.org/10.3390/app11115139

Submission received: 8 May 2021 / Revised: 27 May 2021 / Accepted: 27 May 2021 / Published: 1 June 2021

(This article belongs to the Section Agricultural Science and Technology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Intelligent detection of imperfect wheat grains based on machine vision is of great significance to correctly and rapidly evaluate wheat quality. There is little difference between the partial characteristics of imperfect and perfect wheat grains, which is a key factor limiting the classification and recognition accuracy of imperfect wheat based on a deep learning network model. In this paper, we propose a method for imperfect wheat grains recognition combined with an attention mechanism and residual network (ResNet), and verify its recognition accuracy by adding an attention mechanism module into different depths of residual network. Five residual networks with different depths (18, 34, 50, 101, and 152) were selected for the experiment, it was found that the recognition accuracy of each network model was improved with the attention mechanism, and the average recognition rate of ResNet-50 with the addition of the attention mechanism reached 96.5%. For ResNet-50 with the attention mechanism, the optimal learning rate was further screened as 0.0003. The average recognition accuracy reached 97.5%, among which the recognition rates of scab wheat grains, insect-damaged wheat grains, sprouted wheat grains, mildew wheat grains, broken wheat grains, and perfect wheat grains reached 97%, 99%, 99%, 95%, 96%, and 99% respectively. This work can provide guidance for the detection and recognition of imperfect wheat grains using machine vision.

Keywords:

imperfect wheat grains; machine version; residual network; attention mechanism

1. Introduction

Wheat is a major grain crop and an important commodity grain and strategic reserve grain variety in China, which plays an important role in grain production, circulation and consumption. Imperfect wheat grains refer to damaged but still usable wheat grains, including scab wheat grains, insect-damaged wheat grains, sprouted wheat grains, mildew wheat grains, and broken wheat grains. In the process of wheat circulation, the content of imperfect grains is the limiting index to measure the quality of wheat [1]. At present, the detection of imperfect grains mainly includes artificial sensory detection methods and recognition methods based on machine vision [2]. Due to the disadvantages of being time consuming, laborious, having low reproducibility and strong subjectivity, manual detection methods can no longer meet the requirements of rapid and accurate detection of large-scale wheat. In recent years, the method of automatic wheat grain recognition based on machine vision has attracted widespread attention. This detection technology has made some progress in wheat species and variety recognition [3,4] and classification of wheat and similar grains [5], for example, heat quality detection and grading [6,7], imperfect grain detection [8], cuticle rate, and hardness detection [9]. Classification based on machine vision uses a camera to shoot a wheat image. Through its analysis and processing, the shape, color, texture and other characteristic parameters of wheat are calculated. It can complete quality inspection of wheat size, color, surface smoothness, surface defects and damage at one time. Then, the corresponding grading institutions are controlled to carry out wheat quality grading. This method can, not only overcome the disadvantage of artificial classification, but also does not damage wheat in the process of recognition. At present, most identification methods adopt feature extraction algorithms, but artificial feature extraction needs to be continuously optimized through testing, and the process is quite complex; the wheat varieties are mixed, interlacing between imperfect grains (e.g., diseased spots on a single broken grain) and there are inevitable shift and illumination inequality factors in image collection [10]. As a result, it is difficult to find accurate and stable features in practical applications, and this method can no longer meet the need of rapid identification of imperfect grains.

With the rapid development of deep learning in the field of image recognition, the convolutional neural network (CNN) has also received extensive attention in the field of agriculture [11], for example, the identification of plant diseases and insect pests [12,13,14], weed identification [15], crop species identification [16,17], crop yield estimation [18], and other aspects have achieved excellent performance. There are also many imperfect wheat grains identification methods based on CNN. In 2010, Cheng et al. [19] used a two-layer back propagation (BP) neural network to identify perfect and broken wheat grains and the recognition accuracy reached 97.5%. In 2017, Cao et al. [20] added spatial pyramid pooling to a conventional CNN, and used this model to identify perfect wheat grains and imperfect wheat grains of two types, and the average recognition rate of the test reached 93.36%. In 2017, Le et al. [21] realized rapid identification between perfect and imperfect wheat grains by combining hyperspectral data and CNN. In 2020, Zhu et al. [22] used four CNNs (LeNet-5, AlexNet, VGG-16, and ResNet-34) to identify perfect and broken wheat grains, and compared them with a traditional support vector machine (SVM) and BP neural network, and the results showed that the identification accuracy was greatly improved by using CNN. In 2021, He et al. [23] used the LeNet-5, ResNet-34 and VGG-16 model combined with an image enhancement method to highlight the characteristics of imperfect particles, and the test accuracy was improved by 1% compared with the model without image enhancement.

The image features of imperfect grains of wheat are not clearly distinguished and the overall similarity is high. Therefore, its classification and recognition can be classified as a fine-grained image classification problem. In 2017, Luo et al. [24] pointed out that, different from ordinary image classification tasks, the signal to noise ratio (SNR) of fine-grained images is very small, and the information containing a sufficient degree of discrimination usually only exists in a very small and local area. How to effectively extract and utilize useful information from these local regions is the key to the success of a fine-grained image classification algorithm. In 2017, the Google team [25] proposed a simple network structure based on an attention mechanism and applied it to machine translation. Later, the attention mechanism was widely applied in the deep learning field. In 2018, Woo [26] proposed an attention module, the convolutional block attention module (CBAM), and added it to the CNN model. After testing, it was found that the network model with the attention module was also better than the network model without it in terms of image recognition. In 2019, Xu et al. [27] added a channel attention mechanism to VGG-16 and compared three types of fine-grained image datasets. The results showed that the network with the attention mechanism, not only improved classification accuracy, but also had a good generalization ability. In 2020, Peng et al. [28] applied CNN with an attention mechanism to soybean aphid identification which resulted in higher accuracy.

Based on the above, this study will attempt to combine the attention mechanism and residual network to classify and recognize six kinds of wheat grains: scab wheat grains, insect-damaged wheat grains, sprouted wheat grains, mildew wheat grains, broken wheat grains, and perfect grains. The aim is to explore a more accurate deep learning model that is suitable for imperfect wheat grain recognition, and to provide guidance for intelligent detection and recognition methods of wheat.

The structure of this paper is as follows: the second section introduces the methods used in this paper, the third section shows the experimental results, tables, and discussions, and the conclusions are presented in the fourth section.

2. Model and Methods

2.1. Attention Mechanism

The attention module used in this paper is CBAM [26], which is primarily divided into two parts: channel attention and spatial attention. Channel attention focuses on ‘what’ is meaningful given an input image, while spatial attention focuses on ‘where’ as an informative part.

2.1.1. Channel Attention

Channel attention is used to pass the input feature graph

F \in ℝ^{C \times H \times W}

through both the average-pooling and max-pooling layers. The feature vectors after the average-pooled and the max-pooled layers are, respectively, expressed as

F_{a v g}^{C}

and

F_{m a x}^{C}

.

F_{a v g}^{C}

and

F_{m a x}^{C}

will then pass through a shared network of multi-layer perceptron (MPL). Thus, the channel attention map

M_{c} \in ℝ^{C \times 1 \times 1}

can be obtained. The formula is expressed as [26]:

\begin{array}{l} M_{c} (F) = σ (M P L (F_{a v g}^{C}) + M P L (F_{m a x}^{C})) \\ = σ (W_{1} (W_{0} (F_{a v g}^{C})) + W_{1} (W_{0} (F_{m a x}^{C}))) \end{array}

(1)

σ

represents the sigmoid function. MPL contains a hidden layer, in order to reduce parameter overhead, the hidden activation size is set to

ℝ^{C / r \times 1 \times 1}

, where is the reduction ratio set to 16.

W_{0} \in ℝ^{C / r \times C}

, and

W_{1} \in ℝ^{C \times C / r}

represents the weight shared in the MPL. It is important to note that a ReLU activation function is followed by

W_{0}

.

2.1.2. Spatial Attention Module

The spatial attention module uses average-pooling and max-pooling to map the RGB channels of a feature map, generating two 2D maps:

F_{a v g}^{s} \in ℝ^{1 \times H \times W}

and

F_{m a x}^{s} \in ℝ^{1 \times H \times W}

. They denote average-pooled features and max-pooled features across the channel. This information is then connected and convolved through a standard convolution layer to generate spatial attention module

M_{s} \in ℝ^{H \times W}

. Spatial attention is calculated as follows [26]:

M_{s} (F) = σ (ƒ^{7 \times 7} (F_{a v g}^{s}; F_{m a x}^{s}))

(2)

σ

is sigmoid function,

ƒ^{7 \times 7}

means that after a convolution operation, the size of the convolution kernel is 7 × 7.

2.2. ResNet Model

ResNet [29] is superposed by a large number of residual blocks, and its core idea is to enable the residual blocks to have an identity mapping ability, this can make the input of this building block equal to the output. This ability for identity mapping of residual blocks is achieved through the use of shortcut connections, which add the input of a block to its output. There are two types of basic residual blocks used. ResNet-18/34 and ResNet-50/101/152 use residual blocks and are shown in Figure 1a,b, respectively. A residual block can be expressed as [29]:

X_{i}_{+ 1} = X_{i} + F (X_{i})

(3)

where

X_{i}

is the direct mapping part of the residual block,

F (X_{i})

is the residual mapping part of the residual block, and

F (X_{i})

means after two or three convolution operations.

2.3. The Residual Block Integrates with the Attention Mechanism

The integration of the residual block and the attention mechanism requires the addition of the attention mechanism in the ResNet residual block; this is done by adding channel and spatial attentions at the end of each base residual block of the ResNet model and adding the resulting attention results to the input to generate the new feature map. Because the base residual blocks used by the different layers of the ResNet model are different, the base residual blocks with the attention mechanism are also different. Figure 2a,b shows a structural diagram of residual blocks of ResNet-18/34 and ResNet-50/101/152 with the attention mechanism. A residual block to add attention can be expressed as [26]:

F^{'} = M_{c} (F) \otimes F

(4)

F^{″} = M_{s} (F^{'}) \otimes F^{'}

(5)

X_{i + 1} = X_{i} + F^{″}

(6)

F

represents the residual map after two or three convolution operations,

F^{″}

represents the final feature map after channel attention and the spatial attention is equivalent to the residual map with added attention, where

\otimes

denotes element-wise multiplication.

3. Results and Discussion

3.1. Data Acquisition

The wheat samples used in this study were collected by the relevant personnel of the Anhui Grains and Oil Research Institute, and included perfect wheat grains and five kinds of imperfect wheat grains sample; namely, perfect wheat grains, scab wheat grains, insect-damaged wheat grains, sprouted wheat grains, mildew wheat grains, and broken wheat grains. Some sample pictures are shown in Figure 3. The data were acquired by professional wheat quality inspection personnel of the research institute. The dataset used for wheat grain identification was 1800 artificially taken wheat grains images in six categories; each category had 300 images, and each image was a 100 × 100 pixel three-channel RGB image. The original dataset was divided into a training set and a test set, with 200 pieces in each category as the training set data and 100 pieces as the test set data.

Because the wheat image is captured in a single fixed scene, the training set is enriched by rotating the data sample at any angle, adjusting the image saturation, brightness, contrast and sharpness, and adding image noise. The diversity of datasets can make the model have a stronger generalization ability. The training set of each type of wheat (200 pictures) was expanded to 3000, so that the total number of pictures in the training set reached 18,000. These 18,000 pictures were divided using a 9:1 ratio in training, which meant that 16,200 pictures actually participated in network training, and 1800 pictures were used as the verification set in the training process.

3.2. Hardware and Software Preparation

The main parameters of the hardware device used in this test are: Intel(R) Core(TM) I5-7400 CPU @3.00GHz, an NVIDIA RTX1050Ti GPU was used for GPU acceleration with 4GB of memory. Python 3.8 was the programming language and Tensorflow2.3 framework was used to build the model on the PyCharm development platform.

3.3. Model Training and Test Results

In order to verify the feasibility of the attention mechanism, a total of 10 groups of models were tested: ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, and ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152 with an attention mechanism, respectively. Each network model iterated 100 epochs, the initial learning rate was set as 0.0001, and the learning rate decayed by 1% for each iteration.

Figure 4 shows the loss of function iteration curve of the ResNet training process without an attention mechanism and with an attention mechanism. The accuracy iteration curve of the ResNet training process and the accuracy iteration curve of the testing process can be seen in Figure 4a. The convergence rate of the training loss function curve of the ResNet model without the attention mechanism is slower with an increase in the number of network layers. As can be seen from Figure 4c,e, at the same time, the increase rate of the training accuracy and test accuracy also slow down. Figure 4b shows that the ResNet model with the added attention mechanism greatly reduces this phenomenon. The training loss function curve of the network model with different layers almost converges at the same time. By comparing the training results without an attention mechanism, it can be seen from Figure 4d,f that the training results of the network model with an attention mechanism are significantly more stable and can achieve a higher accuracy with only a small number of iterations.

Table 1 and Table 2 show a comparison of the number of parameters, training time, and optimal accuracy of the ResNet training process, with and without an attention mechanism. According to the results in Table 1, with an increasing number of layers in the ResNet model, the classification accuracy of imperfect wheat grains did not increase. ResNet-50 achieved the highest classification accuracy in a relatively small amount of time. ResNet-101 and ResNet-152 needed more training time for the same number of iterations, but did not achieve a higher classification accuracy. It can be seen from Table 1 and Table 2 that the number of parameters in the training of the ResNet model after the addition of an attention mechanism has been increased. For the network model under the same training batch, the training time for 100 iterations was increased by 1–2 h. By comparing the accuracy, it can be seen that the classification of imperfect wheat grains was significantly improved after the addition of the attention mechanism, and the best classification effect was ResNet-50. The only decrease was in ResNet-152. It is speculated that the reason may be that the number of parameters was too large. In order to ensure a normal training, the training batch had to be reduced without changing the computer equipment, which significantly increased the time spent in the training test of ResNet-152; however, the recognition accuracy did not improve.

3.4. Comparison of Identification Results

The dataset used for the test included perfect wheat grains and five types of imperfect wheat grains, each of which consisted of 100 images (100 × 100 pixels), which resulted in a total of 600 images. Precision, recall and F-measure were used as indicators to evaluate the performance of the model, and the calculation method was as follows:

Precision = \frac{T P}{T P + F P} \times 100 %

(7)

Recall = \frac{T P}{T P + F N} \times 100 %

(8)

F - measure = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(9)

In the formula, TP represents the number of positive samples labeled as positive samples; that is, the correct number of imperfect wheat grains identified as that kind of wheat. FP represents the number of negative samples incorrectly labeled as positive samples; that is, the imperfect wheat grains species classified as one kind of wheat where they were actually another kind of imperfect wheat. FN represents the number of positive samples that were incorrectly labeled as negative samples; that is, imperfect wheat grain species of this type of wheat were incorrectly classified as another kind of imperfect wheat.

According to the above experimental comparison, the optimal network model without an attention mechanism and with an adding attention mechanism is the ResNet-50 model. Therefore, a confusion matrix was drawn and the precision recall rate and weighted F-measure were calculated to further compare the recognition accuracy of various types of perfect and imperfect wheat grains using ResNet-50. Table 3 and Table 4, respectively, present the ResNet-50 confusion matrix and performance analysis, with and without an attention mechanism. According to the F-measure values in Table 3, the classification results of various types of imperfect wheat grains in the ResNet-50 model were similar, reaching more than 94%. By comparing the values in Table 3 and Table 4, it can be seen that the ResNet-50 network model with an attention mechanism is superior to the ResNet-50 model without an attention mechanism in terms of the recognition rate of all kinds of imperfect wheat grains. After comparing Table 3 and Table 4, we found that the F-measure value of the ResNet-50 network with an attention mechanism only increased the mildew category by 0.09% compared with the traditional ResNet-50 network. According to the confusion matrix, we can see that the model with an attention mechanism mistakenly classified six mildew wheat grains as sprouted wheat grains; as a result, the precision value of the sprouted category and recall value of the mildew category decreased, and the F-measure value of these two categories was significantly lower than that of other categories.

We tested the inference time of the traditional ResNet-50 and the ResNet-50 with an attention mechanism. Using 600 pictures in the test set, the total inference time of the model was 1.09 s and 1.63 s, respectively. The model with an attention mechanism took 50% more time than the traditional model. Because adding an attention mechanism increases the parameters of the model, it is reasonable to use more time. We can see that the model can acquire a large number of image prediction results in a very short time. The inference time of 600 images using ResNet-50 with an attention mechanism only increased by 0.49 s, which was acceptable.

3.5. Network Visualization with Grad-CAM

For further analysis, Grad-CAM [30] was used to visualize the attention of the network model. Grad-CAM is a recently proposed visualization method that uses gradients to calculate importance. As shown in Figure 5, compared to the network without an attention mechanism, the network model with an attention mechanism can focus attention more on the imperfect features of grains, which is obviously more beneficial to the recognition of the network model. It can be seen that the attention mechanism can obtain the feature information in pictures more effectively.

3.6. Optimize Learning Rate Results

In order to further improve the identification accuracy of imperfect wheat grains, the optimal learning rate of ResNet-50 with the best test effect was selected; seven different initial learning rates were set as 0.0001, 0.0002, 0.0003, 0.0004, 0.0005, 0.001, and 0.01. The learning rate decay was also set, and the learning rate decreased by 1% for each iteration. Iteration curves for different learning rates are shown in Figure 6. As can be seen from Figure 6a, when the learning rate was set to 0.01, the loss of function of the network model decreased slowly, which led to a slow increase in the training accuracy and testing accuracy (Figure 6b,c). It took nearly 50 iterations to reach a 90% accuracy, when the learning rate was small, the loss of function could converge quickly. Meanwhile, the accuracy of training and testing could reach 90% after 10 iterations, and the curve was very stable. It can be seen that the network model has a better performance effect when the learning rate is small.

Table 5 shows the model training parameters and training results at different learning rates. According to Table 5, the best achieved test accuracy was 97.5% when the learning rate was set to 0.0003. Table 6 shows the classification performance of ResNet-50 with an attention mechanism when the learning rate was set to 0.0003. It can be seen that the F-measure value of wheat grains of all kinds reached more than 96%, and the average F-measure value was 97.51%.

4. Conclusions

In this paper, we combined an attention mechanism and the ResNet model to classify imperfect wheat grains, and proposed an imperfect wheat grain recognition method based on the attention mechanism. The study compared the recognition effect of the ResNet model with and without an attention mechanism. The results showed that the recognition accuracies of the five ResNet models with an attention mechanism were improved compared to that without an attention mechanism. The recognition accuracy of ResNet-50 was 97.5%. The confusion matrix and classification performance of the ResNet-50 model, with and without an attention mechanism, were calculated and the results showed that the ResNet-50 model with an attention mechanism was better than the model without attention mechanism in classifying the perfect wheat grains and the five types of imperfect wheat grains, because of the increased training parameters, resulting in an increase in training time. However, the stability of the model was enhanced, so it was feasible to introduce an attention mechanism into the classification of imperfect wheat grains.

The method proposed in this study can provide a new idea for the automatic identification of imperfect wheat grains in practical applications. The research results of effectively improving the automatic recognition accuracy of imperfect wheat grains are of great significance for intelligent and automatic classification of grain quality. However, the classification work of this study was limited to a single wheat grain, which is inefficient in practical applications. In the future work, we will identify a large number of tiled wheat grains in an image, and consider the problems of grain segmentation and target detection.

Author Contributions

Conceptualization, W.Z. and H.M.; methodology, W.Z. and H.M. and X.L. (Xiaohong Li); validation, W.B., S.C. and P.Z.; formal analysis, P.Z. and X.L. (Xiaoli Liu); investigation, H.M.; resources, X.L. (Xiaohong Li) and X.L. (Xiaoli Liu); data curation, X.L. (Xiaohong Li), Q.W.; writing—original draft preparation, W.Z.; writing—review and editing, W.Z. and H.M.; visualization, H.M.; supervision, L.G.; project administration, J.J.; funding acquisition, Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Open Research Fund of National Engineering Research Center for Agro-Ecological Big Data Analysis & Application, Anhui University, grant number AE201905; the Natural Science Foundation of Higher Education of Anhui Province, grant number KJ2019A0210; the Key research and development project of Anhui Province, grant number 201904c03020007; and the Major Science and Technology Project of Anhui Province, grant number S201903a06020020.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, F.N. Real-Time Detection of Kernel-like Impurity and Unsound Kernel in Wheat Using Machine Vision. Ph.D. Thesis, Zhejiang University, Zhejiang, China, 2012. [Google Scholar]
Vithu, P.; Moses, J.A. Machine vision system for food grain quality evaluation: A review. Trends Food Sci. Technol. 2016, 56, 13–20. [Google Scholar] [CrossRef]
Manickavasagan, A.; Sathya, G.; Jayas, D.S. Comparison of illuminations to identify wheat classes using monochrome images. Comput. Electron. Agric. 2008, 63, 237–244. [Google Scholar] [CrossRef]
Pourreza, A.; Pourreza, H.; Abbaspour-Fard, M.H.; Sadrnia, H. Identification of nine Iranian wheat seed varieties by textural analysis with image processing. Comput. Electron. Agric. 2012, 83, 102–108. [Google Scholar] [CrossRef]
Choudhary, R.; Paliwal, J.; Jayas, D.S. Classification of cereal grains using wavelet, morphological, colour, and textural features of non-touching kernel images. Biosyst. Eng. 2008, 99, 330–337. [Google Scholar] [CrossRef]
Zhijun, W.; Peisheng, C.; Jialu, Z.; Zhongliang, Z. Method for identification of external quality of wheat grain based on image processing and artificial neural network. Trans. Chin. Soc. Agric. Eng. 2007, 23, 158–161. [Google Scholar]
Da, Z.H.; Rong, Z.Y.; Yu, W.W.; Qing, Z.X.; Sai, C.S. Identification method of maize quality grades based on machine vision. Sci. Technol. Cereals Oils Foods 2016, 24, 50–56. [Google Scholar]
Rong, Z.Y.; Sai, C.S.; Qing, Z.X.; Yu, W.W.; Qiong, W.; Rong, W.H. Identification of unsound kernels in wheat based on image processing and neural network. Sci. Technol. Cereals Oils Foods 2014, 22, 59–63. [Google Scholar]
Xie, F.; Pearson, T.; Dowell, F.; Zhang, N. Detecting vitreous wheat kernels using reflectance and transmittance image analysis. Cereal Chem. 2004, 81, 594–597. [Google Scholar] [CrossRef] [Green Version]
Barbedo, J.G.A. A review on the main challenges in automatic plant disease identification based on visible range images. Biosyst. Eng. 2016, 144, 52–60. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldu, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef] [Green Version]
Dyrmann, M.; Karstoft, H.; Midtiby, H.S. Plant species classification using deep convolutional neural network. Biosyst. Eng. 2016, 151, 72–80. [Google Scholar] [CrossRef]
Cheng, X.; Zhang, Y.; Chen, Y.; Wu, Y.; Yue, Y. Pest identification via deep residual learning in complex background. Comput. Electron. Agric. 2017, 141, 351–356. [Google Scholar] [CrossRef]
Fuente, A.; Yoon, S.; Kim, S.; Park, D. A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition. Sensors 2017, 17, 2022. [Google Scholar] [CrossRef] [Green Version]
Rahnemoonfar, M.; Sheppard, C. Deep count: Fruit counting based on deep simulated learning. Sensors 2017, 17, 905. [Google Scholar] [CrossRef] [Green Version]
Lee, S.H.; Chan, C.S.; Mayo, S.J.; Remagnino, P. How deep learning extracts and learns leaf features for plant classification. Pattern Recognit. 2017, 71, 1–13. [Google Scholar] [CrossRef] [Green Version]
Yu, S.; Yuan, L.; Guan, W.; Haiyan, Z. Deep learning for plant identification in natural environment. Comput. Intell. Neurosci. 2017, 2017, 7361042. [Google Scholar]
Elavarasan, D.; Vincent, D.R.; Sharma, V.; Zomaya, A.Y.; Srinivasan, K. Forecasting yield by integrating agrarian factors and machine learning models: A survey. Comput. Electron. Agric. 2018, 155, 257–282. [Google Scholar] [CrossRef]
Cheng, F.; Chen, F.N.; Ying, Y.B. Image recognition of unsound wheat using artificial neural network. In Proceedings of the 2010 Second WRI Global Congress on Intelligent Systems, Wuhan, China, 16–17 December 2010; pp. 172–175. [Google Scholar]
Ting-Cui, C.; Xiao-Hai, H.; De-Liang, D.; Heng, S.; Shu-Hua, X. Identification of unsound kernels in wheat based on CNN deep model. Mod. Comput. 2017, 36, 9–14. [Google Scholar]
Le, Y.; Chao, W.; Jingzhu, W.; Yan, C.; Yangyang, L.; Yao, W. Identification method of unsound kernel wheat based on hyperspectral and convolution neural network. J. Electron. Meas. Instrum. 2017, 31, 1297–1303. [Google Scholar]
Zhu, S.; Zhuo, J.; Huang, H.; Li, G. Wheat grain integrity image detection system based on CNN. Trans. Chin. Soc. Agric. Mach. 2020, 51, 36–42. [Google Scholar]
He, J.; Wu, X.; He, X.; Hu, J.; Qin, L. Imperfect wheat kernel recognition combined with image enhancement and conventional neural network. Comput. Appl. 2021, 41, 911–916. [Google Scholar]
Luo, J.H.; Wu, J.X. A survey on fine-grained image categorization using deep convolutional features. Acta Autom. Sin. 2017, 43, 1306–1318. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Aidan; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module; European Conference on Computer Vision Springer: Cham, Switzerland, 2018. [Google Scholar]
Xu, L. Research on Fine-Grained Object Classification Method Based on Attention Mechanism. Master’s Thesis, Agricultural University of Hebei, Baoding, China, 2019. [Google Scholar]
Sun, P.; Chen, G.; Cao, L. Image recognition of soybean pests based on attention convolutional neural network. J. Chin. Agric. Mech. 2020, 41, 171–176. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Residual blocks.

Figure 2. Residual blocks with an attention mechanism.

Figure 3. Sample images of imperfect grains and perfect grains of wheat.

Figure 4. Comparison diagrams of the training results.

Figure 5. Sample visualization: Grad-CAM. (Left) original image, (Center-left) activated heatmaps without attention mechanism, (Center) activated heatmaps with attention mechanism, (Center-Right) heatmap superimposed on original image without attention mechanism, and (Right) heatmap superimposed on original image with attention mechanism.

Figure 6. Results of ResNet-50 with attention mechanism at different learning rates.

Table 1. Parameters and results of the training model.

Method	Batch	Parameters	Epoch	Time	Accuracy (%)
ResNet-18	32	11,189,190	100	1 h 27 m 25 s	94.17
ResNet-34	32	21,304,774	100	2 h 27 m 54 s	94.83
ResNet-50	32	23,573,446	100	3 h 27 m 54 s	95.33
ResNet-101	32	42,617,798	100	5 h 34 m 21 s	95.17
ResNet-152	32	58,307,526	100	7 h 45 m 19 s	95.17

Table 2. Parameters and results of the training model with attention mechanism.

Method (Attention)	Batch	Parameters	Epoch	Time	Accuracy (%)
ResNet-18	32	11,279,054	100	2 h 13 m 9 s	95.17
ResNet-34	32	21,467,538	100	3 h 59 m 52 s	95.50
ResNet-50	32	26,106,006	100	5 h 24 m 52 s	96.50
ResNet-101	32	47,398,744	100	9 h 42 m 9 s	96.17
ResNet-152	15	64,941,466	100	24 h 26 m 34 s	94.83

Table 3. Confusion matrix and classification performance of the ResNet-50 model.

Grains Type of Wheat	Predicted Species						Classification Performance
Grains Type of Wheat	Scab	Insect-Damaged	Sprouted	Mildew	Broken	Perfect	Precision	Recall	F-Measure
Scab	98	0	0	0	0	2	95.15	98	96.55
Insect-damaged	2	93	1	3	1	0	97.89	93	95.38
Sprouted	1	0	93	1	4	1	95.88	93	94.41
Mildew	2	0	1	96	0	1	94.12	96	95.05
Broken	0	0	1	1	96	2	95.05	96	95.52
Perfect	0	2	1	1	0	96	94.12	96	95.05

Table 4. Confusion matrix and classification performance of the ResNet-50 model with attention.

Grains Type of Wheat	Predicted Species						Classification Performance
Grains Type of Wheat	Scab	Insect-Damaged	Sprouted	Mildew	Broken	Perfect	Precision	Recall	F-Measure
Scab	100	0	0	0	0	0	96.15	100	98.04
Insect-damaged	0	100	0	0	0	0	95.24	100	97.56
Sprouted	0	1	98	0	0	1	92.45	98	95.65
Mildew	3	1	6	90	0	0	98.9	90	95.14
Broken	1	1	1	0	95	2	100	95	97.43
Perfect	0	2	1	1	0	96	96.97	96	96.48

Table 5. The test of different learning rates for ResNet-50 with an attention mechanism.

Learning Rate	Batch	Epoch	Time	Accuracy (%)
0.0001	32	100	5 h 24 m 52 s	96.33
0.0002	32	100	5 h 25 m 6 s	97
0.0003	32	100	5 h 24 m 1 s	97.5
0.0004	32	100	5 h 34 m 16 s	97.17
0.0005	32	100	5 h 34 m 16 s	96.5
0.001	32	100	5 h 43 m 52 s	95.5
0.01	32	100	5 h 59 m 59 s	93.33

Table 6. Confusion matrix and classification performance of ResNet-50 model with attention.

Grains Type of Wheat	Predicted Species						Classification Performance
Grains Type of Wheat	Scab	Insect-Damaged	Sprouted	Mildew	Broken	Perfect	Precision	Recall	F-Measure
Scab	97	0	0	1	0	2	98.98	97	97.98
Insect-damaged	0	99	1	0	0	0	100	99	99.5
Sprouted	0	0	99	1	0	0	93.4	99	96.12
Mildew	1	0	4	95	0	0	97.94	95	96.43
Broken	0	0	1	0	96	3	100	96	97.96
Perfect	0	0	1	0	0	99	95.19	99	97.06

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, W.; Ma, H.; Li, X.; Liu, X.; Jiao, J.; Zhang, P.; Gu, L.; Wang, Q.; Bao, W.; Cao, S. Imperfect Wheat Grain Recognition Combined with an Attention Mechanism and Residual Network. Appl. Sci. 2021, 11, 5139. https://doi.org/10.3390/app11115139

AMA Style

Zhang W, Ma H, Li X, Liu X, Jiao J, Zhang P, Gu L, Wang Q, Bao W, Cao S. Imperfect Wheat Grain Recognition Combined with an Attention Mechanism and Residual Network. Applied Sciences. 2021; 11(11):5139. https://doi.org/10.3390/app11115139

Chicago/Turabian Style

Zhang, Weiwei, Huimin Ma, Xiaohong Li, Xiaoli Liu, Jun Jiao, Pengfei Zhang, Lichuan Gu, Qi Wang, Wenxia Bao, and Shengnan Cao. 2021. "Imperfect Wheat Grain Recognition Combined with an Attention Mechanism and Residual Network" Applied Sciences 11, no. 11: 5139. https://doi.org/10.3390/app11115139

APA Style

Zhang, W., Ma, H., Li, X., Liu, X., Jiao, J., Zhang, P., Gu, L., Wang, Q., Bao, W., & Cao, S. (2021). Imperfect Wheat Grain Recognition Combined with an Attention Mechanism and Residual Network. Applied Sciences, 11(11), 5139. https://doi.org/10.3390/app11115139

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Imperfect Wheat Grain Recognition Combined with an Attention Mechanism and Residual Network

Abstract

1. Introduction

2. Model and Methods

2.1. Attention Mechanism

2.1.1. Channel Attention

2.1.2. Spatial Attention Module

2.2. ResNet Model

2.3. The Residual Block Integrates with the Attention Mechanism

3. Results and Discussion

3.1. Data Acquisition

3.2. Hardware and Software Preparation

3.3. Model Training and Test Results

3.4. Comparison of Identification Results

3.5. Network Visualization with Grad-CAM

3.6. Optimize Learning Rate Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI