Classifying Wheat Hyperspectral Pixels of Healthy Heads and Fusarium Head Blight Disease Using a Deep Neural Network in the Wild Field

: Classiﬁcation of healthy and diseased wheat heads in a rapid and non-destructive manner for the early diagnosis of Fusarium head blight disease research is difﬁcult. Our work applies a deep neural network classiﬁcation algorithm to the pixels of hyperspectral image to accurately discern the disease area. The spectra of hyperspectral image pixels in a manually selected region of interest are preprocessed via mean removal to eliminate interference, due to the time interval and the environment. The generalization of the classiﬁcation model is considered, and two improvements are made to the model framework. First, the pixel spectra data are reshaped into a two-dimensional data structure for the input layer of a Convolutional Neural Network (CNN). After training two types of CNNs, the assessment shows that a two-dimensional CNN model is more efﬁcient than a one-dimensional CNN. Second, a hybrid neural network with a convolutional layer and bidirectional recurrent layer is reconstructed to improve the generalization of the model. When considering the characteristics of the dataset and models, the confusion matrices that are based on the testing dataset indicate that the classiﬁcation model is effective for background and disease classiﬁcation of hyperspectral image pixels. The results of the model show that the two-dimensional convolutional bidirectional gated recurrent unit neural network (2D-CNN-BidGRU) has an F1 score and accuracy of 0.75 and 0.743, respectively, for the total testing dataset. A comparison of all the models shows that the hybrid neural network of 2D-CNN-BidGRU is the best at preventing over-ﬁtting and optimize the generalization. Our results illustrate that the hybrid structure deep neural network is an excellent classiﬁcation algorithm for healthy and Fusarium head blight diseased classiﬁcation in the ﬁeld of hyperspectral imagery.


Introduction
Fusarium head blight disease is not evenly distributed across a wheat field, but occurs in patches with large areas of the field free of disease in the early stages of infestation, which will eventually result in significant losses.Fusarium head blight is an intrinsic infection by fungal organisms, after infestation, this disease can harm the normal physiological function of wheat, and cause changes in the form and internal physiological structure [1][2][3].After the wheat is infected, several fungal toxins will be produced.Deoxynivalenol (DON) is the most serious of these toxins.DON can lead to poisoning in people or animals and persists in the food chain for a long time [4,5].If the disease in the field can be detected earlier and quicker, wheat containing the toxins can be isolated, which can reduce the loss caused by disease.Scabs appear on the wheat head due to chlorophyll content degradation and water losses after the wheat is infected [6,7].Digital image and spectral analysis can be applied to detect the disease.The spectral reflectance of diseased plant tissue can be investigated as a function of the change in plant chlorophyll content, water, morphology, and structure during the development of the disease [8,9].
Hyperspectral imagery technology is a non-invasive method and consumes large amounts of memory and tranmission bandwidth, so that the very large data cubes can extract features and spectral identification, which represents a small subset.Hyperspectral imagery can include the spectral, spatial, textural, and contextual features of food and agricultural products [10][11][12].In addition, the application of hyperspectral imagery detection for wheat disease is relatively recent and offers interesting and potentially discerning opportunities [13].The occurrence Fusarium head blight disease is a short time, and lasts approximately half of a month.Since the field environment has a strong influence on hyperspectral imagery, previous studies have mainly focused on laboratory conditions [14].Bauriegel detected the wavelength of the Fusarium infection in wheat using hyperspectral imaging via Principal Component Analysis (PCA) under laboratory conditions.The degree of disease was correctly classified (87%) in the laboratory by utilizing the spectral angle mapper.Meanwhile, the medium milk stage was found to be the best time to detect the disease using spectral ranges of 665-675 nm and 550-560 nm [14].Karl-Heinz Dammer used the normalized differential vegetation index of the image pixels and the threshold segmentation to discriminate between infected and non-infected plant tissue.The grey threshold image of the disease ear shows a linear correlation between multispectral images and visually estimated disease levels [15].However, multispectral and RGB imagery are used to detect only infected ears with typical symptoms via sophisticated analysis of the image under uniform illumination conditions.Spectroscopy and imaging platforms for tractors, UAVs, aircraft and satellites are current innovative technologies for mapping disease in wild fields [16].
The hyperspectral imagery classification model must be robust and generalizable to improve the disease detection accuracy in wild fields.The traditional commonly used algorithm is a Support Vector Machine (SVM), which has achieved remarkable results in statistical process control applications [17].Qiao uses the method of SVM to classify fungi-contaminated peanuts in hyperspectral image pixels, and the classification accuracy exceeded 90% [18].When using SVM to classify hyperspectral images, the spectral and spatial features should be extracted with reduced dimensionality.The overall accuracy increased from 83% without any feature reduction to 87% with feature reduction based on several principal components and morphological profiles [19].Feature reduction can be seen as a transformation from high dimensions to low dimensions to overcome the curse of dimensionality, which is a common phenomenon when conducting analyses in high-dimensional space.For a given number of available training samples, the curse of dimensionality decreases the classification accuracy as the dimensions of the input feature vectors increase [20].Therefore, the analysis of hyperspectral images that are large data cubes faces the major challenge of addressing redundant information.
The method of deep learning originates from an artificial neural network, the Multiple Layer Perceptron (MLP).Among the several applications of deep neural networks, Deep Convolutional Neural Nets (DCNN) have brought about breakthroughs in processing images, face detection, audio, and so on [21][22][23].In 2015, DCNN is firstly introduced into hyperspectral images classification And Wei Hu also proposed convolutional layers and max pooling layer to discriminate each spectral signature [24,25].DCNN shows excellent performance for hyperspectral image classification [26], including cancer classification [27], and land-cover classification [28].A Deep Recurrent Neural Network (DRNN), another typical deep neural network, is a sequence network that contains the hidden layers and memory cells to remember the sequence state [29,30].In 2017, Li was the first use the DRNN to treat the pixels of hyperspectral image as the sequence data.They developed a novel function, named PRetanh, for hyperspectral imagery [31].In addition, the deep convolutional recurrent neural network for hyperspectral image was firstly for hyperspectral images that were used by Hao Wu in 2017.They constructed a few convolutional layers and followed recurrent layers to extract the contextual spectra information using the features of the convolutional layers.By combining convolutional and recurrent layers, the DCRNN model achieves results that are superior to those of other methods [32].
Up to now, research on wheat Fusarium infection was primarily focused on approaches to classify the fungal disease of wheat kernels by grey threshold segmentation, or extract head blight symptoms based on PCA [33][34][35].In the early development stage of Fusarium head blight disease, infected and healthy grains are easier to separate, but the disease symptoms are difficult to diagnose.[14,15].Moreover, in a wild field, the complicated environmental conditions and irregular disease patterns limit the classification accuracy of hyperspectral imagery experiments.
Therefore, the current study's main aim is to develop a robust and generalizable classification model for hyperspectral image pixels to detect early-stage Fusarium head blight disease in a wild field.Specifically, the main work are listed, as follows.

1.
Design and complete a hyperspectral image classification experiment for healthy head and Fusarium head blight disease in the wild field.The remainder of this paper is organized as follows.An introduction to the experiment for obtaining hyperspectral imagery of Fusarium head blight disease is briefly given in Section 2. The details of the classification algorithms, including modeling and evaluating, are described in Section 3. The experimental results and a comparison with different approaches are provided in Section 4. Section 5 discusses the effectiveness of the early diagnosis of Fusarium head blight disease by different hyperspectral classification models; finally, Section 6 concludes this paper.

Plant Material
The field wheat plants were grown in Guo He town, Hefei City, Anhui Province, China, in 2017.The occurrence of disease is completely naturals because the cultivation process does not utilize pesticides, which guarantees the success of cultivation and illustrates the real and typical symptoms of Fusarium head blight of wheat.To ensure the quality of the experimental data, the period of the experiment from 29 April to 15 May in 2017 is the ideal time for disease detection from the wheat from the medium milk stage to the fully ripe stage for obtain the real and valid hyperspectral images.Several factors influence the hyperspectral imagery experiment, including wind, humidity, and temperature, and the best experimental time of day is noon because of the suitable sunbeam angle.The constraining factors of the environment were considered in the experiment, and 90 samples of wheat ear were divided into 10 regions, with a hyperspectral image acquired for each region.Analysis of the hyperspectral image indicated that by 9-10 May, the early period of the disease development were stable (29 ± 2 • C, humidity 70%, breeze), and three groups of wheat hyperspectral images were selected in consideration of the prominent disease appearance.

Experiment Apparatus and Procedure in the Field
Following the consideration of the complexity of the field environment, the field experiment was devised to improve the program of measurement in Figure 1, which shows that the hyperspectral image data of Fusarium head blight disease was acquired with a hyperspectral VNIR system and auxiliary equipment in the field.The system consisted of the following parts: a pushbroom-type hyperspectral apparatus (OKSI, Torrance, CA, USA), a rotation stage with a pan/tilt head for scanning, a Dell Precision Workstation with the data collection software HyperVision (OKSI, Torrance, CA, USA), and a set of height-adjustable mounting brackets.Wheat samples were placed in a fixed region, and a piece of black light-absorbing cloth was placed under the targeted object as the background.The lateral view of the experiment shows the basic setting: • the tripod apparatus is placed about 30 cm in front of these samples; • the high spectral camera adjusts to a height of 1.5 m from the ground • the cloud platform is 45 degrees in the horizontal direction; • the scan range is −30 degrees to +30 degrees; and, • the measurement times were set from 11:00 a.m. to 2:00 p.m. to acquire sufficient light.
• the scan range is −30 degrees to +30 degrees; and, • the measurement times were set from 11:00 a.m. to 2:00 p.m. to acquire sufficient light.
Figure 1 shows an RGB image of nine wheat heads of each hyperspectral image.The outputs of this system were "image cubes" of wheat heads region consisting of a two-dimensional spatial image (1620 × 2325 pixels) with spectral data (400-1000 nm and 1.79 nm resolution, 339 wavebands) at each pixel.The pixel spectra classes are background, healthy, and diseased.The reflectance of the pixel spectra was derived using of standard method of a white panel (Lab sphere, North Sutton, NH, USA).Digital Numerical (DN) is the uncalibrated value of the hyperspectral imaging system.DNT is the DN of a diseased sample, and DNW is the DN of a white panel.DNB is the DN that is invoked as a substitute for dark current and noise when the camera shutter closed.The reflectance R can be calculated from the following equation [36]: The hyperspectral image data were analysed with ENVI (Environment for Visualizing Images) software of the Exelis Visual Information Solutions Company.The Fusarium head blight disease is visually distinguished by false colour images in Figure 1.False colour images also facilitated the proper manual setting of the Region Of Interst (ROI) and the selection of tissues for spectral analysis.Additionally, the ENVI software can use the manual fraction to increase the ROI of the wheat head.

The Deep Convolutional Neural Network
For deep models, DCNN is widely used as a feed-forward neural network consisting of a convolution layer, a pooling layer, and a fully connected layer [37,38].The unit of the convolution layer is the feature graph, and each unit is related to the block of the previous feature graph by the filter group.The pooling layer takes a specific value as the output value in a small area.The main Figure 1 shows an RGB image of nine wheat heads of each hyperspectral image.The outputs of this system were "image cubes" of wheat heads region consisting of a two-dimensional spatial image (1620 × 2325 pixels) with spectral data (400-1000 nm and 1.79 nm resolution, 339 wavebands) at each pixel.The pixel spectra classes are background, healthy, and diseased.The reflectance of the pixel spectra was derived using of standard method of a white panel (Lab sphere, North Sutton, NH, USA).Digital Numerical (DN) is the uncalibrated value of the hyperspectral imaging system.DN T is the DN of a diseased sample, and DN W is the DN of a white panel.DN B is the DN that is invoked as a substitute for dark current and noise when the camera shutter closed.The reflectance R can be calculated from the following equation [36]: The hyperspectral image data were analysed with ENVI (Environment for Visualizing Images) software of the Exelis Visual Information Solutions Company.The Fusarium head blight disease is visually distinguished by false colour images in Figure 1.False colour images also facilitated the proper manual setting of the Region Of Interst (ROI) and the selection of tissues for spectral analysis.Additionally, the ENVI software can use the manual fraction to increase the ROI of the wheat head.

The Deep Convolutional Neural Network
For deep models, DCNN is widely used as a feed-forward neural network consisting of a convolution layer, a pooling layer, and a fully connected layer [37,38].The unit of the convolution layer is the feature graph, and each unit is related to the block of the previous feature graph by the filter group.The pooling layer takes a specific value as the output value in a small area.The main purpose of the pooling operation is to reduce the dimension [39].The one-dimension convolution neural network (1D-CNN) has been successfully used for hyperspectral image pixel-level classification [40,41].A spectral vector of the hyperspectral image pixel is taken as the input layer.Analogous to the 1D-CNN, the two-dimension convolution neural network (2D-CNN) is widespread for two-dimensional image data.
There is a comparison between the one-dimensional pixel vector and the two-dimensional gray-scale image, which is based on spectral information in Figure 2.For the neural network of the model, the convolutional layer and pooling layer refer to the Visual Geometry Group neural network (VGG network) [42], the kernel is 3 × 3, and the pooling layer is 2 × 2. The significant improvement of the VGG network makes the neural network deeper and uses two 3 × 3 kernels instead of a 5 × 5 kernel, which decreases the number of parameters and the amount of computation.By contrast to the configuration of the VGG network, the kernel sizes of 1D-CNN and 2D-CNN are 3 and 3 × 3, respectively.Table 1 shows the configuration of the 1D-CNN and 2D-CNN, which both use four convolutional layers, two pooling layers, one dense layer for the deep neural network, and the dropout function to prevent the over-fitting.purpose of the pooling operation is to reduce the dimension [39].The one-dimension convolution neural network (1D-CNN) has been successfully used for hyperspectral image pixel-level classification [40,41].A spectral vector of the hyperspectral image pixel is taken as the input layer.
Analogous to the 1D-CNN, the two-dimension convolution neural network (2D-CNN) is widespread for two-dimensional image data.
There is a comparison between the one-dimensional pixel vector and the two-dimensional grayscale image, which is based on spectral information in Figure 2.For the neural network of the model, the convolutional layer and pooling layer refer to the Visual Geometry Group neural network (VGG network) [42], the kernel is 3 × 3, and the pooling layer is 2 × 2. The significant improvement of the VGG network makes the neural network deeper and uses two 3 × 3 kernels instead of a 5 × 5 kernel, which decreases the number of parameters and the amount of computation.By contrast to the configuration of the VGG network, the kernel sizes of 1D-CNN and 2D-CNN are 3 and 3 × 3, respectively.Table 1 shows the configuration of the 1D-CNN and 2D-CNN, which both use four convolutional layers, two pooling layers, one dense layer for the deep neural network, and the dropout function to prevent the over-fitting.

Deep Recurrent Neural Network
A DRNN is a classic framework for time sequence data that is different from the convolutional feed-forward neural network [43,44].In a deep recurrent network, the output neuron can directly affect itself the next time.The original RNN has problems with gradient vanishing and gradient explosion.LSTM is a traditionally gated framework that is extended to the RNN to solve these problems, and GRU is a novelty gated framework.
LSTM was introduced by Hochreiter and Schmidhuber [45].Graves reviewed and utilized the LSTM to generate and recognize speech [46,47].LSTM consists of three gates (i.e., a forget gate, an input gate, and an output gate)."h" is the prior output."x" is the current input."σ" is the sigmoid function."b" is the bias item.
The forget gate formula is: The input gate formula is: The 'tanh' function creates a new candidate "C"; "i" is the selection result for "C"; the "C t " uses "C t−1 ", "f t ", and " C" to update the state.
The output gate formula is: GRU is the optimal form of the LSTM architecture to adaptively capture dependencies of different time scales [48,49].The GRU has an update gate, and an input gate.These gates' formulas are: Two RNN models with three stacked LSTM layers and GRU layers, respectively, are able to learn higher-level temporal representations, as shown in Table 2.

Deep Convolutional Recurrent Neural Network
To take advantage of characteristics of the convolutional and recurrent layers, a deep convolutional recurrent neural network (DCRNN) integrates a convolutional layer with a recurrent layer as a complete framework for the classification of images, language, and so on [50].The DCRNN can learn better image representation, because the convolutional layers act as feature extractors and provide abstract representations of the input data in feature maps, and the recurrent neural networks are designed to learn contextual dependencies by using the recurrent connections [50,51].Because the feature of a different wavelength is a mutual correlation, such as the vegetation index, the hybrid neural network structure can extract features of spectra sequences from the convolutional layer, and then obtain the contextual information of the feature from the recurrent layer.The training process of the hybrid model may draw from deeper and useful characteristics of hyperspectral image pixels.The framework of DCRNN consists of three parts: convolution layers, a reshape layer to change the tensor dimensions, and three stacked recurrent layers, using LSTM or GRU.
To improve the generalizability, a bidirectional RNN includes forward and backward features of the spectra from Equation (6).The hybrid structure can be utilized for natural language processing and phonetic recognition [52].The bidirectional LSTM network was proposed for sequence tagging and other fields [53,54].In 2017, hyperspectral image classification first used the bidirectional LSTM of CRNN, and the accuracy of the training dataset was over 95% [32].A bidirectional GRU that exploited the context to resolve ambiguities better than the unidirectional GRU was utilized to extract efficient features and make the classifier perform better than the unidirectional method [55,56].
The formula of the bidirectional RNN is: ) Therefore, the hybrid structure use the bidirectional LSTM and bidirectional GRU, instead of the LSTM and GRU, and Figure 3 shows that the feature of the convolutional layers can flow into the recurrent layers to capture the global information for hyperspectral images.Table 3 presents four hybrid models, namely, a two-dimensional convolutional neural network with long short-term memory (2D-CNN-LSTM), a two-dimensional convolutional neural network with gated recurrent unit (2D-CNN-GRU), a two-dimensional convolutional neural network with bidirectional long short-term memory (2D-CNN-BidLSTM), and a two-dimensional convolutional neural network with a bidirectional gated recurrent unit (2D-CNN-BidGRU).

Evaluation Method
This paper considers the following criteria to select the best model.Accuracy is widely implemented for classified hyperspectral pixels.However, the accuracy is not sufficient for imbalanced datasets.The confusion matrix, viewed as an error matrix, can clearly depict the predicted categories for each row and the actual categories for each column.However, the confusion matrix cannot directly determine the evaluation of the classifier model.Therefore, we use precision, recall, and F1 score to assess these models.
TP, FP, FN, and TN stands for true positive, false positive, false negative, and true negative, respectively.The formulas of Precision (P) and Recall (R) are [57]: The performance of models expect the value of precision and recall to be higher, but they are incompatible.The F1 score is a better metric that combines the characteristics of precision and recall to evaluate the model for different classes in the dataset.A good F1 score is also indicative of satisfactory classification performance.The F1 score formula is [57]: The Tensorflow framework with Python 3.5 is implemented on a workstation with a 3.5 GHz Intel(R) Core i7 CPU and a NVIDIA(R) GTX 1080TI GPU.

Experiment Dataset and Analysis
The original data comprise hyperspectral image cubes for six sample regions in Figure 1.Three types are included: background pixels, healthy pixels, and the diseased pixels.Table 4 shows the number of these ROI pixels.The dataset suffers from the imbalance problem where the number of diseased pixels is obviously smaller than the numbers of healthy and background pixels.Sample class imbalance is a common problem that has a detrimental effect on classification performance.Several methods can be used to resolve the issue: oversampling, undersampling, and two-phase training [58][59][60].To avoid sample quantity interference, we use undersampling to guarantee no difference in the numbers of samples of different types.After randomly undersampling the data, the total size of the training and validation datasets is 227,484.The total size of the testing dataset is 581,716.When considering the dataset of deep model, the spectra of hyperspectral image pixel will be reshaped into the two dimensional data as grey image.There are many preprocessing methods for image data of deep model, such as normalization, mean-removal, and PCA whitening.Preprocessing methods could speed up the gradient descent and improve the accuracy.PCA whitening is mainly used into the dimensionality reduction with colour image [61].But, this input data for these deep models is similar to grey image.Therefore, the preprocessing method of spectra samples uses the mean-removal method that can be used to reduce the sampling error from different experiment dates [62].
PCA method that can identify the principal characteristics of the data at different observation times can be used to improve the visualization of spectra data to compare the mean-removed and non-mean-removed data [63][64][65].In Figure 4, the red discrete points are the first-day samples, and the yellow discrete points are the second-day samples.The discrete points in Figure 4a show 800 random background, healthy, and diseased pixels, that are clearly irrelevant from different observation times.In Figure 4b, first-and second-day discrete points show more obvious overlap, especially the background samples.The illustration shows that the spectra from different experiment dates don't belong to the same range of values, and mean-removal reduces the difference between the first-and second-day samples.The last step of preprocessing is normalization, which is a conventional method for deep learning [23].
random background, healthy, and diseased pixels, that are clearly irrelevant from different observation times.In Figure 4b, first-and second-day discrete points show more obvious overlap, especially the background samples.The illustration shows that the spectra from different experiment dates don't belong to the same range of values, and mean-removal reduces the difference between the first-and second-day samples.The last step of preprocessing is normalization, which is a conventional method for deep learning [23].

Model Training
When training models, the data set is divided into a training set (70%) and a validation set (30%).
The training results are used to search the hyperparameters of the deep neural network.In our experiments, the loss function is "cross entropy" [66,67]; the optimizer is "adadelta" [68,69]; the activation function is "elu" for the convolutional layer and the dense layer and the "tanh" for the recurrent layer; the batch size is 64 [44,70]; and, the regularization and dropout function are used to decrease over-fitting [71,72].In terms of accuracy and loss, a number of deep neural networks are regular and stabilized after 300 epochs of training.
Figure 5 shows the accuracy and loss of models of the training dataset and validation dataset after 300 epochs.Figure 5a

Model Training
When training models, the data set is divided into a training set (70%) and a validation set (30%).The training results are used to search the hyperparameters of the deep neural network.In our experiments, the loss function is "cross entropy" [66,67]; the optimizer is "adadelta" [68,69]; the activation function is "elu" for the convolutional layer and the dense layer and the "tanh" for the recurrent layer; the batch size is 64 [44,70]; and, the regularization and dropout function are used to decrease over-fitting [71,72].In terms of accuracy and loss, a number of deep neural networks are regular and stabilized after 300 epochs of training.
Figure 5 shows the accuracy and loss of models of the training dataset and validation dataset after 300 epochs.Figure 5a 5e,f show that the first hybrid structure also suffers from over-fitting with LSTM and GRU, especially the 2D-CNN-LSTM, but they are still better than the model with the recurrent neural network.Based on these result, the second hybrid neural network is reconstructed with the bidirectional LSTM and GRU instead of LSTM and    5e,f show that the first hybrid structure also suffers from over-fitting with LSTM and GRU, especially the 2D-CNN-LSTM, but they are still better than the model with the recurrent neural network.Based on these result, the second hybrid neural network is reconstructed with the bidirectional LSTM and GRU instead of LSTM and GRU; the new hybrid networks are called 2D-CNN-BidLSTM and 2D-CNN-BidGRU, respectively.On the basis of Figure 5g,h, the second hybrid deep neural networks avoid over-fitting, while simultaneously improving the performance.
According to the results of different deep neural networks in Figure 5, the models that minimized the loss for the validation dataset are considered to be the best.Table 5 shows the training accuracy, validation accuracy, training loss, validation loss, epoch, and training time for each model.2D-CNN-BidGRU, which has the minimum loss, is the best training model.On the basis of Table 5, we can conclude that the maximum training accuracy is 0.847 for the GRU model, but the maximum validation accuracy is 0.83 for the 2D-CNN model.Therefore, a model using only independent CNN or RNN is not sufficient for training the best hyperspectral image classification model.Moreover, different deep learning algorithms will take almost the same training time expect LSTM, but all of the deep models require a longer time than SVM.

Model Testing
The performance of the models should be assessed using different methods for the testing dataset.In this experiment, the testing dataset is approximately three times larger than the training dataset; therefore, the testing accuracy is important, especially for the generalization assessment, because the models will be applied to classify the hyperspectral image for large scale disease detection.
In Figures 6 and 7, we calculate the confusion matrix, precision, recall, and F1 score of the testing dataset to evaluate all models and to determine the best models from the testing dataset based on generalizability.Figure 6 presents the confusion matrix for the detailed classification models of the testing dataset that contains the background class, healthy class, and diseased class.Note the relatively large size of the testing dataset and that the healthy testing dataset has a high misclassification rate for all models.
Figure 7 and Table 6 show the precision, recall, and F1 scores for three classes of datasets and all models, respectively, for the testing dataset.In Figure 7, the precision of the healthy class is the highest, the recall of the disease class is the highest, the precision of the disease class is the lowest, and the recall of the healthy class is the lowest.The total precision and the total-recall for 2D-CNN-BidLSTM and 2D-CNN-BidGRU are both the highest of these models.Therefore, the F1 scores of 2D-CNN-BidGRU and 2D-CNN-BidLSTM are the same as in Table 6.
The F1 scores and accuracy of the models indicate that RBF-SVM performs worse than the deep neural network models because the classification of SVM does not apply to a large number of samples.Based on this assessment, 2D-CNN-BidGRU and 2D-CNN-BidLSTM are efficient, but 2D-CNN-BidGRU is better than 2D-CNN-BidLSTM with respect to the accuracy of the disease and healthy classifications.The hybrid structure connecting the CNN with the bidirectional recurrent neural network is better than that of other deep models.Although deep learning takes a longer training time, deep model are faster than SVM in testing time, which is more important in practice from Table 6.The development of GPU makes the tolerable training and testing time of deep neural network.

Original Hyperspectral Image Mapping
Finally, the grey hyperspectral image (670 nm) in the range of 665-675 nm for the head blight index (HBI) [14] is used to compare the mapping of original hyperspectral image by different models.Figure 8 shows the grey images of the original hyperspectral images mapped by these models.The results indicate that although these models can classify the diseased and healthy wheat heads, instrument noise influences the accuracy of the classification.

Discussion
The main purpose of this work was to study classification algorithms of hyperspectral image pixels and analyse the performance of deep models for diagnosing Fusarium head blight disease in wheat.Previous studies attempted to classify the disease symptoms based on spatial and spectral features via machine learning algorithms [14,15]; for example, PCA [33][34][35], random forest, and SVM

Discussion
The main purpose of this work was to study classification algorithms of hyperspectral image pixels and analyse the performance of deep models for diagnosing Fusarium head blight disease in wheat.Previous studies attempted to classify the disease symptoms based on spatial and spectral features via machine learning algorithms [14,15]; for example, PCA [33][34][35], random forest, and SVM [17][18][19].Recently, the study of deep algorithms for hyperspectral imagery has becomes increasingly intensive, such as DCNN [24][25][26][27], DRNN [31], and hybrid neural networks [32].The complicated hybrid structure [50,51] of a bidirectional recurrent layer [53,54] will help to improve the of the classification model for prediction in testing datasets.This study, therefore, indicates that the benefits that are gained from deep characteristics of the disease spectra and identifies the best hybrid model for the diagnosis of Fusarium head blight disease in the field.

Extracting and Representing Deep Characteristics for Disease Symptoms
For high-resolution spectral instrumentation, the number of bands obtained by hyperspectral images is greater than that by multispectral images [12].Therefore, the valid features of hyperspectral image pixels for disease detection in wild fields are difficult to determine.Many classifiers of hyperspectral images are based on specific wavelet features or vegetation indices that are related to pathological characteristics [7,10].Deep feature extraction for disease detection by deep neural networks in the spectral domain is presented to acquire deeper information from hyperspectral images [26].
Among the abundance of deep learning methods, DCNN is commonly used to classify two-dimensional data-type images in visual tasks [42].Our work on training and testing 1D-CNN and 2D-CNN shows that two-dimensional data can capture the intrinsic features of hyperspectral images better than one-dimensional data.Another significant branch of the deep learning family is the DRNN, which is designed to address sequential data [31].By contrast to the CNN, the RNN maintains all of the spectral information in a recurrent procedure with a sequence-based data structure to characterize spectral correlation and band-to-band variability.As a result, to integrate the advantages of these deep neural networks, novel models that combine recurrent neural networks are proposed.However, in this work, the accuracy of second hybrid neural network with bidirectional recurrent layer for validation data is the better than first one with recurrent layer.The bidirectional RNNs that can incorporate contextual information from both past and future inputs [54].The assessment results of all the deep models on the validation data show that 2D-CNN-BidGRU, with an accuracy of 0.846, is the most effective method for disease feature extraction from hyperspectral image pixels.

Assessing the Performance of Different Models for Hyperspectral Image Pixel Classification
Robustness and generalizability are very important for large-scale disease detection in hyperspectral images.Therefore, the superior performance of a model must be assessed for a large number of testing datasets.The assessment metrics of these classification models include not accuracy, but the confusion matrix, F1 score, precision and recall [57].The precision, recall, and F1 score for the testing dataset indicate that both the background and diseased classes are efficient for hyperspectral image pixel classification, but part of the healthy dataset is misclassified as diseased by all of the models.
The assessment results show that the deep models are better than SVM for the performance.With these deep models, despite applying regularization [71] and dropout [72] for the deep neural networks, the LSTM model, which has the worst performance, has an over-fitting problem.When a deep model learns a concept when there is noise in the training data, the problem of over-fitting will occur to such an extent that it negatively impacts the performance of the model on the testing data [73].The performance of the hybrid neural networks are better than that of other deep neural networks.For comparing the assessment result of the two novel hybrid models on training and testing dataset, the first hybrid neural network with LSTM has the same over-fitting problem, but the second hybrid model with bidirectional LSTM and GRU can restrain the problem.The F1 score (0.75) and accuracy (0.743) on the testing dataset provide compelling evidence that the hybrid structure of 2D-CNN-BidGRU is the best for improving the model with respect to robustness and generalizability.

Next Steps
Notably, the application of deep neural networks to hyperspectral image pixel classification for Fusarium head blight disease is a new idea and has a great potential for disease diagnosis by remote sensing.Despite some insufficiencies, these results show that a deep neural network with a convolutional layer and bidirectional recurrent layer can classify the hyperspectral image pixels to diagnose the Fusarium head blight disease and improve the generalization and robustness of the classification model.In a wild wheat field, many types of objects have strong impacts on the generalization of the classification model.Therefore, future studies of deep models should apply more in-deep neural networks and customized loss functions to optimize these algorithms for a large number of testing datasets and more types of objects.Future study of application will focus on airborne hyperspectral remote sensing, which could be used to develop large-scale Fusarium head blight disease monitoring and mapping.

Conclusions
The study illustrates that deep neural networks can improve the classification accuracy and F1 score of Fusarium head blight disease detection from hyperspectral image pixels in a wild field.These results demonstrate that: (1) the hyperspectral image can be used to classify diseased and healthy wheat heads using the spectra of the pixels in a wild field; (2) the spectra of pixels can be reshaped into a two-dimensional data type to identify the features of disease symptoms more easily than using a one-dimensional data structure (3) when compared with other deep neural networks, the hybrid model of the convolutional bidirectional recurrent neural network can prevent over-fitting and achieve higher accuracy (0.846) on the validation dataset; and, (4) with a larger testing dataset, the 2D-CNN-BidGRU model has the best generalization performance that of F1 score and accuracy is 0.75 and 0.743, respectively.Our study provides a novel classification algorithm for the research on Fusarium head blight disease in a wild field within a complicated environment, which can be used in future studies of disease prediction and larger-scale disease assessment based on airborne hyperspectral remote sensing.

Figure 1 .
Figure 1.Hyperspectral imagery field experiment and manual hyperspectral image Region Of Interst (ROI) for diseased, healthy, and background.

Figure 1 .
Figure 1.Hyperspectral imagery field experiment and manual hyperspectral image Region Of Interst (ROI) for diseased, healthy, and background.

Figure 4 .
Figure 4. Principal Component Analysis (PCA) image of the original data and the mean-removed data.(a) Illustration of the original data by PCA; and, (b) Illustration of the mean removed data by PCA.The red points are the first-day samples, and the yellow points are the second-day samples.
,b show the training accuracy and loss of 1D-CNN and 2D-CNN, which have similar networks, and contrast 1D-CNN with 2D-CNN, which reshape the dimension into 16 ×

Figure 4 .
Figure 4. Principal Component Analysis (PCA) image of the original data and the mean-removed data.(a) Illustration of the original data by PCA; and, (b) Illustration of the mean removed data by PCA.The red points are the first-day samples, and the yellow points are the second-day samples.
,b show the training accuracy and loss of 1D-CNN and 2D-CNN, which have similar networks, and contrast 1D-CNN with 2D-CNN, which reshape the dimension into 16 × 16 grey images for the input layer.The results indicate that two models have equivalent accuracy and loss for the training dataset, whereas for validation dataset, the accuracy and loss of 2D-CNN are better than those of 1D-CNN.Figure 5c,d show the training result of LSTM and GRU.The recurrent neural network of LSTM and GRU both have over-fitting problems in terms of accuracy and loss in the training dataset and the validation dataset, but GRU is better than LSTM in the preliminary training period.16grey images for the input layer.The results indicate that two models have equivalent accuracy and loss for the training dataset, whereas for validation dataset, the accuracy and loss of 2D-CNN are better than those of 1D-CNN.Figure5c,dshow the training result of LSTM and GRU.The recurrent neural network of LSTM and GRU both have over-fitting problems in terms of accuracy and loss in the training dataset and the validation dataset, but GRU is better than LSTM in the preliminary training period.

Figure 5 .
Figure 5. Accuracy and loss in the training dataset and validation dataset.(a,b) Illustration of the accuracy and loss by 1D-CNN and 2D-CNN.(c,d) Illustration of the accuracy and loss by LSTM and GRU.(e,f) Illustration of the accuracy and loss by 2D-CNN-LSTM and 2D-CNN-GRU.(g,h) Illustration of the accuracy and loss by 2D-CNN-BidLSTM and 2D-CNN-BidGRU.

Figure
Figure 5e-h compare the training and validation results of the hybrid structures 2D-CNN-LSTM, 2D-CNN-GRU, 2D-CNN-BidLSTM, and 2D-CNN-BidGRU.Figure 5e,f show that the first hybrid structure also suffers from over-fitting with LSTM and GRU, especially the 2D-CNN-LSTM, but they are still better than the model with the recurrent neural network.Based on these result, the second hybrid neural network is reconstructed with the bidirectional LSTM and GRU instead of LSTM and

Figure
Figure 5e-h compare the training and validation results of the hybrid structures 2D-CNN-LSTM, 2D-CNN-GRU, 2D-CNN-BidLSTM, and 2D-CNN-BidGRU.Figure 5e,f show that the first hybrid structure also suffers from over-fitting with LSTM and GRU, especially the 2D-CNN-LSTM, but they are still better than the model with the recurrent neural network.Based on these result, the second hybrid neural network is reconstructed with the bidirectional LSTM and GRU instead of LSTM and

Figure 5 .
Figure 5. Accuracy and loss in the training dataset and validation dataset.(a,b) Illustration of the accuracy and loss by 1D-CNN and 2D-CNN.(c,d) Illustration of the accuracy and loss by LSTM and GRU.(e,f) Illustration of the accuracy and loss by 2D-CNN-LSTM and 2D-CNN-GRU.(g,h) Illustration of the accuracy and loss by 2D-CNN-BidLSTM and 2D-CNN-BidGRU.

Figure
Figure 5e-h compare the training and validation results of the hybrid structures 2D-CNN-LSTM, 2D-CNN-GRU, 2D-CNN-BidLSTM, and 2D-CNN-BidGRU.Figure5e,f show that the first hybrid structure also suffers from over-fitting with LSTM and GRU, especially the 2D-CNN-LSTM, but they are still better than the model with the recurrent neural network.Based on these result, the second hybrid neural network is reconstructed with the bidirectional LSTM and GRU instead of LSTM and GRU; the new hybrid networks are called 2D-CNN-BidLSTM and 2D-CNN-BidGRU, respectively.On the basis of Figure5g,h, the second hybrid deep neural networks avoid over-fitting, while simultaneously improving the performance.

Figure
Figure 5e-h compare the training and validation results of the hybrid structures 2D-CNN-LSTM, 2D-CNN-GRU, 2D-CNN-BidLSTM, and 2D-CNN-BidGRU.Figure5e,f show that the first hybrid structure also suffers from over-fitting with LSTM and GRU, especially the 2D-CNN-LSTM, but they are still better than the model with the recurrent neural network.Based on these result, the second hybrid neural network is reconstructed with the bidirectional LSTM and GRU instead of LSTM and GRU; the new hybrid networks are called 2D-CNN-BidLSTM and 2D-CNN-BidGRU, respectively.On the basis of Figure5g,h, the second hybrid deep neural networks avoid over-fitting, while simultaneously improving the performance.

Figure 6 .
Figure 6.Confusion matrix of the testing dataset.

Figure 6 .
Figure 6.Confusion matrix of the testing dataset.

19 4. 4 .Figure 8 .
Figure 8. Original hyperspectral image and the grey-image mapped using different models (white represents the background dataset; grey represents the healthy dataset; and the black represents the disease dataset).

Figure 8 .
Figure 8. Original hyperspectral image and the grey-image mapped using different models (white represents the background dataset; grey represents the healthy dataset; and the black represents the disease dataset).
The hyperspectral images are divided by pixels of different classes into a training dataset, a validation dataset, and a testing dataset to training the model.2.
Compare and improve the different deep neural networks for hyperspectral image classification.These neural networks include DCNN with two input data structures, DRNN with Long Short Term (LSTM), and the Gated Recurrent Unit (GRU), and an improved hybrid CRNN.3.Take advantage of these assessment methods to determine the best model for classifying hyperspectral image pixels.Different SVM and deep neural network models are assessed and analysed on training dataset, validation dataset, and the testing dataset.

Table 4 .
The total number of ROI pixels.

Table 5 .
Accuracy and loss of the best models, and training time (h).
1 SVM algorithm has a different loss value, and its epoch is none.

Table 6 .
Evaluation of testing dataset for different models.

Table 6 .
Evaluation of testing dataset for different models.