Rape Plant Disease Recognition Method of Multi-Feature Fusion Based on D-S Evidence Theory

: In view of the low accuracy and uncertainty of the traditional rape plant disease recognition relying on a single feature, this paper puts forward a rape plant disease recognition method based on Dempster-Shafer (D-S) evidence theory and multi-feature fusion. Firstly, color matrix and gray-level co-occurrence matrix are extracted as two kinds of features from rape plant images after processing. Then by calculating the Euclidean distance between the test samples and training samples, the basic probability assignment function can be constructed. Finally, the D-S combination rule of evidence is used to achieve fusion, and ﬁnal recognition results are given by using the variance. This method is used to collect rape plant images for disease recognition, and recognition rate arrives at 97.09%. Compared with other methods, experimental results show that the method is more effective and with lower computational complexity.


Introduction
Rape is one of the main oil crops in China and plays an important role in people's daily life and production.However, during their growth, rape plants are always associated with various diseases.For example, black spot, bacterial black spot and downy mildew are the most common three diseases.These diseases seriously influence the yield of rape plants, which brings huge losses to the farmers' property income.Traditional methods to recognize different kinds of diseases are not only time-consuming but laborious.Computer Vision Technology [1][2][3] has the advantages of being real-time, objective and nondestructive, which is suitable for application in crop disease recognition.It can efficiently guide farmers to take corresponding measures for prevention and control to improve the crop yield.Therefore, the intelligent recognition of rape plants is very important.
Rape plant disease recognition methods can be divided into three steps: image preprocessing, feature extraction and final recognition of diseases.The most important part of image preprocessing is the disease area segmentation.The accuracy of segmentation directly affects the accuracy of feature extraction.At present, the extracted features of crops mainly include color features, texture features and shape features [4].The color features generally include color moments and color histogram.Texture features generally include local binary pattern (LBP) and gray level co-occurrence matrix (GLCM).Shape features generally include area, circumference, length-width ratio, and so on.Komi et al. [5] recognize the diseased crops through the combination of color feature and leaf spectrograms.However, the image acquisition facilities are expensive, and the spectrograms are difficult to get.Tao et al. [6] fuse color completed local binary pattern (CCLBP), hue-saturation-value(HSV) histogram, border/interior pixel classification (BIC) to do vegetable recognition and fruit recognition, and get satisfactory results.However, the CCLBP feature has a high dimension which leads to slow computing speed.It is not conducive to practical application.Because of the limitation of the single feature method, multi-feature methods are widely used to describe the image information.In the recognition step, Support Vector Machine(SVM), Back Propagation (BP) neural network and Dempster-Shafer (D-S) evidence theory are mostly used.Because of its simple principle and comprehensive analysis of each evidence's effect to the final recognition result, D-S evidence theory [7] is very popular in related research.However, how to construct the basic probability assignment function, which is necessary for D-S evidence theory, is a difficult problem.
In this paper, a method of rape plant disease recognition based on D-S evidence theory is proposed.By comparing the Euclidean distances between training samples and test samples, the basic probability assignment function is constructed.At the same time, due to the disadvantages of D-S evidence theory decision-making rules, this paper improves it by introducing the variance to comprehensively evaluate the fusion results, aiming to further improve the recognition rate.

Feature Extraction
Through observation and analysis of large numbers of diseased rape plant images, it is showed that the leaf diseased areas of three rape plant diseases have apparent differences in color and texture.Thus, this paper extracts the color feature and texture feature of rape plant images to make further experimental analysis.

Color Feature
The Color Matrix is a kind of simple and effective method to describe images' color information.The distribution information of color can be described by its moments, and color information is mainly concentrated on the lower-order moments.Therefore, the first-order-moment, second-order and third-order moment are usually used to represent color features.Color moments Equations are shown in the following: (2) N is the total number of pixels, q i,j is the value of the nth pixel in the ith color channel.M 1,i , M 2,i , M 3,i respectively corresponds to the first-order-moment, second-order and third-order moment.In view of the advantages of Hue-Saturation-Intensity (HIS) color space in processing color images, this paper extracts the color matrix from images in HSI color space.Before extracting features, each channel of HSI color space should not be uniformly quantized and this paper sets the level of non-uniform quantization to eight [8].By calculating each channel's Color Matrix of a HIS image, a nine-dimensional vector is obtained.

Texture Feature
Ojala et al. [9] propose a model called uniform local binary pattern(ULBP).When the jumping times of a LBP cyclic binary code from 0 to 1 or 1 to 0 are less than three, this LBP cyclic binary code is called a uniform pattern.Other patterns are classified as one pattern, called the mixed pattern.By such improvement, the number of the binary patterns is greatly reduced without losing any information.Wang et al. [10] and Sun et al. [11] propose a texture feature extraction algorithm based on ULBP combined with GLCM.After using this algorithm, on the one hand, the extraction of texture feature is more abundant, which is conducive to the various operations and processing in later steps.On the other hand, the computational complexity is greatly reduced.After obtaining the GLCM, angular second moment, contrast, entropy, inverse different moment four scalars are selected to represent the texture feature.Each image can be transformed as a 13-dimensional vector by adding the color feature vector mentioned in Section 2.1.1.

Definition of the Dempster-Shafer Evidence Theory
Dempster-Shafer evidence theory is an efficient method to process uncertain, incomplete, and vague information in data fusion.It fuses the Basic Probability Assignment (BPA) function of two or more evidences into a new one as the final decision-making basis, which can get more reliable results and higher recognition rate.
Suppose that E 1 , E 2 are two evidences, m 1 , m 2 are two probability distribution functions resulted from different evidence, A i and B j are the focal elements, the combination rule of D-S evidences is shown as Equation (4).
) reflects the conflict extent between different evidences.

The Construction of BPA
Because the Euclidean distance can reflect the similarity between different cases: the greater the distance is, the less similarity they have.Suppose there are two vectors x 1 and x 2 , the Euclidean distance between them can be show in Equation (5).
In view of this principle, BPA can be constructed by using the Euclidean distance.From the image samples standpoint, if the Euclidean distance between a training sample and a test sample is very large, it means there is little possibility that they belong to the same category.In order to obtain the probability density of a sample belonging to different categories, we try this hypothesis: the Euclidean distance between a training sample and a test sample is d, the probability density of the test sample is inversely proportional to the α-th power of d, as shown in Equation (6).
The Equation ( 6) can explain the relationship between Euclidean distance and probability density.In this paper, α is set to 2.
The concrete steps to construct the basic probability assignment function are as follows: Suppose there are respectively n 1 , n 2 , n 3 images corresponding to three kinds of rape plant diseases in training samples.The vectors used to describe these images are named x i,1 , x i,2 , . . ., x i,ni (i = 1, 2, 3), they are all 13-dimensional vectors (Section 2.1.2).Now an image from the test samples is named x 0 , d i represents the average Euclidean distance between this test sample and the i-th disease training samples.
Step 1: Firstly, all the Euclidean distances between training samples and the test sample are calculated, and they are respectively named d i,1 , d i,2 , . . ., d i,n i (i = 1, 2, 3).After that, select the minimum value from d i,1 , d i,2 , . . ., d i,n i as the distance between test sample and the i-th training samples, so d i = min (d i,1 , d i,2 , . . ., d i,n i ).For d i is the arbitrary real number in [0, ∞], so it is necessary to normalize all the d i .Follow the Equation ( 7): Step 2: After getting the d i , the next step is to put the d i into Equation (7), so that the probability density p(i) of test sample belongs to the i-th kind disease is obtained.
Step 3: By the aid of Equation ( 8), the BPA function D-S evidence theory need is constructed.

Decision-Making Rules
Let µ i (i = 1, 2, 3) represent the category of rape plant diseases.A r is the target category.The belief of A r (here referred to as m(A r )) obtained through the Equation ( 9) must meet the following rules: • m(A r ) is the largest number in the final BPA;

•
The difference between m(A r ) and the belief of any other category should be larger than a threshold (let it be ε 1 ); • If the above three rules are not met, the recognition result is identified as "uncertain".

Decision-Making Improvement
By using D-S evidence theory, color feature and texture feature are combined to judge the final result.However, this introduces new problems, if the target category belief of evidence A is 0.4, while the belief of evidence B is 0.45, their difference is lower than 0.1, which does not conform to the second rule in D-S evidence theory decision-making rules.The final recognition result would be judged as "uncertain category", this phenomenon will no doubt affect the final recognition rate.To solve this problem and improve the recognition rate, it is necessary to change the decision-making rules.Through a series of observations and experiments, it is found that for the actual category of the test sample, its beliefs of color, texture and final fusion are changing with small amplitude.If one evidence strongly supports a sample, it should be classified as A, but another evidence lacks enough support then it is an A which does not meet the expectation.This can be interpreted as one evidence generating a wrong belief, or the sample's information is not well described by the corresponding feature.Suppose we choose features with no defects, this phenomenon should not appear.It reflects that the actual category makes a contribution to the belief of each evidence being relatively stable, which is also consistent with human cognitive experience.
The variance [12] can reflect the change range of the data.Therefore, this paper introduces the variance to improve the D-S evidence theory decision-making rules.Suppose ], m 1 is the BPA function of color feature, m 2 is the BPA function of texture feature and m is the BPA function of D-S fused.µ 1 , µ 2 are corresponding to two different kinds of diseases.m(µ 1 ), m(µ 2 ) are against the D-S decision-making rules, the difference of m(µ 1 ) and m(µ 2 ) is less than 0.1, then respectively calculate the variance of sequence X 1 , X 2 .After the variance is obtained(Set to var), we generate a new BPA function using Equation (9): λ is the weight to adjust the proportion of the variance and the original belief of fusion.This paper selects λ = 0.5 (different types of data may have different values).If λ is too small, the effect of variance is not obvious.If λ is too big, then the variance may generate interference in the final results.In Table 1 the method is proved effective.

Experiments and Analysis
With the help of the methods above, this paper does experiments on rape plant images databases containing three kinds of rape plant diseases, 200 training images and 103 test images.These images are collected from the experiment fields of the Chinese Academy of Agricultural Sciences (CAAS) in natural lighting, and the final recognition rate is shown in Table 2.
From the Table 1 we can see that the recognition of black spot and bacterial black spot have slight errors.The reason for this is that these two disease samples are similar to a certain extent.If an image of black spot is not clear enough, its color feature and texture feature extraction may be influenced.In these circumstances, its disease type may be recognized as bacterial black spot.Further research is needed to deal with such problems.The algorithm this paper proposed presents a good performance.The average recognition rate has reached 97.09%, which is high compared to other algorithms in Table 3.However, to a certain degree, the recognition time has been increased due to adding operations in the D-S evidence theory decision-making step.Nonetheless, its extracted features have a low dimension, so the overall cost of the time is not too much but recognition rate increases a lot.This is absolutely acceptable.

Conclusions
This paper selects two relatively independent features, color feature and texture feature, to do multi-feature research on rape plant disease recognition.D-S evidence theory shows its feasibility and superiority in multi-feature fusion.By means of introducing the variance, this method effectively avoids the shortcomings that D-S evidence theory cannot solve the classification problem when fused results do not conform to the decision-making rules.However, in this paper, the experimental condition is built on small samples.The performance of our method under big sample conditions is still to be detected.It is also worth considering how to reduce the time complexity in practical applications.