Crops Fine Classiﬁcation in Airborne Hyperspectral Imagery Based on Multi-Feature Fusion and Deep Learning

: Hyperspectral imagery has been widely used in precision agriculture due to its rich spectral characteristics. With the rapid development of remote sensing technology, the airborne hyperspectral imagery shows detailed spatial information and temporal ﬂexibility, which open a new way to accurate agricultural monitoring. To extract crop types from the airborne hyperspectral images, we propose a ﬁne classiﬁcation method based on multi-feature fusion and deep learning. In this research, the morphological proﬁles, GLCM texture and endmember abundance features are leveraged to exploit the spatial information of the hyperspectral imagery. Then, the multiple spatial information is fused with the original spectral information to generate classiﬁcation result by using the deep neural network with conditional random ﬁeld (DNN+CRF) model. Speciﬁcally, the deep neural network (DNN) is a deep recognition model which can extract depth features and mine the potential information of data. As a discriminant model, conditional random ﬁeld (CRF) considers both spatial and contextual information to reduce the misclassiﬁcation noises while keeping the object boundaries. Moreover, three multiple feature fusion approaches, namely feature stacking, decision fusion and probability fusion, are taken into account. In the experiments, two airborne hyperspectral remote sensing datasets (Honghu dataset and Xiong’an dataset) are used. The experimental results show that the classiﬁcation performance of the proposed method is satisfactory, where the salt and pepper noise is decreased, and the boundary of the ground object is preserved.


Introduction
Accurate and timely grasp of the information about the agricultural resources is extremely important for agricultural development. Obtaining the area and spatial distribution of crops is an important way to obtain agricultural information [1,2]. Traditional methods obtain crop classification results through field measurement, investigation and statistics, which are time-consuming, labor-consuming and money-consuming [3,4]. Remote sensing technology advances by leaps and bounds, and the resolution and timeliness of remote sensing images have been improved, and hyperspectral remote sensing data have been widely used [5,6]. Specifically, hyperspectral data play a great role in agricultural surveys [7][8][9][10], and have been used for crop condition monitoring, agricultural yield estimation, pest monitoring and so on. In agricultural survey, the fine classification of the hyperspectral image provides the information of crops distribution [11][12][13]. Fine classification of crops requires images with high spatial and spectral resolution [14]. In recent years, airborne The DNN model has multiple hidden layers, and the hidden layers are fully connected. In the experiment, the DNN model is used to learn the potential features of airborne hyperspectral imagery. The internal information is excavated and the probability image is obtained. Conditional random field is used as a classifier to remove noise and preserve the boundary of ground features. As a discriminative model, the CRF model directly models the posterior probability of the label field through a specific observation field. The probability image is taken as the unary potential function of the conditional random field model, thus reducing the salt and pepper noise.
This article will also explain the following: Section 2 explains the spatial features, fusion methods and classification model. Then, Section 3 introduces the datasets and analyzes the experimental results. Section 4 summarizes the whole article. Figure 1 shows the flowchart of the proposed method for the fine classification of crops in airborne hyperspectral remote sensing images using multi-feature fusion and deep learning. The original hyperspectral image is reduced in dimensionality by using principal component analysis (PCA), and the first four bands are selected as the base image. The morphological features, texture information and endmember abundance features of the image are extracted based on the base images to mine the spatial information. Subsequently, the DNN-CRF is employed as the classification model to mining potential information and obtaining the classification results.

Materials and Methods
Remote Sens. 2021, 13, 3 of 17 spectral represents different features. Moreover, a deep neural network model is used. The DNN model has multiple hidden layers, and the hidden layers are fully connected. In the experiment, the DNN model is used to learn the potential features of airborne hyperspectral imagery. The internal information is excavated and the probability image is obtained. Conditional random field is used as a classifier to remove noise and preserve the boundary of ground features. As a discriminative model, the CRF model directly models the posterior probability of the label field through a specific observation field. The probability image is taken as the unary potential function of the conditional random field model, thus reducing the salt and pepper noise. This article will also explain the following: Section 2 explains the spatial features, fusion methods and classification model. Then, Section 3 introduces the datasets and analyzes the experimental results. Section 4 summarizes the whole article. Figure 1 shows the flowchart of the proposed method for the fine classification of crops in airborne hyperspectral remote sensing images using multi-feature fusion and deep learning. The original hyperspectral image is reduced in dimensionality by using principal component analysis (PCA), and the first four bands are selected as the base image. The morphological features, texture information and endmember abundance features of the image are extracted based on the base images to mine the spatial information. Subsequently, the DNN-CRF is employed as the classification model to mining potential information and obtaining the classification results.

Multiple Feature Extraction
Hyperspectral images have abundant information. Different features can express different details of hyperspectral images. Multi-feature fusion in hyperspectral images is beneficial to solving the problem of insufficient single feature information. Naturally, multi-features fusion has greatly promoted the accuracy of image classification.

Multiple Feature Extraction
Hyperspectral images have abundant information. Different features can express different details of hyperspectral images. Multi-feature fusion in hyperspectral images is beneficial to solving the problem of insufficient single feature information. Naturally, multi-features fusion has greatly promoted the accuracy of image classification.

Texture Features
Hyperspectral remote sensing images have profuse texture information. It is an internal feature common to all object surfaces and represents important information about the distribution of objects and neighborhood relations [28]. The Gray Level Co-Occurrence Matrix (GLCM) is often used to extract texture characteristics [29]. By calculating the correlation between the gray levels of two pixels in a certain distance and a certain direction in an image, it reflects the comprehensive information of the direction, interval, amplitude of change and speed of the image [30].
Suppose f (x , y) is a two-dimensional digital image with a size of M*N and a gray level of Ng, where #x is the number of elements in the set x and P P is a matrix of Ng × Ng. If the distance between (x 1 , y 1 ) and (x 2 , y 2 ) is d and the angle is θ (0 • , 45 • , 90 • and 135 • ), the GLCM of various pitches and angles is: In this method, we use the six measurements, namely mean, homogeneity, contrast, dissimilarity, entropy and second moment to depict the textural information of the image. Specifically, the mean value represents the regularity of the image gray value, and the uniformity of the local image gray level is represented by homogeneity [30]. The contrast shows the sharpness and texture depth of an image, and dissimilarity shows the measure of the degree of difference. Entropy expresses the complexity or unevenness of the image texture, and the angular second moment indicates the uniform characteristics of the local gray distribution of the image and the width of the texture.

Endmember Abundance Features
Affected by the mixing effect of sensors, atmospheric transmission, there are plenty of mixed pixels in the hyperspectral imagery. For reducing the limitations of mixed pixels on the classification process of hyperspectral images, endmember abundance features are extracted [31]. Endmember is a kind of characteristic object with relatively fixed spectrum. Sequential Maximum Angle Convex Cone (SMACC) is a method based on the convex cone model, which uses constraint conditions to identify the endmember spectrum of the image [32]. Firstly, the convex cone is determined by the pole and first endmember spectral. Then, the next end member spectrum is generated by applying the oblique projection of the constraint conditions. The addition of cones can generate a new endmember spectrum until the specified endmember spectrum category is satisfied. In this paper, the SMACC method often extracts endmember spectrum from image. The mathematical formula of the SMACC method is: where H is the endmember spectrum; c is the band index and i is the pixel index; k is the index from 1 to the largest end member; R is the matrix containing the endmember spectrum; A is the abundance of the endmember j to the endmember k in each pixel degree matrix.

Morphological Profiles
Morphology is a theory based on mathematical morphology for mining the morphological profiles of target objects [33]. The basic operations of the morphological algorithm include erosion, dilation, opening and closing [34]. The opening and closing are the combined operation of erosion and dilation. The opening operation performs dilation processing on an erosion image, which can remove the brighter structure in the image. In contrast, the closing operation performs erosion processing on a Dilation image, which can remove darker structures in the image.
Using the opening and closing of morphological reconstruction, the shape and structure can be preserved, and fine noise can be removed. The opening and closing operator are proved to be effective in processing the spatial information for classification of hyperspectral images. Let γ SE (I) be the morphological opening Structuring Elements (SE have properties such as size and shape) of image I, and ϕ SE (I) be the closed morphological. A series of SEs of increasing size are defined as MPs: With γ 0 (I) = ϕ 0 (I) = I In the formula, λ is the radius of SE of the commonly used disk. A grayscale image can be used to generate MPs for open/close reconstruction. A set of SEs with gradually increasing size is used to display the multi-scale information of the image.

Decision Fusion
Decision fusion method is often used in the fusion of multi-features in the image classification [35]. According to different mathematical foundations, the decision fusion is roughly divided into four types: methods based on evidence theory, methods based on probability, methods based on fuzzy logic and methods of voting and election strategies. The basic idea of decision fusion strategy is: each voter evaluates and ranks different candidates, and then calculates the number of votes of all voters. The candidate with the largest number of votes wins the competition.
Decision fusion is a process that data reduction mapping from multiple inputs to a smaller number of outputs [36]. Firstly, the three features extracted from the original image are fused with the spectral information respectively. We can obtain the respective classification results with different features. Then, the classification results of feature are fused by decision fusion to obtain the final classification result. Decision fusion uses the most frequently occurring category as the label of this pixel. Therefore, a classification image can be given based on the classification results of multiple features.
Where A m is the number of votes calculated for candidate m, n is the number of features. If candidate n has the largest number of votes, then candidate n can be the winner after k classifier evaluations and be considered to be the best.

Probability Fusion
Probability fusion based on the probability output result of the classifier. The probability outputs with different features are calculated, on which the probability fusion is performed. The main steps of probability fusion are as follows: firstly, we obtain the classification probability image of each spatial feature and spectral feature through the classifier. Then, probability images of multiple features are fused to obtain probability classification images. The classification result is gain by probabilistic fusion of probability images.

Stacking Fusion
Stacking fusion is classified by the combination of feature vectors. Stacking fusion strategy steps are as follows: firstly, we combine the extracted spatial features with spectral information to form the new feature that is used as input to the classifier. Specifically, the image of classification is the result of fusion of spatial and spectral features. The step of stacking fusion is the fusion of features before classification. Stacking fusion is represented as: where X spec is the spectral feature. X spat is the feature related to the extended the morphological profiles, GLCM texture, and endmember abundance features. Then, there is the feature fusion expression, where γ is the fusion feature and ϕ is the linear mapping moment of the extracted feature.

Deep Neural Networks
Deep Neural Network (DNN) has a strong learning ability, that has been often used for image classification. DNN is used as the classification model to potential features of images. The basic structure of DNN composed of several input layers, hidden layer and output layer. After the input, a linear relationship is learned in hidden layers, and the output result is obtained through the activation function.
The training of the deep neural network includes the forward propagation and back propagation process. The forward propagation algorithm performs a series of linear operations and activation operations with the input value vector by using multiple weight coefficient matrices and bias vectors. Back propagation algorithm optimizes the selected loss function to find the minimum value. A series of linear coefficient matrices and bias vectors are updated. It mines deep features of target high-dimensional data by constructing multiple hidden layers of neuron connections. The structure diagram of DNN is shown in Figure 2.

Stacking Fusion
Stacking fusion is classified by the combination of feature vectors. Stacking fusion strategy steps are as follows: firstly, we combine the extracted spatial features with spectral information to form the new feature that is used as input to the classifier. Specifically, the image of classification is the result of fusion of spatial and spectral features. The step of stacking fusion is the fusion of features before classification. Stacking fusion is represented as: where is the spectral feature. is the feature related to the extended the morphological profiles, GLCM texture, and endmember abundance features. Then, there is the feature fusion expression, where is the fusion feature and is the linear mapping moment of the extracted feature.

Deep Neural Networks
Deep Neural Network (DNN) has a strong learning ability, that has been often used for image classification. DNN is used as the classification model to potential features of images. The basic structure of DNN composed of several input layers, hidden layer and output layer. After the input, a linear relationship is learned in hidden layers, and the output result is obtained through the activation function.
The training of the deep neural network includes the forward propagation and back propagation process. The forward propagation algorithm performs a series of linear operations and activation operations with the input value vector by using multiple weight coefficient matrices and bias vectors. Back propagation algorithm optimizes the selected loss function to find the minimum value. A series of linear coefficient matrices and bias vectors are updated. It mines deep features of target high-dimensional data by constructing multiple hidden layers of neuron connections. The structure diagram of DNN is shown in Figure 2.
The forward propagation algorithm uses several weight coefficient matrices W and bias vector b. After we input data, the result of the next layer was calculated based on the output of the previous layer. The output result is not limited to a single neuron. The output layer can have multiple neurons. The forward propagation formula is:  The forward propagation algorithm uses several weight coefficient matrices W and bias vector b. After we input data, the result of the next layer was calculated based on the Remote Sens. 2021, 13, 2917 7 of 18 output of the previous layer. The output result is not limited to a single neuron. The output layer can have multiple neurons. The forward propagation formula is: where l is the number of input layers, W is the matrix of all hidden layers and output layers. b is the offset vector, and the final output is a l . Back propagation is the core of deep learning. By defining a loss parameter, the gap between the probability output of the model and the real sample is calculated. Here, cross entropy is selected as the loss parameter. The back-propagation algorithm is the opposite process to the forward propagation algorithm. It pushes backwards from the L layer to the first layer, revises W and b through repeated iterations, and finally obtains W and b as the parameters that can be finally classified.

Conditional Random Field
Conditional Random Field (CRF) is a class of statistical modeling method often applied in pattern recognition and machine learning and used for structured prediction. Whereas a classifier predicts a label for a single sample without considering "neighboring" samples, a CRF can take context into account. To do so, the prediction is modeled as a graphical model, which implements dependencies between the predictions. What kind of graph is used depends on the application.
CRF model, as a discriminative model, is extensively used for image classification and target labeling. The Conditional Random Field model (CRF) uses a unified probability framework to simulate the local neighborhood interaction between random variables. It directly simulates the posterior probability of the label and obtains the corresponding Gibbs energy. At the same time, the classification image can obtain the label image with the maximized posterior probability through Bayesian Maximum Posterior Rule (MAP). The CRF model directly simulates the posterior distribution of the label x, given the observation y.
The unary potential function uses the relationship between the label and the observed image data to model. It calculates the single pixel with a specific category label through the feature vector. The binary potential function simulates the spatial context information between a pixel and its neighborhood by considering the field and the observation field. This paper uses the results of DNN classification output to define the unary potential function of the CRF model.
The calculation process of the conditional random field is as follows: V is the set of all the pixels of the observed data; N is the number of pixels in the observed data. Let ψ i (x i , y) and ψ ij x i , x j , y are the unary potential function and the binary potential function E(x|y) respectively defined on the local area of the pixel i. The adjustment parameter of the binary potential function is defined as a non-negative constant, which is used to measure the influence of the unary potential function and the binary potential function.     Table 1 shows the feature types and corresponding pixel numbers of the Honghu data set.    Xiong'an New District is located in Baoding City, Hebei Province, China ( Figure 5). The planning scope covers Xiongxian, Rongcheng, Anxin and some surrounding areas in Hebei Province. The Xiong'an New Area is located in the mid-latitude zone, with a warm temperate monsoon continental climate.

Xiong'an Dataset
Xiong'an New District is located in Baoding City, Hebei Province, China ( Figure 5 The planning scope covers Xiongxian, Rongcheng, Anxin and some surrounding areas Hebei Province. The Xiong'an New Area is located in the mid-latitude zone, with a war temperate monsoon continental climate. In October 2017, the Institute of Remote Sensing and Digital Earth of the Chinese Aca emy of Sciences and the Shanghai Institute of Technical Physics of the Chinese Academy Sciences conducted an aerial hyperspectral remote sensing data acquisition experiment Xiong'an New District, Hebei Province (Xiong'an dataset, Figure 6.). The hyperspectral im age data of Horseshoe Bay Village in Xiong'an New District was collected by full spectru multimode imaging spectrometer for high resolution special aviation system, with a spati resolution of 0.5 m, a size of 3750×1580 and 250 bands from 400 to 1000 nm. Table 2 show the feature types and corresponding pixel numbers of the Xiong'an data set. In October 2017, the Institute of Remote Sensing and Digital Earth of the Chinese Academy of Sciences and the Shanghai Institute of Technical Physics of the Chinese Academy of Sciences conducted an aerial hyperspectral remote sensing data acquisition experiment in Xiong'an New District, Hebei Province (Xiong'an dataset, Figure 6). The hyperspectral image data of Horseshoe Bay Village in Xiong'an New District was collected by full spectrum multimode imaging spectrometer for high resolution special aviation system, with a spatial resolution of 0.5 m, a size of 3750 × 1580 and 250 bands from 400 to 1000 nm. Table 2 shows the feature types and corresponding pixel numbers of the Xiong'an data set.  In order to verify the effectively of this method, we compared the following seven sets of experiments: the original spectral image, GLCM texture, morphological profiles, endmember abundance features, decision fusion, probability fusion and stacking fusion.
The airborne hyperspectral image has rich spectral characteristics. For dimensionality reduction, we use PCA to reduce the airborne hyperspectral image to the first eight bands. Using the data after the PCA as the basic data source, texture features are extracted through GLCM. Among them, we set the window size to 7 × 7. The direction is set to 0°, 45°, 90° and 135°. The average of the results in the four directions is used to represent the GLCM texture. The endmember spectral is extracted. RMS Error Tolerance is set to 0, so that abundance images and spectra can be obtained. The morphological profiles are obtained by morphological opening and closing reconstruction. The radius of the disk operator is set to 1, 3, 5 and 7.
Deep neural network has five hidden layers, the number of neurons in each layer was 29. The learning rate is set to 0.00001. The number of iterations is 1800. The minibatch size is set to 27. In order to avoid overfitting, this paper uses the dropout method to randomly  In order to verify the effectively of this method, we compared the following seven sets of experiments: the original spectral image, GLCM texture, morphological profiles, endmember abundance features, decision fusion, probability fusion and stacking fusion.
The airborne hyperspectral image has rich spectral characteristics. For dimensionality reduction, we use PCA to reduce the airborne hyperspectral image to the first eight bands. Using the data after the PCA as the basic data source, texture features are extracted through GLCM. Among them, we set the window size to 7 × 7. The direction is set to 0 • , 45 • , 90 • and 135 • . The average of the results in the four directions is used to represent the GLCM texture. The endmember spectral is extracted. RMS Error Tolerance is set to 0, so that abundance images and spectra can be obtained. The morphological profiles are obtained by morphological opening and closing reconstruction. The radius of the disk operator is set to 1, 3, 5 and 7.
Deep neural network has five hidden layers, the number of neurons in each layer was 29. The learning rate is set to 0.00001. The number of iterations is 1800. The minibatch size is set to 27. In order to avoid overfitting, this paper uses the dropout method to randomly delete 30% of the neural nodes to reduce the network complexity and improve the generalization ability of the model. Supported by a large number of experiments, the parameters of CRF, λ and θ are set to 1.6 and 3.0, respectively. The accuracy of each crop, the overall accuracy (OA) and Kappa coefficient (Kappa) are used to verify the classification results. Kappa coefficient is a measure of classification accuracy, which can be calculated by: where P o is the sum of the number of samples of each class divided by the total number of samples, that is, the overall classification accuracy. Suppose that the number of real samples of each class is A 1 , A 2 , . . . , A C respectively, and the number of predicted samples of each class is B 1 , B 2 , . . . , B C respectively, and the total number of samples is n, The formula is as follows:

Experimental Results
The classification results of Honghu are shown in Figure 7a is the classification result of the original image. The result shows that there are still many misclassifications of ground objects. The white radish and small brassica chinensis in the lower left corner of the image are misrecognized. The carrot in the upper right corner of the image is classified as tuber mustard. Figure 7b is the endmember abundance classification result. The cabbage in the lower left corner was misclassification but recognized as film-covered lettuce. Moreover, some brassica chinensis were mistakenly classified as rapeseed and small brassica chinensis. Romaine lettuce in the middle was also mistakenly classified as film-covered lettuce. The result of GLCM texture is shown in Figure 7c. In addition to the misclassification of lactuca sativa and carrot, there are also some small Brassica chinensis that are classified as pakchoi cabbage. The classification results of morphological profiles are slightly improved in Figure 7d. However, the cabbage is still classified as small green vegetables and film-covered lettuce. Part of the greens was also mistakenly classified as small greens and rapeseed. The results of decision fusion, probability fusion and stacking fusion are shown in Figure 7e-g. The classification results of the three fusion strategies are satisfactory. Almost all categories are classified correctly, but there are still misclassifications. For example, in the decision fusion result, some Chinese cabbage was wrongly classified as bare land and rape, green cabbage was wrongly classified as small green vegetables, and some small green vegetables were mixed with rape. In the result of probabilistic fusion, various features can be better distinguished. Among them, the precision of carrot, sprouting garlic, celtuce, etc. has been greatly improved. delete 30% of the neural nodes to reduce the network complexity and improve the generalization ability of the model. Supported by a large number of experiments，the parameters of CRF, λ and θ are set to 1.6 and 3.0, respectively. The accuracy of each crop, the overall accuracy (OA) and Kappa coefficient (Kappa) are used to verify the classification results. Kappa coefficient is a measure of classification accuracy，which can be calculated by: where is the sum of the number of samples of each class divided by the total number of samples, that is, the overall classification accuracy. Suppose that the number of real samples of each class is A1, A2, ..., AC respectively, and the number of predicted samples of each class is B1, B2, ..., BC respectively, and the total number of samples is n, The formula is as follows:

Experimental Results
The classification results of Honghu are shown in Figure 7a is the classification result of the original image. The result shows that there are still many misclassifications of ground objects. The white radish and small brassica chinensis in the lower left corner of the image are misrecognized. The carrot in the upper right corner of the image is classified as tuber mustard. Figure 7b is the endmember abundance classification result. The cabbage in the lower left corner was misclassification but recognized as film-covered lettuce. Moreover, some brassica chinensis were mistakenly classified as rapeseed and small brassica chinensis. Romaine lettuce in the middle was also mistakenly classified as film-covered lettuce. The result of GLCM texture is shown in Figure 7c. In addition to the misclassification of lactuca sativa and carrot, there are also some small Brassica chinensis that are classified as pakchoi cabbage. The classification results of morphological profiles are slightly improved in Figure 7d. However, the cabbage is still classified as small green vegetables and film-covered lettuce. Part of the greens was also mistakenly classified as small greens and rapeseed. The results of decision fusion, probability fusion and stacking fusion are shown in Figure 7e-g. The classification results of the three fusion strategies are satisfactory. Almost all categories are classified correctly, but there are still misclassifications. For example, in the decision fusion result, some Chinese cabbage was wrongly classified as bare land and rape, green cabbage was wrongly classified as small green vegetables, and some small green vegetables were mixed with rape. In the result of probabilistic fusion, various features can be better distinguished. Among them, the precision of carrot, sprouting garlic, celtuce, etc. has been greatly improved.  Table 3 is the classification accuracy of different features and fusion strategies of the Honghu dataset. The OA of the original spectral is 91.05%, and the accuracy of celtuce, romaine lettuce, and carrots is 0%. The overall accuracy of classification using endmember abundance is 91.77%. Compared with the classification results of the original image, the accuracy of part types of endmember abundance was improved, such as celtuce and carrot, but pakchoi cabbage was misclassified. The OA of GLCM texture and morphological profiles were 91.92% and 93.64%, respectively. The accuracy of some crops such as bare soil, cotton, lettuce were improved. The accuracy of multi-features fusion more than 95%, and the 18 categories in the classification result are basically consistent with the ground truth. The probability fusion and stacking fusion classification results are generally better, with OA reaching 96.89% and 98.7%, respectively. The accuracy of multi-feature fusion is higher than that of single-feature classification, which shows that the fusion of multiple features is helpful for the fine classification of crops.  Figure 8 shows the classification results of Xiongan. Figure 8a is the classification result of the original image. Large-area classification results are better, but the accuracy of small areas such as peach, vegetable field, and locust is only 0%. The classification result of endmember abundance is shown in Figure 8b. The peach in the upper part is still misclassified, and there are many small pixels in the upper right corner that have been misclassified. Figure 8c is the result of the GLCM texture. Compared with the first two sets of experiments, there was a great improvement in the maintenance of the ground object boundary. The morphological profiles can well maintain the shape characteristics of the image. The experimental results shown in Figure 8d clearly show that almost all categories of different area sizes are displayed.  Table 3 is the classification accuracy of different features and fusion strategies of the Honghu dataset. The OA of the original spectral is 91.05%, and the accuracy of celtuce, romaine lettuce, and carrots is 0%. The overall accuracy of classification using endmember abundance is 91.77%. Compared with the classification results of the original image, the accuracy of part types of endmember abundance was improved, such as celtuce and carrot, but pakchoi cabbage was misclassified. The OA of GLCM texture and morphological profiles were 91.92% and 93.64%, respectively. The accuracy of some crops such as bare soil, cotton, lettuce were improved. The accuracy of multi-features fusion more than 95%, and the 18 categories in the classification result are basically consistent with the ground truth. The probability fusion and stacking fusion classification results are generally better, with OA reaching 96.89% and 98.7%, respectively. The accuracy of multi-feature fusion is higher than that of single-feature classification, which shows that the fusion of multiple features is helpful for the fine classification of crops. Figure 8 shows the classification results of Xiongan. Figure 8a is the classification result of the original image. Large-area classification results are better, but the accuracy of small areas such as peach, vegetable field, and locust is only 0%. The classification result of endmember abundance is shown in Figure 8b. The peach in the upper part is still misclassified, and there are many small pixels in the upper right corner that have been misclassified. Figure 8c is the result of the GLCM texture. Compared with the first two sets of experiments, there was a great improvement in the maintenance of the ground object boundary. The morphological profiles can well maintain the shape characteristics of the image. The experimental results shown in Figure 8d clearly show that almost all categories of different area sizes are displayed.  The classification accuracy of different features is shown in Table 4. In the classification results of the original spectral, the classification accuracy of sparse forest, peach, and soybean is 0%, and all of them are mistaken for pear tree. The accuracy of acer compound and corn is less than 60%, and the OA is 85.46%. In the classification results using endmember abundance, it can be seen that the classification accuracy of most crops has been improved, but the classification accuracy of peach, soybean, and locust is still 0. According to the classification results of GLCM texture, the classification accuracy of peach trees has increased by 3.17%, the classification accuracy of maple leaves has reached 91.28%, and the OA has reached 90.85%. In the classification results of morphological profiles, the classification accuracy of peach has increased by more than 60%, the classification accuracy of rice is the highest at 98.83%, and the OA is 94.08%. The last three groups are the results of decision fusion, probability fusion and staking fusion. Among them, the OA of decision fusion is 94.34%, and the Kappa coefficient is 0.915. Except for vegetable field and sparse forest, the classification accuracy of all crops is above 50%. The OA of probability fusion is 95.74% and Kappa coefficient is 0.928. Only the classification accuracy of vegetable plots is below 60%, and there are seven types of crop categories above 95%. In the classification accuracy of stacking fusion, there are 12 categories that reach more than 99%, including rice, water and willow. The OA is 99.71%, and the overall accuracy is satisfying.   The classification accuracy of different features is shown in Table 4. In the classification results of the original spectral, the classification accuracy of sparse forest, peach, and soybean is 0%, and all of them are mistaken for pear tree. The accuracy of acer compound and corn is less than 60%, and the OA is 85.46%. In the classification results using endmember abundance, it can be seen that the classification accuracy of most crops has been improved, but the classification accuracy of peach, soybean, and locust is still 0. According to the classification results of GLCM texture, the classification accuracy of peach trees has increased by 3.17%, the classification accuracy of maple leaves has reached 91.28%, and the OA has reached 90.85%. In the classification results of morphological profiles, the classification accuracy of peach has increased by more than 60%, the classification accuracy of rice is the highest at 98.83%, and the OA is 94.08%. The last three groups are the results of decision fusion, probability fusion and staking fusion. Among them, the OA of decision fusion is 94.34%, and the Kappa coefficient is 0.915. Except for vegetable field and sparse forest, the classification accuracy of all crops is above 50%. The OA of probability fusion is 95.74% and Kappa coefficient is 0.928. Only the classification accuracy of vegetable plots is below 60%, and there are seven types of crop categories above 95%. In the classification accuracy of stacking fusion, there are 12 categories that reach more than 99%, including rice, water and willow. The OA is 99.71%, and the overall accuracy is satisfying. In order to verify the effect of the sample size of this method on the results, 3%, 5%, 10% of testing sample were used as the training samples. The experimental results of different algorithms using different training samples are shown in the Table 3.
From Table 5, we can see that as the number of samples increases, the classification accuracy of the image also increases. Table 4 shows the classification results of different samples in Honghu. The training sample result of 10% of the original image has reached 98.71%. The results of Xiong'an are shown in Table 3, with the highest accuracy of 99.94%. The accuracy of different fusion strategies is also increasing. The classification accuracy of more than 3% of the training samples of the original image is more than 90%. Therefore, it can be seen that the number of training samples also plays a great role in image classification.

Effect of Classifier
In order to verify the effect of the classifier on the results, we chose different classifiers for experiments. Here, we choose the method based on SVM classifier and DNN classifier. At the same time, we still use three fusion strategies combined with different classification classifier.
The sample size is 3% of the original image. In addition, the results of different classifier are in Table 4. In Table 4, the classification accuracy of the DNN classifier is the highest, followed by the SVM classifier. Through deep learning to mine the potential information of the image and build a deep network classifier, combined with the conditional random field, the classification accuracy of the image by DNN classifier has increased by about 10%. The inherent information of airborne hyperspectral images is difficult to mine close to ordinary classifiers. The combination of multi-features and deep learning can mine deep information. This proves that the DNN classifier is suitable for airborne image classification. The experimental results are in Table 6.

Effect of CRF
At the same time, we also discussed the effect of conditional random fields on the classification results. One group of experimental methods with CRF, and the other group without CRF. Firstly, the fusion image is input into the DNN model to obtain the probability image, and the accuracy of the probability image is evaluated accurately to obtain the classification result of the DNN method. The other is the method proposed in this paper. Multi-feature fusion data input into the model with CRF to obtain the classification results and evaluated the accuracy. The accuracy results of the comparison method are shown in Table 7. From the table, we can see that the accuracy of the model with CRF is about 5% higher than that of the model without CRF. Therefore, we can conclude that the deep classification model can improve the accuracy in the process of crops fine classification. The use of conditional random fields can improve the accuracy of fine classification of crops.

Discussion
In this paper, we proposed a method for crops fine classification in airborne hyperspectral image based on multi-feature fusion and deep learning. We extracted GLCM texture, morphological profiles, and endmember abundance features from airborne hyperspectral imagery. To fuse the spatial and spectral information of the image, decision fusion, probability fusion and stacking fusion are used. At the same time, the classification model consists of deep neural networks and conditional random fields are employed. The deep learning model can mine the deep information of the image. The CRF keeps the boundaries of the ground features intact while reducing noise. We conducted experiments on the Honghu dataset and the Xiong'an dataset. The results proved that the DNN-CRF method proposed in this paper helps to improve the accuracy of crops classification. Specifically, the classification accuracy of multi-feature fusion is higher than that of single feature. The experimental results proved that multi-feature fusion can help improve the classification accuracy. The larger the number of training samples, the higher the accuracy can be obtained. Additionally, the classification accuracy of DNN is higher than that of SVM, as DNN mined the deep features of the image for crops fine classification. Moreover, the accuracy of the experiment with CRF is higher than the accuracy of classification without CRF. It can be seen that CRF has the effect of improving the accuracy of crop classification. In the future work, we will consider more other types of neural networks, such as CNNs, as well as the integration of UAV and aerial images, to meet the needs of larger scale fine classification of crops, trying our best to apply our research results to the field of agriculture faster and better.