Spatial – Spectral Fusion Based on Conditional Random Fields for the Fine Classification of Crops in UAV-Borne Hyperspectral Remote Sensing Imagery

The fine classification of crops is critical for food security and agricultural management. There are many different species of crops, some of which have similar spectral curves. As a result, the precise classification of crops is a difficult task. Although the classification methods that incorporate spatial information can reduce the noise and improve the classification accuracy, to a certain extent, the problem is far from solved. Therefore, in this paper, the method of spatial–spectral fusion based on conditional random fields (SSF-CRF) for the fine classification of crops in UAV-borne hyperspectral remote sensing imagery is presented. The proposed method designs suitable potential functions in a pairwise conditional random field model, fusing the spectral and spatial features to reduce the spectral variation within the homogenous regions and accurately identify the crops. The experiments on hyperspectral datasets of the cities of Hanchuan and Honghu in China showed that, compared with the traditional methods, the proposed classification method can effectively improve the classification accuracy, protect the edges and shapes of the features, and relieve excessive smoothing, while retaining detailed information. This method has important significance for the fine classification of crops in hyperspectral remote sensing imagery.


Introduction
The accurate identification of crop types is an important basis of agricultural monitoring, crop yield estimation, growth analysis, and determination of crop area and spatial distribution [1,2].It is also an important basis for rationally allocating resources, scientifically adjusting agricultural structure, and planning economic development strategies in the agricultural production process [3][4][5].Remote sensing technology has been widely used in crop classification for its advantages of speed, simplicity, and low cost [6].However, the conventional multispectral remote sensing images are limited by low spectral resolution.Furthermore, the spectra of different plants have many similar features, so the traditional wide-band spectral data cannot be used to accurately identify crop types [7,8].In contrast, the high spectral resolution of hyperspectral images makes it possible to detect the subtle spectral differences between crop species, which is conducive to fine crop classification [9,10].
In recent years, more and more scholars have used hyperspectral images for classification.There are two main approaches used in this field: (1) Machine learning and pattern recognition; and (2) probability statistics.Among the methods in the first category, Cheng et al. [11] proposed a new sparse-based hyperspectral image classification algorithm, which incorporates contextual information in the sparse recovery optimization problem, achieving a classification performance that was better than that of the classical supervised support vector machine classifier.Chen et al. [12] employed the sparse auto-encoder (SAE) depth model to extract features of hyperspectral imagery and classify these features via logical regression.Wang and Wu [13] analyzed the hyperspectral characteristic parameters of eight common crops in the Jianghuai watershed area in China, and used a back propagation (BP) neural network to classify them, achieving an accuracy of 91.8%.In the second category, there have also been some notable achievements.Zhang et al. [14] designed a hybrid decision tree classification algorithm, based on the spectral characteristics of hyperspectral data in the rice growing season, and this method obtained an accuracy of 94.9% when it was used to classify hyperspectral image data of the Jintan rice breeding farm in Changzhou, Jiangsu, China.Senthilnath et al. [15] used principal component analysis (PCA) to reduce the dimension of an EO-1 Hyperion image and the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) Indian Pines image, and used a hierarchical artificial immune system to extract a variety of crops, obtaining a higher classification accuracy than the traditional unsupervised classification methods.Mariotto et al. [8] used hyperspectral reflectance data to accurately identify cotton, wheat, corn, rice, and alfalfa, achieving an accuracy improvement of about 20% when compared to multispectral data.Finally, Chen [16] applied a spectrum analysis method to analyze the spectral characteristics of typical wetland vegetation in different seasons.Although these methods provide some ideas for the classification of hyperspectral remote sensing images, their research objects involve spaceborne hyperspectral imagery, which generally has an insufficient spatial resolution.Therefore, the classification models of the above methods mainly rely on the image spectral information, and ignore the spatial information.As a result, it is difficult to achieve a fine classification result.
In the south of China, the current situation of farmland fragmentation [17] and the low spatial resolution of the spaceborne hyperspectral remote sensing images make it difficult to obtain good classification results.With the rapid development of unmanned aerial vehicle (UAV) technology, UAV-borne remote sensing has become an important means of Earth observation, providing support for the development of precision agriculture.With their small size, low cost, flexible operation, and short operating cycles [18][19][20][21], UAV-borne remote sensing systems can simultaneously obtain data with high spatial and spectral resolutions, which enables us to obtain more accurate agricultural information [22].These advantages make up for the drawbacks of the existing spaceborne, airborne, and ground-based remote sensing systems, making UAV-borne remote sensing systems more suitable for small-and medium-scale agricultural remote sensing applications [23].Therefore, this kind of hyperspectral imagery has become a unique data source for the fine classification of crops.On the one hand, however, as the dimension of hyperspectral data increases, the high redundancy between bands poses great difficulties for classification [24].On the other hand, the increased spatial resolution makes such hyperspectral data contain more detailed features, resulting in spectral changes and heterogeneity within the same feature, and a reduction in the spectral separability [25].Therefore, the simple use of spectral classification alone cannot meet the increasingly high spatial resolution.The spatial features hidden in hyperspectral data are now gradually being utilized, and methods for merging spectral-spatial features are being increasingly applied to crop classification [26].
The random field method is a classification method that can effectively combine spatial contextual information.The Markov random field (MRF) model was first used for image processing in 1984 [27,28], and has since been widely used in classification problems [29,30].The Markovian support vector classifier (MSVC) is a new MRF-based classifier that integrates support vector machines (SVM) and MRF, and uses iterated conditional modes (ICM) to optimize the energy function of the spatial contextual classification [31].The MRF model can fuse the spatial information in the label data, but it only considers joint distributions in the label domain, which cannot simulate the spatial interactions in the observed data [32].The conditional random field (CRF) model is optimized on the basis of the MRF model, and can consider contextual information in both label data and observation data [33].For example, the support vector conditional random field classifier [34,35] is widely used to combine spatial information, effectively overcoming salt-and-pepper classification noise.The pairwise conditional random field model has also been successfully applied to the classification of remote sensing images [36][37][38], where the unary potential function and the pairwise potential function can better combine the spatial interactions in the local neighborhood.However, the many CRF-based models all result in different degrees of smoothing when applied to classification [39].In particular, when using high spatial resolution hyperspectral images for the fine classification of crops, many small but very important features will be treated as noise and removed, which greatly affects the result of the fine classification.
Therefore, in this paper, we propose the method of spectral-spatial fusion based on conditional random fields (SSF-CRF) for the fine classification of crops in hyperspectral imagery, which is designed to fuse the spatial and spectral features of the high spatial resolution hyperspectral data by combining suitable potential functions in a pairwise conditional random field model.In this method, to reduce the spectral changes within homogenous regions, preserve details, and alleviate the problem of excessive smoothing, SSF-CRF selects representative features from the perspectives of mathematical morphology, spatial texture, and mixed pixel decomposition to form the spatial feature vector, and then combines them with the spectral information of each pixel to form the spectral-spatial fusion feature vector.It then models the relationship between the label and the fusion feature, and calculates the probability estimate of each pixel independently, based on the feature vector, according to the given label, to obtain the probability image.Finally, under the action of the spatial smoothing term and the local class label cost term, the label field and the observation field simulate the spatial contextual information of each pixel and its corresponding domain, considering the spatial correlation and reducing the noise while retaining the detailed features.It thereby maintains the integrity of the homogeneous regions and the shape structure of the features by simulating the spatial contextual information of each pixel and its corresponding field through the label field and the observation field.

The Improved Conditional Random Field (CRF) Model
The CRF model simulates the local neighborhood interaction between random variables in a uniform probability framework, which directly models the posterior probability of the label, given the observed image data, as a Gibbs distribution [40,41]: where y = y 1 , y 2 , . . ., y N is the observed data; y i is the spectral vector of pixel i ∈ V = {1, 2, . . . ,N}; V is the set of all the pixels of the observed data; N is the number of pixels in the observed data; x = {x 1 , x 2 , . . . ,x N } represents the class labels of the whole image; x i (i = 1, 2, . . ., N) comes from the label set L = {1, 2, . . . ,K}; K is the number of classes; Z is the normalization function; and ψ c (x c , y) is defined locally as the potential function, which is an arbitrary positive function of the clique c.C is the set of all the cliques, which represents a fully connected subgraph.
The CRF model directly simulates the posterior distribution of the label x, given the observation y.The corresponding Gibbs energy is as shown in Equation ( 2): Correspondingly, the classified image finds the label image x that maximizes the posterior probability P(x|y) by the Bayesian maximum a posteriori (MAP) rule.Therefore, the MAP label x MAP of the random field is given by: Thus, when the posterior probability P(x|y) is at its largest, the energy function E(x|y) is minimal.The remote sensing classification problem can be described by designing suitable potential functions for the pairwise conditional random field model: where ψ i (x i , y) and ψ ij x i , x j , y are, respectively, the unary potential function and pairwise potential function defined in the local neighborhood N i of i.In this paper, an eight-neighborhood system is used to encode the pairwise interactions, as shown in Figure 1.The non-negative constant λ is an adjustment parameter of the pairwise potential function, and is used to balance the effects of the unary potential function and the pairwise potential function.
Correspondingly, the classified image finds the label image x that maximizes the posterior probability P( | ) x y by the Bayesian maximum a posteriori (MAP) rule.Therefore, the MAP label xMAP of the random field is given by: Thus, when the posterior probability ( ) x y is at its largest, the energy function E( | ) x y is minimal.The remote sensing classification problem can be described by designing suitable potential functions for the pairwise conditional random field model: where ( ) x y ψ and ( ) x x y ψ are, respectively, the unary potential function and pairwise potential function defined in the local neighborhood i N of i.In this paper, an eight-neighborhood system is used to encode the pairwise interactions, as shown in Figure 1.The non-negative constant λ is an adjustment parameter of the pairwise potential function, and is used to balance the effects of the unary potential function and the pairwise potential function.

Unary Potential
The unary potential function ( ) x y ψ models the relationship between the label and the observed image data, and the cost of the individual pixels using the particular class label is calculated by the spectral-spatial feature vector.Therefore, each pixel can be separately calculated by a discriminant classifier, capable of giving a probability estimate of the label x , and then obtaining a feature vector.The unary potential plays a leading role in the classification process and can generally be the posterior probability of a supervised classifier.It is usually defined as: where f is a feature mapping function, which maps an arbitrary subset of contiguous image cells to a feature vector; and ( ) i f y represents the feature vector at position i.
( ) the probability of pixel i acquiring the label k l , based on the feature vector.Because the SVM classifier performs well in the case of a small number of training samples in remote sensing image classification [42,43], we select the SVM classifier with a radial basis function as the kernel type to obtain the probability estimate from the spatial-spectral feature vector as the unary potential function.
In this paper, the two parameters C and γ are set as the default values.

Spectral Characteristics
Minimum noise fraction (MNF) rotation is a commonly used method for extracting spectral features, and it is both simple and easy to implement.After MNF transformation, the components are arranged according to the signal-to-noise ratio, where the information is mainly concentrated in

Unary Potential
The unary potential function ψ i (x i , y) models the relationship between the label and the observed image data, and the cost of the individual pixels using the particular class label is calculated by the spectral-spatial feature vector.Therefore, each pixel can be separately calculated by a discriminant classifier, capable of giving a probability estimate of the label x i , and then obtaining a feature vector.The unary potential plays a leading role in the classification process and can generally be the posterior probability of a supervised classifier.It is usually defined as: where f is a feature mapping function, which maps an arbitrary subset of contiguous image cells to a feature vector; and f i (y) represents the feature vector at position i.
] is the probability of pixel i acquiring the label l k , based on the feature vector.Because the SVM classifier performs well in the case of a small number of training samples in remote sensing image classification [42,43], we select the SVM classifier with a radial basis function as the kernel type to obtain the probability estimate from the spatial-spectral feature vector as the unary potential function.In this paper, the two parameters C and γ are set as the default values.

Spectral Characteristics
Minimum noise fraction (MNF) rotation is a commonly used method for extracting spectral features, and it is both simple and easy to implement.After MNF transformation, the components are arranged according to the signal-to-noise ratio, where the information is mainly concentrated in the first component.As the components increase, the image quality gradually decreases.Studies have shown that, compared with the original high-dimensional image data and the feature image obtained by PCA transformation, the low-dimensional feature image obtained by MNF transformation can extract the spectral information more effectively [44].Therefore, we choose this method to extract the spectral information of the high spatial resolution hyperspectral imagery.

Spatial Characteristics
A. Morphological Feature Mathematical morphology is an effective image feature extraction tool that describes the local characteristics of images.The basic morphological operations are corrosion, expansion, and opening and closing operations, which act on the image through a series of shape regions called structural elements (SEs).The morphological opening and closing reconstructions are another common kind of operator, which has a better shape preservation ability than the classical morphological filters.Since the shape of the SEs used in the filtering is adaptive, with respect to the structures present in the image itself, it nominally introduces no shape noise [45,46], as shown in Figure 2. In this paper, we extract the spatial information of the images, based on "opening reconstruction followed by closing reconstruction" (OFC), which can simultaneously smooth out the bright and dark details of the structure while maintaining the overall feature stability and improving the consistency within the object area [47,48].
by PCA transformation, the low-dimensional feature image obtained by MNF transformation can extract the spectral information more effectively [44].Therefore, we choose this method to extract the spectral information of the high spatial resolution hyperspectral imagery.

A. Morphological Feature
Mathematical morphology is an effective image feature extraction tool that describes the local characteristics of images.The basic morphological operations are corrosion, expansion, and opening and closing operations, which act on the image through a series of shape regions called structural elements (SEs).The morphological opening and closing reconstructions are another common kind of operator, which has a better shape preservation ability than the classical morphological filters.Since the shape of the SEs used in the filtering is adaptive, with respect to the structures present in the image itself, it nominally introduces no shape noise [45,46], as shown in Figure 2. In this paper, we extract the spatial information of the images, based on "opening reconstruction followed by closing reconstruction" (OFC), which can simultaneously smooth out the bright and dark details of the structure while maintaining the overall feature stability and improving the consistency within the object area [47,48].
The OFC operator is a hybrid operation of opening by reconstruction (OBR) and closing by reconstruction (CBR), which can be defined as: where

B. Texture Feature
Hyperspectral remote sensing images not only have continuous and abundant spectral information, but also rich texture information.Some studies have demonstrated the efficiency of texture for improving land-cover classification accuracy [49,50].Image textures are complex visual patterns composed of entities or regions with sub-patterns with the characteristics of brightness, color, shape, size, etc. Texture is an intrinsic property common to the surface of all objects, and contains important information about the organization of the surface structure of the object and its relationship with the surrounding environment.The gray-level co-occurrence matrix (GLCM) is a commonly used method for extracting texture information with a better discriminative ability [51,52].The principle is to establish a GLCM between two pixels in a certain positional relationship in the image and to extract the corresponding feature quantity from this matrix for the texture analysis.The OFC operator is a hybrid operation of opening by reconstruction (OBR) and closing by reconstruction (CBR), which can be defined as: where is the opening reconstruction of the closing reconstruction image.

B. Texture Feature
Hyperspectral remote sensing images not only have continuous and abundant spectral information, but also rich texture information.Some studies have demonstrated the efficiency of texture for improving land-cover classification accuracy [49,50].Image textures are complex visual patterns composed of entities or regions with sub-patterns with the characteristics of brightness, color, shape, size, etc. Texture is an intrinsic property common to the surface of all objects, and contains important information about the organization of the surface structure of the object and its relationship with the surrounding environment.The gray-level co-occurrence matrix (GLCM) is a commonly used method for extracting texture information with a better discriminative ability [51,52].The principle is to establish a GLCM between two pixels in a certain positional relationship in the image and to extract the corresponding feature quantity from this matrix for the texture analysis.
If we let f (x, y) be a two-dimensional digital image with the size of M × N, and the gray level is Ng, then the GLCM satisfying a certain spatial relationship is: where #(x) is the number of elements in the set x, and P is the matrix of Ng × Ng.If the distance between (x 1 , y 1 ) and (x 2 , y 2 ) is d and the angle is θ, then the GLCM P(i, j, d, θ) of various spacings and angles is: In this paper, we use the following texture metrics: (1) Homogeneity-reflects the uniformity of the image grayscale; (2) Angular second moment-reflects the uniformity of the grayscale distribution of the image and the thickness of the texture; (3) Contrast-reflects the amount of grayscale change in the image; (4) Dissimilarity-measures the degree of dissimilarity of the gray values in the image; (5) Mean-indicates the degree of regularity of the texture; (6) Entropy-reflects the complexity or non-uniformity of the image texture.

C. Endmember Component
There are a large number of mixed pixels in high spatial resolution hyperspectral images.For mixed pixels, if using hard classification technology, a lot of information will be lost.If a method of mixed pixel decomposition is used, the corresponding percentage of each class in the mixed pixel can be expressed, thereby obtaining an abundance image equal to the number of classes.The endmember is a physical quantity associated with a mixed pixel.It is the main parameter describing the linear mixed model, representing the characteristic feature with a relatively fixed spectrum.The endmember extraction can obtain more detailed information of the image.In the proposed method, the sequential maximum angle convex cone (SMACC) endmember model is used to extract the endmember spectra and the abundance image, to form the endmember component [53], which can be defined as: where H is the spectral endmember; c and i are the band index and the pixel index, respectively; k and j represent an index from 1 to the largest endmenber; R is the matrix containing the endmember spectra; and A is the abundance matrix containing endmember j to endmenber k in each pixel.

Pairwise Potential
The pairwise term simulates the spatial contextual information between each pixel and its neighborhood by considering the label field and the observation field.Although the spectral values of adjacent pixels in a uniform image may look different due to spectral changes and noise, they are likely to be the same class, due to spatial correlation.The pairwise potential function models this smoothness and takes the label constraints into account, which facilitates the classification of pixels with the same features in a uniformly distributed region and preserves the edges of adjacent regions.The pairwise potential function is defined as follows: where Θ L x i , x j |y is the local class label cost term with the size of |L| × |L|, which represents the cost between x i and x j in the neighborhood.The parameter θ is the interaction coefficient that controls the degree of the label cost term.The range of parameter θ is usually [0-4].g ij (y) is the smoothing term related to y, which simulates the interaction between adjacent pixels i and j, and is used to measure the difference between adjacent pixels, as defined below: where (i, j) is the spatial position of adjacent pixels, and the function dist(i, j) is their Euclidean distance, which is in the real space, not in the feature space.y i and y j are spectral vectors representing pixels i and j that can correlate the strength of the interactions within the neighborhood with the image data and promote consistency in similar regions.Parameter β is the mean squared error between the spectral vectors of all the adjacent pixels in the image (β = 2 y i − y j , where y i − y j 2 is the average over the image).
The local class label cost term Θ L x i , x j |y simulates the spatial relationship between different neighborhood class labels and the observed image data, and is defined as: where P[x i | f i (y)] is the label probability of the feature vector f i (y) given by the SVM classifier.The term takes the current class label x i into account to measure the correlation between the labels of adjacent elements i and j.When there is a strong overlap of classes in the feature space, it changes the label of the pixel through the neighborhood space label information.Therefore, the local class label cost term associated with the current thematic label considers the spectral information by the probability distribution estimation form of the thematic category label to perform appropriate smoothing, while considering the spatial contextual information.

Algorithm Flowchart
The flowchart of the SSF-CRF method proposed in this paper is provided in Figure 3.According to the characteristics of high spatial resolution hyperspectral data, SSF-CRF combines the spatial and spectral features of pixels to form a spectral-spatial fusion feature vector, which is set to the unary potential function in the CRF framework.The local class label cost term is then set to the pairwise potential function.The method is described as follows: (1) MNF rotation is performed on the original image, and the noise covariance matrix in the principal component is used to separate and readjust the noise in the data, so that the variance of the transformed noise data is minimized and the bands are not correlated; (2) Representative features are selected from the perspective of mathematical morphology, spatial texture, and mixed pixel decomposition, and then combined with the spectral information of each pixel to form a spectral-spatial fusion feature vector.The SVM classifier is used to model the relationship between the label and the fusion feature and the probability estimate of each pixel is calculated independently, based on the feature vector, according to the given label; (3) The spatial smoothing term and the local class label cost term simulate the spatial contextual information of each pixel and its corresponding neighborhood through the label field and the observation field.According to spatial correlation theory, both the spatial smoothing term and the local class label cost term have the effect of adjacent pixels having the same class label.

Algorithm Flowchart
The flowchart of the SSF-CRF method proposed in this paper is provided in Figure 3.According to the characteristics of high spatial resolution hyperspectral data, SSF-CRF combines the spatial and spectral features of pixels to form a spectral-spatial fusion feature vector, which is set to the unary potential function in the CRF framework.The local class label cost term is then set to the pairwise potential function.The method is described as follows: 1) MNF rotation is performed on the original image, and the noise covariance matrix in the principal component is used to separate and readjust the noise in the data, so that the variance of the transformed noise data is minimized and the bands are not correlated; 2) Representative features are selected from the perspective of mathematical morphology, spatial texture, and mixed pixel decomposition, and then combined with the spectral information of each pixel to form a spectral-spatial fusion feature vector.The SVM classifier is used to model the relationship between the label and the fusion feature and the probability estimate of each pixel is calculated independently, based on the feature vector, according to the given label; 3) The spatial smoothing term and the local class label cost term simulate the spatial contextual information of each pixel and its corresponding neighborhood through the label field and the observation field.According to spatial correlation theory, both the spatial smoothing term and the local class label cost term have the effect of adjacent pixels having the same class label.
The city of Hanchuan is located in the central part of Hubei province, China, on the lower reaches of the Han River and in the middle of Jianghan Plain, where the terrain is flat and low-lying.The area Flowchart of the spectral-spatial fusion based on conditional random fields (SSF-CRF) method.

Study Areas
The two datasets cover the cities of Hanchuan (113 • 22 -113 • 57 E, 30 • 22 -30 • 51 N) and Honghu (113 • 07 -114 • 05 E, 29 • 39 -30 • 12 N) in Hubei, China (see Figures 4 and 5).The city of Honghu is located in the south-central part of Hubei province, on the middle and lower reaches of the Yangtze River, and in the southeast of Jianghan Plain.The terrain in this region is higher in the north and south of the area.The climatic characteristic of Honghu is similar to that of Hanchuan, and they both belong to the subtropical monsoon climate zone.The main crops grown in Honghu are cotton, rice, wheat, barley, broad beans, sorghum, and rapeseed.The city of Honghu is located in the south-central part of Hubei province, on the middle and lower reaches of the Yangtze River, and in the southeast of Jianghan Plain.The terrain in this region is higher in the north and south of the area.The climatic characteristic of Honghu is similar to that of Hanchuan, and they both belong to the subtropical monsoon climate zone.The main crops grown in Honghu are cotton, rice, wheat, barley, broad beans, sorghum, and rapeseed.

Data Acquisition
The two datasets used to verify the proposed SSF-CRF method were provided by the Intelligent Data Extraction and Remote Sensing Analysis Group of Wuhan University (RSIDEA).The data were collected by the use of a DJI Matrice 600 Pro drone.The hyperspectral imager used was a Nano- The city of Honghu is located in the south-central part of Hubei province, on the middle and lower reaches of the Yangtze River, and in the southeast of Jianghan Plain.The terrain in this region is higher in the north and south of the area.The climatic characteristic of Honghu is similar to that of Hanchuan, and they both belong to the subtropical monsoon climate zone.The main crops grown in Honghu are cotton, rice, wheat, barley, broad beans, sorghum, and rapeseed.

Data Acquisition
The two datasets used to verify the proposed SSF-CRF method were provided by the Intelligent Data Extraction and Remote Sensing Analysis Group of Wuhan University (RSIDEA).The data were collected by the use of a DJI Matrice 600 Pro drone.The hyperspectral imager used was a Nano-Hyperspec hyperspectral imaging sensor.The parameters of the Nano-Hyperspec imager are listed in Table 1.The Hanchuan dataset includes a hyperspectral image of 303 × 600 pixels and 270 bands, with a spatial resolution of 0.1 m.The image contains the nine land-cover classes of red roof, gray roof, tree, road, strawberry, pea, soy, shadow, and iron sheet.The true-color image is shown in Figure 6a and the corresponding ground-truth map is displayed in Figure 6b.The Honghu dataset includes a hyperspectral image of 400 × 400 pixels with 274 bands and a spatial resolution of 0.4 m.The image contains the 18 land-cover classes of red roof, bare soil, rape, cotton, Chinese cabbage, pakchoi, cabbage, tuber mustard, Brassica parachinensis, Brassica chinensis, small Brassica chinensis, Lactuca sativa, celtuce, film-covered lettuce, romaine lettuce, carrot, white radish, and sprouting garlic.The true-color image is shown in Figure 7a and the corresponding ground-truth map is shown in Figure 7b.The Honghu dataset includes a hyperspectral image of 400 × 400 pixels with 274 bands and a spatial resolution of 0.4 m.The image contains the 18 land-cover classes of red roof, bare soil, rape, cotton, Chinese cabbage, pakchoi, cabbage, tuber mustard, Brassica parachinensis, Brassica chinensis, small Brassica chinensis, Lactuca sativa, celtuce, film-covered lettuce, romaine lettuce, carrot, white radish, and sprouting garlic.The true-color image is shown in Figure 7a and the corresponding ground-truth map is shown in Figure 7b.The Honghu dataset includes a hyperspectral image of 400 × 400 pixels with 274 bands and a spatial resolution of 0.4 m.The image contains the 18 land-cover classes of red roof, bare soil, rape, cotton, Chinese cabbage, pakchoi, cabbage, tuber mustard, Brassica parachinensis, Brassica chinensis, small Brassica chinensis, Lactuca sativa, celtuce, film-covered lettuce, romaine lettuce, carrot, white radish, and sprouting garlic.The true-color image is shown in Figure 7a and the corresponding ground-truth map is shown in Figure 7b.

Experimental Description
The high spatial resolution hyperspectral datasets of the cities of Hanchuan and Honghu in China were used to verify the proposed SSF-CRF method.The comparison algorithms were the traditional pixel-based SVM classification algorithm with a radial basis function as the kernel type, the object-oriented classification approach of mean shift segmentation (MS) [30], and a number of random field-based classification methods.The random field-based methods were the Markovian support vector classifier (MSVC) [31], the support vector conditional random field classifier with a Mahalanobis distance boundary constraint (SVRFMC) [37], and the detail-preserving smoothing classifier based on conditional random fields (DPSCRF) [54].The MSVC algorithm integrates SVM with the MRF model, and obtains the final classification result through the ICM algorithm, using the Gaussian radial basis function and the Potts model as the kernel function and the local prior energy function, respectively.SVRFMC is a CRF-based classification algorithm based on Markov boundary constraints, where the spatial term is constrained by the Markov distance boundary to maintain the spatial details of the classification results.DPSCRF considers the interaction of segmentation and classification in the CRF model, and adds large-scale spatial contextual information by segmentation.
In the experiments, for each algorithm, we randomly selected 1%, 3%, 5%, and 10% of the training samples to classify, and the remaining 99%, 97%, 95%, and 90% of the samples were used for precision verification.Three kinds of accuracies are used in this paper to assess the quantitative performance: The accuracy of each class, the overall accuracy (OA), and the Kappa coefficient (Kappa) [55].

Classification Results and Discussion
For the Hanchuan and Honghu datasets, the classification maps obtained using the SVM, MS, MSVC, SVRFMC, DPSCRF, and SSF-CRF algorithms under 1% training samples are shown in Figures 8  and 9, respectively.The corresponding classification accuracies and confusion matrices are provided in Tables 2-5.The first experiment was with the Hanchuan dataset, for which the MNF transformation reduced the original image from 270 bands to 10 bands.According to the characteristics of the data, several experiments were conducted with selected suitable features to minimize the noise.The four endmembers of shadow, tree, strawberry, and red roof were then extracted.The four texture features of homogeneity, angular second moment, contrast, and mean were extracted from the image with a window size of 7 × 7. The morphological features were extracted with a disk-shaped SE with a size of 8.
As can be seen from Figure 8 and the confusion matrix of SSF-CRF in Table 2, the classification result of the SVM algorithm shows a lot of salt-and-pepper noise because it does not consider the neighborhood spatial contextual information.Figure 8b is the result of the object-oriented classification approach (MS), and Figure 8c-f shows the results of the random field-based methods (SVRFMC, DPSCRF, MSVC, and SSF-CRF).These classification maps present a better visual performance, as the neighborhood interaction is taken into consideration.Although all these methods are able to consider the spatial contextual information, they differ in detail.As highlighted in the black boxes and red boxes, SSF-CRF can better maintain the integrity of the shape structure of the red roof and tree classes, while the other algorithms lose most of these parts, and the results still contain classification noise.Furthermore, the soy class in the images is wrongly classified to pea and tree by most methods, except SSF-CRF, as displayed in the blue boxes.Correspondingly, we can see from the confusion matrix that these three types of features are less often misclassified into other categories by SSF-CRF.Overall, the SSF-CRF algorithm not only shows a good performance in maintaining details and keeping good boundary information, but it can also better distinguish the crops with similar spectra.
The quantitative metrics (the accuracy of each class, the OA, and the Kappa) of the different algorithms are listed in Table 3. From the table, we can see that, compared with the traditional pixel-based classification method (SVM), the object-oriented method (MS) and the random field-based classification methods (SVRFMC, DPSCRF, MSVC, and SSF-CRF) show an improvement of more than 3% in terms of OA and Kappa, which confirms the importance of spatial contextual information for classification.Having added in the spatial feature vector, the OA of SSF-CRF reaches 94.60%, which is an increase of about 9% over SVM.For the accuracy of each class, SSF-CRF also outperforms the other algorithms.For example, for the soy class, the accuracy of most of the algorithms is below 79%, but SSF-CRF achieves an accuracy of 89.26%, which demonstrates that it performs well in separating similar crops and solving the problem of spectral variation and heterogeneity within the same land-cover class.On the whole, SSF-CRF obtains greatly improved classification results, in terms of the accuracy of each class, the OA, and Kappa.

Experiment 2: Honghu Dataset
The second experiment was with the dataset from the city of Honghu.According to the characteristics of the data, several experiments were conducted with selected suitable features to minimize the noise.The original 274-dimension hyperspectral image was reduced to 10 dimensions by the MNF transformation, and the endmember characteristics of bare soil, rape, and film-covered lettuce were extracted.The four texture features of homogeneity, mean, dissimilarity, and entropy were extracted with a window size of 7 × 7. The morphological features were collected with a disk-shaped SE, with a size of 8.
The classification results shown in Figure 9 and the confusion matrix of SSF-CRF in Table 4 allow us to conclude that the algorithms fusing spatial contextual information can improve the classification accuracy and show a smoother classification effect, which was also the case for the Hanchuan dataset.The SVM algorithm again displays a result containing a lot of noise.Figure 9b is the result of the MS algorithm, which shows less noise as a result of considering the spatial contextual information.Although the random field-based methods (SVRFMC, DPSCRF, MSVC, and SSF-CRF) exhibit a better visual performance, as presented in Figure 9c-f, for crops with similar spectra, there are still spectral variations and heterogeneity problems.For example, the romaine lettuce in the yellow boxes is almost completely classified as film-covered lettuce by the SVM and DPSCRF algorithms, but it is well maintained in the SSF-CRF classification result.The sprouting garlic and Brassica chinensis classes in the red and blue boxes keep a relatively complete shape structure under the action of the spatial features in SSF-CRF, but the results of the other methods are poor.
It can be clearly seen from the quantitative evaluation results in Table 5 that, having taken the neighborhood interaction into consideration, the OAs of MS, SVRFMC, DPSCRF, and MSVC are improved, compared with SVM, and the accuracies for each class are also improved, except for pakchoi and romaine lettuce.Because the spectral difference between pakchoi, romaine lettuce, and the other crops is not obvious, and the area is small, they are completely misclassified by SVM and DPSCRF, and the improvement of MS, SVRFMC, and MSVC is also limited.After considering the texture, morphology, and endmember features, the SSF-CRF algorithm effectively distinguishes these classes, obtaining an OA of 97.95%, and the accuracies of most classes are more than 90%.For the pakchoi and romaine lettuce classes, most of the other algorithms obtain an accuracy of around 15% and 44%, respectively, but SSF-CRF obtains an accuracy of 87.50% and 95.64%, respectively.

Sensitivity Analysis for the Training Sample Size
The Hanchuan and Honghu datasets were both used to analyze the influence of different training sample sizes on the different classification algorithms.In this experiment, we randomly selected 1%, 3%, 5%, and 10% of each type of training sample from the corresponding ground-truth distribution map, and the remaining 99%, 97%, 95%, and 90% of the samples were used as verification samples to evaluate the classification accuracy.The classification OAs of the different classification algorithms under different training sample sizes are shown in Figure 10.
As can be seen from Figure 10, as the training sample size increases, the classification accuracies of all the algorithms increase.The object-oriented MS algorithm performs better in the Honghu dataset than in the Hanchuan dataset because the Hanchuan dataset is more fragmented.The random field-based classification methods (SVRFMC, MSVC, DPSCRF, and SSF-CRF) show similar effects in both datasets.All the algorithms are superior to the pixel-based SVM classification algorithm, which simply considers the spectral information.In summary, the SSF-CRF algorithm obtains the best classification performance in this experiment with different training sample sizes.As can be seen from Figure 10, as the training sample size increases, the classification accuracies of all the algorithms increase.The object-oriented MS algorithm performs better in the Honghu dataset than in the Hanchuan dataset because the Hanchuan dataset is more fragmented.The random field-based classification methods (SVRFMC, MSVC, DPSCRF, and SSF-CRF) show similar effects in both datasets.All the algorithms are superior to the pixel-based SVM classification algorithm, which simply considers the spectral information.In summary, the SSF-CRF algorithm obtains the best classification performance in this experiment with different training sample sizes.

Conclusions
High spatial resolution hyperspectral data have rich spectral and spatial detailed information, which makes the land-cover classes and the spatial distribution in the imagery more complicated.The traditional classification methods cannot solve the problem of the many species of crops and their similar spectral curves.In this paper, the SSF-CRF classification method was proposed to solve the problem of the accurate identification of crops.Aiming at the characteristics of the data, three representative features were selected from three angles-mathematical morphology, spatial texture, and mixed pixel decomposition-and combined with the spectral features to form a spectral-spatial feature vector, which was integrated into the CRF model to alleviate the spectral changes and heterogeneity within the same feature.At the same time, considering the local class label cost constraint relieves the over-smoothing of the CRF model.Experiments with two high spatial resolution hyperspectral datasets from the cities of Hanchuan and Honghu in China demonstrated that the SSF-CRF classification method can obtain a competitive accuracy and visual performance, compared with the traditional classification methods.
Due to the characteristics of crop planting in southern China and the limitation of the flight time of UAVs, the experimental datasets used in this paper were small, and the method proposed in this paper can be deemed suitable for small-and medium-scale crop classification applications.When the method is applied to a wider range of crops, more appropriate features should be selected to

Conclusions
High spatial resolution hyperspectral data have rich spectral and spatial detailed information, which makes the land-cover classes and the spatial distribution in the imagery more complicated.The traditional classification methods cannot solve the problem of the many species of crops and their similar spectral curves.In this paper, the SSF-CRF classification method was proposed to solve the problem of the accurate identification of crops.Aiming at the characteristics of the data, three representative features were selected from three angles-mathematical morphology, spatial texture, and mixed pixel decomposition-and combined with the spectral features to form a spectral-spatial feature vector, which was integrated into the CRF model to alleviate the spectral changes and heterogeneity within the same feature.At the same time, considering the local class label cost constraint relieves the over-smoothing of the CRF model.Experiments with two high spatial resolution hyperspectral datasets from the cities of Hanchuan and Honghu in China demonstrated that the SSF-CRF classification method can obtain a competitive accuracy and visual performance, compared with the traditional classification methods.
Due to the characteristics of crop planting in southern China and the limitation of the flight time of UAVs, the experimental datasets used in this paper were small, and the method proposed in this paper can be deemed suitable for small-and medium-scale crop classification applications.When the method is applied to a wider range of crops, more appropriate features should be selected to participate in the classification, based on the characteristics of the data and crops.In our future work, we will attempt to classify a wider range of crops.

φ
indicates the closing reconstruction of image f and reconstruction of the closing reconstruction image.

Figure 3 .
Figure 3. Flowchart of the spectral-spatial fusion based on conditional random fields (SSF-CRF) method.

Figure 3 .
Figure 3.Flowchart of the spectral-spatial fusion based on conditional random fields (SSF-CRF) method.
Remote Sens. 2018, 10, x FOR PEER REVIEW 9 of 21 is dominated by a subtropical humid monsoon climate.A wide variety of crops are grown in the area, including rice, wheat, cotton, and rapeseed.

Figure 4 .
Figure 4. (a) The location of Hubei province in China.(b) Administrative area map of the city of Hanchuan in Hubei province.(c) The study site.

Figure 5 .
Figure 5. (a) The location of Hubei province in China.(b) Administrative area map of the city of Honghu in Hubei province.(c) The study site.

Figure 4 .
Figure 4. (a) The location of Hubei province in China.(b) Administrative area map of the city of Hanchuan in Hubei province.(c) The study site.

Figure 4 .
Figure 4. (a) The location of Hubei province in China.(b) Administrative area map of the city of Hanchuan in Hubei province.(c) The study site.

Figure 5 .
Figure 5. (a) The location of Hubei province in China.(b) Administrative area map of the city of Honghu in Hubei province.(c) The study site.

Figure 5 .
Figure 5. (a) The location of Hubei province in China.(b) Administrative area map of the city of Honghu in Hubei province.(c) The study site.The city of Hanchuan is located in the central part of Hubei province, China, on the lower reaches of the Han River and in the middle of Jianghan Plain, where the terrain is flat and low-lying.The area is dominated by a subtropical humid monsoon climate.A wide variety of crops are grown in the area, including rice, wheat, cotton, and rapeseed.The city of Honghu is located in the south-central part of Hubei province, on the middle and lower reaches of the Yangtze River, and in the southeast of Jianghan Plain.The terrain in this region is higher in the north and south of the area.The climatic characteristic of Honghu is similar to that of

Table 3 .
The classification accuracies for the Hanchuan dataset.

Table 5 .
The classification accuracies for the Honghu dataset