A Hybrid Model Based on Superpixel Entropy Discrimination for PolSAR Image Classiﬁcation

: Superpixel segmentation is widely used in polarimetric synthetic aperture radar (PolSAR) image classiﬁcation. However, the classiﬁcation method using simple majority voting cannot easily handle evidence conﬂicts in a single superpixel. At present, there is no method to evaluate the quality of superpixel classiﬁcation. To solve the above problems, this paper proposes a hybrid classiﬁcation model based on superpixel entropy discrimination (SED), and constructs a two-level cascade classiﬁer. Firstly, a light gradient boosting machine (LGBM) was used to process large-dimensional input features, and simple linear iterative clustering (SLIC) was integrated to obtain the primary classiﬁcation results based on superpixels. Secondly, information entropy was introduced to evaluate the quality of superpixel classiﬁcation, and a complex-valued convolutional neural network (CV-CNN) was used to reclassify the high-entropy superpixels to obtain the secondary classiﬁcation results. Experiments with two measured PolSAR datasets show that the overall accuracy of both classiﬁcation methods exceeded 97%. This method suppressed the evidence conﬂict in a single superpixel and the inaccuracy of superpixel segmentation. The test time of our proposed method was shorter than that of CV-CNN, and using only 55% of CV-CNN test data could achieve the same accuracy as using CV-CNN for the whole image.


Introduction
Polarimetric synthetic aperture radar (PolSAR) can obtain images of different polarimetric channels.Compared with traditional single-channel synthetic aperture radar (SAR), PolSAR has rich polarimetric and texture features [1,2], and can obtain more comprehensive target-scattering characteristics.PolSAR images have a wide range of application scenarios, such as terrain classification, building extraction and building damage assessment [3][4][5], among which terrain classification is an important research objective of PolSAR image interpretation.
In the field of terrain classification of PolSAR images, it is an important step to use superpixels for surface feature segmentation.Superpixel segmentation is an image segmentation technology [6] that is widely used in computer vision tasks such as target detection [7], visual tracking [8] and image quality assessment [9].Through the space and color features of the image, the whole image is condensed into a set of subregions to effectively maintain the local consistency of the image.Compared with traditional pixel images, superpixel images contain less redundant information, which can reduce the complexity of subsequent image processing tasks.PolSAR images usually pay more attention to the natural feature categories gathered in a large area, such as forests, oceans, etc.In addition, the feature categories with traces of human activities, such as farmland and building areas, usually show a regular aggregation form.Therefore, similar ground objects often appear in the form of local coherent aggregation, and an isolated pixel is usually disturbed by noise or classification error.Therefore, each superpixel on the PolSAR image can be considered as the same ground object, similar to using a fast and efficient simple linear iterative clustering (SLIC) and improved methods to suppress the interference of speckle noise fast and efficiently [10][11][12].
In the early days of PolSAR image interpretation, the superpixel method was often used together with the model-based classification method.Wishart distribution [13] uses polarization covariance matrix to derive the Wishart distance model, and uses this distance in combination with superpixel classification.In addition, a variety of classification algorithms based on statistical models are used, such as the classification algorithm based on a Markov random-field model [14] and its variant [15], and the hybrid model of Wishart and Markov random field [16].However, superpixel-based algorithms depend on the performance of classifiers.These model-based classifiers rely heavily on accurate statistical models, while PolSAR parameter estimation tasks are sensitive to data obtained from different environments or platforms, which makes it difficult to widely apply the algorithms to various PolSAR datasets [17].
The data-driven machine learning classification algorithm is also a good combination of superpixels, such as random forest (RF) [18], XGBoost (XGB) [19], support vector machine (SVM) [20] are used to classify PolSAR image datasets.The input of machine learning algorithms is often characterized by various polarization decomposition methods, such as Cloude-Pottier decomposition [21,22], Freeman decomposition [23] and Yamaguchi decomposition [24].However, high-dimensional input data will lead to a "dimension disaster", which has led some researches to try to select the optimal feature combination [25,26].In the face of many kinds of ground object classification methods, feature selection is still difficult.These manually selected features limit the performance of the classifier and cannot solve the evidence conflict problem when the local superpixel segmentation is low-quality.
In recent years, deep learning has made outstanding achievements in PolSAR image processing tasks [27].Benefiting from its end-to-end characteristics, the process of manual feature extraction can be transformed into automatic depth feature extraction.The convolutional neural network (CNN) is a typical representation of deep learning [28].The improved complex-valued CNN (CV-CNN) can make full use of the amplitude and phase information of images [29,30]; adding a context mechanism and attention module to the network can further automatically extract useful features [31,32], and deep networks such as capsule network [33] are also applied to PolSAR image classification.The fusion of multi-temporal SAR data and optical data has achieved higher classification accuracy by using a 2D-CNNbased classifier [34].These methods have achieved high classification accuracy.However, as the network becomes wider and deeper, the requirement for computing resources of the model increases significantly.In addition, its end-to-end characteristics are difficult to combine with the superpixel method, which makes it difficult to avoid the influence of speckle noise.To solve this problem, the probability distribution of the output of a multilayer automatic encoder is used as a measure, and the k-nearest neighbor (KNN) is introduced to improve the classification accuracy of superpixels [35].Lately, SLIC has been added to the end-to-end process to automatically seek the optimal superpixel segmentation by using a superpixel sampling network (SSN) [36], which improves the effect of superpixel edge fitting.However, the impact of a large number of training iterations on classification efficiency must be considered.
To sum up, in complex local images, the existing superpixel methods cannot fully fit the edge of terrains, and rely heavily on the accuracy of the classifier.In addition, the simple majority voting mechanism cannot reflect the evidence conflict in a single superpixel, which easily leads to the loss of accuracy in terrain edge classification.At present, there are many improved methods for superpixel image segmentation [11,37,38], but there is a lack of an evaluation method for the quality of each classified superpixel, which is incomplete in the PolSAR image classification based on superpixels.
In order to solve the above problems, this study proposes a method to evaluate the quality of superpixel classification called superpixel entropy discrimination (SED), and constructed a hybrid classification model.Firstly, a light gradient boosting machine (LGBM) was used to process large-dimensional input features with an extremely fast training speed and a strong feature selection ability [39], and SLIC was used to obtain superpixels, quickly obtaining the primary classification results based on pixel-by-pixel voting in a single superpixel.Secondly, information entropy was used to evaluate the quality of superpixel classification.The features of ground objects in high-entropy superpixels are complex, and it is difficult to express the accurate information of classification by manually extracting features.CV-CNN was used to reclassify the high-entropy superpixels, extracting features automatically, and the secondary classification results were obtained.The main contributions of our work include: (1) A superpixel entropy discrimination method was proposed, and the definition of superpixel entropy based on information entropy was proposed to describe the evidence conflict in a single superpixel, which was used to evaluate the quality of superpixel classification.(2) A two-level cascade classifier based on LGBM+SLIC and CV-CNN was proposed.
The superpixels with high entropy were reclassified by CV-CNN to reduce the accuracy loss caused by evidence conflict in a single superpixel.(3) The training and testing time consumption of LGBM+SLIC were short.The integrated model could achieve the same accuracy by using CV-CNN for the whole image, which greatly shortened the testing time while maintaining high-accuracy performance.
The rest of this paper is organized as follows: Section 2 introduces the main framework and submodules of our proposed method and gives the derivation process of SED.Section 3 shows the results and analysis of our proposed method for two typical PolSAR datasets.Section 4 discusses the effect of SED under different conditions.Finally, Section 5 represents the conclusion.

Proposed Method
In past research, the combination of superpixel segmentation and a classification algorithm has played an important role in the ground object classification of PolSAR images, and has achieved good classification accuracy.Our proposed method also continues this main technical route and improves on it.We proposed SED based on information entropy to evaluate the quality of single-superpixel classification, and constructed a two-level cascade classifier.

Main Framework
In this paper, we proposed a hybrid model based on superpixel entropy discrimination (SED), which was applied to PolSAR terrain classification.PolSAR image data were used as the input, and the polarimetric features and texture features of the image were obtained in the feature decomposition module.The processed data entered a two-level cascade classification model.In the primary classification module, LGBM+SLIC was used to classify the previous data.This combination can avoid the heavy feature selecting work and obtain the classification results fast.In the secondary classification module, the SED method was used to evaluate the quality of superpixel classification, and CV-CNN was used to reclassify the high-entropy superpixel data to obtain the complete image classification results after fusion.The flow of the whole method is shown in Figure 1.In the following, the feature decomposition and two classification modules are introduced in detail according to the flow, and the derivation process of SED is given in Section 2.4.1.

Feature Decomposition
At first, the polarimetric scattering matrix S is often used to define the features of PolSAR images.The complex matrix S is expressed as: In the single-backscattering system, S satisfies the reciprocity S HV = S HV , and then the three-dimensional Pauli eigenvector k is defined: The 3 × 3 complex polarimetric coherence matrix T can be obtained: where is the average of the set.The nondiagonal data in (3) are in the plural form, and T is represented as T 11 , T 22 , T 33 , Re(T 12 ), Im(T 12 ), Re (T 13 ), Im(T 13 ), Re(T 23 ) and Im(T 23 ).
Polarization decomposition theories include coherent decomposition and target polarization decomposition, which is based on the eigenvector or scattering model.They characterize different features of PolSAR images [31].Although complex features can hardly be represented by using one decomposition, the combination of multiple decomposition methods can represent it from different perspectives.
Then, considering the relationship between adjacent pixels in the PolSAR image, the pixels belonging to a certain class are extremely dependent on their neighborhood space.The texture feature can reflect the spatial distribution characteristics of the image, which is one of the most important features of the PolSAR image.This paper used the gray-level co-occurrence matrix (GLCM) [40] to extract eight texture features of the image.
Reasonably selected features have positive effects on classification results.We deliberately selected different types of features in the feature selection process, and we also tried different combinations, such as Yamaguchi decomposition and Krogager decomposition [41].Finally, the combination of Cloude-Pottier decomposition, Freeman-Durden decomposition, Pauli decomposition [42] and texture feature was found to have a better effect.
Finally, the polarimetric coherence matrix T, nine decomposition parameters and eight texture features were used to constitute the 26-dimensional features for the classification of PolSAR images, as shown in Table 1.H is the scattering entropy, α is the scattering angle, A is the anisotropy, P S is the surface scattering, P D is the even scattering, P V is the volume scattering, a Pauli is the odd scattering of the flat surface, b Pauli and c Pauli are the dihedral angle scattering of the angle reflector with the direction angle of 0 • and 45 • , µ is the mean value, θ is the variance, γ is the contrast, Dis is the heterogeneity, Hom is the homogeneity, ASM is the angular second moment, Ent is the entropy, and Max is the maximum value.

Primary Classification Module
The LGBM is part of the boosting algorithm in the field of ensemble learning [43], and it was developed on the basis of the gradient boosting decision tree (GBDT) [44].The LGBM has extremely fast training and testing speed, and has a strong feature selection ability due to its unique gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB) algorithms [39], which can avoid the problem of "dimension explosion" caused by the input of large-dimensional features of PolSAR images [45].
The LGBM uses the gradient lifting algorithm to iteratively generate a cluster of decision trees as the final prediction model.For the pre-specified loss function L(y, f (x)) and the input training dataset T D = {(x 1 , y 1 ), (x 2 , y 2 ), ..., (x N , y N )} , the LGBM generates the prediction model as follows: Firstly, the initialized prediction model f 0 (x) is: where y i represents the label of the training set, ρ represents the initialized output value, and N represents the size of the training set.When the number of iterations is m ∈ [1, M], and the negative gradient ỹi calculated is: where M represents the number of base classifiers.Secondly, fitting base classifiers: where h(•) represents the expression of the base classifiers, i.e., the decision tree, and a m , ρ m are the superparameter and weight of the m-th base classifier, respectively.Then, to update the current prediction model: Finally, the final prediction model f (x) is obtained: The LGBM has the ability to reduce the model output deviation.Because classification based on superpixels has strong local image consistency, it can suppress the adverse impact caused by the model output variance.Therefore, the combination of an LGBM and SLIC has a positive complementary relationship.

Simple Linear Iterative Clustering(SLIC)
The SLIC [46] algorithm is a fast and effective superpixel generation algorithm.The SLIC algorithm flow is shown in Figure 2. The initialization of the algorithm includes: (1) Converting the input Pauli pseudo-color image to the CIELA color space, and the pixel p i is represented by CLA i = (l i , ca i , cb i ).Where CLA i is a coordinate of uniform color space, and l i is the brightness of color.ca i is the position between red and green.cb i is the position between yellow and blue.
(2) Placing cluster centers according to the equal step length L: , where center i is the position of the cluster centers, POS i = (I x i , I y i ) represents the position of pixel p i on the image, and I x i ,I y i are coordinates of pixel p i .The calculation method for step length L during initialization is L = (S/N c ) 1/2 , where N c is the number of cluster centers, and S is the image size.
(3) Moving cluster centers to the position with the lowest gradient in its eight fields.SLIC synthesizes color distance and spatial distance to obtain a new distance metric, D, for clustering.The color distance d C and spatial distance d S between two pixels, p i and p j , are calculated as follows: The distance metric D is obtained: where M is the maximum color distance in the cluster, and L is the step size when the cluster center is placed, which is used to replace the maximum spatial distance in the cluster.
In actual calculation, (12) constant deformation is: The time complexity of SLIC is linear with the image size S, so SLIC has extremely high computational efficiency.After superpixels are generated, simple majority voting is used to complete the classification of a single superpixel, and then the whole image is traversed.

Secondary Classification Module 2.4.1. Superpixel Entropy Discrimination(SED)
In the local images with complex feature types, the existing superpixel methods cannot fully fit the feature edges, and are exceedingly reliant on the accuracy of the classifier.The simple majority voting mechanism also cannot reflect the evidence conflict within a single superpixel, which highly likely to lead to a loss in the accuracy of feature edge classification.At present, there is no method to evaluate the quality of each classified superpixel.
In a local region, information entropy can be used to accurately measure the difference between two pixels [2,47,48].In the same way, information entropy can also measure the difference between superpixels.We call this superpixel entropy discrimination (SED).This paper selected the classification proportion of an LGBM in a single superpixel to construct superpixel entropy to describe the evidence conflict degree in the superpixel.
First, we calculate the classification proportion in the superpixel.When the classification category is determined, it is equivalent to the classification probability of the superpixel: where x i is the classification category, n is the number of classification categories, N s is the number of pixels in a single superpixel, and N i is the total number of pixels in a single category.Then, the information entropy of each superpixel H(s) is constructed: where A is the collection of superpixels of the whole image.
If the classification results for all pixels in a superpixel tend to be consistent, the superpixel entropy will be small, which means that the classification of the superpixel is high-quality.If there are evidence conflicts between the classification results, the superpixel entropy will be large, which means that the classification of the superpixel is low-quality.
According to the monotonic property of information entropy, there is an upper limit on the entropy of each superpixel, which is related to the number of classification categories n: 0 ≤ H(s) ≤ log n.Assuming that the probability of the largest category in a single superpixel is P m (1/n ≤ P m ≤ 1), the entropy of a single superpixel can be expressed as: The following optimization problems can be constructed: where P i represents the probability of other classifications.Let us construct Lagrange function: where λ represents the Lagrange multiplier.Solve Equation ( 18) above: Substituting (19) into the previous Formulas ( 16) and ( 17): Let us assume H D is the discrimination threshold of superpixel entropy.Let H D equal the maximum value of ( 20): This transformation is mainly used to facilitate the evaluation of the quality of superpixel classification.Superpixel entropy still has all the properties of information entropy.The uncertainty in this superpixel is mainly caused by the error of superpixel edge segmentation or the large-scale classification error of the primary classifier.It is not feasible to use the maximum classification to replace the local area.We used CV-CNN to reclassify it.

Complex-Valued Convolutional Neural Network (CV-CNN)
Compared with manually extracting image features, depth learning methods, such as CNN, can deeply mine the joint features between adjacent pixels through the convolutionpooling process.The application in PolSAR images has proved that the efficiency of this automatic feature extraction classification method is much higher than that of the manual feature extraction classification method, especially in local images with complex terrain, ground object boundary and noise point [28,49,50].At present, quite a number of PolSAR image researches are using CNN-related classification methods.In order to reduce the complexity of the whole network, we used CV-CNN to reclassify the pixels within the high-entropy superpixels, using the combination of two convolution-pooling layers and two full connection layers.The specific network superparameter settings refer to [36].

Experimental Setup
The two measured datasets used in the experiment in this paper were the image of Flevoland in the Netherlands taken by the AIRSAR on an airborne platform at the L band and the image of San Francisco in the United States taken by Radarsat-2 on a spaceborne platform at the C band.These two datasets are often used for PolSAR image classification.Pauli pseudo-color images and ground truth maps of the two datasets are presented in Figure 3.The size of the Flevoland image was 750 × 1024, and there were 15 different types of ground objects in total.The ground object truth of the image was drawn with reference to [36]; the size of the San Francisco image was 1800 × 1380, and there were five different types of ground objects.The ground truth value of the image was drawn with reference to [51].The black part in the ground truth is the unlabeled area.Our proposed method was compared with seven PolSAR image classification methods, including SVM, RF and XGB in machine learning and RV-CNN in deep learning.In order to compare the functions of each part of the model proposed in this paper, we also listed the classification effects of only using LGBM, only using LGBM+SLIC and only using CV-CNN.The input of SVM was nine real and imaginary parts generated by the polarimetric coherence matrix T. XGB and RF were also combined with SLIC, and their input was the same as that used for the proposed method.The input of RV-CNN generated a tensor with a size of 12 × 12 × 9 for the polarimetric coherence matrix T.
In the experiments, 9% of the labeled pixels were selected as the test set and 1% as the verification set.We used the whole image as the test set, and used the overall accuracy (OA) and kappa coefficient to evaluate the classification performance.The experiments were run on Python and Intel i7-11700 CPU.

Classification Results of Flevoland Dataset
In the Flevoland dataset, the number of superpixels was set to 592, the number of base classifiers of the LGBM was set to 600, the maximum depth of base classifiers was set to 9, and the learning rate was set to 0.15.All the above parameters were obtained through 10-fold cross-validation to find the optimal superparameters.CV-CNN used a 12 × 12 local image near a single pixel to replace the category of the pixel, and the size of each input was a 12x12x6 tensor.Zero filling was performed on the outermost layer of the whole image.To prevent the network from overfitting, we set the learning rate of CV-CNN to 0.5 and the number of iterations to 50.According to (20), the threshold H D for SED is negatively correlated with P m .P m was set to 0.75, and its corresponding H D = 1.7631, which will be discussed in detail in the next section.The classification results of the Flevoland dataset are shown in Table 2, and the classification output images are shown in Figure 4. Due to the limitations of manually extracted features, SVM and LGBM, which are not combined with superpixel segmentation, were seriously affected by speckle noise, resulting in their low classification accuracy, as shown in Figure 4a,b.LGBM-SLIC, XGB-SLIC and RF-SLIC had obvious recognition effects on wheat, grass and water because superpixel segmentation has good segmentation effects on these categories.The OA of these three methods was close, as seen in Figure 4c-e.Because RV-CNN only used the amplitude of the polarization coherence matrix as input, its overall performance was not as good as that of the classification method based on superpixels, as seen in Figure 4f.Affected by the performance of the classifier and the inaccurate edge segmentation of the features, CV-CNN performed better than the above methods, as shown in Figure 4g.However, it was blatantly obvious that RV-CNN and CV-CNN could not identify buildings.This was due to the combination of fewer building sample iterations.This phenomenon can be avoided by increasing the iterations from 50 to 100.The OA of CV-CNN was increased to 98.01%, and the OA of our proposed method also increased to 98.48%; nevertheless, the training time and the possibility of overfitting increased accordingly.
As seen in Table 2, our proposed method achieved the best classification results for a total of seven types of ground objects.This method performed best for peas, bare soil and barley, which were accurately classified 1.02%, 0.72% and 1.96% more often by our method than the second most accurate one.In other classifications, such as grape and rapeseed, the accuracy of our proposed method was always higher than that of CV-CNN.Improving the performance of CV-CNN would also improve the accuracy of our proposed method.This is related to the secondary classification module using the feature of CV-CNN reclassification.The proposed method also achieved the best OA and kappa coefficients.The OA was 1.2% higher than that of the CV-CNN with the highest score among the other seven classification methods, and the kappa coefficient reached 0.9709.This is because this method improved the accuracy of edge classification of the superpixels.Therefore, in the Flevoland image classification experiment, our proposed method achieved the optimal classification results.

Classification Results of San Francisco Dataset
In the San Francisco dataset, the number of superpixels was set to 2148, the number of base classifiers was set to 1000, and the rest of the superparameter designs were the same as those described in Section 3.2.The classification results for the San Francisco dataset are shown in Table 3, and the classification image is shown in Figure 5.According to Table 3, our proposed method achieved the best classification results for water, vegetation, low-density urban and slashed urban, and nearly had the highest accuracy for high-density urban.It can be seen from the comparison of the subgraphs in Figure 5 that this method further avoided the influence of speckle noise.
The OA of this method reached 97.52%, which is 1.1% higher than that of CV-CNN, and the kappa coefficient reached 0.9643, which is also higher than that achieved by the other seven classification methods.In a word, in the experiment of image classification in San Francisco, our proposed method achieved the best results.In fact, the performance of the model was affected by several parameters, such as the number of superpixels k and the superpixel entropy H D .The model applied in the dataset did not have the best performance in the extreme case.We will discuss this in detail in the next section.

Disussion
The performance of our proposed method was affected by the extent of the superpixel segmentation k and the superpixel entropy H D .In this section, we first carefully observe the impact of SED on the edge segmentation of ground objects, and then discuss the performance changes in the model with different superparameters.

Classification Effect of SED
Focusing on the Flevoland dataset of Section 3.2, Figure 6 shows the performance of the superpixel entropy.Considering Figure 6d-f, pay attention to the forest area in the green box.LGBM+SLIC was incorrectly classified.CV-CNN could correctly classify this region.Our proposed method could also use the results of CV-CNN through SED to obtain the correct classification.On the contrary, as seen for the building area in the bottom-right box, CV-CNN did not correctly classify this, but SED judged that the superpixel quality of the area was high enough, and locked the final result to obtain the correct classification of LGBM+SLIC.As seen for the box on the left side of the image, the terrain edge classification accuracy for peas and wheat was also significantly improved, indicating that SED is helpful for the classification of feature edges.

Configuration of SED
In the experiments, we manually specified P m = 0.75, which represents the probability that the largest category accounts for more than 75% of a single superpixel.We further explore the relationship between H D and P m in Figure 7.The superpixel entropy thresholds were H D = 1.7631 for Flevoland and H D = 1.3113 for San Francisco.If H(s) was higher than this value, the corresponding superpixel would be reclassified by CV-CNN.But this is not necessarily the value that obtains the optimal accuracy, so we tried to find the influence of P m and k on OA.
Keeping the models of LGBM-SLIC and CV-CNN unchanged, we considered the changes in OA under the conditions of different numbers of superpixels, k, and superpixel entropy, H D .We set the initial placement step of each superpixel to 34-57, and the corresponding number of superpixels to 244-750, which is a moderate range of computing resource consumption.The threshold H D for SED was negatively correlated with P m .We set P m to change from 0 to 1 according to the step of 0.01 to change H(s).After 2400 experiments, Figure 8 was obtained.When P m = 0 or 1, point A and point D on the curve represent the OA ave when using LGBM+SLIC or CV-CNN, respectively, for all data in the proposed model.Point C indicates that the maximum value was obtained when P m = 0.73.At this time, OA ave = 97.16%,which is 0.94% higher than that of CV-CNN.Point B indicates that when P m = 0.60, the OA ave of our proposed method was the same as that of CV-CNN, but only 55.02% of the data were tested using CV-CNN.Because the speed of LGBM+SLIC is fast, our proposed method could greatly shorten the test time cost to process the whole image, which was 33% shorter than that of CV-CNN with the same accuracy at 96.20%, and the increase in the training time cost was almost negligible.The comparison of testing time cost is shown in Table 4.In addition, on the basis of Figure 8, we also used XGB and RF to replace the LGBM in our primary classification model.The average OA curve is shown in Figure 9a.We performed the same operation on the San Francisco dataset, as shown in in Figure 9b.The initial placement step of each superpixel for the San Francisco dataset was 32-61, and accordingly, the corresponding number of superpixels was 690-2425.In terms of the overall trend, the largest improvement of OA ave in the two images was concentrated in P m = 0.7-0.8.According to the experimental results, our manual value of P m = 0.75 is reasonable.We also tested the method of using different forms of the threshold H D of superpixel entropy to find out the maximum value of all superpixel entropy in the whole image, and set the threshold coefficient K: We used (22) instead of ( 21) to implement classification on two datasets according to the above method and also made the OA ave -K curve in Figure 9c,d.
As seen in Figure 9, there was a difference in accuracy between the curves, which is related to the performance of the classifier itself.Most notably, whether P m or K was used to design H D , our proposed method always had a maximum value higher than the OA of the SLIC and CV-CNN method combined for these two datasets, which shows that the experimental results in Section 3 are not an isolated phenomenon, and the SED and hybrid model proposed in this paper are effective.In addition, in the current model, the number of superpixels and H D still need to be manually specified.The phenomenon of the maximum value always appearing indicates that SED has the possibility of adding to the end-to-end model.We can automatically learn these parameters through the loss function, which is the direction of our further research.
There are some limitations of this paper.One limitation is that SED cannot play a role when H D is greater than the threshold, and the classification corresponding to P m is wrong.This is because when only pixel-by-pixel voting is used, the OA loss of superpixel classification will still be caused by the classification error of the primary classifier.To prevent this kind of situation, higher requirements are needed for the performance of primary classifiers.
The other limitation is that our proposed method had a good performance in lowresolution images, and the effect of its application in high-resolution images remains to be verified.In future work, we plan to explore the application of SED in high-resolution PolSAR images.Because the information in high-resolution images is huge and complex, it is difficult to classify it directly at the pixel level, so the patch-based classification algorithm is preferred [27].SED combines superpixel classification and the patch-based algorithm, which makes it possible to transplant it on high-resolution images.Moreover, most of the current methods used in high-resolution PolSAR images have a complex network structure, a large number of parameters and long prediction time [52].Our proposed method had a good effect, simplifying the model and reducing time consumption, and also has great application potential in high-resolution images.

Conclusions
In this paper, a superpixel entropy discrimination method was proposed, and information entropy was introduced to discriminate between the classification quality of superpixels.This paper also proposed a hybrid model based on SED.Firstly, an LGBM was used to filter large-dimensional features, and SLIC was combined to quickly obtain superpixel classification results.After the classification quality of superpixels was determined by SED, CV-CNN was used to reclassify the low-quality superpixels.The results show that our proposed method can improve the classification accuracy, and SED can effectively evaluate the quality of superpixel classification.In the future, we mainly have two research directions.One is to add the superpixel entropy to the end-to-end process and the other is to explore the application of our proposed method in high-resolution PolSAR images.Simplifying the model and reducing time consumption while maintaining performance is a major challenge, and our proposed method is promising for those direction.

Figure 1 .
Figure 1.Schematic diagram of the proposed method.

( 1 )
If H(s) < H D , it means that a classification dominates in a single superpixel.The superpixel has high classification quality.The uncertainty in this superpixel is mainly caused by speckle noise or small-scale classification errors of the primary classifier.It is feasible to use the maximum classification instead of local region classification.(2) If H(s) ≥ H D , it means that multiple classifications may account for similar proportions in a single superpixel.The kind of superpixel has low classification quality.

Figure 3 .
Figure 3. AIRSAR Flevoland and RS-2 San Francisco datasets.(a) Pauli pseudo-color image of Flevoland.(b) Ground truth of Flevoland.(c) Labels of Flevoland.(d) Pauli pseudo-color image of San Francisco.(e) Ground truth of San Francisco.(f) Labels of San Francisco.

Figure 6 .
Figure 6.Performance of the superpixel entropy when k = 592, P m = 0.75, H D = 1.7631.(a) Superpixel on Pauli pseudo-color image.(b) Superpixel distinguished by SED.Red color represents superpixels reclassified using CV-CNN.(c) The heat map of superpixel entropy, which also reflects the degree of evidence conflict in the superpixel.(d) The full image result of LGBM+SLIC.(e) The full image result of CV-CNN.(f) The full-image result of our proposed method.

Figure 8 .
Figure 8. OA ave -P m curve.The curve shows the average OA with different numbers of superpixels when the number of superpixels, k, ranged from 244 to 750.The percentage of data represents the proportion of LGBM+SLIC and CV-CNN data used by our proposed method in the test set.

Figure 9 .
Figure 9.The average OA curve when LGBM, XGB and RF were used as the curves of the primary classifier.(a) OA ave -P m curve for Flevoland.(b) OA ave -P m curve for San Francisco.(c) OA ave -K curve for Flevoland.(d) OA ave -K curve for San Francisco.

Table 2 .
Accuracy results of Flevoland Dataset.

Table 3 .
Accuracy results of San Francisco Dataset.

Table 4 .
Comparison of time cost.