Gaofen-3 PolSAR Image Classification via XGBoost and Polarimetric Spatial Information

The launch of the Chinese Gaofen-3 (GF-3) satellite will provide enough synthetic aperture radar (SAR) images with different imaging modes for land cover classification and other potential usages in the next few years. This paper aims to propose an efficient and practical classification framework for a GF-3 polarimetric SAR (PolSAR) image. The proposed classification framework consists of four simple parts including polarimetric feature extraction and stacking, the initial classification via XGBoost, superpixels generation by statistical region merging (SRM) based on Pauli RGB image, and a post-processing step to determine the label of a superpixel by modified majority voting. Fast initial classification via XGBoost and the incorporation of spatial information via a post-processing step through superpixel-based modified majority voting would potentially make the method efficient in practical use. Preliminary experimental results on real GF-3 PolSAR images and the AIRSAR Flevoland data set validate the efficacy and efficiency of the proposed classification framework. The results demonstrate that the quality of GF-3 PolSAR data is adequate enough for classification purpose. The results also show that the incorporation of spatial information is important for overall performance improvement.


Introduction
The Chinese Gaofen-3 (GF-3) satellite carrying a synthetic aperture radar (SAR) sensor, was launched on 10 August 2016. With a powerful imaging capability of 12 modes, GF-3 can provide polarimetric SAR (PolSAR) images with a nominal resolution of 8 m. By transmitting and receiving horizontal and vertical waves, PolSAR data is able to capture four polarization channels, helpful for physical understanding and mechanism interpretation regarding land surface [1]. Many applications such as crop monitoring, biomass estimation, target detection, land cover and land use classification, can be undertaken by choosing appropriate GF-3 SAR data. Among the applications, the classification of PolSAR image plays an important role in PolSAR image analysis. Actually, land cover classification on a real whole PolSAR image is the main task in our cooperation with the National Disaster Reduction Center of China (NDRCC). How to find an efficient framework to classify an entire PolSAR image quickly and accurately is the main concern in practical use.
In the last two decades, researchers have developed many PolSAR image classification algorithms. In general, these algorithms can be divided into three categories called statistical model-based algorithms [2][3][4], scattering mechanism analysis-based algorithms [5][6][7], and algorithms combining standard classifiers and polarimetric features [8]. Many works focus on the third type among these approaches. More and more complicated and efficient classifiers or features are integrated into the PolSAR image classification framework to improve accuracy [9][10][11][12]. For example, many neural network(NN)-based methods are proposed and they have reported very excellent results in recent years [11,[13][14][15]. However, even though accuracy is improved, most classification algorithms and procedures of complicated feature extraction are of high complexity and time-consuming, which makes them impractical to process an entire PolSAR image. Moreover, enough labeled samples for NN-based classifiers are not always available. Therefore, an efficient classifier with high accuracy and low time cost is preferable in practical use, which also explains why tree-based classifiers are still welcome in remotely sensed image analysis [16][17][18][19][20]. In Ref. [16], QUEST (quick, unbiased, efficient, statistical tree [21]) was used as a decision tree (DT) tool to implement the classification. The decision tree was very efficient and provided clear split rules that can be easily interpreted according to the physical understanding of used features. Du et al. [17] used random forest (RF) and rotation forest to investigate polarimetric-spatial features in PolSAR image classification and found that rotation forest performed better while RF was much faster [17]. Pradhan et al. [18] compared the performance of several classification techniques for extracting urban areas. It turned out that the rule-based classifier and DT distinguished land cover classes better than support vector machine (SVM) and K-nearest neighbor (KNN) [18]. In [19,20], hybrid DT and RF were used as classifiers, separately, and showed comparable results. Previous work has shown that the tree-based classifier is very efficient and popular in practical use.
Polarimetric features are often associated with the scattering mechanism of terrain scatterers and represent the information embedded in PolSAR data. The quality of input features basically determines the classification performance of employed classifiers. Besides polarimetric information, spatial information is another important factor that affects classification performance and has been paid more and more attentions. The introduction of spatial features in PolSAR image analysis is inspired by the remarkable improvements resulting from the complementarity between spectral and spatial features in optical image classification [17]. Markov random fields (MRF) modeling, texture information and region-based or object-based algorithms are the three main methodological approaches that can be employed to incorporate spatial information into image analysis [22,23]. Du et al. combined spatial features (consisting of textural metrics and morphological profiles) with polarimetric features for PolSAR image classification [17]. In our previous work, spatial information and statistical model-based data fidelity were incorporated into the classification framework via MRF [24]. In Ref. [25], superpixels were generated by a modified linear iterative clustering algorithm to incorporate spatial context. Another potential benefit due to the incorporation of spatial information is to reduce speckles, which are very common in PolSAR images and adversely affect the classification performance.
To make the classification method more practical in use and incorporate spatial information at each image pixel, we select a recently proposed tree-based classification framework called XGBoost [26] for initial image classification. Spatial information is incorporated via superpixels generated by the statistical region merging (SRM) method [27]. After superpixels are generated, the label of a superpixel is determined in a post-processing step through a modified majority voting strategy, which runs quickly and turns out to be efficient. The remainder of this paper is organized as follows. In Section 2, we introduce the adopted features, related classifiers and the proposed classification framework. In Section 3, we describe the experimental data sets and analyze the classification results obtained on real GF-3 PolSAR images. Additional discussions are presented in Section 4. Conclusions are drawn in Section 5.

Polarimetric Features
Providing more information for land cover classification, PolSAR data can be represented by a scattering matrix S, a covariance matrix C or a coherency matrix T [1]. |HH| 2 , |HV| 2 , and |VV| 2 represent the scattered power of three channels and directly reflect the type of scattering media. With an assumption of reciprocal condition S HV = S V H , the coherency matrix can be expressed by T 3 , of which the diagonal elements are T 11 , T 22 , and T 33 , respectively.
Another useful polarimetric parameter is the circular polarization correlation coefficient ρ [28], which can be expressed as Man-made targets such as buildings exhibit low surface roughness and a high value of |ρ| [29]. Many other techniques also have been proposed to decompose PolSAR data and most of these can be divided into two categories [30]. One category is based on an eigenvector-based decomposition proposed by Cloude [5]. Three eigenvector-based parameters of entropy H, scattering angle α, and anisotropy A can classify a pixel by scattering mechanisms.
where p i correspond to the pseudo-probabilities from the eigenvalues, λ i are the sorted eigenvalues of T 3 , and α i represent scattering mechanism associated with each eigenvector. The other category is model-based decomposition such as Freeman decomposition, which models PolSAR data as the sum of surface, volume, and double-bounce scattering components under an assumption of reflection symmetry [6]. Yamaguchi et al. went further than the Freeman decomposition by introducing the helix component [7].
However, it has been demonstrated that the Yamaguchi decomposition may generate negative powers due to overestimation in the volume scattering, which is obviously nonphysical [30]. Van Zyl et al. tackled the problem by introducing the nonnegative eigenvalue decomposition (NNED) and ensured nonnegative scattering powers of different scattering mechanisms [30].
In this paper, we extract 16 polarimetric features to represent the information embedded in GF-3 PolSAR data. Due to the relation between T 33 and |HV| 2 , only T 33 is taken into consideration. Even though Van Zyl NNED and Yamaguchi decompositions reflect similar scattering mechanisms, we simultaneously use the two decompositions since the extraction procedures are different and Yamaguchi decomposition provides another helix component. Therefore, the total selected features consist of |HH| 2 , |VV| 2 , the diagonal elements of T 3 (T 11 , T 22 , T 33 ), |ρ|, three eigenvector-based parameters from H/α/A model (H, α, A), four components of Yamaguchi decomposition (YP S , YP V , YP D , YP C ) and three components of Van Zyl NNED decomposition (VP S , VP V , VP D ).

XGBoost for PolSAR Image Classification
In PolSAR image classification, the goal is to determine the labels of unknown scattering cells or pixels based on a fine-trained classifier. The adopted XGBoost is short for "extreme gradient boosting" and designed to be a scalable machine learning system for tree boosting. The system runs much faster and gives state-of-the-art results on many problems from data mining challenges [26]. Given n labeled samples with m features D = (x i , y i )(|D| = n, x i ∈ m , y i ∈ ), the tree ensemble method uses K additive functions to predict the label.ŷ is the space of regression trees. q represents the structure of each tree that has T leaves. Each f k corresponds to an independent tree structure q and leaf weights w. To learn the set of functions in the model, XGBoost minimizes the following regularized objective.
Here l is the loss function and Ω is the regularized term. The regularized objective is slightly improved compared to previous gradient tree boosting algorithm. Letŷ i be the prediction of the i-th instance at the t-th iteration. The ensemble model works better in an additive manner. Second-order approximation is used to speed up the optimization procedure.
where g i and h i are first and second order gradient statistics on the loss function. For a fixed structure q(x), the optimal weight w and the corresponding optimal value can be calculated by evaluating the split candidates. Besides the improvements in the regularized objective, several additional techniques are also used to further promote classification performance. Shrinkage and column/feature subsampling are used to prevent overfitting. Column/feature subsampling can also speed up computations of the algorithm. In order to find the best split efficiently, the algorithm visits feature values in sorted order to accumulate the gradient statistics in (8) and an approximate algorithm is summarized to avoid enumerating all possible splits greedily. In system design level, a block structure is used to reduce the cost of sorting, which is the most time consuming part of tree learning. Readers can refer to [26] for detailed information of XGBoost. In case of the mentioned advancements and excellent performance in practical use, XGBoost is adopted for PolSAR image to quickly generate the initial classification map.

Post-Processing via Superpixels and Majority Voting
To incorporate spatial information into the pixel-wise classification framework, we adopt the decision strategy of superpixel-based majority voting, which is similar to [31] except for the decision rule and the generation algorithm of superpixels. Superpixels can be generated via segmentation, which is another important application of PolSAR images. Many superpixel segmentation algorithms suitable for PolSAR data have been proposed by incorporating polarimetric information. Lang et al. improved SRM considering the characteristics of PolSAR data and made the generalized SRM suitable for single-or multi-dimensional SAR data [32]. Xiang et al. modified the simple linear iterative clustering(SLIC) and developed an adaptive superpixel generation procedure with local iterative clustering and spherically invariant random vector (SIRV) [33]. Wang et al. introduced two distance measures and an entropy rate method into the PolSAR image superpixel segmentation [34]. In this paper, statistical region merging is directly applied to the Pauli RGB image to generate superpixels for its rapidity [27,32]. The parameter Q in SRM quantifies the statistical complexity of a perfect scene I * and controls the coarseness of the segmentation. Conceptual simplicity and the ability of coping with significant noise corruption make SRM suitable for superpixel generation of PolSAR image. One way to use superpixels is to apply classification algorithms on superpixels directly. But it is inconvenient for users to select superpixels as training samples and may also decrease the number of available samples. In this paper, the post-processing procedure of superpixel-based majority voting is shown in Figure 1. The direct use of superpixel-based majority voting goes both ways. It may misclassify a superpixel if the majority is misclassified. To weaken such an effect, we modify the majority voting. For a given superpixel sp j with n pixels that are classified into C classes, we construct the normalized histogram H = {H 1 , H 2 , · · · , H C } of class labels and the sorted histogram Hs = {H s1 , H s2 , · · · , H sC } in descending order. If H s1 <= σ 1 and H s2 >= σ 2 , the pixels labeled s2 are kept and the remaining pixels are assigned as the majority label. Otherwise, the label with the highest frequency is assigned to all pixels within the superpixel sp j . The proposed classification framework consists of four simple parts including polarimetric feature extraction and stacking, the initial classification via XGBoost, superpixels generation by SRM based on the Pauli RGB image, and label determination via a post-processing step through superpixel-based modified majority voting. The extended Lee Sigma filter is actually used to pre-process the original GF-3 PolSAR images for speckle filtering [35,36]. The workflow of the method is presented in Figure 2. Fast initial classification via XGBoost and the incorporation of spatial information in the post-processing step via superpixel-based modified majority voting, would potentially make the method efficient in practical use and promote the classification performance.

Experimental Data
In this study, the city of Wuhan, which lies in central China, is chosen as a case study area. Two GF-3 single-look complex (SLC) PolSAR images with quad-polarized strip I (QPSI) mode are captured to evaluate the proposed algorithm. Both images are acquired with similar imaging parameters over Wuhan, China, as shown in Figure 3. The detailed data information is presented in Table 1. Two sub-images specified by the red rectangles in Figure 3 are cropped from the two whole images (images #1 and #2) and we name the sub-images data set A and data set B, respectively. Two optical images acquired by Sentinel-2 [37] on 26 March 2017 are also collected for identifying land cover classes. Figure 4 shows the mosaic image and the two regions of interest.    Figure 5 illustrates Pauli RGB images of the two data sets. Data set A has a size of 656 × 1192 pixels and covers the whole region of Jinyin Lake, Wuhan. Three classes consisting of water, buildings, and vegetation, are identified, as shown in Figure 5b. Data set B has a size of 603 × 976 pixels. The study site is located at a rural region of Hannan District, Wuhan. A total of five classes are considered, which include water, building, farmland, vegetation1 and vegetation2. Vegetation areas are divided into two classes according to the backscattering responses and vegetation types, which is shown in Figure 5d. Since only Sentinel-2 optical images are available and no field survey is conducted, the exact land cover types are not identified. It is speculated that vegetation1 are shrubberies or trees with strong branches and vegetation2 involve grass or reeds that grow in wet places. Vegetation1 show stronger scattered power than vegetation2. Both ground-truth maps shown in Figure 5 are manually created based on the Sentinel-2 optical images in Figure 3. The white areas labeled "None" in both data sets are not included in the final accuracy assessment step.

Classification Results
In this section, initial classification results and related analysis achieved by the proposed framework from real GF-3 PolSAR data are reported. The compared classifiers include DT, RF, and SVM with radial basis function kernel. The parameters of three tree-based classifiers are set default for fair comparison. Parameter tuning is recommended for tree-based classifiers to produce better classification results. A total of n = 2000 labeled samples are randomly selected to train the classifiers. The number of train samples for each class is equal. Remaining samples are used for evaluation purpose. The extended Lee sigma filter with σ v = 0.9 and a 9 × 9 window is adopted for speckle reduction of GF-3 PolSAR data [36]. The empirical parameters of modified majority voting are set σ 1 = 0.8 and σ 2 = 0.1. m = 16 polarimetric features are employed as described in Section 2.1. Overall accuracy (OA) and the confusion matrix are adopted to evaluate the performance.
Samples of five classes consisting of water, building, farmland, vegetation1 and vegetation2, are randomly selected from data set B, image #2. Figure 6 illustrates the 2D scatter plots of different features in pairs. All the features are extracted by PolSARPro 5.1. Water and farmland show higher surface components while the volume components of both vegetation types are obvious. Figure 6g illustrates the scatter plots of five classes basically fit into the zones defined in the H/α plane [5]. However, it also shows that the used GF-3 PolSAR image shows higher values of the entropy H. The |ρ| value, the anisotropy A and the double-bounce component of buildings are higher compared to other land cover classes. The components of Yamaguchi and Van Zyl NNED decompositions reflect similar scattering mechanisms according to corresponding scatter plots. The characteristics of adopted features from GF-3 PolSAR data are basically in accordance with the theorems.    Tables 2 and 3. According to the results, The OAs of DT on both data sets are worse. The other three classifiers preform well and report comparable results on both data sets. For both data sets, XGBoost performs best. From Table 3, it can be seen that the improperly classified regions mainly occur in the building area. The corresponding accuracies obtained by the four classifiers are all below 77%. Figure 9 shows the confusion matrices of both data sets, respectively. For data set A, the misclassification mainly occurs between building and vegetation types. For data set B, about 20% of the building area are misclassified as vegetation areas. The reason is that data set B covers rural region, where trees are planted around buildings. Misclassification between farmland, vegetation1 and vegetation2 results from their similarity. In all, XGBoost reports similar classification results to SVM and RF, but performs much better than DT. However, many misclassified pixels emerge in homogeneous areas for all classification maps, which badly affect the performance. Moreover, the classification results in building areas are not adequate. Some necessary steps should be taken to remove the discrete misclassified pixels and to improve the the overall performance.

Performance Analysis
To further compare performances of different classifiers, more experiments are conducted here. First, the effects of the number of training samples are examined. Figure 10 illustrates the overall classification accuracies for each classifier under different numbers of training samples. 10 to 2000 pixels of the labeled data are randomly selected as training samples. Additionally, 5000 randomly selected pixels are used for testing. Ten repeated experiments are conducted and we take the average as the final OA. Classification accuracies of both data sets generally increase as more training samples are selected. Other conclusions are same to Section 3.2. XGBoost, SVM, and RF report similar results and perform much better than DT. Figure 11 shows the time costs of different classifiers. Both training samples and test samples are randomly selected. The training sample numbers is set to 2000. The number of test samples ranges from 1000 to 10,000. Accordingly, The curves of time costs on two data sets are very similar with each other. The time cost of SVM linearly increases as more samples are used. The time cost of DT is lowest but the classification result is not satisfying. SVM costs more time than XGBoost but report similar classification results. RF costs more time than XGBoost on data set A while the contrary is the case on data set B. XGBoost performs better than RF on both data sets according to Figure 10. In sum, XGBoost is most efficient in case of classification result and time cost.

Final Classification
Based on the initial classification results obtained by XGBoost, the post-processing step is conducted to incorporate spatial information into the final classification map via superpixels and the modified majority voting. Superpixels are generated by SRM based on the Pauli RGB image. The parameter Q in SRM controls the coarseness of superpixels. The post-processing step can also be combined with other classifiers. Figure 12 shows the change of OA for different classifiers when the Q level increases from 1 to 256. Tables 4 and 5 report the Q levels with best OAs on the two data sets, respectively. It is visible that the OAs for both data sets have improved, which demonstrates the effectiveness of incorporating spatial information through superpixels for pixel-wise classification. It can also concluded that the Q level has an impact on the classification result. One point is the Q level should not be set too small.The Q levels with highest OAs for the two data sets are different and should be set accordingly.    Figures 13 and 14 show the generated superpixels and the final classification maps of both data sets with selected Q levels, according to the OA curves in Figure 12. The specific Q levels can be found at Tables 4 and 5. Compared to Figures 7 and 8, the misclassified pixels in a superpixel have been removed, which helps to generate smoothed classification maps. The confusion matrices shown in Figure 15 illustrate detailed changes after the incorporation of spatial information. For data set A, the percentage at which buildings are wrongly assigned as vegetations decreases by 19.42%. The accuracy of building reaches 96.38%. The OA of data set A acquired by XGBoost is 92.20%, which is superior to other classifiers. As for data set B, the accuracies of building, farmland, and vegetations increased. Specifically, the accuracy of farmland areas in data set B is higher by 11% than that of the pixel-wise results. Overall performance and accuracies of most classes improve at the price of the misclassification between building and farmland areas. The incorporation of spatial information significantly contributes to the performance improvement. Even though SVM and XGBoost report comparable OAs according to Table 5, tree-based classifiers, especially XGBoost, are preferable in practical classification application in consideration of accuracy and time costs.

Discussion on Polarimetric Features
In this section, the two GF-3 PolSAR data sets are further processed with six feature sets (FS) to find out how these features contribute to classification results. The detailed content of feature sets can be seen in Table 6. In order not to be affected by spatial information, initial classification results without the post-processing step are analyzed. Tables 7 and 8 report the accuracies by classes and OAs with the six feature sets on both data sets. FS1 produces the lowest OAs and the lowest accuracies of building on both data sets. The H/α/A model improves the results obviously by increasing OAs by 3.96% and 5.4%, respectively. |ρ| contributes to building areas but has not benefited much to overall classification results. Both Yamaguchi and Van Zyl NNED decompositions contribute to building areas and further promote the overall performance. However, the simultaneous use of Yamaguchi and Van Zyl NNED decompositions decreases the accuracy of buildings. In all, H, α, and A are three important parameters for land cover classification. One of the Yamaguchi, Van Zyl NNED or other model-based decompositions reflecting similar scattering mechanisms is recommended. Table 6. Detailed content of feature sets.

Feature Set Polarimetric Features
All the m = 16 features

Discussion on Road Type
In data set A, the roads surrounding Jinyin Lake often appear as vegetation types. This is probably due to the lack of training samples since current land cover classes do not contain roads. Another possible reason is that trees are planted along the roads as shown in Figure 16. Here, some training sites on the road around Jinyin Lake are added into the original ground truth map. Figure 17 shows the updated ground truth map and final classification map with the road type. Some parts of roads are extracted. Some none-road areas are unexpectedly misclassified as roads. Based on the confusion matrix shown in Figure 18, the accuracy of road is only 32.70% and the accuracies acquired by the other three compared classifiers are 29.20% (DT), 34.56% (RF), and 21.21% (SVM). The majority of misclassification can be found in vegetation and building areas for trees and buildings alongside roads. The classification methods are unsuitable for the extraction of roads or other line targets.

Discussion on More Complex Land Cover Classification System
To further evaluate the performance on the data set with a more complex land cover classification system, the AIRSAR Flovland data set is tested by the proposed classification framework. The scene covers over Flevoland, the Netherland with an image size of 750 × 1024 pixels. With well-established ground truth map, the AIRSAR Flevoland data set has been widely used in land use classification since [38]. The land cover classification system consists of eight crop classes and three other classes of bare soil, water, and forest, which is shown in Figure 19. In consideration of the homogeneity of this data set, the empirical parameters of the modified majority voting are set σ 1 = 0.5 and σ 2 = 0.4. The extended Lee sigma filter is adopted and other parameter settings are same to Section 3.  Figure 20 illustrates the change of OA with different Q levels. Table 9 reports the Q level with best OA for different classifiers. The OAs acquired by methods proposed by Lee et al. [38], Tao et al. [10], and Qin et al. [11] are also reported in Table 9 since the same data set has been used and land cover classes are same. Lee et al. used an maximum likelihood classifier based on the Wishart distribution for all polarization combinations to quantitatively assess classification results [38].
Tao et al. developed a classification method with selected hidden polarimetric features in the rotation domain and a SVM/DT classifier. The overall classification accuracy of DT with a window size of 9 × 9 is shown in Table 9 in view of better performance and fair comparison [10]. Qin et al. developed an object-oriented classification method using restricted Boltzmann machines. The best OA of the proposed RBM-Adaboost with 29 units and 88 learners is 96.15% [11]. According to the results, tree-based classifiers surpass SVM by 3%. The OA of XGBoost is higher by 10% than that of Lee's method. Figure 21 shows the generated superpixels and the final classification map acquired by XGBoost. The corresponding confusion matrix is shown in Figure 22. The accuracies of most classes except for beet, are above 90%. Therefore, the proposed method is competitive with other benchmark methods.

Conclusions
In this paper, we proposed an efficient classification framework for GF-3 PolSAR images. With quick initial classification and incorporation of spatial information, the proposed method is hopefully able to classify an entire GF-3 PolSAR image quickly and accurately. We also analyzed the characteristics of features from GF-3 PolSAR data and how the adopted features contribute to classification results. Preliminary experimental results on real GF-3 PolSAR images and the AIRSAR Flevoland PolSAR data have demonstrated their own advantages. XGBoost showed comparable classification results with less time consumed. Spatial information turned out to be an important factor for PolSAR image classification and accounted for the accuracy improvements achieved in the post-processing step via superpixel-based modified majority voting. The results also showed that the quality of GF-3 PolSAR data was adequate enough for land cover classification and other potential usages.
Further improvement is still required. Analysis about more palarimetric parameters extracted from Gaofen-3 data [39] and how to choose essential polarimetric features need to be investigated. The generation of adequate superpixels is worth further exploration. Future efforts should also be devoted to implementing the classification framework in C++, which will further improve efficiency.