Experiments were carried out on three typical datasets. The performance of the MFCN+LPP algorithm was quantitatively and qualitatively compared with the standard FCN- and LPP-based approaches. The discussion and analysis are presented in this section.
4.2. Experimental Settings and Parameter Tuning
In the experiments, our datasets were all processed by multi-looking process, which is a basic method for speckle suppression of SAR images. Drawing on transfer learning, different pseudo-color maps were first fed into pre-trained multiple-FCN8s (MFCN) models to learn the multi-scale deep spatial features automatically. During this process, polarization characteristics and spatial information could be adaptively fused. Then, the outputs of the last layer from each FCN-8s were stacked to form spatially polarized fused features. For high-dimensional fused features (the labels are known in advance), a certain proportion of them was randomly selected as the training set, and the remaining selected for testing. The training set was utilized to construct an intrinsic graph and a penalty graph to learn the mapping relationship (represented by a matrix Q) from a high-dimensional feature space to a manifold subspace. Finally, an SVM classifier with a Gaussian kernel was trained to obtain the final classification results on the test samples.
To robustly evaluate the classification performance of the proposed algorithm, the overall accuracy (OA), kappa coefficient, and confusion matrix are employed as evaluation indicators, and comparative experiments were conducted with the following algorithms.
- (a)
Local preserving projection (LPP) finds a manifold representation of polarized features in a low-dimensional subspace without considering the spatial relationship between pixels.
- (b)
FCN: A single pre-trained FCN is adopted to learn the multi-scale spatial structure of the PolSAR data. This nonlinear feature is actually a deep abstract semantic representation with relatively simple polarization properties.
- (c)
FCN+LPP: A single FCN extracts the multi-scale spatial features in PolSAR imagery, which undergo dimensionality reduction by the LPP algorithm for further classification.
- (d)
MFCN+LPP is he algorithm proposed in this paper. First, to describe various ground objects effectively, seven polarized decompositions are fed into multiple parallel FCN-8s models to learn multi-scale deep spatial features. Afterwards, through the manifold graph embedding model (LPP), the compact manifold representation of high-dimensional spatially polarized features from the MFCN is extracted for final classification.
The classification algorithm proposed in this paper can be divided into three stages: feature learning and fusion, dimension reduction and classification. In the first phase, there are millions of parameters in the FCN model to be learned. Considering the cost and scarcity of PolSAR data, this paper draws on transfer learning and employs FCN-8s, which have been pre-trained on the PASCAL VOC 2011 dataset. Good model parameters are migrated to learn the characteristics of the PolSAR data.
During this process, the original image is cut into four small patches; three of these patches are selected as the training set, and the remaining one is used for the test set. After four successive cycles, the classification results of the whole image can be obtained. Seven typical decomposition methods (Pauli decomposition, Cloude decomposition, Freeman decomposition,
decomposition, Huynen decomposition, Yamaguchi decomposition and Krogager decomposition) were adopted in the experiments. Pseudo-color images from some different decompositions and the corresponding constituents of the R, G, and B channels are shown in
Figure 8 and
Table 1, respectively. To accomplish this goal, seven parallel FCNs with pseudo-color images from different decompositions as input learned the multi-scale deep spatial features adaptively, achieving the simultaneous fusion of the polarized properties and spatial information.
During the second stage, dimensionality reduction is conducted based on the manifold graph embedding model, and the training set and test set are divided. According to our previous research [
31], the training ratios for Flevoland Dataset 1, the San Francisco dataset and Flevoland Dataset 2 were set to 5%, 3% and 5%, respectively. In addition, when establishing the manifold graph embedding model, designing a weight matrix W and specifying a subspace dimension
k also involve parameter selection. First, for a weight matrix
W, a thermonuclear function (
, Tikhonov regularization with a strength of 0.1) is chosen. In addition, the curve of the classification accuracy varies with the subspace dimensionality, as shown in
Figure 9. Considering both the accuracy and time consumption, the subspace dimensions
k of Flevoland Dataset 1, the San Francisco dataset and Flevoland Dataset 2 were set to 90, 50 and 35, respectively.
For the final classification, an SVM with a Gaussian kernel is employed as the classifier, and we use five-fold CrossValidation to optimize the parameters. To more intuitively analyze the classification effect of the dataset on multiple comparative experimental methods in this paper, we assume that the intensity of speckle noise of the training data and test data in each dataset is consistent.
4.3. Experimental Results and Analysis
In this section, the classification results of the proposed method and comparative algorithms are described from multiple perspectives. Specifically, the corresponding classification accuracy values, OAs and kappa coefficients for the different algorithms are listed in
Table 2,
Table 3 and
Table 4; the confusion matrix and visualized classification results of the proposed method are shown in
Figure 10,
Figure 11,
Figure 12,
Figure 13,
Figure 14 and
Figure 15.
Table 2 shows that the LPP method has poor classification performance for all classes, with an average overall accuracy of 73.70%. The main reason is that the LPP algorithm is a shallow learning algorithm and lacks spatial constraints. The classification accuracy of the FCN has been greatly improved for all categories (OA of 90.99%) except 71.13% in Category 9, which is lower than that of the LPP algorithm. By contrast, with the FCN+LPP algorithm, the complementation of deep spatial features by the FCN and manifold graph embedding learning can greatly enhance the classification accuracy in most categories. Especially for rapeseed, beets and stem beans, the corresponding accuracy has been improved 6%, 13% and 19%, indicating that manifold subspace representation has gained deep multi-scale spatial features.
Compared with the FCN+LPP algorithm, the accuracy of the MFCN+LPP algorithm for almost every category has been improved, especially on beets and stem beans (5% for both). This finding demonstrates that polarized property passes through the presented MFCN, and fusions with multi-scale spatial features can be more effective for pixel-level classification.
For the San Francisco dataset, the experimental results in
Table 3 show that the performance of the FCN+LPP algorithm is higher than that of the single LPP or FCN method. Looking at the accuracy only, the FCN outperforms the LPP algorithm, the former’s classification accuracy is greater than 95% for all categories and 99.44% for sea, which proves that spatial features from the FCN play a leading role in distinguishing different ground objects. However, according to the visualized classification results presented in
Figure 12b, it is obvious that the boundary areas have many misaligned pixels. For example, there are some misclassified samples in the intersection of categories indicated by yellow and green since the FCN cannot handle edge details well.
By integrating spatial features from the FCN and manifold representations from the graph embedding model, the LPP algorithm’s advantage of retaining local characteristics can largely compensate for the FCN’s defects. After combining the FCN and LPP methods, the overall accuracy reaches 98.46%. In the proposed algorithm, the MFCN+LPP algorithm further combines polarized information with spatial features through multi-parallel convolutional neural networks; then, the manifold graph embedding model is employed to remove redundancies, and the classification accuracy of vegetation, high-density areas and low-density areas continues to grow. Moreover, the OA of the low-density areas reaches 100%, indicating that multi-scale polarized spatial features have a strong discriminating ability for classification. However, the accuracy of sea and developed areas is lower than that of the FCN+LPP method; these two categories account for a large proportion of the area; thus, the proposed method’s performance is not as good as that of the FCN+LPP method.
Figure 14 shows the visualized classification results of different algorithms on Flevoland Dataset 2. The corresponding OAs and kappa coefficients for each class are given in
Table 4. Although the LPP-based manifold method can obtain an OA up to 96.54%, the combination of the FCN and LPP methods achieves a better performance, with the OA increasing to 99.83% (FCN+LPP) and 99.54% (MFCN+LPP).
Compared with the single LPP method (56.39% for beans), the accuracy of both the FCN+LPP and MFCN+LPP methods on the same class increased by nearly 42%. For fruit, oats, wheat, peas, maize, flax, rapeseed, grass and lucerne, the MFCN+LPP algorithm achieves the best performance on both the visualization and classification accuracy. The overall accuracy for most of the categories mentioned above is 100% except 99.86% for rapeseed. The single FCN already has excellent classification performance with an OA as high as 97.45%, but its synergy with the LPP algorithm can boost the final results. More importantly, when polarized features are fed into the MFCN, the polarization information can effectively fuse with the spatial characteristics at multiple scales and levels, thereby improving the representation ability and discrimination ability of the extracted features. Finally, the runtime of the proposed algorithm and the other comparison methods are given in
Table 5 and
Table 6. Among them, the experiments based on the FCN-8s models for deep multi-scale spatial feature extraction were conducted in the Caffe framework using an NVIDIA Tesla K40 GPU, which took 423.08, 490.98 and 559.86 s in Flevoland Dataset 1, the San Francisco dataset and Flevoland Dataset 2, respectively, which are approximately linear with the input image size. When the hardware conditions are satisfied, parallel operations can be performed on a machine with multiple GPUs to reduce the time consumption. Other experiments were performed using MATLAB 2014 on a computer with an Intel Core i5-4570 CPU and 32 GB RAM.
Table 6 indicates that the single FCN and the FCN+LPP algorithm have lower time costs, while the time consumption of the MFCN+LPP algorithm is the highest because the generated feature space of the MFCN+LPP algorithm has the highest dimensions, increasing the computational complexity.