Fully Convolutional Networks and a Manifold Graph Embedding-Based Algorithm for PolSAR Image Classification

With the rapid development of artificial intelligence, how to take advantage of deep learning and big data to classify polarimetric synthetic aperture radar (PolSAR) imagery is a hot topic in the field of remote sensing. As a key step for PolSAR image classification, feature extraction technology based on target decomposition is relatively mature, and how to extract discriminative spatial features and integrate these features with polarized information to maximize the classification accuracy is the core issue. In this context, this paper proposes a PolSAR image classification algorithm based on fully convolutional networks (FCNs) and a manifold graph embedding model. First, to describe different types of land objects more comprehensively, various polarized features of PolSAR images are extracted through seven kinds of traditional decomposition methods. Afterwards, drawing on transfer learning, the decomposed features are fed into multiple parallel and pre-trained FCN-8s models to learn deep multi-scale spatial features. Feature maps from the last layer of each FCN model are concatenated to obtain spatial polarization features with high dimensions. Then, a manifold graph embedding model is adopted to seek an effective and compact representation for spatially polarized features in a manifold subspace, simultaneously removing redundant information. Finally, a support vector machine (SVM) is selected as the classifier for pixel-level classification in a manifold subspace. Extensive experiments on three PolSAR datasets demonstrate that the proposed algorithm achieves a superior classification performance.


Introduction
As an advanced SAR system, polarimetric SAR (PolSAR) inherits the unique advantages of SAR, and it can transmit and receive electromagnetic waves in four polarization combinations (HH, VV, HV, and VH). PolSAR can measure the medium's complex scattering matrix, which integrates multiple types of information, such as the amplitude, frequency and phase. Moreover, it contains polarization properties highly related to the object structure and thus can characterize land surfaces more accurately. PolSAR data have been widely applied in image classification, target recognition and detection task, among which image classification plays an important role in remote sensing image interpretation systems. Classification results can not only be directly applied to land cover classification and urban representation of polarimetric SAR data through a multi-layer network. Deep learning can not only mine polarized information but can also integrate it with spatial properties for classification.
Early typical neural networks include the stacked auto encoder (SAE) and deep belief network (DBN), which are unsupervised learning algorithms. As a multi-layer self-coding neural network, SAE cascades hidden layers together by training network parameters layer by layer [14]. Similar to SAE, stacked denoising auto encoder (SDAE) is grounded on a denoising auto encoder (DAE). it learns the high-level representation of images in a purely unsupervised manner, which helps to improve the performance of subsequent SVM classifiers, and enhances the robustness of the network [15]. DBN is composed of several restricted Boltzmann machines (RBMs), whose core idea is to optimize the connection weights layer by layer through a greedy strategy [16]. Many studies have utilized these models. Liu et al. proposed a remote sensing image classification algorithm based on deep learning, in which texture features were first extracted by a non sampling contourlet transform, and a DBN was utilized for remote sensing image classification [17]. According to the polarimetric hierarchical semantic model, Shi obtained initial classification results [18]. For clustered areas in PolSAR images, they used an SAE to obtain a high-level feature representation, thus realizing terrain classifications. However, the algorithm was quite complex and time consuming.
Since CNN was proposed, researchers have tended to explore deeper and more complex convolutional models. In 1998, the five-layer LeNet was introduced. Then, a 7-layer AlexNet and 16-layer VGG network were proposed. Their feature extraction ability has been enhanced, as well as their classification performance. A CNN-based remote sensing image classification algorithm was proposed in [19]. On the basis of a traditional LeNet network, this algorithm adopted a ReLU function instead of a sigmoid and tanh function and achieved good effects. In [20], a supervised classification algorithm for PolSAR imagery was proposed. The algorithm selected effective polarized features and obtained the final classification results by means of a deep neural network. However, methods based on CNNs can only achieve image-level classification, and the advent of FCNs [21] makes the semantic segmentation of SAR imagery a reality. On this basis, many researchers have made further attempts. For example, Mohammadimanesh et al. [22] used a fully convolutional neural network for semantic image segmentation, which was much better than the traditional random forest classifier.
Speckle is the inherent characteristic of SAR, which complicates the SAR image interpretation problem by reducing the effectiveness of image information. The source of speckle is random interference between the coherent returns issued from the many scatterers present on the Earth's surface. Speckle noise not only makes the SAR image show a very low signal-to-noise ratio, but also overwhelms the true scatter information of the target. At the same time, post-processing such as multi-looking and image enhancement in actual applications may cause differences in the intensity of the speckle noise between the test SAR image and the training data. To avoid this problem, many effective methods are proposed, For example, the training set can be expanded based on the data augmentation method to include data with different intensity of speckle noise. The network model based on the expanded training data can enhance its robustness to speckle noise [23]. In addition, a method based on noise-invariant constraints introduces regular terms into the optimization loss function of the model. By constraining the feature vectors of images under different noise intensities, the robustness of speckle noise of the CNN model is achieved as much as possible [24].
Although the above mentioned methods can achieve good classification results for objects in an image, they neglect the influence of multi-channel information contained in PolSAR data. How to effectively extract features with high discriminant properties for different region types in PolSAR imagery remains the key to achieving high-precision classification.

Problems and Motivation
PolSAR data have multiple polarized channels and contain information including the amplitudes and phases of different pixels. If these data could be better utilized, it would be easier to extract features with a stronger representation ability for ground objects. However, with the increase in data, there are also difficulties and challenges in PolSAR image classification, mainly reflected in the following aspects: (1) Scattering mechanism: There are different forms of electromagnetic wave scattering, such as plane scattering and dihedral angle scattering. Each pixel is formed by a mixture of various scattering forms. Therefore, the phenomenon of different objects have the same spectrum and the same objects have different spectrum exists in high-resolution SAR images, increasing the difficulty of semantic classification. In addition, there is speckle noise in PolSAR imagery, which can easily lead to misclassification. (2) Complex data format: In terms of the polarized coherence matrix, each pixel in an PolSAR images a 3 × 3 complex matrix. For a large PolSAR image, there is much redundancy between each pixel. Therefore, how to extract effective features for ground objects becomes one of the major difficulties. In addition, for different kinds of terrain with various scales and shapes, it is quite hard for a single method to achieve a good classification for all objects. (3) Semantic gap: For the task of machine vision, people do not focus on each individual pixel but on the target formed by pixels. In fact, pixels are discrete points and have no specific meaning. Only when many discrete points aggregate to form targets can they be classified. However, discrete pixels with large differences in their underlying features (such as gray-scale features) are likely to represent the same object. This is the semantic gap problem, demonstrating differences between underlying features and high-level semantics. To overcome this problem and obtain a region-consistent classification result, higher-level semantic features should be mined.
In recent years, deep learning has become popular since it can automatically learn image features through multi-layer networks and has good performance. Compared with traditional methods with lower precision, a deep network can extract more abstract and high-level semantic features, showing its good analytical ability for complex scenes. As a pioneering work of semantic segmentation, fully convolutional networks (FCNs) [21] perform well on optical images.
However, considering the differences in the imaging mechanism between optical and PolSAR images, the direct application of FCNs to PolSAR images will encounter a serious problem: the network parameters are very large, heavily depending on many training samples [25]. The accurate labeled samples of PolSAR images are few and difficult to obtain; directly training FCN from scratch with PolSAR images can easily cause serious overfitting. Faced with problems of high demand for training samples, model training is quite time consuming, and combining deep neural networks with transfer learning is of great practical significance. Transfer learning is a learning method that uses existing knowledge to solve problems in different but similar domains. Compared with traditional machine learning, transfer learning can not only solve the problem of insufficient training samples but also migrates and shares a trained model between different tasks even when training data and test data obey different distributions [26].
Since the physical mechanism of polarized signals is extremely complex, features extracted from original PolSAR imagery usually have high dimensionality. There is a lot of redundant information and, if not processed effectively, the curse of dimensionality will occur and have a great impact on classification results. Therefore, while using deep transfer learning for high-level semantic feature extraction, this paper also combines a graph embedding model to obtain the most intrinsic representation of high-dimensional data to identify principal features that have decisive effects on classification.

Contributions and Structure
Aiming at these problems, this paper proposes a PolSAR image classification algorithm based on a multi-parallel FCN and a manifold graph embedding model. First, based on the scattering matrix and coherent matrix, high-dimensional polarized features of PolSAR data are extracted and color-coded according to different decomposition methods. Then, inspired by transfer learning, those synthesized multi-color maps are fed into the multiple-parallel FCN (MFCN) model, pre-trained on optical images, for deep multi-scale spatial feature learning. Feature maps from the last layer of the multi-parallel FCN-8s models are concatenated to generate better fused features. Finally, the manifold graph embedding model is applied to determine the effective representation of the fused feature in a low-dimensional subspace, which serves as the input of an SVM to obtain the final classification results. The main contributions of this paper are as follows: (1) Based on transfer learning, parallel FCN models are utilized to automatically learn deep multi-scale spatial features. Since the input of the MFCN originates from polarized features in PolSAR imagery, spatially polarized information can be adaptively fused while learning discriminative deep features. (2) The manifold graph embedding model explores the representation of spatially polarized fused features in a low-dimensional subspace, mining the essential structure of the PolSAR data to improve the fused feature s classification discriminability. This article is organized as follows. Section 2 introduces the background of the multi-dimensional space of PolSAR data, the FCN structure and the graph embedding framework. In Section 3, the principle of the presented manifold graph embedding-based MFCN model for PolSAR image classification is illustrated. Section 4 describes our experimental results and corresponding detailed analysis. Finally, the conclusions of this paper are summarized in Section 5.

Multi-Dimensional Polarization SAR Data
The polarimetric synthetic aperture radar system (PolSAR) can provide the amplitude, phase and frequency characteristics of the target scattering echo, and also takes advantage of its polarization characteristics. PolSAR can obtain the polarimetric scattering matrix of the target by measuring the polarimetric scattering characteristics of the target by transmitting and receiving different polarimetric electromagnetic waves. The polarization scattering matrix, also known as the Sinclair Matrix [27], is a 2 × 2 complex number matrix and represented by a Matrix S: where H stands for Horizontal Polarization and V means Vertical Polarization. Element S xy represents the complex scattering coefficient when the emission polarization is y and the receiving polarization is x, while S HH and S VV denote the co-polarization components while S HV and S V H are the cross-polarization components. r denotes the distance between the terrain target and the PolSAR system and k 0 is the coefficient of the electromagnetic wave emitted by the PolSAR system. The matrix unifies the phase, energy and polarization characteristics of the target and describes the electromagnetic scattering characteristics of the radar target relatively completely.
In the actual scene radar measurement, the SAR data show a certain degree of randomness. To suppress the speckle noise's interference, it is necessary to use statistical methods to analyze the electromagnetic scattering characteristics of the target. The commonly used second-order statistic, the polarization coherence matrix T, is shown in the formula below: where T ij is the element of a polarized coherence matrix T and k p is the target vector of polarized scattering matrix S under a Pauli basis: For the better interpretation of PolSAR data, many target polarimetric decomposition methods have been developed, such as Yamaguchi decomposition [28], Huynen decomposition and so on. In practice, different polarization decomposition methods are suitable for different objects. A reasonable method is to combine the polarization characteristics of different types of target decomposition to describe PolSAR data. Therefore, this paper adopts coherent features based on a scattering matrix S, statistical features based on a coherence matrix T and polarized incoherence features to represent PolSAR images.

Fully Convolutional Networks
FCN [21] solves the problem of semantic segmentation by performing pixel-level classification of images. Traditional CNNs use fully-connected layers to obtain fixed-length feature vectors for classification after convolutional layers. As a pioneer of image semantic segmentation, FCN is different from classic CNN in that it can accept input images of any size. By using a deconvolution layer to upsample the feature map of the last convolution layer to restore it to the same size as the input image, a prediction can be generated for each pixel while retaining the space in the original input image Finally, the softmax classification loss is calculated pixel by pixel on the upsampled feature map, which is equivalent to each pixel corresponding to a training sample.
FCN can basically locate the target area through convolution and deconvolution. However, in the early stage of the model, the feature weight image is output through convolution, pooling and nonlinear activation function. The image output after deconvolution and other operations is actually rough and loses a lot of details. Thus, FCN needs to find a way to fill the missing details.
To further improve prediction performance, FCNs use skip structure to fuse information at different layers. There are three different structures: FCN-32s, FCN-16s and FCN-8s. Their difference lies in the number of feature fusions. FCN-8s, as shown in Figure 1, performs feature fusion between three layers of feature maps. It has been proved to have the best semantic segmentation performance in theory and practice. That is, multi-layer feature fusion helps to improve accuracy. This is because the shallow layer can extract local linear features of the image, while the deep layer can learn global semantic features. Therefore, the final feature map has strong expression ability. Based on this, this paper uses FCN-8s to learn the multi-scale deep space features of PolSAR images.

Graph Embedding Framework
Graph embedding transforms data denoted by graph into subspace representation in a low dimension and retains the graph's structure and characteristics to the maximum extent [29]. The unified dimensionality reduction framework based on graph embedding was first proposed by Yan's team [30], and they used an undirected weighted graph to represent the similarity between different data. Considering that X 1 , X 2 , . . . , X m , X i ∈ R n indicates m data points in an n-dimensional space, dimensionality reduction aims to seek a set of representations y 1 , y 2 , . . . , y m , y i ∈ R k in a low-dimensional subspace with k dimensions, so that y i can stand for X i . Under the graph embedding framework, given a graph indicates m vertices of the graph and W is a m × m symmetric matrix, whose element W ij is defined as similarity between connected samples x i and x j .
The purpose of graph embedding is to find a mapping matrix Q ∈ R n×k , which maps the original data X to y in low-dimensional space by y = Q T X and preserves the similarity between vertex pairs. For good embedding, two similar data points should be as compact as possible in the low-dimensional subspace; if they are embedded far apart, a larger "penalty" should be imposed. Therefore, in the graph embedding model, an intrinsic graph and a penalty graph are constructed to represent attributes we want to retain and suppress in the low-dimensional subspace, respectively. In the intrinsic graph, if two samples are the k-nearest neighbors belonging to the same class, they are connected. In the penalty graph, the connection is made only if the two samples belong to different classes. The unified objective function of the graph embedding algorithm is as follows: arg min Through simple algebraic operations (for a detailed derivation, see [31]), it can be further formulated as follows: L I denotes the Laplacian matrix of the intrinsic graph. A is a diagonally constrained matrix for scale normalization, which is usually valued as a Laplacian matrix L P of the penalty graph. c is a constant and usually takes a value of 1. Through the formulas above, the optimal mapping matrix can be obtained: According to the formulas above, the difference in the dimensionality reduction algorithms based on graph embedding is mainly rooted in Laplacian matrices L I and L P corresponding to the intrinsic and penalty graphs, respectively. Generally, the Laplacian matrix of the penalty graph is fixed. Therefore, the essential difference between different graph embedding algorithms lies in the design of the similarity matrix W I , which will play a decisive role in the effect of dimensionality reduction.

The Proposed Algorithm Framework for PolSAR Image Classification
Considering that the FCN-8s model has a variety of receptive fields with different sizes, it can perceive the object's structure information at multiple scales and layers. In addition, as a traditional manifold method, the locality preserving projection (LPP) model based on graph embedding can significantly preserve the original data's local characteristics and deal with high-dimensional data located in manifold space well. Therefore, in our experiments, the LPP and MFCN models are integrated, and their advantages are complementarily utilized to classify PolSAR images. The proposed algorithm framework is shown in Figure 2. First, seven "RGB" synthetic pseudo-color maps are obtained through different polarized decomposition methods. After different color encodings, they are input into the pre-trained FCN-8s model. When the network automatically learns the spatial information of adjacent pixels, the polarized properties can be adaptively fused with the spatial structure. In this case, the MFCN can extract features at different network levels. Since the same pre-trained FCN-8s model in the MFCN is adopted as a feature extractor, there is redundancy in the spatially polarized fused features extracted by the MFCN. To remove redundant information in the cascaded features with high dimensionality, the manifold algorithm LPP based on graph embedding is utilized for dimensionality reduction before the final classification. Considering that features learned by the MFCN are already sparse and nonlinear, this work combines the manifold graph embedding model LPP to remove redundancy in high-dimensional features to capture the most essential data structure, which is finally fed into a classifier to obtain the classification results. Considering that SVM is a trainable classifier and its nonlinear kernel function can better distinguish complex image features, our method uses the most commonly used SVM as the classifier.

Feature Learning Based on an MFCN
CNN-based classification methods are usually based on image patches, whose appropriate size is difficult to determine. In addition, to classify each pixel, a pixel-by-pixel sliding window strategy is necessary. However, it will lead to many repeated operations, greatly increasing the computational time and required storage space. Therefore, this paper utilizes an FCN for pixel-level PolSAR image classification, as shown in Figure 3. Since the scale of PolSAR data is quite small, retraining the network from scratch is not realistic. Therefore, the transfer-learning strategy is considered to migrate the FCN model (pre-trained on the PASCAL VOC dataset) to learn the deep multi-scale spatial features of the PolSAR images [32]. Considering the sample capacity of the PolSAR dataset and its differences from one of optical images, and the similarity of PolSAR images and optical images is small, a reasonable method is to use the FCN model pre-trained on the optical image as a feature extractor to get deep spatial features of PolSAR images and then uses a classifier to perform final classification.
In the experiments, an FCN is employed as a feature extractor when some of the underlying layer parameters are fine-tuned.  There are many targets with different scattering characteristics in real scenes, and to make the spatially polarized features represent different objects more effectively, this paper proposes a multiple-parallel FCN-8s (MFCN) model for feature learning and fusion of PolSAR images, as shown in Figure 4. Various polarized decompositions are fed into different FCN-8s models, and the final spatial feature maps are cascaded into high-dimensional features for PolSAR imagery representation. According to our previous research [31], it is obvious that, for the same network input, e.g., Pauli decompositions, each filter in the score layer outputs different feature maps, which means that different filters prefer different object types. In addition, for different network inputs, the same filter at the same network layer can obtain different feature maps, mainly depending on different polarized information inputs. In this respect, spatially polarized fused features from the proposed MFCN can be helpful in distinguishing different types of terrain objects.

Dimensionality Reduction Based on Manifold Graph Embedding
Nonlinear data with high dimensions are usually compactly distributed in a low-dimensional manifold space; therefore, it is necessary to explore the most intrinsic data structure using manifold algorithms. The objective functions of global manifold methods (ISOMAP) and local manifold techniques (LLE and LE) all involve a Laplacian matrix and are eventually transformed into eigenvalue decomposition problems. However, these nonlinear methods have common drawbacks: the mapping of the original high-dimensional data to a low-dimensional subspace has no explicit functional expression.
To solve the problem of external samples, this paper adopts the locality preserving projection (LPP) [33], which can be regarded as a linear approximation of the nonlinear LE method. As a manifold dimension reduction model based on graph embedding, the LPP algorithm can restore nonlinear manifolds important properties by retaining local structures. The LPP algorithm includes three main steps: construction of adjacency graphs, assignment of weights and calculation. The detailed process is as follows: Stage 1: There are two strategies to measure the "similarity" in adjacency graphs. The first is the ε-nearest neighbor: if ||x i − x j || 2 < ε, then node i is connected to node j. The second strategy is the k-nearest neighbor: if node i is the k-nearest neighbor of node j, then the two nodes are connected. Relatively speaking, the k-nearest neighbor is widely used because of its stability, but it has a higher time cost [34]. The ε-nearest neighbor is suitable for scenarios with high computational complexity, but it is very difficult to select an optimal ε in real applications. To construct a stable graph and make full use of existing label information to enhance feature discriminability, this paper chooses the k-nearest neighbor method to generate the graph.
Stage 2: There are two strategies in Stage 2: the thermonuclear method and the simple method for weight distribution. If node i and node j are connected, then the weight is W ij ; otherwise, the weight is 0. For the simple method, the allocation of weights does not involve parameters. Only when node i and node j are connected, the W ij is 1; otherwise, it is 0. For connected nodes, this paper utilizes a thermonuclear strategy to distribute the weights as follows: When the value of ε or k is infinite and the eigenvector corresponding to the maximum eigenvalue is selected, the data will be mapped along the direction of the maximum covariance to preserve the global structure; otherwise, local characteristics will be maintained.
Stage 3: In this stage, eigenvectors are calculated by generalized eigenvalue decomposition: where D is a diagonal matrix and L = D − W is a Laplacian matrix. Considering that λ 1 < λ 2 < . . . λ n denote eigenvalues arranged from smallest to largest and column vectors q 0 , q 1 , . . . , q n represent the corresponding eigenvectors, the embedding of the LPP can be expressed as: where Q LPP is a transformation matrix with a size of n × k and each column represents an embedding of the LPP in manifold space. To a certain extent, the local neighborhood information of the data is effectively preserved.
Moreover, to solve the singular value problem, this paper uses a regularization strategy according to the idea in [35], adding a constant value to every diagonal element in matrix XDX T . For any regularization parameter ϕ, XDX T + ϕI (I is a unit matrix) is a nonsingular matrix when ϕ > 0. Therefore, by solving the eigenvalue decomposition problem with the regularization term, the first k eigenvectors corresponding to the eigenvalues (arranged from smallest to largest) are selected to form the manifold mapping matrix Q * LPP .
Finally, the representation of PolSAR data Y LPP in the low-dimensional manifold subspace can be obtained by the linear mapping function Y LPP = Q T LPP X, which well retains the local intrinsic characteristics of the original data.

Experiments and Analysis
Experiments were carried out on three typical datasets. The performance of the MFCN+LPP algorithm was quantitatively and qualitatively compared with the standard FCN-and LPP-based approaches. The discussion and analysis are presented in this section.

Experiments Data
To validate the effectiveness of the proposed classification scheme, we conducted extensive experiments on three classic PolSAR datasets for quantitative and qualitative comparison with other existing algorithms. The ground truth images of Flevoland Dataset 1 and San Francisco dataset are generated by manual annotation according to the associated optical image, which can be found in the Google Earth by the information provided in the metadata files.
(1) Flevoland Dataset1: These data were obtained in the Netherlands by the AIRSAR system of the NASA Jet Propulsion Laboratory in 1989. As a four-look fully polarized image of the L-band, its image size is 750 × 1024 pixels, and the resolution of the distance and azimuth are 6 and 12 m, respectively. According to Wang et al. [31], the whole scene has been divided into 11 species: rapeseed, grass, forest, peas, lucerne, wheat, beets, bare soil, stem beans, water and potatoes. This fully PolSAR image, a classic farmland dataset, has been widely used to verify pixel-level classification performances. Its Pauli pseudo-color map, corresponding ground truth and labels are shown in Figure 5, and the pixel number of each class is given in the bracket of label image. (2) San Francisco dataset: This dataset is a four-look fully polarized image of the L-band, collected by NASA Jet Propulsion Laboratory's AIRSAR system in the San Francisco Bay Area. The image size is 900 × 1024 pixels, and its spatial resolution is approximately 10 m × 10 m. As shown in the Pauli image and ground truth in Figure 6, there are three major types of objects, namely vegetation, sea and urban areas, the latter being further divided into high-density urban areas, developed urban areas and low-density urban areas. (3) Flevoland Dataset2: The third fully polarized SAR image adopted in this paper is also L-band AIRSAR data collected in Holland in 1991. The number of looks from the dataset is four, and the image of size 1024 × 1024 pixels contains only farmland areas. The scene consists of 14 crops: potatoes, fruit, oats, beet, barley, onions, wheat, beans, peas, maize, flax, rapeseed, grass and lucerne. Comprising a rich variety of geometric species with similar scattering characteristics, it is of great significance for this image to be utilized for verifying the proposed classification algorithm. Figure 7 shows the Pauli RGB map and the corresponding ground truth, as suggested by Zhang et al. [36].

Experimental Settings and Parameter Tuning
(1) Experimental settings : In the experiments, our datasets were all processed by multi-looking process, which is a basic method for speckle suppression of SAR images. Drawing on transfer learning, different pseudo-color maps were first fed into pre-trained multiple-FCN8s (MFCN) models to learn the multi-scale deep spatial features automatically. During this process, polarization characteristics and spatial information could be adaptively fused. Then, the outputs of the last layer from each FCN-8s were stacked to form spatially polarized fused features. For high-dimensional fused features (the labels are known in advance), a certain proportion of them was randomly selected as the training set, and the remaining selected for testing. The training set was utilized to construct an intrinsic graph and a penalty graph to learn the mapping relationship (represented by a matrix Q) from a high-dimensional feature space to a manifold subspace. Finally, an SVM classifier with a Gaussian kernel was trained to obtain the final classification results on the test samples.
To robustly evaluate the classification performance of the proposed algorithm, the overall accuracy (OA), kappa coefficient, and confusion matrix are employed as evaluation indicators, and comparative experiments were conducted with the following algorithms.
(a) Local preserving projection (LPP) finds a manifold representation of polarized features in a low-dimensional subspace without considering the spatial relationship between pixels. (b) FCN: A single pre-trained FCN is adopted to learn the multi-scale spatial structure of the PolSAR data. This nonlinear feature is actually a deep abstract semantic representation with relatively simple polarization properties. (c) FCN+LPP: A single FCN extracts the multi-scale spatial features in PolSAR imagery, which undergo dimensionality reduction by the LPP algorithm for further classification. (d) MFCN+LPP is he algorithm proposed in this paper. First, to describe various ground objects effectively, seven polarized decompositions are fed into multiple parallel FCN-8s models to learn multi-scale deep spatial features. Afterwards, through the manifold graph embedding model (LPP), the compact manifold representation of high-dimensional spatially polarized features from the MFCN is extracted for final classification.
(2) Parameter settings : The classification algorithm proposed in this paper can be divided into three stages: feature learning and fusion, dimension reduction and classification. In the first phase, there are millions of parameters in the FCN model to be learned. Considering the cost and scarcity of PolSAR data, this paper draws on transfer learning and employs FCN-8s, which have been pre-trained on the PASCAL VOC 2011 dataset. Good model parameters are migrated to learn the characteristics of the PolSAR data.
During this process, the original image is cut into four small patches; three of these patches are selected as the training set, and the remaining one is used for the test set. After four successive cycles, the classification results of the whole image can be obtained. Seven typical decomposition methods (Pauli decomposition, Cloude decomposition, Freeman decomposition, H/A/α decomposition, Huynen decomposition, Yamaguchi decomposition and Krogager decomposition) were adopted in the experiments. Pseudo-color images from some different decompositions and the corresponding constituents of the R, G, and B channels are shown in Figure 8 and Table 1, respectively. To accomplish this goal, seven parallel FCNs with pseudo-color images from different decompositions as input learned the multi-scale deep spatial features adaptively, achieving the simultaneous fusion of the polarized properties and spatial information.

Polarized Decompositions R Channel G Channel B Channel
During the second stage, dimensionality reduction is conducted based on the manifold graph embedding model, and the training set and test set are divided. According to our previous research [31], the training ratios for Flevoland Dataset 1, the San Francisco dataset and Flevoland Dataset 2 were set to 5%, 3% and 5%, respectively. In addition, when establishing the manifold graph embedding model, designing a weight matrix W and specifying a subspace dimension k also involve parameter selection. First, for a weight matrix W, a thermonuclear function (t = 1, Tikhonov regularization with a strength of 0.1) is chosen. In addition, the curve of the classification accuracy varies with the subspace dimensionality, as shown in Figure 9. Considering both the accuracy and time consumption, the subspace dimensions k of Flevoland Dataset 1, the San Francisco dataset and Flevoland Dataset 2 were set to 90, 50 and 35, respectively. For the final classification, an SVM with a Gaussian kernel is employed as the classifier, and we use five-fold CrossValidation to optimize the parameters. To more intuitively analyze the classification effect of the dataset on multiple comparative experimental methods in this paper, we assume that the intensity of speckle noise of the training data and test data in each dataset is consistent.

Experimental Results and Analysis
In this section, the classification results of the proposed method and comparative algorithms are described from multiple perspectives. Specifically, the corresponding classification accuracy values, OAs and kappa coefficients for the different algorithms are listed in Tables 2-4; the confusion matrix and visualized classification results of the proposed method are shown in Figures 10-15.      Table 2 shows that the LPP method has poor classification performance for all classes, with an average overall accuracy of 73.70%. The main reason is that the LPP algorithm is a shallow learning algorithm and lacks spatial constraints. The classification accuracy of the FCN has been greatly improved for all categories (OA of 90.99%) except 71.13% in Category 9, which is lower than that of the LPP algorithm. By contrast, with the FCN+LPP algorithm, the complementation of deep spatial features by the FCN and manifold graph embedding learning can greatly enhance the classification accuracy in most categories. Especially for rapeseed, beets and stem beans, the corresponding accuracy has been improved 6%, 13% and 19%, indicating that manifold subspace representation has gained deep multi-scale spatial features.
Compared with the FCN+LPP algorithm, the accuracy of the MFCN+LPP algorithm for almost every category has been improved, especially on beets and stem beans (5% for both). This finding demonstrates that polarized property passes through the presented MFCN, and fusions with multi-scale spatial features can be more effective for pixel-level classification.
For the San Francisco dataset, the experimental results in Table 3 show that the performance of the FCN+LPP algorithm is higher than that of the single LPP or FCN method. Looking at the accuracy only, the FCN outperforms the LPP algorithm, the former's classification accuracy is greater than 95% for all categories and 99.44% for sea, which proves that spatial features from the FCN play a leading role in distinguishing different ground objects. However, according to the visualized classification results presented in Figure 12b, it is obvious that the boundary areas have many misaligned pixels. For example, there are some misclassified samples in the intersection of categories indicated by yellow and green since the FCN cannot handle edge details well.
By integrating spatial features from the FCN and manifold representations from the graph embedding model, the LPP algorithm's advantage of retaining local characteristics can largely compensate for the FCN's defects. After combining the FCN and LPP methods, the overall accuracy reaches 98.46%. In the proposed algorithm, the MFCN+LPP algorithm further combines polarized information with spatial features through multi-parallel convolutional neural networks; then, the manifold graph embedding model is employed to remove redundancies, and the classification accuracy of vegetation, high-density areas and low-density areas continues to grow. Moreover, the OA of the low-density areas reaches 100%, indicating that multi-scale polarized spatial features have a strong discriminating ability for classification. However, the accuracy of sea and developed areas is lower than that of the FCN+LPP method; these two categories account for a large proportion of the area; thus, the proposed method's performance is not as good as that of the FCN+LPP method. Figure 14 shows the visualized classification results of different algorithms on Flevoland Dataset 2. The corresponding OAs and kappa coefficients for each class are given in Table 4. Although the LPP-based manifold method can obtain an OA up to 96.54%, the combination of the FCN and LPP methods achieves a better performance, with the OA increasing to 99.83% (FCN+LPP) and 99.54% (MFCN+LPP).
Compared with the single LPP method (56.39% for beans), the accuracy of both the FCN+LPP and MFCN+LPP methods on the same class increased by nearly 42%. For fruit, oats, wheat, peas, maize, flax, rapeseed, grass and lucerne, the MFCN+LPP algorithm achieves the best performance on both the visualization and classification accuracy. The overall accuracy for most of the categories mentioned above is 100% except 99.86% for rapeseed. The single FCN already has excellent classification performance with an OA as high as 97.45%, but its synergy with the LPP algorithm can boost the final results. More importantly, when polarized features are fed into the MFCN, the polarization information can effectively fuse with the spatial characteristics at multiple scales and levels, thereby improving the representation ability and discrimination ability of the extracted features. Finally, the runtime of the proposed algorithm and the other comparison methods are given in Tables 5 and 6. Among them, the experiments based on the FCN-8s models for deep multi-scale spatial feature extraction were conducted in the Caffe framework using an NVIDIA Tesla K40 GPU, which took 423.08, 490.98 and 559.86 s in Flevoland Dataset 1, the San Francisco dataset and Flevoland Dataset 2, respectively, which are approximately linear with the input image size. When the hardware conditions are satisfied, parallel operations can be performed on a machine with multiple GPUs to reduce the time consumption. Other experiments were performed using MATLAB 2014 on a computer with an Intel Core i5-4570 CPU and 32 GB RAM. Table 6 indicates that the single FCN and the FCN+LPP algorithm have lower time costs, while the time consumption of the MFCN+LPP algorithm is the highest because the generated feature space of the MFCN+LPP algorithm has the highest dimensions, increasing the computational complexity.

Conclusions
In this paper, to effectively describe terrain with various ground object types, we propose a parallel and multi-level fully convolutional neural network for PolSAR imagery classification. For better spatial-feature learning and fusion with polarized information, the network incorporates a shallow manifold embedding subspace representation to remove redundancies. In terms of the classification accuracy, the proposed algorithm can achieve results comparable to the state-of-the art algorithm, without the need for post-processing. Furthermore, the proposed algorithm is quite fast and efficient in small datasets. The effectiveness of the proposed algorithm mainly depends on the following aspects: (1) the FCN has the powerful potential to learn automatically nonlinear deep multi-scale spatial features, and the skip architecture in the network handles pixel-level classification well; (2) subspace learning based on manifold embedding can largely preserve local features and reveal the most essential structure of PolSAR data; and (3) the deep MFCN and shallow LPP algorithms can complement each other and improve the representation and discrimination ability of the spatially polarized features. Relatively speaking, the proposed method requires more time to train the network in the phase of feature extraction, unless a machine that supports parallel computing is used. This shortcoming is also one of the problems we want to improve upon in the future. In addition, phase information has also been proven to be meaningful for characterizing ground objects in many studies, but existing network architectures are usually designed for real values as input. However, we will still attempt to combine phase information contained in PolSAR imagery to boost classification performance.