Pixel-Wise PolSAR Image Classification via a Novel Complex-Valued Deep Fully Convolutional Network

Although complex-valued (CV) neural networks have shown better classification results compared to their real-valued (RV) counterparts for polarimetric synthetic aperture radar (PolSAR) classification, the extension of pixel-level RV networks to the complex domain has not yet thoroughly examined. This paper presents a novel complex-valued deep fully convolutional neural network (CV-FCN) designed for PolSAR image classification. Specifically, CV-FCN uses PolSAR CV data that includes the phase information and utilizes the deep FCN architecture that performs pixel-level labeling. It integrates the feature extraction module and the classification module in a united framework. Technically, for the particularity of PolSAR data, a dedicated complex-valued weight initialization scheme is defined to initialize CV-FCN. It considers the distribution of polarization data to conduct CV-FCN training from scratch in an efficient and fast manner. CV-FCN employs a complex downsampling-then-upsampling scheme to extract dense features. To enrich discriminative information, multi-level CV features that retain more polarization information are extracted via the complex downsampling scheme. Then, a complex upsampling scheme is proposed to predict dense CV labeling. It employs complex max-unpooling layers to greatly capture more spatial information for better robustness to speckle noise. In addition, to achieve faster convergence and obtain more precise classification results, a novel average cross-entropy loss function is derived for CV-FCN optimization. Experiments on real PolSAR datasets demonstrate that CV-FCN achieves better classification performance than other state-of-art methods.


I. INTRODUCTION
P OLARIMETRIC synthetic aperture radar (PolSAR) images have received a lot of attention as they can provide more comprehensive and abundant information compared with SAR images [1]. In the process of PolSAR image analysis and interpretation, PolSAR image classification is arguably rather typical and important. Until now, numerous traditional schemes have been developed for PolSAR image classification, such as Wishart classifiers [2]- [4], target decompositions (TDs) [5]- [8] and random fields (RFs) [9]- [11]. However, these traditional methods focus on extracting features which are not only mostly low-level and hand-crafted but also involve a considerable amount of manual trial and error [12]. Besides, hand-engineered features such as TDs features heavily rely on the complex analysis of PolSAR data. Meanwhile, the selection of descriptive feature sets is a burden regarding computation time.
With the rapid development of learning algorithms, several machine learning tools do perform feature learning (or at least feature optimization), such as support vector machines (SVMs) [13], [14] and random forest (RF) [12]. However, they are still shallow models that focus on a large number of input features and may not be robust to nonlinear data [15]. Recently, deep learning (DL) has achieved remarkable results in the remote sensing community [16]- [18]. Compared with the aforementioned conventional methods, DL techniques can automatically learn discriminative features and perform advanced tasks by multiple neural layers in an end-to-end manner, thereby reducing manual error and achieving promising results [19]. In recent years, some better DL-based algorithms have significantly improved the performance of PolSAR image classification, such as sparse autoencoder (SAE) [20], deep belief network (DBN) [21], convolutional neural network (CNN) [22], [23], deep fully convolutional neural network (FCN) [24]- [26], and so on.
Notably, most studies on DL methods for PolSAR classification tasks predominantly focus on the case of real-valued neural networks (RV-NNs). In RV-NNs, input, weights, and output are all modeled as real-valued (RV) numbers. This means that projections are required to convert the PolSAR complex-valued (CV) data to RV features as RV-NNs input.
Although RV-NNs have demonstrated excellent performance in PolSAR image classification tasks, there are a couple of problems stated by RV features. Firstly, it is unclear which projection yields the best performance towards a particular PolSAR image. Although the descriptive feature set generated by multi-projection has achieved remarkable results, a larger feature set will increase computing time and memory consumption, and may even cause data redundancy problems [12]. Secondly, projection sometimes means a loss of valuable information, especially the phase information, which may lead to unsatisfactory results. In fact, the phase of multichannel SAR data can provide useful information in the interpretation of SAR images. Especially for PolSAR systems, phase differences between polarizations have received significant attention arXiv:1909.13299v1 [eess.IV] 29 Sep 2019 for a couple of decades [27]- [30].
In view of the aforementioned problems, some researchers have begun to investigate networks which are tailored to CV data of PolSAR images rather than requiring any projection to classify PolSAR images. Hansch et al. [31] first proposed the complex-valued MLPs (CV-MLPs) for land use classification in PolSAR images. Shang et al. [32] suggested a complexvalued feedforward neural networks in the Poincare sphere parameter space. Moreover, an improved quaternion neural network [33] and a quaternion autoencoder [34] have been proposed for PolSAR land classification. Recently, a complexvalued CNN (CV-CNN) specifically designed for PolSAR image classification has been proposed by Zhang et al. [35], where the authors derived a complex backpropagation algorithm based on stochastic gradient descent for CV-CNN training.
Although CV-NNs have achieved remarkable breakthroughs for PolSAR image classification, they still suffer some challenges. Firstly, we find that relatively deep networks architectures have not received considerable attention in the complex domain. Structures of the above CV-NNs are relatively simple with limited feature extraction layers. This results in limited learning characteristics, which may yield the risk of suboptimal classification results. Secondly, these networks fail to sufficiently take spatial information into account to effectively reduce the impact of speckles on classification results. Due to the inherent existence of speckle in PolSAR images, the pixel-based classification accuracy is easily affected and even leads to incorrect results. In this case, those CV-NNs would be ineffective to explicitly distinguish complex classes, since only local contexts caused by small image patches are considered. Thirdly, it is necessary to construct a CV-NN for direct pixelwise labeling to predict fast and effectively. Actually, the image classification is a dense (pixel-level) problem that aims at assigning a label to each pixel in the input image. However, existing CV-NNs usually assign an entire input image patch to a category. This results in a large amount of redundant processing and leads to seriously repetitive computation.
In response to the above challenges, this paper explores a complex-valued deep FCN architecture, which is an extension of FCN to the complex domain. The FCN is first proposed in [36] and is an excellent pixel-level classifier for semantic labeling. Typically, FCN outputs a 2-dimensional (2D) spatial image and can preserve certain spatial context information for accurate labeling results. Recently, FCNs have demonstrated remarkable classification ability in the remote sensing community [37], [38]. However, to utilize FCN in the complex domain (i.e., CV-FCN) for PolSAR image classification, some tricky problems need to be tackled. Firstly, the CV-FCN tailored to PolSAR data requires a proper scheme for complex-valued weight initialization. Generally, FCNs are often pre-trained on VGG-16 [39], whose parameters are first trained using optical images and are all real-valued numbers. However, those parameters are not appropriate to initialize CV weights for CV-FCN and are ineffective for PolSAR images since they cannot preserve polarimetric phases information. Therefore, a proper complex-valued weight initialization scheme not only can effectively initialize CV weights but also has the potential to reduce the risks of vanishing or exploding gradients, thereby training rapidly and improving the performance of networks. Secondly, layers in the upsampling scheme of CV-FCN should be constructed in the complex domain. Although some works have extended some layers to the complex domain [31], [35], [40], upsampling layers have not yet thoroughly examined in such domain. Finally, in the training processing of CV-FCN, it is necessary to select a loss function for CV predicted labeling. The aim is to achieve faster convergence during CV-FCN optimization and obtain higher classification accuracy. Thus, how to design a reasonable loss function in the complex domain that is suitable for PolSAR images classification needs to be solved.
In view of the above-involved limitations, we present a novel complex-valued deep fully convolutional network (CV-FCN) for classification of PolSAR imagery. The proposed deep CV-FCN adopts the complex downsampling-then-upsampling scheme to achieve pixel-wise classification results. To this end, this paper focuses on four works: 1) complex-valued weights initialization for faster PolSAR feature learning; 2) multi-level CV features extraction for enriching discriminative information; 3) more spatial information recovery for stronger speckle noise immunity; 4) average cross-entropy loss function for more precise labeling results. Specifically, CV weights for CV-FCN are first initialized by a new complex-valued weight initialization scheme. This scheme explicitly focuses on the statistical characteristic of PolSAR data for training. Thus, it is very effective for faster training. Then, different-level CV features that retain more polarization information are extracted via the complex downsampling section. Those CV features have a powerful discriminative capacity for various classes. Subsequently, the complex upsampling section upsamples lowresolution CV feature maps and generates dense labeling.
Notably, for more spatial information retaining, the complex max-unpooling layers are used in the upsampling section. Those layers recover more spatial information by the max locations maps to reduce the effect of speckles on coherent labeling results as well as improve boundary delineation. In addition, to promote CV-FCN training more effectively, an average cross-entropy loss function is employed to update CV-FCN parameters. The loss function performs cross-entropy operations on the real and imaginary parts of CV predicted labeling, respectively. In this way, the phase information is also taken into account during parameters updating, resulting in more precise classification for PolSAR images. Extensive experimental results evidently reflect the effectiveness of CV-FCN for classification of PolSAR imagery. In summary, the major contributions of this paper can be highlighted as follows: In summary, the major contributions of this paper can be highlighted as follows: 1) The CV-FCN structure is proposed for PolSAR image classification, which weights, biases, input, and output are all modeled as complex values. The CV-FCN directly utilizes PolSAR CV data as input without any data projection, in which case it can extract multilevel and more robust CV features, which can retain more polarization information and have the powerful discriminative capacity for various categories. 2) A new complex-valued weight initialization scheme is employed to initialize CV-FCN parameters and conduct CV-FCN training from scratch. It allows CV-FCN to mine polarimetric features after only relatively few tuning. Thus, it can make CV-FCN training faster and save computation time. 3) A complex upsampling scheme for CV-FCN is proposed to capture more spatial information by max-unpooling layers. This scheme can not only eliminate the upsampling learning simplifying optimization but also recover more spatial information by max locations maps to reduce the impact of speckles. Thus, smoother and more coherent classification results can be achieved. 4) A new average cross-entropy loss function in the complex domain is employed for CV-FCN optimization. It takes the phase information into account during parameters updating by average cross-entropy operation of CV predicted labels. Therefore, the new loss function enables CV-FCN optimization more precise while boosting the labeling accuracy.
The remainder of this paper is organized as follows. Section II formulates a detailed theory for the classification method of CV-FCN. In section III, we conduct experiments on real benchmark PolSAR images and give detailed comparisons and analyses. Finally, the conclusion and future works are discussed in Section IV.

II. PROPOSED CV-FCN FOR CLASSIFICATION OF POLSAR IMAGERY
In this work, a deep CV-FCN is proposed to conduct PolSAR image classification. The CV-FCN method integrates the feature extraction module and the classification module in a unified framework. Thus, features extracted through CV-FCN that is trained by PolSAR data are more able to distinguish various categories for PolSAR classification tasks. In the following, we first give the framework of the deep CV-FCN classification method in Section II-A. Then, to learn more discriminative features for classification faster and more accurately, it is critical to train CV-FCN suitable for PolSAR images. Thus, we highlight and introduce four critical works for CV-FCN training in Section II-B, C, D. They include CV weight initialization, deep and multi-level CV feature extraction, more spatial information recovery, and loss function for more precise optimization. Finally, the CV-FCN classification algorithm is summarized in Section II-E.

A. Framework of the Deep CV-FCN Classification Method
The framework of CV-FCN classification method is shown in Fig. 1, which is composed of two separate modules: feature extraction module and classification module. In the feature extraction module, CV-FCN is trained to exploit the discriminative information. Then, the trained CV-FCN is used to classify PolSAR images in the classification module.
The data patches set and the corresponding label patches set are first prepared as input to CV-FCN before training. The two sets are generated from the PolSAR data set and the corresponding ground truth mask, respectively. Let the CV PolSAR dataset be H ∈ C h×w×B , where h and w are the height and width of the spatial dimensions respectively, and B is the number of complex bands, C is the complex domain. The corresponding ground truth mask is denoted as G ∈ R h×w . The set of all data patches cropped from the given data set is denoted as I = {I 1 , I 2 , . . . , I n }, and the corresponding label patches set is T = {T 1 , T 2 , . . . , T n }, where I i ∈ C H×W ×B and T i ∈ R H×W (i ∈ [1, n]) represent one data patch and corresponding label patch, respectively. Here n is the total number of patches, H and W are the patch size in the spatial dimension.
In the feature extraction module, the CV-FCN is mainly trained. A novel complex-valued weight initialization scheme is first adopted to initialize CV-FCN. Then, a certain percent of patches from the set I are randomly chosen as the training data patches I train to the network. These data patches are forwardpropagated through the complex downsampling section of CV-FCN [marked by red dotted boxes in Fig. 1] to extract multilevel CV feature maps. Then those low-resolution feature maps are upsampled by the complex upsampling section [marked by blue dotted boxes in Fig. 1] to generate predicated label patches. Subsequently, calculate the error between predicated label patches and the corresponding label patches T train according to a novel loss function, and then iteratively updating CV parameters in CV-FCN. According to some certain conditions, the updating iteration will terminate when the error value does not substantially change.
In the classification module, we feed the entire PolSAR dataset H to the trained network. The label of every pixel in this PolSAR image is predicted based on the output of the last complex softmax layer. Notably, compared with a CNN model which predicts a single label for the center of each image patch, the CV-FCN model can predict all pixels in the entire image at one time. Thus, this enables pixel-level labeling and can decrease the computation time during the prediction.

B. New Complex-valued Weight Initialization Scheme Using Polarization Data Distribution for Faster Feature Learning
The CV-FCN architecture for PolSAR image classification task has been systematically built, the weight initialization problem will arise when training the network. Generally, deep ConvNets can update from pre-trained weights generated by the transfer learning technique. However, those weights are all real-valued numbers and only reflect the backscattering intensities, while the loss of the polarimetric phase [41]. Here, based on the distributions of polarization data, a new complexvalued (CV) weight initialization scheme is employed for faster network learning.
For RV networks that process PolSAR images, learned weights, commonly known as kernels, can well characterize scattering patterns, particularly in high-level layers [22]. In [35], the initialization scheme is just to initialize the real and imaginary parts of a CV weight separately with a uniform distribution. Fortunately, for a reciprocal medium, a complex scattering vector u = [S 1 , S 2 , S 3 ] T can be modeled by a multivariate complex Gaussian distribution, where individual  [1]. Thus, we utilize this distribution to initialize complex weights in CV-FCN.
Suppose that a CV weight is denoted as W = R(W) + j.J(W), where the real component R(W) and the imaginary component J(W) are all identically Gaussian distributed with 0 mean and variance σ 2 /2. Here, the initialization criterion proposed by He et al. [42] is used to calculate the variance of W, i.e., Var(W) = 2/n in , where n in is the number of input units, since this criterion provides the current best practice when the activation function is ReLU.
Notably, the CV weight can also be denoted as W = |W|e jθ , where the magnitude |W| follows the Rayleigh distribution. The expectation and the variance are given by where σ is the single parameter in the Rayleigh distribution. In addition, the variance Var(|W|) and the variance Var(W) can be defined as According to the initialization rules of [40], in the case of W symmetrically distributed around 0, Var(W) = E |W| 2 . Thus, Var(W) can be formulated as Taking Equation (1) and Equation (2) into account, Var(W) is calculated as According to Hes initialization criterion and Equation (6), the single parameter in the Rayleigh distribution can be computed as σ = 2/n in . At this point, the Rayleigh distribution can be used to initialize the amplitude |W|. In addition, the phase θ is initialized by using the uniform distribution between −π and π. Thus, the initialization of the complex weight is finished.
It is worth noting that our initialization scheme is quite different from the random initialization on both the real and imaginary parts of a CV weight [35]. The most notable superiority of the new initialization scheme will be explicitly focusing on the statistical characteristic of training data, which makes it possible to learn a CV network suitable for PolSAR images after a small amount of fine-tuning. We can understand that the network exhibits some of the same properties as the data to be learned at the beginning, which seems to give a priori rather than the initial random information. Thus, it is possible to increase the potential chance of learning some special property of PolSAR datasets and is much effective for faster training.

C. Deep CV-FCN for Dense Feature Extraction
In the forward propagation of CV-FCN training procedure, dense features are extracted through the complex downsampling-then-upsampling scheme. The detailed configuration of CV-FCN is shown in Table I. The complex downsampling section first extracts effective multi-level CV features through downsampling blocks (i.e., B1-B5 in Fig. 1). Then, the complex upsampling section recovers more spatial information in a simple manner and produces dense labeling through a series of upsampling blocks (i.e., B7-B11 in Fig.  1). In particular, fully skip connections between the complex downsampling section and the complex upsampling section fuse shallow, fine features and deep, coarse features to preserve sufficient detailed information for complex classes distinction.
1) Multi-level Complex-valued Feature Extraction via the Complex Downsampling Section: The complex downsampling section consisting of downsampling blocks extracts 2-D CV features of different levels. In CV-FCN, five downsampling blocks are employed to extract more abstract and extensive features. Each of them contains four layers, including a complex convolution layer, a complex batch normalization layer, a complex activation layer, and a complex max-pooling layer. Among these layers, the main feature extraction work is performed in the complex convolution layer. Compared with the real convolution layer, it extracts CV features retaining more polarization information and discriminative information through the complex convolution operation.
In the lth complex convolution layer, given complex filters W l : W l ∈ C H×H×M l−1 ×N l and complex bias b l ∈ C N l , where M l−1 is the number of input channel and N l is the number of output channel. The output complex feature maps Y l ∈ C H l ×M l ×N l outputted by the complex convolution layer is computed by where X l−1 ∈ C H l−1 ×W l−1 ×M l−1 is the given input complex feature maps, and H l−1 × W l−1 is the input feature map size. ⊗ is the convolution operation in the complex domain. The matrix notation of the nth output complex feature map Y l n (n ∈ [1, N l ]) is given by where W l nm = R W l nm + j.I W l nm , and R W l nm and I W l mn are respectively the real part and the imaginary part of W l nm . is the convolution operation in the real domain. Thus the nth output complex feature map can be represented as The complex batch normalization (BN) layer [40] is performed for normalization after complex convolution, which holds great potential to relieve networks from overfitting. For the non-linear transformation of CV features, we find that the complex-valued ReLU (CReLU) as the complex activation can provide us good results. The CReLU is defined as where (x = R(x) + j.I(x)) ∈ C. Then the output Z l+1 ∈ C H l ×W l ×N l in the (l + 1)th complex nonlinear layer can be given Furthermore, the complex max-pooling layer [35] is adopted to generalize features into a higher level. In this way, features are more robust and CV-FCN can converge well. After five downsampling blocks, the block 6 (B6 in Fig. 1) including a complex convolution layer with 1×1 kernels and a complex batch normalization layer densifies its sparse input and extracts complex convolution features.
2) Using Complex Upsampling Section for More Spatial Information Recovery to stronger Speckle Noise Immunity: After the complex downsampling section for multi-level CV features extraction, a complex upsampling section is implemented to upsample those CV feature maps. Specifically, the new complex max-unpooling layers are employed in the complex upsampling section. The reason is two-fold. On the one hand, compared with the complex deconvolution layer which is another upsampling operation, the complex maxunpooling layer reduces the number of trainable parameters and mitigates information loss due to complex pooling operations. On the other hand, owing to the inherent existence of speckle in PolSAR images, obtaining smooth labeling results is not easy. This issue can be addressed by the complex maxunpooling layer that recovers more spatial information by the max locations maps [represented by purple dotted arrows in Fig. 1]. The spatial information is a critical indicator for confusing categories classification, which captures more wider visual cues to stronger speckle noise immunity.
To be more intuitive, Fig. 2 illustrates an example of the complex max-unpooling operation. The green and black boxes are simple structures of the complex max-pooling and complex max-unpooling, respectively. As shown in the green box, the amplitude feature map is formed by the real and imaginary feature maps where the red dotted box represents 2×2 pooling window with a stride of 2. In the amplitude feature map, four maximum amplitude values are chosen by corresponding pooling windows which are marked by orange, blue, green, and yellow, respectively. They construct the pooled map. At the same time, locations of those maxima are recorded in a set of switch variables which is visualized by the so-called max locations map. On the other hand, within the black box, the real and imaginary input maps are upsampled by the usage of this max locations map, respectively. Then the real and imaginary unpooled maps are produced. Here, those unpooled maps are sparse wherein white regions have the values of 0. This will ensure that the resolution of the output is higher than the resolution of its input.
In particular, we perform fully skip connections which can fuse multi-level features to preserve sufficient discriminative information for the classification of complex classes. Finally, the complex output layer with the complex softmax function is used to calculate the prediction probability map. Thus, the output of CV-FCN can be formulated as where Y l−1 ∈ C H×W ×K is the inputs of the complex output layer, g(.) is the softmax function in the real domain. In this layer, output feature maps are the same size as the data cubic fed into CV-FCN. This enables pixel-to-pixel training. After the complex downsampling section and the complex upsampling section, the complex forward propagation process of the training phase is completed.

D. Average Cross-entropy Loss Function for Precise CV-FCN Optimization
To promote CV-FCN training more effectively and achieve more precise results, a novel loss function is used as the learning objective to iteratively update CV-FCN parameters Θ during the backpropagation. Θ includes W l and b l . Usually, for multi-class classification tasks, the cross-entropy loss function performs well to update parameters. Compared with the quadratic cost function, it can increase the training speed and promote the training of NNs more effectively. Thus, a novel average cross-entropy loss function is employed for CV predicted labels in PolSAR classification tasks, which is based on the definition of the popular cross-entropy loss function. Formally, the average cross-entropy (ACE) loss function is defined as where O ∈ C H×W ×K indicates the output data cubic in the last complex softmax layer and K is the total number of classes. R ∈ C H×W ×K is the sparse representation of the true label patch T ∈ R H×W , which is converted by one-hot encoding. Notably, non-zeros positions within R ∈ C H×W ×K are 1+1·j instead of 1+0·j. This means that we also take the phase information into account during parameters updating. As a result, the updated CV-FCN can work effectively, leading to more precise classification results for PolSAR images.
Θ can be updated iteratively by J and learning rate η according to b l n [t .
To calculate (14) and (15), the key point is computing the partial derivatives. Note J is a real-valued loss function, it can be back-propagated through CV-FCN according to the generalized complex chain rule in [31]. Thus, the partial derivatives can be calculated as follows: When the value of loss function no longer decreases, the parameters update is suspended and the training phase is completed. Then the trained network will be used to predict the entire PolSAR image in the classification phase.
Algorithm 1 CV-FCN Classification Algorithm for PolSAR Imagery Input: PolSAR dataset H ∈ C h×w×B , learning rate η, batch size, momentum parameter. Output: Dense label k.
1: Construct the data patches set I and the label patches set T using H; 2: Initialize CV-FCN parameters Θ by Section II-B; 3: Choose the entire training set D train = {I train , T train } from I and T ; 4: Repeat: 5: Forward pass the complex downsampling section to obtain multi-level feature maps by Section II-C1; 6: Call the complex upsampling section to recover more spatial information by Section II-C2; 7: Calculate loss function E by Section II-D; 8: Update Θ by J and η. 9: Until:meet the termination criterion. 10: Classify the entire PolSAR image by forward passing the trained network to obtain k. 11: End

E. CV-FCN PolSAR Classification Algorithm
For more intuitive, the proposed CV-FCN PolSAR Classification algorithm is illustrated by Algorithm 1. Specially, we first construct the entire training set for CV-FCN and employ the new complex-valued weight initialization scheme to initialize the network. And then, we train CV-FCN by alternately updating CV-FCN parameters using the average cross-entropy loss function. Finally, the entire PolSAR image is classified using the trained network.

III. EXPERIMENTAL ANALYSIS AND EVALUATION
In this section, experimental datasets description and evaluation metrics are first presented. Then, input data vector and experimental settings are listed for CV-FCN training. Moreover, the effectiveness of some strategies for CV-FCN is analyzed in detail by a series of special experiments. Finally, comparisons with other classification methods on three PolSAR datasets are presented to demonstrate the superiority of the proposed CV-FCN.

A. Experimental Datasets Description
We use three benchmark PolSAR datasets for experiments. Details about these datasets are listed as follows. 1) Flevoland Benchmark dataset: Fig. 3(a) shows the PauliRGB image of Flevoland Benchmark data, which was acquired by NASA/JPL AIRSAR in 1991. The size of the image is 1020×1024. The ground-truth class labels and the corresponding color codes are shown in Fig. 3(b) and Fig.  3(c), respectively. There are 14 classes in the image including potato, fruit, oats, beet, barley, onions, wheat, beans, peas, maize, flax, rapeseed, grass, and lucerne.
2) San Francisco dataset: This AIRSAR full PolSAR image provides good coverage of four targets including water, vegetation, low-density urban and high-density urban. The original data has a dimension of 900×1024 pixels with a spatial resolution of 10 m, as shown in Figure 4(a). The ground-truth class labels and color codes are shown in Fig.  4(b) and Fig. 4(c).

3) Oberpfaffenhofen dataset:
This data is an ESAR data of Oberpfaffenhofen area in Germany, provided by the German Aerospace Center, has a size of 1300×1200 pixels. The Pauli-RGB image, the ground-truth class labels, and color codes are respectively shown in Fig. 5

B. Evaluation Metrics
With the hand-marked ground-truth images, the overall accuracy (OA), average accuracy (AA) and Kappa coefficient (κ) are used as the evaluation measures for classification performance evaluation. Where OA represents the ratio of the number of correctly labeled pixels divided by the total number of test pixels; AA is defined as the average of individual class accuracy; Kappa which does not consider the successful classification that obtained by chance gives a good representation of the overall performance of the classifiers. The larger values of three criteria, the better classification performance.
C. Preparing for Classifier Model Training 1) Complex-valued Input Vector for CV-FCN: Before training CV-FCN, the CV input vector needs to be determined. CV-FCN works directly on the PolSAR CV data without any data projection from the complex to the real domain. Since the coherency matrix or covariance matrix completely describes the distributed target [1], the PolSAR data is usually presented in these formats. The polarimetric coherency matrix T is calculated as where the superscript H denotes the complex conjugate transpose, L is the number of looks, and u i denotes the ith scattering vector in the multi-look processing window. The coherency matrix T is a Hermitian positive semidefinite matrix which implies that main diagonal elements are RV and other CV elements are conjugate symmetric about the main diagonal. Therefore, the six elements of the upper triangular matrix of T can be used to fully represent the PolSAR data [1]. So we utilize these six elements to construct the CV input vector for CV-FCN, which is represented by Here, imaginary parts of T 11 , T 22 , T 33 in CV input feature vector are all expanded with a value of 0. On the other hand, compared with CV input feature vector with phase information, the RV input feature vector without phase information can be represented by 2) Parameter Settings: Relevant parameter settings are required before CV-FCN training. For PolSAR image classification, some works of literature have discussed the sampling rate and parameter settings of NNs structures [23]- [25] in detail. Hence, we no longer spend time discussing again and will choose them through experiments.
In this paper, the sliding window operation in [43], [44] is used to generate the data patches set from experimental images and corresponding label patches set from ground truth images. Here, we choose 128 as the default setting of sliding windows size and 40 as the default setting of stride for all experimental datasets, which is a trade between classification performance and computational burden. Additionally, to mitigate overfitting on datasets, the data augmentation strategy [26] was carried out by vertically and horizontally flipping all patches. Then all these patches are the input data of the proposed CV-FCN, where 90% for training and 10% for validation, respectively. It is worth noting that only labeled pixels in individual label patch are considered in modifying parameters of the network during the training [25].
Moreover, Adam with momentum 0.9 is used to update CV-FCN parameters. The learning rate η is 0.0001. The size of mini-batch is empirically set to 30. The training epoch number is 200 until the objective function converges. Additionally, dropout regularization is adopted to reduce overfitting. In this paper, all non-deep methods are run on Matlab R2014b, and DL-based methods are implemented in the Keras framework with TensorFlow as the back end. The machine used for experiments is a Lenovo Y720 cube gaming PC with an Intel Core i7-7700 CPU, an Nvidia GeForce GTX 1080 GPU, and 16GB RAM under Ubuntu 18.04 LTS operating system. To make comparisons as fair as possible, we take the average of 10 experiments as the final result.

D. CV-FCN Model Analysis and Discussions
To evaluate the performance of some aspects in CV-FCN model, two ablation experiments and two comparison experiments are conducted as follows. Notably, the most perspective of the proposed CV-FCN is the complex-valued upsampling scheme, in which the fully skip connections and the max location maps are the two important strategies. Therefore, two ablation experiments are designed for comparison and evaluation. Specifically, the impact of fully skip connections in CV-FCN structure is investigated firstly. Then, the effect of max locations maps on classification performance is evaluated on all datasets. Additionally, a comparison experiment about the  1) Ablation Experiment 1 -Impact of Fully Skip Connections: The fully skip connections is an important part of CV-FCN because it enables the network to enhance more detail. The core idea is to superimpose feature maps of different levels to improve the final classification effect. Here, to evaluate the effectiveness of fully skip connections on classification accuracy, we construct the CV-FCN structure without skip connections. We use NS CV-FCN to represent this CV-FCN network. Table II contains the evaluation indices for classification.
As illustrated in Table II, CV-FCN outperforms NS CV-FCN which reveals that fully skip connections are useful for improving classification accuracy. Compared with NS CV-FCN, the proposed CV-FCN increases the accuracy by 0.99% of OA, 3.17% of AA, and 0.0116 of the Kappa coefficient, respectively, on the Flevoland dataset. Moreover, on the San-Francisco dataset, CV-FCN is able to achieve the accuracy increments by 1.32% of OA, 1.53% of AA, and 0.0183 of Kappa, respectively. In particular, on the Oberpfaffenhofen dataset, CV-FCN increases the accuracy significantly by 3.9% of OA, 5.87% of AA, and 0.07 of Kappa, respectively. This superiority can be attributed to the fact that fully skip connections fuse features of different levels to preserve more discriminative information for PolSAR image classification.
2) Ablation Experiment 2 -Impact of Max Location Maps: The most prominent trait in the complex upsampling section is that the max location maps are utilized to perform nonlinear upsampling of the feature maps, which are beneficial for more precise reconstruction output. To examine the effect of max locations maps, we construct a CV-FCN structure wherein complex upsampling layers upsample feature maps without the guidance of max locations maps. We call this network as NL CV-FCN. The experimental results on all datasets are shown in Table III.
As illustrated in Table III, CV-FCN outperforms NL CV-FCN on all three datasets. Specifically, CV-FCN is able to achieve the accuracy increments by 0.43% of OA, 1.73% of AA, and 0.0052 of Kappa, respectively, on the Flevoland dataset; by 0.72% of OA, 0.81% of AA, and 0.001 of Kappa, respectively, on the SanFrancisco dataset; by 1.46% of OA, 2.07% of AA, and 0.0258 of Kappa, respectively, on the Oberpfaffenhofen dataset. These results suggest that the max locations maps benefit the classification accuracy since they have the capacity of retrieving more sufficient spatial information.

3) Comparison Experiment 1 -Complex Weight Initialization:
To evaluate the impact of complex-valued weight initialization which is critical for CV-FCN learning, we conduct a comparison experiment on the Flevoland Benchmark dataset. We only utilize the complex weight initialization in [35] as  the old CV initialization scheme for comparison. The old CV initialization scheme is to initialize the real and imaginary parts of a CV weight separately with a uniform distribution [35]. Fig. 6 illustrates the difference in the validation curves of one presentative experiment, where the proposed weight initialization and the compared weight initialization are denoted by CWI-1 and CWI-2, respectively. Furthermore, we also report the evaluation indices of two initialization schemes as a function of epoch. Specifically, we first train CV-FCN for 10 epochs and then update classification results every 10 epochs. Table IV contains a comparison of the results.
As shown in Fig. 6, both old initialization and proposed initialization lead to convergence, but proposed initialization trains CV-FCN faster and reaches the optimal value earlier.
As illustrated in Table IV, proposed initialization achieves the best results when the training epoch is set as around 60, while the old initialization achieves is around 120. These results validate that the proposed initialization not only facilitates faster learning but also improves the classification performance of CV-FCN. This may be attributed to the ability of proposed initialization to reduce risks of vanishing or exploding gradients, which has great significance for deep networks training. Additionally, these phenomenons, in part, illustrate that the proposed initialization scheme is suitable for CV-FCN to achieve given PolSAR image classification.

4) Comparison Experiment 2 -Loss Function:
We carry out a comparison experiment on the Oberpfaffenhofen dataset to evaluate the effectiveness of the complex average crossentropy loss function. The complex-valued mean square error (MSE) in [35] and the complex-valued mean absolute error (MAE) are utilized as compared loss functions, which will be respectively denoted by CMSE and CMAE. The CMSE and the CMAE can be respectively expressed as The overall accuracy curves for different loss functions of training and validation illustrated in Fig. 7(a) and Fig. 7(b), respectively. Moreover, the resulted typical classification maps for different loss functions are shown in Fig. 8.
As seen from Fig. 7, the proposed complex loss function denoted by ACE and the CMAE can converge faster than CMSE. The training and validation accuracies by using the ACE and CMAE remain relatively stable after 120 epochs, while CMSE still does not achieve similar stability until 250 epochs. Additionally, the best accuracy of the proposed loss function is higher than the CMAE when validation. As shown in Fig. 8, the classification map with CMSE is smoother than CMAE and ACE, this finding is potentially explained that the CMSE loss can mitigate the effects of the speckle noise. However, boundary delineation between different categories are ambiguous due to being too smooth. Although the classification map with CMAE contains clear structural information, it has more misclassification points since affected by the speckle noise. Notably, the proposed loss function can achieve correct boundary localization as well as shows better robustness to the speckle noise. Thus, these above phenomena can partially establish the effectiveness of the proposed loss function employed.

E. Comapring Models
We demonstrate the effectiveness of the proposed method by comparison with some state-of-the art methods including SVM [13], Wishart classifier [3], Markov random field (MRF) [9], MLP [20], CVNN [31], CNN [22], CV-CNN [35], and FCN [43]. In the previous second part, we have already introduced the structure of CV-FCN. The specific settings of comparison methods are briefly described as follows.
• Non-deep methods: The non-deep methods include SVM [13], Wishart classifier [3], and MRF [9]. They all adopt the input feature vector shown in Equation (20). For SVM-based methods, the radial basis function (RBF) kernel is chosen advised by [13]. For MRF, parameters are set according to the original publication [9]. to represent networks in [22] and [35], respectively. According to SV-SCNN in [22], the architecture of CV-SCNN is adjusted, which contains the input layer, two convolution layers interleaved with two pooling layers, two fully connected layers, and the softmax layer. For SCNNs, a 32×32 neighborhood of each pixel is employed as the patch for training. • RV-DCNN/CV-DCNN: The downsampling section in FCN is transformed from a CNN structure. Therefore, for a fair comparison between FCN and CNN, we construct a new CNN structure represented by DCNN according to CV-FCN structure. Table VI reports the detail configuration of SV-DCNN and CV-DCNN. Compared with CNNs in [35], DCNNs contain more convolutional layers. For DCNNs, we have the same operation as SCNNs to generate patches for training.

F. Classification Performance Evaluation
To evaluate the effectiveness of CV-FCN, comparisons with above models on three PolSAR datasets are presented as follows.

1) Flevoland Benchmark Dataset Result:
For this dataset, we randomly choose 5% of available labeled samples per class for training. The classification maps obtained from all methods are shown in Fig. 9, and the accuracies are reported in Table  V.
As shown in Fig. 9(b) and Fig. 9(c), classification maps obtained from SVM and Wishart are seriously affected by speckle noisy points since they only consider polarimetric information. Compared with Fig. 9(b) and Fig. 9(c), the classification map from MRF shown in Figure 9(d) is much clearer in which misclassification pixels are significantly reduced. The reason is that MRF can embed the spatial smoothness information into the classification stage. Fig. 9(e)-(l) demonstrate the classification resluts from all DL-based methods, where Fig. 9(e)-(h) are the results of different RV-NNs and Fig. 9(i)-(l) give the results of different CV-NNs. It can be seen that all DL-based methods outperform non-deep methods, which indicates that learning features have stronger discriminative ability than traditional features.
When comparing RV-NNs, it can be seen that RV-FCN performs best for the classification of flax class [marked by white ovals in Fig. 9(e)-(h)]. In addition, among CV-NNs, CV-FCN has the highest classification accuracy on the beet class [marked by yellow ovals in Fig. 9(i)-(l)] and the whole class label map of CV-FCN is much clearer than others. The above two results indicate that proposed FCN architecture is advantageous for PolSAR classification compared to other network structures, especially CNNs.
Moreover, comparing RV-NNs and CV-NNs directly, we can observe that CVNNs have better performance than their RV counterparts. For example, Fig. 9(h) and Fig. 9(l) are classification results from RVFCN and CV-FCN, respectively. The confusion between oats class and beet class is severe in Fig. 9(h), but does not appear in Fig. 9(l) [marked by skyblue rectangles]. This confirms the effectiveness of complexvalued features with phase information for the classification of PolSAR imagery. From the overall effects depicted in Fig.  9, the classification map of CV-FCN is noticeably closer to the ground truth map.
The evaluation indices of all methods are listed in Table  V. As shown in Table V, MRF and all DL-based methods achieve OA exceeding 90%. All CV-NNs methods achieve better performance than their RV counterparts in terms of all evaluation metrics. In particular, the largest part of changes   Fig. 9. In summary, from Fig. 9 and Table V, for Flevoland Benchmark dataset, CV-FCN achieves the best performance compared with other methods and has powerful ability to distinguish different terrain categories.
2) San Francisco Dataset Result: For the San Francisco dataset, we randomly choose 1% labeled pixels per class for training, and the remaining for testing. The classification results obtained from all methods are shown in Fig. 10, and Table VII reports the evaluation metrics of them. Fig. 10(b) and Fig. 10(c) give classification results using SVM and Wishart classifier. It can be viewed that vegetation, low-density urban and high-density urban are severely mixed and there are many isolated pixels in images. Fig. 10(d) shows the classification result obtained from MRF, where the confusion between low-density urban and high-density urban is not severe. In addition, misclassification occurs much slightly than the previous two methods, as MRF can consider  the spatial information to obtain a smoother classification map. Nevertheless, due to limited discriminative features, it is difficult for non-deep methods to distinguish complex backscatters, especially for vegetation and urban areas. Fig. 10(e)-(l) show the classification results of DL-based methods. From Figure 10(e)-(h), it is worth noting that RV-FCN outperforms than the other three methods, where boundaries between categories are much clearer. In addition, from Fig. 10(i)-(l), CV-FCN yields the optimal visual effect compared with other CV-NNs. All of the above analysis about NNs demonstrate the effectiveness of proposed FCN structure, which can capture more discriminative features and effectively incorporate more spatial information. Furthermore, from Fig.  10, CV-FCN achieves the best performance, which illustrates that both the FCN structure and the phase information have contributions to improve classification accuracies.
As Table VII shows, CV-FCN achieves the highest classification accuracy. The OA value of CV-FCN is about 3%, 4% higher than CV-DCNN and CV-SCNN, respectively, which indicates that proposed FCN structure is suitable for PolSAR data. In addition, the results of CV-FCN are slightly better than RV-FCN. This confirms that the phase information plays an important role in the improvement of classification accuracy. Furthermore, CV-FCN yields the highest accuracies in all evaluation metrics, which is coincident with results in Fig.  10.
3) Oberpfaffenhofen Dataset Result: For the Oberpfaffenhofen dataset, we also choose 1% of pixels with ground-truth class labels for training. Fig. 11 shows the visual classification results. The overall evaluation indices are given in Table VIII and Fig. 12 demonstrates the classification accuracies of every class obtained from different methods.
From Table VIII, CV-FCN achieves the best performance in terms of all metrics. The accuracies of non-deep methods are poor which are all below 85% in terms of OA. This might be a result of limited labeled pixels as prior information and little discriminative features. It can be also seen that CV-NNs have better performance than their RV counterparts. However, this superiority is not prominent. In terms of OA, CV-MLP, CV-SCNN, CV-DCNN, and CV-FCN are only 0.66%, 0.17%, 1.01%, and 1.75% higher than RV-MLP, RV-SCNN, RV-DCNN, and RV-FCN, respectively. Fig. 11(e)-(h) and Fig. 11(i)-(l) show classification results of RV-NNs and CV-NNs, respectively. As shown in Fig. 11(e)-(h), the classification result using RV-FCN is much clear than the other three, especially in the purple boxes which are noticeably closer to the ground truth map. This situation also occurs in the comparison among CV-NNs. Comparing all results shown in Fig. 11, for this dataset, the classification map of CV-FCN is the best close to the ground truth map.
As shown in Fig. 12, non-deep methods have poor abilities to distinguish built-up areas and wood land. That can also be observed in Fig. 11(b)-(d) where the misclassification in whole images is severe and all classification maps have many isolated pixels. However, the accuracies of wood land and open areas using DCNNs and FCNs are all over 95%, which illustrates the discriminative feature learning ability of deep networks. In addition, CV-FCN is advantageous in terms of accuracies for all categories relative to other methods, which demonstrates its effectiveness in extracting more discriminative features. Overall, the above analyses exactly illustrate that CV-FCN can exhibit better contextual consistency and extracts more discriminative features for PolSAR image classification.
As the above comparisons demonstrate, the classification performance of CV-FCN exceeds other methods on all PolSAR datasets. On the one hand, CV-FCN improves the classification accuracy effectively compared to its RV counterpart (i.e., RV-FCN). Meanwhile, this conclusion is also established in other network structures, which confirms the validity of complexvalued features containing the phase information. On the other hand, compared with CNN structures, CV-FCN can perform more coherent labeling and show better robustness to the speckle noise, while resulting in smooth classification with precise location. This demonstrates the effectiveness of CV-FCN architecture in considering more spatial information and extracting more discriminative features for PolSAR image classification.
IV. CONCLUSION In this paper, a novel complex-valued (CV) pixel-level model called CV-FCN has been proposed for PolSAR image classification, which obtains better performance compared with non-deep methods and other DL-based methods. This model integrates the feature extraction module and the classification module in a unified framework. For learning meaningful features faster, a new complex-valued weight initialization scheme is proposed to initialize CV-FCN. It greatly facilitates faster learning for this network and is beneficial to improve CV-FCN performance. Then, different-level and robust CV features that retain more discriminative information are extracted via CV-FCN. Particularly, a new complex upsampling scheme in CV-FCN is proposed to output CV predicted labeling. It also recovers rich spatial information with max locations maps to alleviate the problem of speckle noise. Furthermore, a novel average cross-entropy loss function is presented for more precise CV-FCN optimization. The proposed CV-FCN model can enable pixel-to-pixel classification results directly using the PolSAR CV data without any data projection. Moreover, it automatically learns a higher-level feature representation and fuses multi-level features for accurate categories identification. Experimental results on real benchmark PolSAR images show that CV-FCN achieves comparable or better results than the comparing models.
In the future, this work may be continued with the following ideas: 1) Some experiments demonstrated the effectiveness of the new complex-valued weight initialization scheme to initialize CV-FCN for PolSAR image classification. However, it still needs strong cues to prove the superiority and some visualization to observe the difference; 2) With the limitation of available PolSAR datasets and high-quality training datasets, training rather deep complex-valued networks devoted to Pol-SAR classification is very challenging, often yielding the risk of overfitting and model collapse. Moreover, data augmentation strategies for natural images are generally not suited for PolSAR images to enlarge training datasets because of the difference of imaging mechanisms. So it appears that an available data augmentation strategy is urgently necessary to tackle the above issues.