Wetlands Classiﬁcation Using Quad-Polarimetric Synthetic Aperture Radar through Convolutional Neural Networks Based on Polarimetric Features

: Wetlands are the “kidneys” of the earth and are crucial to the ecological environment. In this study, we utilized GF-3 quad-polarimetric synthetic aperture radar (QP) images to classify the ground objects (nearshore water, seawater, spartina alterniﬂora, tamarix, reed, tidal ﬂat, and suaeda salsa) in the Yellow River Delta through convolutional neural networks (CNNs) based on polarimetric features. In this case, four schemes were proposed based on the extracted polarimetric features from the polarization coherency matrix and reﬂection symmetry decomposition (RSD). Through the well-known CNNs: AlexNet and VGG16 as backbone networks to classify GF-3 QP images. After testing and analysis, 21 total polarimetric features from RSD and the polarization coherency matrix for QP image classiﬁcation contributed to the highest overall accuracy (OA) of 96.54% and 94.93% on AlexNet and VGG16, respectively. The performance of the polarization coherency matrix and polarimetric power features was similar but better than just using three main diagonals of the polarization coherency matrix. We also conducted noise test experiments. The results indicated that OAs and kappa coefﬁcients decreased in varying degrees after we added 1 to 3 channels of Gaussian random noise, which proved that the polarimetric features are helpful for classiﬁcation. Thus, higher OAs and kappa coefﬁcients can be acquired when more informative polarimetric features are input CNNs. In addition, the performance of RSD was slightly better than obtained using the polarimetric coherence matrix. Therefore, RSD can help improve the accuracy of polarimetric SAR image classiﬁcation of wetland objects using CNNs.


Introduction
Several types of ground objects are distributed in wetlands, making their classification challenging [1]. Understanding the distribution of ground objects in wetlands can help prevent alien species from encroaching on the living environment of local species that may otherwise cause an imbalance in the ecological environment. A good survey of the distribution of ground objects in wetland areas can provide technical support for wetland protection. In recent years, a large number of studies have focused on the classification of wetlands. In 2008, Touzi et al. [2] proposed the Touzi decomposition method for extracting polarization information from synthetic aperture radar (SAR) images and applied the extracted polarimetric features to classify wetland areas which provided a new Remote Sens. 2022, 14, 5133 2 of 19 method of wetland classification. However, this decomposition method still has space to advance. Chen et al. [3] investigated the influence of different polarimetric parameters and an object-based approach on the classification results for various land use or land cover types in coastal wetlands in Yancheng using quad-polarimetric ALOS PALSAR data. The results showed that utilizing polarimetric parameters such as Shannon entropy can notably improve the classification results. It also demonstrated that different polarimetric parameters and object-based methods could notably improve the classification accuracy of coastal wetland land cover using QP data. It shows that these polarimetric parameters are helpful for wetland classification. Yang et al. [4] fused GF-1 wide format optical image and RadarSat-2 SAR image, then used a support vector machine (SVM) method for supervised classification. The results indicated that the accuracy of the fused image was higher than that of the single. Moreover, using the SVM method of optical and SAR image fusion could obtain more ground feature information and thus improve performance. He et al. [5] proposed an efficient generative adversarial network: ShuffleGAN, which uses Jilin-1 satellite data to classify wetlands. ShuffleGAN is composed of two neural networks (i.e., generator and discriminator), which behave as adversaries in the training phase, and ShuffleNet units were added in both generator and discriminator with a speed-accuracy tradeoff. Compared with the existing generative adversarial network (GAN) algorithm, the final overall accuracy of ShuffleGAN is higher by 2% and is effective for analyzing land cover.
Apart from the above research, the following works are worth mentioning. Liu et al. [6] used C-band sentinel-1 and L-band ALOS-2 PALSAR data to determine the distribution of coastal wetlands in the Yellow River Delta. Using three classical machine learning algorithms, namely naive Bayes (NB), random forest (RF), and multilayer perceptron (MLP), they proposed an algorithm based on SAR coherence, backscatter intensity, and optical image classification. The OA was 98.3%. This method is superior to a single data source, indicating that using more satellite data can improve the classification accuracy of machine learning algorithms. Gao et al. [7] combined hyperspectral and multispectral images based on a CNN method and designed a spatial-spectral vision transformer (SSVIT) to extract sequence relations from the combined images. This is also a case of using multiple satellite data to classify wetland ground objects. In 2021, Gao [8] et al. proposed a depthwise feature interaction network to classify the multispectral images of the Yellow River Delta region. A depthwise cross-attention module was designed to extract self-correlation and crosscorrelation from multisource feature pairs. Thus, meaningful complementary information is emphasized for classification. Chen et al. [9] used an object-oriented method to classify polarimetric synthetic aperture radar (PolSAR) images of coastal wetlands based on the scattering characteristics of polarization decomposition and finally achieved an overall accuracy of 87.29%. However, this method was ineffective for separating reed from Spartina alterniflora and could be improved for more detailed classification. Delancey et al. [10] applied a deep CNN and a shallow CNN to classify large-area wetlands and compared the effectiveness of the two CNNs. The experimental results indicated that a deep CNN could extract more informative features of ground objects and be useful for complex land use classification. However, the depth neural network is not suitable for all wetland classification occasions, and it also needs to consider satellite spatial resolution, data type, surface features, and other factors to determine the depth of the network. Banks et al. [11] classified wetlands using an RF algorithm using images combined with SAR and digital elevation model (DEM) data with different resolutions. The results indicated that PolSAR data are reliable for wetland classification. From investigation, deep learning methods have a wide application in wetland classification areas.
A PolSAR transmits and receives electromagnetic waves through different polarization modes and multiple channels, forming a complete polarization basis. Thus, the polarization scattering matrix and scattering features can be obtained. A PolSAR is used for active remote sensing and observing targets by actively transmitting electromagnetic waves to target surfaces; the PolSAR then receives scattered information reflected from targets. Moreover, it can capture all-day and all-weather high-resolution images. By using PolSAR images from the GF-3 satellite, several polarization features of targets can be acquired through polarization decomposition. The back-scattered information of different ground objects is different. We incorporated this principle to classify PolSAR images through convolutional neural networks (CNNs) in this study.
In addition, CNNs are highly popular in areas of computer vision and are used in domains such as region of interest (ROI) [12][13][14][15], synthetic aperture sonar (SAS) image classification [16][17][18][19], visual quality assessment [20], mammogram classification [21], brain tumor classification [22], and PolSAR image classification [23][24][25][26][27] et al. Especially in the field of PolSAR, in recent years, many new CNN frameworks were proposed by researchers. These CNNs are superior to the existing methods either in accuracy or efficiency. For example, Wang et al. [23] proposed a method named vision transformer (ViT) for Pol-SAR classification. The ViT can extract features from the global range of images based on a self-attention block which is suitable for PolSAR image classification at different resolutions. Dong et al. [24] firstly explored the application of neural architecture search (NAS) in the PolSAR area and proposed a PolSAR-tailored differentiable architecture search (DARTS) method to adapt NAS to the PolSAR classification. The architecture parameters can be optimized with high efficiency by a stochastic gradient descent (SGD) method rather than randomly setting. Dong et al. [25] introduced the state-of-the-art method in natural language processing, i.e., transformer into PolSAR image classification for the first time to tackle the problem of the bottleneck that may be induced by their inductive biases. This is a meaningful work that provided new thoughts in this underexploited field. Nie et al. [26] proposed a deep reinforcement learning (RL)-based PolSAR image classification framework. Xie et al. [27] proposed a novel fully convolutional network (FCN) model by adopting a complex-valued domain stacked-dilated convolution (CV-SDFCN). The proposed method adopts the FCN model combined with polarimetric characteristics for PolSAR image classification.
Conventionally, the physical scattering features [28] and texture information [29] of SAR are broadly adopted. Some SAR classifications at the pixel level are enough in low and medium spatial resolution. However, for target recognition and classification, reflecting the texture features of targets at the pixel level is not sufficient. Deep CNNs can effectively extract not only polarimetric features but also spatial features from PolSAR images which can comprehensively classify ground object [30]. A few traditional classification methods, such as the gray-level co-occurrence matrix [31] and four component decomposition [32], can be used to classify PolSAR images; however, these methods could not extract all information from the data. With the development of computer hardware, several excellent neural networks have been proposed such as SVM, random forest (RF) [33], deep belief network (DBN) [34], stack autoencoder (SAE) [35], and deep CNN [36,37]. Thus, the efficiency and accuracy of data recognition and classification tasks have been improved considerably. With the help of deep learning, the terrain surface classification using PolSAR images is a direction of SAR. Several studies have applied these algorithms to the applications such as classification [38], segmentation [39], and object detection [40] of SAR images [41][42][43][44] and achieved desirable results.
Early research on neural networks primarily focused on the classification of SAR images using the SAE algorithm and its variants [45][46][47][48]. Instead of simply applying an SAE, Geng et al. [45] proposed a deep convolution autoencoder (DCAE) to extract features automatically. The first layer of the DECA is the manually designed convolution layer, wherein the filter is predefined. The second layer performs scale transformation and integrates relevant neighborhood pixels to reduce speckles. After the two layers, a trained SAE model was used to extract more abstract features. In high-resolution unipolar TerraSAR-X images. Based on the classification of SAR using DCAE, Geng et al. [46] proposed a deep supervised contraction neural network (DSCNN) with a histogram of a directional gradient descriptor. In addition, a supervised penalty was designed to capture the information between features and tags, and a contraction constraint was incorporated to enhance local invariance. Compared with other methods, DSCNN can be used to classify images with higher accuracy. Zhang et al. [47] applied a sparse SAE to PolSAR image classification by considering local spatial information. Hou et al. [48] proposed a method that involved combining superpixels for PolSAR image classification. Multiple layers of an SAE are trained pixel by pixel. Superpixels are formed with a pseudocolor image based on Pauli decomposition. In the last step of k-Nearest neighbor superpixel clustering, the output of the SAE is used as a feature.
In addition to using the SAE algorithm, some scholars also have realized the classification of PolSAR images using CNNs [49][50][51][52]. Zhao et al. [49] proposed a discriminant DBN for SAR image classification. It extracts discriminant features in an unsupervised manner by combining ensemble learning with the DBN. In addition, most of the current deep learning methods use the features of polarization information and spatial information of PolSAR images. Gao et al. [50] proposed a two-branch CNN to achieve the feature classification of two kinds of holes. This method involves two types of feature extraction: the extraction of polarization features from a six-channel real matrix and the extraction of spatial features through Pauli decomposition. Next, two parallel and fully connected layers combine the extracted features and input them into a softmax layer for classification. Wang et al. [51] proposed a CNN named full convolution network that integrates sparse and low-rank subspace representation for PolSAR images. Qin et al. [52] applied an enhanced adaptive restricted Boltzmann machine for PolSAR image classification.
The polarization coherency matrix (T) contains complete information regarding the polarization scattering of the targets. Since the back-scattering coefficients of different targets are different, studies have used three diagonal elements and correlation coefficients of nondiagonal elements of the T matrix to classify PolSAR images with high accuracy [53][54][55][56]. Instead of using the information in the T matrix for classification, a polarization covariance matrix also be used. For example, Zhou et al. [53] first extracted a six-channels covariance matrix and then inputted it into a trainable CNN for PolSAR image classification. Xie et al. [55] used a stacked sparse auto-encoder for multi-layer PolSAR feature extraction. In addition, the input data is represented as a nine-dimensional real vector extracted from the covariance matrix. After polarimetric decomposition, surface scattering, double-bounce scattering, and volume scattering could be acquired [56]. A few studies have implemented these three components in CNNs with overall accuracy (OA) of as high as 95.85% [57]. A few studies have also combined the information in the T matrix and polarization power parameters to classify PolSAR images with different weights. OA was higher by adjusting the weights [58]. Chen et al. [54] improved the performance of a CNN by combining the target scattering mechanism and polarization feature mining. A recent work by He et al. [59] combined the features extracted using nonlinear manifold embedding; next, they applied FCN to input PolSAR images. The final classification was performed using the SVM integration method. In [60], the authors emphasized the computational efficiency of deep learning methods and proposed a lightweight 3D CNN. They demonstrated that the classification accuracy of the proposed method was higher than that of other CNN methods. The number of learning parameters was notably reduced, and high computational efficiency was achieved.
Through the literature survey, we discovered that ground objects could be classified using polarization features decomposed from PolSAR images. However, few studies have focused on polarization scattering features decomposed using excellent polarization decomposition algorithms. Reflection symmetry decomposition (RSD) [61] is an effective algorithm and can be used to obtain polarization scattering characteristics. This study investigated if higher classification accuracy can be acquired when more informative polarimetric features were input CNNs. To this end, this paper proposes four schemes to explore the combination of polarization scattering features for QP image classification based on polarization scattering characteristics obtained through RSD and T matrix. The remainder of this paper is as follows: the first section serves as an introduction to the current research progress of using polarization scattering information and describes the innovations in current; the second section discusses the research area and data preprocessing; the third section discusses the experimental method and the experimental process; the fourth section presents an analysis of the experimental results and accuracy; finally, we discussed the strengths and limitations of the experiment.
The main goals of this study were, therefore, (1) to provide a method for wetlands classification based on polarimetric features; (2) to examine the power of classical CNNs for the classification of back-scattering similar wetland classes; (3) to investigate the generalization capacity of existing CNNs for the classification of different satellite imagery; (4) to explore polarimetric features which are helpful for wetland classification and provide comparisons with different polarimetric features combinations; (5) to compare the performance and efficiency of the most well-known deep CNNs. Thus, this study contributes to the CNN classification tools for complex land cover mapping using QP data based on polarimetric features.

Study Area and Data
GF-3 satellite is China's first C-band high-resolution PolSAR. The satellite has 12 imaging modes, and the spatial resolution ranges from 1 to 500 m. The images obtained by the satellite are broadly used, for example, ship recognition [62], terrain surface classification [63], and feature classification [64]. The quad polarization band 1 (QPSI) imaging mode of the GF-3 satellite has an incidence angle range of 21 • -41 • and an imaging bandwidth of 20-35 km, which is suitable for researching the Yellow River Delta. Therefore, we selected the QPSI mode of GF-3 to classify ground objects in the Yellow River Delta.
The QP images can be downloaded from the China Ocean Satellite Data Service System [65]. We selected four scenes' images (14 September 2021 [two images]; 13 October 2021 [one image]; and 12 October 2017 [one image]), the first three images were for training, and the last one was for testing. The imaging mode selected was QPSI and 8 m in spatial resolution. The longitude and latitude ranges were (118 • 33 -119 • 20 E, 37 • 35 -38 • 12 N), and the incidence angle (inc. angle) ranges from 30.97 • -37.71 • . The images selected for the experiment are specified in Table 1.

Data Preprocessing
Before inputting the QP images into CNNs, the images should be processed through radiometric calibration, polarization filtering, polarization decomposition, pseudocolor synthesis, data normalization, training dataset making, and so on. The radiometric calibration formula was described in [66].
Owing to the imaging mechanism of PolSAR, speckles are inevitably generated in the images, thereby reducing the accuracy. The non-local mean filtering method proposed by Chen et al. in 2011 can effectively suppress speckles. In the non-local mean filtering method, the neighborhood is considered a single unit [67]. This method not only focused on the similarity between two individual pixels but also the similarity between two pixels. It is more stable and effective than traditional neighborhood filtering methods. Therefore, we adopted non-local mean filtering to despeckle PolSAR images.
The material, size, and shape of the ground objects could influence polarimetric features. The back-scattering coefficients of different targets are different because of characteristics such as shapes. In PolSAR, the back-scattering coefficient of the target is represented by the T matrix. Polarization decomposition theories have been proposed to decompose the scattering matrix into several different components to interpret the scattering mechanism of targets. Polarization characteristics are defined using the parameters extracted from QP data, which can reflect the polarization characteristics of ground objects to a certain extent, for example, total polarization power, scattering angle, and similarity parameters. Polarization features are extensively used for polarization target feature extraction, target classification, target detection, and parameter inversion.
Traditional polarization decomposition can only decompose a small number of polarization features. For example, Freeman decomposition [56] can only save the information of 5 real elements of the original polarization coherence matrix along with loss of polarization information occurring along with the presence of a negative power component in the results. Yamaguchi decomposition [68] can only save six real elements. The de-orientation Freeman decomposition [69] can save six real elements and the de-orientation Yamaguchi decomposition [70] can save seven real elements. Cui decomposition [71] and improved Cui decomposition [72] neither lead to polarization information loss nor have a negative power component. However, the model of their third component is only a polarization coherence matrix with rank 1. Thus, complete polarization decomposition was not achieved using these methods. RSD is a novel algorithm-based non-coherent polarization decomposition method that does not involve any loss of polarization information [61]. The decomposition algorithm is complete and has excellent performance. The three components obtained through decomposition using RSD satisfy the reflection symmetry assumption. RSD can decompose all polarization scattering features and completely reconstruct the polarization coherence matrix according to the decomposed polarization features. RSD is a highly effective polarization decomposition scheme. With RSD, more polarimetric features can be obtained. Therefore, the polarization scattering characteristics used in this study are new. Owing to these advantages, we selected RSD as the polarization decomposition method in this study.
The polarization features obtained after decomposition through RSD included volume scattering value P V , surface scattering value P S , double bounce value P D , the total power value of the second component of RSD P 2 , the total power value of the third component of RSD To label the targets on the images conveniently, the obtained polarization scattering parameters are required to be synthesized in pseudo color. We assigned red, green, and blue to P D , P S , and P V, respectively (Figure 1e). The pseudocolor images for training and testing are displayed in Figure 1: The correspondence of training and validating labels to QP image classification is necessary. We used unmanned aerial vehicle (UAV) images (displayed in Figure 2), combined with empirical knowledge, for marking targets to guarantee the accuracy of the labeled training datasets. Different ground objects were randomly sampled to generate training sets for the CNN. Next, the trained model was used to classify QP images that were not in the training datasets; thus, the training images and testing images of the same scenes were captured at different times.
In this study, we classified seven species according to the survey results: nearshore water, seawater, spartina alterniflora, tamarix, reed, tidal flat, and suaeda salsa; we labeled these targets from number 1 to 7, respectively. We selected 800 samples of each category for training and 200 for validation. The details are shown in Table 2.  The correspondence of training and validating labels to QP image classification is necessary. We used unmanned aerial vehicle (UAV) images (displayed in Figure 2), combined with empirical knowledge, for marking targets to guarantee the accuracy of the labeled training datasets. Different ground objects were randomly sampled to generate training sets for the CNN. Next, the trained model was used to classify QP images that were not in the training datasets; thus, the training images and testing images of the same scenes were captured at different times. In this study, we classified seven species according to the survey results: nearshore water, seawater, spartina alterniflora, tamarix, reed, tidal flat, and suaeda salsa; we labeled these targets from number 1 to 7, respectively. We selected 800 samples of each category for training and 200 for validation. The details are shown in Table 2.  20210914_1  500  400  1000  500  500  500  500  20210914_2  500  200  0  0  0  500  0  20211013  0  400  0  500  500  0  500  Total  1000  1000  1000  1000  1000 1000 1000

Normalized Method
The polarization coherence matrix can be expressed in (1). The reciprocity theorem was satisfied in the PolSAR images.
Where the superscript * represents conjugation, the superscript H represents conjugate transpose, <•> represents set average, and k is the Pauli vector. The relationship of the elements of the polarization scattering matrix can be expressed in (2).

Normalized Method
The polarization coherence matrix can be expressed in (1). The reciprocity theorem was satisfied in the PolSAR images.
where the superscript * represents conjugation, the superscript H represents conjugate transpose, <•> represents set average, and k is the Pauli vector. The relationship of the elements of the polarization scattering matrix can be expressed in (2).
The back-scattering coefficients of PolSAR images after RSD are distributed non-linear because of the imaging mechanism of PolSAR. The linear normalization methods (such as maximum and minimum normalization and Z-score standardization) are unsuitable for PolSAR image processing. Therefore, according to the relationship between the total polarization power of PolSAR and each polarization scattering feature, this study adopted different methods to normalize polarimetric features. First, before employing the normalization method for scattered polarization features, the total polarization power span is required to be processed. Span refers to the sum of the diagonal values of the T matrix: To better represent Span, it is converted into a quantity in dB: Several different types of ground objects are distributed in the Yellow River Delta. Understanding the distribution of the total polarization power values of the targets is necessary. To further investigate the distribution of the classified features, we statistically analyzed the features. The experimental results indicated that the ground features were mainly distributed within the range [−30 dB, 0 dB] as well as P 0 . Therefore, we intercepted [−30 dB, 0 dB] of the scattering characteristics for classification.
The input polarization features should be normalized before training and testing the neural network. Since Span = T 11 + T 22 + T 33 for variables processing in the T matrix, T ij /Span (where i and j represent the row and column numbers of the T matrix) is used to normalize each element in the T matrix. For the complex elements in the non-diagonal elements of the T matrix, the real and imaginary parts are divided by Span to realize normalization.
The following physical quantities have a lower magnitude than the total polarization power Span: volume scattering component power value P V , surface scattering power value P S , double bounce scattering power value P D , the total power value of the second component of RSD P 2 , and total power value of the third component of RSD P 3 . Therefore, these parameters were divided by the span value to realize normalization.
The power ratio x of the spherical scattering of the second component of the symmetric decomposition of reflection and the power ratio y of the spherical scattering of the third component of the symmetric decomposition of the reflection was within the range [0, 1]. Thus, these parameters were not required to be normalized.
The range of the double directional and double spiral angles was (-π/2, π/2]. The RSD phase of the second component T 12 element a and that of the third component T 13 element b were within the range [−π, π]. Therefore, these parameters were processed as follows (5): X 1 is the normalized quantity and x is the quantity to be processed; m max and n min are the maximum and minimum values within the range of the physical quantities to be processed.
A few parameters processed using formula (5) were within the range [−1, 1]. To match the value range of the neural network activation function, the range [−1, 1] of the parameters was required to be reduced to [0, 1]. Next, the following formula was used: where y is the quantity to be processed at range of [−1, 1], z is the quantity after processing within the range [0, 1].

Schemes
Based on the polarization features generated through RSD and the T matrix, four schemes were proposed. First, we used the polarimetric features in the polarization coherence matrix (T) to classify QP images, which have been extensively reported in the literature [53][54][55][56]. Among them, the three main diagonal elements of T (T 11 , T 22, and T 33 ) contained most of the polarization information. The polarimetric features should be normalized before input to CNNs. The input polarimetric features can be normalized according to the relationship between the elements in the T matrix and Span. Therefore, we considered these four elements as scheme 1.
A substantial amount of polarimetric information was contained in the T matrix. In addition to the three diagonal elements, the non-diagonal elements also include relevant information. All elements of the T matrix and Span were regarded as scheme 2.
In addition to using the information in the T matrix, various polarization power quantities (P S , P D , P V , P 2 , P 3 , θ, ϕ, x, y, a, b and P 0 ) were decomposed using the RSD method. Given that normalization was involved, 12 polarization quantities along with Span were regarded as scheme 3. Finally, a total of 20 polarimetric features and Span were selected as scheme 4. In order well understand, similar to picture processing, a polarimetric feature was regarded as a channel here. The details are displayed in Table 3.

Exp
CNNs are used to process data with similar network structures. A complete CNN is generally composed of data input layers, convolution layers, activation layers, pooling layers, and full connection layers. CNNs with a certain depth and width can extract deeper features of images and can perform better object recognition and classification. In 2012, Geoffrey Hinton proposed AlexNet [73] which uses ReLU as the nonlinear activation function, dropout was adopted to randomly deactivate a few neurons for the first time which can avoid the overfitting of the model. The model has been applied by many scholars, thus marking the emergence of a new era of deep learning. Only two years later, VGG-Nets [74] were proposed and became a new star among relevant researchers. Both AlexNet and VGG16 are classic models with shallow layers, good generalization performance, and less time-consuming compared with other deeper state-of-art networks. Therefore, these two CNNs were selected as the backbone networks in this study.

Experiment
The procedure of the experiment is available as follows: First, the PolSAR data should be radiometrically calibrated [66] and filtered [67], a total of 21 polarimetric features were extracted from the RSD [61], and four schemes were proposed. Next, the polarimetric features were normalized before input CNNs [73,74]. Next, image cubes were extracted from training images and then divided into training datasets and validation datasets. We used training and validation datasets for model training. The model parameters are described in [73,74]. We used the weights and other parameters for testing when the model performed well. According to the processing methods, we took AlexNet for example, the experimental flow is displayed in Figure 3. The flow chart can be roughly divided into two lines. The first line is the main part of the experiment, and each color represents different processing content. The second line explains the main processing part of the first line. The orange box is the four research schemes designed, the green box is the visualization of the batch data of the corresponding research scheme, and the red box is the architecture and parameters of AlexNet.
in [73,74]. We used the weights and other parameters for testing when the model performed well. According to the processing methods, we took AlexNet for example, the experimental flow is displayed in Figure 3. The flow chart can be roughly divided into two lines. The first line is the main part of the experiment, and each color represents different processing content. The second line explains the main processing part of the first line. The orange box is the four research schemes designed, the green box is the visualization of the batch data of the corresponding research scheme, and the red box is the architecture and parameters of AlexNet.  The pseudocodes of the experiment are as follows: Conventional image classification involves feature extraction and classifier design. The quality of the extracted polarimetric features is crucial. Spatial information not only depended on the target itself but also was related to its neighborhoods. Neighborhood data included polarization features and spatial image patterns around the center point, indicating that different channel samples within the same range were input in AlexNet.
The batch size was set at 64 in the experiment reported in [74], Kaiming initialization was also used [75], with the initial learning rate being 0.1, attenuation rate being 0.1, the initial weight being 0.9, and the weight attenuation rate being 0.0005 [8].
When the model fitted well and the validation accuracy improved, the parameters of the model were stored for testing. The unit's size of training and testing are the same.
We used the cross-entropy loss function, as expressed in (7).
where M represents the number of categories, y ic is a symbolic function (0 or 1) if the real category of the sample is equal to 1, otherwise 0. The prediction probability of the observation samples is denoted by p ic . After the convolution and pooling layers in CNNs, the feature size could be acquired. Then the full connection and softmax layers were used to distinguish the category of a pixel. We set an empty matrix of the same size as the test image. The predicted label of each pixel was padded into the empty matrix one by one. Finally, the prediction results were output.

Evaluation Method
A confusion matrix was used to evaluate classification accuracy. A confusion matrix is described in terms of row and column. Evaluation indicators include overall accuracy (OA), mapping accuracy, user accuracy, etc. These accuracy indicators reflect the accuracy of the image classification from different aspects. The confusion matrix was calculated by comparing the position and classification of each measured pixel at the corresponding position and classification of the classified image. Each column of the confusion matrix represented the prediction category; the total number of data points in each column represented the number of data points in this category. Each row represented the appropriate category of data, and the total number of each row represented the number of data instances. The values in each column represented the number of real data points predicted in this category.
The overall accuracy can be expressed as follows: where ρ is the total number of classified pixels and ρ kk is the number of correctly classified pixels. The kappa coefficient was expressed as, where r is the total number of columns in the confusion matrix (total number of categories); x ii is the number of pixels on row i and column i of the confusion matrix (number of correct classifications); and x i+ and x +i are the total number of pixels in the row and column, respectively; N is the total number of pixels used for accuracy evaluation.

Results
We classified the QP image obtained on October 12, 2017, according to the four schemes. We selected 1000 samples of each category and obtained OA and kappa coefficients (K) to evaluate the algorithm performance. According to the above schemes and experiment, the results using the four schemes are displayed in Figure 4. We drew the ground-truth map of the test image according to the investigation, as displayed in Figure 4e. where r is the total number of columns in the confusion matrix (total number of categories); xii is the number of pixels on row i and column i of the confusion matrix (number of correct classifications); and xi+ and x+i are the total number of pixels in the row and column, respectively; N is the total number of pixels used for accuracy evaluation.  According to the classification results on AlexNet, the highest OA and kappa coefficient can be obtained when we adopt 21 total polarimetric features. The performance of scheme 2 and scheme 3 is similar but better than scheme 1. In addition, the OA and kappa coefficient of scheme 3 is slightly higher than scheme 2. The confusion matrix is displayed in Table 4.

Results
Based on the confusion matrixes of the four research schemes, we inferred that the more polarimetric features were input AlexNet, the higher OA acquired. When 21 polarimetric features were used for classification, the OA was 96.54%, which was 8.13% higher than that obtained by using only the main diagonal elements of the T matrix, 2.88% higher than that obtained using the matrix and non-diagonal elements, and 1.1% higher than that obtained using polarization power and other polarimetric features. The OA of classification can be improved by using more informative polarimetric features. Similarly, we conducted experiments on VGG16 as well. The parameters are described in [74] and [8]. The OA of scheme 4 is 94.93%, which is 5.4% higher than just using the three diagonal elements of the T matrix. The performance of scheme 2 and scheme 3 is similar but better than scheme 1. The results indicated that higher OAs and kappa coefficients can be acquired when more informative polarimetric features are input VGG16. The confusion matrix of VGG16 is displayed in Table 5.

Discussion
To verify that the polarization features classified by employing CNNs were informative, we designed noise test experiments by adding one, two, and three channels of Gaussian random noise to each scheme. The results on AlexNet indicated that after adding one channel of Gaussian random noise, the OAs of schemes from schemes 1 to 4 were 81.36%, 85.47%, 94.93%, and 95.71%, respectively, and the kappa coefficients were 78.25%, 83.05%, 94.08%, and 95%, respectively. The OAs and kappa coefficients were lower than those obtained using the original schemes. Similarly, upon adding two channels of Gaussian random noise, the OAs using the four schemes were 81.2%, 90.44%, 94.9%, and 93.76%, respectively, and kappa coefficients were 78.07%, 88.85%, 93.22%, and 92.72%, respectively. Upon adding three channels of Gaussian random noise, the OAs of the four schemes were 85.09%, 91.87%, 93.89%, and 94.5%, respectively, and the kappa coefficients were 82.6%, 90.52%, 92.87%, and 93.58%, respectively. When we added noise on VGG16, the OAs and kappa coefficient also decreased to varying degrees. The OAs and kappa coefficients obtained upon adding the noise were worse than the original schemes, meaning that AlexNet and VGG16 had a good anti-noise performance. Higher accuracy can be acquired when adopting more informative polarimetric features to classify QP images. Furthermore, the results of RSD were slightly better than the T matrix.
According to the results obtained using different schemes, the scheme with 21 polarimetric features had the highest OA because of more scattering of polarization information. The back-scattering coefficient is a crucial factor affecting the classification of targets. A CNN can distinguish targets easily when the back-scattering coefficients of specific targets differ from those of other ground objects. For example, the back-scattering coefficients of seawater and vegetation are considerably different; thus, the boundary between them is apparent. Distinguishing seawater from nearshore water is challenging because of their similar back-scattering coefficients.

Conclusions
In this study, we employed two well-known CNNs to classify QP images of the Yellow River Delta captured during summer. Accordingly, the wetlands in this area were classified as nearshore water, seawater, spartina alterniflora, tamarix, reed, tidal flat, and suaeda salsa.
With the polarimetric features from RSD and T matrix, four schemes were proposed. After radiation correction, polarization filtering, and normalization, the corresponding ground objects of the images were divided into training and validation datasets. The OAs of the classification were up to 96.54% and 94.93% which were 8.13% and 5.4% higher than the T matrix. The OAs of the four schemes were all higher than 88%. The results indicated that the accuracy was improved when more informative polarimetric features were input CNNs. The classification results also confirmed that the CNN classification method accounting for polarimetric features can be applied to QP images of wetlands classification. Furthermore, the back-scattering coefficient is a crucial parameter for distinguishing ground objects. The results obtained through RSD were slightly better than those obtained using the T matrix. Therefore, RSD can help improve the accuracy of polarimetric SAR image classification of wetland objects using CNNs. This study provides a method for wetlands classification based on polarimetric features and will promote future research on wetland cover.
The QP images captured using the GF-3 satellite contain a substantial amount of information with high utilization value. However, there only four summer-time images of the Yellow River Delta captured between 2016 and 2022 are available. These images are insufficient for analyzing the ground cover of the entire Yellow River Delta. Meanwhile, there are no benchmarks for performance assessment. We intend to utilize more GF-3 QP images of the Yellow River Delta in the future to train a model that can be applied to both summer and winter conditions. Furthermore, optical and QP images can be fused to classify wetlands. In addition, we will use novel CNNs or propose algorithms to analyze the Yellow River Delta in future studies.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The