A Crop Classification Method Integrating GF-3 PolSAR and Sentinel-2A Optical Data in the Dongting Lake Basin

With the increasing of satellite sensors, more available multi-source data can be used for large-scale high-precision crop classification. Both polarimetric synthetic aperture radar (PolSAR) and multi-spectral optical data have been widely used for classification. However, it is difficult to combine the covariance matrix of PolSAR data with the spectral bands of optical data. Using Hoekman’s method, this study solves the above problems by transforming the covariance matrix to an intensity vector that includes multiple intensity values on different polarization basis. In order to reduce the features redundancy, the principal component analysis (PCA) algorithm is adopted to select some useful polarimetric and optical features. In this study, the PolSAR data acquired by satellite Gaofen-3 (GF-3) on 19 July 2017 and the optical data acquired by Sentinel-2A on 17 July 2017 over the Dongting lake basin are selected for the validation experiment. The results show that the full feature integration method proposed in this study achieves an overall classification accuracy of 85.27%, higher than that of the single dataset method or some other feature integration modes.


Introduction
As for the demand of large-scale and high-efficiency crop mapping, remote sensing technology can substitute for the traditional field measurement and it can observe the same area many times in a short revisit time. Nowadays, optical data and polarimetric synthetic aperture radar (PolSAR) data are often used for crops' monitoring and the integration of multi-source data sets can help to achieve high-precision classification results. However, in the integrated classification, some effective features extracted from data of different sensors cannot be used at the same time, so that the potential of integrated datasets cannot be fully explored. Particularly, the covariance matrix of PolSAR data is difficult to be combined with multi-spectral optical data for classification. Considering the covariance matrix contains rich polarimetric information, this paper applies Hoekman's method [1], the matrix can be transformed to an intensity vector, detailed in Section 3.2. Such intensity vector has nine bands, denoting the intensity values on different polarization bases, which has the similar data structure with the spectral bands of optical data, so it is easy to combine these two kinds of information. In addition, some other useful features are extracted, including the polarimetric features, as the radar vegetation index (RVI) and the decomposed Yamguichi four components, as well as some optical features as the

Item Parameter
Swath (km) We collected the crop information through an in-situ survey. We kept a record for crop types and their growth stages. The crop types were identified through the regional agricultural expertise and farmers. Finally, the training samples and testing samples were separately selected ( Figure 2) according to the basic sampling principle [49,50] and the detailed information of samples are listed in Table 3.  We collected the crop information through an in-situ survey. We kept a record for crop types and their growth stages. The crop types were identified through the regional agricultural expertise and farmers. Finally, the training samples and testing samples were separately selected ( Figure 2) according to the basic sampling principle [49,50] and the detailed information of samples are listed in Table 3.

Methodology
The proposed method includes the following steps: data preprocessing, feature extraction and integration, SVM classification. The flowchart of the proposed method is shown in Figure 3.

Data Preprocessing
In order to make the extracted features better used for classification, the careful data preprocessing is necessary. Firstly, the GF-3 data is polarimetric calibrated. Specifically, the backscattering amplitude information on different polarization channels should be corrected according to the calibration constants in the header file. Then, the polarimetric coherency matrix is generated and the Non-Local filtering is used to reduce the speckle noise [51,52]. Finally, the area of interest is selected for subsequent experiments. As for the Sentinel-2A data, there are 13 bands, of which the selected four bands are commonly used for classification, including red (R), green (G), blue (B) and near infrared (NIR) bands.
Then the two data are registered into the same coordinate system for extracting and integrating features. Because the SAR acquisition is side looking, which is different from the central projection of optical data, the original optical data is registered into the SAR coordinate system for keeping target's backscattering characteristics. Details are shown in Figure 4. We choose the ground control points (GCPs) and then register data sets based on corresponding GCPs. Since the study area has a flat terrain, the SAR data has no obvious foreshortening, layover and shadow. So, the registration method

Methodology
The proposed method includes the following steps: data preprocessing, feature extraction and integration, SVM classification. The flowchart of the proposed method is shown in Figure 3.

Data Preprocessing
In order to make the extracted features better used for classification, the careful data preprocessing is necessary. Firstly, the GF-3 data is polarimetric calibrated. Specifically, the backscattering amplitude information on different polarization channels should be corrected according to the calibration constants in the header file. Then, the polarimetric coherency matrix T 3 is generated and the Non-Local filtering is used to reduce the speckle noise [51,52]. Finally, the area of interest is selected for subsequent experiments. As for the Sentinel-2A data, there are 13 bands, of which the selected four bands are commonly used for classification, including red (R), green (G), blue (B) and near infrared (NIR) bands.
Then the two data are registered into the same coordinate system for extracting and integrating features. Because the SAR acquisition is side looking, which is different from the central projection of optical data, the original optical data is registered into the SAR coordinate system for keeping target's backscattering characteristics. Details are shown in Figure 4. We choose the ground control points (GCPs) and then register data sets based on corresponding GCPs. Since the study area has a flat terrain, the SAR data has no obvious foreshortening, layover and shadow. So, the registration method based on the GCPs can achieve a high registered accuracy. At last, the optical data is cut into the interested area the same as GF-3 data.
based on the GCPs can achieve a high registered accuracy. At last, the optical data is cut into the interested area the same as GF-3 data.

Feature Extraction and Integration
To fully characterize different crops, we extract the backscattering intensity, backscattering type, canopy vegetation index from the GF-3 data and the spectral characteristics, spatial texture, canopy vegetation index from Sentinel-2A data. Since the intensity information is the most direct representation of the backscattering of radar waves in ground objects, it is extracted firstly.
We use the method proposed by Hoekman in 2003 [1] to transform elements of covariance matrix into multi-channel intensity vectors. Matrix B can be used to convert the elements of matrix into an intensity vector ⃑ , which can represent the backscattering intensity of crops in different polarimetric channels. The equation is shown specifically as follows:

Feature Extraction and Integration
To fully characterize different crops, we extract the backscattering intensity, backscattering type, canopy vegetation index from the GF-3 data and the spectral characteristics, spatial texture, canopy vegetation index from Sentinel-2A data. Since the intensity information is the most direct representation of the backscattering of radar waves in ground objects, it is extracted firstly.
We use the method proposed by Hoekman in 2003 [1] to transform elements of covariance matrix C 3 into multi-channel intensity vectors. Matrix B can be used to convert the elements of matrix C 3 into an intensity vector P, which can represent the backscattering intensity of crops in different polarimetric channels. The equation is shown specifically as follows: where DN denotes the intensity value and the subscripts denote the received and transmitted polarization bases: horizontal (h), vertical (v), left circular (l), right circular (r), 45 • linear (+ or +45) and −45 • linear (− or −45). It is worth noting that the backscattering intensity often contains a number of large magnitude values. For the normalization during the data combination, we transform the original intensity into the intensity with backscattering coefficient format (dB) by where σ denotes the transformed intensity vector and its detailed values are presented in Formula (4).
The subscripts in σ are the same with P. Although, the backscattering intensity information can be characterized by σ , the dimension of σ in multi-source data integration is large and will lead to data redundancy. Such redundancy will reduce the classification accuracy and computational efficiency. The principal component analysis (PCA) algorithm can pick out one or two main eigenvalues to replace the total eigenvector, so as to increase the classification accuracy and computational efficiency. In this paper, the sum of the first two principal components' variance values accounts for 98% of the total, which can be used to substitute for eigenvector in the calculation. In addition, such two principal component features σ pca1 and σ pca2 are extracted.
As for the backscattering type information, the corresponding polarimetric characteristics can be extracted by the Y4R decomposition method which is proposed by Yamguichi in 2005 [53]. On the basis of the classical Freeman three-component decomposition, the Y4R decomposition method further considers the helix scattering mechanism, which makes the backscattering types of polarimetric decomposition closer to the real situation, so that the Y4R method has been widely used for PolSAR image classification.
P s = f s 1 + |β| 2 (6) where P s , P d , P v and P c represents the scattering intensity of surface scattering, double scattering, volume scattering and helix scattering, respectively, f s , f d , f v and f c are the surface, double-bounce, volume, helix scattering contributions to |S VV | 2 , α and β denote the reals.  RVI extracted from PolSAR data can be used as the canopy vegetation index [54] and it applies the power of different polarimetric channels to reflect the canopy vegetation characteristics of different phenological stages. The greater the power, the closer the crop canopy is to the forest canopy. RVI extracted from PolSAR data can be used as the canopy vegetation index [54] and it applies the power of different polarimetric channels to reflect the canopy vegetation characteristics of different phenological stages. The greater the power, the closer the crop canopy is to the forest canopy. The characteristics of crop spectral information, spatial texture information and canopy vegetation index are extracted from the Sentinel-2A optical data. Multi-spectral information is more sensitive to moisture and the chlorophyll component of crop leaves, which can be used to identify the crop species. In this paper, four common spectral bands (R, G, B, NIR) are extracted to characterize the spectral information of crops and their corresponding feature vectors are also transformed by PCA algorithm. The first two principal components Opband pca1 and Opband pca2 are extracted, of which sum can contribute 99% of the overall variance of eigenvector.
Then the information entropy H of image on the red (R) band is used to characterize the spatial texture information of crops. The information entropy is an indicator of uncertainty measurement. The greater the value, the higher the uncertainty [55]. As for the image on single spectral band, the uncertainty is often determined by the richness of texture. The richer the texture information, the higher the uncertainty.
At last, the normalized difference vegetation index (NDVI) is calculated from red and near infrared band images by Equation (11). NDVI is used to characterize the canopy properties of different crops, especially the changes of canopy density and biomass.
Then the extracted features should be integrated before the SVM classification. In order to eliminate the effects of different features' scale, this paper normalizes all these features' range to (0~1). As shown in Figure 5, the imaging characteristics between PolSAR data and optical data are obviously different. The features obtained by such two kinds of data are independent and complementary to each other.

SVM Classification
Based on the integrated features, the support vector machine (SVM) method is applied to crop classification. The SVM classifier is an excellent two-class classification model, which can use the kernel function to map the multi-dimensional feature sets into higher dimensional space, to construct the classification plane and distinguish different categories. This method can efficiently get high-precision classification results with a few training samples. The SVM classifier has been successfully applied in many aspects, such as land use classification mapping, data mining. The kernel function adopted in this paper is the radial basis function (RBF), which can solve the linear non-separable problem in SVM classification by nonlinear mapping and it has only several parameters and low model complexity. After the SVM classification, the results with the SAR coordinate system will be transformed into the geographic coordinate system.

Experimental Results
As shown in Table 4, the overall classification accuracy is 85.27% and the Kappa coefficient is 0.8306. As for the misclassification condition, the accuracy of water, lotus pond and vegetation has (e-k) σ pca1 , σ pca2 , RVI, P s , P d , P h and P v extracted from the GF-3 PolSAR data.

Experimental Results
As shown in Table 4, the overall classification accuracy is 85.27% and the Kappa coefficient is 0.8306. As for the misclassification condition, the accuracy of water, lotus pond and vegetation has even reached 96% and that of the single-season rice, watermelon greenhouse, bare soil and grassland also reaches 80%. However, the misclassification rate of two-season rice is even higher than 54%. This is because the two-season rice has similar spectral characteristics as the single-season rice and vegetation. The omission rates of water, watermelon greenhouse and lotus pond are lower than 10% and that of bare soil and grassland is also lower than 20%. Besides, the omission rates of two kinds of rice are higher than the above five species, around 25%. While the omission rates of the vegetation are both over 30%. Although PolSAR can distinguish rice in different growing seasons, the classification accuracy is low, since there are nearly 1/4 of the two-season rice was misclassified as single-season rice. This could be resulted from the small number of available data. If the multi-temporal images are available, such two kinds of rice could be distinguished with the temporal information. And the omitted vegetation pixels here are mainly classified as the two-season rice and grassland. The reason is that the vegetation mostly grows in undulated mountains, where the speckle noise is stronger in PolSAR images and reduce the classification accuracy.

Comparison with Different Datasets
To validate the proposed full feature integration method, this section compares the results generated from the integrated data and that from single GF-3 data as well as from the single Sentinel-2A data ( Figure 6). We also assessed the classification accuracies. The evaluated indicators are the rates of true positive (TP), false negative (FN), true negative (TN) and false positive (FP). These indicators can fairly evaluate result on each class no matter how many samples are used [56]. We present these indicators by histograms. The sum of TP's rate and FN's rate equals to 1, which can be shown in one bar of the histogram (Figure 7). And the case is the same for the TN's rate and FP's rate ( Figure 8). It can be seen that the overall classification accuracy of the integrated data is the highest, followed by the single optical data, then the single PolSAR data. The GF-3 PolSAR data alone can distinguish single-season rice from two-season rice but it will misclassify bare soil, grassland and watermelon greenhouse mainly with the surface scattering. While the Sentinel-2A data alone performs oppositely to GF-3 PolSAR data. It shows better classification ability for bare soil, grassland and watermelon greenhouse, because the spectral information of these three land covers varies greatly. But it cannot classify the single-season rice and two-season rice as well as the GF-3 data, providing a classification accuracy of two-season rice of as low as 28%. The proposed integration method takes the advantages of both two data, so the results have the highest classification accuracy.

Comparison with Different Feature Integration Modes
This section aims to validate the advantage of full feature integration proposed by this paper. Traditional data fusion methods think that both the intensity values of SAR data and the spectral information of optical data into classification at the same time, leading to data redundancy. But the intensity of SAR data is different from the spectral information of optical data. The former denotes the backscattering characteristics, whereas the latter denotes the reflection of sunlight. The classification results under different feature integration modes will be discussed and the details are shown in Table 5. In this study, we used three feature integration modes, including (1) Figure 9 and the accuracy assessments are shown in Figures 10 and 11. It can be concluded that, the full feature integration method has achieved the highest overall classification accuracy and larger Kappa coefficient. It is mainly owing to the improvement of the classification accuracy of vegetation and grassland. And the involvement of more features makes the classification more accurate and stable. In addition, it can be seen that when the PolSAR features are more involved (GF-3 (7 bands) + S2A (2 bands)), the classification accuracy of single-season rice and two-season rice is increased. However, when more optical features are involved (GF-3 (5 bands) + S2A (4 bands)), the classification accuracy of bare soil and watermelon greenhouse is improved. So, this conclusion is consistent with that of last section. To sum up, the full feature integration method proposed in this paper can get a higher classification accuracy.

Comparison with Different Feature Integration Modes
This section aims to validate the advantage of full feature integration proposed by this paper. Traditional data fusion methods think that both the intensity values of SAR data and the spectral information of optical data into classification at the same time, leading to data redundancy. But the intensity of SAR data is different from the spectral information of optical data. The former denotes the backscattering characteristics, whereas the latter denotes the reflection of sunlight. The classification results under different feature integration modes will be discussed and the details are shown in Table 5. In this study, we used three feature integration modes, including (1) GF-3 features (σ pca1 , σ pca2 , RVI, P s , P d , P h and P v ) + Sentinel-2A features (Opband pca1 , Opband pca2 , NDVI and H); (2) GF-3 features (σ pca1 , σ pca2 , RVI, P s , P d , P h and P v ) + Sentinel-2A features (NDVI and H); (3) GF-3 features (RVI, P s , P d , P h and P v ) + Sentinel-2A features (Opband pca1 , Opband pca2 , NDVI and H). The classification results are shown in Figure 9 and the accuracy assessments are shown in Figures 10 and 11. It can be concluded that, the full feature integration method has achieved the highest overall classification accuracy and larger Kappa coefficient. It is mainly owing to the improvement of the classification accuracy of vegetation and grassland. And the involvement of more features makes the classification more accurate and stable. In addition, it can be seen that when the PolSAR features are more involved (GF-3 (7 bands) + S2A (2 bands)), the classification accuracy of single-season rice and two-season rice is increased. However, when more optical features are involved (GF-3 (5 bands) + S2A (4 bands)), the classification accuracy of bare soil and watermelon greenhouse is improved. So, this conclusion is consistent with that of last section. To sum up, the full feature integration method proposed in this paper can get a higher classification accuracy. Table 5. Details on different feature integration modes.

Classification Ability of ⃑
The Wishart supervised classification based on the covariance matrix or the coherency matrix has be widely used. In this study, we substituted the intensity vector ⃑ for covariance matrix to adapt to the SVM classifier. Input variables of the SVM classifier should be multiple independent bands. Hoekman has proved that the intensity vector ⃑ can represent the full polarimetric target characteristics by a covariance matrix [1] and ⃑ is more suitable to crop classification, because it can describe the biophysical parameter variations of crops. To clarify this point, we compare three polarimetric classification methods, including (1) Wishart supervised classification with , (2) SVM classification with ⃑ and (3) SVM classification with the first two PCA components of ⃑ . The results are presented in Figure 12. As the figure shows, the SVM classification with ⃑ has the highest overall accuracy and kappa coefficient in all methods. We also calculated the rates of TP, FN, TN and FP and made a comparison (Figures 13 and 14). The comparison shows that the SVM method with ⃑ performs better than the Wishart supervised method in most land covers but the Wishart method has the best performance in the watermelon greenhouse and the forest region among these three methods. The crop classification results of the SVM classification with ⃑ has the highest accuracy, verifying Hoekman's theory that ⃑ is more suitable to describe crops. And for the crops, the first two PCA components of ⃑ can achieve similar classification results as the whole ⃑ . We can conclude that the intensity vector and its PCA components can be successfully applied into the polarimetric classification and get better results than the Wishart supervised classification in most crop cases.

Classification Ability of σ
The Wishart supervised classification based on the covariance matrix C 3 or the coherency matrix T 3 has be widely used. In this study, we substituted the intensity vector σ for covariance matrix to adapt to the SVM classifier. Input variables of the SVM classifier should be multiple independent bands. Hoekman has proved that the intensity vector σ can represent the full polarimetric target characteristics by a covariance matrix [1] and σ is more suitable to crop classification, because it can describe the biophysical parameter variations of crops. To clarify this point, we compare three polarimetric classification methods, including (1) Wishart supervised classification with C 3 , (2) SVM classification with σ and (3) SVM classification with the first two PCA components of σ . The results are presented in Figure 12. As the figure shows, the SVM classification with σ has the highest overall accuracy and kappa coefficient in all methods. We also calculated the rates of TP, FN, TN and FP and made a comparison (Figures 13 and 14). The comparison shows that the SVM method with σ performs better than the Wishart supervised method in most land covers but the Wishart method has the best performance in the watermelon greenhouse and the forest region among these three methods. The crop classification results of the SVM classification with σ has the highest accuracy, verifying Hoekman's theory that σ is more suitable to describe crops. And for the crops, the first two PCA components of σ can achieve similar classification results as the whole σ . We can conclude that the intensity vector and its PCA components can be successfully applied into the polarimetric classification and get better results than the Wishart supervised classification in most crop cases.

Conclusions
The GF-3 PolSAR data is sensitive to the change of morphological structure during crop growth, whereas the Sentinel-2A optical data can show the change of moisture and chlorophyll content in crop leaves well. Integrating such two kinds of data can improve the accuracy of crop classification. However, some useful features cannot be used in the classification at the same time. Particularly, the covariance matrix of PolSAR data is hard to be combined with the spectral bands of optical data. To solve this problem, we used the Hoekman's method to transform the covariance matrix to an intensity vector. The PCA algorithm was applied to reduce the redundancy of feature sets. Then, the training samples were selected to do the SVM classification. The classification accuracy of the proposed method is higher than that of single data set method and other two feature integration modes and the intensity vector has a better performance than the covariance matrix for crop classification. In total, full feature integration method proposed by this paper is suitable for crop classification and can effectively improve the classification accuracy. Furthermore, this paper expands the application of GF-3 satellite in agriculture, proving the great potential in monitoring crops.
Author Contributions: H.G. performed the experiments, wrote the paper; C.W. contributed the ideas, analyzed the experimental results and revised the paper; G.W. analyzed the experimental results and revised the paper; J.Z., Y.T., P.S. and Z.Z. contributed discussions for the results and revised the paper.

Conclusions
The GF-3 PolSAR data is sensitive to the change of morphological structure during crop growth, whereas the Sentinel-2A optical data can show the change of moisture and chlorophyll content in crop leaves well. Integrating such two kinds of data can improve the accuracy of crop classification. However, some useful features cannot be used in the classification at the same time. Particularly, the covariance matrix of PolSAR data is hard to be combined with the spectral bands of optical data. To solve this problem, we used the Hoekman's method to transform the covariance matrix to an intensity vector. The PCA algorithm was applied to reduce the redundancy of feature sets. Then, the training samples were selected to do the SVM classification. The classification accuracy of the proposed method is higher than that of single data set method and other two feature integration modes and the intensity vector has a better performance than the covariance matrix for crop classification. In total, full feature integration method proposed by this paper is suitable for crop classification and can effectively improve the classification accuracy. Furthermore, this paper expands the application of GF-3 satellite in agriculture, proving the great potential in monitoring crops.
Author Contributions: H.G. performed the experiments, wrote the paper; C.W. contributed the ideas, analyzed the experimental results and revised the paper; G.W. analyzed the experimental results and revised the paper; J.Z., Y.T., P.S. and Z.Z. contributed discussions for the results and revised the paper.