Multi-Feature Classification of Multi-Sensor Satellite Imagery Based on Dual-Polarimetric Sentinel-1A, Landsat-8 OLI, and Hyperion Images for Urban Land-Cover Classification

This paper focuses on evaluating the ability and contribution of using backscatter intensity, texture, coherence, and color features extracted from Sentinel-1A data for urban land cover classification and comparing different multi-sensor land cover mapping methods to improve classification accuracy. Both Landsat-8 OLI and Hyperion images were also acquired, in combination with Sentinel-1A data, to explore the potential of different multi-sensor urban land cover mapping methods to improve classification accuracy. The classification was performed using a random forest (RF) method. The results showed that the optimal window size of the combination of all texture features was 9 × 9, and the optimal window size was different for each individual texture feature. For the four different feature types, the texture features contributed the most to the classification, followed by the coherence and backscatter intensity features; and the color features had the least impact on the urban land cover classification. Satisfactory classification results can be obtained using only the combination of texture and coherence features, with an overall accuracy up to 91.55% and a kappa coefficient up to 0.8935, respectively. Among all combinations of Sentinel-1A-derived features, the combination of the four features had the best classification result. Multi-sensor urban land cover mapping obtained higher classification accuracy. The combination of Sentinel-1A and Hyperion data achieved higher classification accuracy compared to the combination of Sentinel-1A and Landsat-8 OLI images, with an overall accuracy of up to 99.12% and a kappa coefficient up to 0.9889. When Sentinel-1A data was added to Hyperion images, the overall accuracy and kappa coefficient were increased by 4.01% and 0.0519, respectively.


Introduction
Over the past few decades, the world has undergone an unprecedented process of urbanization and it is estimated that by 2030, 60% of the global population will live in cities [1]. The rapid population growth has caused unhealthy housing, air pollution, traffic congestion, food security and other issues in urban areas. With the acceleration of urbanization, many changes have taken place in the spatial layout and functions of cities, and the regional ecosystems and climate have been affected [2]. Accurate and timely collection of reliable urban land use and land cover (LULC) information is the key to addressing these issues and achieving sustainable urban development, which is also important for urban planners and decision-makers. Due to its characteristics of frequent and large area detection, remote sensing technology has become an important means to obtain the information of LULC quickly, and has made great contributions to monitoring the process of dynamic urbanization. Extensive important for the identification of LULC and many studies have demonstrated the effectiveness of random forest (RF) [38] in SAR data classification [39][40][41][42].
Furthermore, many features can be extracted from SAR images, and the most commonly used is the backscatter coefficient. For example, Skakun et al. [43] used multi-temporal C-Band Radarsat-2 intensity and Landsat-8 for crop identification. Shu et al. [44] extracted the shoreline from RADARSAT-2 intensity images. Li et al. [45] used a marked point process to detect oil spills from SAR intensity images. Since SAR data can provide abundant texture information, many previous studies have extracted texture features from SAR data for LULC information extraction. Some previous studies used the combination of texture and backscatter intensity features to map urban land cover and found that texture features were important information that can improve classification accuracy [46,47]. In addition to backscatter intensity and texture features, some researchers also extracted coherence and color features from SAR data for urban land cover classification. Uhlmann et al. [48] performed LULC classification by extracting color features from SAR data and demonstrated the usefulness of color features. Pulvirenti et al. [49] analyzed the role of coherence features in mapping floods in agricultural and urban environments. Xiang et al. [50] conducted urban mapping using model-based decomposition and polarization coherence. Zhang et al. [51] only quantitatively evaluated the contribution of texture features. Another study evaluated only the contribution of texture features obtained using three texture measures families [52]. Wurm et al. [53] performed slum mapping using multi-temporal X-band SAR data and evaluated the contribution of texture features by using RF. Schlund et al. [54] evaluated the contribution of texture and coherence features to forest mapping. Although the above studies obtained good classification results using different SAR-derived features, they only used part of these features and did not compare the contribution of these different feature types. Consequently, there are still only a few studies that make use of multiple feature types extracted from SAR data for urban land cover mapping, especially studies that quantitatively assess the contribution of different feature types.
The objective of this study was to evaluate the ability and contribution of using backscatter intensity, texture, coherence and color features extracted from Sentinel-1A data for urban land cover classification and to compare different multi-sensor land cover mapping methods to improve classification accuracy. For this purpose, the backscatter intensity, texture, coherence, and color features were extracted from Sentinel-1A data and then the RF classifier was used to explore the contribution and potential of different combinations of these features to urban mapping. We then used the different combinations of Sentinel-1A, Landsat-8 OLI, and Hyperion images to compare and explore the complementary advantages of the three different optical and radar sensors.

Study Area
The study area is located in the downtown area of Suzhou (30 • 47 to 32 • 02 N; 119 • 55 to 121 • 20 E), Jiangsu Province, China ( Figure 1). As one of the largest cities in the Yangtze River Delta, Suzhou has a total area of 8657.32 km 2 and the permanent population was 10.616 million at the end of 2015. It has a subtropical monsoon climate with annual average temperature of 15-17 • C and annual average precipitation at around 1076 mm. The average elevation of Suzhou is about 4 m above sea level, plain area accounts for 54% of the total area; many hills in the southwest. In recent years, Suzhou has become a typical rapid urbanization area due to the rapid economic development and dramatic changes in land use. However, with the development of society and the improvement of economic level, the rapid urbanization has caused the rapid increase of urban population, environmental deterioration, and resource crisis and so on. In order to effectively address these issues, there is an urgent demand to monitor the dynamic urbanization in the area.

SAR Satellite Data
As the first new space part of the GMES (Global Monitoring for Environment and Security) satellite series, Sentinel-1 is a constellation of two satellites designed and developed by ESA (European Space Agency) and funded by the European Commission. Sentinel-1A and Sentinel-1B were launched on 3 April 2014 and 25 April 2016, respectively [55]. With the launch of Sentinel-1B, double the amount of data was obtained and global coverage was achieved in six days. Sentinel-1A carries a C-band SAR instrument (with a 12-day repeat cycle [56]) and has the following four operational modes: Stripmap mode (SM), Interferometric Wide Swath mode (IW), Extra Wide Swath mode (EW), and Wave mode (WV) [57]. In this study, four Sentinel-1A images were obtained from ESA and detailed image information is shown in Table 1.

Optical Satellite Data
As a joint project of NASA and the U.S. Geological Survey (USGS), Landsat 8 was launched in February 2013 and has two sensors: Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) [26]. The OLI sensor has nine spectral bands (bands 1-9), eight channels of 30 m spatial resolution, and one panchromatic channel with 15 m spatial resolution; the TIRS has two spectral

SAR Satellite Data
As the first new space part of the GMES (Global Monitoring for Environment and Security) satellite series, Sentinel-1 is a constellation of two satellites designed and developed by ESA (European Space Agency) and funded by the European Commission. Sentinel-1A and Sentinel-1B were launched on 3 April 2014 and 25 April 2016, respectively [55]. With the launch of Sentinel-1B, double the amount of data was obtained and global coverage was achieved in six days. Sentinel-1A carries a C-band SAR instrument (with a 12-day repeat cycle [56]) and has the following four operational modes: Stripmap mode (SM), Interferometric Wide Swath mode (IW), Extra Wide Swath mode (EW), and Wave mode (WV) [57]. In this study, four Sentinel-1A images were obtained from ESA and detailed image information is shown in Table 1.

Optical Satellite Data
As a joint project of NASA and the U.S. Geological Survey (USGS), Landsat 8 was launched in February 2013 and has two sensors: Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) [26]. The OLI sensor has nine spectral bands (bands 1-9), eight channels of 30 m spatial resolution, and one panchromatic channel with 15 m spatial resolution; the TIRS has two spectral bands (bands 10-11) [58]. A Landsat 8 OLI image on 13 October 2015 was obtained from the USGS with a cloud cover of 3.2%.
Hyperion is a Hyperspectral instrument on the Earth Observation 1 (EO-1) spacecraft launched on 21 November 2000 [59], with 7.5 km coverage [60]. It has a total of 242 bands from 357 to 2577 nm with a spectral resolution of 10 nm and a spatial resolution of 30 m [61,62]. In this study, one Hyperion image (ID: EO1H1190382015042110PF) of 11 February 2015 was captured from the USGS. In addition, a GaoFen-2 (GF-2) on 28 December 2015 was obtained and used as reference data. GF-2 was successfully launched on 19 August 2014. It is China's first civilian optical remote sensing satellite with a spatial resolution better than 1 m. The GF-2 satellite carries two high-resolution 0.8-m panchromatic and 3.2-m multispectral cameras [63].

Accuracy Assessment
In this study, six different urban land cover types were identified: dark impervious surface (DIS), bright impervious surface (BIS), forest (FOR), water (WAT) and grass (GRA). Impervious surface means any material that prevents water from penetrating the soil, which not only represents urbanization but also is a major contributor to the environmental impact of urbanization [64]. Impervious surfaces can be divided into two types according to their physical composition [65]: BIS (e.g., asphalt and old concrete) and DIS (e.g., new concrete and metal) [66]. A set of samples was obtained by visual interpretation of GF-2 image (image date: 28 December 2015) and with reference to Google Earth images with very high spatial resolution (image date: 8 December 2015). Then, using stratified random sampling, about 50% of the samples were used as the training samples and the remaining 50% were used to test the results and calculate the accuracy ( Table 2). Finally, the confusion matrix was used to calculate the overall accuracy (OA) and kappa coefficient to evaluate the classification results of urban land cover. In addition, as the harmonic mean of the producer's and user's accuracy [67,68], the F1 measure (Equation (1)) was also calculated to evaluate the effectiveness of the urban land cover classification. The F1 measure is considered to be more meaningful than the kappa coefficient and the overall accuracy [69], and its values range from 0 to 1; a larger F1 measure indicates better results, and a smaller F1 measure indicates poorer results: producer's accuracy × user's accuracy user's accuracy + producer's accuracy (1)

Satellite Data Pre-Processing
The preprocessing of Sentinel-1A data was conducted using SARscape 5.2 software, including the following process: multi-look, registration, speckle filtering, geocoding, and radiometric calibration. To reduce speckle, a Lee filter with 3 × 3 windows was applied to all Sentinel-1A images [70]. The digital number (DN) values of Sentinel-1A images were converted to backscatter coefficient (σ 0 ) in decibel (dB) scale. The Sentinel-1A images were then geocoded using the shuttle radar topography mission DEM with a spatial resolution of 30 m.
Preprocessing of the Landsat 8 OLI and GF-2 images using ENVI 5.3 included radiance calibration, atmospheric correction, and geometric correction. The DN value of the raw image was converted to the surface spectral reflectance by radiance calibration. The atmospheric correction of these multispectral data was conducted using the ENVI FLAASH model. The bad bands and bad columns of the Hyperion image were removed because many of their bands show low signal-to-noise ratio or other problems [59,71]. Atmospheric correction was then also implemented using the FLAASH algorithm. All images were also geometrically rectified using 25 ground control points (GCP), with the root mean square error (RMSE) less than 0.5 pixels.

Color Features
Color features are important visual features, which is used to describe the visual content of the whole image or specific image area. Although they do not provide the natural color information of a target, they can provide useful information for understanding and analyzing Sentinel-1A data. Compared with grayscale images, color images not only have better visual display, but also have rich information about image details [72]. In order to explore more information to improve the classification accuracy of urban land cover, the HSV color space [73] was used to extract the color features of Sentinel-1A images from false color images. The HSV color space decomposes the color into hue (Hu), saturation (Sa) and value (Va) components [74], and the Hu, Sa, and Va values are the color features [75]. For the dual-polarized Sentinel-1A data used in this study, the two different polarization backscattering matrices were used to obtain pseudo color images (R: |VH|, G: |VH-VV|, B: |VV| or R: |VV|, G: |VV-VH|, B: |VH|). In this study, we assigned the |VH|, |VH-VV| and |VV| scatter matrices to red, green and blue image components, or assigned the |VV|, |VV-VH| and |VH| scatter matrices to red, green and blue image components.

Texture Features
Texture is an intrinsic spatial feature of an image [76] and is an effective representation of spatial relationship [77]. It is important for the application of SAR images because of the rich texture information in SAR images. Therefore, for remote sensing data, especially SAR data, texture is considered as an important tool to distinguish land cover types. In this study, the gray level co-occurrence matrix (GLCM) was used to extract the following texture features: mean, variance, correlation, dissimilarity, contrast, entropy, angular second moment, and homogeneity. Since the extraction of texture features is affected by the size of the selected window, texture measures at different window sizes were calculated to obtain the appropriate window size. To account for the different spatial variability patterns of the land cover types in the study area, empirical criteria were used to select the window size [78]. After several experiments, some different window sizes were tested: 3 × 3, 5 × 5, 7 × 7, 9 × 9, and 11 × 11 . . . 75 × 75. In order to reduce the influence of the direction and improve the extraction accuracy of the texture features [76], they are detected in four directions of 0 • , 45 • , 90 • and 135 • ; finally, the final texture information was calculated from the average value of four directions.

Coherence Features
Coherence is an estimate of the phase consistency of the imaged targets during the time interval between two SAR acquisitions [79]. The value of the coherence coefficient is between 0 and 1, and the coherence size determines whether the corresponding pixel has changed; a larger coherence coefficient indicates a smaller change, while a smaller coherence coefficient indicates a larger change. In this study, SARscape 5.2 software was used to calculate the coherence of Sentinel-1A data for both VV and

Feature Combination
In order to assess the ability and contribution of using backscatter intensity, texture, coherence and color features extracted from Sentinel-1A data for urban land cover classification and to compare different multi-sensor land cover mapping methods to improve classification accuracy, the following combinations were considered ( Table 3). The surface reflectance values in the spectral bands from Landsat-8 OLI and Hyperion data formed a feature vector that was input to the classifier. The images were then stacked in different combinations.

Classifiers
Random forest is a machine learning method proposed by Breiman [38]. It is a classifier based on decision tree, in which each tree contributes one vote [78], and the final classification or prediction results are obtained by voting [80]. A large number of studies have shown that RF produces relatively high classification accuracy in SAR data classification [23,81,82]. According to [83], RF outperforms standard classification methods, such as a simple decision tree, because it allows for greater differentiation between different land cover classes; RF is relatively robust to training dataset reduction and noise; due to the Law of Large Numbers, RF does not overfit. For each RF classifier, a default value of 500 trees were grown using the square root of the total number of features at each node. In addition, it can handle large data sets [53] and also provides useful information about variable importance. In the present study, the variable importance was used to reduce the number of input variables.
RF uses bootstrap aggregating to enhance the diversity of classification trees [78], and the samples that are excluded from the bootstrap sample are referred to as out-of-bag (OOB) samples [51]. The RF classifier assesses the importance of variables using the Gini index and the OOB subset [84]. According to [84], for a training data T, the Gini index can be defined as (Equation (2)): where f (C i , T)/|T| is the probability that the selected case belongs to class C i .

Results and Discussion
4.1. Texture Analysis of Sentinel-1A Image Figure 2 shows the classification accuracy of the texture feature experiments based on different window sizes in order to obtain the best identification window size. The combination of all texture features obtained the highest classification accuracy; in terms of window size, the best classification result was achieved with 9 × 9 (overall accuracy and kappa coefficient were about 90% and 0.88, respectively).
is the probability that the selected case belongs to class i C . Figure 2 shows the classification accuracy of the texture feature experiments based on different window sizes in order to obtain the best identification window size. The combination of all texture features obtained the highest classification accuracy; in terms of window size, the best classification result was achieved with 9 × 9 (overall accuracy and kappa coefficient were about 90% and 0.88, respectively). According to the work of Pesaresi [85], image spatial resolution and land cover characteristics were the main determinants of the optimal window size for texture features. Wurm et al. [53] used the TerraSAR-X data to calculate the texture features of different window sizes for LULC identification, and found that the window size of 81 × 81 was the best. The classification accuracy using a single texture feature was lower than that of the combination of all the texture features. The classification results for each individual feature reveal the following: the mean feature had the best classification result, followed by the dissimilarity; the worst classification result came from the According to the work of Pesaresi [85], image spatial resolution and land cover characteristics were the main determinants of the optimal window size for texture features. Wurm et al. [53] used the TerraSAR-X data to calculate the texture features of different window sizes for LULC identification, and found that the window size of 81 × 81 was the best. The classification accuracy using a single texture feature was lower than that of the combination of all the texture features. The classification results for each individual feature reveal the following: the mean feature had the best classification result, followed by the dissimilarity; the worst classification result came from the angular second moment and correlation features. The best classification results for each individual feature (mean, variance, entropy, dissimilarity, homogeneity, correlation, contrast and angular second moment) correspond to the following window sizes: 5 × 5, 23 × 23, 25 × 25, 13 × 13, 19 × 19, 49 × 49, 51 × 51 and 9 × 9, respectively, with overall accuracy of 72.70%, 58.81%, 61.82%, 63.08%, 60.55%, 60.31%, 63.39% and 59.13%. Before the best classification result was achieved, the classification accuracy increased with the increase of the window size, and then decreased with the increase of the window size after reaching the best classification result. This shows that the best classification results for different texture features correspond to different window sizes. A more effective representation of the most heterogeneous environments with high local variance can be obtained with the smallest window size. In contrast, a more accurate representation of the often homogeneous pattern of spatial variability of large areas can be obtained with the larger window sizes [78]. Compared with the previous studies [53,86], this paper not only explored the optimal window size of the combination of all the texture features, but also obtained the optimal window size corresponding to each individual texture feature. We chose the optimal window size corresponding to each individual texture feature for the following experiment.

Feature Selection
In this paper, all of the input features were 102 Sentinel-1A-derived features (i.e., color features, texture features, coherence features, and backscatter intensity features) and the importance of variables was assessed using RF classifiers (see Section 3.3). Figure 3 shows that the classification accuracy varies with the number of n best features selected. The overall accuracy and kappa coefficient obtained using the first 10 most important features were about 85% and 0.8000, respectively. As more features were added, higher classification accuracy was achieved. This improvement continued until 50 features, and then stabilized. Although all 102 features achieved the highest classification accuracy, only the 50 most important features contributed significantly to the classification results. Therefore, considering the computational cost and efficiency, we chose the first 50 most important features to reduce the original number of Sentinel-1A-derived features and used them for the following experiments.  63.39% and 59.13%. Before the best classification result was achieved, the classification accuracy increased with the increase of the window size, and then decreased with the increase of the window size after reaching the best classification result. This shows that the best classification results for different texture features correspond to different window sizes. A more effective representation of the most heterogeneous environments with high local variance can be obtained with the smallest window size. In contrast, a more accurate representation of the often homogeneous pattern of spatial variability of large areas can be obtained with the larger window sizes [78].
Compared with the previous studies [53,86], this paper not only explored the optimal window size of the combination of all the texture features, but also obtained the optimal window size corresponding to each individual texture feature. We chose the optimal window size corresponding to each individual texture feature for the following experiment.

Feature Selection
In this paper, all of the input features were 102 Sentinel-1A-derived features (i.e., color features, texture features, coherence features, and backscatter intensity features) and the importance of variables was assessed using RF classifiers (see Section 3.3). Figure 3 shows that the classification accuracy varies with the number of n best features selected. The overall accuracy and kappa coefficient obtained using the first 10 most important features were about 85% and 0.8000, respectively. As more features were added, higher classification accuracy was achieved. This improvement continued until 50 features, and then stabilized. Although all 102 features achieved the highest classification accuracy, only the 50 most important features contributed significantly to the classification results. Therefore, considering the computational cost and efficiency, we chose the first 50 most important features to reduce the original number of Sentinel-1A-derived features and used them for the following experiments. Figure 4 shows the 50 first Sentinel-1A-derived features ordered by normalized variable importance. According to the feature rankings, the texture features contributed the most to the classification, with the highest value above 0.75, followed by the coherence features and the backscatter intensity features; the color features contributed the least to urban land cover classification. All the coherence features and the backscatter intensity features appeared among the first 50 features, while the color features and texture features did not all appear. In addition, the maximum value of the color feature was about 0.35, which was less than the minimum of the coherence feature and the backscatter intensity feature.   Figure 4 shows the 50 first Sentinel-1A-derived features ordered by normalized variable importance. According to the feature rankings, the texture features contributed the most to the classification, with the highest value above 0.75, followed by the coherence features and the backscatter intensity features; the color features contributed the least to urban land cover classification. All the coherence features and the backscatter intensity features appeared among the first 50 features, while the color features and texture features did not all appear. In addition, the maximum value of the color feature was about 0.35, which was less than the minimum of the coherence feature and the backscatter intensity feature.   Figure 5 presents all the texture features ordered by normalized variable importance. The horizontal axis was labeled according to the polarization of the image (VV or VH), its acquisition date, and the name of the texture feature. By analyzing the individual contributions of the texture features, it is shown that the mean feature contributed the most to the classification, the contributions of which all exceeded 0.5, followed by the dissimilarity, contrast, and variance features; the correlation and angular second moment features contributed the least to urban land cover classification, and their values were all less than 0.25. Finally, there was no significant difference in the contribution of homogeneity and entropy features. Comparing VV and VH polarization information, we found that VH polarization contributed more to the classification than VV polarization; VH polarization appeared 17 times in the first 50 most important features, while VV polarization appeared only 8 times. Jin et al. [87] reported that backscatter coefficients at HV polarization were more important than HH polarization. Compared with co-polarization, cross-polarization has strong sensitivity to vegetation structure and biomass because of its multiple volume scattering [88,89]. In terms of dates, there was no significant difference in the contribution of the texture features of the four Sentinel-1A images, mainly due to the fact that the data came from the same season. In Suzhou, the rainy season is mainly concentrated in the summer and into the plum-rains season in late June. The measure of feature importance provides useful information for selecting the most important classification features, which can reduce the number of inputs to the classification algorithm and thus speed up the classification process [87].

Contribution of Different Feature Combinations
The contribution was assessed by performing RF classifier classification using different feature combinations. These classification results are shown in Table 4. First, SAR images were classified based on four single features, including color features, texture features, coherence features, and backscatter intensity features. As shown in Table 4, the classification accuracy using T was highest, with an overall accuracy of 89.08% (kappa coefficient = 0.8621), followed by VV + VH and C2. Compared with the previous studies [46,90], the classification accuracy of the texture features in this study was higher mainly because each individual texture feature was calculated using their corresponding best window. Although C1 had the worst performance (with an overall accuracy of 58.45% and a kappa coefficient of 0.4734) of the four single features, it could distinguish urban impervious surfaces (BIS and DIS) well. Although the coherence features have a higher contribution to the classification and have good identification ability to the urban impervious surface, their overall classification accuracy is low, mainly due to the low identification ability of the coherence features to forests, grasslands and water bodies. This shows that texture information is more suitable for urban land cover classification and the coherence feature is important information for the extraction of urban impervious surfaces. Some previous studies have shown that crop classification accuracy of backscatter intensity information is higher than that of texture information [26,91]. For all four single features, overall accuracies were lower than 80% except for the value obtained using T. Although the overall accuracy of T was higher than 85%, the corresponding F1 measures of GRA and BIS were lower than 85%, which indicates that four single features could not meet the requirements of urban land cover classification. Consistent with previous studies, classification accuracy of less than 85% when using single polarization image [46] and using only coherence features [92]. The classification accuracy of the backscatter intensity features in this study was higher than previous studies using only single-date dual-polarimetric SAR data [93], mainly due to the use of multi-temporal SAR data. Another previous study using texture information from single-date SAR data obtained producer's accuracy and user's accuracy of only 54% and 76% for urban areas, respectively [94]. Therefore, multi-date Sentinel-1A images are needed to obtain better urban land cover identification. For color features and backscatter intensity features, the best classification results were from WAT (with an F1 measure of 84.04% and 91.70%, respectively), followed by FOR; BIS had the worst performance. For texture features, the best performance came from FOR (with an F1 measure of 94.03%), followed by WAT; the poorest performance was from BIS. For coherence features, the beat classification results were found in DIS (with an F1 measure of 73.68%), followed by BIS; GRA had the poorest performance. Therefore, it can be seen that these four different Sentinel-1A-derived features have their own advantages and disadvantages for urban land cover classification. Although texture features had the highest classification accuracy among the four different features, it could not recognize GRA and BIS well. Coherence features were better at identifying urban impervious surfaces, but their overall classification accuracy was lowest. Color and backscatter intensity features were better at recognizing water bodies and forests, but they could not identify BIS well, and their F1 measures were less than 50% for BIS. To investigate the advantages of the combination of features, classification was carried out by combining two sentinel-1A-derived features. The best classification results were produced using an SAR data combination of T + C1, with an overall accuracy up to 91.55% and a kappa coefficient up to 0.8935, respectively; all F1 measures were above 85%. In general, when the classification accuracy is higher than 85%, the classification result is reliable [43,95]. This shows that SAR images can not only replace optical images for urban land cover classification, but also meet the classification accuracy requirements by using only a combination of texture and coherence features. Furthermore, F1 measures using T + C2 and VV + VH + T were all higher than 85% except BIS. Addition of coherence features derived from SAR images to the backscatter intensity images improved the overall accuracy and kappa coefficient improved by 10.21% and 0.1300, respectively. The importance of coherence features was also supported by the works of Ai et al. [96] and Watanabe et al. [97]. These improvements were also observed when combining two other sentinel-1A-derived features. Uhlmann et al. [98] also showed that the addition of color features can improve the classification accuracy compared with the traditional use of texture features. This indicates that the combined feature performs better than the single feature because the combined feature can provide more useful information for urban land cover classification. For WAT and FOR, the best classification results were from T + C1 (with an F1 measure of 94.93% and 96.00%, respectively), followed by VV + VH + T and T + C2; the poorest classification accuracy came from C1 + C2 and VV + VH + C2. For GRA and BIS, the best accuracy was found in T + C1, followed by T + C2; the worst performance was from VV + VH + C2 and C1 + C2. For DIS, the best F1 measure (89.11%) was with T + C2, followed by T + C1 and VV + VH + T; the poorest performance was from VV + VH + C2, with an F1 measure of 76.77%. This shows that different features extracted from SAR data are suitable for the extraction of different urban land cover types. That is to say, for the extraction of urban land cover, the selection of SAR data features is very important.
In order to improve classification accuracy and fully explore the ability of Sentinel-1A-derived features to identify urban land cover, a random combination of three features was classified. T + C1 + C2 had the highest classification accuracy (with an overall accuracy of 92.95% and a kappa coefficient of 0.9113), followed by VV + VH + T + C1 (with an overall accuracy of 91.90% and a kappa coefficient of 0.8978); the addition of color features increased overall accuracy and kappa coefficient by 1.4% and 0.0178, respectively; the highest classification accuracy came from FOR (F1 measure = 96.35%), while the lowest classification accuracy came from GRA (F1 measure = 87.72%). VV + VH + C1 + C2 had the worst classification results, with an overall accuracy of 87.85% and a kappa coefficient of 0.8466; unlike the classification results of T + C1 + C2, the best classification results were from WAT (F1 measure = 94.58%) and the worst classification results were from BIS (F1 measure = 76.60%). VV + VH + T + C2 can well identify urban land cover, except for BIS, F1 measures of other land cover types were no less than 85%; adding texture images to the combination of VV + VH + C2 allowed us to increase both the overall accuracy and kappa coefficient from 76.41% to 90.14% and from 0.7009 to 0.8757, respectively. This also indicates that the classification accuracy can be effectively improved when different Sentinel-1A-derived features were combined.

Multi-Sensor Urban Land Cover Mapping
To assess the impact of multi-sensor earth observation data on urban land cover extraction, three different earth observation sensors, including Sentinel-1A, Landsat-8 OLI, and Hyperion, were classified by RF classifiers (see Table 5). First, these earth observation data were classified based on three different single sensors. As shown in Table 5, L had the highest classification accuracy (with an overall accuracy of 95.89% and a kappa coefficient of 0.9480), followed by E. Although the classification accuracy of VV + VH + T + C1 + C2 was the lowest, its classification results were satisfactory, and the classification accuracy of each urban land cover types was higher than 85%; it had the best performance among all combinations of Sentinel-1A-derived features, with an overall accuracy up to 93.13% and a kappa coefficient up to 0.9135, respectively. Compared with previous research results [91], the classification accuracy of SAR data in this study was higher mainly because more SAR-derived features were used. For both Landsat-8 OLI and Hyperion images, the highest classification accuracy came from FOR, with an F1 measure of 98.00% and 97.59%, respectively. The worst classification results for Landsat-8 OLI data came from DIS (F1 measure = 93.52%), while the worst performance for Hyperion data was from GRA (F1 measure = 92.74%). Unlike Landsat-8 OLI and Hyperion, the highest classification accuracy for Sentinel-1A came from WAT (F1 measure = 97.11%); the worst performance came from BIS, with an F1 measure of 89.02%. This shows that the earth observation data of different sensors are suitable for the identification of different land cover types. These sensors record surface information at different wavelengths [99], optical data has more spectral information, and SAR data has richer texture information.
In order to utilize the multi-source data to potentially improve the classification accuracy, classification was performed by combining three optical and radar instruments. As shown in Table 5, the best classification results were obtained with the combination of Sentinel-1A and Hyperion data, with an overall accuracy up to 99.12% and a kappa coefficient up to 0.9889, respectively; the addition of Sentinel-1A data to Hyperion images increased overall accuracy and kappa coefficient by 4.01% and 0.0519, respectively. These classification results are better than those reported by Kumar et al. [35], in which the overall accuracy of the combination of SAR and hyperspectral data was only 68.90%. The accuracy obtained using the combination of Sentinel-1A and Landsat-8 OLI data was lower, with an overall accuracy of 96.83% and a kappa coefficient of 0.9600; the addition of Sentinel-1A data to Landsat-8 OLI images also resulted in improved classification accuracy. This indicates that the combination of Sentinel-1A and Hyperion data leads to better classification accuracy compared to the combination of Sentinel-1A and Landsat-8 OLI images. The differences in wavelength determine that optical and SAR data will respond to different characteristics of the surface. SAR data contains the structure and dielectric properties of the Earth's surface material, while optical data provides information on the surface reflectance and emissivity characteristics [100]. Zhang et al. [101] used empirical neural network with Landsat TM and ERS-2 SAR data to improve the estimations of water characteristics and found that microwave data can help improve the estimation of these characteristics. Optical and SAR sensors can use the complementarity of their information to improve accuracy. This explanation of complementary information is also supported by previous literature, which makes use of multispectral and SAR data for land use and land cover identification [30,90,102]. Figure 6 presents samples of the final map products. while optical data provides information on the surface reflectance and emissivity characteristics [100]. Zhang et al. [101] used empirical neural network with Landsat TM and ERS-2 SAR data to improve the estimations of water characteristics and found that microwave data can help improve the estimation of these characteristics. Optical and SAR sensors can use the complementarity of their information to improve accuracy. This explanation of complementary information is also supported by previous literature, which makes use of multispectral and SAR data for land use and land cover identification [30,90,102]. Figure 6 presents samples of the final map products.  Table 3 (e.g., L means Landsat-8 data; E means EO-1 Hyperion data).

Conclusions
In this study, we evaluated the ability and contribution of using backscatter intensity, texture, coherence, and color features extracted from Sentinel-1A data for urban land cover classification and compared different multi-sensor land cover mapping methods to improve classification accuracy. The following several conclusions can be drawn from these experiments: (1) For the combination of all texture features, the optimal window size was 9 × 9. For each individual texture feature (mean, variance, entropy, dissimilarity, homogeneity, correlation, contrast and angular second moment),

Conclusions
In this study, we evaluated the ability and contribution of using backscatter intensity, texture, coherence, and color features extracted from Sentinel-1A data for urban land cover classification and compared different multi-sensor land cover mapping methods to improve classification accuracy. The following several conclusions can be drawn from these experiments: (1) For the combination of all texture features, the optimal window size was 9 × 9. For each individual texture feature (mean, variance, entropy, dissimilarity, homogeneity, correlation, contrast and angular second moment), the optimal window sizes were: 5 × 5, 23 × 23, 25 × 25, 13 × 13, 19 × 19, 49 × 49, 51 × 51 and 9 × 9, respectively. The mean feature had the best classification result, followed by the dissimilarity; the worst classification result came from the angular second moment and correlation features; (2) The RF importance measures showed that the texture features contribute the most to the classification, with the highest value above 0.75, followed by the coherence features and the backscatter intensity features; the color features contributed the least to urban land cover classification; (3) Among all four different feature types, the classification accuracy using texture features was highest, with an overall accuracy of 89.08% (kappa coefficient = 0.8621), followed by backscatter intensity and color features. Although the coherence features had the worst performance of the four single features, it could distinguish urban impervious surfaces (BIS and DIS) well. Satisfactory classification results can be obtained using only the combination of texture and coherence features, with an overall accuracy up to 91.55% and a kappa coefficient up to 0.8935, respectively; all F1 measures were above 85%. When only Sentinel-1A was utilized, the combination of all Sentinel-1A-derived features performed the best; and (4) Better classification results could be obtained when using multi-sensor for urban land cover mapping. The combination of Sentinel-1A and Hyperion data yielded higher accuracy than the combination of Sentinel-1A and Landsat-8 OLI imagery, with an overall accuracy up to 99.12% and a kappa coefficient up to 0.9889, respectively; the addition of Sentinel-1A data to Hyperion images increased overall accuracy and kappa coefficient by 4.01% and 0.0519, respectively.
With the further application of Sentinel-1A images, our findings provide a reference guide on how to select and combine multi-type features. In a future study, different SAR-derived features from different seasons will be extracted to investigate the impact of seasonality on urban land cover mapping under different climatic conditions.